Reports until 14:14, Wednesday 24 May 2017
H1 ISC (CDS, OpsInfo)
jeffrey.kissel@LIGO.ORG - posted 14:14, Wednesday 24 May 2017 - last comment - 11:31, Thursday 25 May 2017(36381)
First Recovery Problem: ALS X Laser Head Power too High (Because of HVAC Upgrade / X VEA Temperature ), Defying Thresholds Momentarily, Cause Virtually Untraceable Errors Everywhere
J. Driggers, K. Izumi, S. Dwyer, J. Kissel, V. Adya, T. Shaffer

We've been able to find green alignment quite easily. Yes!!

However, once we were able to get ALS X well aligned and locked, with Green WFS and ITM camera alignment systems ON -- the ALS X Beckhoff state machine turned into a blinking MEDM light show. The arm would remain locked, with good alignment, the ALSX guardian would go into fault.

The obvious symptoms were several momentary errors on the (from the ALS Overview Screen) Beckhoff screens for PDH, Fiber PLL, and VCO that each complained of each other.
We started with slow calculated attempts of trying to disable various parts of the state machine, e.g. 
    - using H1:ALS-X_FIBR_LOCK_LOGIC_FORCE to force the fiber PLL lock, or 
    - by hitting reset (H1:ALS-X_VCO_CONTROLS_CLEARINT) on the H1:ALS-X_VCO_TUNEOFS to reset the frequency finding servo.
we then degraded to a bit of button mashing, after which the state machine would just restore everything to what it was before we started.

Finally, Sheila showed us how to dig down an alternate path for finding errors via the sitemap > SYS > EtherCAT overview and follow the error messages from there. However the screens only show explanatory text when there are errors present, which makes tracing a momentary error frustrating at best. Our path down this rabbit hole was
    sitemap >
        SYS >
            EtherCAT overview > ECAT_CUST_SYSTEM.adl
                X-End PLC2 (because it showed a text "ALS-X; ISC-EX") > H1_X1PLC2.adl
                    Als (which had no text) > H1ALS_X1_PLC2.adl
                        X (had no text) > H1ALS_X1_PLC2_X.adl
                            Potential BUG: On this screen the "Lock" and "Refl" were showing constant errors but "Laser" ended up being the problem
                            Laser (maybe showed only momentary text) > H1ALS_X1PLC2_X_LASER.adl
                                Head > H1ALS_X1_PLC2_X_LASER_HEAD.adl
                                    After careful scrutiny of this screen we found that the ALS-X laser diode 2's powr monitor
                                        H1:ALS-X_LASER_HEAD_LASERDIODE2POWERMONITOR
                                    was bouncing between 2.038 and 2.039, with is just hovering along the edge of the user defined tolerance of
                                        H1:ALS-X_LASER_HEAD_LASERDIODEPOWERTOLERANCE == 0.2
                                    from the user-defined nominal
                                        H1:ALS-X_LASER_HEAD_LASERDIODEPOWERNOMINAL == 1.842
                                    After increasing the threshold on deviations from the nominal from 0.2 to 0.5, the entire state machine became happy and normal.

This is a problem we'd never seen before, but upon further inspection while writing this log (because we found it hard to believe that laser diodes would produce *more* power than before), I took a look at the 15 day trend of this laser power vs. the X VEA temperature (as measured by the PCAL Receiver's Temperature Sensor), and indeed, the laser power follows it nicely. We should be prepared for the HVAC upgrade to be impacting a lot more than just suspension alignments (LHO aLOG 36331).

Lesson Learned The state machine for the ALS system is really hard to debug when there are momentary errors. 
    - We should change the beckhoff error reporting to be latching
    - We should change these automatically generated screens to *always* display text, so that one can navigate around them with comfort
    - There may actually be a bug in the reporting system
Images attached to this report
Comments related to this report
daniel.sigg@LIGO.ORG - 17:25, Wednesday 24 May 2017 (36397)

Inspecting the TwinCAT code for the laser head indeed revealed a mistake when calling the error handler: the list of error messages was never passed down which in turn prevents the bits from "lightening up."

jeffrey.kissel@LIGO.ORG - 11:31, Thursday 25 May 2017 (36421)CDS