Reports until 13:20, Wednesday 13 March 2024
H1 SUS (CDS, ISC)
jeffrey.kissel@LIGO.ORG - posted 13:20, Wednesday 13 March 2024 - last comment - 15:21, Wednesday 13 March 2024(76343)
Status report: New-and-improved WD system on ETMs looks great; one WD trip at the PUM stage; threshold was arbitrary and then just barely surpassed at 25 [um_RMS]
E. Capote, L. Dartez, J. Kissel, R. Short

As you know, Oli and I installed a revamped watchdog system on the ETMs yesterday (LHO:76305). 
In doing so, we arbitrarily set the watchdog threshold for all stages at 25 [um_RMS] (over the 0.1 to 10 Hz band).

Elenna and Louis reported yesterday evening that we had *one* WD trip of the PUM stage after the lock-loss of the fourth of four successful locks (see the last phrase of the last sentence in LHO:76315). Further, they were concerned because, apparently, "the watchdog reset itself."

To the first point -- that there even was a watchdog trip -- the first attachment shows
    - The ISC_LOCK guardian state (H1:GRD-ISC_LOCK_STATE_N), remeber 600 is NOMINAL_LOW_NOISE
    - The 4 stages of OSEM watchdog "is it tripped?" channels (H1:SUS-ETMX_??_WDMON_STATE, with ?? = M0, R0, L1, and L2 for Main Chain and Reaction Chain top stages, the UIM and PUM stages)
    - The 4 stages of OSEM watchdog "is the trigger actively suggesting it should be tripped?" channels (H1:SUS-ETMX_??_WDMON_CURRENTTRIG)
    - The PUM (L2) stage BLRMS trigger signals themselves, now for the first time in physical units! (H1:SUS-ETMX_L2_WD_OSEMAC_RMSLP_??_OUT16)

One can see that lock losses are typically kicking the PUM, up to 10-20 [um_RMS], but otherwise, the quiescent BLRMS is around 0.5 [um_RMS].

To the second point -- we're still investigating, but the second attachment shows that the PUM WD is reset (H1:SUS-ETMX_L2_WDMON_STATE goes from 2 to 1) at 2024-03-13 06:44:32 UTC.

In general, the guardian systems has been build on the axiom of "GUARDIAN SHALL NOT TOUCH WATCHDOGS," so we're quite suspicious of that this was not human-related. The human action that seems to be coincident with the reset on *this* lock acquisition attempt, as opposed to the three attempts immediately prior is that the ISC_LOCK guardian was taken to "INIT." It's not a smoking gun, but it's at least a bread crumb. We'll keep looking.

Finally, the third attachment shows an overall assessment of typical maximum values of the BLRMS for all stages. Here's a summary of the assessment in tabular form:

Stage        Quiescent Average Value        Maximum Transient Value
                    [um_RMS]                     [um_RMS]
M0                    0.35                         6.35
L1                    0.32                        18.34
L2                    0.44                        29.02

(where out of laziness and worry of too crowded a plot, I assume R0 has the same behavior.)

In the mean time, the watchdog is working exactly as expected, and let's bump up the PUM WD threshold to 30 [um_RMS].
Images attached to this report
Comments related to this report
ryan.short@LIGO.ORG - 15:21, Wednesday 13 March 2024 (76354)

No evidence in Guardian logs around the time of the L2 WD untripping:

$: guardctrl log -a 1394347485 -b 1394347491 | grep ETMX
2024-03-13_06:44:27.268440Z ALS_DIFF [DOWN.run] ezca: H1:SUS-ETMX_L1_LOCK_L_RSET => 2.0
2024-03-13_06:44:27.268440Z ALS_DIFF [DOWN.run] ezca: H1:SUS-ETMX_L1_DRIVEALIGN_L2P_RSET => 2.0
2024-03-13_06:44:30.704829Z ISC_LOCK [PREP_FOR_LOCKING.main] ezca: H1:SUS-ETMX_L1_LOCK_P_RSET => 2
2024-03-13_06:44:30.705395Z ISC_LOCK [PREP_FOR_LOCKING.main] ezca: H1:SUS-ETMX_L1_LOCK_Y_RSET => 2
2024-03-13_06:44:31.063310Z ISC_LOCK [PREP_FOR_LOCKING.main] ezca: H1:SUS-ETMX_L3_LOCK_P => ON: FM7
2024-03-13_06:44:32.078345Z ISC_LOCK [PREP_FOR_LOCKING.main] ezca: H1:SUS-ETMX_L3_LOCK_Y => ON: FM7
2024-03-13_06:44:32.634872Z SUS_ETMX EDGE: TRIPPED->RESET
2024-03-13_06:44:32.636179Z SUS_ETMX calculating path: RESET->ALIGNED
2024-03-13_06:44:32.642122Z SUS_ETMX new target: SAFE
2024-03-13_06:44:32.643532Z SUS_ETMX executing state: RESET (9)
2024-03-13_06:44:32.649093Z SUS_ETMX [RESET.main] Turning off all LOCK outputs
2024-03-13_06:44:32.776339Z SUS_ETMX [RESET.main] ezca: H1:SUS-ETMX_M0_LOCK_L_SW2 => 1024
2024-03-13_06:44:32.871341Z ISC_LOCK [PREP_FOR_LOCKING.main] ezca: H1:SUS-ETMX_L2_DAMP_MODE1_RSET => 2