Reports until 09:08, Wednesday 17 May 2023
H1 CAL (CDS, GRD, OpsInfo)
jeffrey.kissel@LIGO.ORG - posted 09:08, Wednesday 17 May 2023 (69688)
CAL_AWG_LINES guardian not robust across Tuesdays / DAQ Restarts / Model Reboots
J. Kissel, T. Shaffer

TJ noticed that the CAL_AWG_LINES guardian node went into error after Corey and I started the calibration suite this morning (LHO:69684).
After some digging and trending, we realized 
    (1) during yesterday's OMC front-end code restarts, and several DAQ restarts (LHO:69655) -- the CAL_AWG_LINES had obliviously remained in its LINES_ON state. 
    (2) Across several of yesterday evening's post-maintenance lock re-acquisitions, the ISC_LOCK guardian merely re-requested the CAL_AWG_LINES "LINES_ON" state. Since CAL_AWG_LINES was already in the LINES_ON state, no action was taken. However,
    (3) Today, when taking ISC_LOCK to NLN_CAL_MEAS, which requests CAL_AWG_LINES to go to IDLE (through TURN_LINES_OFF), it tried to access the awg test point it had started May 16 2023 04:47:00 UTC (i.e. Monday night, May 15 2023 21:47:00 PDT, likely when the PEM team turned the lines back on after being done for the night), the code fell over.

(A) There's nothing to be really sad about here. It just means we missed a few thermalizations.
(B) I don't think it's urgent to somehow *make* the calls to python / guardian / AWG system robust across computer reboots and DAQ restarts, though see thoughts below
(C) Hopefully, I'll have enough of a plan on what to do about the IFO's thermalization (continuing with all the actions in LHO:69593), that I won't need this guardian
(D) The good enough "solution" is for a human to check the logs, confirm it died because it couldn't access test points that don't exist, and "just" reload the guardian just before we're done with our calibration suite in NLN_CAL_MEAS. I would expect this only needs doing across a Tuesday, and there's only one of those left before the observing run.

In the fullness of time, though, if we do want to start using guardians to drive awg, we will need to figure out a way around this. 
For example, currently, as the CAL_AWG_LINES and awg processes are written, we can't "just" ask the guardian to cycle through its LINES ON and LINES OFF state, because the code is in error. It's also not really good practice for a guardian manager node (in this case ISC_LOCK) to "just assume" what's going on and toggle the LOAD button.
Images attached to this report