While locking DRMI, we noticed that the POP_90 signal looked strangely much larger than it normally does. Evan noticed that PRM was still aligned during this phase when it (and SRM) should normally be misaligned. The PRM guardian was showing that the request was 'MISALIGNED' while the state readback still showed 'ALIGNED'. Looking at the guardian log for PRM, I see that it stopped logging ~3 hours prior to starting to attempt locking. See attached screenshot for error messages.
J. Kissel, N. Kijbunchoo, T. Sadecki
We'd tried several things to resurrect / fix the problem:
- Switch the operation mode from EXEC to PAUSE and back, from EXEC to STOP and back
- Restarting the node from the command line,
- Stopping the node from the command line,
- Destroying the node from the command line
all to no avail.
This doesn't seem to be a show stopping problem, so we're just going to continue as is and email people.
The procedure given in LHO aLOG entry 16880 has been 100% successful in restoring hung Guardian nodes at LLO. We have found that DAQ restarts are usually responsible for causing nodes to hang, hence we reboot the Guardian/script machine following Tuesday maintenance as a preventative measure. n.b. Jamie has also provided a script to help expedite identifying the hung nodes, see LLO aLOG entry 20615.
J. Kissel, T. Sadecki Stuart! You rock! We followed the "procedure" from LHO aLOG 16880, and now SUS_PRM is no longer a member of the walking dead. The PRM node is now responsive to requests and has been remanaged by the ALIGN_IFO manager. Very good!