Displaying report 1-1 of 1.
Reports until 18:40, Friday 01 May 2015
H1 GRD (DetChar, GRD, ISC)
jeffrey.kissel@LIGO.ORG - posted 18:40, Friday 01 May 2015 - last comment - 19:34, Friday 01 May 2015(18162)
ISC_DRMI Guardian Complains of Dead Channels, cause locking procedure to stall
J. Kissel, N. Kijbunchoo, K. Izumi

While I was peacefully explaining tilt-horiztonal coupling to Nutsinee waiting for the DRMI to lock, it acquired, but the ISC_DRMI guardian node got stuck in the DRMI_1F_LOCKED_ASC state complaining in the SPF DIFFs that the channel H1:ASC-INMATRIX_P_1_9 (the REFLA RF9I to element INP1_P) is dead. Kiwamu pointed us to Jamie's solution the last time this had occurred (see LHO aLOG 17545 for problem, and LHO aLOG 17548 for fix), but this time we're 100% confident that no one has made any change to guardian code. 

I've tried reloading the guardian code, but that's all I'm willing to do. We've been working so hard to get DRMI up since we lost lock from violin mode problems.

I note that Dave has been reporting that the guardian machine is grossly overloaded today (LHO aLOG 18152), but at this point I can only claim these two things are connected anecdotally.

I've left a message at the Guardian Help Desk.
Images attached to this report
Comments related to this report
jeffrey.kissel@LIGO.ORG - 19:34, Friday 01 May 2015 (18163)
D. Barker, J. Kissel, and then J. Rollins

Pre call from Jamie:
Dave and I tried chasing a solution to the above problem a little further by not just reloading the guardian code, but restarting the node as Sheila had done in LHO aLOG 17545. Unlike that previous situation, though, the problem cleared with the restart. Regrettably, the node comes up in "INIT" and when I tried requesting the same state it had frozen in, "DRMI_1F_LOCKED_ASC," it dropped the DRMI lock. It didn't kill the ALS COMM over DIFF lock though, which is nice.

Then with Jamie on the phone:
By then DRMI had recovered to LOCK_DRMI_1F, but remained stationary because when you restart a subordinate node, his manager -- in this case ISC_LOCK -- loses management possession and doesn't know what to do. The trouble is that the only way for ISC_LOCK to regain possession of all of its subordinates is to go to the INIT state, which I didn't want to do because we already had an ALS COMM, ALD DIFF, and DRMI locked up.

Jamie recommended that I try force regaining possession of:
- Put the ISC_LOCK in MANUAL mode (via the "all" states subscreen)
- Jump to INIT (this *worked* and repossessed the ISC_DRMI node)
- Jump back to the state it *was* in before it had been put into manual, and switch back to EXEC.
However, upon switching back to EXEC, the IFO lost all locks.

Jamie thinks it's because I should have requested the state *after* the one where it was stuck, I think (now while writing this log) that I just went to the wrong state period (i.e. not what it was before I went to manual). *sigh* Oh well.

We're on our way back up...

Displaying report 1-1 of 1.