Displaying report 1-1 of 1.
Reports until 11:14, Monday 07 December 2015
H1 SEI
hugh.radkins@LIGO.ORG - posted 11:14, Monday 07 December 2015 - last comment - 11:47, Monday 07 December 2015(24028)
HAM 2 & 3 ISI Guardian and Isolation issues

As Patrick reported, the Guardians for HAM2 & 3 ISI were reporting some issue; TJ is looking into that more deeply now.  Regarding the Guardian update I did on 24 November, that update was only for common files and the code had been exercised on most platforms as of 1 Dec.  It had not been exercised on HAM 2 3 4 6  BS or ITMX before this morning according to my notes.

This morning I decided to restart the guardian for HAMs 2 & 3 and the error condition went away.  Still don't know why there was the error condition.

Next, both the ISIs were tripping during the Isolation process.  It looked like this was occurring as the ISI was switching the GS13 back to high gain.  I disabled the Sensor Correction thinking the high microseism might be the problem but tripping was still occurring.  So I disabled the guardians, SEI and ISI, and brought the ISI to Isolation using the commands scripts.  This way the GS13s could be kept in low gain.  I was then able the switch the GS13 back to high gain after giving the platforms a few moments after Isolation was completed.

I tested the guardian this morning on HAM5 and it worked fine so while the first reported error gives us reason to suspect, the fundamental performance of the guardian seems to function.  Again, while there are runtime parameter files unique for each ISI (although they are generally the same,) the guardian are all running the same common code.  Isolation is the same on a HAM and the two stages of the BSC.

My first theory would be that the position offsets on HAMs 2 & 3 are larger than those for the other platforms.  This may cause the GS13s to be noisier after the completion of Isolation just as the GS13s are switched causing the trip.  However, HAM6 verticals are about 5 to 10 times that of all the others and the HAM2 & 3 horizontals are maybe only a little larger than the HAM5 drives.  So, that may be the issue but it looks to be suttle.  Does the payload contribute with the high microseism and the GS13 switch to trip the platform?  Maybe except the HAM3 payload is much different from HAM5 and is much different from HAM2.  What about the alignment wrt IFO?  The HAM2 & 3 platforms, both HEPI and ISI are rotated compared to the other HAMs.  What about FeedForward?  HAM2 & 3 are getting X & RY FF from HEPI as is HAM6 but HAM6 GS13s aren't switched like HAM 2 & 3.  HAM4 & 5 are getting their FF signals from the ISI Stage0 L4Cs.

So, could the alignment of HAM2 & 3 make them more vulnerable to the incoming high microseism?  The BLRMS would not suggest this and the arrangement of the arms (NW & SW) would indicate the coastal noise woould be the same for the arms.

So the FF may be the issue.  The guardian update also turns on the FF but this has proven to be a non issue in turn on transients but the microseism environment may be contributing.

Comments related to this report
jameson.rollins@LIGO.ORG - 11:47, Monday 07 December 2015 (24030)

A couple of things:

Remember that just because different systems are all running the same common code, that doesn't mean they all excersize the same code paths.  The actual code run by the BSC ISI stages may be slightly different than what is run by the HAMs.  Different functions may be executed.  In some cases the different parameters given to the different chamber guardians actually specify which different code paths should be executed.

If a guardian node throws an error, a restart may cause the immediate issue to clear by resetting the state of the system, but it doesn't actually fix the error.  If no code has been changed, then the error will necessarily occur again when the same conditions are met.

Study the exception traceback message, since usually that's pointing to exactly what the problem is.  In this case, the error was:

2015-12-07T11:12:59.06689 ISI_HAM3 [INIT.main] determining how to get to defined state...
2015-12-07T11:12:59.08430 ISI_HAM3 W: Traceback (most recent call last):
2015-12-07T11:12:59.08432   File "/ligo/apps/linux-x86_64/guardian-1485/lib/python2.7/site-packages/guardian/worker.py", line 459, in run
2015-12-07T11:12:59.08433     retval = statefunc()
2015-12-07T11:12:59.08433   File "/opt/rtcds/userapps/release/isi/common/guardian/isiguardianlib/ISI_STAGE/states.py", line 52, in main
2015-12-07T11:12:59.08434     error_message = isolation_util.check_if_isolation_loops_on(control_level)
2015-12-07T11:12:59.08434   File "/opt/rtcds/userapps/release/isi/common/guardian/isiguardianlib/util.py", line 30, in wrapper
2015-12-07T11:12:59.08435     .format(nth=const.NTH[arg_number].lower(), func=func.__name__, allowed=allowed, passed_arg=args[arg_number]))
2015-12-07T11:12:59.08435 WrongArgument: The 1st argument of check_if_isolation_loops_on must be in {'HIGH': {'INDEX': 400, 'MAIN': ('FM4', 'FM5', 'FM6', 'FM7'), 'BOOST': ('FM8',)}}. Passed ROBUST.

This is telling you exactly what/where the error is: the check_if_isolation_loops_on function is getting an incorrect input argument when called during the INIT state.

Displaying report 1-1 of 1.