Reports until 15:14, Wednesday 05 July 2023
H1 ISC (GRD, OpsInfo)
thomas.shaffer@LIGO.ORG - posted 15:14, Wednesday 05 July 2023 - last comment - 21:39, Thursday 06 July 2023(71078)
Added wrapper to DRMI/PRMI states that use getdata

Over the weekend we ran into a few times (alog71043alog71026, alog71008) that we tried to get data via cdsutils getdata function in an ISC_LOCK guardian state, and it returned nothing. This caused an error in ISC_LOCK, fixed by simply reloading the node since the function just had to try again to get the data. This is not a new thing, but it's definitely another reminder that we have to be prepared for different outcomes anytime we request data.

Some months ago I made with Jonathan's help, a function wrapper that can be used to handle hung data grabs. While not the issue we saw over the weekend, it's still a good idea to use this whenever we try getting data in a Guardian node. The file is (userapps)/sys/h1/guardian/timeout_utils.py and there is either a decorator (@timeout) or a wrapper function (call_with_timeout) than can be used.

For the specific issue we saw over the weekend, a solution is to just do a simple check that the data is actually there before trying to do anything with it (ie. if data:). Using this situation as a good example:

 

# This wrapper should handle hung nds data grabs

popdata_prmi = call_with_timeout(cdu.getdata, 'LSC-POPAIR_B_RF90_I_ERR_DQ', -60)

# This conditional handles None data returned

if popdata_prmi.data:

    if popdata_prmi.data.max() < 20:

        log('no POPAIR RF90 flashes above 20, going to CHECK MICH FRINGES')

        return 'CHECK_MICH_FRINGES'

    else:

        self.timer['PRMI_POPAIR_check'] = 60
Comments related to this report
thomas.shaffer@LIGO.ORG - 15:30, Wednesday 05 July 2023 (71079)

I should have added that this fix was loaded into ISC_LOCK by Tony during commissioning today and is ready for our next relock.

camilla.compton@LIGO.ORG - 21:39, Thursday 06 July 2023 (71127)OpsInfo

This threw the attached error at 2034-07-07 04:14UTC. I edited ISC_LOCK for prmi and drmi checkers from 'if popdata_prmi.data:' to 'if popdata_prmi:'.

This seemed to work but I'm not sure if it will cover all every case. If this goes into error again I suggest the operator start by reloading ISC_LOCK and, if necessary, the "elif self.timer['PRMI_POPAIR_check'] " block of code can be commented out. Tagging OpInfo.

After this edit and a reload, the checker seems to work well, logging that there was no RF18 flashes above 120 (true) and moving to PRMI locking before the old 5 minute 'try_PRMI' timer finished.

Images attached to this comment