Reports until 11:02, Thursday 21 December 2023
H1 CDS
david.barker@LIGO.ORG - posted 11:02, Thursday 21 December 2023 - last comment - 12:10, Thursday 21 December 2023(74959)
Upgrade of cdslogin caused EDC and SDF failures

Erik, Jonathan, Dave:

We upgraded cdslogin to deb12 at 09:45 PST this morning. This caused the EDC on h1susauxb123 go constantly restart. For a short term solution we removed all the channels being served by cdslogin from the H1EDC.ini channel list (lock-loss-alert and remote-access channels) and restarted the DAQ. The EDC is running stably now.

The loss of EDC channels caused a secondary Guardian issue, which is now resolved.

The h1cdssdf system would not run, again because it is connecting to cdslogin lock-loss-alert channels. We have temporarily removed these channels from the monitor.req and safe.snap files. This also caused a secondary Guardian problem which has now been resolved.

Current situation:

Alarms system is running

Lock loss alert system of offline.

EDC is missing all of its cdslogin channels

No slow channels from EDC in the DAQ between the times of 09:50:00 and 10:21:00

Comments related to this report
david.barker@LIGO.ORG - 11:13, Thursday 21 December 2023 (74962)

Erik is working on a deb11 container as a temporary solution to get the LLA code running again on cdslogin. Another possible solution is to move the code to another machine. We hope to get text and phone call alerting back online before tonight's operator owl shift.

david.barker@LIGO.ORG - 11:55, Thursday 21 December 2023 (74965)

Lock loss alerts are online again. I tested twilio texts and phone calls were working. I reset the settings to 09:15 by converting the safe.snap into a set of caput operations. At next TOO we will add the LLA channels back into h1cdssdf to put them back under SDF control.

erik.vonreis@LIGO.ORG - 12:10, Thursday 21 December 2023 (74966)

The text/phone alert system is now running in a Debian 11 container.  The remote access IOC is also running in a Debian 11 container. 

A crash of the alert system, similar to the crash of the EDC, seen in Debian 12 hasn't recurred in Debian 11, so we should try re-attaching the EDC to these systems.