Displaying report 1-1 of 1.
Reports until 10:23, Sunday 08 December 2024
H1 CDS
david.barker@LIGO.ORG - posted 10:23, Sunday 08 December 2024 - last comment - 16:51, Sunday 08 December 2024(81676)
alarms issue 01:00 Sunday 08dec2024

Jonathan, Dave:

The alarms service on cdslogin stopped reporting around 1am this morning. Symptoms are status file was not being updated (caused alarm block on CDS Overview MEDM to turn PURPLE) and the report file was not being updated. Presumably no alarms would have been sent from this time onwards.

At 08:10 I restarted the alarms.service on cdslogin. A new report file was created but not written to, the /tmp/alarm_status.txt file was not changed (still frozen at 01:00) but I did get a startup text. Then 14 minutes later the files started being written. I raised a test alarm and got a text, but no email.

At 09:38 after not getting a keepalive email at 09:00 or any SSH login emails I rebooted cdslogin. Same behavior as 08:10; report file created not written, tmp file not created, startup text sent successfully. After 14 minutes alarms starts running, writes to file system, test alarms are texted but no emails at all.

Jonathan is going to check on bepex.

Comments related to this report
david.barker@LIGO.ORG - 10:53, Sunday 08 December 2024 (81678)

Jonathan rebooted bepex which has fixed the no-email problem with alarms and alerts. I raised a test alarm and alert to myself and got both texts and emails.

david.barker@LIGO.ORG - 11:01, Sunday 08 December 2024 (81680)
david.barker@LIGO.ORG - 16:51, Sunday 08 December 2024 (81685)

Alarms got stuck again around noon today, presumably due to a reoccurring bepex issue. I have edited the code to skip trying to use bepex and only use twilio for texts. alarms.service was restarted on cdslogin at 16:48 PST.

Displaying report 1-1 of 1.