Reports until 10:26, Sunday 01 December 2024
H1 CDS
david.barker@LIGO.ORG - posted 10:26, Sunday 01 December 2024 - last comment - 11:33, Sunday 01 December 2024(81561)
cdslogin alarms and alert system not working since 4am Sun 01 Dec 2024

cdslogin is in a strange state, not quite down but not sending alarms or alerts. It is pingable, I am still using it as a ssh-tunnel for my no-machine connection to opslogin0, but it does not accept any new ssh logins and my shell on it cannot find any commands.

Comments related to this report
david.barker@LIGO.ORG - 10:36, Sunday 01 December 2024 (81563)

File system error starting at 03:42:20 this morning

Images attached to this comment
david.barker@LIGO.ORG - 11:24, Sunday 01 December 2024 (81564)

I power cycled cdslogin remotely via IPMI (10:50 power down, 10:52 power up) to force a fsck. The system came back up in operational mode, the systemd services alarms and locklossalerts started normally.

These services write to the local file system, which is presumably why they were down when the local FS switched to read-only mode.

Two improvements spring to mind:

Make alarms and alerts memory resident only, no reliance on any file system.

Make these services portable to cdsssh if cdslogin became unusable.

david.barker@LIGO.ORG - 11:28, Sunday 01 December 2024 (81565)
david.barker@LIGO.ORG - 11:33, Sunday 01 December 2024 (81566)

There is no indication of any mains power issues at 03:42 this morning. No UPS reports, and the three phases of the corner station mains-mon look good throughout.