Reports until 14:21, Wednesday 21 May 2025
H1 CDS
david.barker@LIGO.ORG - posted 14:21, Wednesday 21 May 2025 - last comment - 10:49, Thursday 22 May 2025(84521)
h1sush7 crash

13:25:11 Wed 21 May 2025 PDT all models on h1sush7 stopped running. The SWWD to h1iopseih7 started its countdown with 100% IPC receive errors.

I set a bypass time of 999999 on h1iopseih7 to interrupt the countdown.

We first restarted all the models. This did not clear the error, and we found a PCI bus issue whereby only one of the four ADCs could be found.

As a precautionary measure we fenced h1sush7 from the Dolphin fabric.

Next we power cycled the computer. This fixed the PCI bus issues, all cards are visible and all models starting running.

I cleared the SWWDs and handed the system over to the control room.

Note, there were TIMING error flashes during this recovery time which we have not traced down.

[Wed May 21 13:25:11 2025] h1iopsush7: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Wed May 21 13:25:11 2025] h1susauxh7: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Wed May 21 13:25:11 2025] h1sussqzin: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Wed May 21 13:25:11 2025] h1susfc1: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
 

Images attached to this report
Comments related to this report
david.barker@LIGO.ORG - 10:49, Thursday 22 May 2025 (84541)

Added this crash to FRS20317