h1susauxh2 models stopped running 23:43:02 Fri 18apr2025 PDT with an ADC timing error.
I am able to ssh onto the machine and first scans suggest we have lost an ADC in this system (only 7 of 8 are seen with lspci). We will need to power cycle the IO Chassis before deciding if an ADC replacement is needed.
This is an auxiliary SUS frontend for HAM2 meaning ADCs only and no control function has been lost.
Dmesg:
[Fri Apr 18 23:43:12 2025] rts_cpu_isolator: LIGO code is done, calling regular shutdown code
[Fri Apr 18 23:43:12 2025] h1iopsusauxh2: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Fri Apr 18 23:43:12 2025] h1susauxh2: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
h1susauxh2 is running again, no hardware issues.
When opening an FRS ticket for this I found a similar one from 20 April 2019 (FRS12775) at which time a reboot of the computer fixed it. At 09:02 I stopped the models and powered down h1susauxh2 from command line. After a minute I powered it back up using IPMI. All 8 ADC cards are visible and the models started with no problems.
FRS for today's issue: FRS33903