At 04:26:10 Sun 25 Feb 2024 PST all the models on h1psl0 crashed.
lspci shows we have lost an ADC card, only 3 are visible on the bus.
dmesg output:
[Wed Feb 21 12:08:41 2024] nfs: server 10.101.0.17 OK
[Sun Feb 25 04:27:11 2024] rts_cpu_isolator: LIGO code is done, calling regular shutdown code
[Sun Feb 25 04:27:11 2024] h1ioppsl0: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Sun Feb 25 04:27:11 2024] h1psldbb: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Sun Feb 25 04:27:11 2024] h1pslfss: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Sun Feb 25 04:27:11 2024] h1pslpmc: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
[Sun Feb 25 04:27:11 2024] h1psliss: ERROR - An ADC timeout error has been detected, waiting for an exit signal.
(diskless)root@h1psl0:/home/controls#
4th ADC (adc-3) card is missing
ADC cards: 6d, 75, 78
16bitDAC cards: 6f, 7c, b8, bb
lspci -nvt listing:
-+-[0000:b2]-+-00.0-[b3-bd]----00.0-[b4-bd]----01.0-[b5-bd]----00.0-[b6-bd]--+-01.0-[b7-b8]----00.0-[b8]----04.0 10b5:9056 <<< DAC dac-2
| | +-04.0-[b9]--
| | +-05.0-[ba-bb]----00.0-[bb]----04.0 10b5:9056 <<< DAC dac-3
| | +-07.0-[bc]-- <<< Empty slot
| | \-09.0-[bd]-- <<< Empty slot
| +-02.0-[be-c7]----00.0-[bf-c7]----01.0-[c0-c7]----00.0-[c1-c7]--+-01.0-[c2-c3]----00.0-[c3]----00.0 1221:8682 <<< Contec BIO6464
| | +-04.0-[c4]--
| | +-05.0-[c5]--
| | +-07.0-[c6]--
| | \-09.0-[c7]--
.
+-[0000:64]-+-00.0-[65-6f]----00.0-[66-6f]----01.0-[67-6f]----00.0-[68-6f]--+-01.0-[69]----00.0 10ee:d8c6 <<< LIGO Timing Card
| | +-04.0-[6a]--
| | +-05.0-[6b]-- <<< Empty Slot (cooling for TC and cable run)
| | +-07.0-[6c-6d]----00.0-[6d]----04.0 10b5:9056 <<< ADC adc-0
| | \-09.0-[6e-6f]----00.0-[6f]----04.0 10b5:9056 <<< DAC dac-0
| +-02.0-[70-7c]----00.0-[71-7c]----01.0-[72-7c]----00.0-[73-7c]--+-01.0-[74-75]----00.0-[75]----04.0 10b5:9056 <<< ADC adc-1
| | +-04.0-[76]--
| | +-05.0-[77-78]----00.0-[78]----04.0 10b5:9056 <<< ADC adc-2
| | +-07.0-[79-7a]--+-[0000:7a]---04.0 10b5:9056 <<< Corrupted address, should be ADC adc-3
| | | \-[0000:79]---00.0 10b5:8111
| | \-09.0-[7b-7c]----00.0-[7c]----04.0 10b5:9056 <<< DAC dac-1
| +-05.0 8086:2034
IO Chassis Layout:
10:29 PST: I power cycled h1psl0. As expected this did not fix the issue, and now the lspci scan is not reporting anything in the ADC-3 slot, with DAC-1's address changing from 7c to 7b.
Adnaco second BackPlane is now:
| +-02.0-[70-7b]----00.0-[71-7b]----01.0-[72-7b]----00.0-[73-7b]--+-01.0-[74-75]----00.0-[75]----04.0 10b5:9056 <<< ADC adc-1
| | +-04.0-[76]--
| | +-05.0-[77-78]----00.0-[78]----04.0 10b5:9056 <<< ADC adc-2
| | +-07.0-[79]-- <<< MISSING ADC adc-3
| | \-09.0-[7a-7b]----00.0-[7b]----04.0 10b5:9056 <<< DAC dac-1