Displaying report 1-1 of 1.
Reports until 08:03, Monday 21 September 2020
H1 DAQ
david.barker@LIGO.ORG - posted 08:03, Monday 21 September 2020 - last comment - 10:30, Monday 21 September 2020(56834)
DAQ crash at 00:20 Monday 21 Sept

All of the DAQ machines are unresponsive except for h1daqtw0. Both frame writers were writing the 00:20 frame at the time of the crash.

Images attached to this report
Comments related to this report
david.barker@LIGO.ORG - 08:13, Monday 21 September 2020 (56835)

We brought up the data concentrators' console on SMCIPMI and they do not report any errors, just the regular login prompt which is not responding to the keyboard. Jonthan is in the process of rebooting the DAQ, starting with the DCs.

david.barker@LIGO.ORG - 08:17, Monday 21 September 2020 (56836)

h1daqnds0 console shown as a reference

Images attached to this comment
david.barker@LIGO.ORG - 08:25, Monday 21 September 2020 (56838)

Opened FRS15602

jonathan.hanks@LIGO.ORG - 09:15, Monday 21 September 2020 (56839)
While recovering the daqd system the 10Gb port on h1daqdc0 went down after it had been up and receiving traffic.  Dmesg reported the following errors:

[  698.537347] sfc 0000:02:00.0 ens2f0np0: RX DMA error (event: c0011004:00111001)
[  698.537550] sfc 0000:02:00.0 ens2f0np0: resetting (RECOVER_OR_ALL)
[  698.580015] sfc 0000:02:00.0 ens2f0np0: efx_ef10_rx_push_exclusive_rss_config: failed rc=-1
[  698.580224] sfc 0000:02:00.0 ens2f0np0: MC command 0x80 inlen 164 failed rc=-22 (raw=22) arg=789
[  698.580944] sfc 0000:02:00.0 ens2f0np0: has been disabled


I rebooted the system to bring the interface up after it failed to come up with ifup.
jonathan.hanks@LIGO.ORG - 09:17, Monday 21 September 2020 (56840)
While doing the recovery I brought up the h1daq*1 systems and then moved to the h1daq*0 systems.  While I was working on h1daq*0 I had a lockup on h1daqdc1 again.  Same symptoms, no error messages on the console, not responsive to console or network input.
jonathan.hanks@LIGO.ORG - 09:18, Monday 21 September 2020 (56841)
All the daqd systems are up and running now.
jonathan.hanks@LIGO.ORG - 10:30, Monday 21 September 2020 (56842)
The firmware and driver versions of the 10Gb cards on the dc machines:

h1daqdc0
Part Number: SFN7x02F
driver: sfc
version: 4.1
firmware-version: 6.2.5.1000 rx0 tx0

h1daqdc1
Part Number: SFN7x02F
driver: sfc
version: 4.1
firmware-version: 6.2.7.1000 rx0 tx0



Displaying report 1-1 of 1.