Reports until 08:23, Monday 26 August 2024
H1 CDS
david.barker@LIGO.ORG - posted 08:23, Monday 26 August 2024 - last comment - 17:09, Monday 26 August 2024(79708)
EY Dolphin Crash

TJ, Jonathan, EJ, Dave:

Around 01:30 this morning we had a Dolphin crash of all the frontends at EY (h1susey, h1seiey, h1iscey). h1susauxey is not on the Dolphin network as was not impacted.

We could not ping these machines, but were able to get some diagnostics from their IPMI management ports.

At 07:57 we powered down h1[sus,sei,isc]ey for about a minute and then powered them back on.

We checked the IX Dolphin switch at EY was responsive on the network.

All the systems came back with no issues. SWWD and model WDs were cleared. TJ is recovering H1.

Comments related to this report
jonathan.hanks@LIGO.ORG - 08:29, Monday 26 August 2024 (79709)
Screen shots of the console retrieved via ipmi.  h1iscey had a similar screen to h1seiey, same crash dump.

h1iscey, h1seiey - crash in the dolphin driver.
h1susey - kernel panic, with a note that a LIGO real time module had been unloaded.
Images attached to this comment
david.barker@LIGO.ORG - 08:27, Monday 26 August 2024 (79710)

Crash time: 01:43:47 PDT

david.barker@LIGO.ORG - 08:51, Monday 26 August 2024 (79711)
Images attached to this comment
david.barker@LIGO.ORG - 12:01, Monday 26 August 2024 (79717)

Reboot/Restart LOG:

Mon26Aug2024
LOC TIME HOSTNAME     MODEL/REBOOT
07:59:27 h1susey      ***REBOOT***
07:59:30 h1seiey      ***REBOOT***
08:00:04 h1iscey      ***REBOOT***
08:01:04 h1seiey      h1iopseiey  
08:01:17 h1seiey      h1hpietmy   
08:01:30 h1seiey      h1isietmy   
08:01:32 h1susey      h1iopsusey  
08:01:45 h1susey      h1susetmy   
08:01:47 h1iscey      h1iopiscey  
08:01:58 h1susey      h1sustmsy   
08:02:00 h1iscey      h1pemey     
08:02:11 h1susey      h1susetmypi 
08:02:13 h1iscey      h1iscey     
08:02:26 h1iscey      h1caley     
08:02:39 h1iscey      h1alsey     
 
 

david.barker@LIGO.ORG - 17:09, Monday 26 August 2024 (79727)

FYI: There was a pending filter module change for h1susetmypi which got installed when this model was restarted this morning.