Displaying report 1-1 of 1.
Reports until 20:47, Monday 22 May 2023
H1 CDS
david.barker@LIGO.ORG - posted 20:47, Monday 22 May 2023 - last comment - 20:51, Monday 22 May 2023(69805)
Dolphin problems, several frontends are down

FRS28008

IPC was found to be flashing red/green. There were DACKILLs on h1sus[b123, h2a, h34, h56] h1asc0, h1lsc0. The long range dolphin CS-ends were all red (ends to corner all green). I started to dolphin disable switch ports for omc, asc, lsc, oaf, then all the sus. Finally I disabled h1cdsrfm 3 ports (corner, ex, ey) which finally stopped the flashing

Prior to rebooting any SUS, I set all the SEI SWWDs into bypass mode.

As a first recovery, I rebooted h1susb123. It came back with no problems.

I then rebooted h1cdsrfm, which started with no issues, and at this point still showing no CS->ES traffic because all the CS senders are disabled.

I then rebooted all the remaining CS FE which had been dolphin disabled. All came back OK, all the h1cdsrfm CS->ES went green.

I untripped the SWWD and handed the system over to Corey, Jim and Rahul to complete the recovery.

h1cdsrfm dmesg reports a Dolphin IX adapter error at the time this IPC problem started.


[Tue Apr 11 11:14:37 2023] IXH Adapter 2 : Enable option to create session to all available nodes.
[Mon May 22 19:57:30 2023] IXH Adapter 2 : Uncorrectable error on adapter PCIe SLOT detected - Event_type=0x3 Event_data=0x20
[Mon May 22 19:57:30 2023] IXH Adapter 2 : Port 0 is not operational -- UpTime: 0 sec - Event = 3
[Mon May 22 20:11:51 2023] IXH Adapter 1 : Port 0 is not operational -- UpTime: 0 sec - Event = 0
[Mon May 22 20:11:57 2023] IXH Adapter 0 : Port 0 is not operational -- UpTime: 0 sec - Event = 0
 

Images attached to this report
Comments related to this report
rahul.kumar@LIGO.ORG - 20:42, Monday 22 May 2023 (69807)SUS

All SUS are back online.

david.barker@LIGO.ORG - 20:50, Monday 22 May 2023 (69808)

here is an example of the IPC flashing on h1iopseib2

Images attached to this comment
david.barker@LIGO.ORG - 20:51, Monday 22 May 2023 (69809)

Mon22May2023
LOC TIME HOSTNAME     MODEL/REBOOT
20:13:51 h1susb123    ***REBOOT***
20:16:15 h1susb123    h1iopsusb123
20:16:28 h1susb123    h1susitmy   
20:16:41 h1susb123    h1susbs     
20:16:54 h1susb123    h1susitmx   
20:17:07 h1susb123    h1susitmpi  
20:20:47 h1cdsrfm     ***REBOOT***
20:21:59 h1sush2a     ***REBOOT***
20:23:36 h1sush34     ***REBOOT***
20:24:12 h1sush2a     h1iopsush2a 
20:24:25 h1sush2a     h1susmc1    
20:24:38 h1sush2a     h1susmc3    
20:24:51 h1sush2a     h1susprm    
20:25:04 h1sush2a     h1suspr3    
20:25:11 h1sush56     ***REBOOT***
20:25:46 h1sush34     h1iopsush34 
20:25:59 h1sush34     h1susmc2    
20:26:12 h1sush34     h1suspr2    
20:26:25 h1sush34     h1sussr2    
20:26:43 h1asc0       ***REBOOT***
20:27:15 h1sush56     h1iopsush56 
20:27:28 h1sush56     h1sussrm    
20:27:41 h1sush56     h1sussr3    
20:27:54 h1lsc0       ***REBOOT***
20:27:54 h1sush56     h1susifoout 
20:28:07 h1sush56     h1sussqzout 
20:28:19 h1asc0       h1iopasc0   
20:28:32 h1asc0       h1asc       
20:28:45 h1asc0       h1ascimc    
20:28:58 h1asc0       h1ascsqzifo 
20:29:08 h1oaf0       ***REBOOT***
20:29:29 h1lsc0       h1ioplsc0   
20:29:30 h1omc0       ***REBOOT***
20:29:42 h1lsc0       h1lsc       
20:29:55 h1lsc0       h1lscaux    
20:30:08 h1lsc0       h1sqz       
20:30:21 h1lsc0       h1ascsqzfc  
20:30:49 h1oaf0       h1iopoaf0   
20:30:58 h1omc0       h1iopomc0   
20:31:02 h1oaf0       h1pemcs     
20:31:11 h1omc0       h1omc       
20:31:15 h1oaf0       h1tcscs     
20:31:24 h1omc0       h1omcpi     
20:31:28 h1oaf0       h1susprocpi 
20:31:41 h1oaf0       h1seiproc   
20:31:54 h1oaf0       h1oaf       
20:32:07 h1oaf0       h1calcs     
20:32:20 h1oaf0       h1susproc   
20:32:33 h1oaf0       h1calinj    
20:32:46 h1oaf0       h1bos       
 

Displaying report 1-1 of 1.