IPC was found to be flashing red/green. There were DACKILLs on h1sus[b123, h2a, h34, h56] h1asc0, h1lsc0. The long range dolphin CS-ends were all red (ends to corner all green). I started to dolphin disable switch ports for omc, asc, lsc, oaf, then all the sus. Finally I disabled h1cdsrfm 3 ports (corner, ex, ey) which finally stopped the flashing
Prior to rebooting any SUS, I set all the SEI SWWDs into bypass mode.
As a first recovery, I rebooted h1susb123. It came back with no problems.
I then rebooted h1cdsrfm, which started with no issues, and at this point still showing no CS->ES traffic because all the CS senders are disabled.
I then rebooted all the remaining CS FE which had been dolphin disabled. All came back OK, all the h1cdsrfm CS->ES went green.
I untripped the SWWD and handed the system over to Corey, Jim and Rahul to complete the recovery.
h1cdsrfm dmesg reports a Dolphin IX adapter error at the time this IPC problem started.
[Tue Apr 11 11:14:37 2023] IXH Adapter 2 : Enable option to create session to all available nodes.
[Mon May 22 19:57:30 2023] IXH Adapter 2 : Uncorrectable error on adapter PCIe SLOT detected - Event_type=0x3 Event_data=0x20
[Mon May 22 19:57:30 2023] IXH Adapter 2 : Port 0 is not operational -- UpTime: 0 sec - Event = 3
[Mon May 22 20:11:51 2023] IXH Adapter 1 : Port 0 is not operational -- UpTime: 0 sec - Event = 0
[Mon May 22 20:11:57 2023] IXH Adapter 0 : Port 0 is not operational -- UpTime: 0 sec - Event = 0
All SUS are back online.
here is an example of the IPC flashing on h1iopseib2
Mon22May2023
LOC TIME HOSTNAME MODEL/REBOOT
20:13:51 h1susb123 ***REBOOT***
20:16:15 h1susb123 h1iopsusb123
20:16:28 h1susb123 h1susitmy
20:16:41 h1susb123 h1susbs
20:16:54 h1susb123 h1susitmx
20:17:07 h1susb123 h1susitmpi
20:20:47 h1cdsrfm ***REBOOT***
20:21:59 h1sush2a ***REBOOT***
20:23:36 h1sush34 ***REBOOT***
20:24:12 h1sush2a h1iopsush2a
20:24:25 h1sush2a h1susmc1
20:24:38 h1sush2a h1susmc3
20:24:51 h1sush2a h1susprm
20:25:04 h1sush2a h1suspr3
20:25:11 h1sush56 ***REBOOT***
20:25:46 h1sush34 h1iopsush34
20:25:59 h1sush34 h1susmc2
20:26:12 h1sush34 h1suspr2
20:26:25 h1sush34 h1sussr2
20:26:43 h1asc0 ***REBOOT***
20:27:15 h1sush56 h1iopsush56
20:27:28 h1sush56 h1sussrm
20:27:41 h1sush56 h1sussr3
20:27:54 h1lsc0 ***REBOOT***
20:27:54 h1sush56 h1susifoout
20:28:07 h1sush56 h1sussqzout
20:28:19 h1asc0 h1iopasc0
20:28:32 h1asc0 h1asc
20:28:45 h1asc0 h1ascimc
20:28:58 h1asc0 h1ascsqzifo
20:29:08 h1oaf0 ***REBOOT***
20:29:29 h1lsc0 h1ioplsc0
20:29:30 h1omc0 ***REBOOT***
20:29:42 h1lsc0 h1lsc
20:29:55 h1lsc0 h1lscaux
20:30:08 h1lsc0 h1sqz
20:30:21 h1lsc0 h1ascsqzfc
20:30:49 h1oaf0 h1iopoaf0
20:30:58 h1omc0 h1iopomc0
20:31:02 h1oaf0 h1pemcs
20:31:11 h1omc0 h1omc
20:31:15 h1oaf0 h1tcscs
20:31:24 h1omc0 h1omcpi
20:31:28 h1oaf0 h1susprocpi
20:31:41 h1oaf0 h1seiproc
20:31:54 h1oaf0 h1oaf
20:32:07 h1oaf0 h1calcs
20:32:20 h1oaf0 h1susproc
20:32:33 h1oaf0 h1calinj
20:32:46 h1oaf0 h1bos