Displaying report 1-1 of 1.
Reports until 12:31, Monday 25 October 2021
H1 CDS
david.barker@LIGO.ORG - posted 12:31, Monday 25 October 2021 - last comment - 13:35, Monday 25 October 2021(60382)
WP10002 Reboot all corner station Dolphin'ed front end computers

Patrick, TJ, Jonathan, Erik, Dave:

We have rebooted all the corner station systems which are on the Dolphin fabric. Patrick put all the systems into safe mode. We then stopped the models, stopped the Dolphin drivers, disabled the Dolphin switch port and rebooted the front end computer.

the frontends were rebooted in the order: h1susb123, h1sush2a, h1sush2b, h1sush34, h1seib1, h1seib2, h1seib3, h1seih16, h1seih23, h1seih45, h1seih7, h1oaf0, h1lsc0, h1asc0, h1oaf1, h1cdsrfm, h1sush56, h1sush7.

We booted h1sush[56, 7] last because they were being used when the reboots started. h1oaf0 developed a DAQ FIFO-FULL error, so its models were restarted a second time.

As seen before, stopping the dolphin drivers was silent except for h1lsc0, which gives a Disaling IRQ #16 message to the console.

To protect the DAQ from possible bad data from h1oaf1, it was running with a non-standard DAQ port until all the reboots had been completed, at which time its DAQ data stream was resumed.

Before the reboots began, here are the results of running the checklut script on the Dolphined machines (end station is added for completeness).

controls@h1boot1:~/zombie$ ./cmd_dolphin 'python3 /home/controls/zombie/checklut.py'
h1cdsrfm
h1cdsrfm: less than 20/24 LUT entries used
h1sush2b
h1sush2b: ***** WARNING 23/24 LUT ENTRIES FILLED ****
h1sush2a
h1sush2a: ***** WARNING 23/24 LUT ENTRIES FILLED ****
h1oaf1
Open failed :: No error 2 (0x2)
h1oaf1: less than 20/24 LUT entries used
h1oaf0
h1oaf0: ***** WARNING 23/24 LUT ENTRIES FILLED ****
h1sush34
h1sush34: ***** WARNING 23/24 LUT ENTRIES FILLED ****
h1lsc0
h1lsc0: less than 20/24 LUT entries used
h1susb123
h1susb123: ***** WARNING 23/24 LUT ENTRIES FILLED ****
h1sush56
h1sush56: ***** WARNING 24/24 LUT ENTRIES FILLED ****
h1asc0
h1asc0: less than 20/24 LUT entries used
h1susex
h1susex: less than 20/24 LUT entries used
h1susey
h1susey: less than 20/24 LUT entries used
h1seih23
h1seih23: less than 20/24 LUT entries used
h1seih45
h1seih45: less than 20/24 LUT entries used
h1seiex
h1seiex: less than 20/24 LUT entries used
h1seiey
h1seiey: less than 20/24 LUT entries used
h1iscex
h1iscex: less than 20/24 LUT entries used
h1seih16
h1seih16: less than 20/24 LUT entries used
h1iscey
h1iscey: less than 20/24 LUT entries used
h1seib3
h1seib3: less than 20/24 LUT entries used
h1seib2
h1seib2: less than 20/24 LUT entries used
h1seib1
h1seib1: less than 20/24 LUT entries used
h1sush7
h1sush7: less than 20/24 LUT entries used
h1seih7
h1seih7: less than 20/24 LUT entries used
 

Comments related to this report
david.barker@LIGO.ORG - 12:33, Monday 25 October 2021 (60383)

12:33: ran the checklut's, all front ends are reporting 'less than 20/24' message.

david.barker@LIGO.ORG - 12:37, Monday 25 October 2021 (60384)

FC1 RMS went high soon after the reboot, causing the HAM7-SWWD to trip h1iopseih7's DACKILL. I bypassed the SUS trip, TJ and Patrick resolved the ring-up and the watchdogs were then restored.

Patrick discovered a bug with the FC1 main MEDM, which reported the SWWD as being tripped when it was in its countdown state. This is most probably a residual from when the SWWD was 'on/off' and did not have the count down states.

david.barker@LIGO.ORG - 13:35, Monday 25 October 2021 (60385)

After the reboots the only issue we see are IPC errors on the h1susprocpi model [on h1oaf0], receiving channels from the h1omcpi model [on h1lsc0]. For almost every seconds there are one or two errors.

Displaying report 1-1 of 1.