aLIGO LHO Logbook

H1 CDS

david.barker@LIGO.ORG - posted 20:40, Saturday 16 September 2017 - last comment - 11:02, Monday 18 September 2017(38675)

CDS computing recovery after this morning's power outage

To verify that front ends could be started, I first started all the non-dolphin FECs (except for the mid station PEMs), there were no problems. Before starting h1psl0 I consulted with Peter King to see if there would be any PSL issues when I do so, he said there would be none.

I noticed a timing error in the corner station and found it to be the IO Chassis for h1seib3 (ITMX) which was powered down. Its front panel switch was in the OFF position. Richard and I think this was accidentally switched much earlier, and only caused a problem after the computer was power cycled. Before powering this IO Chassis up, I first powered down the two AI chassis because of the issue with the 16bit DACs outputting voltage when the IO Chassis is ON and the FEC is OFF (after h1seib3 was later operational I powered the AI chassis back on).

I restarted all the dolphin FECs, using IPMI for the end stations and front panel switches for the MSR. Many FECs started with IRIGB errors, both in the positive and negative directions. We know from experience some of these take many minutes to clear, so I continued with the non-FEC restarts.

I consulted with Jim Warner on starting the end station BRS computers (it was OK to do so) and the HEPI Pump Controllers (we will leave them till Monday).

I went to EY, the first of many trips as it turned out. I powered up h1ecaty1 and h1hwsey. I noticed that h1brsey was already powered up, but its code is not running?

Back in the control room, I noticed all EY FECs had the same large positive IRIG-B error, indicating a problem with the IRIG-B Fanout. Back at EY I confirmed the IRIG-B fanout was reporting the date as mid June 1999. After some issues, I power cycled the IRIG-B chassis and rebooted the FECs. The front ends were now running correctly.

At this point I ran out of time. There is timing issue with h1iscex, and the Beckhoff timing fanout is reporting an error with the fourth Duotone slave which I suspect is h1iscex.

To be done:

Start mid station PEM FECs.

power up h1ecatx1 and h1hwsex

Start Beckhoff slow controls code on h1ecat[x,y]1. Start HWS code on h1hwse[x,y]

Investigate Duotone Timing error at EX, get h1iscex running.

Start hepi pump controllers.

Start digital video servers.

Start BRS code.

Start PSL diode room Beckhoff computer.

Comments related to this report

david.barker@LIGO.ORG - 20:43, Saturday 16 September 2017 (38676)

Link

attached CDS site overview MEDM.

Images attached to this comment

patrick.thomas@LIGO.ORG - 08:12, Monday 18 September 2017 (38677)

Link

Was able to use remote desktop to connect to h1ecaty1. The terminal for the EPICS IOC was open, but it appeared that something had not started properly. I used the icon on the desktop to restart the computer. This appears to have worked.

I can not reach h1ecatx1. Will likely need to turn it on locally.

patrick.thomas@LIGO.ORG - 09:39, Monday 18 September 2017 (38680)

Link

Jeff B. powered on h1ecatx1. I logged in with remote desktop and found the same issue as I had with h1ecaty1 (screenshot attached). I used the icon on the desktop to restart the computer. This appears to have worked.

Images attached to this comment

patrick.thomas@LIGO.ORG - 10:56, Monday 18 September 2017 (38681)

Link

Burtrestored the FMCS IOC to restore alarm levels:

patrick.thomas@zotws12:/ligo/cds/lho/h0/burt/2017/09/15/06:00$ burtwb -f h0fmcs.snap

patrick.thomas@LIGO.ORG - 11:02, Monday 18 September 2017 (38683)

Link

All the weather stations except the CS had lost their connections. I restarted and burtrestored the IOCs on h0epics.