aLIGO LHO Logbook

H1 CDS

jeffrey.kissel@LIGO.ORG - posted 21:00, Friday 10 April 2015 - last comment - 21:11, Friday 10 April 2015(17827)

h1boot server crashed 07:05 UTC

J. Kissel, A. Staley, D. Sigg, S. Dwyer, E. Hall, K. Arai, G. Grabeel [remote D. Barker]

The stuff that only happens on a Friday evening.

At 07:05 PDT, the h1boot server crashed. We'd discovered this because all EPICs information from the front-ends (StripTools, MEDM screens, etc) had suddenly flat-lined. After a few calls to Dave, he'd suggested resetting the h1boot computer in the computer Mass Storage Room (MSR, the noisy computer room right adjacent to the control room, where all the front ends live). Unfortunately, because the file system had not been checked in 381 days, it had to go through 2 excruciatingly long system checks on its way back up. An hour later, the initialization completed, and all work-station displays of data instantly came back. 

No apparent hiccups in any front end processes, and they picked up right where they left off. Some work stations had to be rebooted, but others continued on without issue. We continue on to restoring the IFO post HAM6-vent!  

In the mean time, we were able to squeeze in some valuable offline commissioning time (see attached).

Non-image files attached to this report

2015-04-10_h1boot_server_crash.pdf

Comments related to this report

david.barker@LIGO.ORG - 21:11, Friday 10 April 2015 (17828)DAQ

Link

h1boot's log files does not show any activity around the time of freeze up (19:05 PDT). Looks like it just froze, Evan said the console was blank. So only a front panel reset could be done.

Main problem with the extended removal of the /opt/rtcds NFS file system was with the DAQ frame writers. The times when frames were not being written are:

h1fw0: 19:38 - 20:46

h1fw1: 19:51 - 20:46

No data was recorded from 19:51 to 20:46 PDT (55 minutes).

The periodic rsyncs of h1boot have just completed. They are reporting the usual level of file activity on /opt/rtcds, so it looks like the file system came back intact.