Displaying report 1-1 of 1.
Reports until 15:04, Tuesday 13 March 2018
H1 DAQ (CDS)
david.barker@LIGO.ORG - posted 15:04, Tuesday 13 March 2018 - last comment - 08:31, Wednesday 14 March 2018(40996)
h1tw0 went unstable, looks like reboot fixed it

I restarted the DAQ at 12:05 PDT. About 90 minutes later h1fw0 crashed and was auto-restarted by monit. This was followed quickly with another restart. At this point I started monitoring it carefully and noticed it was very slow in writing its frame files, to the extent that it was sometimes over 64 seconds behind. Most of the time it managed to almost catch up.

I verified that h1fw0's E18 RAID was not reporting any errors. On h1fw0 I noticed that looking at the /trend mount froze the terminal. So as a first try, I rebooted h1fw0 (leaving h1ldasgw0 running). It came back up correctly and the problem appears to have been resolved. The /trend issue was an old mount in /etc/fstab trying to mount h1tw0, a machine which was turned off several weeks ago. I have corrected the fstab file, but for the current run it was still in the file and /trend is resolving as an empty mount point (as it should be) and not freezing the terminal.

Attached plot show 30 mins of second trend, showing the full file size for h1fw0 (red), h1fw1 (green) and h1fw2 (blue). fw2 and fw3 are in lock step, fw0 is one cycle behind initially and catches up later.

Images attached to this report
Comments related to this report
david.barker@LIGO.ORG - 16:02, Tuesday 13 March 2018 (40997)

I spoke too soon, h1fw0 restarted itself 93 minutes later.

I performed a full power cycle of both h1fw0 and h1ldasgw0 (solaris QFS server) which has fixed these issues in the past. System has been back for 15 minutes, I'm monitoring it closely.

david.barker@LIGO.ORG - 08:31, Wednesday 14 March 2018 (41005)

Looks like it is fixed now, has been running for 18 hours.

Displaying report 1-1 of 1.