Reports until 14:40, Thursday 14 May 2015
H1 CDS (DAQ, DCS)
david.barker@LIGO.ORG - posted 14:40, Thursday 14 May 2015 (18429)
h1fw0 unstable, h1ldasgw0 rebooted

Dan, Greg, Jim, Dave

Perhaps coincidentally, but since the LDAS tape robot was relocated late Tuesday h1fw0 has been unstable. Its log files suggest a slow disk system is causing the problems.

We were given permission to reboot the h1ldasgw0 Solaris QFS/NFS server, and Dan took the opportunity to upgrade this Solaris system (last upgrade 9/11/2014).

The only potential problem with unmounting the disk system from h1nds0 is with the SYS_DIAG Guardian node. It is performing regular NDS requests for the past 30 seconds of BRS data to determine if the data is not flatlined. Since this data should be served by the daqd process from memory, and not the nds process from disk, we are confident this will not cause any problems for the node.

Procedure is:

h1fw0: stop monit, stop daqd process (via telnet), un-mount h1ldasgw0

h1nds0: stop monit, stop nds process (via init.d script), un-mount h1ldasgw0

h1ldasgw0: perform upgrades, reboot, share QFS via NFS

h1fw0: mount h1ldasgw0, start monit (which in turn starts daqd)

h1nds0: mount h1ldasgw0, start nds process via init.d script, start monit.

This has not helped, h1fw0 has restarted twice since this was done.