at 11:10 PST h1boot generated an nfsd error (see below). All front end computers have locked up. We tried rebooting h1susauxh2 but h1boot is no longer permitting this (missing root file system). We need to first reboot h1boot, then all front end computers.
h1boot has gone 230 days without a file system check. An fsck on the /opt/rtcds (800GB) file system is in progress, will take almost an hour to complete.
h1boot completed its fsck and started running. Looks like all the front end real-time cores ran the entire time. This raises an interesting problem, should they run when there is no operator/guardian control of them? Did guardian keep trying to control the system and, with no feed-back, continue to 'push' the system.
The DAQ data continued to flow for the entire time. We see evidence of some channels glitching when the front end controls was recovered.
We will test this scenario further to see if a solution is needed.
Downtime was 11:10 - 12:40 PST
Filed an FRS for this since we lost 1.5hrs of commissioning time. This is FRS #4429.
Note: it wasn't clear what units of time to enter for "Orig. Est", and I entered 90 (as in minutes), but this turned out to be an entry of 90hrs! I believe it has been corrected to 1.5hrs.