Displaying report 1-1 of 1.
Reports until 15:23, Saturday 09 August 2014
H1 AOS
david.barker@LIGO.ORG - posted 15:23, Saturday 09 August 2014 (13316)
h1fw0 back up and running

Greg, Dan, Dave

h1fw0 is back up and running. 

The problem was with disk9 in the raid raid-dcs-h1a. Its status led was flashing yellow instead of steady green. But it appears to have only partially failed, and the raid continued to try to use it. The result was an unstable file system which cannot keep up with the frame writing.

Step 1 was to run the Oracle 'guds' command to provide diagnostics.

The second step was to stop reading this file system and only have h1fw0 write it, still unstable.

The third step was to power cycle the solaris box h1ldasgw0, remount to h1fw0 and restart frame writer, still unstable.

The fourth and most drastic step, was to walk to the LDAS server room and physically remove the offending disk #9 from the raid (hot removal). This forced the raid to stop trying to use disk9 and to start using the hot-swap spare.

At the time of writing h1fw0 has been running for 20 mins, which is longer that it has since 4am this morning.

Displaying report 1-1 of 1.