TITLE: 05/2 Day Shift: 15:00-23:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Observing
OUTGOING OPERATOR: Jeff
QUICK SUMMARY: H1 is still locked although maintenance day has began. I switched the OPS_OBSERVATORY_MODE TO Preventative Maintenance.
5.1 Magnitude EQ South of Tonag at 14:06utc, (07:56PT).
The first half of the shift was in Observing with no apparent problems. Range is in the upper 60Mpc and environmental conditions are good. With a little luck the second half of the shift will be a lot quieter than yesterday.
TITLE: 05/02 Eve Shift: 23:00-07:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 67Mpc
INCOMING OPERATOR: Jeff
SHIFT SUMMARY: Recovering from the many Alaska earthquakes and front end reboots. Observing for the past 2hours
LOG:
Jeff Kissel, Kiwamu Izumi, Jenne Driggers, TJ Shaffer
The two 6+ earthquakes in Alaska along with the front end restarts did a number on us, but Jeff K and Kiwamu worked hard and we are now back to Observing at 68Mpc.
I arrived on shift with initial alignment just finishing up. We could not go past LOCKING_ARMS_GREEN without damping the severely rung up bounce modes, so we spent some time damping those and trying to get them below the noise. After we were able to move on from there, we got DRMI after a handful of tries. We stopped at RF_DARM to damp the bounce modes further, and then check on the OM alignment at ANALOG_CARM. We then had to engage the SOFT loops slowly, per Jenne's suggestion. At this point we could see that the roll modes were also very rung up, and knowing this, still proceeded to lose lock at ROLL_MODE_DAMPING. Doh.
So I ran through the whole thing again, and this time I damped the roll modes by hand. The rest of locking was smooth sailing.
Now I can finally eat.
To hopefully speed up recovery in the future, I have fixed the IFO_ALIGN_COMPACT screen so that it is compatible with Time Machine.
I have also made a driftmon screen (accessible from IFO_ALIGN_COMPACT) that looks at either the OSEM or OpLev witness at the bottom stage of each optic. The pre-existing driftmon screen looked at the top stage. This new screen is also compatible with Time Machine, and color-coded to match the sliders screen.
Hopefully we will in the future just have to time machine the one driftmon screen, move sliders to match those locations, and then go forward. This removes the need for operators or the recovery team to trend or Time Machine each channel for each optic individually, so will hopefully speed things up significantly.
All IOP models were restarted recently, which gives me an opportunity to check on the AUTOCAL status of the 18 bit DACs at startup. We have 100% success, all taking the usual 5.3 seconds except two taking 5.1 seconds and two taking 6.5 seconds.
h1sush2a
[ 129.028682] h1iopsush2a: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 134.392007] h1iopsush2a: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 141.426432] h1iopsush2a: DAC AUTOCAL SUCCESS in 6576 milliseconds +++
[ 146.789845] h1iopsush2a: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 152.582684] h1iopsush2a: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 157.944027] h1iopsush2a: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 163.305387] h1iopsush2a: DAC AUTOCAL SUCCESS in 5344 milliseconds
h1sush2b
[ 49.239824] h1iopsush2b: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 54.603246] h1iopsush2b: DAC AUTOCAL SUCCESS in 5344 milliseconds
h1sush34
[ 258.304447] h1iopsush34: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 263.667874] h1iopsush34: DAC AUTOCAL SUCCESS in 5341 milliseconds
[ 269.461621] h1iopsush34: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 274.824274] h1iopsush34: DAC AUTOCAL SUCCESS in 5340 milliseconds
[ 280.618002] h1iopsush34: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 285.979340] h1iopsush34: DAC AUTOCAL SUCCESS in 5344 milliseconds
h1sush56
[ 200.361619] h1iopsush56: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 205.724956] h1iopsush56: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 211.523836] h1iopsush56: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 216.886349] h1iopsush56: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 222.680190] h1iopsush56: DAC AUTOCAL SUCCESS in 5344 milliseconds
h1susb123
[ 48.320209] h1iopsusb123: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 53.681546] h1iopsusb123: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 59.472545] h1iopsusb123: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 64.838906] h1iopsusb123: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 71.868287] h1iopsusb123: DAC AUTOCAL SUCCESS in 6576 milliseconds ***
[ 77.231569] h1iopsusb123: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 82.594837] h1iopsusb123: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 87.957423] h1iopsusb123: DAC AUTOCAL SUCCESS in 5345 milliseconds
h1oaf0
[ 51.570076] h1iopoaf0: DAC AUTOCAL SUCCESS in 5345 milliseconds
h1susey
[ 51.283680] h1iopsusey: DAC AUTOCAL SUCCESS in 5346 milliseconds
[ 56.644737] h1iopsusey: DAC AUTOCAL SUCCESS in 5346 milliseconds
[ 62.436958] h1iopsusey: DAC AUTOCAL SUCCESS in 5346 milliseconds
[ 67.803614] h1iopsusey: DAC AUTOCAL SUCCESS in 5346 milliseconds
[ 73.164682] h1iopsusey: DAC AUTOCAL SUCCESS in 5346 milliseconds
h1iscey
[ 1163.540088] h1iopiscey: DAC AUTOCAL SUCCESS in 5133 milliseconds ---
h1susex
[284637.769316] h1iopsusex: DAC AUTOCAL SUCCESS in 5346 milliseconds
[284643.127603] h1iopsusex: DAC AUTOCAL SUCCESS in 5343 milliseconds
[284648.914014] h1iopsusex: DAC AUTOCAL SUCCESS in 5328 milliseconds
[284654.271652] h1iopsusex: DAC AUTOCAL SUCCESS in 5345 milliseconds
[284659.629553] h1iopsusex: DAC AUTOCAL SUCCESS in 5383 milliseconds
h1iscex
[ 365.004169] h1iopiscex: DAC AUTOCAL SUCCESS in 5134 milliseconds ---
Nothing unusual. There are obvious signs of diode current adjustments.
Concur with Ed, everything looks normal.
TITLE: 05/01 Day Shift: 15:00-23:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Earthquake
INCOMING OPERATOR: TJ
SHIFT SUMMARY:
H1 recovery is ongoing. BS FE computers crashed while trying to damp extremely high bounce modes. Dave took advantage to reboot the DAQ. Currently Locking PRC is a daunting task. Jeff K is having his hand at aligning RMs to REFL WFS.
LOG:
H1 down due to heavy EQ activity in Alaska.
15:39 Bubba and Chris out to lube fans in corner
17:26 Fil out to vault
20:10 Chris to MY
20:29 Fil to MY
20:45 Begin WD resetting
Ed, Jeff K, Richard, Dave:
h1seib2 locked up at 13:05 PDT this afternoon after running for 215 days. Its console reported an uptime of 399,422 seconds, and the error was the same seen on the other lockup consoles (fixing recursive fault but reboot is needed!).
I rebooted all of the front end computers with run-times exceeding 208 days. The front ends which were not rebooted are (followed by their uptimes): h1oaf0 (168 days), h1psl0 (134 days), h1seiex (4 days), h1susex (3 days).
I also rebooted the DAQ computer h1nds0 which had been running 209 days. The following DAQ computers were not rebooted: h1dc0 (194 days), h1nds1 (91 days), h1tw1 (111 days), h1broadcast0 (57 days), h1build (102 days), h1boot (90 days).
Notes:
h1psl0 and h1oaf0 could be scheduled for reboots during the May vent event.
h1dc0 is scheduled to be rebooted during 5/2 maintenance, to clear this befor the May vent event.
reboot details:
I was able to take h1seib2 out of the Dolphin fabric before it was reset. I found that I was able to manage the computer via the IPMI management port (so only the standard gigabit ethernet ports were disabled) and reset the computer using this method. The computer came back correctly.
For every dolphin'ed corner station the procedure was: kill all models, become root user, take local host out of dolphin fabric, issue 'reboot' command. This worked on every machine, and the h1psl models were never glitched.
For every non-dolphin'ed corner station the procedure was the same sans the dolphin removal.
At EX, h1iscex was the only dolphin'ed machine with needed a reboot. But the procedure failed, and the machine did not come back from soft reboot. Using IPMI to reset, all other EX models were dolphin glitched and needed code restarts (not computer reboots). The sus-aux rebooted with no issues.
At EY things were worse. Following the procedure none of the dolphined machines restarted correctly following soft reboot. IPMI resets got h1seiey and h1iscey going, but not h1susey (faster computer model). We ended up IPMI power-cycling h1susey to get the code to run. Last Friday morning this is how h1susex was recovered too. the sus-aux rebooted with no issues.
h1dc0 did not shutdown cleanly (I think this is a known daqd issue) and needed a front panel RESET button press. It was a slow restart because the OS had been running in excess of 214 days (like we didn't know this!). It kicked out an error in attempting to NFS mount h1tw0 (absent) and started running.
after the reboots, here are the current uptimes for the front end computers
h1psl0 up 134 days
h1seih16 up 59 min
h1seih23 up 57 min
h1seih45 up 55 min
h1seib1 up 1:01
h1seib2 up 1:15
h1seib3 up 1:00
h1sush2a up 1:06
h1sush2b up 1:05
h1sush34 up 1:04
h1sush56 up 1:02
h1susb123 up 1:10
h1susauxh2 up 1 day
h1susauxh34 up 22 min
h1susauxh56 up 19 min
h1susauxb123 up 9:54
h1oaf0 up 168 days
h1lsc0 up 54 min
h1asc0 up 53 min
h1pemmx up 17 min
h1pemmy up 16 min
h1susauxey up 45 min
h1susey up 19 min
h1seiey up 35 min
h1iscey up 35 min
h1susauxex up 41 min
h1susex up 3 days
h1seiex up 4 days
h1iscex up 40 min
here is a photo of h1seib2's console after it had locked up
J. Kissel, E. Merilh, J. Driggers, K. Izumi, D. Barker Just on an update on the recovery process this morning: - ~12:30 UTC Giant EQ in Alaska (LHO aLOG 35930) - EQ is on our tectonic plate, so happens too fast for Bartlett to switch ISI Systems to Robust / EQ mode, so they all trip - ~14:30 UTC Giant Aftershock (LHO aLOG 35931) - Same story: EQ is on our tectonic plate, so happens too fast for Bartlett to switch ISI Systems to Robust / EQ mode, so they all trip - ~15:00 UTC Ed Merilh takes over from Bartlett - 16:30 UTC Ed having trouble with INITIAL ALIGNMENT (specifically with PRC WFS loops would not close) - Solution was to skip to MICH_DARK_LOCKED and make sure BS was well aligned, and then back to PRC and move PRM in pitch until AS port looked well - ~17:30 UTC Finished with Initial Alignment, microseism and EQ band have sufficiently rung down to begin locking - ~18:00 UTC Found that we couldn't get past LOCKING ALS because ETMY Bounce Mode, spent a LONG time running with ALS_DIFF at a factor of 10 less gain, and SLLOOOOWWWWLYY damping the mode with gains of 0.001, and we eventually got up to 0.028, and then - ~20:05 UTC The SEI BS crashed. Since we're already down, we voted for just completing all corner station and end station front-end reboots. Dave's working on that now, more news on everything later.
h1seib2 froze up at 13:15 local time, we are now restarting all machines with more than 208 days of run time.
(manual run of the scripts, crontabs were not operational)
Starting CP3 fill. LLCV enabled. LLCV set to manual control. LLCV set to 50% open. Fill completed in 40 seconds. TC B did not register fill. LLCV set back to 18.0% open.
Starting CP4 fill. LLCV enabled. LLCV set to manual control. LLCV set to 70% open. Fill completed in 630 seconds. TC A did not register fill. LLCV set back to 42.0% open.
I've reconfigured the crontab for user vacuum on vacuum1, autofill should run automatically on Wednesday.
Increased CP4 to 43% open.
We had two more lock ups over the weekend. Luckily they were on non-dolphined corner station machines (SUS-AUX) and I was able to work remotely with the operator on getting these machines reset via the front panel reset button.
Here is a summary of the lockups since Wednesday night:
h1susauxb123 | Mon 11:24 01 mar 2017 UTC (04:24 PDT) |
h1susauxh2 | Sun 20:20 30 apr 2017 UTC (13:30 PDT) |
h1susex | Fri 12:43 28 apr 2017UTC (05:43 PDT) |
h1seiex | Thu 13:10 27 apr 2017 UTC (06:10 PDT) |
Extended table, and added recent h1seib2 lockup:
computer | lock-up time (local) | computer uptime at lockup | timer reset date-time (local) |
h1seib2 | Mon 13:05 5/1 PDT | 215 days | Wed 22:07 4/26 PDT |
h1susauxb123 | Mon 04:24 5/1 PDT | 215 days | no data |
h1susauxh2 | Sun 13:30 4/30 PDT | 214 days | Wed 22:18 4/26 PDT |
h1susex | Fri 05:43 4/28 PDT | 209 days | Wed 23:34 4/26 PDT |
h1seiex | Thu 06:10 4/27 PDT | 209 days | Wed 21:53 4/26 PDT |
Had to accept a bunch of SDF diffs.
SEI - Hepi had some Tramps and setpoint diffs, and the ISIs had some filter differences.
SUS - A few differences with setpoints
ASC - IMC PZT offset diffs
Actually the earthquakes were in BC, Canada - that country that borders the US to the north ( or east of Alaska).
The SDF diffs on HAMs 2 and 3 are because these chambers (for reasons we don't understand) can't have their guardians change gains on the GS13s. On these chambers it works just fine to use the SDF system to revert the GS13 gains, if you do them all at once. This used to also be able to be set from the Commands screen on the chamber overview, but the script that does the switching was written in PERL (sensor_hilo in userapps/isi/common/scripts), so now it doesn't on our new Debian workstations. There is currently no easy way to reset the gains for these chambers.