Activity Log: All Times in UTC (PT) 07:00 (00:00) Take over from TJ 08:18 (01:18) Jenne & Nutsinee – Leaving the site 08:23 (01:23) Guardian error – Trying to select INIT. ISC_LOCK reload fixed problem. 08:30 (01:30) redoing initial alignment 12:51 (05:51) Finished initial alignment – Trying to relock 12:57 (05:57) Lockloss at LOCK_DRMI_1F 12:58 (05:58) Peter – Going into LVEA to take some photos 13:03 (06:03) Peter – Out of LVEA 13:10 (06:10) Continuing working on relocking 15:00 (08:00) Turn over to Travis End of Shift Summary: Title: 10/21/2015, Owl Shift 07:00 – 15:00 (00:00 – 08:00) All times in UTC (PT) Support: Jenne, Nutsinee, Jeff K., Sheila, Kiwamu Incoming Operator: Travis Shift Summary: No progress in relocking. Redoing initial alignment No luck with INPUT_ALIGN. Power jumps to 1, then immediately breaks lock. Spoke with Sheila. No progress. Sheila coming to site. Completed Initial Alignment and start relocking. Made it to LOCK_DRMI_1F before lockloss Making progress at relocking, but still having difficulties. Sheila and Kiwamu are working through the problems.
Things that Jeff B and I have done:
Jeff was having difficulty locking the X arm in IR for inital alignment. Jeff moved SR3 pit to bring witness and oplev back to where they were before the power outage (both indicated that it lost 1-2urad of pit in the outage) (Still could not lock X arm in IR) Jeff re-ran the green WFS alignment, and moved PR3 by 0.5 urad in yaw.
checked all shutters to retore them to the way they were before the outage. (opened ISCT1 spare, IOT2L spare, and X end fiber.) This uncovered an interesting bug, which is that if theX green beam shutter is open, and the fiber shutter is closed, opening the fiber shutter will close the green beam shutter. In most other situations both of these shutters seem to work fine.
After this the X arm locked. Jeff went through the rest of inital alingment without incident until SRC align. Since SR3 was closer to the alignment from before the outage, we re-engaged the cage servo, which was turned off earlier probably because SR3 had moved. the SRC alignment was way off when we started the SRC align step and Jeff aligned by hand to get us close.
We then were able to lock DRMI several times, but had difficulty engaging DRMI ASC as people described last night. We tried doing things by hand, it seemed like we were OK engaging the MICH and INP1 loops, but PRC2 was a problem. In august, Evan and I changed the input matrix for this loop to no longer use refl 45 (alog 20811.) I found the old input matrix in the svn and tried it. This worked once so I've reverted the matrix in the guardian.
Input matrix for PRC2 since august 24th: refl91A-refl9IB
input matrix before august 24th and now:
# PRC2 REFLA45I - REFLA9I
asc_intrix_pit['PRC2', 'REFL_A_RF45_I'] = asc_intrix_yaw['PRC2', 'REFL_A_RF45_I'] = 0.83
asc_intrix_pit['PRC2', 'REFL_A_RF9_I'] = asc_intrix_yaw['PRC2', 'REFL_A_RF9_I'] = 0.5
asc_intrix_pit['PRC2', 'REFL_B_RF9_I'] = asc_intrix_yaw['PRC2', 'REFL_B_RF9_I'] = 0.5
asc_intrix_pit['PRC2', 'REFL_B_RF45_I'] = asc_intrix_yaw['PRC2', 'REFL_B_RF45_I'] = 0.83
This matrix gets reset in the full lock ASC engage states, so this is not a change to the full lock configuration.
After this change we made it past DRMI, to the point where DHARD WFS are engaged durring the CARM offset reduction. The bounce and roll modes are rung up, we spent a few minutes damping them but probably moved on too soon and lost lock probably due to roll modes in the final stages of CARM offset reduction.
Difficult morning for locking. After many lock failures, and at Jenne's suggestion started to run another an Initial Alignment. One had been done earlier during the day. No problem with ALS green. Could not get Input_Align to complete. It would grab lock with good power for a second or two and then drop. Ran through a couple of things with Sheila on the phone, but made no real progress. Sheila is coming in to work on resolving this problem.
Title: 10/21/2015, Owl Shift 07:00 – 15:00 (00:00 – 08:00) All times in UTC (PT) State of H1: 07:00 (00:00) Relocking after yesterday’s power outage Outgoing Operator: TJ Quick Summary: Wind is a light to gentle breeze; seismic activity is low. Working on IFO recovery.
TITLE: "10/20 [EVE Shift]: 23:00-07:00UTC (16:00-00:00 PDT), all times posted in UTC"
STATE Of H1: Unlocked, struggling
SUPPORT: Jenne, Jeff K, Nutsinee, Sheila (by phone)
SHIFT SUMMARY: Problems from the beginning of my shift. Initial alignment presented two issues alog22705 and alog22707. Then trouble with DRMI has taken up the rest of my shift, alog22709 and alog22711. Let's hope for some good luck from here.
INCOMING OPERATOR: Jeff B
J. Kissel, J. Driggers, T. Shaffer After several lock losses of DRMI that were obviously caused by DRMI WFS running away with the optics, we've begun the arduous journey of going through each step of the DRMI WFS turn on process. We're trying the usual -- identify which step is problematic, slow things down, try turning on with low gains and no boosts, etc. We've run through tuning dark offsets suspecting we might have problems there and our earlier experience with the TransMon QPDs, but they didn't change too much. We've checked whitening filters and gains, all looks OK. Jenne's gunna stick it out with Bartlett for a few more attempts of finagling the WFS loops in hopes of working through them enough that they offload goodness to the SUS, but TJ and I are going to call it a night.
J. Kissel, J. Driggers, T. Schaffer After a very quick transition through ALS (nice!) we were stuck trying to get through DRMI 1f lock acquisition. After all of the usual tricks of taking SRM out of the equation, touching up alignment here and there, we began to be suspicous of beckhoff settings of our error signal photodiodes, as just about all of our problems have been this evening. Gains, whitening filters, all checked out. However, finally just *looking* at the error signal in dataviewer, we saw little to nothing. Jenne caught it -- the REFL shutter on ISCT was closed (similar to the earlier shutter gotcha LHO aLOG 22696). 40 mins ... *flush*
J. Kissel, J. Driggers, T. Schaffer [E. Hall and S. Dwyer remotely] Right after finally getting through XARM initial alignment, we advanced to the PRM initial alignment state, and found the ALIGN_IFO guardian manager erroring out. It was relatively easy to trace the problem to /opt/rtcds/userapps/release/isc/h1/guardian/lscparams.py which ALIGN_IFO was expecting to define a variable "prm_m2_cross_over". We found that this definition had been commented out of the code (on line 27), yet (a) there were no svn diffs, (b) the last time the file was touched was Oct 13 (by Evan), and (c) we've done initial alignment several times since Oct 13th. A call to Evan and Shiela revealed that they were just as baffled as to how this could have possibly worked for 7 days as we were. At their advice, we uncommented the variable definition, reloaded the guardian code, and the state succeeded admirably, without error. #facepalm We could try to come up with some boogey-man, malicious theories involving power outtages and gremlins as to how this could possibly be true, but instead we move on with our day. The functional lscparams.py with the uncommented, well-defined prm_m2_cross_over has now been committed to the userapps repo. 30 minutes ... *flush*
J. Kissel, J. Driggers, T. Schaffer [S. Dwyer remotely] While working our way through initial alignment, we were having great difficulties keeping the XARM locked on Red. Looking in every drawer and under every rug we could find, we found three small problems: (1) The TransMon X QPD B dark offset sum compensation needed to be changed +0.8 to -1.2. Not crazy that a site-wide power outtage changed the QPD dark offsets. (2) Digging around the analog signal chain, we found that the TransMon X QPD A and B whitening gains were +18 [db], when they had been +9 [dB] 24 hours ago before the power outtage. Not crazy that some Beckhoff settings did not get restored properly. I'm worried there are more, but I guess we'll find those later... (3) The ALIGN_IFO manager had been continuously requesting PRM and SRM to be misaligned for the XARM state, but PRM and SRM were not misaligned. After re-requesting misaligned (on their respective individual SUS guardian nodes), PRM and SRM actually misaligned, which cleaned up AS AIR RF45 PDH error signal nicely, and the XARM locked right up. Two hours ... *flush*
J. Kissel, for R. McCarthy, J. Worden, G. Moreno, J. Hanks, R. Bork, C. Perez, R. Blair, K. Kawabe, P. King, J. Oberling, J. Warner, H. Radkins, N. Kijbunchoo, E. King, B. Weaver, T. Sadecki, E. Hall, P. Thomas, S. Karki, D. Moraru, G. Mendell Well, it turns out the IFO is a complicated beast with a lot of underlying infrastructure that we rely on to even begin recovering the IFO. Since LHO so infrequently loses power, I summarize the IFO systems that are necessary before we can begin the alignment / recovery process with pointers to aLOGs and/or names of people who did the work, so that we have a global perspective on all of the worlds that need attention when all power dies. One could consider this a sort of check-list, so I've roughly prioritized the items into stages, where the items within each stage can be done in parallel if the man-power exists and/or is on-site. Stage 1 -------------------- Facilities - Richard Vacuum - John / Kyle / Gerardo Stage 2 -------------------- CDS - Work Stations - Richard Control Room FOMs - Operators / Carlos DC Power Supplies - Richard Stage 3 -------------------- CDS continued Front-Ends and I/O Chassis - Dave (LHO aLOG 22694, LHO aLOG 22704) Timing System Guardian Machine (comes up OK with a simple power cycle) Beckhoff PLCs - Patrick (LHO aLOG 22671) PSL - Peter / Jason / Keita (LHO aLOG 22667, LHO aLOG 22674, LHO aLOG 22693) Laser Chillers Front Ends TwinCAT Beckhoff (separate from the rest of the IFO's Beckhoff) IO Rotation Stage TCS Nutsinee / Elli (LHO aLOG 22675) Laser Chillers TCS Rotation Stage (run on same Beckhoff chassis as IO Rotation Stage, and some PSL PEM stuff too) ALS Green Lasers - Keita The interlock for these lasers are on-top of the ISCT-Ends, and need a key turn as well as a "start" button push, so it's a definite trip to the end-station PCAL Lasers - Sudarshan These either survived the power outtage, don't have an interlock, or can be reset remotely. I asked Sudarshan about the health of the PCAL lasers, and he was able to confirm goodness without leaving the control room. High-Voltage - Richard McCarthy ESD Drivers, PZTs HEPI Pumps and Pump Servos - Hugh (LHO aLOG 22679) Stage 4 ------------------ Cameras - Carlos PCAL Spot-position Cameras Green and IR cameras SDF System - Betsy / Hugh Changing the default start-up SAFE.snap tables to OBSERVE.snap tables (LHO aLOG 22702) Hardware Injections - Chris Biwer / Keith Riles / Dave Barker These have not yet been restarted DMT / LDAS - Greg Mendell / Dan Moraru (LHO aLOG 22701) May we have excercise this list very infrequently if at all in the future!
C. Vorvick should be added to the list of participants! Apologies for anyone else that slipped from my mind late in the evening.
DCS has fully recovered from this morning's power outage. Affected by the power outage were all compute nodes, a couple of switches, the ACSLS server, and a disk expansion chassis. The switches and ACSLS server have single power supplies and so could not be moved onto UPSes previously without disruption. They are now UPS-protected, as is the E18X expansion chassis that had mistakenly been left off UPS. We took advantage of the downtime to patch and reboot all Solaris servers. The LDAS gateway failed to come back up and required intervention, but that issue has been resolved. The Condor central manager is now running on new hardware. The gstlal-calibration packages were updated cluster-wide, and llldd partitions are now locked into memory on all servers other than dmt-er.
Annulus ion pump is now running on its own.
Aux pump cart was turned off at 12:38 pm. Turbo pump and flex hoses were decoupled from the annulus system.
Title: 10/20 Day Shift 15:00-23:00 UTC (8:00-16:00 PST). All times in UTC.
State of H1: Aligning
Shift Summary: H1 was down when I arrived this morning due to power outage. After extensive recovery and some minimal maintenance day activities sprinkled in, I began initial alignment. After a couple of hours of diagnosing why we weren't able to see the ALS beam, Keita discovered that the ISCT1 shutter was closed. At about the same time, we got hammered by a 7.1 EQ in Vanuatu ringing up both the 0.03-0.1 Hz and 0.1-0.3 Hz bands to over 1 um/s. Waiting for these to ring down.
Incoming operator: TJ
Activity log:
15:07 Jeff B to mezzanine to do TCS chiller work
15:09 Peter K to LVEA checking power supplies
15:13 Joe D to LVEA checking emergency lights
15:15 Gerardo to MY
15:17 Peter K done
15:20 Hugh to HEPI pump station
15:26 Hugh done
15:40 Joe D done
15:40 Kyle to beer garden
15:46 Joe D to mids and ends checking emergency lights
15:24 Gerardo to EY
15:50 Jodi to both mids
16:11 Nutsinee to mezzanine
16:15 Jason, Elli, Jeff B, and Nutsinee to LVEA for TCS chiller work
16:21 Fire dept. to EX
16:31 Hugh to EY for HEPI pump restart
16:47 Fire dept. done
16:53 Jason, Elli, Jeff B, and Nutsinee done
17:00 Hugh leaving EY, going to EX
17:11 Fil done
17:14 Joe D done
17:28 Dave B to mids for PEM work
17:36 Fil to EY
17:39 Jeff B and Nutsinee out of LVEA
17:47 Dave B reports no phone at MY
17:57 Bubba to LVEA checking batteries and LTS
18:00 Kyle to Y28 hammer drilling
18:00 Elli and guests touring LVEA
18:12 Jeff B done
18:12 Dave B done
18:17 Chris S beam tube sealing Xarm
18:19 Hugh done in mezz.
18:43 John to MY
18:54 Carlos and Sudarshan to EY for PCal camera restart
19:30 Gerardo to MX
20:00 Gerardo done
20:21 Carlos to both ends for PCal camera resets
21:20 Carlos back
21:55 Kyle and Gerardo to MY
22:39 Kyle and Gerardo done
Won't need helium leak test ability now that we have the leak location narrowed down to a small area. Local pressure gauges to be used in lieu of the helium setup.
[Basically everyone in the control room]
We spent a lot of time working on pre-initial alignment, since we weren't seeing any light at the ALS transmission cameras or PDs.
All of the test mass optics and PR3 were restored to where their oplevs thought they were before the power outage (since the oplevs are independent of ISI pointing). The TMS suspensions were aligned using the baffle PD script.
Eventually, it was discovered that a shutter on ISCT1 was closed, and blocking the transmitted green beams.
Keita and Daniel are looking into why it was closed at all when the power went out and why it wasn't opened after the burt restore that Patrick did, but I have also added it to the ISC_LOCK DOWN state, so that we don't get bitten by this again.
We're getting good flashing in the arm cavities, both for green and IR, so as soon as this earthquake finishes ringing down, we can do initial alignment and finally relock.
[Jenne, TJ]
After finishing the green initial alignment step, we noticed that the COMM beatnote power was tiny - something like -32dBm. No good. After trying some PR3 alignment with no effect, we looked at some more shutters. As it turns out, the PSL green beam also has a shutter, so we weren't getting any PSL green to the beat PD. Opening that immediately fixed the problem. TJ finished off hand-aligning PR3 to maximize the beatnote power, and we went off to our next initial alignment step. (See aLog 22705 for that "fun"....)
This shutter was also added to the ISC_LOCK down state, so we don't get bitten by it either.
Two hours ... *flush*
Rolf, Richard, Jeff, Betsy, Travis, Filiburto, Carlos, Jonathan, Greg, Sudarshan, Dave
Front Ends
All IOCs and front end computers were power cycled. Recovery of Dolphin systems was delayed due to fault in h1seiex, which needed a second reboot to clear. (Jeff requests the split of the single Dolphin master into three be bumped up in priority).
Some IOP models went into "negative" IRIG range for a few minutes. Some corner station systems went into the high positive range and took several hours to come down to operational range. At time of this report, only SEIH45 has an IRIG error.
Yesterday I checked that there were no partial filter module loads or modified files, so the restart should not have loaded any new filters.
Hartman Wavefront Sensors
Elli restarted the HWS code at both EX and EY. The EDCU alerted us that this was not running, EDCU is now GREEN.
DAQ
The DAQ rode through the outage and recovery with no problems. I have cleared the accumulated CRC errors between the FECS and the DAQ concentrator due to the restarts.
DMT
Greg reports all DMT systems are fully recovered.
PCAL camera
Sudarshan and Carlos brought the PCAL camera systems back online.
SDF
All systems are now using their OBSERVE.snap files for their SDF reference.
Restart log
The full restart log is attached. The filtermodule DAQ restart strings shows the last start times for each model alphabetically
After clearing problems with the dolphin network, and untripping all watchdogs, there remained incosequential IRIG-B Timing Errors present on the CDS State Word on several front-ends, h1sush56 h1seih45 h1seib2 h1psl0 These errors are a result of the IRIG-B system not starting up in sync with the 1 PPS timing signal. They eventually go away after some time as the IRIG-B slowly begins to synchronize, as it did in this case. I document it just for future reference that these errors are not-surprising, and have little-to-no impact on recovery.
J. Kissel, B. Weaver, J. Driggers, H. Radkins When all front-ends die and restart, they come back pointing to their SAFE.snap SDF file. Once front end computers up, running, and mostly happy and we began to *use* the front-ends to recover the IFO, we began to change all of the SAFE.snaps to the nominal OBSERVE.snaps to help us continue to figure out what settings were out of place. This is a pretty tedious task, but once through, we reverted everything and requested the ISC guardians to run their DOWN states. This worked out quite well, but we really could use a script that switches all FE's SDF tables from SAFE.snap to OBSERVE.snap.
J. Kissel, K. Thorne, D. Barker, R. Bork, J. Hanks, R. Blair Since it was unclear where to find the recovery process for the front-ends given how interwoven they are, I outline the process that Keith had suggested and we ended up following once Dave got in: Depending on the length of the power outtage some front-end computers may or may not survive the outtage. However, again, given the interwoven systems, we've found it best to perform a systematic shut-down such that all computers and their interactions can be brought up in a controlled fashion. With the current setup of the dolphin network, the power-down and power-up procedure should be performed at the end-stations first, because the corner-station computers won't start until the end-station dolphin network is up and functional. As Dave mentions, I've requested that LHO adopt LLO's splitting of the dolphin fabrics, such that one can truely exercise the end stations independently of the corner. Power-down and power-up procedure - Power down all front end computers in the MSR (for corner station) or Entrance Lobby (for end stations). This is done by holding down the power button for ~5 [sec]. - Power down all I/O Chassis in the CDS Highbays. The rocker switches on the front panels don't always work, so you may have to use the rocker switch on the back of the chassis above where the +/- 24 [V] comes in. - Power cycle DC power supplies (recommended by LLO, unclear whether Richard did this before we got in in the morning. We did *not* do this systematically when we ran this power-down power-up procedure today) - Power up I/O chassis - Wait for timing slaves to be happy (relatively quick, but be sure to check) - Turn on front-end computers Once you turn on front-end computers, they will automatically start turning on the front-end processes. Recall that for SUS front-ends it may look (from the GDS-TP screens and the CDS overview) that the SUS computers are not coming back, but it's merely because they're running through the 18-bit DAC auto calibration, which takes ~3 to 5 minutes. This happens once the IOP model is started up, so from the CDS overview screen, it'll look like the IOP model came up dead, and the user models didn't start. Give it a few minutes before you get sad and go to restart the front-end processes by hand.
Kiwamu and I spent about 15 minutes locked at CARM 10 picometers to damp bounce and roll. We lost lock after that, possibly because the IFO got misaligned as we were sitting at 10pm damping.
In the next lock, we saw that ETMX violin modes are also rung up, Kiwamu lowered the damping gains to stop PUM saturations.
We made it through engaging the ASC in full lock, and found that we couldn't lock the OMC because the Kepco power supply was off.
After we made it to low noise, I cleared 82 diffs in the ASC SDF. Most of these were due to the dark offset script that Jenne and Jeff ran last night. I accidentally accepted all of these with one button click (I hit accept all assuming that was only the first page which I could read, but accept all really means accept all.) Betsy pointed me to the last version to be accepted in SDF, so I was ble to check on the things I had inadvertently accepted. There were a few oddball things, like ADS SIG DEMOD TRAMPS, (I accepted the new 3 second TRAMPs), and offsets in the SRC1 loops (now set to 0).