Ed, Sheila
Trouble with LOCKING_ALS sent us looking for HV settings on ETMX. It seemed that the L3 stages were set to LO volts rather than HI. We believe what happened when we toggled the L3 LL HI/LO control that the state of the UL changed as well. We're also not sure why SDF didn't grab this change and we couldn't find it in SDF with a search.
A similar thing:
OMCDCPD whitening settings were incorrect and the DCPD_MATRIX elements were all zero. these record did exist in SDF, but were set to zero in SDF.
This morning started with a smorgasboard of troubles. Patrick aLogged what happened there. After we seemingly got everything back up there were still some lingering issues with connections/channels that were finally resolved through a half-dozen or so phone calls with Dave Barker. His aLogs should show the gory details. I'm finally tying to get things re-aligned so I can get this ship sailing again.
Existing MEDMs continued to be connected to h1iscex, but no new connections were possible. Also I was unable to ping or ssh to h1iscex on the FE-LAN. This also meant that the Dolphin manager was unable to put this node into an offline state. The only recourse was to put SUS-EX and SEI-EX into a safe state and remotely power cycle h1iscex via its IPMI management port. As expected, this in turn glitched the attached Dolphin nodes in the EX-Fabric (h1susex and h1seiex). I restarted all the models on these two systems and Ed is now recovering EX.
at approximately 07:50 PDT this morning the /opt/rtcds file system (served by h1fs0) became full. This caused some front end epics processes to segfault (example dmesg output for h1susb123 shown below). Presumably these models epics processes were trying to do some file access at this time. The CDS overview is attached showing which specific models had problems. At this point guardian stopped running because it could not connect to critical frontends. Lockloss_shutter_check also reported an NDS error at this time (log shown below), further investigation is warrented since h1nds0 was running at the time.
On trying to restart h1susitmx, the errors showed that /opt/rtcds was full. This is a ZFS file system, served by h1fs0. I first attempted to delete some old target_archive directories, but ran into file-system-full errors when running the 'rm' command. As root, I manually destroyed all the ZFS Snapshots for the month of May 2016. This freed up 22.3GB of disk which permitted me to start the failed models.
Note that only the model EPICS processes had failed, the front end cores were still running. However in order to cleanly restart the models I first issued a 'killh1modelname' and then ran 'starth1modelname'. Restarting h1psliss did not trip any shutters and the PSL was operational at all times.
I've handed the front ends over to Patrick and Ed for IFO locking, I'll work on file system cleanup in the background.
I've opened FRS6488 to prevent a re-occurance of this
[1989275.036661] h1susitmxepics[25707]: segfault at 0 ip 00007fd13403c894 sp 00007fffb426b9a0 error 4 in libc-2.10.1.so[7fd133fda000+14c000]
[1989275.045095] h1susitmxepics used greatest stack depth: 2984 bytes left
[1989275.086076] h1susbsepics[25384]: segfault at 0 ip 00007f2a5348e894 sp 00007fff908c88e0 error 4 in libc-2.10.1.so[7f2a5342c000+14c000]
[1989275.127643] h1susitmyepics[25166]: segfault at 0 ip 00007f5905a59894 sp 00007fff20f878d0 error 4 in libc-2.10.1.so[7f59059f7000+14c000]
2016-10-23T14:51:50.62907 LOCKLOSS_SHUTTER_CHECK W: Traceback (most recent call last):
2016-10-23T14:51:50.62909 File "/ligo/apps/linux-x86_64/guardian-1.0.2/lib/python2.7/site-packages/guardian/worker.py", line 461, in run
2016-10-23T14:51:50.62910 retval = statefunc()
2016-10-23T14:51:50.62910 File "/opt/rtcds/userapps/release/isc/h1/guardian/LOCKLOSS_SHUTTER_CHECK.py", line 50, in run
2016-10-23T14:51:50.62911 gs13data = cdu.getdata(['H1:ISI-HAM6_BLND_GS13Z_IN1_DQ','H1:SYS-MOTION_C_SHUTTER_G_TRIGGER_VOLTS'],12,self.timenow-10)
2016-10-23T14:51:50.62911 File "/ligo/apps/linux-x86_64/cdsutils/lib/python2.7/site-packages/cdsutils/getdata.py", line 78, in getdata
2016-10-23T14:51:50.62912 for buf in conn.iterate(*args):
2016-10-23T14:51:50.62912 RuntimeError: Requested data were not found.
2016-10-23T14:51:50.62913
Started Beckhoff SDF for h1ecatc1 PLC2, h1ecatx1 PLC2, and h1ecaty1 PLC2 by following the instructions at the end of this wiki: https://lhocds.ligo-wa.caltech.edu/wiki/UpdateChanListBeckhoffSDFSystems controls@h1build ~ 0$ starth1sysecatc1plc2sdf h1sysecatc1plc2sdfepics: no process found Specified filename iocH1.log does not exist. h1sysecatc1plc2sdfepics H1 IOC Server started controls@h1build ~ 0$ starth1sysecatx1plc2sdf h1sysecatx1plc2sdfepics: no process found Specified filename iocH1.log does not exist. h1sysecatx1plc2sdfepics H1 IOC Server started controls@h1build ~ 0$ starth1sysecaty1plc2sdf h1sysecaty1plc2sdfepics: no process found Specified filename iocH1.log does not exist. h1sysecaty1plc2sdfepics H1 IOC Server started
Everything was going well until 10 minutes before the end of the shift. The IFO was locked at NLN (26 W) and the range was fairly steady around 60 Mpc. Then at 14:50 UTC the IFO lost lock, guardian went into error, and it looks like various frontend models have crashed. 08:06 UTC Stefan done commissioning. IFO is locked at NLN (26 W) 14:50 UTC Lock loss. ISC_LOCK node in error. LOCK_LOSS_SHUTTER_CHECK node in error. Hit load on ISC_LOCK. Lots of voice alarms. Various frontend models are white.
Ed, Patrick 14:50 UTC The IFO lost lock. Guardian reported ISC_LOCK node in error and LOCK_LOSS_SHUTTER_CHECK node in error. The guardian overview turned dark red. I hit LOAD on the ISC_LOCK guardian. Ed came in and I turned around and saw a bunch of frontend models were white. I thought it must have been a power glitch, so I called Richard. He reported that he did not receive any notices related to a power problem and suggested I call Dave. I have done so, but have not been able to reach him yet. I am no longer thinking it is a power glitch. The laser is still up and all of the machines in the MSR appear to be running. I have attached a screenshot of the initial guardian error and the cds overview.
This spectrum was taken with the POP_A PIT QPD offset removed (see snapshots).
Left plot: Current noise agains O1-references.
Right plot: Current noise against tonight's 40W noise, and the noise from last night (POP_A PIT QPD offset was on, TCS ringheater was transitioning - see previous elog.)
Plot 1 shows the DC signals of all 4 I segments of ASA36. Note that seg 3 is ~2.5 times larger than the others.
Plot 2 shows the updated AS_A_RF36_I matrix - the gains for seg 3 have been dropped to -0.4 from -1.
Plot 3 shows the resulting error signal - it now cresses zero where the buildups and couplings for SRCL are good.
Closed the SRC1 PIT and YAW loops with a gain of 10, and input matrix element of 1. I will leave this setting for the night - although it is not in guardian yet.
I accepted the funny matrix in SDF, and added this in the SRM ASC high power state. The loops should only come on for input powers less than 35 Watts. Nutsinee and I tested it once.
Stefan, Terra
We had a large peak rise quickly at 27.41 Hz around 6 UTC. A bit of searching gave us Jeff's alog identifying it as the bounce mode of PR2; as such we lowered gain of MICH from 2 --> 1.2 which eventually allowed it to ring down.
We were wondering whether the auxiliary noise depends on the TCS state or PR3 spot position. Last night we had a ringheater change with several short locks over the night, and tonight we had some alignment change.
Attached are AUX spectra from last night, from early tonight, and just now. For some reason the early tonight spectra were significantly better (albeit not quite O1 quality). We could not correlate it with alignment or heating in a systematic way.
TITLE: 10/22 Eve Shift: 23:00-07:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Commissioning
INCOMING OPERATOR: Patrick
SHIFT SUMMARY:
I would recomend that instead of putting the extra factor of 2 gain in PRCL2, people double the gain in PRCL 1. The guardian doesn't touch the gain in PRCL2, but it will later in the locking sequence adjsut PRCL1 to be the nominal value. If people adjust the gain in PRCL2 and forget to rest it to 1, this can cause problems.
If we consistently are needing higher gain, there is a parameter in LSC params that adjusts the gains for lock acquisition.
The states are designed for in-lock adjustment of the spot position on PR2 and PR3. It is a guardian implementation of the scripts
/ligo/home/controls/sballmer/20160927/pr2spotmove.py
/ligo/home/controls/sballmer/20160927/pr3spotmove.py
So far, turning the PRC1 ASC loops off has to be done manualy.
Instructions
PR3_SPOT_MOVE:
WFS state: disable the PRC1 loop that ordinarily moves the PRM to center POP_A.
Use the PRM alignment sliders to move the beam spot on PR3. The scrips slaves all other optics in order to avoid taxing the WFS.
PR2_SPOT_MOVE:
WFS state: disable the PRC1 loop that ordinarily moves the PRM to center POP_A
Use the PR3 alignment sliders to move the beam spot on PR2. The scrips slaves all other optics in order to avoid taxing the WFS.
(see also alogs 30030, 28442,28627)
Stefan, Sheila, Terra, Ed and Nutsinee
We have found that alingment and TCS together can improve our noise hump from 100Hz-1kHz. We have reverted both alignment and TCS changes to July, and we seem to be stable at 50Watts with a carrier recycling gain around 28.
TImes:
22:18:18 Oct 23rd (before alingment move, at 40Watts) and 22:28:12 (after)
ten minutes of data starting at 23:33:33 is with the better alingment at 40 Watts, 10 minutes starting at 0:19 UTC Oct 23 is at 50 Watts. (we redid A2L at 40 Watts, but not 50 Watts, there is MICH and SRCL FF retuning still to be done.)
We saw that the TCS changes of the last few days made a small improvement in the broadband noise lump from 200Hz -1kHz, so we decided to retry several of the noise test we had done before without sucsess. We only moved the POPA spot position in PItch, moving it in yaw made the carrier recycling gain drop but didn't help the noise. The attached screenshot shows the spectra, and the coherence with IMC WFS, our best jitter sensors in lock. We have lots of coherence with these signals, at frequencies above where the HPO changed the spectra.
Apparently this alog was not clear enough. Kiwmau recomended bigger fonts.
The main message: The blue trace in the attachment was taken at 50Watts, and there is no broad noise lump, just the jitter peaks from structures on the PSL
My earlier comment assumption was correct: during the settling of the recent ring heater changes, Mode26 had shifted outside of the usual guardian-controlled bandpasses; an already existing filter needed to be turned on. After this, we powered up and damped with no problems. Locked for 2.5 hours at this point needing only normal phase changes. I walked Ed through how to check on this and change the filter, but this large of change only occurs after a ring heater change so will not be a normal issue operators need to worry about.
Reminder that I've requested to be called anytime there is a persistant PI problem.
I've added step-by-step instructions for how to attempt to handle this scenerio in the Operator PI Wiki (under 'If a PI seems unresponsive') if for some reason I can't be reached. PI Wiki can also be opened from PI medm screen. Working on automating this.