Ed, Sheila
Trouble with LOCKING_ALS sent us looking for HV settings on ETMX. It seemed that the L3 stages were set to LO volts rather than HI. We believe what happened when we toggled the L3 LL HI/LO control that the state of the UL changed as well. We're also not sure why SDF didn't grab this change and we couldn't find it in SDF with a search.
A similar thing:
OMCDCPD whitening settings were incorrect and the DCPD_MATRIX elements were all zero. these record did exist in SDF, but were set to zero in SDF.
This morning started with a smorgasboard of troubles. Patrick aLogged what happened there. After we seemingly got everything back up there were still some lingering issues with connections/channels that were finally resolved through a half-dozen or so phone calls with Dave Barker. His aLogs should show the gory details. I'm finally tying to get things re-aligned so I can get this ship sailing again.
Existing MEDMs continued to be connected to h1iscex, but no new connections were possible. Also I was unable to ping or ssh to h1iscex on the FE-LAN. This also meant that the Dolphin manager was unable to put this node into an offline state. The only recourse was to put SUS-EX and SEI-EX into a safe state and remotely power cycle h1iscex via its IPMI management port. As expected, this in turn glitched the attached Dolphin nodes in the EX-Fabric (h1susex and h1seiex). I restarted all the models on these two systems and Ed is now recovering EX.
at approximately 07:50 PDT this morning the /opt/rtcds file system (served by h1fs0) became full. This caused some front end epics processes to segfault (example dmesg output for h1susb123 shown below). Presumably these models epics processes were trying to do some file access at this time. The CDS overview is attached showing which specific models had problems. At this point guardian stopped running because it could not connect to critical frontends. Lockloss_shutter_check also reported an NDS error at this time (log shown below), further investigation is warrented since h1nds0 was running at the time.
On trying to restart h1susitmx, the errors showed that /opt/rtcds was full. This is a ZFS file system, served by h1fs0. I first attempted to delete some old target_archive directories, but ran into file-system-full errors when running the 'rm' command. As root, I manually destroyed all the ZFS Snapshots for the month of May 2016. This freed up 22.3GB of disk which permitted me to start the failed models.
Note that only the model EPICS processes had failed, the front end cores were still running. However in order to cleanly restart the models I first issued a 'killh1modelname' and then ran 'starth1modelname'. Restarting h1psliss did not trip any shutters and the PSL was operational at all times.
I've handed the front ends over to Patrick and Ed for IFO locking, I'll work on file system cleanup in the background.
I've opened FRS6488 to prevent a re-occurance of this
[1989275.036661] h1susitmxepics[25707]: segfault at 0 ip 00007fd13403c894 sp 00007fffb426b9a0 error 4 in libc-2.10.1.so[7fd133fda000+14c000]
[1989275.045095] h1susitmxepics used greatest stack depth: 2984 bytes left
[1989275.086076] h1susbsepics[25384]: segfault at 0 ip 00007f2a5348e894 sp 00007fff908c88e0 error 4 in libc-2.10.1.so[7f2a5342c000+14c000]
[1989275.127643] h1susitmyepics[25166]: segfault at 0 ip 00007f5905a59894 sp 00007fff20f878d0 error 4 in libc-2.10.1.so[7f59059f7000+14c000]
2016-10-23T14:51:50.62907 LOCKLOSS_SHUTTER_CHECK W: Traceback (most recent call last):
2016-10-23T14:51:50.62909 File "/ligo/apps/linux-x86_64/guardian-1.0.2/lib/python2.7/site-packages/guardian/worker.py", line 461, in run
2016-10-23T14:51:50.62910 retval = statefunc()
2016-10-23T14:51:50.62910 File "/opt/rtcds/userapps/release/isc/h1/guardian/LOCKLOSS_SHUTTER_CHECK.py", line 50, in run
2016-10-23T14:51:50.62911 gs13data = cdu.getdata(['H1:ISI-HAM6_BLND_GS13Z_IN1_DQ','H1:SYS-MOTION_C_SHUTTER_G_TRIGGER_VOLTS'],12,self.timenow-10)
2016-10-23T14:51:50.62911 File "/ligo/apps/linux-x86_64/cdsutils/lib/python2.7/site-packages/cdsutils/getdata.py", line 78, in getdata
2016-10-23T14:51:50.62912 for buf in conn.iterate(*args):
2016-10-23T14:51:50.62912 RuntimeError: Requested data were not found.
2016-10-23T14:51:50.62913
Started Beckhoff SDF for h1ecatc1 PLC2, h1ecatx1 PLC2, and h1ecaty1 PLC2 by following the instructions at the end of this wiki: https://lhocds.ligo-wa.caltech.edu/wiki/UpdateChanListBeckhoffSDFSystems controls@h1build ~ 0$ starth1sysecatc1plc2sdf h1sysecatc1plc2sdfepics: no process found Specified filename iocH1.log does not exist. h1sysecatc1plc2sdfepics H1 IOC Server started controls@h1build ~ 0$ starth1sysecatx1plc2sdf h1sysecatx1plc2sdfepics: no process found Specified filename iocH1.log does not exist. h1sysecatx1plc2sdfepics H1 IOC Server started controls@h1build ~ 0$ starth1sysecaty1plc2sdf h1sysecaty1plc2sdfepics: no process found Specified filename iocH1.log does not exist. h1sysecaty1plc2sdfepics H1 IOC Server started
Everything was going well until 10 minutes before the end of the shift. The IFO was locked at NLN (26 W) and the range was fairly steady around 60 Mpc. Then at 14:50 UTC the IFO lost lock, guardian went into error, and it looks like various frontend models have crashed. 08:06 UTC Stefan done commissioning. IFO is locked at NLN (26 W) 14:50 UTC Lock loss. ISC_LOCK node in error. LOCK_LOSS_SHUTTER_CHECK node in error. Hit load on ISC_LOCK. Lots of voice alarms. Various frontend models are white.
Ed, Patrick 14:50 UTC The IFO lost lock. Guardian reported ISC_LOCK node in error and LOCK_LOSS_SHUTTER_CHECK node in error. The guardian overview turned dark red. I hit LOAD on the ISC_LOCK guardian. Ed came in and I turned around and saw a bunch of frontend models were white. I thought it must have been a power glitch, so I called Richard. He reported that he did not receive any notices related to a power problem and suggested I call Dave. I have done so, but have not been able to reach him yet. I am no longer thinking it is a power glitch. The laser is still up and all of the machines in the MSR appear to be running. I have attached a screenshot of the initial guardian error and the cds overview.
This spectrum was taken with the POP_A PIT QPD offset removed (see snapshots).
Left plot: Current noise agains O1-references.
Right plot: Current noise against tonight's 40W noise, and the noise from last night (POP_A PIT QPD offset was on, TCS ringheater was transitioning - see previous elog.)
Plot 1 shows the DC signals of all 4 I segments of ASA36. Note that seg 3 is ~2.5 times larger than the others.
Plot 2 shows the updated AS_A_RF36_I matrix - the gains for seg 3 have been dropped to -0.4 from -1.
Plot 3 shows the resulting error signal - it now cresses zero where the buildups and couplings for SRCL are good.
Closed the SRC1 PIT and YAW loops with a gain of 10, and input matrix element of 1. I will leave this setting for the night - although it is not in guardian yet.
I accepted the funny matrix in SDF, and added this in the SRM ASC high power state. The loops should only come on for input powers less than 35 Watts. Nutsinee and I tested it once.
Stefan, Terra
We had a large peak rise quickly at 27.41 Hz around 6 UTC. A bit of searching gave us Jeff's alog identifying it as the bounce mode of PR2; as such we lowered gain of MICH from 2 --> 1.2 which eventually allowed it to ring down.
We were wondering whether the auxiliary noise depends on the TCS state or PR3 spot position. Last night we had a ringheater change with several short locks over the night, and tonight we had some alignment change.
Attached are AUX spectra from last night, from early tonight, and just now. For some reason the early tonight spectra were significantly better (albeit not quite O1 quality). We could not correlate it with alignment or heating in a systematic way.
TITLE: 10/22 Eve Shift: 23:00-07:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Commissioning
INCOMING OPERATOR: Patrick
SHIFT SUMMARY:
I would recomend that instead of putting the extra factor of 2 gain in PRCL2, people double the gain in PRCL 1. The guardian doesn't touch the gain in PRCL2, but it will later in the locking sequence adjsut PRCL1 to be the nominal value. If people adjust the gain in PRCL2 and forget to rest it to 1, this can cause problems.
If we consistently are needing higher gain, there is a parameter in LSC params that adjusts the gains for lock acquisition.
The states are designed for in-lock adjustment of the spot position on PR2 and PR3. It is a guardian implementation of the scripts
/ligo/home/controls/sballmer/20160927/pr2spotmove.py
/ligo/home/controls/sballmer/20160927/pr3spotmove.py
So far, turning the PRC1 ASC loops off has to be done manualy.
Instructions
PR3_SPOT_MOVE:
WFS state: disable the PRC1 loop that ordinarily moves the PRM to center POP_A.
Use the PRM alignment sliders to move the beam spot on PR3. The scrips slaves all other optics in order to avoid taxing the WFS.
PR2_SPOT_MOVE:
WFS state: disable the PRC1 loop that ordinarily moves the PRM to center POP_A
Use the PR3 alignment sliders to move the beam spot on PR2. The scrips slaves all other optics in order to avoid taxing the WFS.
(see also alogs 30030, 28442,28627)
Stefan, Sheila, Terra, Ed and Nutsinee
We have found that alingment and TCS together can improve our noise hump from 100Hz-1kHz. We have reverted both alignment and TCS changes to July, and we seem to be stable at 50Watts with a carrier recycling gain around 28.
TImes:
22:18:18 Oct 23rd (before alingment move, at 40Watts) and 22:28:12 (after)
ten minutes of data starting at 23:33:33 is with the better alingment at 40 Watts, 10 minutes starting at 0:19 UTC Oct 23 is at 50 Watts. (we redid A2L at 40 Watts, but not 50 Watts, there is MICH and SRCL FF retuning still to be done.)
We saw that the TCS changes of the last few days made a small improvement in the broadband noise lump from 200Hz -1kHz, so we decided to retry several of the noise test we had done before without sucsess. We only moved the POPA spot position in PItch, moving it in yaw made the carrier recycling gain drop but didn't help the noise. The attached screenshot shows the spectra, and the coherence with IMC WFS, our best jitter sensors in lock. We have lots of coherence with these signals, at frequencies above where the HPO changed the spectra.
Apparently this alog was not clear enough. Kiwmau recomended bigger fonts.
The main message: The blue trace in the attachment was taken at 50Watts, and there is no broad noise lump, just the jitter peaks from structures on the PSL
My earlier comment assumption was correct: during the settling of the recent ring heater changes, Mode26 had shifted outside of the usual guardian-controlled bandpasses; an already existing filter needed to be turned on. After this, we powered up and damped with no problems. Locked for 2.5 hours at this point needing only normal phase changes. I walked Ed through how to check on this and change the filter, but this large of change only occurs after a ring heater change so will not be a normal issue operators need to worry about.
Reminder that I've requested to be called anytime there is a persistant PI problem.
I've added step-by-step instructions for how to attempt to handle this scenerio in the Operator PI Wiki (under 'If a PI seems unresponsive') if for some reason I can't be reached. PI Wiki can also be opened from PI medm screen. Working on automating this.
I kept losing lock from NLN at 26 W after a fairly constant amount of time. Each time ETMY would saturate and the whole DARM spectrum would jump up for a while before hand. I have begun to suspect that it might be related to a fairly broad hump seen in Terra's PI DTT template that grows in height and width as the lock progresses. It is centered somewhere around 15008 Hz and the peak grows from around 100 to 10000 before the lock loss. I tried sitting at DC_READOUT and then INCREASE_POWER and playing around with a bunch of PI settings, assuming it was related to either mode 18 or mode 26. I tried various filters in the H1SUSPROCPI_PI_PROC_COMPUTE_MODE26_BP filter bank, different phases and gains and even turned off and back on the entire PI output to ETMY (H1:SUS-ETMY_PI_ESD_DRIVER_PI_DAMP_SWITCH). Nothing really seemed to have any effect and it eventually broke lock again. Relocking has been fairly robust. The IR transmission at CHECK_IR has been kind of poor towards the end of the shift, but it doesn't seem to hinder further locking. At the beginning of the last lock at NLN I changed the TCS ITMX CO2 power by a small amount per Sheila's request. 07:40 UTC NLN at 26 W 07:50 UTC Damped PI mode 27 by flipping sign of gain 07:56 UTC Played around with damping PI mode 18 and 26. Not sure if they just came down on their own. 08:10 UTC Damped PI mode 27 by flipping sign of gain 09:05 UTC Lock loss. EY constantly saturating and PI modes 9, 18 and 26 ringing up. Attempts to damp these by changing their phases did not seem to help. 09:30 UTC Lock loss immediately upon reaching DC_READOUT. IFO got that far without intervention. OMC SUS and HAM6 ISI tripped. 09:35 UTC X arm IR transmission is ~ .4 at CHECK_IR. Able to bring it to ~ .7 by adjusting H1:ALS-C_COMM_VCO_CONTROLS_SETFREQUENCYOFFSET from 0 to ~ -156. This dropped Y arm transmission. Brought back by moving H1:ALS-C_DIFF_PLL_CTRL_OFFSET. 10:00 UTC Pausing at ROLL_MODE_DAMPING. Waiting for ITM roll modes to damp. 10:12 UTC DC_READOUT_TRANSITION. Stable here. 10:13 UTC DC_READOUT. 10:17 UTC Stable at DC_READOUT. Moving on. 10:25 UTC NLN at 26 W 10:35 UTC Flipped sign of PI mode 27 for third time since NLN. 10:57 UTC Flipped sign of PI mode 27 again 11:18:50 UTC Changed TCS ITMX CO2 power using rotation stage. Changed requested power from 0.200 W to 0.201 W. 0.195 W measured out at 42.2733 deg changed to 0.195 - 0.196 W measured out at 42.3040 deg. 11:24 UTC EY saturating again 11:25 UTC Tried dropping power to 10 W by going to manual -> adjust power -> auto, power request through guardian. Lock loss. Peak on PI DTT at 15008.9 got really broad. HAM6 ISI and OMC SUS tripped. 11:45 UTC Having hard time on CHECK_IR getting stable transmission. Moving on. 12:06 UTC Pausing at DC_READOUT. 12:12 UTC Stable. Moving on. Various large optics saturating until reaching ~ NOISE_TUNINGS. 12:20 UTC NLN at 25.9 W 12:22 UTC Changed TCS ITMX CO2 power using rotation stage: req .200 W -> .198 W meas .196 W -> .194 W meas 42.2800 deg -> 42.2483 deg 12:30 UTC Flipped sign of PI mode 27 gain 12:46 UTC Flipped sign of PI mode 27 gain 13:18 UTC Lock loss. 13:46 UTC Pausing at DC_READOUT. Played around with various settings for mode 18 and 26. 15008 Hz line seemed to come down a little, but not sure it was anything I did. 14:12 UTC Going to INCREASE_POWER. Sitting at INCREASE_POWER trying lots of different things to damp mode 26. No effect.
I believe this was all due to a different BP filter on Mode26 needing to be engaged (one outside those that are currently controlled by guardian) in light of frequency shift due to recent RH changes. I have called and told Ed which to turn on. I repeat: please call me at anytime if there is persistant PI trouble. Often a simple fix might save a night of not locking/stress and I'm happy to be woken up. PI Help Desk phone number is on control room whiteboard.