Many thanks to Ryan for our short term fix of reverting our DS service back to login5. After making this change (and increasing the sleeptime to 300 seconds), I restart ligods and apache2 services on cdswiki and the CDS web pages are back. As Ryan said, the login screens will revert to the old-style form. As soon as the new DS is available, we will switch back.
This will allow Vacuum, Facilities and Detector Engineering teams to remotely monitor LHO remotely for the remainer of the long-weekend.
Thanks again to Ryan.
TITLE: 01/15 Day Shift: 16:00-00:00 UTC (08:00-16:00 PST), all times posted in UTC STATE of H1: Aligning OUTGOING OPERATOR: Ed CURRENT ENVIRONMENT: Wind: 3mph Gusts, 1mph 5min avg Primary useism: 0.03 μm/s Secondary useism: 0.27 μm/s QUICK SUMMARY: I could not log in to the alog earlier. This seems to be resolved now (see Dave's alog). Ed was having trouble with initial alignment when I arrived and was on the phone with Keita. Ed told me that Keita sent a message to Sheila, Jenne and Kiwamu to ask for assistance. Sheila called me and I told her about the diag main message I was seeing: 'IM_ALIGNED: IM3 P is out of its nominal range of 1961'. I sent her a plot of the IM 1-4 witness sensor positions over the last 7 days. She suggested that I move IM3 P back to 1961 and retry initial alignment. I did so and am now on the initial alignment of the arms on green. The X arm locked easily, but I am not getting any signal for the Y arm. There is a spot on the camera, but it does not seem to move with the optics. I trended back and moved the positions of the ITM, TMS and ETM to no avail. I have just started looking for the dither align script.
The auth team has reported server issues, meaning that access to services with LIGO.ORG authentication is sporadic. I can confirm that I can 2FA SSH into LHO CDS, and as is self evident I can make alogs. Patrick reports he cannot log into alog to make entries.
The auth team are actively working this problem.
I'm not sure if this is related to our ongoing SAML-DS error which has prevented LHO CDS web pages from being accessible since yesterday morning.
Very directly related.
https://login.ligo.org/idp/shibboleth needs to return a signed XML document, not a human-readable error about missing files. LIGO-SAML-DS service on every SP running it trips over this. The very short-term fix is to modify your ligods.ini 's "master" property to point to the site IdP (login5. for LHO) rather than the master (login.) and restart ligods service. This is not something you should do outside of emergencies, since you really do want the latest metadata. Just happens to be a failure mode that hadn't been thought of at the time it was written -- it does handle "cannot talk to main IdP at all" case fine (e.g. Internet outage or similar).
I was just able to log in.
Ryan, thanks for the description of the problem. Should we go ahead with the short-term fix you mention or do we know if service restoration is imminent?
I would go ahead and make the change, with an eye towards reverting once folks are back at work next week. You can check if the central service has been fixed by hitting the first URL in my other comment (if you get a large blob of XML, you're good. If you get "missing file", not so good). No ETA since I have no management involvment with the central IdP server.
... and in bad form replying to myself, permanently changing sleeptime to something more like 300 seconds would not be amiss either, since the default 15 seconds induces a large load on all of the IdPs from purely "hello, are you there?" checks.
We are seeing glitches that come in sets and look like stacks of harmonics in the most recent lock. It looks like they can be explained as the non-linear upconversion of some 1 kHz violin modes, based on the spacing between glitches. We think that these glitches happen when the violin modes get high enough to run into some non-linearity in the sensing. I used the derivative of OMC-DCPD_SUM, since it seems like these glitches should not care about the DC value and would most likely be some kind of slew-rate limit. The 1 kHz violin modes dominate the RMS, in particular a pair at 1009.44 and 1009.487 Hz. The beat frequency between these gives a period of about 21 seconds, which is the spacing between bursts of glitches. The glitches occur when the amplitude of the DCPD derivative is highest. The amplitude has a period of 21 seconds because of the two 1 kHz violin modes. Comparing to the previous lock, when these glitches were not present, the amplitude of these two modes is no higher. But there are a number of other modes near 1 kHz and several of those are substantially higher. So they may have pushed the amplitude into the nonlinear region. Attached is a PDF showing the glitches as they appear on the summary page (they are most clearly seen at 2 kHz in Omicron), and the comparison of the 1 kHz spectrum with the previous lock which did not have these glitches. The second page shows the bursts of glitches compared to the amplitude of the DCPD derivative.
If the interferometer is up I will spend some time damping them tonight.
It is important to notice that this 2kHz glitch line has been appearing and dissapearing quite irregularly in the past, but when it is present the associated Omicron glitches are of high SNR. In fact the last time this line showed up was all the way back to 29-30th November:
* 2kHz glitch line started to show on first 29th Nov lock
* 2kHz glitch line disappears on the 30th Nov
Originally I thought that the 2kHz glitch line could have been related to PCALX roaming calibration lines, based on Evan's alog on PCALX roaming calibration line frequency changes. The 2kHz glitch line seem to start as soon as the detector locked after the PCALX calibration line at 1001.3Hz was activated on UTC 2016-11-30 17:16:00, and then the glitch line disappeared around the time the cal line at 2001.3Hz was moved to 2501Hz at 2016-11-30 22:07:00. The fact that the time coincidence was not precise made me believe that the time coincidence may have been casual. It can now be confirmed that it must be unrelated because not such PCALX roaming line was not set at 2001.3Hz during the time of the current appearence of the 2kHz glitch line.
The obvious question is if the November 2kHz high SNR glitch line shows a similar 21 second spacing between bursts of glitches. The answer is yes ,as seen next during the dissapearance of the 2kHz glitch line on the 30th Nov (attached are the original images from which this image was made):
A zoom around the beginning of the above spectrum shows the ~21secs periodicity of the features:
A closer look to the 2nd harmonic violin modes for 30 mins during the time of the 2kHz glitch line (in blue) and 30 mins after the glitch line dissapears (in red) shows that only few violin modes were higher during the time of the 2kHz glitch line:
There are two cases when the blue lines are higher than the red:
* At about 1003.7Hz:
* At about 1009_4Hz:
It is clear that only the pair at around 1009.45Hz would beat with a periodicity of about 20 seconds. And in this case while the lower frequency violin mode of the pair does not change much in amplitude however it is the higher frequency violin mode of the pair which increases by 30.
I have also compared the 1009.45Hz pair peaks amplitude for 4 different cases, two of them correspond to times when the 2kHz glitch line was present, and 2 other (dashed lines) to cases when the glitch line was not present. It shows how the higher frequency line has to be high enough to cause the nonlinearity for the 2kHz glitch to be present:
Nutsinee has just now created a damping filter for this pair of violin modes, so hopefully this will be enough to avoid growth of this peaks to the point of causing appearance of the 2kHz glitch line but only time will tell.
A small note for the operation purpose: Borja's 2kHz glitches thresholds on 1009.44Hz and 1009.49Hz correspond to
8e-13 and 6.5e-13 m/sqrt(Hz) in the (dtt calibrated) CAL-DELTAL_EXTERNAL_DQ channel in November
and
7e-13 and 4.4e-13 m/sqrt(Hz) in January.
Now that the guardian is turning the damping on, these two modes should be well controlled. But if something bad happened and the modes ring up I would suggest operators to take some time to damp the mode when they get close to 1e-12 on DARM FOM.
10:16:08 OMC DCPD Saturation
Nothing jumps out as apparent at this time.
Initiating re-lock. Had to re-align Beam Splitter/PRM and PR2. We'll see.
ITMY Roll Mode seems to be an ongoing obstacle to locking.
17 minutes for for damping. Added 30degrees to phase (60 total) and increased gain from 10 to 20.
Unsuccessful lock attempt @ LOWNOISE_ASC. decrease in ASAIR_B_RF90
11:08UTC Begin Initial Alignment
H1 back up and running for 3hr22min. Livingston is just getting back on the scoreboard but not to low noise yet. Running the script just to "freshen" the alignment for coincidence with LLO.
08:21:45UTC H1 back to Observing
TITLE: 01/15 Eve Shift: 00:00-08:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 64.5071Mpc
INCOMING OPERATOR: Ed
SHIFT SUMMARY: Things have been quiet since Keita and Peter fixed the ISS. The range increased slightly over the past hour and now hanging out at 64 Mpc.
LOG:
03:01 Keita and Peter to LVEA troubleshooting the ISS board
04:58 Back to Observe
The range is still bad, but at least we are back to where we were before the ISS issue. Accepted two SDF differences.
Patrick told me that the ISS diffracted power was swinging around more than usual. Indeed the diffraction minimum and diffraction maximum values were more than usual even with the second loop disabled. After a quick survey of the ISS MEDM screen, I saw that the calculated diffracted power was inconsistent with the AOM drive voltage. At one point I saw that the reported diffracted power was ~15% for a reported AOM drive of less than 0.2 V, which I know to be false. Some time around 7:17 - 7:18 UTC the output of the AOM driver suddenly dropped. I had a quick look at some other signals but didn't find anything that matched I am wondering if the ground reference voltage of the PSL rack has changed relative to the PSL table surface? There does not appear to be any coincidence with the LVEA temperature, so I cannot think of a reason why it would change all of a sudden.
For the meanwhile, it is probably worth trying to keep the AOM drive to be ~0.5 V. One does this by adjusting the first loop ISS reference signal (bottom left hand corner slider of the first loop ISS MEDM screen). The adjustment should be done with the second loop ISS off.
Plots of the ISS reference signal and its monitor are attached. The reference signal comes from a DAC. The monitor is read off the ISS board from an OP-27. One thing to note is that the monitor output is somewhat noisier than the reference signal. This might be related to the grounding issue (if there is indeed one).
Problem was tracked down to the loss of -18V in PSL ISS AA chassis. Power cycling seems to have fixed it.
I was called and came to the site at 6PM to work with Peter who was also called earlier.
We have found that some of the ISS 1st loop signals were much smaller than they should be, starting 23:20 local time yesterday. PDA readback was only 0.7V or so when it should be 3.2, PDB was also too small, and ISS reference voltage readback was only -0.1 when it was set to -0.48.
We were first suspicious about analog failure of the ISS chassis, but when we went to the floor, everything looked normal in analog land. We measured the OLTF of ISS 1st loop and the UGF was about 38kHz.
Voltages on one of the DB9 that goes to the AA looked correct.
Then we went to the PSL ISS AA and found that LEDs for negative voltage was off, and the positive voltage LED was dim (will attach picture later). Peter power cycled it and everything seems to be back to normal.
Things to do for Tuesday:
Check PSL ISS AA chassis power board.
Things unrelated to this but are troubling:
We also went to the power supply area in mezzanine, and things look OK except that MANY power supplies have faulty power indicator lamps, some of them were flickering on and off.
Also, oplev power supply has a label saying +18V, but the actual voltage is more like +10V.
The reason why the readback failure matters is because we are now using the 1st loop PDA and PDB values for the power normalization of 2nd loop digital AC coupling.
With the AA chassis failing, H1:PSL-ISS_SECONDLOOP_INPUT_NORM hit the limitter bottom and no power normalization was done. Withrout a proper power normalization this AC coupling is known to cause loss of lock.
Fisrt picture: LEDs of the front panel with negative power failue (top chassis).
Second picture: LEDs of the back panel with negative power failue.
Third picture: UGF of ISS 1st loop.