In this elog entry I'll describe some statistical study of all the loud glitches of the last two months. Unfortunately, I don't have any clear conclusion, but I identified 151 glitches of the kind we are looking for.
Looking at the Detchar summary pages, I selected the most stable and longest lock stretches from June 1st till today. For each of them, loud glitches were selected by looking at the minutes when the inspiral range dropped more than 5 Mpc with respect to the general trend (computed with a running average of 30 minutes). The list of lock stretches is reported in the first attached text file (lock_stretches.txt).
The range is sampled at 1/60 Hz, so to better identify the glitch time I loaded LSC-DARM_IN1 and computed a band-limited RMS between 30 and 300 Hz. Empirically, this turned out to be a good indicator of the kind of glitches we are trying to investigate. The time of the glitch is taken as the time that corresponds to the maximum value of the said BLRMS. This works very well for our glitches, but may give slightly wrong results (within one minute) for other kind of glitches
In total I identified 285 loud glitches, as defined above. The corresponding GPS times are listed in the second attached text file (glitch_times.txt) together with a number which classifies the glitch (see more later).
First of all, I wanted to understand if those loud glitches, regardless of their shape and origin, are clustered in particular hours of the day. The first attached figure shows three histograms:
I don't see any dramatic clustering, however it seems that there are a bit more glitches happening between 5pm and 8pm. Not very significant though. Morevoer, remember that this analysis covers all locks that I judged to be good, without having any information on the activity of the commissioners.
The second attached plot is the same kind of histogram, but here I restricted the analysis to the period of time marked as ER7 (between GPS 1117400416 and GPS 1118329216). This should be a more controlled period, without too much commissioning or tube cleaning. A total of 167 glitches were identified in this period.
Basically, the same daily distribution as before is visible, altough the predominance of 5pm-8pm is somewhat lower.
The third attached plot shows the same histogram again, but for all the non ER7 periods. It looks like the dominant periods are later in the night, between 7pm and 10pm.
In conclusion, I don't see any striking dependency on the time of the day.
I looked into the 285 glitches one by one, to try a classification based on their shape. The kind of glitches we are hunting down have a very clear shape, as pointed out by Keita. Here is my classification system:
Class 0: unidentified origin (not clear what caused the range to drop...)
Class 1: like the glitches we are looking for, but kind of small and sometimes not completely certain
Class 2: definitely the glitches we are looking for
Class 3: somewhat slower glitches, with a duration of 10-100 ms
Class 5: general noise increase on a longer time scale (seconds)
Class 6: messy things, including clear human actions (swept sines, etc..)
The classification is based on the behavor of the BLRMS in the 30-300 Hz band, the time series, and a 100Hz high passed time series.
In total, I could identify 151 glitches of class 1 and 2, that most likely corresponds to what we are looking for. Attached figures 4 and 5 show two examples of class 1 and class 2 glitches. I saved similar plots for all 285 glitches, so ask me if you are interested to see all of them.
I repeated the same stastistical analysis described above, but this time using only class 1 and 2 glitches. The 6th attached plot shows the dependency on the time of the day. Unclear to me if there is anything significant. The peak is betwen 6pm and 7pm...
I also checked if there is a correlation with the day of the week, see the 7th plot. Not clear either, altough it seems excluded that there are less glitches over the weekend. It's more likely the contrary.
Finally, the very last plot shows the glitch rate as a function of the date. It seems that three days were particularly glitchy: June 8th, July 25th and August 1st.
Added 356 channels. The following channels remain unmonitored: H1:GRD-LSC_CONFIGS_LOGLEVEL H1:GRD-LSC_CONFIGS_MODE H1:GRD-LSC_CONFIGS_NOMINAL_S H1:GRD-LSC_CONFIGS_REQUEST H1:GRD-LSC_CONFIGS_REQUEST_S H1:GRD-LSC_CONFIGS_STATE_S H1:GRD-LSC_CONFIGS_STATUS H1:GRD-LSC_CONFIGS_TARGET_S I'm not sure why they weren't caught by the conlog_create_pv_list.bsh script?
ALL TIMES IN UTC
15:00 IFO locked at 32 mpc
15:21 Looked at PSL status. Everything looks good .
15:23 Robert S. out ino LVEA to do some HF acoustic injections while IFO is locked.
16:32 Sent Katie out to LVEA to join Robert.
16:50 Robert out temporarily. Katie still in.
16:54 TJ working with SYS_DIAG. Red Guarian Backgrounds are to be ignored until further notice.
17:00 Robert and Katie out until lock re-established.
17:10 Kyle and company out to Y2-8 (~300m from Y-End) to deliver eqiupment.
17:34 Guardian:OMC_LOCK in error as Kiwamu warned me. He's in a meeting. Jamie is looking into it for now.
18:00 Operator training in the control room.
22:00 Filled the chiller water level. Low Level alarm sounding on unit. Filling restored it back to normal.
The steps that lead to success: - First I got the best mode frequency estimate so far, from Keith Riles alog: 508.2892Hz (19190) - Next I drove at a frequency nominally 1/10min above: 508.2892Hz + 1/(60sec*10) = 508.290867 Hz - (Note: I picked a higher frequency because the neighbouring mode is at a lower frequency.) - This produces a beat signal with a period of 13min, maybe +-10sec - Thus, today our best mode frequency estimate is f0 +- df = 508.2892Hz + 1/(60sec*10) - 1/(60sec*13) +- df = 508.289585Hz +- 0.000017Hz. - Thus, a simple awggui drive at that frequency should run away with at most 1/df/360deg = 167sec/deg, or 2.8min/deg - I guessed an initial phase and left the drive running during the commissioning call, and got a nice decline of the mode. - Next I designed a feed-back filter that matched my successful awggui drive (all fast enough that my feed-back phase didn’t run away.) - That loop looked good so far - except that we several times tripped the silly hardware watchdogs - they take us out at about 1/2 of the DAC range... :-( - We then turned on all violin damping filters. Notably this includes MODE5 on ETMY, which is a broadband filter also affecting the drive at 508.2896Hz. - We will observe this state for a while before complete declaring success. The filters currently used are FM1, FM4 and FM5 (a +12deg), and a gain of about +500. The setting are not in Guardian yet.
Filiberto, Ed
Fil alerted me to the Low Water Level alarm going off in the chiller room. I added 400ml. Alarm reset. Situation returned to normal.
As I showed and Daniel knew and confirmed, moving the HEPIs at 2um/sec was too fast to not upset the BSC ISI State1 T240 sensors. With larger offsets which therefore will bleed down for longer periods of time, the T240s have londer time to get to their trip point and trip the watchdog.
I changed the H1:LSC-X(Y)_TIDAL_CTRL_BLEEDRATEs from 2 to 1u/s; the change in SDF has been accepted. Will follow up after we've had a few lock stretches, and bleedoffs.
Did a quick scan through DV and there are 3 trips of the ETMY ISI from this bleedoff since July 6. It looks like there have been no trips of this type on the ETMX since then.
Carlos, Patrick On h1ecatx1 we found that the Power Efficiency Diagnostics -> AnalyzeSystem task was enabled. The 'Triggers' was 'At 6:00 AM every 14 days'. The 'Last Run Time' was '08/04/2015 2:13:32 PM'. The 'Wait for idle for' was '2 hours'. It was disabled on h1ecaty1 and h1ecatc1. This is something that Beckhoff had told me about a while ago and I just thought to recheck. I suspect this may be the cause of h1ecatx1 crashing every two weeks on Tuesday. In addition we disabled the following services on h1ecaty1: Windows Defender Windows Update Windows Search Adobe Acrobat Updater and the following services on h1ecatc1: Windows Defender Windows Update Adobe Acrobat Updater
On Tuesday, an upgrade was made to all ISI models to separate the WD untrip function from the Saturation Clearing function. Also, an automated saturation bleed off was included to subtract accumulated saturations over time. Saturations occuring in a given minute are subtracted from the total 60 minutes later. This saturation has mainly been an issue for HAM6 when the shutter triggers and jars the ISI table. While we can accumulate 8192 saturations on the actuators before tripping the ISI Watchdog, hundreds of these saturations can occur for each shutter trigger.
To prevent the ISI from ultimately tripping the ISI, the commissioners added a reset of the watchdog to the guardian. Since Tuesday, the bleed off has been running but the guardian reset was also operational. That is, until Sheila commented this out yesterday about 2200utc. On the attached 24 hour trend of the saturation counter for the HAM6 actuator, I noted the ~time when the guardian reset feature was disabled and the one time when the new CLEAR SATURATIONS button on the watchdog medm was pressed. You can clearly see the saturations that occur dropping off one hour later. What is not clear is whether the clearing will happen quickly enough in a state when the system is dropping triggeringthe shutter frequently due to conditions or commissioning activities. Still, at least here the conditions where such that the saturation were able to get back to zero.
TJ has been tasked with a SYS_DIAG addition to alarm/notify when the saturation total exceeds some percentage of the maximum prompting manual clearing to stop ISI tripping.
Summary:
I looked at all 10 events on Gabrielle's alog that are without "Those marked with ** look somwhat different in the spectral shape" caveat, i.e. I only picked the events that were similar according to Gabrielle.
We don't know what these are, we know that this is NOT a sensing side software/hardware glitch.
All of these look very similar (see attached). DCPD current suddenly shoots up, there's a msec-ish rise, and then an equally sudden and fast fall of msec order, then the servo slowly follows up.
Details:
The spikes for larger glitches are on the order of 10^-15m to 10^-14m, or 0.1%-ish RIN (0.02mA) (but there are smaller ones).
DC current increases, which means that X becomes shorter and/or Y becomes longer (if the IFO calibration is correct).
What we were able to exclude so far is:
Things that are not yet completely killed are:
Is there anything that helps us understand if this is happening inside or outside of the arms?
Plots:
These are four biggest glitches from Gabrielle's list but I looked at all 10 that are without "look somewhat different" remark.
On top is the calibrated DARM displacement, properly dewhitened, and rewhitened again to make the glitch visible while keeping the phase intact above 600Hz or so, and mildly band-stopped (-20dB) second violin resonances. The "calibration" of this is 10^-10 meters, and the sign is correct for msec signal (i.e. positive means X-Y going positive).
Middle is the OMC DCPD SUM. DARM UGF is about 40Hz (1/e time = 1/2/pi/40=4msec), so the glitch shape is almost undisturbed by the DARM servo.
Bottom (of the bottom) is the shape of the glitch after subtracting the background motion in OMC DCPD SUM.
Some details on the Virgo glue glitches can be found at the following elog entries:
WI glitch rate: shows the time domain shape of some of the glitches, that were quite fast kicks
Glitches: Another example time series
Glitch: where the excitation of one of the test mass modes is evident
Some Virgo jargon translation
Pr_B1_ACp = DARM error signal, equivalent to AS_RF_Q
Hrec = calibrated (high passed at ~10 Hz) strain, equivalent to CAL_DELTAL_EXTERNAL
WI = West input mirror ITMY
J. Kissel, M. Wade In order to test out the new GDS calibration pipeline infrastructure, Maddie needs for the newly installed EPICs values that represent the DARM loop model reference values at each calibration line to be populated. Since we don't have a new, vetted, model that represents the current interferometer yet, I've installed the ER7 values, which have been precalculated from Maddie. There are some SDF vs. EPICS values that still need sorting since some of the values are very, very small. Below, I quote the values entered. For the PCAL Line at 36.7: H1:CAL-CS_TDEP_PCALY_LINE1_REF_A_REAL -6.6965e-17 H1:CAL-CS_TDEP_PCALY_LINE1_REF_A_IMAG 7.8784e-18 H1:CAL-CS_TDEP_PCALY_LINE1_REF_C_REAL 1.2887e+06 H1:CAL-CS_TDEP_PCALY_LINE1_REF_C_IMAG -193720 H1:CAL-CS_TDEP_PCALY_LINE1_REF_D_REAL 8.9995e+09 H1:CAL-CS_TDEP_PCALY_LINE1_REF_D_IMAG 1.1573e+10 H1:CAL-CS_TDEP_PCALY_LINE1_REF_C_NOCAVPOLE_REAL 0.999 H1:CAL-CS_TDEP_PCALY_LINE1_REF_C_NOCAVPOLE_IMAG -0.0462 For the DARM line at 37.3: H1:CAL-CS_TDEP_ESD_LINE1_REF_A_REAL -6.4774e-17 H1:CAL-CS_TDEP_ESD_LINE1_REF_A_IMAG 7.6853e-18 H1:CAL-CS_TDEP_ESD_LINE1_REF_C_REAL 1.288e+06 H1:CAL-CS_TDEP_ESD_LINE1_REF_C_IMAG -196820 H1:CAL-CS_TDEP_ESD_LINE1_REF_D_REAL 9.1216e+09 H1:CAL-CS_TDEP_ESD_LINE1_REF_D_IMAG 1.1789e+10 H1:CAL-CS_TDEP_ESD_LINE1_REF_C_NOCAVPOLE_REAL 0.999 H1:CAL-CS_TDEP_ESD_LINE1_REF_C_NOCAVPOLE_IMAG -0.0469 For the PCAL line at 331.9: H1:CAL-CS_TDEP_PCALY_LINE2_REF_A_REAL -8.1357e-19 H1:CAL-CS_TDEP_PCALY_LINE2_REF_A_IMAG -3.8605e-21 H1:CAL-CS_TDEP_PCALY_LINE2_REF_C_REAL 373720 H1:CAL-CS_TDEP_PCALY_LINE2_REF_C_IMAG -890150 H1:CAL-CS_TDEP_PCALY_LINE2_REF_D_REAL 7.7692e+10 H1:CAL-CS_TDEP_PCALY_LINE2_REF_D_IMAG -8.0292e+09 H1:CAL-CS_TDEP_PCALY_LINE2_REF_C_NOCAVPOLE_REAL 0.9206 H1:CAL-CS_TDEP_PCALY_LINE2_REF_C_NOCAVPOLE_IMAG -0.4128 More details to come.
Jamie, Dan, Gabriele, Evan
There are some files in evan.hall/Public/Templates/LSC/CARM/FrequencyCouplingAuto that will allow for automated transfer function measurements of frequency noise into DARM (using IOP channels) at specified intervals. Running ./launcher 1200 15 FreqCoupIOPrun.sh will open a dtt template, record the transfer function (and coherence) between the appropriate IOP channels, and then save a timestamped copy of said template. This then repeats every 20 minutes for 15 iterations (i.e., about 5 hours).
The idea here is to hook up the usual broadband, analog frequency excitation into the CARM error point and leave it running for the duration of the thermal tuning (1 V rms, 300 Hz HPF, 30 kHz LPF, with the CARM IN2 slider at -17 dB seems to be OK). This conclusion (i.e., to do a broadband frequency measurement only) was the conclusion that Gabriele and I came to after finding that a similar excitation injected into the ISS error point was highly nonstationary in the OMC DCPD sum.
If we want to do broadband noise injection to both the ISS and CARM, then some modification of the script will be required; i.e., we probably will have to ramp the two excitations on and off so that they are not injecting simultaneously. That's not hard, since both the CARM board and the ISS board have digitally controlled enable/disable switches.
Right now a thermal sweep with transfer function measurements does not really seem compatible with the heinous upconversion in DARM from the 508.289 Hz mode. Dan and I tried for some time to continue on with the work started by Jenne, Stefan, and Kiwamu, but we could not make progress either. We even tried (on Stefan's suggestion) sending in an awg excitation at the mode frequency into the EY ESD, but this did not have any real effect on the mode height. If it's useful, the EY mode #8 BLRMS filters are set up to measure the behavior of the violin modes in the neighborhood of this one.
This was rewritten to accomodate interleaved measurements of the ISS and FSS couplings:
./launcher 1200 15 FreqCoupIOPrun.py
It will turn on an FSS injection, measure the FSS coupling, turn off the injection, then repeat for the ISS (via the second-loop error point injection). It remains to be seen what the appropriate ISS drive level is.
IOP channels for ISS second loop array:
H1:IOP-PSL0_MADC0_20 = H1:PSL-ISS_SECONDLOOP_PD1
...
H1:IOP-PSL0_MADC0_27 = H1:PSL-ISS_SECONDLOOP_PD8
Earlier today or yesterday, Sheila added some new "shuttered" states to the ALS guardians. The ISC_LOCK guardian requests these states during the "bounce violin mode damping" state. The new ALS states are just empty, so that we can have a final nominal state, so we know when we've completed everything. This prevents the ALS guardians from reporting things like "lockloss" or "fault" when the green lasers are shuttered. The new states only have a guardian edge coming from the "IR found" state.
However, in the ISC_LOCK we were requesting these states, and then immediately closing the green shutters. If the ALS guardians detect that there is no green light in the arms before they've been told to go to this new state, they'll end up in "lockloss" or "fault". I don't totally understand why, but if they are then told to go to the "shuttered" state from somewhere other than "IR found" , this causes a lockloss of the full IFO.
Anyhow, for now I've put a 2 second sleep between the node requests and the closing of the green shutters. At least one time we have now successfully made it through this state after my fix. I will consult with Sheila and Jamie tomorrow about the best way to handle this - should we build in more explicit ways for the guardian to get to the "shuttered" state?
As a side note, we had several locklosses earlier in the day at this state, and I was unable to find any obvious reason for the lockloss. Since these new states were already in use, I suspect that this problem is what caused those locklosses.
Since we were working on violin mode damping anyway (aLog 20280), we also tried some ETMX violin damping tuning. At some point, we tripped the ETMX analog PUM current RMS watchdog. We tried setting the H1:SUS-ETMX_BIO_L2_RMSRESET to zero, then back to one (which worked to reset the ETMY watchdog, when we tripped that once or twice). We tried this reset procedure several times, but the watchdog isn't resetting. I don't know enough about this system to diagnose it further, so I await guidance from someone more knowledgeable in the morning. Our low frequency noise is kind of terrible, which might be because it looks like we aren't driving the ETMX L2 stage at all, but we are able to get to the Nominal Low Noise state (the trip happened at the DC Readout state).
Attached is a screenshot of the current state of the ETMX suspension screen, including the tripped L2 RMS watchdog.
Afterwards, we punched in zero again in H1:SUS-ETMX_BIO_L2_RMSRESET. Even though we did not punch in 1 this time, for some reason the current monitor became back active, the watchdogs were untripped. After a minute or so, we set it back to 1 and confirmed that the coils were still active by observing the current monitors.
Another strange thing is that, even though the coils are now active, the screen still shows a red light at the top as Jenne showed in the attachement. This needs to be investigated in the day time tomorrow.
Jenne and I drove down to EX and power cycled the PUM driver. That untripped the watchdog. [Also, I power cycled the AA chassis two slots below the PUM driver by accident. Sorry.]
We've seen this failure mode before: https://alog.ligo-wa.caltech.edu/aLOG/index.php?callRep=17385
Yes. The Y PUM watchdogs can be reset from MEDM but the Xs cannot.
Related alogs: 17610, 19720 Jenne, Stefan We had highly rung up violing modes tonight, in particular 508.289Hz was outrageous. Nominally it should be on ETMY. We pushed on that optic in pitch, yaw, length, and with all sorts of different filters (narrow ones, broad ones, different feed-back phases, etc.) We succeeded in ringing up all three neighbouring modes, and thendam them again. But 508.289 never changed its size. We even tried pushing ETMX (its modes are closest to ETMY), but no success... (There was a typo in the frequency in the first version, hence Evans comment)
Just to be clear, it is 508.289 Hz.
In all the last lock stretches we saw a lot of very loud glitches, clearly visible as large drops in the range. See reports from Sheila and Lisa.
I had a look at the time period pointed out by Lisa (1122456180 - 1122486360). I wrote a MATLAB script (attached) that load the range and select the glitch times looking at the large drops. Since the range is sampled once per minute, I then look at the DARM error point and search for an excess of noise in the 30 to 300 Hz region. In this way, 16 loud glitches were selected.
Most of them are similar, see the first figure. I checked with the MATLAB script the value of all suspension actuation signal, using the MASTER channels. No saturation or unusual behavior is visible.
I also checked that there is no drop or change in the IFO powers. Then, as suggested by Sheila, I checked the DACKILL signals as well as the CPU_METER signals. Nothing there.
Finally, I selected the loudest glitch (1122479776) and looked at all (literally) *_DQ channels. I couldn't find any interesting correlation, except for signals that are known to be related to the DARM loop, like for example ETM* and ITM* control signals. For this glitch and a couple of the loudest ones I could see a similar glitch in H1:LSC-ASAIR_A_RF45_Q_ERR_DQ, H1:ASC-AS_A_RF45_I_YAW_OUT_DQ, H1:ASC-X_TR_A_NSUM_OUT_DQ, H1:ASC-Y_TR_A_NSUM_OUT_DQ. All these are signals somehow sensitive to DARM.
In conclusion, I couldn't find any channel, except DARM ones, that are correlated with those glitches.
In addition to what reported above, I can add that the glitches are not correlated to any clear level crossing of any of the control signals (MASTER*)
I also checked that there is no overflow on any of the FEM-*_CPU_METER signals.
Here is the list of time of the glitches in this period.
1122457875.48352 **
1122458467.51526 **
1122464968.42584
1122466633.18567
1122467932.76263 **
1122470377.97369
1122470958.48877
1122471214.02692 **
1122474562.14758
1122478259.22369 **
1122479126.35669
1122479776.82788
1122481918.12170 **
1122483108.98047
1122484101.65674
1122485013.84241
Those marked with ** look somwhat different in the spectral shape, so they may be uncorellated with the others.
The loudest glitches on the 26th were ETMY L3 DAC overflows (alog), as were the ones on the 28th. That's not the case in this lock; there were no overflows in the ETM L2 or L3 DACs, or in the OMC DCPDs.
Here is the working script, the previous one had some copy and paste mistakes...