Rana, Evan
Yesterday, Rana made a low-pass filter in order to remove EX ESD saturations when ALS DIFF is used as a DARM sensor. However, today we continued to see some locklosses around the time that DARM is handed off from ALS DIFF to AS45Q.
Therefore, we looked at FM8 and FM10 in the DARM filter bank, which are low-pass filters used with DIFF only. We found that we could relax the amount of rolloff in order to win some phase at 10 Hz and below (see attached foton plot). There is not nearly as much suppression above 500 Hz, but it doesn't really matter since most of the rms drive to the EX ESD is accumulated below 100 Hz. The attached spectrum shows the EX UIM and ESD drives both before (blue) and after (red) retuning. There is no increase in the the rms drive to the ESD (part of this is also due to Rana's retuning of the UIM/ESD crossover).
Finally, the attached transfer function shows the ALS DIFF OLTF before (blue) and after (red) retuning. The "before" OLTF was taken yesterday and has 4 dB less gain than the red transfer function (because we added 4 dB of missing gain in the LSC-DARM violin bandstop filters).
Additionally, we switched FM1 and FM7 in LSC-DARM. FM7 was a boost that would cause the OMC transmission to flutter when first engaged. FM1 was the suspension compensation, and since it is always on there is no point in having it in such prime real estate. So we've switched them and updated the appropriate settings in the ISC_LOCK and ALS_DIFF guardians.
We were also getting some transitions during this transition due to some unsuppressed 0.2 Hz microseism in the ESD drive. Ideally, all of the low frequency force is taken care of by the upper stages, but this was not the case here:
The new filters have a no-ramp integrator and a RG at 0.2 Hz which is Q-matched to the microseism bump. The L3 signal has much less 0.2 Hz signal and our ESD saturations are somewhat reduced. This seems to work fine on both ETMx and ETMy all through the low noise state.
Guardian scripts updated to reflect the new filters. Now the ETMX / ETMY switcheroo for flipping to low noise ESD state can be done by gain ramping alone -- no filter switching to match actuators.
Current version:
The previous upgrade to r1449 included a patch to throw errors if any other internal state inconsistencies (like that which caused the "double main()" bug) are detected. The errors encountered since Friday are from failures of these internal consistency checks, apparently from situations not fully covered in the battery of unit tests. This patch covers those situations. I'm looking into improvements to the unit tests to cover more of the user phase space.
No nodes have been restarted.
ISC_LOCK and ISC_DRMI were restarted around 2015-07-19 22:10:00 Z.
the probability a freeze-up does not impact at lease one dolphined FE is very small, so I'm using the h1boot dolphin node manager's logs to data mine these events. The dolphin manager was restarted when h1boot was rebooted Tuesday 7th July, so data epochs at that time.
As I was seeing with my monitoring programs, the restarts preferentially happen in the 20-40 minute block within the hour. The first histogram is the number of events within the hour, divided into 10 minute blocks.
We are also seeing more events recently, the second histogram shows number of events per day. The spike on Tue 14th is most probably due to front end computer reboots during maintenance. Friday's increase is not so easily explained.
FE IOC freeze up time listing:
controls on h1boot
grep "not reachable by ethernet" /var/log/dis_networkmgr.log |awk '{print $2r, $4}'|awk 'BEGIN{FS=":"}{print $1":"$2}'|sort -u
total events 197
minutes within the hour, divided into 10 min blocks
00-09 11 :*****
10-19 11 :*****
20-29 67 :*********************************
30-39 79 :****************************************
40-49 17 :*********
50-59 12 :******
events per day in July (start tue 07)
wed 08 09 :*****
thu 09 09 :*****
fri 10 08 :****
sat 11 07 :****
sun 12 08 :****
mon 13 09 :*****
tue 14 22 :***********
wed 15 10 :*****
thu 16 20 :**********
fri 17 38 :*******************
sat 18 16 :********
This is a very clever analysis, Dave I checked the LLO logs (there are three, corner, x-end, y-end). So far we only see these issues when we have a front-end down for IO chassis, new hardware installs.
Rana, Matt, Evan
Low-noise commissioning was derailed by various CARM-reduction locklosses:
During the ALS DIFF to DARM RF transitions, there were saturations of the ETM ESD actuators which caused occasional unlocks. This was due to the poor sensing noise in the ALS and so any change in the dark noise or EMI at the end stations would slow down the whole lock acq sequence,
We put the following low pass filter in DARM: started with the RED RLP80, but found the BLUE RLP180 was good enough so that the control signal is now dominated by low frequency stuff instead of the hash at 1 kHz. The script now turns this on by default and turns it off to recover the 100 Hz phase margin as soon as we have transitioned to a low noise DARM RF signal.
* also, while writing this, we had the second Guardian crash of the night using the latest build with the 'no double main()' fix. Jamie has been notified.
Today we've had >5 locklosses of the 3rd kind: where the CARM handoff from using arm transmission to REFL PDH fails. Attached is a lockloss plot of one of these events.
@ -1.2 seconds, CARM is on normalized REFL9_I
@ -1.3 seconds, it is starting the transition and the arm powers are starting to dive
Why are the arm powers diving here? I suspect that the normalized REFL signal may have some instability depending upon the L2A -> power coupling, so we might try freezing the normalization at that time.
Also, since we only normalize by Y arm power for CARM, we're getting a DARM-> CARM coupling at f < 0.1 Hz...
We're having some mysterious lock losses as we move from large CARM offset to less large offset. With the DRMI locked, the ALS starts bringing the arms in and the ALS / IMC lose lock.
Suspecting the recent FSS tunings, we looked at the FSS screen. The FAST gain was down at +5 dB. Also, the EOM Drive readback was up at +3V. The attached plot shows the EOM readback (PC_MON) as the FAST gain is tuned.
I have left it at 22.2 dB, where the EOM drive is minimized. IN mid-April, there are a series of entries from Rick and Peter where the loop is tuned up, but the fast gain is turned down incrementally from 21 to 5 dB. Why so??
Also, it seems like we should aim for a ~250 kHz UGF for the FSS to avoid the peaking at 1.8 MHz where the notch is not getting all of the EOM resonance.
Cataloging the many ways in which we are breaking lock or failing to lock since Friday, we found this one:
Sitting quietly at 10W DC readout, there was a slow ring up of a pitch instability in several ASC signals. Perhaps its time we went back to seriously controlling the ASC in the hard/soft basis instead of the optic basis in which its currently done. The frequency is ~1 Hz and the time constant is ~1 min. It would be great if someone can look at the signs of the fluctuations in the OL and figure out if this was dHard or cHard or whatever.
In the attached plot, I've plotted the OpLev pit signals during the time of this ringup (0702 UTC on 7/19). The frequency is 1 Hz. It appears with the same sign and similar magnitudes in all TMs except ITMX (there's a little 1 Hz signal in ITMX, but much smaller).
Evan, Matt, Rana
We again saw the pitch instability tonight. We tried reducing it in a few ways, but the only successful way was to turn off the SRCL FF.
It appears that at higher powers, the SRCL_FF provides a feedback path for the pitch signals to get back to the Arms (since SRCL_FF drives the ITMs; and both of them as of Thursday). i.e. cSOFT has a secondary feedback path that includes some length<->angle couplings and produces a high Q unstable resonance. I don't understand how this works and I have never heard of this kind of instability before. But we repeatedly were able to see it ringup and ringdown by enabling SRCLFF.
To enable use of SRCL_FF, we've put a high pass filter into the SRCL_FF. This cuts off the SRCL_FF gain below a few Hz while preserving the phase above 10 Hz (where we want the subtraction to be effective). HP filter Bode plot attached.
The DAC for the BS M2 stage can put out 131000 counts, but the RMS is only 500 counts after transitioning into 'state 3' of the coil driver (Acq OFF, LP ON).
Seems like we're not in the best state here. We don't want to reduce the BS range for acquisition.
Should we be putting in an offset to avoid the DAC glitches or has this DAC been improved by some EEPROM upgrades?
Has anyone in DetChar seen BS DAC glitches from ER7?
I don't remember us ever noticing BS M2 DAC glitches in ER7. The only ones we really saw were on MC2 M3. Looking back at two locks (Jun 5 5 UTC and Jun 8 14 UTC), BS M2 was centered on 0 with a peak-to-peak range of 6000 counts. So we don't really know what happens when we hit +/- 2^16 counts. I think we've seen that the -2^16 crossing is often the worst. I just looked at some zero-crossings in MICH and the BS M2 noisemons during these two locks. Three quadrants look to be clear of DAC glitches. But in the UR noisemon there seem to be glitches at 300 Hz which match up pretty well with the times of zero crossings. The first attached plot is the Omega triggers of the noisemon with vertical lines showing zero-crossings in MASTER_OUT. The second plot is a different lock, where the conclusion still holds but the timing seems less exact. Next is a spectrum comparison of the UR NOISEMON versus MASTER_OUT. There's a notch around 300 Hz, presumably to avoid ringing up the BS violin modes, but the noisemon sees something coming out of the DAC in this range. For comparison, the other spectrum is the same thing for LR, where there doesn't seem to be excess noise in the notch. I don't see any evidence of these glitches affecting MICH (I think that's where BS glitches would show up the best). That's probably why we never noticed these. We mostly look for things that affect DARM, although we sometimes serendipitously find other things (we noticed MC2 because it showed up in MCL). It's weird for DAC glitches to show up at high frequency, and the timing doesn't exactly match the zero crossings. It's probably worth keeping a close eye on the noisemons if more range is used on the DACs, and for detchar to check whether the DAC glitches had any effect on the BS stability.
Evan, Matt, Stefan - Matt wrote a many - optic ASC relief function, which we added to the DRMI guardian. This saves us some time. - for the rest we were chasing random fast lock losses that hit us pretty much anywhere - Prep_TR_CARM, REFL_TRANS just sitting on resonance at low power, and sitting at high power. - Evan started to systematically go through and check loop,gains. - He found the digital REFL CARM loop to be slightly low, increase Tr_REFLAIR_9 gain from -0.5 to -0.8 - ALS diff looked fine.
What I did - Take a filter that ramps over 3sec (always on) - edit the foton file top a 1 sec ramp - start the ramp, but before it finishes - load the new filter. - The filter module keeps ramping, and never finishes... - I could reproduce this twice. - I attached a snap of the still ramping FM1 on LSC-REFLBIAS. - To fix it, I considered rebooting, but since the I suspected the problem to be a runaway counter, I simply added a filter with 600sec ramp time (long enough to catch the original filter ramp). 10min later (the time it took to write this log) it was fixed...
Evan, Rana, Stefan After we mostly fixed the CARM_ON_TR lock losses we ran into the REAFL_TRANS ones. There is definitely a loop instability on TR_REFL_TRANS. Be sped up this transition which at least once seemed to help. However we also randomly lost lock at other places, and never made it to low noise. We'll have another systematic approach tomorrow.
Handoff of CARM from in-air REFL to in-vac REFL is now automated via the IN_VACUUM_REFL
state in the ISC_LOCK
guardian.
It will run after ENGAGE_ASC
and before BOUNCE_VIOLIN_MODE_DAMPING
. It was tested and seems to work fine.
MattE, Hong, Kiwamu, Rana
We've made a temporary hookup at EY to get the in-vac, IR, TransMon QPD signals into the new fast 'h1susetmypi'. This is so that we can monitor the amplitude and frequency of the unstable opto-acoustic modes in the interferometer (0910.2716). The only cabling change we've made is to add a breakout board at the AA chassis, so things out to run as before after the EQ rings down.
Cabling Details:
The TransMon QPD cable goes into a Transimpedance/Whitening amplifier (D1001974) with Z = 1000 Ohms. Then there's a 0.4:40 pole:zero stage with a gain of 1 at DC. The output of this board then goes through a whitening chassis and then the output of that box (in the rack near the BSC) goes into the electronics room and into an ISC AA Chassis via a 9-pin dsub. We put the breakout board at the AA side of this cable. We used clip-doodles to go into a SCSI breakout board and via ribbon cable into the ADC for the h1susetmypi. This is a temporary setup to allow us to commission the model software. In this setup, since we're using the whitening filter outputs, we also get the whitening gain and amplificaiton which is used for the QPD servos. Also we do not need to use the PEM patch panels as initially planned.
The transimpedance box has a single pole at 80 kHz. The whitening filter has no poles below 80 kHz. So these should be fast enough to let us see PI modes up to 30 kHz.
We had to use some critical electrical tape to keep the BNC-clip shields from shorting with each other; take care in working near the AA side of this cabling - it may put offsets into the QPDs and disturb the lock acquisition.
Sheila, Stefan Our first version of this fix had a bug: the linear fit below the sqrt limiter meant that there is a non-zero chance to drive the arm in the wrong direction, resulting in a DRMI lock-loss. We addressed this in the next version of the code with a quadratic fit: - The initial limiter is set at l=1e-4. Above it (x>l), we still simply have sqrt(x) for CARM_TR. - Below x<-3*l, we have TR_CARM=0 - Between -3*l<x<l we add the 2nd path f=A*(x-x0) + f0, with - f0=-sqrt(l) = -0.01 - x0=-3 l = -3e-4 - A = 1/(16*l^(3/2)) = 62500 - The sum of the two paths gives a smooth interpolation. We tested this code and verified that the FE code does what it should. - Next we wanted to optimize the threshold limit l: - Looking at past locks, the pk2pk during PREP_TR_CARM in TR_CARM is about 0.1 cts - Thus the following parameters might be even better: - l=0.01 - f0=-sqrt(l) = -0.1 - x0=-3 l = -3e-2 - A = 1/(16*l^(3/2)) = 62.5 - We installed this as version V2 of this code. - Due to the earthquake we have not yet tested this yet.
Evan, Stefan After the earthquake we had a chance to test the code: - The good news: the smooth turn-on of TR_CARM described above seems to work just fine - every time. - The bad news: we still sometimes lost it sometimes 13 seconds after we grabbed TR_CARM. - Attached are traces of TR_CARM_INMON for one failed attempt, and 2 successful attempts. - At any rate - the lock losses happen when we are already clean on the sqrt(x) part - so we will keep the code change. The lock loss happens most likely during the engaging of LSC-MCL FM3 (BounceRG): Guardian line 642: ezca.switch('LSC-MCL', 'FM3', 'ON') Guardian log: 2015-07-18T07:58:51.84052 ISC_LOCK [PREP_TR_CARM.main] ezca: H1:LSC-MCL => ON: FM3 We moved the LSC-MCL FM3 engaging to the end of CARM_ON_TR (line 733): ezca.switch('LSC-MCL', 'FM3', 'ON') This seems to have fixed the problem - at least as far as we can tell,, (we are at 2 out of 2 for this type of lock loss...) We also moved the zeroing of REFL_BIAS matrix elements to the DOWN state.
Evan, Stefan We implemented the PR3 feed-back in ENGAGE_ASC in the ISC_LOCK guardian: - We decided to leave the feed-back on PR2 during DRMI. This allows us to absorb the first correction to the initial alignment to PR2. - We then switch the feed-back to PR3 in ENGAGE_ASC. This locks down PR3 during power-up - we confirmed that the REFL beam no longer moves in that transition. - Details: - We modified the DRMI guardian ENGAGE_DRMI_ASC state to prepare both PR2 and PR3 for feed-back. - PR3 gais were set to roughly match PR2 - The settings on PR3 are: - ezca['SUS-PR3_M3_ISCINF_P_GAIN'] = 2 - ezca['SUS-PR3_M3_ISCINF_P_GAIN'] = 5 - SUS-PR3_M1_LOCK have an integrator, a -120dB and a gain of 1 - Also, it now has the flag self.usePR3, which can select feed-back to PR2 (False) or PR3 (True). The rest of the ENGAGE_DRMI_ASC state uses this flag. - The default is self.usePR3=False, i.e. it does just the old PR2 engaging. - The PRC2 loop is always off during the CARM reduction sequence. It is re-engaged in ENGAGE_ASC with feed-back to PR3 with the following steps: - The output of SUS-PR2_M3_ISCINF_P and SUS-PR2_M3_ISCINF_Y is held - The ASC-PRC2_P and ASC-PRC2_Y filters are cleared. - The output matrix is updated: - asc_outrix_pit['PR3', 'PRC2'] = 1 - asc_outrix_yaw['PR3', 'PRC2'] = -1 # required to keep the same sign as PR2 - And the loops are turned on again. - The current loop gain is still low - the step response is on the order of 30sec. - ENGAGE_ASC also has the self.usePR3 flag (default is True), so it is still backwards compatible. - The whole sequence (engage DRMI on PR2, switch to PR3 in full lock) was tested successfully once before an earthquake hit.
I have just installed a new version of Guardian core:
guardian r1449
It address the "double main" execution bug that was been has been plaguing the system. See guardian bug 879, ECR 1078.
The new version is in place, but none of the guardian nodes have been restarted yet to pull in the new version.
You can either manually restart the nodes with 'guardctrl restart', or just try rebooting the whole guardian machine. I might start with the former, to just target the important lock acquisition nodes (ISC_LOCK, etc.), and wait until Tuesday maintenance for a full restart of the Guardian system.
ISC_LOCK and ISC_DRMI were restarted around 2015-07-19 07:07:00 Z.