SUS_PRM guardian became unresponsive, but has how been restored.
The control room reported that the SUS_PRM guardian had become completely unresponsive. The log was showing the following epicsMutex error:
2015-02-24T03:45:51.412Z SUS_PRM [ALIGNED.run] USERMSG: Alignment not enabled or offsets have changed from those saved.
epicsMutex pthread_mutex_unlock failed: error Invalid argument
epicsMutexOsdUnlockThread _main_ (0x2999f20) can't proceed, suspending.
The first line is the last legit guardian log message before the hang. The node was not responding to guardctrl stop or restart commands. I then tried killing the node using the guardctrl interface to the underlying runit supervision interface:
controls@h1guardian0:~ 0$ guardctrl sv kill SUS_PRM controls@h1guardian0:~ 0$
This did kill the main process, but it unfortunately left the worker subprocess orphaned, which I then had to kill manually:
controls@h1guardian0:~ 0$ ps -eFH | grep SUS_PRM ... controls 18783 1 4 130390 37256 7 Feb19 ? 05:13:18 guardian SUS_PRM (worker) controls@h1guardian0:~ 0$ kill 18783 controls@h1guardian0:~ 0$
After everything was cleared out, I was able to restart the node normally:
controls@h1guardian0:~ 0$ guardctrl restart SUS_PRM stopping node SUS_PRM... ok: down: SUS_PRM: 49s, normally up starting node SUS_PRM... controls@h1guardian0:~ 0$
At this point SUS_PRM appears to be back to funcitoning normally.
However, I have no idea why this happened or what it means. This is the first time I've seen this issue. The setpoint monitoring in the new guardian version installed last week means that nodes with the monitoring enabled (such as SUS_PRM) are doing many more EPICS reads per cycle than they were previously. As the channels being monitored aren't changing very much, these additional reads shouldn't incur much of a perfomance hit. I will investigate and continue monitoring.
Sheila, Alexa, Elli, Evan, Gabriele
Because the DHARD WFS plant for pitch changes significantly with the CARM offset reduction, which we think could be caused by miscentering on the optics combined with radiation pressure. We have attempted to center the green beams on the ETMs this morning. We used the PCAL cameras and the code that Elli had been using, but did not use the fitting function, instead just looking by eye at the corsshairs that mark the center of the optic relative to the position of our beams.
For the Y arm we moved the green QPD A yaw from 0.5 (photo 100) to -0.5 (photo 105). For the X arm we ended up not moving, there are no QPD offsets on the X arm.
We found that we were not well aligned after this work, which might have been unrelated to what we did. Now we have reverted the offsets, but it woul be worth trying to put them back in at some time.
We had some suspicious alignment issues today. To align the two arms, we ran the green wfs which feeds back to the ITMs, ETMs, and TMSs, as we have been doing for several weeks now. Then we followed our input pointing, PRM aligning, MICH aligning, and SRM aligning. We could then lock IR with ALS COMM and produce a good build up in the x-arm; however, when we locked ALS DIFF, we were about 5% below the nominal IR build up in the y-arm. We kept seeing an oscillation in AS 45Q before transitioning to RF DARM, and we consistently were losing lock. DRMI was also taking longer to lock (about 12min). We were frusturated and decided to start aligning from scratch. We cleared all the WFS history, used the nominal QPD offsets as described above, and then used the baffle PDs to align the TMS. We then adjusted ITMs and ETMs by hand to get the green build up high in both the arms. This restored the IR build up in the y-arm when locked on ALS DIFF, and allowed us to reach full lock with DRMI locking on the order of minutes. For sanity check, I looked at the PCAL image in green of the Y-arm and it looked the same as above with the corresponding QPD offset (image LHOX 106). I am not sure why the green wfs took us to a "bad" alignment today; these have been reliable in the past.
[Stuart A, Jeff K] To accommodate QUAD top stage bounce and roll mode damping using AS WFS as sensors (see LHO aLOG entry 16868) it has been necessary to prepare h1sus QUAD models ready for tomorrows planned roll-out. All h1sus QUAD models links to the common library (QUAD_MASTER.mdl) were previously broken to provide DARM ERR damping at M0 (see LHO aLOG entry 16655). Therefore, we proceeded to work with the locally modified QUAD models, until such a time that the 'best' damping approach can be incorporated into the common library part for use at both sites. The following list of changes have been made: (1) ETMs, added RFM receiver for H1:ASC-ETMX_AS_B_RF45_Q_P & H1:ASC-ETMY_AS_B_RF45_Q_P at the top-level (e.g. see h1susetmx_top-level_new.png below). (2) ITMs, added IPC receiver for H1:ASC-ITM_AS_B_RF45_Q_P. (3) All QUADs, added input matrix within QUAD/M0/DARM_DAMP block for DARM_ERR + AS_B_RF45 (e.g. see h1susetmx_DARM_DAMP.png below). (4) All QUADs, routed AS WFS from top-level through to DARM_DAMP and into summation block (e.g. see h1susetmx_M0.png below). These models were test built which produced the following errors (IPC related): - H1:ASC-ETMX_AS_A_RF45_Q_P & H1:ASC-ETMX_AS_A_RF45_Q_P - H1:LSC-ETMX_L_SUSETMX & H1:LSC-ETMY_L_SUSETMY - H1:LSC-OAF_DARM_ERR The first is expected due to changes needing to be made to h1asc model. The remaining issues we need to coordinate with Kiwamu who is updating/splitting the h1lsc model. It is planned to build, install and restart models tomorrow.
The cdsutils installation has been updated to r440, which includes a new CDSMatrix object that, once initialized, can be used to reference CDS front end EPICS matrix elements by name.
NOTE: this object uses the standard (row, column) ordering, for consistency with the most matrix element reference standards.
Example usage:
jameson.rollins@operator1:~ 0$ cdsutils --version cdsutil 440 jameson.rollins@operator1:~ 0$ guardian -i -------------------- aLIGO Guardian Shell -------------------- ezca prefix: H1: In [1]: from cdsutils import CDSMatrix In [2]: m = CDSMatrix('LSC-PD_DOF_MTRX', rows={'DARM': 1, 'MICH': 2,}, cols={'OMC': 1, 'POP_A_RF45_Q': 5}) In [3]: m Out[3]:In [4]: m('MICH', 'POP_A_RF45_Q') Out[4]: 'LSC-PD_DOF_MTRX_2_5' In [6]: m['MICH', 'POP_A_RF45_Q'] Out[6]: 0.0 In [7]: m['DARM', 'POP_A_RF45_Q'] = 0
Jeff, Dan, others...
Sheila and I noticed last night that the ETMY L2 stage RMS current watchdog had tripped - the circled indicator lights in the first figure were red. Today with Jeff we tested whether this has any effect on the L2 actuation and reset the watchdog.
Turns out that with the watchdog tripped you can still actuate on the L2 stage, but with about 3x less force at DC than with the watchdog untripped. So, maybe this explains our crummy alignment stability last night.
The second plot shows the effect of applying a large offset in pitch to ETMY L2 before and after the watchdog was reset, as observed by the optical lever.
I've added the SUS-ETM{X,Y}_BIO_L2_MON channels as conditions to the ETM warning lights on the OPS overview screen.
I ran my (now 3x faster) BruCo script using a lock stretch that Dan pointed to me, from last Friday (GPS time 1107840506 + 600 s). The report can be found at the following address:
https://ldas-jobs.ligo.caltech.edu/~gabriele.vajente/bruco_1107840506/
Here is an excerpt, but yuo should definitely read the whole book:
The second bullet is indeed very strange since it seems that SRCL/PRCL contributes pretty much uniformly to the DARM background. It seems rather strange to me that sensing noie of SRCL/PRCL can couple up to those frequencies. One simple way to test it would be to add a band-stop filter at 1-2 kHz in PRCL and SRCL loops, and see if the coherence with DARM gets reduced. If not, it means that the same souce of noise contributes in a similar way to both RF signals (used for PRCL and SRCL) and DC signals at the AS port. I can imagine things like: high order modes and RF modulation amplitude noise...
J. Kissel, S. Aston As per ECR E1500090, Integration Issue 1015, and Work Permit 5069 we are preparing to copy LLO's infrastructure implementation of using AS_A_RF45_Q_PIT (i.e. the pitch signal calculated from the Q phase of the 45 [MHz] demodulated signal from the AS A WFS in HAM6, see D1200666 and T1100472) to damping the four QUAD's roll mode. However, we found ourselves asking many questions of the implementation, namely if blindly copying is the right thing to do, mostly because of it requiring Reflected Memory (RFM) IPC connections between the ASC model and QUAD models. We think the answer is the blindly copying is NOT the right thing to do. We think we have a better sensor that is more suitable to how the roll modes appear in the H1 ASC sensors, and we think we don't have enough IPC head room. See discussion below. Please tell us your thoughts on this. If we don't hear any push back by ~9a PT tomorrow morning, we're going forward with the changes to the scheme as decided below. (1) Recently, LHO has pushed forward many different ways to *reduce* the number of ISC to ETM reflectec memory (RFM) communications: (a) We have split the ISC end-station models into two parts, an ALS and ISC model (LHO aLOG 16443) (b) We've moved the RFM connection for differential tidal control from being sent directly to the ETMs to being sent to the end-station ISC models first, and then sent to the SUS models via Dolphin PCIE. (T1400733) (c) Removed the SEI EX / EY computers from the RFM Loop (LHO aLOG 16775) (d) We will be (also tomorrow) re-distributing our corner-station LSC front end models to move the DARM path from the h1lsc.mdl to the h1omc.mdl (E1500041) Indeed, even LLO -- though they've only found to have *intermittent* IPC errors -- is testing out a faster CPU for their LSC computer to decrease the clock cycle turn around time. ... so, for the two new ASC to ETM RFM connections that LLO has added, should we do the same thing at (2) and feed the RFM connections to the end-station ISC models, and then to the SUS via Dolphin PCIE? Here is the current status of relevant IPC errors / clock-cycle turnaround time: Avg Clock Cycle IPC Receiving Channels Error Rate (CPU Max, [usec]) Errors? [n errors/sec] LLO LSC 35 No ASC 161 No ISCEX 27 No ISCEY 25 No [ALSEX] [does not exist] [ALSEY] [does not exist] SUSETMX 50 Intermittent L1:LSC-ETMX_[DARM_ERR,L_SUSETMX] less than 1 SUSETMY 49 Intermittent L1:LSC-ETMY_[DARM_ERR,L_SUSETMX] less than 1 LHO LSC 36 Yes H1:ALS-[X,Y]_ARM_RFM X: 100 to 1000, Y: 1-10 ASC 160 Yes H1:ALS-[X,Y]_REFL_SLOW_RFM 2048 ISCEX 5 No ISCEY 5 No ALSEX 32 No ALSEY 31 No SUSETMX 48 Yes H1:LSC-ETMX_[DARM_ERR,L_SUSETMX] 1 to 25, 5 to 40 SUSETMY 49 Yes H1:LSC-ETMY_[DARM_ERR,L_SUSETMX] 10 to 120, 5 to 60 Reminders: - all of the above models are running at a sampling rate of 16384 [Hz], so they have 60 [usec] to complete their clock cycle - all IPCs have a delay of 1 clock-cycle built in, but if all the computations do not complete, or just barely have enough time to complete, then the IPC misses the even the 1 clock-cycle and throws an error - The RFM loops (there are separate loops for each end station) is now LSC -> SUSETM[X,Y] -> ISCE[X,Y] -> OAF -> ASC -> LSC where the delay between the LSC and SUSETM[X,Y], as well as the ISCE[X,Y] and OAF, is the one-way four [km] light travel time through a glass fiber, n * L / c = 1.5 * 3995 [m] / 3e8 [m/s] = 20 [usec] and the delay between nodes in the same building is believed to be dt = 0.7 [usec]. This means from the LSC back to LSC delay is 2 * n * L / c + 3 * dt = 2 * 1.5 * 3995 [m] / 3e8 [m/s] + 3 * 0.7e-6 [s] = 42 [usec]. (2) The roll coupling to specifically AS_RF45_Q_PIT is the *only* feedback sensor that's piped to the SUS. But the coupling is a dirt effect do to imperfections in the real system. What guarantee do we have that we'll have equal success with this and only this sensor? To help answer the question, I took a look at 5 of the more recent times when the IFO was fully locked on DC readout, (a) 2015-02-12 07:35:56 UTC (b) 2015-02-13 05:23:45 UTC (c) 2015-02-21 01:53:30 UTC (d) 2015-02-21 03:08:20 UTC, and there doesn't appear to be a pattern on where they appear. For example, (e) 2015-02-23 09:14:31 UTC in which the there are one of two roll modes visible in DARM: 13.18 [Hz] in (a) and (b), 13.81 [Hz] in (b), (c), (d), and (e). At these times, I've taken a 0.05 [Hz] resolution amplitude spectral density of DARM and *every* ASC channel, i.e. AS A -- 45 and 36, I and Q, PIT and YAW AS B -- 45 and 36, I and Q, PIT and YAW REFL A -- 45 and 9, I and Q, PIT and YAW REFL B -- 45 and 9, I and Q, PIT and YAW TRX -- A and B, PIT and YAW TRY -- A and B, PIT and YAW POP -- A and B, PIT and YAW, or 52 channels plus DARM, and compared the amplitude and coherence to DARM in these channels for the 5 times. For each lock stretch, when either the 13.18 [Hz] mode or the 13.81 [Hz] mode is visible, it has an amplitude of ~2e-8 [ct] in DARM. After staring at the plots for an hour or three, I've convinced myself (and Stuart) that in 4 out of the 5 lock stretches, either AS B RF45 Q PIT or YAW is the sensor channel that sees both modes reliably *not* AS A RF 45 Q PIT. Further, observability does not preclude controllability, so it's not immediately clear which of the sensor signals will actually serve well for feedback. Finally, in the 5 lock stretches, only two of the four possible QUAD's highest roll modes have appeared. BUT I tested the hypothesis "does the lower frequency mode appear only in the TRs, and the higher in POP?" on the hunch that those two signals would be able to distinguish between an ETM roll mode or an ITM roll mode: again, I've convinced myself that the lower frequency mode (13.18 [Hz]) is only ever visible in POP -- implying its an ITM -- and the higher frequency mode is visible in the TRs -- implying it's an ETM. Maybe. Here's my point: if we want to be able to damp both of these modes, then just one ASC sensor isn't the right answer, and which sensor sees the mode the best is IFO dependent.
Comparisng two stretches of lock that Dan pointed to me (one from Friday, good stable sensitivity, and one from last night, worse sensitivity), I find a clear non-stattonary behavior of the sensitivity, see the first two plots, which are spectrograms of ten minutes of data during each lock.
Analyzing last night's lock, I computed the band-limited RMS between 100 and 300 Hz and it indeed shows an almost regular oscillation at about 120-140 mHz. I compared this oscillation with few auxiliary channels, and found out good correlation with the SRM angular motion (third plot), and expecially with AS_RF36 WFS signals (fourth plot).
The last two plots show my usual least square fit of the BLRMS using a set of auxiliary channels:
In summary, I think we need to commission an angular control for the SRM, and it seems we have good signals from AS_RF36. The main motion is in pitch.
= Ops Summary =
== Morning Meeting ==
Visitors: Stuart, Gabriele, Ken, Myron. Welcome to Tumbleweed Land.
SEI (Hugh) Things seems okay. Platform survived the weekend.
SUS (Betsy) I can't hear you. Sorry :-(
ISC *silence*
CDS (Richard) Things are somewhat stable. Loading/unloading ldas racks.
Vacuum ...
== Activities ==
9:58 Elli and Dave to IOT2R
*Difficulty requesting MICH_DARK_LOCKED. The beam was too low and the PD wasn't picking up all the signal. Alexa adjusted SR2 - All is well (for now)*
10:49 Mitchell to Mid X
11:54 Mitchell back
12:36 Got an automated call from Hanford Emergency System saying the alarm will go off at some point...
12:40 Elli and Dave back
13:38 Myron and Ken to Mid X
15:04 Myron and Ken back
Jodi - The big truck that supposed to come move the racks is not coming today.
Laser Status: SysStat is good Output power is 28.7 W (should be around 30 W) FRONTEND WATCH is RED HPO WATCH is RED PMC: It has been locked 1 day, 2 hr 34 minutes (should be days/weeks) Reflected power is 2 Watts and PowerSum = 24.4 Watts. (Reflected Power should be <= 10% of PowerSum) FSS: It has been locked for 0 h and 44 min (should be days/weeks) Threshold on transmitted photo-detector PD = 0.70 V (should be 0.9V) ISS: The diffracted power is around 13.8% (should be 5-15%) Last saturation event was 1 d 2 h and 4 minutes ago (should be days/weeks)
Added 250mL of water.
As I logged Friday (16836,) I set the minor alarm point to +- 2psi but that is just too tight for the EndY where the pressure sensor channels are the most noisy (by 4 to 10x.) I opened the non-alarm pressure to +-3psi. Please note if these alarms continue. The trends suggest that still might be just a tad too close.
Attached is 5 seconds of full data where the ISC sent a whole bunch (5e5nm) of LONG output to the HEPI all at once. The fast channels (shown on HPI ISO X) triggered the HEPI watchdog and the slow channels never see a thing. Why would ISC do such a thing?
Patrick T., Cyrus R. The work described under permit 5066 is complete. No issues came up.
A late entry with data: It doesn't look like the ETMx offsets that Keita (alog 16591) implemented in an attempt to decouple the large pitch of the main chain from the reaction chain was doing much.
Attached are Pitch transfer functions of the top, middle, and lower stages of the ETMx comparing various times when the large pitch bias and the Keita offsets were on and off. Not much change in any of the configurations in any of the 3 stages of the ETMx. Note, according to Jeff's alog 16824, the Keita offsets were all turned off last Thur night.
After clearing all the SEI Overflows on the GDS_TP medms Friday, this morning some have and some don't report new overflows. Where do this come from?
Okay--No Overflows reported over the weekend on:
HEPI: HAMs 2, 4 & 5, BSCs ITMY, ITMX, BS, ETMY; ISIs: HAMs 2, 4, 5, ITMY, ITMX, ETMY.
So that leaves Overflows did occur on:
HPIs HAMs 3, 6, & BSC ETMX and ISIs HAM3, HAM6, BS, & ETMX.
Okay so there is almost a complete pattern here. The BS is the only platform that had overflows on the ISI but not on the HEPI.
So for the overflows, on the ISIs, HAM3, HAM6, ETMX and the BS have overflows on the GS13s, the L4Cs, and on the DACs.
For HEPI there are overflows on the ETMX DAC; but, strangely, while the GDS-TP Overflows (e.g.:H1:FEC-55_ACCUM_OVERFLOW) have non-zero numbers, non of the ADC or DAC accumulators show any totals on the HAM3 or HAM6 (e.g.:H1:FEC55_ADC_OVERFLOW_ACC_2_9)...
See the attached for the inconsistant circled numbers: there are overflow totals on the overflow but non on the ADCs or DAC to support where they originate.
So next, I'll try to put a reason for the Overflows: maay an earthquake or a trip (although I don't believe we had any trips. HAM6 with the Fast Shutter may be the only platform with a valid reason for overflows. And, why does the HAM3 and HAM6 HEPI overflows don't show from where they came.
model restarts logged for Sat 21/Feb/2015
no restarts reported
model restarts logged for Sun 22/Feb/2015
2015_02_22 03:42 h1fw0
one unexpected restart.
Dan, Sheila
OP Lev Damping on ETMs
We have turned the op lev damping for the ETMs off completely after the interferometer is locked. This is now in the gaurdian. The attached spectra show the witness sensors with oplev damping on (dashed lines) and off (solid lines). We knew these loops were not good, but this clearly shows it. The bottom right plot shows the DHARD WFS error signal with the oplev damping on and off.
DHARD WFS
The DHARD loop, which was designed based on measurements made at a carm offset of about 50 picometers with high op lev damping gain (see alog 16501 comment) , needs to be redesigned to take into account changes in the plant on resoannce and with the op lev damping off or reduced. The attached plots show the change in the transfer function, the first one shows the open loop gain in the situation where the loop was designed, the transfer function on resonance, and a partial measurement of the TF with oplev damping gain reduced. You can see this loop becomes unstable, which explains why we have had to reduce the gain several times in the last few weeks. After turning off the op lev damping, we attempted to take some measurements which could be used to design a new loop, these are shown in the last screen shot and can be found in my directory under /Alingment/DHARD_WFS_NO_OPLEV_WHITE_NOISE.xml While the cohernce is not good at some frequencies, this should be better than what we have now.
Attachments:
1) Spectra showing that op Lev damping was imposing noise, and is now off
2) Swept sine measurements showing that DHARD PIT becomes unstable as we go on resonance and reduce Op Lev gains.
3) swept sine measurements of DHARD Pit plants on resonance and with reduced op lev gain
4) noise injection measurements of DHARD plant with op lev damping off.
Other locking notes from tonight:
- The ETMY violin modes were higher tonight than Friday by a factor of several. I made +/-60deg filters in the ETMY_L2_DAMP_MODE2 filter bank and toyed with the gain and phases. With a gain of 500k and no phase shift the peak height decreased by ~20% in half an hour. So, perhaps some progress.
- On Friday there was some coherence above 100Hz with the OMC ASC loops while the dither was on, I installed some more aggressive butterworth rolloff filters into the dither error signal filter banks. Also I added some poles and zeros to the POSX filter bank to counteract a somewhat poor job of plant inversion for this d.o.f. Tonight the dither alignment was robust, but the IFO itself was quite noisy, so it's not clear whether this was a measureable improvement over Friday.
- Progress was made on the OMC autolocker code. The challenge here is that there is no high-frequency readback of PZT2, only the DC monitor channel is recorded at >16Hz. There is some lag between the PZT2 output and the DC monitor readback, so while the code can find the mode (by sweeping the cavity), the voltage it expects the mode to be at isn't the same as the PZT slider value. This needs some work, more fast DQ channels in the OMC model would be helpful.
We made it to low noise a couple of times, but the pitch motion at the AS port was so bad that the DC lock was not very stable.
J. Kissel, S. Dwyer During the earthquake we captured new safe.snaps for the ETMs. Note that in doing so, I brought the SUSs to "safe" by hand, then requested the guardian to go to "SAFE", and vice versa brought the sus to "ALIGNED" via guardian, then restored everything that the guardian didn't touch -- all because I finally wanted to compare what my impression of what the guardian should be doing is different from what the guardian currently does after months of neglect from me, and commissioning by others. Most notably, it DOES NOT touch any part of the locking filters, and it DID turn on BOTH degrees of freedom of optical lever damping. For the record, the safe I gathered, I turned OFF ALL euler basis output switches: LOCK, DITHER, DAMP, OL DAMP, DARM DAMP, etc. Further, I've turned OFF the large offsets installed by Keita reference in LHo aLOG 16591. Those offsets, and all the expected LOCK filter banks and optical lever Pitch damping has been turned back ON. I've committed the new safe.snap the the userapps repo. Currently, we want optical lever damping to be either under human control or ISC guardian control, so we have commented out the lines in the INIT and ENGAGE_DAMPING states with call the function susobj.olDampOutputSwitchWrite to turn the levers ON. I've commited SUS.py to the userapps repo.
J. Kissel, B. Weaver The above log is a unclear regarding Keita's offsets. I had turned the offsets in M0 DRIVEALIGN L2L, L2P, and L2Y, as well as the R0 TEST L OFF while capturing the safe.snap and have LEFT them OFF since. Repeat: the M0 DRIVEALIGN L2L, L2P, and L2Y, as well as the R0 TEST L are now OFF and should remain OFF because they have proven to be ineffective (Betsy's posting an aLOG with the proof shortly).