J. Kissel, K. Izumi, J. Warner, S. Dwyer After another ETMX front-end failure this morning (see LHO aLOG 35861, 35857 etc.), the recovery of the IFO was much easier, because of yesterday morning's lessons learned about not running initial alignment scripts that suffer from bit rot (see LHO aLOG 35839). However, after completing recovery, the SDF system's OBSERVE.snap let us know that some of the same critical initial alignment references were changed at 14:17 UTC, namely - the green ITM camera reference points: H1:ALS-X_CAM_ITM_PIT_OFS H1:ALS-X_CAM_ITM_YAW_OFS and - the transmission monitors red QPDs: H1:LSC-X_TR_A_LF_OFFSET After discussing with Jim, he'd heard the Corey (a little surprisingly) didn't have too much trouble with turning on the green ASC system, which, if these ITM camera offsets are large, then that means the error signals are large, and we'd have the same trouble closing them as yesterday. We traced down the change to when Dave had to the reboot of h1alsex & h1iscex this morning at around 14:15 UTC -- see LHO aLOG 35862 -- and those two models out of date safe.snap files restored. Recall that the safe.snaps for these computers are soft linked to the down.snaps in the user apps repo: /opt/rtcds/lho/h1/target/h1alsex/h1alsexepics/burt ]$ ls -l safe.snap lrwxrwxrwx 1 controls controls 62 Mar 29 2016 safe.snap -> /opt/rtcds/userapps/release/als/h1/burtfiles/h1alsex_down.snap /opt/rtcds/lho/h1/target/h1iscex/h1iscexepics/burt ]$ ls -l safe.snap lrwxrwxrwx 1 controls controls 62 Mar 29 2016 safe.snap -> /opt/rtcds/userapps/release/isc/h1/burtfiles/h1iscex_down.snap where the "safe.snap" in the local "target" directories are what the front uses to restore its EPICs records (which is why we've intentionally commandeered the file with a soft link to a version controlled file in the userapps repo). We've since reverted the above offsets to their OBSERVE values, and I've accepted those OBSERVE values into the safe.snap / down.snap and committed the updated snap to the userapps repo. In the attached screenshots, the "EPICS VALUE" is the correct OBSERVE value, and the "SETPOINT" is the errant safe.snap. So, they show what I've accepted as the current correct value.
The fundamental problem here is our attempt to maintain 2 files with nearly duplicate information (safe and observe are mostly the same settings, realistically only one file is ever going to be well maintained).
I've added a test to DIAG_MAIN to check if the ITM camera references change. It's not a terribly clever test, because it just checks if the camera offset is within a small range around a hard coded value for pitch and yaw for each ITM. These values will need to be adjusted if the cameras are moved or if the reference spots are moved meaning there will be 3 places these values need to be updated (both OBSERVE and safe.snap files and, now, DIAG_MAIN) but hopefully this will help keep us from getting bitten by changed references again. The code is attached below.
@SYSDIAG.register_test
def ALS_CAM_CHECK():
"""Check that ALS CAM OFS references havent changed. Will need to be updated if cameras are moved
"""
nominal_dict = {
'X' : {'PIT':285.850, 'YAW':299.060, 'range':5},
'Y' : {'PIT':309.982, 'YAW':367.952, 'range':5},
}
for opt, vals in nominal_dict.iteritems():
for dof in ['PIT','YAW']:
cam = ezca['ALS-{}_CAM_ITM_{}_OFS'.format(opt,dof)]
if not (vals[dof] + vals['range']) > cam > (vals[dof] - vals['range']):
yield 'ALS {} CAM {} OFS changed from {}'.format(opt,dof,vals[dof])
Summary: no apparent change in the induced wavefront from the point absorber.
After a request from Sheila and Kiwamu, I checked the status of the ITMX point absorber with the HWS.
If I look at the wavefront approximately 13 minutes after lock-aquisition, I see the same magnitude of optical path distortion across the wavefront (approximately 60nm change over 20mm). This is the same scale of OPD that was seen around 17-March-2017.
Note that the whole pattern has shifted slighlty because of some on-table work in which a pick-off beam-splitter was placed in front of the HWS.
Thanks Aidan.
We were wondering about this because of the reappearance of the braod noise lump from 300-800 Hz in the last week, which is clearly visible on the summary pages. (links in this alog ) It we also now have broad coherence between DARM and IMC WFS B pit DC, which I do not think we have had today. We didn't see any obvious alignment shift that could have caused this. It also seems to be getting better or going away if you look at today's summary page.
Here is a bruco for the time when the jitter noise was high: https://ldas-jobs.ligo-wa.caltech.edu/~sheila.dwyer/bruco_April27/
TITLE: 04/28 Owl Shift: 07:00-15:00 UTC (00:00-08:00 PST), all times posted in UTC
STATE of H1: LOCKING by Jim, but still in CORRECTIVE MAINTENANCE (will write an FRS unless someone else beats me to it again!)
INCOMING OPERATOR: Jim
SHIFT SUMMARY:
Groundhog's Day shift, with H1 fine for the first 6hrs and then EX going down again (but this time SUS...see earlier alog). This time I kept away from even breathing on the TMSx dither scripts & simply restored ETMx & TMSx to their values before all of the front end hub bub of this morning. I was able to get ALSx aligned & this is where I'm handing off to Jim (he had to tweak on ALSy & see that he is already tweaking up a locked PRMI. Much better outlook than yesterday at this time for sure!
LOG:
FRS Assigned & CLOSED/RESOLVED for h1susex frontend crash:
https://services.ligo-la.caltech.edu/FRS/show_bug.cgi?id=7995
H1SUSEX front end computer and IO chassis were rebooted this morning to deal with the issue posted by Corey. Richard / Peter
Same failure mode on h1susex today as h1seiex had yesterday. Therefore we were not able to take h1susex out of the Dophin fabric, and so all dolphin connected models were glitched after the reboot of h1susex.
Richard power cycled h1susex and its IO Chassis. I killed all models on h1seiex and h1iscex, and then started all models on these computers. No IRIG-B timing excursions. Cleared IPC and CRC errors, Corey reset the SWWD. IFO recovery has started.
here are the front end computer uptimes (times since last reboot) ran at 10:02 this morning. The longest any machine has ran is 210 days since the site power outage 30 Sep 2016.
h1psl0 up 131 days, 18:38, 0 users, load average: 0.37, 0.13, 0.10
h1seih16 up 210 days, 3:00, 0 users, load average: 0.11, 0.14, 0.05
h1seih23 up 210 days, 3:00, 0 users, load average: 0.62, 1.59, 1.37
h1seih45 up 210 days, 3:00, 0 users, load average: 0.38, 1.31, 1.17
h1seib1 up 210 days, 3:00, 0 users, load average: 0.02, 0.04, 0.01
h1seib2 up 210 days, 3:00, 0 users, load average: 0.02, 0.08, 0.04
h1seib3 up 210 days, 3:00, 0 users, load average: 0.00, 0.05, 0.06
h1sush2a up 210 days, 3:00, 0 users, load average: 1.64, 0.59, 0.56
h1sush2b up 210 days, 3:00, 0 users, load average: 0.00, 0.00, 0.00
h1sush34 up 210 days, 3:00, 0 users, load average: 0.00, 0.03, 0.00
h1sush56 up 210 days, 3:00, 0 users, load average: 0.00, 0.00, 0.00
h1susb123 up 210 days, 3:00, 0 users, load average: 0.17, 1.07, 1.10
h1susauxh2 up 210 days, 3:00, 0 users, load average: 0.00, 0.00, 0.00
h1susauxh34 up 117 days, 17:07, 0 users, load average: 0.08, 0.02, 0.01
h1susauxh56 up 210 days, 3:00, 0 users, load average: 0.00, 0.00, 0.00
h1susauxb123 up 210 days, 2:07, 0 users, load average: 0.00, 0.00, 0.00
h1oaf0 up 164 days, 20:16, 0 users, load average: 0.10, 0.24, 0.23
h1lsc0 up 207 days, 41 min, 0 users, load average: 0.06, 0.57, 0.65
h1asc0 up 210 days, 3:00, 0 users, load average: 1.03, 1.82, 1.80
h1pemmx up 210 days, 3:53, 0 users, load average: 0.05, 0.02, 0.00
h1pemmy up 210 days, 3:53, 0 users, load average: 0.00, 0.00, 0.00
h1susauxey up 205 days, 23:41, 0 users, load average: 0.07, 0.02, 0.00
h1susey up 210 days, 3:06, 0 users, load average: 0.14, 0.04, 0.01
h1seiey up 206 days, 51 min, 0 users, load average: 0.00, 0.03, 0.00
h1iscey up 210 days, 3:07, 0 users, load average: 0.04, 0.21, 0.20
h1susauxex up 210 days, 3:16, 0 users, load average: 0.00, 0.00, 0.00
h1susex up 3:06, 0 users, load average: 0.00, 0.00, 0.00
h1seiex up 1 day, 2:57, 0 users, load average: 0.00, 0.00, 0.00
h1iscex up 177 days, 21:59, 0 users, load average: 0.08, 0.33, 0.24
Here is the list of free RAM on the front end computers in kB:
h1psl0 4130900
h1seih16 4404868
h1seih23 4023316
h1seih45 4024452
h1seib1 4754280
h1seib2 4763256
h1seib3 4753960
h1sush2a 4009216
h1sush2b 5160476
h1sush34 4389108
h1sush56 4400720
h1susb123 4013144
h1susauxh2 5338804
h1susauxh34 5351172
h1susauxh56 5350144
h1susauxb123 5339900
h1oaf0 9102096*
h1lsc0 4065228
h1asc0 3988536
h1pemmx 5358464
h1pemmy 5358352
h1susauxey 5352196
h1susey 64277012~
h1seiey 4758336
h1iscey 4117788
h1susauxex 5349644
h1susex 64301796~
h1seiex 4769840
h1iscex 4138204
* oaf has 12GB
~ end station sus have 66GB
At 12:44utc (5:44amPDT):
(attached is a screenshot of all the WHITE screens we have for EX.)
Tally of activities taken to recover
Noticed some new YELLOW on the Guardian Overview (and Intent Bit) medm. It's related to a Time Ramp for the HEPI IPS' (i.e. H1:HPI-ETMX_IPS_[H/V]P_TRAMP). The other (6) channels have 30sec, but the setpoints for these two is currently 0. Leaving this for the SEI crew to remedy (perhaps next time we drop out of OBSERVING.
Attached is a screen shot showing the medms in question.
TITLE: 04/28 Owl Shift: 07:00-15:00 UTC (00:00-08:00 PST), all times posted in UTC
STATE of H1: Observing at 61Mpc
OUTGOING OPERATOR: Nutsinee
CURRENT ENVIRONMENT:
Wind: 9mph
Primary useism: 0.02 μm/s
Secondary useism: 0.11 μm/s (at 50 percentile)
QUICK SUMMARY:
H1 range hovering just above 60Mpc & winds are slightly high at the end stations.
TITLE: 04/28 Eve Shift: 23:00-07:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 63Mpc
INCOMING OPERATOR: Corey
SHIFT SUMMARY: One lockloss. *Probably* caused by the Noise Eater (see attached plot). No issue recovering.
Got a NPRO out of range message on Diag Main so I went in LVEA and toggle the Noise Eater. Not sure if the noise eater went bad and cause the lockloss or the other way round. Anyway, resume locking now.
Verbal Alarm complained about Type Error then crashed. There were similar errors in the log last night when seismic FE tripped.
Back to Observe 06:09 UTC
Been Observing for 5.5 hours. A big tour group in the control room early in the evening. Wind seems to be dying down. No issue to report.
J. Kissel I've gathered a full set of transfer functions to measure the IFO sensing and actuator functions. All looks quite normal, detailed analysis to come. The data lives here: Sensing Function Measurements 2017-04-27_H1DARM_OLGTF_4to1200Hz_25min.xml 2017-04-27_H1_PCAL2DARMTF_4to1200Hz_8min.xml 2017-04-27_H1_PCAL2DARMTF_BB_5to1000Hz_0p25BW_250avgs_5min.xml 2017-04-27_H1_OMCDCPDSUM_to_DARMIN1.xml <-- new measurement that captures the [mA/ct] scale of the digital portion of the sensing function Actuation Function Measurements UIM: 2017-04-27_H1SUSETMY_L1_iEXC2DARM_25min.xml 2017-04-27_H1SUSETMY_L1_PCAL2DARM_8min.xml PUM: 2017-04-27_H1SUSETMY_L2_iEXC2DARM_17min.xml 2017-04-27_H1SUSETMY_L2_PCAL2DARM_8min.xml TST: 2017-04-27_H1SUSETMY_L3_iEXC2DARM_8min.xml 2017-04-27_H1SUSETMY_L3_PCAL2DARM_8min.xml All has been committed to the Calibration SVN.
While updating the FMCS MEDM screens for the migration to the BACNet IOC I noticed that the alarm status for chiller pump 2 at end Y (H0:FMC-EY_CY_ALARM_2) has been active (value = 2) for at least a month. Is this normal?
This is NOT a chiller pump alarm. It is a "Chiller 2" alarm. There are two devices which are supplying chilled water for the building HVAC - The Chiller is a refrigeration machine which cools the water and there is a separate chilled water pump (CWP) which circulates the water through the chiller and up to the building.
This chiller has two cooling circuits - one has a known fault - hence the alarm.
To confuse the issue further there are two chillers and two chilled water pumps at the end station - this provides us redundancy in case of failure.
The critical alarm is the "Chilled Water Supply Temperature". This temperature is currently normal.
I added two band stops to the CHARD loops, which have reduced the CHARD drives from 15-25 Hz by about a factor of 10. ASC noise should no longer be limiting our DARM sensitivity above 15Hz, the noise is only slightly better from 15-20 Hz.
The first attachment shows the slight improvement in the DARM noise, the difference in the drives (the cut offs I added were 2nd order elliptic bandstops), and the reduction in the coherence between the ASC drives and DARM. For CHARD Y the first screenshot shows the loop measurement before I added the bandstop, the bandstop should have reduced the phase at the upper ugf (2.5 Hz) by about 5 degrees. For CHARD P I reduced the gain by about 3dB, the third screenshot shows the before measurement in blue, a measurement after I reduced the gain but before I added the cut off in red. For CHARD the cut off only reduced the phase at the upper ugf of 3 Hz by 6 degrees, we are left with almost 50 degrees of phase margin.
I also re ran the noise budget injections for CHARD, DHARD, MICH, SRCL, PRCL and IMC PZT jitter. There only real change is that the ASC noise is lower, and there is a larger gap between the sum and the measured noise at 25 Hz. I am not able to download GDS strain from this afternoon, so I will post a noise budget when I can get data.
Was there a change to H1:ASC-CHARD_P_GAIN from -0.10 to -0.14? (This came up as an SDF Diff for the next lock.)
Yes, Corey, sorry I forgot to load the guardian.
The first attachment is the noise budget with measurements from yesterday. You can see that the broad lump that we blame on beam size jitter is worse, there is a gap between the measured noise and the sum of the predicted noises from 300-800 Hz which was not present in the noise budget from early Jan (here) Looking at the summary pages, you can see that this has happened in the last week. (April 18th compared to yesterday). Kiwamu and I had a look at some alignment sensors, and at first glance it doesn't seem like we've had an unusual alignment change this week. We asked Aidan to check the Hartmann data to see if there has been a change in absorption on ITMX.
The linear jitter is also slowly getting worse, which you can see by comparing the 350 Hz peak to January. The next two attached pngs are screenshots of the jitter transfer functions measured yesterday using the IMC PZT. You can compare these to measurements from mid Feb and see that the coupling is about 50% worse for yaw and almost a factor fo 2 worse for pit.
The 4th attachment shows a comparison of the coherence between DARM and the IMC WFS DC signals for February to earlier today. We now have broad coherence between IMC WFS B pit and darm, which I don''t think I have seen before even when we had a broad lump of noise in DARM before our pre O2 alignment change.
The last attachement shows coherences between DARM and the bullseye PD on the PSL.
Evan G., Robert S. Looking back at Keith R.'s aLOGs documenting a changes happening on March 14 (see 35146, 35274, and 35328), we found that one cause seems to be the shuttering of the OpLev lasers on March 14. Right around this time, 17:00 UTC on March 14 at EY and 16:07 UTC at EX, there is an increase in line activity. The correlated cause is Travis' visit to the end station to take images of the Pcal spot positions. The images are taken using the Pcal camera system and needs the OpLevs to be shuttered so that a clean image can be taken without the light contamination. We spoke with Travis and he explained that he disconnected the USB interface between the DSLR and the ethernet adapter, and used a laptop to directly take images. Around this time, the lines seem to get worse in the magnetometer channels (see, for example, the plots attached to Keith's aLOG 35328). After establishing this connection, we went to the end stations to turn off the ethernet adapters for the Pcal cameras (the cameras are blocked anyway, so this active connection is not needed). I made some magnetometer spectra before and after this change (see attached). This shows that a number of lines in the magnetometers are reduced or are now down in the noise. Hopefully this will mitigate some of the recent reports of combs in h(t). We also performed a short test turning off another ethernet adapter for the H1 illuminator and PD. This was turned off at 20:05:16 18/04/2014 UTC and turned back on at 20:09:56 UTC. I'll post another aLOG with this investigation as well.
Good work! That did a lot of good in DARM. Attached are spectra in which many narrow lines went away or were reduced (comparing 22 hours of FScan SFTs before the change (Apr 18) with 10 hours of SFTs after the change (Apr 19). We will need to collect much more data to verify that all of the degradation that began March 14 has been mitigated, but this first look is very promising - many thanks! Fig 1: 20-50 Hz Fig 2: 50-100 Hz Fig 3: 100-200 Hz
Attached are post-change spectra using another 15 hours of FScan SFTs since yesterday. Things continue to look good. Fig 1: 20-50 Hz Fig 2: 50-100 Hz Fig 3: 100-200 Hz
Correction: the date is 18/04/2017 UTC.
Another follow-up with more statistics. The mitigation from turning off the ethernet adapter continues to be confirmed with greater certainty. Figures 1-3 show spectra from pre-March 14 (1210 hours), a sample of post-March 14 data (242 hours) and post-April 18 (157 hours) for 20-50 Hz, 50-100 Hz and 100-200 Hz. With enough post-April 18 statistics, one can also look more closely at the difference between pre-March 14 and and post-April 18. Figures 4-6 and 7-9 show such comparisons with different orderings and threrefore different overlays of the curves. It appears there are lines in the post-April 18 data that are stronger than in the pre-March 14 data and lines in the earlier data that are not present in the recent data. Most notably, 1-Hz combs with +0.25-Hz and 0.50-Hz offsets from integers have disappeared. Narrow low-frequency lines that are distinctly stronger in recent data include these frequencies: 21.4286 Hz 22.7882 Hz - splitting of 0.0468 Hz 27.4170 Hz 28.214 Hz 28.6100 Hz - PEM in O1 31.4127 Hz and 2nd harmonic at 62.8254 Hz 34.1840 Hz 34.909 Hz (absent in earlier data) 41.8833 Hz 43.409 Hz (absent in earlier data) 43.919 Hz 45.579 Hz 46.9496 Hz 47.6833 Hz 56.9730 Hz 57.5889 Hz 66.7502 Hz (part of 1 Hz comb in O1) 68.3677 Hz 79.763 Hz 83.315 Hz 83.335 Hz 85.7139 Hz 85.8298 Hz 88.8895 Hz 91.158 Hz 93.8995 Hz 95.995 Hz (absent in earlier data) 107.1182 Hz 114.000 Hz (absent in earlier data) Narrow low-frequency lines in the earlier data that no longer appear include these frequencies: 20.25 Hz - 50.25 Hz (1-Hz comb wiped out!) 24.50 Hz - 62.50 Hz (1-Hz comb wiped out!) 29.1957 Hz 29.969 Hz Note that I'm not claiming change points occurred for the above lines on March 14 (as I did for the original set of lines flagged) or on April 18. I'm merely noting a difference in average line strengths before March 14 vs after April 18. Change points could have occurred between March 14 and April 18, shortly before March 14 or shortly after April 18.
To pin down better when the two 1-Hz combs disappeared from DARM, I checked Ansel's handy-dandy comb tracker and found the answer immediately. The two attached figures (screen grabs) show the summed power in the teeth of those combs. The 0.5-Hz offset comb is elevated before March 14, jumps up after March 14 and drops down to normal after April 18. The 0.25-Hz offset comb is highly elevated before March 14, jumps way up after March 14 and drops down to normal after April 18. These plots raise the interesting question of what was done on April 18 that went beyond the mitigation of the problems triggered on March 14. Figure 1 - Strength of 1-Hz comb (0.5-Hz offset) vs time (March 14 is day 547 after 9/15/2014, April 18 is day 582) Figure 2 - Strength of 1-Hz comb (0.25-Hz offset) vs time