Wed Aug 30 10:07:53 2023 INFO: Fill completed in 7min 49secs
Jordan confirmed a good fill curbside
Not sure why we lost lock. 16:18UTC
We just had a second instance of a DAQ CRC error from h1sush2b to DC0 at 08:41:46 Wed 30 Aug 2023 PDT.
This is identical to the one we had 05:56:09 Sun 27 Aug 2023 PDT. FRS28945 was opened for the Sun event, I have added today's event to it.
The event is characterized as:
for one of the1/16th second data blocks in GPS second 1377445324 all of the DAQ data originating from h1sush2b's local DC to DAQ-DC0 was invalid. However for the same 1/16th second, the data block sent to DAQ-DC1 was valid.
Yesterday Jonathan configured the frame writers to store the MD5 checksum of the frame files. The error occured at GPS=1377445324, taking the modulo-64 of this time gives the full frame file name, and we can verify the checksums for these files differ between FW0 and FW1
>>> 1377445324 - (1377445324 % 64)
1377445312
Using the NDS to view the files (it mounts them read-only):
ssh root@h1daqnds0 "cat /frames/full/13774/H-H1_R-1377445312-64.md5"
fa59b37c1f1f02aeda8123ea81e70009
ssh root@h1daqnds1 "cat /frames/full/13774/H-H1_R-1377445312-64.md5"
1347619a6b8e18e103ceb86b5df75380
And they are indeed different (as are the frame file sizes, not shown).
As before, we can say that the FW0 frame is bad, and the FW1 frame is good and should be archived instead.
Yesterday during the process of changing lights in the MSR, it was discovered that an end cap had been dislodged from a section of duct work. The end cap was replaced and secured with tek screws. This in turn redirected a large portion of the air flow to the control room which increased the noise factor. I have adjusted the flow to the control room to a more pleasant audible level. The air flow was 19000 cfm and is now set at 12000 cfm.
TITLE: 08/30 Day Shift: 15:00-23:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Observing at 143Mpc
OUTGOING OPERATOR: TJ
CURRENT ENVIRONMENT:
SEI_ENV state: CALM
Wind: 15mph Gusts, 10mph 5min avg
Primary useism: 0.03 μm/s
Secondary useism: 0.07 μm/s
QUICK SUMMARY:
We dropped Observing at 15:16 from the squeezer losing lock, back in Observing at 15:18
Squeezer kept losing lock, eventually it gave the message ISS is off check alog 70050, which I did and followed the instructions and set opo_grTrans_setpoint_uW to 60. I accepted the new OPO_TEC_temp in SDF and brought us back to Observing at 16:03
FC1 seems to have some strange things going on; from screenshot, see the circled FC1_M1_OSEMINF_T3_INMON, which seem to have changed behavior in the past 12 hours, and we see exces smotion in FC1_M1_DAMP_P/R/V, compared to the previous/normal locks.
On FC GREEN trans camera, the beam spot appears visibly dithering in pitch while it is locked. No input suspension ZM1/2/3 or FC1/2 see dithers running, but the DAMP filters on FC1 all seem to have a lot of motion, even with the FC is unlocked.
TITLE: 08/29 Eve Shift: 23:00-07:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 136Mpc
INCOMING OPERATOR: TJ
SHIFT SUMMARY: Standing down for high winds on site for the first half of the shift. Able to relock once wind was quieter and have been observing since. Winds are still high so things are less stable and range is lower.
Tried relocking three times, but could not get past locking ALS due to the (now much higher) wind gusts. Jenne made the call to stand down until the winds died down (alog 72536).
The RO_WATER alarm and EX/EY dust alarms have continued to alarm on/off throughout the shift.
LOG:
Start Time | System | Name | Location | Lazer_Haz | Task | Time End |
---|---|---|---|---|---|---|
21:46 | CAL | Tony | PCAL lab | LOCAL | PCAL work | 23:46 |
22:18 | FAC | Tyler + 1 | SIte, EndY | N | Site tour, air handlers | 23:18 |
23:49 | VAC | Janos | EY | - | Circulation measurements | 00:10 |
State of H1: Lock Acquisition
Winds have died down enough to the point where I'm attempting to lock H1. So far so good; there was one lockloss while trying to lock DRMI, but this time currently up to ENGAGE_ASC_FOR_FULL_IFO.
Also, the RO_WATER alarm has been going off/on for the past 3 hours. Let Bubba know, also tagging facilities.
While H1 was standing down this evening waiting for the high winds to quiet down before relocking, Jeff and Jenne made the observation that the ground motion we usually see in the 30-100mHz frequency band of the ETM ISI BLRMS from wind was also bleeding into the higher frequency bands, visible up through the 30Hz band. See attachment #1 for wind speeds, attachment #2 for the 30-300mHz bands, and attachment #3 for the 3-30Hz bands.
At first, this behavior was thought to be unphysical, but then Jeff pointed out the ASD of the BRS "super-sensor," where tilt motion is subtracted out (attachment #5, ETMX used as an example). It can be seen that the ground motion is such that the "BRS RY OUT" trace is elevated from the low frequency to above 1Hz compared to the same traces from a day ago (attachment #4), showing that this indeed is the result of strong winds tilting the end stations.
Yes, the winds do tilt the buildings, and if the sensor is above the rotation center then you get a translation. The real model must be more complex because the slab is bending - so there is not a single "rotation center".
In any case, the translation has been seen for a long time. See, for example some of my data and some of Robert's data shown on page 16 and 17 of https://dcc.ligo.org/LIGO-G1501371.
Note - we built a wind fence instead of using the inside/outside sensors. This was a good choice, even though it took a while...
Following on last weeks log https://alog.ligo-wa.caltech.edu/aLOG/index.php?callRep=72444 As planned, two technicians from Macdonald Miller visited site today to continue investigation of the runaway and failure of chilled water system 1 at EY. A failing flow switch was discovered. Further investigation showed that not only had the wiring for the flow switch been chewed away exposing/severing the wiring (bunny, mice etc.) but that a CBUS line in the chillers harness had also been met with curious teeth. Both damaged items have a ~20 week lead time but are on order. Purely as a contingency plan (given part lead times), we cross referenced flow switch part numbers on the corner station chillers. In the event of a chilled water apocalypse, we are confident that flow switches from either of two non-running corner station chillers would be suitable fill-ins to keep the end stations functional. Because we incurred mild glycol loss during todays work at EY, Chris Soike brought a barrel of glycol up to EY. We monitored pressures at the VEA closely and replenished losses at the makeup tank. The makeup tank is presently sitting ~75% capacity. Supply and return pressures look good. The system continues to be supplied and satisfied by chiller 2 and chilled water pump 2. C. Soike B. Haithcox A. Tarralbo T. Guidry
Good catch, summary, and photos! (tagging EPO)
(Wow 5-months for parts?? Eeek. )
Gabriele and I think we have found the problem causing the large 102 Hz line. Today I plotted the LSC control and error signals, the LSC FF out signals and the OMC DCPD sum.
The 102 Hz line is clearly evident in SRCLFF out and OMC DCPD sum, but not present in the LSC control or error signals, or in MICHFF out.
The line showed up on August 4 when the SRCL feedforward was retuned. We have made no changes to the SRCL FF since.
Looking at the actual SRCL ff filter, there is an incredibly high Q (therefore narrow and hard to see without fine resolution) feature in the SRCL FF filter at precisely 102.1 Hz. Gabriele will post more details in a comment.
In short, we think this is the problem, and are taking the steps to fix it.
Edit to add:
How can we avoid this problem in the future? This feature is likely an artifact of running the injection to measure the feedforward with the calibration lines on, so a spurious feature right at the calibration line appeared in the fit. Since it is so narrow, it required incredibly fine resolution to see it in the plot. For example, Gabriele and I had to bode plot in foton from 100 to 105 Hz with 10000 points to see the feature. However, this feature is incredibly evident just by inspecting the zpk of the filter, especially if you use the "mag/Q" of foton and look for the poles and zeros with a Q of 3e5 (!!). If we ensure to both run the feedforward injection with cal lines off and/or do a better job of checking our work after we produce a fit, we can avoid this problem.
We did make sure to check the MICH feedforward in case the same error had occurred, but luckily everything looks fine there!
We removed the high Q zero/pole pair, saved and reloaded the filter.
We have relocked with no sign of the 102 Hz peak. Great! Tagging DetChar since they pointed out this problem first and Cal since they made adjustments to calibration lines to avoid this problem (and may want to undo those changes).
Lockloss @ 23:50 UTC - no immediate obvious cause.
H1 had relocked to NLN by 23:18 UTC after SQZ team ran their checks, but the range was significantly lower (~120Mpc), so investigations were ongoing into the loss of sensitivity. H1 was not observing between reaching NLN and this lockloss.
Winds have picked up this afternoon, now over 30mph, although unsure if this would be the cause of the lockloss. It will make relocking challenging, however.
The hourly forecast (link) suggests we should have similar / worse than right now wind for the next several hours, and then the wind will start to calm down (a bit, not a lot) around 8pm or 9pm. I've suggested to RyanS that if the IFO doesn't lock in the next few tries, that he leave the IFO in DOWN for 2-3 hours until the wind starts to come down.
Separately, we had a surprising amount of trouble locking this afternoon, but much of that time was before the wind picked up. We don't really know why we were able to successfully lock the one time we did. I'm hopeful that once the wind calms down, locking will be straightforeward, but it may not be.
Benoit, Ansel, Derek
Benoit noticed that for recent locks, the 102.13 Hz calibration line is much louder than typical for the first few hours of the lock. An example of this behavior is shown in the attached spectrogram of H1 strain data on August 5 - this is the first day this behavior appeared. Ansel noted that this feature includes a comb-like structure around the line that is only present in the H1:GDS-CALIB_STRAIN_NOLINES channel and not H1:GDS-CALIB_STRAIN (see spectra for CALIB_STRAIN and CALIB_STRAIN_NOLINES on Aug 5). This issue also visible in the PCAL trends for the 102.13 Hz line.
We are not sure if the excess noise near 102.13 Hz is from the calibration line itself or another noise source that is near the line. However, the behavior has been present for every lock since 12:30 UTC on August 5 2023.
FYI, $ gpstime Aug 05 2023 12:30 UTC PDT: 2023-08-05 05:30:00.000000 PDT UTC: 2023-08-05 12:30:00.000000 UTC GPS: 1375273818.000000 so... this behavior seems to have started at 5:30a local time on a Saturday. Therefore *very* unlikely that the start of this issue is intentional / human change driven. The investigation continues.... making sure to tag CAL.
Other facts and recent events: - Attached are 2 screenshots that show the actual *digital* excitation is not changing with time in anyway. :: 2023-08-08_H1PCALEX_OSC7_102p13Hz_Line_3mo_trend.png shows the specific oscillator, --- PCALX's OSC7 which drives the 102.13 Hz line's EPICs channel version of its output. The minute trend shows the max, min, and mean of the output, and there's no change in amplitude. :: 2023-08-08_H1PCALEX_EXC_SUM_3mo_trend.png shows a trend of the total excitation sum from PCAL X. This also shows *no* change in time in amplitude. Both trends show the Aug 02 2023 change in amplitude kerfuffle I caused that Corey found and a bit later rectified -- see LHO:71894 and subsequent comments, but that was done, over with an solved, definitely by Aug 03 2023 UTC and unrelated to the start up of this problem. It's also well after I installed new oscillators and rebooted the PCALX, PCALY, and OMC models on Aug 01 2023 (see LHO:71881).
The front-end version of the calibration's systematic error at 102.13 Hz also shows the long, time-dependent issue -- this will allow us to trend the issue against other channels Folks in the calibration group have found that the online monitoring system for the - overall DARM response function systematic error - (absolute reference) / (Calibrated Data Product) [m/m] - ( \eta_R ) ^ (-1) - (C / 1+G)_pcal / (C / 1+G)_strain - CAL-DELTAL_REF_PCAL_DQ / GDS-CALIB_STRAIN (all different ways of saying the same thing; see T1900169) in calibration at each PCAL calibration line frequency -- the "grafana" pages -- are showing *huge* amounts of systematic error during these times when the amplitude of the line is super loud. Though this metric is super useful because it's dreadfully obvious that things are going wrong -- this metric is not in any normal frame structure, so you can't compare it against other channels to find out what's causing the systematic error. However -- remember -- we commissioned a front-end version of this monitoring during ER15 -- see LHO:69285. That means the channels H1:CAL-CS_TDEP_PCAL_LINE8_COMPARISON_OSC_FREQ << the frequency of the monitor H1:CAL-CS_TDEP_PCAL_LINE8_SYSERROR_MAG_MPM << the magnitude of the systematic error H1:CAL-CS_TDEP_PCAL_LINE8_SYSERROR_PHA_DEG << the phase of the systematic error tell you (what's supposed to be***) equivalent information. *** One might say that "what's suppose to be" is the same as "roughly equivalent" due to the following reasons: (1) because we're human, the one system is displaying the systematic error \eta_R, and the other is displaying the inverse ( \eta_R ) ^ (-1) (2) Because this is early-days in the front-end system, it uses the "less complete" calibrated channel CAL-DELTAL_EXTERNAL_DQ rather than the "fully correct" channel GDS-CALIB_STRAIN But because the problem is so dreadfully obvious in these metrics, even though they're only *roughly* equivalent, you can see the same thing. In the attached screenshot, I show both metrics for the most recent observation stretch, between 10:15 and 14:00 UTC on 2023-Aug-09. Let's use this front-end metric to narrow down the problem via trending.
There appears to be no change in the PCALX analog excitation monitors either. Attached is a trend of some key channels in the optical follower servo -- the analog feedback system that serves as intensity stabilization and excitation power linearization for the PCAL's laser light that gets transmitted to the test mass -- the actuator of which is an acousto-optic modulator (an AOM). There seems to be no major differences in the max, min, and mean of these signals before vs. after these problems started on Aug 05 2023. H1:CAL-PCALX_OFS_PD_OUT_DQ H1:CAL-PCALX_OFS_AOM_DRIVE_MON_OUT_DQ
I believe this is caused by the presence of another line very close to the 102.13 Hz pcal line. This second line is present at the start of a lock stretch but seems to go away as the lock stretch continues. I have attached a plot showing a zoom-in on an ASD around 102.1-102.2 Hz right after a lock stretch (orange), where the second peak is evident, and well into a lock stretch (blue) where the PCAL line is still present, but the second peak right below it in frequency is gone. This ASD is computed using an hour of data for each curve, so we can get the needed resolution for these two peaks.
I don't know the origin of this second line. However, a quick fix to the issue could be moving the PCAL line over by about a Hz. The second attached plot shows that the spectrum looks pretty clean from 101-102 Hz, so somewhere in there would be probably be okay for a new location of the PCAL line.
Since it looks like the additional noise is at 102.12833 Hz, I did a quick check in Fscan data from Aug 5 for channels where there is high coherence with DELTAL_EXTERNAL at 102.12833 but *not* at 102.13000 Hz. This narrows down to just a few channels:
(lines git issue opened as we work on this.)
As a result of Ansel's discovery, and conversation on the CAL call today -- I've moved the calibration line frequency from 102.13 to 104.23 Hz. See LHO:72108.
This line may have appeared in the previous lock the day before (Aug 4). The daily spectrogram for Aug 4 shows a line near 100 Hz starting at 21:00 UTC.
Looking at alogs leading up to the time Derek notes above, I noticed that Gabriele retuned and tested new LSC FF. This change may be related to this new peak. Remembering some issues we had recently where DHARD filter impulses were ringing up violin modes, I checked the new LSC FF filters and how they are engaged in the guardian. Some of them have no ramp time, and the filter bank is turned on immediately along with the filters in the guardian. I have no idea why that would cause a peak at 102 Hz, but I updated those filters to have a 3 second ramp.
Reloaded the H1LSC model to load in Elenna's filter changes
Now that the calibration line has been moved, the comb-like structure at the calibration line frequency is no longer present (checked in the CLEAN channel).
We can also see the shape of the 102.12833 Hz line much more clearly without the overlapping calibration line. I have attached a plot for reference on the width and shape.
As discussed in todays commissioning meeting, I checked TMSX and ETMX movement for a kick during locking and couldn't see anything suspicious. I did find some increase motion/noise every 8Hz in TMSX 1s into ENGAGE_SOFT_LOOPS when ISC_LOCK isn't explicitly doing anything, plot attached. However this noise was present prior to Aug 4th, (July 30th attached).
TMS is suspicious as Betsy found that TMS's have violin modes ~103-104Hz.
Jeff draws attendtion to 38295, showing modes of quad blade springs above 110Hz, and 24917 showing quad top wire modes above 300Hz.
Elenna's notes with calibration lines off (as we are experimenting with for current lock) we can see this 102Hz peak at ISC_LOCK state ENGAGE_ASC_FOR_FULL_IFO. We were mistaken.
To preserve documentation, this problem has now been solved, with more details in 72537, 72319, and 72262.
The cause of this peak was a spurious, narrow, 102 Hz feature in the SRCL feedforward that we didn't catch when the filter was made. This has been been fixed, and the cause of the mistake has been documented in the first alog listed above so we hopefully don't repeat this error.