The TRANSITION_BACK_TO_ETMX state of the SUS_CHARGE guardian has been causing the majority of our locklosses prior to Tuesday Maintenance, e.g. 69437, 70861.
I looked into this again and found that the ITMX LOCK_L gain is being turned off before the ITMX to ETMX transition has completed. The ramp time of the swap is 20 seconds but the code only waits 15 seconds before moving on, see attached plot and code.
We are using a ezca.get_LIGOFilter and the wait=True parameter isn't waiting the full ramp_time specified, created Issue 15. I've changed the wait to be False and added a 20s sleep timer. The SUS_CHARGE guardian will need to be reloaded when we are out of observing.
Once I remembered that SUS_CHARGE is not one of the guardian nodes monitored for the observation intent bit (although its subordinate nodes are), I reloaded it this morning at 14:02 UTC so Camilla's changes are pushed in.
This fixed the SUS_CHARGE code and we successfully stayed locked throughout both ESD transitions on Tuesday.
Total time taken is 18 minutes, this is longer than the 15 minute target. I've reduced the slowy_ramp_on_bias() ramptime back from 60 to 20 seconds, as I've reverted some of 68987 changes I was blaming for the looklosses. ESD_EXC_{QUAD} will need to be reloaded for each quad for this to take effect.
Currently the SUS_CHARGE code is taking 16m45s. We could look at further decreasing some tramps.
TJ noticed this morning that the L2L tramps for ITMX and ETMX weren't reverted by the code. I've added lines to SUS_CHARGE to save and revert these after the ESD transitions and also save adn reset the ITMX_L2L gain so it's not hard-coded.
I just turned off the EX19 damping Ryan and I turned on last night (FM1,2,10 Gain -30 71066) as the mode seemed to be very slowly turning around and the damping was so slow anyway, see attached. Tagging SUS.
Following up on the DARM bicoherence observation and the non-stationarity of low frequency noise: the noise in DARM between 20 and 40 Hz is correlated with the amplitude of the 2.6 Hz peak
The attached plot is an histogram of the DARM RMS in the 20-37 Hz region (computed by summing bins in a whitenend spectrogram) and the DARM RMS around the 2.6 Hz peak (computed with a band-pass filter between 2.4 and 2.7 Hz).
There is a clear correlation between the DARM noise in the 20-40 Hz region and the RMS in the 2.6 Hz region.
An exploration of where the 2.6 Hz peak is visible and coherent with DARM
The ETMX M0 / R0 / TMSX are interesting signals, maybe something isn't working properly in the R0 tracking loop?
The usual ASC suspects, especially CHARD_Y.
Several DC centering loops.
Some OMC signals.
No smoking gun
I have some evidence that could indicate that this peak is related to some instability in the HARD loops.
First, we noticed this 2.6 Hz peak is very prevalent in the OMC and OM3 suspensions. Evan was tracking the presence of the peak in OM3 yaw and noted it appeared sometime around April 9-14. This was a tricky time because many things happened: we powered up, we adjusted the compensation plates and OMC to reduce scatter, etc. Evan also noticed it doubled in size on June 22, which corresponds to when we went down in power. I used the OMC SUS master out channels as an indicator of when this peak appeared. I noticed it was not present April 7th and then appeared on April 8th, UTC time. More specifically, it appeared in the Lownoise ASC guardian state. This was around the time I was working on removing RPC completely from the ASC control and working with Dan and others on increasing the power. It appears that the peak pops up right when I turned off the RPC and transitioned all the HARD loops to the new high power controllers.
The fact that the peak height doubled when we went down in power also bolsters this theory: if there was something a bit marginal in the loop at higher power, the shift in the radiation pressure plant could have worsened the marginality. To further confirm, I looked at spectra of the HARD and SOFT loops, and the 2.6 Hz peak becomes prominent around the time I changed the loop control and turned off RPC.
As a test, Gabriele and I would like to make small changes in the HARD loop gains and see if we can pinpoint which loop is marginal. If we're lucky, the peak could be fixed with a gain change. If we're less lucky, maybe we need to redo one or more loop controllers.
First attachment is an ndscope screenshot. I plotted the guardian state, RPC gains, and the SWSTATs for all the HARD loops. The cursor is on the time when I hard turned off all the RPC gains and switched the loop controllers. GPStime: 1364955781. This also corresponds to when the peak appears in the OMC suspension channels.
Second attachment is a dtt screenshot of a plot comparing the OMC sus master out spectra now (blue refs) with the spectra on April 7 before the ASC change (red live).
Edit to add: Gabriele and I tested raising and lowering various ASC loop gains but we saw no difference in the peak.
We also hypothesized that this is related to the reaction chain tracking and/or damping. We turned off the reaction chain tracking of ETMX for L, P and R and saw no change in the peak.
Summary of the report:
Link to the full report: Report
V. Bossilkov, L. DartezSummary We gave the simulines calibration injection scheme another go at LHO during this week's commissioning window. In short, the commissioning endeavor was partially successful: we did not lose lock and we were able to gather useful data for the actuation stages and the DARM OLG. However, additional commissioning time is required before we can fully adopt simulines for the regular calibration measurements at LHO.
Objective
The goal of today's session included the following: 1.) Determine whether we can run simulines at LHO reliably without losing lock, 2.) assess the data quality of the recorded simulines hdf5 files and confirm that they can be read and post-processed 3.) evaluate the spacing of the common frequency vector that is used for all measurements 4.) tweak the injection amplitudes and integration length to get a good enough SNR while keeping the amplitudes and integration lengths as low as possibleCode Execution
We ran two trials with injections at different amplitudes: 0.5*A and 0.1*A, where A is the corresponding amplitude typically used for the same injections in DTT. Simulines uses a configuration file in.INIformat to inform the various injections of their frequency vectors, amplitudes, and other settings. The two files we used, both of which are located in/ligo/groups/cal/src/simulines/h1_test_settings, are named as follows: Half the DTT amplitude:settings_h1_0.5.iniOne tenth the DTT amplitude:settings_h1_0.1.iniThese files are contained within the simulines repo as of commit bf25b1b4. The simulines code was run using the following command from within the repo'ssimulinesdirectory:python simulines.py -i ../h1_test_settings/settings_h1_0.5.iniThe terminal output for both runs is attached here. All measurements were placed in/ligo/groups/cal/H1/measurementsas they would during a normalpydarm measureexecution. Once recorded, we ran a quick script that Vlad wrote to take a look at each measurement's uncertainty as it compares to those taken with DTT. The script used lives at/ligo/home/louis.dartez/projects/20230705/simulines_commissioning/simulinesUncert_H1.py. This script compares each simulines injection with a corresponding DTT injection. The exact DTT injections used for this initial test run are listed in the script source code (uploaded to this alog for redundancy).Results
We were able to successfully run simulines for all of the normal calibration measurement sweeps (we did not take any broadband measurements). The table below contains plots comparing the uncertainty from the simulines (SL) injections vs the same using diaggui (DTT). The top plot in each figure shows an overlay of the uncertainty from the SL injection and the DTT injection. The bottom plot contains the residual (SL_unc / DTT_unc). The left and right columns correspond to the 0.5*"DTT Amplitude" and 0.1*"DTT Amplitude", respectively. Each actuation stage is displayed on its own row.
| 0.5*Ampl | 0.1*Ampl | |
|---|---|---|
| ETMX L1 | ![]() | ![]() |
| ETMX L2 | ![]() | ![]() |
| ETMX L3 | ![]() | ![]() |
The two simulines injections were at the following GPS times: Trial 1 Start: 2023-07-05 19:02:04.672 UTC GPS 1372618942.672165 Trial 2 Start: 2023-07-05 19:25:27.781 UTC GPS 1372620345.781883 Each simulines suite ran for approximately 23 minutes.
Naoki, Sheila, Vicky. B/c of recent issues with SQZ_ASC not engaging (eg 71057, 70890), range can hover ~120 instead of >145 (!!). So, we added a state to SQZ_MANAGER that operators can use to clear asc, "RESET_SQZ_ASC". See new guardian graph (vs old graph).
Tagging OpsInfo: If range hovers low ~120 and SQZ_MANAGER has the yellow notification like "SQZ ASC AS42 not on?? Please RESET_SQZ_ASC", can do the following (~10 seconds):
If this doesn't work, the following is a longer solution (~5 min) which includes resets the squeezer ASC offsets, in case that's the problem:
Usually our ASC offsets don't change much, unless IFO output alignment or beam changes. SVN code diffs attached, and committed to version 25949.
Naoki, Sheila, Elenna
We tested high NLG. We increased the OPO trans power from 85uW to 105uW, which corresponds to NLG of 22.7 and generated squeezing of 18.3dB according to the NLG calculator. We measured sqz/asqz with high NLG as shown in the attached figure. The squeezing above 1 kHz slightly improved, but the squeezing below 500 Hz got worse.
At 1kHz, squeezing is 3.6dB, mean squeezing is 13.5dB, and anti squeezing is 16.7dB.
We suspect that the degradation below 500 Hz is due to the SRCL detuning so we tried to change the SRCL offset. When we changed the SRCL offset from -175 to -150, squeezing below 500 Hz slightly improved. So the degradation below 500 Hz is likely due to the SRCL detuning. We also tried the SRCL offset of -125 and -100, but they are almost the same as -150. Everytime we changed the SRCL offset, we adjusted the sqz angle to maximize the squeezing at high frequency.
How to increase the pump power
To be clear: we did not change the current SRCL offset permanently. But we would like to reduce it to at least -150 for the reasons stated above. Tagging CAL as they are the main group we thought of in regards to this change.
Noise between 15 and 50 Hz seems to be highly non-stationary, as shown in the median-whitened spectrogram attached, computed over 9 hours of data.
TITLE: 07/05 Eve Shift: 23:00-07:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 146Mpc
CURRENT ENVIRONMENT:
SEI_ENV state: CALM
Wind: 10mph Gusts, 6mph 5min avg
Primary useism: 0.02 μm/s
Secondary useism: 0.08 μm/s
QUICK SUMMARY:
Inherited from Tony in Nominal_Low_Noise, detector has been locked for 17:43 and just came out of commissioning. Now in Observing.
Leftover from last Wednesday's update to the LSC iterative feedforward was another update to the MICH FF. I was able to test it today. Attached is a screenshot of the injections. The blue reference trace is the injection with no MICH feedforward on (SRCL FF on). The green trace was our previous MICH feedforward and the red live trace is the new MICH feedforward. There is a small improvement from 20-30 Hz, so we are going with the new feedforward. Old FF filter was in FM5 ("6-22b-23"), new feedforward filter is in FM3 ("6-28b-23"). The update is SDFed and guardianized.
Elenna, Sheila, Brina
When we engaged the OM2 heater on June 27th, we saw a reduction in input jitter coupling to DARM (70864) and a shift in OMC alignment (70886). Today we attempted to recreate the changes we saw when turning on the TSAMS by shifting the OMC alignement, by watching IMC PZT lines, OM1 + OM3 dither lines, and the optical gain. We see that the alignment shifts are the right size to explain what happened when the TSAMs came on, but the optical gain shifts from alignment are smaller. We also see that the aligments that reduce input jitter aren't the same as those that reduce output jitter.
The first screenshot shows the alignment shift that happened when we first turned on OM2 heater. Today we added offsets into OMC QPD A and B, we found that adding an offset of about 0.03 to OMC B made a shift somewhat simliar to what happened when OM2 was heating up (second attachment shows a step which was roughly undoing the alignment shift from June 27th). We saw that there was a 13.8% decrease in IMC PZT P coupling to DARM when we undid that step. On the 27th there was a 26% decrease in jitter coupling to DARM looking at the known peak around 120Hz. So the change that we see in jitter coupling from this alignment shift has the right order of magnitude to explain the change in jitter coupling on the 27th, although confusingly we improved the jitter by undoing the move. This move also made the coupling of the OM1 pitch dither line to DARM worse, while OM3 got slightly better. (3rd screenshot attached)
We quickly tried to move this DOF to see if we could further reduce the jitter coupling, we weren't able to get much more of an improvement.
During all of our moves today we saw small changes in optical gain, for OMC alignment changes that were comparable in size to what happened on the 27th the optical gain changes were smaller than the 2% drop seen with TSAMs.
TITLE: 07/05 Day Shift: 15:00-23:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Observing at 148Mpc
CURRENT ENVIRONMENT:
SEI_ENV state: CALM
Wind: 11mph Gusts, 8mph 5min avg
Primary useism: 0.02 μm/s
Secondary useism: 0.08 μm/s
QUICK SUMMARY:
Inherited 9.75 hour lock.
19:00 Commissioning Time!!
We are in Comissioning State for some Calibration Measurements, Squeezing updates, and a filter change for LSC Mitch.
19:00 UTC Calibration Measurements.
20:00 UTC OMC changes
22:27 SEI Measurements on HAM3.
22:30 SQUEEZE Guardian changes and reload.
22:43 UTC LSC Mitch changes.
Current IFO Status: Locked in NOMINAL_LOW_NOISE for 17.5 Hours and Just got back to OBSERVING,
| Start Time | System | Name | Location | Lazer_Haz | Task | Time End |
|---|---|---|---|---|---|---|
| 17:53 | FAC | Karen | Optic lab Vac Prep | n | Technical Cleaning | 18:13 |
| 18:08 | VAC | Janos | EX | N | Turning on pump by CP8 | 18:38 |
| 20:50 | VAC | Gerardo | Mid Y | N | Working on Hepta back at 22:54 | 22:20 |
| 22:37 | SQZ | Vickie | Remote | n | Making changes and reloading squeeze guardian | 22:47 |
| 22:51 | SEI | Jim | Remote | n | Sei measurements on HAM3 | 22:52 |
| 22:52 | LSC | Elenna | CRTL RM | n | LSC Mitch changes | 23:02 |
Tony, Jonathan, Erik, Keith T, Mike T, Dave:
Executive Summary:
psinject spontaneously restarted itself due to an out-of-memory problem on h1hwinj1 yesterday morning.
This was a result of the upgrade of the system on Tue 27 June 2023. It takes 7 days to run out of memory.
In the short term (over next few days) we will monitor the memory usage, and restart the psinject process during a lock_loss event to make it through to next week.
Next Tuesday we will downgrade to the original version of the LAL pulsar binary.
Details:
On Tuesday 4th July 2023 at 10:32 the psinject process on h1hwinj1 cleanly shutdown and was then restarted by monit. The shutdown and restart of this process involves ramping the output of the INJ_CW filtermodule on h1calinj, which takes H1 out of observation mode.
Looking through the logs we found that h1hwinj1 had slowly ran out of memory over the 7 days since the upgrade of the code on Tuesday 27th June 2023.
We did a quick estimate of the memory leak rate between 10am and 2pm today and came up with 1MB/min. Starting with 10GB of free memory, at this rate the memory is exhausted in about 7 days, which is what we saw.
Last Tuesday several things were upgraded on h1hwinj1:
1. psinject code was changed to use gpstime instead of tconvert (makes LHO = LLO)
2. python3 version increased from 3.4 to 3.6 (makes LHO = LLO)
3. lalapps was upgraded from 6.25 to 9.2 (not done at LLO)
We think it is most probably the lalapps upgrade which is causing the memory leak. LLO did the first two upgrades two weeks ago on l1hwinj1 and are not seeing any memory issues.
In lalapps 6.25 /usr/bin/lalapps_Makefakedata_v4 is a 70K binary. In lalapps 9.2.1 is is a launcher script, spawning /usr/bin/lalpulsar_Makefakedata_v4 which is a 53K binary. It also issues the warning
"WARNING: 'lalapps_Makefakedata_v4' has been renamed to 'lalpulsar_Makefakedata_v4'"
Actions:
We will restart psinject before it stops itself next Tuesday during an appropriate lock loss time.
Next Tuesday we will downgrade LAL from 9.2.1 to 6.25.1 so LHO and LLO are identical.
Over the weekend we ran into a few times (alog71043, alog71026, alog71008) that we tried to get data via cdsutils getdata function in an ISC_LOCK guardian state, and it returned nothing. This caused an error in ISC_LOCK, fixed by simply reloading the node since the function just had to try again to get the data. This is not a new thing, but it's definitely another reminder that we have to be prepared for different outcomes anytime we request data.
Some months ago I made with Jonathan's help, a function wrapper that can be used to handle hung data grabs. While not the issue we saw over the weekend, it's still a good idea to use this whenever we try getting data in a Guardian node. The file is (userapps)/sys/h1/guardian/timeout_utils.py and there is either a decorator (@timeout) or a wrapper function (call_with_timeout) than can be used.
For the specific issue we saw over the weekend, a solution is to just do a simple check that the data is actually there before trying to do anything with it (ie. if data:). Using this situation as a good example:
# This wrapper should handle hung nds data grabs
popdata_prmi = call_with_timeout(cdu.getdata, 'LSC-POPAIR_B_RF90_I_ERR_DQ', -60)
# This conditional handles None data returned
if popdata_prmi.data:
if popdata_prmi.data.max() < 20:
log('no POPAIR RF90 flashes above 20, going to CHECK MICH FRINGES')
return 'CHECK_MICH_FRINGES'
else:
self.timer['PRMI_POPAIR_check'] = 60
I should have added that this fix was loaded into ISC_LOCK by Tony during commissioning today and is ready for our next relock.
This threw the attached error at 2034-07-07 04:14UTC. I edited ISC_LOCK for prmi and drmi checkers from 'if popdata_prmi.data:' to 'if popdata_prmi:'.
This seemed to work but I'm not sure if it will cover all every case. If this goes into error again I suggest the operator start by reloading ISC_LOCK and, if necessary, the "elif self.timer['PRMI_POPAIR_check'] " block of code can be commented out. Tagging OpInfo.
After this edit and a reload, the checker seems to work well, logging that there was no RF18 flashes above 120 (true) and moving to PRMI locking before the old 5 minute 'try_PRMI' timer finished.
Erik, TJ, Dave:
TJ found that gpstime on h1guardian1 is warning of an expired leapseconds file. The file in question is /usr/share/zoneinfo/leapseconds which has an expiration time of 1687910400 in UNIX seconds which is Tue 27 Jun 2023 05:00:00 PM PDT.
Currently H1 is locked for commissioning, so we don't want to do any updates on h1guardian1 at this time. As a temporary measure I increased the expiration time to 1787910400, Fri 28 Aug 2026 02:46:40 AM PDT by manually editing the file.
This is handled by the tzdata debian package on the guardian machine.
J. Kissel To help out studies of the end station VEA temperatures and HVAC system, I've created templates of the relevant channels similar to those described in LHO:70284, but for the X and Y-end VEAs. You can find them in /ligo/home/jeffrey.kissel/Templates/NDScope/ xvea_temp_study.yaml yvea_temp_study.yaml
All of the Template files can be found here:
/opt/rtcds/userapps/release/cds/h1/scripts/fom_startup/nuc32
Both EX And EY MEDM screens are updated.