Reports until 10:25, Tuesday 12 September 2023
H1 CAL (CDS)
jeffrey.kissel@LIGO.ORG - posted 10:25, Tuesday 12 September 2023 - last comment - 13:07, Wednesday 13 September 2023(72830)
h1calcs Model Rebooted; Gating for CALCS \kappa_U is now informed by KAPPA_UIM Uncertainty (rather than KAPPA_TST)
J. Kissel, D. Barker
WP #11423

Dave has graciously compiled, installed and restarted the h1calcs model. In doing so, that brings in the bug fix from LHO:72820, which fixes the issue that the front-end, CAL-CS KAPPA_UIM library block was receiving the KAPPA_TST uncertainty identified in LHO:72819.

Thus h1calcs is now using rev 26218 of the library part /opt/rtcds/userapps/release/cal/common/models/CAL_CS_MASTER.mdl.

I'll confirm that the UIM uncertainty is the *right* uncertainty during the next nominal low noise stretch later today (2023-09-12 ~20:00 UTC).
Comments related to this report
jeffrey.kissel@LIGO.ORG - 17:37, Tuesday 12 September 2023 (72848)
Circa 9:30 - 10:00a PDT (2023-09-12 16:30-17:00 UTC)
Post-compile, but prior-to-install, Dave ran a routine foton -c check on the filter file to confirm that there were no changes in the
    /opt/rtcds/lho/h1/chans/H1CALCS.txt
besides "the usual" flip of the header (see IIET:11481 which has now become cds/software/advLigoRTS:589).

Also relevant, remember every front-end model's filter file is a softlink to the userapps repo,
    $ ls -l /opt/rtcds/lho/h1/chans/H1CALCS.txt 
    lrwxrwxrwx 1 controls controls 58 Sep  8  2015 /opt/rtcds/lho/h1/chans/H1CALCS.txt -> /opt/rtcds/userapps/release/cal/h1/filterfiles/H1CALCS.txt

Upon the check, he found that foton -c had actually changed filter coefficients.
Alarmed by this, he ran an svn revert on the userapps "source" file for H1CALCS.txt in
    /opt/rtcds/userapps/release/cal/h1/filterfiles/H1CALCS.txt

He walked me through what had happened, and when he did to fix it, *verbally* with me on TeamSpeak, and we agreed -- "yup, that should be fine."

Flash forward to NOMINAL_LOW_NOISE at 14:30 PDT (2023-09-12 20:25:57 UTC) TJ and I find that the GDS-CALIB_STRAIN trace on the wall looks OFF, and there're no impactful SDF DIFFs. I.e. TJ says "Alright Jeff... what'd you do..." seeing the front wall FOM show GDS-CALIB_STRAIN at 2023-09-12 20:28 UTC.

After some panic having not actually done anything but restart the model, I started opening up CALCS screens trying to figure out "uh oh, how can I diagnose the issue quickly..." I tried two things before I figured it out:
    (1) I get through the inverse sensing function filter (H1:CAL-CS_DARM_ERR) and look at the foton file ... realized -- looks OK, but if I'm really gunna diagnose this, I need to find the number that was installed on 2023-08-31 (LHO:72594)...
    (2) I also open up the actuator screen for the ETMX L3 stage (H1:CAL-CS_DARM_ANALOG_ETMX_L3) ... and upon staring for a second I see FM3 has a "TEST_Npct_O4" in it, and I immediately recognize -- just by the name of the filter -- that this is *not* the "HFPole" that *should* be there after Louis restores it on 2023-08-07 (LHO:72043).

After this, I put two-and-two together, and realized that Dave had "reverted" to some bad filter file. 

As such, I went to the filter archive for the H1CALCS model, and looked for the filter file as it stood on 2023-08-31 -- the last known good time:

/opt/rtcds/lho/h1/chans/filter_archive/h1calcs$ ls -ltr
[...]
-rw-rw-r-- 1 advligorts advligorts 473361 Aug  7 16:42 H1CALCS_1375486959.txt
-rw-rw-r-- 1 advligorts advligorts 473362 Aug 31 11:52 H1CALCS_1377543182.txt             # Here's the last good one
-rw-r--r-- 1 controls   advligorts 473362 Sep 12 09:32 H1CALCS_230912_093238_install.txt  # Dave compiles first time
-rw-r--r-- 1 controls   advligorts 473377 Sep 12 09:36 H1CALCS_230912_093649_install.txt  # Dave compiles the second time
-rw-rw-r-- 1 advligorts advligorts 473016 Sep 12 09:42 H1CALCS_1378572178.txt             # Dave installs his "reverted" file
-rw-rw-r-- 1 advligorts advligorts 473362 Sep 12 13:50 H1CALCS_1378587040.txt             # Jeff copies Aug 31 11:52 H1CALCS_1377543182.txt into current and installs it


Talking with him further in prep for this aLOG, we identify that when Dave said "I reverted it," he meant that he ran an "svn revert" on the userapps copy of the file, which "reverted" the file to the last time it was committed to the repo, i.e. 
    r26011 | david.barker@LIGO.ORG | 2023-08-01 10:15:25 -0700 (Tue, 01 Aug 2023) | 1 line

    FM CAL as of 01aug2023
i.e. before 2023-08-07 (LHO:72043) and before 2023-08-31 (LHO:72594).

Yikes! This is the calibration group's procedural bad -- we should be committing the filter file to the userapps svn repo every time we make a change.

So yeah, in doing normal routine things that all should have have worked, Dave fell into a trap we left for him.

I've now committed the H1CALCS.txt filter file to the repo at rev 26254

    r26254 | jeffrey.kissel@LIGO.ORG | 2023-09-12 16:26:11 -0700 (Tue, 12 Sep 2023) | 1 line

    Filter file as it stands on 2023-08-31, after 2023-08-07 LHO:72043 3.2 kHz ESD pole fix and  2023-08-31 LHO:72594 calibration update for several reasons.


By 2023-09-12 20:50:44 UTC I had loaded in H1CALCS_1378587040.txt which was simple "cp" copy of H1CALCS_1377543182.txt, the last good filter file that was created during the 2023-08-31 calibration update,...
and the DARM FOM and GDS-CALIB_STRAIN returned to normal. 

All of panic and fix was prior to us going to OBSERVATION_READY 2023-09-12 21:00:28 UTC, so there was no observation ready segment that had bad calibration.

I also confirmed that all was restored and well by checking in on both
 -- the live front-end systematic error in DELTAL_EXTERNAL_DQ using the tools from LHO:69285) and
 -- the low-latency systematic error in GDS-CALIB_STRAIN using the auto-generated plots on https://ldas-jobs.ligo-wa.caltech.edu/~cal/
Images attached to this comment
jeffrey.kissel@LIGO.ORG - 13:07, Wednesday 13 September 2023 (72863)CDS
Just some retro-active proof from the last few days worth of measurements and models of systematic error in the calibration.

First, a trend of the front-end computed values of systematic error, shown in 2023-09-12_H1CALCS_TrendOfSystematicError.png which reviews the time-line of what had happened.

Next, grabs from the GDS measured vs. modeled systematic error archive which show similar information but in hourly snapshots,
    2023-09-12 13:50 - 14:50 UTC 1378561832-1378565432 Pre-maintenance, pre-model-recompile, calibration good, H1CALCS_1377543182.txt 2023-08-31 filter file running.
    2023-09-12 19:50 - 20:50 UTC 1378583429-1378587029 BAD 2023-08-01, last-svn-commit, r26011, filter file in place.
    2023-09-12 20:50 - 21:50 UTC 1378587032-1378590632 H1CALCS_1378587040.txt copy of 2023-08-31 filter installed, calibration goodness restored.

Finally, I show the systematic error in GDS-CALIB_STRAIN trends from the calibration monitor "grafana" page, which shows that because we weren't in ANALYSIS_READY during all this kerfuffle, the systematic error as reported by that system was none-the-wiser that any of this had happened.

*phew* Good save team!!
Images attached to this comment