Jeff K, Sheila, TJ, Jim
Jim had two more locklosses at engage ASC while trying to relock after this afternoon's lockloss. In one case, there aren't many clues except for the usual error signals deviating from 0: 1252450014
In the earlier one 1252449154, the lockloss happened after all the ASC was on and the first convergence checker had passed. At that time we increase the gain of some loops and turn off the offsets in AS36Q which are set in DRMI. I noticed that we have a 4 second wait for a 10 second ramp on the offsets, so I set the wait time to be the TRAMP for the offsets.
The attached ndscope for this lockloss shows both CHARD and INP1Y hitting the soft limiters in the 5 seconds before the lockloss. We use the soft limiters to allow us to turn on all the ASC loops at once, in principle we would like to not have soft limiters on our slowest loops, and only limit those which move fast so that the slow ones can keep up. We haven't taken the time to figure out which loops actually need the limiters, so we have them on all the loops that get engaged here. This means we are probably unnecessarily limiting the ability of our slower loops to keep up in some cases, so eventually we probably want to be more selective about which soft limiters we are using.
The soft limiter is only meant to help us when we are first engaging the loops. Once we have all the loops on and they have converged once, they shouldn't be on anymore since all they will do is make slow down the ASC system's ability to respond to disturbances. In this lockloss, INP1 + CHARD Y hit their limiters after everything was converged (I am not sure what caused a disturbance). I changed the guardian code so that the soft limiters will be turned off after the first convergence check.
Jeff is looking through some of our recent locklosses from ENGAGE_ASC to see if this could potentially have helped us avoid other similar locklosses.
I finished completing Jeff K's table below. Out of the lock losses from ENGAGE_ASC_FOR_FULL_IFO (STATE N = 430), 6 out of the 9 hit some soft limiters. Most consistently, it was INP1_Y and CHARD_Y. It's tough to say if there was correlation between lock losses before or after the PRC2/SRC1 gain increase at the end of that state.
Loops that get error signal limiters:
PRC2_
INP1_
CHARD_
SRC1_
SRC2_
Ones that don't DHARD, MICH, ADS3 (PRC1)
(1) need to wait for ramp off of offset on MICH
(2) after first convergence checker, we turn off the soft limiters. Used to be that we didn't turn off the soft limiteres until after the *second* converegnece checker ,which is after the PRC2/SRC1 boosts get turned on. So maybe the loops that were limited couldn't keep up with the new boosts.
Locklosses around ENGAGE_ASC_FOR_FULL_IFO (STATE N = 430)
search for guardian state 430
LLN GPS UTC What State? What Hits Soft Limits? Before or After PRC2/SRC1 Gain Increases?
1252450014 September 13 2019, 22:46:36 UTC 435 (no, survived) Well after, INP1, CHARD runaway
0 1252449154 September 13 2019, 22:32:16 UTC 430 INP1, CHARD After
1252421630 September 13 2019, 14:53:32 UTC 435 (no, survived) Well after, INP1, CHARD runaway
1252252912 September 11 2019, 16:01:34 UTC 429 INP1 is wildly oscillating Before (Sheila caused, increased INP1 gain too much )
1 1252249409 September 11 2019, 15:03:11 UTC 430 none After
1252145321 September 10 2019, 10:08:23 UTC 435 (no, survived) Well after, INP1, CHARD runaway
2 1252130742 September 10 2019, 06:05:24 UTC 430 CHARD
1252043449 September 09 2019, 05:50:31 UTC 430 INP1_Y, CHARD_Y After, INP1_P runaway
1251914435 September 07 2019, 18:00:17 UTC 435 none Well after, INP1_Y, CHARD_Y runaway
3 1251913373 September 07 2019, 17:42:35 UTC 430 CHARD
4 1251807482 September 06 2019, 12:17:44 UTC 430 CHARD, INP1
1251806452 September 06 2019, 12:00:34 UTC 430 none After
1231738298 September 05 2019, 17:04:40 UTC 435 none Well after, INP1_Y, CHARD_Y runaway
1251638577 September 05 2019, 13:22:39 UTC 430 CHARD_Y, INP1_Y After
1251636875 September 05 2019, 12:54:17 UTC 430 none After
Follow-up on this change: there have been several more lock-losses in/around the ENGAGE_ASC_FOR_FULL_IFO (see, e.g. operator report in LHO aLOG 51961). I'm not sure this guardian code change to move turning off the error signal ("smooth") limiters worked. One collective series during Travis' Sunday shift: GPS Time UTC Time State Error Limiters Hit Lock Loss Before / After PRC2/SRC1 Gain Increases Did Error Limiters Get Turned Off? 1252609254.84 Sep 15 2019, 19:00:36 UTC 430 none, but close After Yes (Made it to where they're turned off) 1252610293.71 Sep 15 2019, 19:17:55 UTC 430 CHARD_Y, INP1_Y After No 1252611608.05 Sep 15 2019, 19:39:50 UTC 430 none, but close After Yes (Made it to where they're turned off) 1252612537.75 Sep 15 2019, 19:55:19 UTC 430 CHARD_Y, INP1_Y After No 1252613443.62 Sep 15 2019, 20:10:25 UTC 430 CHARD_Y, INP1_Y After No 1252615042.75 Sep 15 2019, 20:37:04 UTC 430 CHARD_Y, INP1_Y After No 1252666222.31 Sep 16 2019, 10:50:04 UTC 430 CHARD_Y, INP1_Y After No For the record, so others can perform the same study: Command for lockloss tool used: lockloss -c /opt/rtcds/userapps/release/isc/h1/ndscope/asc_fullIFO.yml select - Looking at the error signals of the ndscope template (you'll likely have to zoom the y-axis out), look for "flat-lining" where the error signal appears to get large and goes to what looks like an artificially flat trace. Leading up to the above locklosses, this happens at ~ -6 for INPY1_Y and ~ -2 for CHARD_Y. - Looking through the guardian log, see where there are lines 2019-09-16_10:49:38.220815Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-PRC2_P_SW1 => 16 2019-09-16_10:49:38.474971Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-PRC2_P => OFF: FM1 2019-09-16_10:49:38.474971Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-PRC2_Y_SW1 => 16 2019-09-16_10:49:38.723825Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-PRC2_Y => OFF: FM1 2019-09-16_10:49:38.724816Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-SRC1_Y_SW1 => 16 2019-09-16_10:49:38.975953Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-SRC1_Y => OFF: FM1 2019-09-16_10:49:38.977005Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-SRC1_P_GAIN => 12 2019-09-16_10:49:38.977982Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-AS_A_RF36_Q_PIT_SW1 => 8 2019-09-16_10:49:39.229301Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-AS_A_RF36_Q_PIT => OFF: OFFSET 2019-09-16_10:49:39.230172Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-AS_A_RF36_Q_YAW_SW1 => 8 2019-09-16_10:49:39.481531Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-AS_A_RF36_Q_YAW => OFF: OFFSET 2019-09-16_10:49:39.482179Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-AS_A_RF36_Q_YAW_OFFSET => 0 2019-09-16_10:49:39.482503Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-AS_A_RF36_Q_PIT_OFFSET => 0 happen, in relation to lines like 2019-09-15_19:39:50.260416Z ISC_LOCK JUMP: ENGAGE_ASC_FOR_FULL_IFO->LOCKLOSS 2019-09-15_19:39:50.260416Z ISC_LOCK calculating path: LOCKLOSS->NOMINAL_LOW_NOISE 2019-09-15_19:39:50.267164Z ISC_LOCK new target: DOWN 2019-09-15_19:39:50.273348Z ISC_LOCK executing state: LOCKLOSS (2) 2019-09-15_19:39:50.274286Z ISC_LOCK [LOCKLOSS.enter] 2019-09-15_19:39:50.395063Z ISC_LOCK JUMP target: DOWN namely, if they're before or after lines like 2019-09-15_19:39:46.596598Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-INP1_P_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.596943Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-INP1_Y_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.597208Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-CHARD_P_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.597478Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-CHARD_Y_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.598132Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-PRC2_P_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.598455Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-PRC2_Y_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.598808Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-SRC1_P_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.599082Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-SRC1_Y_SMOOTH_ENABLE => 0 2019-09-15_19:39:46.599343Z ISC_LOCK [ENGAGE_ASC_FOR_FULL_IFO.run] ezca: H1:ASC-SRC2_P_SMOOTH_ENABLE => 0 ... if they happen. As indicated above, most did not happen.
Sheila confirms, after discussing my review with her, that it was likely that code changes did not get loaded. I've loaded the ISC_LOCK guardian code this afternoon, during the calibration measurement period. 2019-09-16_21:16:49.389046Z ISC_LOCK LOAD REQUEST 2019-09-16_21:16:49.390035Z ISC_LOCK RELOAD requested. reloading system data... 2019-09-16_21:16:49.423530Z /opt/rtcds/userapps/release/isc/h1/guardian/ISC_LOCK.py: inconsistent use of tabs and spaces in indentation 2019-09-16_21:16:50.217943Z /opt/rtcds/userapps/release/isc/h1/guardian/ISC_LOCK.py: inconsistent use of tabs and spaces in indentation 2019-09-16_21:16:50.274528Z ISC_LOCK module path: /opt/rtcds/userapps/release/isc/h1/guardian/ISC_LOCK.py 2019-09-16_21:16:50.275249Z ISC_LOCK user code: /opt/rtcds/userapps/release/isc/h1/guardian/switch_SDF_source_files.py 2019-09-16_21:16:50.275249Z ISC_LOCK user code: /opt/rtcds/userapps/release/sys/common/guardian/cdslib.py 2019-09-16_21:16:50.275249Z ISC_LOCK user code: /opt/rtcds/userapps/release/isc/h1/guardian/lscparams.py 2019-09-16_21:16:50.275249Z ISC_LOCK user code: /opt/rtcds/userapps/release/isc/h1/guardian/ISC_GEN_STATES.py 2019-09-16_21:16:50.275249Z ISC_LOCK user code: /opt/rtcds/userapps/release/isc/h1/guardian/ISC_library.py 2019-09-16_21:16:50.275249Z ISC_LOCK user code: /opt/rtcds/userapps/release/isc/h1/guardian/fast_ezca.py 2019-09-16_21:16:55.589754Z ISC_LOCK system archive: code changes detected and committed 2019-09-16_21:16:56.618081Z ISC_LOCK system archive: id: d347c69389743cbdc979e7dfbc2df8a33cd97f2a (221543529) 2019-09-16_21:16:56.618594Z ISC_LOCK RELOAD complete 2019-09-16_21:16:56.619981Z ISC_LOCK W: RELOADING @ NLN_CAL_MEAS.run