Reports until 12:46, Monday 09 October 2017
H1 SEI (SUS)
sam.cooper@LIGO.ORG - posted 12:46, Monday 09 October 2017 - last comment - 15:55, Wednesday 11 October 2017(38919)
What should we do during large earthquakes?

J Warner, S Dwyer, S Cooper

We've been looking at what we should do during large earthquakes. The attached plots show the state of both the SEI Guardian (State N), and the L2 watchdog (L2 WDMON) channel, the L3 Oplev and the HEPI L4C's (as the ground STS's saturated) for the 8.1 magnitude Mexico earthquake (GPS: 1188881718), alog 38570, for the chambers ITMX,ITMY,ETMX,ETMY. During the earthquake, all the ISI's tripped as well as the ITMX suspension watchdog. From these plots we think that the decrease in amplitude of the Oplev signal is due to the reduction in ground motion around this time, rather than damping of the ISI, as both the damping and the reduction in ground motion occurred at similar times.

Images attached to this report
Comments related to this report
jim.warner@LIGO.ORG - 14:42, Monday 09 October 2017 (38946)

We've also talked about seismic watchdogs a bit and why the ISIs trip after the isolation loops are shut off by the guardian. Both ETMs are in damped right now so we set the T240 threshold to 100 counts, and sure enough, the T240s started counting saturations, but did not trip the watchdog. Attached plot shows the T240 saturation counts, threshold and ST1 WD mon state. The dip on the top left plot is where we reduced the threshold, the spike on the bottom left is where the model started counting T240 saturations, and the flat line bottom right shows the watchdog didn't trip. This is as it should be.

However, what I think I've seen during ISI trips before, is the ST1 T240s saturate, ST1 trips and ST2 runs for a little bit then trips. This results in ST1 getting whacked pretty hard. I'll try to see if that's what happened with this earthquake.

Images attached to this comment
jeffrey.kissel@LIGO.ORG - 15:23, Monday 09 October 2017 (38948)SEI
J. Kissel, inspired by conversation from S. Cooper, S. Dwyer, J. Warner

I'll remind folks that this collective SEI/ SUS watchdog system has been built up sporadically over ~10 years in fits and spurts as reactionary and quick solutions to various problems by several generations of engineers and scientists. Also, the watchdog system is almost entirely designed only to protect the hardware from a software failure, and never designed to combat this latest suggestion -- protecting the hardware from the earth. So I apologize on behalf of that history at how clunking and confusing things are when discussing what to do in that situation. 

Also, I'll remind people that there are three "areas" of watchdogs: 
    (1) in software, inside the user model -- typically defined by the subsystem experts
    (2) in software, inside the upper level iop model -- typically defined by CDS software group, with input from subsystem experts
    (3) in hardware, either in the AA/AI chassis, or built into the analog coil drivers -- typically defined during initial aLIGO design phase

In my reply here, I'll only be referring to (1) & (2), though I still have an ECR pending approval regarding (3) -- see E1600270 and/or FRS Ticket 6100.

With all that primer done, here's what we should do with the suspension user watchdogs (1), and not necessarily just for earthquakes:
    (a) Remove all connection between SUS and the ISIs user watchdogs. The independent software watchdogs (2) should cover us in any bad scenarios that that connection was designed to protect against.
    (b) Update the RMS system to be *actually* an RMS, and especially, one that we can define a time-constant. The RMS system that is currently installed is some frankenstein brought alive before bugs in the RCG were appreciated (namely LHO aLOG 19658), and before I understood how to use the RCG's RMS function in general. The independent software's watchdog (2) is a good role model for this
    (c) We should rip out all USER MODEL usage of the DACKILL part. The way the DACKILL used across suspension types and platforms with many payloads is confusing and inconsistent. Any originally designed intent of this part is now covered by the independent software watchdog.
    (d) Once (b) is complete, we should tailor the lower the time-constants and the band-passing to better match the digital usage of the stage. For example, the worst that can happen to a PUM stage is getting sent junk ASC and Violin Mode Damping control feedback signals when the IFO has lost lock, but the guardian has not figured it out and switched off control.
    (e part 1) Upon watchdog trip, we should consider leaving the alignment offsets alone. Suddenly turning off alignment offsets often causes just as much of a kick to the system as what had originally set off the watchdog. HEPI has successfully implemented such a system.
    (e part 2) We should re-think the interaction between the remaining USER watchdog system and the Guardian. Currently, after a watchdog trip the guardian state immediately jumps to "TRIPPED" and begins to shut off all outputs and bringing the digital control system to "SAFE." 
    (f) Add a "bypass" feature to the watchdog such that a user can request the "at all costs, continue to try damping to top mass" in the case of earthquakes.
jim.warner@LIGO.ORG - 15:55, Wednesday 11 October 2017 (38985)

I'm attaching some more plots of what happened to the ISIs during this earthquake. The first plot is the saturation count time series for all seismometers and actuators for the test mass ISIs. All of the chambers saturated on the Stage 2 actuators first, this is the first green spike. This tripped the high gain DC-coupled isolation loops, and probably cause Stage 2 to hit it's lockers. The watchdog stopped counting all saturations for 3 seconds (by design), then immediately tripped damping loops on the saturated L4Cs or T240s. I'm not sure why the GS13s don't show up here.

The second plot I attach shows how long the ETMX was saturating different sensors. The L4Cs were saturated for about 45 seconds, the T240s and GS13s were saturated for minutes. The L4Cs never had their analog gains switched, but the chamber guardian should have switched the GS13s automatically. For this reason, if we increase the pause time in the watchdog (between shutting off the isolation loops and full shutdown), I think this shows that for this earthquake the ride-thru time needs to be more than 45 seconds.

 

Images attached to this comment