Reports until 19:38, Thursday 15 January 2015
H1 CDS (SUS)
jeffrey.kissel@LIGO.ORG - posted 19:38, Thursday 15 January 2015 - last comment - 11:41, Friday 16 January 2015(16101)
First Attempt at Defining SUS SDF File for H1SUSMC2
J. Kissel, T. Sadekci, B. Weaver

We've made a first attempt to use the State Definition File (SDF, see e.g. LLO aLOG 15907, or G1500060) system on H1SUSMC2. You can find a play-by-play below, but most importantly, we think we've found a major flaw in the system. 
Here's the use case:
Taking MC2 from SAFE to ALIGNED using the SUS guardian, then using the IMC_LOCK guardian to lock the IMC, uses several _SW1S or _SW2S channels (i.e. those bit words that define the first and second halves of the switchable buttons in a filter bank) in H1 SUS MC2 -- for the exact list, see first attached screenshot. For example, in the M1_LOCK_L bank, the input, and FM1 are regularly switched ON and OFF by guardian, but FM2 should always be ON. As such, we'd want the SDF system to monitor FM2, but not FM1 or the input switch. The second attachment, captured *after* we changed all settings to being monitored, and then brought the SUS up from SAFE to the IMC LOCKED, shows this. BUT, all three are controlled by the SW1S channel, which we must chose to be *either* monitored or not. So, if we chose to not monitor the input and FM1, we loose the ability to monitor FM2.

Jonathan and Dave inform me that monitoring each bit individually has been considered, but not yet implemented. I think this IMC use case scenario (which is representative of a LOT of suspension and ISC filter banks) demonstrates that we DEFINITELY need a bit-by-bit monitoring system before we can reliably roll out the SDF system for filter banks. Note -- for EPICs channels with unique identifiers, like the a GAIN, MASTERSWITCH, or matrix elements, the SDF system is already great.


Play-by-play:
- Find the directory in which a given user front end code's safe.snap lives:
    jeffrey.kissel@opsws8:~$ cd /opt/rtcds/lho/h1/target/h1susmc2/h1susmc2epics/burt/
    jeffrey.kissel@opsws8:/opt/rtcds/lho/h1/target/h1susmc2/h1susmc2epics/burt$ pwd
    /opt/rtcds/lho/h1/target/h1susmc2/h1susmc2epics/burt
- Make sure it's a soft link to the userapps repo:
    jeffrey.kissel@opsws8:/opt/rtcds/lho/h1/target/h1susmc2/h1susmc2epics/burt$ ls -l safe.snap
    lrwxrwxrwx 1 controls controls 63 Jan 11 15:19 safe.snap -> /opt/rtcds/userapps/release/sus/h1/burtfiles/h1susmc2_safe.snap
- Add a "1" to the end of the line for all EPICs settings channels in the safe.snap file, such that all settings go from being unmonitored to monitored -- do so using Jamie's USERAPPS/sys/common/scripts/sdf_set_monitor, documented in LLO aLOG 15907:
    jeffrey.kissel@opsws8:/opt/rtcds/lho/h1/target/h1susmc2/h1susmc2epics/burt$ sdf_set_monitor 1 safe.snap
- Try to be too clever, and use the command line to push the "LOAD Table Only" button:
    jeffrey.kissel@opsws8:/opt/rtcds/lho/h1/target/h1susmc2/h1susmc2epics/burt$ caput H1:FEC-39_SDF_RELOAD 1
    Old : H1:FEC-39_SDF_RELOAD           0
    New : H1:FEC-39_SDF_RELOAD           1
- Watch with sadness at the MC2 suspension get immediately forced to a safe.snap and the IMC lose lock. Why? Because *all three* SDF load buttons are the same channel, but a request of "1" performs the "Load Settings and Table" action. Too greedy.
- Use SUS guardian (just because that's the screen I had open) to request ALIGNED again, so no action, realized it was because MC2 was managed by the IMC_LOCK manager, which eventually requested the same, and restored the IMC.
- Begin to hand edit the modified safe.snap
    jeffrey.kissel@opsws8:/opt/rtcds/lho/h1/target/h1susmc2/h1susmc2epics/burt$ gedit safe.snap&
- Get scared that you're not changing the right channel, make a few mistakes, hit the reload button, eventually run into the fundamental flaw described above and stop.
Images attached to this report
Comments related to this report
daniel.sigg@LIGO.ORG - 11:00, Friday 16 January 2015 (16113)

This was one of the main findings with our test setup past summer. The control values for the standard filter modules need support at the bit level. This required both a mask field to indicate which bits are watched, and support for command strings which are modelled after ezcaswitch. Back then we used the following convention:

The supported command strings were of the form “keyword [[bit]+ cmd]+”. The keyword is either “bits” or “switch” supporting a generic bit encoded value or a standard filter module. The allowed fields for ‘cmd’ can be one of the following:

  • ON: indicates that the switch or filter stage is on,
  • OFF: indicates that the switch or filter stage is off, and
  • MAN: indicates that the switch or filter is in manual mode (i.e., not watched).

For the type “bits” the allowed values for bit are B0 to B31 and ALL. For the type “switch” (standard filter module) the allowed values for bit are one of the following:

  • INPUT: denotes the state of the input switch,
  • OFFSET: denotes the state of the offset enable,
  • FM1, … FM10: denotes the state of the individual filter stages,
  • LIMIT: denotes the state of the limiter,
  • DECIMATION: denotes the state of the decimation filter,
  • OUTPUT: denotes the state of the output switch,
  • HOLD: denotes the state of the hold output switch,
  • ALL: includes all of the above,
  • IO: includes INPUT, DECIMATION and OUTPUT, and
  • FMALL: includes all filters stages.

All switches which are not listed were set to off.

jameson.rollins@LIGO.ORG - 11:41, Friday 16 January 2015 (16114)

I really don't think the filter module monitoring is such a big deal, assuming we can get guardian internal settings monitoring working . If a guardian node is flipping bits in a filter module, then that node can just monitor the entire filter bank.  We shouldn't have multiple different guardian nodes touching the same filter module, so this really shouldn't be an issue.  This is not a "fundamental" flaw in the system, so no need to panic.  We can easily work around this.