Reports until 13:48, Tuesday 03 March 2015
H1 SEI (ISC)
jeffrey.kissel@LIGO.ORG - posted 13:48, Tuesday 03 March 2015 (17044)
H1 ISI ITMY Watchdog's Payload Flag allows for some IPC errors
J. Kissel, D. Barker,

Hearing about LLO's miss-hap with an Dolphin PCIE, IPC error causing an errant ISI watchdog trip (LLO aLOG 17036) I was reminded of when Dave and Vincent had implemented an allowance for (what they thought was) 10 [errors / sec] during the H2OAT when such errors were triggering these types of false alarms all the time (see LHO aLOG 5373, or in WP 3698). Poking around the BSC-ISI models to see if any of that was still implemented (since it was all implemented at the top-level of the model, not a part of any library), I found that ISI ITMY still has some of this residual stuff, but the other 4 BSC-ISIs do not.

Check out the first attachment for how it's implemented for ISI ITMY: -- the error rate spigot is fed into a comparator with a constant 150, and the output of the comparator is fed into the ISIPAYLOAD block. But this is sort of bogus. You cal tell that the *idea* was to send an error if the rate (in [errors/sec]) was higher than 150. But in the front end, running at 4096, the errors per *second* will either be 0 or 4096, and it's unclear what happens on a cycle-by-cycle basis. I like the idea, but the implementation doesn't really make sense.

The second attachment shows the beam splitter's implementation, which is identical to the other 3, non-ITMY chamber's implementation -- and to how LLO has theirs implemented. The error rate is inverted, which means if the error rate zero, the payload block gets a 1 (for "running" or "OK"), and when the error rate is non-zero, the payload block gets a 0, and trips the payload flag because it thinks the model is 0xDEADBEEF. This is an OK implementation, but it only is really useful for that use case scenario, where the SUS model is permanently down, AND it's bad for the use case where there's a single drop in communication in any cycle within a second. 

Then I check to see what the HAMs are doing. Hooray! Something different! These guys have a block that I'd never seen before, an "ISI_RFM_ERROR_CONVERTER." This a part of the ${userapps}/isi/common/models/isipayloadwd.mdl library, which has the following helpful note above it:
"Dolphin RFM error logic inverter. ISIPAYLOADWD is coded to work with EZCAREAD parts which use 0 for not connected, and 1 for connected. Dolphins RFM error channel reports errors per second so 0 is good (no errors) and or more is bad (up to the rate the code is running at)."
Inside is a choice block, which passes 0 if the error rate from the IPC block is greater than 0, and 1 otherwise. So, this is really just a glorified version of what's implemented on the non-H1ISIITMY BSC chambers at LHO and LLO. And the same use-case flaws apply.

We should 
(a) Make sure all chambers at both sites are doing the same thing, and
(b) Allow for more than one cycle's worth of errors -- infact allow for several seconds worth of errors -- before tripping the ISI via the payload flag's IPC connection failure.
Images attached to this report