Around 08:30 PDT h1iopsush2a had an ADC-DAC timing glitch. Pressing DIAG_RESET cleared the ADC error, but the glitch was large enough to desynchronize the DAC channels and the IOP went into its safe state of not driving any DAC channels.
At 09:24 I did a simple stop-all-models, start-all-models restart. Within 5 minutes the IOP glitched again. /proc/h1iopsush2a/status showed a large ADC timeout of 180uS at 09:34.
At 09:48 I performed a full power cycle of the cpu and IO Chassis. Sequence was: stop-all-models, take node out of Dolphin fabric, power down cpu, power down IO Chassis*, power up IO Chassis (wait for good timing lock), power up cpu (autostarts models).
* before powering the IO Chassis down, I noted it had a good timing status on the timing slave card.
This time the IOP IRIG-B went into a positive excursion, it had topped out at 1500 and is on its way down. System has been running for 25 minutes with no repeat of the ADC-DAC timing issues.
I noted that the auto-calibration of the 18bit DACS are all good, but the third card consistently takes longer to calibrate:
controls@h1sush2a ~ 0$ dmesg|grep CAL
[ 50.143460] h1iopsush2a: DAC AUTOCAL SUCCESS in 5341 milliseconds
[ 55.506760] h1iopsush2a: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 62.536318] h1iopsush2a: DAC AUTOCAL SUCCESS in 6572 milliseconds
[ 67.899590] h1iopsush2a: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 73.693450] h1iopsush2a: DAC AUTOCAL SUCCESS in 5344 milliseconds
[ 79.056792] h1iopsush2a: DAC AUTOCAL SUCCESS in 5345 milliseconds
[ 84.425154] h1iopsush2a: DAC AUTOCAL SUCCESS in 5345 milliseconds