at 10:30 PST the h1iopoaf0 model detected an ADC/DAC error and stopped driving the DAC outputs. Before restarting the models I pressed the DIAG_RESET button to verify the ADC error in the STATE_WORD cleared (it did) and the DAC error was latched on (it was). We also did a quick data dump of h1iopoaf0's proc status file and dmesg.
Here is the /proc/h1iopoaf0/status file before the restart
root@h1oaf0 /proc/h1iopoaf0 0# cat /proc/h1iopoaf0/status
startGpsTime=1162832563
uptime=5511
cpuTimeEverMax=8
cpuTimeEverMaxWhen=1162832990
adcHoldTime=15
adcHoldTimeEverMax=90
adcHoldTimeEverMaxWhen=1162837830
adcHoldTimeMax=17
adcHoldTimeMin=13
adcHoldTimeAvg=14
usrTime=2
usrHoldTime=3
cycle=25821
gps=1162838074
buildDate=Nov 10 2016 08:33:32
cpuTimeMax(cur,past sec)=3,5
cpuTimeMaxCycle(cur,past sec)=21,0
cycleHist: 3=65068@17 4=466@65535 5=2@1
DAC #0 18-bit buf_size=40
DAC #1 16-bit fifo_status=0 (OK)
ADC #0 read time MAX=17 Current=14
ADC #1 read time MAX=0 Current=0
ADC #2 read time MAX=0 Current=0
ADC #3 read time MAX=0 Current=0
ADC #4 read time MAX=0 Current=0
ADC #5 read time MAX=0 Current=0
The adcHoldTimeEverMax looks very high (we see it on other systems in the 60-70 range but not this high). The adcHoldTimeEverMaxWhen=1162837830 equates to Nov 10 2016 10:30:13 PST which is the time of the event.
On other systems running with 16bit DACs the FIFO_STATUS is always 2. After h1iopoaf0 was restarted, its status is now 2. Looking at the manual, it suggests
2 | FIFO in Low Quarter |
0 | FIFO in 2nd or 3rd Quarter |
I've setup monitors to report on ADC and FIFO status if this happens again.