Reports until 10:50, Thursday 04 January 2018
H1 CDS
jonathan.hanks@LIGO.ORG - posted 10:50, Thursday 04 January 2018 - last comment - 14:52, Thursday 04 January 2018(39984)
Restarted models on the h1lsc0
I restarted the h1lsc0 models today.  Dave Barker is trending the timing information and state word and may have more to add.  But at this point we suspect it is likely a glitch in the IRIG-B as seen yesterday.

The system was not responsive via the network via ssh or EPICS CA.  The MEDM screen on the control room wall showed everything green, but trying to view the medm screen from another computer (a new connection) failed with the channels not connecting.

Going to the console showed that the repeated error 'nf_conntrack: table full, dropping packet'.  The system was set to track 64k connections.  I changed the limit (until reboot) to 100,000.  At that point new connections could be made and the medm screens went red with IPC.  I am suprised by this behavior, I would have though the IPC bit would have gone bad on the other machines irregardless of the state of the lsc machine.  At this point I killed all the models and restarted them.  Then TJ and I went through and cleared all the IPC errors through the site after verifying that they where related to h1lsc0.

Reviewing the dmesg output and filtering out nf_conntrack errors showed an ADC TIMEOUT on h1lscaux, h1omc, h1sqz, h1omcpi, h1lsc at 7246965.68s since boot.

Comments related to this report
david.barker@LIGO.ORG - 11:14, Thursday 04 January 2018 (39985)

BTW: the network filter connection tracking problem seen on h1lsc0 today was also seen on h1oaf0 in November 2016.

alog: Link

david.barker@LIGO.ORG - 11:30, Thursday 04 January 2018 (39986)

The EDCU is configured to read two EPICS channels from the h1ioplsc0 model via channel access (H1:FEC-7_STATE_WORD and H1:IOP-LSC0_ADC_DT_OUTMON). Of the two, the latter should be constantly changing and would show if it froze to a single value. Trending this channel shows that the EDCU did not lose its connection to h1ioplsc0 this morning, but the hourly autoburt could not connect at 10:10 PST. The autoburt could however connect to the user models on h1lsc0 at this time (only the IOP model was disconnected).

 

jonathan.hanks@LIGO.ORG - 14:52, Thursday 04 January 2018 (39999)
Restarted the models again around 22:50 UTC.  They are working in the CER moving fibers and such around so we expect this glitch was caused by this work.