Displaying report 1-1 of 1.
Reports until 12:02, Tuesday 08 November 2016
H1 CDS (TCS)
david.barker@LIGO.ORG - posted 12:02, Tuesday 08 November 2016 - last comment - 13:28, Tuesday 08 November 2016(31316)
h1oaf0 problems

WP6287 Add PEM ADC to h1oaf0, reconfigure h1iopoaf0 to read new ADC

Jim installed a 7th ADC into the h1oaf0 IO Chassis this morning. On power up, the IOP model did not start well and reported ADC and DAC errors.

Since we were to restart the h1iopoaf0 model to clear the DAC errors, we installed the new code which reads the new ADC.

After running for about an hour, the h1oaf0 machine stopped making new network connections with the console error

nf_conntrack: table full, dropping packet

repeating frequently. On h1lsc0 we issued the command to remotely take h1oaf0 out of the corner station dolphin fabric, and on h10af0's console we issued the command for it to prepare-shutdown from the fabric. With no connection between h1oaf0 and h1boot (dolphin master) we had little confidence we could reboot h1oaf0 without glitching most of the corner staiton.

Researching the error, we found that it is possible to expand the netfilter connection tracking table size on-the fly with the command

echo 256000 > /proc/sys/net/netfilter/nf_conntrack_max

(the max is at the default of 65536).

This cleared the error and new MEDM's and Guardian could establish CA links, and we could SSH onto the machine. At this point we again issued the dolphin prepare shutdown command with more confidence that it was successful, but there is still a chance of corner station crash.

We will make the larger table size the default by creating the file on the boot server (h1boot)

/diskless/root/etc/sysctl.conf

with one line

net.netfilter.nf_conntrack_max = 256000

will test this on the reboot of h1oaf0 (waiting for a good time if a CS crash is precipitated)

Note that around the time of the nf_conntrack errors the IOP reported ADC and DAC errors. It is still a possibility that the new ADC was the  cause of  these errors and may be removed it another error is seen.

Comments related to this report
keith.thorne@LIGO.ORG - 12:16, Tuesday 08 November 2016 (31318)
Will likely propagate to LLO when possible
david.barker@LIGO.ORG - 13:28, Tuesday 08 November 2016 (31324)

looks like our change to /etc/sysctl.conf didn't work and the front end computer had defaulted back to 65536. We manually set it to 256000 for now.

Displaying report 1-1 of 1.