aLIGO LHO Logbook

H1 CDS

david.barker@LIGO.ORG - posted 15:33, Friday 28 April 2017 - last comment - 15:41, Friday 28 April 2017(35880)

front end computer crashes may be related to sched_clock overflow after 208.5 days

Ryan has tracked down the core problem. With older kernels, the sched clock overflows in about 208.5 days. My previous alogs showed that the computers have been running since the last power outage (30 Sep 2016 06:45), and the error messages on h1susex and h1seiex suggested a clock zeroing late Wednesday night (26 Apr 2017 22:00). The time difference between these two times is 208.6 days.

We are investigating why only two computers have crashed, why both at EX, why both around 6am and why one day apart.

If you google 'sched clock overflows in 208 days' you will see many articles on this. One article references kernel 2.6.32, we are running 2.6.34.

Comments related to this report

david.barker@LIGO.ORG - 15:41, Friday 28 April 2017 (35882)

Link

I saw a posting saying the bug was fixed in kernel 2.6.38