I am reminded of a kernel 2.6.34 bug whereby the system is prone to lockup after 208.5 days have elapsed. At the time of its freeze, h1boot had been running for 215 days. This bug is also most probably the reason for h1build's freeze ten days before h1boot's freeze. The dates agree with restart data shown in this alog: Link
This will all be resolved soon when we transition the front ends, boot and build machines to a later kernel.
david.barker@LIGO.ORG - 16:05, Tuesday 05 September 2017 (38528)
the longest running front ends have been up for 123 days, so no need to reboot these soon. The gentoo DAQ machines are running kernel 2.6.35, which has a bug fix for this problem. This is evidenced by h1tw1 which has been running for 239 days, well beyond the 208.5 days onset of the problem.
h1boot locked up due to 208.5 days bug.
I am reminded of a kernel 2.6.34 bug whereby the system is prone to lockup after 208.5 days have elapsed. At the time of its freeze, h1boot had been running for 215 days. This bug is also most probably the reason for h1build's freeze ten days before h1boot's freeze. The dates agree with restart data shown in this alog: Link
This will all be resolved soon when we transition the front ends, boot and build machines to a later kernel.
the longest running front ends have been up for 123 days, so no need to reboot these soon. The gentoo DAQ machines are running kernel 2.6.35, which has a bug fix for this problem. This is evidenced by h1tw1 which has been running for 239 days, well beyond the 208.5 days onset of the problem.