Reports until 08:13, Sunday 23 March 2014
H1 CDS
david.barker@LIGO.ORG - posted 08:13, Sunday 23 March 2014 - last comment - 08:34, Sunday 23 March 2014(10937)
possible problem with h1boot, investigating

I'm noticing a possible problem with h1boot, the NFS server of the /opt/rtcds file system. Machines which mount this file system are not letting me log in (they freeze after accepting my password) and h1boot is not responding to ping requests.

The disk-to-disk backup of h1boot at 05:00 this morning completed normally at 05:04. MEDM snap shot images suggest the problem appeared at 06:42 this morning.

Comments related to this report
david.barker@LIGO.ORG - 08:34, Sunday 23 March 2014 (10938)

Here are the central syslogs for the event.

 

 

Mar 23 06:42:18 h1boot kernel: [12441189.877539] CPU 0 

Mar 23 06:42:18 h1boot kernel: [12441189.877544] Modules linked in:

Mar 23 06:42:18 h1boot kernel: [12441189.877953] 

Mar 23 06:42:18 h1boot kernel: [12441189.878157] Pid: 4652, comm: nfsd Not tainted 2.6.34.1 #7 X8DTU/X8DTU

Mar 23 06:42:18 h1boot kernel: [12441189.878369] RIP: 0010:[<ffffffff8102f70e>]  [<ffffffff8102f70e>] find_busiest_group+0x3bc/0x784

Mar 23 06:42:18 h1boot kernel: [12441189.878785] RSP: 0018:ffff8801b9cefa60  EFLAGS: 00010046

Mar 23 06:42:18 h1boot kernel: [12441189.878993] RAX: 0000000000000000 RBX: ffff880001e0e3c0 RCX: 0000000000000000

Mar 23 06:42:18 h1boot kernel: [12441189.879403] RDX: 0000000000000000 RSI: ffff880001e0e4d0 RDI: 00000000fe15cf79

Mar 23 06:45:25 script0 kernel: [12378753.814330] nfs: server h1boot not responding, still trying

Mar 23 06:45:27 script0 kernel: [12378755.721463] nfs: server h1boot not responding, still trying