Reports until 13:48, Monday 23 July 2012
X1 SUS
james.batch@LIGO.ORG - posted 13:48, Monday 23 July 2012 (3541)
X1 Tripleteststand running intermittently
The tripleteststand is malfunctioning - Symptoms are GPS time for user model and IOP model freezes at some time (identical for both), killing the user model and restarting the IOP results in a GPS time of 0, and the IOP indicates no ADC/DAC cards exist.  There is an interesting dialog which can be brought up using the dmesg command:

[13290.003398] x1ioptriple: Allocated daq shmem; set at 0xffffc9001a10d000
[13290.003399] x1ioptriple: configured to use 5 cards
[13290.003400] x1ioptriple: Initializing PCI Modules
[13290.003408] x1ioptriple: ADC card on bus a; device 4 prim a
[13290.003409] x1ioptriple: adc card on bus a; device 4 prim a
[13290.003415] x1ioptriple: pci0 = 0xffffffff
[13290.003423] resource map sanity check conflict: 0xffffffff 0x1000001fe 0xffc00000 0xffffffff reserved
[13290.003426] ------------[ cut here ]------------
[13290.003431] WARNING: at arch/x86/mm/ioremap.c:98 __ioremap_caller+0xd5/0x301()
[13290.003432] Hardware name: X8DTU
[13290.003433] Info: mapping multiple BARs. Your kernel is fine.
[13290.003434] Modules linked in: x1ioptriplefe(+) mbuf [last unloaded: x1ioptriplefe]
[13290.003437] Pid: 5760, comm: insmod Not tainted 2.6.34.1 #7
[13290.003438] Call Trace:
[13290.003442]  [] warn_slowpath_common+0x77/0x8f
[13290.003444]  [] warn_slowpath_fmt+0x3c/0x3e
[13290.003446]  [] __ioremap_caller+0xd5/0x301
[13290.003449]  [] ? T.484+0x13/0x15
[13290.003451]  [] ? pci_bus_read_config_dword+0x66/0x74
[13290.003452]  [] ioremap_nocache+0x12/0x14
[13290.003457]  [] mapAdc+0x65/0x251 [x1ioptriplefe]
[13290.003461]  [] mapPciModules+0x6c1/0x824 [x1ioptriplefe]
[13290.003465]  [] init_module+0x242/0x97d [x1ioptriplefe]
[13290.003468]  [] ? init_module+0x0/0x97d [x1ioptriplefe]
[13290.003471]  [] do_one_initcall+0x59/0x149
[13290.003475]  [] sys_init_module+0xd1/0x231
[13290.003477]  [] system_call_fastpath+0x16/0x1b
[13290.003478] ---[ end trace 31c35fdb3a0a9ba9 ]---
[13290.003481] ioremap reserve_memtype failed -22
[13290.003483] x1ioptriple: pci2 = 0xffffffff
[13290.003485] resource map sanity check conflict: 0xffffffff 0x1000001fe 0xffc00000 0xffffffff reserved
[13290.003487] ioremap reserve_memtype failed -22
[13290.003488] x1ioptriple: ADC I/O address=0xffffffff  0x0
[13290.003491] BUG: unable to handle kernel NULL pointer dereference at (null)
[13290.003919] IP: [] mapAdc+0xd8/0x251 [x1ioptriplefe]
[13290.004137] PGD 1b7cbc067 PUD 1b7c84067 PMD 0 
[13290.004352] Oops: 0000 [#1] SMP 
[13290.004562] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:27:01.0/class
[13290.004974] CPU 2 
[13290.004979] Modules linked in: x1ioptriplefe(+) mbuf [last unloaded: x1ioptriplefe]
[13290.005596] 
[13290.005801] Pid: 5760, comm: insmod Tainted: G        W  2.6.34.1 #7 X8DTU/X8DTU
[13290.006215] RIP: 0010:[]  [] mapAdc+0xd8/0x251 [x1ioptriplefe]
[13290.006637] RSP: 0018:ffff8801b7d49dc8  EFLAGS: 00010292
[13290.006846] RAX: 000000000000003f RBX: 0000000000000000 RCX: 000000000000003f
[13290.007059] RDX: 0000000000020cc5 RSI: ffffffff8179c135 RDI: 000000000000000a
[13290.007272] RBP: ffff8801b7d49df8 R08: 000000007ffffff2 R09: 000000000000000a
[13290.007487] R10: 0000000000000006 R11: 00000000ffffffff R12: ffffffffa000f130
[13290.007701] R13: ffff8801be8ad800 R14: 0000000000000000 R15: 0000000000000000
[13290.007914] FS:  00007f957e7836f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[13290.008326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13290.008537] CR2: 0000000000000000 CR3: 00000001a165d000 CR4: 00000000000006e0
[13290.008746] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13290.008955] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[13290.009165] Process insmod (pid: 5760, threadinfo ffff8801b7d48000, task ffff8801bdd0f200)
[13290.009571] Stack:
[13290.009771]  0000000000000000 ffffffffa000f130 0000000000000001 0000000000000000
[13290.009985] <0> 0000000000000000 0000000000000000 ffff8801b7d49e48 ffffffffa00087b7
[13290.010401] <0> ffff8801b7d49e18 00000000b7d49e58 ffff8801b7d49e48 ffff8801b7d49e58
[13290.011023] Call Trace:
[13290.011231]  [] mapPciModules+0x6c1/0x824 [x1ioptriplefe]
[13290.011445]  [] init_module+0x242/0x97d [x1ioptriplefe]
[13290.011659]  [] ? init_module+0x0/0x97d [x1ioptriplefe]
[13290.011872]  [] do_one_initcall+0x59/0x149
[13290.012081]  [] sys_init_module+0xd1/0x231
[13290.012291]  [] system_call_fastpath+0x16/0x1b
[13290.012501] Code: 00 02 00 00 e8 0c cc 01 e1 8b 35 91 e8 49 00 48 89 c2 49 89 c6 48 c7 c7 fc d3 00 a0 31 c0 e8 a9 57 00 00 4e 89 34 fd 20 5e 01 a0 <41> 8b 36 48 c7 c7 19 d4 00 a0 31 c0 e8 90 57 00 00 4a 8b 14 fd 
[13290.013247] RIP  [] mapAdc+0xd8/0x251 [x1ioptriplefe]
[13290.013468]  RSP 
[13290.018602] CR2: 0000000000000000
[13290.019122] ---[ end trace 31c35fdb3a0a9baa ]---

The only way to recover is to power the computer off, then power the I/O Chassis off at the power supply, wait, then power up the I/O Chassis, then the computer.  The IOP model can be started normally at that point, followed by the user model.  So far, the system has died three times since Wednesday July 18.