Reports until 19:53, Saturday 16 July 2016
H1 ISC
stefan.ballmer@LIGO.ORG - posted 19:53, Saturday 16 July 2016 - last comment - 08:58, Sunday 17 July 2016(28458)
Computer gremlins

Carl, Evans, Stefan,

We had h1ecatx1 crash. A reboot didn't bring back the connection - we had to log in and manually start start.bat.

Then we ran into strange guardian behaviour, which was tracked down to epics channels having different values on h1guardian0 and operator machines.

In particular, on h1guardian0:

In [8]: ezca['ALS-Y_LOCK_ERROR_FLAG']
Out[8]: 1
 

while on operator 0 I get:

In [7]: ezca['ALS-Y_LOCK_ERROR_FLAG']
Out[7]: 0

 

Comments related to this report
david.barker@LIGO.ORG - 20:20, Saturday 16 July 2016 (28459)

Problem was with an epics gateway, which was stuck with the incorrect value. I restarted the gateway between the slowcontrols-lan and the fe-lan, guardian now connects directly to the h1ecaty1 Beckhoff IOC and is seeing the correct value. 

Here is the diagnostics:

On the workstation nucws20, I did a 'caget -d 5' to return an integer value rather than the enumerated string

david.barker@nucws20: caget -d 5 H1:ALS-Y_LOCK_ERROR_FLAG

H1:ALS-Y_LOCK_ERROR_FLAG

    Value:            0

Same command on h1guardian0

controls@h1guardian0:~ 0$ caget -d 5 H1:ALS-Y_LOCK_ERROR_FLAG

H1:ALS-Y_LOCK_ERROR_FLAG

    Value:            1

More information can be obtained with the cainfo command

controls@h1guardian0:~ 0$ cainfo H1:ALS-Y_LOCK_ERROR_FLAG

H1:ALS-Y_LOCK_ERROR_FLAG

    State:            connected

    Host:             h1egw0.cds.ligo-wa.caltech.edu:42076

    Access:           read, write

    Native data type: DBF_ENUM

    Request type:     DBR_ENUM

    Element count:    1

CA.Client.Exception...............................................

    Warning: "Identical process variable names on multiple servers"

    Context: "Channel: "H1:ALS-Y_LOCK_ERROR_FLAG", Connecting to: h1egw0.cds.ligo-wa.caltech.edu:42076, Ignored: h1ecaty1.cds.ligo-wa.caltech.edu:5064"

    Source File: ../cac.cpp line 1297

    Current Time: Sat Jul 16 2016 19:56:42.065413401

..................................................................

After the errant gateway was restarted:

controls@h1guardian0:~ 0$ cainfo H1:ALS-Y_LOCK_ERROR_FLAG

H1:ALS-Y_LOCK_ERROR_FLAG

    State:            connected

    Host:             h1ecaty1.cds.ligo-wa.caltech.edu:5064

    Access:           read, write

    Native data type: DBF_ENUM

    Request type:     DBR_ENUM

    Element count:    1

controls@h1guardian0:~ 0$ caget -d 5 H1:ALS-Y_LOCK_ERROR_FLAG

H1:ALS-Y_LOCK_ERROR_FLAG

 

    Value:            0

david.barker@LIGO.ORG - 08:58, Sunday 17 July 2016 (28461)DAQ, GRD

I've opened an FRS ticket for this, we should either remove having two options for guardian connection to remote IOCs or ensure only one connection is reliably used.

https://services.ligo-la.caltech.edu/FRS/show_bug.cgi?id=5892