Carl, Evans, Stefan,
We had h1ecatx1 crash. A reboot didn't bring back the connection - we had to log in and manually start start.bat.
Then we ran into strange guardian behaviour, which was tracked down to epics channels having different values on h1guardian0 and operator machines.
In particular, on h1guardian0:
In [8]: ezca['ALS-Y_LOCK_ERROR_FLAG']
Out[8]: 1
while on operator 0 I get:
In [7]: ezca['ALS-Y_LOCK_ERROR_FLAG']
Out[7]: 0
Problem was with an epics gateway, which was stuck with the incorrect value. I restarted the gateway between the slowcontrols-lan and the fe-lan, guardian now connects directly to the h1ecaty1 Beckhoff IOC and is seeing the correct value.
Here is the diagnostics:
On the workstation nucws20, I did a 'caget -d 5' to return an integer value rather than the enumerated string
david.barker@nucws20: caget -d 5 H1:ALS-Y_LOCK_ERROR_FLAG
H1:ALS-Y_LOCK_ERROR_FLAG
Value: 0
Same command on h1guardian0
controls@h1guardian0:~ 0$ caget -d 5 H1:ALS-Y_LOCK_ERROR_FLAG
H1:ALS-Y_LOCK_ERROR_FLAG
Value: 1
More information can be obtained with the cainfo command
controls@h1guardian0:~ 0$ cainfo H1:ALS-Y_LOCK_ERROR_FLAG
H1:ALS-Y_LOCK_ERROR_FLAG
State: connected
Host: h1egw0.cds.ligo-wa.caltech.edu:42076
Access: read, write
Native data type: DBF_ENUM
Request type: DBR_ENUM
Element count: 1
CA.Client.Exception...............................................
Warning: "Identical process variable names on multiple servers"
Context: "Channel: "H1:ALS-Y_LOCK_ERROR_FLAG", Connecting to: h1egw0.cds.ligo-wa.caltech.edu:42076, Ignored: h1ecaty1.cds.ligo-wa.caltech.edu:5064"
Source File: ../cac.cpp line 1297
Current Time: Sat Jul 16 2016 19:56:42.065413401
..................................................................
After the errant gateway was restarted:
controls@h1guardian0:~ 0$ cainfo H1:ALS-Y_LOCK_ERROR_FLAG
H1:ALS-Y_LOCK_ERROR_FLAG
State: connected
Host: h1ecaty1.cds.ligo-wa.caltech.edu:5064
Access: read, write
Native data type: DBF_ENUM
Request type: DBR_ENUM
Element count: 1
controls@h1guardian0:~ 0$ caget -d 5 H1:ALS-Y_LOCK_ERROR_FLAG
H1:ALS-Y_LOCK_ERROR_FLAG
Value: 0
I've opened an FRS ticket for this, we should either remove having two options for guardian connection to remote IOCs or ensure only one connection is reliably used.
https://services.ligo-la.caltech.edu/FRS/show_bug.cgi?id=5892