J. Kissel, K. Izumi, J. Warner, S. Dwyer After another ETMX front-end failure this morning (see LHO aLOG 35861, 35857 etc.), the recovery of the IFO was much easier, because of yesterday morning's lessons learned about not running initial alignment scripts that suffer from bit rot (see LHO aLOG 35839). However, after completing recovery, the SDF system's OBSERVE.snap let us know that some of the same critical initial alignment references were changed at 14:17 UTC, namely - the green ITM camera reference points: H1:ALS-X_CAM_ITM_PIT_OFS H1:ALS-X_CAM_ITM_YAW_OFS and - the transmission monitors red QPDs: H1:LSC-X_TR_A_LF_OFFSET After discussing with Jim, he'd heard the Corey (a little surprisingly) didn't have too much trouble with turning on the green ASC system, which, if these ITM camera offsets are large, then that means the error signals are large, and we'd have the same trouble closing them as yesterday. We traced down the change to when Dave had to the reboot of h1alsex & h1iscex this morning at around 14:15 UTC -- see LHO aLOG 35862 -- and those two models out of date safe.snap files restored. Recall that the safe.snaps for these computers are soft linked to the down.snaps in the user apps repo: /opt/rtcds/lho/h1/target/h1alsex/h1alsexepics/burt ]$ ls -l safe.snap lrwxrwxrwx 1 controls controls 62 Mar 29 2016 safe.snap -> /opt/rtcds/userapps/release/als/h1/burtfiles/h1alsex_down.snap /opt/rtcds/lho/h1/target/h1iscex/h1iscexepics/burt ]$ ls -l safe.snap lrwxrwxrwx 1 controls controls 62 Mar 29 2016 safe.snap -> /opt/rtcds/userapps/release/isc/h1/burtfiles/h1iscex_down.snap where the "safe.snap" in the local "target" directories are what the front uses to restore its EPICs records (which is why we've intentionally commandeered the file with a soft link to a version controlled file in the userapps repo). We've since reverted the above offsets to their OBSERVE values, and I've accepted those OBSERVE values into the safe.snap / down.snap and committed the updated snap to the userapps repo. In the attached screenshots, the "EPICS VALUE" is the correct OBSERVE value, and the "SETPOINT" is the errant safe.snap. So, they show what I've accepted as the current correct value.
The fundamental problem here is our attempt to maintain 2 files with nearly duplicate information (safe and observe are mostly the same settings, realistically only one file is ever going to be well maintained).
I've added a test to DIAG_MAIN to check if the ITM camera references change. It's not a terribly clever test, because it just checks if the camera offset is within a small range around a hard coded value for pitch and yaw for each ITM. These values will need to be adjusted if the cameras are moved or if the reference spots are moved meaning there will be 3 places these values need to be updated (both OBSERVE and safe.snap files and, now, DIAG_MAIN) but hopefully this will help keep us from getting bitten by changed references again. The code is attached below.
@SYSDIAG.register_test
def ALS_CAM_CHECK():
"""Check that ALS CAM OFS references havent changed. Will need to be updated if cameras are moved
"""
nominal_dict = {
'X' : {'PIT':285.850, 'YAW':299.060, 'range':5},
'Y' : {'PIT':309.982, 'YAW':367.952, 'range':5},
}
for opt, vals in nominal_dict.iteritems():
for dof in ['PIT','YAW']:
cam = ezca['ALS-{}_CAM_ITM_{}_OFS'.format(opt,dof)]
if not (vals[dof] + vals['range']) > cam > (vals[dof] - vals['range']):
yield 'ALS {} CAM {} OFS changed from {}'.format(opt,dof,vals[dof])