Jeff K, Jeff B. , Jenne, TJ, Jim, Dave:
Around 4pm PST TJ reported that OMC had tripped and the watchdog could not be untripped. Jeff K. recommended a model restart. Unfortunately due to a communication problem we first mistakenly restarted the OMC model on the LSC front end (sorry OMC). Then we restarted the correct SUS-OMC model on SUSH56. This did not fix it. We then restarted all the models on SUSH56 (including the IOP). This did not fix it. We then stopped all models and only started IOP and SUS-SRM to do further debugging. (in the mean time the SWWD on the IOP had tripped SEI for HAM5 and HAM6). After some debugging we found that the PERL script sus/common/scripts/wdreset_all.pl was throwing an error about not finding the PERL CA LIBRARY. Jim tracked this down to a missing CaTools.pm perl module in the userapps/release/guardian directory. Turns out this file was removed from the SVN repository way back on 2nd March 2015 and the LHO working directory was only updated this afternoon by Jenne and TJ. This all nicely ties in with the watchdog resets working last night but not this afternoon.
In the mean time we had manually reset the watchdogs for SUS-SRM/SR3/OMC and SEI HAM5,6 and set the SDF back to OBSERVE for SUSH56IOP, SUSSRM/SR3/OMC and OMC.
For now we have manually copied the CaTools.pm file into userapps/release/sus/common/scripts to get the watchdog reset script working again.
This raises an FRS:
A perl module which is used by the watchdog systems has been deprecated. The watchdog system should be changed to no longer use PERL and instead use PYTHON (or perhaps BASH for exceptionally simple scripts).
FRS LINK
Stuart, it was broken because I updated the same the same folder when I was visiting LLO. I am at fault for both of these CaTools.pm links being broken at both sites, though I had no idea that simply updating the SVN could cause this.