H1 Upgrade to RCG5.1.1
Jonathan, Erik, EJ, Jim, TJ, Dave:
This morning we upgraded H1 from RCG5.0.5 to RCG5.1.1. The 5.0.5 boot server h1vmboot1 was retired, replaced with the 5.1.1 boot server h1vmboot0.
We had performed several test upgrades using a small number of non-Dolphined front ends. On Friday 10th we upgraded h1susaux[h2, h34, h56, ex, ey] and h1pemmx to RCG5.1.0 (as it was at the time). At the time we encountered several issues resulting from file ownership/permissions. A new RCG5.1.1 was release which is more relaxed file ownership rules was release.
Monday afternoon we upgraded the test front ends again from 5.1.0 to 5.1.1, this time not rebooting the computers, just restarting the models. We also did a full 5.1.1 install and rebooted h1build in preparation for the Tuesday main upgrade. Unfortunately we had several end station front end computers lock up around this time (h1susey, h1seiey, h1iscey, h1susex). To get these systems running again over Monday night we reverted their installs back to 5.0.5. While doing this we found that the DAQ INI file had changed for h1isietmy due model changes since its last restart.
This morning, starting at 8am, we started the main upgrade. Jim undid the late 2022 changes to h1isiham4 and h1isietmy which we detected Monday evening. Boot and Dolphin management services were stopped on h1vmboot1 and started on h1vmboot0.
First we did a test reboot of h1build, no end station problems were seen. We then rebooted h1ecatmon0 and got the slow controls SDF online again. Because most susaux and pemmx had already been upgraded, these were left alone to feed data to the DAQ to keep that running.
We stopped all of H1 models using the "rcg stop --all" command.
First we rebooted h1susauxb123 because it is running the EDC, needed for vacuum trending.
Next EY was upgraded, using h1vmboot1 to log in and issue the reboot command after the Dolphin switch port was fenced.
After the EY Dolphin fabric checked out, EX was upgraded.
At this point the DAQ was restarted to read the new INI files. RCG5.1.1 added slow channels to every model for IPC diagnostics.
The corner station front ends were booted in the order PSL, SUS, SEI, ISC.
The last machine to be rebooted was h1cdsrfm.
h1iopseih16 would not start, this front end had lost contact with the second Adnaco backplane in its IO Chassis. h1seih16 front end computer and IO Chassis were power cycled, which resolved this problem.
At this point all of the front ends had been upgraded and we had a green oveview with the exeption of WD dackills.
A new H1EPICS_GRD.ini was generated following the creation of the CAMERA_SERVO node.
We did a second DAQ restart, with an EDC restart included.
At 10:40 we handed control of H1 over the the control room.
DAQ starts were not without issues. We had the need to restart gds0 and gds1 to resync their channel lists. After running for several minutes we also had spontaneous restarts of fw0 and fw1 (just one each).
Erik has found that some IOP models, which have SWWDs, have had bad DAQ data since the 5.0.5 upgrade in Oct. This manifests itself as a channel hop, by 2 chans in the ini file. A fix is forthcoming.
After the RCG upgrade work was done Ryan and I recovered all SUS (after resetting the WD) and later Ryan recovered the SEI.
The SUS were set to ALIGNED state (at first Damped and then toggled to ALIGNED).
Tue14Feb2023
LOC TIME HOSTNAME MODEL/REBOOT
09:00:01 h1build ***REBOOT***
09:12:54 h1ecatmon0 ***REBOOT***
09:19:05 h1susauxb123 ***REBOOT***
09:29:56 h1iscey ***REBOOT***
09:30:33 h1susey ***REBOOT***
09:31:57 h1seiey ***REBOOT***
09:32:40 h1pemmy ***REBOOT***
09:37:39 h1iscex ***REBOOT***
09:37:48 h1susex ***REBOOT***
09:38:06 h1seiex ***REBOOT***
09:41:53 h1psl0 ***REBOOT***
09:43:35 h1susb123 ***REBOOT***
09:43:43 h1sush2a ***REBOOT***
09:43:48 h1sush2b ***REBOOT***
09:43:49 h1sush34 ***REBOOT***
09:44:07 h1sush56 ***REBOOT***
09:44:16 h1sush7 ***REBOOT***
09:44:34 h1cdsh8 ***REBOOT***
09:48:20 h1seib2 ***REBOOT***
09:48:21 h1seib1 ***REBOOT***
09:48:24 h1seib3 ***REBOOT***
09:48:31 h1seih16 ***REBOOT***
09:48:33 h1seih23 ***REBOOT***
09:48:36 h1seih45 ***REBOOT***
09:48:52 h1seih7 ***REBOOT***
09:49:13 h1asc0 ***REBOOT***
09:49:22 h1lsc0 ***REBOOT***
09:49:24 h1oaf0 ***REBOOT***
09:49:30 h1omc0 ***REBOOT***
09:55:10 h1cdsrfm ***REBOOT***
10:01:06 h1oaf0 ***REBOOT***
10:08:05 h1seih16 ***REBOOT***
10:19:38 h1susauxb123 h1edc[[DAQ]]