RCG 3.0.3 upgrade.
Jim, Dave:
All the frontends were rebuilt against RCG TAG 3.0.3 on Friday afternoon before the shutdown. The build started with an empty H1.ipc file to clean out any unused IPC chans. After all the frontends were restarted today, they are now running with 3.0.3
Infrastructure Restart
Richard, Carlos and Jim:
The CDS infrastructure was restarted between 7:00 and 8:30. These include networking, DC power, timing, DNS, DHCP, NTP, NFS, auth.
Front End Startup
Richard, Jim, Dave:
h1boot was started. Then h1psl0 was started as an non-dolphin FEC. This permitted the DAQ to be restarted (needs a running front end). Then the remaining non-dolphin MSR FECs were started. The Richard started EX, EY, MX and MY FECs. Finally we started the MSR dolphin machines. All started up once the fabric was complete.
h1lsc0 reported timing issues. The timing card on this IO Chassis is independently powered, I activated this DC supply and the system started normally.
h1pemmy needed on more IOP restart, as is expected for these non-IRIGB units.
During the startup of the SWWDs, all MSR dolphin channels were found to be not working. We tracked this down to the Dolphin switches not being powered up, this was easily resolved.
Eagle eye-ed Jim noticed the system time on h1psl0 was way off (by 7 hours). We assumed because it was the first to be started soon after h1boot (which perhaps had not NTP synced). We manually set the time and handed control back to NTPD.
EPICS Gateways
Dave:
All the epics gateway processes on cdsegw0 were restared using the start.all script. This started too many gateways resulting in duplicate data paths, the redundent gateways were removed and the script was corrected.
h1hwinj1
Jim, Dave:
We started the PSINJECT process on h1hwinj1. Again the SL firewall prevented EPICS CA access, which we re-remembered and fixed. CW injection to PCAL is running.
Slow Controls SDF
I started Jonathan's SDF monitor pseudo-target on h1build.
Vacuum Controls and FMCS cell phone notification
I got the cell phone alarm texter running again on cdslogin.
PI Model changes
Tega, Dave
A new PI_MASTER.mdl file was created at 15:10 Friday, which just missed the 3.0.3 build cut-off time. We recompiled and restarted the four models which use this file (h1omcpi, h1susitmpi, h1susetmxpi and h1susetmypi). A DAQ restart was also required.
NDS processes not giving trend data
Jeff, Jim, Dave
here was an interesting one, the nds processes were giving real-time data but no archived trend data. The cronjob which was keeping the jobs directory in check by deleting all files older than one day, today cleaned out the directory and then deleted the jobs directory itself. Re-creating the directory and stopping the cronjob from erasing it again fixed the problem.
h1tw0 raid issues
Carlos, Ryan
h1tw0 did not like being power cycled and the new RAID controller card stopped working again. It looks like we are going to have to recreate the raid again tomorrow.
DAQ instability
Dan, Jim, Dave
sadly we leave the system with an unstable h1fw0. Earlier in the day h1fw1 was unstable, then h1fw0 became very unstable. Dan did some QFS diagnostics, I did some NFS diagnostics, we cannot see any reason for the instablities, all NFS/disk access is well within bandwidth with no indication of packet/re-transmission errors.
in the evening I started the camera copy software which sends digital camera centroid data from the corner station to the ALS models at the end stations.