WP11598 Upgrade HWS computer hardware
Jonathan, Erik, TJ, Camilla, Dave:
Yesterday Jonathan and Erik replaced the original h1hwsmsr computer with a spare V1 computer. They moved the bootdisk and the /data RAID disks over to the new computer, and restored the /data NFS file system for the ITMY HWS code (h1hwsmsr1). At the time the new computer was not connecting to the ITMX HWS camera.
This morning Camilla worked on initialized the camera connection and we were at that time able to control and see images from the camera.
This afternoon at 1pm during the commissioning period we stopped the temporary HWS ITMX IOC on cdsioc0 and Camilla started the actual HWS ITMX code on h1hwsmsr. We verified that the code is running correctly, images are being taken, settings were restored from SDF.
During the few minutes between stopping the dummy IOC and starting the actual IOC both the EDC and h1tcshwssdf SDF had disconnected channels, which then reconnected over the subsequent minutes after channel restoration.
As Naoki did in 75023, with SQZ_ANG_ADJUST in DOWN, I adjusted OPO temperature and H1:SQZ-ADF_OMC_TRANS_PHASE to improve SQZ in the yellow BLRMs (350Hz region). Our SQZ angle has changed from 167 to 180degrees. Since I did this the servo moved the angle away from and back towards this optimum 180degrees, will continue to watch.
BLRMs 350Hz and below improved from this change. Other BLRMs remained the same. The change in SQZ angle was from Jenne's adjustments in OMC QPD offsets, effecting the SQZ ASC (expected) and then the SQZ angle servo.
Plots of SQZ angle and SQZ ASC attached.
When I changed the OPO temp and SQZ angle, the AS42_SUM_NORM almost halved, unsure if that is expected (see upper left of plot).
I ran a broadband calibration suite at 20:00:40 UTC using pydarm measure --run-headless bb
diag> save /ligo/groups/cal/H1/measurements/PCALY2DARM_BB/PCALY2DARM_BB_20240103T200048Z.xml
/ligo/groups/cal/H1/measurements/PCALY2DARM_BB/PCALY2DARM_BB_20240103T200048Z.xml saved
diag> quit
EXIT KERNEL
INFO | bb measurement complete.
INFO | bb output: /ligo/groups/cal/H1/measurements/PCALY2DARM_BB/PCALY2DARM_BB_20240103T200048Z.xml
INFO | all measurements complete.
After this completed I ran the simulines sweep using gpstime;python /ligo/groups/cal/src/simulines/simulines/simuLines.py -i /ligo/groups/cal/src/simulines/simulines/settings_h1.ini;gpstime
GPS start: 1388347800
GPS stop: 1388349124
2024-01-03 20:31:46,637 | INFO | File written out to: /ligo/groups/cal/H1/measurements/DARMOLG_SS/DARMOLG_SS_20240103T200944Z.hdf5
2024-01-03 20:31:46,656 | INFO | File written out to: /ligo/groups/cal/H1/measurements/PCALY2DARM_SS/PCALY2DARM_SS_20240103T200944Z.hdf5
2024-01-03 20:31:46,668 | INFO | File written out to: /ligo/groups/cal/H1/measurements/SUSETMX_L1_SS/SUSETMX_L1_SS_20240103T200944Z.hdf5
2024-01-03 20:31:46,680 | INFO | File written out to: /ligo/groups/cal/H1/measurements/SUSETMX_L2_SS/SUSETMX_L2_SS_20240103T200944Z.hdf5
2024-01-03 20:31:46,691 | INFO | File written out to: /ligo/groups/cal/H1/measurements/SUSETMX_L3_SS/SUSETMX_L3_SS_20240103T200944Z.hdf5
Attached is a screenshot of the calibration monitor, and the pydarm report.
H1 is still locked will begin commissioning at 20:00 UTC until 23:00 UTC in coordination with LLO. Main things going on during this period is a calibration suite, SQZ work, and a CAL DARM measurement.
Wed Jan 03 10:06:32 2024 INFO: Fill completed in 6min 29secs
Gerardo confirmed a good fill curbside
TITLE: 01/03 Day Shift: 16:00-00:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Observing at 152Mpc
OUTGOING OPERATOR: Tony
CURRENT ENVIRONMENT:
SEI_ENV state: CALM
Wind: 1mph Gusts, 0mph 5min avg
Primary useism: 0.04 μm/s
Secondary useism: 0.43 μm/s
QUICK SUMMARY:
- H1 currently on a 16 hour lock
- Seismic activity is low, CDS/DMs ok
Yesterday John Z restarted the BLRMS GDS monitor, the control room SEIS BLRMS FOM on nuc5 started displaying data from 9pm onwards.
Verbal Alarms, which had been reporting leaps.py issues, started working normally yesterday afternoon. We currently don't understand the original error or why it cleared up.
TITLE: 01/03 Eve Shift: 00:00-08:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 157Mpc
INCOMING OPERATOR: Tony
SHIFT SUMMARY:
Other than the one hitch related to the Leap Second, this was a fairly nice and smooth shift post-Maintenance.
LOG:
Rainy night continues. H1 has been locked for 4.25hrs (with about 45min out of observing due to the CW injection Leap Second issue). Rode through a M4.9 EQ from Japan.
At 0037utc, H1 was bumped out of OBSERVING due to an SDF change. The change has been this channel:
I chatted with Louis (phoned him because it is a CAL channel), but he mentions this is not an "online calibration issue", and this has to do with the Continuous Wave Injection. It also does not appear to be guardian-related. After chatting with Louis, since Jonathan and Erik are on-site, I just notified them about the issue.
We have currently been OUT of OBSERVING for about 40+min.
Attached is another screenshot with:
Currently, Dave, Erik, & Jonathan are investigating.
0128: Back to OBSERVING! (thanks to Dave, Erik, Jonathan, and also Louis for my first chat with him).
Dave, Erik, Jonathan, This was caused by an expired leap second file on the h1hwinj1 machine. It looked like the CW injection process was restarting. After investigation we noticed that the psinject service was failing, with an error loading the leap second database file. The solution was to update the tzdata package. On SL7 this is "yum update tzdata". After we did that the injection process was able to startup and run.
TITLE: 01/03 Eve Shift: 00:00-08:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 152Mpc
OUTGOING OPERATOR: Austin
CURRENT ENVIRONMENT:
SEI_ENV state: CALM
Wind: 6mph Gusts, 4mph 5min avg
Primary useism: 0.03 μm/s
Secondary useism: 0.46 μm/s
QUICK SUMMARY:
Austin handed off an H1 nearly at NLN. I eventually took H1 to Observe, at 0011utc, but at 0037, have been getting bumped out of OBSEVERING due to a gain change (i.e. H1:CAL-INJ_CW_GAIN) for CALINJ SDF.
Attached is the last 20+ min of these gain changes from 1.0 to 0.0 every ~1min. Have Louis on the phone now.
Forgot to mention that Jim chatted with me regarding the HAM3 ISI, and that if the glitching issues return, I should phone him if it is before ~8pm. Luckily, we have not had to deal with this for our current 24+hr lock!
After the glitches last night, I went to HAM3 to do the usual fixes. I powered off the corner 2 CPS, unplugged the boards in the satellite chassis at the chamber, put it all back together and powered everything back on. The CPS haven't glitched since then, so seems like maybe things are fine now.
The 65-100 hz blrms are a good witness glitches for a couple hours before they started tripping the HAM3 ISI. Attached trend shows about 5 hours before the first HAM3 trip on Monday. The top row are the raw H2 and V2 CPS in counts, the middle row is the watchdog state, and the bottom row are the 65-100hz blrms for the corner 2 and corner 3 blrms. The H2 and V2 cps start seeing glitches that don't trip the ISI about 3 hours before the first trip, these glitches don't really show up in the in the corner 3 CPS either. These glitches also don't coincide with locklosses, if the ISI doesn't trip. Under normal circumstances, these blrms are well below 10 nm, the first few glitches are up to 600 nm, but a glitch of ~1000nm causes the ISI to trip. There haven't been any glitches since I touched the CPS yesterday, so I think we are in the clear for now.
I'm still not sure of the right way to alarm on this, but some sort of days-ish timeseries trend when ISI trips on CPS would probably be a good place to start.
TITLE: 01/02 Day Shift: 16:00-00:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Lock Aquisition
INCOMING OPERATOR: Corey
SHIFT SUMMARY:
- Light maintanence day, mainly VAC turbo/ion pump work and a HWS card swap from the CDS team
- Ran an initial alignment after ISC went through CHECK MICH FRINGES thrice - ran without issue
Note that there was a large tow truck on site towing a car from 22:16 - 22:25 UTC. Tagging DetChar in case this shows up on their end as noise.
- Lockloss @ 23:10 - cause unknown
- H1 is currently relocking at TRANSITION FROM ETMX
LOG:
| Start Time | System | Name | Location | Lazer_Haz | Task | Time End |
|---|---|---|---|---|---|---|
| 16:06 | FAC | Tyler | LVEA/Mids | N | Check 3IFO items | 16:47 |
| 16:06 | FAC | Ken | EX | N | Lighting | 19:54 |
| 16:24 | VAC | Janos/Travis | MY/EY | N | Turbo test | 19:30 |
| 16:24 | FAC | Karen | EY | N | Tech clean | 17:30 |
| 16:25 | Kim | EX | EX | N | Tech clean | 17:44 |
| 16:27 | FAC | Randy | LVEA | N | Checks | 16:45 |
| 16:35 | FAC | Chris | LVEA | N | Pest control | 18:03 |
| 17:26 | CDS | Erik | MSR | N | Replace HWS sensor card | 21:18 |
| 17:39 | PEM | Mitch | EX/Y | N | DM FAMIS task | 18:19 |
| 18:16 | FAC | Karen/Kim | LVEA | N | Tech clean | 19:41 |
| 18:18 | CDS | Tony/Randy | EX/Y | N | Deploy end station laptops | 19:20 |
| 18:24 | VAC | Janos | LVEA | N | Ion pump work | 20:07 |
| 18:59 | EPO | Oli + tour | LVEA | N | Tour | 19:48 |
| 19:09 | SEI | Jim | CR | N | HAM 5 TFs | 19:31 |
| 19:57 | CDS | Tony | EY/EX | N | Set up workstations | 21:05 |
| 20:14 | ISC | TJ | LVEA | N | LVEA sweep | 20:30 |
| 20:19 | FAC | Ken/Richard | FCES | N | Checks | 20:39 |
| 23:11 | CDS | Tony | CER | N | Take pictures | 23:13 |
At 23:37 Sun 31 Dec 2023 PST the h1hwsmsr computer crashed. At this time: EDC disconnect count went to 88, Slow Controls SDF (h1tcshwssdf) discon_chans count = 15, GRD DIAG_MAIN cannot connect to HWS channel
The main impact on the IFO is that the ITMX HWS camera cannot be controlled and is stuck in the ON state (taking images at 7Hz).
Time line for camera control:
| 23:22 Sun 31 Dec 2023 PST | Lock Loss, ITMX and ITMY cams = ON |
| 23:37 Sun 31 Dec 2023 PST | h1hwsmsr computer crash, no ITMX cam control |
| 04:37 Mon 01 Jan 2024 PST | H1 lock, ITMY cam = OFF, ITMX stuck ON |
Tagging DetChar in case the 7Hz comb reappears since the ITMX HWS camera was left on for the observing stretch starting this morning at 12:41 UTC.
I also removed ITMX from the "hws_loc" list in the HWS test in DIAG_MAIN and restarted the node at 18:08 UTC so that DIAG_MAIN could run again and clear the SPM diff (tagging OpsInfo). This did not take H1 out of observing.
Similar to what I did on 23 Dec 2023 when we lost h1hwsex, I have created a temporary HWS ITMX dummy IOC which is running under a tmux session on cdsioc0 as user=ioc. All of its channels are zero except for the 15 being monitored by h1tcshwssdf which are set to the corresponding OBSERVE.snap values.
EDC and SDF are back to being GREEN.
The H1:PEM-CS_MAG_LVEA_OUTPUTOPTICS_Y_DQ channel 74900 shows the 7Hz has been present since 07:37UTC 01 Jan 2024 when the h1hwsmsr computer crashed. Plan to restart the code turning the camera off during locks 74951 during commisioning today.
In 75124 Jonathan, Erik and Dave replaced the computer and today we were again able to communicate with the camera (needed to use the alias init_hws_cam='/opt/EDTpdv/initcam -f /opt/EDTpdv/camera_config/dalsa_1m60.cfg'). At 18:25-18:40UTC we adjusted from 7Hz to 5Hz, off and left back at 7Hz. We'll plan to stop Dave's dummy IOC and restart the code later today. Once this is successful, the CDS team will look at replacing the h1hwsex 75004 and h1hwsey 73906. Erik has WP 11598
From 23:35UTC these combs are gone, 75159.
Erik is building a new h1hwsex computer and will install it at EX in the next hour.
h1hwsex had crashed and only needed a reboot.
I installed the "new" V1 server as h1hwsey at EY. It's physically connected and running, but is not on the network. It requires some more in-person which we'll do Friday or earlier when out of observe.
I've restarted the camera control software on h1hwsex (ETMX) and h1hwsmsr (ITMX). All the HWS cameras are now off (external trigger mode).