Displaying report 1-1 of 1.
Reports until 09:58, Friday 05 January 2024
H1 CDS
david.barker@LIGO.ORG - posted 09:58, Friday 05 January 2024 - last comment - 11:18, Monday 08 January 2024(75192)
h1hwsmsr stopped running at 22:14 Thu 04 Jan 2024 PST

Camilla, Erik, Dave:

h1hwsmsr (HWS ITMX and /data RAID) computer froze at 22:14 Thu 04 Jan 2024 PST. The EDC disconnect count went to 88 at this time.

Erik and Camilla have just viewed h1hwsmsr's console, which indicated a HWS driver issue at the time. They rebooted the computer to get the /data RAID NFS shared to h1hwsex and h1hwsmsr1. Currently the ITMX HWS code is not running, we will start it during this afternoon's commissioning break.

One theory of the recent instabilities is the camera_control code I started just before the break to ensure the HWS cameras are inactive (in extenal trigger mode) when H1 is locked. Every minute the camera_control code gets the status of the camera, which along with the status of H1 lets it decide if the camera needs to be turned ON or OFF. Perhaps with the main HWS code getting frames from the camera, and the control code getting the camera status, there is a possible collision risk.

To test, we turn the camera_control code off at noon. I will rework the code to minimize the number of camera operations to the bare minimum.

Comments related to this report
camilla.compton@LIGO.ORG - 12:57, Friday 05 January 2024 (75200)TCS

At ~ 20:00UTC we left the HWS code running (restarted ITMX) but stopped Dave's carema control code 74951 on ITMX, ITMY, ETMY, leaving the camera's off. They'll be left off over the weekend until Tuesday. ETMX is still down from yesterday 75176

If the computers remain up over the weekend we'll look at incorporating the camera control into the hws code to avoid crashes. 

camilla.compton@LIGO.ORG - 15:25, Friday 05 January 2024 (75203)

Erik swapped h1hwsex to a new v1 machine. We restarted the HWS code and turned the camera to external trigger mode so it too should remain off over the weekend.

ryan.short@LIGO.ORG - 16:29, Friday 05 January 2024 (75208)OpsInfo

I've commented out the HWS test entirely (only ITMY was being checked) from DIAG_MAIN since no HWS cameras are capturing data. Tagging OpsInfo.

erik.vonreis@LIGO.ORG - 17:24, Friday 05 January 2024 (75210)

Trace from h1hwsmsr crash attached.

 

 

Images attached to this comment
camilla.compton@LIGO.ORG - 11:18, Monday 08 January 2024 (75248)TCS

All 4 computers remained up and running over the weekend, with the camera on/off code paused. We'll look into either making Dave's code smarter or incorporating the cameras turning on/off into the hws-server code so that we don't send multiple calls to the camera at the same time, our leading theory as to why these hws computers have been crashing. 

Displaying report 1-1 of 1.