Sheila, Louis, with help from Camilla and TJ
Louis and I have had several locklosses transitioning to the new DARM configuration. We don't understand why.
This transition was done several times in December, one of these times was 15:15 UTC on December 19th, when the guardian state was used (74977). Camilla used the guardian git to show us the code that was loaded that morning, which does seem very much the same as what we are using now (the only difference is a ramp time which Louis found was wrong and corrected, this ramp time doesn't actually matter though since the value is only being reset to the value it is already at).
We also looked in the filter archive and see that the H1SUSETMX filters have not been reloaded since December 14th, so the filters should be the same. We also looked at the filters and believe that they should be correct.
In the last attachment to 74790 you can see that this configuration has more drive to the ESD at the microseism (the reduction in the ESD RMS comes from reduced drive around a few Hz), so this may be less robust when there is more wind and microseism. I don't think this is our current problem though, because we are loosing lock due to a 2.6Hz oscillation saturating the ESD.
We've tried both to do this transition in the way that it was done in December, (using the NEW_DARM state) and by setting the flag in the TRANSITION_FROM_ETMX state, which I wrote in Decmber but we hadn't tested until today. This code looks to have set everything up correctly, but we still loose lock due to a 2.6Hz saturation of the ESD.
Camilla looked at the transition we did on December 19th, there was also a 2.6Hz ring up at that time, but perhaps with the lower microseism we were able to survive this. A solution may be to ramp to the new configuration more quickly (right now we use a 5 second ramp).
Elenna suggested ASC could be making this transition unstable and that we could think about raising the gain of an ASC lop during the transition. On Friday's lockloss, attached, you can see CSOFT and DSOFT YAW wobble at 2.6Hz. HARD loops look fine.
Nyath, Dave, Jonathan, Erik, At around 19:14 UTC the router failed again. Erik followed the procedure in T2300212 to remove the router and use the backup router. The backup router had been tested on the test stand. The config needed two changes to the route table to make things work. This was largely an artifact of it being built and tested on the test stand. There was a old test stand ip address in the route table and one of the internal routes needed a larger network mask. The new router appears to be working and we are able to allow remote access to CDS and to reach the outside world. Nyath also checked the GC switch that the CDS router connects to to make sure that there were no problems with MAC address locks not allowing the change in routers.
IFO is LOCKING
IFO is LOCKING at CHECK_MICH_FRINGES
Erik says the old nat router's fan is not spinning. This is not a hot-swap item. He and Jonathan and working on installing the spare router.
To mirror Anamaria's findings in LLOalog69020 on the quad vertical motion changing over the last 150 days, I've trended the same channels with the addition of the outside air temperature. I did this in ndscope, so the plots aren't overlapped and offset like Anamarias. Definitely not as good looking but much faster for me. It also seems that some of our CS on chamber temperature sensors seem to be intermittently working.
This tells the same story that we (re?)found back in November/December (alog74575, alog74533) that the outside air temperature is affecting the sag of our suspensions more than we're accounting for with any control, offsets, or HVAC settings. During that time when we couldn't lock, we tried adding some vertical offsets to the BS, and ITMs but ended up reverting all of that and just realigned PR3 to a good point and then adjusted the COMM beatnote on table for this better corner alignment. How to handle these large vertical motions was never really solved.
We think that our OMC throughput has degraded since installation, 74022, 73873, 69707. Dana has checked that the balance of the two OMC DCPDs has not changed since installation, 74683.
Today the question came up if we know if there has been a gradual degradation of the OMC throughput. The calibration group tracks optical gain which depends on serveral things including OMC throughput. This is a trend of kappa C, which is remormalized to 1 when the calibration is reset, shown by the time cursors. This doesn't offer any conclusive evidence that there has been a gradual degredation of the OMC over O4, there has been a slow 2% degredation of the circulating power which can probably explain the slow decrease in kappa C shown here.
Fri Jan 05 10:10:10 2024 INFO: Fill completed in 10min 6secs
Gerardo confirmed a good fill curbside.
Camilla, Erik, Dave:
h1hwsmsr (HWS ITMX and /data RAID) computer froze at 22:14 Thu 04 Jan 2024 PST. The EDC disconnect count went to 88 at this time.
Erik and Camilla have just viewed h1hwsmsr's console, which indicated a HWS driver issue at the time. They rebooted the computer to get the /data RAID NFS shared to h1hwsex and h1hwsmsr1. Currently the ITMX HWS code is not running, we will start it during this afternoon's commissioning break.
One theory of the recent instabilities is the camera_control code I started just before the break to ensure the HWS cameras are inactive (in extenal trigger mode) when H1 is locked. Every minute the camera_control code gets the status of the camera, which along with the status of H1 lets it decide if the camera needs to be turned ON or OFF. Perhaps with the main HWS code getting frames from the camera, and the control code getting the camera status, there is a possible collision risk.
To test, we turn the camera_control code off at noon. I will rework the code to minimize the number of camera operations to the bare minimum.
At ~ 20:00UTC we left the HWS code running (restarted ITMX) but stopped Dave's carema control code 74951 on ITMX, ITMY, ETMY, leaving the camera's off. They'll be left off over the weekend until Tuesday. ETMX is still down from yesterday 75176.
If the computers remain up over the weekend we'll look at incorporating the camera control into the hws code to avoid crashes.
Erik swapped h1hwsex to a new v1 machine. We restarted the HWS code and turned the camera to external trigger mode so it too should remain off over the weekend.
I've commented out the HWS test entirely (only ITMY was being checked) from DIAG_MAIN since no HWS cameras are capturing data. Tagging OpsInfo.
Trace from h1hwsmsr crash attached.
All 4 computers remained up and running over the weekend, with the camera on/off code paused. We'll look into either making Dave's code smarter or incorporating the cameras turning on/off into the hws-server code so that we don't send multiple calls to the camera at the same time, our leading theory as to why these hws computers have been crashing.
Ibrahim, Dave, Jonathan:
The CDS-GC NAT router (rtr-msr-cds) stopped running at 06:20 PST this morning. Symptoms are that all compuer networking between CDS and GC is down, but the control room VOIP phones still work.
Jonathan and Ibrahim power cycled rtr-msr-cds and all is working again now.
Opened FRS30104 to cover this issue
To get the DTS EPICS channels working again (and fix EDC disconnection to these) I restarted the DTS services on cdsioc0 (dts_tunnel, dts_env). The EDC has reconnected to these channels.
TITLE: 12/18 Day Shift: 16:00-00:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Observing at ??Mpc
OUTGOING OPERATOR:
CURRENT ENVIRONMENT:
SEI_ENV state: CALM
Wind: ??
Primary useism: ??
Secondary useism: ??
QUICK SUMMARY:
IFO is in NLN and OBSERVING as of 13:04UTC
Computers in Control Room are NOT working - here is a summary:
- I cannot sign into any of the computers - getting a "password incorrect" despite being able to sign in immediately on my laptop (to write ths).
- DARM FOM is frozen
- Picket fence is frozen
- Teamspeak is frozen
- Cameras are frozen
- No Range is being shown
This seems to be some CDS issue so will call the relevant people to notify and rectify
Other than this, it seems that microseism is on the rise.
TITLE: 01/05 Day Shift: 16:00-00:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Observing at 149Mpc
OUTGOING OPERATOR: Ryan C
CURRENT ENVIRONMENT:
SEI_ENV state: CALM
Wind: 6mph Gusts, 4mph 5min avg
Primary useism: 0.13 μm/s
Secondary useism: 0.76 μm/s
QUICK SUMMARY:
CDS Network is back after restarting router in MSR.
- Problem has been fixed (for now), though we don't know the root cause.
- After calling Dave, Jonathan called and walked me through which MSR power to cycle - after doing this:
- Restarting all NUCs now
Other:
- Potential computer slowdowns
My initial sessions on cdslogin were for some reason slow, but it is now working normally.
Statues of H1: Relocking at CHECK_VIOLINS
11:04UTC H1 called for assistance as the NLN timer had expired with IA already been run. We were at PREP_ASC_FOR_FULL__IFO and I can see that the Violins look huge on DARM, H1 lost lock at MAX_POWER twice this morning/night so that's probably why (tagging SUS), most of the other LLs have been at ALS, presumably these ALS difficulties are from the increased ground motion. I'm going to have to spend some time damping violins it seems.
After spending about an hour in OMC_WHITENING damping violins and increasing gains we've reaquired NLN at 13:04UTC, back into Observing at 13:04UTC
TITLE: 01/05 Eve Shift: 00:00-08:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Lock Acquisition
INCOMING OPERATOR: Ryan C
SHIFT SUMMARY:
High Microseism (over the 95th percentile) is making acquisition tough.
Seems like best chances for locking is right after an alignment.
Have had mixed luck transitioning states for SEI_CONF (between USEISM & WINDY) for most of the shift. Overall, I had more luck in WINDY, but still only made it to NLN once--sole lock of the evening of 2.5hrs.
Ran an alignment with non-trivial results in that some states would not complete. Ended up spending about an hour on alignment.
Picket Fences have been flashing yellow quite a bit (don't recall this yesterday, although I wasn't looking at them as much).
End of shift consisted of frantically trying to get H1 to NLN (lots of quick/early locklosses only around LOCKING ALS) before the end of the shift to no avail.
LOG: