all the models on h1oaf0 have stopped running. At time of writing it looks like a possible timing glitch caused by in-rack work in the CER. We will hold off on restarts until the rack wiring work is completed because there are ongoing alignment measurements.
Soon after we lost h1seih16 models
Initially I thought these were just timing glitches, but logging onto the front ends and scanning the IO Chassis cards using lspci shows that many cards are missing. This suggests +24V DC power issues.
For example, h1seih16 is showing the first Adnaco backplane (timing_card, ADC0, DAC0) but is missing the second Adnaco (ADC1, ADC2, ADC3, DAC1). The third is empty, the fourth has the Contec card and is seen.
Plan is to let Marc finish the accelerometer work he was doing in the CER (WP11653: replacing accelerometer power conditioners) at the time of these model crashes before starting any model-recovery.
Fri Jan 26 12:33:45 2024 INFO: Fill completed in 3min 42secs
Second time was a charm, Gerardo confirmed a good fill curbside.
Replaced the 6 port +24V DC power strip with a 24 port DC power strip in the TCS / PEM racks per WP11649
Marc, Oli, Dave:
For Marc's change over to a bigger +24V power strip in the OAF/TCS CER rack, I powered down the h1oaf0 frontend to permit its IO Chassis to be power cycled.
Sequence was
All is looking good from a CDS perspective.
Dana, Louis
We analyzed every measurement of the sensing function taken between the start of O4 and October 27th to see if they were reliable and came up with the following, summarized in the table below:
Report ID | GPS time [s] | Time locked prior to measurement[h] |
---|---|---|
20230504T055052Z | 1367214670 | 6+ |
20230505T012609Z | 1367285187 | 5.2 |
20230505T174611Z | 1367343989 | 5.2 |
20230505T200419Z | 1367352277 | 0.2 |
20230506T182203Z | 1367432541 | 4.7 |
20230508T180014Z | 1367604032 | 6+ |
20230509T070754Z | 1367651292 | 5.8 |
20230510T062635Z | 1367735213 | 3.5 |
20230517T163625Z | 1368376603 | 6+ |
20230616T161654Z | 1370967432 | 3.4 |
20230620T234012Z | 1371339630 | 2.9 |
20230621T191615Z | 1371410193 | 2.1 |
20230621T211522Z | 1371417340 | 4.0 |
20230628T015112Z | 1371952290 | 4.8 |
20230716T034950Z | 1373514608 | 6+ |
20230727T162112Z | 1374510090 | 6+ |
20230802T000812Z | 1374970110 | 2.6 |
20230817T214248Z | 1376343786 | 6+ |
20230823T213958Z | 1376862016 | 4.3 |
20230830T213653Z | 1377466631 | 3.7 |
20230906T220850Z | 1378073348 | 3.9 |
20230913T183650Z | 1378665428 | 6+ |
20230928T193609Z | 1379964987 | 6+ |
20231004T190945Z | 1380481803 | 4.7 |
20231018T190729Z | 1381691267 | 6+ |
20231027T203619Z | 1382474197 | 6+ |
Ideally, the detector should be in lock state at least three hours before making a sensing function measurement to make sure the thermalization process is complete. However, there were a couple measurements that were made when the detector had only been locked for about two hours (06/21, 08/02), and there was one particularly problematic measurement that was made when the detector had only been locked for about 10 minutes (05/05). This last measurement should certainly not be included in the GPR calculation.
The code used to obtain the detector lock state and history given a report ID is attached below. Note: To run this code, you will need access to pydarm, so run the following command in the terminal before executing the file: source /ligo/groups/cal/local/bin/activate
report 20230505T200419Z
changed to 'invalid' in LHO:75629.
Patrick, Jonathan, Erik, Dave:
A summary and follow up on the recent FMCS EPICS IOC freeze ups.
Timeline (all times local)
Sat 13 Jan 15:57 FMCS IOC flatlining started. After restart IOC would run anywhere from 1 to 14 hours before flatlining again
Tue 16 Jan 23:12 FMCS IOC running under systemd control, auto-restart code running which restarts IOC if flatlined for 10 mins
Wed 17 Jan 10:48 fmcs-epics-cds machine power cycled
Thu 18 Jan 11:25 After power cycle, IOC ran 25 hours with no flatline, previous longest run 14 hours [Power cycle of computer fixed it]
Thu 18 Jan 11:25 Patrick installed new version of the FMCS IOC code. This added new diagnostic EPICS channels.
Thu 18 Jan 14:58 systemd control of FMCS IOC under puppet configuration management
Tue 23 Jan 10:33 DAQ+EDC restarted to trend new FMCS diagnostic channels
Summary:
After running for many years error free, and 90 days after the last computer reboot, the FMCS IOC code became unstable. At random times it would stop updating its EPICS values, flatlining them at their last value.
The code at this point was started manually and ran in a screen environment.
To facilitate the auto restart of the code while the problem was being investigated, the IOC code was moved to a procServ environment and put under systemd control. A script running on cdsmanager monitored the FMCS channel H0:FMC-EX_CY_H2O_SUP_DEGF every minute. If its value did not change for 10 minutes, the systemd fmcs_ioc.service was restarted on fmcs-epics-cds.
A soft reboot of fmcs-epics-cds did not fix the problem. We then tried a hard power down, wait 30 seconds, then power back on. This appeared to fix the immediate problem.
We decided to upgrade the software to see if this would prevent future occurances, in 90+ days time.
At time of writing, +8 days after upgrade, there have been zero flatline instances.
One caveat to the new FMCS code is that we have not directly verified it records when the fire pumps run. However the new code's diagnostics channels do permit a verification that the fire pump bacnet device it functioning correctly.
The fire pump status is a binary device. The new code does not allow binary records to directly read device data, so Patrick created intermediate analog input records to read the bacnet devices. These records are defined in the EPICS db file fmcs_bacnet_bi_to_ai.db
For the firepumps, the AI record device names are:
field(INP, "@bacnet12075 3 5 85")
field(INP, "@bacnet12075 3 6 85")
In backnet-speak, the device string is @bacnet<dev_id> <data_type> <dev_chan_num> <chan_type>
In this case fire_pump_1 reads channel 5, fire_pump_2 reads channel 6. Device 12075 only reads the two fire pump operational status.
The new code provides diagnostic channels for bacnet devices, in this case the three channels:
H0:FMC-BACNET_12075_TX
H0:FMC-BACNET_12075_RX
H0:FMC-BACNET_12075_ER
Trending these channels since they were added to the DAQ Tue morning shows: zero errors, TX and RX numbers increasing linearly in step, they are almost but not quite identical to each other.
Closes FAMIS#26439, last checked 74987
BRS Driftmon (attachment1)
All within range, however in the last day, BRS-X has increased steeply.
Aux BRS Channels (attachment2)
ETMX BRS Temp has declined in temperature over the last day, lining up with the increase in BRS-X Drift.
Fri Jan 26 10:03:24 2024 INFO: Fill completed in 3min 22secs
Dubious fill. From the discharge line pressure it looks like a reasonable fill. From the TC perspective it was not optimal, TC-B min was only -72C which barely tripped the fill. There is no indication of a steady LN2 flow after the trip.
I've zoom in on the plot to show the TC details.
Gerardo agreed this does not look like a good fill, we will try again at 12:30 today.
Closes FAMIS#26278, last checked 75471
Corner Station Fans (attachment1)
MR_FAN5_170_2 became noisier after being turned off and back on last Thursday (attachment2), but is still well below being of concern and has been getting quieter over the course of the week.
All other fans are looking normal and within range.
Outbuilding Fans (attachment3)
All fans are looking normal and within range.
Closes FAMIS#26228, last checked 75219
Laser Status:
NPRO output power is 1.819W (nominal ~2W)
AMP1 output power is 68.31W (nominal ~70W)
AMP2 output power is 138.9W (nominal 135-140W)
NPRO watchdog is GREEN
AMP1 watchdog is GREEN
AMP2 watchdog is GREEN
PMC:
It has been locked 5 days, 19 hr 23 minutes
Reflected power = 18.55W
Transmitted power = 108.4W
PowerSum = 127.0W
FSS:
It has been locked for 10 days 1 hr and 27 min
TPD[V] = 0.5292V
ISS:
The diffracted power is around 1.9%
Last saturation event was 10 days 1 hours and 59 minutes ago
Possible Issues:
PMC reflected power is high
FSS TPD is low
ISS diffracted power is low
So far, the O4 break vent & commissioning work is proceeding as planned, with only minor hiccups. Yesterday's TCS water leak caused a pause to the HAM6 in-chamber OMC alignment work, but we were able to resume this morning pretty quickly with only a short amount of time down.
The attached snapshot shows the planned schedule outline with green checkmarks next to the items completed so far on the left hand side. On the right is a running log of the activities which have taken place daily (for ease of viewing all in one place).
Teams currently on:
TJ, Jason, Camilla, Fil.
WP11612, FRS25384, swapping because current CO2X laser was slowly deteriorating, details in 75249.
Removed: Access 50LT 20706.21015D. Installed 20306-20419D, which was refurbished/re-gassed in March 2020. Table layout: T1200007.
Still to do after the CO2X cooling line issue is solved:
FRS 30283 Created for this issue.
Tagging EPO for Thermal Compensation System (TCS) photos.
Today's activities: - 3 viewports have been taken off at EX for baffle works. These have been inspected, and found intact in every aspect - the inspection report is in the comment section - These viewports have been already taken back, as the works were already finished - The EX RGA is being rebuilt after many leaks have been found - it is in progress, will be finished tomorrow - The new Hepta header elbow pieces have been staged at all stations
Dana, Jenne
We looked at how the microseism levels have impacted the detector duty factor since the start of aLIGO to see if our ability to maintain lock during microseisms has gotten better or worse. The attached figure shows the fraction of time the detector maintained lock over the course of a week as a function of the average microseism level over that same week for each week in O4a, as well as each week in (1) O1, (2) O2, (3) O3a, and (4) O3b.
According to the data, it seems we may be slightly worse off in terms of our ability to maintain lock during high levels of microseisms in O4 than we were in O3, but it seems we are better off now than we were in O1 and O2. However, it is difficult to directly compare the data between observing runs given that during certain runs the average level of microseisms seemed consistently higher than in others. There were also certainly other sources of lockloss besisdes microseisms. But it can at least be said that maintaining lock during microseisms does not appear to be a bigger issue now than it used to be.
Note: The two large data gaps present in O2 were removed.
This is the code I used to produce the attached figure and some additional figures in case anyone would like to repeat this study in the future.