Search criteria
Section: H1
Task: DAQ
As per WP 12455 Dave and Jonathan replaced the power supply on h1daqframes-1. Due to the available space in the rack we decided to power off the system so that we could move it safely and replace the power supply.
We did some double checking to verify which machines where hooked together. We followed the physical connections and cross checked them with the names and numbers for interfaces on each machine. This was good as the labels on the front of the system where wrong. This is the frame disk associated with h1daqfw1.
We powered down h1daqfw1, verified that the link to h1daqframes-1 had gone dark. After that we powered off h1daqframes-1 pushed it forwared enought to replace the power supply, pushed it back in th place and restarted it.
After the disk server was back we restarted h1daqfw1 as well.
Dave will fix the labeling.
h1daqfw2 is now producing identical raw, second trend, and minute trend frames to h1daqfw[01]. The issues mentioned in https://alog.ligo-wa.caltech.edu/aLOG/index.php?callRep=84132 have been resolved. There are times where we expect there to be mismatches. These are generally when data is missing, the daqd may replay recent data, where the new frame writer will write zeros. We will continue tests on the frame writer, including software updates. The next test is to watch the behavior of h1daqfw2 when h1daqfw0 & h1daqfw1 have a restart. It is designed to automatically adjust its channel list and have no gaps in the frames and need no restart to accommodate the changes.
Jonathan, Dave, We replaced the hardware that is running h2daqfw2 today. This was done to put in a system that had more memory. We pulled a spare daq unit out of the test stand (most recently named x6dc0). While doing that we swapped the memory with a broken daqd system, so that the new h2daqfw2 now has 256GB of ram. The system is up and running again, I'm tracking the setup in puppet so that we have a low rebuild time. This was done to support investigating differences between the frames generated on h1daqfw[01] and hqdaqfw2. I need to be able to run modified versions of daqd to see some of the internal state that is not exposed to be able to understand and fix differences in encoding of the frames. When I ran daqd on the old hardware, it ran out of memory, even after reducing buffer sizes.
The TwinCAT slow controls software was updated to incorporate LSC-REFL_B. This includes controls and readbacks for a new demod chassis and a new delay line.
Today I setup h1daqfw2 as a platform to test a new frame writer for use after O4. For the fw hardware I repurposed h1digivideo3 which is an older Xeon server with 10 cores and 64GB of ram. I added 2x2TB old hard disks in a RAID 0 config (to improve the write performance). At this point I am not looking to do any mid to long term storage of frames. I did not connect this up to the data stream via dolphin. Instead I am running a new instance of cps_xmit on h1daqnds0 and using that over a new dedicated 1g link to h1daqfw2. I've updated the puppet config for h1daqnds0 to make this a persistent change. At this point I am running the new frame writer on h1daqfw2, and it is producing frames. I need to do some more configuration (mainly around the run number server) so the frames will be identical to those output from the other frame writers (the difference should be in metadata in the frame headers, not the recorded data). In simulated data setups I have produced frames that are byte for byte identical to daqd frames so it is fairly likely that after I get that working I will see identical frames. The point of this frame writer is to move towards a auto-reconfiguring/restartless system that is able to adjust on the fly to channel changes, remove some other limitations in the daqd, and to become the ngdd projects frame writer for downstream derived data products. The first things I will look at with this is memory and cpu requirements under the H1 load. This testing will be ongoing.
WP12321 Add FMCSSTAT channels to EDC
Erik, Dave:
Recently FMCS-STAT was expanded to monitor FCES, EX and EY temperatures. These additional channels were added to the H1EPICS_FMCSSTAT.ini file. DAQ and EDC restart was required
WP12339 TW1 raw minute trend file offload
Dave:
The copy of the last 6 months of raw minute trends from the almost full SSD-RAID on h1daqtw1 was started. h1daqnds1 was temporarily reconfigured to serve these data from their temporary location while the copy proceeds.
A restart of h1daqnds1 daqd was needed, this was done when the DAQ was restarted for EDC changes.
WP12333 New digivideo server and network switch, move 4 cameras to new server
Jonathan, Patrick, Fil, Dave:
A new Cisco POE network switch, called sw-lvea-aux1, was installed in the CER below the current sw-lvea-aux. This is a dual powered switch, both power supplies are DC powered. Note, sw-lvea-aux has one DC and one AC power supply, this has been left unchanged for now.
Two multimode fiber pairs were used to connect sw-lvea-aux1 back to the core switch in the MSR.
For testing, four relatively unused cameras were moved from h1digivideo1 to the server h1digivideo4. These are MC1 (h1cam11), MC3 (h1cam12), PRM (h1cam13) and PR3 (h1cam14).
The new server IOC is missing two EPICS channels compared with the old IOC, _XY and _AUTO. To green up the EDC due to these missing channels a dummy IOC is being ran (see alog).
The MC1, MC3, PRM and PR3 camera images on the control room FOM (nuc26) started showing compression issues, mainly several seconds of smeared green/magenta horizontal stripes every few minutes. This was tracked to maximizing CPU resources, and has been temporaily fixed by stopping one of these camera viewers.
EY Timing Fanout Errors
Daniel, Marc, Jonathan, Erik, Ibrahim, Dave:
Soon after lunchtime the timing system started flashing RED on the CDS overview. Investigation tracked this down to the EY fanout, port_5 (numbering from zero, so the sixth physical port). This port sends the timing signal to h1iscey's IO Chassis LIGO Timing Card.
Marc and Dave went to EY at 16:30 with spare SFPs and timing card. After swapping these out with no success, the problem was tracked to the fanout port itself. With the original SFPs, fiber and timing card, using port_6 instead of port_5 fixed the issue.
For initial SFP switching, we just stopped all the models on h1iscey (h1iopiscey, h1iscey, h1pemey, h1caley, h1alsey). Later when we replaced the timing cards h1iscey was fenced from the Dolphin fabric and powered down.
The operator put all EY systems (SUS, SEI and ISC) into a safe mode before the start of the investigation.
DAQ Restart
Erik, Dave:
The 0-leg restart was non-optimal. A new EDC restart procedure was being tested, whereby both trend-writers were turned off before h1edc was restarted to prevent channel-hopping which causes outlier data.
The reason for the DAQ restart was an expanded H1EPICS_FMCSSTAT.ini
After the restart of the 0-leg it was discovered that there were some naming issues with the FMCS STAT FCES channels. Erik regenerated a new H1EPICS_FMCSSTAT.ini and the EDC/0-leg were restarted again.
Following both 0-leg restarts, FW0 spontaneously restarted itself after running only a few minutes.
When the EDC and the 0-leg were stable, the 1-leg was restarted. During this restart NDS1 came up with a temporary daqdrc serving TW1 past data from its temporary location.
Reboots/Restarts
Tue18Feb2025
LOC TIME HOSTNAME MODEL/REBOOT
09:45:03 h1susauxb123 h1edc[DAQ] <<< first edc restart, incorrect FCES names
09:46:02 h1daqdc0 [DAQ] <<< first 0-leg restart
09:46:10 h1daqtw0 [DAQ]
09:46:11 h1daqfw0 [DAQ]
09:46:12 h1daqnds0 [DAQ]
09:46:19 h1daqgds0 [DAQ]
09:47:13 h1daqgds0 [DAQ] <<< GDS0 needed a restart
09:52:58 h1daqfw0 [DAQ] <<< Sponteneous FW0 restart
09:56:21 h1susauxb123 h1edc[DAQ] <<< second edc restart, all channels corrected
09:57:44 h1daqdc0 [DAQ] <<< second 0-leg restart
09:57:55 h1daqfw0 [DAQ]
09:57:55 h1daqtw0 [DAQ]
09:57:56 h1daqnds0 [DAQ]
09:58:03 h1daqgds0 [DAQ]
10:03:00 h1daqdc1 [DAQ] <<< 1-leg restart
10:03:12 h1daqfw1 [DAQ]
10:03:13 h1daqnds1 [DAQ]
10:03:13 h1daqtw1 [DAQ]
10:03:21 h1daqgds1 [DAQ]
10:04:07 h1daqgds1 [DAQ] <<< GDS1 restart
10:04:48 h1daqfw0 [DAQ] <<< Spontaneous FW0 restart
17:20:37 h1iscey ***REBOOT*** <<< power up h1iscey following timing issue on fanout port
17:22:17 h1iscey h1iopiscey
17:22:30 h1iscey h1pemey
17:22:43 h1iscey h1iscey
17:22:56 h1iscey h1caley
17:23:09 h1iscey h1alsey
We added some code to recognize the more revent timing board firmware revisions.
I startedtesting the PCALX_STAT Guardian node today.
/opt/rtcds/userapps/release/cal/h1/guardian/PCALX_STAT.py
It created a new ini file, But the DAQ was not restarted after this new ini file creation.
As it currently stands this is a draft of the final product that will be tested for a week and further refined.
This Guardian node, does not make any changes to the IFO, it's only job is to determine if PCALX arm is broken or not. TJ has already added it to the Guardian Ignore list.
J. Kissel echoing E. Dohmen Just a bit of useful info from E.J. that I think others might be interested in (and giving myself bread crumbs to find the info in the future): - The PRODUCTION (H1 included) systems are only at 5.1.4 with no plans on upgrading soon. - h1susetmx computer, however is at a prototype version of 5.3.0 in order to support LGIO DAC testing (see LHO:79735) - One can find the running release notes for modern versions (since RCG 2.0) of the RCG at https://git.ligo.org/cds/software/advligorts/-/blob/master/NEWS?ref_type=heads
This information is also shown on the CDS overview. Models with a purple mark are built using a non-production RCG version. Clicking on the purple mark opens the RCG MEDM for more details.
Please see alog-78920 for details.
FWIW - The Stanford test stands are running RCG 5.3 to allow development and testing of the SPI readout code.
There's also some release notes here:
As part of yesterday's maintenance, the atomic clock has been resynchronized with GPS. The tolerance has been reduced to 1000ns again. Will see how long it lasts this time.
Marc Daniel
We measured the gain and phase difference between the new DAC and the existing 20-bit DAC in SUS ETMX. For this we injected 1kHz sine waves and measure gain and phase shifts between the two. We started with a digital gain value of 275.65 for the new DAC and adjusted it to 275.31 after the measurement to keep the gains identical. The new DAC implements a digital AI filter that has a gain of 1.00074 and a phase of -5.48° at 1kHz, which corresponds to a delay of 15.2µs.
This puts the relative gain (new/old) to 1.00074±0.00125 and the delay to 13.71±0.66µs. The variations can be due to the gain variations in the LIGO DAC, the 20-bit DAC, the ADC or the AA chassis.
DAC | Channel Name | Gain | Adjusted | Diff (%) | Phase (°) | Delay (us) |
0 | H1:SUS-ETMX_L3_ESD_DC | 1.00239 | 1.00114 | 0.11% | -5.29955 | -14.72 |
1 | H1:SUS-ETMX_L3_ESD_UR | 1.00026 | 0.99901 | -0.10% | -5.10734 | -14.19 |
2 | H1:SUS-ETMX_L3_ESD_LR | 1.00000 | 0.99875 | -0.12% | -4.93122 | -13.70 |
3 | H1:SUS-ETMX_L3_ESD_UL | 1.00103 | 0.99978 | -0.02% | -5.11118 | -14.20 |
4 | H1:SUS-ETMX_L3_ESD_LL | 1.00088 | 0.99963 | -0.04% | -5.21524 | -14.49 |
8 | H1:SUS-ETMX_L1_COIL_UL | 1.00400 | 1.00275 | 0.27% | -4.72888 | -13.14 |
9 | H1:SUS-ETMX_L1_COIL_LL | 1.00295 | 1.00170 | 0.17% | -4.88883 | -13.58 |
10 | H1:SUS-ETMX_L1_COIL_UR | 1.00125 | 1.00000 | 0.00% | -5.08727 | -14.13 |
11 | H1:SUS-ETMX_L1_COIL_LR | 1.00224 | 1.00099 | 0.10% | -4.92882 | -13.69 |
12 | H1:SUS-ETMX_L2_COIL_UL | 1.00325 | 1.00200 | 0.20% | -4.78859 | -13.30 |
13 | H1:SUS-ETMX_L2_COIL_LL | 1.00245 | 1.00120 | 0.12% | -4.55283 | -12.65 |
14 | H1:SUS-ETMX_L2_COIL_UR | 1.00175 | 1.00050 | 0.05% | -4.52503 | -12.57 |
15 | H1:SUS-ETMX_L2_COIL_LR | 1.00344 | 1.00219 | 0.22% | -5.00466 | -13.90 |
Average | 1.00199 | 1.00074 | 0.07% | -4.93611 | -13.71 | |
Standard Deviation | 0.00125 | 0.00125 | 0.13% | 0.23812 | 0.66 |
FPGA filter is
zpk([585.714+i*32794.8;585.714-i*32794.8;1489.45+i*65519.1;1489.45-i*65519.1;3276.8+i*131031; \
3276.8-i*131031;8738.13+i*261998;8738.13-i*261998], \
[11555.6+i*17294.8;11555.6-i*17294.8;2061.54+i*26720.6;2061.54-i*26720.6;75000+i*93675; \
75000-i*93675;150000+i*187350;150000-i*187350;40000],1,"n")
Jennie, Jenne, Sheila
I pushed Jenne's updated cleaning but cannot check if this is better or worse until our problems getting data from nds2 are fixed.
I ran the following:
cd /ligo/gitcommon/NoiseCleaning_O4/Frontend_NonSENS/lho-online-cleaning/Jitter/CoeffFilesToWriteToEPICS/ python3 Jitter_writeEPICS.py
I accepted these DIFFS in OAF model in OBSERVE.snap but we might have to revert them before finish of the commissioning period today if we find out the cleaning is worse.
GPS 139627823 - 16 mins quiet time from last night.
11:12:41 UTC - 11:18:38 UTC quiet time just before cleaning implemented.
11:19:55 UTC new cleaning drops.
11:31:30 UTC end of quiet time.
I took the following jitter comparison measurements
Old Cleaning quiet time: 11:19:55 UTC 0404/2024 light blue
New cleaning quiet time: 13:50:05 UTC 04/04/2024 red
Its hard to tell if new cleaning is better and I have reverted the coefficients in OBSERVE.snap to what they were this morning.
J. Kissel, O. Patane A follow-up from yesterday's work on installing the infrastructure of the upgrades to the ETM and TMS watchdog systems, in this aLOG I cover with what I've filled out the infrastructure in order to obtain the calibrated BLRMS that forms the trigger signal for the user watchdog. Remember, any sensible BLRMS system should (1) Take a signal, and filter with with a (frequency) band-limiting filter, then (2) Take the square, then the average, then square root, i.e. the RMS, then (3) Low-pass the RMS signal, since only the "DC" portion of the RMS has interesting frequency content. As a bonus, if your signal is not calibrated, then you can add (0) Take the input to the band limiting filter, and calibrate it (and through the power of linear algebra, it doesn't really matter whether you band-limit first and *then* calibrate) This screenshot shows the watchdog overview screen conveying this BLRMS system. Here're the details of the BANDLIM and RMSLP filters for each of the above steps: (0) H1:SUS-ETMX_??_WD_OSEMAC_BANDLIM_?? FM6 ("10:0.4") and FM10 ("to_um") are exact copies of the calibration filters that are, and have "always been" in the OSEMINF banks. These are highlighted in the first attachment in yellow. FM6 :: ("10:0.4") :: zpk([10],[0.4],1,"n") :: inverting the frequency response of the OSEM satellite amp frequency response FM10 :: ("to_um") :: zpk([],[],0.0233333,"n") :: converting [ADC counts] into [um] assuming an ideal OSEM which has a response of 95 [uA / mm], differentially readout with 242 kOhm transimpednance and digitized with a 2^16 / 40 [ct / V] ADC. In addition, I also copied over the GAIN from the OSEMINF banks and copied these over as well such that each OSEM trigger signal remains "normalized" to an ideal 95 [uA / mm] OSEM. These are highlighted in dark green in the first attachment. (1) H1:SUS-ETMX_??_WD_OSEMAC_BANDLIM_?? FM1 :: ("acBandLim") :: zpk([0;8192;-8192],[0.1;9.99999;9.99999],10.1002,"n") :: 0.1 to 10 Hz band-pass (2) This is a major part of the upgrade -- the front-end code that does the RMS was changed from the nonsense "cdsRms" block (see LHO:1265) to a "cdsTrueRMS" block (see LHO:19658) (3) H1:SUS-ETMX_??_WD_OSEMAC_RMSLP_?? FM1 :: ("10secLP") :: butter("LowPass",4,0.1) :: 4th order butterworth filter with a corner frequency at 0.1 Hz, i.e. a 10 second Low Pass. This is highlighted in magneta in the second attachment. These are direct copies from other newer suspension models that had this infrastructure in place. I've committed the filter files to the userapps repo, /opt/rtcds/userapps/release/sus/h1/filterfiles/ H1SUSETMX.txt H1SUSETMY.txt H1SUSETMXPI.txt H1SUSETMYPI.txt H1SUSTMSX.txt H1SUSTMSY.txt are all committed as of rev 27217. All of these settings were captured in each model's safe.snap. I've not yet accepted them in the OBSERVE.snaps.
Here's a handy script that demos using the python bindings to foton in order to easily populate simple filters from a python script. I've only used this from the control room workstations, whose environment has been already built up for me, so I can't claim any knowledge of details about what packages this script needs. But, if you have the base cds conda environment this "should work."
The previous timing master which was again running out of range on the voltage to the OCXO, see alogs 68000 and 61988, has been retuned using the mechanical adjustment of the OCXO.
Today's readback voltage is at +3.88V. We will keep it running over the next few months to see, if it eventually settles.
Today's readback voltage is at +3.55V.
Today's readback voltage is at +3.116V.
Today's readback voltage is at +1.857V.
Today's readback voltage is at +0.951V.
Today's readback voltage is at -2.511V
Labels are now good. MSR-RACK09 U02-U03 (bottom unit) is h1daqframes-1. MSR-RACK09 U04-U05 (next to bottom unit) is h1daqframes-0.
Closed FRS27399