Yesterday morning the BS M2 stage coil driver was swapped out with a spare in an attempt to improve the seemingly bad UR Noise Monitor Channel. The swapped showed a similar condition, only in the LR Channel. As the test plan for TACQ Coil Drivers doesn't require testing of these circuits, we have no data to correlate these issues. However, when the original coil driver was replaced back into service and re-tested, the resulting measurements were reasonably identical to the original measurements. This seems to be a strong indication that the issue is a condition of the monitor bd in the chassis and not cabling/AA/ADC channels, etc. The far left image is the original spectra for the BS M2 stage. The middle image is the spectra for the substitute coil driver. The far right image is the spectra for the original coil driver after re-install.
Nutsinee, Elli
HWS plates replaced on ETMX and ETMY HWS. We shifted the beam path on the ETMX HWS path to get the dimmer refection from a wedge, to drop the power on the HWS. (This is now the lowest power we can get with the current optics.) The power level onto ETMX HWS looks good, the CCD is mostly unsaturated with the plate on. I have changed the exposure time to 13000microseconds, down 20% from the previous 16000microseconds setting. There are about 30 saturated pixels at this setting. ETMY HWS image is too saturated.
ITMY HWS path is aligned to the green beam again. (This alignment changes every time the SR2 and SR3 alignments change). However the ITMY SLED is dead. We need to swap out this SLED and adjust the maximum power levels on the ITMX and ITMY sleds.
Greg, Nutsinee, Aidan, Elli
In the current configuration, the ITMX CO2 laser is inputting 0.2W of central heating to ITMX, just as it has been doing so for the last few months. The power put onto ITMX is still 0.2W, as measured by the power meter on the CO2X table (H1:TCS-ITMX_CO2_LSRPWR_MTR_OUTPUT). But the requested power needed to get 0.2W output in now 0.71W, up from 0.35W yesterday. The rotation stage does not appear to be returning to the same location.
Details:
The CO2 laser power is requested by a setting H1:TCS-ITMX_CO2_LASERPOWER_POWER_REQUEST to the desired value, and then a rotation stage moves a 1/2 wave plate to the required angle for that power. After the model restart this morning, we had to request a very different power (0.44W this morning compared to 0.35W yesterday, compared to 0.3W in March) to get the same output power in H1:TCS-ITMX_CO2_LSRPWR_MTR_OUTPUT. None of the gains or calibration values had changed this morning, so we suspect hysteresis in the rotation stage. I moved the rotation stage back and forth between minimum power and 0.2W output power and back again. I needed to keep requesting higher powers to bring the output power to 0.2W, although there seems to be no clear pattern to the change. Perhapse the rotation stage is sticking sometimes. See attached data set of 20 mins of data where I was moving the rotation stage around.
Other comments on ITMX CO2 laser:
We noticed a few other things while trending the output power:
The laser temperature as measured by H1:TCS-ITMX_CO2_LASERTEMPERATURE jumped from 23 degrees Celcius to ~27C on 17 March, see plot . The temperature has been fluctuating a lot more since then. This does not corelate with the enclosure temperature, the laser power has been steady at 59W, and the chiller settings have not changed. Greg and Matt were poking around the CO2 lasers that day, although according to the alog no chages were made to the ITMX CO2 laser (alogs 17303, 17302).
The laser has been mode hopping since 10 April when Greg swapped out the CO2 laser AOM (alog 17737). We hadn't seen this previously. The laser power has been fluctuating by >1% (1W fluctuations from 59W output power). This does not have a big effect on the power level going onto ITMX.
There's a correlation between spikes in output power of the laser and shifts in the rotation stage. This is quite possible if there is a small reflection from somewhere on or after the rotation stage that is coupling back into the laser and causing the laser mode to shift.
The laser temperature shift may be a glitch in the electronics box. Rich and I noticed something similar at LLO but we could never identify exactly where it came from and then it seemed to disappear. We'll look into this some more.
There definitely seems to be an electronics glitch associated with the laser interlock controller (D1200745). If we look at the attached plot from about a month ago, we see that during an event where the laser was turned off and back on again, there were jumps in the flowrate and temperature voltage monitors (both of which run through D1200745) at the same time that there was a 10% increase in the current to the laser. The laser interlock controller is connected to the laser RF driver to turn the laser on and off.
Taking a look at the laser head output, I'm not sure that those glitches are the laser mode hopping. The power doesn't look like it jumps as much as you would expect for a mode hop. I suspect Aidan is correct that it may be reflecting back into the laser a little which can cause some weird effects since the reflected beam can steal gain.
RCG2.9.1 upgrade [WP5158]
Daniel, Betsy, Jeff, Jim, Dave:
we upgraded all H1 models to RCG tag version 2.9.1 this morning. The H1.ipc file was created from scratch, which removed 24 obsolete IPC channels. Daniel performed an ODC common model change. The end stations models were restarted several times to resync to the ipc file. Some duplicate channels were found in the end station CAL models, these were resolved with explicit EX,EY naming.
h1psliss model core dumped during an SDF table load. More details in earlier posting. We tried to reproduce this error on the DTS x1psl0 ISS model, were unable to do so.
Two 18bit DACs were discovered to have failed their autocal, details in earlier posting.
PSL ODC update [WP5157]
Dave:
The ODC component of the h1psliss model was modified: EPICS part of CHANNEL_OUTMON changed to unsigned integer part, CHANNEL_OUT data type changed to UNIT32. Change went in when PSL was restarted today, this closed WP5157.
I checked that all fast ODC DQ channels in the DAQ are now of data type 7 (unsigned integer 32 bit)
The rather lengthy restart log is attached
Following the discovery of an 18bit-DAC which fails its autocal in x1susex, I moved the card within the chassis to see if the DAC autical failure follows the card. It does.
On inspection, this 18bit DAC differs from the other 4 in the chassis in that the S/N label on the solder side is smaller and placed closer to the front of the card. When we next open the h1susb123 and h1sush56 IO Chassis, we should check if those cards are visually different as well.
Entry by Kyle
LLCV % open for CP2 increased by 20% of its traditional output ~ 10 days ago for some reason -> today's delivery resulted in slow response (shouldn't be related) -> xfer line likely will warm up tonight before PID settles -> delivery rate wasn't abnormally short -> uncertain why response is different -> will investigate
This entry by John
This is likely due to the extra draw on the CP2 dewar after connecting all the 3IFO storage containers to N2 purge. The dewar pressure is probably falling but we should be able to reduce the storage purge.
If not we can use the self pressurizing circuit of the dewar to increase the boiloff.
(Doug) I realigned the ETMx pointing needing only a few micro-radians of adjustment since it was upgraded. Several other OPLVR lasers will need to be realigned in the near future but I am in hopes that we can do them as we replace them with the new installation components.
The ETMy OPLVR laser over heated when its cooling fan quit. I pulled it and Jason and I repaired the unit and will run it over night on the bench. We tweaked the alignment, temperature settings as well as the current to get the best stabilization from it and will look at it in the AM to see if continues to be stable. Given Control Room permission we will reinstall it in the AM. We had added 1 too many diodes to the fan to slow down the RPM to quiet the acoustic noise which caused too large of a current draw causing the fan to over heat and stop. Doug and Jason
07:00 Jeff K begins RCG upgrade; on-table External went closed just after; ISS having some trouble. Sigg took control of the FE and is working on the ISS issue
08:15 ISS fixed and shutter reset
08:20 Hgh to LVEA to check HEPI accumulators. (pressure and charging)
08:23 Elli & Nutsi out to HWS table by HAM4
08:25 Started un-tripping all suspensions except for BS pending the M2 stage coil driver swap.
08:27 Fil swapping the BS M2 coil driver
08:32 Corey to MY
08:40 Jim Batch restarting DAQ
08:43 Fil reports coil driver swapped. Also, seismometer in beer garden is connected.
08:48 Rick, Jason and Kiwamu headed towards PSL to fix periscope PZT
08:50 Kyle into LVEA and then to Y VEA
08:51 Elli and Nutsi out of LVEA
09:00 brought end stations to offline and safe. (sei/sus)
09:11 Jim and Corey to EX to restart damper on BRS
09:13 Betsy out to LVEA to look for stuff
09:25 McCarthy out to LVEA
09:31 Greg to TCSY table
09:40 ISS FE model re-started
09:44 McCarthy out of LVEA
09:45 Corey out to LVEA/Jim back from EX
09:53 Hugh out of LVEA
09:54 Hugh out to Ends to check HEPI actuators
09:55 Corey out of LVEA
09:56 RCG upgrade is COMPLETE! .....
10:00 Elli and Nutsi to end stations to put plates on cameras
10:12 Doug to EX to check opLev - did a tiny bit of alignment
10:29 Cris and Karen out of LVEA. Karen to EY
10:32 Elli and Nutsi back from ends
10:38 Betsy out to LVEA
10:41 Dick out to LVEA to ISC racks 1/2/4
10:46 Kyle back from end stations
10:53 Gerardo making deliveries to end stations
10:54 2nd LN2 delivery arrived. I don't remember the first one getting here but the alarms tell the tale.
11:01 Cris to EX
11:15 Port-O-Let maintenance on site
11:15 Hugh back from EY
11:20 original BS M2 coil driver returned to SUS C5.
11:36 Karen leaving EY
12:04 Jason/Rick/Kiwamu out of PSL
12:07 Greg, Elli & Nutsi out of LVEA
12:21 Dick out of LVEA
12:24 Fil and Hugh out to LVEA to press the centering button on the new seismometer
13:23 FIl and Andres @ MY
13:33 Gerardo out to LVEA to retreive a PC by the X manifold.
13:35 Hugh out of LVEA
13:50 Gerardo out of LVEA
14:01 Kyle to Mid Stations
14:37 Hugh into CER
15:14 Fil and co. back from Mid station
15:15 Kyle back from Mid Stations
While clearing out many settings diffs on the SDF after the RCG 2.9.1 upgrade today, Kissel and I discovered a few issues, the worse offender is the first one:
- OPTICALIGN values are not being monitored on SDF since they constantly change. However after reboots, these values are read in via a burt from the SAFE.SNAP file. Often these files are older than 1 day (in many cases weeks old) so therefore restore an old IFO alignment. We need a better way to save local IFO alignments in the SAFE.SNAP now that we have rerouted how we transition suspensions from aligned to misaligned.
- Precision of the SDF writing feature is different than the precision in the reading feature, therefore they do not clear from the diff list when you attempt to ACCEPT them.
- Useage of CONFIRM and LOAD TABLE buttons is still confusing
- Settings on the DIFFS screen represent the switch settings via reading SWSTAT however the MON list still shows the SW1 and SW2 settings. This means the TOTAL DIFFs and NOT MONITORED counts never add up to the same number. 1 SW line item on the DIFF screen turns into 2 line items when pushed to the NOT MONITORED screen.
- The ramp matrix (specifically looking at LSC) is constantly reporting changes. The time stamp keeps updating even though the value is not actually changing. Haven't looked at other ramp matrices yet.
I've done a little bit more clean up -- this user interface is fantastic once you get to know how to do the operations you want. For example, I wanted to - Get rid of the "errant file" red messages on the corner station BSC-ISI tables - Reduce the channels "not found" and "not initialized" to zero on the HAM-ISI tables both of which require one to write (i.e. confirm) something and then force the front end to re-read that newly written table (i.e. load) to clear the errors. So, I've - selected to monitor the ISI's Master Switches (which instantly become a diff, because the "safe" state is still with the master switch OFF, and we're currently using the ISIs), - confirmed the change, (on the main table screen) - loaded the table, (on the "SDF RESTORE" screen) - selected to unmonitor the master switch, - and confirmed. (on the main table). Very nice!
J. Kissel, D. Sigg, D. Barker, J. Batch, B. Weaver, E. Merilh We've upgraded all front-end models to RCG 2.9.1. Recovery of the IFO is on-going, given that we had our usual chaotic array of activities, but all signs point towards success. The upgrade was a little bit harder for some models / front-ends than others (details below), but we now have a completely green CDS overview screen (except for the saturation indicators that are always present on certain ISC models, and the currently-railing STS2 plugged into STS B causing all corner station seismic models to saturate -- more on that later). Details of the hiccups below. Details --------- Problem children: - PSL ISS front-end model This guy was the nastiest. The model some how did not have a safe.snap file or softlink in the /opt/rtcds/lho/h1/target/h1psliss/h1pslissepics/burt/ directory. We didn't notice this until much later, but at first this caused the model to simply not start because it couldn't find (what used to be) "the burt button" EPICs record that gave to OK status to get started. Daniel and I tried blindly restarting the model, recompiling-reinstalling-and-restarting the model, with no success. Finally, Daniel figured out that if he burt restored *while* the front end was coming up, he could get the model to start. Later, Betsy was making the effort to turn on the PSL ISS model's SDF system, created a table, but could not load it. After about 30 [sec] of trying, the front end model core dumped, seg faulted, and just died (not only this user model crashed, every other model on the front end survived). It was only after investigating this, that we found out about this missing target-area-safe.snap. The userapps repo had a safe.snap, so we softlinked the target-area to the userapps repo, and restarted the front-end model. All has been well since. We don't understand how a front end model can exist with out *something* in the target area called "safe.snap". - The PCAL X and Y front-end model Because it appears to be related (both in symptom and potentially in responsible parties) I mention this here. Last night, when I was capturing all settings in prep for today's model restarts, I captured new safe.snaps for the pcal front end models, because they were not yet under SDF control. In doing so immediately noticed that these models *also* didn't have safe.snap files in the target area. I didn't think much of it at the time, because I know the history of the pcal models, BUT now that I see a similar problem with the psliss model, I worry that the problem is systematic. Will investigate further. - The end-station ISC, SEI, and SUS models We had only planned to restart the end station's ISC and PEM front ends, because the SUS and SEI had been upgraded last week. However, when we restarted the ISC models, we found lots of continuous IPC errors between the ISC models and SEI and SUS. We think we traced this down to the clearing of the entire IFO's IPC file yesterday. Dave and Jim had thought they'd recompiled and reinstalled the SEI and SUS, which should have populated the IPC file *without* restarting the models, but this didn't appear to be successful. So, we ended up recompiling, reinstalling, and restarting the SEI and SUS models in addition to every other model at the end station (again). All errors are clear now, as mentioned above. Also: As mentioned by Daniel (LHO aLOG 17969), we had forgotten to update the some ODC library parts before getting started with the recompiling yesterday, so only some models received the upgrade to their ODC. There also seems to be a few bugs with the updated version, from what we can see. Will follow up with the ODC team. However, since we've run out of time, we'll include the remainder of these in the model recompiling already planned for next Tuesday. Models that need restarting to receive the ODC update: - All corner station SEI and SUS models - Corner station TCS model - Corner station PEM - Corner LSC, ASC, and OMC models - Corner Station CAL model (Note, all SUS AUX models don't have any ODC in them, so they do not need the update).
The DAQ system was also updated to 2.9.1, h1fw0, h1fw1, h1nds0, h1nds1, h1dc0, h1broadcast0. The NDS1 protocol is now reported as 12.2, so a few control room tools will need to be updated to handle the protocol version change. They should function properly as they are for now. There was issues with duplicate channel names restarting the data concentrator. This was caused by the ODC part in h1calex and h1caley being named simply ODC at the top level of the model instead of EX_ODC and EY_ODC. This delayed the restart of the data concentrator by several minutes. The h1asc model is running a specially modified awgtpman to allow more testpoints, as it was under RCG-2.9.
JasonO, KiwamuI, RickS (and RobertS, in spirit)
Today, we moved the PZT-controlled mirror from the top of the IO periscope down to the surface of the optical table and swapped it with with a turning mirror that was on the table. I.e. IO_MB_M6 (top of periscope) swapped with IO_MB_M4 (turning mirror immediately downstream of thin-film polarizers). Note that the PZT blocks the (weak) beam transmitted through the mirror, so we removed beam dump IO_MB_BD6.
We first installed an iris at the top mirror ouput, using a C-clamp to attach a temporary plate to the top of the periscope.
We removed the top mirror mounting plate, installed a remporary Ameristat skirt using cleanroom tape, and used a single-edge razor blade to remove some of the periscope damping material. This allowed the top plate to drop down to the required position.
We swapped the pitch actuator on the upper mirror mount to use a non-lockable actuator that doesn't interfere wtih the mounting plate.
We then used existing irises on the table in the path transmitted by the bottom periscope mirror and the iris we installed plus the spot on the outside of the PSL enclosure that reflects from the HAM1 input port to align the two mirrors we swapped.
We were able to re-install the protective shield for the vertical path up the periscope in it's original orientation.
We expect that RobertS will assess whether or not this reduced noise induced by the PZT mirror by not having it at the top of the periscope where the noise is amplified by the periscope resonance.
A few images are attached below.
Some comments from point of view of the IMC control.
After today's upgrade to RCG 2.9.1 we looked to see if the DAC AUTOCAL was successful using these procedural notes. The last check was done on the BSCs only a few weeks ago (alog 17597). During today's check, we found errors on h1susb123 again and also on h1sush56. Both show that the AUTOCAL failed for 1 of their DACs. As well, Kissel logged into LLO and found all AUTOCALs reported SUCCESS on all SUS front ends since their last computer restart.
The LHO errors were as follows:
controls@h1sush56 ~ 0$ dmesg | grep AUTOCAL
[ 60.359217] h1iopsush56: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 65.510812] h1iopsush56: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 70.661410] h1iopsush56: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 75.813017] h1iopsush56: DAC AUTOCAL FAILED in 5134 milliseconds
[ 80.963620] h1iopsush56: DAC AUTOCAL SUCCESS in 5134 milliseconds
[8443363.521348] h1iopsush56: DAC AUTOCAL SUCCESS in 5134 milliseconds
[8443368.669944] h1iopsush56: DAC AUTOCAL SUCCESS in 5133 milliseconds
[8443373.818544] h1iopsush56: DAC AUTOCAL SUCCESS in 5133 milliseconds
[8443378.967145] h1iopsush56: DAC AUTOCAL FAILED in 5133 milliseconds
[8443384.115739] h1iopsush56: DAC AUTOCAL SUCCESS in 5133 milliseconds
The first of the 2 above calibrations was the reboot 97 days ago, on Jan 14, 2015 when we upgraded to RCG 2.9.
________________________________________________________________
controls@h1susb123 ~ 0$ dmesg | grep AUTOCAL
[ 61.101850] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 66.252460] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 71.833569] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[ 77.416848] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 82.567454] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 87.718046] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 92.869668] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[ 98.021279] h1iopsusb123: DAC AUTOCAL FAILED in 5134 milliseconds
[6643697.827654] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[6643702.976255] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[6643708.553386] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[6643714.130498] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[6643719.278911] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[6643724.427687] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[6643729.576295] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[6643734.724703] h1iopsusb123: DAC AUTOCAL FAILED in 5133 milliseconds
[8443632.920341] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[8443638.069031] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[8443643.646215] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[8443649.223336] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[8443654.371885] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[8443659.520452] h1iopsusb123: DAC AUTOCAL SUCCESS in 5134 milliseconds
[8443664.669135] h1iopsusb123: DAC AUTOCAL SUCCESS in 5133 milliseconds
[8443669.817631] h1iopsusb123: DAC AUTOCAL FAILED in 5133 milliseconds
The first of the 3 above calibrations was the reboot 97 days ago, on Jan 14, 2015 when we upgraded to RCG 2.9, then a restart/calibration by Kissel April 1 (mentioned above), then today's RCG 2.9.1 upgrade.
J. Kissel, B. Weaver, D. Barker, R. McCarthy, and J. Batch We've traced down which suspensions are *using* these DAC cards that always fail their calibration: susb123's is used by the ITM ESD, and sush56's is used by the bottom stage of SR3. Both are currently not used for any local or global control, so we don't *think* this should cause any glitches. @DetChar -- can you confirm this? I'm worried that when the DAC *noise* crosses zero, then there are still glitches. Further -- Dave has confirmed that there are 18-bit DAC on the DAQ Test stand which fail the calibration, and those cards specifically appear to be of a different generation board that the ones that pass the calibration regularly. We suspect that this is the case on the H1 DAC cards as well. However, because they're not used by anything, we figure we'll wait until we *have* to swap out the SUS DACs for the newer-better, fixed-up EEPROM version of the board to investigate further. That's the plan so far. Stay tuned!
Detchar would like to request that the HWS cameras in the center building be turned off for a few minutes at a known time. We're trying to track down some glitches in PEM and ISI sensors that happen every second, and Robert suspects the HWS. Just a few minutes with them off, and then on again, would be fine; we don't need the IFO to be in any particular state, as long as the ISIs are running fine. We would need the precise times (UTC or GPS preferred), as the channels that record the camera state don't seem trustworthy (alog).
This afternoon I tunred all HWS off and I will leave them off all night (both of the corner station HWS were on prior to this).
It seems like the HWS was in fact the culprit. The HWS was turned of at Apr 21 20:46:48 UTC, according to TCS-ITMX_HWS_DALSACAMERASWITCH. I checked the BLND_Z of the GS13s on BS and ITMX, and the table 2 PSL accelerometer. All three had glitches every second before the HWS was turned off. They all continued to glitch for 11 more seconds (until the end of the minute), and then all stopped at the exact same time. Attached is a spectrogram of the ITMX GS13. It's hard to see the glitches in the PSL by spectrogram or even Omega scan, but they're very apparent in the Omicron triggers.
Here are three better spectrograms showing the transitioning off of the HWS and the loud once per second glitches going away in the ISI-*_ST2_BLND_Z_GS13_CUT_IN1 channels. These plots are made with https://ldvw.ligo.caltech.edu using 0.25 seconds per FFT and normalization turned on. Conclusions same as Andy's post above.
David Shoemaker asked the good question, do these glitches even show up in DARM? Well, that's hard to say. There are once per second glitches that show up in the ISI channels, and once per second glitches that show up in DARM. We don't know if they have the same cause. Figures are: 1. DARM once per second glitches, 2,3, BS and ITMX, 4. overlay of all showing that the glitches in DARM are just slightly ahead in time (in this 0.25 sec/fft view, unless there is some sample-rate timing bias).
In order to test whether they are both caused by the HWS it would be really useful if folks on site could turn the HWS on, then off, for a minute or so in each configuration during a low-noise lock and record the UTC times of those states.
We got to a low noise state which is not at as low noise as our best, with the spectrum about a factor of 10 worse at around 90 Hz than our best reference. We were in low noise, HWS off from 10:42:30 UTC to 10:47:30 UTC, i turned the cameras on according to Elli's instructions and we left the cameras on from 10:48:20 UTC to 10:53:40.
Andy, Duncan When looking for times when the HWS camera was on or off, I found that the minute trends indicated that it was off on Apr 18 6:30 UTC for ~27 minutes. But the second trends indicate that it was turned off 20 minutes later than that (and back on at the same time). The raw data (sampled at 16 Hz) indicates that the camera was never turned off. This was originally found using data over NDS2, but Duncan has confirmed by using lalframe to read the frames directly. I've attached a plot below. The channels are H1:TCS-ITM{X,Y}_HWS_DALSACAMERASWITCH.
I was able to successfully run it in the Caltech (CIT) cluster using a matlab code i.e., the raw, minute and second trends agree. The matlab code uses ligo_data_find. But if I run the same code at Hanford cluster it produces the results Andy and Duncan saw i.e., the trends disagree. So there seems to difference between the frames at these two locations for the trend ones. I have attached the matlab codes here with incase some one wants to test it.
This is because the trend data from the two CDS framewriters can disagree. This happens if a framewriter restarts during the period covered by the trend file, and the averages from each framewriter are computed using a different number of values. These differences only happens with the trend data. See below for the details.
Note that at LHO, LDAS is using the CDS fw1 framewriter as the primary source of the scratch trends (saved at LHO for the past month) and the CDS fw0 frameswriter as the primary source of the archive trends (copied to CIT and saved permenantly at LHO and CIT). If a framewriter goes down, it will still write out the trend data based on what data it has since it restarted. Thus you can get trend frames that contain data averages for only part of the time period covered by the file. For the time given in this alog, the trend files under /archive (from framewriter-0) and /scratch (from framewriter-1) differ is size: $ ls -l /archive/frames/.../H-H1_M-1113372000-3600.gwf -r--r--r-- 1 ldas ldas 322385896 Apr 18 00:27 /archive/frames/.../H-H1_M-1113372000-3600.gwf $ ls -l /scratch/frames/.../H-H1_M-1113372000-3600.gwf -r--r--r-- 1 ldas ldas 310156193 Apr 18 00:46 /scratch/frames/.../H-H1_M-1113372000-3600.gwf Note that both files pass FrCheck (but have different checksum) and contain valid data according to framecpp_verify (e.g., run with the --verbose --data-valid options). However, if I dump out the data for one of the channels in question, I get: $ FrDump -i /archive/frames/.../H-H1_M-1113372000-3600.gwf -t H1:TCS-ITMX_HWS_DALSACAMERASWITCH.mean -d 5 | grep "0:" 0: 1 1 1 1 1 1 1 1 1 1 10: 1 1 1 1 1 1 1 1 1 1 20: 1 1 1 1 1 1 1 1 1 1 30: 1 1 1 1 1 1 1 1 1 1 40: 1 1 1 1 1 1 1 1 1 1 50: 1 1 1 1 1 1 1 1 1 1 $ FrDump -i /scratch/frames/.../H-H1_M-1113372000-3600.gwf -t H1:TCS-ITMX_HWS_DALSACAMERASWITCH.mean -d 5 | grep "0:" 0: 0 0 0 0 0 0 0 0 0 0 10: 0 0 0 0 0 0 0 0 0 0 20: 0 0 0 0 0 0 0 0 1 1 30: 1 1 1 1 1 1 1 1 1 1 40: 1 1 1 1 1 1 1 1 1 1 50: 1 1 1 1 1 1 1 1 1 1 These frames start at, $ tconvert 1113372000 Apr 18 2015 05:59:44 UTC and the 0's start about 28 minutes into the /scratch file (copied from framewriter-1), while the /archive version only contains 1's (copied from framewriter-0). Thus, I predict framewriter-1 restarted at around Apr 18 2015 06:28:00 UTC. It seems that 0's get filled in for times before that. If I check, H1:TCS-ITMX_HWS_DALSACAMERASWITCH.n, which gives the number of values used to get the averages, this is also 0 when then above numbers are 0, indicating the 0's came from times when framewriter-1 had no data. Note that this behavior only occurs for second-trend and minute-trend data. If data is missing in the raw or commissioning data, no file is written out. Thus, we never find a difference between the raw (H1_R) or commissioning (H1_C) frames between valid frames written by both framewriters. Note that the diffH1fb0vsfb1Frames process seen in the first row of green lights here, http://ldas.ligo-wa.caltech.edu/ldas_outgoing/archiver/monitor/d2dMonitor.html is continuously checking that the raw frames from the two framewriters is the same. (The same process runs at LLO too.) If differences are found, it sends out an email alert. I've never received an alert, expect when the RAID disk-arrays have either filled up (and 0 byte files were written by one framewriter) or when the RAID disk-array hung in some way that caused corrupt files to be written. In both cases, the files on the problem array never pass FrCheck and are never copied into the LDAS system. Thus, the above feature, is a feature of the second-trend and minute-frames only. To avoid this issue, code should check the .n channel to make sure the full number of samples were used to obtain the average. Otherwise, some of the trend data gets filled in with zeros.
Greg said:
Thus, I predict framewriter-1 restarted at around Apr 18 2015 06:28:00 UTC. It seems that 0's get filled in for times before that.
the restart log for 17th April says
2015_04_17 23:28 h1fw1
With local PDT time = UTC - 7, Greg gets a gold star.
There should also be a .n channel which tells you how many samples were included in the average.