There is a write up of the bug which is possibly causing our front end issues here
I confirmed that our 2.6.34 kernel source on h1boot does have the arch/x86/include/asm/timer.h file with the bug.
To summarize, there is a low level counter in the kernel which has a wrap around bug with kernel versions 2.6.32 - 2.6.38. The bug causes it to wrap around after 208.499 days, and many other kernel subsystems assume this could never happen.
At 06:10 Thursday morning h1seiex partially locked-up. At 05:43 Friday morning h1susex partially locked-up. As of late Wednesday night, all H1 front ends have been running for 208.5 days. The error messages on h1seiex and h1susex consoles show timers which had been reset Wednesday night, within two hours of each other.
We are going on the assumption that the timer wrap-around has put the front end computers in a fragile state where lock-ups may happen. We don't know why only two computers at EX have seen the issue, or why around 6am, or why one day apart. Nothing happened around 6am this morning.
I am filing a work permit to reboot all H1 front end computers and DAQ computers which are running kernels with this bug.
15:03UTC
15:14 Intention bit Undisturbed.
Also, got the strange, "Conflicting IFO Status" verbal just before the intention bit verbal.
TITLE: 04/29 Owl Shift: 07:00-15:00 UTC (00:00-08:00 PST), all times posted in UTC
STATE of H1: Observing at 64Mpc
INCOMING OPERATOR: Ed
SHIFT SUMMARY:
Nice shift with H1 being locked for over 14.5hrs and only a trio of glitches. Luckily no frontend issues (knock on wood). L1 continues to be down due to possible alignment issues post-EQ from yesterday. Chatted briefly with Doug L who said Adam M was inbound.
LOG:
While looking into possible sources of glitches (noticed one Evan mentioned (alog#35800) earlier this week about ITMx glitching every few seconds....is this acceptable?).
I was using the Oplev Overview screen to grab the oplev sum channels and while doing that I noticed that the buttons on it for the SUS screens looked different from the main SUS screens one gets via the Sitemap. The problem button links on the oplev sum screen were for the ITMx, BS, & ITMy. The windows which I clicked open had INVALID/white areas and they just looked subtly different. (perhaps this medm screen is calling out old SUS screens.
Attached is a screenshot showing the difference for the BS. The Oplev Screen opens a screen called SUS_CUST_HLTX_OVERVEW.adl (vs the one from the sitemap which is SUS_CUST_BSFM_OVERVIEW.ADL).
Alaska 5.4magnitude EQ which originated from there at 11:15utc
Fairly quiet. Two glitches on H1.
See BLRMS 0.03-0.1Hz seismic elevating.
Whoa. Actually looks like we'll be shaken more by a quake from Alaska (5.4magnitude w/ 4.1um/s) which should be inbound soon.....watching & waiting.
TITLE: 04/29 Owl Shift: 07:00-15:00 UTC (00:00-08:00 PST), all times posted in UTC
STATE of H1: Observing at 65Mpc
OUTGOING OPERATOR: Nutsinee
CURRENT ENVIRONMENT:
Wind: 5mph
Primary useism: 0.02 μm/s
Secondary useism: 0.08 μm/s (at 50 percentile)
QUICK SUMMARY:
Nutsinee mentioned glitches in this current lock, but haven't had one for almost an hour. H1 is going on 7.5hrs of being locked. Let's see if our frontends give us grief tonight/this morning.
TITLE: 04/29 Eve Shift: 23:00-07:00 UTC (16:00-00:00 PST), all times posted in UTC
STATE of H1: Observing at 66Mpc
INCOMING OPERATOR: Corey
SHIFT SUMMARY: Some oplev related tweak in the beginning of shift. No trouble relocking after the earthquake. The range seems a bit glitchy during this lock stretch. I looked at random channels but no luck so far. Detchar Hveto page doesn't have data for today yet.
LOG:
23:40 Sheila+TJ to power cycle EX Oplev
23:55 Sheila+TJ out
00:21 Observe
00:54 Sheila going to back to EX to adjust oplev power. Switching off BRSX (forgot to go out of Observe while I did this). Out of Observe shortly after.
01:05 Sheila out. BRS turned back on. Back to Observe
Oplev glitches seem okay so far since Sheila turned the oplev power back up. Not much else to report.
We noticed on today's Hveto page that ETMX oplev is glitching. TJ and I went out and I turned the power knob down by about 1/8th of a turn. This reduced the output power by about 1%, and so far we don't have glitches.
After an hour it became clear that that power setting was actually worse, so I went back to the end station and turned the power up again at about 1 UTC. The glitching seems better in the first half hour at this higher power, but we will have to wait longer to see if it is really better. The attached screenshot shows the interferometer build up (which can explain some of the changes in the oplev sum) as well as the oplev sum for the past 5 hours.
Hopefully this means that we will be OK with glitches over the weekend.
I've used my IMC beam center measurements to calculate the change of the beam on IM1 in yaw, and propagated that change to the IO Faraday Isolator input, CalciteWedge1. The change on IM1 is calculated using an ideal IMC (centered beams) to recent beam spot measurements from March 2017. Nominal IM alignments are from the vent, July 2014, when the IMC REFL beam was routed through the IMs and the beam was well centered on the input and output of the IO Faraday.
My calculations show that the beam on CalciteWedge1 has moved +8.1mm, which is in the -X IFO direction, and the incident angle has changed by -1217urad, reducing the incident angle from 6.49deg to 6.42deg.
Beam Changes on IM1, IM2, and the IO Faraday input, CalciteWedge1:
change | units | |
im1 yaw, mm | -6.8 | mm |
im1 yaw, urad | 253 | urad |
im2 yaw, mm | -8.4 | mm |
im2 yaw, urad | -1417 | urad |
cw1 yaw, mm | 8.1 | mm |
cw1 yaw, urad | -1217 | urad |
The beam change on IM1 is well understood, since it comes from the IMC beam spot changes. The IM positions can be assumed to have some error, however I've done the same calculations with IM positions from before and after the vent, and the change on CalciteWedge1 varies only by about 1mm.
A change of 8mm (+/-1mm) on the IO FI input is significant.
The optics inside the Faraday Rotator are only 20mm in diameter, and there is a small loss in aperture due to the optic mounts.
Evan G., Jeff K. Summary: The calibration measurement data that was collected yesterday has now been analyzed using our Markov Chain Monte-Carlo (MCMC) methods. We detail the results below. Nothing abnormal was found, and we find that the time varying factors can track changes in sensing gain and coupled cavity pole and actuation coefficients. Details: We analyzed the data collected during yesterday's calibration measurements, see LHO aLOG 35849. We have simplified the process for analyzing the data; rather than running two separate Matlab script to generate the MCMC results, we can now run just one script to get the final model results. For the sensing function, we run ${CALSVN}/trunk/Runs/O2/H1/Scripts/SensingFunctionTFs/runSensingAnalysis_H1_O2.m while for actuation, we run ${CALSVN}/trunk/Runs/O2/H1/Scripts/FullIFOActuatorTFs/analyzeActuationTFs.m. Note that for the actuation calibration script, we have not yet converted it to an IFO agnostic script. Once it is converted, this script will be renamed. Below are the table of values and their associated uncertainties for yesterday's measurements. Also, for comparison, are the modeled values from the reference measurement 3 Jan 2017: Reference Value MAP (95% C.I.) MAP (95% C.I.) 2017-01-03 2017-01-03 (MCMC) 2017-04-27 (MCMC) Optical Gain K_C [ct/m] 1.088e6 1.088e6 (0.0002e6) 1.124e6 (0.0002e6) Couple Cav. Pole Freq. f_c [Hz] 360.0 360.0 (7.6) 343.4 (2.6) Residual Sensing Delay tau_C [us] 0.67 0.67 (6.7) -1.8 (1.8) SRC Detuning Spring Freq. f_s [Hz] 6.91 6.91 (0.1) 7.4 (0.04) Inv. Spring Qual. Factor 1/Q_s [ ] 0.0046 0.046 (0.016) 0.009256 (0.0069) UIM/L1 Actuation Strength K_UIM [N/ct] 8.091e-8 8.091e-8 (0.2%) 8.0818e-8 (0.18%) PUM/L2 Actuation Strength K_PUM [N/ct] 6.768e-10 6.768e-10 (0.02%) 6.795e-10 (0.08%) UIM/L3 Actuation Strength K_TST [N/ct] 4.357e-12 4.357e-12 (0.02%) 4.537e-12 (0.07%) UIM/L1 residual time delay [usec] n/a n/a 29.1 (35.5) PUM/L2 residual time delay [usec] n/a n/a 7.7 (3.1) TST/L3 residual time delay [usec] n/a n/a 10.2 (1.8) These values are derived from MCMC fitting to the data values. The attached plots show these results for the sensing and multiple-stage actuation functions. We have added a new feature to the MCMC analysis, modeling the residual time delay in actuation. We expect to have zero usec of residual time delay, provided the model accurately captures all dynamics of the actuation. Deviations from zero can reveal un-modeled dynamics. For example, the PUM and TST residual time delays are inconsistent with zero usec, but we expect that this is due to imperfect modeling of the complicated violin resonances of the quad suspension. The time varying factors are doing a good job tracking the changes between the reference model and the currently measured parameters (see time varying factors summary page). For reference, the parameter files were used as follows: ${CALSVN}/trunk/Runs/O2/H1/params/2017-01-24/modelparams_H1_2017-01-24.conf (rev4401, last changed 4396) ${CALSVN}/trunk/Runs/O2/H1/params/2017-04-27/measurements_2017-04-27_sensing.conf (rev4596, last changed 4596) ${CALSVN}/trunk/Runs/O2/H1/params/2017-04-27/measurements_2017-04-27_ETMY_L1_actuator.conf (rev4596, last changed 4596) ${CALSVN}/trunk/Runs/O2/H1/params/2017-04-27/measurements_2017-04-27_ETMY_L2_actuator.conf (rev4596, last changed 4596) ${CALSVN}/trunk/Runs/O2/H1/params/2017-04-27/measurements_2017-04-27_ETMY_L3_actuator.conf (rev4596, last changed 4596)
A typo in the reference values for 1/Q_s above. It should be 0.046 (not 0.0046 as typed above).
For a bigger picture look at these 2017-04-27 actuation function measurements, I attach a plot of the model against measurement *before* dividing out the frequency dependence. This helps discern what the overall phase of the actuator stages should be relative to each other during O2.
TITLE: 04/28 Day Shift: 15:00-23:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Observing at 0Mpc
INCOMING OPERATOR: Nutsinee
SHIFT SUMMARY:
LOG:
IFO was unlocked from EX Sus outage when I arrived, locking was a little more problematic than usual, but not bad.
18:15 Bubba to MX
18:45 Dan to MSR
22:00 Lockloss from 6.8 eq in the Phillipines
22:00 Bubba, Chandra to LVEA to close gate valves and finish craning clean room
22:45 Start locking again
[JeffK, JimW, Jenne]
In hopes of making some debugging a bit easier, we have updated the safe.snap files in SDF just after a lockloss from NomLowNoise.
We knew that an earthquake was incoming (yay seismon!), so as soon as the IFO broke lock, we requested Down so that it wouldn't advance any farther. Then, we accepted most of the differences so that everything but ISIBS, CS ECAT PLC2, EX ECAT PLC2 and EY ECAT PLC2 (which don't switch between Observe.snap and safe.snap) were green.
Jeff is looking at making it so that the ECAT models switch between safe and observe snap files like many of the other models, so that ISIBS will be the only model that has diffs (21 of them).
Note that if the IFO loses lock from any state other than NLN, we shouldn't expect SDF to all be green. But, since this is the state of things when we lose lock from NLN, it should be safe to revert to these values, in hopes of helping to debug.
After talking with Jeff and Sheila, I have made a few of the OBSERVE.snap files in the target directory a link to the OBSERVE.snap in userapps.
This list includes:
I have also updated the switch_SDF_source_files.py script that is called by ISC_LOCK on DOWN and on NOMINAL_LOW_NOISE. I changed the exclude list to only exclude the h1sysecatplc[1or3] "front ends". The sei models will stay in OBSERVE always just as before. This was tested in DOWN and in NLN, and has been loaded into the Guardian.
Summary: no apparent change in the induced wavefront from the point absorber.
After a request from Sheila and Kiwamu, I checked the status of the ITMX point absorber with the HWS.
If I look at the wavefront approximately 13 minutes after lock-aquisition, I see the same magnitude of optical path distortion across the wavefront (approximately 60nm change over 20mm). This is the same scale of OPD that was seen around 17-March-2017.
Note that the whole pattern has shifted slighlty because of some on-table work in which a pick-off beam-splitter was placed in front of the HWS.
Thanks Aidan.
We were wondering about this because of the reappearance of the braod noise lump from 300-800 Hz in the last week, which is clearly visible on the summary pages. (links in this alog ) It we also now have broad coherence between DARM and IMC WFS B pit DC, which I do not think we have had today. We didn't see any obvious alignment shift that could have caused this. It also seems to be getting better or going away if you look at today's summary page.
Here is a bruco for the time when the jitter noise was high: https://ldas-jobs.ligo-wa.caltech.edu/~sheila.dwyer/bruco_April27/
TITLE: 04/28 Owl Shift: 07:00-15:00 UTC (00:00-08:00 PST), all times posted in UTC
STATE of H1: LOCKING by Jim, but still in CORRECTIVE MAINTENANCE (will write an FRS unless someone else beats me to it again!)
INCOMING OPERATOR: Jim
SHIFT SUMMARY:
Groundhog's Day shift, with H1 fine for the first 6hrs and then EX going down again (but this time SUS...see earlier alog). This time I kept away from even breathing on the TMSx dither scripts & simply restored ETMx & TMSx to their values before all of the front end hub bub of this morning. I was able to get ALSx aligned & this is where I'm handing off to Jim (he had to tweak on ALSy & see that he is already tweaking up a locked PRMI. Much better outlook than yesterday at this time for sure!
LOG:
FRS Assigned & CLOSED/RESOLVED for h1susex frontend crash:
https://services.ligo-la.caltech.edu/FRS/show_bug.cgi?id=7995
attached is a list of 2.6.3x kernel machine and their runtime. Looks like h1nds0 will wrap-around Sunday afternoon.
h1nds0's timer will wrap around to zero about 11pm Sunday local time.