Completed final tests on bake enclosure heater today with contractor. We have four thermocouple readouts which all act as over-heat protection along with temperature read back. If temperature rises above set point plus up to 20 degC (user defined) then the heater trips off.
We secured four type T thermocouples in the enclosure (one touches a metal flange at bottom of CP4 and the other three float in air), plus one type K that floats in air and is connected to CDS at TE202A - formerly CP3 exhaust temp read out - for remote monitoring and alarm messaging. It currently reads 29degC, which is higher than the other four reading around 21degC.
We found a spill from the turbo's portable chiller this morning. The flex hose had slipped off a barbed connection, so Kyle replace with metal hose and threaded fittings. The fluid is DI water, mixed with some residual anti-corrosion slime.
We will set or adjust text/email alarms for the following channels:
J. Oberling, E. Merilh
The last couple of days have been somewhat frustrating. After installing the new FE pick-off last Friday, Ed and I attempted to re-acquire the PMC alignment, and failed. While we were successfully re-aligning the beam to the PMC, we were also misaligning the beam path towards the future home of the 70W amplifier; the beam was being driven too low and was beginning to clip on PBS02. We removed the FE pick-off and re-established our PMC alignment, and took the opportunity to install 2 irises; one between mirrors M04 and M05 and one between mirrors M06 and M07 (we would have preferred to have this iris between mirror M07 and the PMC, but there wasn't room to install the iris without blocking the reflected PMC beam). This was completed by COB Monday. This done, we re-installed the pick-off on Tuesday morning and, using primarily mirror M02 (but some adjustment of M01 was necessary), were able to recover the PMC alignment while mostly maintaining our beam path towards the 70W amp. We then tweaked the pick-off alignment, which resulted in a necessary small re-tweak of the PMC alignment. We took visibility measurements both before and after the pick-off installation:
One thing we noted was a loss of power incident on the PMC. Before the pick-off installation we had ~25W incident on the PMC; after the installation and alignment we only had ~23W incident on the PMC. We could find no obvious place where we are losing 2W of power; no obvious clipping or misalignments. Perhaps some clipping in the new pick-off? More on that below.
To begin this morning, we installed the Wincam to take a quick beam profile measurement of the FE beam to check if there was any obvious clipping from the new pick-off. There wasn't, the beam looked as it did after our first install attempt last Friday, and very close to when we finished the NPRO swap in September 2017. We decided to move on to installation of AMP_WP01 and PBSC01, the first new on-table components for the 70W amplifier; PBSC01 replaces mirror M02, and together with AMP_WP01 gives us the ability to switch back and forth between 70W amplifier and FE-only operation. We reduced the power from the FE to ~300mW using the HWP inside the FE (thereby reducing the NPRO power delivered to the MOPA) and installed AMP_WP01. We then made some rough marks on the table to roughly assist with installation of PBSC01, and then removed mirror M02. PBSC01 was installed on the table in place of M02 and we began alternating translation of the mount and yaw of the optic to recover the PMC alignment.
During this process we noticed that the output power of the FE was changing. Without touching any power control optics, the FE power had drifted from 300mW to ~6.9W. We decided to lower the NPRO power by reducing the injection current from the NPRO power supply; we dropped it to ~1.26A from its operating point of 2.222A, which brought the FE output power back to ~300mW. Continuing the alignment, the FE power continued to increase on its own, getting up to 1.5W. At this point I noticed that I could adjust the HWP in the FE to reduce the power slightly, which indicates a possible shift in polarization from the NPRO. At this point we broke for lunch and to consult with Matt regarding this. Maybe they had seen similar behavior at LLO? Turns out they had not. We took some trends and to the best we could tell the FE power was following the NPRO power (I'll post the trends as a comment to this alog tomorrow). At this time we also noticed that when the FE is running at full power, the new pick-off is saturating; we will adjust this after PBSC01 installation. We decided to continue on with the alignment after lunch, this time putting the FE and NPRO output powers on a StripTool so we could monitor it in real-time while we worked. Continuing the alignment, the FE outuput power continued to slowly increase on its own. Once it got to ~800mW, we decided to lower the NPRO injection current again. I dropped it from 1.26A to 1.0A; this lowered the FE output power to ~300mW, where it stayed for the remainder of the afternoon. I have never seen this behavior before and am unclear as to the cause.
Regardless, by slowly translating and yawing PBSC01, using progressively further away alignment references, we are able to recover the majority of the PMC alignment; we did not have to touch mirror M01 at all. We fine-tuned the PMC alignment with mirrors M06 and M07 and took a visibility measurement:
One thing to note is we are once again down in power incident on the PMC. Before PBSC01 installation there was ~23W incident on the PMC; after installation there is ~20.5W incident on the PMC. While there is some leakage from PBSC01 towards the 70W amplifier beam path, it's not 2.5W worth. Once again we could find no obvious clipping or misalignment downstream of PBSC01 that would cause this loss of power. Looking upstream however, we see a good deal of scatter. Can't easily tell where it's coming from, I'm suspecting scatter from the new pick-off. I've attached a couple pictures showing this scatter.
Tomorrow our plan is to get a picture of the beam profile post-PBSC01 installation, and then to begin investigating this scatter. We know we need to adjust the alignment of the pick-off to prevent saturation of the PD, maybe that will help with the scatter as well. Once that is taken care of we plan on moving on to measuring the beam caustics of the FE for mode matching modeling.
Attachments:
StephenA, AlenaA, NikoL, MarekS, JimW, RickS
Jim and crew completed the installation of the shield panels today. They also adjusted the compression of the upper-right (when viewed from the ETM) flexure gap to ~0.220".
Everything seems to be installed as designed.
NikoL, MarekS, TravisS, RickS
Began assessing the centering of the Pcal beams on the input and output apertures using targets mounted to the Pcal window flanges on the A1 adapter. We plan to continue this work in the morning, going inside the vacuum envelope to assess centering on the Pcal periscope relay mirrors.
We plan to install the Pcal target on the ETM suspension cage for this work.
Reflections of the beam tube surface in the baffle not to be confused with smooth finish of the baffle on this photo
Here is a photo logging S/N of periscope components, collected during the above PCal Yend Baffle and Shields install effort. These articles conveniently do not appear to be assembled into any of the existing D1200174 assemblies. :(
Previous work: LHO aLOG 40759
Build records: T1800172
Summary of flexure measurements documented in T1800172:
| Flexure Location (viewed from Front per D1200174-v8) | Flexure Gap (in) |
|---|---|
| Upper Left | .210 |
| Upper Right | .230 |
| Lower Right | .150 |
| Lower Left | .110 |
| Flexure Location (viewed from Front per D1200174-v8) | Flexure Gap (in) |
|---|---|
| Upper Left | .210 |
| Upper Right | .220 |
| Lower Right | n/a |
| Lower Left | n/a |
2018 LHO End Y Flexure Gap, all baffles installed
SQZ6 enclosure was moved south of HAM6. Cabling for table in the SQZ bay was moved to new table location. SQZ team will let us know if we missed a cable. Power and E-Stop cables still need to be terminated.
Nutsinee Daniel
All outside cables are in place and connected.
Activity Summary:
Still at outbuildings:
Activity Details (all times UTC):
28-02-18 15:35 ChrisS to MY to drop off insulation for bakeout
28-02-18 15:54 Rick at EY, Pcal
28-02-18 16:05 Terry to SQZ bay
28-02-18 16:42 Terry out of the SQZ bay
28-02-18 16:40 Hugh to LVEA
28-02-18 16:55 Rick called, EY is laser safe
28-02-18 16:58 Hugh out of the LVEA
28-02-18 17:07 Hugh de-isolating the BS HEPI, then re-isolating
28-02-18 17:08 Hanford Fire Department at EX testing sensors
28-02-18 17:08 Janson and Ed to the PSL
28-02-18 17:21 Alena and Micheal to EY
28-02-18 17:22 Travis to EY
28-02-18 17:26 Corey to SQZ bay and then optics lab
28-02-18 17:27 Jim and Stephen to EY
28-02-18 17:33 TJ to HAM6
28-02-18 17:43 TJ back from optics lab
28-02-18 18:11 TJ to HAM6
28-02-18 18:18 Jaimie starting work on new Guardian machine, currently getting a work permit
28-02-18 18:18 Fil Liz and Diesy to EX to work on access system
28-02-18 18:24 Mike and visitor to LVEA
28-02-18 18:45 Mike and visitor out of the LVEA
28-02-18 17:40 Travis back from EY
28-02-18 19:15 Betsy and Travis to EX to stage for in-vacuum work
28-02-18 19:15 TJ out of the LVEA
28-02-18 19:40 Fil, Liz, and Deisy back from EX
28-02-18 19:48 Jason and Ed to PSL
28-02-18 19:56 Betsy and Travis back from EX
28-02-18 20:20 Terry and Sheila to SQZR bay
28-02-18 20:22 Karen done at MX
28-02-18 20:25 Fil to TCS to install new limiter hardware
28-02-18 20:50 Fil, Liz, to HAM6 to work on cabling
28-02-18 21:00 Corey to EY with 2" optics
28-02-18 21:08 Gerardo to MY to remove a cable for use at MX
28-02-18 21:10 Arm Crew: Mark and Mark and TJ, taking the arm off HAM6
28-02-18 21:11 Arm crew will be opening the rollup door
28-02-18 21:12 Arm crew will ensure the HAM6 soft cover is on while the rollup door is in use
28-02-18 21:18 TJ to HAM6
28-02-18 21:19 Karen done at MY
28-02-18 21:23 Nutsinee to HAM6
28-02-18 21:24 Chandra to LVEA
28-02-18 21:27 Dave changes oplog
28-02-18 21:50 Nutsinee back from LVEA
28-02-18 21:50 Corey back from EY
28-02-18 21:51 Travis to LVEA
28-02-18 21:51 Rick to LVEA
28-02-18 21:59 Rick back from the LVEA
28-02-18 22:02 Travis, Rick, and Niko, to EY for Pcal work
28-02-18 22:05 JeffB to LVEA to retrieve parts
28-02-18 22:36 MarkP to LVEA to delver cables
28-02-18 22:37 MarkP back from LVEA
28-02-18 22:37 DaveB to Mezz to check on chillers
28-02-18 22:37 Chandra to MY
28-02-18 22:54 Richard to old PSL chiller closet and then HAM6
28-02-18 23:12 Betsy to LVEA for supplies
28-02-18 23:13 DaveB to CER to disconnect the TCS FE input to the newly installed summing box
28-02-18 23:17 Mark and Mark done at HAM6, arm is off, SQZ table is placed
28-02-18 23:20 DaveB to TCS chillers to read the setpoint from the chiller display
28-02-18 23:32 Rick, Travis, Niko, Marik, leaving EY
28-02-18 23:32 EY is laser safe
28-02-18 23:48 Patrick restarting Beckoff Laser Safety Code
28-02-18 23:56 Gerardo to MY to assist Kyle, then on to EY
28-02-18 23:57 Rick, Travis, Niko, Marik, back from EY
Hey, look, there's an updated oplog format! Original format for date and time was l-o-n-g. Shortened the numerical date, dropped the seconds for time, and dropped the callout of UTC.
Feb 27 2018 15:25:45 UTC ChrisS to all out buildings for FAMIS/Maint.: fire ext. charging lifts
28-02-18 15:35 ChrisS to MY to drop off insulation for bakeout
Update, as of 00:32UTC, all times in UTC:
01-03-18 00:16 Patricks done restarting code
01-03-18 00:16 Liz back from LVEA, Fil still at HAM6
01-03-18 00:27 Gerardo is back from EY
01-03-18 00:28 Jason and Ed done in the PSL
Done to add a readback channel for the squeezer laser interlock and to add an interlock for a spare laser. Will have tripped off and back on the power supplies for connected lasers.
Waited until work was complete at end Y.
Feb 28 13:47:45 conlog-master conlogd[7513]: Unexpected problem with CA circuit to server "h1ecatc1.cds.ligo-wa.caltech.edu:5064" was "Connection reset by peer" - disconnecting
Feb 28 13:47:45 conlog-master conlogd[7513]: terminate called after throwing an instance of 'sql::SQLException'
Feb 28 13:47:45 conlog-master conlogd[7513]: what(): Invalid JSON text: "Invalid escape character in string." at position 44 in value for column 'events.data'.
2018-02-28T21:47:45.308479Z 8 Execute INSERT INTO events (pv_name, time_stamp, event_type, has_data, data) VALUES('H1:SQZ-CLF_FLIPPER_NAME', '1519854464944719307', 'update', 1, '{"type":"DBR_STS_STRING","count":1,"value":["?"],"alarm_status":"NO_ALARM","alarm_severity":"NO_ALARM"}')
2018-02-28T21:47:45.308688Z 8 Query rollback
WP7388 TCS CO2 laser summing chassis
Richard, Fil, Cheryl, Dave:
While both TCS CO2 lasers were OFF, we installed the new laser summing chassis in the CS CER.
At 14:00 PST I stopped h1tcscs from driving the ITM[X,Y] chiller setpoint control voltages. Soon after, Fil installed the summing box (D1500265) in the path between the tcs AI chassis and the ITMX and ITMY chiller units on the mech room mezzanine.
From 14:05 to 15:15 we ran the chillers in this mode. The chillers LCD display show the temperature setpoints for ITMX and ITMY to be 19.6C and 19.5C respectively.
The laser head temperatures reached an equilibrium value after about 45 minutes.
WHILE WE ARE TESTING THE SUMMING BOX AND CALIBRATING THE H1TCSCS MODEL, THE DAC OUTPUT SHOULD NOT BE TURNED ON.
Updated Open Light Voltages (OSEMs sitting face down on surfaces so darkish):
| M0 OSEM | NEW OLV | NEW OFFSET | NEW GAIN | OLD OFFSET | OLD GAIN |
| F1 | 27600 | 13800 | 1.087 | -15109 | 0.993 |
| F2 | 25600 | 12800 | 1.172 | -14961 | 1.003 |
| F3 | 27400 | 13700 | 1.095 | -14740 | 1.018 |
| LF | 29300 | 14650 | 1.024 | -15567 | 0.964 |
| RT | 26100 | 13050 | 1.149 | -15396 | 0.974 |
| SD | 27500 | 13750 | 1.091 | -15555 | 0.964 |
| R0 OSEM | NEW OLV | NEW OFFSET | NEW GAIN | OLD OFFSET | OLD GAIN |
| F1 | 29000 | -14500 | 1.034 | -15529 | 0.966 |
| F2 | 25000 | -12500 | 1.200 | -15029 | 0.998 |
| F3 | 26100 | -13050 | 1.149 | -14977 | 1.002 |
| LF | 29100 | -14550 | 1.031 | -15397 | 0.974 |
| RT | 27000 | -13500 | 1.111 | -15525 | 0.966 |
| SD | 27100 | -13550 | 1.107 | -15141 | 0.991 |
| L1 OSEM | NEW OLV | NEW OFFSET | NEW GAIN | OLD OFFSET | OLD GAIN |
| UL | 27225 | -13613 | 1.102 | -14611 | 1.027 |
| LL | 28285 | -14143 | 1.061 | -15059 | 0.996 |
| UR | 24350 | -12175 | 1.232 | -12868 | 1.116 |
| LR | 20780 | -10390 | 1.444 | -11361 | 1.166 |
| L2 OSEM | NEW OLV | NEW OFFSET | NEW GAIN | OLD OFFSET | OLD GAIN |
| UL (#572) | 18500 | -9250 | 1.62 | -10667 | 1.41 |
| LL (kept #428) | 22700 | -11350 | 1.32 | -12291 | 1.220 |
| UR (#522) | 22000 | -11000 | 1.36 | -10612 | 1.413 |
| LR (#526 | 21400 | -10700 | 1.40 | -10163 | 1.476 |
Note, we swapped out 3 of the 4 L2 stage AOSEMs which were showing lower OLVs than we were satisfied with, especially since Stuart at LLO provided me back with the latest batch of 8 tested AOSEMs we had fabbed here which had slightly higher OLVs. We removed AOSEM D0901065 S/Ns 321, 332, and 473. These AOSEMs could be used elsewhere if needed. ICS has been updated.
WP 7372 FRS 9743
As reported in LHO aLog 40171 and 40201, the Corner 4 vertical actuator on HEPI appeared to be very close to the plus-side mechanical stop and exhibited some clipping are larger strokes. To fix this, the following was done:
With HEPI Isolating, corners 4 and 2 were mechanically locked. Managed this pretty well not tripping until well locked.
Partially installed the Actuator locking screws and positioned the 0.1" shims between the Tripod Base and the Top Foundation
Pressured support jack to hold Actuator just enough to prevent it from dropping.
Loosened the horizontal 1/2-20 SHCS holding the Actuator to the Actuator Brackets.
With those bolts loose, tightened the Actuator locking screws raising the actuator until the shims were clamped. As I was able to start and turn the locking screws, I had confidence the Tripod assembly was not horizontally out of position and so did not need to loosen the vertical Actuator Bracket bolts. Had that not been the case, this chore would have taken considerable more time as instead of just 6 bolts to loosen and then tighten, it would have been 18 with a couple iterations of tightening one group then the other and loosening the first to relieve strain.
Now with the Actuator effectively back in 'Installation' setup I was ready to zero the IPS. Looking at the actuator/stop gap however suggested that the ~3000ct reading was appropriate and IPS was not zero'd. The Actuator Locking Screws were removed (almost forgot one!) and the platform was unlocked.
At this point before and after positions were viewed on trends to determine new Isolation targets for the vertical dofs: the computed Z, RX, & RY positions changed with the change of V4. The shifts of these were applied to the target locations to arrive at new target locations.
| Unlocked -- V4 Shift | Target Position | ||||
| DOF | Before | After | Change | Before | After |
| Z | -10000 | -85000 | -75000 | -47000 | -122 or -92k |
| RX | -69000 | -21000 | +48000 | -78000 | -30000 |
| RY | -93000 | -47000 | +56000 | -96000 | -40000 |
After re-isolating with these new computed values, it was clear the actuators were pulling down to isolate the platform at the new Z location. So, I 'raised' the target position from -122k to -92k. This served to put the actuator drives somewhat around zero rather than all negative; this in turn puts the free hanging platform much closer to the isolated position. It also put IPSs in better balanced readout. I can't imagine that 30um in vertical position will impact any beam centering but of course this can easily be recovered. The RX & RY newly computed targets were very close to the free hanging tilts. I can not explain why the Z was off enough to warrant further action.
WP closed; FRS Pending
We are setting up a new guardian host machine. The new machine (currently "h1guardian1", but to be renamed "h1guardian0" after the transition is complete) is running Debian 9 "stretch", with all CDS software installed from pre-compiled packages from the new CDS debian software archives. It has been configured with a completely new "guardctrl" system that will manage all the guardian nodes under the default systemd process manager. A full description of the new setup will come in a future log, after the transition is complete.
The new system is basically ready to go, and I am now beginning the process of transferring guardian nodes over to the new host. For each node to be transferred, I will stop the process on the old machine, and start it fresh on the new system.
I plan on starting with SUS and SEI in HAM1, and will move through the system ending with HAM6.
There's been a bit of a hitch with the guardian upgrade. The new machine (h1guardian1) has been setup and configured. The new supervision system and control interface are fully in place, and all HAM1 and HAM2 SUS and SEI nodes have been moved to the new configuration. Configuration is currently documented in the guardian gitlab wiki.
Unfortunately, node processes are occasionally spontaneously seg faulting for no apparent reason. The failures are happening at a rate of roughly one every 6 hours or so. I configured systemd to catch and log coredumps from segfaults for inspection (using the systemd-coredump utility). After we caught our next segfault (which happened only a couple of hours later), Jonathan Hanks and I started digging into the core to see what we could ferret out. It appears to be some sort of memory corruption error, but we have not yet determined where in the stack the problem is coming from. I suspect that it's in the pcaspy EPICS portable channel access python bindings, but it could be in EPICS. I think it's unlikely that it's in python2.7 itself, although we aren't ruling anything out.
We then set up the processes to be run under electric fence to try to catch any memory out-of-bounds errors. This morning I found two processes that had been killed by efence, but I have not yet inspected the core files in depth. Below are the coredump summaries from coredumpctl on h1guardian1.
This does not bode well for the upgrade. Best case we figure out what we think is causing the segfaults early in the week, but there still won't be enough time to fix the issue, test, and deploy before the end of the week. A de-scoped agenda would be to just do a basic guardian core upgrade in the existing configuration on h1guardian0 and delay the move to Debian 9 and systemd until we can fully resolve the segfault issue.
Here is the full list of nodes currently running under the new system:
HPI_HAM1 enabled active
HPI_HAM2 enabled active
ISI_HAM2 enabled active
ISI_HAM2_CONF enabled active
SEI_HAM2 enabled active
SUS_IM1 enabled active
SUS_IM2 enabled active
SUS_IM3 enabled active
SUS_IM4 enabled active
SUS_MC1 enabled active
SUS_MC2 enabled active
SUS_MC3 enabled active
SUS_PR2 enabled active
SUS_PR3 enabled active
SUS_PRM enabled active
SUS_RM1 enabled active
SUS_RM2 enabled active
If any of these nodes are show up white on the guardian overview screen it's likely because they have crashed. Please let me know and I will deal with them asap.
guardian@h1guardian1:~$ coredumpctl info 11512
PID: 11512 (guardian SUS_MC)
UID: 1010 (guardian)
GID: 1001 (controls)
Signal: 11 (SEGV)
Timestamp: Sat 2018-03-03 11:56:20 PST (4h 50min ago)
Command Line: guardian SUS_MC3 /opt/rtcds/userapps/release/sus/common/guardian/SUS_MC3.py
Executable: /usr/bin/python2.7
Control Group: /user.slice/user-1010.slice/user@1010.service/guardian.slice/guardian@SUS_MC3.service
Unit: user@1010.service
User Unit: guardian@SUS_MC3.service
Slice: user-1010.slice
Owner UID: 1010 (guardian)
Boot ID: 870fed33cb4446e298e142ae901c1830
Machine ID: 699a2492538f4c09861889afeedf39ab
Hostname: h1guardian1
Storage: /var/lib/systemd/coredump/core.guardianx20SUS_MC.1010.870fed33cb4446e298e142ae901c1830.11512.1520106980000000000000.lz4
Message: Process 11512 (guardian SUS_MC) of user 1010 dumped core.
Stack trace of thread 11512:
#0 0x00007f1255965646 strlen (libc.so.6)
#1 0x00007f12567c86ab EF_Printv (libefence.so.0.0)
#2 0x00007f12567c881d EF_Exitv (libefence.so.0.0)
#3 0x00007f12567c88cc EF_Exit (libefence.so.0.0)
#4 0x00007f12567c7837 n/a (libefence.so.0.0)
#5 0x00007f12567c7f30 memalign (libefence.so.0.0)
#6 0x00007f1241cba02d new_epicsTimeStamp (_cas.x86_64-linux-gnu.so)
#7 0x0000556e57263b9a call_function (python2.7)
#8 0x0000556e57261d45 PyEval_EvalCodeEx (python2.7)
#9 0x0000556e5727ea7e function_call.lto_priv.296 (python2.7)
#10 0x0000556e57250413 PyObject_Call (python2.7)
...
guardian@h1guardian1:~$ coredumpctl info 11475
PID: 11475 (guardian SUS_MC)
UID: 1010 (guardian)
GID: 1001 (controls)
Signal: 11 (SEGV)
Timestamp: Sat 2018-03-03 01:33:51 PST (15h ago)
Command Line: guardian SUS_MC1 /opt/rtcds/userapps/release/sus/common/guardian/SUS_MC1.py
Executable: /usr/bin/python2.7
Control Group: /user.slice/user-1010.slice/user@1010.service/guardian.slice/guardian@SUS_MC1.service
Unit: user@1010.service
User Unit: guardian@SUS_MC1.service
Slice: user-1010.slice
Owner UID: 1010 (guardian)
Boot ID: 870fed33cb4446e298e142ae901c1830
Machine ID: 699a2492538f4c09861889afeedf39ab
Hostname: h1guardian1
Storage: /var/lib/systemd/coredump/core.guardianx20SUS_MC.1010.870fed33cb4446e298e142ae901c1830.11475.1520069631000000000000.lz4
Message: Process 11475 (guardian SUS_MC) of user 1010 dumped core.
Stack trace of thread 11475:
#0 0x00007fa7579b5646 strlen (libc.so.6)
#1 0x00007fa7588186ab EF_Printv (libefence.so.0.0)
#2 0x00007fa75881881d EF_Exitv (libefence.so.0.0)
#3 0x00007fa7588188cc EF_Exit (libefence.so.0.0)
#4 0x00007fa758817837 n/a (libefence.so.0.0)
#5 0x00007fa758817f30 memalign (libefence.so.0.0)
#6 0x00005595da26610f PyList_New (python2.7)
#7 0x00005595da28cb8e PyEval_EvalFrameEx (python2.7)
#8 0x00005595da29142f fast_function (python2.7)
#9 0x00005595da29142f fast_function (python2.7)
#10 0x00005595da289d45 PyEval_EvalCodeEx (python2.7)
...
After implementing the efence stuff above, we came in to find more coredumps the next day. On a cursory inspection of the coredumps, we noted that they all showed completely different stack traces. This is highly unusual and pathological, and prompted Jonathan to question the integrity of the physical RAM itself. We swapped out the RAM with a new 16G ECC stick and let it run for another 24 hours.
When next we checked, we discovered only two efence core dumps, indicating an approximate factor of three increase in the mean time to failure (MTTF). However, unlike the previous scatter shot of stack traces, these all showed identical "mprotect" failures, which seemed to point to a side effect of efence itself running in to limits on per process memory map areas. We increased the "max_map_count" (/proc/sys/vm/max_map_count) by a factor of 4, again left it running overnight, and came back to no more coredumps. We cautiously declared victory.
I then started moving the remaining guardian nodes over to the new machine. I completed the new setup by removing the efence, and rebooting the new machine a couple of times to work out the kinks. Everything seemed to be running ok...
Until more segfault/coredumps appeared ![]()
![]()
![]()
. A couple of hours after the last reboot of the new h1guardian1 machine, there were three segfaults, all with completely different stack traces. I'm now wondering if efence was somehow masking the problem. My best guess there is that efence was slowing down the processes quite a bit (by increasing system call times) which increased the MTTF by a similar factor. Or the slower processes were less likely to run into some memory corruption race condition.
I'm currently running memtest on h1guardian1 to see if anything shows up, but it's passed all tests so far...
16 seg faults overnight, after rebooting the new guardian machine at about 9pm yesterday. I'll be reverting guardian to the previous configuration today.
Interestingly, though, almost all of the stack traces are of the same type, which is different than what we were seeing before where they're all different. Here's the trace we're seeing in 80% of the instances:
#0 0x00007ffb9bfe4218 malloc_consolidate (libc.so.6)
#1 0x00007ffb9bfe4ea8 _int_free (libc.so.6)
#2 0x000055d2caca7bc5 list_dealloc.lto_priv.1797 (python2.7)
#3 0x000055d2cacdb127 frame_dealloc.lto_priv.291 (python2.7)
#4 0x000055d2caccb450 fast_function (python2.7)
#5 0x000055d2caccb42f fast_function (python2.7)
#6 0x000055d2caccb42f fast_function (python2.7)
#7 0x000055d2caccb42f fast_function (python2.7)
#8 0x000055d2cacc3d45 PyEval_EvalCodeEx (python2.7)
#9 0x000055d2cace0a7e function_call.lto_priv.296 (python2.7)
#10 0x000055d2cacb2413 PyObject_Call (python2.7)
#11 0x000055d2cacf735e instancemethod_call.lto_priv.215 (python2.7)
#12 0x000055d2cacb2413 PyObject_Call (python2.7)
#13 0x000055d2cad69c7a call_method.lto_priv.2801 (python2.7)
#14 0x000055d2cad69deb slot_mp_ass_subscript.lto_priv.1204 (python2.7)
#15 0x000055d2cacc6c5b PyEval_EvalFrameEx (python2.7)
#16 0x000055d2cacc3d45 PyEval_EvalCodeEx (python2.7)
#17 0x000055d2cace0a7e function_call.lto_priv.296 (python2.7)
#18 0x000055d2cacb2413 PyObject_Call (python2.7)
#19 0x000055d2cacf735e instancemethod_call.lto_priv.215 (python2.7)
#20 0x000055d2cacb2413 PyObject_Call (python2.7)
#21 0x000055d2cad69c7a call_method.lto_priv.2801 (python2.7)
#22 0x000055d2cad69deb slot_mp_ass_subscript.lto_priv.1204 (python2.7)
#23 0x000055d2cacc6c5b PyEval_EvalFrameEx (python2.7)
#24 0x000055d2cacc3d45 PyEval_EvalCodeEx (python2.7)
#25 0x000055d2cace0a7e function_call.lto_priv.296 (python2.7)
#26 0x000055d2cacb2413 PyObject_Call (python2.7)
#27 0x000055d2cacf735e instancemethod_call.lto_priv.215 (python2.7)
#28 0x000055d2cacb2413 PyObject_Call (python2.7)
#29 0x000055d2cad69c7a call_method.lto_priv.2801 (python2.7)
#30 0x000055d2cad69deb slot_mp_ass_subscript.lto_priv.1204 (python2.7)
Here's the second most common trace:
#0 0x00007f7bf5c32218 malloc_consolidate (libc.so.6)
#1 0x00007f7bf5c32ea8 _int_free (libc.so.6)
#2 0x00007f7bf5c350e4 _int_realloc (libc.so.6)
#3 0x00007f7bf5c366e9 __GI___libc_realloc (libc.so.6)
#4 0x000055f7eaad766f list_resize.lto_priv.1795 (python2.7)
#5 0x000055f7eaad6e55 app1 (python2.7)
#6 0x000055f7eaafd48b PyEval_EvalFrameEx (python2.7)
#7 0x000055f7eab0142f fast_function (python2.7)
#8 0x000055f7eab0142f fast_function (python2.7)
#9 0x000055f7eab0142f fast_function (python2.7)
#10 0x000055f7eab0142f fast_function (python2.7)
#11 0x000055f7eaaf9d45 PyEval_EvalCodeEx (python2.7)
#12 0x000055f7eab16a7e function_call.lto_priv.296 (python2.7)
#13 0x000055f7eaae8413 PyObject_Call (python2.7)
#14 0x000055f7eab2d35e instancemethod_call.lto_priv.215 (python2.7)
#15 0x000055f7eaae8413 PyObject_Call (python2.7)
#16 0x000055f7eab9fc7a call_method.lto_priv.2801 (python2.7)
#17 0x000055f7eab9fdeb slot_mp_ass_subscript.lto_priv.1204 (python2.7)
#18 0x000055f7eaafcc5b PyEval_EvalFrameEx (python2.7)
#19 0x000055f7eaaf9d45 PyEval_EvalCodeEx (python2.7)
#20 0x000055f7eab16a7e function_call.lto_priv.296 (python2.7)
#21 0x000055f7eaae8413 PyObject_Call (python2.7)
#22 0x000055f7eab2d35e instancemethod_call.lto_priv.215 (python2.7)
#23 0x000055f7eaae8413 PyObject_Call (python2.7)
#24 0x000055f7eab9fc7a call_method.lto_priv.2801 (python2.7)
#25 0x000055f7eab9fdeb slot_mp_ass_subscript.lto_priv.1204 (python2.7)
#26 0x000055f7eaafcc5b PyEval_EvalFrameEx (python2.7)
#27 0x000055f7eaaf9d45 PyEval_EvalCodeEx (python2.7)
#28 0x000055f7eab16a7e function_call.lto_priv.296 (python2.7)
#29 0x000055f7eaae8413 PyObject_Call (python2.7)
#30 0x000055f7eab2d35e instancemethod_call.lto_priv.215 (python2.7)
Pump down curve attached. XBM and YBM are isolated.
Hugh, Alvaro, TJ, Sheila
This afternoon Alvaro and TJ routed the fibers in HAM6 and Hugh installed the fiber feedthroughs. Alvaro and I used the 532 nm (eye safe) fiber laser to check the transmission. We measured 5.7 mW out of the fiber laser, 5mW out of the 532nm collimator and 3mW out of the CLF/seed collimator. The fiber laser power might have been fluctuating during our measurements, but the fibers are working.
The feedthrough labeled SN8 is on the flange D4-2 connected to the fiber (SN..) which goes to the 532nm pump path. Feedthrough SN9 is on the flange D4-3, inside the chamber it is connected to fiber SN _ which goes to the collimator for the seed/clf path.
This alog will be updated in the morning with the fiber serial numbers, and after I double check that I have the flange numbers correct.
h1iscey front end glitched at 14:15 PST. We are holding off on its restart until we contact EY group.
killed and started all models on h1iscey with EY permission.
I have seen glitches on my test stand H1-style ISCEX machine here at LLO (actually quite frequently). It persists even with the GE FANUC RFM removed. I have not tried it in an L1-style model mix yet.
We believe this was physically due to brushing equipment past the cables which loop out of the front of the rack at the end station. Note, these racks are in the middle receiving bay so frequently see traffic traverse in and out of the VEA.
(Mark D, Mark L, Gerardo M)
Ion pump 12 was removed from beam tube, and the vacuum port on the beam tube was covered with a blank. The ion pump's port was covered also with a blank. Varian SQ051 number 70095.
Measured dew point at -43.7 oC after IP12 removal.
Attached is a 500 day trend of 16 face BOSEMs from a few different randomly sampled suspensions in the corner station.
All of the sampled BOSEMs see 200-600 counts of decay over ~365 days of the plotted data.
Note, "face" facing BOSEMs on different types of suspensions have different reference names (T1 vs F1 vs RT BOSEMs are mounted in different locations on the different types of suspensions). For reference, search "Controls Arrangement Poster" by J. Kissel to see all of the configurations (E1100109 is the HLTS one for example). Or see the rendering on each medm screen.
Attached is the first plot I made of a few different randomly sampled suspensions, which included some vertically mounted BOSEMs. These trends are plotted ion brown and show other factors such as temperature in their shape over the last 500 days. Of the remaining red "face" mounted BOSEMs on the plot, all 11 show a downward trend of a couple hundred counts.
Using the same random selection of face OSEM channels as Betsy in the original aLOG entry above, but for LLO, 500 day trends are attached below. OSEM open-light decay trends appear similar between sites, with in general 100-600 counts of decay over ~500 days of the plotted data. However, it should be noted that the IM suspensions also included in the trends employ AOSEMs, and not BOSEMs, but the decay trends for both types of OSEMs appear to be consistent.