I have moved a subset of guardian nodes to the new configuration on h1guardian1. This is to try to catch more of the segfaults we were seeing during the last upgrade attempt, that we have not been able to reproduce in testing.
The nodes should function normally on the new system, but given what we saw before we expect to see segfaults with a mean time to failure of about 100 hours. I will be baby sitting the nodes on the new setup, and will restart them as soon as they crash.
The nodes that have been moved to the new system are all the SUS and SEI nodes in the input chambers, BS, and the arms. No nodes from HAM4, HAM5, or HAM6 were moved. Full list of nodes now running on h1guardian1:
jameson.rollins@opsws12:~ 0$ ssh guardian@h1guardian1 list HPI_BS HPI_ETMX HPI_ETMY HPI_HAM1 HPI_HAM2 HPI_HAM3 HPI_ITMX HPI_ITMY ISI_BS_ST1 ISI_BS_ST1_BLND ISI_BS_ST1_SC ISI_BS_ST2 ISI_BS_ST2_BLND ISI_BS_ST2_SC ISI_ETMX_ST1 ISI_ETMX_ST1_BLND ISI_ETMX_ST1_SC ISI_ETMX_ST2 ISI_ETMX_ST2_BLND ISI_ETMX_ST2_SC ISI_ETMY_ST1 ISI_ETMY_ST1_BLND ISI_ETMY_ST1_SC ISI_ETMY_ST2 ISI_ETMY_ST2_BLND ISI_ETMY_ST2_SC ISI_HAM2 ISI_HAM2_SC ISI_HAM3 ISI_HAM3_SC ISI_ITMX_ST1 ISI_ITMX_ST1_BLND ISI_ITMX_ST1_SC ISI_ITMX_ST2 ISI_ITMX_ST2_BLND ISI_ITMX_ST2_SC ISI_ITMY_ST1 ISI_ITMY_ST1_BLND ISI_ITMY_ST1_SC ISI_ITMY_ST2 ISI_ITMY_ST2_BLND ISI_ITMY_ST2_SC SEI_BS SEI_ETMX SEI_ETMY SEI_HAM2 SEI_HAM3 SEI_ITMX SEI_ITMY SUS_BS SUS_ETMX SUS_ETMY SUS_IM1 SUS_IM2 SUS_IM3 SUS_IM4 SUS_ITMX SUS_ITMY SUS_MC1 SUS_MC2 SUS_MC3 SUS_PR2 SUS_PR3 SUS_PRM SUS_RM1 SUS_RM2 SUS_TMSX SUS_TMSY jameson.rollins@opsws12:~ 0$
NOTE: Until the new system has been put fully into production, "guardctrl" interaction with these nodes on h1guardian1 is a bit different. To start/stop the nodes, or get status or view the logs, you will need to send the appropriate guardctrl command to guardian@h1guardian1 over ssh, e.g.:
jameson.rollins@opsws12:~ 0$ ssh guardian@h1guardian1 status SUS_BS
● guardian@SUS_BS.service - Advanced LIGO Guardian service: SUS_BS
Loaded: loaded (/usr/lib/systemd/user/guardian@.service; enabled; vendor preset: enabled)
Drop-In: /home/guardian/.config/systemd/user/guardian@.service.d
└─timeout.conf
Active: active (running) since Sun 2018-04-08 14:48:47 PDT; 1h 53min ago
Main PID: 24724 (guardian SUS_BS)
CGroup: /user.slice/user-1010.slice/user@1010.service/guardian.slice/guardian@SUS_BS.service
├─24724 guardian SUS_BS /opt/rtcds/userapps/release/sus/common/guardian/SUS_BS.py
└─24745 guardian-worker SUS_BS /opt/rtcds/userapps/release/sus/common/guardian/SUS_BS.py
Apr 08 14:48:50 h1guardian1 guardian[24724]: SUS_BS executing state: ALIGNED (100)
Apr 08 14:48:50 h1guardian1 guardian[24724]: SUS_BS [ALIGNED.enter]
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS REQUEST: ALIGNED
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS calculating path: ALIGNED->ALIGNED
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS same state request redirect
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS REDIRECT requested, timeout in 1.000 seconds
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS REDIRECT caught
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS [ALIGNED.redirect]
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS executing state: ALIGNED (100)
Apr 08 16:01:45 h1guardian1 guardian[24724]: SUS_BS [ALIGNED.enter]
jameson.rollins@opsws12:~ 0$
A couple of the SEI systems did not come back up to the same states they were in before the move. This caused a trip on ETMY HPI, and ETMX ISI_ST1. I eventually recovered everything back to the states they were in at the beginning of the day.
The main problem I've been having is with the ISI_*_SC nodes. They all are supposed to be in the SC_OFF state, but a couple of the nodes are cycling between TURNING_OFF_SC and SC_OFF. For instance, ISI_ITMY_ST2_SC is showing the following:
2018-04-09_00:00:39.728236Z ISI_ITMY_ST2_SC new target: SC_OFF
2018-04-09_00:00:39.729272Z ISI_ITMY_ST2_SC executing state: TURNING_OFF_SC (-14)
2018-04-09_00:00:39.729667Z ISI_ITMY_ST2_SC [TURNING_OFF_SC.enter]
2018-04-09_00:00:39.730468Z ISI_ITMY_ST2_SC [TURNING_OFF_SC.main] timer['ramping gains'] = 5
2018-04-09_00:00:39.790070Z ISI_ITMY_ST2_SC [TURNING_OFF_SC.run] USERMSG 0: Waiting for gains to ramp
2018-04-09_00:00:44.730863Z ISI_ITMY_ST2_SC [TURNING_OFF_SC.run] timer['ramping gains'] done
2018-04-09_00:00:44.863962Z ISI_ITMY_ST2_SC EDGE: TURNING_OFF_SC->SC_OFF
2018-04-09_00:00:44.864457Z ISI_ITMY_ST2_SC calculating path: SC_OFF->SC_OFF
2018-04-09_00:00:44.865347Z ISI_ITMY_ST2_SC executing state: SC_OFF (10)
2018-04-09_00:00:44.865730Z ISI_ITMY_ST2_SC [SC_OFF.enter]
2018-04-09_00:00:44.866689Z ISI_ITMY_ST2_SC [SC_OFF.main] SENSCOR_Y_IIRHP FMs:[4] is not in the correct configuration
2018-04-09_00:00:44.866988Z ISI_ITMY_ST2_SC [SC_OFF.main] USERMSG 0: SENSCOR_Y_IIRHP FMs:[4] is not in the correct configuration
2018-04-09_00:00:44.927099Z ISI_ITMY_ST2_SC JUMP target: TURNING_OFF_SC
2018-04-09_00:00:44.927619Z ISI_ITMY_ST2_SC [SC_OFF.exit]
2018-04-09_00:00:44.989053Z ISI_ITMY_ST2_SC JUMP: SC_OFF->TURNING_OFF_SC
2018-04-09_00:00:44.989577Z ISI_ITMY_ST2_SC calculating path: TURNING_OFF_SC->SC_OFF
2018-04-09_00:00:44.989968Z ISI_ITMY_ST2_SC new target: SC_OFF
2018-04-09_00:00:44.991117Z ISI_ITMY_ST2_SC executing state: TURNING_OFF_SC (-14)
2018-04-09_00:00:44.991513Z ISI_ITMY_ST2_SC [TURNING_OFF_SC.enter]
2018-04-09_00:00:44.993546Z ISI_ITMY_ST2_SC [TURNING_OFF_SC.main] timer['ramping gains'] = 5
2018-04-09_00:00:45.053773Z ISI_ITMY_ST2_SC [TURNING_OFF_SC.run] USERMSG 0: Waiting for gains to ramp
Note that the problem seems to be that it's failing a check for the SENSCOR filter banks being in the correct state once SC_OFF has been achieved. Here are the nodes that are having problems, and the messages they're throwing:
ISI_HAM2_SC [SC_OFF.main] SENSCOR_GND_STS_Y_FIR FMs:[1] is not in the correct configuration ISI_HAM3_SC [SC_OFF.main] SENSCOR_GND_STS_Y_FIR FMs:[1] is not in the correct configuration ISI_BS_ST2_SC [SC_OFF.main] SENSCOR_Y_IIRHP FMs:[4] is not in the correct configuration ISI_BS_ST1_SC [SC_OFF.main] SENSCOR_GND_STS_Y_WNR FMs:[6] is not in the correct configuration ISI_ITMX_ST2_SC [SC_OFF.main] SENSCOR_Y_IIRHP FMs:[4] is not in the correct configuration ISI_ITMY_ST2_SC [SC_OFF.main] SENSCOR_Y_IIRHP FMs:[4] is not in the correct configuration ISI_ETMY_ST1_SC [SC_OFF.main] SENSCOR_GND_STS_Y_WNR FMs:[6] is not in the correct configuration ISI_ETMY_ST1_SC [SC_OFF.main] SENSCOR_GND_STS_Y_WNR FMs:[6] is not in the correct configuration
I've tried to track down where exactly the problem is coming from, but haven't been able to figure it out yet. It looks like the expected configuration just does not match with how they're currently set. I will need to consult with the SEI folks tomorrow to sort this out. In the mean time, I'm leaving all of the above nodes paused.
The "sei_config" guardian nodes (ISI_*_{BLND,SC}) were showing import errors having to do with not finding the module "SEI_CONFIG". It looks like this module was renamed to "sei_config" on March 26, but the nodes that import it were not updated to import the module under the new name. I've updated them appropriately, and committed to the SVN:
jameson.rollins@opsws12:/opt/rtcds/userapps/release/isi/h1/guardian 0$ svn status M ISI_BS_ST1_BLND.py M ISI_BS_ST1_SC.py M ISI_BS_ST2_BLND.py M ISI_BS_ST2_SC.py M ISI_ETMX_ST1_BLND.py M ISI_ETMX_ST1_SC.py M ISI_ETMX_ST2_BLND.py M ISI_ETMX_ST2_SC.py M ISI_ETMY_ST1_BLND.py M ISI_ETMY_ST1_SC.py M ISI_ETMY_ST2_BLND.py M ISI_ETMY_ST2_SC.py M ISI_HAM2_SC.py M ISI_HAM3_SC.py M ISI_HAM4_SC.py M ISI_HAM5_SC.py M ISI_HAM6_SC.py M ISI_ITMX_ST1_BLND.py M ISI_ITMX_ST1_SC.py M ISI_ITMX_ST2_BLND.py M ISI_ITMX_ST2_SC.py M ISI_ITMY_ST1_BLND.py M ISI_ITMY_ST1_SC.py M ISI_ITMY_ST2_BLND.py M ISI_ITMY_ST2_SC.py jameson.rollins@opsws12:/opt/rtcds/userapps/release/isi/h1/guardian 0$ svn commit Sending ISI_BS_ST1_BLND.py Sending ISI_BS_ST1_SC.py Sending ISI_BS_ST2_BLND.py Sending ISI_BS_ST2_SC.py Sending ISI_ETMX_ST1_BLND.py Sending ISI_ETMX_ST1_SC.py Sending ISI_ETMX_ST2_BLND.py Sending ISI_ETMX_ST2_SC.py Sending ISI_ETMY_ST1_BLND.py Sending ISI_ETMY_ST1_SC.py Sending ISI_ETMY_ST2_BLND.py Sending ISI_ETMY_ST2_SC.py Sending ISI_HAM2_SC.py Sending ISI_HAM3_SC.py Sending ISI_HAM4_SC.py Sending ISI_HAM5_SC.py Sending ISI_HAM6_SC.py Sending ISI_ITMX_ST1_BLND.py Sending ISI_ITMX_ST1_SC.py Sending ISI_ITMX_ST2_BLND.py Sending ISI_ITMX_ST2_SC.py Sending ISI_ITMY_ST1_BLND.py Sending ISI_ITMY_ST1_SC.py Sending ISI_ITMY_ST2_BLND.py Sending ISI_ITMY_ST2_SC.py Transmitting file data ......................... Committed revision 17121. jameson.rollins@opsws12:/opt/rtcds/userapps/release/isi/h1/guardian 0$
A couple other notes about the ISI_*_BLND and ISI_*_SC nodes:
As of 03:00 utc rough pumping was stopped, and pumping will restart on Monday.
We started roughing out vertex today and will resume on Monday. We might reach hard vacuum on Tuesday at which point HAM6 in-chamber work and viewport work can resume.
Pumped on BSC10 annulus all day at around 2-3e-6 Torr. Powered on AIP, but it never fell below 10 LED lights. Looks like we need to replace that pump. Left AIP valved out and ON and turbo cart valved out and ON for weekend.
Yesterday we suspected a large air leak at EY when pressure wouldn't drop below e-4 Torr with main turbo. Before pump down, we had valved in the new IP11+baffle and new HV NEG+housing into main volume in order to pump them out without needing separate pump carts. Valved IP+baffle out last night and the main volume pressure fell fast to normal levels in e-6 Torr range.
Today we found a large air leak in the chevron baffle nipple/housing measuring 5e-1 mbar-L/s, after finally bagging it. We also found a gappy mini conflat flange on one of the HV feedthrus that appeared to be the culprit until bag test on nipple. I was able to tighten the mini conflat metal to metal. It may have a small leak but with the other large leak it's too hard to tell until we separate them.
On Monday we will remove the IP and baffle and investigate. Hoping the baffle nipple leak is from a gasket and not a cracked weld. The new IP isolation valve allows us to vent one side up to air without interfering with main volume pump down.
Before leak checking, I assumed the gate of all-metal 1-1/2" right angle valve on top of IP was leaking because it seemed bottomed out (seals via knife edge) and the other side was up to air and closed off with a KF40. I replaced it with a new "easy to close" all-metal valve.
Note that I tested the newly rebuilt IP11 prior to installation next to beam tube. It shipped under vacuum with pinch tube and 16.5" blank, so I connected HV cables and turned ON without supplemental pumping. It behaved normal. If the mini conflat on feedthru leaks, it is likely small, but we will remeasure since I torqued the gappy flange.
Transitioned turbo backing pump from QDP80 to scroll pump this evening and purged & shut down QDP80. Water lines are still valved in partially to continue cooling the pump, and there looks to be a very small water leak from the booster pump. Purge air skid is still on. We should be able to turn it off next week. EY main air compressors are also running for the turbo safety valve.
A solution to mode match the output of the 70 W power amplifier to the pre-modecleaner was found that
did not mechanically clash too badly with already installed components. The lenses were positioned and
the initial pre-modecleaner visibility was ~60%. After adjusting the lenses, the visibility improved to
~88%.
Upon increasing the power incident on the pre-modecleaner the visibility dropped to ~78%. Some
tweaking of the lens positions under full power will be required - which is not surprising. Attached
are the two transfer functions measured at low and high power. Both have a unity gain frequency of
~5.5 kHz. At the moment ~56 W is transmitted through the pre-modecleaner.
Transfer functions of the frequency servo were also taken. We were able to push the unity gain
frequency out to ~540 kHz with a phase margin of ~60 degrees.
Jason / Peter
TITLE: 04/06 Day Shift: 15:00-23:00 UTC (08:00-16:00 PST), all times posted in UTC
STATE of H1: Planned Engineering
INCOMING OPERATOR: None
LOG:
13:30 (6:30) Peter to PSL enclosure
14:27 (7:27) Tyler, Mark Chris to LVEA -- HAM5 door attachment
15:00 (8:00) Start of shift
15:57 (8:57) Fil, Ed to EY -- cable clean up
16:00 (9:00) Jason to PSL enclosure -- 70W work.
16:17 (9:17) Gerardo to HAM5 -- pump down
16:18 (9:18) Terry, Sheila to LVEA
16:23 (9:23) Chandra to EY -- vacuum work
16:26 (9:26) Terry, Sheila out of LVEA
16:27 (9:27) Tyler, Mark, Chris out of LVEA -- doors on
16:28 (9:28) Tyler, Mark, Chris to LVEA -- crane forklift
16:33 (9:33) Richard to LVEA
16:40 (9:40) Richard out of LVEA
17:13 (10:13) Gerardo out of LVEA
17:19 (10:19) Tyler, Mark, Chris out of LVEA
17:28 (10:28) Bubba, guest to LVEA
17:52 (10:52) Gerardo to LVEA -- start roughing vertex
18:07 (11:07) Peter out of PSL enclosure
18:09 (11:09) Jason out of PSL enclosure
18:14 (11:14) Bubba, guest out of LVEA
18:14 (11:14) Bubba, guest to EX
18:46 (11:46) Fil, Ed back from EY
19:30 (12:30) Peter to PSL enclosure
19:35 (12:35) Jason to PSL enclosure -- 70W work
19:58 (12:58) Sebastien to PSL enclosure
20:13 (13:13) TVo, Georgia, Dan to SQZ Bay
21:17 (14:17) Terra to SQZ Bay -- talk to TVo
21:23 (14:23) Terra, TVo out of SQZ Bay
21:50 (14:50) Gerardo to HAM5 -- check on cart
22:00 (15:00) Gerardo back from HAM5
22:00 (15:00) Gerardo to EY -- assist Chandra with vacuum work
22:20 (15:20) Peter, Jason, Sebastien out of PSL enclosure
22:30 (15:30) Georgia, Sebastien to SQZ Bay
23:00 (16:00) End of shift
Before starting roughing out the vertex, the annulus system for HAM5 was prepped for pump down by inserting two stoppers into HAM6 annulus system at both door flanges, north and south. HAM5 annulus system was pump down with an aux cart, and after 2.5 hours of pumping the annulus pressure reached 6.1x10-05 torr, no leak checking was performed. Aux cart continues to pump down on system.
After dealing with HAM5 annulus, the purge air blow down dew point was measured at -32.1 oC. Proceeded to equalize the entire system to atmosphere.
Following procedure, rough down was initiated at the vertex at 12:21 local time.
Per WP#7461
As requested by Chandra, the PT246 cell phone alarm limit has been increased from 5.0e-09 to 1.0e-08 Torr to prevent false-positive alarms this weekend. I took the opportunity to change Chandra's and Gerardo's contact email addresses to their new caltech ones.
p.s. we are bypassing the INTERLOCK cell phone calls until Monday.
FYI -
We've updated several files in the {userapps}/trunk/isi/ directories. These can be installed on site when it is convenient.
These updates add a feature to the ISI-Watchdog so that when the watchdog is tripped it 'cushions' the isolation loop turn-off to reduce the impulse sent to the Suspensions.
Details for the installation are in the SEI log 1323.
Description of the new parts can be found in T1800031 - Smooth Ramp for Isolation Turnoff by Dane Stocks
The ECR is E1800026 and Integration issue 9889 .
Nutsinee Kijbinchoo, Terry McRae, Jeff Kissel, Keita Kawabe, Daniel Sigg, TJ Shafer, Sheila,
yak
Here are some more photos that maybe of interest:
The rest are just random HAM5 work photos. ResourceSpace is broken for me.
Dan B, Terra H. Alexei C., Seb N, TVo
We attempted to try to characterize the SR3 RoC heater's modematching range by mode scanning the OMC. However, we had trouble getting the OMC alignment loops to close. Even though the alignment on the QPDs using OM1 and OM2 were relatively close to the loop offsets to start, the OMC SUS would rail and push the beam off of the QPDs. We've had this problem before but we couldn't figure out what the issue was this time around. Although we got flashes in the OMC, the misalignment was not good enough to get a proper measurement so we'll have to try again some other time.
FYI: For the upcoming vent, we've turned off the SR3 driver and left the IMC locked with 2.5 W input. Also, a lot of the digital cameras weren't loading for some reason, even after requesting a reboot.
It looks like the fast shutter closed at around 5:30 UTC last night, which would make it impossible to close OMC QPD or AS centering loops.
I agree, but there was still light on the QPDs even after the fast shutter said it was "closed". So that makes me think that it was still actually open. This seems very odd to me. Keita's alog here shows some step-by-step instructions on how to use the shutter but I still don't fully understand the logic.
A note on the SC nodes:
Since these new SC nodes are still in a bit of a testing phase, I don't think all of the filters that will be used are in the configuration file. One way we could get around this, until the config file is set exactly how the SEI wants it, is to remove the check temporarily. I'm hesitant to remove it entirely, but that might be best since it doesn't allow for any testing of new filters.
As of 7:50 am this morning (after I restarted 5 nodes last night):
Including the five nodes I restarted last night, that's 23 seg faults out of 68 nodes in roughly 18 hours = 6 hour MTTF. That's higher than it was previously. I'm reverting all nodes back to h1guardian0.
All nodes have been reverted back to h1guardian0