Back to Observing Mode @ 0:42 UTC.
Richard, Gerardo, Jim, Dave:
While the IFO was in commissioning we went to Mid-Y to investigate the h1pemmy failure which happened at 03:53 PDT this morning. We found that the IO Chassis was not powered up, but the +24V power supply was running. We found that the fuse in the fuse-units at the top of the DC power supply rack had blown for the +24V line. We replaced the fuse, powered up the IO Chassis and noted the DC power supply was drawing at steady 3.5A as expected. It was then we noticed the AA-Chassis was not on. It has a fuse on the +18V line which had blown. Putting a new fuse in caused a hot electrical odor from the AA-Chassis. The fuse was removed to de-energize the AA.
Initially we left all the systems down (computer and IO Chassis) and returned to the corner station. But with h1pemmy not running, the DAQ EDCU and CONLOG were red on the overview screen. We remotely powered up the h1pemmy computer, which then started the h1ioppemmy and h1pemmy models. Wtith the EPICS IOCs running, this has appeased EDCU and CONLOG. For now the two models have FEC and DAQ errors as they are not actually running, only the epics IOC is running. We will remain in this state until the AA Chassis is runnng again.
A trend of a seismic signal and the front end status shows they both went down at 03:53 PDT Thu 01 oct 2015. We presume the failure of the AA-chassis and the overcurrent of the 18V glitched the 24V line which in turn blew the fuse.
Activity Log: All Times in UTC (PT) 15:00 (08:00) Take over from TJ 15:05 (08:05) GRB Alert – Switch to Observing mode 16:33 (09:33) Jodi – Delivering storage items to mechanical building 16:59 (09:59) Jodi – Finished with delivery 18:31 (11:31) GRB Alert – In Observing mode before notice 20:47 (13:47) Switch to Commissioning mode – Commissioning work while LLO is down 21:26 (14:26) Kyle & Gerardo – Going to Mid-Y to start rotating pump and leak detector 21:35 (14:35) Dave & Jim – Going to Mid-Y to check on PEM problem 22:31 (15:31) Filiberto – Going to Mid-Y to work on bad AA chassis 22:45 (15:45) Lockloss – Commissioning activities 23:00 (16:00) Turn over to Travis End of Shift Summary: Title: 10/01/2015, Day Shift 15:00 – 23:00 (08:00 – 16:00) All times in UTC (PT) Support: Sheila, Mike, Incoming Operator: Travis Shift Summary: - 15:00 IFO locked. Intent Bit = Commissioning Mode. Wind is calm, no seismic activity. All appears normal. Sheila doing some testing while LLO is relocking. - 17:05 (08:05) GRB Alert. Switch to Observing mode - 18:31 (11:31) GRB Alert. In Observing mode - 23:00 (16:00) Relocking
Just in case you're wondering why LHO sees two noise bumps at 315 and 350Hz (attached, middle blue) but not at LLO, we don't fully understand either but here is the summary.
There are three things here, environmental noise level, PZT servo, and jitter coupling to DARM. Even though the former two explains a part of the LLO-LHO difference, they cannot explain all of it, and the coupling at LHO seems to be larger.
Reducing the PSL chiller flow will help but that's not a solution for the future.
Reimplementing PZT servo at LHO will help and this should be done. Squashing it all will be hard, though, as we are talking about the jitter between 300 and 370Hz and there's a resonance at 620Hz.
Reducing coupling is one area that was not well explored. Past attempts at LHO were on top of dubious IMC WFS quadrant gain imbalances.
1. Environmental difference
These bumps are supposed to be from the beam jitter caused by PSL periscope resonances (not from the PZT mirror resonances). In the attached you can see that the bumps in H1 (middle blue) correspond to the bumps in PSL periscope accelerometer (top blue). (Don't worry, we figured out which server we need to use for DTT to give us correct results.)
Because of the PSL chiller flow difference between LLO and LHO (LHO alog, couldn't find LLO alog but we have MattH's words), in general LLO periscope noise level is lower than LHO. However, the difference in the accelerometer signal is not enough to explain the difference in IFO.
For example, at 350Hz LHO PSL periscope is only a factor of 2 noisier than LLO. At 330Hz, LHO is quieter than LLO by more than a factor of 2. Yet we have a huge hump in DARM at LHO, it becomes larger and smaller in DARM but it never goes away, while LLO DARM is deat flat.
At LLO they do have a servo to supress noise at about 300Hz, but it shouldn't be doing much if any at 350Hz (see the next section).
So yes, it seems like environmental difference is one of the reasons why we have larger noise.
But the jitter to DARM coupling itself seems to be larger.
Turning down the chiller flow will help but that's not a solution for the future.
2. Servo difference
At LLO there's a servo to squash beam jitter in PIT at 300Hz. LHO used to have it but now it is disabled.
At LLO, IOOWFS_A_I_PIT signal is used to suppress PIT jitter targetting the 300Hz peak which was right on some mechanical resonance/notch structure in PZT PIT (which LHO also has), and the servo reduced the noise between about 270 and about 320Hz (LLO alog 19310).
Same servo was successfully copied to LHO with some modification, which also targeted 300Hz bump (except that YAW was more coherent than PIT and we used YAW signal), with somewhat less (but not much less) aggressive gain and bandwidth. At that time 300Hz bump was problematic together with 250Hz bump and 350Hz bump. Look at the plots from alog 20059 and 20093.
Somehow 250Hz and 300Hz subsided, and now LHO is suffering from 315Hz and 350Hz bumps (compare the attached with the above mentioned alog). Since we never had time to tune the servo filter to target either of the new bumps, and since turning the servo on without modification is going to make marginal improvement at 300Hz and will make 250Hz/350Hz somewhat worse due to gain peaking, it was disabled.
Reimplementing the servo to target 315 and 350Hz bumps will help. But it's not going to be easy to make this servo wide band enough to squash everything because of 620Hz resonance, which is probably something in the PZT mirror itself (look at the above mentioned alog 20059 for open loop transfer function of the current servo, for example). In principle we can go even wider band, but we'll need more than 2kHz sampling rate for that. We could stiffen the mount if 620Hz is indeed the mount.
3. Coupling difference
As I wrote in the environment difference, from the accelerometer data and IFO signal, it seems as if the coupling is larger at LHO.
There are many jitter coupling measurements at LHO but the best one to look at is this one. We should be able to make a direct comparison with LLO but I haven't looked.
Anyway, it is known that the coupling depends on IMC alignment and OMC alignment (and probably the IFO alignment).
At LHO, IMC WFS has offsets in PIT and YAW in an attempt to minimize the coupling. This is on top of dubious imbalances in IMC WFS quadrant gains at LHO (see alog 20065, the minimum quadrant gain is a factor of 16 larger smaller than the maximum). We should fix that before spending much time on studying the jitter coupling via alignment.
At LLO, there's no such imbalance and there's no such offset.
The coupling of these peaks into DARM appears to pass through a null near the beginning of each full-power lock stretch, perhaps indicating that this coupling can be suppressed through TCS heating.
Already from the summary pages one can see that at the beginning of each lock, these peaks are present in DARM, then they go away for about 20 minutes, and then they come back for the duration of the lock.
I looked at the coherence (both magnitude and phase) between DARM and the IMC WFS error signals at three different times during a lock stretch beginning on 2015-09-29 06:00:00 Z. Blue shows the signals 10 minutes before the sign flip, orange shows the signals near the null, and purple shows the signals 20 minutes after the sign flip.
One can also see that the peaks in the immediate vicinity of 300 Hz decay monotonically from the beginning of the lock strech onward; my guess is that these are generated by some interaction with the beamsplitter violin mode and have nothing to do with jitter.
Addendum:
alog 20051 shows the PZT to IMCWFS transfer function (without servo) for PIT and YAW. Easier to see which resonance is on which DOF.
Chris Biwer and other members of the hardware injections team will likely be doing coherent hardware injections in the near future, and these will hopefully be detected successfully by one or more of the low-latency data analysis pipelines. Currently, we are still testing the EM follow-up infrastructure, so the "Approval Processor" software is configured to treat hardware injections like regular triggers. Therefore, these significant GW "event candidates" should cause audible alarms to sound in each control room, similar to a GRB alarm. The operator at each site will be asked to "sign off" by going to the GraceDB page for the trigger and answering the question, "At the time of the event, was the operating status of the detector basically okay, or not?" You can also enter a comment. For the purpose of these tests, if you are the operator on shift, please: * Do not disqualify the trigger based on it being a hardware injection -- we know it is! So, please sign off with "OKAY" if the detector was otherwise operating OK. * Pay attention to whether the audible alarm sounded. In the past we had issues at one site or the other, so this is one of the things we want to test. * Feel free to enter a comment on the GraceDB page when you sign off, like maybe "this was a hardware injection and the audible alarm sounded". * You may get a phone call from a "follow-up advocate" who is on shift to remotely help check the trigger. Note: in the future, once the EM follow-up project is "live", a known hardware injection will not cause the control-room alarms to sound (unless it is a blind injection). You should not write anything in the alog about alarms from GW event candidates, because that is potentially sensitive information and the alogs are publicly readable.
IFO has been locked at NOMINAL_LOW_NOISE, 23.0W, 72Mpc for the past 5 hours. Wind and seismic activity are low. 4 ETM-Y saturation alarms. Received GRB alert at 18:31UTC (12:31PT) - LHO was in Observing mode during this event
The attached plot shows the 2 day trend of the RF45 glitches. There were no glitches in the past day. The large glitches 24 hours ago were us. This is not inconsistent with a cable or connection problem. No one should be surprised, if the problem reappears.
Title: 10/01/2015, Day Shift 15:00 – 23:00 (08:00 – 16:00) All times in UTC (PT) State of H1: At 15:00 (08:00) Locked at NOMINAL_LOW_NOISE, 23.0W, 72Mpc Outgoing Operator: TJ Quick Summary: Wind is calm, no seismic activity. All appears normal. Intent Bit at Commissioning while LLO was recovering from a lockloss.
Title: 10/1 OWL Shift: 7:00-15:00UTC (00:00-8:00PDT), all times posted in UTC
State of H1: Locked but not Observing for inj
Shift Summary: I had one lockloss, but it came back up with relative ease. The RF Noise wasn't bothering me.
Incoming Operator: Jeff B
Activity Log:
Relocked @ 14:38
Sheila wants to do a quick injection while LLO is down.
excitation ended just before we got a GRB alert, but I was making an excitation at the time of the GRB (LLO was not in observing so we were taking advantage of some single IFO time to investigate noise at 78 Hz in DARM that may come from EX).
When we heard the alert I stopped the dtt session and Jeff B went to observing, but there were times even when we weren't in observing that there were no excitations running. Grace DB lists 1127747079.41 as the event time for the first GRB alert, and unfortunately my excitation was running at that time. My last excation was ramping down by 1127747090 as shown in the first attached dataviewer screenshot, where the GRB time is approximately in the middle of the plot, so I was exciting the ETMX ISI at the time of the event.
The two channels that I was putting excitations on were H1:ISI-ETMX_ST2_ISO_Y_EXC and H1:ISI-ETMX_ST2_ISO_Y_EXC. These were white noise excitations that produced ISI motions of 0.1 nm/rt Hz at 20 Hz with an amplitude that slowly drops off as the frequency increases until 100 Hz (0.02 nm/rt Hz). The excitation was bandbassed from 20Hz-200 Hz. They produced no features in the DARM spectrum, although they were intended to excite the peaks at 78-80 Hz.
Lockloss @ 13:59 UTC
ITMX saturation, and it tripped SUS OMC WD. No obvious reason for lockloss.
H1IOPPEMMY and H1PEMMY both started to report errors for FE, ADC, and DDC on the CDS Overview around 12:55 UTC. There was a red around TDS, so I checked out the timing screen and there seems to be a problem with Port 13 "Invalid or no data".
Since this is only PEM at MidY, I have NOT taken us out of Observing.
The I/O chassis is no longer visible to the computer h1pemmy. This is not critical to the operation of the interferometer. This can wait until Tuesday to fix unless someone desperately needs PEM data from MY.
Humming along @ 75Mpc. Have had a handful of glitches during my shift, but the RF noise seems to be in control for now.
I'm posting an early version of this alog so that people can see it, but plan to edit again with the results of the second test.
Yesterday I took a few minutes to follow up on the meausrements in alog 21869. This time in addition to driving TMS I drove the ISI in the beam direction to reproduce the motion caused by the backreaction to TMS motion. We also breifly had a chance to move the TMSX angle while exciting L.
The main conclusions are:
Comparison of ISI drive to TMS drive for X and Y
The attached screenshot shows the main results of the first test (driving ISIs and TMSs). In the top right plot you can see that I got the same amount of ISI motion for 3 cases (driving ETMX ISI, TMSX, ETMY ISI) and that driving TMSY with the same amplitude as TMSX resulted in a 50% smaller motion of the ISI. Shaking the TMS in the L direction induces a larger motion measured by the GS13s in the direction perpendicular to the beam, than in the beam direction, which was not what I expected. I chose the drive strength to get the same motion in the beam direction, so I have not reproduced the largest motion of the ISI with this test. If there is a chance it would be interesting to also repeat this measurement reproducing the backreaction in the direction perpendicular to the beam.
The middle panels of the first screnshot show the motion measured by OSEMs. TMS osems see about a factor of 10 more motion when the TMS is driven than when the ISI is driven. The signal is also visible in the quad top mass osems, but not lower down the chain. For the X end, the longitudnal motion seen by the top mass is about a factor of 2 higher when the TMS is excited than when the ISI is excited (middle left panel), which could be because I have not reproduced the full backreaction of the ISI to the TMS motion. However, it is strange that for ETMY the top mass osem signal produced by driving TMS is almost 2 orders of magnitude larger than the motion produced by moving the ISI. It seems more likely that this is a problem of cross coupling between the osems than real mechanical coupling. The ETMY top mass osems are noisier than ETMX, as andy lundgren pointed out (20675). It would be interesting to see a transfer function between TMS and the quad top mass to see if this is real mechanical coupling or just cross talk.
In the bottom left panel of the first screenshot, you can compare the TMS QPD B yaw signals. The TMS drive produces larger QPD signals than the ISI drive, as you would expect for both end stations. My first gues would be that driving the ISI in the beam direction could cause TMS pitch, but shouldn't cause as much yaw motion of the TMS. However, we see the ETMX ISI drive in the yaw QPDs, but not pitch. The Y ISI drive does not show up in the QPDs at all.
Lastly, the first plot in the first screenshot shows that the level of noise in DARM produced by driving the ETMX ISI is nearly the same as what is produced by driving TMSX. Since the TMS motion (seen by TMS osems) is about ten times higher when driving TMS, we can conclude that this coupling is not through TMS motion but the motion of something else that is attached to the ISI. Driving ETMY ISI produces nothing in DARM but driving TMSY produces a narrow peak in DARM.
For future reference:
I drove ETMX-ISI_ST2_ISO_X_EXC with an amplitude of 0.0283 cnts at 75 Hz from 20:07:47 to 20:10:00UTC sept 29th
I drove 2000 cnts in TMSX test L from 20:10:30 to 20:13:30UTC
I drove ETMY-ISI_ST2_ISO_Y_EXC with an amplitude of 0.0612 cnts at 75 Hz from 20:13:40 to 20:16:30UTC
I drove 2000 cnts in TMSY test L from 20:17:10 to 20:20:10UTC
Driving TMSX L while rastering TMS position and angle
I put a 2000 cnt drive on TMSX L from about 2:09 UTC September 30th to 2:37 when I broke the lock. We found a ghost beam that hits QPD B when TMS is misaligned by 100 urad in the positive pitch direction. There is about 0.5% as much power in this beam as in the main beam (not accounting for the dark offset). I got another chance to do this this afternoon, and was able to move the beam completely off of the QPDs, which did not make the noise coupling go away or reduce it much. We can conclude then scatter off of the QPDs is not the main problem. There were changes in the shape of the peak in DARM as TMS moved, and changes in the noise at 78 Hz (which is normally non stationary) Plots will be added tomorrow.
Speculation
There is a feature in the ETMX top mass osems (especially P and T) around 78 Hz that is vaugely in the right place to be related to the excess noise in the QPDs and DARM. Also, Jeff showed us some B&K measurements from Arnaud (7762) that might hint at a Quad cage resonance at around 78 Hz, although the measured Q looks a little low to explain the spectrum of the TMSX QPDs or the feature in DARM. One could spectulate that the motion driving the noise at 78Hz is the quad cage resonance, but this is not very solid. Robert and Anamaria have data from their PEM injections that might be able to shed some light on this.
The units in the attached plots are wrong, there GS13s are calibrated into nm, not meters
This morning I got the chance to do some white noise excitations on the ETMX ISI, in the X and Y directions. The attached screenshot shows the result, which is that for ISI motion a factor of 10-100 above the normal level, for a wide range of frequencies, no noise shows up in DARM. SO the normal level of ISI motion in the X and Y directions is not driving the noise in DARM at 78 Hz. We could do the same test for the other ISI DOFs to eliminate them as well.
C. Biwer, J. Kissel Taking advantange of single IFO time to run PCAL vs DARM hardware injections. More details later.
PCAL Injection tests complete. PCAL X has been restored to nominal configuration. Injection Approx End time (GPS) DARM 1 1127683335 PCAL 1 1127683906 PCAL 2 1127684171 PCAL 3 1127684465 DARM 2 1127684766 DARM 3 1127685143 More details and analysis to come. These were run from the hwinjection machine as hinj. Usual DARM Command awgstream H1:CAL-INJ_TRANSIENT_EXC 16384 coherenttest1from15hz_1126257408.out 1.0 -d -d PCAL Command: awgstream H1:CAL-PCALX_SWEPT_SINE_EXC 16384 coherenttest1from15hz_1126257408.out 1.0 -d -d We turned OFF the 3 [kHz] PCAL line during the excitation. We're holding off on observation mode to confir about other single IFO tests we can do while L1 is down.
I've attached omega scans of the PCAL and DARM injections. All injections used the 15Hz template from aLog 21838.
The SNRs of the Pcal injections seem a bit lower than intended. Omega reports SNR 10.5 for the injection through the normal path, which is about right. But for the Pcal injections, the SNRs are 5.5, 7.6, and 7.2. Note that these are the SNRs in CAL-DELTAL; someone should check in GDS strain as well. Links to scans below: Standard path Pcal 1 Pcal 1 Pcal 1
*** Cross-reference: See alog 22124 for summary and analysis
When DTT gets data from NDS2, it apparently gets the wrong sample rate if the sample rate has changed. The plot shows the result. Notice that the 60 Hz magnetic peak appears at 30 Hz in the NDS2 data displayed with DTT. This is because the sample rate was changed from 4 to 8k last February. Keita pointed out discrepancies between his periscope data and Peter F's. The plot shows that the periscope signal, whose rate was also changed, has the same problem, which may explain the discrepancy if one person was looking at NDS and the other at NDS2. The plot shows data from the CIT NDS2. Anamaria tried this comparison for the LLO data and the LLO NDS2 and found the same type of problem. But the LHO NDS2 just crashes with a Test timed-out message.
Robert, Anamaria, Dave, Jonathan
It can be a factor of 8 (or 2 or 4 or 16) using DTT with NDS2 (Robert, Keita)
In the attached, the top panel shows the LLO PEM channel pulled off of CIT NDS2 server, and at the bottom is the same channel from LLO NDS2 server, both from the exact same time. LLO server result happens to be correct, but the frequency axis of CIT result is a factor of 8 too small while Y axis of the CIT result is a factor of sqrt(8) too large.
Jonathan explained this to me:
keita.kawabe@opsws7:~ 0$ nds_query -l -n nds.ligo.caltech.edu L1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ
Number of channels received = 2
Channel Rate chan_type
L1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 2048 raw real_4
L1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 16384 raw real_4
keita.kawabe@opsws7:~ 0$ nds_query -l -n nds.ligo-la.caltech.edu L1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ
Number of channels received = 3
Channel Rate chan_type
L1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 16384 online real_4
L1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 2048 raw real_4
L1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 16384 raw real_4
As you can see, both at CIT and LLO the raw channel sampling rate was changed from 2048Hz to 16384Hz, and raw is the only thing available at CIT. However, at LLO, there's also "online" channel type available at 16k, which is listed prior to "raw".
Jonathan told me that DTT probably takes the sampling rate number in the first one in the channel list regardless of the actual epoch each sampling rate was used. In this case dtt takes 2048Hz from CIT but 16384Hz from LLO, but obtains the 16kHz data. If that's true there is a frequency scaling of 1/8 as well as the amplitude scaling of sqrt(8) for the CIT result.
FYI, for the corresponding H1 channel in CIT and LHO NDS2 server, you'll get this:
keita.kawabe@opsws7:~ 0$ nds_query -l -n nds.ligo.caltech.edu H1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ
Number of channels received = 2
Channel Rate chan_type
H1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 8192 raw real_4
H1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 16384 raw real_4
keita.kawabe@opsws7:~ 0$ nds_query -l -n nds.ligo-wa.caltech.edu H1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ
Number of channels received = 3
Channel Rate chan_type
H1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 16384 online real_4
H1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 8192 raw real_4
H1:PEM-CS_ACC_PSL_PERISCOPE_X_DQ 16384 raw real_4
In this case, the data from LHO happens to be good, but CIT frequency is a factor of 2 too small and magnitude a factor of sqrt(2) too large.
Part of this that DTT does not handle the case of a channel changing sample rate over time.
DTT retreives a channel list from NDS2 that includes all the channels with sample rates, it takes the first entry for each channel name and ignores any following entries in the list with different sample rates. It uses the first sample rate it receives ans the sample rate for the channel at all possible times. So when it retreives data it may be 8k data, but it looks at it as 4k data and interprets the data wrong.
I worked up a band-aid that inserts a layer between DTT and NDS2 and essentially makes it ignore specified channel/sample rate combinations. This has let Robert do some work. We are not sure how this scales and are investigating a fix to DTT.
As followup we have gone through two approaches to fix this:
The ext_alert.py script which periodically views GraceDB had failed. I have just restarted it, instructions for restarting are in https://lhocds.ligo-wa.caltech.edu/wiki/ExternalAlertNotification
Getting this process to autostart is now on our high priority list (FRS3415).
here is the error message displayed before I did the restart.
File "ext_alert.py", line 150, in query_gracedb
return query_gracedb(start, end, connection=connection, test=test)
File "ext_alert.py", line 150, in query_gracedb
return query_gracedb(start, end, connection=connection, test=test)
File "ext_alert.py", line 135, in query_gracedb
external = log_query(connection, 'External %d .. %d' % (start, end))
File "ext_alert.py", line 163, in log_query
return list(connection.events(query))
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 441, in events
uri = self.links['events']
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 284, in links
return self.service_info.get('links')
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 279, in service_info
self._service_info = self.request("GET", self.service_url).json()
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 325, in request
return GsiRest.request(self, method, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 201, in request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1038, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 415, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib/python2.7/ssl.py", line 241, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 160, in read
return self._sslobj.read(len)
ssl.SSLError: The read operation timed out
I have patched the ext_alert.py script to catch SSLError exceptions and retry the query [r11793]. The script will retry up to 5 times before crashing completely, which is something we may want to rethink if we have to.
I have request both sites to svn up and restart the ext_alert.py process at the next convenient opportunity (the next time it crashes).