The ext_alert.py script which periodically views GraceDB had failed. I have just restarted it, instructions for restarting are in https://lhocds.ligo-wa.caltech.edu/wiki/ExternalAlertNotification
Getting this process to autostart is now on our high priority list (FRS3415).
here is the error message displayed before I did the restart.
File "ext_alert.py", line 150, in query_gracedb
return query_gracedb(start, end, connection=connection, test=test)
File "ext_alert.py", line 150, in query_gracedb
return query_gracedb(start, end, connection=connection, test=test)
File "ext_alert.py", line 135, in query_gracedb
external = log_query(connection, 'External %d .. %d' % (start, end))
File "ext_alert.py", line 163, in log_query
return list(connection.events(query))
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 441, in events
uri = self.links['events']
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 284, in links
return self.service_info.get('links')
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 279, in service_info
self._service_info = self.request("GET", self.service_url).json()
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 325, in request
return GsiRest.request(self, method, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 201, in request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1038, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 415, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib/python2.7/ssl.py", line 241, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 160, in read
return self._sslobj.read(len)
ssl.SSLError: The read operation timed out
I have patched the ext_alert.py script to catch SSLError exceptions and retry the query [r11793]. The script will retry up to 5 times before crashing completely, which is something we may want to rethink if we have to.
I have request both sites to svn up and restart the ext_alert.py process at the next convenient opportunity (the next time it crashes).