Displaying report 1-1 of 1.
Reports until 17:15, Monday 01 June 2015
H1 CDS (CAL)
david.barker@LIGO.ORG - posted 17:15, Monday 01 June 2015 - last comment - 11:48, Tuesday 02 June 2015(18759)
GRB Alert script running on h1fescript0, runs for several minutes and then stops

Sudarshan, Duncan, Branson, Andrew, Michael T, Greg, Dave:

I got a lot further in installing the GRB alert system at LHO. It now runs, but fails after a couple of minutes. Here is a summary of the install:

LHO and LLO sysadmins decided to run the GRB code on the  front end script machine (Ubuntu12). At LHO it is called h1fescript0

I requested a Robot GRID Cert for this machine, Branson very quickly issued the cert for GraceDB queries last Friday

Following Duncan's and the GraceDB install instructions, I was able to install the python-ligo-gracedb module. The initial install failed, Michael resolved this, I was using the Debian Squeezy repository (which uses python2.6) rather than Wheezy which uses python2.7.

Greg told us how to install the GRID cert on the machine and setup the environment variable so the program could find it.

I found a bug in the code for the lookback, it appears the start,stop times were reversed in the arguments to client.events().

For testing, I saw that a GRB event had happened within the past 10 hours, so I ran the program with a 10 hour lookback. It found the event and posted it to EPICS (see attachement)

But afer running for several minutes, it stopped running with an error. This is reproducible.

controls@h1fescript0:scripts 0$ python ext_alert.py run -l 36000
Traceback (most recent call last):
  File "ext_alert.py", line 396, in

    events = list(client.events('External %d.. %d' % (start, now)))
  File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 450, in events
    response = self.get(uri).json()
  File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 212, in get
    return self.request("GET", url, headers=headers)
  File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 325, in request
    return GsiRest.request(self, method, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/ligo/gracedb/rest.py", line 200, in request
    conn.request(method, url, body, headers or {})
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 776, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1157, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
socket.error: [Errno 110] Connection timed out
 

Images attached to this report
Comments related to this report
keith.thorne@LIGO.ORG - 17:42, Monday 01 June 2015 (18763)CDS
We were having the same issues at LLO - Duncan and Jamie were looking at it.  We've got the robot cert, etc. all set up.  Likely can move to standard operation tomorrow.
duncan.macleod@LIGO.ORG - 11:48, Tuesday 02 June 2015 (18779)

The errors Keith mentioned seeing at LLO are unrelated, I cannot reproduce the connection timeout down there.

I have reproduced the timeout error at LHO as suggested, and have written up a retry workaround that will re-send the query up to 5 times in the event of a timeout error. This seems to run stably at LHO. The logging has been updated to record failed queries.

The SVN commit was made from h1fescript0 with Dave Barker's LIGO.ORG ID (unintentionally).

Displaying report 1-1 of 1.