Reports until 17:28, Thursday 01 September 2016
H1 DAQ
jonathan.hanks@LIGO.ORG - posted 17:28, Thursday 01 September 2016 (29452)
Frame writer status update and H1FW0 frame differences
On Tuesday we reconfigured h1fw1 to run the newer code (same code as h1fw0), and put the older code on h1fw2 for comparison purposes.

So:
h1fw0 and h1fw1 are running r4252 from branch-3.0 in svn.
h1fw2 is running as the advLigoRTS-3.0.3 release tagged for O1.

We have not seen frame writer crashes in several weeks now.  However we are seeing h1fw0 produce a small number of frames that are different from what is produced by h1fw1 and h1fw2.

With analysis done by Greg Mendell and David Barker it appears that the daqd does not receive all the data occasionally and issues some re transmit requests for some of the data.  When it does this some of the data is put in out of order.

So apparently data that should arrive in as

1 2 3 4 5

may be stored as:

1 3 4 5 2

I will review the network receive code tomorrow to see if this is what is happening.  We find it curious that it inserts the data it skips later on.

We are not sure what is causing this, but a leading theory is that this may be a network issue between the h1dc0 and h1fw0.  Next Tuesday Dave plans to switch the connections on the switch between h1fw0 and h1fw1.  If the behavior switches between h1fw0 and h1fw1 we will be able to attribute this to a problem with either the port, sfp, or fiber that is currently connecting h1fw0 to the switch.

I will work with Carlos to see if we can get error counters from the switch.