aLIGO LHO Logbook

H1 CDS (CDS, DAQ, TCS)

andrew.lundgren@LIGO.ORG - posted 09:53, Tuesday 21 April 2015 - last comment - 09:24, Friday 24 April 2015(17971)

Minute trends, second trends, and raw data disagree on when HWS camera was off

Andy, Duncan

When looking for times when the HWS camera was on or off, I found that the minute trends indicated that it was off on Apr 18 6:30 UTC for ~27 minutes. But the second trends indicate that it was turned off 20 minutes later than that (and back on at the same time). The raw data (sampled at 16 Hz) indicates that the camera was never turned off.

This was originally found using data over NDS2, but Duncan has confirmed by using lalframe to read the frames directly. I've attached a plot below. The channels are H1:TCS-ITM{X,Y}_HWS_DALSACAMERASWITCH.

Images attached to this report

Comments related to this report

shivaraj.kandhasamy@LIGO.ORG - 14:39, Tuesday 21 April 2015 (17981)DAQ, DetChar

Link

I was able to successfully run it in the Caltech (CIT) cluster using a matlab code i.e., the raw, minute and second trends agree. The matlab code uses ligo_data_find. But if I run the same code at Hanford cluster it produces the results Andy and Duncan saw i.e., the trends disagree. So there seems to difference between the frames at these two locations for the trend ones. I have attached the matlab codes here with incase some one wants to test it.

Non-image files attached to this comment

getDataAll.m

getFrameData.m

gregory.mendell@LIGO.ORG - 14:32, Thursday 23 April 2015 (18029)

Link

This is because the trend data from the two CDS framewriters can disagree. This happens if a framewriter restarts during the period covered by the trend file, and the averages from each framewriter are computed using a different number of values. These differences only happens with the trend data. See below for the details.

Note that at LHO, LDAS is using the CDS fw1 framewriter as the primary source of the scratch trends (saved at LHO for the past month) and the CDS fw0 frameswriter as the primary source of the archive trends (copied to CIT and saved permenantly at LHO and CIT).

If a framewriter goes down, it will still write out the trend data based on what data it has since it restarted.

Thus you can get trend frames that contain data averages for only part of the time period covered by the file.

For the time given in this alog, the trend files under /archive (from framewriter-0) and /scratch (from
framewriter-1) differ is size:

$ ls -l /archive/frames/.../H-H1_M-1113372000-3600.gwf
-r--r--r--   1 ldas     ldas     322385896 Apr 18 00:27 /archive/frames/.../H-H1_M-1113372000-3600.gwf

$ ls -l /scratch/frames/.../H-H1_M-1113372000-3600.gwf
-r--r--r--   1 ldas     ldas     310156193 Apr 18 00:46 /scratch/frames/.../H-H1_M-1113372000-3600.gwf

Note that both files pass FrCheck (but have different checksum) and contain valid data according to framecpp_verify (e.g., run with the --verbose --data-valid options).

However, if I dump out the data for one of the channels in question, I get:

$ FrDump -i /archive/frames/.../H-H1_M-1113372000-3600.gwf -t H1:TCS-ITMX_HWS_DALSACAMERASWITCH.mean -d 5 | grep "0:"
     0:           1           1           1           1           1
      1           1           1           1           1
    10:           1           1           1           1           1
      1           1           1           1           1
    20:           1           1           1           1           1
      1           1           1           1           1
    30:           1           1           1           1           1
      1           1           1           1           1
    40:           1           1           1           1           1
      1           1           1           1           1
    50:           1           1           1           1           1
      1           1           1           1           1

$ FrDump -i /scratch/frames/.../H-H1_M-1113372000-3600.gwf -t H1:TCS-ITMX_HWS_DALSACAMERASWITCH.mean -d 5 | grep "0:"
     0:           0           0           0           0           0
      0           0           0           0           0
    10:           0           0           0           0           0
      0           0           0           0           0
    20:           0           0           0           0           0
      0           0           0           1           1
    30:           1           1           1           1           1
      1           1           1           1           1
    40:           1           1           1           1           1
      1           1           1           1           1
    50:           1           1           1           1           1
      1           1           1           1           1

These frames start at,

$ tconvert 1113372000 Apr 18 2015 05:59:44 UTC

and the 0's start about 28 minutes into the /scratch file (copied from framewriter-1), while the /archive version only contains 1's (copied from framewriter-0).

Thus, I predict framewriter-1 restarted at around Apr 18 2015 06:28:00 UTC. It seems that 0's get filled in for times before that.

If I check, H1:TCS-ITMX_HWS_DALSACAMERASWITCH.n, which gives the number of values used to get the averages, this is also 0 when then above numbers are 0, indicating the 0's came from times when framewriter-1 had no data.

Note that this behavior only occurs for second-trend and minute-trend data.

If data is missing in the raw or commissioning data, no file is written out. Thus, we never find a difference between the raw (H1_R) or commissioning (H1_C) frames between valid frames written by both framewriters. Note that the diffH1fb0vsfb1Frames process seen in the first row of green lights here,

http://ldas.ligo-wa.caltech.edu/ldas_outgoing/archiver/monitor/d2dMonitor.html

is continuously checking that the raw frames from the two framewriters is the same. (The same process runs at LLO too.)

If differences are found, it sends out an email alert.

I've never received an alert, expect when the RAID disk-arrays have either filled up (and 0 byte files were written by one framewriter) or
when the RAID disk-array hung in some way that caused corrupt files to be written. In both cases, the files on the problem array never pass FrCheck and are never copied into the LDAS system.

Thus, the above feature, is a feature of the second-trend and minute-frames only. To avoid this issue, code should check the .n channel to make sure the full number of samples were used to obtain the average. Otherwise, some of the trend data gets filled in with zeros.

david.barker@LIGO.ORG - 16:06, Thursday 23 April 2015 (18032)

Link

Greg said:

Thus, I predict framewriter-1 restarted at around Apr 18 2015 06:28:00 UTC. It seems that 0's get filled in for times before that.

the restart log for 17th April says

2015_04_17 23:28 h1fw1

With local PDT time = UTC - 7, Greg gets a gold star.

daniel.sigg@LIGO.ORG - 09:24, Friday 24 April 2015 (18041)

Link

There should also be a .n channel which tells you how many samples were included in the average.