Dave and I had been discussing https://alog.ligo-wa.caltech.edu/aLOG/index.php?callRep=36294. One solution we thought about was just disabling the optimization that gave us bad data. The thinking was that when it was put in we used 1 minute second trend frames. It would be faster to open one 10 minute second trend frame than ten 1 minute trend frames. I wanted to characterize the impact of the shared table of contents optimization.
NDS1 variant | Timing | Notes |
---|---|---|
Current production code | 0.723s | Fast, but broken around fw restarts |
No TOC optimization | 2.077s | Slow, but always works. |
TOC optimization w/ knowledge of fw restarts | 1.595s | Medium speed, but always correct. |
My thoughts on this are:
1. Getting correct data from nds1 is important, restarting the daq should not be a reason to give bad data (even if it is only for a few minutes).
2. We should preserve the table of contents optimization, it does have some impact.
All timings are from my laptop connecting to the LHO DTS x1nds1 server via an ssh tunnel. It should be noted there were several restarts of the nds/fw/dc on the DTS during this time window, so the penalty of not being able to share the table of contents is high in these short samples. I ran the queries below multiple times to make sure the disk cache was hot and picked a representative time for each case.
The measurements:
Quick query against x1nds1 with the current production code (no changes)
$ time nds_query -d 400 -n localhost -p 8088 -s 1179600000 -e 1179603000 X1:DAQ-DC0_DATA_RATE.mean,s-trend
Data for 600 seconds starting at GPS: 1179600000
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179600600
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179601200
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179601800
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179602400
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
real 0m0.723s
user 0m0.008s
sys 0m0.000s
Quick query against x1nds1 with the frame table of contents optimization disabled.
$ time nds_query -d 400 -n localhost -p 8088 -s 1179600000 -e 1179603000 X1:DAQ-DC0_DATA_RATE.mean,s-trend
Data for 600 seconds starting at GPS: 1179600000
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179600600
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179601200
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179601800
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179602400
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
real 0m2.077s
user 0m0.000s
sys 0m0.008s
Quick query against x1nds1 with using the table of contents optimization with some additional knowledge of fw restart times. This is were we need to be, always correct, but able to use some frame access optimizations. This can be improved by taking into account which restarts are also channel list changes, and which are just restarts.
$ time nds_query -d 400 -n localhost -p 8088 -s 1179600000 -e 1179603000 X1:DAQ-DC0_DATA_RATE.mean,s-trend
Data for 600 seconds starting at GPS: 1179600000
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179600600
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179601200
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179601800
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
Data for 600 seconds starting at GPS: 1179602400
Channel type nWords units
X1:DAQ-DC0_DATA_RATE.mean real_8 600
real 0m1.595s
user 0m0.004s
sys 0m0.004s