Summary: The epics channel H1:GRD-ISC_LOCK_STATE_N recorded at least one lockloss 1/4 of a second before the guardian log records entering that state. We are trying to investigate locklosses that we keep having while transitioning to the TR_CARM signal; there are a lot of things happening quickly, and we would like to understand the relative timing.
Details:
If i use the lockloss tool to plot the middle lockloss in this list:
2018-08-05_22:04:58Z ISC_LOCK LOCKING_ALS -> LOCKLOSS
2018-08-05_22:18:04Z ISC_LOCK START_TR_CARM -> LOCKLOSS_DRMI
2018-08-05_22:18:08Z ISC_LOCK ACQUIRE_DRMI_1F -> LOCKLOSS
the time listed in the title of the plot is 2018-08-05 22:18:04.69000UTC. The channel H1:GRD-ISC_LOCK_STATE_N goes to "LOCKLOSS_DRMI" (state 3) at time 0 in the plot.
However:
We can manually pull up the guardian log around the lockloss time (because the feature of the lockloss tool that used to do this for us doesn't work anymore). We see that the guardian log records itself entering the LOCKLOSS_DRMI state a quarter of a second later (2018-08-05_22:18:04.929352Z)
2018-08-05_22:18:01.409750Z ISC_LOCK new target: CARM_TO_TR
2018-08-05_22:18:01.411666Z ISC_LOCK executing state: START_TR_CARM (201)
2018-08-05_22:18:01.412065Z ISC_LOCK [START_TR_CARM.enter]
2018-08-05_22:18:01.671585Z ISC_LOCK [START_TR_CARM.main] ezca: H1:LSC-MCL => ON: FM3
2018-08-05_22:18:01.716869Z ISC_LOCK [START_TR_CARM.main] ezca: H1:ALS-C_COMM_PLL_BOOST => 0
2018-08-05_22:18:01.717531Z ISC_LOCK [START_TR_CARM.main] timer['wait'] = 2
2018-08-05_22:18:03.717666Z ISC_LOCK [START_TR_CARM.run] timer['wait'] done
2018-08-05_22:18:03.796693Z ISC_LOCK [START_TR_CARM.run] ezca: H1:ALS-C_COMM_VCO_CONTROLS_ENABLE => 0
2018-08-05_22:18:03.797668Z ISC_LOCK [START_TR_CARM.run] ezca: H1:ALS-C_REFL_DC_BIAS_GAIN => 24
2018-08-05_22:18:03.798400Z ISC_LOCK [START_TR_CARM.run] timer['wait'] = 2
2018-08-05_22:18:04.828452Z ISC_LOCK JUMP target: LOCKLOSS_DRMI
2018-08-05_22:18:04.830722Z ISC_LOCK [START_TR_CARM.exit]
2018-08-05_22:18:04.922221Z ISC_LOCK JUMP: START_TR_CARM->LOCKLOSS_DRMI
2018-08-05_22:18:04.922822Z ISC_LOCK calculating path: LOCKLOSS_DRMI->CARM_5_PICOMETERS
2018-08-05_22:18:04.923199Z ISC_LOCK new target: ACQUIRE_DRMI_1F
2018-08-05_22:18:04.929352Z ISC_LOCK executing state: LOCKLOSS_DRMI (3)
2018-08-05_22:18:04.929352Z ISC_LOCK [LOCKLOSS_DRMI.enter]
I have verified the system time on h1guardian1 is correct
david.barker@CDSADMIN!! ssh root@h1guardian1 date +"%T.%N";date +"%T.%N"
10:14:37.515521821
10:14:37.517143842
I trended the STATE_N and the exec time, expecting the exec time to have increased around the lock loss. Oddly, it was 0 around the time that STATE_N changed values. Dead end.
Then I looked at other jump transitions. What Sheila posted above seems to be the norm. The STATE_N channel will change first before there is any info in the log about the jump.
If we look at this jump transition below, the node gets a new target on the second line and then registers it as a jump in the third. The gps time of line 2 is 1217633455.83, but when I pulled the data, the gps time of the STATE_N transition to ALIGN_RECYCLING_MIRRORS (99) is at 1217633455.75. About a tenth of a second before we start to see anything on the log about this state. This matches with Sheila's example above, it registers the jump target in the log about a tenth of a second after the STATE_N value changed. The process must just take this long to exit the current state, calculate the new path in the graph, and then begin to start running the next state.
2018-08-06_23:30:37.828905Z ISC_LOCK calculating path: ACQUIRE_PRMI->ACQUIRE_DRMI_1F
2018-08-06_23:30:37.836971Z ISC_LOCK new target: ALIGN_RECYCLING_MIRRORS
2018-08-06_23:30:37.905879Z ISC_LOCK JUMP target: ALIGN_RECYCLING_MIRRORS
2018-08-06_23:30:37.914834Z ISC_LOCK [ACQUIRE_PRMI.exit]
2018-08-06_23:30:37.941559Z ISC_LOCK JUMP: ACQUIRE_PRMI->ALIGN_RECYCLING_MIRRORS
2018-08-06_23:30:37.959656Z ISC_LOCK calculating path: ALIGN_RECYCLING_MIRRORS->ACQUIRE_DRMI_1F
2018-08-06_23:30:37.971022Z ISC_LOCK new target: ACQUIRE_DRMI_1F
2018-08-06_23:30:37.985272Z ISC_LOCK executing state: ALIGN_RECYCLING_MIRRORS (99)
2018-08-06_23:30:37.986421Z ISC_LOCK [ALIGN_RECYCLING_MIRRORS.enter]
So, this is normal (for ISC_LOCK at least, I haven't checked on other nodes yet). I don't know if we need it to be much faster, but it may be something that we have to keep in mind.
I'm trying to understand the timing of the events inside of guardian (a 1/4 second is definitely a bit of a discrepency), but I'm curious why this timing issue is of such concern.
The EPICS time stamp as seen at the time of the event I believe is set by the system clock on h1guardian1. The time stamp recorded in the data for the transition might be set by the DAQD EDCU. We should confirm with Jonathan and Dave. The logs are time stamped when they are processed by the journald on h1guardian1.
While the setting of the STATE channels and the emission of the log that you're looking at happen adjacent to each other in the code, they're definitely not guaranteed to end up with the same timestamp, since there's various other variables at play. 1/4 second does seems pretty different, though, I agree, so I will try to investigate what's causing the relative timing slop.
But I'm puzzled why this is of such concern. Is it just about wanting to know which time is more authoritative? I would just use either as a guide to look at the real-time data as recorded via DAQD. If you're worried about when exactly a slow channel actuated by guardian changed, look at the record of the slow channel directly. But please do help me understand what the ultimate concern is so that I can know how to help.