Displaying report 1-1 of 1.
Reports until 16:45, Wednesday 14 May 2025
H1 ISC (OpsInfo)
thomas.shaffer@LIGO.ORG - posted 16:45, Wednesday 14 May 2025 (84387)
ALS Beckhoff vs Guardian locking error codes and expanded thresholds

Sheila D, TJ S.

Summary: the ALS arm guardians will now log the Beckhoff fiber locking error messages (untested though, testing later), and I've relaxed some of the thresholds set in Beckhoff. This is all nice, but I don't anticipate major gains from this.

Why: Our ALS locking has not been as reliable as it once was and has been one of the primary reasons for our locking not being fully autonomous. We have been discussing ways to help our ALS locking without large scale ALS table mode matching efforts, that would be very time and resource heavy, and has the chance to impact the ALS system negatively while we are still in a run. One issue that Sheila pointed out was that there were some times before this recent break that the Beckhoff automation code would flag some type of error and then stop the locking automation, the guardian node would see this and drop to the unlocked state and then start over. Sheila thinks that the Beckhoff errors are often based on far too strict of thresholds, and we should rely more on the guardian node for the error checking.

Guardian Changes: While the Beckhoff error codes for fiber locking are in the DAQ (H1:ALS-X_FIBR_LOCK_ERROR_CODE), this binary value needs a map to convert it back to something human readable. The guardian node doesn't look at these directly either, so we don't currently have an easy way to see what the Beckhoff automation is failing on...until now! The binary first has to be converted to hex, then the hex values and their associated error codes were found in a table from E1300482. I wrote a quick dictionary with a string message for the different binary values that are in epics. I've added this into the ALS nodes as a decorator to many different locking states. It will frequently check if there are any errors and log them so we can track what is happening a bit better. It won't be perfect though since it's just a decorator and it's possible that we will miss some of these error messages.

What codes have we been seeing? So in terms of error codes, what has been happening? In 2025, there was a trend in the X arm error codes that was seen during some times where ALS was fussy. Attachment 1 is an example where there is a "beat note out of range of frequency comparator" followed by a "laser error" code. This generally only lasts a few minutes though, and it looks like there were issues before these codes popped up. For the Y arm, the error codes are up for <1second and will often happen a minute or more after the arm loses lock, so I don't think this is too useful. The most common error codes that pop up are: Reference cavity transmission PD error, Reference cavity transmission below the limit, Beat note power too low.

Threshold Changes: One thing that has already been done was lower the PLL reference cavity transmission low limit (ref cav trans low limit) - alog83547. Today, I changed the PLL beat note frequency range from 3MHz-150Mhz to 2MHz-350MHz. The max frequency I saw was near 300MHz, so 350  seemed like enough to get it out of the way. Beat note minimum was already at -40dB, well out of the way already (see attachment 2 for examples). There might be some more places to relax the Beckhoff thresholds, but these related to the most common errors that we saw. Except for the very vague "Laser error" code. This will need more time to look into.

Images attached to this report
Displaying report 1-1 of 1.