aLIGO LHO Logbook

H1 SYS

jameson.rollins@LIGO.ORG - posted 12:34, Sunday 18 August 2013 (7472)

Guardian development / IMC Guardian update

[Jamie, Kiwamu, Mark, Stefan]

Summary

On Friday, we were finally able to get the "new" Guardian supervisors running and supervising various components of the input mode cleaner. Guardian "supervisors" are the main guardian processes that run the guardian code for a particular domain/component/subsystem. In this case, we had three components (or "system") running, delineated by channel access prefixes:

GRD SUS-MC2: supervisor for the MC2 suspension (H1:SUS-MC2_)
GRD IMC: supervisor for IMC (H1:IMC-)
GRD SYS-IMC: manager supervisor for the above

States for each of the systems were defined, and we were able to run the supervisors through their paces a bit to confirm that the behavior was more-or-less as expected. In fact, I would say everything really worked quite well, surpassing my expectations. Unfortunately we didn't quite get to the point where I felt comfortable leaving the supervisors running on their own, so I shut them down before I left on Friday evening.

The old IMC autolocker was not restarted.

Over the next couple of days I'll attempt to build out some of the infrastructure to start/stop/restart the supervisors at will, to ease their commissioning. I'll also work on documentation in preparation for the **Guardian review, Monday, August 26th, 12:00 PDT**.

Details

A guardian "system" consists of states connected together into a directed state graph. It is described in a "system description directory", which is passed to a guardian supervisor process as it's primary argument. When the supervisor is launched it instantiates its own EPICS channel server, which is used for accepting state requests and reporting status.

When the system is in a given system state, the supervisor is executing the run script for that state. If the state run script "Returns", the supervisor transitions to the next state in the sequence to reach the requested state. Once the requested state is reached, the system remains in that state until a new request is issued. If the state code exits with a "return target", the supervisor will transition to the target state. It the system is being run in "un-managed" mode, it will attempt to re-reach the original requested state on its own. This is known as "recovery". Otherwise, if the system is run in "managed" mode, the request will be reset to the recovery target and the system will wait for instructions from its manager.

The supervisor process itself is now functioning as a true finite state machine. In its primary run state the supervisor runs the state code for the current system state (sorry for the overloading of the term "state": the supervisor has "states" of operation of its state machine that are distinct from the "states" of the system it is controlling).

Status

Once we got things finally running, the new supervisor finite state machine architecture was working really quite well, even better than I expected. The system responded very quickly. We could issue requests, after which the supervisor would immediately calculate the path to the new requested state and ratchet through the state sequence to get there. We could easily stop at intermediate states to check status, and then instruct the system to continue on its way.

For things to work, there are really two aspects: there's the guardian supervisor itself, and the system description and its state run code. Beyond the behavior of the supervisor itself (which seemed to be working quite well), we managed to get the actual guardian code for the systems under test in a good enough state such that the graphs were well constructed and the state transitions made sense. We found some bugs in the supervisor that I was able to fix immediately, and we worked on the actual system state code until the systems were behaving properly.

We ran the IMC and SUS-MC2 supervisors in a managed mode, where we were manually coordinating their states to lock the IMC. Once we were happy with their behavior we were even able to run the SYS-IMC manager which was coordinating the transitions of IMC and SUS-MC2 to automatically lock the IMC, and recover the system back to the locked state. I will try to post descriptions of the systems we had working, including the system graphs and state descriptions, in the next couple of days.

However, things weren't working quite well enough that I felt comfortable leaving it running. The supervisor would occasionally get into a hung state that I was not immediately able to diagnose and required restarting the supervisor. The SYS-IMC manager also seemed to miss some transitions, likely due to bugging programming of the SYS-IMC state code. There is also a lot of missing features and infrastructural work needed.

I'll be posting further as more of this stuff gets in place.