J. Batch, J. Kissel After the plug got kicked out of the wall (see LHO aLOG 5794), we had to turn off all front-end computers and power cycle the IO chassis. Upon restart and restoration of the front end computers, I launched off a DTT session transfer function hoping to resume transfer functionon the TMS. However, as soon as I started, I noticed that my excitation would drop out intermittently. UH OH. All lights on the GDS_TP screen showed green. Jim to the rescue!! He immediately recognized the problem (from my verbal story only!) to be that there are too many awgtpman processes running on the front end. Like it's happened before or something. He confirmed the problem by logging into the h1susb6 frontend (on which hsustmsy lives), and running controls@h1susb6 ~ 0$ ps aux | grep awgtpman This revealed duplicated invocations of awgtpman *for each model* (not just TMS). This was quickly and easily resolved, with a sudo pkill awgtpman which killed all of the awgtpman processes. Monit -- the program running on every front end which monitors various important processes, ensuring they're up and running -- then restarted only one process properly. This was confirmed by another grep of aux, controls@h1susb6 ~ 0$ ps aux | grep awgtpman root 28813 0.2 3.1 279784 190428 ? Ssl 16:04 0:04 /opt/rtcds/lho/h1/target/gds/bin/awgtpman -s h1susetmy -1 -l /opt/rtcds/lho/h1/target/gds/awgtpman_logs/h1susetmy.log root 28823 0.3 3.1 279540 190132 ? Ssl 16:04 0:05 /opt/rtcds/lho/h1/target/gds/bin/awgtpman -s h1sustmsy -1 -l /opt/rtcds/lho/h1/target/gds/awgtpman_logs/h1sustmsy.log root 28835 0.2 3.1 279408 189904 ? Ssl 16:04 0:04 /opt/rtcds/lho/h1/target/gds/bin/awgtpman -s h1iopsusb6 -4 -l /opt/rtcds/lho/h1/target/gds/awgtpman_logs/h1iopsusb6.log controls 31462 0.0 0.0 6156 412 pts/0 S+ 16:32 0:00 grep --colour=auto awgtpman controls@h1susb6 ~ 0$ We logged into h1seib6, h1pemey, ns h1susauxb6, also showed the same symptoms -- and we rectified the problem. WHY DID THIS HAPPEN? /etc/rc.local is a local start up script that's run only when the computer is hard-booted/power-cycled, like this afternoon -- which we rarely happens, believe it or not (typically it's just the front-end process that gets restarted). This very low-level shell script is hosted on the h1boot server, so a change to it immediately gets propagated to every front end. This script had recently been modified to invoke all models' front-end-process startup script on that given front end BEFORE Monit is turned on. The problem is that both the startup scripts and Monit start awgtpman processes, but they do it in *different*, *independent* ways. Regardless of the order in which monit or the model start scripts are invoked, two awgtpman processes would get started. The change to the /etc/rc.local script is a temporary fix. The motivation for the fix is unknown to Jim. *COUGH*. What needs to happen is a permanent change to the model start scripts,such that they call awgtpman in the same way that Monit does. Then, because Monit checks for the existence of an awgtpman started by its own method, it will not fire off a new process. This requires a change to the RCG code generator, which writes the front-end model startup scripts. Such a change should then be tested extensively on the DAQ Test Stand (or some other non-observatory location), then released to the sites as a tagged version of the RCG code, which is then installed at a well-determined time that is known not interfere with current activities.