The problem found with the diagnostic boardboard was that the bottom DB25 connector bundle
was loose. This is because of the hardware involved, which has no jack screws to allow an
adapter to screw into the mating connector and the cable into the connector.
The connector/cable combo was re-seated. Changing from remote to local was indicated
by the MEDM screen, whereas previously it remained on "LOCAL".
This closes work permit 7159.
Richard / Peter
Turned Kobelco ON at 7:45 am local. Working through vent procedure turning off/valving out equipment to prepare for slow vent today.
To verify that front ends could be started, I first started all the non-dolphin FECs (except for the mid station PEMs), there were no problems. Before starting h1psl0 I consulted with Peter King to see if there would be any PSL issues when I do so, he said there would be none.
I noticed a timing error in the corner station and found it to be the IO Chassis for h1seib3 (ITMX) which was powered down. Its front panel switch was in the OFF position. Richard and I think this was accidentally switched much earlier, and only caused a problem after the computer was power cycled. Before powering this IO Chassis up, I first powered down the two AI chassis because of the issue with the 16bit DACs outputting voltage when the IO Chassis is ON and the FEC is OFF (after h1seib3 was later operational I powered the AI chassis back on).
I restarted all the dolphin FECs, using IPMI for the end stations and front panel switches for the MSR. Many FECs started with IRIGB errors, both in the positive and negative directions. We know from experience some of these take many minutes to clear, so I continued with the non-FEC restarts.
I consulted with Jim Warner on starting the end station BRS computers (it was OK to do so) and the HEPI Pump Controllers (we will leave them till Monday).
I went to EY, the first of many trips as it turned out. I powered up h1ecaty1 and h1hwsey. I noticed that h1brsey was already powered up, but its code is not running?
Back in the control room, I noticed all EY FECs had the same large positive IRIG-B error, indicating a problem with the IRIG-B Fanout. Back at EY I confirmed the IRIG-B fanout was reporting the date as mid June 1999. After some issues, I power cycled the IRIG-B chassis and rebooted the FECs. The front ends were now running correctly.
At this point I ran out of time. There is timing issue with h1iscex, and the Beckhoff timing fanout is reporting an error with the fourth Duotone slave which I suspect is h1iscex.
To be done:
Start mid station PEM FECs.
power up h1ecatx1 and h1hwsex
Start Beckhoff slow controls code on h1ecat[x,y]1. Start HWS code on h1hwse[x,y]
Investigate Duotone Timing error at EX, get h1iscex running.
Start hepi pump controllers.
Start digital video servers.
Start BRS code.
Start PSL diode room Beckhoff computer.
At Approx. 8:12 this morning the site had a power outage. The GC and CDS UPS have mains shutting off and returning to normal in a 16 second span. This of course means anything not on UPS shut down. Bubba was able to log in and verify FMCS was functioning fine. I came to the site and verified no Fire Pumps were running as indicated by the alarms we were recieving. I have restarted the EPICS interface to FMCS to try to eliminate the text alarms.
Also started the 2fa machine so others could log in and monitor their systems.
There was a fire near the substation that provides our site power. I do not know if the cause of the power outage also caused the fire or if the fire perhaps caused our outage or they were unrelated. Not likely that they were unrelated. I will attempt to contact electrical dispatch Monday morning to see if they know.
Kyle is on site checking vacuum. Robert is crunching data and everything seems stable.
Dave will probably come in later today to get the front ends running again.
It looks like the only "hiccuup" experienced by the vacuum system following the site wide power outage was that IP6's controller defaulted to a "standby" state, i.e. High Voltage was disabled -> I re-enabled it. The vacuum bake ovens isolated from the roughing pumps at the loss of power but the Turbo pumps all spun down. As such, the bake load contents were exposed to a Torr*L or two of turbo exhaust but at least reversed viscous flow was prevented. Both ovens remained above 100C and the parts should clean up with the Turbos now restarted. The RGAs for the VBOs were isolated at the time and should not be affected by the spun down turbos. 1100 hrs, local -> Kyle leaving site now. Robert S. and Richard M. still on site.
Richard and Jonathan have fixed remote login to CDS.
h1tw1 is reporting a failed power supply on its RAID, presumably the one connected to facility power. I can confirm the RAID is operational and raw minute trend files are being written, so the second PS is OK.
cell phone texter is waiting for FMCS channels to restart. This is imminent, so I'll keep it running for now.
FMCS access still not working for me. Not urgent.
I am able to see FMCS from my computer at home and all FMCS systems have restarted and are working towards back to normal.
We had what appears to be a site-wide power glitch at 08:12 PDT this morning. Both CDS and GC UPS systems report being on battery power for about 6 seconds. Systems on UPS power rode through this, all other systems are down.
Vacuum controls is OK, MEDM overview screen is attached. Also DAQ is OK.
Systems down includes end station Beckhoff, all front ends, FMCS. We are getting multiple text alarms due to the systems being INVALID.
Here is the GC-UPS system's report:
Subject: lookout UPS gc-osb Power Failure !!!
lookout UPS gc-osb Power Failure !!!
APC : 001,035,0858
DATE : 2017-09-16 08:12:37 -0700
HOSTNAME : lookout
VERSION : 3.14.12 (29 March 2014) debian
UPSNAME : gc-osb
CABLE : Ethernet Link
DRIVER : SNMP UPS Driver
UPSMODE : Stand Alone
STARTTIME: 2017-08-22 08:29:07 -0700
STATUS : ONBATT
LINEV : 120.0 Volts
LOADPCT : 26.0 Percent
BCHARGE : 100.0 Percent
TIMELEFT : 209.0 Minutes
MBATTCHG : 15 Percent
MINTIMEL : 5 Minutes
MAXTIME : 0 Seconds
MAXLINEV : 121.0 Volts
MINLINEV : 120.0 Volts
OUTPUTV : 120.0 Volts
SENSE : Unknown
ITEMP : 22.0 C
BATTV : 211.0 Volts
LINEFREQ : 60.0 Hz
LASTXFER : Unacceptable line voltage changes
NUMXFERS : 4
XONBATT : 2017-09-16 08:12:31 -0700
TONBATT : 6 Seconds
CUMONBATT: 6556 Seconds
XOFFBATT : 2017-09-10 14:13:57 -0700
SELFTEST : OK
STESTI : 168
STATFLAG : 0x05060010
EXTBATTS : 20
BADBATTS : 0
END APC : 2017-09-16 08:12:37 -0700
Here are the CDS UPS reports (20 seconds on battery power):
Name : ups-msr-0
Location : LHO MSR
Contact : CDS Administrators
http://ups-msr-0.cds.ligo-wa.caltech.edu
http://10.99.3.10
Serial # : ZA0522010987
Device Ser #: PD0615240264
Date: 09/16/2017
Time: 08:12:29
Code: 0x0109
Warning - UPS: On battery power in response to an input power problem.
---------------------------------------------------------
Name : ups-msr-0
Location : LHO MSR
Contact : CDS Administrators
http://ups-msr-0.cds.ligo-wa.caltech.edu
http://10.99.3.10
Serial # : ZA0522010987
Device Ser #: PD0615240264
Date: 09/16/2017
Time: 08:12:49
Code: 0x010A
Informational - UPS: No longer on battery power.
For reasons unknown, we are unable to initiate new remote login sessons to lhocds and cdslogin. Bubba is currently logged in, presumably he started his session prior to the power glitch. Jonathan suspects the cdsadminctrl machine.
The good news is Bubba is able to remotely view the FMCS status.
The bad news about the remote access issus is that I am currently unable to turn off the cell phone texter, and we are getting multiple messages. This will quiet down as the alarms age.
attached CDS site overview MEDM.