H. Radkins, J. Kissel Similar to what we've seen before on front end computers for HAM2 -- both SEI and SUS -- (see LHO aLOGs 11481, 10849, 10375, 8964, 8424, 7385), we found the h1seiham23 computer inexplicably unable to drive past its IOP model this morning. As with each of the other 6 times, we had to kill the user front-end processes, restarted the IOP process, reset all watchdogs to regain actuation. A good guess of the source this time would be the problems with h1sush2b this weekend, but a trend of the CDS State Word DACKILL bit shows the indicative status changed at Aug 9 2014 22:30 UTC, 15:30 Saturday Afternoon (after Dave finished his bootfest with h1sush2b). So ... still no smoking guns for this problem, nor any better solutions than the time-honored, sledge-hammer fix, restart of all models on the computer. We STILL need a better indication of this problem. The DK bit in the CDS word often gets ignored by CDS admins, because they assume it's the user or iop DACKILLs that have tripped after their reboots (as expected). SEI / SUS commissioners -- though CDS state word is now on everyone's overview screen -- lose track of what bits mean what, the only non-invasive power they have is to reset all watchdog that they have, and there're often things like IPC errors (which is the bit right next to it) that don't affect the performance of the platform so the word cries wolf often. Presumably, to better indicate the problem, we have to identify the problem to begin with... Details: ------------------- - Tried turning on ISI, saw that actuators trip. A plot of the trip shows a clear unstable ramp up of output -- but no sign of movement from the sensors. - Found no DAC outputs reported by the IOP, even though user model has requested it. - "DK" bit shows red, but all MEDM control of USER and IOP DACKILL buttons have been reset. Bit went red, Aug 9 2014 22:30 UTC, 15:30 Saturday Afternoon (after Dave finished his bootfest with h1sush2b). - Checked h1seih23 dmesg, controls@opsws4$ ssh h1seih23 controls@h1seih23$ dmesg this reports some node errors, but unfortunately the time stamp is meaningless. In the "recent" history, it reports things like "Session for node 52 is disabled - Status = 0x5" "Heartbeat alive-check for node=52 failed (cnt=6614 state=0x1 deb=0 val=0)." "Session for node 20 is disabled - Status = 0x5" "Session for node 20 is disabled - Status = 0xf" though, according to T1400026, no front-end process is assigned to the DCUID 52, and h1ascimc.mdl is using DCUID 20, should be totally unrelated. - Checked /proc/h1iopseih23/status file, controls@opsws4$ ssh h1seih23 controls@h1seih23$ vi /proc/h1iopseih23/status reports DAC FIFO Status for both DACs is OK DAC #0 16-bit fifo_status=2 (OK) DAC #1 16-bit fifo_status=2 (OK) - Everyone of previous aLOGs of this problem says "kill user models, restart iop process, restart user models." - killed all user models, restarted iop model, restarted user models, cleared all watchdogs and problem clears up.
I raised Alarm Levels for all LVEA Dust Monitors (numbers: 2,3,4,13, & 15) except for those which are still monitoring clean areas (Dust Monitor #'s 5,6 & 16). Raised 0.3um levels to 1200 & 1500 counts (that's minor & major) & 0.5um levels to 600 & 800 counts.
I was not able to change the EX Dust Monitors (this is because they are new monitors). I changed the EY dust monitors the same as the LVEA ones above.
Laser Status
PMC
FSS
ISS
Ran purge air compressors for ~10 minutes @ ~0945 hrs. local
The h1seib1 front end computer was locked up since Aug. 10, 18:46 PDT. Restarted computer this morning at 08:35 PDT. IRIG-B timing was bad until 08:39. No other computers were glitched restarting this dolphin-connected front-end.
model restarts logged for Sun 10/Aug/2014
2014_08_10 08:32 h1fw1
unexpected restart of h1fw1, other writer was running, no data loss.
(Borja)
This entry is a summary of the manual measurements results of the ETMY charge done before Rai's ionized gas injection discharge run1 and run2.
The measurement technique is the standard procedure of injecting a sinusoidal signal of 4Hz and 91.5Volt in amplitude to each quadrant of the ESD (with the exception of quadrant LL because it did not respond to the excitation, more about this on another entry). Then we monitor the deflection of the ETMY both in Pitch and Yaw by looking at the oplev which was carefully centre before any measurement was taken. The deflection was measured in diaggui as power spectrum plot of pitch and yaw, with a BW = 0.01Hz on the range between 1 – 5 Hz and averaged of 2 (for the before measurements) and 3 (for the after discharge measurements) times.
During the measurements the coherence between ‘excitation and Pitch’ and ‘excitation and Yaw’ was monitored to be sure that the excitation was observed. The phase (in degrees) of the transfer function between excitation and oplev pitch and yaw was measured to take into account the sign in the deflection.
Attached to this entry I provide a document with the results for each measurement. Also I provide the combined plots of the results in the standard way “Normalised deflection [µrad/V] vs V BIAS” where the deflection is normalised by dividing the deflection with the excitation amplitude (91.5Volt). The plot contains quite a lot of information (well explained by reading the plot legends) so a zoom version is also added around the Veff values (or deflection zero crossing).
Next I show a table with the summary, providing for each quadrant values of the pitch and yaw slope and Veff.
|
UL before |
UL after1 |
UL after2 |
UR before |
UR after1 |
UR after2 |
LR before |
LR after1 |
LR after2 |
Veff PITCH [V] |
112 |
34 |
43 |
52 |
- |
11 |
123 |
33 |
31 |
PITCH slope [10-7 µrad/V] |
-2.585 |
-2.61 |
-2.63 |
1.89 |
- |
2.35 |
-2.65 |
-2.63 |
-2.655 |
Veff YAW [V] |
125 |
72 |
77 |
103 |
- |
1 |
144 |
48 |
54 |
YAW slope [10-7 µrad/V] |
-2.21 |
-2.30 |
-2.25 |
2.34 |
- |
2.54 |
2.32 |
2.37 |
2.365 |
Looking at the Veff in the table above we notice that the first discharge run was effective by a factor between 2 and 4 depending on quadrant and type of deflection (pitch or yaw). However the second discharge did not have so big effects and in some cases showed an increased charge.
(R. Weiss)
Rai has suggested that the results may suggest that the gate valve, in the second ionized gas discharge run, may have not been opened into a clear aperture even though the measurements of the ion currents were larger. It would be advisable to learn the best injection method for which It all needs to be done a third time. Rai thinks that the results so far are useful enough to decide whether we want to build more of the ionizers. Another very important step in the final decision will be the effect that the green laser light will have in the charging of the ETMY.
G.Moreno, K. Ryan, B, Sorazu, J. Worden, R. Weiss Preliminary results from the first discharging indicate the charge has been reduced to 1/4 of the originally measured charge. As we were dissatisfied by the negative to positive ion ratio and the flow rates in the initial attempt, we made a second injection on August 8 with 1/2 the flow rate and about 3 times the ion current into the test mass chamber. The charge on the etmy has not been measured yet. The new flow conditions are closer to those in the original experiment done at MIT which reduced the charge to 0 +-10% of its initial value. Again, once all the data is together, a more informative posting of the results will be made.
model restarts logged for Sat 09/Aug/2014
2014_08_09 01:52 h1fw0
2014_08_09 04:02 h1fw0
2014_08_09 04:07 h1fw0
2014_08_09 04:12 h1fw0
2014_08_09 04:16 h1fw0
2014_08_09 04:20 h1fw0
2014_08_09 04:24 h1fw0
2014_08_09 05:16 h1fw0
2014_08_09 05:22 h1fw0
2014_08_09 05:31 h1fw0
2014_08_09 05:41 h1fw0
2014_08_09 05:53 h1fw0
2014_08_09 05:56 h1fw0
2014_08_09 06:07 h1fw0
2014_08_09 06:10 h1fw0
2014_08_09 06:16 h1fw0
2014_08_09 06:20 h1fw0
2014_08_09 06:30 h1fw0
2014_08_09 06:52 h1fw0
2014_08_09 07:04 h1fw0
2014_08_09 07:09 h1fw0
2014_08_09 07:17 h1fw0
2014_08_09 07:20 h1fw0
2014_08_09 07:25 h1fw0
2014_08_09 07:33 h1fw0
2014_08_09 07:37 h1fw0
2014_08_09 07:42 h1fw0
2014_08_09 07:50 h1fw0
2014_08_09 07:53 h1fw0
2014_08_09 08:01 h1fw0
2014_08_09 08:06 h1fw0
2014_08_09 08:12 h1fw0
2014_08_09 08:16 h1fw0
2014_08_09 08:23 h1fw0
2014_08_09 08:28 h1fw0
2014_08_09 08:31 h1fw0
2014_08_09 08:36 h1fw0
2014_08_09 08:40 h1fw0
2014_08_09 08:48 h1fw0
2014_08_09 11:48 h1iopsush2b
2014_08_09 11:48 h1susim
2014_08_09 11:54 h1iopsush2b
2014_08_09 12:11 h1iopsush2b
2014_08_09 12:11 h1susim
2014_08_09 14:03 h1fw0
2014_08_09 14:11 h1fw0
2014_08_09 14:28 h1fw0
2014_08_09 14:42 h1fw0
2014_08_09 14:59 h1fw0
Unexpected restarts of h1fw0 due to disk problems. Restart of crashed h1sush2b. Restarts of h1fw0 following disk repair.
We'll stay at this power level until we confirm the amount of loss in IMC.
Alexa, Kiwamu
We performed another ring down measurement using the REFL port. The data still did not make sense.
We probably should switch the approach to something else e.g. a cavity pole measurement (see alog 5429).
The setup:
Since we had the PRM reflection on ISCT1, we used this light instead of the one in IOT2L (see alog 13280). First of all, wiggling PRM and looking at the REFL analog camera, we confirmed that there was no clipping in the REFL path. In fact, the beam already looked quite centered on the REFL camera from the beginning because of our alignment effort (see Alexa's previous alog 13317). We intentionally misaligned PR2 to avoid any interference from the main interferometer.
We moved a PDA55, which had been on ISCT1 for triggering the REFL shutter, to a point right in front of the 3-f broadband PD. In the process of aligning the beam, we touched a steering mirror that was in front of the broadband PD. We then made sure that the gain setting is at its minimum of 0 dB. Since at this point the light was too bright for the PDA55, we placed a ND1 filter to cut down the laser power. This resulted in a DC voltage of about 440 mV when the IMC is in lock -- therefore the dark offset of the PD was negligible this time.
The results:
We repeated the same measurement as the previous one, but with the PRM reflection. We flipped the polarity of the fast path in the IMC common mode board to rapidly unlock it. Every measurement, we obtained a 1/e time of roughly 20 usec. This was coarsely measured with the cursor in a digital oscilloscope. Obviously, the results don't make sense. The attached is a photo of the oscilloscope display.
In Y-end from 1735-1745 hrs. local, X-end 1755-1800, LVEA 1805-1810
Alexa, Kiwamu, Borja, Stefan With the IMC locking reliably, and the input beam hitting the BS, we - Found the good alignment for PRM and got the REFL beam centered on ISCT1. - tweaked the PR2 alignment to get fringes in PRX (ITMY misaligned) - locked PRX on the carrier and tweaked down the REFL power with PR2 and PRM. - installed a camera looking down to HAM5 from HAM4. We could clearly find the beam on the Faraday cage. - Using SR2 we steered the beam through the Faraday, and verified that it arrives in HAM6. Attached is an alignment snapshot. Remarks: - while the input beam through IM3,IM4 and PR2 might still be going a bit up and down, after PR3 it should be pretty good (we didn't change the PR3, BS, ITMX and SR3 aligment (compared to alog 12816 ) - we didn't try to tweak the SRM yet.
Greg, Dan, Dave
h1fw0 is back up and running.
The problem was with disk9 in the raid raid-dcs-h1a. Its status led was flashing yellow instead of steady green. But it appears to have only partially failed, and the raid continued to try to use it. The result was an unstable file system which cannot keep up with the frame writing.
Step 1 was to run the Oracle 'guds' command to provide diagnostics.
The second step was to stop reading this file system and only have h1fw0 write it, still unstable.
The third step was to power cycle the solaris box h1ldasgw0, remount to h1fw0 and restart frame writer, still unstable.
The fourth and most drastic step, was to walk to the LDAS server room and physically remove the offending disk #9 from the raid (hot removal). This forced the raid to stop trying to use disk9 and to start using the hot-swap spare.
At the time of writing h1fw0 has been running for 20 mins, which is longer that it has since 4am this morning.
Came on site to resolve this issue. Console for h1sush2b shows many errors and is frozen. First recovery attempt is a power cycle of the front end computer.
First removed h1sush2b from the Dolphin IPC network using h1susb123 as a remote disabler, this worked.
Power cycled the computer, let the models start themselves. As I suspected, h1sush2b was actively corrupting the DAQ data streams to the concentractor. As soon as h1sush2b was powered down all front ends DAQ data became good again.
IOP model started with large IRIG timing error, daq status 0x4000. Stopped h1susim and restarted h1iopsush2b, still get IRIG-B errors. Checked IRIG-B is OK, suspect problem is at the IO Chassis end.
Step 2 was a full power down of both computer and IO Chassis. Stopped all models, and removed h1sush2b from Dolphin using itself to do this. Powered down h1sush2b. In CER, turned front panel switch to OFF, this did nothing. Disconnected the 24V-DC power cable at the power strip, noting that this was the only thing plugged into this strip (the two other IO Chassis use a different strip) maybe a hint. Powered IO Chassis backup, switch to ON, waited for timing slave to sync. In MSR power up h1sush2b, both models start with no errors.
So not sure why h1sush2b died, but suspect a glitch in CER at 00:08 this morning.
for the systems whose DAQ data was corrupted, the time period we have no data is:
00:06 - 11:50 PDT August 9th.
(Stefan, Kiwamu, Koji, Alexa)
We repeated the ringdown measurement with the low gain setting (0dB compared to the nominal 30dB) and the ND0.6 filter removed (we still need to eventually revert to the nominal configuration). This time we found that the ringdown time was approximately 25µs, which is still larger than what we expected. We want to repeat this measurement with REFL from ISCT1, however, we could not find the beam for a while and then got distracted with other things...This is still to be done.
We adjusted the gains, phase rotation, I-Matrix and IMC WFS matrix. We can now engage the WFS; this increased the trans PD to ~1200 and was stable. I have attached a screenshot of the current configuration.
We increased the PSL power to 1.9W.
The beam is too high on MC3 by 1.5mm and horizotanlly off by 2.138mm in the direction away from IOT2L. This result is almost the same as the previous measurement in alog 8943, so we are fine with this off-centering.
(Richard, Cyrus, Alexa. Kiwamu)
GigE camera 08 of PR2 was first rebooted and then had to be adjusted because the image was clipped.