Currently h1oaf0 has been stable for 22 hours following the one-stop cable-transceiver replacement (as suggested by Daniel).
When the oaf stopped driving the DAC, the h1iop model's proc status file showed a very large value for adcHoldTimeEverMax (in the 90's) while most systems showed this value around 17uS.
If we can take this value was an indicator of a failing PCI-bus extender transceiver, I have written a script to scan all the front end computers and report this value. This was ran at 10:10PST and the results are tabulated below.
Note that they are all in the 16-20uS range except for the h1suse[x,y] systems which are in the 70's. The end station SUS machines are the newer type and this is a known issue not related to possible one-stop fibers.
h1iopsush2a | 17 |
h1iopsush2b | 18 |
h1iopsush34 | 19 |
h1iopsush56 | 20 |
h1iopsusauxh34 | 18 |
h1iopsusauxh56 | 18 |
h1iopsusauxh2 | 18 |
h1iopsusauxb123 | 19 |
h1ioppsl0 | 17 |
h1iopsusex | 74 |
h1iopseiex | 21 |
h1iopiscex | 18 |
h1iopsusauxex | 20 |
h1iopsusey | 71 |
h1iopseiey | 20 |
h1iopiscey | 18 |
h1iopsusauxey | 19 |
h1iopoaf0 | 17 |
h1iopsusb123 | 17 |
h1iopseib1 | 18 |
h1iopseib2 | 18 |
h1iopseib3 | 21 |
h1ioplsc0 | 17 |
h1iopseih16 | 19 |
h1iopseih23 | 16 |
h1iopseih45 | 17 |
h1iopasc0 | 17 |
h1ioppemmx | 18 |
h1ioppemmy | 19 |