Similar to Thu morning, the picket fence server data update slowly degraded overnight and eventually stopped updating.
All was working until 02:48 PDT when some data cycles started taking about 35 seconds (normally it cycles over 3 seconds). This transition is shown in the top trend.
About an hour later at 03:48 the frequency of the long reads increased and became regular (bottom plot).
At 07:13 the updates stopped completely, this is the current situation.
thanks Dave - we'll follow up with you and Erik off line to see if we can figure out what's up. We are not seeing this at Stanford. (also tagging SEI so that I see it)
- Later update - yes we are seeing this at Stanford, see comments below :(
I ran the code on opslogin0 in non-EPICS mode (no caputs to the IOC) and it appears to be working. For the record here is the command line output:
/opt/rtcds/userapps/trunk/isi/h1/scripts/Picket-Fence/Picket_fence_code_v2.py:751: ObsPyDeprecationWarning: Deprecated keyword loglevel in __init__() call - ignoring.
self.seedlink_clients.append(SeedlinkUpdater(self.stream, myargs=self.args, lock=self.lock))
Downloading from server: cwbpub.cr.usgs.gov:18000
US_HLID:00BHZ, US_NEW:00BHZ, US_MSO:00BHZ
Downloading from server: pnsndata.ess.washington.edu:18000
UW_OTR:HHZ, UO_LAIR:HHZ
here is the cleaned up pickets dictionary being used (commented BBB and NLWA removed)
pickets= {
"HLID":{
"Latitude":43.562,
"Longitude":-114.414,
"Channel":"US_HLID:00BHZ",
"PreferredServer":"cwbpub.cr.usgs.gov:18000"
},
"NEW":{
"Latitude":48.264,
"Longitude":-117.123,
"Channel":"US_NEW:00BHZ",
"PreferredServer":"cwbpub.cr.usgs.gov:18000"
},
"OTR":{
"Latitude":48.08632 ,
"Longitude":-124.34518,
"Channel":"UW_OTR:HHZ",
"PreferredServer":"pnsndata.ess.washington.edu:18000"
},
"MSO":{
"Latitude":46.829,
"Longitude":-113.941,
"Channel":"US_MSO:00BHZ",
"PreferredServer":"cwbpub.cr.usgs.gov:18000"
},
"LAIR":{
"Latitude":43.16148,
"Longitude":-123.93143,
"Channel":"UO_LAIR:HHZ",
"PreferredServer":"pnsndata.ess.washington.edu:18000"
}
}
I restarted picket fence on nuc5 by running /opt/rtcds/userapps/release/isi/h1/scripts/Picket-Fence/picket_epics.sh from a controls vnc remote display session and it is running normally again.
USGS sent an email at 10am reporting that one of the servers LHO uses (cwbpub.cr.usgs.gov) is being migrated. We are working on using a backup server in its place.
email:
The NEIC is in process of migrating services currently hosted at the Denver Federal Center to a new location. As a result, the availability of waveform services provided on the CWB 137.227.224.97 (cwbpb) are in flux. This may last for some time, possibly into early 2024.
Thank you very much for this monitoring. The new EPICS channels are coming in handy.
We also had a similar crash at Stanford and there is also a report of a restart at LLO this morning: LLO aLog 67616. The evidence seems to indicate the problem stems from the USGS server side.
Looking at the error mesagges in our local computer, it seems that the connection to the servers timed out ---> The picket fence attempts automatic restarts ----> the picket fence fails because lsim chokes on trying to filter an empty data vector. This error is unintended on the filtering script, so I need to dig what to do to make sure the restart works properly.
More importantly, we need to ping our USGS friends to see if this is part of some maintenance situation.
Edgard