Ryan C, Jonathan, Dave:
Starting around 18:28 Sun 15jan2025 the control room reported name resolution issues within CDS. Also the GC WIFI went offline.
The CDS alarm system froze up at 18:28, which agrees with the time the other services went offline.
Jonathan is reporting issues contacting GC DNS and managment machines, indicating this could be a GC issue.
Jonathan is heading to the site to investigate.
From the control room perspective:
teamspeak continues to run on the verbal machine.
phones continue to work
alog is accessible if the IP number is used, not the name.
scripts are failing if they need to resolve names, this is preventing squeezer work and H1's range is down to the 80s.
the alarm/alert system cannot resolve twilio's address, so no alarm texts/emails can be sent.
The issue has been resolved by power cycling the sw-osb163-0 switch. This is what DNS and a few other key services hang off of.
I restarted the switch around 8:14pm local time. Ryan C. confirms that he has access to the alog. I can get to the management machines and the dns servers, both locally and via offsite routes.
Alarms restarted itself at 20:20 and I restarted alerts at 20:54. Test messages confirmed these services are working correctly.
Opened FRS34439 to cover this, specifically how it impacted on control room operations.