Reports until 11:38, Wednesday 23 December 2015
H1 CDS
david.barker@LIGO.ORG - posted 11:38, Wednesday 23 December 2015 (24415)
problems with LHO CDS backup systems

For reference, here is an email I sent today to lho-all reporting recent CDS backup systems problems, short term and long term solutions:

Over the past few weeks we have developed problems with the CDS backup systems, namely:

  • /ligo file system has filled
  • backing up the front end boot server (h1boot) causes EPICS freeze-ups on the front ends and has on one occasion caused lock loss
  • tape backup hardware failed yesterday
These are all problems related to aging hardware (most are over 4 years old) which have appeared at an unfortunate time (i.e. during an observation run and just before a major holiday).
 
We have new file servers, disks and tape robot on order and plan on replacing all this aging hardware in January. The new hardware will have much larger resources in terms of file-system and tape-backup size and speed. In the mean time we will get by with disk-to-disk-to-disk backups (each file on three disk systems, all protected with UPS power).
 
The main task we all should be doing to help out in the mean time is to ensure that all critical hand-edited files are under SVN version control and to ensure the repository is updated promptly when a file is modified (the last point maintains a good history of changes made to the file and permits restoring previous versions of a file). 
 
We should also refrain from writing large (many GB) files to the /ligo disk system.
 
I have a script called check_h1_files_svn_status which reports any outstanding local mods on critical IFO configuration and control files.
 
many thanks,
Dave