RAL Tier1 weekly operations castor 10/12/2012
From GridPP Wiki
								
												
				Contents
Operations News
- New configuration of rsyslog has now been tested to work against non-rsyslog logs (e.g. xrootd, nsd) which means that once rolled out, we can turn off backups on headnodes
- Tape verification script now tested to be working at RAL. This is like a tape version of Shaun's checksumValidator script on disk servers.
- New CIP is ready for testing which fixes the bug whereby some service classes wrongly report an UNDEFINED path in CASTOR.
Operations Problems
- (Mon) Poor performance on ATLAS stager. Stats were rebuilt, but this caused numerous locking sessions, which did not disappear when the stats rebuilding was halted, and only disappeared when the node hosting the ATLAS stager was restarted.
- (Tue) There appeared to be a transient network failure for ~5 minutes around 07:55 - which affected batch, transfers and the castor db.
Blocking Issues
none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB none
Advanced Planning
Tasks
- Simplify and document Quattor templates to make them easier to maintain
- Test and certify 2.1.13-5 with simplified Quattor templates
Interventions
- Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13
Staffing
-  Castor on Call person
- Matthew
 
-  Staff absence/out of the office: 
- (Mon) Matthew A/L
- (Mon-Wed) Chris at SDB user meeting, The Hague
- (Mon-Wed) Brian at ATLAS Jamboree, CERN
- (Thu-Fri) DS Group Away Day, DL
 
