RAL Tier1 weekly operations castor 24/04/2012
From GridPP Wiki
								
												
				Contents
Operations News
- Increased number of d2d copy slots for atlasStripInput '07 and '08 servers to help drain disk server quicker (Frid)
- Fixed SLS tape monitoring
Operations Problems
- Xrtootd problems (transfer failures) for atlasStripDeg leads to a missing path/svc mapping in xrd.cf on the atlas DLF machine. (Wed)
- Atlas declared lost of 2 files due to clock being out of sync on atlas stager machine. The problem has been fixed and Nagios check created to monitory any time drifts (Thur)
- Gdss445 was having some issues with d2d copy effecting draining mode. Fixed by recreating lsf dynamic libraries and restarting lsf client daemons (Thur)
- Gdss209 (atlasScratchDisk) went down on Friday night and was recovered on Sunday morning
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
| Description | Start | End | Type | Affected VO(s) | Lead by | 
|---|---|---|---|---|---|
| CIP 2.2.0 upgrade (STC) | TBD | TBD | At-risk | All | Matthew | 
Advanced Planning
Tasks
- Test and re-apply CIP upgrade (Jens, Matthew)
- Test and certify 2.1.12-4 and 2.1.11-9 (Matthew, Chris)
- Stress testing of Transfer Manager (TM) (Shaun, All) DONE
- Ganglia monitoring for TM (Rob, Chris) IN PROGRESS
- Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
- Stress testing of CV11 generation disk servers on preprod (Rob, Matthew)
- Selection of disk-only prototype solution (Shaun, Rob, Brian, James)
- Switch to Tape Gateway on repack and test (Tim, Matthew) DONE
Interventions
- Upgrade repack to 2.1.12-4 (Apr)
- Switch from LSF to TM after 2.1.11-8 upgrade. Will need to better stress-test TM on preprod with more disk servers. (Apr)
- Switch to Tape Gateway (TG) once it has been tested on repack (May)
- Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jun)
- Upgrade Oracle to 11g (Jun)
- Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)
Staffing
- Castor on Call person: Chris
-  Staff absence/out of the office: 
- Rob (A/L)
 
