RAL Tier1 weekly operations castor 29/03/2010
From GridPP Wiki
								
												
				Contents
Summary of Previous Week
-  Matthew:
- CASTOR Database Way Forward
- Tier1 Open Day talk
- Investigating safeguarding CASTOR Tier0 data (T2K,MICE,MINOS)
- Organizing CASTOR panel session at GridPP24
- Finalizing 2.1.8/2.1.9 Test Plan and Stress Testing specifications
 
-  Shaun:
- Tier 1 Open day talk
- LHCb Jamboree
- Scheduling Upgrades
- Fixing deployment problems
- COD duties
 
-  Chris:
- Tested maximum number of job slots for root protocol with Raja
- Building 4 cold stand-by central castor servers and doing the final configuration
- Deploying disk servers
- DepMon duties
- Castor on Call duties Mon-Tue
- Doing work related to Tier1 Security Group project
 
-  Cheney:
- cleaning machine room
- investigate sls timeouts
- build new robot controller
- fix zfs on new robot controller
- investigate oracle install problems
- check over castor151 backups
- relocate fibre channel switches
- replace failed drive in vtl
- fix backup problems on nagger
- bring up tape servers with mir problems
 
-  Tim:
- ..
 
-  Richard:
- Deploying some disk servers into cmsNonProd and lhcbNonProd
- Continuing with stress-testing of pre-prod instance and contributing towards test-plan
 
-  Brian:
- Clearence of stuck migration files
- Chase up of redeployment tickets.
- T2 work
 
-  Jens:
- Mostly bkg stuff, a little CIP 2.2.0 dev.
 
Developments for this week
-  Matthew:
- Tier1 Open Day
- CASTOR DB Disaster Recovery plans
- CASTOR On Duty work
- Publishing list of 'approved exceptions' - changes that don't require formal change control
 
-  Shaun:
- Tier 1 open day
- Presenting upgrade timelines
- CASTOR SRM Monitoring
- Testing SRM 2.8-6
 
-  Chris:
- Test SL5 (64bit) disk server with xfs
- Test cold stand-by central castor servers and then write documentation
- Disk server deployment duties
- Test Quattor disk server procedure and build castor disk server
- Castor 2.1.8/2/1.9 upgrade work
- Doing work related to Tier1 Security Group project
 
-  Richard:
- Tweaking stress-testing script to meet requirements of test-plan
- Running stress-testing script on pre-prod instance
 
-  Brian:
- T1 Open Day
- T2 Storage for LHC Media/Start of 7TeV Day
- T2s
 
-  Jens:
- See if I can get round to finishing new CIP features for ATLAS and test on preprod or cert.
 
Operations Issues
- problem transferring files to gdss346 (atlasSimRaw) due to error during deployment
Blocking issues
None
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
| Description | Start | End | Type | Affected VO(s) | 
|---|---|---|---|---|
| update LSF license keys | 26/03/2010 12:00 | 26/03/2010 12:30 | At-risk | All | 
| update LSF license keys | 29/03/2010 09:30 | 29/03/2010 10:30 | At-risk | All | 
Advanced Planning
- Upgrade to 2.1.8/2.1.9 2010
- CASTOR Instance for Non LHC 2010Q2
- Install/enable gridftp-internal on Gen (Before 2.1.8 upgrade)
Staffing
- Castor on Call person: Matthew
-  Staff absences: 
- None
 
