RAL Tier1 weekly operations castor 07/12/2009
From GridPP Wiki
								
												
				Contents
Summary of Previous Week
- Writing a sanity check to cross-ref tsbn stats (Cheney)
- Wrote discussion docs on virtualisation of castor and monitoring (Cheney)
- Created a Fabric Service & Delivery page on wiki (Cheney)
- Nagios plugin priority reviewed (Cheney)
- Started writing nrpe plugin training course (Cheney)
- Restarted build of cdbe07 (Cheney)
- Started build of castoradm1 replacement (Cheney)
- Test building a set of quattor templates for SLC 4.6 (Richard)
- Talk to CERN about getting a copy of their SLC 4.8 templates for quattor (Richard)
- Updated Tier1 wiki on quattor (Richard)
- Continue looking at tape problems thrown up with repack (Tim)
- CoD duties (Shaun)
- Repacking bad tapes (Tim)
- Investigation of ATLAS migration (Shaun)
- SRM development (Shaun)
- Working on polymorphic build (Chris)
- Negotiating with Platform about LSF licences (Chris)
- Working on Puppet servers: upgraded puppetdev and fixed problem on puppetmaster with corrupted YAIM information (Chris)
- Disk Draining For ATLAS SimStrip (Brian)
- Planning disk draining for lhcb (Brian)
- Cleansing of canbemigr candidates form bad files in DATADISKTAPE and FARM. (Brian)
- Two minor bugfix tweaks to CIP 2.0.3 (Jens)
- Developing Tier1 Change Management procedure, using CIP changes (Matthew)
- CASTOR input to GridPP review (Matthew)
- Arranging cover over X-mas period (Matthew)
- Learning about Quattor (Matthew)
Developments for this week
- Developing our January upgrade strategy (All)
- More polymorphic server work (Chris)
- Review configuration for new lsf-triplet and run some tests (Chris)
- Concentrate more on preproduction and work which Richard is doing (Chris)
- More build of castoradm1 replacement (Cheney)
- Build of new robot controller (Cheney)
- More investigation of ATLAS backlog (Shaun)
- More SRM development (Shaun)
- Continue looking at tape problems thrown up with repack (Tim)
- Finalizing CIP 2.1.0 testing and released to CERN, CNAF, and ASGC (Jens)
- Setting up replacement CIP on more resilient hardware (Jens)
- Setting up new CIP instance for T2K etc. (Jens)
- Investigate lhcbUser D2D copy problems (Matthew)
- CoD work (Matthew)
Operations Issues
- Ongoing migration problems on ATLAS - we believe are now fixed
Blocking issues
- Lack of Quattor configuration files for SLC4.8 is stopping us evaluating Quattor alongside CASTOR 2.1.8. Preprod setup will initially proceed with a Kickstart-based deployment.
Planned, Scheduled and Cancelled Interventions
- Deploy new CIP for T2K, ASAP (Pending approval)
- Replace CIP hosting machine with new one with more resilient hardware, after 21/12/09 (Pending approval)
- Deploy new LSF triplets, 14/01/10 (Pending approval)
Advanced Planning
- Gen upgrade to 2.1.8 2010Q1
- Install/enable gridftp-internal on Gen (This year/before 2.1.8 upgrade)
Staffing
- Castor on Call person: Matthew
