RAL Tier1 weekly operations castor 29/02/2016
From GridPP Wiki
								Revision as of 10:41, 26 February 2016 by Alison Packer 52064d6050  (Talk | contribs)
Operations News
- No disk server issues this week
 - globc updates applied, all CASTOR systems rebooted. initial issues with head nodes, 7 failed to reboot due to their build history. ACTION: they need their quattor build revisited so that this does not recur.
 - Main CIP system failed, have failed over to test CIP machine. HW failure to be fixed then will fail back over to production system
 - 11.2.0.4 DB client update had to be rescheduled, should go ahead Monday 29th, has been running in pre-prod for considerable amount of time. This should be transparent.
 
-  castor 2.1.15 update 
- ns upgrade on day of 29thFeb-3March; Downtime for all VOs
 - stager upgrade for one VO week commencing 21/3/16
 
 - Repack updated to 2.1.14-15
 - 2.1.15 works on preprod (RAL xroot rpm build) had not been put under stress yet
 - castor 2.1.16 coming soon - SRM integration into CASTOR code base
 - ATLAS gSoap Errors; JK (SdW advised) restarted SRM front ends
 - CMS AAA still an issue
 - LHCb upload still problematic
 
- VO DiRAC people from Leicester are coming online -
 - 2.1.15 change control had its first airing in change control - 2.1.15 currently not working for us.
 - new tape backed disk servers for Tier1 - to replace CV11, recommendation made to Martin
 - Merging tape pools wiki created by Shaun
 - 2.1.15 name server tested
 - New SRM on vcert2
 - New SRM (SL6) with bug fixes available - needs test
 - Gfal-cat command failing for atlas reading of nsdumps form castor: https://ggus.eu/index.php?mode=ticket_info&ticket_id=117846. Developers looking to fix within: https://ggus.eu/index.php?mode=ticket_info&ticket_id=118842
 - LHCb batch jobs failing to copy results into castor - changes made seems to have improved the situation but not fix (Raja). Increasing the number of connections to the NS db (more threads)
 - BD looking at porting persistent tests to Ceph