RAL Tier1 weekly operations castor 07/01/2011
From GridPP Wiki
								
												
				Contents
Operations News
- ORACLE succesfully upgraded to 10.2.0.5
- CMS disk servers succesfully upgraded to SL5 64bit
- Checksumming turned on for cmsWanIn on 2/2/11 and for everything else on 7/2/11
- Fix for bad checksums (upgraded gridftp rpm) rolled out to lhcbMdst on 2/2/11 and for everything else (apart from Gen) on 7/2/11
- New puppetmaster02 rolled out for all Quattorized disk servers on 3/2/11
- Inactive job manager monitoring script rolled out to all primary job managers on 3/2/11
- 2.1.9-10 installed on Preprod - testing can now start
Operations Issues
- Lost tape CS7541. 78 files declared lost to LHCb. Remaining files were restaged as they were on disk.
- Number of incompletely transferred LHCb files getting the wrong checksums increased until fix was rolled out, and checksums were corrected and the migration queue reduced.
- A small number of files (<10) have been given wrong checksums, when they should contain '0000'. The same fix rolled out for LHCb helps with this bug as well.
Blocking Issues
- Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production. Now being ordered.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
| Description | Start | End | Type | Affected VO(s) | 
|---|---|---|---|---|
| Upgrade gridftp RPM on remaining LHCb, ATLAS and CMS disk servers | 07/02/2011 10:00 | 07/02/2011 12:00 | At-Risk | ATLAS,CMS,LHCb | 
| Roll out WAN tuning changes to cmsWanIn and cmsWanOut | 08/02/2011 09:00 | 08/02/2011 16:00 | At-Risk | CMS | 
| Upgrade and quattorize Gen disk servers to SL5 64 bit | 15/02/2011 08:00 | 15/02/2011 16:00 | Downtime | Gen | 
| Roll out WAN tuning changes to remaining CMS disk pools | 15/02/2011 10:00 | 15/02/2011 12:00 | At-Risk | CMS | 
| Roll out WAN tuning changes to all remaining disk servers (STC) | 01/03/2011 09:00 | 01/03/2011 16:00 | At-Risk | ATLAS,LHCb,Gen | 
Advanced Planning
- Upgrade Gen disk servers to SL5 64bit and Quattorize the remaining non-Quattorized disk servers
-  CASTOR certification and upgrade to 2.1.10 and upgrade of SRM to 2.10 which incorporates:
- fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
- fix to report files on draining disk servers accessed by FTS to be NEARLINE not UNAVAILABLE
 
- Upgrade the NS to 2.1.10
Staffing
- Castor on Call person: Matthew
-  Staff absence/out of the office: 
- Chris (all week)
- Richard (Mon,Tue,Thu)
 
