RAL Tier1 weekly operations castor 24/05/2019
From GridPP Wiki
								Revision as of 09:49, 24 May 2019 by Tom Byrne 411f3ad327  (Talk | contribs)
Contents
Standing agenda
1. Achievements this week
2. Problems encountered this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
5. Special topics
6. Actions
7. Review Fabric tasks
1. Link
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Achievements this week
- No operational issues reported
 
Operation problems
- T2K issues with finding files on tape (GGUS 140870) - Currently on Alastair
 - ATLAS are periodically submitting SAM tests that impact availability and cause pointless callouts - Currently with TA
 - Nagios alert for subject alt names on stagers needs updating. With production team.
 
Plans for next few weeks
-  Examine further standardisation of CASTOR pool settings.
- CASTOR team to generate a list of nonstandard settings and consider whether they are justified.
 
 - Castor tape testing to continue after the production tape robot networking is installed
 -  Decommission lhcbDst then move them to wlcgTape
- Agreed with Raja to keep the data in lhcbDst pool till the 7th of June (2 months after the last storage element was copied to Echo)
 
 -  Migration of the LHCb VO to wlcgTape planned for Tuesday the 28th of May
- Downtime has been declared
 
 -  Kevin has done some storageD functional tests with the new tape robot
- Sent data to several different tapes, need to test recalls from multiple tapes
 
 
Long-term projects
-  New CASTOR WLCGTape instance.
- Migration of name server to VMs on 2.1.17-xx is waiting until aliceDisk is decommissioned.
 
 -  CASTOR disk server migration to Aquilon.
- Need to work with Fabric to get a stress test (see above)
 
 -  The problem of castor-functional-test1 has been absorbed into the task of sorting out worker node grid-mapfile generation and distribution.
- RA to make a VM for James
 
 - SL7 VM headnodes need changes to their personalities for the facilities
 
Actions
-  AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs. 
- Some discussion about what exactly is required and how this can be actually implemented.
 -  CASTOR team proposal is either:
- to switch all of these directories to a fileclass with a requirement for a tape copy but no migration route; this will cause an error whenever any writes are attempted.
 - to run a recursive nschmod on all the unneeded directories to make them read only.
 - CASTOR team split over the correct approach.
 
 
 -  Problem with functional test node using a personal proxy which runs out some time in July.
- RA met with JJ, requested an appropriate certificate.
 - GP to follow up with JJ
 
 -  RA and DM to sit down to sort out storage metric question
- Plan to create new metrics for GridPP6.
 
 
Staffing
- Rob is back very briefly on Thursday and Friday
 
AoB
On Call
GP on call.