RAL Tier1 weekly operations castor 16/12/2016
Contents
Draft agenda
1. Problems encountered this week
2. Upgrades/improvements made this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
1. Castor 2.1.15 2. SL7 upgrade on tape servers
5. Special topics
6. Actions
7. Anything for CASTOR-Fabric?
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Operation problems
gdss685 (atlasStripInput) failed. Put back in prod after it had two drives replaced and rebuilt
gdss677 (cmsTape) failed and removed from prod
Heavy I/O load on the CV11 cmsTape disk servers dueo to lots of tape recalls and writes. SAM tests failed
Slow migration of diamond data to tape. Fdscts09 was showing very slow performance on a write to tape. Issue resolved after Tim changed a network cable that this server uses for the outbounf traffic
Operation news
The firmware on all CV13 disk servers has been upgraded to the latest version RT177723
The total number of transfer slots was increased from 4000 to 8000 on Dell2015 cmsTape servers which fixed the problem with the failing SAM tests
Putting the CV11 ds in cmsTape in read-only mode for few hours cleared the load e-log
Plans for next week
RA will continue development work on Castor 2.1.15
GP will continue development work on tape-server SL7 upgrade
Long-term projects
Castor 2.1.15 upgrade has been postponed until January 2017
First draft of castor tapeserver features completed and published for review.
Actions
Drain 10% of the 13 generation of disk servers (lhcbDst) for decommissioning RT181930
Merge CMS 2010, 2011 and 2012 tape families RT181914RT181913 RT181912
AoTechnicalB
V13 firmware upgrade
Staffing
RA on call next week and during Christmas closing time
