Difference between revisions of "Tier1 Operations Report 2013-12-11"
From GridPP Wiki
								
												
				| Gareth smith  (Talk | contribs)  | 
| (No difference) | 
Latest revision as of 14:07, 11 December 2013
RAL Tier1 Operations Report for 11th December 2013
| Review of Issues during the week 4th to 11th December 2013. | 
- A number of files (some tens) have been found to be missing from Castor as part of the Atlas renaming exercise. Currently around 280 files missing out of around 7 million renamed. So far these have been older files as the renaming has started with those. These are being catalogued and being dealt with in blocks. So far we are not aware of any systematic pattern to the missing files. These numbers are broadly in line with those seen by other Tier1s.
- Independently of the above one specific file was reported missing by Atlas in a GGUS ticket (and has been declared lost to them)
- Batch work for the non-LHC VOs was stopped and drained Tuesday/Wednesday (10/11 Dec) for the software server to be moved.
| Resolved Disk Server Issues | 
- None.
| Current operational status and issues | 
- Atlas have reported batch job failures. The most probably cause is thought to be due to a high inodes counter in CVMFS. We are rolling out some updates to the worker nodes that requires a reboot of each WN and will monitor the effect of this on the job failures.
| Ongoing Disk Server Issues | 
- None
| Notable Changes made this last week. | 
- A UPS/Generator load test was carried out successfully this morning.
- A Condor update and a reduction in the memory over-commit (as well as kernel/errata updates) are being rolled out across the batch farm.
- FTS3 has been upgraded (to version 3.1.46-1.el6).
| Declared in the GOC DB | 
- None.
| Advanced warning for other interventions | 
| The following items are being discussed and are still to be formally scheduled and announced. | 
- There will be an interruption to the small VO's software server as it to be physically moved.
Listing by category:
-  Databases:
- Switch LFC/FTS/3D to new Database Infrastructure.
 
-  Castor:
- Castor 2.1.14 testing is starting. It is expected to be a few months before deployment.
 
-  Networking:
- Possible move of Tier1 core network switch in January (TBC).
- Implementation of new site firewall.
-  Update core Tier1 network and change connection to site and OPN including:
- Install new Routing layer for Tier1
- Change the way the Tier1 connects to the RAL network.
- These changes will lead to the removal of the UKLight Router.
 
 
-  Fabric
- Firmware updates on remaining EMC disk arrays (Castor, FTS/LFC)
- There will be circuit testing of the remaining (i.e. non-UPS) circuits in the machine room during 2014.
 
| Entries in GOC DB starting between the 4th and 11th December 2013. | 
| Service | Scheduled? | Outage/At Risk | Start | End | Duration | Reason | 
|---|---|---|---|---|---|---|
| Whole Site | SCHEDULED | WARNING | 11/12/2013 10:00 | 11/12/2013 12:00 | 2 hours | RAL Tier1 site in warning state due to UPS/generator test. | 
| Open GGUS Tickets (Snapshot at time of meeting) | 
| GGUS ID | Level | Urgency | State | Creation | Last Update | VO | Subject | 
|---|---|---|---|---|---|---|---|
| 99556 | yellow | Very Urgent | In Progress | 2013-12-06 | 2013-12-10 | NGI Argus requests for NGI_UK | |
| 98249 | Red | Urgent | In Progress | 2013-10-21 | 2013-12-10 | SNO+ | please configure cvmfs stratum-0 for SNO+ at RAL T1 | 
| 98122 | Red | Less Urgent | Waiting Reply | 2013-10-17 | 2013-12-09 | cernatschool | CVMFS access for the cernatschool.org VO | 
| 97868 | Red | Less Urgent | In Progress | 2013-10-08 | 2013-12-04 | T2K | CVMFS for t2k.org | 
| 97385 | Red | Less Urgent | In Progress | 2013-09-17 | 2013-12-09 | HyperK | CVMFS for hyperk.org | 
| 97025 | Red | Less urgent | On Hold | 2013-09-03 | 2013-11-05 | Myproxy server certificate does not contain hostname | |
| 86152 | Red | Less Urgent | On Hold | 2012-09-17 | 2013-10-18 | correlated packet-loss on perfsonar host | 
| Availability Report | 
| Day | OPS | Alice | Atlas | CMS | LHCb | Comment | 
|---|---|---|---|---|---|---|
| 04/12/13 | 100 | 100 | 100 | 100 | 100 | |
| 05/12/13 | 100 | 100 | 100 | 100 | 100 | |
| 06/12/13 | 100 | 100 | 100 | 100 | 100 | |
| 07/12/13 | 100 | 100 | 100 | 100 | 100 | |
| 08/12/13 | 100 | 100 | 100 | 100 | 100 | |
| 09/12/13 | 100 | 100 | 100 | 100 | 100 | |
| 10/12/13 | 100 | 100 | 100 | 100 | 100 | 
