| 
| General updates |  
| 
Tuesday 10th December
 
 There is a pre-GDb today on Identity Federation in WLCG. It will discuss existing federation work around the community and set a WLCG direction. Join by Vidyo if you wish to contribute!
 There is a GDB tomorrow. The agenda will cover security; provisioning of EGI core services; SHA-2 readiness; ops coordination updates; an update on networking and report from HEPiX.
 A meeting of the middleware readiness working group will take place on Thursday afternoon.
 Minutes from the GridPP technical meeting on Friday are available.
 The draft Tier-2 availability/reliability report was circulated last week. Corrections due by 15th December. Also please check the VO reports and the EGI/NGI report!
 Note LAL reports VOs running SAM tests under a regular account is showing up fair-share limits in the results - with subsequent impacts on A/R results.
 There are plans for a January HEPSYSMAN at Birmingham.
 The Sussex GGUS access issue was resolved. For future reference, GGUS support access can be applied for via this page.
 
 Tuesday 3rd December
 
 Although ready, the UK CA will wait to move to default SHA-2 certificates in January (WLCG overall has not confirmed readiness). 
 There is an EGI push for ARGUS deployment - a central server is being configured at RAL.
 Minutes from Monday's regular WLCG ops call are available. Generally quiet.
 Monday 25th November
 
 There is a pre-GDB on Identity Federation in WLCG (agenda). The next GDB is on 11th December.
 EMI-3 WN tarball status (and glexec)?
 There is an LFC outage today (see the downtime announcement.
 The middleware readiness group are setting a time for their meeting. More site admins are needed! Discussions will surround the items in the twiki.
 There was an email thread last week on ATLAS plans to move jobs/data away from a site going into downtime. The focus seemed to be on the execute not the storage side of things. 
 A new SAM interface is available for checking.
 Glue2 information validation is ongoing. Look to the monitoring summary page for more information.
 
 |  
| WLCG Operations Coordination - Agendas |  
| 
Tuesday 10th December
 
 Confirmation of the multi-core task force with this mandate. Some concerns about overlaps with the machine/job features TF.
 Discussion of experiment Christmas plans
 Update of the [ttps://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions baseline versions]. BDII update important for SAM BDII nodes at CERN.
 Tier-1 WNs on OPN is now being tracked here.
 ALICE - MC will continue over break. Best efforts approach appreciated.
 ATLAS - plans for ramp up of MC production. Repro and analysis ramp up also expected in coming weeks.
 CMS - Run2 MC samples prep starting. "Appreciate all support from the sites we can get, but don’t expect normal levels of support, especially for T2 sites"
 LHCb: Usage of distributed grid resources for mainly monte carlo productions. Surveillance by the operations team on a best effort basis. Also note a new CVMFS dashboard for LHCb.
 Christmas plans summary: "All experiments will run activities over christmas at non negligible scale. They do not require special effort from sites or WLCG in general, while best effort support is highly appreciated"
 
 WMS decommisioning: looks like WMS usage by CMS decreasing but it is variable.
 glexec: 31 tickets remain open. Status tracked here.
 FTS3: testing ongoing
 Tracking tools: An engineer will be on-call for GGUS over the vacation period.
 perfSONAR: Code maintenance an issue with BNL funding cuts. Looking at OSG and ESNet options. 3.3.2 out soon. See Status & Plans update. Asking sites to make accessible the perfSONAR main page
 (https://<hostname>/toolkit) for the central operations activity. Plans are for OSG to host perfSONAR-PS central service, BNL dashboard not all correct.
 
 IPv6: request from CMS to have IPV6 supported on SLC5 at CERN. Alistair D taking on ATLAS role for IPv6 testing.
 Middleware readiness: Meeting planned for 12th December.
 Machine/job features: Discussion between current implementation and proposed route minimizing draining waste (MDW) cpu time for multi-core pilots.
 SHA-2: still some updates at sites ongoing (>10 sites). "by mid January the WLCG infrastructure is expected to be essentially ready ". OSG plans to move in mid-January.
 VOMRS: VOMS-Admin still in testing.
 Tuesday 3rd December
 |  
| Tier-1 - Status Page |  
| 
Tuesday 17th December
 
 Rolling updates to Worker nodes (applying kernel/errata updates, updating Condor version and slightly reducing memory overcommit) ongoing. Other updates to Grid Services being applied.
 Checking systems ahead of Christmas break. During the holiday we will have our usual out of hours cover supplemented by a brief daily check of systems.
 |  
| Storage & Data Management - Agendas/Minutes |  
| 
Monday 9th December
 
 Spacetokens for non-LHC VOs - recommendations.
 Tuesday 8th October
 
 The DPM workshop agenda and registration page will appear here.
 Monday 30th September
 
 A DPM workshop is being organised in Edinburgh for 13th December. GridPP PMB anticipated covering travel for of order 10 UK sysadmins for this event. Interest should be indicated during the storage group meeting.
 
 |  
 
| Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06 |  
| 
Tuesday 26th November
 Tuesday 5th November
 
 A reminder to keep an eye on the SL HS06 page for odd ratios. Steve takes HS06 cpu numbers direct from ATLAS and the page does get stuck every now and then.
 The metrics page has been updated. 
 Tuesday 13th August
 |  
| Documentation - KeyDocs |  
| 
See the worst KeyDocs list for documents needing review now and the names of the responsible people.
 Monday 11 November
 
 The plan for use of adoption of backup servers continues to evolve. Please see latest version here. The new version contains details of tests and concluding operations for site and VO admins.
 The approved VOs page continues to be updated with the newest data from the operations portal. 
 Note: T2K now requires liblockfile-devel.
 Tuesday 5th November
 
 Documents states will be reviewed at the core ops meeting this coming Thursday. 
 Tuesday 1st October
 
 The approved VOs page has been updated with the newest data from the operations portal. Note that the VOMS records for LondonGrid now contain some alternative voms servers. The migration plan for use of these backup servers is now document here.
 |  
| Interoperation - EGI ops agendas |  
| 
Tuesday 3rd December
 
 Additional notes:
 the 2.6.16 version of dCache mentioned has a serious bug in the migration module; 2.6.17 has this fixed so should be used in preference. The possibility of skipping 2.6.16 in the overall release of EMI-3 being discussed
 Note that the cream updates mentioned in this meeting contain security updates and so are recommended. 
 Looking for CREAM/LSF plugin staged rollout, but don't believe there are any such sites in the UK
 SHA-2 : 17 sites remaining in the EGI that are publishing SHA-2 and alarming; I don't think that any such sites in the UK (just a couple) are unaccounted for/previously documented.
 It was asked when CAs would start issuing SHA-2 certs only (UK noting that it's planning to from January)
 
 Next meeting: (last for 2013) 16th December
 gLite support calendar.
 
 |  
| Monitoring - Links MyWLCG |  
| 
Tuesday 10th December
 
 Feedback transmitted and discussed by consolidation group; next meeting is now in January. 
 Tuesday 26th November
 
 As noted by Alessandra, if possible we'd like site feedback on the consolidated monitoring prototype before the next meeting a week on Friday to report back to the group (with thanks to everyone who has already contributed)
 
 Some notes to form a wiki on Graphite are to be found here: https://www.gridpp.ac.uk/wiki/MonitoringTools but these are under development, however if there are areas people would find useful that could be expanded, please let David know.
 
 Glasgow dashboard now packaged and can be downloaded here. 
 |  
| On-duty - Dashboard ROD rota |  
| 
Monday 9th December
 Tuesday 3rd December
 
 Both Nagios servers upgarded to SAM update22 and the active instance has moved back to Oxford again. 
 Some critical alarms from the old instance had to be dealt with directly.
 Bristol and RAL PPD ARC CE has a few issues after the upgrade. Luke opened a ticket with the ARC developers and it is on-going. 
 EFDA Jet and Sussex had gLExec issues after the test became critical.  A new admin is starting in Sussex but gLExec may take a while to be sorted. Jet has opened a ticket to solve their gLExec problem.
 Brunel has an apel ticket open.
 UCL SL6 upgrade on-going and may have issues.
 RALPP dcache mid mon ticket still open
 |  
| Rollout Status WLCG Baseline |  
| 
Tuesday 29th Oct
Yesterday the first stage rollout request (for the CREAMCE) in months has come through.
I've updated the Stage of the Nation page.
 Tuesday 8th Oct
There have been updates to EMI2 and 3 yesterday, but no new request for Staged Rollout.
There is a problem with dcap-libs: [GGUS 97805] 
References
 
 |  
| Security - Incident Procedure Policies Rota |  
| 
Tuesday 19th November
 
 There was a team meeting last Friday 15th November. Next meeting on 29th.
 Just a couple of site issues showing up in Pakiti.
 Looking at ARGUS server for UK NGI.
 Tuesday 29th October
 
 There was a team meeting on Friday 25th.
 A couple of critical warnings are appearing in Pakiti and being followed up.
 |  
 | 
| Services - PerfSonar dashboard | GridPP VOMS |  
| 
Tuesday 26th November
 
 The main perfSONAR issues this week affect Manchester and Sussex.
 Tuesday 19th November
 
 There is a new dashboard. Feedback is welcome.
 Manchester, Durham, Glasgow and Sussex show problems across the board.
 Tuesday 1st October
 
 PerfSONAR latency hosts configured to use the WLCG meshes should now have a traceroute measurement achive (MA) accessible from the GUI under 'Service Graphs' --> 'Traceroute'. Here is an example.
 Tuesday 17th September
 
 Upgrading/re-installing hosts to v3.3.1/mesh is only making slow progress.
 There is a new view of the status between sites.
 An outage at Manchester due to central switch maintenance means that VOMS is not going to be contactable for a period this morning. It is clear that we need the backup VOMS instances fully available to VOs - please can someone take a lead?
 |  
| Tickets |  
| 
Monday 9th December 2013, 15.30 GMT</br>
34 Open tickets in the UK. In the interests of efficiency/laziness I only looked at the tickets that were updated in the last 7 days as I went over all the tickets last week. And here's what I spied (there's not much going on really).
 TIER 1</br>
https://ggus.eu/ws/ticket_info.php?ticket=99556 (6/12)</br>
As seen on TB-SUPPORT, a ticket is in for an NGI level argus server at the Tier 1. I'm sure this will be discussed elsewhere in the meeting. In progress (9/12)
 https://ggus.eu/ws/ticket_info.php?ticket=97385 (17/9)</br>
The HyperK cvmfs ticket. This one is almost done, Catalin remarks that once he's happy he'll solve this ticket. The other cvmfs tickets (Sno+, cern@school, T2K) are also chugging along nicely. In progress (9/12)
 SHEFFIELD</br>
https://ggus.eu/ws/ticket_info.php?ticket=98594 (4/11)</br>
LHCB problems transferring job results out of Sheffield. If progress has stalled could the ticket be On Held? Or if it's still chugging along can we get an update? In progress (27/11)
 SUSSEX</br>
https://ggus.eu/ws/ticket_info.php?ticket=99198 (26/11)</br>
WN-glexec Nagios test failures. Daniela extended the ticket one more time on the 8th, it really could do with some love (as tickets can't be extended forever). In progress (3/12)
 https://ggus.eu/ws/ticket_info.php?ticket=99524 (6/12)</br>
This Nagios ticket (CADist-Check) looks like it can be closed, as Daniela reminds us the onus is on us to solve our tickets (in all senses of "solved"!). In progress (6/12)
 QMUL</br>
https://ggus.eu/ws/ticket_info.php?ticket=99428 (4/12)</br>
Queen Mary's perfsonar latency box appears to be broken somehow, in a non-obvious way (from my observations, perfsonar's preferred way of breaking). Chris is looking at it, but might have to ask on the perfsonar list (I had forgotten that there was a perfsonar list). In progress (9/12)
 SOLVED CASE PILE</br>
There isn't much excitement on the Solved Case pile. The ngs.ac.uk removal tickets were dealt with quickly by the UK Vomses Teamses. The lfc webdav ticket (91658) has been solved, with a read-only lfc ready to be prodded. And a number of sites have been solving their publishing problems - I found the RAL one quite interesting as it has the summary of what they did at RAL to get their publishing to work with Condor (https://ggus.eu/ws/ticket_info.php?ticket=99162).
 |  
| Tools - MyEGI Nagios |  
| 
Tuesday 26th November
 
 Regional Nagios updated to release 22. It is a glite to UMD update and it required a fresh installation.
 There have been some internal changes in SAM-Nagios. Test probes are now the responsibility of product team. Some test names have been changed as a result of this reorganization.  For example the org.sam.CREAMCE-DirectJobSubmit test has become  emi.cream.CREAMCE-DirectJobSubmit.  This does not affect the operational activities. 
 Please could all site admins look at services associated to their site and please mail Kashif if anything odd is noticed. Site admins can reschedule tests for their sites and it would be helpful if most functionalities are tested.
 Also, look at myegi which can be useful with links to the Dashboard, GSTAT, Accounting Portal and GGUS.  
 |  
| VOs - GridPP VOMS VO IDs Approved VO table |  
| 
Tuesday 9 December 2013
 
 Backup VOMS server
 VO managers still need to check sites  - Scotgrid,northgrid,southgrid,londongrid,gridpp VOs were going first, but have not yet updated their status. 
 Monday 2nd December 2013
 
 Instant UI - progress ( Thanks Stephen Jones)
 Backup VOMS server
 Most VOs have updated the operations portal (scotgrid and southgrid to go).
 T2K.org have tested their resources and are happy they work. Other VOs have not tested them yet. 
 Monday 25th November 2013
 
 CVMFS progress - but not quite there yet. 
 6 VOs (cern@school,gridpp,na62, pheno,sno+,t2k.org ) have updated their VOID card entries and updated the wiki. 
 Storage
 Gfal2 - GGUS  99043,99044,99055,99067 - not performant, but very interesting functionality
 Webdav now enabled on LFC@RAL and ports free from firewall - needs testing
 Tuesday 19 November 2013
 |  |