| 
| General updates |  
| 
 Tuesday 19th May
 
	There was a GDB last week. The summary is available.
	The summary of the pre-GDB about batch systems is available.
	GridPP contacts for other VOs established (these are a current priority). Contacts expected to provide weekly updates on progress and status. 
 DIRAC: Jens Jensen (-> Brian Davies?) – vo being created
 LIGO: Catalin Condurache – vo being created
 LOFAR: George Ryan
 LSST: Alessandra Forti
 LZ: David Colling
 UKQCD: Jeremy Coles
 
 	Glexec: Matt is redirecting efforts from coming up with a relocatable glexec tarball, to a recipe that sites could follow. He comments that this would be a lot more involved than he would like for a tarball install, but thinks that it's the only way to proceed with any confidence.
	gstat is not supported. Note this ticket.
	The network issues resolution process/procedure...
	Assessment of the impact on User Communities/NGIs of the EGI core activities 2015 (results uploaded to the meeting page)
	The EGI conference is taking place this week - link to the detailed agenda. 
	A reminder that the HEPSYSMAN & security training meeting is taking place 1st-3rd June. 
	STFC (through Catalin Condurache) are interested in investigating joining EGI Fed Cloud
 
 Tuesday 12th May
 Monday 11th May
 
 There was an EGI Operations Management Board (OMB) meeting on 30th April. 
 Operations updates:
 12 service types will be removed from GOC DB due to not being used. They are defined in GGUS 113432
 A list Tools-admins at mailman.egi.eu has been created for ops tools administrator discussion.
 EGI OLA period 1 May 2015 - 30 April 2016
 Security coordination moves to CERN after SNIC.
 Only NGI-Argus servers should accept Nagios probes
 What HPC facilities are available in NGIs for federating?
 Suggestion for common RC suspension process.
 EGI conference in Lisbon 18-22 May.
 FedCloud
 No stable monitoring tests. Proposal to create a new CLOUD-MON_CRITICAL (inc. eu.egi.cloud.APEL-Pub; eu.egi.cloud.OCCI-VM ...). 
 New sites IN2P3-IRES (FR) and NCG-INGRID-PT (PT). 2 others in process.
 EGI to provide capacity to instantiate virtual machines to run the computational tasks (on earth observation datasets) generated by the users of the ESA funded Terradue for the development of the e-Collaboration for Earth Observation (e-CEO) platform.
Auger moving to production on FedCloud.
 EGI CSIRT
 Concern about effort going into perfSONAR issues (cacti; web interface; shellshocked...)
 CRITICAL CVE handling. Want EGI CSIRT hook into site re-certification by NGIs.
 Have no way to probe specific WNs. Proposed pakiti client run manually. (More UK feedback given).
 EGI-CSIRT got reviewed by TI and certified according to maturity parameters. Looking to run review on sites/NGIs.
 UMD support for SL5/SL6
 Torque 4.2 is not backward compatible to 2.5.7. Update not recommended. Move to Torque 2.5.13 (patched by SVG) using AppDB repositoy with highest priority.
 SL5 support alligned with RHEL5. In "Maintenance" until March 31, 2017 ... but >80% sites not using it anyway and some sites on SL7 + struggling with MW deployment.
 Supporting CentOS7 in UMD requires to schedule the end of support of SL5 in UMD.
 EPEL7/CentOS7: 13 products are ready for EPEL7.
 No move from SL5 campaign foreseen. 
 60% of cloud sites base their cloud infrastructure on RHEL-compat distribution. Most of these are Ubuntu.
 Proposal: UMD4: September 2015. Decommissioning of SL5: March 2016.
 ARGO Central Monitoring
 Deploy test central instance in May. Review results in June.
 High availability instances deployment in July (Croatia and Greece). Monitor during August.
 Switch A/R engine in September.
 Decommission NGI instances October 2015 (they can still be run for local alarms).
 EGI Strategy Summary
 See document. Basically: Expand cloud. Push 'commons' and open platforms.
 "Consider open science as a production and dissemination system that needs integrated, easy and fair access to several types of shared resources (physical, digital, intellectual), engaged communities that contribute to the process and collaborates in the management and stewardship of the resources, a suitable governance with rules to allow/exclude access, to resolve conflicts, and finally financial support for the long-term availability". 
 Tuesday 5th May
 
 It is a CMS week this week.
 A pre-GDB on batch systems is taking place next Tuesday 12th May. More T2 participation is sought. Still need to define T2 GDB rep. 
 CHEP'15 proceedings submissions due byMay 17th.
 April A/R figures circulated. No real issues this month except getting UCL (VAC/Cloud only) site correctly monitored.
 |  
| WLCG Operations Coordination - Agendas |  
| 
Thursday 7th May
 
 The agenda. Minutes
 News: Alessandra will present the WLCG workshop conclusions at next week's GDB.
 Middleware news: UMD 3.12.0 released this week (fixes for ARGUS-PAP and dCache server)
 Middleware baselines: dCache 2.6.x removed. New version 2.10.28/ 2.12.8 of dCache. Sites should avoid simultaneous updates.
 Middleware issues: major upgrade of torque arrived in EPEL (from torque-2.5.7 to torque-4.2.10) which is not compatible standard EMI torque installation. If upgraded the patched 2.5.13 version of torque has been pushed to the EMI third-party repo in order to downgrade. 
 T0 & T1 upgrades: FTS 3.2.33 upgraded at CERN & RAL.
 T0 news:  batch HTCondor pilot is open for grid submission. Lower-than-usual WLCG availability figures in March for Atlas and CMS - possible overload.
 T1 feedback: NTR
 T2 feedback: NTR
 OS support in UMD: Plans in EGI for CentOS7 support. 13 products are ready for EPEL7, but in general CentOS7 is not a viable option for sites. The release of UMD4 (supporting EPEL7 and Ubuntu) is foreseen for September 2015 and the decommissioning of SL5 for March 2016. It is likely that some products relevant for WLCG will not be ready for EPEL7 before 2016. The requirement for WLCG is to provide SL6 until the end of Run2, however, there are already offers for resources on CentOS7 and this is an incentive for experiments to validate their software on it.
 ALICE: CASTOR at CERN - some re-reco job instabilities. 
 ATLAS: ~running full. Considering increasing job lengths for all MCORE jobs. Need sites to provide MCORE resources. Rucio/FTS issue was discovered - fix via update. Tier-0 data and computing workflow fully commissioned.
 CMS: CMS production activities continue - Several sites reported network saturation. Evaluating to use selected “strong" Tier-2 sites to add computing capacity for DIGI-RECO. Plan to drop support of CRC32 checksum in CMS data transfer systems.
 LHCb: Various operational issues reported - CASTOR/CERN SRM access problems; other data access issues.
 gLExec: ATLAS 61 out of 94 sites. RAL, RALPP and TW-FTT issue was due to a bug in the pilot code that showed up with ARC CE + Condor sites.
 SHA-2:  old VOMS server aliases (lcg-)voms.cern.ch were removed on Tue Apr 28.
 RFC proxies: RFC proxy readiness to be followed up per experiment. SAM-Nagios proxy renewal code fix to support RFC proxies.
 Machine/Job features: NTR
 MW readiness: 10th meeting on 6th agenda. WG is making a check-point of goals and priorities. ARGUS testbed at CERN is set-up and ready to start. Pakiti client requested at other test sites. 
 MC deployment: NTR
 IPv6: LHCb: DIRAC was made IPv6-compatible back in November, but testing has started in April. Issue found at CERN with python library (wrong IPV6 address returned).
 Network/Transfers WG: NTR
 HTTP deployment: perfSONAR - Security: NDT 3.7.0.1 was released. The latest perfSONAR Toolkit version that all sites should be running is 3.4.2-12.pSPS. Network performance incidents process put in place as was agreed at the last meeting. OSG/Datastore validation progressing well. Publishing results to message bus progressing, development has finalized for esmond2mq prototype. Recent meeting focussed on FTS performance. Next meeting 3rd June. Plan is to focus it on latency ramp up and proximity service.
 |  
| Tier-1 - Status Page |  
| 
Monday 18th May
 
 A reminder that there is a weekly Tier-1 experiment liaison meeting. 
 The agenda follows this format:
 1. Summary of Operational Status and Issues
 2. Highlights/summary of the Tier1 Monday operations meeting (Grid Services; Fabric; CASTOR and Other)
 3. Experiment plans and operational issues (CMS; ATLAS; LHCb; ALICE and Others)
 4. Special presentations
 5. Actions
 6. Highlights for Operations Bulletin Latest
 7. AoB
 
 Tuesday 129th May
 
 Remaining CREAM CEs were turned off last week.
 The problems with our primary network router are still being followed up - likely to be an intervention one morning next week (to be planned).
 We are planning an update to the version of the Oracle database behind Castor. Dates to be finalised.
 |  
| Storage & Data Management - Agendas/Minutes |  
| 
Tuesday 18th May
 Tuesday 21st April
 
 Has there been any Tier-1 contact with DiRAC?
 Proposal to setup an 'other VOs' users list. GridPP-Users is too tied with WLCG projects.
 Wednesday 15 April
 
 Backing up data from DiRAC to GridPP (tape)
 More case studies on supporting non-LHC VOs on GridPP: we have a lot of great stuff that can do great stuff - non-LHC VOs tend to have less regimented data models so maybe we need more case studies.
 |  
 
| Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06 |  
| 
Tuesday 12th May
 
 Issues noted with sync for Brunel, Liv, ECDF (see EGI ticket 113473). Message broker issues (memory related) are likely the underlying EGI problem.
 Need to check on VAC sync publishing.
 Tuesday 21st April
 
 (Slight) Accounting delays seen for: UCL; Sheffield; QMUL & RALPP. 
 Tuesday 14th April
 
 APEL delays for UCL; Sheffield; RALPP and Bristol
 |  
| Documentation - KeyDocs |  
| 
See the worst KeyDocs list for documents needing review now and the names of the responsible people.
 Tuesday 21st April
 
 The Approved VOs document has been updated to take account of changes to the Ops Portal VOID cards.For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk  and voms03.gridpp.ac.uk  have both been updated from 15003 to 15503. Sites that support SNOPLUS.SNOLAB.CA should ensure that their configuration conforms to these settings: Approved VOs
 
 KeyDocs still need updating since agreements reached at last core ops meeting.
 
 New section in Wiki called "Project Management Pages".
 The idea is to cluster all Self-Edited Site Tracking Tables
in here. Sites should keep entries in Current Activities
up to date. Once a Self-Edited Site Tracking Tables has
served its purpose, PM to move it to  Historical Archive 
or otherwise dispose of the table.
 |  
| Interoperation - EGI ops agendas |  
| 
Tuesday 21st April
 
 There was an EGI ops meeting on Monday 20th.
 David updated the UK SL5 response.
 Please review the agenda/minutes.
 Monday 9th March
 
 The agenda for February's EGI ops meeting is here. Minutes are here
 
 APEL 1.4.0
 Added Month and Year columns to primary key of CloudSummaries table in cloud schema.
 DPM-Xrootd 3.5.2 is in EPEL stable - this is the first version of the component compatible with xrootd4
 gLExec-wn - v. 1.2.3: lcmaps-plugins-c-pep 1.3.0-1 & mkgltempdir 0.0.5-1
 "The lcmaps-plugins-c-pep-1.3.0-1 preferably needs the argus-pep-api-c-2.3.0. This version will be released into EMI & UMD repositories in a near future."
 UMD 3.11.0 released on 16.02.2014, UMD 3.11.1 released on 4.03.2014
 lcg-CA 1.62 noted with an intention to broadcast these as they occur as opposed to monthly.
 EGI looking at the decommissioning of SL5, possibly by end of 2015, as a byproduct of adding CentOS 7 to UMD. NGIs to make a note if extended SL5 support is required.
 Vincenzo Spinoso has joined EGI Ops team from NGI_IT. Vincenzo will chair EGI Ops. 
 Next meeting is April 20th.
 
 |  
| Monitoring - Links MyWLCG |  
| 
 Tuesday 31st March
 Monday 7th December
 |  
| On-duty - Dashboard ROD rota |  
| 
Monday 11th May
 
 Rota responses awaited from Andrew and Daniela.
 Handover summary should be uploaded to the bulletin please.
 Tuesday 28th April
 
 Glasgow: A GLUE2 problem is transient and doesn't have a short-term solution (if the service status was checked a little more frequently it would help).  Currently on hold. IC sometimes see this too.
 
 UCL: No change to the on-going situation.  UCL has hopped from one downtime to another this week. Note – AM visiting UCL this week to setup VAC. Services will be decommissioned after this step.
 Tuesday 21st April
 
 UCL have put themselves into a downtime until the 21st April. (Start of next week). Noted this in their outstanding tickets.
 Birmingham's availability has steadily recovered over the week - and the low availability ticket against them should be closable next week.
 
 |  
| Rollout Status WLCG Baseline |  
| 
Tuesday 12th May
 
 MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.
 Tuesday 17th March
 
 Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.
 There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.
 Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.
 References
 
 |  
| Security - Incident Procedure Policies Rota |  
| 
Tuesday 18th May
 
 EGI SVG and CSIRT Advisory "Critical/Low?".  "VENOM: QEMU vulnerability (CVE-2015-3456)
 Issue with VM appliance - image ships with ...
 EGI SVG Advisory 'High' Risk - Dirac SQL injection vulnerability [EGI-SVG-2014-7553]
 IGTF is about to release an update to the trust anchor repository (1.64)
 Tuesday 12th May
 |  
 | 
| Services - PerfSonar dashboard | GridPP VOMS |  
| 
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).
 Tuesday 12th May
 
 LHCOPN & LHCONE joint meeting at LBL June 1st & 2nd. Agenda taking shape.
 Tuesday 31st March
 Tuesday 10th March
 
 From the recent WLCG meeting, two slides (1 & 2) give the direction of the network monitoring and metrics progress: integration of perfSONAR event types into experiment monitoring and an architecture for data to get from RSV probes to client. Components described on slide 3.
 The next LHCOPN and LHCONE joint meeting will take place on Monday 1st and Tuesday 2nd of June 2015 in Berkeley (US) (hosted by LBL and ESnet).
 |  
| Tools - MyEGI Nagios |  
| 
Tuesday 17th February
 
 Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?
 Tuesday 27th January
 
 Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/
 
 |  
| VOs - GridPP VOMS VO IDs Approved VO table |  
| 
Tuesday 19th May
 
 There is a current priority for enabling/supporting our joining communities. 
 Tuesday 5th May
 
 We have a number of VOs to be removed. Dedicated follow-up meeting proposed.
 Tuesday 28th April
 
 For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk  and voms03.gridpp.ac.uk  have both been updated from 15003 to 15503.
 Tuesday 31st March
 
 LIGO are in need of additional support for debugging some tests.
 LSST now enabled on 3 sites. No 'own' CVMFS yet.
 |  
| Site Updates |  
| 
Tuesday 24th February
 
 Next review of status today.
 Tuesday 27th January
 
 Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster
 Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.
 Tuesday 2nd December 
 
 Multicore status. Queues available (63%)
 YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)
 NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)
 
 According to our table for cloud/VMs (26%)
 YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)
 NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)
 
 GridPP DIRAC jobs successful  (58%)
 YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)
 NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)
 
 IPv6 status
 Allocation - 42%
 YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)
 NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex
 
 Dual stack nodes - 21%
 YES: Brunel; IC; QMUL; Oxford (4)
 NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)
 
 Tuesday 21st October
 
 High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).
 Tuesday 9th September
 
 Intel announced the new generation of Xeon based on Haswell.
 
 |  |