| 
| General updates |  
| 
Tuesday 16th April
 
 There was an EGI OMB on Friday (agenda)
 It has been agreed that tickets stuck without a response after several reminders will be manually closed as 'unsolved' by GGUS (flow diagram).
 HEPiX is taking place this week in Bologna. (agenda)
 Monday 8th April
 
 As pointed out in Alessandra's email last week, aspects of the experiment computing model evolution is addressed in this DM presentation for ATLAS and CMS and this one on LHCONE.
 Due to the Easter networking outages at RAL (4 days in total), APEL is still catching up. Some sites are publishing several months of data and the server has gone from processing 6 million to 16 million events per day. Sites have been seen to timeout when the consumer table is optimized (new issue) and this is being investigated. Please be patient as the service catches up. WLCG and GridPP are aware that accounting data is not yet up-to-date for March (for the monthly and quarterly reports).
 On Thursday at the EGI Community Forum there is a talk on EMI-3 APEL - if you are a sysadmin and at the EGI CF that day please consider attending to give feedback on the approach. We can also get Will along to an ops meeting.
 
 |  
| WLCG Operations Coordination - Agendas |  
| 
Tuesday 16th April
 Extracts form the 11th April 2013 meeting minutes
 
 New IPv6 compatibility task force is being created to test IPv6 within the experiments frameworks. Sites representatives are needed. Considering the IPv6 effort in the UK perhaps someone wants to join?
 Middleware (WLCG Baseline)
 there was a security release for CREAM, sites should upgrade to it
 now the baseline versions table contains the versions of clients to deploy on UIs and WNs
 EMI-3 has been released but no product is baseline yet; still, sites are free to upgrade services to EMI-3 (the WN needs more testing) 
 CERN WLCG repository to be created in the coming days, can augment EGI WLCG repository and/or serve for failover, will serve various use cases (HEP_OSlibs, XrootD plugins)
Experiments
 CMS Requests to the Tier-2 sites
 Fair share allocations: 50% Role=production or Role=t1production, 40% Role=pilot, 10% remaining CMS
 Provide and publish 48h job queues
CVMFS
 SAM probe for CVMFS currently in preparation may be included into the experiment SAM suites - is this enough for experiment testing?
glexec
 LHCb has to reimplement a good portion of DIRAC because it isn't working anymore: no timeline for this is given
 Atlas panda implementation going on might be finished by the end of May. 
squid
 There was a request to upgrade squid by the end of April to enable the new monitoring however the new monitoring isn't visible yet. CMS has already sent out instructions for their sites, Atlas will do when they are ready hopefully when the monitoring becomes visible. It will require to open an additional port.
 Monday 8th April
 
 The dates of the next WLCG Operations Coordination meetings are: Thursday 11th and 25th April,  15:30 CEST.
 The agenda for Thursday is currently based on the standing items. Let Alessandra or Jeremy know if you have items you would like raised/discussed.
 |  
| Tier-1 - Status Page |  
| 
Tuesday 16th April
 
 Planned intervention on the database behind FTS/LFC services on Thursday was successful.
 Starting more extensive tests of alternative batch system (slurm).
 All new worker nodes in production. Now over 10K job slots (with hyperthreading).
 Part of new disk purchase deployed. (540TB to AtlasDataDisk & 720TB to CMSDisk).
 Investigations are ongoing into problems at batch job set-up.
 |  
| Storage & Data Management - Agendas/Minutes |  
| 
17 April
 
 Good buzz at EGI CF last week: excellent GridPP presence, loads of useful people to talk to.  We spent today's meeting comparing notes.
 Tuesday 9th April
 Monday 1st April
 
 DDN report - see slides circulated by Pete G.
 Wed 20 March 2013 
 
 Ruminated over the agenda items from last week's GDB
 EMI roadmap (dCache, and other things)
 FTS support for HTTP - we knew this but how do we make use of it now
 Storage accounting records, needs updated APEL;
 Work of storage group(s) on interfaces and protocols, and future furlongpebbles.
 RAL D1T0 evaluation.
 Seems to be settling on HDFS and CEPH which will be run anyway
 what about Lustre?
 Presentation to PMB next Monday, but no decision yet.
 
 |  
 
| Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06 |  
| 
Tuesday 12th March
 
 APEL publishing stopped for Lancaster, QMUL and ECDF
 Tuesday 12th February
 
 SL HS06 page shows some odd ratios. Steve says he now takes "HS06 cpu numbers direct from ATLAS" and his page does get stuck every now and then.
 An update of the metrics page has been requested. 
 |  
| Interoperation - EGI ops agendas |  
| 
Tuesday 9th April
 
 There was an EGI ops meeting on 3rd April.
 UMD/SR - note issues with CREAM in UMD-2 - also there's a new CREAM in EMI-2, with security updates.  Does anyone in the UK run CREAM from UMD-2 at the moment?
 EMI-2 WN tarball has passed SR. Expect a deadline for the upgrade soon. gLite 3.2 WN tarballs should be updated ASAP.
 EMI-3 WMS on SL6 doesn't work with Argus (GGUS 92773)
 EMI-3 VOMS Critical issue; fix scheduled April 18th.
 Only APEL and VOMS appear to have stopped supporting YAIM core in the early EMI-3 release.
 Tuesday 2nd April
 
 Minutes of the 20th March EGI ops meeting are available.
 
 |  
| Monitoring - Links MyWLCG |  
| 
Tuesday 9th April
 
 David C has material to present (Glasgow solutions to monitoring) but can not make our Tuesday ops meeting. Looking at options.
 Tuesday 5th February
 
 Task will focus on probes and sharing of useful tools - suggestions and comment welcome
 
 Glasgow dashboard now packaged and can be downloaded here. 
 |  
| On-duty - Dashboard ROD rota |  
| 
Monday 15th April
 
 A lot of alarms because of Networking problem at Tier1 at the start of the week. 
 Three sites have  open  emi tickets.
 Monday 1st April
 
 A new GOCDB field related to the ROD email address was not populated. Emails should now reach the team.
 Tuesday 5th March 
 
 Handling tickets related to EMI-1 probes - what to expect.
 Recommendation with respect to upgrading CE (drain first)
 Tuesday 12th February
 
 Need all ROD members to complete availability survey for the rota.
 |  
| Rollout Status WLCG Baseline |  
| 
Tuesday 2nd April
 
 EMI-1 components should be out of production. Nagios probes will report critical this month. Services remaining (without special condition) beyond 30th April will need to be placed in downtime.
 Monday 4th March
 
 EMI early adopters list by component.
 Do we have a Staged Rollout list for EMI3?
 Tuesday 5th February
 References
 
 |  
| Security - Incident Procedure Policies Rota |  
| 
Monday 16th April
 
 Sites are continuing to upgrade their kernels to rectify CVE-2013-0871.   This vulnerability is still considered HIGH risk by EGI-CSIRT.
 Monday 8th April
 
 We have a number of site notifications from Pakiti. Please check your site summary.
 Tuesday 2nd April
 
 Reminder about ptrace kernel issue (CVE-2013-0871)
 Thanks to all those sites that took part in the security challenge
 Tuesday 5th March
 
 Two openafs vulnerabilities announced (CVE-2013-1794 and CVE-2013-1795).  Further details available at http://www.openafs.org/security.  Updated RPMS for SL5/6 available.
 
 |  
 | 
| Services - PerfSonar dashboard | GridPP VOMS |  
| 
Tuesday 9th April
 
 It is now getting urgent to configure and have enabled the backup VOMS instances at Oxford and Imperial. Please can we arrange a follow-up meeting (postponed last week as Daniela was out).
 Tuesday 2nd April
 
 Impending electrical work at Manchester - we need to commission the backup VOMS arrangement as soon as possible.
 Monday 18th February
 
 PerfSonar tests to BNL reveal poor rates for several sites since upgrade
 Tuesday 5th February
 
 NGS VOMS to be switched off this week
 |  
| Tickets |  
| 
Monday 15th April 2013 14.30 BST</br>
26 Open UK tickets this week, most seem in hand. Here's the one's that jump out. 
I have an ill-timed appointment at the vets so I might not make it to the meeting in time, but the important bits are the gridpp.ac.uk ticket, and the remaining 3 EMI1 upgrade tickets which are in need of updating by the corresponding sites (Glasgow, Durham, RALPP).
 NGI/gridpp.ac.uk</br>
https://ggus.eu/ws/ticket_info.php?ticket=93337 (15/4)</br>
This one stumped me about where it should be sent to, the submitter is having cert problems with the gridpp.ac.uk website- possibly due to the CA certs being out of date. Assigned (15/4) Update- Andrew sorted this out, and the user reports problem solved. Looks like this can be closed.
 GLASGOW</br>
https://ggus.eu/ws/ticket_info.php?ticket=93343 (15/4)</br>
This ticket has been assigned to NGS-GLASGOW, which I'm almost certain is wrong - can one of the Glasgow chaps check and reassign to themselves if I'm right. Assigned (15/4) Update- Gareth solved this one.
 EMI1 Upgrade.</br>
Only the DPM tickets at Glasgow and Durham, and the dcache ticket at RALPP, remain. There are special circumstances around all of them (DPM and dcache versioning is quite separate from the EMI number) but all three have requests for updates on them.</br>
GLASGOW: https://ggus.eu/ws/ticket_info.php?ticket=92805</br>
DURHAM: https://ggus.eu/ws/ticket_info.php?ticket=92804</br>
RALPP: https://ggus.eu/ws/ticket_info.php?ticket=91997 Update- Chris has solved the ticket, although there are still errors on the dashboard everything is upgraded.
 RHUL</br>
https://ggus.eu/ws/ticket_info.php?ticket=92969 (29/3)</br>
Biomed reported seeing negative used space values for the RHUL dpm. Govind attempted to apply the old patch and failed, and has opened a new ticket with the DP devs: https://ggus.eu/tech/ticket_show.php?ticket=93026 In Progress (might want to On Hold if a new patch looks slow in coming) (10/4)
  interest:</br>
https://ggus.eu/ws/ticket_info.php?ticket=92498</br>
I overlooked this one last week, but QMUL's ticket charting their upgrade to EMI3 APEL might be of interest.
 |  
| Tools - MyEGI Nagios |  
| 
Tuesday 16th April
 
 Installation of DIRAC instances at IC pending return of Janusz.
 Tuesday 13th November
 
 Noticed two issues during tier1 powercut. SRM and direct cream submission uses top bdii defined in Nagios configuration to query about the resource. These tests started to fail because of RAL top BDII being not accessible. It doesn't use BDII_LIST so I can not define more than one BDII. I am looking into that how to make it more robust.
 
 Nagios web interface was not accessible to few users because of GOCDB being down. It is a bug in SAM-nagios and I have opened a ticket.
 Availability of sites have not been affected due to this issue because Nagios sends a warning alert in case of not being able to find resource through BDII. 
 
 |  
| VOs - GridPP VOMS VO IDs Approved VO table |  
| 
Monday 8th April
 
 Please note Chris W is away this week.
 Information is being gathered for the Q1 2013 quarterly report. 
 Tuesday 2 April 2013 
 Monday 4th March 2013
 Monday 26th February 2013
 
 NGS VOMS server. Durham fixed. Last site is Glasgow, and I'm running tests now. Hopefully this should now be fixed https://ggus.eu/ws/ticket_info.php?ticket=90356 - note that this has taken 3 months to complete.  
 SNO+ reports lcg-cp timeouts for large files. I suspect this is a problem with the UI.  
 Issues with Proxy renewal. 
 Certificate for RAL myproxy server doesn't match advertised hostname (how does this work at all?).
 Other myproxy issues as well. GGUS#99105 GGUS#9172
 SNO+ Questions 
 
 Jobs appear to fail, but have uploaded output and it is in LFC
 
 MC production
 Want 2-3 people managing this
 Shifters monitoring sites and filing tickets 
 How best to manage certificates - currently upload two proxies to myproxy - one for jobs to renew and one for the UI to renew. 
 How best to do this - should they use a robot cert?
 
 |  |