DIRAC FileCatalog MetaData
From GridPP Wiki
This Wiki page has been frozen and will soon become obsolete. The current version can be found under https://github.com/ic-hep/gridpp-dirac-users/wiki.
Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation.
The DIRAC FileCatalog has two types of Metadata:
Metadata for files and for directories.
Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:
Through the CLI
For a file:
dirac-dms-filecatalog-cli
create index:
FC:/gridpp/user/d/daniela.bauer>meta index -f testfiles int
Added metadata field testfiles of type int
show will show you all the tags available for your VO (here:gridpp)
FC:/gridpp/user/d/daniela.bauer>meta show
FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
DirectoryMetaFields : {'JMMetaInt2': 'INT'}
attach metadata to files:
FC:/gridpp/user/d/daniela.bauer>meta set test-man testfiles 1
/gridpp/user/d/daniela.bauer/test-man {'testfiles': '1'}
FC:/gridpp/user/d/daniela.bauer>meta set test-qmul testfiles 1
/gridpp/user/d/daniela.bauer/test-qmul {'testfiles': '1'}
find all files that are associated with a certain metadata tag:
FC:/gridpp/user/d/daniela.bauer>find /gridpp testfiles=1
Query: {'testfiles': 1}
/gridpp/user/d/daniela.bauer/test-man
/gridpp/user/d/daniela.bauer/test-qmul
For a directory:
dirac-dms-filecatalog-cli
FC:/gridpp/user/d/daniela.bauer>meta index -d testdir int
Added metadata field testdir of type int
FC:/gridpp/user/d/daniela.bauer>meta show
FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
DirectoryMetaFields : {'JMMetaInt2': 'INT', 'testdir': 'INT'}
FC:/gridpp/user/d/daniela.bauer>meta set /gridpp/user/d/daniela.bauer testdir 1
/gridpp/user/d/daniela.bauer {'testdir': '1'}
'find' does not seems to work. Hrmpf.
You can verify that the metadata is set on a directory by doing:
FC:/> meta get /gridpp/user/d/daniela.bauer
!testdir : 1
Through the API
#!/usr/bin/env python
"""
requires a DIRAC UI to be set up (source bashrc)
and a valid proxy: dirac-proxy-init -g [your vo here]_user
"""
from __future__ import print_function
# DIRAC does not work otherwise
from DIRAC.Core.Base import Script
Script.initialize()
# end of DIRAC setup
from DIRAC.Interfaces.API.Dirac import Dirac
from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient
def main():
fcc = FileCatalogClient()
# show available fields
print(fcc.getMetadataFields())
# create a new *file* (-f) index
res = fcc.addMetadataField('testmetaapi','INT','-f')
print(res)
# index a file or two with this metadata
fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2})
fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2})
print("Testing the find command for testmetaapi = 2")
print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp"))
if __name__ == "__main__":
main()
The official DIRAC documentation on the topic can be found here.