Data Discovery (DataCite)
DataCite is an organization founded in Germany to help make data more accessible and more useful; their purpose is to develop and support methods to locate, identify and cite data and other research objects. Specifically, they develop and support the standards behind persistent identifiers for data, and their members assign them. They are also known as the originators of Digital Object Identifiers (DOIs). The DataCite Metadata Schema is a list of metadata elements chosen by DataCite for the accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions. It was created by DataCite to help DOI users document resources that were being assigned DOIs. The resource that is being identified can be of any kind, but it is typically a dataset. DataCite uses the term "dataset" in its broadest sense. They mean it to include not only numerical data, but also any other research data outputs. The recommendation has three parts (termed spirals): mandatory concepts, recommended concepts and optional concepts. In the context of the terminology we use (described below), DataCite is an organization that created a set of recommendations at three levels (described in the schema description document) and an XML schema (a dialect) for implementing those recommendations. The dialect is currently being used in the DataCite search portal and in creating DOI landing pages. It can be useful to communities that are trying to improve the way they share metadata. The recommendations are useful for communities looking for expert guidance about metadata elements that are useful for data discovery. The work we are doing explores how those recommendations can be useful for communities that are already using other dialects.
DataCite Metadata Schema for the Publication and Citation of Research Data - Mandatory
The DataCite Metadata Schema is a list of core metadata properties chosen for the accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions. The resource that is being identified can be of any kind, but it is typically a dataset. We use the term 'dataset' in its broadest sense. We mean it to include not only numerical data, but any other research data outputs.
Source: The DataCite Metadata SchemaConcept | Description | Dialect (Fit) Paths |
---|---|---|
Resource Identifier | Identifier for the resource described by the metadata | ADIwg /adiwg:project/adiwg:idinfo/adiwg:ids/adiwg:projguid DIF /dif:DIF/dif:Data_Set_Citation/dif:Dataset_DOI DCAT /dct:identifier ECHO /*/echo:DataSetId ECHO (1) /*/echo:ShortName | /*/echo:LongName ECS /ecs:LocalGranuleID HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Group[@Name='METADATA']/hdf5:Group[@Name='INVENTORYMETADATA']/hdf5:Group[@Name='ProductSpecificMetadata']/hdf5:Attribute[@Name='identifier_file_uuid']/hdf5:Data/hdf5:DataFromFile HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Attribute[@Name='identifier_file_uuid']/hdf5:Data/hdf5:DataFromFile ISO /*/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:CharacterString ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/cit:identifier/mcc:MD_Identifier/mcc:code/gco:CharacterString THREDDS /thredds:catalog/thredds:dataset/@ID netCDF /nc:netcdf/nc:attribute[@nc:name=id]/@nc:value |
Naming Authority | The organization responsible for the maintenance of the identifier (namespace) Note: DIF and ECHO use the Global Change Master Directory (GCMD) keywords. In DCAT this is a controlled vocabulary with only one value (DOI) | HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Attribute[@Name='naming_authority']/hdf5:Data/hdf5:DataFromFile ISO /*/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:authority ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/cit:identifier/mcc:MD_Identifier/mcc:codeSpace/gco:CharacterString THREDDS /thredds:catalog/thredds:dataset/@authority netCDF /nc:netcdf/nc:attribute[@nc:name=naming_authority]/@nc:value |
Author / Originator | The principal author of the resource Note: In CSW this concept is called Creator | DIF /dif:DIF/dif:Data_Set_Citation/dif:Dataset_Creator ECHO /echo:Contacts/echo:Contact[Role='Data Originator'] ECHO /echo:Contacts/echo:Contact[Role='Investigator'] ECHO /echo:Contacts/echo:Contact[Role='Producer'] ECHO /echo:Contacts/echo:Contact[Role='INVESTIGATOR'] ECS /ecs:Author EML concat(//eml:eml/eml:dataset/eml:creator/eml:individualName/eml:givenName/eml:text,//eml:eml/eml:dataset/eml:creator/eml:individualName/eml:surName/eml:text) FGDC /fgdc:metadata/fgdc:idinfo/fgdc:citation/fgdc:citeinfo/fgdc:origin HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Attribute[@Name='creator_name']/hdf5:Data/hdf5:DataFromFile ISO /*/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty[gmd:role/gmd:CI_RoleCode[normalize-space()="author"]] ISO /*/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty[gmd:role/gmd:CI_RoleCode[normalize-space()="originator"]] ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/cit:citedResponsibleParty/cit:CI_Responsibility[cit:role/cit:CI_RoleCode[normalize-space()="author"]] ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/cit:citedResponsibleParty/cit:CI_Responsibility[cit:role/cit:CI_RoleCode[normalize-space()="originator"]] THREDDS //thredds:dataset/thredds:creator/thredds:name netCDF /nc:netcdf/nc:attribute[@nc:name=creator_name]/@nc:value |
Responsible Party Identifier | A unique identifier for a person or an organization | |
Responsible Party Identifier Type | The type of a unique identifier for a person or an organization | |
Resource Title | A short description of the resource. The title should be descriptive enough so that when a user is presented with a list of titles the general content of the data set can be determined. | ADIwg /adiwg:project/adiwg:idinfo/adiwg:citation/adiwg:citeinfo/adiwg:title DIF /dif:DIF/dif:Entry_Title DIF /dif:DIF/dif:Data_Set_Citation/dif:Dataset_Title DCAT /dct:title Dryad dcterms:title ECHO /*/echo:ShortName>/*/echo:LongName ECS /*/ecs:ShortName > /*/ecs:LongName EML /eml:dataset/eml:title/eml:text FGDC /fgdc:metadata/fgdc:idinfo/fgdc:citation/fgdc:citeinfo/fgdc:title HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Attribute[@Name='title']/hdf5:Data/hdf5:DataFromFile HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Group[@Name='METADATA']/hdf5:Group[@Name='COLLECTIONMETADATA']/hdf5:Attribute[@Name='LongName']/hdf5:Data/hdf5:DataFromFile ISO /*/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/cit:title/gco:CharacterString UMM /umm:UMM/umm:CollectionCitation/umm:Title OGC-SOS /sos:Capabilities/ows:ServiceIdentification/ows:Title SERF /serf:SERF/serf:Entry_Title THREDDS /thredds:catalog/thredds:dataset/@name THREDDS /thredds:catalog/thredds:dataset/thredds:metadata/dc:title THREDDS //thredds:dataset[1]/@name netCDF /nc:netcdf/nc:attribute[@nc:name=title]/@nc:value |
Publisher | Publisher of the cited resource | DIF /dif:DIF/dif:Data_Set_Citation/dif:Dataset_Publisher DIF /dif:DIF/dif:Reference/dif:Publisher DCAT /dct:publisher FGDC /fgdc:metadata/fgdc:idinfo/fgdc:citation/fgdc:citeinfo/fgdc:pubinfo/fgdc:publish HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Attribute[@Name='publisher']/hdf5:Data/hdf5:DataFromFile ISO //gmd:CI_ResponsibleParty[gmd:role/gmd:CI_RoleCode[normalize-space()="publisher"]]/gmd:organisationName/gco:CharacterString ISO-1 //cit:CI_Responsibility[cit:role/cit:CI_RoleCode[normalize-space()="publisher"]]/cit:party/cit:CI_Organisation/cit:name/gco:CharacterString THREDDS //thredds:dataset/thredds:publisher/thredds:name THREDDS //thredds:metadata/thredds:publisher/thredds:name netCDF /nc:netcdf/nc:attribute[@nc:name=publisher_name]/@nc:value |
Resource Creation/Revision Date | The date the resource was created | DIF /dif:DIF/dif:Data_Set_Citation/dif:Dataset_Release_Date DCAT /dct:issued ECHO /*/echo:InsertTime ECHO /*/echo:LastUpdate ECS /*/ecs:RevisionDate FGDC /fgdc:metadata/fgdc:idinfo/fgdc:citation/fgdc:citeinfo/fgdc:pubdate HDF5.1 /hdf5:HDF5-File/hdf5:RootGroup/hdf5:Attribute[@Name='date_created']/hdf5:Data/hdf5:DataFromFile ISO //gmd:CI_Citation/gmd:date/gmd:CI_Date[gmd:dateType/gmd:CI_DateTypeCode=[normalize-space()="creation"]]/gmd:date/gco:Date ISO //gmd:CI_Citation/gmd:date/gmd:CI_Date[gmd:dateType/gmd:CI_DateTypeCode=[normalize-space()="creation"]]/gmd:date/gco:DateTime ISO //gmd:CI_Citation/gmd:date/gmd:CI_Date[gmd:dateType/gmd:CI_DateTypeCode=[normalize-space()="revision"]]/gmd:date/gco:Date ISO //gmd:CI_Citation/gmd:date/gmd:CI_Date[gmd:dateType/gmd:CI_DateTypeCode=[normalize-space()="revision"]]/gmd:date/gco:DateTime ISO //gmd:CI_Citation/gmd:date/gmd:CI_Date[gmd:dateType/gmd:CI_DateTypeCode=[normalize-space()="publication"]]/gmd:date/gco:Date ISO //gmd:CI_Citation/gmd:date/gmd:CI_Date[gmd:dateType/gmd:CI_DateTypeCode=[normalize-space()="publication"]]/gmd:date/gco:DateTime ISO-1 //cit:CI_Citation/cit:date/cit:CI_Date[cit:dateType/cit:CI_DateTypeCode=[normalize-space()="creation"]]/cit:date/gco:DateTime ISO-1 //cit:CI_Citation/cit:date/cit:CI_Date[cit:dateType/cit:CI_DateTypeCode=[normalize-space()="revision"]]/cit:date/gco:DateTime ISO-1 //cit:CI_Citation/cit:date/cit:CI_Date[cit:dateType/cit:CI_DateTypeCode=[normalize-space()="publication"]]/cit:date/gco:DateTime SERF /serf:SERF/serf:Service_Citation/serf:Release_Date THREDDS /thredds:catalog/thredds:metadata/thredds:date[@type='created'] netCDF /nc:netcdf/nc:attribute[@nc:name=date_created]/@nc:value |
xPath Note: The xPaths included in this table use several wildcards. // means any path, so //gmd:CI_ResponsibleParty indicates a gmd:CI_ResponsibleParty anywhere in an XML file. /*/ indicates a single level with several possible elements. This usually indicates one of several concrete realizations of an abstract object. For example /*/gmd:identificationInfo could be gmd:MD_Metadata/gmd:identificationInfo or gmi:MI_Metadata/gmd:identificationInfo and gmd:identificationInfo/*/gmd:descriptiveKeywords could be gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords or gmd:identificationInfo/srv:SV_ServiceIdentification/gmd:descriptiveKeywords. Fit: The fit of the dialect path with the concept is estimated on a scale of 1 = excellent two-way fit, 2 = one-way fit or some other problem, 3 - extension required.