Documenting Data Centers and Distributors

From Earth Science Information Partners (ESIP)

Describing how to access data is an important role of metadata as data access follows data discovery for many users. Once an interesting data set is discovered, users want to get the data so they can use it. This access information is usually provided as descriptions of a Data Center or a Distributor, contact information and a set of links to the data.

ECHO

The ECHO metadata model includes several data access elements.

Organizations

<echo:ArchiveCenter/>
<echo:Contact>
   <echo:Role = "distributor"/>
   <echo:HoursOfService/>
   <echo:Instructions/>
   <echo:OrganizationName/>
   <echo:OrganizationAddresses/>
   <echo:OrganizationPhones/>
   <echo:OrganizationEmails/>
   <echo:ContactPersons/>
</echo:Contact>

NASA GCMD Directory Interchange Format

Organizations

Organizations (and contacts) that are responsible for distributing data are "Data Centers" in the Directory Inthttp://wiki.esipfed.org/skins/common/images/button_bold.pngerchange Format. They have the following properties:

<dif:Data_Center uuid="UUID">
    <dif:Data_Center_Name/>
    <dif:Data_Center_URL/>
    <dif:Data_Set_ID/>
    <dif:Personnel/>
</dif:Data_Center>

The Data_Center field is required and can be repeated.

Download Details

Details about the downloads are included in the Distribution section of the metadata record. They include:

<dif:Distribution>
    <dif:Distribution_Media/>
    <dif:Distribution_Size/>
    <dif:Distribution_Format/>
    <dif:Fees/>
</dif:Distribution>

The Distribution field is highly recommended and may be repeated.

ISO 19115

Data distribution information is described in "gmd:MD_Distribution" sections in ISO 19115. They have the following properties:

<gmd:distributionInfo>
    <gmd:MD_Distribution>
        <gmd:distributionFormat/>
        <gmd:distributor/>
        <gmd:transferOptions/>
    </gmd:MD_Distribution>
</gmd:distributionInfo>

NetCDF Attribute Convention for Data Discovery

Information about Data Centers is included in a set of publisher attributes in the Attribute Convention for Data Discovery (ACDD):

<nc:attribute name="publisher_name">
<nc:attribute name="publisher_url">
<nc:attribute name="publisher_email">

THREDDS Catalog

Information about Data Centers is included in a set of publisher attributes in the Attribute Convention for Data Discovery (ACDD):

<thredds:metadata>
   <thredds:publisher>
      <thredds:name/>
      <thredds:contact url="url" email="email">
   </thredds:publisher>
</thredds:metadata>


Connections

The information content of these models for distribution information overlaps significantly so it is possible to represent most important content in all three dialects and to do some translations without losing information. There are a few differences that might be important in specific cases:

DIF to ISO

  1. The DIF model separates information about organizations (Data Centers) from information about people that work in these organizations (Personnel). The ISO model combines organizations, positions, and people into a single object (gmd:CI_ResponsibleParty or gmd:CI_Party in 19115-1). The information in the dif:Data_Center object is combined with information from the dif:Data_Center/dif:Personnel object into a single gmd:distributorContact object in the ISO model.
  2. The DIF model assigns roles to people and provides a list of standard role names (Investigator, Technical Contact, or DIF Author). The ISO model has a longer list of role names that includes distributor. In the translation from DIF to ISO, the dif:Personnel/dif:Role translates to gmd:positionName in order to preserve the DIF information as well as the standard ISO code.
  3. The Data_Center_Name includes a ShortName and LongName that must be selected from the GCMD Data Center Keyword list. A decision must be made how these should be combined in the ISO gmd:organizationName. Currently the combination is gmd:organizationName = dif:ShortName > dif:LongName.
  4. The dif:Data_Set_ID included in the dif:Data_Center object identifies the dataset and is controlled by the Data Center. The inclusion of this attribute in the dif:Data_Center object makes it difficult to reuse Data Center information across multiple records and may make it more difficult translate this information into different dialects. The translation to ISO separates the identifier from the distribution information. it becomes an gmd:MD_Identifier for the data set with an authority of the Data Center.
  5. The dif:Distribution_Format field holds a format name from the suggested Format Keywords list. The ISO gmd:MD_Format object includes a name that can match the DIF keyword along with a Version, a reference to the specification (19115-1), and other information.
  6. The dif:Distribution_Media holds a media name from the suggested Media Keywords. The ISO gmd:MD_Medium object includes a gmd:mediumName attribute that can hold the dif:distributionMedia information along with a MD_MediumFormat codeList that provides a shared vocabulary for medium formats, and other information.
  7. Many-to-Many relationships: All three of the distribution elements (dif:Data_Center, dif:Distribution, and dif:Related_URL) are repeatable in a DIF record. If there are more than one of any of these elements the relationships between them may not be clear. For example, if there are two dif:Distribution/dif:Distribution_Format objects and two dif:Data_Center objects, it is not clear how one could tell which format is distributed by which dif:Data_Center or, if there are more than one GET DATA URLs, which format is available from which URL. These relationships are clear in the ISO model because related information is grouped in a distributionInfo or distributor object.

ECHO to ISO

  1. The echo:ArchiveCenter is typically the same as the echo:OrganizationName in the echo:Contact with the echo:Role = "distributor".

Crosswalks

ConceptDescriptionDialect (Fit) Paths
MediaMedia on which the resource is availableISO /gmd:MD_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:transferOptions/gmd:MD_DigitalTransferOptions/gmd:offLine/gmd:MD_Medium/gmd:name/gmd:MD_MediumNameCode/@codeListValue
ISO-1 /mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:transferOptions/mrd:MD_DigitalTransferOptions/mrd:offLine/mrd:MD_Medium/mrd:name/cit:CI_Citation/cit:title/gco:CharacterString
DIF /dif:DIF/dif:Distribution/dif:Distribution_Media
Resource Cost or FeesCost associated with access to the resourceISO /*/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributionOrderProcess/gmd:MD_StandardOrderProcess/gmd:fees/gco:CharacterString
ISO-1 /mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributionFormat/mrd:MD_Format/mrd:formatDistributor/mrd:MD_Distributor/mrd:distributionOrderProcess/mrd:MD_StandardOrderProcess/mrd:fees/gco:CharacterString
ECHO /*/echo:Price
DIF /dif:DIF/dif:Distribution/dif:Fees
Resource FormatThe physical or digital manifestation of the resourceISO /*/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorFormat/gmd:MD_Format/gmd:name/gco:CharacterString
ISO /*/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributionFormat/gmd:MD_Format/gmd:name/gco:CharacterString
ISO-1 /mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributionFormat/mrd:MD_Format
ISO-1 /mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributor/mrd:MD_Distributor/mrd:distributorFormat/mrd:MD_Format
ECHO /*/echo:DataFormat
DIF /dif:DIF/dif:Distribution/dif:Distribution_Format
Transfer SizeThe size of the digital resourceISO /gmd:MD_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributionFormat/gmd:MD_Format/gmd:formatDistributor/gmd:MD_Distributor/gmd:distributorTransferOptions/gmd:MD_DigitalTransferOptions/gmd:transferSize/gco:Real
ISO /gmd:MD_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorTransferOptions/gmd:MD_DigitalTransferOptions/gmd:transferSize/gco:Real
ISO-1 /mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributionFormat/mrd:MD_Format/mrd:formatDistributor/mrd:MD_Distributor/mrd:distributorTransferOptions/mrd:MD_DigitalTransferOptions/mrd:transferSize/gco:Real
ISO-1 /mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributor/mrd:MD_Distributor/mrd:distributorTransferOptions/mrd:MD_DigitalTransferOptions/mrd:transferSize/gco:Real
ECHO /echo:Granule/echo:DataGranule/echo:SizeMBDataGranule
DIF /dif:DIF/dif:Distribution/dif:Distribution_Size
Distribution ContactContact information for the organization or individual that distributes the resource.ISO /gmd:MD_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorContact/gmd:CI_ResponsibleParty
ISO-1 /mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributor/mrd:MD_Distributor/mrd:distributorContact/cit:CI_Responsibility
ECHO /echo:Collection/echo:Contacts/echo:Contact
DIF /dif:DIF/dif:Data_Center/dif:Personnel/dif:Role

xPath Note: The xPaths included in this table use several wildcards. // means any path, so //gmd:CI_ResponsibleParty indicates a gmd:CI_ResponsibleParty anywhere in an XML file. /*/ indicates a single level with several possible elements. This usually indicates one of several concrete realizations of an abstract object. For example /*/gmd:identificationInfo could be gmd:MD_Metadata/gmd:identificationInfo or gmi:MI_Metadata/gmd:identificationInfo and gmd:identificationInfo//*/gmd:descriptiveKeywords could be gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords or gmd:identificationInfo/srv:SV_ServiceIdentification/gmd:descriptiveKeywords. Fit: The fit of the dialect path with the concept is estimated on a scale of 1 = excellent two-way fit, 2 = one-way fit or some other problem, 3 - extension required.

Metadata Implementation