Attribute Convention for Data Discovery Object Conventions

From Earth Science Information Partners (ESIP)
Revision as of 22:41, February 6, 2014 by Graybeal (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Object Conventions for Data Discovery

The ability to group data and metadata into objects has become an important organizational tool as the complexity of even "simple" datasets increases. Groups have been included in HDF for many years and in netCDF as part of the extended model for nearly as long. In addition to organizing data, these groups can be used to naturally organize metadata elements into commonly used objects.

This grouping capability could be very helpful for organizing the attributes defined in this convention into re-useable objects. For example, there are currently at least twelve discovery attributes that, while currently independent, could be combined into a re-usable citation object:

<attribute name="id" value="netcdf/attribute/@name=id"/>
<attribute name="title" value="netcdf/attribute/@name=title"/>
<attribute name="creator_email" value="netcdf/attribute/@name=creator_email"/>
<attribute name="creator_name" value="netcdf/attribute/@name=creator_name"/>
<attribute name="creator_url" value="netcdf/attribute/@name=creator_url"/>
<attribute name="institution" value="netcdf/attribute/@name=institution"/>
<attribute name="publisher_name" value="netcdf/attribute/@name=publisher_name"/>
<attribute name="publisher_url" value="netcdf/attribute/@name=publisher_url"/>
<attribute name="publisher_email" value="netcdf/attribute/@name=publisher_email"/>
<attribute name="date_created" value="netcdf/attribute/@name=date_created"/>
<attribute name="date_modified" value="netcdf/attribute/@name=date_modified"/>
<attribute name="date_issued" value="netcdf/attribute/@name=date_issued"/>

can be combined into

group name="citation" uuid="UUID">
  <attribute name="objectType" value="acdd:Citation"/>
  <attribute name="title" value="netcdf/attribute/@name=title"/>
  <attribute name="identifier" value="netcdf/attribute/@name=id"/>
  <group name="creationDate" uuid="UUID">
    <attribute name="objectType" value="acdd:date"/>
    <attribute name="date" value="netcdf/attribute/@name=date_created"/>
    <attribute name="dateType" value="creation"/>
  </group>
  <group name="modificationDate" uuid="UUID">
    <attribute name="objectType" value="acdd:date"/>
    <attribute name="date" value="2012-02-29"/>
    <attribute name="dateType" value="modified"/>
  </group>
  <group name="issuedDate" uuid="UUID">
    <attribute name="objectType" value="acdd:date"/>
    <attribute name="date" value="2012-02-29"/>
    <attribute name="dateType" value="issued"/>
  </group>
  <group name="originator" uuid="UUID">
    <attribute name="objectType" value="acdd:ResponsibleParty"/>
    <attribute name="role" value="originator"/>
    <attribute name="individualName" value="netcdf/attribute/@name=creator_name"/>
    <attribute name="organisationName" value="netcdf/attribute/@name=institution"/>
    <attribute name="electronicMailAddress" value="netcdf/attribute/@name=creator_email"/>
    <group name="onlineResource" uuid="UUID">
      <attribute name="objectType" value="acdd:OnlineResource"/>
      <attribute name="linkage" value="netcdf/attribute/@name=creator_url"/>
      <attribute name="name" value="A title for the URL"/>
      <attribute name="description" value="A description of the URL"/>
      <attribute name="function" value="information"/>
    </group>
  </group>
  <group name="publisher" uuid="UUID">
    <attribute name="objectType" value="acdd:ResponsibleParty"/>
    <attribute name="role" value="publisher"/>
    <attribute name="individualName" value="netcdf/attribute/@name=publisher_name"/>
    <attribute name="organisationName" value="netcdf/attribute/@name=publisher_organization"/>
    <attribute name="electronicMailAddress" value="netcdf/attribute/@name=publisher_email"/>
    <group name="onlineResource" uuid="UUID">
      <attribute name="objectType" value="acdd:OnlineResource"/>
      <attribute name="linkage" value="netcdf/attribute/@name=publisher_url"/>
      <attribute name="name" value="A title for the URL"/>
      <attribute name="description" value="A description of the URL"/>
      <attribute name="function" value="information"/>
    </group>
  </group>
</group>

There are several interesting ideas illustrated in this example

  1. the citation object includes several other types of objects: responsibleParties and dates.
  2. the dates and responsibleParties include types and roles that allow the standard to be extended with shared vocabularies for these elements rather than with new named attribute
  3. each object is identified by a UUID that has no semantics. This is proposed in order to avoid problems of multiple groups with the same names.
  4. each object has a type which is proposed in an acdd namespace that would describe these types and their semantics.

A more complete example:

<?xml version='1.0' encoding='UTF-8'?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <!-- Metadata Reference -->
  <attribute name="naming_authority" value="netcdf/attribute/@name=naming_authority"/>
  <attribute name="Metadata_Conventions" value="netcdf/attribute/@name=Metadata_Conventions"/>
  <attribute name="Metadata_Link" value="netcdf/attribute/@name=Metadata_Link"/>
  <attribute name="metadata_link" value="netcdf/attribute/@name=metadata_link"/>
  <!-- Text Search -->
  <group name="citation" uuid="UUID">
    <attribute name="objectType" value="acdd:Citation"/>
    <attribute name="title" value="netcdf/attribute/@name=title"/>
    <attribute name="identifier" value="netcdf/attribute/@name=id"/>
    <group name="creationDate" uuid="UUID">
      <attribute name="objectType" value="acdd:date"/>
      <attribute name="date" value="netcdf/attribute/@name=date_created"/>
      <attribute name="dateType" value="creation"/>
    </group>
    <group name="modificationDate" uuid="UUID">
      <attribute name="objectType" value="acdd:date"/>
      <attribute name="date" value="2012-02-29"/>
      <attribute name="dateType" value="modified"/>
    </group>
    <group name="issuedDate" uuid="UUID">
      <attribute name="objectType" value="acdd:date"/>
      <attribute name="date" value="2012-02-29"/>
      <attribute name="dateType" value="issued"/>
    </group>
    <group name="originator" uuid="UUID">
      <attribute name="objectType" value="acdd:ResponsibleParty"/>
      <attribute name="role" value="originator"/>
      <attribute name="individualName" value="netcdf/attribute/@name=creator_name"/>
      <attribute name="organisationName" value="netcdf/attribute/@name=institution"/>
      <attribute name="electronicMailAddress" value="netcdf/attribute/@name=creator_email"/>
      <group name="onlineResource" uuid="UUID">
        <attribute name="objectType" value="acdd:OnlineResource"/>
        <attribute name="linkage" value="netcdf/attribute/@name=creator_url"/>
        <attribute name="name" value="A title for the URL"/>
        <attribute name="description" value="A description of the URL"/>
        <attribute name="function" value="information"/>
      </group>
    </group>
    <group name="publisher" uuid="UUID">
      <attribute name="objectType" value="acdd:ResponsibleParty"/>
      <attribute name="role" value="publisher"/>
      <attribute name="individualName" value="netcdf/attribute/@name=publisher_name"/>
      <attribute name="organisationName" value="netcdf/attribute/@name=publisher_organization"/>
      <attribute name="electronicMailAddress" value="netcdf/attribute/@name=publisher_email"/>
      <group name="onlineResource" uuid="UUID">
        <attribute name="objectType" value="acdd:OnlineResource"/>
        <attribute name="linkage" value="netcdf/attribute/@name=publisher_url"/>
        <attribute name="name" value="A title for the URL"/>
        <attribute name="description" value="A description of the URL"/>
        <attribute name="function" value="information"/>
      </group>
    </group>
  </group>
  <attribute name="summary" value="netcdf/attribute/@name=summary"/>
  <group name="keywords" uuid="UUID">
    <attribute name="objectType" value="acdd:Keyword"/>
    <attribute name="type" value="theme"/>
    <attribute name="keyword"
      value="netcdf/attribute/@name=keywords1,netcdf/attribute/@name=keywords2,netcdf/attribute/@name=keywords3"/>
    <group name="projectKeywordThesaurus" uuid="UUID">
      <attribute name="objectType" value="acdd:Citation"/>
      <attribute name="title" value="netcdf/attribute/@name=keyword_vocabulary"/>
      <group name="revision">
        <attribute name="objectType" value="acdd:Date"/>
        <attribute name="date" value="2011-06-06"/>
        <attribute name="dateType" value="revision"/>
      </group>
    </group>
  </group>
  <attribute name="standard_name_vocabulary" value="netcdf/attribute/@name=standard_name_vocabulary"/>
  <attribute name="comment" value="netcdf/attribute/@name=comment"/>
  <group name="extent" uuid="UUID">
    <attribute name="objectType" value="acdd:Extent"/>
    <attribute name="description" value="Description of the extent"/>
    <attribute name="geographicIdentifier" value="Identifier of the extent"/>
    <attribute name="westBoundLongitude" value="netcdf/attribute/@name=geospatial_lon_min" type="float"/>
    <attribute name="eastBoundLongitude" value="netcdf/attribute/@name=geospatial_lon_max" type="float"/>
    <attribute name="southBoundLongitude" value="netcdf/attribute/@name=geospatial_lat_min" type="float"/>
    <attribute name="northBoundLongitude" value="netcdf/attribute/@name=geospatial_lat_max" type="float"/>
    <attribute name="beginPosition" value="netcdf/attribute/@name=time_coverage_start"/>
    <attribute name="endPosition" value="netcdf/attribute/@name=time_coverage_end"/>
    <attribute name="verticalMinimum" value="netcdf/attribute/@name=geospatial_vertical_min" type="float"/>
    <attribute name="verticalMaximum" value="netcdf/attribute/@name=geospatial_vertical_max" type="float"/>
  </group>
  <!-- Other Extent Information -->
  <attribute name="geospatial_lat_units" value="netcdf/attribute/@name=geospatial_lat_units"/>
  <attribute name="geospatial_lat_resolution" value="netcdf/attribute/@name=geospatial_lat_resolution" type="float"/>
  <attribute name="geospatial_lon_units" value="netcdf/attribute/@name=geospatial_lon_units"/>
  <attribute name="geospatial_lon_resolution" value="netcdf/attribute/@name=geospatial_lat_resolution" type="float"/>
  <attribute name="geospatial_vertical_units" value="netcdf/attribute/@name=geospatial_vertical_units"/>
  <attribute name="geospatial_vertical_resolution" value="netcdf/attribute/@name=geospatial_vertical_resolution" type="float"/>
  <attribute name="geospatial_vertical_positive" value="netcdf/attribute/@name=geospatial_vertical_positive"/>
  <attribute name="time_coverage_duration" value="netcdf/attribute/@name=time_coverage_duration"/>
  <attribute name="time_coverage_resolution" value="netcdf/attribute/@name=time_coverage_resolution"/>
  <attribute name="acknowledgment" value="netcdf/attribute/@name=acknowledgment"/>
  <group name="contributor" uuid="UUID">
    <attribute name="objectType" value="acdd:ResponsibleParty"/>
    <attribute name="role" value="netcdf/attribute/@name=contributor_role"/>
    <attribute name="individualName" value="netcdf/attribute/@name=contributor_name"/>
    <attribute name="organisationName" value="netcdf/attribute/@name=contributor_organization"/>
    <attribute name="electronicMailAddress" value="netcdf/attribute/@name=contributor_email"/>
    <group name="onlineResource" uuid="UUID">
      <attribute name="objectType" value="acdd:OnlineResource"/>
      <attribute name="linkage" value="netcdf/attribute/@name=contributor_url"/>
      <attribute name="name" value="A title for the URL"/>
      <attribute name="description" value="A description of the URL"/>
      <attribute name="function" value="information"/>
    </group>
  </group>
  <!--<attribute name="project" value="netcdf/attribute/@name=project1,netcdf/attribute/@name=project2"/>-->
  <group name="project" uuid="UUID">
    <attribute name="objectType" value="gmd:MD_Keyword"/>
    <attribute name="type" value="project"/>
    <attribute name="keyword" value="Conventions for HDF"/>
    <group name="projectKeywordThesaurus" uuid="UUID">
      <attribute name="objectType" value="acdd:Citation"/>
      <attribute name="title" value="Title for project keyword thesaurus"/>
      <group name="revision">
        <attribute name="objectType" value="acdd:Date"/>
        <attribute name="date" value="2011-06-06"/>
        <attribute name="dateType" value="revision"/>
      </group>
    </group>
  </group>
  <attribute name="history" value="netcdf/attribute/@name=history"/>
  <group name="history" uuid="UUID">
    <group name="processStep" uuid="UUID">
      <attribute name="objectType" value="gmi:LE_ProcessStep"/>
      <attribute name="description" value="A description of the processing step"/>
      <attribute name="dateTime" value=""/>
      <group name="processor" uuid="UUID">
        <attribute name="objectType" value="acdd:ResponsibleParty"/>
        <attribute name="role" value="processor"/>
        <attribute name="individualName" value="The name of a processing person"/>
        <attribute name="organisationName" value="The name of a processing organization"/>
        <attribute name="positionName" value="The name of a processing position"/>
        <attribute name="electronicMailAddress" value="The email address of the processor"/>
      </group>
      <attribute name="source" value="UUID,UUID,UUID"/>
      <group name="processingInformation" uuid="UUID">
        <attribute name="objectType" value="gmi:LE_Processing"/>
        <attribute name="identifier" value="A unique identifier for the processing"/>
        <group name="algorithm">
          <attribute name="description"
            value="A brief description of the algorithm used in this step"/>
          <group name="citation" uuid="UUID">
            <attribute name="objectType" value="acdd:Citation"/>
            <attribute name="title" value="The title of the algorithm document"/>
            <attribute name="identifier" value="A unique identifier for the algorithm"/>
            <group name="creation_date" uuid="UUID">
              <attribute name="objectType" value="acdd:Date"/>
              <attribute name="date" value="A date associated with the algorithm"/>
              <attribute name="dateType" value="The type of the date"/>
            </group>
            <group name="citedResponsibleParty" uuid="UUID">
              <attribute name="objectType" value="acdd:ResponsibleParty"/>
              <attribute name="role" value="The role of a responsible party"/>
              <attribute name="individualName" value="The name of a responsible party"/>
            </group>
          </group>
        </group>
      </group>
      <attribute name="output" value="UUID,UUID,UUID"/>
    </group>
  </group>
  <!-- Other Attributes -->
  <attribute name="processing_level" value="netcdf/attribute/@name=processing_level"/>
  <attribute name="license" value="netcdf/attribute/@name=license"/>
  <attribute name="cdm_data_type" value="netcdf/attribute/@name=cdm_data_type"/>
  <dimension name="altitude" length="/netcdf/dimension/@name=altitude/@length"/>
  <dimension name="lat" length="/netcdf/dimension/@name=lat/@length"/>
  <dimension name="lon" length="/netcdf/dimension/@name=lon/@length"/>
  <dimension name="time" length="/netcdf/dimension/@name=time/@length"/>
  <variable name="lat" shape="lat" type="double">
    <attribute name="coverage_content_type" value="coordinate"/>
    <attribute name="_CoordinateAxisType" value="Lat"/>
    <attribute name="actual_range" type="double" value="-90.0 90.0"/>
    <attribute name="coordsys" value="geographic"/>
    <attribute name="fraction_digits" type="int" value="4"/>
    <attribute name="long_name" value="Latitude"/>
    <attribute name="point_spacing" value="even"/>
    <attribute name="standard_name" value="latitude"/>
    <attribute name="units" value="degrees_north"/>
    <attribute name="axis" value="Y"/>
  </variable>
  <variable name="lon" shape="lon" type="double">
    <attribute name="coverage_content_type" value="coordinate"/>
    <attribute name="_CoordinateAxisType" value="Lon"/>
    <attribute name="actual_range" type="double" value="0.0 360.0"/>
    <attribute name="coordsys" value="geographic"/>
    <attribute name="fraction_digits" type="int" value="4"/>
    <attribute name="long_name" value="Longitude"/>
    <attribute name="point_spacing" value="even"/>
    <attribute name="standard_name" value="longitude"/>
    <attribute name="units" value="degrees_east"/>
    <attribute name="axis" value="X"/>
  </variable>
  <variable name="time" shape="time" type="double">
    <attribute name="coverage_content_type" value="coordinate"/>
    <attribute name="actual_range" type="double" value="1.3121568E9 1.3121568E9"/>
    <attribute name="fraction_digits" type="int" value="0"/>
    <attribute name="long_name" value="Centered Time"/>
    <attribute name="units" value="seconds since 1970-01-01T00:00:00Z"/>
    <attribute name="standard_name" value="time"/>
    <attribute name="axis" value="T"/>
    <attribute name="_CoordinateAxisType" value="Time"/>
  </variable>
  <variable name="altitude" shape="altitude" type="double">
    <attribute name="actual_range" type="double" value="0.0 0.0"/>
    <attribute name="fraction_digits" type="int" value="0"/>
    <attribute name="long_name" value="Altitude"/>
    <attribute name="positive" value="up"/>
    <attribute name="standard_name" value="altitude"/>
    <attribute name="units" value="m"/>
    <attribute name="axis" value="Z"/>
    <attribute name="_CoordinateAxisType" value="Height"/>
    <attribute name="_CoordinateZisPositive" value="up"/>
  </variable>
  <variable name="variable_1" type="netcdf/variable/@type" shape="netcdf/variable/@shape">
    <attribute name="units" value="netcdf/variable/attribute/@name=units"/>
    <attribute name="long_name" value="netcdf/variable/attribute/@name=long_name"/>
    <attribute name="standard_name" value="netcdf/variable/attribute/@name=standard_name"/>
    <attribute name="coverage_content_type" value="physicalMeasurement"/>
  </variable>
  <variable name="variable_2" type="netcdf/variable2/@type" shape="netcdf/variable2/@shape">
    <attribute name="long_name" value="netcdf/variable2/attribute/@name=long_name"/>
    <attribute name="standard_name" value="netcdf/variable2/attribute/@name=standard_name"/>
  </variable>
  <variable name="qualityVariable_1" type="netcdf/qualityVariable1/@type"
    shape="netcdf/qualityVariable1/@shape">
    <attribute name="units" value="netcdf/qualityVariable1/attribute/@name=units"/>
    <attribute name="long_name" value="netcdf/qualityVariable1/attribute/@name=long_name"/>
    <attribute name="standard_name" value="netcdf/qualityVariable1/attribute/@name=standard_name"/>
  </variable>
  <variable name="qualityVariable_2" type="netcdf/qualityVariable2/@type"
    shape="netcdf/qualityVariable2/@shape">
    <attribute name="units" value="netcdf/qualityVariable2/attribute/@name=units"/>
    <attribute name="long_name" value="netcdf/qualityVariable2/attribute/@name=long_name"/>
    <attribute name="standard_name" value="netcdf/qualityVariable2/attribute/@name=standard_name"/>
    <attribute name="coverage_content_type" value="qualityInformation"/>
  </variable>
  <variable name="modelResult_1" type="netcdf/qualityVariable1/@type"
    shape="netcdf/modelResult1/@shape">
    <attribute name="units" value="netcdf/qualityVariable1/attribute/@name=units"/>
    <attribute name="long_name" value="netcdf/qualityVariable1/attribute/@name=long_name"/>
    <attribute name="standard_name" value="netcdf/qualityVariable1/attribute/@name=standard_name"/>
    <attribute name="coverage_content_type" value="modelResult"/>
  </variable>
</netcdf>

Both of these examples are first drafts. They will certainly be improved with input from the group.