ISO Lineage

From Earth Science Information Partners (ESIP)
Revision as of 15:59, September 29, 2017 by Jkozimor (talk | contribs)

Tracking data sources and processing done to them is becoming increasing important as scientists seek to define trends and unexpected changes in the environment. Keeping track of data transformations and processing, generally termed lineage, is an important role of high-quality metadata. The ISO metadata standard provides a simple lineage model based on sources which are either used or produced in a series of process steps. This model can be helpful in many cases despite its simplicity. Sources and process steps are linked together to describe the lineage of a resource.

The Model

This Figure shows an overview of the ISO lineage model which links sources to process steps. Sources can be input to, or output (19115-2) from a process step. Each process step has associated processing and algorithm information (also added in 19115-2). These improvements make it important to use 19115-2 if you need good lineage descriptions. ISO 19115-2 Lineage UML. This Figure shows more detail in the UML model used by the ISO Standard to describe lineage. In some cases, a simple descriptive statement can describe the lineage effectively. In more complex cases, multiple sources and process steps might be required. The definitions of sources and processSteps are also shown in the UML. The capability to specify the spatial and temporal extent of the source and to describe the rationale for a process step are new in the ISO Standard. Note that each source can have any number of associated sourceSteps and that each processStep can have any number of sources (and outputs in ISO 19115-2).

Sources and Steps

The original ISO 19115 Source descriptions (LI_Source) were extended in 19115-2 to include several more elements. The LE_Source includes the following elements:

LE_Source
+ description[0..1]: CharacterString
+ scaleDenominator[0..1]: MD_RepresentativeFraction
+ sourceReferenceSystem[0..1]: MD_ReferenceSystem
+ sourceCitation[0..1]: CI_Citation
+ sourceExtent[0..*]: EX_Extent
+ processedLevel[0..1]: MD_Identifier
+ resolution[0..1]: LE_NominalResolution

and Process Steps include

LE_ProcessStep
+ description: CharacterString
+ rationale[0..1]: CharacterString
+ dateTime[0..1]: DateTime
+ processor[0..*] : CI_ResponsibleParty
+ processingInformation[0..*]: LE_Processing
+ report[0..*]: LE_ProcessStepReport

LE_Processing
+ identifier: MD_Identifier
+ softwareReference[0..*]: CI_Citation
+ procedureDescription[0..1]: CharacterString
+ documentation[0..*]: CI_Citation
+ runTimeParameters[0..1]: CharacterString
+ algorithm[0..*]: LE_Algorithm

LE_Algorithm
+ citation: CI_Citation
+ description: CharacterString

LE_ProcessStepReport
+ name: CharacterString
+ description[0..1]: CharacterString
+ fileType[0..1]: CharacterString

The ISO Lineage model is simple but is probably sufficient for many common processing scenarios. It may only provide summary information in complex processing scenarios. This is facilitated by the use of CI_Citations in LE_Sources, LE_Processing, and LE_Algorithm. The resources referenced by these citations can provide more detail when necessary.

XML Implementation

Implementing these relationships in XML can seem daunting. It is accomplished in the XML representation using ids and references. The LE_Source and LE_ProcessStep objects (the boxes in the UML) are implemented as independent children of the LI_Lineage object with unique identifiers and the relationships, the source, output, and sourceStep roles in between the boxes, are implemented as references.

The example shown below shows the lineage section of a DART metadata record. This DART dataset is made up of data from three different deployments. Each of these is listed as a source in the second part of the lineage section. Each source includes

   *an id (D165_1999, D165_2000, and D165_2001),
   *a spatial and temporal extent defined using a reference to a full description in a different part of the record (xlink:href="#Extent_D165_2001"), and
   *a sourceStep which is also defined by a reference to a full definition located in the first part of the lineage section (e.g. xlink:href="#Received_D165_2001).

The processing of each source is described in the first part of the lineage section. In this case, the process is the receipt of the the data by the archive. The processSteps include:

   *a brief description of the process
   *when it was done
   *who did it, defined by a reference to the seriesmetadataContact defined elsewhere in the record,
   *a reference to the source that was processed (gmd:source xlink:href="#D165_1999").

Note the use of id's within this record to identify sources and process steps and to make links between them.

<gmd:dataQualityInfo>
   <gmd:DQ_DataQuality>
      <gmd:scope>
         <gmd:DQ_Scope id="datasetScope">
            <gmd:level>
               <gmd:MD_ScopeCode codeList="./resources/codeList.xml#MD_ScopeCode" codeListValue="dataset"/>
            </gmd:level>
            <gmd:extent xlink:href="#boundingExtent"/>
         </gmd:DQ_Scope>
      </gmd:scope>
      <gmd:lineage xlink:title="Dart Bouy D165 Processing">
         <gmd:LI_Lineage uuid="95BD4CCC-D27D-8DE4-E040-0AC8C5BB43B64">
            <gmd:statement>
               <gco:CharacterString>Dart Bouy D165 Processing</gco:CharacterString>
            <gmd:statement>
            <gmd:processStep>
               <gmd:LI_ProcessStep id="Received_D165_1999">
                  <gmd:description>
                     <gco:CharacterString>Received edited data D165_1999-ed</gco:CharacterString>
                  </gmd:description>
                  <gmd:dateTime>
                     <gco:DateTime>2005-09-02T00:00:00</gco:DateTime>
                  </gmd:dateTime>
                  <gmd:processor xlink:href="#seriesMetadataContact"/>
                  <gmd:source xlink:href="#D165_1999"/>
               </gmd:LI_ProcessStep>
            </gmd:processStep>
            <gmd:processStep>
               <gmd:LI_ProcessStep id="Received_D165_2000">
                  <gmd:description>
                     <gco:CharacterString>Received edited data D165_2000-ed</gco:CharacterString>
                  </gmd:description>
                  <gmd:dateTime>
                     <gco:DateTime>2005-09-02T00:00:00</gco:DateTime>
                  </gmd:dateTime>
                  <gmd:processor xlink:href="#seriesMetadataContact"/>
                  <gmd:source xlink:href="#D165_2000"/>
               </gmd:LI_ProcessStep>
            </gmd:processStep>
            <gmd:processStep>
               <gmd:LI_ProcessStep id="Received_D165_2001">
                  <gmd:description>
                     <gco:CharacterString>Received edited data D165_2001-ed</gco:CharacterString>
                  </gmd:description>
                  <gmd:dateTime>
                     <gco:DateTime>2005-09-02T00:00:00</gco:DateTime>
                  </gmd:dateTime>
                  <gmd:processor xlink:href="#seriesMetadataContact"/>
                  <gmd:source xlink:href="#D165_2001"/>
               </gmd:LI_ProcessStep>
            </gmd:processStep>
            <gmd:source>
               <gmd:LI_Source id="D165_1999">
                  <gmd:description>
                     <gco:CharacterString>gov.noaa.ngdc.dart:D165_1999</gco:CharacterString>
                  </gmd:description>
                  <gmd:sourceExtent xlink:href="#Extent_D165_1999"/>
                  <gmd:sourceStep xlink:href="#Received_D165_1999"/>
               </gmd:LI_Source>
            </gmd:source>
            <gmd:source>
               <gmd:LI_Source id="D165_2000">
                  <gmd:description>
                     <gco:CharacterString>gov.noaa.ngdc.dart:D165_2000</gco:CharacterString>
                  </gmd:description>
                  <gmd:sourceExtent xlink:href="#Extent_D165_2000"/>
                  <gmd:sourceStep xlink:href="#Received_D165_2000"/>
               </gmd:LI_Source>
            </gmd:source>
            <gmd:source>
               <gmd:LI_Source   id="D165_2001">
                  <gmd:description>
                     <gco:CharacterString>gov.noaa.ngdc.dart:D165_2001</gco:CharacterString>
                  </gmd:description>
                  <gmd:sourceExtent xlink:href="#Extent_D165_2001"/>
                  <gmd:sourceStep xlink:href="#Received_D165_2001"/>
               </gmd:LI_Source>
            </gmd:source>
         </gmd:LI_Lineage>
      </gmd:lineage>
   </gmd:DQ_DataQuality>
</gmd:dataQualityInfo>

This XML shows parts of a lineage section for a CoastWatch Swath dataset.

<gmd:lineage>
  <gmd:LI_Lineage>
    <gmd:processStep>
      <gmd:LI_ProcessStep id="121">
        <gmd:description>
          <gco:CharacterString>
             * Ingest and calibrate: ingests raw satellite data to TeraScan data format.* Automatic navigation: corrects an ingested AVHRR pass file.
          </gco:CharacterString>
        </gmd:description>
        <gmd:dateTime gco:nilReason="Not complete"/>
        <gmd:processor>...</gmd:processor>
        <gmd:source xlink:href="#HRPT_AVHRR_L0"/>  <!-- 19115-2: input -->
        <gmd:source xlink:href="#HRPT_AVHRR_L1B"/> <!-- 19115-2: input -->
        <gmd:source xlink:href="#TDF_Temp"/>       <!-- 19115-2: output -->
      </gmd:LI_ProcessStep>
    </gmd:processStep>
    <gmd:processStep>
      <gmd:LI_ProcessStep id="122">
        ...
        <gmd:source xlink:href="#TDF_Temp"/>      <!-- 19115-2: input -->
        <gmd:source xlink:href="#SST_Cloud_TDF"/> <!-- 19115-2: output -->
      </gmd:LI_ProcessStep>
    </gmd:processStep>
    <gmd:processStep>...</gmd:processStep>
    <gmd:processStep>...</gmd:processStep>
    <gmd:source>
      <gmd:LI_Source id="HRPT_AVHRR_L1B">
        <gmd:description>
          <gco:CharacterString>
            HRPT is a live data feed as the spacecraft goes over a receiving stations.
          </gco:CharacterString>
        </gmd:description>
        <gmd:sourceCitation></gmd:sourceCitation>
        <gmd:sourceExtent>
          <gmd:EX_Extent>
            <gmd:temporalElement>
              <gmd:EX_TemporalExtent>
                <gmd:extent>
                  <gml:TimePeriod gml:id="tp_1030059.81238">
                    <gml:beginPosition>2003-11-10</gml:beginPosition>
                    <gml:endPosition/>
                  </gml:TimePeriod>
                </gmd:extent>
              </gmd:EX_TemporalExtent>
            </gmd:temporalElement>
          </gmd:EX_Extent>
        </gmd:sourceExtent>
        <gmd:sourceStep xlink:href="#121"/>
      </gmd:LI_Source>
    </gmd:source>
    <gmd:source>
      <gmd:LI_Source id="HRPT_AVHRR_L0">
      ...
      </gmd:LI_Source>
    </gmd:source>
    <gmd:source>...</gmd:source>
    <gmd:source>...</gmd:source>
    <gmd:source>...</gmd:source>
    <gmd:source>...</gmd:source>
  </gmd:LI_Lineage>
</gmd:lineage>