Lineage syntax, semantics and transport

From Earth Science Information Partners (ESIP)

Lineage Syntax

Most lineages, whether prescriptive, descriptive or other, can be represented as a graph. However, there is as yet no standard syntax for representing said graphs. Thus, it is likely that multiple value-added sites will have their own syntax within each site.

Still, we have to start somewhere. One approach is to begin with the ES3 schema used by UCSB in the 2nd Provenance challenge, and generalize it as necessary to accommodate inter-site lineages. Since the ES3 schema already includes mechanisms for nesting workflows and encapsulating mostly arbitrary extra metadata, this is a natural starting point for an inter-site workflow syntax. In this sense, it is similar to Giovanni's approach to lineage as well.

Lineage Semantics

At the same time, there must be some standardization of semantics to ensure the inclusion of key information and to enable clients to be written (eventually) to use the lineage metadata.

I suggest that we use the userMetadata element to store an initial cut at lineage metadata, specifically information identifying the site and a contact information (e.g. a URL), but following the semantics of the ISO 19115 standard for those attributes.

Lineage Transport

This is in some ways the most difficult part, and refers to the protocol request/response by which lineage is transported. There are three ways to do this: 1. Embed the lineage information in the "data" response of existing protocols, e.g., in the netcdf attributes of a WCS/netCDF profile response.

2. Develop a completely separate protocol, e.g. http://some.site.edu/lineage? http://webapps.datafed.net/dvoy_services/ogc.wsfl?SERVICE=wcs&REQUEST=GetCoverage&VERSION=1.0.0&CRS=EPSG:4326&COVERAGE=THREDDS_GFS.T&FORMAT=GML&BBOX=-180,-90,180,90,1350,1350&TIME=2005-12-06T12:00:00Z&WIDTH=900&HEIGHT=400&DEPTH=0

3. Extend existing protocols to include a LINEAGE request/response, e.g.: http://webapps.datafed.net/dvoy_services/ogc.wsfl?SERVICE=wcs&REQUEST=GetLineage&VERSION=1.0.0&CRS=EPSG:4326&COVERAGE=THREDDS_GFS.T&FORMAT=GML&BBOX=-180,-90,180,90,1350,1350&TIME=2005-12-06T12:00:00Z&WIDTH=900&HEIGHT=400&DEPTH=0

I suggest we take a stab at the third approach. See the Lineage scenarios for an example.