Difference between revisions of "Preservation Ontology"

Revision as of 10:56, April 5, 2011

About

Supporting the long-term preservation of Earth system science data and information is core of the Data Preservation and Stewardship Cluster. As such, a formalism is needed to codify the information. A future-looking approach is to leverage semantic web technologies to capture the knowledge representation and to enable flexible usage of this information.

Roadmap

We would like to start with practical use cases in preservation modeling, then work on a small and manageable model, and continually increment on a working design. Ideally, we should converge with support the information identified in the Provenance and Context Content Standard.

Some major steps planned:

1. Define what we would like from a "Preservation Ontology"

Initially covered in the 2011-03-09 telecon.

2. Define practical Use Cases

3. Extract high-level requirements

4. Adopt/reuse existing provenance models

A good starting point: OPMO: The Open Provenance Model OWL Ontology

5. Extend model with more focus on Earth science preservation

Include provenance and context

6. Infuse model into data systems

some aspects covered by ACCESS project(s)?

7. Updates and Refinements

Approach

Following the steps from the roadmap, the approach will also include the following:

Follow closely the Provenance and Context Content Standard.
- e.g. processing history, data formats used, product development history, algorithms, ATBDs, product tools, QA, validation, software.
- On preservation, do we want to model with provenance and/or context?
- Mark P on provenance vs context:
  - provenance is for reproducibility.
  - context is for someone use the information for something else.
- Bruce B additional comments
  - context information is probably usefully divided into
    - documentation, which is text, images, and tables that provide information about the Earth science data
    - other data, which will usually be numbers or strings that might be needed to understand the Earth science data. For example, if one were dealing with a digital glacier photo, a file with the latitude and longitude of each pixel might help a user understand the image
  - It is not clear what to do about some data that doesn't necessarily appear in the published data files, but is critical to the meaning of the Earth science data. Examples include calibration coefficients, radiative transfer parameters, or even supporting data, such as temperature and humidity profiles used as input (or, to back up one more step, the radiosonde or satellite data used to produce numerical weather forecasts)
Start with Open Provenance Model
- The W3C Provenance Incubator Group has a good comparison of different provenance models and their mappings in the Provenance Vocabulary Mappings.
- More specifically, initially adopt the OPMO: The Open Provenance Model OWL Ontology
- then extend with more Earth science-specific domain model.
Explore if possible to map some of the preservation model information to ISO 19115 - Metadata for Geographic Data?
- ISO 19115:2003 and 19115-2:2009 have Lineage sections.
- ISO LI and LE lineage mostly covers processing provenance.
- In light of Metadata Evolution for NASA Earth Science Data System (MENDS) recommendations.
  - reference: MENDS Breakout at ESIP 2011 Winter meeting.

Use Cases

Here are some initial grouping place holders:

Capturing preservation information

capturing data production provenance
capture data product context

Using preservation information

provenance for reproducibility
comparison of production runs from two granules.
context for reuse in other domains

Model

tbd

Infusion

tbd

References

Provenance Vocabulary Mappings by the W3C Provenance Incubator Group

@@ Line 69: / Line 69: @@
 * provenance for reproducibility
+* comparison of production runs from two granules.
 * context for reuse in other domains