Creating a long term trend data set from multiple data sets

From Earth Science Information Partners (ESIP)
Revision as of 08:11, July 9, 2013 by Ramapriyan (talk | contribs)

Bold textCreating a long term trend data set from multiple data sets Summary: Input datasets are gathered to produce a new data set. Inputs cover a long time duration and come from multiple sources. The Producers of inputs are no longer available for consultation. Provenance of inputs are essential to have been captured for this use case to succeed. Reciprocally, for future users of the derived long-term dataset, provenance will need to be captured.

Objective/Context: Create dataset from multiple previously produced datasets to enable long-term use. Assume producers of datasets are no longer available for consultation.

Actors: Producers disconnected from consumers; consumers; archivist/curator; data distributor; science data intermediaries

Sequence of events:

  • identify purpose of creating long-term dataset
  • Write proposal
  • Get funding
  • Identify input datasets required
  • Search for (discover) input datasets
  • Negotiate rights (in case of restricted data)/obtain datasets
  • Consult with data intermediary
  • Verify that datasets are applicable for purpose
  • Designs algorithms
  • Create software
  • Generate long term trend dataset
  • Document lessons learned
  • Document provenance

PCCS artifacts: For each input dataset, need to decide which artifacts need to be preserved to satisfy the use case. Artifacts also need to be preserved for the long term derived dataset generated by the use case.

  • the data itself
  • processing version
  • source code *
  • Format
  • Size of data
  • parameter descriptions
  • content descriptions
  • tools and web apps needed to read/use/transform the data
  • reputation of source
  • calibration method
  • processing method
  • algorithms used *
  • difference from previous versions *
  • data version
  • processing history *
  • history of what has happened to the data since it was created *
  • data inputs used *
  • pre-cursor products used *
  • instrument schematics, etc. *
  • instrument capabilities and characteristics
  • calibration/validation data *
  • validation method *