Difference between revisions of "Interagency Data Stewardship/Identifiers"

From Earth Science Information Partners (ESIP)
Line 30: Line 30:
 
===Identifier schemes===
 
===Identifier schemes===
  
[[Interagency Data Stewardship/Identifiers/Table| Identifier Characteristics Table]]
+
[[Interagency Data Stewardship/Identifiers/Table| Identifier Characteristics Table]] provides a listing of the significant characteristics of each of the identifier schemes examined.
 +
 
 +
In this testbed, we will be using the following schemes:
 +
 
 +
* DOI
 +
* PURL
 +
* UUID
 +
* OID
 +
* ARK
 +
* XRI (presuming it is still deemed worthwhile after further study)
 +
 
  
 
===Testbed Telecon Minutes===
 
===Testbed Telecon Minutes===

Revision as of 15:05, September 16, 2009

Identifiers Testbed Activities

These wiki pages will be used to chronicle the identifiers test bed activities of the Data Stewardship cluster

Proposal Text

This is the text related to the identifiers testbed from the proposal to the federation.

The Preservation and Stewardship Cluster and the NASA Technology Infusion Working Group have been considering permanent naming schemes for data products. These identifiers can serve as references in journal articles and must include versioning representations. Many naming options have been promoted, but the best choices for Earth science data require careful examination. Two datasets may differ only in format, byte order, data type, access method, etc., creating facets (dimensions) not relevant to classification schemes for books (Library of Congress, Dewey Decimal).

Ultimate Benefit: Permanent, unique names for data Federation data products.

Cost: $5K for Programmer 2 to setup a test archive where data can be retrieved via the candidate naming schemes, as identified by the Provenance and Stewardship Cluster.

Goals and Objectives (feel free to update/modify/redirect if need be)

Unique and lasting data identifiers are needed for a wide range of purposes and at various scales from data sets, to collections of data sets, to individual files or data objects. Likewise, a large variety of identification schemes have been developed, each of which satisfies a subset of the overall needs and is most appropriately applied at various scales.

The ESIP preservation and stewardship cluster has recognized that as a consequence data centers will need to support multiple identification schemes and different identifiers at different scales. The purpose of this testbed activity is to test and demonstrate the applicability of selected schemes with a wide variety of earth science data types with the ultimate goal of recommending a suite for use by ESIP federation members...

Data Sets to be used

  • NSIDC's glacier photo collection (random place, time, source, with repeats occasionally)
  • NSIDC's GLAS data ("Picket fence" spatial organization over short time periods spaced intermittently)
  • GSFC's Ozone datasets (orbit data partitioned to keep all daylight data together; and profile data sets)
  • GSFC's 3-D merged cloud data set
  • NOAA/Bruce B. - ERBE (long-time series from multiple satellites)
  • NOAA/Bruce B. - Hurricane Ike collection (spatially organized data collection)
  • ORNL - Luyssert data set (example of ways for tracking pieces from multiple different field datasets going into a synthesized product)

Identifier schemes

Identifier Characteristics Table provides a listing of the significant characteristics of each of the identifier schemes examined.

In this testbed, we will be using the following schemes:

  • DOI
  • PURL
  • UUID
  • OID
  • ARK
  • XRI (presuming it is still deemed worthwhile after further study)


Testbed Telecon Minutes