Interagency Data Stewardship/Identifiers/UseCases

From Earth Science Information Partners (ESIP)

Back to the Identifiers Testbed home page.

General Use Case

An archive is responsible for a Data Type (in NASA EOS parlance, an ESDT). Call it Datatype DT. It has processed that data twice, each with a different version (either a major algorithm update, calibration update, or a similar update to the version of one of its inputs). Call those versions 1 and 2 (in NASA EOS parlance, Collection 1 or Collection 2). This therefore results in two Data Sets, DT1 and DT2. Assume that DT1 is a closed data set and will have no further changes, but DT2 is an open data set, so could have additional granules added to it, or older broken granules removed or replaced.

The archive maintains a database of metadata, and can present a web page of information for the data type DT, as well as for each of the data sets DT1 and DT2. To support that, it can produce URLs into its web site for each of those entities. That page can include textual information, structured metadata, links to download the data, etc.

The curator does [#1] to construct an identifier for either the data type, or the data set, and uses [#2] mechanism to produce more specific identifiers from which the precise granule membership can be determined.

On Jan 1, 2010 at 2:00PM UTC, a researcher downloads some data and wants to cite it:

[#3] All of the data from DT1

[#4] All of the data from DT2

[#5] Doesn't download any specific data, but refers to the data type in general way.

At some point in the future the data type is transferred to a new organization. This could happen in a couple of ways:

[#6] the whole organization goes away, including the server that hosted the first level of identifiers.

[#7] the archive remains active, but a particular data type is transferred to another archive.

DOIs

Assume for now we use the CrossRef DOIs.

Central Registration

A central organization (ESIP Federation?) registers as a CrossRef organization (10.12345)and plays 'middle man' to register DOIs for us. The archive curator logs into the ESIP Federation site and enters some basic metadata and registers DT and gets assigned a DOI. It could also register each of DT1 and DT2 if we want.

[#3] Since DT1 is closed, the DOI itself is enough to resolve the specific granule membership.

Smith, John. Some Earth Science Data. DT1. DOI: 10.12345/DT1.

This DOI gets pointed at a URI on the ESIP Fed site, which redirects to the Archive metadata page, something like

http://archivea.agency.gov/DT1

[#4] Since DT2 is open, the reference must be qualified by something: either a date/time stamp or some other identifier component.

Smith, John. Some Earth Science Data. DT2. DOI: 10.12345/DT2, 2010-01-01T14:00:00.

[#5]

Smith, John. Some Earth Science Data. DT. DOI: 10.12345/DT.

[#6] The curator logs into the ESIP Fed site and does a 'global' search/replace on some prefix of its identifiers for the new archive's URI scheme.

[#7] The curator logs into the ESIP Fed site and updates the particular database record with the identifiers for the data type that is being moved.

Distributed Registration

Each archive individually joins CrossRef and gets assigned their own organization code and can assign their DOIs themselves. Cases [#3], [#4], and [#5] are similar to central, with one fewer redirection needed to resolve.

[#6] and [#7] The curator for the archive updates the CrossRef database itself to point to the new archive.

PURLs

This is very similar to the DOI case, and could be done centralized or distributed. We could either have each organization register for their own "purl.org" prefix, or set up an ESIP Fed. instance of the PURL resolver at something like "purl.esipfed.org" and everything would work identically. (Though we could put more policies/guidelines in place for "purl.esipfed.org" registration.)

Assume the archive registers and gets the http://purl.org/net/archivea prefix.

[#1] It assigns these identifiers:

[#3]

Smith, John. Some Earth Science Data. DT1. http://purl.org/net/archivea/DT1.

[#4] Since DT2 is open, the reference must be qualified by something: either a date/time stamp or some other identifier component.

Smith, John. Some Earth Science Data. DT2. http://purl.org/net/archivea/DT2/2010-01-01T14:00:00.

[#5]

Smith, John. Some Earth Science Data. DT. http://purl.org/net/archivea/DT.

[#6] The curator logs into the PURL server and redirects the prefix as a whole to the new archive.

[#7] Either the curator redirects the prefix for the data type at the PURL server level to the new archive, or maintains the redirection in its own server. In either case after redirection, the published PURLs still end up on the right pages at the new archive.

DOI Use Cases

Finding a data set referenced in a paper from the DOI in its citation

While reading a journal article a scientist comes across a reference to a data set that might be useful in their work. Since the authors of the paper had formally cited the data set, including a DOI for the data set as a whole, the user accesses the data set by either 1) using the DOI add-on to the Firefox browser; or 2) using one of the several DOI resolvers available on the web. In either case, the user is taken to the home web page for that version of the data set. A place from which data set metadata and documentation can be accessed, which also contains links to access the data, perhaps through several different mechanisms.

Accessing a data set discovered on the ESIP Federation data set registry

While perusing the ESIP Federation data set registry (if we can such a thing to exist), a user runs across a description of a data set they'd like to evaluate for use in their work. Part of the registry entry for the data set is its DOI. They click on this and are re-directed to the home web page for that version of the data set.

NOTES: Such a registry could be at least populated with NSIDC metadata using OAI-PMH metadata harvesting...