Request for Endorsement of EZId

From Earth Science Information Partners (ESIP)

DRAFT Request for Endorsement of EZID Service Account for Metadata Testbed & Report to ESIP Federation Executive Committee
From Data Stewardship & Preservation Cluster
January 21, 2011

Executive Summary

In order to further the Data Stewardship & Preservation Cluster’s (DS&P) objective of exploring and recommending identifier schemes for digital objects within the geospatial domain, a Metadata Testbed is being created that can be used to assign and test the practicability of assigning unique identifiers from a limited number of identification schemes to one or more digital collections / components. The EZID service created and maintained by the California Digital Library (CDL) / DataCite Consortium (DataCite) has been chosen as the most viable registration agency / service to facilitate the creation and long-term maintenance of two of the chosen identification schemes, Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs). Use of the EZID service requires organizational support from the ESIP Federation, initially for Metadata Testbed activities, and possibly for more permanent / broader use by the ESIP Federation upon evaluation and approval by both the DS&P cluster and the ESIP Federation Executive Committee. This memo is written to provide context for and request endorsement of a proposal with which to approach CDL/DataCite for use of its EZID service for one year with possibility of renewal.


Purpose of Metdata Testbed Activity: One of the issues that has long been of concern for the DS&P cluster is the question of unique, persistent identity for digital resources at both the dataset and component or granule level. Proposed activities resulting from this and related discussions have been the desire to make recommendations to the geospatial community about which identification scheme(s) might be best suited for geospatial digital objects, and possibly, what practices would be best promulgated related to the establishment of unique identify. In the interests of both recommendations, a Metadata Testbed has been set up to enhance and ground the theoretical evaluation of identification schemes with an assessment of operational considerations related to the technology and implementation of identifier schemes, thereby helping to clarify the choices identification schemes.
Planned Activities: Based on assessment criteria generated from both DS&P cluster and ESDSWG Technology Infusion Working Group (TIWG) discussions, nine identification schemes were chosen to test on at least one, but hopefully several other digital collections of digital objects representing canonical collections that can meet the needs associated with four different use cases. (A more complete discussion of assessment criteria, and abstract evaluation of identification schemes chosen can be found in a paper soon to be made available by DS&P and TIWG members.) The nine identification schemes include the Archival Resource Key, the Digital Object Identifier, the Extensible Resource Identifier, the Handle System, the Life Science Unique Identifier, the Object Identifier, the Persistent Uniform Resource Locator, the Uniform Resource Identifier /Name /Locator, and the Universally Unique Identifier.
On the assumption that a literature review, and theoretical evaluation of identification schemes may not tell the full story, the DS&P cluster decided to subject the identification schemes to a further test of implementation before making a recommendation on which identification scheme(s) to use for geospatial resources.
Planned Outcomes:
  1. A paper discussing the utility of chosen identification schemes, and recommending one or more identification schemes to use for geospatial resources.
  2. A paper discussing operational considerations associated with assigning identifiers of each of nine identification schemes to at least one dataset and its components, and the impact of those considerations on the recommendations from the first paper.
  3. Further discussion about best practices related to the assignment / use of unique identifiers based on the experience gained from above activities.
Operational Questions to be asked
From DS&P cluster
  • Technical, i.e., how sound is the technology underlying the identifier scheme (including, but not limited to scalability of creation / maintenance of IDs, e.g., API or web-based service, ease of transformation to citable / resolvable URLs, acceptance / support by standardization bodies, framework for name authority, algorithms for establishing / ensuring uniqueness);
  • User Value: i.e., what value does the scheme provide to end users – are end user’s lives made easier or more difficult by using the scheme, especially for purposes of bibliographic citation;
  • Archive Value, i.e., what value does the scheme provide to the archive or archives that are managing data using these identifiers – is the job of data management made easier or more complicated.
  • Operational, i.e., what are the organizational commitments required or desirable for the data creator / provider / broker associated with use of the identification scheme (including but not limited to the short-term and long-term costs, if any, of the benefits / services received, the business model / strategy underlying and/or supporting the service / organization providing the persistence of the identifiers).
From the ESIP Fed Executive Committee
  • Are there activities / responsibilities associated with the EZID user groups of which the ESIP Federation should be aware?
  • What is the relationship of the EZID user groups to the DataCite Consortium, if any?
  • The ESIP Federation may be interested in allowing its members to use the EZID service who are not interested in having their own relationship with CDL/DataCite. One of the outcomes of the Metadata Testbed activities would be to assess the feasibility of, and ongoing costs for the ESIP Federation for brokering that service. We anticipate that the Metadata Testbed activities should be completed within a year, and thus would like to establish a time-bounded agreement with CDL/DataCite that could be renewed upon the completion of the testbed activities and reports.

Proposal statement to Steering Committee

  • Negotiate an with CDL/DataCite for the ESIP Federation to use the CDL EZID service to assign both DOIs and ARKs for up to two digital collections and their components / granules.
  • Time-bind the agreement for a one year period that is related to DS&P Metadata Testbed activities designed to assess the efficacy of identification schemes for geospatial digital objects including DOIs and ARKs.
  • Included in the report from the DS&P cluster regarding Metadata Testbed activities will be:
  • An evaluation of the feasibility, pros / cons and recommendations regarding the continuation of the relationship of the ESIP Federation with CDL/DataCite and continued use of the EZID service.
  • Recommendations for a plan for any necessary responsibilities / actions related to the creation / maintenance of the EZID-assigned persistent IDs created for the digital collections and components / granules involved in the Metadata Testbed activities.
  • Costs for the use of the EZID service for the period of the agreement will include:
  • Annual fee of $ _(see attached email; we should discuss which category we could fall in)
  • Costs for programmer support for Metadata testbed activities related to assignement of DOIs and ARKs (built into existing scope of work)
  • Volunteer support for ID implementation / reporting by DS&P cluster members
  • EZID services provided:
  • EZID stores the identifier string and its metadata, and internally generated, administrative metadata.
  • EZID does not store passwords except in encrypted form via a one-way hash for account security purposes. All stored data for the identifiers owned by "our" group would accessible using the API. The method for doing so is included in the API documentation.
  • ESIP Federation responsibilities:
  • The ESIP user group would have to take responsibility for maintaining the location of the digital resources, and for maintaining the metadata for the location of the resources per its permanent ID. The group can set this value, per the EZID API, Version 2. This would be the target URL for the resources. It appears that there can be separate targets for different formats of the resource, i.e., the jpeg and the tiff versions for this collection / set.