Revised Proposal to Federation

From Earth Science Information Partners (ESIP)

2009 Committee Budget Request Version 3: April 19, 2009

The Products and Services Committee is requesting $40,000 to invest in activities that will ultimately add value to the products and services offered by Federation members. We seek to develop an internal testbed to evaluate prototype standards, metadata services, protocols, and best practices, as listed in Appendix A. These standards and protocols will help users to discover, use, and understand Federation products and services. The testbed will serve as a forum for innovative collaboration across all sectors of the Federation, and will enable our Committee (and the Federation, more generally) to work as a virtual organization to address deficiencies in how our member products and services are made available.

While any of the testbed focus areas listed in Appendix A could form the basis of a proposal to an external funding program (such as NASA ROSES/ACCESS), significant advantages accrue from gaining the inputs, requirements, feedback, and buy-in from the entire Earth science community, as represented within the Federation. The testbed will provide a framework for the entire Federation community to explore competing approaches for a new generation of standards and services for Federation offerings. The Federation wiki will be used to share comments about each testbed focus area, including: problem statement, requirements, use cases, current approaches and limitations, new approaches implemented in the testbed, feedback regarding new standards, and suggested best practices. Many of the focus areas will be carried out in conjunction with other Federation Clusters, Working Groups, and Standing Committees. Additional focus areas will be added later upon demand.

The testbed is not intended to test new technologies; rather it will be a unified environment for gaining consensus on community best practices. The testbed will require computer and web programming support at the 0.5 FTE level to set up, populate, and maintain the infrastructure. Implementation will proceed in close collaboration with the Federation front office. We are seeking support for two programmers at the 0.25 FTE level each, one based at George Mason University with specialized knowledge of semantics (Programmer 1) and the second based at the Federation Front Office with specialized knowledge of web services (Programmer 2). All expenses are for Labor.


Appendix A: Top 10 Testbed Focus Areas

Essential:

1. Permanent Unique Object Identifiers The Preservation and Stewardship Cluster and the NASA Technology Infusion Working Group have been considering permanent naming schemes for data products. These identifiers can serve as references in journal articles and must include versioning representations. Many naming options have been promoted, but the best choices for Earth science data require careful examination. Two datasets may differ only in format, byte order, data type, access method, etc., creating facets (dimensions) not relevant to classification schemes for books (Library of Congress, Dewey Decimal). Ultimate Benefit: Permanent, unique names for data Federation data products. Cost: $5K for Programmer 2 to setup a test archive where data can be retrieved via the candidate naming schemes, as identified by the Provenance and Stewardship Cluster.

2. Expert Skills Service The Federation collectively includes an exceptionally wide range of expertise among its participating members. The testbed will enable us to explore various approaches to capturing expert skills residing within the Federation and categorizing them in a knowledge base so that this information can be offered as a service. Ultimate Benefit: Promotion of expert skills available within the Federation. Cost: $5K split between Programmers 1 and 2 to implement and test a knowledge base, as identified by the Federation Front Office and the Semantic Web Cluster.

3. Semantic Web Services The Semantic Web Cluster has been developing ontologies for Data Service, Data types, and science concepts. The testbed will enable providers to register their products and services semantically, which will provide more precise descriptions of their offerings. Ultimate Benefit: Better classification of Federation products and services Cost: $5K for Programmer 1 to setup a test archive where data can be registered via criteria established by the Semantic Web Cluster.

4. Customized Inventories for GEOSS Societal Benefit Areas The Air Quality Working Group has been developing an inventory of air quality data and data services. Other GEOSS Societal Benefit Areas could benefit from a similar capability to highlight offerings from Federation members. Ultimate Benefit: Better promotion of targeted Federation products and services Cost: $5K for Programmer 1 to setup a test inventory based on criteria established by the Water and Carbon Clusters.

Important:

5. Advertising of Federation Member Services Several machine-processable mechanisms for advertising services (such as service casting) have been evaluated by the NASA Tech Infusion Working Group. The testbed will provide an appropriate demonstration area for this promising new technology. This aggregation service could be used to gather and disseminate information about the wide variety of web-based services (e.g., OGC, REST, SOAP, etc.) and interfaces available from ESIP members. Ultimate Benefit: Better accessibility of Federation products and services Cost: $4K for Programmer 2 to setup a test service with input from the Tech Infusion Working Group.

6. Data and Service Quality It is desirable to associate standardized quality measures to Federation products and services. Data quality has many dimensions (instrument accuracy, algorithm effectiveness, etc.) and these are not described in any consistent manner. Service quality is essentially an unexplored area that requires new measures altogether. Some quality measures will be application-dependent. The results are potentially applicable to improving existing standards, such as the OAIS Reference Model. Ultimate Benefit: Better description of Federation products and services Cost: $4K for Programmer 1 to setup a classification scheme and testbed for data registered to standard quality measures.

7. Production history provenance Provenance is the key to reproducibility in science and to allow users to trust the data obtained. A major component of provenance is the understanding and verification of production history, and the ties to the context of the data. We will use the testbed for representative datasets to: i) separate the provenance tracking into physical object production, discrete file production, database production, ad hoc workflow production, and Web services types of data production; ii) develop a generic data flow (data - process - data) for each kind of production; and iii) show how to instantiate the data flow in a recoverable record - developing and implementing a design, with use cases. Ultimate Benefit: Better description of value-added Federation products and services Cost: $4K for Programmer 2 to capture this information in metadata, as identified by the Products and Services Committee.

Desired:

8. Metadata for Custom/Virtual Products and Services Many ESIP members offer products and services created on-demand. Full metadata descriptions require new standards that are robust enough to describe and invoke these offerings. We will use the testbed to prototype hybrid data-service metadata standards. Ultimate Benefit: Better description of innovative Federation products and services Cost: $3K for Programmer 1 to setup and implement metadata extensions for several virtual data products that are redirected by the testbed, as identified by the Products and Services Committee.

9. Intellectual Property Rights An important use of provenance is to track and verify the delegation of Intellectual Property Rights. We will use the testbed to: i) identify IPR bundles applicable to Earth science data, documentation, and services; ii) develop use cases for transfer of IPR bundles; and iii) show how to instantiate IPR recording and transfer mechanisms Ultimate Benefit: Better description of value-added Federation products and services Cost: $3K for Programmer 2 to capture this information in metadata, as identified by the Products and Services Committee.

10. Metadata Harvesting Federation products and services are not cataloged in any single consistent form. Any Federation inventory must be able to harvest existing clearinghouses and be able to distinguish primary and secondary sources. Ultimate Benefit: Inventory of Federation products and services. Cost: $2K for Programmer 2 to setup and implement a harvesting protocol.