Difference between revisions of "Discovery White Paper"

From Earth Science Information Partners (ESIP)
Line 19: Line 19:
 
== Example Scenario - Ruth ==
 
== Example Scenario - Ruth ==
  
== Technology framework - Chris Mattman ==
+
== Technology framework - Chrises Mattman & Lynnes==
 +
=== Federated Search Framework ===
 +
The distributed, diverse nature of the ESIP federation mitigates in favor of a federated solution to the basic problem of Search.  Federated search for Earth science data actually has a long history within the Earth science disciplines. The EOSDIS<ref>Earth Observing System Data and Information System</ref> Version 0 system implemented a federated search across 8 remote sensing data centers as early as 1994. In the wider community, Amazon implemented a federated search named A9 to search across vendors.  The conventions used for A9 eventually became known as OpenSearch (http://www.opensearch.org). OpenSearch has the virtue of being extremely simple to understand and implement. Simplicity is a critical aspect of any framework that needs widespread, voluntary adoption in a community with significant variation in technical capacity.
 +
=== Datacasting Frameworks ===
 +
=== Servicecasting Framework ===
 +
=== Integrating Frameworks Together===
 +
One problem that is largely unsolved is that of linking data with the services that operate on them...
  
 
== Governance ==
 
== Governance ==

Revision as of 15:41, September 2, 2011

NOTE: This is the ESIP Discovery Cluster's forum for working on a white paper for NSF's EarthCube program. If you feel that you have something positive to contribute and aren't yet a member of the cluster, feel free to join (i.e., the wiki and the monthly telecons). If you'd like to work on sections of the White Paper add your name to that section of the outline and start writing (note - multiple authors for a section are encouraged but should collaborate with each other)! We are looking for roughly 2-3 pages of text (plus images if any), so be pithy


Introduction - Ruth & Chris L.

The Grand Challenge

Earth science data is growing by leaps and bounds, not only in pure data volume, but also in variety. This bounty presents new opportunities to Earth science practitioners, providing more views into how Earth processes work and particularly how the various Earth systems interact. However, integrating these datasets poses key significant challenges as scientists try to incorporate the growing diversity into their scientific processes:

  • discovering data available from the ever-increasing set of data providers
  • employing tools and services with the data
  • harmonizing different datasets so they can be integrated

The NSF Earth Cube program takes square aim at the issues that make data integrability so difficult with its goal of a fully interoperable digital access infrastructure. Such an infrastructure would encompass discovery of data across disciplines and from distributed data providers, interoperable services and data formats, and tools and services that can work with a wide range of datasets.

The ESIP Federation

The Federation of Earth Science Information Providers (ESIP) has also been addressing interworkable data since its inception in 1998. The ESIP Federation is composed of a wide variety of Earth science data, information and service providers: academic institutions, commercial interests, government agencies at federal, state and local levels, and non-governmental organizations. Members also cover a wide range of missions, from educational to research to applications, as well as a wide range of disciplines: solid-earth, oceanography, atmospheric sciences, land surface, ecology, and demographics.

This diversity has forced ESIP to confront many of the challenges to data integration. At the same time, it virtually mandates a loosely knit organization. While ESIP has a well-defined governance structure for with respect to business activities, technical progress most often comes out of ESIP "clusters". Clusters are self-organizing groups of people within the federation who come together to tackle a particular issue, with integration across the foundation usually the main goal. Some clusters are domain-focussed, such as the Air Quality cluster, while others are formed to address particular aspects of interoperability, such as the Discovery and Earth Science Collaboratory clusters.

The Discovery Cluster

The Discovery Cluster began as the Federated Search Cluster in 2009 to address the problem of discovering Earth science data over the widest possible variety of data providers. In keeping with the federated aspect of the ESIP Federation at large, a federated search solution was developed based on the OpenSearch (http://www.opensearch.org) conventions. In January of 2011, the Federated Search Cluster was broadened to include subscription based ("*casting") methods of discovery, at which point it was renamed the Discovery Cluster. The Discovery Cluster works to develop usable federated solutions to the problem of distributed and diverse providers, leveraging existing standards, conventions and technologies, with a predilection for simple solutions that have a high likelihood of voluntary adoption.

Example Scenario - Ruth

Technology framework - Chrises Mattman & Lynnes

Federated Search Framework

The distributed, diverse nature of the ESIP federation mitigates in favor of a federated solution to the basic problem of Search. Federated search for Earth science data actually has a long history within the Earth science disciplines. The EOSDIS<ref>Earth Observing System Data and Information System</ref> Version 0 system implemented a federated search across 8 remote sensing data centers as early as 1994. In the wider community, Amazon implemented a federated search named A9 to search across vendors. The conventions used for A9 eventually became known as OpenSearch (http://www.opensearch.org). OpenSearch has the virtue of being extremely simple to understand and implement. Simplicity is a critical aspect of any framework that needs widespread, voluntary adoption in a community with significant variation in technical capacity.

Datacasting Frameworks

Servicecasting Framework

Integrating Frameworks Together

One problem that is largely unsolved is that of linking data with the services that operate on them...

Governance

Looking to the future - Chris Lynnes

The preceding text demonstrates how a lightweight standard or convention can nonetheless enable significant interoperability with respect to discovering data and services, and furthermore, how similar, interlocking conventions can provide cross-cutting interoperability, in this case between services and data. However, these are not the only Earth science entities that we should like to encompass in our drive to make systems "interworkable". Data and services (or tools) can be combined in sequences to form scientific workflows. The analysis results from executing these workflows may also be thought of in a fashion similar to data. And the results themselves may be aggregated into an experiment, in much the same way that different model runs are aggregated into an ensemble. Many of the key discovery attributes of workflows, results and experiments can be inherited from the data and service building blocks from which they are made. As a result, it is not too ambitious to hope that the entire "information stack", from data and services, up through workflows, results and experiments, can be interoperable (or interworkable) both horizontally (data with data, result with result) and vertically (data with tool with workflow with result with experiment). Such an interoperability framework would convey the key advantage of presenting everything in the proper context: a given result could be traced back down through the analysis workflow to the tools/services and data that went into the result. This rich context would be further enhanced by supported some basic social networking technology, allowing researchers to annotate any level of the information stack (from data/service up through experiment) with contextual knowledge.

Such an "Earth Science Collaboratory (ESC)" (Fig. x) has been proposed within the ESIP Federation, with an Earth Science Collaboratory Cluster formed to push the idea forward. The ESC would allow researchers to share not just data, but tools, services, analysis workflows (i.e., techniques), and resutls as easily as links are shared today in tools such as Facebook, thus preserving the full context of a given result as well as the contextual knowledge added by the researcher. However, there are potential benefits for many other types of user. For instance, science assessment committees would be able to share with each other both the (usually highly processed) end results and articles but also the input data and tools, greatly increasing transparency of the assessment. Novice graduate students would be able to "follow" more experienced researchers in the field, thus learning how to handle the data properly and avoiding common pitfalls. Educators would be able to put together science stories that trace back to the original data, allowing them to give students exposure to what "real" data look like, and how they are eventually processed to yield a compelling story. Users of Decision Support Systems (DSS) would be able to collaborate in real time with the scientist whose research is incorporated into the DSS, providing a valuable bridge over the chasm that often separates research and operations.

Such an Earth Science Collaboratory faces a number of hurdles, both technical and non-technical. However, the NSF EarthCube is aligned along the same axis, and could therefore provide the critical impetus toward realization of the ESC.

Conclusion (All)