Difference between revisions of "Discovery White Paper"

From Earth Science Information Partners (ESIP)
Line 9: Line 9:
  
 
Describe at a really high level the technologies involved (i.e., OpenSearch, Collection casting, service casting, data casting, aggregation, etc.)
 
Describe at a really high level the technologies involved (i.e., OpenSearch, Collection casting, service casting, data casting, aggregation, etc.)
* What are the key problems in achieving interworkability?  E.g., searching distributed sources, knowing what services are available for a given dataset or data file, including non-data-center data providers (better term?)
+
* Need to put these in the context of the overall ESIP approach, i.e., federation with emphasis on lightweight capabilities that are inherently distributed.
* How does ESIP and its clusters solve these issues?
 
** Technical aspect
 
** Governance aspect
 
  
 
== Example Scenario - Ruth ==
 
== Example Scenario - Ruth ==

Revision as of 06:26, September 2, 2011

NOTE: This is the ESIP Discovery Cluster's forum for working on a white paper for NSF's EarthCube program. If you feel that you have something positive to contribute and aren't yet a member of the cluster, feel free to join (i.e., the wiki and the monthly telecons). If you'd like to work on sections of the White Paper add your name to that section of the outline and start writing (note - multiple authors for a section are encouraged but should collaborate with each other)! We are looking for roughly 2-3 pages of text (plus images if any), so be pithy


Introduction - Ruth

Describe the "Grand Challenge"

Describe ESIP as a whole, the cluster concept, and what this cluster is all about (i.e., background)

Describe at a really high level the technologies involved (i.e., OpenSearch, Collection casting, service casting, data casting, aggregation, etc.)

  • Need to put these in the context of the overall ESIP approach, i.e., federation with emphasis on lightweight capabilities that are inherently distributed.

Example Scenario - Ruth

Technology framework - Chris Mattman

Governance

Looking to the future - Chris Lynnes

The preceding text demonstrates how a lightweight standard or convention can nonetheless enable significant interoperability with respect to discovering data and services, and furthermore, how similar, interlocking conventions can provide cross-cutting interoperability, in this case between services and data. However, these are not the only Earth science entities that we should like to encompass in our drive to make systems "interworkable". Data and services (or tools) can be combined in sequences to form scientific workflows. The analysis results from executing these workflows may also be thought of in a fashion similar to data. And the results themselves may be aggregated into an experiment, in much the same way that different model runs are aggregated into an ensemble. Many of the key discovery attributes of workflows, results and experiments can be inherited from the data and service building blocks from which they are made. As a result, it is not too ambitious to hope that the entire "information stack", from data and services, up through workflows, results and experiments, can be interoperable (or interworkable) both horizontally (data with data, result with result) and vertically (data with tool with workflow with result with experiment). Such an interoperability framework would convey the key advantage of presenting everything in the proper context: a given result could be traced back down through the analysis workflow to the tools/services and data that went into the result. This rich context would be further enhanced by supported some basic social networking technology, allowing researchers to annotate any level of the information stack (from data/service up through experiment) with contextual knowledge.

Such an "Earth Science Collaboratory (ESC)" (Fig. x) has been proposed within the ESIP Federation, with an Earth Science Collaboratory Cluster formed to push the idea forward. The ESC would allow researchers to share not just data, but tools, services, analysis workflows (i.e., techniques), and resutls as easily as links are shared today in tools such as Facebook, thus preserving the full context of a given result as well as the contextual knowledge added by the researcher. However, there are potential benefits for many other types of user. For instance, science assessment committees would be able to share with each other both the (usually highly processed) end results and articles but also the input data and tools, greatly increasing transparency of the assessment. Novice graduate students would be able to "follow" more experienced researchers in the field, thus learning how to handle the data properly and avoiding common pitfalls. Educators would be able to put together science stories that trace back to the original data, allowing them to give students exposure to what "real" data look like, and how they are eventually processed to yield a compelling story. Users of Decision Support Systems (DSS) would be able to collaborate in real time with the scientist whose research is incorporated into the DSS, providing a valuable bridge over the chasm that often separates research and operations.

Such an Earth Science Collaboratory faces a number of hurdles, both technical and non-technical. However, the NSF EarthCube is aligned along the same axis, and could therefore provide the critical impetus toward realization of the ESC.

Conclusion (All)