Difference between revisions of "NASA ACCESS09: Tools and Methods for Finding and Accessing Air Quality Data"

From Earth Science Information Partners (ESIP)
Line 12: Line 12:
  
 
The impediments from the user's point of view are succinctly stated in the report by NAS (1989), in short: There are '''no data''' for what the user needs, if there are needed data, the '''user can not find them''', if she can find them, she '''can not access''' them, if she can access then, she does not '''know how good they are''', if she finds the data good, she '''can not merge''' them with other data. The data provider face a similar set of hurdles: There are '''no users''' for the data, if there are users, the provider '''can not find them''', if she can find users, she does not know how to '''deliver the data''', if she can deliver, she does not know how to '''make them more valuable'''...
 
The impediments from the user's point of view are succinctly stated in the report by NAS (1989), in short: There are '''no data''' for what the user needs, if there are needed data, the '''user can not find them''', if she can find them, she '''can not access''' them, if she can access then, she does not '''know how good they are''', if she finds the data good, she '''can not merge''' them with other data. The data provider face a similar set of hurdles: There are '''no users''' for the data, if there are users, the provider '''can not find them''', if she can find users, she does not know how to '''deliver the data''', if she can deliver, she does not know how to '''make them more valuable'''...
 +
 +
==Approach==
 +
 +
* Service Orientation, while accepted has not been widely adapted for serving NASA products
 +
* SOA allows the creation of loosely coupled, agile, data systems
 +
* SOA -> requires ability to Publish, Find, Bind (Register, Discover, Access)
 +
 +
* Stages: Acquisition, Repackaging, Usage
 +
** Acquisition is a stovepipe
 +
** Usage is value chain
 +
** In-between is the 'market'
 +
 +
 +
==== The Network ====
 +
 +
* Fan-In,  Fan-Out
 +
* (so is GCI) not central
 +
* holarchy , data up into the pool though the aggregator network and down the disaggregator/filter network
 +
[[Image:ScaleFreeNetwork3.png|300px]]
 +
* Data distributed through Scale-free aggregation network. Metadata contributed along the line of usage. Homogenized and shared.
 +
  
  

Revision as of 15:10, June 6, 2009

Air Quality Cluster > AQIP Main Page > Proposal | NASA ACCESS Solicitation | Context | Resources | Forum | Participants

Tools and Methods for Air Quality Data Access and Discovery Services

The Air Quality Community Catalog: Tools and Methods for Data Access and Discovery Services Short: Tools for Data Access and Discovery Services

Background

Recent developments offer outstanding opportunities to fulfill the information needs for Earth Sciences and support for many societal benefit areas. The satellite sensing revolution of the 1990's now yield near-real-time observations of many Earth System parameters. The data from surface-based monitoring networks now routinely provide detailed cgaracterisation of atmospheric and surface parameters. The ‘terabytes’ of data from these surface and remote sensors can now be stored, processed and delivered in near-real time and the instantaneous ‘horizontal’ diffusion of information via the Internet now permits, in principle, the delivery of the right information to the right people at the right place and time. Standardized computer-computer communication languages and the emerging Service-Oriented information systems now facilitate the flexible processing of raw data into high-grade scientific or ‘actionable’ knowledge. Last but not least, the World Wide Web has opened the way to generous sharing of data and tools leading to faster knowledge creation through collaborative analysis in real and virtual workgroups.

Nevertheless, Earth scientists and societal decision makers face significant hurdles. The production of Earth observations and models are rapidly outpacing the rate at which these observations are assimilated and metabolized into actionable knowledge that can produce societal benefits. The “data deluge” problem is especially acute for analysts interested in climates change and atmospheric processes are inherently complex, the numerous relevant data range form detailed surface-based chemical measurements to extensive satellite remote sensing and the integration of these requires the use of sophisticated models. As a consequence, Earth Observations (EO) are under-utilized in science and for making societal decisions.

The impediments from the user's point of view are succinctly stated in the report by NAS (1989), in short: There are no data for what the user needs, if there are needed data, the user can not find them, if she can find them, she can not access them, if she can access then, she does not know how good they are, if she finds the data good, she can not merge them with other data. The data provider face a similar set of hurdles: There are no users for the data, if there are users, the provider can not find them, if she can find users, she does not know how to deliver the data, if she can deliver, she does not know how to make them more valuable...

Approach

  • Service Orientation, while accepted has not been widely adapted for serving NASA products
  • SOA allows the creation of loosely coupled, agile, data systems
  • SOA -> requires ability to Publish, Find, Bind (Register, Discover, Access)
  • Stages: Acquisition, Repackaging, Usage
    • Acquisition is a stovepipe
    • Usage is value chain
    • In-between is the 'market'


The Network

  • Fan-In, Fan-Out
  • (so is GCI) not central
  • holarchy , data up into the pool though the aggregator network and down the disaggregator/filter network

ScaleFreeNetwork3.png

  • Data distributed through Scale-free aggregation network. Metadata contributed along the line of usage. Homogenized and shared.


This proposal...application of the GEOSS concepts in the federated data system, DataFed. The proposal focuses on the SAO aspects of the publish find bind. ...a contribution to the emerging architecture of GEOSS. It is recognized that it represents just one of the many configurations that is consistent with the loosely defind concept of GEOSS.


The implementation details and the various applications of DataFed are reported elsewhere [4]-[6].



Data Value Chain Stages: Acquisition - Mediation - Application

  • Acquisition: Data from Sensor -> CalVal -> Data exposed
  • Mediation: Accessible/Reusable -> Leverable
  • Application: Processed -> LeveragedSynergy -> Productivity

Provider and User Oriented Designs

  • Providers offers it wares ... to reach maxinum users in many applications

GEOSS Fanin Fanout.png
GEOSSUIC Diagram.png
PublishFindBind.png


Data as Service

  • Wrappers, reusable tools and methods (wrapper classes for: SeqImage/Seq File, SQL, netCDF) into WCS and WMS
  • WCS/WMS GetCapabilities Conventions that allow metadata reuse
  • WCS Capabilitie expanded -> WMS (Combination of WCS and Render) (build metadata in 2 steps wcs and then augment with wms fields)
  • WMS GetCapabilities > ISO Maker Tool publish metadata
  • google analytic sensors - where do we put it? so that we can identify users pattern.

To illustrate the Network, Coding (faceting) through metadata, WMS to show data

    • WCS next
  • Binding to data through standard data access protocols, publishing and finding requires metadata system

Data Discovery

  • Semantic Mediation - Repackaging/homogenizing metadata - added value comes from incorporating user actions back into the semantic relationships.
  • Metadata system for publishing and finding content has to be jointly developed between data providers and users.
  • Generic catalog systems - metadata collection of not only what provider has done but also tracking what users need
    • Collecting and Enhancing Metadata from observing Users
  • Communication along the value chain, in both direction;
  • Metadata the glue and the message
  • Market approach; many providers; many users; may products
  • Faceted search
    • user is happy
  • Search by usage data

Description on how users will discover and use services provided by NASA, other Agencies, academia..

  • Detail on discovery services
  • System components for persistent availability of these services
    • machine-to-machine interface
    • GUI interface

Classes of Users

  • by value chain
  • by level of experience
  • by ...

Data Assess and Usage

Provider Oriented Catalog:

  • All data from a provider (subset fall data)
  • Metadata only (Standard protocol)
  • Provider metadata (meta-meta-meta data)


User Oriented Catalog:

  • The right data to the user at the right time the right (subset fall data)
  • Seamlessly accessable (Standard protocol)
  • Complete Metadata (meta-meta-meta data)


  • Has to handle derived data (Raw-procssed Pyramid --- less along the value chain-network)

Performance Measurement and Feedback

Metadata from Providers

Metadata from Users

  • Google Analytics/Google Sitemap - provides feedback and helps market process by improving "shopping" experience for users - creates values to both users and producers.
  • Amazon - collects data on user actions in order to help the next user navigate to books of interest. collects data on text in the book in order to relate books together

Metadata from Mediators

Management

Community approach:

  • ESIP AQ Workgroup with links to
    • GEO CoP -> Linking multi-region (global), multi SBA
  • Agency (Air Quality Information Partnership (AQIP))
    • EPA
    • NASA
    • NOAA
    • DOE ....
  • ESIP
    • Semantic Cluster
      • Offer: a rich highly textured data needing semantics
      • Needed: Semantics of the data descriptions and finding
    • Web Services and Orchestration Cluster
      • Offer: A rich array of WCS data access services
      • Different workflow & orchestration clients
    • Meetings
      • Winter
      • Summer
    • Telecons
    • OGC WCS netCDF
      • Stefano
      • Ben
      • Max Cugliano
  • Other Related Proposals/Projects
    • Show our CC proposal to AQWG
      • Ask the if they have a way to use this CC as testbed
      • Add a paragraph into the proposal to indicate the way their fits in

links

ESIP


DataFed wiki

NASA Existing Component - Links

  • Atmosphere Data Reference Sheet - Datasets identified to be relevant to atmospheric research.
  • Giovanni - that provides a simple and intuitive way to visualize, analyze, and access vast amounts of Earth science remote sensing data without having to download the data GIOVANNI metadata describes briefly parameter
  • parameter information pages - provide short descriptions of important geophysical parameters; information about the satellites and sensors which acquire data relevant to these parameters; links to GES DAAC datasets which contain these parameters; and external data source links where data or information relevant to these parameters can be found.
  • Mirador new search and order Web interface employs the Google mini appliance for metadata keyword searches. Other features include quick response, data file hit estimator, Gazetteer (geographic search by feature name capability), event search
  • Frosty
  • GCMD - Has selected Air Quality datasets; provides lots of discovery metadata keywords, citation, etc. lacks standard data access.
  • A-Train Data Depot - to process, archive, allow access to, visualize, analyze and correlate distributed atmospheric measurements from A-Train instruments.
  • Atmospheric Composition Data and Information Service Center - is a portal to the Atmospheric Composition (AC) specific, user driven, multi-sensor, on-line, easy access archive and distribution system employing data analysis and visualization, data mining, and other user requested techniques for the better science data usage.
  • WIST - Warehouse Inventory Search Tool. search-and-order tool is the primary access point to 2,100 EOSDIS and other Earth science data sets
  • FIND - The FIND Web-based system enables users to locate data and information held by members of the Federation (DAACs are Type 1 ESIPs.) FIND incorporates EOSDIS data available from the DAAC Alliance data centers as well as data from other Federation members, including government agencies, universities, nonprofit organizations, and businesses.
  • SESDI Semantically-Enabled Science Data Integration | Vision - ACCESS Project, Peter Fox - will demonstrate how ontologies implemented within existing distributed technology frameworks will provide essential, re-useable, and robust, support for an evolution to science measurement processing systems (or frameworks) as well as for data and information systems (or framework) support for NASA Science Focus Areas and Applications.