NASA ACCESS09: Tools and Methods for Finding and Accessing Air Quality Data

From Earth Science Information Partners (ESIP)

Air Quality Cluster > AQIP Main Page > Proposal | NASA ACCESS Solicitation | Context | Resources | Forum | Participants

Tools and Methods for Finding and Delivering Air Quality Data

Short: Tools for Finding and Delivering Air Quality Data

This proposal is in response to the solicitation: Advancing Collaborative Connections for Earth System Science (ACCESS) 2009. In particular, it focuses on providing "means for users to discover and use services being made available by NASA, other Federal agencies, academia, the private sector, and others". This proposal is offering tools and methods for data access and discovery services for Air Quality-related datasets. However, the tools, methods and infrastructure should be applicable to other domains of science and application.

There are abundant impediments to seamless and effective data usage that both data providers and users encounter. The impediments from the user's point of view are succinctly stated in the report by NAS (1989), in short: the user can not find the data, if she can find them, she can not access them, if she can access then, she does not know how good they are, if she finds the data good, she can not merge them with other data. The data provider face a similar set of hurdles: the provider can not find the users, if she can find users, she does not know how to seamlessly deliver the data, if she can deliver, she does not know how to make them more valuable to the users. This project intends to provide support to overcome the first two hurdles: finding the user/provider and accessing/delivering the desired data.


Approach

Architectural Approach: The architectural basis of the proposed work is Service Oriented Architecture (SOA) for the publishing, finding, binding to and delivery of data as services. The critical aspect of SOA is the loose coupling between service providers and service users. Loose coupling is accomplished through plug-and-play connectivity facilitated by standards-based data access service protocols. SOA is the only architecture that we are aware of that allows seamless connectivity of data between a rich set of provider resources and diverse array of users. Service providers registers services in a suitable service registry, the users discover the desired service and access the data The result is a dynamic binding mechanism for the construction of loosely-coupled work-flow applications.

Service orientation, has been accepted as the desired way of delivering Earth Observation (EO) data products. However, formal standards-based data-as-service offerings have been slow delivery of data products in NASA and other Agencies. While offering images through OGC WMS standard interface is becoming common for many Federal Agency data products, there is currently no effective way for the users to find those services advertised and exposed over dispersed over the web pages.

Management Approach: Core group Community - Find-bind ... was developend and tested durion AIP2 by the cooomm Collaborative,

Engineering and Implementation: The three major components of this SOA-based project are: (1) facilities for publication of data services (2) facilities to find data services and (3) facilities to access data services. Everything is a service. mash ups to other systems. Connectivity-workflow. The engineering design of the proposed work is Each of these components will be supported by a set of tools and methods.

Technology Approach: The project will rely on three key mature and widely used standard protocols for interoperability : (1) OGC WMS, WCS and WFS web services for accessing data; (2)ISO 19115 for geographic metadata and (3) HTTP Get for inter-service data transfer. Furthermore, the developing software is implemented using three key maturing 'intellectual technologies': (1) Tagging for flexible, user-extensible structuring and annotation of diverse metadata; (2) Faceted search technology for navigation through multidimensional data discovery and (3) Ajax-based dynamic user interface to the data discovery

Data as Service

  • Wrappers, reusable tools and methods (wrapper classes for: SeqImage/Seq File, SQL, netCDF) into WCS and WMS
    • Unidata (netCDF), CF (CF working group; Naming); GALEON (WCS-netCDF) building on top of...

Metadata as Service

  • Metadata not service: readme file
  • Metadata as service: Capability


WCS/WMS GetCapabilities Conventions that allow metadata reuse

  • WCS Capabilitie expanded -> WMS (Combination of WCS and Render) (build metadata in 2 steps wcs and then augment with wms fields)
  • WMS GetCapabilities > ISO Maker Tool publish metadata

Publishing (metadata -> WAF)

The data discovery through the clearinghouse is aided by ISO metadata for Geospatial data which is prepared for each dataset.

The metadata has the primary purpose to facilitate finding and accessing the data in order to help dealing with first two hurdles that the users face. Clearly, the air quality specific metadata such as sampling platform, data domain and measured parameters etc. need to be defined by air quality users. Dealing with the hurdles of data quality and multi-sensory data integration are topics of future efforts.

The metadata is prepared by transforming and augmenting OGC GetCapabilities into ISO metadata records. The GetCapabilities document provides initial metadata for an ISO 19115 metadata record, through tools and methods provided this metadata is extended to include AQ-specific metadata. The ISO record is validated and saved into the AQ community catalog. The community catalog is registered as a component in the GEOSS Component and Service Registry (CSR). The GEOSS Clearinghouses query the GEOSS CSR for catalogs and then periodically harvest the catalogs for their metadata records, ending the metadata publishing process.

20090505 AIP2 Metadata Registration.png

Finding (...)

The finding of air quality data is accomplished in two stages: a coarse filter generic to all Earth Observations through the clearinghouse and then a high resolution filter specific to air quality in the AQ Catalog Browser.

The GEOSS clearinghouse provides a coarse filter using generic discovery metadata to find Earth Observations. However, the clearinghouse exposes a search API which enables machine queries to be made. The returned records can be further searched using the entire metadata record originally submitted allowing a more refined search specific to a particular community. The AQ Community has built a catalog browser interface to the clearinghouse which enables this two step process. After the initial coarse filter, the returned records are browsed using a customized, faceted search interface was built to search the extended AQ metadata and find AQ data using specific filters such as sampling platform and data structure. Finding the right data is further enhanced by the user's ability to immediately view data as WMS through multiple clients provided by ESRI, Compusult and others. This proposal allow query results to be embedded in another web page. The proposal will also link to multiple available WMS viewers to browse layers available in catalog.

Additionally, the finding process will be augmented through web-based analytics that monitor user activity of the AQ catalog. These analytics will highlight where users come from spatially and virtually (i.e. by search engine, link...) as well as what datasets, spatial and temporal domain and parameters are most viewed or what datasets were viewed together. This feedback will help to hone the catalog to provide more useful information to users such as more datasets in a certain domain or tips for what others who viewed this dataset also viewed. The metrics will also help providers by identifying who some of the users of the data are.

20090505 AIP2 ADC UICSlide8.PNG

Binding

Once the data are accessible through standard service protocols and discoverable through the clearinghouse they can be incorporated and browsed in any client application including the ESRI and Compusult GEO Portals.

The registered datasets are also directly accessible to air quality specific, work-flow based clients which can perform value-adding data processing and analysis.

The loose coupling between the growing data pool in GEOSS and workflow-based air quality client software shows the benefits of the Service Oriented Architecture to the Air Quality and Health Societal Benefit Area.

Add new browser and client

Background

Recent developments offer outstanding opportunities to fulfill the information needs for Earth Sciences and support for many societal benefit areas. The satellite sensing revolution of the 1990's now yield near-real-time observations of many Earth System parameters. The data from surface-based monitoring networks now routinely provide detailed cgaracterisation of atmospheric and surface parameters. The ‘terabytes’ of data from these surface and remote sensors can now be stored, processed and delivered in near-real time and the instantaneous ‘horizontal’ diffusion of information via the Internet now permits, in principle, the delivery of the right information to the right people at the right place and time. Standardized computer-computer communication languages and the emerging Service-Oriented information systems now facilitate the flexible processing of raw data into high-grade scientific or ‘actionable’ knowledge. Last but not least, the World Wide Web has opened the way to generous sharing of data and tools leading to faster knowledge creation through collaborative analysis in real and virtual workgroups.

Nevertheless, Earth scientists and societal decision makers face significant hurdles. The production of Earth observations and models are rapidly outpacing the rate at which these observations are assimilated and metabolized into actionable knowledge that can produce societal benefits. The “data deluge” problem is especially acute for analysts interested in climates change and atmospheric processes are inherently complex, the numerous relevant data range form detailed surface-based chemical measurements to extensive satellite remote sensing and the integration of these requires the use of sophisticated models. As a consequence, Earth Observations (EO) are under-utilized in science and for making societal decisions.

Web Services

A Web Service is a URL addressable resource that returns requested data, e.g. current weather or the map for a neighborhood. Web Services use standard web protocols: HTTP, XML, SOAP, WSDL allow computer to computer communication, regardless of their language or platform. Web Services are reusable components, like ‘LEGO blocks’, that allow agile development of richer applications with less effort. Visionaries (e.g. Berners-Lee, the ‘father’ of the Internet) argue that Web services can transform the web from a medium for viewing and downloading to distributed data/knowledge-exchange and computing.

Enabling Protocols of the Web Services architecture: Connect: Extensible Markup Language (XML) is the universal data format that makes data and metadata sharing possible. Communicate. Simple Object Access Protocol (SOAP) is the new W3C protocol for data communication, e.g. making and responding to requests. Describe. Web Service Description Language (WSDL) describes the functions, parameters and the returned results from a service. Discover. Universal Description, Discovery and Integration (UDDI) is a broad W3C effort for locating and understanding web services.

Service Oriented Architecture (SOA)) provides methods for systems development and integration where systems package functionality as interoperable services. SOA allows different applications to exchange data with one another. Service-orientation aims at a loose coupling of services with operating systems, programming languages and other technologies that underlie applications. These services communicate with each other by passing data from one service to another, or by coordinating an activity between two or more services. SOA can be seen in a continuum, from older concepts of distributed computing and modular programming, through to current practices of mashups, and Cloud Computing.


20090505 AIP2 ADC UICSlide3.PNG
There are numerous Earth Observations that are available and in principle useful for air quality applications such as informing the public and enforcing AQ standards. However, connecting a user to the right observations or models is accompanied by an array of hurdles.

The GEOSS Common Infrastructure allows the reuse of observations and models for multiple purposes

Even in the narrow application of Wildfire smoke, observations and models can be reused.
20090505 AIP2 ADC UICSlide4.PNG

20090505 AIP2 ADC UICSlide5.PNG
The ADC and UIC are both participating stakeholders in the functioning of the GEOSS information system that overcomes these hurdles. The UIC is in position to formulate questions and the ADC can provide infrastructure that delivers the answers.

20090505 AIP2 ADC UICSlide6.PNG
The data reuse is possible through the service oriented architecture of GEOSS.

  • Service providers registers services in the GEOSS Clearinghouse.
  • Users discover the needed service and access the data

The result is a dynamic binding mechanism for the construction of loosely-coupled work-flow applications.

20090505 AIP2 ADC UICSlide7.PNG
The metadata has the primary purpose to facilitate finding and accessing the data in order to help dealing with first two hurdles that the users face. Clearly, the air quality specific metadata such as sampling platform, data domain and measured parameters etc. need to be defined by air quality users. Dealing with the hurdles of data quality and multi-sensory data integration are topics of future efforts.


The finding of air quality data is accomplished in two stages.

  • the data are filtered through the generic discovery mechanism of the clearinghouse
  • then air quality specific filters such as sampling platform and data structure are applied

20090505 AIP2 ADC UICSlide9.PNG
Once the data are accessible through standard service protocols and discoverable through the clearinghouse they can be incorporated and browsed in any application including the ESRI and Compusult GEO Portals.

20090505 AIP2 ADC UICSlide10.PNG
The registered datasets are also directly accessible to air quality specific, work-flow based clients which can perform value-adding data processing and analysis.

The loose coupling between the growing data pool in GEOSS and workflow-based air quality client software shows the benefits of the Service Oriented Architecture to the Air Quality and Health Societal Benefit Area.
20090505 AIP2 ADC UICSlide11.PNG

The Network

  • Fan-In, Fan-Out
  • (so is GCI) not central
  • holarchy , data up into the pool though the aggregator network and down the disaggregator/filter network

ScaleFreeNetwork3.png

  • Data distributed through Scale-free aggregation network. Metadata contributed along the line of usage. Homogenized and shared.


This proposal...application of the GEOSS concepts in the federated data system, DataFed. The proposal focuses on the SAO aspects of the publish find bind. ...a contribution to the emerging architecture of GEOSS. It is recognized that it represents just one of the many configurations that is consistent with the loosely defind concept of GEOSS.


The implementation details and the various applications of DataFed are reported elsewhere [4]-[6].



Data Value Chain Stages: Acquisition - Mediation - Application

  • Acquisition: Data from Sensor -> CalVal -> Data exposed
  • Mediation: Accessible/Reusable -> Leverable
  • Application: Processed -> LeveragedSynergy -> Productivity

Provider and User Oriented Designs

  • Providers offers it wares ... to reach maxinum users in many applications

GEOSS Fanin Fanout.png
GEOSSUIC Diagram.png
PublishFindBind.png

Community

To illustrate the Network, Coding (faceting) through metadata, WMS to show data

    • WCS next
  • Binding to data through standard data access protocols, publishing and finding requires metadata system
  • google analytic sensors - where do we put it? so that we can identify users pattern.

Community

Data Discovery

  • Semantic Mediation - Repackaging/homogenizing metadata - added value comes from incorporating user actions back into the semantic relationships.
  • Metadata system for publishing and finding content has to be jointly developed between data providers and users.
  • Generic catalog systems - metadata collection of not only what provider has done but also tracking what users need
    • Collecting and Enhancing Metadata from observing Users
  • Communication along the value chain, in both direction;
  • Metadata the glue and the message
  • Market approach; many providers; many users; may products
  • Faceted search
    • user is happy
  • Search by usage data

Description on how users will discover and use services provided by NASA, other Agencies, academia..

  • Detail on discovery services
  • System components for persistent availability of these services
    • machine-to-machine interface
    • GUI interface

Classes of Users

  • by value chain
  • by level of experience
  • by ...

Data Assess and Usage

Provider Oriented Catalog:

  • All data from a provider (subset fall data)
  • Metadata only (Standard protocol)
  • Provider metadata (meta-meta-meta data)


User Oriented Catalog:

  • The right data to the user at the right time the right (subset fall data)
  • Seamlessly accessable (Standard protocol)
  • Complete Metadata (meta-meta-meta data)


  • Has to handle derived data (Raw-procssed Pyramid --- less along the value chain-network)

Performance Measurement and Feedback

Metadata from Providers

  • Provide discovery, access information
  • Providers can also provide information about how users behave once they are at their site

Metadata from Users

  • Google Analytics/Google Sitemap - provides feedback and helps market process by improving "shopping" experience for users - creates values to both users and producers.
  • Amazon - collects data on user actions in order to help the next user navigate to books of interest. collects data on text in the book in order to relate books together
  • Recently Viewed widget Tracks for the user the last things that they viewed

Metadata from Mediators

  • Alexa Amazon Web Information Serviceprovides analytics about another site. This is good for us b/c we could access analytics about distributed data access and show them uniformly in catalog (Site Overview pages, Traffic Detail pages and Related Links pages)
    • Alexa also example of site pulling in information about one URL from multiple sources. (Amazon example)
  • perform collaborative filtering based upon data collected from more than one [data provider]
  • Avinash Kaushik - Evangelist for Google Analytics
  • Key Performance Indicators -
    • Where do people come from? search, other links - % of visits across all traffic source
    • Bounce rate - Came to one page and left immediately (Good for us b/c it helps direct people to the right information/right place/right time)
      • Combining where people come from and how many people from that source bounce lets you know if you are targeting the right people...
      • Wrong audience
      • Wrong Landing page
    • Visitor loyalty - # that come in a give duration
    • Recency of visit - do you retain people over time
  • What are the key outcomes you want people to do (i.e. subscribe to feed, click on data link, click on metadata link...)
  • What is the top content on site?

Management

Community approach:

  • ESIP AQ Workgroup with links to
    • GEO CoP -> Linking multi-region (global), multi SBA
  • Agency (Air Quality Information Partnership (AQIP))
    • EPA
    • NASA
    • NOAA
    • DOE ....
  • ESIP
    • Semantic Cluster
      • Offer: a rich highly textured data needing semantics
      • Needed: Semantics of the data descriptions and finding
    • Web Services and Orchestration Cluster
      • Offer: A rich array of WCS data access services
      • Different workflow & orchestration clients
    • Meetings
      • Winter
      • Summer
    • Telecons
    • OGC WCS netCDF
      • Stefano
      • Ben
      • Max Cugliano
  • Other Related Proposals/Projects
    • Show our CC proposal to AQWG
      • Ask the if they have a way to use this CC as testbed
      • Add a paragraph into the proposal to indicate the way their fits in

links

ESIP


DataFed wiki

NASA Existing Component - Links

  • Atmosphere Data Reference Sheet - Datasets identified to be relevant to atmospheric research.
  • Giovanni - that provides a simple and intuitive way to visualize, analyze, and access vast amounts of Earth science remote sensing data without having to download the data GIOVANNI metadata describes briefly parameter
  • parameter information pages - provide short descriptions of important geophysical parameters; information about the satellites and sensors which acquire data relevant to these parameters; links to GES DAAC datasets which contain these parameters; and external data source links where data or information relevant to these parameters can be found.
  • Mirador new search and order Web interface employs the Google mini appliance for metadata keyword searches. Other features include quick response, data file hit estimator, Gazetteer (geographic search by feature name capability), event search
  • Frosty
  • GCMD - Has selected Air Quality datasets; provides lots of discovery metadata keywords, citation, etc. lacks standard data access.
  • A-Train Data Depot - to process, archive, allow access to, visualize, analyze and correlate distributed atmospheric measurements from A-Train instruments.
  • Atmospheric Composition Data and Information Service Center - is a portal to the Atmospheric Composition (AC) specific, user driven, multi-sensor, on-line, easy access archive and distribution system employing data analysis and visualization, data mining, and other user requested techniques for the better science data usage.
  • WIST - Warehouse Inventory Search Tool. search-and-order tool is the primary access point to 2,100 EOSDIS and other Earth science data sets
  • FIND - The FIND Web-based system enables users to locate data and information held by members of the Federation (DAACs are Type 1 ESIPs.) FIND incorporates EOSDIS data available from the DAAC Alliance data centers as well as data from other Federation members, including government agencies, universities, nonprofit organizations, and businesses.
  • SESDI Semantically-Enabled Science Data Integration | Vision - ACCESS Project, Peter Fox - will demonstrate how ontologies implemented within existing distributed technology frameworks will provide essential, re-useable, and robust, support for an evolution to science measurement processing systems (or frameworks) as well as for data and information systems (or framework) support for NASA Science Focus Areas and Applications.