Difference between revisions of "Subcommittee on Interoperability"

From Earth Science Information Partners (ESIP)
Line 48: Line 48:
  
 
====Classification of Air Quality Data====
 
====Classification of Air Quality Data====
 
 
The air quality data that is to be exchanged can also be classified in many ways.
 
The air quality data that is to be exchanged can also be classified in many ways.
  
Line 64: Line 63:
  
 
Another type of metadata discussed was that related to “discovery” of services.  That is, for the ‘system of systems’ in the value chain, what data is available, from where, and how.
 
Another type of metadata discussed was that related to “discovery” of services.  That is, for the ‘system of systems’ in the value chain, what data is available, from where, and how.
 +
 +
 +
==== Outstanding Issues with interop. using WCS ====
 +
In exploring the idea of using WCS for the exchange of air quality information, several practical issues have arisen.  These will need to be worked out prior to any development work.
 +
 +
#For most of my academic users, we need an 'asynchronous' request.  They would prefer to submit a request and along with that, and email address that the results can be sent to.  The WCS model does not generally support such a process.
 +
#The other thing my users (academics and AirNow) need is a "changed since date" argument.  That is, they want to query AQS for 1-hour ozone data for May 01, 2008, that has changed since my last query run on June 01, 2008.  The WCS model does not include this.
 +
#In the WCS, what is a coverage and how can we best make use of that?  For example, is a coverage a single pollutant/parameter.  Are ozone and wind speed two coverages requiring two separate requests.  Is the 1-hour ozone one coverage and the 8-hour ozone another?  Can a coverage have data for 20 years in it?  Can it have all of the PM2.5 species or all of the criteria pollutants?  I'm trying to understand what the "coverage" is and how many we'll have in AQS - could be millions and millions.  The definition of coverage seems pretty loose in the standard, so this may be up to us to define.  We'd prefer a small number of coverages that can be subsetted (for "output coverages"), but we would have to extend the domain of KVPs to include things like parameter and duration (maybe 'averaging time' is more general to other data sources) to make this happen or is this handled via the field variables?  What would you recommend for coverages?
 +
##A coverage can be encoded as a single pollutant parameter. Each version of the data (1-hr, 8-hr...) needs to be a different coverage. The query allows data selection in the geographic bounding box and arbitrary time range.  WCS coverages for the criteria pollutants (e.g. those given in the zip files on AMTIC) would make sense.
 +
#All WCS requests have to use a bounding box - a state or county or specific station cannot be specified (I see that Rudy has asked the WCS to make a mod!).
 +
#Parameter names.  I understand the naming standard/process that Rudy took us through during one of the calls, but we've got 1500 parameters, 5-10 averaging times, many many methods, etc. that I don't want to have to redescribe.  The community is familiar with what's in AQS - can we use those  parameter codes or names?
  
 
== WCS Tools and Resources ==
 
== WCS Tools and Resources ==

Revision as of 12:11, August 18, 2008

Back to <Data Summit Workspace
Back to <Community Air Quality Data System Workspace
Back to <Interoperability of Air Quality Data Systems

Subcommittee Telecons

Charge for Subcommittee:

Decide on a standard that we can use to build a connection to flow data and then demonstrate that ability. We can then improve this ability by incrementally expanding the scope of what is transferred.

Guiding Principles:

We should be compliant with OGC (the Open Geospatial Consortium) and GEO (Group on Earth Observations) standards; we should also coordinate to ensure we fit into future plans for the EN (environmental information Exchange Network).

The initial focus will be on air quality data, but we need to also consider metadata and standard names (for things like pollutants/parameters, units of measure, time, and station/site identifiers). The WMO (world meteorological organization) and GEO may be guideposts for this.

In order to facilitate the work of the Interoperability Subcommittee, a wiki workspace was set up on the topic of Interoperability of Data systems. This workspace is on the ESIP wiki and will be used to accommodate inter-agency and inter-disciplinary participation.

WCS

We relied on a heuristic method of discussing standards that appear to be making universal inroads (e.g., into GEO) and are supported by reputable organizations (W3C, OGC, OASIS) and have been widely accepted in the information technology and environmental monitoring communities. This led us to decide that the WCS (web coverage service from OGC) would be the first service we attempt to pilot. There was general agreement that everyone seems happy with WCS and that it is a safe place to start, e.g., units, nomenclature, etc., however it may only be a subset of what needs to be considered.

We need to agree on:

  1. a messaging exchange scheme (perhaps SOAP or KVP or HTTP/REST?)
  2. a common definition of layer (that is, what if a model has 15 layers and 100 parameter, must you get it all)
  3. a payload structures and formats.

Messaging Exchange Scheme

Using WCS, the call is one string, so value pairs are the only option.

Common Definition of Layer

A layer is known as coverage and describes one parameter of one dataset.

Payloads - Format for sending Data

The list of payloads we want to consider is:

  1. NetCDF (from unidata) with CF metadata/conventions
  2. KML (or KMZ) (Keyhole Markup Language)
  3. CSV (comma separated values)
  4. EN compliant XML

Bolded options are seen as preferred payload formats for WCS. netCDF has more advantages, but CSV is simpler to use.

Each of these would require more definition to before we can implement and we should probably pick one to begin with. For example, what would the CSV structure be – would there be minimum requirements for station or raster data?

What payloads get sent, etc? What is the “payload” if WCS is used? The KML format is widely used. CSV files can be used. XML is desired for the exchange network; file sizes are an issue, as is how to convert into Oracle. There is a need to engage Rudy Husar in this dialogue. David McCabe, also, has ideas that should be explored.

Classification of Air Quality Data

The air quality data that is to be exchanged can also be classified in many ways.

There are measurements, aggregates (daily summaries, MSA summaries, etc.), events, method descriptions, etc.

Measurements can be broad in space: on a 3-D model grid or 2-D satellite field of view (raster data), or more limited in space to a path (lidar or mobile monitor) or point (stationary monitor).

Regarding time, measurements can also be a continuous (6 second, 5 minute, or 1 hour) series or discrete/instantaneous (including aggregates).


Metadata

There are two types of metadata: technical (like a grid size, file creation date, etc.) and business (like data source and data quality indicators, model run characteristics, or descriptions of data and how it was manipulated).

We briefly discussed a separate kind of metadata (“operational”?) to notify downstream users that something upstream has changed: the CAP (common alerting protocol) from OASIS, which GEO is investigating. Getting news about critical data events was something that participants at the Data Summit thought were important. Atom and RSS are also possibilities for this.

Another type of metadata discussed was that related to “discovery” of services. That is, for the ‘system of systems’ in the value chain, what data is available, from where, and how.


Outstanding Issues with interop. using WCS

In exploring the idea of using WCS for the exchange of air quality information, several practical issues have arisen. These will need to be worked out prior to any development work.

  1. For most of my academic users, we need an 'asynchronous' request. They would prefer to submit a request and along with that, and email address that the results can be sent to. The WCS model does not generally support such a process.
  2. The other thing my users (academics and AirNow) need is a "changed since date" argument. That is, they want to query AQS for 1-hour ozone data for May 01, 2008, that has changed since my last query run on June 01, 2008. The WCS model does not include this.
  3. In the WCS, what is a coverage and how can we best make use of that? For example, is a coverage a single pollutant/parameter. Are ozone and wind speed two coverages requiring two separate requests. Is the 1-hour ozone one coverage and the 8-hour ozone another? Can a coverage have data for 20 years in it? Can it have all of the PM2.5 species or all of the criteria pollutants? I'm trying to understand what the "coverage" is and how many we'll have in AQS - could be millions and millions. The definition of coverage seems pretty loose in the standard, so this may be up to us to define. We'd prefer a small number of coverages that can be subsetted (for "output coverages"), but we would have to extend the domain of KVPs to include things like parameter and duration (maybe 'averaging time' is more general to other data sources) to make this happen or is this handled via the field variables? What would you recommend for coverages?
    1. A coverage can be encoded as a single pollutant parameter. Each version of the data (1-hr, 8-hr...) needs to be a different coverage. The query allows data selection in the geographic bounding box and arbitrary time range. WCS coverages for the criteria pollutants (e.g. those given in the zip files on AMTIC) would make sense.
  4. All WCS requests have to use a bounding box - a state or county or specific station cannot be specified (I see that Rudy has asked the WCS to make a mod!).
  5. Parameter names. I understand the naming standard/process that Rudy took us through during one of the calls, but we've got 1500 parameters, 5-10 averaging times, many many methods, etc. that I don't want to have to redescribe. The community is familiar with what's in AQS - can we use those parameter codes or names?

WCS Tools and Resources

What other materials on interoperability should be collected?

Need a WCS tutorial we could use/modify?

Current State of Data Flow and Interoperability between the 'Core' Data Systems

  • AQS and AIRNOW need to better communicate on what we want to share; then a web service can be added.

The following EPA-affiliated systems currently provide OGC-WCS for multiple kinds of data:

* NASA MODIS mod04/6/7 (AOD, COT, Ozone, etc.)
  forwarded from lpweb.nascom.nasa.gov/cgi-bin/modisserver
* NASA CALIPSO LIDAR (Backscatter)
* NESDIS-GOES_Biomass-Burning (CO, PM25, etc.)
* EPA CMAQ (Met & AQ)
* EPA AIRNow (GMT-hourly Ozone, PM25)
  using datafed then adding capabilities such as regridding to CMAQ
* EPA AQS Datamart (GMT-hourly Ozone, PM25 plus average, maxes)
* UVNet (irradiance)
Note rsigserver is also an OGC-WMS serving images and animations (PNG, MPEG, KMZ) to EGS.

Note Datafed also has OGC-WMS.

  • EPA AIRNow (via above Datafed & RSIG)
  • EPA AQS (via above RSIG)
  • Louis Sweeny expressed reservations about the extensibility and compatibility of WCS as a general purpose data transport protocol, since it was originaly designed for GIS coverages, but agreed that its what we have so its the place to start.
  • Keyhole Markup Language (KML) adopted an open standard on format, not interoperability.
  • The files put out in KML format are related to service from Google Earth.
  • USGS is automatically updating earthquake data.

A related question involves the future of the National Environmental Information Exchange Network (NEIEN) and what is the next generation for this network? Also, it was noted that the Exchange Network Leadership Council (ENLC) has plans for an exchange network. Chris Clark (OEI) can help with tech issues on EIEN and on web services. It was suggested that Nick send Steve a note about what is needed and Steve will see that the note is forwarded to Chris. In addition, Linda Travers (OEI) and Chet Wayland (OAQPS) could be interested in the future of interoperability via ENLC. Their input on resource issues for the exchange of data should probably be sought. How can such a program be developed with limited resources? A more “meaty” proposal could help move this activity forward in OEI; for example, OAQPS could be put forward as an example user.

Possible Interop tests/documentation among the core network nodes(?)

A quick implementation to demonstrate the concept is important to success. Also, having a client that is easy to understand and use it important to show the value of common interoperability work. A spreadsheet with a macro, or a Google Earth implementation was considered.

Managers are concerned with “see / feel / touch”. We need something quick to show managers, an indication of how it works, and identification of what the benefits are. How do you make it tangible? Put it on a spread sheet and tie-up for the broader community. Target Google Earth. Think in service-oriented terms. Identify a client.

The following are desirable:

  • a table of data;
  • a list of services;
  • questions for EPA managers on types of data transfer;
  • a demonstration or something easy to visualize.

There are several data systems affiliated with EPA, which could be made interoperable using the WCS OGC Standard Protocol. <ask format="ul" limit="100" > +</ask>

  • EIS

AQS and Airnow are going to pilot the WCS interface.

How about moving RSIG's OGC-WCS airnowserver to a public SonomaTech computer to remove the 24-hour delay in accessing files from Datafed's AIRNow WCS?

How about, in the long-term, having all of the data providers for small site data send their data into the AQS Datamart (on an hourly or daily basis) so their data could become OGC-WCS-accessible via the existing rsigserver (which already has access to the AQS Datamart DB and could likely be quickly modified to handle this additional data)?


Following the group recommendation at the March 12 telecon, it was recommended that a subcommittee be formed on interoperability of data systems to address the diversity of interoperable data standards and to make recommendations. Several volunteers agreed to participate, including: David McCabe, Steve Young, Nick Mangus, Tim Dye, and Rudy Husar. The interested data systems should monitor the activities of the interoperability group. The initial activities of the group should include:

  1. Identify interoperability standards and methods,
  2. Test and apply these standards to several EPA data systems
  3. Apply GEO principles and architecture and ESIP venues and community