Subcommittee on Interoperability

From Earth Science Information Partners (ESIP)
Revision as of 13:21, May 21, 2008 by Erinmr (talk | contribs)

Back to <Data Summit Workspace
Back to <Community Air Quality Data System Workspace
Back to <Interoperability of Air Quality Data Systems

Charge for Subcommittee:

Decide on a standard that we can use to build a connection to flow data and then demonstrate that ability. We can then improve this ability by incrementally expanding the scope of what is transferred.

Guiding Principles:

We should be compliant with OGC (the Open Geospatial Consortium) and GEO (Group on Earth Observations) standards; we should also coordinate to ensure we fit into future plans for the EN (environmental information Exchange Network).

The initial focus will be on air quality data, but we need to also consider metadata and standard names (for things like pollutants/parameters, units of measure, time, and station/site identifiers). The WMO (world meteorological organization) and GEO may be guideposts for this.

In order to facilitate the work of the Interoperability Subcommittee, a wiki workspace was set up on the topic of Interoperability of Data systems. This workspace is on the ESIP wiki and will be used to accommodate inter-agency and inter-disciplinary participation.

WCS

We relied on a heuristic method of discussing standards that appear to be making universal inroads (e.g., into GEO) and are supported by reputable organizations (W3C, OGC, OASIS) and have been widely accepted in the information technology and environmental monitoring communities. This led us to decide that the WCS (web coverage service from OGC) would be the first service we attempt to pilot. There was general agreement that everyone seems happy with WCS and that it is a safe place to start, e.g., units, nomenclature, etc., however it may only be a subset of what needs to be considered.

We need to agree on:

  1. a messaging exchange scheme (perhaps SOAP or KVP or HTTP/REST?)
  2. a common definition of layer (that is, what if a model has 15 layers and 100 parameter, must you get it all)
  3. a payload structures and formats.

Messaging Exchange Scheme

Common Definition of Layer

Payloads - Format for sending Data

The list of payloads we want to consider is:

  1. NetCDF (from unidata) with CF metadata/conventions
  2. KML (or KMZ) (Keyhole Markup Language)
  3. CSV (comma separated values)
  4. EN compliant XML

Each of these would require more definition to before we can implement and we should probably pick one to begin with. For example, what would the CSV structure be – would there be minimum requirements for station or raster data?

What payloads get sent, etc? What is the “payload” if WCS is used? The KML format is widely used. CSV files can be used. XML is desired for the exchange network; file sizes are an issue, as is how to convert into Oracle. There is a need to engage Rudy Husar in this dialogue. David McCabe, also, has ideas that should be explored.

Classification of Air Quality Data

The air quality data that is to be exchanged can also be classified in many ways.

There are measurements, aggregates (daily summaries, MSA summaries, etc.), events, method descriptions, etc.

Measurements can be broad in space: on a 3-D model grid or 2-D satellite field of view (raster data), or more limited in space to a path (lidar or mobile monitor) or point (stationary monitor).

Regarding time, measurements can also be a continuous (6 second, 5 minute, or 1 hour) series or discrete/instantaneous (including aggregates).


Metadata

There are two types of metadata: technical (like a grid size, file creation date, etc.) and business (like data source and data quality indicators, model run characteristics, or descriptions of data and how it was manipulated).

We briefly discussed a separate kind of metadata (“operational”?) to notify downstream users that something upstream has changed: the CAP (common alerting protocol) from OASIS, which GEO is investigating. Getting news about critical data events was something that participants at the Data Summit thought were important. Atom and RSS are also possibilities for this.

Another type of metadata discussed was that related to “discovery” of services. That is, for the ‘system of systems’ in the value chain, what data is available, from where, and how.

WCS Tools and Resources

Here is a screencast on DataFed WCS (5 min WCS, 4 min DataFed app). Also , here is a paper on the topic, and the related PPT. To see WCS live, select a dataset from the DataFed catalog. and then click on the WCS button in the second column, say on the AirNow tabular data WCS link. In the Format field select CSV Table. Click on GetCoverage and a CSV table is returned ... Your browser may invoke Excel to show the table.

What other materials on interoperability should be collected?

Need a WCS tutorial we could use/modify?

Current state of data flow and interoperability between the 'core' data systems

  • AQS and AIRNOW need to better communicate on what we want to share; then a web service can be added.
  • RSIG uses WCS, but Louis Sweeny has expressed reservations.
  • Keyhole Markup Language (KML) adopted an open standard on format, not interoperability.
  • The files put out in KML format are related to service from Google Earth.
  • USGS is automatically updating earthquake data.

A related question involves the future of the National Environmental Information Exchange Network (NEIEN) and what is the next generation for this network? Also, it was noted that the Exchange Network Leadership Council (ENLC) has plans for an exchange network. Chris Clark (OEI) can help with tech issues on EIEN and on web services. It was suggested that Nick send Steve a note about what is needed and Steve will see that the note is forwarded to Chris. In addition, Linda Travers (OEI) and Chet Wayland (OAQPS) could be interested in the future of interoperability via ENLC. Their input on resource issues for the exchange of data should probably be sought. How can such a program be developed with limited resources? A more “meaty” proposal could help move this activity forward in OEI; for example, OAQPS could be put forward as an example user.

There are several data systems affiliated with EPA, which could be made interoperable using the WCS OGC Standard Protocol. <ask format="ul" limit="100" > +</ask>

Possible Interop tests/documentation among the core network nodes(?)

A quick implementation to demonstrate the concept is important to success. Also, having a client that is easy to understand and use it important to show the value of common interoperability work. A spreadsheet with a macro, or a Google Earth implementation was considered.

Managers are concerned with “see / feel / touch”. We need something quick to show managers, an indication of how it works, and identification of what the benefits are. How do you make it tangible? Put it on a spread sheet and tie-up for the broader community. Target Google Earth. Think in service-oriented terms. Identify a client.

In summary, the following are desirable:

  • a table of data;
  • a list of services;
  • questions for EPA managers on types of data transfer;
  • a demonstration or something easy to visualize.

Subcommittee Telecons



Following the group recommendation at the March 12 telecon, it was recommended that a subcommittee be formed on interoperability of data systems to address the diversity of interoperable data standards and to make recommendations. Several volunteers agreed to participate, including: David McCabe, Steve Young, Nick Mangus, Tim Dye, and Rudy Husar. The interested data systems should monitor the activities of the interoperability group. The initial activities of the group should include:

  1. Identify interoperability standards and methods,
  2. Test and apply these standards to several EPA data systems
  3. Apply GEO principles and architecture and ESIP venues and community