Subcommittee on Interoperability
- 2008-07-02: Subcommittee on Interoperability Telecon Minutes
- 2008-07-02: RHusar: Data Reuse through Federation ppt slides
- 2008-06-04: Subcommittee on Interoperability Telecon Minutes
- 2008-05-21: Subcommittee on Interoperability Telecon Minutes
- 2008-05-07: Subcommittee on Interoperability Telecon Minutes
Charge for Subcommittee:
Decide on a standard that we can use to build a connection to flow data and then demonstrate that ability. We can then improve this ability by incrementally expanding the scope of what is transferred.
We should be compliant with OGC (the Open Geospatial Consortium) and GEO (Group on Earth Observations) standards; we should also coordinate to ensure we fit into future plans for the EN (environmental information Exchange Network).
The initial focus will be on air quality data, but we need to also consider metadata and standard names (for things like pollutants/parameters, units of measure, time, and station/site identifiers). The WMO (world meteorological organization) and GEO may be guideposts for this.
In order to facilitate the work of the Interoperability Subcommittee, a wiki workspace was set up on the topic of Interoperability of Data systems. This workspace is on the ESIP wiki and will be used to accommodate inter-agency and inter-disciplinary participation.
We relied on a heuristic method of discussing standards that appear to be making universal inroads (e.g., into GEO) and are supported by reputable organizations (W3C, OGC, OASIS) and have been widely accepted in the information technology and environmental monitoring communities. This led us to decide that the WCS (web coverage service from OGC) would be the first service we attempt to pilot. There was general agreement that everyone seems happy with WCS and that it is a safe place to start, e.g., units, nomenclature, etc., however it may only be a subset of what needs to be considered.
We need to agree on:
- a messaging exchange scheme (perhaps SOAP or KVP or HTTP/REST?)
- a common definition of layer (that is, what if a model has 15 layers and 100 parameter, must you get it all)
- a payload structures and formats.
Messaging Exchange Scheme
Using WCS, the call is one string, so value pairs are the only option.
The architectual issues of authentication (will we require it?) and query size limits (most systems will refuse a query if the returned package will be inordinately large)have not yet been considered.
Common Definition of Layer
A layer is known as coverage and describes one parameter of one dataset.
Payload formats for sending Data
The list of payloads we want to consider is:
- NetCDF (from unidata) with CF metadata/conventions
- KML (or KMZ) (Keyhole Markup Language)
- CSV (comma separated values)
- EN compliant XML
Bolded options are seen as preferred payload formats for WCS. netCDF has more advantages, but CSV is simpler to use.
Each of these would require more definition to before we can implement and we should probably pick one to begin with. For example, what would the CSV structure be – would there be minimum requirements for station or raster data?
What payloads get sent, etc? What is the “payload” if WCS is used? The KML format is widely used. CSV files can be used. XML is desired for the exchange network; file sizes are an issue, as is how to convert into Oracle. There is a need to engage Rudy Husar in this dialogue. David McCabe, also, has ideas that should be explored.
We will definitely need to exchange raw data (that is, sample measurements). We will probably need to exchange NAAQS averages (8- and 24-hour averages) and possibly daily summary information. Site and monitor description files are also something we will need to exchange.
A key remaining task is to identify which fields will be mandatory in all exchanges and which will be optional. The optional fields will likely be data elements that are only available from a subtset of the participating systems, so they cannot be required to be provided by all. The mandatory fields will be a set of lowest-common-denominator fields that all systems carry and are necessary to copmletely describe a measurement.
Potential Fields for AQS/AirNow/VIEWS Web Service CSV File
Below is a list of the fields (and descriptions) typically requested with RAW DATA from AQS / Data Mart. The field names are based on the XML tags which are based on the EPA standard naming conventions, but they can be changed.
- Latitude Measure The monitoring site's angular distance north or south of the equator measured in decimal degrees.
- Longitude Measure The monitoring site's angular distance east of the prime meridian measured in decimal degrees.
- Horizontal Datum The Datum associated with the Latitude and Longitude measures.
- Horizontal Accuracy The reported accuracy (in meters) of the Latitude and Longitude measures.
- Tribal Code If the monitor is reported as being on tribal land, the BIA code of the tribe to which that land belongs.
- State Code The FIPS code of the state in which the monitor resides.
- County Code The FIPS code of the county in which the monitor resides.
- Site Number A unique number within the county identifying the site.
- Parameter Code The code corresponding to the parameter measured by the monitor.
- POC This is the “Parameter Occurrence Code” used to distinguish different instruments that measure the same parameter at the same site.
- AQS Parameter Description The name assigned in AQS to the parameter measured by the monitor. Parameters may be pollutants or non-pollutants (this term has the same meaning as “substance”).
- Date Local The calendar date in Local Standard Time at the monitor for the sample.
- 24 Hour Local The time of day on a 24-hour clock in Local Standard Time for the sample.
- Date GMT The calendar date in Greenwich Mean Time at the monitor for the sample.
- 24 Hour GMT The time of day on a 24-hour clock in Greenwich Mean Time for the sample.
- Year GMT The calendar year of the sample in Greenwich Mean Time.
- Day In Year GMT The sequential day in the year of the sample in Greenwich Mean Time.
- Measurement Value Standard The measured sample value in standard units of measure for the parameter.
- Measure Unit Name Standard The standard units of measure for the parameter.
- Last Update Date The date and time (EST) of the last time this data was updated.
- Duration Description The length of time that air passes through the monitoring device before it is analyzed (measured). So, it represents an averaging period in the atmosphere (for example, a 24-hour sample duration draws ambient air over a collection filter for 24 straight hours). For continuous monitors, it can represent an averaging time of many samples (for example, a 1-hour value may be the average of four one-minute samples collected during each quarter of the hour).
- Frequency Description How often the monitor takes a sample. For hourly data (Duration = 1 hour), this field is null and means the frequency is continuous (that is, also 1 hour). Other typical values are daily, every third day, etc.
- Minimum Detectable Limit The minimum sample concentration detectable for the monitor and method. NOTE: IF SAMPLES ARE REPORTED BELOW THIS LEVEL, THEY MAY BE REPLACED WITH ½ THIS LIMIT.
- Measurement Uncertainty The total uncertainty associated with a reported measurement as indicated by the reporting agency.
- Qualifier Description Sample values may have qualifiers that indicate why they are missing or that they are out of the ordinary. Types of qualifiers are: null data, exceptional event, natural events, and quality assurance.
- Event Data Flag An indication of whether the sample value has been flagged as associated with an exceptional (out of the ordinary) air pollution event by the submitter.
- Certification Indicator An indication of the certification status of the data. The regulations require that submitting state, local, and tribal agencies certify criteria pollutant data by June of the year after it was submitted.
- Method Type An indication of whether the method used to collect the data is a federal reference method (FRM), equivalent to a federal reference method, an approved regional method, or none of the above (non-federal reference method).
- Method Description The text that describes the process and/or tools that manage storage, disposal, treatment, and other handling protocols designed for and/or used in taking the sample.
- State Name The name of the state where the measurement was taken.
- County Name The name of the county where the measurement was taken.
- MSA Name The name of the metropolitan statistical area where the measurement was taken. (In AQS, this will be replaced by CBSA soon.)
Classification of Air Quality Data
The air quality data that is to be exchanged can also be classified in many ways.
There are measurements, aggregates (daily summaries, MSA summaries, etc.), events, method descriptions, etc.
Measurements can be broad in space: on a 3-D model grid or 2-D satellite field of view (raster data), or more limited in space to a path (lidar or mobile monitor) or point (stationary monitor).
Regarding time, measurements can also be a continuous (6 second, 5 minute, or 1 hour) series or discrete/instantaneous (including aggregates).
There are two types of metadata: technical (like a grid size, file creation date, etc.) and business (like data source and data quality indicators, model run characteristics, or descriptions of data and how it was manipulated).
We briefly discussed a separate kind of metadata (“operational”?) to notify downstream users that something upstream has changed: the CAP (common alerting protocol) from OASIS, which GEO is investigating. Getting news about critical data events was something that participants at the Data Summit thought were important. Atom and RSS are also possibilities for this.
Another type of metadata discussed was that related to “discovery” of services. That is, for the ‘system of systems’ in the value chain, what data is available, from where, and how.
Outstanding Issues with interop. using WCS
In exploring the idea of using WCS for the exchange of air quality information, several practical issues have arisen. These will need to be worked out prior to any development work.
- For most of my academic users, we need an 'asynchronous' request. They would prefer to submit a request and along with that, and email address that the results can be sent to. The WCS model does not generally support such a process.
- The other thing my users (academics and AirNow) need is a "changed since date" argument. That is, they want to query AQS for 1-hour ozone data for May 01, 2008, that has changed since my last query run on June 01, 2008. The WCS model does not include this.
- In the WCS, what is a coverage and how can we best make use of that? For example, is a coverage a single pollutant/parameter. Are ozone and wind speed two coverages requiring two separate requests. Is the 1-hour ozone one coverage and the 8-hour ozone another? Can a coverage have data for 20 years in it? Can it have all of the PM2.5 species or all of the criteria pollutants? I'm trying to understand what the "coverage" is and how many we'll have in AQS - could be millions and millions. The definition of coverage seems pretty loose in the standard, so this may be up to us to define. We'd prefer a small number of coverages that can be subsetted (for "output coverages"), but we would have to extend the domain of KVPs to include things like parameter and duration (maybe 'averaging time' is more general to other data sources) to make this happen or is this handled via the field variables? What would you recommend for coverages?
- A coverage can be encoded as a single pollutant parameter. Each version of the data (1-hr, 8-hr...) needs to be a different coverage. The query allows data selection in the geographic bounding box and arbitrary time range. WCS coverages for the criteria pollutants (e.g. those given in the zip files on AMTIC) would make sense.
- All WCS requests have to use a bounding box - a state or county or specific station cannot be specified (I see that Rudy has asked the WCS to make a mod!).
- Parameter names. I understand the naming standard/process that Rudy took us through during one of the calls, but we've got 1500 parameters, 5-10 averaging times, many many methods, etc. that I don't want to have to redescribe. The community is familiar with what's in AQS - can we use those parameter codes or names?
WCS Tools and Resources
- Screencast on DataFed WCS (5 min WCS, 4 min DataFed app)
- DataFed Protocol-based Data Access Paper | PPT.
- To see WCS live, select a dataset from the DataFed catalog. and then click on the WCS button in the second column, say on the AirNow tabular data WCS link. In the Format field select CSV Table. Click on GetCoverage and a CSV table is returned ... Your browser may invoke Excel to show the table.
- Interoperability Resources
What other materials on interoperability should be collected?
Need a WCS tutorial we could use/modify?
Current State of Data Flow and Interoperability between the 'Core' Data Systems
- AQS and AIRNOW need to better communicate on what we want to share; then a web service can be added.
The following EPA-affiliated systems currently provide OGC-WCS for multiple kinds of data:
- RSIG http://badger.epa.gov/rsig/rsigserver serves the following data:
* NASA MODIS mod04/6/7 (AOD, COT, Ozone, etc.) forwarded from lpweb.nascom.nasa.gov/cgi-bin/modisserver * NASA CALIPSO LIDAR (Backscatter) * NESDIS-GOES_Biomass-Burning (CO, PM25, etc.) * EPA CMAQ (Met & AQ) * EPA AIRNow (GMT-hourly Ozone, PM25) using datafed then adding capabilities such as regridding to CMAQ * EPA AQS Datamart (GMT-hourly Ozone, PM25 plus average, maxes) * UVNet (irradiance) Note rsigserver is also an OGC-WMS serving images and animations (PNG, MPEG, KMZ) to EGS.
- Datafed http://webapps.datafed.net/ogc_EPA.wsfl serves many kinds of data including AIRNow (on a 24-hour delay).
Note Datafed also has OGC-WMS.
- EPA AIRNow (via above Datafed & RSIG)
- EPA AQS (via above RSIG)
- Louis Sweeny expressed reservations about the extensibility and compatibility of WCS as a general purpose data transport protocol, since it was originaly designed for GIS coverages, but agreed that its what we have so its the place to start.
- Keyhole Markup Language (KML) adopted an open standard on format, not interoperability.
- The files put out in KML format are related to service from Google Earth.
- USGS is automatically updating earthquake data.
A related question involves the future of the National Environmental Information Exchange Network (NEIEN) and what is the next generation for this network? Also, it was noted that the Exchange Network Leadership Council (ENLC) has plans for an exchange network. Chris Clark (OEI) can help with tech issues on EIEN and on web services. It was suggested that Nick send Steve a note about what is needed and Steve will see that the note is forwarded to Chris. In addition, Linda Travers (OEI) and Chet Wayland (OAQPS) could be interested in the future of interoperability via ENLC. Their input on resource issues for the exchange of data should probably be sought. How can such a program be developed with limited resources? A more “meaty” proposal could help move this activity forward in OEI; for example, OAQPS could be put forward as an example user.
SOS as an alternative to WCS
Given the issues identified above related to using a WCS to transfer point data, the GEO standards were reviewed again to see if there is a more appropriate alternative.
Looking at the GEOSS Registries web page, there is the OGC Sensor Observation Service (SOS): SOS Standard.
It seems to have the following advantages:
- It is geared toward point observations from monitoring stations and not at a complete geographic coverage.
- It addresses some of the same issues that Tim Dye brought up about dictionaries of descriptions and how certain things must be domain specific - like the parameter names for air quality.
- The SOS GetObservation operation includes an ad-hoc query capability that allows a client to filter observations by time, space, sensor, and phenomena.
- It allows for the requestor to think of the data in thier own terms: "the consumer might approach this problem from either a sensorcentric or an observation-centric point of view."
- What I'm not sure of is how it would handle aggregated data - like our NAAQS or daily averages. Would this be a phenomenon, or a different class of "observation"?
- The preferred payload is still GML or some other "SWE" markup language, so we may have to bend that rule to allow for CSV; but at least the discovery and query model is a better fit for us than the WCS model.
Possible Interop tests/documentation among the core network nodes(?)
A quick implementation to demonstrate the concept is important to success. Also, having a client that is easy to understand and use it important to show the value of common interoperability work. A spreadsheet with a macro, or a Google Earth implementation was considered.
Managers are concerned with “see / feel / touch”. We need something quick to show managers, an indication of how it works, and identification of what the benefits are. How do you make it tangible? Put it on a spread sheet and tie-up for the broader community. Target Google Earth. Think in service-oriented terms. Identify a client.
The following are desirable:
- a table of data;
- a list of services;
- questions for EPA managers on types of data transfer;
- a demonstration or something easy to visualize.
There are several data systems affiliated with EPA, which could be made interoperable using the WCS OGC Standard Protocol.
AQS and Airnow are going to pilot the WCS interface.
How about moving RSIG's OGC-WCS airnowserver to a public SonomaTech computer to remove the 24-hour delay in accessing files from Datafed's AIRNow WCS?
How about, in the long-term, having all of the data providers for small site data send their data into the AQS Datamart (on an hourly or daily basis) so their data could become OGC-WCS-accessible via the existing rsigserver (which already has access to the AQS Datamart DB and could likely be quickly modified to handle this additional data)?
Following the group recommendation at the March 12 telecon, it was recommended that a subcommittee be formed on interoperability of data systems to address the diversity of interoperable data standards and to make recommendations. Several volunteers agreed to participate, including: David McCabe, Steve Young, Nick Mangus, Tim Dye, and Rudy Husar. The interested data systems should monitor the activities of the interoperability group. The initial activities of the group should include:
- Identify interoperability standards and methods,
- Test and apply these standards to several EPA data systems
- Apply GEO principles and architecture and ESIP venues and community