Example SQL for Stations
Back to WCS Access to netCDF Files
This is an example on how to serve Station-Point data through WCS data access protocol. The basis of the use case is the [http:AIRNow.gov AIRNow] dataset that represents hourly near-real-time surface observations of PM10, PM2.5 and ozone over the US. This example use may be applicable to all datasets where:
- Locations: the monitoring is performed a fixed geographic points, i.e. monitoring stations
- Times: the sampling is over a fixed time range (e.g. hour or day) and fixed periodicity (every six hour or every third day)
- Parameters: the number of observed Earth Observation parameters is finite and known
The Locations, Times and Parameters constitute the dimensions of the Station-Point data space.
Airnow data source
Daily PM 2.5, PM 10 and Ozone data can be downloaded from date encoded folders. The data and location tables are in daily text files.
- Folder for 2011-07-05: http://www.epa.gov/airnow/2011/20110705/
- PM 10 observations: http://www.epa.gov/airnow/2011/20110705/pm10_070511.obs
- PM 2.5 observations: http://www.epa.gov/airnow/2011/20110705/pmfine_070511.obs
- Ozone observations: http://www.epa.gov/airnow/2011/20110705/super_070511.obs
- Active monitoring sites: http://www.epa.gov/airnow/2011/20110705/active-sites.dat
These constitute the raw input data in their native formats. These are then transferred to a SQL database. The schema to the database is totally up to the implementers. The WCS server uses specific, well defined database VIEWS that are created for WCS. See VIEW descriptions below.
View in CORE data catalog and datafed browser
The Design of the SQL Database
The most common and flexible method of storing station time series point data is an SQL database. The data collector designs the schema, which captures all the measured data and metadata. This very often requires a database with lots of tables, there is no one-size-fits-all design around because of differing requirements.
Fortunately, relational databases were invented just to solve this problem:
Physical Data Tables and Virtual Data Views
The physical database on the left is the real data storage, which contains multiple tables and typical relationships of a fully normalized schema.
On the right is the virtual WCS data access view, a flat, simple data source to select data from. The WCS does not have to understand the physical structure at all. The real configuration of the WCS system is in the making of these SQL views.
- Physical database:
- The center of this database is the location table.
- Each location has a unique loc_code as the key column.
- Each measure parameter has its own data table
- loc_code for the location of the station
- datetime for the observation time
- data columns value and quality flag.
- Virtual View
- loc_code, lat and lon is selected from the location table
- datetime, pm10 and pm10 quality flag is selected from the pm10 data table
- loc_code is used to join location to the data
- the result is a flat view with six columns
The SQL view definition:
CREATE VIEW pm10_data AS SELECT location.loc_code, lat, lon, datetime, pm10, pm10_qf FROM location INNER JOIN pm10 ON location.loc_code = pm10.loc_code
Example query to above SQL view, getting pm10 data from Continental US at 2010-01-17T09:00:00.
The WCS point data is published using a fairly popular design philosophy called convention over configuration. Rather than allowing any name, longitude or längengrad, and then mark it to be the longitude column somewhere else, the field name is fixed to lon. On the other hand the data and metadata column names can be anything.
Location View
The Location view needs at least 3 columns: loc_code, lat and lon. These columns are joined with the data table in the WCS data queries.
The standard field names are loc_code, loc_name (optional), lat, lon, elev (optional). It is important to use these names, since client software, like datafed browser, expects them. Any other fields can be added. The system will just copy the additional data to output.
There are two reasons to use an SQL view also for the location table.
- The location data may be distributed over several tables. In this case, the view can hide the SQL joins.
- The location table most likely has different names for the columns. In this case, the view may do nothing else but rename SiteCode to loc_code and latitude to lat.
Once the SQL view is created, it can be published using Web Feature Service, WFS. Web Coverage Service, WCS, was originally designed to serve gridded data, and in the DescribeCoverage response there is no convenient way to encode the location dimension. WFS, on the other hand, was designed to serve static geographical features, and matches well for the job.
Example WFS request to get the WFS AIRNOW location table from the location SQL view.
The WFS standard is defined in WFS 1.0.0.
Notes on the Physical Database Schema Possibilities
There are numerous ways to organize the physical data tables, and all of these ways have pros and cons. The AIRNOW database is organized by creating a physical data table for each parameter separately. This arrangement has a some advantages: fastest to query single parameters and no need for NULLs
Other possibilies are:
Big Wide table: One big table with columns loc_code datetime pmfine pm10 super. In this case, querying multiple parameters at the same time requires no joins. Unfortunately, missing data must be expressed with NULLs.
Long and skinny table: loc_code datetime param_code param_value. In this case each row contains two data columns: the parameter code, telling what the measurement actually is, and the parameter value. In this case new parameter codes can be added any time without changing the database schema.
There is no universally best solution, pros and conses must be weighted for your case.
The main point is Physical - Logical Separation! Whatever is your physical schema, the flat views are used for data access. The view system allows you to change the physical schema completely without changing the WCS configuration.
Client-side view of AIRNow WCS
Map Query for large area and single time instance: Identifier=AIRNOW RangeSubset=pmfine TimeSequence=2011-07-01T18:00:00 BoundingBox=-90,35,-70,45] . The actual WCS getCoverage call is:
Time Series Query for a time range and a single location: Identifier=AIRNOW RangeSubset=pmfine[location[420010001]] TimeSequence=2005-06-01/2011-09-01/PT1H
The syntax RangeSubset=pmfine[location[420010001]] is standard as documented in WCS 1.1.2 Standards. It specifies the pmfine parameter, filters by dimension location selecting loc_code = 420010001.
TimeSequence=2005-06-01/2011-09-01/PT1H has time_min/time_max/periodicity. PT1H is hourly, P1D is daily.
AIRNow registered in GEO Air Quality Community Catalog
Once the AIRNow WCS service is available as a tested and functioning web service, it is ready to be published in a service catalog(s) where the potential clients can find it and access (bind to) it. In other words, it is ready to be included in a network following the publish-find-bind triad of Service Oriented Architecture.
Unfortunately, a general registry and catalog of OGC W*S services does not exist. The task of identifying and organizing the available WCS data access services then falls on the communities within specific domains. For the Air Quality domain, a subset of the available interoperable services are gathered in the GEO Air Quality Community Catalog, which is harvested by the GEO Clearinghouse. The service can be registered in multiple catalogs.
View AIRNow in in the AQ Community Catalog.
and datafed browser
Configuring the WFS Service and WCS Service for point data
In the CF-NetCDF, the CF metadata is enough to do all the configuration. The configuration contains:
With SQL views in place, you only need to tell the WFS and WCS servers the view names. In the AIRNOW case, WFS configuration for locations.
view_alias : location_with_statistics columns : loc_code, loc_name, lat, lon view_filter : param_abbr='pm10' and data_count > 0
Since the location_with_statistics contains all the parameter statistics, the view must be filtered to include pm10 only.
The data configuration needs the view and some basic metadata. Configuring the PM 10:
pm10: title : Particulate Matter 10 datatype : float units : ug/cm^3 data_view : view_alias : pm10_data, columns : loc_code, lat, lon, datetime, pm10, pm10_qf
External Examples
TODO: separate page for CIRA, more infor about EBAS