Example SQL for Stations

From Earth Science Information Partners (ESIP)

Back to WCS Access to netCDF Files

A Real life example how to serve station timeseries point data through WCS data access protocol.

AIRNOW

Daily PM 2.5, PM 10 and Ozone data can be downloaded from date encoded folders. The data and location tables are in text files.

Folder for 2011-07-05: http://www.epa.gov/airnow/2011/20110705/

PM 10 observations: http://www.epa.gov/airnow/2011/20110705/pm10_070511.obs

Active sites: http://www.epa.gov/airnow/2011/20110705/active-sites.dat

Below is the description of the public database interface.

The Design of the SQL Database

The most flexible method of storing station time series point data is an SQL database. Typically, the collector of the data designs a schema, that captures all the measured data, metadata of the measurements, station instrument information etc. This typically requires a database with lots of tables and relationships between them. There is no one-size-fits-all design around, because requirements for data collecting systems are different.

Fortunately, relational databases were invented just to solve this standardization problem:

SQL views.png

The complex database on the left is the real data storage, which is not an easy thing to understand.

On the right, is the requirement for the WCS: a flat, simple table listing place, time and data value.

The tool that brings these two together is an SQL View. It allows the administrator to create a view, a virtual table, that can be queried with a simple filter, taking the complexity off from the WCS point access system.


Location table

The Location table needs at least 3 columns: loc_code, lat and lon

The names are fixed. This goes with fairly popular design philosophy, convention over configuration. Rather than allowing any name, longitude or längengrad, and then mark it to be the longitude column somewhere else, the field name is fixed to lon. This is done at the SQL View level, renaming the fields.

Location table.png

The fixed field names are loc_code, loc_name (optional), lat, lon, elev (optional). It is important to use these names, since client software, like datafed browser, expects them.

Any other fields can be added. The system will just copy the additonal data to output.

Raw Location Table Example

Loc with id.png

This location table has no loc_code as a single field. The primary key contains three fields: state, county and site codes. The loc_id is a surrogate key, automatically generated number and carries no information by itself.

In this case, the location SQL view can create the loc_code in two ways. It can either concatenate the 3 numeric codes into a 9-character loc_code field, or use loc_id as a loc_code. If loc_id is used as a loc_code, it has the drawback that it must be maintained forever, where as state, county, and site ID's are given from outside.

Parameter Views

Similarly, parameters are published as views. The picture below shows a table with columns loc_code, lat, lon, datetime and pm10. It could also contain data quality metadata for pm10.

Simple data view.png

The field names are standardized, datetime is the standard name for observation time column. The parameter column names can be anything, as well as the metadata flag columns.

It is to be noted, that the parameter view is a result of a join, and therefore properly normalized. There is no denormalized table, that repeats the loc_code, lat and lon columns over and over again.

Fact - Data Table

There are numerous ways to organize the data tables. In the AIRNOW database the parameters are pmfine(PM 2.5), pm10 and super(Ozone)

  • Big Wide table: One big table with columns loc_code datetime pmfine pm10 super
    • Fast to query all the three data fields, no joining.
    • Needs NULLs for missing data
  • Long and skinny table: loc_code datetime param_code param_value
    • The param_code is either pmfine, pm10 or super, telling what data the param value is.
    • No need for NULLs
  • Own table for each parameter. The pm10 table contains fields loc_code datetime pm10
    • fastest to query single parameters
    • No need for NULLs
    • More difficult for system administrator.

There is no universally best solution, pros and conses must be weighted for your case.

Data Views in SQL

Physical - Logical Separation!

The concept of a view is of ultimate importance in database practice. It allows the system administrator to change the real data tables, without breaking existing software, because the views can be maintained separately. For example, you can change the physical table structure from wide table to long and skinny table and still maintain the pm10 view above.

Sometimes, the views have to do so much joining and filtering, that the performance drop to unacceptable level. In this case, it's advisable to create a snapshot view. Most promising method is to create a location table with integer based loc_id, and the data tables that use this loc_id as a foreign key. Querying the data then uses one inner join. This structure allows very efficient filtering by geographical region, since filters like lat between 24 and 52 first gets the locations, which then can be used to filter the data table efficiently by using indexes.

Configuring the WCS Service and WFS Service for point data

WCS Server Point.png

WFS for locations

WCS for data

using python dictionaries ??