Difference between revisions of "WCS Wrapper Configuration for Point Data"

From Earth Science Information Partners (ESIP)
 
(83 intermediate revisions by the same user not shown)
Line 6: Line 6:
  
 
== Storing Point Data in a Relational Database ==
 
== Storing Point Data in a Relational Database ==
 
Provider [http://localhost:8080/point point] is an example how to configure this service to use SQL database to serve point data.
 
  
 
Point data is often stored in SQL databases. There's no standard schema, like CF-1.0 convention for NetCDF files, so it is not possible to just connect and start serving. You have to create the configuration description.  
 
Point data is often stored in SQL databases. There's no standard schema, like CF-1.0 convention for NetCDF files, so it is not possible to just connect and start serving. You have to create the configuration description.  
  
Therefore, the WCS query processor needs to know what to select and join. This information must be edited into the configuration script.
+
One of the most powerful ideas in relational database design is the concept of a '''view'''. You don't need to change the existing data tables, creating a view that makes your DB to look like the one needed is enough.
 
 
=== Notes on SQL ===
 
 
 
One of the most powerful ideas in relational database design is the concept of a view. You don't need to change the existing data tables, creating a view that makes your DB to look like the one needed is usually enough. This is by far the easiest way to configure your WCS.
 
  
It is better to design a normalized schema and only optimize with benchmarks available. Especially filtering small lat/lon ranges is much more efficient to do on a normalized location table rather than denormalized data table.
+
== The Simplest Case: Configure with SQL Views ==
  
=== Location Table ===
+
=== Location Table/View ===
  
The common thing between different databases is, that they need to have a location table. The current implementation is based on time series from stationary locations.
+
The common thing between different databases is, that they need to have a location table. The current implementation is based on time series from stationary locations.  
  
     table location
+
     table/view location
     +----------+-------------------------+
+
     +----------+-------+---------+-------+
 
     | loc_code | lat  | lon    | elev  |
 
     | loc_code | lat  | lon    | elev  |
     +----------+-------------------------+
+
     +----------+-------+---------+-------+
 
     | KMOD    | 37.63 | -120.95 |  30.0 |
 
     | KMOD    | 37.63 | -120.95 |  30.0 |
 
     | KSTL    | 38.75 |  -90.37 | 172.0 |
 
     | KSTL    | 38.75 |  -90.37 | 172.0 |
 
     | KUGN    | 42.42 |  -87.87 | 222.0 |
 
     | KUGN    | 42.42 |  -87.87 | 222.0 |
 
     |...      |      |        |      |
 
     |...      |      |        |      |
     +----------+-------------------------+
+
     +----------+-------+---------+-------+
  
 
Here loc_code is the primary key and lat,lon is the location. Optional fields can be added. Your database may have a location table with different name and different field names, but that does not matter. CIRA VIEWS database has a location table, but it's called '''Site''' and it spells full '''longitude'''. The datafed browser uses standard names loc_code, loc_name, lat and lon for browsing; to get plug-and-play compatibility we these names are needed. In the CIRA VIEWS database, the view creation would be:
 
Here loc_code is the primary key and lat,lon is the location. Optional fields can be added. Your database may have a location table with different name and different field names, but that does not matter. CIRA VIEWS database has a location table, but it's called '''Site''' and it spells full '''longitude'''. The datafed browser uses standard names loc_code, loc_name, lat and lon for browsing; to get plug-and-play compatibility we these names are needed. In the CIRA VIEWS database, the view creation would be:
Line 38: Line 32:
 
     select  
 
     select  
 
         SiteCode as loc_code,  
 
         SiteCode as loc_code,  
        SiteName as loc_name,
 
 
         Latitude as lat,  
 
         Latitude as lat,  
 
         Longitude as lon  
 
         Longitude as lon  
Line 45: Line 38:
 
The primary key is loc_code, being unique for all the locations.
 
The primary key is loc_code, being unique for all the locations.
  
If the fields have different names they can be aliased in the configuration.
+
Because WCS does not have a good place to describe a location table, we use WFS, Web Feature Service to do the same. [http://128.252.202.19:8080/CIRA?service=WFS&Version=1.0.0&Request=GetFeature&typename=VIEWS&outputFormat=text/csv Sample WFS Call.]
  
== Some Different DB Schema types ==
+
=== Data Views ===
  
In this documentation three different schemas are presented. Each of them have good and bad points.
+
Each data variable needs a view. For example:
  
=== One Big Data Table ===
+
    create view TEMP_V as
 +
    select
 +
        location.loc_code,
 +
        location.lat,
 +
        location.lon,
 +
        TEMP_base.datetime,
 +
        TEMP_base.temp,
 +
        TEMP_base.flag
 +
    from location
 +
    inner join TEMP_base on TEMP_base.loc_code = location.loc_code
  
In this case, all the data is in the same table:
+
Each parameter has its own data view that looks like
  
     +----------+------------+------+------+------+
+
    view TEMP_V
     | loc_code | datetime  | TEMP | DEWP | VIS  |
+
     +----------+-------+---------+------------+------+------+
     +----------+------------+------+------+------+
+
     | loc_code | lat  | lon    | datetime  | temp | flag |
     | KMOD    | 2009-06-01 | 87.8 | 51.4 | 10  |
+
     +----------+-------+---------+------------+------+------+
     | KMOD    | 2009-06-02 | 82.3 | 51.4 | NULL |
+
     | KMOD    | 37.63 | -120.95 | 2009-06-01 | 87.8 | X    |
     | KSTL    | 2009-06-01 | 78.6 | 34.9 | 18  |
+
     | KMOD    | 37.63 | -120.95 | 2009-06-02 | 82.3 |     |
     | ...      |            |      |      |      |
+
     | KSTL    | 38.75 |  -90.37 | 2009-06-01 | 78.6 |     |
     +----------+------------+------+------+------+
+
     | ...      |      |        |            |      |      |
 +
     +----------+-------+---------+------------+------+------+
  
The foreign key to location table is loc_code. The primary key is (loc_code, datetime)
+
    view DEWP_V
 +
    +----------+-----------------+------------+------+
 +
    | loc_code | lat  | lon    | datetime  | dewp |
 +
    +----------+-----------------+------------+------+
 +
    | KMOD    | 37.63 | -120.95 | 2009-06-01 | 51.4 |
 +
    | KMOD    | 37.63 | -120.95 | 2009-06-02 | 51.4 |
 +
    | KSTL    | 38.75 |  -90.37 | 2009-06-01 | 34.9 |
 +
    | ...      |      |        |            |      |
 +
    +----------+-----------------+------------+------+
  
'''Strengths:''' Simple, No joining when querying all the fields.
+
=== Example Configuration ===
  
'''Downsides:''' Needs nulls for missing data, querying just one field is inefficient.
+
All of the configuration is done using python dictionaries and lists. The syntax is simple, This is a list:
  
=== Long And Skinny Table ===
+
     ['loc_code', 'lat', 'lon']
 
 
In this case, all the data is in the same table:
 
 
 
    +----------+------------+------+-------+
 
     | loc_code | datetime  | data | param |
 
    +----------+------------+------+-------+
 
    | KMOD    | 2009-06-01 | 87.8 | TEMP  |
 
    | KMOD    | 2009-06-02 | 82.3 | TEMP  |
 
    | KSTL    | 2009-06-01 | 78.6 | TEMP  |
 
    | KMOD    | 2009-06-01 | 51.4 | DEWP  |
 
    | KMOD    | 2009-06-02 | 51.4 | DEWP  |
 
    | KSTL    | 2009-06-01 | 34.9 | DEWP  |
 
    | KMOD    | 2009-06-01 | 10  | VIS  |
 
    | KMOD    | 2009-06-02 | 10  | VIS  |
 
    | KSTL    | 2009-06-01 | 18  | VIS  |
 
    | ...      |            |      |      |
 
    +----------+------------+------+------+
 
  
'''Strengths:''' No nulls, Easy to add fields.
+
and this is a dictionary:
  
'''Downsides:''' Querying requires extra filtering with parameter index, slower than others.
+
    {'key1':'value1', 'key2': 'value2' }
 
 
=== One Data Table For Each Param ===
 
 
 
Each parameter has its own data table. In this case there's no need for nulls, and is the fastest for one parameter query.
 
  
    +----------+------------+------+
+
The test provider [http://128.252.202.19:8080/point point] is an example how to configure this service to use a SQL database to serve point data.  
    | loc_code | datetime  | TEMP |
 
    +----------+------------+------+
 
    | KMOD    | 2009-06-01 | 87.8 |
 
    | KMOD    | 2009-06-02 | 82.3 |
 
    | KSTL    | 2009-06-01 | 78.6 |
 
    | ...      |            |      |
 
    +----------+------------+------+
 
  
 +
The data for the demonstration is stored for into sqlite, which is distributed with python by default. The project has following files:
  
    +----------+------------+------+
+
* [http://128.252.202.19:8080/static/point/pntdata.py pntdata.py]: This script creates the test database and fills it with dummy data.
    | loc_code | datetime  | DEWP |
 
    +----------+------------+------+
 
    | KMOD    | 2009-06-01 | 51.4 |
 
    | KMOD    | 2009-06-02 | 51.4 |
 
    | KSTL    | 2009-06-01 | 34.9 |
 
    | ...      |            |      |
 
    +----------+------------+------+
 
 
 
 
 
    +----------+------------+-----+
 
    | loc_code | datetime  | VIS |
 
    +----------+------------+-----+
 
    | KMOD    | 2009-06-01 | 10  |
 
    | KMOD    | 2009-06-02 | 10  |
 
    | KSTL    | 2009-06-01 | 18  |
 
    | ...     |            |    |
 
    +----------+------------+-----+
 
 
 
 
 
'''Strengths:''' No nulls, Easy to add tables, easy to add heterogenous flag fields, fastest queries for single parameter.
 
 
 
'''Downsides:''' More tables, querying all the parameters at once requires a massive join.
 
 
 
== Configuring the WCS using SQL Views ==
 
 
 
This is demonstrated in the test provider '''point'''.
 
 
 
The demonstration is using sqlite, which is distributed with python by default. The project has following files:
 
 
 
* '''pntdata.py''': This script creates the test database and fills it with dummy data.
 
 
* '''pntdata.db''': The sqlite database file created by pntdata.py
 
* '''pntdata.db''': The sqlite database file created by pntdata.py
* '''point_config.py'''
+
* [http://128.252.202.19:8080/static/point/point_config.py point_config.py]:
** WCS coverage information
+
** Declares the location table in the SQL database.
 
** Mapping the coverages and fields to SQL tables.
 
** Mapping the coverages and fields to SQL tables.
* '''point_WCS.py'''
+
* [http://128.252.202.19:8080/static/point/point_WCS.py point_WCS.py] is the custom WCS handler
** Loads the metadata. In this demo version, this is done by hardcoding the tables in point_config.py. In CIRA/VIEWS this is done by querying the parameter table.
+
* [http://128.252.202.19:8080/static/point/point_WFS.py point_WFS.py] is the custom WFS handler that delivers the location table.
** Gets db connection. The metadata mappings allows the service to generate SQL on it's own.
 
  
 +
==== Location Table Configuration ====
  
=== Contents of point_config.py ===
+
From file [http://128.252.202.19:8080/static/point/point_config.py point_config.py]
  
All of these are python dictionaries and lists.  
+
In it's simplest case, SQL views are used to create required location table, so no aliasing is needed.
  
The syntax is simple, This is a list:
+
    location_info = {
 
+
        'location':{
    ['loc_code', 'lat', 'lon']
+
            'service':'WFS',
 
+
            'version':'1.0.0',
This is a dictionary:
+
            },
 +
        }
  
    {'key1':'value1', 'key2': 'value2 }
+
These are the standard names that datafed uses:
  
Since these are just python objects, they can be generated a database as well.
+
* The dimension name is "location".
 +
* No aliasing is needed, since the DB table/view and column names are standard.
 +
* The view/table name in the DB is "location".
 +
* The columns are lat", "lon" and "loc_code" and loc_code is a text type, not an integer.  
  
=== Location Table Configuration ===
 
  
First, the location table for coverage SURF_MET. Because WCS does not have a good place to describe a location table, we use WFS, Web Feature Service to do the same. [http://128.252.202.19:8080/CIRA?service=WFS&Version=1.0.0&Request=GetFeature&typename=VIEWS&outputFormat=text/csv Sample WFS Call.]
+
In the CIRA/VIEWS case, table and fields are alised:
  
     location_info = {
+
     VIEWS_location_info = {
 
         'location':{
 
         'location':{
 
             'service':'WFS',
 
             'service':'WFS',
 
             'version':'1.0.0',
 
             'version':'1.0.0',
 
In the '''CIRA/VIEWS''' database, we're not authorized to create a view. So we need to map the 'Site' table and it's columns.
 
 
 
             'table_alias':'Site',
 
             'table_alias':'Site',
 
             'columns':{
 
             'columns':{
Line 180: Line 134:
 
             },
 
             },
 
         }
 
         }
 +
* The dimension name is still "location"
 +
* The location table is called "Site"
 +
* "SiteCode", "Latitude" and "Longitude" are aliased to "loc_code", "lat" and "lon".
  
=== Data Table Configuration using SQL View ===
+
==== Data Table Configuration ====
  
    point_info = {
+
From file [http://128.252.202.19:8080/static/point/point_config.py point_config.py]
  
First key is the coverage information and it's descriptions:
+
Coverage information and it's descriptions:
  
 +
    point_info = {
 
         'SURF_MET':
 
         'SURF_MET':
 
             {
 
             {
Line 192: Line 150:
 
                 'Abstract':'Dummy test data.',
 
                 'Abstract':'Dummy test data.',
  
The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension.  
+
The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension. Time dimension format is [http://en.wikipedia.org/wiki/ISO_8601 ISO 8601] (start-inclusive)/(end-inclusive)/periodicity. PT1H means Periodicity Time 1 Hour, P1D would mean Periodicity Time 1 Day
  
 
                 'axes':{
 
                 'axes':{
Line 201: Line 159:
  
 
Then comes the description of the fields.  
 
Then comes the description of the fields.  
 
  
 
                 'fields':{
 
                 'fields':{
Line 209: Line 166:
 
                         'units':'deg F',
 
                         'units':'deg F',
  
The location table is a real dimension. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they can be specified individually.  
+
The location table is a real dimension, Latitude and Longitude are attributes along location axis, not dimensions themselves. So a typical point dataset with locations and regular time intervals is a 2-dimensional dataset. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they need to be specified individually.  
  
 
                         'axes':location_info,
 
                         'axes':location_info,
Line 221: Line 178:
 
                         },
 
                         },
  
The end of the first field.
+
==== Configure with SQL table and name aliasing ====
 
 
 
 
=== Data Table Configuration using by Mapping Original Tables ===
 
  
For demonstration purposes, the next field is configured without a view.
+
It is not necessary to create a complete view. The datafed WCS can join the location and data tables, as long as it has the table and field names. The second field in [http://128.252.202.19:8080/static/point/point_config.py point_config.py] is an configured using name aliasing. In this case, the table_alias tells, that the data table is named "DEWP_base" and its data field is named "dewp".  
  
 
                     'DEWP':{
 
                     'DEWP':{
Line 233: Line 187:
 
                         'units':'deg F',
 
                         'units':'deg F',
 
                         'axes':location_info,
 
                         'axes':location_info,
 +
                        'table_alias':'DEWP_base',
 +
                        'data_columns':[
 +
                            {'name':'dewp'},
 +
                            ],
 +
                        },
 +
 +
=== Query Examples ===
 +
 +
==== GetCapabilities ====
 +
 +
Returns the only coverage, ASOS.
 +
 +
http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCapabilities
 +
 +
==== DescribeCoverage ====
 +
 +
Returns the dimensions and all the fields of ASOS:
 +
 +
http://128.252.202.19:8080/NCDC?Service=WCS&Version=1.1.2&Request=DescribeCoverage&identifiers=ASOS
 +
 +
==== GetCoverage ====
 +
 +
Timeseries for one location. RangeSubset both selects fields and filters them by dimensions.
 +
 +
RangeSubset='''BEXT'''<nowiki>[</nowiki>'''location'''<nowiki>[</nowiki>'''13935'''<nowiki>]]</nowiki> selects field '''BEXT''' filtering by dimension '''location''' by loc_code='''13935'''.
 +
 +
The parameter '''store=true''' makes the return to be an xml document containing some meta information and a url to the CSV result. You need to copy the url from the XML envelope and paste it to address bar to retrieve it yourself.
 +
 +
<html><a href="http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCoverage&format=text/csv&store=true&identifier=ASOS&RangeSubset=BEXT[location[13935]]&TimeSequence=2010-05-01T00:00:00Z/2010-06-20T00:00:00Z/PT1M">http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCoverage&format=text/csv&store=true&identifier=ASOS&RangeSubset=BEXT[location[13935]]&TimeSequence=2010-05-01T00:00:00Z/2010-06-20T00:00:00Z/PT1M</a></html>
 +
 +
Same query, but The parameter '''store=false''' returns both the XML envelope and the CSV file in the same request.
 +
 +
So far '''only Mozilla Firefox''' can open it automatically. Internet Explorer, Safari and Chrome are not recommended here.
 +
 +
<html><a href="http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCoverage&format=text/csv&store=false&identifier=ASOS&RangeSubset=BEXT[location[13935]]&TimeSequence=2010-05-01T00:00:00Z/2010-06-20T00:00:00Z/PT1M">http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCoverage&format=text/csv&store=false&identifier=ASOS&RangeSubset=BEXT[location[13935]]&TimeSequence=2010-05-01T00:00:00Z/2010-06-20T00:00:00Z/PT1M</a></html>
 +
 +
== CIRA / VIEWS: Populating the configuration from a database ==
 +
 +
Open [https://sourceforge.net/projects/aq-ogc-services/files_beta/ download page] in another tab, download and unzip '''ows-point-1.2.3.zip''' or later. Copy the '''web''' folder over the '''/OWS/web'''.
 +
 +
The web/static/'''CIRA''' is a real production provider, serving data from http://views.cira.colostate.edu/web/ online. Eventually, this service will move to be hosted by them.
 +
 +
The data in CIRA/VIEWS is all in one table. Therefore, the configuration of WCS fields can be done by scanning that table and writing the information into the dictionary.
 +
 +
There is only one data table, AirFact3, that holds all the parameters. Therefore each row must have parameter code. This makes the joining a little more complicated:
 +
 +
    select ....
 +
    from Site
 +
    inner join AirFact3 on AirFact3.SiteID = Site.SiteID
 +
    inner join Parameter on AirFact3.ParamID = Parameter.ParamID'''
 +
 +
And we need to filter just non-aggregated valid data from certain programs:
 +
 +
    AirFact3.AggregationID = 1
 +
    and Site.latitude is not null
 +
    and Site.Longitude is not null
 +
    and AirFact3.FactValue <> -999
 +
    and AirFact3.ProgramID in (10001, 10005, 20002) -- ('INA', 'IMPPRE', 'ASPD')
 +
    and Parameter.ParamCode = '%s'
 +
 +
This is explained in detail in the comments of the script [http://128.252.202.19:8080/static/CIRA/VIEWS_metadata.py VIEWS_metadata.py], a 250 lines long non-trivial script. It counts the VIEWS parameters occurrence and writes a list of parameters with enough data into a local sqlite dabase.
 +
 +
The configuration script [http://128.252.202.19:8080/static/CIRA/CIRA_config.py CIRA_config.py] loads the metadata from that small local database.
  
First we tell which table contains the data.
+
== Some Different DB Schema types ==
  
                        'table_alias':'DEWP_base',
+
In this documentation three different schemas are presented. Each of them rely on the location table. None of them is the best for everything, they make different compromises for different reasons.
  
The '''CIRA/VIEWS''' has FactDate, not datetime, so we have to map.
+
=== One Big Data Table ===
  
                        'datetime_alias':'FactDate',
+
In this case, all the data is in the same table:
  
If you have a simple join by loc_code, then the point the SQL processor can just generate 'inner join DEWP_base on DEWP_base.loc_code = location.loc_code'. In the CIRA/VIEWS, it's more complicated. We need to join the Site table for locations and since all the data is in the same table, we also need to filter by the parameter code, which now requires joining the parameter table. Also the foreign key is SiteID, not the loc_code alias SiteCode.  
+
    +----------+------------+------+------+------+
 +
    | loc_code | datetime  | TEMP | DEWP | VIS  |
 +
    +----------+------------+------+------+------+
 +
    | KMOD    | 2009-06-01 | 87.8 | 51.4 | 10  |
 +
    | KMOD    | 2009-06-02 | 82.3 | 51.4 | NULL |
 +
    | KSTL    | 2009-06-01 | 78.6 | 34.9 | 18  |
 +
    | ...     |            |      |      |      |
 +
    +----------+------------+------+------+------+
  
                        'joins':(
+
The foreign key to location table is loc_code. The primary key is (loc_code, datetime)
                            'inner join AirFact3 on AirFact3.SiteID = Site.SiteID ' +
 
                            'inner join Parameter on AirFact3.ParamID = Parameter.ParamID'),
 
  
The default common data filter is empty. Again, CIRA/VIEWS needs more:
+
'''Strengths:''' Simple, No joining when querying all the fields.
* Aggregation: some data is aggregated, we're interested in the raw data only.
 
* Exclude null lat/lon and -999 in data.
 
* Only get data from certain programs.
 
* Finally: Since all the parameters are in the same table, filter by param code.
 
  
 +
'''Downsides:''' Needs nulls for missing data, querying just one field is inefficient.
  
                        'common_data_filter':(
+
=== Long And Skinny Table ===
                            'AggregationID = 1 and Site.latitude is not null and Site.Longitude is not null ' + 
 
                            'and AirFact3.FactValue <> -999 ' +
 
                            'and AirFact3.ProgramID in (10001, 10005, 20002) -- ('INA', 'IMPPRE', 'ASPD') ' +
 
                            "and Parameter.ParamCode = 'MF'"),
 
  
Finally: All the data is in the 'FactValue', so we need to map the column.
+
In this case, all the data is in the same table:
  
                        'data_columns':[
+
    +----------+------------+------+-------+
                            {'name':'MF', 'column_alias':'FactValue'},
+
    | loc_code | datetime  | data | param |
                            ],
+
    +----------+------------+------+-------+
 +
    | KMOD    | 2009-06-01 | 87.8 | TEMP  |
 +
    | KMOD    | 2009-06-02 | 82.3 | TEMP  |
 +
    | KSTL    | 2009-06-01 | 78.6 | TEMP  |
 +
    | KMOD    | 2009-06-01 | 51.4 | DEWP  |
 +
    | KMOD    | 2009-06-02 | 51.4 | DEWP  |
 +
    | KSTL    | 2009-06-01 | 34.9 | DEWP  |
 +
    | KMOD    | 2009-06-01 | 10  | VIS  |
 +
    | KMOD    | 2009-06-02 | 10  | VIS  |
 +
    | KSTL    | 2009-06-01 | 18  | VIS  |
 +
    | ...      |            |      |      |
 +
    +----------+------------+------+------+
 +
 
 +
'''Strengths:''' No nulls, Easy to add fields.
 +
 
 +
'''Downsides:''' Querying requires extra filtering with parameter index, slower than others.
 +
 
 +
=== One Data Table For Each Param ===
 +
 
 +
Each parameter has its own data table.
 +
 
 +
    +----------+------------+------+
 +
    | loc_code | datetime  | TEMP |
 +
    +----------+------------+------+
 +
    | KMOD    | 2009-06-01 | 87.8 |
 +
    | KMOD    | 2009-06-02 | 82.3 |
 +
    | KSTL    | 2009-06-01 | 78.6 |
 +
    | ...      |            |      |
 +
    +----------+------------+------+
  
In this point database, the above is not necessary since fields have defaults. But we still need the data column.
 
  
                        'data_columns':[
+
    +----------+------------+------+
                            {'name':'dewp'},
+
    | loc_code | datetime  | DEWP |
                            ],
+
    +----------+------------+------+
                        },
+
    | KMOD    | 2009-06-01 | 51.4 |
 +
    | KMOD    | 2009-06-02 | 51.4 |
 +
    | KSTL    | 2009-06-01 | 34.9 |
 +
    | ...      |            |      |
 +
    +----------+------------+------+
 +
 
  
The end of field.
+
    +----------+------------+-----+
 +
    | loc_code | datetime  | VIS |
 +
    +----------+------------+-----+
 +
    | KMOD    | 2009-06-01 | 10  |
 +
    | KMOD    | 2009-06-02 | 10  |
 +
    | KSTL    | 2009-06-01 | 18  |
 +
    | ...      |            |    |
 +
    +----------+------------+-----+
  
                    },
 
            },
 
  
If you have another coverage, add it here:
+
'''Strengths:''' No nulls, Easy to add tables, easy to add heterogeneous flag fields, fastest queries for single parameter.
  
        'Cov-Name':
+
'''Downsides:''' More tables, querying all the parameters at once requires a massive join.
            {
 
                'Title': ...
 

Latest revision as of 13:03, October 8, 2010

Back to WCS Wrapper

Back to WCS Wrapper Configuration

Project on SourceForge

Storing Point Data in a Relational Database

Point data is often stored in SQL databases. There's no standard schema, like CF-1.0 convention for NetCDF files, so it is not possible to just connect and start serving. You have to create the configuration description.

One of the most powerful ideas in relational database design is the concept of a view. You don't need to change the existing data tables, creating a view that makes your DB to look like the one needed is enough.

The Simplest Case: Configure with SQL Views

Location Table/View

The common thing between different databases is, that they need to have a location table. The current implementation is based on time series from stationary locations.

   table/view location
   +----------+-------+---------+-------+
   | loc_code | lat   | lon     | elev  |
   +----------+-------+---------+-------+
   | KMOD     | 37.63 | -120.95 |  30.0 |
   | KSTL     | 38.75 |  -90.37 | 172.0 |
   | KUGN     | 42.42 |  -87.87 | 222.0 |
   |...       |       |         |       |
   +----------+-------+---------+-------+

Here loc_code is the primary key and lat,lon is the location. Optional fields can be added. Your database may have a location table with different name and different field names, but that does not matter. CIRA VIEWS database has a location table, but it's called Site and it spells full longitude. The datafed browser uses standard names loc_code, loc_name, lat and lon for browsing; to get plug-and-play compatibility we these names are needed. In the CIRA VIEWS database, the view creation would be:

   create view location as 
   select 
       SiteCode as loc_code, 
       Latitude as lat, 
       Longitude as lon 
   from Site

The primary key is loc_code, being unique for all the locations.

Because WCS does not have a good place to describe a location table, we use WFS, Web Feature Service to do the same. Sample WFS Call.

Data Views

Each data variable needs a view. For example:

   create view TEMP_V as
   select 
       location.loc_code, 
       location.lat, 
       location.lon, 
       TEMP_base.datetime, 
       TEMP_base.temp,
       TEMP_base.flag
   from location
   inner join TEMP_base on TEMP_base.loc_code = location.loc_code

Each parameter has its own data view that looks like

   view TEMP_V
   +----------+-------+---------+------------+------+------+
   | loc_code | lat   | lon     | datetime   | temp | flag |
   +----------+-------+---------+------------+------+------+
   | KMOD     | 37.63 | -120.95 | 2009-06-01 | 87.8 | X    |
   | KMOD     | 37.63 | -120.95 | 2009-06-02 | 82.3 |      |
   | KSTL     | 38.75 |  -90.37 | 2009-06-01 | 78.6 |      |
   | ...      |       |         |            |      |      |
   +----------+-------+---------+------------+------+------+
   view DEWP_V
   +----------+-----------------+------------+------+
   | loc_code | lat   | lon     | datetime   | dewp |
   +----------+-----------------+------------+------+
   | KMOD     | 37.63 | -120.95 | 2009-06-01 | 51.4 |
   | KMOD     | 37.63 | -120.95 | 2009-06-02 | 51.4 |
   | KSTL     | 38.75 |  -90.37 | 2009-06-01 | 34.9 |
   | ...      |       |         |            |      |
   +----------+-----------------+------------+------+

Example Configuration

All of the configuration is done using python dictionaries and lists. The syntax is simple, This is a list:

   ['loc_code', 'lat', 'lon']

and this is a dictionary:

   {'key1':'value1', 'key2': 'value2' }

The test provider point is an example how to configure this service to use a SQL database to serve point data.

The data for the demonstration is stored for into sqlite, which is distributed with python by default. The project has following files:

  • pntdata.py: This script creates the test database and fills it with dummy data.
  • pntdata.db: The sqlite database file created by pntdata.py
  • point_config.py:
    • Declares the location table in the SQL database.
    • Mapping the coverages and fields to SQL tables.
  • point_WCS.py is the custom WCS handler
  • point_WFS.py is the custom WFS handler that delivers the location table.

Location Table Configuration

From file point_config.py

In it's simplest case, SQL views are used to create required location table, so no aliasing is needed.

   location_info = {
       'location':{
           'service':'WFS',
           'version':'1.0.0',
           },
       }

These are the standard names that datafed uses:

  • The dimension name is "location".
  • No aliasing is needed, since the DB table/view and column names are standard.
  • The view/table name in the DB is "location".
  • The columns are lat", "lon" and "loc_code" and loc_code is a text type, not an integer.


In the CIRA/VIEWS case, table and fields are alised:

   VIEWS_location_info = {
       'location':{
           'service':'WFS',
           'version':'1.0.0',
           'table_alias':'Site',
           'columns':{
               'loc_code':{'column_alias':'SiteCode'},
               'lat':{'column_alias':'Latitude'},
               'lon':{'column_alias':'Longitude'},
               }
           },
       }
  • The dimension name is still "location"
  • The location table is called "Site"
  • "SiteCode", "Latitude" and "Longitude" are aliased to "loc_code", "lat" and "lon".

Data Table Configuration

From file point_config.py

Coverage information and it's descriptions:

   point_info = {
       'SURF_MET':
           {
               'Title':'Surface Meteorological Observations',
               'Abstract':'Dummy test data.',

The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension. Time dimension format is ISO 8601 (start-inclusive)/(end-inclusive)/periodicity. PT1H means Periodicity Time 1 Hour, P1D would mean Periodicity Time 1 Day

               'axes':{
                   'X':(-180, 179.75), 
                   'Y':(-90, 89.383),
                   'T':iso_time.parse('2009-09-01T12:00:00/2009-09-03T12:00:00/PT1H'),
                   },

Then comes the description of the fields.

               'fields':{
                   'TEMP':{
                       'Title':'Temperature',
                       'datatype': 'float',
                       'units':'deg F',

The location table is a real dimension, Latitude and Longitude are attributes along location axis, not dimensions themselves. So a typical point dataset with locations and regular time intervals is a 2-dimensional dataset. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they need to be specified individually.

                       'axes':location_info,

The access instructions. This configuration is using 'complete_view', so the administrator has created the view that joins together the location table and the temperature data table. The SQL query will typically look like select loc_code, lat, lon, datetime, temp, flag from TEMP_V where datetime = '2009-09-01 and (lat between 34 and 44) and (lon between -90 and -80). This is by far the easiest way to configure the WCS.

                       'complete_view':{
                           'view_alias':'TEMP_V',
                           'columns':['loc_code','lat','lon','datetime','temp', 'flag'],
                           },
                       },

Configure with SQL table and name aliasing

It is not necessary to create a complete view. The datafed WCS can join the location and data tables, as long as it has the table and field names. The second field in point_config.py is an configured using name aliasing. In this case, the table_alias tells, that the data table is named "DEWP_base" and its data field is named "dewp".

                   'DEWP':{
                       'Title':'Dewpoint',
                       'datatype': 'float',
                       'units':'deg F',
                       'axes':location_info,
                       'table_alias':'DEWP_base',
                       'data_columns':[
                           {'name':'dewp'},
                           ],
                       },

Query Examples

GetCapabilities

Returns the only coverage, ASOS.

http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCapabilities

DescribeCoverage

Returns the dimensions and all the fields of ASOS:

http://128.252.202.19:8080/NCDC?Service=WCS&Version=1.1.2&Request=DescribeCoverage&identifiers=ASOS

GetCoverage

Timeseries for one location. RangeSubset both selects fields and filters them by dimensions.

RangeSubset=BEXT[location[13935]] selects field BEXT filtering by dimension location by loc_code=13935.

The parameter store=true makes the return to be an xml document containing some meta information and a url to the CSV result. You need to copy the url from the XML envelope and paste it to address bar to retrieve it yourself.

http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCoverage&format=text/csv&store=true&identifier=ASOS&RangeSubset=BEXT[location[13935]]&TimeSequence=2010-05-01T00:00:00Z/2010-06-20T00:00:00Z/PT1M

Same query, but The parameter store=false returns both the XML envelope and the CSV file in the same request.

So far only Mozilla Firefox can open it automatically. Internet Explorer, Safari and Chrome are not recommended here.

http://128.252.202.19:8080/NCDC?service=WCS&version=1.1.2&Request=GetCoverage&format=text/csv&store=false&identifier=ASOS&RangeSubset=BEXT[location[13935]]&TimeSequence=2010-05-01T00:00:00Z/2010-06-20T00:00:00Z/PT1M

CIRA / VIEWS: Populating the configuration from a database

Open download page in another tab, download and unzip ows-point-1.2.3.zip or later. Copy the web folder over the /OWS/web.

The web/static/CIRA is a real production provider, serving data from http://views.cira.colostate.edu/web/ online. Eventually, this service will move to be hosted by them.

The data in CIRA/VIEWS is all in one table. Therefore, the configuration of WCS fields can be done by scanning that table and writing the information into the dictionary.

There is only one data table, AirFact3, that holds all the parameters. Therefore each row must have parameter code. This makes the joining a little more complicated:

   select ....
   from Site
   inner join AirFact3 on AirFact3.SiteID = Site.SiteID
   inner join Parameter on AirFact3.ParamID = Parameter.ParamID

And we need to filter just non-aggregated valid data from certain programs:

   AirFact3.AggregationID = 1 
   and Site.latitude is not null 
   and Site.Longitude is not null 
   and AirFact3.FactValue <> -999
   and AirFact3.ProgramID in (10001, 10005, 20002) -- ('INA', 'IMPPRE', 'ASPD')
   and Parameter.ParamCode = '%s'

This is explained in detail in the comments of the script VIEWS_metadata.py, a 250 lines long non-trivial script. It counts the VIEWS parameters occurrence and writes a list of parameters with enough data into a local sqlite dabase.

The configuration script CIRA_config.py loads the metadata from that small local database.

Some Different DB Schema types

In this documentation three different schemas are presented. Each of them rely on the location table. None of them is the best for everything, they make different compromises for different reasons.

One Big Data Table

In this case, all the data is in the same table:

   +----------+------------+------+------+------+
   | loc_code | datetime   | TEMP | DEWP | VIS  |
   +----------+------------+------+------+------+
   | KMOD     | 2009-06-01 | 87.8 | 51.4 | 10   |
   | KMOD     | 2009-06-02 | 82.3 | 51.4 | NULL |
   | KSTL     | 2009-06-01 | 78.6 | 34.9 | 18   |
   | ...      |            |      |      |      |
   +----------+------------+------+------+------+

The foreign key to location table is loc_code. The primary key is (loc_code, datetime)

Strengths: Simple, No joining when querying all the fields.

Downsides: Needs nulls for missing data, querying just one field is inefficient.

Long And Skinny Table

In this case, all the data is in the same table:

   +----------+------------+------+-------+
   | loc_code | datetime   | data | param |
   +----------+------------+------+-------+
   | KMOD     | 2009-06-01 | 87.8 | TEMP  |
   | KMOD     | 2009-06-02 | 82.3 | TEMP  |
   | KSTL     | 2009-06-01 | 78.6 | TEMP  |
   | KMOD     | 2009-06-01 | 51.4 | DEWP  |
   | KMOD     | 2009-06-02 | 51.4 | DEWP  |
   | KSTL     | 2009-06-01 | 34.9 | DEWP  |
   | KMOD     | 2009-06-01 | 10   | VIS   |
   | KMOD     | 2009-06-02 | 10   | VIS   |
   | KSTL     | 2009-06-01 | 18   | VIS   |
   | ...      |            |      |       |
   +----------+------------+------+------+

Strengths: No nulls, Easy to add fields.

Downsides: Querying requires extra filtering with parameter index, slower than others.

One Data Table For Each Param

Each parameter has its own data table.

   +----------+------------+------+
   | loc_code | datetime   | TEMP |
   +----------+------------+------+
   | KMOD     | 2009-06-01 | 87.8 |
   | KMOD     | 2009-06-02 | 82.3 |
   | KSTL     | 2009-06-01 | 78.6 |
   | ...      |            |      |
   +----------+------------+------+


   +----------+------------+------+
   | loc_code | datetime   | DEWP |
   +----------+------------+------+
   | KMOD     | 2009-06-01 | 51.4 |
   | KMOD     | 2009-06-02 | 51.4 |
   | KSTL     | 2009-06-01 | 34.9 |
   | ...      |            |      |
   +----------+------------+------+


   +----------+------------+-----+
   | loc_code | datetime   | VIS |
   +----------+------------+-----+
   | KMOD     | 2009-06-01 | 10  |
   | KMOD     | 2009-06-02 | 10  |
   | KSTL     | 2009-06-01 | 18  |
   | ...      |            |     |
   +----------+------------+-----+


Strengths: No nulls, Easy to add tables, easy to add heterogeneous flag fields, fastest queries for single parameter.

Downsides: More tables, querying all the parameters at once requires a massive join.