Difference between revisions of "WCS Wrapper Configuration for Point Data"
Line 194: | Line 194: | ||
and Parameter.ParamCode = '%s' | and Parameter.ParamCode = '%s' | ||
− | + | This is explained in detail in the comments of the script [http://128.252.202.19:8080/static/CIRA/VIEWS_data.py VIEWS_data.py], which compiles all the metadata into a local sqlite dabase, and [http://128.252.202.19:8080/static/CIRA/CIRA_config.py CIRA_config.py], which loads the metadata from that database. | |
== Some Different DB Schema types == | == Some Different DB Schema types == |
Revision as of 10:59, July 29, 2010
Back to WCS Wrapper Configuration
Storing Point Data in a Relational Database
Point data is often stored in SQL databases. There's no standard schema, like CF-1.0 convention for NetCDF files, so it is not possible to just connect and start serving. You have to create the configuration description.
One of the most powerful ideas in relational database design is the concept of a view. You don't need to change the existing data tables, creating a view that makes your DB to look like the one needed is enough.
All of the configuration is done using python dictionaries and lists. The syntax is simple, This is a list:
['loc_code', 'lat', 'lon']
and this is a dictionary:
{'key1':'value1', 'key2': 'value2 }
The Simplest Case: Configure with SQL Views
The test provider point is an example how to configure this service to use a SQL database to serve point data.
This is demonstrated in the test provider point.
The demonstration is using sqlite, which is distributed with python by default. The project has following files:
- pntdata.py: This script creates the test database and fills it with dummy data.
- pntdata.db: The sqlite database file created by pntdata.py
- point_config.py
- WCS coverage information
- Mapping the coverages and fields to SQL tables.
- point_WFS.py is the custom WFS handler that delivers the location table.
- point_WCS.py is the custom WCS handler
- Loads the metadata. In this demo version, this is done by hardcoding the tables in point_config.py.
- Gets db connection. The metadata mappings allows the service to generate SQL on it's own.
Location Table/View
The common thing between different databases is, that they need to have a location table. The current implementation is based on time series from stationary locations.
table/view location +----------+-------------------------+ | loc_code | lat | lon | elev | +----------+-------------------------+ | KMOD | 37.63 | -120.95 | 30.0 | | KSTL | 38.75 | -90.37 | 172.0 | | KUGN | 42.42 | -87.87 | 222.0 | |... | | | | +----------+-------------------------+
Here loc_code is the primary key and lat,lon is the location. Optional fields can be added. Your database may have a location table with different name and different field names, but that does not matter. CIRA VIEWS database has a location table, but it's called Site and it spells full longitude. The datafed browser uses standard names loc_code, loc_name, lat and lon for browsing; to get plug-and-play compatibility we these names are needed. In the CIRA VIEWS database, the view creation would be:
create view location as select SiteCode as loc_code, Latitude as lat, Longitude as lon from Site
The primary key is loc_code, being unique for all the locations.
Data Views
Each data variable needs a view. For example:
create view TEMP_V as select location.loc_code, location.lat, location.lon, TEMP_base.datetime, TEMP_base.temp, TEMP_base.flag from location inner join TEMP_base on TEMP_base.loc_code = location.loc_code
Each parameter has its own data view that looks like
view TEMP_V +----------+-----------------+------------+------+------+ | loc_code | lat | lon | datetime | temp | flag | +----------+-----------------+------------+------+------+ | KMOD | 37.63 | -120.95 | 2009-06-01 | 87.8 | X | | KMOD | 37.63 | -120.95 | 2009-06-02 | 82.3 | | | KSTL | 38.75 | -90.37 | 2009-06-01 | 78.6 | | | ... | | | | | | +----------+-----------------+------------+------+------+
view DEWP_V +----------+-----------------+------------+------+ | loc_code | lat | lon | datetime | dewp | +----------+-----------------+------------+------+ | KMOD | 37.63 | -120.95 | 2009-06-01 | 51.4 | | KMOD | 37.63 | -120.95 | 2009-06-02 | 51.4 | | KSTL | 38.75 | -90.37 | 2009-06-01 | 34.9 | | ... | | | | | +----------+-----------------+------------+------+
Example Configuration
This a running example from point_config.py
First comes the coverage information and it's descriptions:
point_info = { 'SURF_MET': { 'Title':'Surface Meteorological Observations', 'Abstract':'Dummy test data.',
The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension.
'axes':{ 'X':(-180, 179.75), 'Y':(-90, 89.383), 'T':iso_time.parse('2009-09-01T12:00:00/2009-09-03T12:00:00/PT1H'), },
Then comes the description of the fields.
'fields':{ 'TEMP':{ 'Title':'Temperature', 'datatype': 'float', 'units':'deg F',
The location table is a real dimension. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they need to be specified individually.
'axes':location_info,
The access instructions. This configuration is using 'complete_view', so the administrator has created the view that joins together the location table and the temperature data table. The SQL query will typically look like select loc_code, lat, lon, datetime, temp, flag from TEMP_V where datetime = '2009-09-01 and (lat between 34 and 44) and (lon between -90 and -80). This is by far the easiest way to configure the WCS.
'complete_view':{ 'view_alias':'TEMP_V', 'columns':['loc_code','lat','lon','datetime','temp', 'flag'], }, },
Configure with SQL table and name aliasing
It is not necessary to create a complete view. The datafed WCS can join the location and data tables, as long as it has the table and field names. The second field in point_config.py is an configured using name aliasing. In this case, the table_alias tells, that the data table is named "DEWP_base" and its data field is named "dewp".
'DEWP':{ 'Title':'Dewpoint', 'datatype': 'float', 'units':'deg F', 'axes':location_info, 'table_alias':'DEWP_base', 'data_columns':[ {'name':'dewp'}, ], },
CIRA/VIEWS: Populating the configuration from a database
Location Table
In the original database, the location table is called 'Site' and columns are SiteCode, Latitude and Longitude. These are aliased here.
VIEWS_location_info = { 'location':{ 'service':'WFS', 'version':'1.0.0', 'table_alias':'Site', 'columns':{ 'loc_code':{'column_alias':'SiteCode'}, 'lat':{'column_alias':'Latitude'}, 'lon':{'column_alias':'Longitude'}, } }, }
Data Table
The data in CIRA/VIEWS is all in one table. Therefore, the configuration of WCS fields can be done by scanning that table and writing the information into the dictionary.
There is only one data table, AirFact3, that holds all the parameters. Therefore each row must have parameter code. This makes the joining a little more complicated:
select .... from Site inner join AirFact3 on AirFact3.SiteID = Site.SiteID inner join Parameter on AirFact3.ParamID = Parameter.ParamID
And we need to filter just non-aggregated data from certain programs:
AirFact3.AggregationID = 1 and Site.latitude is not null and Site.Longitude is not null and AirFact3.FactValue <> -999 and AirFact3.ProgramID in (10001, 10005, 20002) -- ('INA', 'IMPPRE', 'ASPD') and Parameter.ParamCode = '%s'
This is explained in detail in the comments of the script VIEWS_data.py, which compiles all the metadata into a local sqlite dabase, and CIRA_config.py, which loads the metadata from that database.
Some Different DB Schema types
In this documentation three different schemas are presented. Each of them rely on the location table. None of them is the best for everything, they make different compromises for different reasons.
One Big Data Table
In this case, all the data is in the same table:
+----------+------------+------+------+------+ | loc_code | datetime | TEMP | DEWP | VIS | +----------+------------+------+------+------+ | KMOD | 2009-06-01 | 87.8 | 51.4 | 10 | | KMOD | 2009-06-02 | 82.3 | 51.4 | NULL | | KSTL | 2009-06-01 | 78.6 | 34.9 | 18 | | ... | | | | | +----------+------------+------+------+------+
The foreign key to location table is loc_code. The primary key is (loc_code, datetime)
Strengths: Simple, No joining when querying all the fields.
Downsides: Needs nulls for missing data, querying just one field is inefficient.
Long And Skinny Table
In this case, all the data is in the same table:
+----------+------------+------+-------+ | loc_code | datetime | data | param | +----------+------------+------+-------+ | KMOD | 2009-06-01 | 87.8 | TEMP | | KMOD | 2009-06-02 | 82.3 | TEMP | | KSTL | 2009-06-01 | 78.6 | TEMP | | KMOD | 2009-06-01 | 51.4 | DEWP | | KMOD | 2009-06-02 | 51.4 | DEWP | | KSTL | 2009-06-01 | 34.9 | DEWP | | KMOD | 2009-06-01 | 10 | VIS | | KMOD | 2009-06-02 | 10 | VIS | | KSTL | 2009-06-01 | 18 | VIS | | ... | | | | +----------+------------+------+------+
Strengths: No nulls, Easy to add fields.
Downsides: Querying requires extra filtering with parameter index, slower than others.
One Data Table For Each Param
Each parameter has its own data table.
+----------+------------+------+ | loc_code | datetime | TEMP | +----------+------------+------+ | KMOD | 2009-06-01 | 87.8 | | KMOD | 2009-06-02 | 82.3 | | KSTL | 2009-06-01 | 78.6 | | ... | | | +----------+------------+------+
+----------+------------+------+ | loc_code | datetime | DEWP | +----------+------------+------+ | KMOD | 2009-06-01 | 51.4 | | KMOD | 2009-06-02 | 51.4 | | KSTL | 2009-06-01 | 34.9 | | ... | | | +----------+------------+------+
+----------+------------+-----+ | loc_code | datetime | VIS | +----------+------------+-----+ | KMOD | 2009-06-01 | 10 | | KMOD | 2009-06-02 | 10 | | KSTL | 2009-06-01 | 18 | | ... | | | +----------+------------+-----+
Strengths: No nulls, Easy to add tables, easy to add heterogeneous flag fields, fastest queries for single parameter.
Downsides: More tables, querying all the parameters at once requires a massive join.
Configuring the WCS using SQL Views
Since these are just python objects, they can be generated a database as well.
Location Table Configuration
The location table for coverage SURF_MET. Because WCS does not have a good place to describe a location table, we use WFS, Web Feature Service to do the same. Sample WFS Call.
location_info = { 'location':{ 'service':'WFS', 'version':'1.0.0',
In the CIRA/VIEWS database, we're not authorized to create a view. So we need to map the 'Site' table and it's columns.
'table_alias':'Site', 'columns':{ 'loc_code':{'column_alias':'SiteCode'}, 'lat':{'column_alias':'Latitude'}, 'lon':{'column_alias':'Longitude'}, } }, }
Data Table Configuration using SQL View
point_info = {
First key is the coverage information and it's descriptions:
'SURF_MET': { 'Title':'Surface Meteorological Observations', 'Abstract':'Dummy test data.',
The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension.
'axes':{ 'X':(-180, 179.75), 'Y':(-90, 89.383), 'T':iso_time.parse('2009-09-01T12:00:00/2009-09-03T12:00:00/PT1H'), },
Then comes the description of the fields.
'fields':{ 'TEMP':{ 'Title':'Temperature', 'datatype': 'float', 'units':'deg F',
The location table is a real dimension. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they can be specified individually.
'axes':location_info,
The access instructions. This configuration is using 'complete_view', so the administrator has created the view that joins together the location table and the temperature data table. The SQL query will typically look like select loc_code, lat, lon, datetime, temp, flag from TEMP_V where datetime = '2009-09-01 and (lat between 34 and 44) and (lon between -90 and -80). This is by far the easiest way to configure the WCS.
'complete_view':{ 'view_alias':'TEMP_V', 'columns':['loc_code','lat','lon','datetime','temp', 'flag'], }, },
The end of the first field.
Data Table Configuration using by Mapping Original Tables
For demonstration purposes, the next field is configured without a view.
'DEWP':{ 'Title':'Dewpoint', 'datatype': 'float', 'units':'deg F', 'axes':location_info,
First we tell which table contains the data.
'table_alias':'DEWP_base',
The CIRA/VIEWS has FactDate, not datetime, so we have to map.
'datetime_alias':'FactDate',
If you have a simple join by loc_code, then the point the SQL processor can just generate 'inner join DEWP_base on DEWP_base.loc_code = location.loc_code'. In the CIRA/VIEWS, it's more complicated. We need to join the Site table for locations and since all the data is in the same table, we also need to filter by the parameter code, which now requires joining the parameter table. Also the foreign key is SiteID, not the loc_code alias SiteCode.
'joins':( 'inner join AirFact3 on AirFact3.SiteID = Site.SiteID ' + 'inner join Parameter on AirFact3.ParamID = Parameter.ParamID'),
The default common data filter is empty. Again, CIRA/VIEWS needs more. For example for Fine Mass (MF):
- Aggregation: some data is aggregated, we're interested in the raw data only.
- Exclude null lat/lon and -999 in data.
- Only get data from certain programs.
- Finally: Since all the parameters are in the same table, filter by param code.
'common_data_filter':( 'AggregationID = 1 and Site.latitude is not null and Site.Longitude is not null ' + 'and AirFact3.FactValue <> -999 ' + 'and AirFact3.ProgramID in (10001, 10005, 20002) -- ('INA', 'IMPPRE', 'ASPD') ' + "and Parameter.ParamCode = 'MF'"),
Finally: All the data is in the 'FactValue', so we need to map the column.
'data_columns':[ {'name':'MF', 'column_alias':'FactValue'}, ],
In this point database, the above is not necessary since fields have standard names. But we still need the data column.
'data_columns':[ {'name':'dewp'}, ], },
The end of field.
}, },
If you have another coverage, add it here:
'Cov-Name': { 'Title': ...