Difference between revisions of "WCS Wrapper Configuration for Cubes"

From Earth Science Information Partners (ESIP)
(New page: Back to WCS Wrapper [http://sourceforge.net/p/aq-ogc-services/home/ Project on SourceForge] Questions and comments should go to [http://sourceforge.net/...)
 
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
[[WCS_Access_to_netCDF_Files| Back to WCS Wrapper]]   
 
[[WCS_Access_to_netCDF_Files| Back to WCS Wrapper]]   
 +
 +
[[WCS_Wrapper_Configuration| Back to WCS Wrapper Configuration]]
  
 
[http://sourceforge.net/p/aq-ogc-services/home/ Project on SourceForge]
 
[http://sourceforge.net/p/aq-ogc-services/home/ Project on SourceForge]
  
Questions and comments should go to [http://sourceforge.net/p/aq-ogc-services/discussion/ sourceforge discussions], bug reports to [http://sourceforge.net/p/aq-ogc-services/tickets/ sourceforge tickets]. Urgent issues can be asked from Kari Hoijarvi 314-935-6099(w) or 314-843-6436(h)
+
== Serving Data from complete NetCDF-CF files ==
 
 
== Structure of OWS/web ==
 
 
 
'''OWS/web''' is for system developers only.
 
 
 
OWS/web/'''static''' contains static web content. You can put any documentation here and it will be served as a web page or download. The home page index.html is pretty much mandatory, and you shoud change favicon.ico to reflect your organization. We highly recommend, that you customize these to document your WCS service.
 
 
 
OWS/web/static/'''cache''' is a folder for temporary files. The service uses it for output files. Anything you put there will be deleted when space is needed.
 
 
 
The installation contains an example datasets OWS/web/static/'''testprovider''' and OWS/web/static/'''point'''. The testprovider is a demo NetCDF dataset, point is an example how to server point data from a SQL database. Every service will have a folder with the same name here.
 
 
 
You may now check the provider page [http://localhost:8080/testprovider http://localhost:8080/testprovider] which is served as a static file. Any file under static becomes accessible
 
 
 
== The Human Interface: Create the index.html Front Pages for Visitors. ==
 
 
 
If no query is present, the server gives a default page '''index.html'''. You should provide pages for your server and for all the providers.
 
  
The server index.html is at '''OWS/web/static/index.html''', which will be displayed from url http://localhost:8080/, Index of an external server index.html is [http://128.252.202.19:8080/ here].  
+
Read [[Creating_NetCDF_CF_Files| How to pack your data to NetCDF Files]].
  
Every provider folder should also have an index.html like '''OWS/web/static/testprovider\index.html''' which will be displayed from http://localhost:8080/testprovider, index of an external provider front page is [http://128.252.202.19:8080/HTAP here]
+
The best thing is to have all your heterogeneous data in one NetCDF file. In that case, your data shares all the main dimensions: Latitude, Longitude, Time, Elevation, Wavelength etc... This enables the WCS service to filter out '''map slices''', '''time series''' or cubes of any combination of the dimensions.  
  
Every provider should have wcs_capabilities.conf that lists keywords and contact information. The format is simple, copy one from the testprovider and edit it.
+
Copy your NetCDF file to web/static/myproject
  
=== Windows Implementation Bug ===
+
* Every NetCDF file becomes a coverage.
'''Important''' There is a bug deep in python core libraries that make serving text files tricky. The files need to be encoded with unix style line ending convention '\n', instead of windows style '\r\n'.  
+
* Every Variable in the file becomes a field.  
  
To fix this, issue command:
+
The configuration is automatic, /OWS/web/'''owsadmin.py''' script extracts the metadata from these cubes and you are ready to run.
  
     python /OWS/web/owsadmin.py unix_nl filename.html
+
     python owsadmin.py wcs_prepare -ao
  
for every html file you serve.
+
This extracts all the metadata from all the providers.
  
 
== Serving data from periodic collection of NetCDF files ==
 
== Serving data from periodic collection of NetCDF files ==
  
Sometimes you have accumulated a huge number of small NetCDF files, like daily slices from a model output. You could combine those into one big cube, but you for a terabyte of files, that may not be an option.  
+
Sometimes you have accumulated a huge number of small NetCDF files, like daily slices from a model output. You could combine those into one big cube, but for a terabyte of files, that may not be an option. The HTAP demo provider is an example how to do just this. Look at the [http://128.252.202.19:8080/static/HTAP/HTAP_wcs.py HTAP_wcs.py] and [http://128.252.202.19:8080/static/HTAP/HTAP_config.py HTAP_config.py].
 
 
Download our HTAP test package [http://sourceforge.net/downloads/aq-ogc-services/ows/custom-netcdf-1.2.0.zip/ custom-netcdf-1.2.0.zip]. It only has two days of data to make download small. Then read the [http://localhost:8080/HTAP custom provider page]
 
 
 
== Storing Point Data in a Relational Database ==
 
 
 
Provider [http://localhost:8080/point point] is an example how to configure this service to use SQL database to serve point data.
 
 
 
Point data is often stored in SQL databases. There's no standard schema like CF-1.0 convention for NetCDF files, so it is not possible to just connect and start serving. You have to create the configuration description.
 
 
 
Therefore, the WCS query processor needs to know what to select and join. This information must be edited into the configuration script.
 
 
 
=== Notes on SQL ===
 
 
 
One of the most powerful ideas in relational database design is the concept of a view. You don't need to change the existing data tables, creating a view that makes your DB to look like the one needed is usually enough. This is by far the easiest way to configure your WCS.
 
 
 
It is better to design a normalized schema and only optimize with benchmarks available. Especially filtering small lat/lon ranges is much more efficient to do on a normalized location table rather than denormalized data table.
 
 
 
=== Location Table ===
 
 
 
The common thing between different databases is, that they need to have a location table.
 
 
 
    table location
 
    +----------+-------------------------+
 
    | loc_code | lat  | lon    | elev  |
 
    +----------+-------------------------+
 
    | KMOD    | 37.63 | -120.95 |  30.0 |
 
    | KSTL    | 38.75 |  -90.37 | 172.0 |
 
    | KUGN    | 42.42 |  -87.87 | 222.0 |
 
    |...      |      |        |      |
 
    +----------+-------------------------+
 
 
 
Here loc_code is the primary key and lat,lon is the location. Optional fields can be added. CIRA VIEWS database has a location table, but it's called '''Site''' and it spells full '''longitude'''. The datafed browser uses standard names loc_code, loc_name, lat and lon for browsing. For plug-and-play compatibility we recommend using these names. In the CIRA VIEWS database, the view creation would be:
 
 
 
    create view location as
 
    select
 
        SiteCode as loc_code,
 
        Latitude as lat,
 
        Longitude as lon
 
    from Site
 
 
 
The primary key is loc_code, being unique for all the locations.
 
 
 
If the fields have different names they can be aliased in the configuration.
 
 
 
== Some Different DB Schema types ==
 
 
 
In this documentation three different schemas are presented. Each of them have good and bad points.
 
 
 
=== One Big Data Table ===
 
 
 
In this case, all the data is in the same table:
 
 
 
    +----------+------------+------+------+------+
 
    | loc_code | datetime  | TEMP | DEWP | VIS  |
 
    +----------+------------+------+------+------+
 
    | KMOD    | 2009-06-01 | 87.8 | 51.4 | 10  |
 
    | KMOD    | 2009-06-02 | 82.3 | 51.4 | NULL |
 
    | KSTL    | 2009-06-01 | 78.6 | 34.9 | 18  |
 
    | ...      |            |      |      |      |
 
    +----------+------------+------+------+------+
 
 
 
The foreign key to location table is loc_code. The primary key is (loc_code, datetime)
 
 
 
'''Strengths:''' Simple, No joining when querying all the fields.
 
 
 
'''Downsides:''' Needs nulls for missing data, querying just one field is inefficient.
 
 
 
=== Long And Skinny Table ===
 
 
 
In this case, all the data is in the same table:
 
 
 
    +----------+------------+------+-------+
 
    | loc_code | datetime  | data | param |
 
    +----------+------------+------+-------+
 
    | KMOD    | 2009-06-01 | 87.8 | TEMP  |
 
    | KMOD    | 2009-06-02 | 82.3 | TEMP  |
 
    | KSTL    | 2009-06-01 | 78.6 | TEMP  |
 
    | KMOD    | 2009-06-01 | 51.4 | DEWP  |
 
    | KMOD    | 2009-06-02 | 51.4 | DEWP  |
 
    | KSTL    | 2009-06-01 | 34.9 | DEWP  |
 
    | KMOD    | 2009-06-01 | 10  | VIS  |
 
    | KMOD    | 2009-06-02 | 10  | VIS  |
 
    | KSTL    | 2009-06-01 | 18  | VIS  |
 
    | ...      |            |      |      |
 
    +----------+------------+------+------+
 
 
 
'''Strengths:''' No nulls, Easy to add fields.
 
 
 
'''Downsides:''' Querying requires extra filtering with parameter index, slower than others.
 
 
 
=== One Data Table For Each Param ===
 
 
 
Each parameter has its own data table. In this case there's no need for nulls, and is the fastest for one parameter query.
 
 
 
    +----------+------------+------+
 
    | loc_code | datetime  | TEMP |
 
    +----------+------------+------+
 
    | KMOD    | 2009-06-01 | 87.8 |
 
    | KMOD    | 2009-06-02 | 82.3 |
 
    | KSTL    | 2009-06-01 | 78.6 |
 
    | ...      |            |      |
 
    +----------+------------+------+
 
 
 
 
 
    +----------+------------+------+
 
    | loc_code | datetime  | DEWP |
 
    +----------+------------+------+
 
    | KMOD    | 2009-06-01 | 51.4 |
 
    | KMOD    | 2009-06-02 | 51.4 |
 
    | KSTL    | 2009-06-01 | 34.9 |
 
    | ...      |            |      |
 
    +----------+------------+------+
 
 
 
 
 
    +----------+------------+-----+
 
    | loc_code | datetime  | VIS |
 
    +----------+------------+-----+
 
    | KMOD    | 2009-06-01 | 10  |
 
    | KMOD    | 2009-06-02 | 10  |
 
    | KSTL    | 2009-06-01 | 18  |
 
    | ...      |            |    |
 
    +----------+------------+-----+
 
 
 
 
 
'''Strengths:''' No nulls, Easy to add tables, easy to add heterogenous flag fields, fastest queries for single parameter.
 
 
 
'''Downsides:''' More tables, querying all the parameters at once requires a massive join.
 
 
 
== Configuring the WCS using SQL Views ==
 
 
 
This is demonstrated in the test provider '''point'''.
 
 
 
The demonstration is using sqlite, which is distributed with python by default. The project has following files:
 
 
 
* '''pntdata.py''': This script creates the test database and fills it with dummy data.
 
* '''pntdata.db''': The sqlite database file created by pntdata.py
 
* '''point_config.py'''
 
** WCS coverage information
 
** Mapping the coverages and fields to SQL tables.
 
* '''point_WCS.py'''
 
** Loads the metadata. In this demo version, this is done by hardcoding the tables in point_config.py. In CIRA/VIEWS this is done by querying the parameter table.
 
** Gets db connection. The metadata mappings allows the service to generate SQL on it's own.
 
 
 
 
 
=== Contents of point_config.py ===
 
 
 
All of these are python dictionaries and lists.
 
 
 
The syntax is simple, This is a list:
 
 
 
    ['loc_code', 'lat', 'lon']
 
 
 
This is a dictionary:
 
 
 
    {'key1':'value1', 'key2': 'value2 }
 
 
 
Since these are just python objects, they can be generated a database as well.
 
 
 
=== Location Table Configuration ===
 
 
 
First, the location table for coverage SURF_MET. Because WCS does not have a good place to describe a location table, we use WFS, Web Feature Service to do the same. [http://128.252.202.19:8080/CIRA?service=WFS&Version=1.0.0&Request=GetFeature&typename=VIEWS&outputFormat=text/csv Sample WFS Call.]
 
 
 
    location_info = {
 
        'location':{
 
            'service':'WFS',
 
            'version':'1.0.0',
 
 
 
In the '''CIRA/VIEWS''' database, we're not authorized to create a view. So we need to map the 'Site' table and it's columns.
 
 
 
            'table_alias':'Site',
 
            'columns':{
 
                'loc_code':{'column_alias':'SiteCode'},
 
                'lat':{'column_alias':'Latitude'},
 
                'lon':{'column_alias':'Longitude'},
 
                }
 
            },
 
        }
 
 
 
=== Data Table Configuration using SQL View ===
 
 
 
    point_info = {
 
 
 
First key is the coverage information and it's descriptions:
 
 
 
        'SURF_MET':
 
            {
 
                'Title':'Surface Meteorological Observations',
 
                'Abstract':'Dummy test data.',
 
 
 
The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension.
 
 
 
                'axes':{
 
                    'X':(-180, 179.75),
 
                    'Y':(-90, 89.383),
 
                    'T':iso_time.parse('2009-09-01T12:00:00/2009-09-03T12:00:00/PT1H'),
 
                    },
 
 
 
Then comes the description of the fields.  
 
 
 
 
 
                'fields':{
 
                    'TEMP':{
 
                        'Title':'Temperature',
 
                        'datatype': 'float',
 
                        'units':'deg F',
 
 
 
The location table is a real dimension. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they can be specified individually.  
 
 
 
                        'axes':location_info,
 
 
 
The access instructions. This configuration is using 'complete_view', so the administrator has created the view that joins together the location table and the temperature data table. The SQL query will typically look like '''select loc_code, lat, lon, datetime, temp, flag from TEMP_V where datetime = '2009-09-01 and (lat between 34 and 44) and (lon between -90 and -80)'''. This is by far the easiest way to configure the WCS.
 
 
 
                        'complete_view':{
 
                            'view_alias':'TEMP_V',
 
                            'columns':['loc_code','lat','lon','datetime','temp', 'flag'],
 
                            },
 
                        },
 
 
 
The end of the first field.
 
 
 
 
 
=== Data Table Configuration using by Mapping Original Tables ===
 
 
 
For demonstration purposes, the next field is configured without a view.
 
 
 
                    'DEWP':{
 
                        'Title':'Dewpoint',
 
                        'datatype': 'float',
 
                        'units':'deg F',
 
                        'axes':location_info,
 
 
 
First we tell which table contains the data.
 
 
 
                        'table_alias':'DEWP_base',
 
 
 
The '''CIRA/VIEWS''' has FactDate, not datetime, so we have to map.
 
 
 
                        'datetime_alias':'FactDate',
 
 
 
If you have a simple join by loc_code, then the point the SQL processor can just generate 'inner join DEWP_base on DEWP_base.loc_code = location.loc_code'. In the CIRA/VIEWS, it's more complicated. We need to join the Site table for locations and since all the data is in the same table, we also need to filter by the parameter code, which now requires joining the parameter table. Also the foreign key is SiteID, not the loc_code alias SiteCode.
 
 
 
                        'joins':(
 
                            'inner join AirFact3 on AirFact3.SiteID = Site.SiteID ' +
 
                            'inner join Parameter on AirFact3.ParamID = Parameter.ParamID'),
 
 
 
The default common data filter is empty. Again, CIRA/VIEWS needs more:
 
* Aggregation: some data is aggregated, we're interested in the raw data only.
 
* Exclude null lat/lon and -999 in data.
 
* Only get data from certain programs.
 
* Finally: Since all the parameters are in the same table, filter by param code.
 
 
 
 
 
                        'common_data_filter':(
 
                            'AggregationID = 1 and Site.latitude is not null and Site.Longitude is not null ' + 
 
                            'and AirFact3.FactValue <> -999 ' +
 
                            'and AirFact3.ProgramID in (10001, 10005, 20002) -- ('INA', 'IMPPRE', 'ASPD') ' +
 
                            "and Parameter.ParamCode = 'MF'"),
 
 
 
Finally: All the data is in the 'FactValue', so we need to map the column.
 
 
 
                        'data_columns':[
 
                            {'name':'MF', 'column_alias':'FactValue'},
 
                            ],
 
 
 
In this point database, the above is not necessary since fields have defaults. But we still need the data column.
 
 
 
                        'data_columns':[
 
                            {'name':'dewp'},
 
                            ],
 
                        },
 
 
 
The end of field.
 
  
                    },
+
To try it out yourself, open the [https://sourceforge.net/projects/aq-ogc-services/files_beta/ download page] in another tab and get '''custom-netcdf-1.2.3.zip or later. It contains folders GEMAQ-v1p0, GEOSChem-v45 and web. web contains the code, other folders contain the data. Copy these folders in /OWS so that the "web" folder is copied on top of the existing "web". Under web/static, HTAP is a custom grid demo, CIRA is the point demo, irrelevant here.
            },
 
  
If you have another coverage, add it here:
+
The custom handler does three things.
 +
* It gets the datetime from the TimeSequence parameter. Time ranges are reported as error.
 +
* It finds the daily netcdf file by the datetime.
 +
* It redirects the extractor to get the subcube from that file
  
        'Cov-Name':
+
To achieve this it's using standard object-oriented inheritance and method overriding.
            {
 
                'Title': ...
 

Latest revision as of 12:04, October 8, 2010

Back to WCS Wrapper

Back to WCS Wrapper Configuration

Project on SourceForge

Serving Data from complete NetCDF-CF files

Read How to pack your data to NetCDF Files.

The best thing is to have all your heterogeneous data in one NetCDF file. In that case, your data shares all the main dimensions: Latitude, Longitude, Time, Elevation, Wavelength etc... This enables the WCS service to filter out map slices, time series or cubes of any combination of the dimensions.

Copy your NetCDF file to web/static/myproject

  • Every NetCDF file becomes a coverage.
  • Every Variable in the file becomes a field.

The configuration is automatic, /OWS/web/owsadmin.py script extracts the metadata from these cubes and you are ready to run.

   python owsadmin.py wcs_prepare -ao 

This extracts all the metadata from all the providers.

Serving data from periodic collection of NetCDF files

Sometimes you have accumulated a huge number of small NetCDF files, like daily slices from a model output. You could combine those into one big cube, but for a terabyte of files, that may not be an option. The HTAP demo provider is an example how to do just this. Look at the HTAP_wcs.py and HTAP_config.py.

To try it out yourself, open the download page in another tab and get custom-netcdf-1.2.3.zip or later. It contains folders GEMAQ-v1p0, GEOSChem-v45 and web. web contains the code, other folders contain the data. Copy these folders in /OWS so that the "web" folder is copied on top of the existing "web". Under web/static, HTAP is a custom grid demo, CIRA is the point demo, irrelevant here.

The custom handler does three things.

  • It gets the datetime from the TimeSequence parameter. Time ranges are reported as error.
  • It finds the daily netcdf file by the datetime.
  • It redirects the extractor to get the subcube from that file

To achieve this it's using standard object-oriented inheritance and method overriding.