Difference between revisions of "Glossary"

Revision as of 08:30, September 2, 2010

Glossary for Common Terms and Standard Names in Datafed WCS Wrapper Framework

AQ_uFIND

A front end to GEOSS Clearinghouse. Currently can be used to find WMS services.

Example use: AQ_uFIND.aspx?datatype=point

Capabilities Processor

This component creates the standard XML documents for WCS and WFS services.

It operates on Metadata and data configuration. From Metadata the capabilities document gets Title, Abstract, Keywords, Contact person etc...

From data configuration the processor gets full information of each coverage.

WCS Wrapper Configuration for Cubes

WCS Wrapper Configuration for Point Data

Coverage Processor for Cubes

The Coverage Processor is a component that performs three different activities:

WCS Query Parser. The syntax is checked and output is binary object of all the query elements.
Subsetter. This component finds the desired coverage and applies filters to read a subset of the data:
- Fields: A client querying wind data may be interested in speed and direction, but reject air pressure.
- Bounding Box: restrict response to certain geographical area.
- Time: Default time, One time, list of times, periodic range of times.
- Grid size and interpolation: High resolution data can be interpolated to lower resolution.
- By dimension: Select only one or some wavelengths,elevations
Formatter. The binary data is returned in desired format. Currently supported are NetCDF-CF for cubes and CSV, Comma Separated Values for points.

The NetCDF-CF based processor is completely generic for any compatible netCDF-CF file.

Coverage Processor for Points

The Coverage Processor for points is a component that performs three different activities:

WCS Query Parser. The syntax is checked and output is binary object of all the query elements.
SQL Query Compiler.
- Fields:
  - Select required fields
  - join in location table
  - join in tables for fields
- Bounding Box:
  - Add where filter by location lat and lon.
  - Time: Add filter by data datetime.
  - location: if requested, add filter by location
Formatter. The table data is returned in desired format. Currently the only supported is CSV, Comma Separated Values.

SQL processors can be either configured to supported DB schema types, or custom written for esoteric databases.

By writing a custom processor, anything can be used as a data source.

Custom Coverage Processor for Cubes

A custom module for data storage that is not supported directly.

Example: HTAP_wcs.py custom WCS module

The HTAP demo service stores data in daily netCDF-CF files. The custom module must:

Check, that only one datetime in the TimeSequence is used, because getting timeseries from separate files is not supported.
Locate the correct file. Normally, the file is just the coverage identifier and '.nc' extension. In this case the template is GEMAQ-v1p0_SR1_sfc_%(year)s_%(doy)s.nc year and doy, Julian day, gets replaced.

The HTAP_wcs.py implements this by inheriting the default netCDF-CF processor and overriding the _input_file method.

   def _input_file(self, query):

Check that the query has exactly one datetime.

       if len(query.time) != 1:
           raise owsutil.OwsError(
               'CustomError',
               'time parameter must match exactly one datetime')

Get the single datetime

       datetimes = query.time.all_datetimes()
       dt = datetimes[0]

Now get the file template from the configuration

       config = self._load_config(query)[query.identifier]
       files = metadata['files']
       ncfile = files['template'] % {'year':str(dt.year), 'doy': str(iso_time.day_of_year(dt)).zfill(3)}

Now we have the filename, get the folder and return the full file path name.

       src_root, last = __file__, 
       while last != 'web':
           src_root, last = os.path.split(src_root)
       return os.path.join(src_root, files['path'], ncfile)

By writing a custom processor, anything can be used as a data source.

Custom Coverage Processor for Points

The service CIRA/VIEWS cannot access the database directly due to firewall. The SQL query can still be compiled by with configuration, but it has to be executed via a proxy server. The custom processor overrides getcoverage method and does just that.

By writing a custom processor, anything can be used as a data source.

Data Configuration for Cubes

For standard netCDF-CF files, the configuration is automatic.

Each file becomes a coverage.
Each variable becomes a field.

This is by far the easiest way to create a WCS service. Examples are testprovider which comes with the installation package, and NASA which serves some datasets downloaded from NASA.

For daily netCDF-CF files it is possible to create a service without compiling them into single file. See Serving data from periodic collection of NetCDF files as an example.

By creating a custom handler, it is possible to use anything as a data source.

Data Configuration for Points

Programmed instructions for the framework how to access data.

There should be a netCDF-CF like convention for SQL databases. That would allow the point coverage processor just connect to the DB and serve, without any other configuration. But since such convention does not exist, manual configuration is needed.

From file point_config.py

Coverage information and it's descriptions:

   point_info = {
       'SURF_MET':
           {
               'Title':'Surface Meteorological Observations',
               'Abstract':'Dummy test data.',

The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension. Time dimension format is ISO 8601 (start-inclusive)/(end-inclusive)/periodicity. PT1H means Periodicity Time 1 Hour, P1D would mean Periodicity Time 1 Day

               'axes':{
                   'X':(-180, 179.75), 
                   'Y':(-90, 89.383),
                   'T':iso_time.parse('2009-09-01T12:00:00/2009-09-03T12:00:00/PT1H'),
                   },

Then comes the description of the fields.

               'fields':{
                   'TEMP':{
                       'Title':'Temperature',
                       'datatype': 'float',
                       'units':'deg F',

The location table is a real dimension, Latitude and Longitude are attributes along location axis, not dimensions themselves. So a typical point dataset with locations and regular time intervals is a 2-dimensional dataset. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they need to be specified individually.

                       'axes':location_info,

The access instructions. This configuration is using 'complete_view', so the administrator has created the view that joins together the location table and the temperature data table. The SQL query will typically look like select loc_code, lat, lon, datetime, temp, flag from TEMP_V where datetime = '2009-09-01 and (lat between 34 and 44) and (lon between -90 and -80). This is by far the easiest way to configure the WCS.

                       'complete_view':{
                           'view_alias':'TEMP_V',
                           'columns':['loc_code','lat','lon','datetime','temp', 'flag'],
                           },
                       },

By creating a custom handler, it is possible to store data anywhere. You still need to declare the configuration in a python dictionary, just like above.

Datafed Browser

TODO: describe classic browser

TODO: describe browsing WCS without df catalog http://webapps.datafed.net/datafed.aspx?wcs=http://128.252.202.19:8080/CIRA&coverage=VIEWS&param_abbr=SO4f

TODO: Describe GE Plugin browser

Feature Processor for Points

Web Feature Service, WFS, is good in publishing geographic information that does not change by time.

With datafed WCS it is used to publish the location table for point data, because WCS DescribeCoverage Document does not support such rich dimensions well and location tables are static geographic information.

The component that performs three different activities:

WFS Query Parser. The syntax is checked and output is binary object of all the query elements.

Subsetter.
- Each field may have different location table. If a the data is sparse, some fields have data only in a few locations, it makes sense to return only those locations.
- Locations may also be filtered by geographic bounding box.
- Other WFS filters are not implemented.

Formatter. The data is returned in desired format. Currently the only supported is CSV, Comma Separated Values

GEOSS Clearinghouse

The Clearinghouse is a component in the GEOSS Common Infrastructure. One of it's functions is the GEOSS Components and Services Registry

Google Earth

Google Earth is a standalone program, that can display data on 3D globe.

Google Earth Plugin is a browser extension, that puts interactive Google Earth on a web page.

http://earth.google.com/

ISO 19115 Metadata

Description of a service, with strictly defined XML presentation. Contains service URL's and metadata about the service.

ISO 19115 Maker

A public service to create an ISO 19115 record from a WCS or WMS service.

If the Capabilities document contains necessary keywords, the document can be created automatically: ISO 19115 for AIRNOW pmfine WMS.

Without keywords in the URL, the metadata can be passed via URL parameters.

KML Keyhole Markup Language

KML is the way to describe content in Google Earth and Google Maps. KML documentation is hosted by google.

KML Maker

Datafed tools produce KML directly out of data, which can be produced with WCS or WMS services.

KML from a CIRA/VIEWS showing SO4f and direct link

KML from NASA giovanni WMS and direct link

Fast Precompiled Examples:

Point Demo

Gridded Demo

Location Table for Points

The location table describes the location dimension for point data.

The fields that datafed uses are:

Mandatory fields:

- loc_code: A unique text field, used to identify a location.
- lat: Latitude of the location in degrees_north
- Lon: Elevation of the location in degrees_east

Optional datafed fields:

- loc_name: Reasonably short text describing location.
- elev: elevation in meters.

data specific fields:
- Any field with any name

Good loc_codes are short abbreviations like ACAD and YOSE for Acadia and Yosemite National Parks. Completely numeric loc codes are possible, but more difficult to recognize and since leading zeros are significant, tools like excel may think they're numbers and cut them off.

If the loc_codes are long, like 9 characters, it's useful to generate a numeric 16-bit primary key for the location table and use it for joining the data tables with the location table. This may help in indexing and speed things up quite a bit.

Example: CIRA/VIEWS location table

Part of the CSV response:

   loc_code, loc_name, lat, lon, elev
   WHRI1, White River NF, 39.1536, -106.8209, 3413.5
   WICA1, Wind Cave, 43.5576, -103.4838, 1296
   WIMO1, Wichita Mountains, 34.7323, -98.713, 509
   YELL2, Yellowstone NP 2, 44.5653, -110.4002, 2425

Metadata

Abstract, Contact Information, Keywords and any other such documentation that is needed in classifying or finding the service. The metadata is accessible for the user via capabilities and coverage description documents.

Every provider should have wcs_capabilities.conf that lists keywords and contact information. The format is simple, copy one from the testprovider and edit it.

   # this file provides some information about the provider
   # and is incorporated into the respective WCS responses.
   # all currently available field identifiers are listed below.
   # please define every identifier only once.
   # other identifiers will be ignored, input is case sensitive.
   # the format is always <identifier>: <value>.
   # whitespaces before and after <value> will be stripped.
   # KEYWORDS can take a comma separated list that will then be
   # included in the respective keyword tags
   # empty lines and lines starting with "#" will be ignored.
   PROVIDER_TITLE: National Climate Data Center
   PROVIDER_ABSTRACT: National Climate Data Center is the worlds largest archive of climate data.
   KEYWORDS: Domain:Aerosol, Platform:Network, Instrument:Unknown, DataType:Point, Distributor:DataFed, Originator:NCDC, TimeRes:Minute, Vertical:Surface, TopicCategory:climatologyMeteorologyAtmosphere
   FEES: NONE
   CONSTRAINTS: NONE
   PROVIDER_SITE: http://lwf.ncdc.noaa.gov/oa/ncdc.html
   CONTACT_INDIVIDUAL: Climate Contact, Climate Services Branch, National Climatic Data Center
   CONTACT_PHONE: 828-271-4800
   CONTACT_STREET: 151 Patton Avenue Room 468
   CONTACT_CITY: Asheville 
   CONTACT_ADM_AREA: North Carolina 
   CONTACT_POSTCODE: 28801-5001
   CONTACT_COUNTRY: USA
   CONTACT_EMAIL: ncdc.info@noaa.gov

Here is the real live NCDC wcs_capabilities.conf

NetCDF-CF

NetCDF file format contains four kinds of information:

Global attributes
- Simple name=value pairs

Dimensions
- Only declares the length of the dimension
- Contains no dimension data.

Variables
- Array data with any number of dimensions.
- Zero dimensions meaning scalar data.

Variable Attributes:
- Simple name=value pairs associated to a variable.

While these are enough to describe any data, it's not easy to interpret what the data actually means. What is self-evident for humans is difficult for a computer program to reason. If you have a NetCDF viewer, it should be possible just open the file and display the data on a geographic map. But making a program that can automatically get the geographic dimensions from a NC file, is very difficult.

Conventions come to rescue. CF-1.0 Standardizes many things:

Standard name: what is the measure data about
Units
How to tell, that a variable is one of the following:
- Data Variable, containing real data.
- Dimension Coordinate Variable, containing dimension coordinates.
- Dimension Bounds Variable, containing lower and upper bounds of a dimension coordinate.
Projection

With this part standardized, your program can list the data variables for you and tell you exactly what you can filter by.

Links:

CF 1.0 - 1.4 contain conventions for cube data.

Unofficial CF-1.5 contains point data encoding. Expired certificate, add security exception.

CF Conventions

NetCDF Conventions

Unidata NetCDF documentation

Creating NetCDF CF Files

Point Data Example

All of the configuration is done using python dictionaries and lists. The syntax is simple, This is a list:

   ['loc_code', 'lat', 'lon']

and this is a dictionary:

   {'key1':'value1', 'key2': 'value2' }

The test provider point is an example how to configure this service to use a SQL database to serve point data.

The data for the demonstration is stored for into sqlite, which is distributed with python by default. The project has following files:

pntdata.py: This script creates the test database and fills it with dummy data.
pntdata.db: The sqlite database file created by pntdata.py
point_config.py:
- Declares the location table in the SQL database.
- Mapping the coverages and fields to SQL tables.
point_WCS.py is the custom WCS handler
point_WFS.py is the custom WFS handler that delivers the location table.

Location Configuration for Points

From file point_config.py

In it's simplest case, SQL views are used to create required location table, so no aliasing is needed.

   location_info = {
       'location':{
           'service':'WFS',
           'version':'1.0.0',
           },
       }

These are the standard names that datafed uses:

The dimension name is "location".
No aliasing is needed, since the DB table/view and column names are standard.
The view/table name in the DB is "location".
The columns are lat", "lon" and "loc_code" and loc_code is a text type, not an integer.

In the CIRA/VIEWS case, table and fields are alised:

   VIEWS_location_info = {
       'location':{
           'service':'WFS',
           'version':'1.0.0',
           'table_alias':'Site',
           'columns':{
               'loc_code':{'column_alias':'SiteCode'},
               'lat':{'column_alias':'Latitude'},
               'lon':{'column_alias':'Longitude'},
               }
           },
       }

The dimension name is still "location"
The location table is called "Site"
"SiteCode", "Latitude" and "Longitude" are aliased to "loc_code", "lat" and "lon".

SQL Database for Points

Currently the datafed WCS for points supports one kind of point data: Fixed locations and regular intervals.

Point data is often stored in SQL databases. There's no standard schema, like CF-1.0 convention for NetCDF files, so it is not possible to just connect and start serving. You have to create the configuration description.

One of the most powerful ideas in relational database design is the concept of a view. You don't need to change the existing data tables, creating a view that makes your DB to look like the one needed is enough.

The Simplest Case: Configure with SQL Views

Location Table/View

The common thing between different databases is, that they need to have a location table. The current implementation is based on time series from stationary locations.

   table/view location
   +----------+-------+---------+-------+
   | loc_code | lat   | lon     | elev  |
   +----------+-------+---------+-------+
   | KMOD     | 37.63 | -120.95 |  30.0 |
   | KSTL     | 38.75 |  -90.37 | 172.0 |
   | KUGN     | 42.42 |  -87.87 | 222.0 |
   |...       |       |         |       |
   +----------+-------+---------+-------+

Here loc_code is the primary key and lat,lon is the location. Optional fields can be added. Your database may have a location table with different name and different field names, but that does not matter. CIRA VIEWS database has a location table, but it's called Site and it spells full longitude. The datafed browser uses standard names loc_code, loc_name, lat and lon for browsing; to get plug-and-play compatibility we these names are needed. In the CIRA VIEWS database, the view creation would be:

   create view location as 
   select 
       SiteCode as loc_code, 
       Latitude as lat, 
       Longitude as lon 
   from Site

The primary key is loc_code, being unique for all the locations.

Because WCS does not have a good place to describe a location table, we use WFS, Web Feature Service to do the same. Sample WFS Call.

Data Views

Each data variable needs a view. For example:

   create view TEMP_V as
   select 
       location.loc_code, 
       location.lat, 
       location.lon, 
       TEMP_base.datetime, 
       TEMP_base.temp,
       TEMP_base.flag
   from location
   inner join TEMP_base on TEMP_base.loc_code = location.loc_code

Each parameter has its own data view that looks like

   view TEMP_V
   +----------+-------+---------+------------+------+------+
   | loc_code | lat   | lon     | datetime   | temp | flag |
   +----------+-------+---------+------------+------+------+
   | KMOD     | 37.63 | -120.95 | 2009-06-01 | 87.8 | X    |
   | KMOD     | 37.63 | -120.95 | 2009-06-02 | 82.3 |      |
   | KSTL     | 38.75 |  -90.37 | 2009-06-01 | 78.6 |      |
   | ...      |       |         |            |      |      |
   +----------+-------+---------+------------+------+------+

   view DEWP_V
   +----------+-----------------+------------+------+
   | loc_code | lat   | lon     | datetime   | dewp |
   +----------+-----------------+------------+------+
   | KMOD     | 37.63 | -120.95 | 2009-06-01 | 51.4 |
   | KMOD     | 37.63 | -120.95 | 2009-06-02 | 51.4 |
   | KSTL     | 38.75 |  -90.37 | 2009-06-01 | 34.9 |
   | ...      |       |         |            |      |
   +----------+-----------------+------------+------+

WCS Capabilities Document

The document contains all the high level information about a service.

Metadata
List of Coverages.

From NRL service at datafed:

The Document contains:

Description of the Service
- Title=Naval Research Laboratory
- Abstract=Naval Research Laboratory Marine Meteorology Division
Keywords Domain:Aerosol etc...
Contact information
HTTP access information
List of coverages in the service
- Identifier=NAAPS
- Title=NOGAPS and NAAPS fields in press coordinate
- Abstract=none
- Latitude and Longitude bounds = whole world

WCS Describe Coverage Document

The document describes the coverage in detail, so that the user knows what the data is and what are the dimensions of the data.

Cube Data:

NRL NAAPS

Title=NOGAPS and NAAPS fields in press coordinate
Abstract=none
Identifier=NAAPS
Latitude and Longitude bounds in EPSG:4326 and WGS 84 projections. This is regardless what the data projection is.
Grid bounds and size in the WGS 84 projection, it's the only currently supported projection.
Time dimension
Fields of Coverage. The example is the second on the list.
- Title=specific humidity - nogaps
- Identifier=sphu
- Units of measure, UOM=(g/g)
- NullValue=nan, IEEE standard for not a number.
- InterpolationMethods=none
- Elevation dimension, values from 1000 to 10 mbar.
Supported Coordinate Systems, at the very end, linear lat-lon urn:ogc:def:crs:EPSG::4326 and lon-lat urn:ogc:def:crs:OGC:2:84
Supported Formats: image/netcdf and application/x-netcdf are the same, netCDF-CF. There is a need to standardize this format.

WCS GetCoverage Query

The main query to get data from a WCS

GetCoverage for points

CIRA/VIEWS small geo range for one datetime &store=false

CIRA/VIEWS small geo range for one datetime &store=true

WFS Capabilities Document

The document contains all the high level information about a service

TODO: implement

TODO: samples

WFS DescribeFeatureType

TODO: implement

TODO: samples

WFS GetFeature Query

The main query to get data from a WFS

CIRA/VIEWS location table

@@ Line 491: / Line 491: @@
 * List of Coverages.
-From [http://128.252.202.19:8080/NRL?service=WCS&acceptversions=1.1.2&Request=GetCapabilities NRL service at datafed]
+From [http://128.252.202.19:8080/NRL?service=WCS&acceptversions=1.1.2&Request=GetCapabilities NRL service at datafed]:
 The Document contains: