Glossary

Glossary for Common Terms and Standard Names in Datafed WCS Wrapper Framework

AQ_uFIND

A front end to GEOSS Clearinghouse. Currently can be used to find WMS services.

Example use: AQ_uFIND.aspx?datatype=point

Capabilities Processor

This component creates the standard XML documents for WCS and WFS services.

It operates on Metadata and data configuration. From Metadata the capabilities document gets Title, Abstract, Keywords, Contact person etc... From data configuration the processor gets full information of each coverage.

WCS Wrapper Configuration for Cubes

WCS Wrapper Configuration for Point Data

Coverage Processor

The Coverage Processor is a component that performs three different activities:

WCS Query Parser. The syntax is checked and output is binary object of all the query elements.
Subsetter. This component finds the desired coverage and applies filters to read a subset of the data:
- Fields: A client querying wind data may be interested in speed and direction, but reject air pressure.
- Bounding Box: restrict response to certain geographical area.
- Time: Default time, One time, list of times, periodic range of times.
- Grid size and interpolation: High resolution data can be interpolated to lower resolution.
- By dimension: Select only one or some wavelengths,elevations,locations
Formatter. The binary data is returned in desired format. Currently supported are NetCDF-CF for cubes and CSV, Comma Separated Values for points.

NetCDF-CF based processor is completely generic for any compatible netCDF-CF file.

SQL processors can be either configured to supported DB schema types, or custom written for esoteric databases.

By writing a custom processor, anything can be used as a data source.

Cube Data Configuration

For standard netCDF-CF files, the configuration is automatic. Each file becomes a coverage, and each variable becomes a field. This is by far the easiest way to create a WCS service. Examples are testprovider which comes with the installation package, and NASA which serves some datasets downloaded from NASA.

For daily netCDF-CF files it is possible to create a service without compiling them into single file. See Serving data from periodic collection of NetCDF files as an example.

By creating a custom handler, it is possible to store data anywhere.

Datafed Browser

TODO: describe classic browser

TODO: describe browsing WCS without df catalog http://webapps.datafed.net/datafed.aspx?wcs=http://128.252.202.19:8080/CIRA&coverage=VIEWS&param_abbr=SO4f

TODO: Describe GE Plugin browser

Feature Processor

Web Feature Service, WFS, is good in publishing geographic information that does not change by time.

With datafed WCS it is used to publish the location table for point data, because WCS DescribeCoverage Document does not support such rich dimensions well and location tables are static geographic information.

The component that performs three different activities:

WFS Query Parser. The syntax is checked and output is binary object of all the query elements.

Subsetter.
- Each field may have different location table. If a the data is sparse, some fields have data only in a few locations, it makes sense to return only those locations.
- Locations may also be filtered by geographic bounding box.
- Other WFS filters are not implemented.

Formatter. The data is returned in desired format. Currently the only supported is CSV, Comma Separated Values

GEOSS Clearinghouse

The Clearinghouse is a component in the GEOSS Common Infrastructure. One of it's functions is the GEOSS Components and Services Registry

Google Earth

http://earth.google.com/

TODO: describe standalone and plugin

describe images and static points

describe dynamic points

ISO 19115 Metadata

Description of a service, with strictly defined XML presentation. Contains service URL's and metadata about the service.

ISO 19115 Maker

A public service to create an ISO 19115 record from a WCS or WMS service.

If the Capabilities document contains necessary keywords, the document can be created automatically: ISO 19115 for AIRNOW pmfine WMS.

Without keywords in the URL, the metadata can be passed via URL parameters.

KML Keyhole Markup Language

KML is the way to describe content in Google Earth and Google Maps. KML documentation is hosted by google.

KML Maker

Datafed tools produce KML directly out of data, which can be produced with WCS or WMS services.

KML from a CIRA/VIEWS showing SO4f and direct link

KML from NASA giovanni WMS and direct link

Precompiled examples:

Point Demo

Gridded Demo

Location Table

The location table describes the location dimension for point data.

The fields that datafed uses are:

Mandatory fields:

- loc_code: A unique text field, used to identify a location.
- lat: Latitude of the location in degrees_north
- Lon: Elevation of the location in degrees_east

Optional datafed fields:

- loc_name: Reasonably short text describing location.
- elev: elevation in meters.

data specific fields:
- Any field with any name

Good loc_codes are short abbreviations like ACAD and YOSE for Acadia and Yosemite National Parks. Completely numeric loc codes are possible, but more difficult to recognize and since leading zeros are significant, tools like excel may think they're numbers and cut them off.

If the loc_codes are long, like 9 characters, it's useful to generate a numeric 16-bit primary key for the location table and use it for joining the data tables with the location table. This may help in indexing and speed things up quite a bit.

Example: CIRA/VIEWS location table

Metadata

Abstract, Contact Information, Keywords and any other such documentation that is needed in classifying or finding the service. The metadata is accessible for the user via capabilities and coverage description documents.

Every provider should have wcs_capabilities.conf that lists keywords and contact information. The format is simple, copy one from the testprovider and edit it.

   # this file provides some information about the provider
   # and is incorporated into the respective WCS responses.
   # all currently available field identifiers are listed below.
   # please define every identifier only once.
   # other identifiers will be ignored, input is case sensitive.
   # the format is always <identifier>: <value>.
   # whitespaces before and after <value> will be stripped.
   # KEYWORDS can take a comma separated list that will then be
   # included in the respective keyword tags
   # empty lines and lines starting with "#" will be ignored.
   PROVIDER_TITLE: National Climate Data Center
   PROVIDER_ABSTRACT: National Climate Data Center is the worlds largest archive of climate data.
   KEYWORDS: Domain:Aerosol, Platform:Network, Instrument:Unknown, DataType:Point, Distributor:DataFed, Originator:NCDC, TimeRes:Minute, Vertical:Surface, TopicCategory:climatologyMeteorologyAtmosphere
   FEES: NONE
   CONSTRAINTS: NONE
   PROVIDER_SITE: http://lwf.ncdc.noaa.gov/oa/ncdc.html
   CONTACT_INDIVIDUAL: Climate Contact, Climate Services Branch, National Climatic Data Center
   CONTACT_PHONE: 828-271-4800
   CONTACT_STREET: 151 Patton Avenue Room 468
   CONTACT_CITY: Asheville 
   CONTACT_ADM_AREA: North Carolina 
   CONTACT_POSTCODE: 28801-5001
   CONTACT_COUNTRY: USA
   CONTACT_EMAIL: ncdc.info@noaa.gov

Here is the real live NCDC wcs_capabilities.conf

NetCDF-CF

NetCDF file format contains four kinds of information:

Global attributes
- Simple name=value pairs

Dimensions
- Only declares the length of the dimension
- Contains no dimension data.

Variables
- Array data with any number of dimensions.
- Zero dimensions meaning scalar data.

Variable Attributes:
- Simple name=value pairs associated to a variable.

While these are enough to describe any data, it's not easy to interpret what the data actually means. What is self-evident for humans is difficult for a computer program to reason. If you have a NetCDF viewer, it should be possible just open the file and display the data on a geographic map. But making a program that can automatically get the geographic dimensions from a NC file, is very difficult.

Conventions come to rescue. CF-1.0 Standardizes many things:

Standard name: what is the measure data about
Units
How to tell, that a variable is one of the following:
- Data Variable, containing real data.
- Dimension Coordinate Variable, containing dimension coordinates.
- Dimension Bounds Variable, containing lower and upper bounds of a dimension coordinate.
Projection

With this implicit information, you program can list the data variables for you and tell you exactly what you can filter by.

Links:

CF 1.0 - 1.4 contain conventions for cube data.

Unofficial CF-1.5 contains point data encoding. Expired certificate, add security exception.

CF Conventions

NetCDF Conventions

Unidata NetCDF documentation

Creating NetCDF CF Files

Point Data Configuration

Programmed instructions for the framework how to access data.

There should be a netCDF-CF like convention for SQL databases. That would allow the point coverage processor just connect to the DB and serve, without any other configuration. But since such convention does not exist, manual configuration is needed.

From file point_config.py

Coverage information and it's descriptions:

   point_info = {
       'SURF_MET':
           {
               'Title':'Surface Meteorological Observations',
               'Abstract':'Dummy test data.',

The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension. Time dimension format is ISO 8601 (start-inclusive)/(end-inclusive)/periodicity. PT1H means Periodicity Time 1 Hour, P1D would mean Periodicity Time 1 Day

               'axes':{
                   'X':(-180, 179.75), 
                   'Y':(-90, 89.383),
                   'T':iso_time.parse('2009-09-01T12:00:00/2009-09-03T12:00:00/PT1H'),
                   },

Then comes the description of the fields.

               'fields':{
                   'TEMP':{
                       'Title':'Temperature',
                       'datatype': 'float',
                       'units':'deg F',

The location table is a real dimension, Latitude and Longitude are attributes along location axis, not dimensions themselves. So a typical point dataset with locations and regular time intervals is a 2-dimensional dataset. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they need to be specified individually.

                       'axes':location_info,

The access instructions. This configuration is using 'complete_view', so the administrator has created the view that joins together the location table and the temperature data table. The SQL query will typically look like select loc_code, lat, lon, datetime, temp, flag from TEMP_V where datetime = '2009-09-01 and (lat between 34 and 44) and (lon between -90 and -80). This is by far the easiest way to configure the WCS.

                       'complete_view':{
                           'view_alias':'TEMP_V',
                           'columns':['loc_code','lat','lon','datetime','temp', 'flag'],
                           },
                       },

By creating a custom handler, it is possible to store data anywhere. You still need to declare the configuration in a python dictionary, just like above.

Point Location Configuration

Programmed instructions for the framework how to access data. This includes but is not limited to

For Cube Coverages:
- Automatic: Information of variables and dimensions extracted from netCDF-CF files.
- Manual: Hand-edited python dictionaries describing the netCDF files, their variables and dimensions.

For Point Coverages:
- Hand-edited python dictionaries describing Names and Columns of Location and Data tables.
- Custom modules for databases that are too esoteric configure in pure declarative manner.

Cube: Configuring NetCDF based Cube Data

Point:

Location Table Configuration

Data Table Configuration

SQL Database for Points

Currently the datafed WCS for points supports one kind of point data: Fixed locations and regular intervals.

Storing Point Data in a Relational Database

WCS Capabilities Document

The document contains all the high level information about a service

The Document contains:

Description of the Service
Machine Readable and Human Readable Name.
Keywords
Contact Information
HTTP access information
List of coverages in the service
- Machine Readable and Human Readable Name.
- Keywords
- Latitude and Longitude bounds.
- Time range in version 1.0.0

Example Version 1.1.2

Example Version 1.0.0

WCS Describe Coverage Document

The document describes the coverage in detail, so that the user knows what the data is and what are the dimensions of the data.

Description of the Coverage
Machine Readable and Human Readable Name.
Keywords
Latitude and Longitude bounds.
Grid bounds in the projection of the data, if applicable
Grid size in the projection of the data, if applicable
Time dimension.
Supported Coordinate Systems
Supported Formats
Supported Interpolations
Fields of Coverage in versions 1.1.x
- Name
- Units
- Other dimensions, like elevation or wavelength, if applicable
- Reference to location dimension, if applicable

Example Version 1.1.2

Example Version 1.0.0

WCS GetCoverage Query

The main query to get data from a WCS

GetCoverage for points

TODO: samples

WFS Capabilities Document

The document contains all the high level information about a service

TODO: samples

WFS DescribeFeatureType

ASOS The document contains all the high level information about a service

TODO: samples

WFS GetFeature Query

The main query to get data from a WFS

TODO: samples sample