Difference between revisions of "Glossary"
Line 491: | Line 491: | ||
* List of Coverages. | * List of Coverages. | ||
− | From [http://128.252.202.19:8080/NRL?service=WCS&acceptversions=1.1.2&Request=GetCapabilities NRL service at datafed] | + | From [http://128.252.202.19:8080/NRL?service=WCS&acceptversions=1.1.2&Request=GetCapabilities NRL service at datafed]: |
The Document contains: | The Document contains: |
Revision as of 08:30, September 2, 2010
Glossary for Common Terms and Standard Names in Datafed WCS Wrapper Framework
AQ_uFIND
A front end to GEOSS Clearinghouse. Currently can be used to find WMS services.
Example use: AQ_uFIND.aspx?datatype=point
Capabilities Processor
This component creates the standard XML documents for WCS and WFS services.
It operates on Metadata and data configuration. From Metadata the capabilities document gets Title, Abstract, Keywords, Contact person etc...
From data configuration the processor gets full information of each coverage.
WCS Wrapper Configuration for Cubes
WCS Wrapper Configuration for Point Data
Coverage Processor for Cubes
The Coverage Processor is a component that performs three different activities:
- WCS Query Parser. The syntax is checked and output is binary object of all the query elements.
- Subsetter. This component finds the desired coverage and applies filters to read a subset of the data:
- Fields: A client querying wind data may be interested in speed and direction, but reject air pressure.
- Bounding Box: restrict response to certain geographical area.
- Time: Default time, One time, list of times, periodic range of times.
- Grid size and interpolation: High resolution data can be interpolated to lower resolution.
- By dimension: Select only one or some wavelengths,elevations
- Formatter. The binary data is returned in desired format. Currently supported are NetCDF-CF for cubes and CSV, Comma Separated Values for points.
The NetCDF-CF based processor is completely generic for any compatible netCDF-CF file.
Coverage Processor for Points
The Coverage Processor for points is a component that performs three different activities:
- WCS Query Parser. The syntax is checked and output is binary object of all the query elements.
- SQL Query Compiler.
- Fields:
- Select required fields
- join in location table
- join in tables for fields
- Bounding Box:
- Add where filter by location lat and lon.
- Time: Add filter by data datetime.
- location: if requested, add filter by location
- Fields:
- Formatter. The table data is returned in desired format. Currently the only supported is CSV, Comma Separated Values.
SQL processors can be either configured to supported DB schema types, or custom written for esoteric databases.
By writing a custom processor, anything can be used as a data source.
Custom Coverage Processor for Cubes
A custom module for data storage that is not supported directly.
Example: HTAP_wcs.py custom WCS module
The HTAP demo service stores data in daily netCDF-CF files. The custom module must:
- Check, that only one datetime in the TimeSequence is used, because getting timeseries from separate files is not supported.
- Locate the correct file. Normally, the file is just the coverage identifier and '.nc' extension. In this case the template is GEMAQ-v1p0_SR1_sfc_%(year)s_%(doy)s.nc year and doy, Julian day, gets replaced.
The HTAP_wcs.py implements this by inheriting the default netCDF-CF processor and overriding the _input_file method.
def _input_file(self, query):
Check that the query has exactly one datetime.
if len(query.time) != 1: raise owsutil.OwsError( 'CustomError', 'time parameter must match exactly one datetime')
Get the single datetime
datetimes = query.time.all_datetimes() dt = datetimes[0]
Now get the file template from the configuration
config = self._load_config(query)[query.identifier] files = metadata['files'] ncfile = files['template'] % {'year':str(dt.year), 'doy': str(iso_time.day_of_year(dt)).zfill(3)}
Now we have the filename, get the folder and return the full file path name.
src_root, last = __file__, while last != 'web': src_root, last = os.path.split(src_root) return os.path.join(src_root, files['path'], ncfile)
By writing a custom processor, anything can be used as a data source.
Custom Coverage Processor for Points
The service CIRA/VIEWS cannot access the database directly due to firewall. The SQL query can still be compiled by with configuration, but it has to be executed via a proxy server. The custom processor overrides getcoverage method and does just that.
By writing a custom processor, anything can be used as a data source.
Data Configuration for Cubes
For standard netCDF-CF files, the configuration is automatic.
- Each file becomes a coverage.
- Each variable becomes a field.
This is by far the easiest way to create a WCS service. Examples are testprovider which comes with the installation package, and NASA which serves some datasets downloaded from NASA.
For daily netCDF-CF files it is possible to create a service without compiling them into single file. See Serving data from periodic collection of NetCDF files as an example.
By creating a custom handler, it is possible to use anything as a data source.
Data Configuration for Points
Programmed instructions for the framework how to access data.
There should be a netCDF-CF like convention for SQL databases. That would allow the point coverage processor just connect to the DB and serve, without any other configuration. But since such convention does not exist, manual configuration is needed.
From file point_config.py
Coverage information and it's descriptions:
point_info = { 'SURF_MET': { 'Title':'Surface Meteorological Observations', 'Abstract':'Dummy test data.',
The covered area and time. The Time dimension is a true dimension here, but contrary to grid data, the X-Y dimensions for point data are not dimensions, but attributes of the location dimension. Time dimension format is ISO 8601 (start-inclusive)/(end-inclusive)/periodicity. PT1H means Periodicity Time 1 Hour, P1D would mean Periodicity Time 1 Day
'axes':{ 'X':(-180, 179.75), 'Y':(-90, 89.383), 'T':iso_time.parse('2009-09-01T12:00:00/2009-09-03T12:00:00/PT1H'), },
Then comes the description of the fields.
'fields':{ 'TEMP':{ 'Title':'Temperature', 'datatype': 'float', 'units':'deg F',
The location table is a real dimension, Latitude and Longitude are attributes along location axis, not dimensions themselves. So a typical point dataset with locations and regular time intervals is a 2-dimensional dataset. In this case, the location table is shared, so we use the previously declared variable 'location_info' If the location tables are parameter specific, they need to be specified individually.
'axes':location_info,
The access instructions. This configuration is using 'complete_view', so the administrator has created the view that joins together the location table and the temperature data table. The SQL query will typically look like select loc_code, lat, lon, datetime, temp, flag from TEMP_V where datetime = '2009-09-01 and (lat between 34 and 44) and (lon between -90 and -80). This is by far the easiest way to configure the WCS.
'complete_view':{ 'view_alias':'TEMP_V', 'columns':['loc_code','lat','lon','datetime','temp', 'flag'], }, },
By creating a custom handler, it is possible to store data anywhere. You still need to declare the configuration in a python dictionary, just like above.
Datafed Browser
TODO: describe classic browser
TODO: describe browsing WCS without df catalog http://webapps.datafed.net/datafed.aspx?wcs=http://128.252.202.19:8080/CIRA&coverage=VIEWS¶m_abbr=SO4f
TODO: Describe GE Plugin browser
Feature Processor for Points
Web Feature Service, WFS, is good in publishing geographic information that does not change by time.
With datafed WCS it is used to publish the location table for point data, because WCS DescribeCoverage Document does not support such rich dimensions well and location tables are static geographic information.
The component that performs three different activities:
- WFS Query Parser. The syntax is checked and output is binary object of all the query elements.
- Subsetter.
- Each field may have different location table. If a the data is sparse, some fields have data only in a few locations, it makes sense to return only those locations.
- Locations may also be filtered by geographic bounding box.
- Other WFS filters are not implemented.
- Formatter. The data is returned in desired format. Currently the only supported is CSV, Comma Separated Values
GEOSS Clearinghouse
The Clearinghouse is a component in the GEOSS Common Infrastructure. One of it's functions is the GEOSS Components and Services Registry
Google Earth
Google Earth is a standalone program, that can display data on 3D globe.
Google Earth Plugin is a browser extension, that puts interactive Google Earth on a web page.
ISO 19115 Metadata
Description of a service, with strictly defined XML presentation. Contains service URL's and metadata about the service.
ISO 19115 Maker
A public service to create an ISO 19115 record from a WCS or WMS service.
If the Capabilities document contains necessary keywords, the document can be created automatically: ISO 19115 for AIRNOW pmfine WMS.
Without keywords in the URL, the metadata can be passed via URL parameters.
KML Keyhole Markup Language
KML is the way to describe content in Google Earth and Google Maps. KML documentation is hosted by google.
KML Maker
Datafed tools produce KML directly out of data, which can be produced with WCS or WMS services.
KML from a CIRA/VIEWS showing SO4f and direct link
KML from NASA giovanni WMS and direct link
Fast Precompiled Examples:
Location Table for Points
The location table describes the location dimension for point data.
The fields that datafed uses are:
- Mandatory fields:
- loc_code: A unique text field, used to identify a location.
- lat: Latitude of the location in degrees_north
- Lon: Elevation of the location in degrees_east
- Optional datafed fields:
- loc_name: Reasonably short text describing location.
- elev: elevation in meters.
- data specific fields:
- Any field with any name
Good loc_codes are short abbreviations like ACAD and YOSE for Acadia and Yosemite National Parks. Completely numeric loc codes are possible, but more difficult to recognize and since leading zeros are significant, tools like excel may think they're numbers and cut them off.
If the loc_codes are long, like 9 characters, it's useful to generate a numeric 16-bit primary key for the location table and use it for joining the data tables with the location table. This may help in indexing and speed things up quite a bit.
Example: CIRA/VIEWS location table
Part of the CSV response:
loc_code, loc_name, lat, lon, elev WHRI1, White River NF, 39.1536, -106.8209, 3413.5 WICA1, Wind Cave, 43.5576, -103.4838, 1296 WIMO1, Wichita Mountains, 34.7323, -98.713, 509 YELL2, Yellowstone NP 2, 44.5653, -110.4002, 2425
Metadata
Abstract, Contact Information, Keywords and any other such documentation that is needed in classifying or finding the service. The metadata is accessible for the user via capabilities and coverage description documents.
Every provider should have wcs_capabilities.conf that lists keywords and contact information. The format is simple, copy one from the testprovider and edit it.
# this file provides some information about the provider # and is incorporated into the respective WCS responses. # all currently available field identifiers are listed below. # please define every identifier only once. # other identifiers will be ignored, input is case sensitive. # the format is always <identifier>: <value>. # whitespaces before and after <value> will be stripped. # KEYWORDS can take a comma separated list that will then be # included in the respective keyword tags # empty lines and lines starting with "#" will be ignored. PROVIDER_TITLE: National Climate Data Center PROVIDER_ABSTRACT: National Climate Data Center is the worlds largest archive of climate data. KEYWORDS: Domain:Aerosol, Platform:Network, Instrument:Unknown, DataType:Point, Distributor:DataFed, Originator:NCDC, TimeRes:Minute, Vertical:Surface, TopicCategory:climatologyMeteorologyAtmosphere FEES: NONE CONSTRAINTS: NONE PROVIDER_SITE: http://lwf.ncdc.noaa.gov/oa/ncdc.html CONTACT_INDIVIDUAL: Climate Contact, Climate Services Branch, National Climatic Data Center CONTACT_PHONE: 828-271-4800 CONTACT_STREET: 151 Patton Avenue Room 468 CONTACT_CITY: Asheville CONTACT_ADM_AREA: North Carolina CONTACT_POSTCODE: 28801-5001 CONTACT_COUNTRY: USA CONTACT_EMAIL: ncdc.info@noaa.gov
Here is the real live NCDC wcs_capabilities.conf
NetCDF-CF
NetCDF file format contains four kinds of information:
- Global attributes
- Simple name=value pairs
- Dimensions
- Only declares the length of the dimension
- Contains no dimension data.
- Variables
- Array data with any number of dimensions.
- Zero dimensions meaning scalar data.
- Variable Attributes:
- Simple name=value pairs associated to a variable.
While these are enough to describe any data, it's not easy to interpret what the data actually means. What is self-evident for humans is difficult for a computer program to reason. If you have a NetCDF viewer, it should be possible just open the file and display the data on a geographic map. But making a program that can automatically get the geographic dimensions from a NC file, is very difficult.
Conventions come to rescue. CF-1.0 Standardizes many things:
- Standard name: what is the measure data about
- Units
- How to tell, that a variable is one of the following:
- Data Variable, containing real data.
- Dimension Coordinate Variable, containing dimension coordinates.
- Dimension Bounds Variable, containing lower and upper bounds of a dimension coordinate.
- Projection
With this part standardized, your program can list the data variables for you and tell you exactly what you can filter by.
Links:
CF 1.0 - 1.4 contain conventions for cube data.
Unofficial CF-1.5 contains point data encoding. Expired certificate, add security exception.
Point Data Example
All of the configuration is done using python dictionaries and lists. The syntax is simple, This is a list:
['loc_code', 'lat', 'lon']
and this is a dictionary:
{'key1':'value1', 'key2': 'value2' }
The test provider point is an example how to configure this service to use a SQL database to serve point data.
The data for the demonstration is stored for into sqlite, which is distributed with python by default. The project has following files:
- pntdata.py: This script creates the test database and fills it with dummy data.
- pntdata.db: The sqlite database file created by pntdata.py
- point_config.py:
- Declares the location table in the SQL database.
- Mapping the coverages and fields to SQL tables.
- point_WCS.py is the custom WCS handler
- point_WFS.py is the custom WFS handler that delivers the location table.
Location Configuration for Points
From file point_config.py
In it's simplest case, SQL views are used to create required location table, so no aliasing is needed.
location_info = { 'location':{ 'service':'WFS', 'version':'1.0.0', }, }
These are the standard names that datafed uses:
- The dimension name is "location".
- No aliasing is needed, since the DB table/view and column names are standard.
- The view/table name in the DB is "location".
- The columns are lat", "lon" and "loc_code" and loc_code is a text type, not an integer.
In the CIRA/VIEWS case, table and fields are alised:
VIEWS_location_info = { 'location':{ 'service':'WFS', 'version':'1.0.0', 'table_alias':'Site', 'columns':{ 'loc_code':{'column_alias':'SiteCode'}, 'lat':{'column_alias':'Latitude'}, 'lon':{'column_alias':'Longitude'}, } }, }
- The dimension name is still "location"
- The location table is called "Site"
- "SiteCode", "Latitude" and "Longitude" are aliased to "loc_code", "lat" and "lon".
SQL Database for Points
Currently the datafed WCS for points supports one kind of point data: Fixed locations and regular intervals.
Point data is often stored in SQL databases. There's no standard schema, like CF-1.0 convention for NetCDF files, so it is not possible to just connect and start serving. You have to create the configuration description.
One of the most powerful ideas in relational database design is the concept of a view. You don't need to change the existing data tables, creating a view that makes your DB to look like the one needed is enough.
The Simplest Case: Configure with SQL Views
Location Table/View
The common thing between different databases is, that they need to have a location table. The current implementation is based on time series from stationary locations.
table/view location +----------+-------+---------+-------+ | loc_code | lat | lon | elev | +----------+-------+---------+-------+ | KMOD | 37.63 | -120.95 | 30.0 | | KSTL | 38.75 | -90.37 | 172.0 | | KUGN | 42.42 | -87.87 | 222.0 | |... | | | | +----------+-------+---------+-------+
Here loc_code is the primary key and lat,lon is the location. Optional fields can be added. Your database may have a location table with different name and different field names, but that does not matter. CIRA VIEWS database has a location table, but it's called Site and it spells full longitude. The datafed browser uses standard names loc_code, loc_name, lat and lon for browsing; to get plug-and-play compatibility we these names are needed. In the CIRA VIEWS database, the view creation would be:
create view location as select SiteCode as loc_code, Latitude as lat, Longitude as lon from Site
The primary key is loc_code, being unique for all the locations.
Because WCS does not have a good place to describe a location table, we use WFS, Web Feature Service to do the same. Sample WFS Call.
Data Views
Each data variable needs a view. For example:
create view TEMP_V as select location.loc_code, location.lat, location.lon, TEMP_base.datetime, TEMP_base.temp, TEMP_base.flag from location inner join TEMP_base on TEMP_base.loc_code = location.loc_code
Each parameter has its own data view that looks like
view TEMP_V +----------+-------+---------+------------+------+------+ | loc_code | lat | lon | datetime | temp | flag | +----------+-------+---------+------------+------+------+ | KMOD | 37.63 | -120.95 | 2009-06-01 | 87.8 | X | | KMOD | 37.63 | -120.95 | 2009-06-02 | 82.3 | | | KSTL | 38.75 | -90.37 | 2009-06-01 | 78.6 | | | ... | | | | | | +----------+-------+---------+------------+------+------+
view DEWP_V +----------+-----------------+------------+------+ | loc_code | lat | lon | datetime | dewp | +----------+-----------------+------------+------+ | KMOD | 37.63 | -120.95 | 2009-06-01 | 51.4 | | KMOD | 37.63 | -120.95 | 2009-06-02 | 51.4 | | KSTL | 38.75 | -90.37 | 2009-06-01 | 34.9 | | ... | | | | | +----------+-----------------+------------+------+
WCS Capabilities Document
The document contains all the high level information about a service.
- Metadata
- List of Coverages.
From NRL service at datafed:
The Document contains:
- Description of the Service
- Title=Naval Research Laboratory
- Abstract=Naval Research Laboratory Marine Meteorology Division
- Keywords Domain:Aerosol etc...
- Contact information
- HTTP access information
- List of coverages in the service
- Identifier=NAAPS
- Title=NOGAPS and NAAPS fields in press coordinate
- Abstract=none
- Latitude and Longitude bounds = whole world
WCS Describe Coverage Document
The document describes the coverage in detail, so that the user knows what the data is and what are the dimensions of the data.
Cube Data:
- Title=NOGAPS and NAAPS fields in press coordinate
- Abstract=none
- Identifier=NAAPS
- Latitude and Longitude bounds in EPSG:4326 and WGS 84 projections. This is regardless what the data projection is.
- Grid bounds and size in the WGS 84 projection, it's the only currently supported projection.
- Time dimension
- Fields of Coverage. The example is the second on the list.
- Title=specific humidity - nogaps
- Identifier=sphu
- Units of measure, UOM=(g/g)
- NullValue=nan, IEEE standard for not a number.
- InterpolationMethods=none
- Elevation dimension, values from 1000 to 10 mbar.
- Supported Coordinate Systems, at the very end, linear lat-lon urn:ogc:def:crs:EPSG::4326 and lon-lat urn:ogc:def:crs:OGC:2:84
- Supported Formats: image/netcdf and application/x-netcdf are the same, netCDF-CF. There is a need to standardize this format.
WCS GetCoverage Query
The main query to get data from a WCS
CIRA/VIEWS small geo range for one datetime &store=false
CIRA/VIEWS small geo range for one datetime &store=true
WFS Capabilities Document
The document contains all the high level information about a service
TODO: implement
TODO: samples
WFS DescribeFeatureType
TODO: implement
TODO: samples
WFS GetFeature Query
The main query to get data from a WFS