WCS NetCDF-CF Updates

From Earth Science Information Partners (ESIP)
Revision as of 10:48, May 5, 2010 by Rhusar (talk | contribs)

WCS for NetCDF-CF Progress Report and Update: April 2010

Large Scale Data Processing using WCS

Processing large amounts of data, ranging from 100 MB to several GB, is always a challenge. Computer networks work better with large amount of small chunks, accepting the possibility of a failure and need to retry missing piece.

The WCS client operation at browser has been improved to use chunked approach. This allows processing large queries even if the service cannot supply them. The Data is read using small subqueries and the aggregator or other data consumer is processing it one chunk at a time. This allows large queries to be aggregated.

What is still missing is batch processing with retry. A system where you submit a query and processing instructions, and retrieve the results later, is in plans.

Adding periodic time filters

For analysis purposes, it is useful to be able to filter time dimension by getting only June, July and August, Only Mondays and Fridays, or only hours 11-13.

This can be done with standard WCS by enumerating datetimes: TimeSequence=2009-06-01,2009-06-02,...2009-08-31 but this very quickly produces long urls that servers are unable to process. Most browsers limit url length to a small value, like 2000 characters. While that is enough for urls that need to be typed by human, they are too small for computer generated queries.

WCS supports now filters like days_of_week=Mon+Fri

These filters are extensions to the standard.

New Features and Bug Fixes

2010-03-10, By Decker:

  • upgraded server code to WCS version 1.1.2 requirements
  • implemented minimum requirements for WCS 1.1.2
  • totally restructured wcs.py in the process to be able to support new versions more easily in the future (hopefully)
  • versions 1.1.0 and 1.1.1 will be treated by the same code as 1.1.2 as no real protocol differences were found in the spec documents
  • introduced wcs_capabilities.conf in provider dir to provide some static but provider specific settings (Contact, Title, etc); also see inline documentation in that file
  • moved providers from static to own providers dir, so that provider data is not freely accessible via static path any more
  • index.html contents will now be delivered via owsutil.StringResponse
  • moved logs out of static
  • changed owsadmin from a tool generating static xml documents to a tool that collects all relevant metadata for realtime generation of XML responses to all requests
  • metadata is saved as a pickled nested dict in "metadata.dat" in the provider dir -> completely removed template concept
  • added get_range method to all relevant iso_time Time classes
  • changed pync3 time filtering code to handle exact points in time and time ranges
  • added very basic support for multiple CRS definitions
  • introduced config.float_precision to globally set number of digits that floats should be rounded to before comparing them
  • added config options for supported formats and CRSs to be announced in responses
  • updated inline documentation
  • moved ows_name, wcs_name, etc. from owsadmin to owsutil
  • updated ExceptionReport generation
  • owsparser will not mask ";" characters any more as they should be escaped (%3B) if they are not meant to be interpreted by the parser

2010-04-02, By Hoijarvi:

  • Added more query unit tests.

2010-04-05, By Hoijarvi:

  • Upgraded to NetCDF 4 library. This version allows cubes bigger than 2 GB to be used as data source.

2010-04-07, By Decker:

  • Internal improvements in ISO 8609 time parsing, correctly rejecting time zone information from full dates without time.
  • Darcs update: upgraded the repository format from the original to darcs-2. New darcs access requires darcs 2

2010-04-09, By Hoijarvi:

  • Homepages of the service and the providers are now redirected properly, so the html documents can use relative addresses for images and other hyperlinks.
  • Added real W3C-Schema based XML validation for Capabilities and Coverage Descriptions, fixed automated creation of them.

2010-04-14, By Hoijarvi:

  • Fixed NCML interpretation to avoid accumulation of rounding errors when creating values for dimension variables.
  • Allowed querying without bbox. The spec requires either time or bbox filter.

2010-04-15, By Hoijarvi:

  • Re-enabled month, day_of_week, and hour_of_day filters to allow queries like: every noon hour of every weekend in every summer month. This is a non-standard extension.
  • switched from standard python urlparse to own version, to allow semicolon to be used as a separator as it is used in specs.

2010-04-27, By Hoijarvi:

  • Optimization to allow big queries: 500 MB cubes used to cause out of memory exception.
  • Optimized datafed client to do big aggregations with multiple small queries. This enables aggregation of multi-gigabyte queries.

2010-04-29, By Hoijarvi:

  • Fix to allow dimensions to have only one coordinate.

2010-05-05, By Decker:

  • remove unused imports from desc_t
  • updated webpy_logger to wsgilog 0.2
  • improved query parser (parse_qsl)
  • improved is_num_type()

External Installations: Juelich

Juelich HTAP Models Capabilities Document

This is a large set of models with monthly data. They have been registered for datafed client.

Example: Mole Fraction of SO2 in the Air

View the Full Catalog


External Installations: Northrop Grumman GeoEnterpiseLab

Capabilities of CALPuff This is a demo for NetCDF daily slices. It's using Smokefire data. Browse

Capabilities of losangeles Smokefire data of Los Angeles at 2009. Browse

Capabilities of New Orleans 2009 Smokefire data of Los Angeles at 2009. Browse

Capabilities of Niagara Fire Browse

Capabilities of New York Fire Browse

Santa Barbara Fire Browse

Oak Ridge National Laboratory SiB3 Carbon Flux Browse

Current Installations at Datafed

Capabilities of MODIS4_AOT AOT at 0.55 micron for both ocean (best) and land (corrected) cached at datafed. This is an 0.1 degree resolution cube for 10 years, is served fom one 85 gigabyte netcdf cube. Browse

Capabilities of NAAPS 4-dimensional AOD model from Naval Research laboratory. Browse


Capabilities of Southeast Asia 2006 Southeast Asia Emission Inventory in 2006 for the NASA INTEX-B Mission, cached at datafed. Browse


CMAQ_Baron data from CATHALAC cached at datafed

Browse 20 km resolution and 5 Browse km resolution

Capabilities of CMAQ_DISP is an aerosol related dataset derived from the Total Ozone Monitoring Satellite (TOMS) Sensor, cached at datafed. Browse

Wish List

  • Support both -180..180 and 0..360 coordinate systems.
  • Allow queries across longitude, from 179..-179 or 359..1 as a two-degree query, as the spec says.
  • Support bounds variables, especially for time dimension.
  • Support WCS 1.0.0, it has filter for elevation/depth dimension, as well as WIDTH,HEIGHT and DEPTH sampling filters.
  • Automated index.html generation. Currently, index pages must be written by hand and are often out of date.
  • Online creation of new datasets on external servers.
  • Proper support of CRS systems
  • Proper generation of Coverage descriptions to enumerate irregular time dimensions.
  • Proper generation of Coverage descriptions for additional dimensions like wavelength
  • Filtering additional dimensions, like elevation, via RangeSubset parameter.