Difference between revisions of "WCS Access to netCDF Files"

From Earth Science Information Partners (ESIP)
m ([BOT] Misspelling correction: | Convertion -> Conversion | persued -> pursued)
Line 13: Line 13:
  
 
It is proposed that for the HTAP data information system adapts the Web Coverage Service (WCS) as the standard data query language.
 
It is proposed that for the HTAP data information system adapts the Web Coverage Service (WCS) as the standard data query language.
The adoption of a set of interoperability standards is a necessary condition for building an agile data system from loosely coupled components for HTAP. During 2006/2007, members of HTAP TF have made considerable progress in evaluating and selecting suitable standards. They also participated in the extension of several international standards, most notably standard names (CF Convertion), data formats (netCDF-CF) and a standard data query language (OGC Web Coverage Service, WCS).
+
The adoption of a set of interoperability standards is a necessary condition for building an agile data system from loosely coupled components for HTAP. During 2006/2007, members of HTAP TF have made considerable progress in evaluating and selecting suitable standards. They also participated in the extension of several international standards, most notably standard names (CF Conversion), data formats (netCDF-CF) and a standard data query language (OGC Web Coverage Service, WCS).
  
  
Line 28: Line 28:
 
[[Image:WCS Protocol.png|300px|left]]  
 
[[Image:WCS Protocol.png|300px|left]]  
  
The WCS protocol consists of a communications protocol and a data query language. The WCS data access protocol is defined by the international Open Geospatial Consortium (OGC), which is also the key organization responsible for interoperability standards in GEOSS. Since the WCS protocol was originally developed for the GIS community, it was necessary to adapt it to the needs of "Fluid Earth Sciences". The Earth Science Community has actively persued the adaptation and testing of the WCS interoperability standards, spearheded by the Unidata-driven GALEON interoperability program.  
+
The WCS protocol consists of a communications protocol and a data query language. The WCS data access protocol is defined by the international Open Geospatial Consortium (OGC), which is also the key organization responsible for interoperability standards in GEOSS. Since the WCS protocol was originally developed for the GIS community, it was necessary to adapt it to the needs of "Fluid Earth Sciences". The Earth Science Community has actively pursued the adaptation and testing of the WCS interoperability standards, spearheded by the Unidata-driven GALEON interoperability program.  
  
 
The '''WCS communications protocol''' consists of three service calls: getCapabilities, describeCoverage, getCoverage. The combination of the three services permits the linking of WCS clients and servers using loosely coupled connections consistent with Service Oriented Archicterure (SOA). The client first invokes the getCapabilities service, which returns an XML file listing the datasets (coverages) offered by the server. Given the list the client then selects a particulate coverage and issues a describeCoverage request to the server.  The returned  XML file describes the specific coverage and also contains specific data access instructions for that coverage. The main data query service is getCoverage, in which the user specifies the desired data subsets as well as the desired data format. In our case, the return format is netCDF-CF, which can be accessed and manipulated by the client-side netCDF libraries.  
 
The '''WCS communications protocol''' consists of three service calls: getCapabilities, describeCoverage, getCoverage. The combination of the three services permits the linking of WCS clients and servers using loosely coupled connections consistent with Service Oriented Archicterure (SOA). The client first invokes the getCapabilities service, which returns an XML file listing the datasets (coverages) offered by the server. Given the list the client then selects a particulate coverage and issues a describeCoverage request to the server.  The returned  XML file describes the specific coverage and also contains specific data access instructions for that coverage. The main data query service is getCoverage, in which the user specifies the desired data subsets as well as the desired data format. In our case, the return format is netCDF-CF, which can be accessed and manipulated by the client-side netCDF libraries.  

Revision as of 07:23, December 20, 2007

Back to Interoperability


WCS Wrapper for netCDF-CF Data Files


Kari Hoijarvi (hoijarvi@me.wustl.edu) 314 935 5772, Ed Fialkowski (edfialk@gmail.com), Rudolf Husar (rhusar@me.wustl.edu)

Introduction

The purpose of this effort is to create a portable software template for accessing netCDF-formated data using the WCS protocol. Using that protocol will allow accessing the stored data by any WCS compliant client software. It is hoped that the standards-based data access service will promote the development and use of distributed data processing and analysis tools.

The initial effort is focused on developing and applying the WCS wrapper template to the HTAP global ozone model outputs created for the HTAP global model comparison study. These model outputs are being managed by Martin Schultz's group at Forschungs Zentrum Juelich, Germany. It is hoped that following the successful implementation at Juelich, the WCS interface could also be implemented at Michael Schulz's AeroCom server that archives the global aerosol model outputs.

It is proposed that for the HTAP data information system adapts the Web Coverage Service (WCS) as the standard data query language. The adoption of a set of interoperability standards is a necessary condition for building an agile data system from loosely coupled components for HTAP. During 2006/2007, members of HTAP TF have made considerable progress in evaluating and selecting suitable standards. They also participated in the extension of several international standards, most notably standard names (CF Conversion), data formats (netCDF-CF) and a standard data query language (OGC Web Coverage Service, WCS).


The netCDF-CF Data Format

The netCDF-CF file format is a common way of storing and transferring gridded meteorological and air quality model results. The CF convention for structuring and naming of netCDF-formated data further enhances the semantics of the netCDF files. Most of the recent model outputs are conformant with netCDF-CF. The netCDF-CF convention is a key step toward standard-based storage and transmission of Earth Science data.

The netCDF-CF data format is supported by a robust set of well-documented and maintained low-level libraries for creating, maintaining and accessing data in that format for multiple platforms (Linux, Windows). The low level libraries provided by UNIDATA also offer a clear application programing interface (API). At the server side, the libraries can be used to create and to subset the netCDF data files. At the client side, the libraries allow easy access to the transmitted netCDF contents. Thus, both the data servers and the application developers are enabled by the robust netCDF libraries.

The existing names for atmospheric chemicals in the CF convention were inadequate to accommodate all the parameters used in the HTAP modeling. in order to remedy this shortcoming the list of standard names was extended by the HTAP community under leadership of C. Textor. She also became a member of the CF convention board that is the custodian of the standard names. The standard names for HTAP models were developed using a collaborative wiki workspace. It should be noted, however, that at this time the CF naming convention has only been developed for the model parameters and not for the various observational parameters.(See Textor, need a better paragraph). The naming of individual chemical parameters will follow the CF convention used by the Climate and Forecast (CF) communities.

The netCDF CF data format is most useful for the exchange of multidimensional gridded model data. It was also demonstrated that the netCDF format is well suited for the encoding and transfer of station monitoring data. Traditionally, satellite data were encoded and transferred using the HDF format. The new netCDF version 4 (beta) library provides a common API for netCDF and HDF-5 data formats.

Data access through OGC WCS protocol

WCS Protocol.png

The WCS protocol consists of a communications protocol and a data query language. The WCS data access protocol is defined by the international Open Geospatial Consortium (OGC), which is also the key organization responsible for interoperability standards in GEOSS. Since the WCS protocol was originally developed for the GIS community, it was necessary to adapt it to the needs of "Fluid Earth Sciences". The Earth Science Community has actively pursued the adaptation and testing of the WCS interoperability standards, spearheded by the Unidata-driven GALEON interoperability program.

The WCS communications protocol consists of three service calls: getCapabilities, describeCoverage, getCoverage. The combination of the three services permits the linking of WCS clients and servers using loosely coupled connections consistent with Service Oriented Archicterure (SOA). The client first invokes the getCapabilities service, which returns an XML file listing the datasets (coverages) offered by the server. Given the list the client then selects a particulate coverage and issues a describeCoverage request to the server. The returned XML file describes the specific coverage and also contains specific data access instructions for that coverage. The main data query service is getCoverage, in which the user specifies the desired data subsets as well as the desired data format. In our case, the return format is netCDF-CF, which can be accessed and manipulated by the client-side netCDF libraries.

The WCS getCoverage service incorporates a data query language to request specific data from the server. The data queries are formulated in terms of physical coordinate i.e. the space-time query consisting of the (1) geographic bounding box, (2) time-range and the (2) parameter (coverage).

Networking using the netCDF-CF and WCS Protocols

The combination of netCDF-CF and WCS protocols offers the means to create agile, loosely coupled data flow networks based on Service Oriented Architecture (SOA). The netCDF-CF data fromat provides a compact, standards-based self-describing data format for transmitting Earth Science data pockets. The OGC WCS protocol supports the Publish, Find, Bind, operations required for service oriented architecture. A network of interoperable nodes can be established for each of the nodes complies with the above standard protocls.

An important (incomplete) initial set of nodes for the HTAP information network already exist as shown in Figure 3. Each of these nodes is, in effect, is a portal to an array of datasets that they expose through their respective interfaces. Thus, connecting these existing data portals would provide an effective initial approach of incorporating a large fraction of the available observational and model data into the HTAP network. The US nodes DataFed, NEISGEI and Giovanni are already connected through standard (or pseudo-standard) data access services. In other words, data mediated through one of the nodes can be accessed and utilized in a companion node. Similar connectivity is being pursued to the European data portals Juelich, AeroCom, EMEP and others.

HTAP Network.png
Figure 3. Initial HTAP Information Network Configuration

(Here we could say a few words about each of the main provider nodes) Federated Data System DataFed; NASA Data System Giovanni; Emission Data System NEISGEI; Juelich Data System; AeroCom; EMEP.

Design goals of WCS-netCDF Wrapper:

  • Promote WCS as standard interface.
  • Promote NetCDF and CF-1.0 conventions
  • Make it easy to deliver your data via WCS from your own server or workstation.
  • No intrusion: You can have the NetCDF files where you want them to be.
  • A normal PC is enough, no need to have a server.
  • Minimal configuration

The purpose of the wrapper software is to respond to three HTPP GET service queries. The WCS-netCDF wrapper software was developed for DataFed using Python 2.5, C++ and lxml 1.3.6 and NetCDF 3.6.1 beta 1 libraries. It's tested on Windows and Linux operating systems. In fact, public test servers have been prepared for both platforms. It is free and open source, licensed under WTFPL v2

Wrapper software description and installation

The WCS wrapper for netCDF software has three tripple functionality:

  • Accessing netCDF-CF files contents over the HTTP Get Internet protocol
  • Imposing a standard data query language using the WCS standard
  • Allow easy (non-intrusive) adoptation to evolving standards.
WCS netCDF Wrapper.png

The low level netCDF libraries provided by UNIDATA provide an Application Programming Interface (API) for netCDF files. These libraries can be called from application development programming languages such as Java and C. The API facilitates creating and accessing netCDF data files within an operating environment but not over the internet. In other words, the API is a standard library interce, not a web service interface to netCDF data.

However, such libraries are not adequate to access user-specified data subsets over the Web. For standardized web-based access, another layer of software is required to connect the high-level user queries to the low-level interface of the netCDF libraries. We call this interface the WCS-netCDF Wrapper.

The main components of the wrapper software are shown schematically in the Figure left. At the lowest level are open source libraries for accessing netCDF and XML files. At the next level are Python scripts for extracting spatially subset slices for specific parameters and times. At the third level, is the WCS interpreter that parses the WCS url.

The Capabilities and Description files are created automatically from the NetCDF files, but you can provide a template containing information about your organization, contacts and other metadata.

The library has also some useful features for python programmers that need access to NetCDF files. The datafed.nc3 module providers wrappers to all the essential C calls, and the module datafed.iso_time helps in interpreting ISO 8601 time range encoding.