AQ Community server software

From Earth Science Information Partners (ESIP)
Revision as of 23:57, September 5, 2011 by Rhusar (talk | contribs) (→‎Real Data-to-WCS-Mapping structure)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

< Back to AQ CoP.png | Workshops | Air Quality Data Network

This sessions is standards and conventions | Implementation for gridded and station data | Development tools | Server performance

Issues re. the use of netCDF and other data formats

netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.

  • Issue: ambiguity and completeness of CF
    • Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
    • Issue: geo-referencing (also see CF-ML discussion "the need to store lat/lon coordinates in a CF-compliant netCDF file")
    • What is missing in CF?
    • sever independent CF-API Package
  • Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
  • Issue: other ouput formats
    • support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
    • Delivery of (small) data sets in ASCII/csv format?
  • Issue: Reading other gridded input data formats? (i.e. GRIB)

Data server performance issues/solutions

  • Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
    • direct streaming of datasets to the client could be part of the solution, click here for details
    • generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
    • problem: both proposals might be mutually exclusive to some degree
  • Issue: XML Metadata assembly might take a long time depending on the catalogue content, i.e. with a lot of Identifiers
    • GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
      • attention: DescribeCoverage response depends on parameters
  • Issue: management overhead when opening NetCDF
    • when opening a NetCDF file, some metadata has to be read and data structures have to be set up
      • input files could be kept open for a while to avoid this overhead
  • Issue: temp file space is limited on WCS server
    • streaming approach for store=false parameter would not requrie additional local storage
    • temp file approach for store=true parameter could be limited by a maximum dataset size
      • requires a reliable output file size estimator
      • server would return an exception if estimated size is over given threshold
      • would force people to use store=false for large datasets
      • should not violate WCS 1.1 standard (too badly) as only store=false is mandatory

Data Servers: Technical Realization (IT) Issues and Solutions

  • which W*S protocol for which purpose, how to combine?
    • WMS for display/preview of spatial data
    • WFS .. for station description/spatial metadata?
    • WCS for "everything else"? (gridded ("raw") datasets)
  • WCS Data structure hierarchy: DataHub; Service; Coverage: Field; Flag
    • WCS 1.1 terminology: Service->Group of similar datasets; Coverage->Dataset; Field->Parameter; Flag->Flag

Gridded data service through WCS

WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. This generally works well.

  • Issue: Extraction of vertical levels
  • Issue: current state of WCS 2.0?
    • core relased, but extensions still in draft (how do we know what is valid?)
  • Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
    • create a "wrapper" module that can handle such cases?
    • Kari has already done something like this for HTAP datasets, this could be a starting point
  • Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
    • Kari has already created such filters, but so far they are outside the standard

Delivery of station-point data

  • Issue: use WCS or WFS, Combination of both/which combination?

Access rights'

  • Issue: technical options to restrict access to datasets?

Server co-development tools, methods

Server code is maintained through SourceForge, Darcs code repositories are available at WUSTL and in Juelich.

  • Issues: Platform independence (netcdf interface), Documentation

Relationship to non-AComServ (non-NetCDF) WCS servers

  • data format(s)
    • many WCS clients don't understand NetCDF
  • Issue: protocol compatibility
    • might need to implement more optional features of WCS
  • standard compliance
    • will need a test suite for 1.1.2 (and manage to run it)

Real Data-to-WCS 1.1.2 Mapping

  • Data hub that exposes the data ==> Provider ==> WCS Service
  • Observation platform or network ==> Dataset ==> WCS Coverage
  • Observation parameter/variable ==> Parameter ==> WCS Field