Michael Decker (MDecker)
-- Michael Decker (MDecker) 10:14, 23 August 2011 (MDT)
Data Servers: Technical Realization (IT) Issues and Solutions --- Summary
Issues re. the use of netCDF and other data formats
netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.
- Issue: ambiguity and completeness of CF
- development of a server independent (python) CF-API library
- some (beta) code available at FZJ: CommonUtils.cf_netcdf, feel free to suggest a better name
- Brainstorming: What is missing in CF?
- Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
- those two cases can be processed with different code, but uniformity would be less confusing
- could try to get ISO time recommendation into CF; would still need different code because it's only a recommendation
- Issue: geo-referencing
- CF offers support for projections (see here), but they are not used by the WCS Server so far
- CF-Metadata List had some discussions about handling and specification of projections recently, see thread http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/007935.html
- development of a server independent (python) CF-API library
- Issue: We should define a standard NetCDF python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
- Issue: other output formats
- support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
- Delivery of (small) data sets in ASCII/csv format?
- Issue: Reading other gridded input data formats? (i.e. GRIB)
- Issue: traceability and revision tracking of datasets (in WCS metadata as well as in NetCDF metadata)
Server co-development tools, methods
Server code is maintained through SourceForge (bugtracker, tar balls), Darcs code repositories are available at WUSTL and in Juelich.
- Issue: Version control
- maintain a common codebase
- Issue: Documentation
- mainly inline tech documentation to date
- need more documentation regarding operation
- proposal: sphinx for proper documentation (easy to include inline tech doc)
Use of WMS, WCS, WFS .. in combination?
Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.
- Issue: WMS interface for preview; "latest" token for dynamic links?
- generic WMS service operating on external wcs
- "latest" token could be realized on WCS-WMS interface by using the metadata on the WMS/client side and requesting the latest time; latest time could be default response of WMS server if nothing else requested
Gridded data service through WCS
WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. This generally works well.
- Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
- clients should only have to do one query to receive the whole times series in one piece instead of requiring the client-side logic to request multiple pieces
- could create a "wrapper" module that can handle such cases with knowledge of the server-side file structure
- Kari has already done something like this for HTAP datasets, this could be a starting point
- Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
- Issue: Extraction of vertical levels
- already defined through the rangesubset/fieldsubset parameter
- is this definition OK for us or would we need something else/better?
- potential problem: only enumeration of levels possible, no ranges
- Issue: current state of WCS 2.0? core released, but extensions still in draft (how do we know/keep track of what is currently valid?)
Delivery of station-point data
- Issue: use WCS or WFS, Combination of both?
Access rights
- Issue: technical options to restrict access to datasets?
Data server performance issues/solutions?
Define performance issues, measurements, ideas
- Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
- direct streaming of datasets to the client could be part of the solution, click here for details
- generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
- problem: both proposals might be mutually exclusive to some degree
- Issue: management overhead when opening NetCDF
- when opening a NetCDF file, some metadata has to be read and data structures have to be set up
- input files could be kept open for a while to avoid this overhead
- when opening a NetCDF file, some metadata has to be read and data structures have to be set up
- Issue: temp file space is limited on WCS server
- streaming approach for store=false parameter would not require additional local storage
- temp file approach for store=true parameter could be limited by a maximum dataset size
- requires a reliable output file size estimator
- server would return an exception if estimated size is over given threshold
- would force people to use store=false for large datasets
- should not violate WCS 1.1 standard (too badly) as only store=false is mandatory
- Issue: XML Metadata assembly might take a long time depending on the catalog content, i.e. with a lot of Identifiers
- GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
- attention: DescribeCoverage response depends on parameters
- minor issue compared to actual data delivery performance
- GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
Relationship to non-AComServ (non-NetCDF) WCS servers
- data format(s)
- most WCS clients don't understand NetCDF
- Issue: protocol compatibility
- might need to implement more optional features of WCS
- standard compliance
- will need a test suite for 1.1.2 (and manage to run it)
Linkages to non-WCS servers
- Issue: is there a need?
- Issue: which protocols? (OpenDAP?, GIS servers?)
-- Michael Decker (MDecker) 10:03, 23 August 2011 (MDT)
CF-API:
- need to read and write CF-compliant files easily
- add a python interface to ucar libcf? http://www.unidata.ucar.edu/software/libcf/
Performance/Virtual Datasets
- non-compressed data preferred
- many files vs. single file for queries
- mapping: many files -> single identifier
- Kari: might be too slow
- Michael: should not matter so much for performance
- queries might get very large
- need to limit query size on server side (datafed browser: client side management currently)
- mapping: many files -> single identifier
Common NetCDF Python Interface, NetCDF4
- Kari cloned PyNIO interface for Windows, so no problem right now for cross platform development
- solve other problems first, keep an eye open
- NetCDF4 makes things more complicated, might not be mappable to WCS easily
Delivery of other data formats, other input formats
- need to map other formats to WCS and/or CF concept
- differentiate between format (NetCDF) and convention (CF)
- chain with WMS server for default views/previews
Tracability and revision tracking of Datasets
- always try to get current data when dealing with real time data, always expect your data to be old
- would be nice to have WCS field for "last updated" date, same for NetCDF/CF (global attribute?)
- we can make something up on our own for a start
- try to propose that for CF (and WCS)
Delivery of Point Station data
- put config into SQL database as much as possible (views, stored procedures, etc)
- try to maintain unit tests for this
Access restrictions to WCS
- HTTP Basic authentication
- API key
- does not have to be 100% secure, more about connecting with the users, knowing who they are
- firewalling for small user groups
Relationship with other Servers
- write a wrapper for other data formats
WCS 2.0
- more modular, core and extensions
- potentially easier to use/implement
- CF-NetCDF extension coming
Processing Services
- community provides online processing service for their discipline, for example averaging
- not part of W*S, but separate service
- protocol: web processing service http://www.opengeospatial.org/standards/wps
Time filtering
- day of week, hour of day, day of month,...
- describe non-standard features in capabilities document?
- might be difficult to get into official standard?
- does not interfere with standard if you don't use it