|
|
Line 1: |
Line 1: |
− | == -- [[User:MDecker|MDecker]] 10:14, 23 August 2011 (MDT) ==
| |
| | | |
− | == Data Servers: Technical Realization (IT) Issues and Solutions --- Summary ==
| |
− | ===Issues re. the use of netCDF and other data formats===
| |
− | netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.
| |
− | * Issue: ambiguity and completeness of CF
| |
− | ** ''development of a server independent (python) CF-API library''
| |
− | *** some (beta) code available at FZJ: [http://repositories.icg.kfa-juelich.de/hg/CommonUtils/file/faad03a63f98/CommonUtils/cf_netcdf.py CommonUtils.cf_netcdf], feel free to suggest a better name
| |
− | ** Brainstorming: What is missing in CF?
| |
− | ** Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
| |
− | *** those two cases can be processed with different code, but uniformity would be less confusing
| |
− | *** could try to get ISO time recommendation into CF; would still need different code because it's only a recommendation
| |
− | ** Issue: geo-referencing
| |
− | *** CF offers support for projections (see [http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#grid-mappings-and-projections here]), but they are not used by the WCS Server so far
| |
− | *** CF-Metadata List had some discussions about handling and specification of projections recently, see thread http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/007935.html
| |
− |
| |
− | * Issue: We should define a standard NetCDF python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
| |
− | * Issue: other output formats
| |
− | ** support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
| |
− | ** Delivery of (small) data sets in ASCII/csv format?
| |
− | * Issue: Reading other gridded input data formats? (i.e. GRIB)
| |
− | * Issue: traceability and revision tracking of datasets (in WCS metadata as well as in NetCDF metadata)
| |
− |
| |
− | ===Server co-development tools, methods ===
| |
− | Server code is maintained through SourceForge (bugtracker, tar balls), Darcs code repositories are available at WUSTL and in Juelich.
| |
− | * Issue: Version control
| |
− | ** maintain a common codebase
| |
− | * Issue: Documentation
| |
− | ** mainly inline tech documentation to date
| |
− | ** need more documentation regarding operation
| |
− | ** proposal: [http://sphinx.pocoo.org/ sphinx] for proper documentation (easy to include inline tech doc)
| |
− |
| |
− | ===Use of WMS, WCS, WFS .. in combination?===
| |
− | Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.
| |
− | * Issue: WMS interface for preview; "latest" token for dynamic links?
| |
− | ** generic WMS service operating on external wcs
| |
− | ** "latest" token could be realized on WCS-WMS interface by using the metadata on the WMS/client side and requesting the latest time; latest time could be default response of WMS server if nothing else requested
| |
− |
| |
− | ===Gridded data service through WCS===
| |
− | WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2.
| |
− | This generally works well.
| |
− | * Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
| |
− | ** clients should only have to do one query to receive the whole times series in one piece instead of requiring the client-side logic to request multiple pieces
| |
− | ** could create a "wrapper" module that can handle such cases with knowledge of the server-side file structure
| |
− | *** Kari has already done something like this for HTAP datasets, this could be a starting point
| |
− | * Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
| |
− | * Issue: Extraction of vertical levels
| |
− | ** already defined through the rangesubset/fieldsubset parameter
| |
− | ** is this definition OK for us or would we need something else/better?
| |
− | *** potential problem: only enumeration of levels possible, no ranges
| |
− | * Issue: current state of WCS 2.0? core released, but extensions still in draft (how do we know/keep track of what is currently valid?)
| |
− |
| |
− | ===Delivery of station-point data===
| |
− | * Issue: use WCS or WFS, Combination of both?
| |
− |
| |
− | === Access rights ===
| |
− | * Issue: technical options to restrict access to datasets?
| |
− |
| |
− | ===Data server performance issues/solutions? ===
| |
− | Define performance issues, measurements, ideas
| |
− |
| |
− | * Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
| |
− | ** direct streaming of datasets to the client could be part of the solution, [[Streaming_and_or_netCDF_File|click here]] for details
| |
− | ** generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
| |
− | ** problem: both proposals might be mutually exclusive to some degree
| |
− | * Issue: management overhead when opening NetCDF
| |
− | ** when opening a NetCDF file, some metadata has to be read and data structures have to be set up
| |
− | *** input files could be kept open for a while to avoid this overhead
| |
− | * Issue: temp file space is limited on WCS server
| |
− | ** streaming approach for store=false parameter would not require additional local storage
| |
− | ** temp file approach for store=true parameter could be limited by a maximum dataset size
| |
− | *** requires a reliable output file size estimator
| |
− | *** server would return an exception if estimated size is over given threshold
| |
− | *** would force people to use store=false for large datasets
| |
− | *** should not violate WCS 1.1 standard (too badly) as only store=false is mandatory
| |
− | * Issue: XML Metadata assembly might take a long time depending on the catalog content, i.e. with a lot of Identifiers
| |
− | ** GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
| |
− | *** attention: DescribeCoverage response depends on parameters
| |
− | ** minor issue compared to actual data delivery performance
| |
− |
| |
− | ===Relationship to non-AComServ (non-NetCDF) WCS servers===
| |
− | * data format(s)
| |
− | ** most WCS clients don't understand NetCDF
| |
− | * Issue: protocol compatibility
| |
− | ** might need to implement more optional features of WCS
| |
− | * standard compliance
| |
− | ** will need a test suite for 1.1.2 (and manage to run it)
| |
− | ===Linkages to non-WCS servers===
| |
− | * Issue: is there a need?
| |
− | * Issue: which protocols? (OpenDAP?, GIS servers?)
| |
− |
| |
− |
| |
− | == -- [[User:MDecker|MDecker]] 10:03, 23 August 2011 (MDT) ==
| |
− |
| |
− | CF-API:
| |
− | * need to read and write CF-compliant files easily
| |
− | ** add a python interface to ucar libcf? http://www.unidata.ucar.edu/software/libcf/
| |
− |
| |
− | Performance/Virtual Datasets
| |
− | * non-compressed data preferred
| |
− | * many files vs. single file for queries
| |
− | ** mapping: many files -> single identifier
| |
− | *** Kari: might be too slow
| |
− | *** Michael: should not matter so much for performance
| |
− | *** queries might get very large
| |
− | *** need to limit query size on server side (datafed browser: client side management currently)
| |
− |
| |
− | Common NetCDF Python Interface, NetCDF4
| |
− | * Kari cloned PyNIO interface for Windows, so no problem right now for cross platform development
| |
− | * solve other problems first, keep an eye open
| |
− | * NetCDF4 makes things more complicated, might not be mappable to WCS easily
| |
− |
| |
− | Delivery of other data formats, other input formats
| |
− | * need to map other formats to WCS and/or CF concept
| |
− | * differentiate between format (NetCDF) and convention (CF)
| |
− | * chain with WMS server for default views/previews
| |
− |
| |
− | Tracability and revision tracking of Datasets
| |
− | * always try to get current data when dealing with real time data, always expect your data to be old
| |
− | * would be nice to have WCS field for "last updated" date, same for NetCDF/CF (global attribute?)
| |
− | ** we can make something up on our own for a start
| |
− | ** try to propose that for CF (and WCS)
| |
− |
| |
− | Delivery of Point Station data
| |
− | * put config into SQL database as much as possible (views, stored procedures, etc)
| |
− | ** try to maintain unit tests for this
| |
− |
| |
− | Access restrictions to WCS
| |
− | * HTTP Basic authentication
| |
− | * API key
| |
− | * does not have to be 100% secure, more about connecting with the users, knowing who they are
| |
− | * firewalling for small user groups
| |
− |
| |
− | Relationship with other Servers
| |
− | * write a wrapper for other data formats
| |
− |
| |
− | WCS 2.0
| |
− | * more modular, core and extensions
| |
− | * potentially easier to use/implement
| |
− | * CF-NetCDF extension coming
| |
− |
| |
− | Processing Services
| |
− | * community provides online processing service for their discipline, for example averaging
| |
− | ** not part of W*S, but separate service
| |
− | ** protocol: web processing service http://www.opengeospatial.org/standards/wps
| |
− |
| |
− | Time filtering
| |
− | * day of week, hour of day, day of month,...
| |
− | * describe non-standard features in capabilities document?
| |
− | * might be difficult to get into official standard?
| |
− | * does not interfere with standard if you don't use it
| |