Difference between revisions of "Michael Decker (MDecker)"

From Earth Science Information Partners (ESIP)
(-- ~~~~)
 
(Blanked the page)
 
Line 1: Line 1:
==  -- [[User:MDecker|MDecker]] 10:14, 23 August 2011 (MDT) ==
 
  
== Data Servers: Technical Realization (IT) Issues and Solutions --- Summary ==
 
===Issues re. the use of netCDF and other data formats===
 
netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.
 
* Issue: ambiguity and completeness of CF
 
** ''development of a server independent (python) CF-API library''
 
*** some (beta) code available at FZJ: [http://repositories.icg.kfa-juelich.de/hg/CommonUtils/file/faad03a63f98/CommonUtils/cf_netcdf.py CommonUtils.cf_netcdf], feel free to suggest a better name
 
** Brainstorming: What is missing in CF?
 
** Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
 
*** those two cases can be processed with different code, but uniformity would be less confusing
 
*** could try to get ISO time recommendation into CF; would still need different code because it's only a recommendation
 
** Issue: geo-referencing
 
*** CF offers support for projections (see [http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#grid-mappings-and-projections here]), but they are not used by the WCS Server so far
 
*** CF-Metadata List had some discussions about handling and specification of projections recently, see thread http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/007935.html
 
 
* Issue: We should define a standard NetCDF python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
 
* Issue: other output formats
 
** support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
 
** Delivery of (small) data sets in ASCII/csv format?
 
* Issue: Reading other gridded input data formats? (i.e. GRIB)
 
* Issue: traceability and revision tracking of datasets (in WCS metadata as well as in NetCDF metadata)
 
 
===Server co-development tools, methods ===
 
Server code is maintained through SourceForge (bugtracker, tar balls), Darcs code repositories are available at WUSTL and in Juelich.
 
* Issue: Version control
 
** maintain a common codebase
 
* Issue: Documentation
 
** mainly inline tech documentation to date
 
** need more documentation regarding operation
 
** proposal: [http://sphinx.pocoo.org/ sphinx] for proper documentation (easy to include inline tech doc)
 
 
===Use of WMS, WCS, WFS .. in combination?===
 
Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.
 
* Issue: WMS interface for preview; "latest" token for dynamic links?
 
** generic WMS service operating on external wcs
 
** "latest" token could be realized on WCS-WMS interface by using the metadata on the WMS/client side and requesting the latest time; latest time could be default response of WMS server if nothing else requested
 
 
===Gridded data service through WCS===
 
WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2.
 
This generally works well.
 
* Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
 
** clients should only have to do one query to receive the whole times series in one piece instead of requiring the client-side logic to request multiple pieces
 
** could create a "wrapper" module that can handle such cases with knowledge of the server-side file structure
 
*** Kari has already done something like this for HTAP datasets, this could be a starting point
 
* Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
 
* Issue: Extraction of vertical levels
 
** already defined through the rangesubset/fieldsubset parameter
 
** is this definition OK for us or would we need something else/better?
 
*** potential problem: only enumeration of levels possible, no ranges
 
* Issue: current state of WCS 2.0? core released, but extensions still in draft (how do we know/keep track of what is currently valid?)
 
 
===Delivery of station-point data===
 
* Issue: use WCS or WFS, Combination of both?
 
 
=== Access rights ===
 
* Issue: technical options to restrict access to datasets?
 
 
===Data server performance issues/solutions? ===
 
Define performance issues, measurements, ideas
 
 
* Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
 
** direct streaming of datasets to the client could be part of the solution, [[Streaming_and_or_netCDF_File|click here]] for details
 
** generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
 
** problem: both proposals might be mutually exclusive to some degree
 
* Issue: management overhead when opening NetCDF
 
** when opening a NetCDF file, some metadata has to be read and data structures have to be set up
 
*** input files could be kept open for a while to avoid this overhead
 
* Issue: temp file space is limited on WCS server
 
** streaming approach for store=false parameter would not require additional local storage
 
** temp file approach for store=true parameter could be limited by a maximum dataset size
 
*** requires a reliable output file size estimator
 
*** server would return an exception if estimated size is over given threshold
 
*** would force people to use store=false for large datasets
 
*** should not violate WCS 1.1 standard (too badly) as only store=false is mandatory
 
* Issue: XML Metadata assembly might take a long time depending on the catalog content, i.e. with a lot of Identifiers
 
** GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
 
*** attention: DescribeCoverage response depends on parameters
 
** minor issue compared to actual data delivery performance
 
 
===Relationship to non-AComServ (non-NetCDF) WCS servers===
 
* data format(s)
 
** most WCS clients don't understand NetCDF
 
* Issue: protocol compatibility
 
** might need to implement more optional features of WCS
 
* standard compliance
 
** will need a test suite for 1.1.2 (and manage to run it)
 
===Linkages to non-WCS servers===
 
* Issue: is there a need?
 
* Issue: which protocols? (OpenDAP?, GIS servers?)
 
 
 
==  -- [[User:MDecker|MDecker]] 10:03, 23 August 2011 (MDT) ==
 
 
CF-API:
 
* need to read and write CF-compliant files easily
 
** add a python interface to ucar libcf? http://www.unidata.ucar.edu/software/libcf/
 
 
Performance/Virtual Datasets
 
* non-compressed data preferred
 
* many files vs. single file for queries
 
** mapping: many files -> single identifier
 
*** Kari: might be too slow
 
*** Michael: should not matter so much for performance
 
*** queries might get very large
 
*** need to limit query size on server side (datafed browser: client side management currently)
 
 
Common NetCDF Python Interface, NetCDF4
 
* Kari cloned PyNIO interface for Windows, so no problem right now for cross platform development
 
* solve other problems first, keep an eye open
 
* NetCDF4 makes things more complicated, might not be mappable to WCS easily
 
 
Delivery of other data formats, other input formats
 
* need to map other formats to WCS and/or CF concept
 
* differentiate between format (NetCDF) and convention (CF)
 
* chain with WMS server for default views/previews
 
 
Tracability and revision tracking of Datasets
 
* always try to get current data when dealing with real time data, always expect your data to be old
 
* would be nice to have WCS field for "last updated" date, same for NetCDF/CF (global attribute?)
 
** we can make something up on our own for a start
 
** try to propose that for CF (and WCS)
 
 
Delivery of Point Station data
 
* put config into SQL database as much as possible (views, stored procedures, etc)
 
** try to maintain unit tests for this
 
 
Access restrictions to WCS
 
* HTTP Basic authentication
 
* API key
 
* does not have to be 100% secure, more about connecting with the users, knowing who they are
 
* firewalling for small user groups
 
 
Relationship with other Servers
 
* write a wrapper for other data formats
 
 
WCS 2.0
 
* more modular, core and extensions
 
* potentially easier to use/implement
 
* CF-NetCDF extension coming
 
 
Processing Services
 
* community provides online processing service for their discipline, for example averaging
 
** not part of W*S, but separate service
 
** protocol: web processing service http://www.opengeospatial.org/standards/wps
 
 
Time filtering
 
* day of week, hour of day, day of month,...
 
* describe non-standard features in capabilities document?
 
* might be difficult to get into official standard?
 
* does not interfere with standard if you don't use it
 

Latest revision as of 01:25, August 24, 2011