Difference between revisions of "Candidate Technical Topics"

From Earth Science Information Partners (ESIP)
Line 33: Line 33:
 
** Issue: geo-referencing
 
** Issue: geo-referencing
 
** What is missing in CF?
 
** What is missing in CF?
 +
** sever independent CF-API
 
* Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
 
* Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
 
* Issue: Delivery of (small) data sets in ASCII/csv format
 
* Issue: Delivery of (small) data sets in ASCII/csv format

Revision as of 06:52, August 3, 2011

< Back to AQ CoP.png | Workshops | Air Quality Data Network

Air Quality Data Network (ADN), Non-IT Issues

Virtual to Real Purpose and scope of ADN | AQ network stakeholders | Relationship to integrating initiatives | Governance, legitimacy, impediments

What is the purpose of ADN?

  • Facilitate access to air quality information (measurements, model results, analyses) in order to enable users (stakeholders) to make better informed choices.
  • Boost scientific understanding of air chemistry and pollution transport processes by enabling synthesis views based on multi-platform observations and model ensembles.
  • Reduce chaos by improving standardisation and documentation of existing data sets and through exposing data processing and quality control procedures
  • Ensure reproducability of scientific analyses by allowing for traceability of data sets and definition of responsibilities within the data chain.

Who is involved in the ADN and what are their roles?

  • Data providers: mandated data origin vs. scientific data sets; what are the implications of a real ADN for them?
  • Data federators/network hubs: who does it and why? Who has a mandate? What are the respective roles and focal points?
  • Users: Who are they? What is the interaction with them? Who should become a user but doesn't know about it yet?

What organizations are stakeholders in the network? How do they relate to ADN?

What is the ADN governance, legitimacy, impediments

What are the minimum requirements for ADN?

What few things must be the same, so everything els can be different.

  • Autonomy - Interoperability balance...
  • Who formulates the requirements?
  • When is an ADN an ADN?

What is the scope of AQN?

Geographic, Variables, Ambient fixed station observations, satellite observations, emissions, models?

What data (processing) level served through ADN?

Is ADN a decider? Why not the Provider? Data Raw, Derived;

Data Servers: Technical Realization (IT) Issues and Solutions

netCDF, CF, WCS standards and conventions | Implementation for gridded and station data | Development tools | Server performance

Issues re. the use of netCDF and other data formats

netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.

  • Issue: ambiguity and completeness of CF
    • Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
    • Issue: geo-referencing
    • What is missing in CF?
    • sever independent CF-API
  • Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
  • Issue: Delivery of (small) data sets in ASCII/csv format
  • Issue: Reading of grib data (?)
  • Issue: tracability and revision tracking of datasets (in WCS metadata as well as in NetCDF metadata)

Use of WMS, WCS, WFS .. in combination?

Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.

  • Issue: WMS interface for preview; "latest" token for dynamic links?

WCS versions

WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. Define here the WCS version (WCS 2.0) issue in about one sentence

Gridded data service through WCS

This generally works well.

  • Issue: Extraction of vertical levels?
  • Issue: Ambiguity of WCS; core plus extended (do we know what is valid?)
  • Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
  • Issue: desirable time filtering options in WCS (hour of day, day of week, day of month, etc. filters)

Delivery of station-point data

  • Issue: use WCS or WFS, Combination of both?
  • Access rights?

Data server performance issues/solutions?

Define performance issues, measurements, ideas

  • Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
    • direct streaming of datasets to the client could be part of the solution, click here for details
    • generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
    • problem: both proposals might be mutually exclusive to some degree
  • Issue: XML Metadata assembly might take a long time depending on the catalogue content, i.e. with a lot of Identifiers
    • GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
      • attention: DescribeCoverage response depends on parameters
  • Issue: management overhead when opening NetCDF
    • when opening a NetCDF file, some metadata has to be read and data structures have to be set up
      • input files could be kept open for a while to avoid this overhead
  • Issue: temp file space is limited on WCS server
    • streaming approach for store=false parameter would not requrie additional local storage
    • temp file approach for store=true parameter could be limited by a maximum dataset size
      • requires a reliable output file size estimator
      • server would return an exception if estimated size is over given threshold
      • would force people to use store=false for large datasets
      • should not violate WCS 1.1 standard (too badly) as only store=false is mandatory

Server co-development tools, methods

Server code is maintained through SourceForge, Darcs code repositories are available at WUSTL and in Juelich.

  • Issues: Version control, Documentation
  • Issue: find a platform-independent Python NetCDF Interface that satisfies our needs

Relationship to non-AComServ WCS servers

  • Issue: protocol compatibility, standard compliance, data format(s)

Linkages to non-WCS servers

  • Issue: is there a need?
  • Issue: which protocols? (OpenDAP?, GIS servers?)

Data Network: Technical Realization (IT) Issues and Solutions

Network components, servers, clients, catalog | Mediated access | Catalog implementation, integration | Network level data flow, statistics, performance

What is the design philosophy

Service oriented (everything is a service), Component and network design for change; open source (everything?!)

Functionality of an Air Quality Data Network Catalog (ADNC)?

Content and structure (granularity) of ADNC?

Interoperability of ADNC

Interoperability with whom? what standards are needed? CF Naming extensions? ===

Access rights and access management

What are the generic (ISO, GEOSS, INSPIRE) and the AQ-specific discovery metadata?

Minimal metadata for data provenance, quality, access constrains?

Single AQ Catalog? Distributed? Service-oriented?

Network-level data flow, usage statistics (GoogleAnalytics), performance