Candidate Technical Topics

From Earth Science Information Partners (ESIP)

< Back to AQ CoP.png | Workshops | Air Quality Data Network

Air Quality Data Network (ADN), Non-IT Issues

Virtual to Real Purpose and scope of ADN | AQ network stakeholders | Relationship to integrating initiatives | Governance, legitimacy, impediments

What is the purpose of ADN?

  • Facilitate access to air quality information (measurements, model results, analyses) in order to enable users (stakeholders) to make better informed choices.
  • Boost scientific understanding of air chemistry and pollution transport processes by enabling synthesis views based on multi-platform observations and model ensembles.
  • Reduce chaos by improving standardization and documentation of existing data sets and through exposing data processing and quality control procedures
  • Ensure reproducibility of scientific analyses by allowing for traceability of data sets and definition of responsibilities within the data chain.

Who is involved in the ADN and what are their roles?

  • Data providers: mandated data origin vs. scientific data sets; what are the implications of a real ADN for them?
  • Data federators/network hubs: who does it and why? Who has a mandate? What are the respective roles and focal points?
  • Users: Who are they? What is the interaction with them? Who should become a user but doesn't know about it yet?

What organizations are stakeholders in the network? How do they relate to ADN?

What is the ADN governance, legitimacy, impediments

What are the minimum requirements for ADN?

What few things must be the same, so everything else can be different.

  • Autonomy - Interoperability balance...
  • Who formulates the requirements?
  • When is an ADN an ADN?

What is the scope of AQN?

Geographic, Variables, Ambient fixed station observations, satellite observations, emissions, models?

What data (processing) level served through ADN?

Is ADN a decider? Why not the Provider? Data Raw, Derived;

Data Servers: Technical Realization (IT) Issues and Solutions

netCDF, CF, WCS standards and conventions | Implementation for gridded and station data | Development tools | Server performance

Issues re. the use of netCDF and other data formats

netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.

  • Issue: ambiguity and completeness of CF
    • development of a server independent (python) CF-API library
    • Brainstorming: What is missing in CF?
    • Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
      • those two cases can be processed with different code, but uniformity would be less confusing
      • could try to get ISO time recommendation into CF; would still need different code because it's only a recommendation
    • Issue: geo-referencing
  • Issue: We should define a standard NetCDF python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
  • Issue: other output formats
    • support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
    • Delivery of (small) data sets in ASCII/csv format?
  • Issue: Reading other gridded input data formats? (i.e. GRIB)
  • Issue: traceability and revision tracking of datasets (in WCS metadata as well as in NetCDF metadata)
  • Info: Cf standard name grammar: CF metadata grammar concept

Server co-development tools, methods

Server code is maintained through SourceForge (bugtracker, tar balls), Darcs code repositories are available at WUSTL and in Juelich.

  • Issue: Version control
    • maintain a common codebase
  • Issue: Documentation
    • mainly inline tech documentation to date
    • need more documentation regarding operation
    • proposal: sphinx for proper documentation (easy to include inline tech doc)

Use of WMS, WCS, WFS .. in combination?

Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.

  • Issue: WMS interface for preview; "latest" token for dynamic links?
    • generic WMS service operating on external wcs
    • "latest" token could be realized on WCS-WMS interface by using the metadata on the WMS/client side and requesting the latest time; latest time could be default response of WMS server if nothing else requested

Gridded data service through WCS

WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. This generally works well.

  • Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
    • clients should only have to do one query to receive the whole times series in one piece instead of requiring the client-side logic to request multiple pieces
    • could create a "wrapper" module that can handle such cases with knowledge of the server-side file structure
      • Kari has already done something like this for HTAP datasets, this could be a starting point
  • Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
  • Issue: Extraction of vertical levels
    • already defined through the rangesubset/fieldsubset parameter
    • is this definition OK for us or would we need something else/better?
      • potential problem: only enumeration of levels possible, no ranges
  • Issue: current state of WCS 2.0? core released, but extensions still in draft (how do we know/keep track of what is currently valid?)

Delivery of station-point data

Access rights

  • Issue: technical options to restrict access to datasets?

Data server performance issues/solutions?

Define performance issues, measurements, ideas

  • Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
    • direct streaming of datasets to the client could be part of the solution, click here for details
    • generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
    • problem: both proposals might be mutually exclusive to some degree
  • Issue: management overhead when opening NetCDF
    • when opening a NetCDF file, some metadata has to be read and data structures have to be set up
      • input files could be kept open for a while to avoid this overhead
  • Issue: temp file space is limited on WCS server
    • streaming approach for store=false parameter would not require additional local storage
    • temp file approach for store=true parameter could be limited by a maximum dataset size
      • requires a reliable output file size estimator
      • server would return an exception if estimated size is over given threshold
      • would force people to use store=false for large datasets
      • should not violate WCS 1.1 standard (too badly) as only store=false is mandatory
  • Issue: XML Metadata assembly might take a long time depending on the catalog content, i.e. with a lot of Identifiers
    • GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
      • attention: DescribeCoverage response depends on parameters
    • minor issue compared to actual data delivery performance

Relationship to non-AComServ (non-NetCDF) WCS servers

  • data format(s)
    • most WCS clients don't understand NetCDF
  • Issue: protocol compatibility
    • might need to implement more optional features of WCS
  • standard compliance
    • will need a test suite for 1.1.2 (and manage to run it)

Linkages to non-WCS servers

  • Issue: is there a need?
  • Issue: which protocols? (OpenDAP?, GIS servers?)

Data Network: Technical Realization (IT) Issues and Solutions

Network components, servers, clients, catalog | Mediated access | Catalog implementation, integration | Network level data flow, statistics, performance

What is the design philosophy

Service oriented (everything is a service), Component and network design for change; open source (everything?!)

Functionality of an Air Quality Data Network Catalog (ADNC)?

Content and structure (granularity) of ADNC?

Interoperability of ADNC

This is a key question for achieving the goal to transform the ADNC from virtual to real. There are a number of systems out there which provide services in either real-time or from archived data. Connecting these services, so that data from all of the different sources can be made available through a single interface (note: this doesn't mean one implementation of this interface!) represents different technical and non-technical challenges. Different protocol versions, different OGC services, different metadata descriptions and different data formats upon delivery need to be recognized and some harmonisation must be achieved here.

From the existing services the Community WCS server hubs (Datafed and Juelich) and the NASA/DLR ACP are probably most advanced in terms of implementing data services through the OGC WCS standard. Yet, it remains to be demonstrated that these services can be connected in the fully interoperable loose coupling sense.

Interoperability with whom? what standards are needed? CF Naming extensions?

Access rights and access management

What are the generic (ISO, GEOSS, INSPIRE) and the AQ-specific discovery metadata?

Minimal metadata for data provenance, quality, access constrains?

Single AQ Catalog? Distributed? Service-oriented?

Network-level data flow, usage statistics (GoogleAnalytics), performance