Difference between revisions of "Candidate Technical Topics"

From Earth Science Information Partners (ESIP)
 
(43 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
<noinclude>{{AQ CoP Solta2011 Backlinks}}</noinclude>
 
<noinclude>{{AQ CoP Solta2011 Backlinks}}</noinclude>
  
=== Selected Server Topics ===
+
==Air Quality Data Network (ADN), Non-IT Issues==
==== Design Issues ====
+
[[AQ Network From Virtual to Real|Virtual to Real]]
* WCS version 1.0, 1.1.2, 2.0?
+
Purpose and scope of ADN | AQ network stakeholders | Relationship to integrating initiatives | Governance, legitimacy, impediments
* Combining WMS, WFS, WCS?
+
===What is the purpose of ADN? ===
==== Server for Different Data Types====
+
* Facilitate access to air quality information (measurements, model results, analyses) in order to enable users (stakeholders) to make better informed choices.
* Grid data (model, emiss., sat.)
+
* Boost scientific understanding of air chemistry and pollution transport processes by enabling synthesis views based on multi-platform observations and model ensembles.
* Point-Station (surf. Netw.)
+
* Reduce chaos by improving standardization and documentation of existing data sets and through exposing data processing and quality control procedures
* Other data types?
+
* Ensure reproducibility of scientific analyses by allowing for traceability of data sets and definition of responsibilities within the data chain.
====Server Maintenance-Support====
 
* SourceForge, Docum. Guides
 
* Server code governance
 
====Server Performance ====
 
* Remote access or cache (??)
 
* Extraction of vertical levels (WCS problem)
 
* Streaming concept:
 
* idea is to speed delivery of netcdf files by avoiding a local copy action before the download
 
* requires separation of header creation and data transfer (this is probably accomplished in C API but we are not sure about Python API)
 
* requires knowledge about new file dimension before creation (to show status bar etc.)
 
* might not require "real streaming formats" such as ncstream [ncstream is a new format that differs from classic netcdf or netcdf 4, hence users would have to locally convert from ncstream to netcdf if they want a netcdf file in the end. Therefore, ncstream may be useful for specific client applications, but maybe not for general file download]
 
* Request caching (store metadata so that they are available for a subsequent request with same parameters)
 
* File caching (seems to work for Windows but failed on Linux - would be obsolete if streaming works)
 
  
===Selected Network Topics===
+
===Who is involved in the ADN and what are their roles?===
====AQ Network Design Issues====
+
* Data providers: mandated data origin vs. scientific data sets; what are the implications of a real ADN for them?
* Autonomy-interop. balance
+
* Data federators/network hubs: who does it and why? Who has a mandate? What are the respective roles and focal points?
* Network Catalog(s)
+
* Users: Who are they? What is the interaction with them? Who should become a user but doesn't know about it yet?
====AQ Community Catalog====
+
===What organizations are stakeholders in the network? How do they relate to ADN?===
* Domain/Application Catalog(s)
+
===What is the ADN governance, legitimacy, impediments===
====Network Metadata Issues====
+
===What are the minimum requirements for ADN? ===
* Discovery metadata for AQ
+
What few things must be the same, so everything else can be different.
* Provenance, quality, security
+
* Autonomy - Interoperability balance...
====Network Operation, Maintenance====
+
* Who formulates the requirements?
* Governance, Legitimacy
+
* When is an ADN an ADN?
  
===Selected Client Topics===
+
===What is the scope of AQN? ===
====Client Applications====
+
Geographic, Variables, Ambient fixed station observations, satellite observations, emissions, models?
* Regulations/Directives
+
===What data (processing) level served through ADN?===
* Air Quality/Composition. Science
+
Is ADN a decider? Why not the Provider? Data Raw, Derived;
* Informing the public
+
 
====Client Design Issues====
+
==Data Servers: Technical Realization (IT) Issues and Solutions==
* Desktop vs web-based
+
netCDF, CF, WCS standards and conventions | Implementation for gridded and station data | Development tools | Server performance
* Workflow? Mashups?
+
===Issues re. the use of netCDF and other data formats===
====Community tools methods====
+
netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.
* Tools …
+
* Issue: ambiguity and completeness of CF
* Etc etc
+
** ''development of a server independent (python) CF-API library''
 +
*** some (beta) code available at FZJ: [http://repositories.icg.kfa-juelich.de/hg/CommonUtils/file/faad03a63f98/CommonUtils/cf_netcdf.py CommonUtils.cf_netcdf], feel free to suggest a better name
 +
** Brainstorming: What is missing in CF?
 +
** Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
 +
*** those two cases can be processed with different code, but uniformity would be less confusing
 +
*** could try to get ISO time recommendation into CF; would still need different code because it's only a recommendation
 +
** Issue: geo-referencing
 +
*** CF offers support for projections (see [http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#grid-mappings-and-projections here]), but they are not used by the WCS Server so far
 +
*** CF-Metadata List had some discussions about handling and specification of projections recently, see thread http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/007935.html
 +
 
 +
* Issue: We should define a standard NetCDF python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
 +
* Issue: other output formats
 +
** support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
 +
** Delivery of (small) data sets in ASCII/csv format?
 +
* Issue: Reading other gridded input data formats? (i.e. GRIB)
 +
* Issue: traceability and revision tracking of datasets (in WCS metadata as well as in NetCDF metadata)
 +
* Info: Cf standard name grammar: [http://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/ CF metadata grammar concept]
 +
 
 +
===Server co-development tools, methods ===
 +
Server code is maintained through SourceForge (bugtracker, tar balls), Darcs code repositories are available at WUSTL and in Juelich.
 +
* Issue: Version control
 +
** maintain a common codebase
 +
* Issue: Documentation
 +
** mainly inline tech documentation to date
 +
** need more documentation regarding operation
 +
** proposal: [http://sphinx.pocoo.org/ sphinx] for proper documentation (easy to include inline tech doc)
 +
 
 +
===Use of WMS, WCS, WFS .. in combination?===
 +
Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.
 +
* Issue: WMS interface for preview; "latest" token for dynamic links?
 +
** generic WMS service operating on external wcs
 +
** "latest" token could be realized on WCS-WMS interface by using the metadata on the WMS/client side and requesting the latest time; latest time could be default response of WMS server if nothing else requested
 +
 
 +
===Gridded data service through WCS===
 +
WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2.
 +
This generally works well.
 +
* Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
 +
** clients should only have to do one query to receive the whole times series in one piece instead of requiring the client-side logic to request multiple pieces
 +
** could create a "wrapper" module that can handle such cases with knowledge of the server-side file structure
 +
*** Kari has already done something like this for HTAP datasets, this could be a starting point
 +
* Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
 +
* Issue: Extraction of vertical levels
 +
** already defined through the rangesubset/fieldsubset parameter
 +
** is this definition OK for us or would we need something else/better?
 +
*** potential problem: only enumeration of levels possible, no ranges
 +
* Issue: current state of WCS 2.0? core released, but extensions still in draft (how do we know/keep track of what is currently valid?)
 +
 
 +
===Delivery of station-point data===
 +
* Issue: use WCS or WFS, Combination of both? (see also [[WCS_Server_Software#WCS_Server_for_Station-Point_Data_Type]])
 +
 
 +
=== Access rights ===
 +
* Issue: technical options to restrict access to datasets?
 +
 
 +
===Data server performance issues/solutions? ===
 +
Define performance issues, measurements, ideas
 +
 
 +
* Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
 +
** direct streaming of datasets to the client could be part of the solution, [[Streaming_and_or_netCDF_File|click here]] for details
 +
** generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
 +
** problem: both proposals might be mutually exclusive to some degree
 +
* Issue: management overhead when opening NetCDF
 +
** when opening a NetCDF file, some metadata has to be read and data structures have to be set up
 +
*** input files could be kept open for a while to avoid this overhead
 +
* Issue: temp file space is limited on WCS server
 +
** streaming approach for store=false parameter would not require additional local storage
 +
** temp file approach for store=true parameter could be limited by a maximum dataset size
 +
*** requires a reliable output file size estimator
 +
*** server would return an exception if estimated size is over given threshold
 +
*** would force people to use store=false for large datasets
 +
*** should not violate WCS 1.1 standard (too badly) as only store=false is mandatory
 +
* Issue: XML Metadata assembly might take a long time depending on the catalog content, i.e. with a lot of Identifiers
 +
** GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
 +
*** attention: DescribeCoverage response depends on parameters
 +
** minor issue compared to actual data delivery performance
 +
 
 +
===Relationship to non-AComServ (non-NetCDF) WCS servers===
 +
* data format(s)
 +
** most WCS clients don't understand NetCDF
 +
* Issue: protocol compatibility
 +
** might need to implement more optional features of WCS
 +
* standard compliance
 +
** will need a test suite for 1.1.2 (and manage to run it)
 +
===Linkages to non-WCS servers===
 +
* Issue: is there a need?
 +
* Issue: which protocols? (OpenDAP?, GIS servers?)
 +
 
 +
==Data Network: Technical Realization (IT) Issues and Solutions==
 +
Network components, servers, clients, catalog | Mediated access | Catalog implementation, integration | Network level data flow, statistics, performance
 +
===What is the design philosophy ===
 +
Service oriented (everything is a service), Component and network design for change; open source (everything?!)
 +
===Functionality of an Air Quality Data Network Catalog (ADNC)? ===
 +
===Content and structure (granularity) of ADNC?===
 +
===Interoperability of ADNC===
 +
This is a key question for achieving the goal to transform the ADNC from virtual to real. There are a number of systems out there which provide services in either real-time or from archived data. Connecting these services, so that data from all of the different sources can be made available through a single interface (note: this doesn't mean one implementation of this interface!) represents different technical and non-technical challenges. Different protocol versions, different OGC services, different metadata descriptions and different data formats upon delivery need to be recognized and some harmonisation must be achieved here.
 +
 
 +
From the existing services the Community WCS server hubs (Datafed and Juelich) and the NASA/DLR ACP are probably most advanced in terms of implementing data services through the OGC WCS standard. Yet, it remains to be demonstrated that these services can be connected in the fully interoperable loose coupling sense.
 +
 
 +
Interoperability with whom? what standards are needed? CF Naming extensions?
 +
 
 +
=== Access rights and access management===
 +
===What are the generic (ISO, GEOSS, INSPIRE) and the AQ-specific discovery metadata?===
 +
===Minimal metadata for data provenance, quality, access constrains?===
 +
===Single AQ Catalog? Distributed? Service-oriented?===
 +
===Network-level data flow, usage statistics (GoogleAnalytics), performance===

Latest revision as of 07:55, August 30, 2011

< Back to AQ CoP.png | Workshops | Air Quality Data Network

Air Quality Data Network (ADN), Non-IT Issues

Virtual to Real Purpose and scope of ADN | AQ network stakeholders | Relationship to integrating initiatives | Governance, legitimacy, impediments

What is the purpose of ADN?

  • Facilitate access to air quality information (measurements, model results, analyses) in order to enable users (stakeholders) to make better informed choices.
  • Boost scientific understanding of air chemistry and pollution transport processes by enabling synthesis views based on multi-platform observations and model ensembles.
  • Reduce chaos by improving standardization and documentation of existing data sets and through exposing data processing and quality control procedures
  • Ensure reproducibility of scientific analyses by allowing for traceability of data sets and definition of responsibilities within the data chain.

Who is involved in the ADN and what are their roles?

  • Data providers: mandated data origin vs. scientific data sets; what are the implications of a real ADN for them?
  • Data federators/network hubs: who does it and why? Who has a mandate? What are the respective roles and focal points?
  • Users: Who are they? What is the interaction with them? Who should become a user but doesn't know about it yet?

What organizations are stakeholders in the network? How do they relate to ADN?

What is the ADN governance, legitimacy, impediments

What are the minimum requirements for ADN?

What few things must be the same, so everything else can be different.

  • Autonomy - Interoperability balance...
  • Who formulates the requirements?
  • When is an ADN an ADN?

What is the scope of AQN?

Geographic, Variables, Ambient fixed station observations, satellite observations, emissions, models?

What data (processing) level served through ADN?

Is ADN a decider? Why not the Provider? Data Raw, Derived;

Data Servers: Technical Realization (IT) Issues and Solutions

netCDF, CF, WCS standards and conventions | Implementation for gridded and station data | Development tools | Server performance

Issues re. the use of netCDF and other data formats

netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.

  • Issue: ambiguity and completeness of CF
    • development of a server independent (python) CF-API library
    • Brainstorming: What is missing in CF?
    • Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
      • those two cases can be processed with different code, but uniformity would be less confusing
      • could try to get ISO time recommendation into CF; would still need different code because it's only a recommendation
    • Issue: geo-referencing
  • Issue: We should define a standard NetCDF python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
  • Issue: other output formats
    • support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
    • Delivery of (small) data sets in ASCII/csv format?
  • Issue: Reading other gridded input data formats? (i.e. GRIB)
  • Issue: traceability and revision tracking of datasets (in WCS metadata as well as in NetCDF metadata)
  • Info: Cf standard name grammar: CF metadata grammar concept

Server co-development tools, methods

Server code is maintained through SourceForge (bugtracker, tar balls), Darcs code repositories are available at WUSTL and in Juelich.

  • Issue: Version control
    • maintain a common codebase
  • Issue: Documentation
    • mainly inline tech documentation to date
    • need more documentation regarding operation
    • proposal: sphinx for proper documentation (easy to include inline tech doc)

Use of WMS, WCS, WFS .. in combination?

Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.

  • Issue: WMS interface for preview; "latest" token for dynamic links?
    • generic WMS service operating on external wcs
    • "latest" token could be realized on WCS-WMS interface by using the metadata on the WMS/client side and requesting the latest time; latest time could be default response of WMS server if nothing else requested

Gridded data service through WCS

WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. This generally works well.

  • Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
    • clients should only have to do one query to receive the whole times series in one piece instead of requiring the client-side logic to request multiple pieces
    • could create a "wrapper" module that can handle such cases with knowledge of the server-side file structure
      • Kari has already done something like this for HTAP datasets, this could be a starting point
  • Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
  • Issue: Extraction of vertical levels
    • already defined through the rangesubset/fieldsubset parameter
    • is this definition OK for us or would we need something else/better?
      • potential problem: only enumeration of levels possible, no ranges
  • Issue: current state of WCS 2.0? core released, but extensions still in draft (how do we know/keep track of what is currently valid?)

Delivery of station-point data

Access rights

  • Issue: technical options to restrict access to datasets?

Data server performance issues/solutions?

Define performance issues, measurements, ideas

  • Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
    • direct streaming of datasets to the client could be part of the solution, click here for details
    • generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
    • problem: both proposals might be mutually exclusive to some degree
  • Issue: management overhead when opening NetCDF
    • when opening a NetCDF file, some metadata has to be read and data structures have to be set up
      • input files could be kept open for a while to avoid this overhead
  • Issue: temp file space is limited on WCS server
    • streaming approach for store=false parameter would not require additional local storage
    • temp file approach for store=true parameter could be limited by a maximum dataset size
      • requires a reliable output file size estimator
      • server would return an exception if estimated size is over given threshold
      • would force people to use store=false for large datasets
      • should not violate WCS 1.1 standard (too badly) as only store=false is mandatory
  • Issue: XML Metadata assembly might take a long time depending on the catalog content, i.e. with a lot of Identifiers
    • GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
      • attention: DescribeCoverage response depends on parameters
    • minor issue compared to actual data delivery performance

Relationship to non-AComServ (non-NetCDF) WCS servers

  • data format(s)
    • most WCS clients don't understand NetCDF
  • Issue: protocol compatibility
    • might need to implement more optional features of WCS
  • standard compliance
    • will need a test suite for 1.1.2 (and manage to run it)

Linkages to non-WCS servers

  • Issue: is there a need?
  • Issue: which protocols? (OpenDAP?, GIS servers?)

Data Network: Technical Realization (IT) Issues and Solutions

Network components, servers, clients, catalog | Mediated access | Catalog implementation, integration | Network level data flow, statistics, performance

What is the design philosophy

Service oriented (everything is a service), Component and network design for change; open source (everything?!)

Functionality of an Air Quality Data Network Catalog (ADNC)?

Content and structure (granularity) of ADNC?

Interoperability of ADNC

This is a key question for achieving the goal to transform the ADNC from virtual to real. There are a number of systems out there which provide services in either real-time or from archived data. Connecting these services, so that data from all of the different sources can be made available through a single interface (note: this doesn't mean one implementation of this interface!) represents different technical and non-technical challenges. Different protocol versions, different OGC services, different metadata descriptions and different data formats upon delivery need to be recognized and some harmonisation must be achieved here.

From the existing services the Community WCS server hubs (Datafed and Juelich) and the NASA/DLR ACP are probably most advanced in terms of implementing data services through the OGC WCS standard. Yet, it remains to be demonstrated that these services can be connected in the fully interoperable loose coupling sense.

Interoperability with whom? what standards are needed? CF Naming extensions?

Access rights and access management

What are the generic (ISO, GEOSS, INSPIRE) and the AQ-specific discovery metadata?

Minimal metadata for data provenance, quality, access constrains?

Single AQ Catalog? Distributed? Service-oriented?

Network-level data flow, usage statistics (GoogleAnalytics), performance