Difference between revisions of "Solta 2011 Agenda"

From Earth Science Information Partners (ESIP)
Line 20: Line 20:
 
This sessions is standards and conventions | Implementation for gridded and station data | Development tools | Server performance<br>
 
This sessions is standards and conventions | Implementation for gridded and station data | Development tools | Server performance<br>
  
'''Data Servers: Technical Realization (IT) Issues and Solutions'''<br>
 
* WMS for display. Issues? Atm. Composition Portal contribution
 
* WFS .. for station description 
 
* WCS data encoding
 
** Data structure hierarchy: DataHub; Service; Coverage: Field; Flag
 
** WCS 1.1: Service->Group of similar datasets; Coverage->>Dataset; Filed->Parameter; Flag->Flag
 
** Combination of W*S services: WCS->Access data; WFS->Access spatial metadata; WMS->Display spatial data
 
'''Real Data-to-WCS-Mapping tructure'''<br>
 
* Data hub that exposes the data  ==> Provider    ==>  WCS Service 
 
* Observation platform or network ==> Dataset    ==>  WCS Coverage
 
* Observation parameter/variable ==> Parameter ==> WCS Field
 
 
'''Issues re. the use of netCDF and other data formats'''<br>
 
'''Issues re. the use of netCDF and other data formats'''<br>
 
netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.  
 
netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.  
* Issue: ambiguity of CF
+
* Issue: ambiguity and completeness of CF
 +
** Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
 +
** Issue: geo-referencing (also see CF-ML discussion "the need to store lat/lon coordinates in a CF-compliant netCDF file")
 +
** What is missing in CF?
 +
** sever independent CF-API Package
 
* Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
 
* Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
* Issue: Delivery of (small) data sets in ASCII/csv format
+
* Issue: other ouput formats
* Issue: Reading of grib data (?)
+
** support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
'''Use of WMS, WCS, WFS .. in combination?'''<br>
+
** Delivery of (small) data sets in ASCII/csv format?
Data display/preview is through WMS. AQ data can be delivered through WCS, WFS. In AComServ, WCS for transferring ndim grid and point-station data; WFS for deliver monitoring station descriptions.
+
* Issue: Reading other gridded input data formats? (i.e. GRIB)
* Issue: WMS interface for preview; "latest" token for dynamic links?
+
 
 +
'''Data server performance issues/solutions'''
 +
* Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
 +
** direct streaming of datasets to the client could be part of the solution, [[Streaming_and_or_netCDF_File|click here]] for details
 +
** generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
 +
** problem: both proposals might be mutually exclusive to some degree
 +
* Issue: XML Metadata assembly might take a long time depending on the catalogue content, i.e. with a lot of Identifiers
 +
** GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
 +
*** attention: DescribeCoverage response depends on parameters
 +
* Issue: management overhead when opening NetCDF
 +
** when opening a NetCDF file, some metadata has to be read and data structures have to be set up
 +
*** input files could be kept open for a while to avoid this overhead
 +
* Issue: temp file space is limited on WCS server
 +
** streaming approach for store=false parameter would not requrie additional local storage
 +
** temp file approach for store=true parameter could be limited by a maximum dataset size
 +
*** requires a reliable output file size estimator
 +
*** server would return an exception if estimated size is over given threshold
 +
*** would force people to use store=false for large datasets
 +
*** should not violate WCS 1.1 standard (too badly) as only store=false is mandatory
 +
 
 +
'''Data Servers: Technical Realization (IT) Issues and Solutions'''<br>
 +
* which W*S protocol for which purpose, how to combine?
 +
** WMS for display/preview of spatial data
 +
** WFS .. for station description/spatial metadata?
 +
** WCS for "everything else"? (gridded ("raw") datasets)
 +
* WCS Data structure hierarchy: DataHub; Service; Coverage: Field; Flag
 +
** WCS 1.1 terminology: Service->Group of similar datasets; Coverage->Dataset; Field->Parameter; Flag->Flag
  
'''WCS versions'''<br>
 
WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. Define here the WCS version (WCS 2.0) issue in about one sentence<br>
 
 
'''Gridded data service through WCS'''<br>
 
'''Gridded data service through WCS'''<br>
This generally works well.
+
WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. This generally works well.
* Issue: Extraction of vertical levels?
+
* Issue: Extraction of vertical levels
* Issue: Ambiguity of WCS; core plus extended (do we know what is valid?)
+
* Issue: current state of WCS 2.0?
 +
** core relased, but extensions still in draft (how do we know what is valid?)
 +
* Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
 +
** create a "wrapper" module that can handle such cases?
 +
** Kari has already done something like this for HTAP datasets, this could be a starting point
 +
* Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
 +
** Kari has already created such filters, but so far they are outside the standard
  
'''Delivery of station-point data'''
+
'''Delivery of station-point data'''<br>
* Issue: use WCS or WFS, Combination of both?
+
* Issue: use WCS or WFS, Combination of both/which combination?  
* Access rights?
+
 
 +
'''Access rights'''<br>
 +
* Issue: technical options to restrict access to datasets?
  
'''Data server performance issues/solutions?'''<br>
 
Define performance issues, measurements<br>
 
 
'''Server co-development tools, methods'''<br>  
 
'''Server co-development tools, methods'''<br>  
 
Server code is maintained through SourceForge, Darcs code repositories are available at WUSTL and in Juelich.
 
Server code is maintained through SourceForge, Darcs code repositories are available at WUSTL and in Juelich.
* Issues: Version control, Platform independence, Documentation
+
* Issues: Platform independence (netcdf interface), Documentation
 +
 
 +
'''Relationship to non-AComServ (non-NetCDF) WCS servers'''<br>
 +
* data format(s)
 +
** many WCS clients don't understand NetCDF
 +
* Issue: protocol compatibility
 +
** might need to implement more optional features of WCS
 +
* standard compliance
 +
** will need a test suite for 1.1.2 (and manage to run it)
 +
 
 +
'''Real Data-to-WCS-Mapping tructure'''<br>
 +
* Data hub that exposes the data  ==> Provider    ==>  WCS Service 
 +
* Observation platform or network ==> Dataset    ==>  WCS Coverage
 +
* Observation parameter/variable ==> Parameter ==> WCS Field
  
 
==Tue  16.00-18.00: NoIT: ADN scope, providers, users==
 
==Tue  16.00-18.00: NoIT: ADN scope, providers, users==

Revision as of 08:41, August 3, 2011

< Back to AQ CoP.png | Workshops | Air Quality Data Network

Monday Evening: Registration and Social

Tue 8.00-10.00: Self-Introduction, 5 mins/participant

In order to make efficient use of our time in Croatia, we ask you all to prepare for the workshop in the following ways (aside from arranging your travel etc.):

  • Slides 1-2: Name, Institution, Relevant research, development or organizational work on AQ data system interoperability and networking
  • Slide(s) 3-(4): Involvement and participation in projects, programs, i.e. list of Integrating Initiatives. Potential contributions.

Tue 10.30-12.30: Hubs, servers, ADN, CoP

This session will be a status report on the current state of data accessibility and networking.

  • current state of AQ data networking with the standards based community based server software
  • Role of GEO AQ CoP

What is the purpose of ADN?
Who are the participating users of the network? What are their roles?
What organizations are stakeholders in the network?
How do they relate to ADN?

Tue 16.00-18.00: IT: Community server software

This sessions is standards and conventions | Implementation for gridded and station data | Development tools | Server performance

Issues re. the use of netCDF and other data formats
netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.

  • Issue: ambiguity and completeness of CF
    • Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
    • Issue: geo-referencing (also see CF-ML discussion "the need to store lat/lon coordinates in a CF-compliant netCDF file")
    • What is missing in CF?
    • sever independent CF-API Package
  • Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
  • Issue: other ouput formats
    • support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
    • Delivery of (small) data sets in ASCII/csv format?
  • Issue: Reading other gridded input data formats? (i.e. GRIB)

Data server performance issues/solutions

  • Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
    • direct streaming of datasets to the client could be part of the solution, click here for details
    • generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
    • problem: both proposals might be mutually exclusive to some degree
  • Issue: XML Metadata assembly might take a long time depending on the catalogue content, i.e. with a lot of Identifiers
    • GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
      • attention: DescribeCoverage response depends on parameters
  • Issue: management overhead when opening NetCDF
    • when opening a NetCDF file, some metadata has to be read and data structures have to be set up
      • input files could be kept open for a while to avoid this overhead
  • Issue: temp file space is limited on WCS server
    • streaming approach for store=false parameter would not requrie additional local storage
    • temp file approach for store=true parameter could be limited by a maximum dataset size
      • requires a reliable output file size estimator
      • server would return an exception if estimated size is over given threshold
      • would force people to use store=false for large datasets
      • should not violate WCS 1.1 standard (too badly) as only store=false is mandatory

Data Servers: Technical Realization (IT) Issues and Solutions

  • which W*S protocol for which purpose, how to combine?
    • WMS for display/preview of spatial data
    • WFS .. for station description/spatial metadata?
    • WCS for "everything else"? (gridded ("raw") datasets)
  • WCS Data structure hierarchy: DataHub; Service; Coverage: Field; Flag
    • WCS 1.1 terminology: Service->Group of similar datasets; Coverage->Dataset; Field->Parameter; Flag->Flag

Gridded data service through WCS
WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. This generally works well.

  • Issue: Extraction of vertical levels
  • Issue: current state of WCS 2.0?
    • core relased, but extensions still in draft (how do we know what is valid?)
  • Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
    • create a "wrapper" module that can handle such cases?
    • Kari has already done something like this for HTAP datasets, this could be a starting point
  • Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
    • Kari has already created such filters, but so far they are outside the standard

Delivery of station-point data

  • Issue: use WCS or WFS, Combination of both/which combination?

Access rights

  • Issue: technical options to restrict access to datasets?

Server co-development tools, methods
Server code is maintained through SourceForge, Darcs code repositories are available at WUSTL and in Juelich.

  • Issues: Platform independence (netcdf interface), Documentation

Relationship to non-AComServ (non-NetCDF) WCS servers

  • data format(s)
    • many WCS clients don't understand NetCDF
  • Issue: protocol compatibility
    • might need to implement more optional features of WCS
  • standard compliance
    • will need a test suite for 1.1.2 (and manage to run it)

Real Data-to-WCS-Mapping tructure

  • Data hub that exposes the data ==> Provider ==> WCS Service
  • Observation platform or network ==> Dataset ==> WCS Coverage
  • Observation parameter/variable ==> Parameter ==> WCS Field

Tue 16.00-18.00: NoIT: ADN scope, providers, users

  • Short description of major data hubs
    • DataFed
    • FJ Juelich
    • NGC
    • VIEWS
    • EBAS

    • DLR
    • RSIG
    • AIRNow
    • EEA NRT

Wed 8.00-10.00: Breakout reports, general server items

Report from the IT breakout session: Community server software
Report from non IT breakout session: ADN scope, providers, users..
"'Report from the pre-workshop Data Catalog side meeting Possible discussion topics/focus on cross-overs between IT and non-IT issues

  • standard definitions (clarity, ambiguity, completeness, ...)
  • standard development and documentation
  • open-source server software development
  • platform issues, portability
  • coding language(s), code interchangeability
  • coding style and software development approaches
  • Data Content
  • organisation of data
  • data formats, standard compliance
  • data access
  • performance
  • flexibility
  • user friendliness
  • meeting user demands (fitness for purpose)
  • governance, responsibilities, etc.
  • Open Source collaborative approach. Issues?
  • General software design: Multi-layer, Multi-protocol. Standard-Convention driven
  • Porting, Installation. Issues?
  • Maintenance, governance. Issues?
  • Criteria for single (trusted ?) 'primary' data source
  • Designations for secondary, derived, augmented data sources

Wed 10.30-12.30: AQ network: Servers, Catalog, Clients

Preparing the way forward...

What few things must be the same, so that everything else can be different?

Metadata for finding and understanding, CF, ISO)
Data access/use constrains, quality control, data versioning, etc.
What is the design philosophy
Service oriented (everything is a service), Component and network design for change; open source (everything?!)
Network-level data flow, usage statistics (GoogleAnalytics), performance

... goal is to obtain a good basis for discussion in the following breakout sessions, both from the IT and non-IT sides.


  • Server Software Design (uFIND). Issues?

Functionality of an Air Quality Data Network Catalog (ADNC)?
Content and structure (granularity) of ADNC?
Interoperability of ADNC

  • Interoperability with whom? what standards are needed? CF Naming extensions?
  • AQ Discovery Metadata Convention (for use in ISO, Data Catalogs...)
  • Extend CF Naming conventions for Point Data
  • Devise human-readable CF naming equivalents?

Access rights and access management

What are the generic (ISO, GEOSS, INSPIRE) and the AQ-specific discovery metadata?
Minimal metadata for data provenance, quality, access constrains?
Single AQ Catalog? Distributed? Service-oriented?

  • GI-cat
  • uFind

Wed 16.00-18.00: Relationship, cooperation, governance

  • User perspective, value chain .. user can not find..

  • EPA - HTAP Terry Keating??
  • EEA - H. Anderson??, Peder ??


Relationships, cross-thematic links (EGIDA, ESIP) How can we collaborate?

Group dinner

Thu 8.00-10.00: Networking impediments, opportunities

  • Clear statements about obstacles..
    • no organizational structure,
    • no dedicated funding or
    • no clear idea yet how to do it (or several independent ideas?).
  • Opportunities, fixes
    • Identification of manageable work packages
    • Reusable components, resources

JJ Bogardi, Global Water System Project (GWSP) at EGIDA, Bonn:
Nature of Networking Projects: Complex funding .., multiple obligations. Interdisciplinary and international. Differing project maturity. Mixture of paid and voluntary contributors. Governance and project cultures may differ.
Which glue keeps it together? Trust and personal affinity. Common objectives and scientific values. Mutual respect. Mutual benefit (win-win). Complementarity. Donor dictate
„Lethal“ ingredients. Turf mentality. Budget discrepancies. Too much competition. Lack of data and information exchange. Donor jealousy


What can we do to achieve a "win-win" situation?

Thu 10.30-12.30: ADN user relations/help, whom, what?

  • Target User communities
  • Use cases for different applications... from scientists to managers, media people and the general public)?
  • How can users find out about (each) system? Big future issues: data quality, traceability, metadata..

Relationship to non-AComServ WCS servers

  • Issue: protocol compatibility, standard compliance, data format(s)
  • Issue: is there a need? How can we benefit from "other" data? How can they benefit from AQ data?
  • Issue: which protocols? (OpenDAP?, GIS servers?)

Thu 16.00-18.00: Workshop outputs, outcomes, plans?

What are the anticipated outputs? Agreement on community WCS server for grid and point data; server governance, distributed catalog; workshop summary
What are the anticipated outcomes? Better understanding of the network, higher level of trust and concrete steps toward turning the ADN from virtual to real
What are the short-term opportunities? Do we have Common long-term goals and visions?

Friday: Boat trip