Solta 2011 Agenda

From Earth Science Information Partners (ESIP)

< Back to AQ CoP.png | Workshops | Air Quality Data Network

Monday Evening: Registration and Social

Tue 8.00-10.00: Self-Introduction, 5 mins/participant

Husar/Vidic


In order to make efficient use of our time in Croatia, we ask you all to prepare for the workshop in the following ways (aside from arranging your travel etc.):

  • Slides 1-2: Name, Institution, Relevant research, development or organizational work on AQ data system interoperability and networking
  • Slide(s) 3-(4): Involvement and participation in projects, programs, i.e. list of Integrating Initiatives. Potential contributions.

Tue 10.30-12.30: Introduction of Hubs, 5 min each, CoP 5 min

Schultz/Bernonville


This session will be a status report from major data hubs

  • DataFed
  • FJ Juelich
  • NGC/CIERA
  • EBAS
  • DLR/ACP
  • EEA NRT
  • AIRNow
  • AQMEII

--Afternoon 16:00 US Nodes: VIEWS, AeroCom, RSIG, GIOVANNI --
Existing Data Hubs

  • Hubs are established organizations to deliver AQ data
  • Part of their mandate is to integrate, harmonize data
  • The data offerings are directed toward clients
  • Provide data through conventional data transfer

--- Current - Network comparison--- Data integration is pursued for:

  • Facilitating access and reuse through a single access point
  • Providing more comprehensive information by combining complementing data.

Data hubs already perform data integration within their respective domains. AQ data networking extends the scope of the integration by connecting

  • Integrating the integrating hubs (non-intrusively!)
  • Use standard interfacing protocol for lose coupling i. e. networking
  • Generic processing services that are applicable to all data


Data Catalogs

  • AQ Community Catalog
  • GI-cat


Networking "what few things must be the same

  1. Hubs expose a fraction of their holdings as standards-based data access web services services
  2. Data resources are
  • Role of GEO AQ CoP
  • Role of Integrating Initiatives

Tue 16.00-18.00: IT: AQ Community server software

Decker/Hoijarvi


This sessions is standards and conventions | Implementation for gridded and station data | Development tools | Server performance

Issues re. the use of netCDF and other data formats
netCDF is standard format for multi-dimensional data. Cf-netCDF is used both as an archival format of grid data as well as a payload format for WCS queries.

  • Issue: ambiguity and completeness of CF
    • Issue: CF (udunits) time format not the same as ISO Time format (as used by WCS)
    • Issue: geo-referencing (also see CF-ML discussion "the need to store lat/lon coordinates in a CF-compliant netCDF file")
    • What is missing in CF?
    • sever independent CF-API Package
  • Issue: We should define a standard python interface (PyNIO, python-netcdf4, scipy.io.netcdf?)
  • Issue: other ouput formats
    • support fused into server or add-on concept (possibly using the public W*S/NetCDF interface)
    • Delivery of (small) data sets in ASCII/csv format?
  • Issue: Reading other gridded input data formats? (i.e. GRIB)

Data server performance issues/solutions

  • Issue: especially big datasets take a long time to prepare for delivery (slicing/subsetting, etc.)
    • direct streaming of datasets to the client could be part of the solution, click here for details
    • generated datasets could be cached for a while, so they could be delivered again when there is a request with compatible parameters
    • problem: both proposals might be mutually exclusive to some degree
  • Issue: XML Metadata assembly might take a long time depending on the catalogue content, i.e. with a lot of Identifiers
    • GetCapabilities response Metadata is very static anyway, other responses (DescribeCoverage) could be cached for a while
      • attention: DescribeCoverage response depends on parameters
  • Issue: management overhead when opening NetCDF
    • when opening a NetCDF file, some metadata has to be read and data structures have to be set up
      • input files could be kept open for a while to avoid this overhead
  • Issue: temp file space is limited on WCS server
    • streaming approach for store=false parameter would not requrie additional local storage
    • temp file approach for store=true parameter could be limited by a maximum dataset size
      • requires a reliable output file size estimator
      • server would return an exception if estimated size is over given threshold
      • would force people to use store=false for large datasets
      • should not violate WCS 1.1 standard (too badly) as only store=false is mandatory

Data Servers: Technical Realization (IT) Issues and Solutions

  • which W*S protocol for which purpose, how to combine?
    • WMS for display/preview of spatial data
    • WFS .. for station description/spatial metadata?
    • WCS for "everything else"? (gridded ("raw") datasets)
  • WCS Data structure hierarchy: DataHub; Service; Coverage: Field; Flag
    • WCS 1.1 terminology: Service->Group of similar datasets; Coverage->Dataset; Field->Parameter; Flag->Flag

Gridded data service through WCS
WCS is implemented in multiple versions: 1.0, 1.12, 2.0. The AQ Community Server (AComServ) is now implemented using WCS 1.1.2. This generally works well.

  • Issue: Extraction of vertical levels
  • Issue: current state of WCS 2.0?
    • core relased, but extensions still in draft (how do we know what is valid?)
  • Issue: serve "virtual" WCS datasets with continuous time line assembled from many source files
    • create a "wrapper" module that can handle such cases?
    • Kari has already done something like this for HTAP datasets, this could be a starting point
  • Issue: desirable time filtering options in WCS: hour of day, day of week, day of month, etc.
    • Kari has already created such filters, but so far they are outside the standard

Delivery of station-point data

  • Issue: use WCS or WFS, Combination of both/which combination?

Access rights

  • Issue: technical options to restrict access to datasets?

Server co-development tools, methods
Server code is maintained through SourceForge, Darcs code repositories are available at WUSTL and in Juelich.

  • Issues: Platform independence (netcdf interface), Documentation

Relationship to non-AComServ (non-NetCDF) WCS servers

  • data format(s)
    • many WCS clients don't understand NetCDF
  • Issue: protocol compatibility
    • might need to implement more optional features of WCS
  • standard compliance
    • will need a test suite for 1.1.2 (and manage to run it)

Real Data-to-WCS-Mapping tructure

  • Data hub that exposes the data ==> Provider ==> WCS Service
  • Observation platform or network ==> Dataset ==> WCS Coverage
  • Observation parameter/variable ==> Parameter ==> WCS Field

Tue 16.00-18.00: NoIT: ADN scope, providers, users

Vik/Fialkowski



    • VIEWS
    • GIOVANNI
    • RSIG
    • AEROCom

Wed 8.00-10.00: Breakout reports, general server items

Eckhardt, Gaussev


Report from the IT breakout session: Community server software
Report from non IT breakout session: ADN scope, providers, users..
"'Report from the pre-workshop Data Catalog side meeting Possible discussion topics/focus on cross-overs between IT and non-IT issues

  • standard definitions (clarity, ambiguity, completeness, ...)
  • standard development and documentation
  • open-source server software development
  • platform issues, portability
  • coding language(s), code interchangeability
  • coding style and software development approaches
  • Data Content
  • organisation of data
  • data formats, standard compliance
  • data access
  • performance
  • flexibility
  • user friendliness
  • meeting user demands (fitness for purpose)
  • governance, responsibilities, etc.
  • Open Source collaborative approach. Issues?
  • General software design: Multi-layer, Multi-protocol. Standard-Convention driven
  • Porting, Installation. Issues?
  • Maintenance, governance. Issues?
  • Criteria for single (trusted ?) 'primary' data source
  • Designations for secondary, derived, augmented data sources

Wed 10.30-12.30: AQ network: Servers, Catalog, Clients

Bigagli/Robinson


Preparing the way forward...

What few things must be the same, so that everything else can be different?

Metadata for finding and understanding, CF, ISO)
Data access/use constrains, quality control, data versioning, etc.
What is the design philosophy
Service oriented (everything is a service), Component and network design for change; open source (everything?!)
Network-level data flow, usage statistics (GoogleAnalytics), performance
... goal is to obtain a good basis for discussion in the following breakout sessions, both from the IT and non-IT sides.

  • Server Software Design (uFIND). Issues?

Functionality of an Air Quality Data Network Catalog (ADNC)?
Content and structure (granularity) of ADNC?
Interoperability of ADNC

  • Interoperability with whom? what standards are needed? CF Naming extensions?
  • AQ Discovery Metadata Convention (for use in ISO, Data Catalogs...)
  • Extend CF Naming conventions for Point Data
  • Devise human-readable CF naming equivalents?

Access rights and access management

What are the generic (ISO, GEOSS, INSPIRE) and the AQ-specific discovery metadata?
Minimal metadata for data provenance, quality, access constrains?
Single AQ Catalog? Distributed? Service-oriented?

  • GI-cat
  • uFind

Wed 16.00-18.00: Relationship, cooperation, governance

Nativi/Domenico


  • User perspective, value chain .. user can not find..

  • EPA - HTAP Terry Keating??
  • EEA - H. Anderson??, Peder ??


Relationships, cross-thematic links (EGIDA, ESIP) How can we collaborate?

Group dinner

Thu 8.00-10.00: Networking impediments, opportunities

Kjeld/Ludewig


  • Clear statements about obstacles..
    • no organizational structure,
    • no dedicated funding or
    • no clear idea yet how to do it (or several independent ideas?).
  • Opportunities, fixes
    • Identification of manageable work packages
    • Reusable components, resources

JJ Bogardi, Global Water System Project (GWSP) at EGIDA, Bonn:
Nature of Networking Projects: Complex funding .., multiple obligations. Interdisciplinary and international. Differing project maturity. Mixture of paid and voluntary contributors. Governance and project cultures may differ.
Which glue keeps it together? Trust and personal affinity. Common objectives and scientific values. Mutual respect. Mutual benefit (win-win). Complementarity. Donor dictate
„Lethal“ ingredients. Turf mentality. Budget discrepancies. Too much competition. Lack of data and information exchange. Donor jealousy


What can we do to achieve a "win-win" situation?

Thu 10.30-12.30: ADN user relations/help, whom, what?

Galmarini/Dye

'

  • Target User communities
  • Use cases for different applications... from scientists to managers, media people and the general public)?
  • How can users find out about (each) system? Big future issues: data quality, traceability, metadata..

Relationship to non-AComServ WCS servers

  • Issue: protocol compatibility, standard compliance, data format(s)
  • Issue: is there a need? How can we benefit from "other" data? How can they benefit from AQ data?
  • Issue: which protocols? (OpenDAP?, GIS servers?)

Thu 16.00-18.00: Workshop outputs, outcomes, plans?

Schultz/Husar


What are the anticipated outputs? Agreement on community WCS server for grid and point data; server governance, distributed catalog; workshop summary
What are the anticipated outcomes? Better understanding of the network, higher level of trust and concrete steps toward turning the ADN from virtual to real
What are the short-term opportunities? Do we have Common long-term goals and visions?

Friday: Boat trip