HTAP GEOSS

From Federation of Earth Science Information Partners
Revision as of 10:47, April 16, 2007 by ERobinson (talk | contribs)

HTAP Information System (HTAP IS)

The HTAP Information System has to facilitate communication, provide a shared workspace and offer tools and methods for data/model analysis. The HTAP IS needs to facilitate open communication between the collaborating analysts. The IT technologies may include wikis and other groupware, social software, science blogs, Skype, etc., along with the traditional communication channels. The HTAP TF also needs to have a workspace where the tools and artifacts of the TF are housed. The most important component of the HTAP IS is the HTAP Integrated Data System (HTAP IDS) discussed below.

The HTAP IS will consists of a connectivity infrastructure (cyberinfrastructure), as well as a set of user-centric tools to empower the collaborating analysts. The HTAP IS will not compete with existing tools of its members. Rather it will embrace and leverage those resources through an open, collaborative federation philosophy. Its main contribution is to connect the TF participating analysts, their information resources and their tools.


HTAP IntDataSys.png
Figure 1. Architecture of the HTAP Integrated Data System

A focal point of HTAP IS is the integrated dataset to be used for model evaluation and pollutant characterization. The front end of the data system is designed to produce this high quality integrated dataset from the available observational, emission and modeling resources. The back-end of the data system is aimed at deriving knowledge from the data, i.e. a variety of the comparisons: model-model, observation-observation, model-observation, etc.

The primary goal of the data system is to allow the flow and integration of observational, emission and model data. The model evaluation requires that both the observations and if possible the emissions are fixed. For this reason it is desirable to prepare an integrated observational database to which the various model implementations can be compared to. The integrated dataset should be as inclusive as possible but, such a goal needs to be tempered by many limitations that preclude a broad, inclusive data integration. The proposed HTAP data system will be a hybrid combination of both distributed and fixed components.


HTAP Integrated Data System (HTAP IDS)

The HTAP IDS will (1)facilitate seamless access to distributed data,(2)allow easy connectivity of data processing components through standard interfaces, and (3)provide a set of basic tools for data processing, integration, and comparisons.

Traditionally data processing, integration, and model comparisons have been performed using dedicated software tools that were handcrafted for specific applications. The Earth observations and modeling of hemispheric transport is currently pursued by individual projects and programs in the US and Europe. These constitute autonomous systems with well defined purpose and functionality. The key role of the Task Force is to assess the contributions of the individual systems and to integrate those into a system of systems.

Both the data providers as well as the HTAP analysts-users will be distributed. However, they will be connected through an integrated HTAP database which should be a stable, virtually fixed database. The section below describes the main component of this distributed data system.

The multiple steps that are required to prepare the integrated dataset are shown on the left. The sequence of operations can be viewed as a value chain that transforms the raw input data into a highly organized integrated dataset.

The operations that prepare the integrated dataset can be broken into distinct services that sequentially operate on the data stream. Each service is defined by its functionality and by firmly specified service interface. In principle, the standards-based interface allows linking of service chains using formal work flow software. The main benefit of such a Service Oriented Architecture (SOA) is the ability to build agile application programs that can respond to changes in the data input conditions as well as the output requirements.

This flexibility offered through the chaining and orchestration of distributed, loosely coupled web services is the architectural framework for the building of agile data systems for the support of future demanding HTAP applications.

The service oriented software architecture is an implementation of the System of Systems approach, which is the design philosophy of GEOSS. Each service can be operated by autonomous providers and the "system" that implements the service is behind the service interface. Combining the independent services constitutes System of Systems. In other words, following the SOA approach, not only the data providers but also the processing services can be distributed and executed by different participants. This flexible approach to distributed computing allows the distribution of labor and the easy creation of different processing configurations.

The part of the integrated data system to the right of the integrated dataset (Figure 1) aids the analysts in performing high level analysis such as data data,..., in particular, the comparison of models and observations. The service oriented architecture of HTAP IS is well suited for the rapid implementation of model intercomparison techniques. At this time, neither the specific model evaluation protocols nor the supporting information system is well defined. It is anticipated, however, that the observational and modeling members of the HTAP TF will develop such protocols soon.

HTAP Information Network

The above described architecture needs to be implemented as soon as possible so that the HTAP integrated dataset can be created and the model data comparisons can commence. An important (incomplete) initial set of nodes for the HTAP information network already exist as shown in Figure 3. Each of these nodes is, in effect, is a portal to an array of datasets that they expose through their respective interfaces. Thus, connecting these existing data portals would provide an effective initial approach of incorporating a large fraction of the available observational and model data into the HTAP network. The US nodes DataFed, NEISGEI and Giovanni are already connected through standard (or pseudo-standard) data access services. In other words, data mediated through one of the nodes can be accessed and utilized in a companion node. Similar connectivity is being pursued to the European data portals Juelich, AeroCom, EMEP and others.

HTAP Network.png
Figure 3. Initial HTAP Information Network Configuration

(Here we could say a few words about each of the main provider nodes) Federated Data System DataFed; NASA Data System Giovanni; Emission Data System NEISGEI; Juelich Data System; AeroCom; EMEP.

HTAP Datasets - need list of others

See 20 selected datasets in the federated data systems, DataFed; TOMS_AI_G - Satellite; SURF_MET - Surface; SEAW_US - Satellite; SCIAMACHYm - Satellite; RETRO ANTHRO - Emission; OnEarth JPL - Satellite ; OMId - Satellite; NAAPS GLOBAL - Model; MOPITT Day - Satellite; MODISd G - Satellite; MODIS Global Fire - Satellite; MISRm G - Satellite; GOMEm G - Satellite; GOCART G OL - Model; EDGAR - Emission; CALIPSO - Satellite; AIRNOW - Surface; VIEWS OL - Surface; AERONETd - Surface; AEROCOM LOA - Model

HTAP Relationship to GEO and GEOSS

There is an outstanding opportunity to develop a mutually beneficial and supportive relationship between the activities of the HTAP Task Force and that of the Group of Earth Observations (GEO). The national and organizational members of GEO have adapted a general architectural framework for turning Earth observations into societal benefits. The three main components of this architecture are models, and observations, which feed into decision support systems for a variety of societal decision making processes.

400px HTAP GEOSS Arch.png
Figure 4. Architecture Framework for HTAP and GEOSS

This general GEO framework is well suited as an architectural guide to the HTAP program. However, it is void of specific guidelines and details that are needed for application areas such as HTAP. The HTAP program provides an opportunity to apply and extend the GEO framework. In case of HTAP, the major activities are shown in the architectural diagram of Figure 4. The modeling is conducted through global scale chemical transport models. The observations arise from satellite, ground-based and airborne observations of chemical constituents and their hemispheric transport. In case of HTAP, a third input data stream is needed for emissions which i neither observation, nor a model.

The HTAP decision support system consists primarily of humans. They need to be supported by an IT infrastructure and a set of enabling tools.

The first cluster is composed by the analysts who are the members of the HTAP task force. Their products are the 2007 and 2009 assessment reports submitted to the HTAP co-chairs and to EMEP. A second shorter report is prepared and submitted to the EMEP executive body, which is the decision making body of the LRTP convention. (Terry, Andre this description of the HTAP DSS needs your help).

Developing a higher resolution design chart for the HTAP DSS is an important task because it can guide the design and and implementation of the supporting information system. Furthermore, the detailed DSS architectural map may also serve as a communications channel for the interacting system of systems components. The insights gained in developing the HTAP DSS may also help the DSS design in similar applications.

The implementation of the GEO framework utilizes the concept of Global Observing System of Systems (GEOSS). Traditionally, Earth observations were performed by well defined systems such as specific satellites and monitoring networks which were designed and operated using systems engineering principles. However, GEO recognized that the understanding of Earth system requires the utilization and integration of the individual, autonomous systems.

While system science is a well developed engineering and scientific discipline, the understanding and development of System of Systems is in its infancy. The work of HTAP TF may provide an empirical testbed for the study of this new and promising integration architecture. Since, the HTAP TF activity encompasses virtually all aspects of GEOSS system of systems integration, it is an attractive "near-term opportunity" to demonstrate the GEOSS concept. An initial low-key demonstration could be accomplished as part of the HTAP TF 2009 assessment. Such a GEOSS demonstration is particularly timely since the data resources, data mediators and the connectivity infrastructure is nearly ready to be connected into a system of systems. Also, there are strong societal drivers to extend an update of LRTP convention to incorporate the air pollution impacts of one continent to another.

An HTAP-GEOSS demonstration would also demonstrate System of Systems approach not through stovepipe but through a dynamic network approach.

This sequence of activities constitutes an end to end approach that turns observations and models into actionable knowledge for societal decision making. One could say that this is an octagonal approach to more deliberate step by step development of GEOSS.