Difference between revisions of "NSF Air Quality Observatory:AQ Observatory Proposal"

From Earth Science Information Partners (ESIP)
Line 254: Line 254:
  
 
=Collaborators and Other Personnel=
 
=Collaborators and Other Personnel=
 +
 +
== Misc Stuff ==
 +
This project will advance understanding about distributed, service-oriented architectures in the presence of semantic impedance. Particular emphasis will be placed on designing for simplicity and extensibility through abstract data typing, service adaptors and polymorphism. Each of these builds upon the above paragraphs and leverages prior work as follows:
 +
 +
* Abstract Data Typing - A crucial factor in the successes of Unidata and DataFed has been the generalization of data access methods through the use of data models. Note, for example, Unidata's development of the netCDF data model in 1988, which deeply informed much follow-on work, such as OpenDAP. Such models are equivalent to very high-level abstract data types, and they  set the stage for syntactically and, potentially, semantically type-checked interfaces and interoperability.
 +
 +
* Service Adaptors - Though theoretically possible, it is entirely unrealistic to expect data providers or the builders of tools and services to adopt high-level data abstractions or retrofit their data-access interfaces to use them. Such expectations are especially unrealistic with the passage of time, as new users create new applications of older data and tools. The approach envisioned in this project is to make extensive use of Web services that function as data wrappers, transformers, aggregators and so forth, each of which exploits data abstraction as a framework in which to project one data type onto another, i.e., to perform semantic impedance matching along with other useful data-synthesis functions. As described above (see Interoperable Data-Processing Services), prototype service adaptors will be among the outcomes of this project. These will be leveraged on prior work, especially THREDDS and GALEON, where experience has been gained, for example, in
 +
** creating virtual aggregations of data from distributed servers;
 +
** cross-projecting data types and coordinate systems to yield semantic interoperability between the 4-dimensional atmospheric modeling community and the GIS community.
 +
 +
* Polymorphism - Programming languages vary in the degrees to which functions can handle inputs of varying types. Some are largely free of data types (user beware of sending some function the wrong type of data), some enforce very strict type-checking (user protected from type-mismatch errors), and still others check types but facilitate the creation of functions whose inputs can accept multiple types of data, a practice dubbed polymorphism. Investigators for this project envision semantic-level polymorphism realized via Web services that perform type checking but allow multiple types of input, perhaps by passing service requests off to other more appropriate Web services. The linchpin for such polymorphism in the AQO will be the Common Data Model, perhaps extended to embrace more data types, because it provides a framework in which to define and to check data types at a high semantic level.
 +
 +
Significant simplicity is gained through these three design principles. If the air-quality and meteorology communities jointly have interest in applying N distinct classes of tools and services to M distinct classes of data, then a straightforward approach to universal access and usability requires order NxM code-development efforts. In contrast--though the challenges of abstraction and polymorphism are great--the intended outcome of this project is a prototype that requires order N+M code-development efforts, roughly one each for creating service adaptors that project the semantics of a given data type, tool or service onto the Common Data Model.
 +
 +
The above technical advances will be realized in an end-to-end prototype designed to enable intellectual advances in the the air-quality domain. The use cases articulated below will require cross-community sharing of services and data, including aggregations of observed and simulated information from multiple sources. An unusual aspect of the AQO will be the joining of real-time, asynchronous data streams with more traditional pull-style Web services.

Revision as of 04:56, January 25, 2006

Links to: AQO Proposal Main Page > Proposal | Proposal Discussion| NSF Solicitation | NSF Solicitation Discussion | People



Link to PDF

Project Summary

Research and management of air quality is addressed by diverse but linked communities that deal with emissions, atmospheric transport and removal processes, chemical transformations and effects. Among the most complex of the attendant cross-disciplinary links are between atmospheric chemistry and meteorology.

The goal of this Air Quality Observatory (AQO) project is to enhance the linkage between these communities through effective cyberinfrastructure. In air quality, a recently developed infrastructure (DataFed) provides access to over 30 distributed data sets (emissions, concentrations, depositions), along with Web-based processing capabilities that serve both research and management needs. In meteorology, long-standing infrastructure developed by Unidata supports a large community of researchers, educators, and decision-makers needing observational data. These two cyberinfrastructure components are most useful at present within the scopes of their respective communities, and great opportunity exists for widening their combined effectiveness.

Intellectual Merit. The overarching contribution of this project is to advance cross-community interoperability among cyberinfrastructure components that are critical in multidisciplinary environmental observation. Interoperability topics will include: access methods for heterogeneous data sources; adapters and transformers to aid in coupling and chaining Web services from distinct communities; and designing service-oriented architectures for simplicity and extensibility in the presence of semantic impedance. Leveraging on current capabilities of Unidata and DataFed, new understandings will be gained on the benefits of abstract data typing, service adaptors and polymorphism in such architectures.

The framework for these advances will be an end-to-end prototype whose functional design will enable intellectual advances in the application domain (as well as in cyberinfrastructure). Several use cases will be explored that require cross-community sharing of tools and data, including aggregations of observed and simulated information from multiple sources. An unusual aspect of the AQO will be the extent to which it joins Unidata's push-style capabilities for handling real-time, asynchronous data streams with more traditional pull-style Web services.

Broader Impact. The AQO will support diverse learners within and beyond the DataFed and Unidata communities per se. Lessons learned in this project will inform builders of other cross-disciplinary cyberinfrastructure, especially those facing semantic impedances and the challenges of real-time streams. Finally, the observatory will support many users, such as federal and state AQ managers performing status and trend analyses, managing exceptional events, or evaluating monitoring networks.

The AQO will leverage, augment and integrate DataFed and Unidata in a prototype cyberinfrastructure component that better serves researchers, decision-makers and teachers of air quality, meteorology, and related fields by overcoming key difficulties. The research team from Washington University and Unidata has decades of experience in developing information technologies and applying them to air-quality analysis, meteorology, and environmental engineering.

NSF and Related Projects

Rudolf Husar's "Collaboration Through Virtual Workgroups" (NSF ATM-ITR small grant #0113868, $445,768, 9/01/01-8/31/04ITR 2001-2004) The ITR technology finding of this project is that web service are now mature enough for the integration of distributed,heterogeneous, and autonomous datasets into homogeneous, federated datasets. The developed tools, DataFed http://datafed.net, allowed real-time, 'just-in-time' data analysis for the characterization and explanation of major air pollution events. In fact, in recent years, such agile analyses have provided real-time support to the air quality managers.

Ben Domenico's "Thematic Real-time Environmental Distributed Data (THREDDS)" project (DUE-0121623, $900,001, 10/01/2001-09/30/2003) was a collaborative initiative to build a software infrastructure to provide students, educators, and researchers with Internet access to large collections of real-time and archived environmental datasets from distributed servers. Unidata is also nearing completion on "THREDDS Second Generation" (DUE-0333600, $554,993, 10/01/03 to 09/30/06) which focuses on integrating GIS information via Open GIS protocols. Additional THREDDS details can be found here: http://www.unidata.ucar.edu/projects/THREDDS/. Dr. Domenico is been involved in several other successful efforts: "Unidata 2008: Shaping the Future of Data Use in the Geosciences"(ATM-0317610, $22,800,000, 10/01/03 to 09/30/08); "Linked Environments for Atmospheric Discovery (LEAD)" (ATM-0331587, $11,250,000, 10/01/03 to 09/30/08); and "DLESE Data Services: Facilitating the Development and Effective Use of Earth System Science Data in Education" (EAR-0305045, $390,985, 09/30/03 to 08/31/06).

Ken Goldman's "Interactive Learning Environment for Introductory Computer Science" project (EIA-0305954, $514,996.00, 8/15/03-7/31/06) involves the development of JPie, an interactive programming environment designed to make object-oriented software development accessible to a wider audience. Programs are constructed by graphical manipulation of functional components so inexperienced programmers can achieve early success without the steep learning curve that precedes development in a traditional textual language. Recent JPie extensions support object-oriented access to relational databases and live client/server development, including dynamic server interface changes, with support for both SOAP and CORBA.

Intellectual and Technical Merit

The overarching technological contribution of this project is to advance cross-community interoperability among cyberinfrastructure components that are critical in contemporary environmental observation. The tangible outcomes will include a prototype observatory that provides genuine end-to-end services needed by two distinct communities of users and simultaneously advances the state of the art in designing observatories for multidisciplinary communities of users. Each of the communities participating in this study has operational systems that will be leveraged to create the prototype, but the marriage of their systems presents significant design challenges within which to study important interoperability questions.

Specifically, the joining of the air-quality and meteorology communities will require (1) effective global access to distinct but overlapping, heterogeneous data streams and data sets; (2) use of these data in distinct but overlapping sets of tools and services, to meet complex needs for analysis, synthesis, display and decision support; and (3) new combinations of these data and (chained) services such as can be achieved only in a distributed, service-oriented architecture that exhibits excellence of functional and technical design, including means to overcome the semantic differences that naturally arise when communities develop with distinct motivations and interests.

Interoperable Data Access Methods. Re effective global access to heterogeneous data streams and data sets, the contributions of this research will advance the state of interdisciplinary data use. Specific emphasis will be placed on mediating access to diverse types of data from remote and in-situ observing systems, combined with simulated data from sophisticated, operational forecast models. The observational sources will include satellite- and surface-based air quality and meteorological measurements, emission inventories and related data from the remarkably rich arrays of resources presently available via Unidata and DataFed.

The tangible output of this research component will be an extended Common Data Model (including the associated metadata structures), realized in the form of (interoperable) Web services that will meet the data-access needs of both communities and that can become generally accepted standards.

Interoperable Data-Processing Services. Interoperability among Web services at the physical and syntactic level is, of course, assured by the underlying Internet protocols, though semantic interoperability is not. In the case of SOAP-based services, WSDL descriptions permit syntax checking, but higher-level meanings of data exchange are inadequately described in the schema. Hence, SOAP-based services developed by different organizations for different purposes are rarely interoperable in a meaningful way. The research contribution on this topic will include--within key contexts for environmental data use--development of Web-service adapters that provide loosely coupled, snapping interfaces for Web services created autonomously in distinct communities.

Distributed Applications. The Service Oriented Architecture (SOA) movement, among others, indicates ongoing intellectual interest in the (unmet) challenges of distributed computing. Our team’s experience with SOA in recent years has demonstrated that useful applications can be built via Web-service chaining, but our current prototypes--including DataFed--operate within a context where service interoperability is assured by internal, community-specific conventions. As the AQO evolves into a fully networked system, distributed applications built upon its infrastructure will need to be robust, evolvable and linkable to the system of systems that is the Web.

Broader Impacts

The Air Quality Observatory through its technologies and applications will have broader impact on the evolving cyberinfrastructure, air quality management and atmospheric science.

Impact on Cyberinfrastructure

Infusion of Web Service Technologies. The agility and responsiveness of the evolving cyberinfrastructure is accomplished by loose coupling and user-driven dynamic rearrangement of its components. Service orientation and web services are the key architectural and technological features of AQO. These new paradigms have been applied by the proposing team for several years generating applications and best-practice procedures that use these new approaches. Through collaborative activities, multi-agency workgroups and formal publications the web-service based approach will be infused into the broader earth science cyberinfrastructure. [support letter?]

Technologies for Wrapping Legacy Datasets. The inclusion of legacy datasets into the cyberinfrastructure necessitates the wrapping of legacy datasets with formal interfaces for programmatic access, i.e. turning data into services. In the course of developing these interfaces to a wide variety of air quality, meteorology and other datasets the proposal team has developed an array of data wrapping procedures and tools. A wide distribution of these wrappings will assure the rapid growth of the content shareable through the cyberinfrastructure and the science and societal benefits resulting from the “network effect”.

Common Data Models for Multiple Disciplines. The major resistance in the horizontal diffusion of data through the cyberinfrastructure arises from the variety of physical and semantic data structures for earth science applications. Common data models are emerging that allow uniform queries and standardized, self-describing returned data types. Through the development, promotion and extensive application of these common, cross-disciplinary data models, the AQO will contribute to interoperability within the broader earth science community.

Impact on Air Quality Management

Federal and State Air Quality Status and Planning. DataFed has already been used extensively by federal and state agencies to prepare status and trend analysis and to support various planning processes. The new air quality observatory with the added meteorological data and tools will more effectively ???

Exceptional Air Quality Events. AQ management is being increasingly responsive to the detection, analysis and management of short term systems. The combined DataFed-Unidata system and the extended cyberinfrastructure of AQO will be able to support these activities with increased effectiveness and through the just-in-time delivery of actionable knowledge to decision makers in AQ management organizations as well as the general public. [support letter?]

Monitoring Network Assessment. A current revolution in remote and surface based sensing of air pollutants is paralleled by a bold new National Ambient Air Monitoring Strategy (NAAMS ref). The effectiveness of the new strategy will depend heavily on cyberinfrastructure for data collection, distribution and analysis for a variety of applications. The cyberinfrastructure will also be needed to assess the overall effectiveness of the monitoring system that now includes data from multiple Linking Agencies, Disciplines, Media and Global Communities [support letter?]

Impact on Atmospheric Science and Educations

Chemical Model Evaluation and Augmentation. Dynamic air quality models are driven by emissions data and/or scenarios and a module that includes air chemistry and meteorology to calculate the source receptor relationships. The chemistry models themselves can be embedded in larger earth systems, models and they can serve as inputs into models for health, ecological and economic effects. The air quality observatory will provide homogenized data resources for model validation and also for assimilation into advanced models[fix]. A good example is the assimilation of satellite-based smoke emission estimates into near-term forecast models. [support letter?]

International Air Chemistry Collaboration. A significant venue for advancing global atmospheric chemistry is through international collaborative projects that bring together the global research community to address complex new issues such as intercontinental pollutant transport. The AQO will be able to support these scientific projects using real-time global scale data resources, the user-configurable processing chains and the user-defined virtual dashboards. [support letter?]

Near-Term Application of GEOSS. A deeper understanding of the earth system is now being pursued by a Global Earth Observation System of Systems (GEOSS, ref) which now includes the cooperation of over sixty nations. Air quality was identified as one of the near-term opportunities for demonstrating GEOSS through real examples. The AQO prototype can serve as a test bed for GEOSS demonstrations. [support letter?]

Long Term Sustainability.

Project Description: Air Quality Observatory (AQO)

Introduction

Research and management of air quality is addressed by several diverse communities. Pollutant emissions are determined by environmental engineers, atmospheric transport and removal processes are mainly in the domain of meteorologists, pollutant transformations are in the purview of atmospheric chemists and air-quality analysts, and the impacts of air pollution are assessed by health scientists, ecologists, and economists. Among the most dynamic and structurally complex cross-disciplinary link is between atmospheric chemistry and meteorology.

In recent years, command and control style air quality management is giving way to a more participatory approach that includes the key stakeholders and encourages the application of more science-based ‘weight of evidence’ approaches to controls. The air quality regulations now emphasize short-term monitoring and air quality goals are set to glide toward ‘natural background’ levels. EPA has also adopted a new National Ambient Air Monitoring Strategy3 {(NAAMS}. to provide more relevant and timely data for these complex management needs. Real-time surface-based monitoring networks now provide routinely patterns of fine particles and ozone throughout the US. Satellite sensors with global coverage depict the pattern of haze, smoke and dust in stunning detail. The emergence a new cooperative spirit to make effective use of these developments is exemplified in the Global Earth Observation System of Systems (with over 60 nation membership), where air quality is identified as one of the near-term opportunities for collaborative data integration.

The increased data supply and the demand for higher grade AQ information products is a grand challenge for both environmental and information science communities. The current dedicated ‘stove-pipe’ information systems are unable to close the huge information supply-demand gap. Fortunately, information technologies now offer outstanding opportunities. Gigabytes of data can now be stored, processed and delivered in near-real time. Standardized computer-computer communication protocols and Service-Oriented Architectures (SOA) now facilitate the flexible processing of raw data into high-grade ‘actionable’ knowledge. The instantaneous ‘horizontal’ diffusion of information via the Internet now permits, in principle, the delivery of the right information to the right people at the right place and time.

The vision of this research is to improve air quality through a more supportive information infrastructure. The specific project obectives are to (1) Improve the interoperability infrastructure that links the air wuality and the meteorological communities (2) Develop a prototype air quality onservatory (AQO) and (3) Demostrate the utility of AQO through three cross-cutting use cases.

Interoperability Infrastructure

Overcoming Semantic Impedance

This project envisions innovative uses of Web services to enable new levels of cross-discipline interoperability among the tools, data sets and data streams employed by members of the air-quality and meteorology communities. Although needs for interoperability could possibly be addressed on a case-by-case basis as need arises, this is unrealistic, as the effort would be roughly proportional to NxM, where N is the number of data types and M is the number of tools, processes and services that pertain (aggregating these from all of the communities involved).

The approach embodied by this proposal reduces the effort through design simplicity akin to that of middleware, except that the middle layer will be constructed of Web services capable of data processing tasks (repackaging, transforming, aggregating, etc.) that support mapping or projecting one data type onto another. Rather than creating NxM mapping services, project investigators envision a many-to-one or a many-to-few set of mappings that exploit an extended version of the Common Data Model under development in Unidata (see the netCDF/CDM Web page). This model will serve as a high-level, abstract data type onto which can be projected most of the types of data presently used in the air-quality, air-chemistry and meteorolgy communities.

Fewer than N+M such projections will be required to capture the essential semantics of data use in both communities, thus setting the stage for unprecedented levels of interoperability (i.e., semantic as well as syntactic impedance matching) among the air-quality and meteorology communities.

Figure ???

The polymorphism of the data access system is shown in Figure ??? The flexible connections of DataFed are expedited by dataset specific wrappers that homogenize the data into virtual cubes. Adapters that facilitate queries to the abstract data cubes through multiple data access protocols. An additional flexibility is provided by a choice of data formats compatible with each of the protocols. The Comon Data Model developed at Unidata serves a similar purpose.

Carrying this a step further, the various Web services will themselves be interoperable in a fashion that allows the composition or chaining of functions. Hence, with a minimum of semantic complexity (and software development) the capabilities of the AQO can grow in a combinatorial fashion, evolving and adapting to the changing needs of a user base that increases both in size and diversity. Subsequent sections indicate how the achievement of this vision can be leveraged on extant cyberinfrastructure components and standards in the air-quality and meteorology communities.

Standards Based Interoperability

Data and service interoperability among Air Quality Observatory (AQO) participants will be fostered through the implementation of accepted standards and protocols. Adherence to standards will foster interoperability not only within the AQO but also among other observatories, cyberinfrastructure projects, and the emerging GEOSS efforts. Standards for finding, accessing, portraying and processing geospatial data are defined by the Open Geospatial Consortium (OGC) (Buehler and McKee, 1998). The AQO will implement many of the OGC specifications for discovering and interacting with its data and tools.

The most established OGC specification is the Web Map Server (WMS) for exchanging map images but the Web Feature Service (WFS) and Web Coverage Service (WCS) are gaining wider implementation. WFS provides queries to discrete feature data output in Geography Markup Language (GML) format. WCS allow access to multi-dimensional data that represent coverages, such as grids. While these standards are based on the geospatial domain, they are being extended to support non-geographic data "dimensions." For example, WCS is designed to ultimately support coverage formats other than grids, such as point data of spatially continuous phenomena that vary over time (e.g. a dataset from a temperature monitoring network).

The success of OGC specifications have led to efforts to develop interfaces between them and other common data access protocols (e.g. OPeNDAP, THREDDS). For example the GALEON Interoperability Experiment, led by Ben Dominico at Unidata, is developing a WCS interface to netCDF datasets to "map" the multi-dimenional atmospheric model output into the three dimensional geospatial framework.

AQO development will actively explore the use of, and extensions to, the WCS specification for accomodating air quality and atmospheric data. Advances from GALEON will be incorporated and the WCS will be extended to provide a powerful interface for building multi-dimensional queries to monitoring, model, and satellite data. WCS supports x,y,z and time dimensions and allows server-defined "dimensions" for other parameters.

The interaction through the WCS service is through a client-server 'conversation' initiated by the client by requesting the server to offer its list of capabilities (e.g. datasets). Next, the client requests more detailed description of a desired dataset including return data format choices. In the third call to the server, the client sends specific data requests to the server, which is formulated in terms of a physical bounding box (X,Y,H) and time range. Following such universal data requests the server can deliver the data in the desired format, such that further processing in the client environment can proceed in a seamless manner. Currently the first two steps of the conversation are preformed by humans but it is hoped that new technologies will aid the execution of 'find' and 'bind' operations for distributed data and services.

Other evolving and emerging specifications will be explored. OGC Catalog Services (CSW) support publishing and searching collections of metadata and services. The OGC Sensor Web Enablement (SWE) activity includes a number of emerging specifications including, Sensor Observation Service (SOS) for retrieving observation datasets such those from monitoring networks. The proposed Web Processing Service (WPS) offers geospatial operations, including traditional GIS processing and spatial analysis algorithms, to clients across networks.

Interoperability testing and prototyping will be conducted through service compliance and standards gap analysis. Use of OGC specifications and interaction with OGC during the development of the AQO prototype will be facilitated by Northrop Grumman. Northrop Grumman acts as the Chair of the OGC Compliance and Interoperability Subcommittee and is nearing completion of an open source compliance engine soon to be adopted by the OGC. Compliance testing of AQO prototype will ensure more complete interoperability and establish an AQO infrastructure that can be networked with other infrastructure developments.

New Activities Extending the Infrastructure

The proposed Observatory will consist of a large collection of independent data services, as well as applications that operate on those data services in order to collect, analyze and disseminate information. Applications will be created by multiple organizations and will require interaction with the data and applications created by other organizations. Furthermore, individual applications will need to be modified over time without disruption of the other applications that depend upon them.

To support this high degree of interoperability and dynamic change, we plan to leverage our ongoing research efforts on the creation of a shared infrastructure for the execution of distributed applications4 {TG2005}. Important goals of for this work include support for installation, execution, and evolution (live upgrades) of long-running distributed applications. For those applications that require high levels of robustness, the infrastructure will provide strong guarantees that installed applications will continue to execute correctly in spite of failures and attacks, and that they will not interfere with one another. To support widespread sharing of resources and information, the computing infrastructure is being designed in a decentralized way, with computing resources provided by a multitude of independently administered hosts with independent security policies.

The execution model for this infrastructure captures a wide class of applications and supports integration of legacy systems, including applications written using SOAP. The execution model consists of an interconnected graph of data repositories and the work flow transactions that access them. By separating repositories from computation, the model simplifies the creation of applications that span multiple organizations. For example, one organization might install an application that reads from the data repository of a second organization and writes into a repository used by a third organization. Each application will specify its own security policies and fault-tolerance requirements.

Building on prior experience in constructing distributed systems infrastructure4,5,6 {GSMAS1995, SGM2005, PEMDG2000}, work is underway is to design and implement algorithms, protocols, and middleware for a practical shared computing infrastructure that is incrementally deployable. The architecture will feature dedicated data servers and transaction servers that communicate over the Internet, that run on heterogeneous hosts, and that are maintained and administered by independent service providers. For applications that require fault-tolerance, the servers will participate in replica groups that use an efficient Byzantine agreement protocol to survive arbitrary failures, provided that the number of faulty replicas in a group is less than one third of the total number of replicas. Consequently, the infrastructure will provide guarantees that once information enters the system, it will continue to be processed to completion, even though processing spans multiple applications and administrative domains.

The Observatory will be able to benefit from this infrastructure in several ways, most notably ease of application deployment and interoperability. In addition, the infrastructure will provide opportunities for reliable automated data monitoring. For example, we anticipate that ongoing computations, such as those that perform “gridding” operations on generated data points, will be installed into the system and happen automatically. Moreover, that some of the data analysis that is currently performed on demand could be installed into the system for ongoing periodic execution. This will result in the availability of shared data repositories not only for raw data, but also for information that is the result of computational synthesis of data obtained from multiple data sources. Researchers will be able to install applications into the infrastructure to make further use of these data sources, as well as the raw data sources, as input to their application. The fact that installation in this infrastructure is managed as a computation graph provides additional structure for certifying the source and derivation of information. Knowing the sources and destinations of each information flow in the computation graph could enable, for example, the construction of a computation trace for a given result. This could be useful for verifying the legitimacy of the information, since the information trace would reveal the source of the raw data and how the result was computed.

Protoype Air Quality Observatory

The functional design of the systems will incorporate responding to AQ event in real time; delivering need information to decision makers (AQ managers, public) and overcoming the syntactic and semantic impedances. The cyberinfrastructure design of the AQO will accommodate a variety of data sources, respond to different types of AQ events, offer simplicity via a Common Data Model, fosters interoperability through standard protocols, provide user-programmable components for data filtering, aggregation and fusion, employ a service-oriented architecture for linking loosely coupled web-services, and facilitate the creation of user-configurable monitoring consoles. The software application framework and the prototype will also serve as a test bed for testing advanced computer science ideas on ...

Figure ???

The prototype AQO will be an extension of the DataFed air quality information system, depicted in Figure ???On the left are the providers of heterogeneous distributed air quality-related datasets and on the far right are the users requiring reports and other high level information products. The first processing stage is data homogenization into a common data model such as a multi-dimensional “cube.” This is accomplished by data wrappers. Next the cubed data are sliced, diced and otherwise filtered or aggregated by simple tools for exploring a given air pollution situation More elaborate processing occurs by applications that chain a set of web services as part of user-programmable web applications. The final products are the reports and summaries for consideration by decision-makers. The underlying infrastructure and some of the tools for this type of analysis have been operational in DataFed since 2004. This includes a catalog for publishing, finding and describing datasets. There are data access wrapper classes for station-monitoring data, squentiel images, grids, and trajectories. Web services exist for gridding, aggregation, filtering, spatial and temporal rendering, view overlays and annotations. User-defined data views are created by chaining web services for each data layer and then applied the overlay services for the creaiton of multi-layered views(e.g. maps, time charts).

However, the cross-linkages to meteorology and other fields has been minimal.

The Unidata system will be the mediator for passing meteorological data into the observatory. The enabling Unidata technologies include real time data distribution, advanced push technologies, cross-disciplinary desktop visualization tools, mechanisms for tracking events, standards-based remote access, and others.The prototype Observatory will be greater than the sum of its parts since it will enable access to data and functionality of both systems and it will foster the fusion of multi-disciplinary data and synergism of the combined functionality.

Figure ???

AQO Architecture. The architecture design of the Air Quality Observatory is that of a network as illustrated in Figure ???. The network will consist of nodes which can be both as servers as well as clients of air quality-relevant data. Since the nodes belong to different organizations, the serve a variety of communities, they have different data needs. There is also considerable heterogeneity in which the nodes conduct internally their business of finding, accessing, transforming, and delivering data. In Figure ??? the Unidata and DataFed netorwk nodes are shown in more detail. [Unidata node description…]. The DataFed system accesses air quality-relevant data that is in the form of tables, images, grids, etc. from a variety of sources.


Other server nodes in the AQO network will include the NASA Goddard DAAC server, which provides access to immense volume of satellite data contained in their “data pool.” At that server, a WCS interface is being implemented that will facilitate its joining the AQO network. The managers of the Goddard DAAC node have also expressed strong interest in accessing air quality data from DataFed and meteorological data from Unidata. Similarly, there are good prospects of adding an EPA node to the AQO will be able to serve air quality model forecasts as well as access to an array of monitoring data. The participation of these additional AQO nodes is to be arranged and performed independently. However, this NSF AQO prototype project will provide the architectural framework for such networking, connectivity tools such as adapters and also can serve as the testbed for the expanding AQO nodes. Attracting new nodes and guiding their inclusion will be pursued by the project team members through multiple venues; membership through workgroups, ESIP, training workshops....

The interoperability of these heterogeneous systems can be achieved through the adaptation of uniform query protocol such as provided by the OGC Web Coverage Service. To achieve this level of interoperability, a significant role is assigned to the adapter services that can translate data formats and make other adjustments of the data syntax. These services can be provided by the server, by the client or by third mediators, such as DataFed [and Unidata?]. This type of “loose coupling” between clients and servers will allow the creation of dynamic, user-defined processing chains.

The AQO will be connected through a mediated peer-to-peer network. The mediation will be performed by centralized or virtually centralized catalog services which enable the Publish-Find operations needed for loosely coupled web-service-based networking. The Bind i.e. data access operation will be executed directly through a protocol-driven peer-to-peer approach. The Unidata THREDDS system performs the meteorological data brokering, while the DataFed Catalog serves the same purpose for the AQ data. Other candidate AQO nodes are currently brokered through their own catalog services, e.g. ECHO for the NSAS Goddard DAAC. The unification (physically or virtually) of these distributed catalog services will be performed using the best available emerging web service brokering technologies.

Combined Air Quality - Meteorolgy Tools

"Today the boundaries between all disciplines overlap and converge at an accelerating pace. Progress in one area seeds advances in another. New tools can serve many disciplines, and even accelerate interdisciplinary work." ( Rita R. Colwell, Director, NSF, February 2003). Many in the AQ and the meteorological communities have had a longstanding desire to create new knowledge together, but with marginal success. It is hoped that a the AQ Observatory with its shared data and tools will increase their combined creativity and productivity.

The development of the cyberinfrastructure that brings together real-time air quality data through DataFed and meteorological data through the Unidata system offers the possibility of creating powerful new synergistic tools, such as the Combined Aerosol Trajectory Tool, CATT. [CATT ref] . In the CATT tool, air quality and pollutant transport (trajectory) data are combined in an exploration program that highlights the source regions from which high or low pollutant concentrations originate from. Advanced data fusion algorithms applied in CATT have already contributed to the characterization of unexpected aerosol source regions such as the nitrate source over the Upper Midwest8. {Poirot et al.al}.

Figure 3 illustrates the current capabilities of the tool, by highlighting in red, the airmass trajectories that carry the highest concentration of sulfate over the Eastern US on a particular day. Currently, the real-time application of such a diagnostic tool is not possible since necessary IT infrastructure for bringing together and fusing the AQ and transport data does not exists.


Figure 3. CATT diagnostic tool for dirty and clean air source regions.



With the IT infrastructure of the Air Quality Observatory, which seemlessly links real-time AQ monitoring data to current and forecast meteorology, the CATT tool could be a significant addition to the toolbox of air quality analysts and meteorologists.

Other synergistic tools include Analysts Consoles, which consists of an array of maps and charts similar to the 'meteorolgical wall' where forecaster cover their walls with relavant corrent and forecast information. The View-based data processing and rendering system of DataFed, is well suited to create such data views from the distributed data streams. Early protoyping has hown that Virtual Analysts Concoles are indeed feasible. However, there is considerble development required to make the preparation of user-defined data views easy and fast. Also, facilities are needed to to layout the views in a console according to the users needs.

Exiting new synergsm possibilities are offered through the advanced high-resolution forecast modeling offered through inthe LEAD project of Unidata. [Hey Ben, did I hear that with LEAD a user could set up and run a [nested?] local forecast model over a user-defined spatial window? If so, it would be a terrific new way to derive smoke emissions from forest fires! Lets compare notes on that.]

We plan to investigate multiple paradigms for the construction of interoperable distributed applications. One promising approach builds upon a separation of data and process. In particular, data repositories are seen entirely as passive entities that are acted upon by processes or transactions that are separately installed into the infrastructure. In this way, a variety of organizations can contribute to the Observatory by installing a mixture of data and processing into the infrastructure. Some repositories may contain raw data, while others may be derived from ongoing computation. Data repositories and application processes may by installed to physically execute on the same host, but logically separating them provides a design advantage over traditional work flow models. In particular, when computation is not tied to a particular data server, it becomes easier to construct applications that span multiple organizations. Processes may be installed into the infrastructure as independent entities for ongoing execution, without integrating the code for those processes into the data server implementations.

Use Cases for Prototype Demonstration

The proposed Air Quality Observatory prototype will demonstrate the benefits of the IIT through three use cases that are both integrating [cross-cutting], make a true contribution to AQ science and management and also place significant demand on the IIT. Use case selection driven by user needs: [letters EPA, LADCO, IGAC]. Not by coincidence, these topics are areas of active research in atmospheric chemistry and transport at CAPITA and other groups. The cases will be end-to end, connecting real data produces, mediators and well as decision makers. Prototype will be demonstration of seamless data discovery and access, flexible analysis tools and delivery.

[use future IT scenarios to illustrate the contribution of the advanced AQP IT]

1) Intercontinental Pollutant Transport. Sahara dust over the Southeast, Asian dust, pollution, [20+ JGR papers facilitated on the Asian Dust Events of April 1998 - now more can be done, faster and better with AQO] [letter from Terry Keating?]

2) Exceptional Events. The second AQO use case will be demonstration of real-time data access/processing/delivery/response system for Exceptional Events (EE). Exceptional AQ events include, smoke from natural and some anthropogenic fires, windblown dust events, volcanoes and also long range pollution transport events from sources such as other continents. A key feature of exceptional events is that they tend to be episodic with very high short-term concentrations. The AQO information prototype system needs will provide real-time characterization and near-term forecasting, that can be used for preventive action triggers, such as warnings to the public. Exceptional events are also important for long-term AQ management since EE samples can be flagged for exlosion from the National Ambient Air Quality Standards calculations. The IIT support by both state agencies and fedral gov...[need a para on the IIT support to global science e.g. IGAC projets] During extreme air quality events, the stakeholders need more extensive 'just in time analysis’, not just qualitative air quality information.

3) Midwestern Nitrate Anomaly. Over the last two years, a mysterious pollutant source has caused the rise of pollutant levels in excess of the AQ standard over much of the Upper Midwest in the winter/spring. Nitrogen sources are suspected since a sharp rise in nitrate aerosol is a key air component. The phenomenon has eluded detection and quantification since the area was not monitored but recent intense sampling campaigns have implicated NOX and Ammonia release from agricultural fields during snow melt. This AQO use case will integrate and facilitate access to data from soil quality, agricultural fertilizer concentration and flow, snow chemistry, surface meteorology and air chemistry.

Participant Qualifications

DataFed is a community-supported effort led by CAPITA at Washington University. While the data integration web services infrastructure was initially supported by specific information technology grants form NSF and NASA, the data resources are contributed by the autonomous providers. The application of the federated data and tools is in the hands of users as part of specific projects. Similar to how data quality improves by passing it through many hands, the analysis tools will also improve with use and feedback from data analysts. A partial list is at http://datafed.net/projects. Rudolf Husar is Professor of Mechanical engineering, director of the Center for Air Pollution Impact and Trend Analysis (CAPITA) and will lead the DataFed integration into AQO. Dr. Husar brings 30+ years of experience in AQ analysis and environmental informatics to AQO project.

Unidata is a diverse community of education and research institutions vested in the common goal of sharing data, tools to access the data, and software to use and visualize the data. Successful cooperative endeavors have been launched through Unidata and its member institutions to enrich the geosciences community. Unidata's governing committees facilitate consensus building for future directions for the program and establish standards of involvement for the community. Ben Domenico is Deputy Director of Unidata. Since its inception in 1983, Domenico was an engine that turned Unidata into one of the earliest examples of successful cyberinfrastructure, providing data, tools and general building support to the meteorological research and education community.

The Kenneth Goldman resarch group at Washington University recent work includes JPie, a novel visual programming environment that supports live construction of running applications. In addition, the group is currently working on algorithms and middleware for a fault-tolerant shared infrastructure that supports evolvable long-running distributed applications. Kenneth J. Goldman is an Associate Professor in the Washington University Department of Computer Science and Engineering and brings to this project over 20 years of research experience in the areas of distributed systems and programming environments.

Northrop Grumman Corporation (NG) contributes expertise in development and implementation of geospatial applications, architectures and enterprise-wide solutions. NG is a Principal Member of the OGC and has been influential in defining an interoperable open infrastructure that is shared across user communities. Through its development of OGC’s Compliance Testing Tools, NG leads the geospatial community in insight into service providers’ and GIS vendors’ compliance to OGC standards. Stefan Falke, Systems Engineer with NG, will lead the NG team. He brings experience in applying OGC based services (including DataFed's) to NG's projects. As a part-time research professor of Environmental Engineering at Washington University, he is involved in air quality cyberinfrastructure projects for satellite and emissions data. Dr. Falke is co-lead for the ESIP Air Quality Cluster.

Workplan

The AQO project will be lead by Rudolf Husar (CAPITA) and Ben Domenico (Unidata). CAPITA and Unidata with their rich history and the experience of their staff will be the pillars of the AQO. The active members of the AQO network will come from the ranks of data providers, data users and value-adding mediators-analysts. The latter group will consist of existing AQ research projects funded by EPA, NASA, NOAA, NSF that have data, tools, or expertise to contribute to the shared AQO pool.

Husar will lead the integration of DataFed...

Domenico will lead the integraion of Unidata...

Goldman will develop the capability of creatin distributed applications...

Falke will coordinate the OGC interoperability compliance and standard gap analysis.

Coordination among activities and organizations will be fostered through a combination of virtual and physical interaction. Approximately exery sixth months the core AQO team will meet as part of the Earth Science Information Partners (ESIP) Federation meeting, as part of the Air Quality Cluster. The ESIP Federation and AQ Cluster provide an inter-agency and inter-organization environment in which to Communication between meetings will be handled through a shared Wiki site.


Infrastructure

Prototype

Use Cases


glossay acronyms

References Cited

1. Husar, R.B., et al. The Asian Dust Events of April 1998; J. Geophys. Res. Atmos. 106, 18317-18330, 2001. See event website: http://capita.wustl.edu/Asia-FarEast/

2. Wayland, R.A.; Dye, T.S. AIRNOW: America’s Resource for Real-Time and Forecasted Air Quality Information; Environmental Manager 2005, September, 19-27, 2005.

3. {(NAAMS)} National Ambient Air Monitoring Strategy, Draft, OAQPS. USEPA, December 2005, http://www.epa.gov/ttn/amtic/files/ambient/monitorstrat/naamstrat2005.pdf

4. {TG2005} Thorvaldsson, Haraldur D.; Goldman, Kenneth J. "Architecture and Execution Model for a Survivable Workflow Transaction Infrastructure." Washington University Department of Computer Science and Engineering, Technical Report TR-2005-61, December 2005.

5. {GSMAS1995} Kenneth J. Goldman, Bala Swaminathan, T. Paul McCartney, Michael D. Anderson, and Ram Sethuraman. “The Programmers' Playground: I/O Abstraction for User-Configurable Distributed Applications.” IEEE Transactions on Software Engineering, 21(9):735-746, September 1995.

6. {SGM2005} Sajeeva L. Pallemulle, Kenneth J. Goldman, and Brandon E. Morgan. Supporting Live Development of SOAP and CORBA Servers. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS’05), pages 553-562, Washington DC, 2005.

7. {PEMDG2000} Jyoti K. Parwatikar, A. Maynard Engebretson, T. Paul McCartney, John D. Dehart, and Kenneth J. Goldman. “Vaudeville: A High Performance, Voice Activated Teleconferencing Application,” Multimedia Tools and Applications, 10(1): 5-22, January 2000. Buehler, K. and L. McKee, The OpenGIS Guide: Introduction to Interoperable Geoprocessing and the OpenGIS Specification, Waltham, MA (1998).

8. Husar, R.; Poirot, R. DataFed and Fastnet: Tools for Agile Air Quality Analysis; Environmental Manager 2005, September, 39-41

Buehler, K. and L. McKee, The OpenGIS Guide: Introduction to Interoperable Geoprocessing and the OpenGIS Specification, Waltham, MA (1998).

Biographical Sketches

Collaborators and Other Personnel

Misc Stuff

This project will advance understanding about distributed, service-oriented architectures in the presence of semantic impedance. Particular emphasis will be placed on designing for simplicity and extensibility through abstract data typing, service adaptors and polymorphism. Each of these builds upon the above paragraphs and leverages prior work as follows:

  • Abstract Data Typing - A crucial factor in the successes of Unidata and DataFed has been the generalization of data access methods through the use of data models. Note, for example, Unidata's development of the netCDF data model in 1988, which deeply informed much follow-on work, such as OpenDAP. Such models are equivalent to very high-level abstract data types, and they set the stage for syntactically and, potentially, semantically type-checked interfaces and interoperability.
  • Service Adaptors - Though theoretically possible, it is entirely unrealistic to expect data providers or the builders of tools and services to adopt high-level data abstractions or retrofit their data-access interfaces to use them. Such expectations are especially unrealistic with the passage of time, as new users create new applications of older data and tools. The approach envisioned in this project is to make extensive use of Web services that function as data wrappers, transformers, aggregators and so forth, each of which exploits data abstraction as a framework in which to project one data type onto another, i.e., to perform semantic impedance matching along with other useful data-synthesis functions. As described above (see Interoperable Data-Processing Services), prototype service adaptors will be among the outcomes of this project. These will be leveraged on prior work, especially THREDDS and GALEON, where experience has been gained, for example, in
    • creating virtual aggregations of data from distributed servers;
    • cross-projecting data types and coordinate systems to yield semantic interoperability between the 4-dimensional atmospheric modeling community and the GIS community.
  • Polymorphism - Programming languages vary in the degrees to which functions can handle inputs of varying types. Some are largely free of data types (user beware of sending some function the wrong type of data), some enforce very strict type-checking (user protected from type-mismatch errors), and still others check types but facilitate the creation of functions whose inputs can accept multiple types of data, a practice dubbed polymorphism. Investigators for this project envision semantic-level polymorphism realized via Web services that perform type checking but allow multiple types of input, perhaps by passing service requests off to other more appropriate Web services. The linchpin for such polymorphism in the AQO will be the Common Data Model, perhaps extended to embrace more data types, because it provides a framework in which to define and to check data types at a high semantic level.

Significant simplicity is gained through these three design principles. If the air-quality and meteorology communities jointly have interest in applying N distinct classes of tools and services to M distinct classes of data, then a straightforward approach to universal access and usability requires order NxM code-development efforts. In contrast--though the challenges of abstraction and polymorphism are great--the intended outcome of this project is a prototype that requires order N+M code-development efforts, roughly one each for creating service adaptors that project the semantics of a given data type, tool or service onto the Common Data Model.

The above technical advances will be realized in an end-to-end prototype designed to enable intellectual advances in the the air-quality domain. The use cases articulated below will require cross-community sharing of services and data, including aggregations of observed and simulated information from multiple sources. An unusual aspect of the AQO will be the joining of real-time, asynchronous data streams with more traditional pull-style Web services.