NSF Air Quality Observatory:AQ Observatory Proposal

From Earth Science Information Partners (ESIP)
Links to: AQO Proposal Main Page > Proposal | Proposal Discussion| NSF Solicitation | NSF Solicitation Discussion | People





Air Quality Observatory (AQO)


Proposal in a Nutshell:
Topic: Air Quality Observatory (AQO): Protoype based on Modular Service-based Infrastructure
IT Infrastructure: Standard Data & Tools Sharing || Orchestration of Distributed Apps || Communication, Cooperation, Coord.
AQO Prototype: DataFed extensions || THREDDS extensions || DataFEd/THEDDS fusion || Link to GIS || Community work space
Use Cases:Realtime AQ Event Detection, Analysis and Response || Intracontinental Transport || Midwest Nitrate Mystery
Management:CAPITA/Unidata, Collaborators Team || Multi-agency and Project Particpation || ESIP Facilitation

Introduction

Traditionally, air quality analysis was a slow, deliberate investigative process occurring months or years after the monitoring data had been collected. Satellites, real-time pollution detection and the World Wide Web have changed all that. Analysts can now observe air pollution events as they unfold. They can ‘congregate’ through the Internet in ad hoc virtual work-groups to share their observations and collectively create the insights needed to elucidate the observed phenomena. Air quality analysis his becoming much more agile and responsive to the needs of air quality managers, the public and the scientific community.

The era of near real-time air quality analysis began in the late 1990s, with the availability of real-time satellite data over the Internet. High-resolution color satellite images were uniquely suited for early detection, and tracking of extreme natural or anthropogenic pollution events events. In April 1998, for example, a group of analysts keenly followed and documented on the Web, in real-time the trans-continental transport and impact of Asian dust from the Gobi desert on the air quality over the Western US. (Husar, et al., 2001, http://capita.wustl.edu/Asia-FarEast).

The high value of qualitative real-time air quality information to the public is well demonstrated through EPA’s successful AIRNOW program (Weyland and Dye, 2006). However, during extreme air quality events like the above-mentioned, air quality managers need more extensive 'just in time analysis’, not just qualitative air quality information. In fact, in the1998 Asian Dust Event, local air quality managers in Oregon and Washington used the real-time analysis to issue health advisories. Soon after the CentralAmerican Smoke Event, the federal EPA granted some states ‘exceptional event’ exemptions from ozone standard violations. These responsive air quality management actions were largely facilitated by the agile event-analyses provided by the ad hoc community of scientist and managers collaborating through the Internet. In recent years, air quality management has also changed . The old command and control style is giving way to a more participatory approach that includes all the key stakeholders from multiple jurisdictions and application of scientific and ‘weight of evidence’ approaches. The air quality regulations now emphasize short-term monitoring while at the same time long-term air quality goals are set to glide toward ‘natural background’ levels over the next decades. In response to these and other development, EPA has undertaken a major redesign of the monitoring system that provides the data input for air quality management. The new National Ambient Air Monitoring Strategy (NAAMS), through its multi-tier integrated monitoring system, is geared to provide more relevant and timely data for these complex management needs. All these changes in management style place considerable burden on the information system that supports them. Fortunately, both the air quality monitoring and data dissemination technologies have also advanced considerably since the 1990s. Recent developments offer outstanding opportunities to fulfill the information needs for the new agile air quality management approach. The data from surface-based air pollution monitoring networks now provides routinely high-grade, spatio-temporal and chemical patterns throughout the US for PM25 and ozone. Satellite sensors with global coverage and kilometer-scale spatial resolution now provide real-time snapshots which depict the pattern of haze, smoke and dust in stunning detail. The ‘terabytes’ of data from these surface and remote sensors can now be stored, processed and delivered in near-real time. The instantaneous ‘horizontal’ diffusion of information via the Internet now permits, in principle, the delivery of the right information to the right people at the right place and time. Standardized computer-computer communication languages and Service-Oriented Architectures (SOA) now facilitate the flexible processing of raw data into high-grade ‘actionable’ knowledge. Last but not least, the World Wide Web has opened the way to generous sharing of data and tools leading to faster knowledge creation through collaborative analysis and virtual workgroups. Nevertheless, air quality analysts face significant hurdles. The new developments introduced a new set of problems. The “data deluge” problem is especially acute for analysts interest in aerosol pollution, since aerosols are so inherently complex and since there are so many different kinds of relevant data – from extensive, new surface-based monitoring networks, meteorological and aerosol forecast models, satellite imagery and associated data products, etc.

[Summary] Recent developments in surface and satellite sensing along with new information technologies now allow real-time, ‘just-in-time’ data analysis for the characterization and partial explanation of the of major air pollution events as well as more in-depth post-analysis, integration and fusion can also be performed using the federated historical data resources and tools. By making available many spatio-temporal data sources through a single web interface and in a consistent format, the DataFed tools allow anyone to view, process, overlay, and display many types of data to gain insight to atmospheric physical and chemical processes. A goal of the current effort is to encourage use of these tools by a broad community of air pollution researchers and analysts, so that a growing group of empowered analysts may soon enhance the rate at which our collective knowledge of air pollution evolves. In recent years, such agile analyses have provided occasional real-time support to the air quality managers and to the public but much more could be done. The current challenge is to incorporate such support into the air quality management process in a more regular and robust way.


The multidisciplinary topic of Air Quality (air chemistry, meteorology, health science, ecology) is a national priority and it is pursued in diverse organizations (EPA, NOAA, NASA, Regional and State Agencies etc.) , each organization being both a producer and consumer of AQ-related information. This 'messy' information system is further complicated by the fact that the value chain that turns raw AQ data into 'actionable knowledge' for decision making has many steps; the data processing nodes are distributed among different organizations and many nodes include human 'processors'. While in the past the AQ science and management system has worked, these were hampered by the marginal support from a suitable information flow infrastructure. This problem of AQ information access, integration and delivery will be greatly exacerbated in the near future. The current revolution in both satellite remote sensing and surface air chemistry measurements delivers higher quality and much higher quantity of AQ data that need to assimilated into the AQ analysis systems. Air quality simulation and forecast models now also require more input verification, assimilation and augmentation. At the same time, use of AQ information by AQ managers at federal and state levels is is being transformed from command-and-control to more flexible 'weight-of-evidence' style. Added to these changes is the emergence a new cooperative spirit exemplified in the Global Earth Observation System of Systems (GEOSS, 60 + nation membership), where Air Quality is identified as one of the near-term opportunities for demonstrating the benefits of GEOSS. The increased supply and the demand for highly refined, just-in-time air quality information is a grand challenge for both information science and environmental science communities.

Fortunately, recent developments and convergence of various information technologies can close the gap between the AQ information supply-demand. A particularly value can be added by a web-based cyberinfrastructure that can benefit virtually all components of the information system.....Internet II , Cyber stuff in NSF, NASA, NOAA, EPA as well as industry....[from info stowepipe to open networking]

The goal of this project is to build an infrastructure to support the science, management and education related to Air Quality. This goal is to be achieved through an Air Quality Observatory Based on a Modular Service-based Infrastructure. The AQO concept will be demonstrated s prototype Air Quality Observatory (AQO) will integrate AQ-relevant data, from surface, satellite and other sensors, provide data analysis tools for agile, analyst-configurable data processing. The users of the prototype will...

Infrastructure for Sharing AQ Data, Services and Tools

Current Infrastructure

DataFed an infrastructure for real-time integration and web-based delivery of distributed monitoring data. The federated data system, DataFed, (http://datafed.net) aims to support air quality management and science by more effective use of relevant data. Building on the emerging pattern of the Internet itself, DataFed assumes that datasets and new data processing services will continue to emerge spontaneously and autonomously on the Internet, as shown schematically in Figure 1. Example data providers include the AIRNOW project, modeling centers and the NASA Distributed Active Archive Centers (DAAC).

DataFed is not a centrally planned and maintained data system but a facility to harness the emerging resources by powerful dynamic data integration technologies and through a collaborative federation philosophy. The key roles of the federation infrastructure are to (1) facilitate registration of the distributed data in a user-accessible catalog; (2) ensure data interoperability based on physical dimensions of space and time; (3) provide a set of basic tools for data exploration and analysis. The federated datasets can be queried, by simply specifying a latitude-longitude window for spatial views, time range for time views, etc. This universal access is accomplished by ‘wrapping’ the heterogeneous data, a process that turns data access into a standardized web service, callable through well-defined Internet protocols.

The result of this ‘wrapping’ process is an array of homogeneous, virtual datasets that can be queried by spatial and temporal attributes and processed into higher-grade data products.

WCS GALEON here

The Service Oriented Architecture (SOA) of DataFed is used to build web-applications by connecting the web service components (e.g. services for data access, transformation, fusion, rendering, etc.) in Lego-like assembly. The generic web-tools created in this fashion include catalogs for data discovery, browsers for spatial-temporal exploration, multi-view consoles, animators, multi-layer overlays, etc.(Figure 2).

A good illustration of the federated approach is the realtime AIRNOW dataset described in a companion paper in this issue (Wayland and Dye, 2005). The AIRNOW data are collected from the States, aggregated by the federal EPA and used for informing the public (Figure 1) through the AIRNOW website. In addition, the hourly real-time O3 and PM2.5 data are also made accessible to DataFed where they are translated on the fly into uniform format. Through the DataFed web interface, any user can access and display the AIRNOW data as time series and spatial maps, perform spatial-temporal filtering and aggregation, generate spatial and temporal overlays with other data layer and incorporate these user-generated data views into their own web pages. As of early 2005, over 100 distributed air quality-relevant datasets have been ‘wrapped’ into the federated virtual database. About a dozen satellite and surface datasets are delivered within a day of the observations and two model outputs provide PM forecasts.

This is about data protocols, discovery, access processing services

DataFed

Unidata

Extending Current Infrastructure

DataFed wrappers - data access and homogenization

Standards Based Interoperability

Data and service interoperability among Air Quality Observatory (AQO) participants will be fostered through the implementation of accepted standards and protocols. Adherence to standards will foster interoperability not only within the AQO but also among other observatories, cyberinfrastrcuture projects, and GEOSS efforts.

Standards for finding, accessing, portraying and processing geospatial data are defined by the Open Geospatial Consortium (OGC) develops specifications for geospatial data. The AQO will implement many of the OGC specifications for interacting with its data and tools. The most well established OGC specification is the Web Map Server for exchanging map images but the Web Feature Service and Web Coverage Service are gaining wider implementation. The OGC specifications we expect to use in developing the AQO prototype are described in Table X.

Table X: OGC Specifications
Specification Description
WMS Web Map Services support the creation, retrieval and display of registered and superimposed map views of information that can come simultaneously from multiple sources.
WFS The Web Feature Service defines interfaces for accessing discrete geospatial data encoded in GML.
WCS Web Coverage Services allow access to multi-dimensional data that represent coverages, such as grids and point data of spatially continuous phenomena.
CSW Catalog services support the ability to publish and search collections of descriptive information (metadata) for data, services, and related information objects. Metadata in catalogs represent resource characteristics that can be queried and presented for evaluation and further processing by both humans and software.
SWE The Sensor Web Enablement specifications include SensorML for describing sensor instruments, observations & measurements for describing sensor data, sensor observation service for retrieving sensor data and sensor planning service for managing sensors. These specifications are discussion papers within OGC.
WPS Web Processing Services offer geospatial operations, including traditional GIS processing and spatial analysis algorithms, to clients across networks. WPS is currently a proposed specification.

While these standards are based on the geospatial domain, they are being extended to support non-spatial aspects of geospatial data. For example, the WFS revision working group is presently revising the specification to include support for time. WCS is being revised to support coverage formats other than grids. These extensions are particularly relevant to air quality data as access, analysis and display of the data is conducted is not limited to only maps. The implementation and use of OGC specificaitons within AQO will enhance the observatories ability to adapt to new data and service offerings. At the same time, it will also provide a testing environment for identifying limitations of current geospatial standards and feeding that new insight back to OGC for incorporation in future versions of its specifications.

The success of OGC specifications have led to efforts to develop interfaces between them and other common data access protocols. For example the GALEON Interoperability Experiment, led by Ben Dominico at Unidata, is developing a WCS interface to netCDF.


Unidata THREDDS middleware for data discovery and use; and test beds that assure the data exchange is indeed interoperable, e.g. Unidata-OGC GALEON Interoperability Experiment/Network. [much more Unidata stuff here] [Stefan OGC W*S standards] [CAPITA data wrapping, virtual SQL query for point data]

New Activities Extending the Infrastructure

Common Data Model [How about Stefano Nativi's semantic mediation]

Networking. [Semantic mediation of distributed data and services] [Jeff Ullman Mediator-as-view] [Purposeful pursuit of maximizing the Network Effect] [Value chains, value networks]

The novel technology development will focus on the framework for building distributed data analysis applications using loosely coupled web service component. By these technologies, applications will be built by dynamically 'orchestrating' the information processing components. .....[to perform an array of user-defined processing applications]. The user-configurable applications will include Analysts Consoles for real-time monitoring and analysis of air pollution events, workflow programs for more elaborate processing and tools for intelligent multi-sensory data fusion. Most of these technologies are already part of the CAPITA DataFed access and analysis system, developed through support from NSF, NASA, EPA and other agencies. Similarly, and increasing array of web service components are now being offered various providers. However, a crucial missing piece is the testing of service interoperability and the development of the necessary service-adapters that will facilitate interoperability and service chaining...... [more on evolvable, fault tolerance web apps ..from Ken Goldman here] [also link to Unidata LEAD project here]

Support for networked community interactions by creating web-based communication channels, aid cooperation through the sharing and reuse of multidisciplinary (air chemistry, meteorology, etc) AQ data, services and tools and by providing infrastructure support for group coordination among researchers, managers for achieving common objectives such as research, management and educational projects. [Unidata community support] The exploratory data analysis tools built on top of this infrastructure will seamlessly access these data, facilitate data integration and fusion operations and allow user-configuration of the analysis steps. ...[ including simple diagnostic AQ models driven by data in the Unidata system]. The resulting insights will help developing AQ management responses to the phenomenon and contribute to the scientific elucidation of this unusual phenomenon. [cyberinfrastructure-long end-to-end value chain, many players].

Protoype Air Quality Observatory

Extending Current Prototype

DataFed

THREDDS

New Prototyping Activities

Networking: Connecting Data DataFed, THREDDS, other nodes to make System of Systems

Service Interoparability chaining

Processing Applications, novel ways [Loose Coupling, Service Adapters, KenGoldman stuff]...

Cross-Cutting Use Cases for Prototype Demonstration

The proposed Air Quality Observatory prototype will demonstrate the benefits of the IIT through three use cases that are both integrating [cross-cutting], make a true contribution to AQ science and management and also place significant demand on the IIT. Use case selection driven by user needs: [letters EPA, LADCO, IGAC]. Not by coincidence, these topics are areas of active research in atmospheric chemistry and transport at CAPITA and other groups. The cases will be end-to end, connecting real data produces, mediators and well as decision makers. Prototype will be demonstration of seamless data discovery and access, flexible analysis tools and delivery.

1) Intercontinental Pollutant Transport. Sahara dust over the Southeast, Asian dust, pollution, [20+ JGR papers facilitated on the Asian Dust Events of April 1998 - now more can be done, faster and better with AQO] [letter from Terry Keating?]

2) Exceptional Events. The second AQO use case will be demonstration of real-time data access/processing/delivery/response system for Exceptional Events (EE). Exceptional AQ events include, smoke from natural and some anthropogenic fires, windblown dust events, volcanoes and also long range pollution transport events from sources such as other continents. A key feature of exceptional events is that they tend to be episodic with very high short-term concentrations. The AQO information prototype system needs will provide real-time characterization and near-term forecasting, that can be used for preventive action triggers, such as warnings to the public. Exceptional events are also important for long-term AQ management since EE samples can be flagged for exlosion from the National Ambient Air Quality Standards calculations. The IIT support by both state agencies and fedral gov...[need a para on the IIT support to global science e.g. IGAC projets] During extreme air quality events, the stakeholders need more extensive 'just in time analysis’, not just qualitative air quality information.


3) Midwestern Nitrate Anomaly. Over the last two years, a mysterious pollutant source has caused the rise of pollutant levels in excess of the AQ standard over much of the Upper Midwest in the winter/spring. Nitrogen sources are suspected since a sharp rise in nitrate aerosol is a key air component. The phenomenon has eluded detection and quantification since the area was not monitored but recent intense sampling campaigns have implicated NOX and Ammonia release from agricultural fields during snow melt. This AQO use case will integrate and facilitate access to data from soil quality, agricultural fertilizer concentration and flow, snow chemistry, surface meteorology and air chemistry.

Observatory Guiding Principles, Governance, Personnel

Guiding Priciples: openness, networking, 'harnessing the winds' [of change in technology, attitudes]

The governance of the Observatory ... reps from data providers (NASA/NOAA), users (EPA), AQ science/Projects Use agency-neutral ESIP/AQ cluster as the interaction platform -- AQO Project wiki on ESIP. Use ESIP meetings to have AQO project meetings. [Ben/Dave could use help here on governace]....

[everybody needs to show off their hats and feathers here, dont be too shy]The AQO project will be lead by Rudolf Husar and Ben Domenico. Husar is Professor of Mechanical engineering and director of the Center for Air Pollution Impact and Trend Analysis (CAPITA) and brings 30+ years of experience in AQ analysis and environmental informatics to AQO project. Ben Domenico, Deputy Director of Unidata. Since its inception in 1983, Domenico was an engine that turned Unidata into one of the earliest examples of successful cyberinfrastructure, providing data, tools and general building support to the meteorological research and education community. CAPITA and Unidata with their rich history and the experience of their staff will be the pillars of the AQO. The active members of the AQO network will be from the ranks of data providers, data users and value-adding mediators-analysts. The latter group will consist of existing AQ research projects funded by EPA, NASA, NOAA, NSF that have data, tools, or expertise to contribute to the shared AQO pool. The communication venue for the AQO will be through the Earth Science Information Partners (ESIP), as part of the Air Quality Cluster [agency/organization neutral].

DataFed is a community-supported effort. While the data integration web services infrastructure was initially supported by specific information technology grants form NSF and NASA, the data resources are contributed by the autonomous providers. The application of the federated data and tools is in the hands of users as part of specific projects. Just like the way the quality data improves by passing it through many hands, the analysis tools will also be improve with use and feedback from data analysts. A partial list is at http://datafed.net/projects. At this time the DataFed-FASTNET user community is small but substantial efforts are under way to encourage and facilitate broader participation through larger organizations such as the Earth Science Information Partners (ESIP) Federation (NASA, NOAA, EPA main member agencies) and the Regional Planning Organizations (RPOs) for regional haze management.

Broader Impacts of the Air Quality Observatory

Impact on data providing agencies [letter from NASA?]

Impact on user agencies [letter from EPA, RPOs?]

International, Earth Science Process [letter from IGAC?]

Activity Schedule

Infrastructure

Prototype

Use Cases

References Cited

Husar, R.B., et al. The Asian Dust Events of April 1998; J. Geophys. Res. Atmos. 2001, 106, 18317-18330. See event website: http://capita.wustl.edu/Asia-FarEast/

Husar, R.; Poirot, R. DataFed and Fastnet: Tools for Agile Air Quality Analysis; Environmental Manager 2005, September, 39-41

Wayland, R.A.; Dye, T.S. AIRNOW: America’s Resource for Real-Time and Forecasted Air Quality Information; Environmental Manager 2005, September, 19-27.

National Ambient Air Monitoring Strategy {(NAAMS)}

Biographical Sketches

Collaborators and Other Personnel