Difference between revisions of "NSF Air Quality Observatory:AQ Observatory Proposal"

From Earth Science Information Partners (ESIP)
 
(270 intermediate revisions by 11 users not shown)
Line 5: Line 5:
 
=<center>Project Summary</center>=
 
=<center>Project Summary</center>=
  
[VERY ROUGH] The research and management of air quality is addressed by the several diverse communities. Pollutant emissions are determined by environmental engineers, atmospheric transport and removal processes are mainly in the domain of meteorologist, the pollutant transformations are in the purview of atmospheric chemists, air quality analysts, and the effects of air pollution are assessed by health, scientist, ecologists, and economists. The most complex and intertwined multi-disciplinary linkage is between atmospheric chemistry and meteorology. The goal of this project is to enhance the link between these communities through an effective cyberinfrastructure.  In air quality there is an recently developed cyberinfrastructure (DataFed) that provides access to over 30 distributed AQ datasets (emissions, concentrations, depositions) and some web-based processing tools that has shown to support to both research and AQ management. In meteorology, the extant infrastructure, spearheaded by Unidata and its community, supports research, teaching, and decision-making by the matching of end-user tools to needed observational data. 
+
Research and management of air quality is addressed by diverse, but linked, communities that deal with emissions, atmospheric transport and removal processes, chemical transformations, and environmental/health effects. Among the most complex of the attendant cross-disciplinary links is the interaction between atmospheric chemistry and meteorology.  
  
Unfortunately, there is a significant gap between the meteorological and air quality communities, particularly when it comes to sharing data and tools. Thus, AQ researchers and analysts wishing to combine the data face considerable hurdles, especially where the needed data involve aggregations of observed and simulated information from multiple sources. The data filtering, aggregation and fusion is becoming more difficult as end-to-end-user needs increase in complexity, data volumes grow, and data types evolve. Consequently, the air-quality-meteorology cross-community data use is very low because the above difficulties are greatly compounded by interdisciplinary variations in tool usage and the attendant data syntax and semantics. The AQO will leverage, augment and integrate DataFed and Unidata in a prototype cyberinfrastructure component that better serves researchers, decision-makers and teachers of air quality, meteorology, and related fields by overcoming the listed difficulties. The research team Washington University and Unidata has decades of experience in building information infrastructures and their application air quality analysis, meteorology, environemtatal engineering..computer scientist.  
+
The goal of this Air Quality Observatory (AQO) project is to enhance the linkage between these communities through effective cyberinfrastructure.  In air quality, a recently developed infrastructure (DataFed) provides access to over 30 distributed data sets (emissions, concentrations, depositions), along with web-based processing capabilities that serve both research and management needs. In meteorology, long-standing infrastructure developed by Unidata supports a large community of researchers, educators, and decision-makers needing observational dataThese two cyberinfrastructure components are most useful at present within the scopes of their respective communities, and great opportunity exists for widening their combined effectiveness.
  
The underpinning for these advances will be an end-to-end system that exhibits advanced functional and cyberinfrastructure design. The functional design of the systems will incorporate responding to AQ event in real time; delivering need information to decision makers (AQ managers, public) and overcoming the syntactic and semantic impedances. The cyberinfrastructure design of the AQO will accommodate a variety of data sources, respond to different types of AQ events, offer simplicity via a Common Data Model, fosters interoperability through standard protocols, provide user-programmable components for data filtering, aggregation and fusion, employ a service-oriented architecture for linking loosely coupled web-services, and facilitate the creation of user-configurable monitoring consoles. The software application framework and the prototype will also serve as a test bed for testing advanced computer science ideas on ...  
+
'''Intellectual Merit.''' The overarching contribution of this project is to advance cross-community interoperability among cyberinfrastructure components that are critical in multidisciplinary environmental observation. Interoperability topics will include: access methods for heterogeneous data sources; adapters and transformers to aid in connecting Web services from distinct communities; and designing service-oriented architectures for simplicity and extensibility in the presence of semantic impedance. Leveraging on current capabilities of Unidata and DataFed, new understandings will be gained about the benefits of abstract data typing, service adaptors, and polymorphism in such architectures.  
  
The AQO prototype will demonstrate these features through use cases that will include active researchers, analysts and managers of air quality.  (1) quantifying natural and anthropogenic pollution transport from Asia, Africa and Central America to the US  (2) forecasting, observing and responding to exceptional air quality events such as forest smoke, windblown dust and regional pollution (3) analyzing and explaining the Midwestern Nitrate Anomaly. Considerable, broader impacts of the AQO are expected in several areas. Builders of Cyberinfrastructure will benefit from the infusion of novel web service architectures, robust distributed application framework and from technologies for semi-automatic service-wrapping of legacy data. The observatory will support federal and state AQ managers in performing status and trend analysis, exceptional event management and for assessing the effectiveness of the existing monitoring networks. The AQO will also contribute to general atmospheric science by providing real-time data/tools for international field studies and by aiding chemical model evaluation and data assimilation. AQO will also be a near-term application of GEOSS.
+
The framework for these advances will be an end-to-end prototype whose functional design will enable intellectual advances in the application domain (as well as in cyberinfrastructure). Several use cases will be explored that require cross-community sharing of tools and data, including aggregations of observed and simulated information from multiple sources. An unusual aspect of the AQO will be the extent to which it joins Unidata's push-style capabilities for handling real-time, asynchronous data streams with more traditional pull-style Web services.
  
=<center>NSF and Related Projects</center>=
+
'''Broader Impact.''' The AQO will support diverse learners within and beyond the DataFed and Unidata communities.  Lessons learned in this project will inform builders of other cross-disciplinary cyberinfrastructure, especially those facing semantic impedances and the challenges of real-time streams. Finally, the observatory will support many users, such as federal and state AQ managers performing status and trend analyses, managing exceptional events, or evaluating monitoring networks.
  
CAPITA NSF small ITR 2001-2004
+
The AQO will leverage, augment, and integrate DataFed and Unidata in a prototype cyberinfrastructure component that better serves researchers, decision-makers, teachers and students of air quality, meteorology, and related fields by overcoming key difficulties. The research team from Washington University and Unidata has decades of experience in developing information technologies and applying them to air-quality analysis, meteorology, and environmental engineering.
  
Unidata NSF projects?
+
=<center>Past NSF Projects</center>=
  
Together 2-4 paragraphs??
+
Rudolf Husar's "Collaboration Through Virtual Workgroups"  project (NSF ATM-ITR small grant #0113868, $445,768, 9/01/01 to 8/31/04, ITR 2001-2004) found that web services are mature enough for integration of distributed, heterogeneous, and autonomous datasets into homogeneous, federated datasets. The developed DataFed (http://datafed.net) tools allowed real-time, 'just-in-time' data analysis for the characterization of major air pollution events.
  
=<center>Technical Merit</center>=
+
Ben Domenico's "Thematic Real-time Environmental Distributed Data (THREDDS)" project (DUE-0121623, $900,001, 10/01/2001 to 09/30/2003) was a collaborative initiative to build a software infrastructure to provide students, educators, and researchers with Internet access to large collections of real-time and archived environmental datasets from distributed servers. Unidata is also nearing completion on "THREDDS Second Generation" (DUE-0333600, $554,993, 10/01/03 to 09/30/06) which focuses
The overarching technological contribution of this project is to advance cross-community '''interoperability''' among cyberinfrastructure components that are critical in environmental observatories. The tangible outcomes will include a prototype observatory that provides genuine end-to-end services needed by two distinct communities of users and ''simultaneously'' advances the state of the art in designing observatories for multidisciplinary communities of users. Each of the communities participating in this study has operational systems that will be leveraged to create the prototype, but the marriage of their systems presents significant design challenges within which to study specific interoperability questions.
+
on integrating GIS information via Open GIS protocols. Additional THREDDS details can be found here: http://www.unidata.ucar.edu/projects/THREDDS/. Dr. Domenico has been involved in several other successful efforts: "Unidata 2008: Shaping the Future of Data Use in the Geosciences"(ATM-0317610, $22,800,000, 10/01/03 to 09/30/08); "Linked Environments for Atmospheric Discovery (LEAD)" (ATM-0331587, $11,250,000, 10/01/03 to 09/30/08); and "DLESE Data Services: Facilitating the Development and Effective Use of Earth System Science Data in Education" (EAR-0305045, $390,985, 09/30/03 to 08/31/06).
  
Specifically, the joining of the air-quality and meteorology communities will require (1) effective global access to distinct but overlapping heterogeneous data streams and data sets; (2) use of these data in distinct but overlapping sets of tools and services, to meet complex needs for analysis, synthesis, display and decision support; and (3) new combinations of the above such as can be achieved only in a distributed, service-oriented architecture that exhibits excellence of functional and technical design, including means to overcome the semantic differences that naturally arise when communities develop with distinct (even though overlapping) motivations and interests.
+
Ken Goldman's "Interactive Learning Environment for Introductory Computer Science" project (EIA-0305954, $514,996.00, 8/15/03 to 7/31/06) involves the development of JPie, an interactive programming environment designed to make object-oriented software development accessible to a wider audience.  Programs are constructed by graphical manipulation of functional components so
 +
inexperienced programmers can achieve early success without the steep learning curve that precedes development in a traditional textual language. Recent JPie extensions support dynamic server interface changes, with support for both SOAP and CORBA.
  
'''Data Interoperability'''. For data interoperability the contributions of this research are in the area of semantic homogenization of multidisciplinary data from satellite and surface-based air quality and meteorological measurement, model outputs, emission inventories, etc. The tangible output of this research component will be an extended Common Data Model and the associated metadata structures that can be transferred into generally accepted standards.  
+
=<center>Intellectual and Technical Merit</center>=
 +
The overarching technological contribution of this project is to advance cross-community '''interoperability''' among cyberinfrastructure components that are critical in contemporary environmental observation. The tangible outcomes will include a prototype observatory that provides genuine end-to-end services needed by two distinct communities of users and ''simultaneously'' advances the state of the art in designing observatories for multidisciplinary communities of users. Each of the communities participating in this study have operational systems that will be leveraged to create the prototype, but the marriage of their systems presents significant design challenges within which to study important interoperability questions.
  
'''Service Interoperability'''. The interoperability of web services at the physical and syntactic level is assured by a set of internet protocols. In the case of SOAP XML web services the service description through WSDL permits syntax checking, but the semantic meaning of the data exchange is inadequately described through the schema. Hence, SOAP web services that are developed by different organizations for different purposes are not directly interoperable through a loosely coupled, snapping interface. The research contribution on this topic consists of the development of web service adapters that will allow loose connection of autonomously developed web services.  
+
Specifically, the joining of the air-quality and meteorology communities will require (1) effective global access to distinct but overlapping, heterogeneous data streams and data sets; (2) use of these data in distinct but overlapping sets of tools and services, to meet complex needs for analysis, synthesis, display, and decision support; and (3) new ''combinations'' of these data and (chained) services such as can be achieved only in a distributed, service-oriented architecture that exhibits excellence of functional and technical design, including means to overcome the semantic differences that naturally arise when communities develop with distinct motivations and interests.
 +
 
 +
'''Interoperable Data Access Methods.''' The contributions of this research will advance the state of interdisciplinary data use by achieving effective global access to heterogeneous data streams and data sets. Specific emphasis will be placed on mediating access to diverse types of data from remote and in-situ observing systems, combined with simulated data from sophisticated, operational forecast models. The observational sources will include satellite- and surface-based air quality and meteorological measurements, emission inventories, and related data from the remarkably rich arrays of resources presently available via Unidata and DataFed.
 +
 
 +
The tangible output of this research component will be an extended Common Data Model (including the associated metadata structures), realized in the form of (interoperable) Web services that will meet the data-access needs of both communities and that can become generally accepted standards.
 +
 
 +
'''Interoperable Data-Processing Services.''' Interoperability among Web services at the physical and syntactic level is, of course, assured by the underlying Internet protocols, though semantic interoperability is not. In the case of SOAP-based services, WSDL descriptions permit syntax checking, but higher-level meanings of data exchange are inadequately described in the schema. Hence, SOAP-based services developed by different organizations for different purposes are rarely interoperable in a meaningful way. The research contribution on this topic will include--within key contexts for environmental data use--development of Web service adapters that provide loosely-coupled, snapping interfaces for Web services created autonomously in distinct communities.  
 
          
 
          
'''Distributed Applications'''.  The Service Oriented Architecture (SOA) movement constitutes the latest approach to address the challenge of distributed computing. Our team’s experience with SOA over the past few years has demonstrated that useful applications can be built using web service chaining, however the current prototype systems such as DataFed operate within a single host where service interoperability is assured by internal conventions.  As the AQO evolves into a fully networked system the distributed applications built within its framework need to be robust, evolovable and linkable to the system of systems. (Ken Goldman).
+
'''Distributed Applications'''.  The Service Oriented Architecture (SOA) movement, among others, indicates ongoing intellectual interest in the (unmet) challenges of distributed computing. Our team’s experience with SOA in recent years has demonstrated that useful applications can be built via Web-service chaining, but our current prototypes--including DataFed--operate within a context where service interoperability is assured by internal, community-specific conventions.  As the AQO evolves into a fully networked system, distributed applications built upon its infrastructure will need to be robust, evolvable, and linkable to the system of systems that is the Web. The project approach to these goals will be based on abstract data typing, polymorphism, and standards-based service adaptors.
  
=<center>Broader Impact</center>=  
+
=<center>Broader Impacts</center>=  
The Air Quality Observatory through its technologies and applications will have broader impact on the evolving cyberinfrastructure, air quality management and atmospheric science.  
+
The Air Quality Observatory, through its technologies and applications, will have broader impact on the evolving cyberinfrastructure, air quality management, and atmospheric sciences.  
 
===Impact on Cyberinfrastructure===   
 
===Impact on Cyberinfrastructure===   
  
'''Infusion of Web Service Technologies'''. The agility and responsiveness of the evolving cyberinfrastructure is accomplished by loose coupling and user-driven dynamic rearrangement of its components. Service orientation and web services are the key architectural and technological features of AQO. These new paradigms have been applied by the proposing team for several years generating applications and best-practice procedures that use these new approaches. Through collaborative activities, multi-agency workgroups and formal publications the web-service based approach will be infused into the broader earth science cyberinfrastructure. [support letter?]
+
'''Infusion of Web Service Technologies'''. The agility and responsiveness of the evolving cyberinfrastructure is accomplished by loose coupling and user-driven dynamic rearrangement of its components. Service orientation and Web services are the key architectural and technological features of AQO. These new paradigms have been applied by the proposal team for several years generating applications and best-practice procedures that use these new approaches. Through collaborative activities, multi-agency workgroups, and formal publications the Web service based approach will be infused into the broader earth science cyberinfrastructure.  
 
    
 
    
'''Technologies for Wrapping Legacy Datasets'''. The inclusion of legacy datasets into the cyberinfrastructure necessitates the wrapping of legacy datasets with formal interfaces for programmatic access, i.e. turning data into services. In the course of developing these interfaces to a wide variety of air quality, meteorology and other datasets the proposal team has developed an array of data wrapping procedures and tools. A wide distribution of these wrappings will assure the rapid growth of the content shareable through the cyberinfrastructure and the science and societal benefits resulting from the “network effect”.
+
'''Technologies for Wrapping Legacy Datasets'''. The inclusion of legacy datasets into the cyberinfrastructure necessitates the wrapping of legacy datasets with formal interfaces for programmatic access, i.e. turning data into services. In the course of developing these interfaces to a wide variety of air quality, meteorological, and other datasets, the proposal team has developed an array of data wrappers, procedures and tools. A wide distribution of these wrappings will assure the rapid growth of the content shareable through the cyberinfrastructure and the science and societal benefits resulting from the “network effect”.
  
'''Common Data Models for Multiple Disciplines'''. The major resistance in the horizontal diffusion of data through the cyberinfrastructure arises from the variety of physical and semantic data structures for earth science applications. Common data models are emerging that allow uniform queries and standardized, self-describing returned data types. Through the development, promotion and extensive application of these common, cross-disciplinary data models, the AQO will contribute to interoperability within the broader earth science community.
+
'''Common Data Models for Multiple Disciplines'''. The major resistance in the horizontal diffusion of data through the cyberinfrastructure arises from the variety of physical and semantic data structures for earth science applications. Common data models are emerging that allow uniform queries and standardized, self-describing returned data types. Through the development, promotion, and extensive application of these common, cross-disciplinary data models, the AQO will contribute to interoperability within the broader earth science community.
  
 
===Impact on Air Quality Management===
 
===Impact on Air Quality Management===
  
'''Federal and State Air Quality Status and Planning'''. DataFed has already been used extensively by federal and state agencies to prepare status and trend analysis and to support various planning processes. The new air quality observatory with the added meteorological data and tools will more effectively ???
+
'''Federal and State Air Quality Status and Planning'''. DataFed has already been used extensively by federal and state agencies to prepare status and trend analysis and to support various planning processes. The new air quality observatory with the added meteorological data and tools will be able to  serve these communities more effectively.
  
'''Exceptional Air Quality Events'''. AQ management is being increasingly responsive to the detection, analysis and management of short term systems. The combined DataFed-Unidata system and the extended cyberinfrastructure of AQO will be able to support these activities with increased effectiveness and through the just-in-time delivery of actionable knowledge to decision makers in AQ management organizations as well as the general public. [support letter?]
+
'''Exceptional Air Quality Events'''. AQ management is being increasingly responsive to the detection, analysis, and management of short-term systems. The combined DataFed-Unidata system and the extended cyberinfrastructure of AQO will be able to support these activities with increased effectiveness and through the just-in-time delivery of actionable knowledge to decision makers in AQ management organizations as well as the general public.  
  
'''Monitoring Network Assessment'''. A current revolution in remote and surface based sensing of air pollutants is paralleled by a bold new National Ambient Air Monitoring Strategy (NAAMS ref). The effectiveness of the new strategy will depend heavily on cyberinfrastructure for data collection, distribution and analysis for a variety of applications. The cyberinfrastructure will also be needed to assess the overall effectiveness of the monitoring system that now includes data from multiple Linking Agencies, Disciplines, Media and Global Communities [support letter?]
+
'''Monitoring Network Assessment'''. A current revolution in remote and surface based sensing of air pollutants is paralleled by a bold new National Ambient Air Monitoring Strategy <sup>1</sup>. The effectiveness of the new strategy will depend heavily on cyberinfrastructure for data collection, distribution, and analysis for a variety of applications. The cyberinfrastructure will also help assess the overall effectiveness of the monitoring system that now includes data from multiple agencies, disciplines, media, and global communities.
  
===Impact on Atmospheric Science===  
+
===Impact on Atmospheric Science and Educations ===  
  
'''Chemical Model Evaluation and Augmentation'''. Dynamic air quality models are driven by emissions data and/or scenarios and a module that includes air chemistry and meteorology to calculate the source receptor relationships. The chemistry models themselves can be embedded in larger earth systems, models and they can serve as inputs into models for health, ecological and economic effects. The air quality observatory will provide homogenized data resources for model validation and also for assimilation into advanced models[fix]. A good example is the assimilation of satellite-based smoke emission estimates into near-term forecast models. [support letter?]
+
'''Chemical Model Evaluation and Augmentation'''. Dynamic air quality models are driven by emissions data and/or scenarios and a module that includes air chemistry and meteorology to calculate the source receptor relationships. The chemistry models themselves can be embedded in larger earth system models, and they can serve as inputs into models for health, ecological and economic effects. The AQO will provide homogenized data resources for model validation and also for assimilation into advanced models.
 +
 +
'''International Air Chemistry Collaboration'''.  A significant venue for advancing global atmospheric chemistry is through international collaborative projects that bring together the global research community to address complex new issues such as intercontinental pollutant transport. The AQO will be able to support these scientific projects using real-time global scale data resources, the user-configurable processing chains, and the user-defined virtual dashboards. (See collaboration letter from T. Keating). 
  
'''International Air Chemistry Collaboration'''. A significant venue for advancing global atmospheric chemistry is through international collaborative projects that bring together the global research community to address complex new issues such as intercontinental pollutant transport. The AQO will be able to support these scientific projects using real-time global scale data resources, the user-configurable processing chains and the user-defined virtual dashboards. [support letter?] 
+
'''Near-Term Application of GEOSS'''. A deeper understanding of the earth system is now being pursued by a Global Earth Observation System of Systems (GEOSS, ref) which now includes the cooperation of over 60 nations. Air quality was identified as one of the near-term opportunities for demonstrating GEOSS through real examples. The AQO prototype can serve as a test bed for GEOSS demonstrations.
  
'''Near-Term Application of GEOSS'''. A deeper understanding of the earth system is now being pursued by a Global Earth Observation System of Systems (GEOSS, ref) which now includes the cooperation of over sixty nations. Air quality was identified as one of the near-term opportunities for demonstrating GEOSS through real examples. The AQO prototype can serve as a test bed for GEOSS demonstrations. [support letter?]
+
'''Education and Outreach Impact'''. Twice each year, Unidata will report on progress in implementing the AQO prototype at its User and Policy Committee meetings, encourage participation in the prototype, and solicit community input.  Unidata site surveys conducted in 2001 and 2002 showed the significance of Unidata's impact in the educational community: over 21,000 students per year use Unidata tools and data in classrooms and labs; more than 1,800 faculty and research staff use Unidata products in both teaching and research; over 130 Unidata-connected university programs influence over 40,000 K-12 students; nearly 900 teacher-training participants have used Unidata software; Unidata-based weather web sites at colleges and universities have over 400,000 hits per day. Many of the Unidata universities serve large numbers of students from underrepresented and minority groups. Besides the technological and research advances, an important outcome of this initiative will be bringing together the existing Unidata community and the DataFed community.
  
=<center>Project Description: Air Quality Observatory (AQO) - Jan 22, Very rough draft</center>=  
+
=<center>Project Description: Air Quality Observatory (AQO)</center>=  
  
 
==Introduction==
 
==Introduction==
  
Traditionally, air quality analysis was a slow, deliberate investigative process occurring months or years after the monitoring data had been collected. Satellites, real-time pollution detection and the World Wide Web have changed all that. Analysts and managers can now observe air pollution events as they unfold. They can ‘congregate’ through the Internet in ad hoc virtual work-groups to share their observations and collectively create the insights needed to elucidate the observed phenomena. Air quality analysis his becoming much more agile and responsive to the needs of air quality managers, the public and the scientific community. In April 1998, for example, a group of analysts keenly followed and documented on the Web, in real-time, the trans-continental transport Asian dust from the Gobi desert (Husar, et al., 2001), its impact on the air quality over the Western US and provided real-time qualitative explanation of the unusual event to the managers and to public pubic. The high value of qualitative real-time air quality information to the public is well demonstrated through EPA’s successful AIRNOW program (Weyland and Dye, 2006).  
+
Research and management of air quality is addressed by several diverse communities. Pollutant emissions are determined by environmental engineers; atmospheric transport and removal processes are mainly in the domain of meteorologists; pollutant transformations are in the purview of atmospheric chemists and air-quality analysts; and the impacts of air pollution are assessed by health scientists, ecologists, and economists. Among the most dynamic and structurally complex cross-disciplinary links is the interaction between atmospheric chemistry and meteorology.  
  
In recent years, air quality management process has also changed. The old command and control style is giving way to a more participatory approach that includes the key stakeholders and encourages the application of more science-based ‘weight of evidence’ approaches to controls. The air quality regulations now emphasize short-term monitoring while at the same time long-term air quality goals are set to glide toward ‘natural background’ levels over the next decades. In response to these and other development, EPA has undertaken a major redesign of the monitoring system that provides the main sensory data input for air quality management. The new National Ambient Air Monitoring Strategy {(NAAMS)}, through its multi-tier integrated monitoring system, is geared to provide more relevant and timely data for these complex management needs. The data from surface-based air pollution monitoring networks now provides routinely high-grade, spatio-temporal and chemical patterns throughout the US for the most serious air pollutant, fine particles (PM2.5) and ozone. Satellite sensors with global coverage and kilometer-scale spatial resolution now provide real-time snapshots which depict the pattern of haze, smoke and dust in stunning detail and the new sensors also show the pattern of gaseous compounds such as ozone and nitrogen dioxide. The generous sharing of data and tools now leads to faster knowledge creation through collaborative analysis and management. The emergence a new cooperative spirit is exemplified in the Global Earth Observation System of Systems (GEOSS, 60 + nation membership), where air quality is identified as one of the near-term opportunities for demonstrating the benefits of GEOSS.
+
Command-and-control style air quality management has recently given way to a more participatory approach that includes the key stakeholders and encourages the application of more science-based ‘weight of evidence’ approaches to controls. The air quality regulations now emphasize short-term monitoring, and air quality goals are set to glide toward ‘natural background’ levels. The EPA has also adopted a new National Ambient Air Monitoring Strategy<sup>1</sup> to provide more relevant and timely data for these complex management needs. Real-time surface-based monitoring networks now routinely provide patterns of fine particles and ozone throughout the US. Satellite sensors with global coverage depict the pattern of haze, smoke, and dust in stunning detail. The emergence a new cooperative spirit to make effective use of these developments is exemplified in the Global Earth Observation System of Systems (with over 60 member nations), where air quality is identified as one of the near-term opportunities for collaborative data integration and analysis.  
  
Information technologies offer outstanding opportunities to fulfill the information needs for the new agile air quality management system. The ‘terabytes’ of data from these surface and remote sensors can now be stored, processed and delivered in near-real time. The instantaneous ‘horizontal’ diffusion of information via the Internet now permits, in principle, the delivery of the right information to the right people at the right place and time. Standardized computer-computer communication protocols and Service-Oriented Architectures (SOA) now facilitate the flexible processing of raw data into high-grade ‘actionable’ knowledge.
+
The increased data supply and the demand for higher grade AQ information products is a grand challenge for both the environmental and information science communities. The current dedicated ‘stove-pipe’ information systems are unable to close the huge information supply-demand gap. Fortunately, new information technologies now offer outstanding opportunities. Gigabytes of data can now be stored, processed, and delivered in near-real time. Standardized computer-computer communication protocols and Service-Oriented Architectures (SOA) facilitate the flexible processing of raw data into high-grade ‘actionable’ knowledge. The instantaneous ‘horizontal’ diffusion of information via the Internet permits, '''in principle''', the delivery of the right information to the right people at the right place and time.
  
The increased data supply and the demand for higher grade AQ information products is a grand challenge for both and environmental science information science communities. From environmental science and engineering point of view, air quality is a highly multidisciplinary topic which includes air chemistry, atmospheric physics, meteorology, health science, ecology and others. The range of data needed for analysis and interpretation now is much richer including high resolution satellite data on PM concentrations, emissions, meteorology, and effects.  Meteorological and air quality simulation and forecast models now also require more input verification, and augmentation. The “data deluge” problem is especially acute for analysts interest in aerosol pollution, since aerosols are so inherently complex and since there are so many different kinds of relevant data.  
+
The vision of this research is to improve air quality through a more supportive information infrastructure. The specific project obectives are to (1) Improve the interoperability infrastructure that links the air quality and the meteorological communities; (2) Develop a prototype air quality observatory (AQO); and (3) Demonstrate the utility of AQO through three cross-cutting use cases.'''
  
The AQ data need to be ‘metabolized’ into higher grade knowledge by the AQ analysis systems, but the value-adding chain that turns raw AQ data into 'actionable knowledge' for decision making consists of many steps, include human 'processors'.  The data processing nodes are distributed among different organizations (EPA, NOAA, NASA, Regional and State Agencies etc. academia), each organization being both a producer and consumer of AQ-related information. The system must deliver relevant information to a broad range of stakeholders (federal, state, local, industry, international). Furthermore, the type of data, the level of aggregation, filtering, and the frequency at which sensory data are provided to the air quality management system differs greatly whether it is applied to policy, regulatory or operational decisions. The IIT needs to support both real-time, ‘just-in-time’ data analysis as well as the traditional in-depth post-analysis.
+
==Interoperability Infrastructure==
  
While the current AQ science and management system do work, their efficiency and effectiveness is hampered by the marginal support from a suitable information flow infrastructure. [stove-pipes]
+
===Overcoming Semantic Impedance===
  
Air Quality Observatory to the rescue!!!
+
This project envisions innovative uses of Web services to enable new levels of cross-discipline interoperability among the tools, data sets, and data streams employed by members of the air-quality and meteorology communities. Two complementary approaches are proposed to reduce semantic impedance: polymorphism and standards-based interoperability.
  
The goal of this project is to build an infrastructure to support the science, management and education related to Air Quality. This goal is to be achieved through an Air Quality Observatory Based on a Modular Service-based Infrastructure. By making available many spatio-temporal data sources through a single web interface and in a consistent format, the DataFed and Unidata tools allow anyone to view, process, overlay, and display many types of data to gain insight to atmospheric physical and chemical processes.  
+
[[Image:CommonDaataModelS.gif|left|frame|Figure 1a. Unidata's Common Data Model as a basis for abstract data typing.]]
  
A goal of the Observatory is to encourage use of these tools by a broad community of air pollution researchers and analysts, so that a growing group of empowered analysts may soon enhance the rate at which our collective knowledge of air pollution. The current challenge is to incorporate the support of the AQO into the air quality management process in a more regular and robust way.  
+
In this project, polymorphism means Web services that (1) maintain sophisticated awareness of and sensitivity to the types of input information they receive and (2) take differing actions, dependent upon this data-type discernment. Such flexibility requires sufficient ''descriptive'' capacity to characterize data inputs and outputs at a high (i.e., semantic rather than syntactic) level. Such ''abstract data typing'' will be achieved in this project by relying on and augmenting the Common Data Model, sketched as shown in Figure 1a, under development in Unidata.
  
A particularly goal is to develop and demonstrate the benefits of a mid-tier cyberinfrastructure that can benefit virtually all components of the air quality information system, the data producers, processors, human refiners, and the knowledge-consuming decision makers. ....Internet II , Cyber stuff in NSF, NASA, NOAA, EPA as well as industry....[from info stowepipe to open networking]
+
The power and simplicity potentially gained via polymorphism, combined with an appropriate set of data-transformation and -projection services, is that for N types of data and M applications, the number of adaptor components is roughly N+M, rather than NxM as would be required with a many-to-many approach.  
  
==Infrastructure for Sharing AQ Data, Services and Tools==
+
One application of this concept may be seen in Figure 1b, which illustrates an approach to complex query mediation.
 +
In this example, flexible connections are expedited by data-specific '''wrappers''' that homogenize the data into virtual data cubes. Data access '''adapters''' facilitate queries to the data cubes through multiple data access protocols.
  
===Current Infrastructure===
+
[[Image:DataFed AdapterS.gif|left|frame|Figure 1b. Flexible access through data wrappers and query adapters addressing the vitual (noth physical) data cube ]]
'''DataFed''' an infrastructure for real-time integration and web-based delivery of distributed monitoring data. The federated data system, DataFed, (http://datafed.net) aims to support air quality management and science by more effective use of relevant data. Building on the emerging pattern of the Internet itself, DataFed assumes that datasets and new data processing services will continue to emerge spontaneously and autonomously on the Internet, as shown schematically in Figure 1. Example data providers include the AIRNOW project, modeling centers and the NASA Distributed Active Archive Centers (DAAC).
 
  
DataFed is not a centrally planned and maintained data system but a facility to harness the emerging resources by powerful dynamic data integration technologies and through a collaborative federation philosophy.
+
Carrying these principles a step further, key Web services can themselves be interoperable in a fashion that allows the composition or chaining of functions.  Hence, with a minimum of semantic complexity (and software development) the capabilities of the AQO can grow in a combinatorial fashion, evolving and adapting to the changing needs of a user base that increases both in size and diversity.
The key roles of the federation infrastructure are to (1) facilitate registration of the distributed data in a user-accessible catalog; (2) ensure data interoperability based on physical dimensions of space and time; (3) provide a set of basic tools for data exploration and analysis. The federated datasets can be queried, by simply specifying a latitude-longitude window for spatial views, time range for time views, etc. This universal access is accomplished by ‘wrapping’ the heterogeneous data, a process that turns data access into a standardized web service, callable through well-defined Internet protocols.  
 
  
The result of this ‘wrapping’ process is an array of homogeneous, virtual datasets that can be queried by spatial and temporal attributes and processed into higher-grade data products.
+
===Standards Based Interoperability===
  
WCS GALEON here
+
Data and service interoperability among Air Quality Observatory (AQO) participants will be fostered through the implementation of accepted standards and protocols. (In the above diagrams, connectors will be standards-compliant to the greatest practical extent.) Adherence to standards will foster interoperability not only within the AQO but also among other observatories, cyberinfrastructure projects, and the emerging GEOSS efforts. Standards for finding, accessing, portraying, and processing geospatial data are defined by the Open Geospatial Consortium (OGC)<sup>2</sup>.
  
The Service Oriented Architecture (SOA) of DataFed is used to build web-applications by connecting the web service components (e.g. services for data access, transformation, fusion, rendering, etc.) in Lego-like assembly. The generic web-tools created in this fashion include catalogs for data discovery, browsers for spatial-temporal exploration, multi-view consoles, animators, multi-layer overlays, etc.(Figure 2).
+
The most established OGC specification is the ''Web Map Server (WMS)'' for exchanging map images, but the ''Web Feature Service (WFS)'' and ''Web Coverage Service (WCS)'' are gaining wider implementation. WFS provides queries to discrete feature data output in Geography Markup Language (GML) format. WCS allows access to multi-dimensional data that represent coverages, such as grids. While these standards are based on the geospatial domain, many are designed to be extended to support non-geographic data "dimensions."
  
A good illustration of the federated approach is the realtime AIRNOW dataset described in a companion paper in this issue (Wayland and Dye, 2005). The AIRNOW data are collected from the States, aggregated by the federal EPA and used for informing the public (Figure 1) through the AIRNOW website. In addition, the hourly real-time O3 and PM2.5 data are also made accessible to DataFed where they are translated on the fly into uniform format. Through the DataFed web interface, any user can access and display the AIRNOW data as time series and spatial maps, perform spatial-temporal filtering and aggregation, generate spatial and temporal overlays with other data layer and incorporate these user-generated data views into their own web pages.
+
The OGC specifications success has led to efforts to develop interfaces between them and other common data access protocols (e.g. OPeNDAP, THREDDS). For example the GALEON Interoperability Experiment, led by B. Domenico, is developing a WCS interface to netCDF datasets to "map" the multi-dimensional atmospheric model outputs into the three-dimensional geospatial framework. AQO development will apply and extend the WCS specification for accommodating air quality and atmospheric data. Advances from GALEON will be incorporated and the WCS will be extended to provide a powerful interface for building multi-dimensional queries to monitoring, model, and satellite data.  
As of early 2005, over 100 distributed air quality-relevant datasets have been ‘wrapped’ into the federated virtual database. About a dozen satellite and surface datasets are delivered within a day of the observations and two model outputs provide PM forecasts.  
 
  
This is about data protocols, discovery, access processing services
+
WCS service interaction is through a client-server 'conversation' by the client requesting the server to offer its list of capabilities (e.g. datasets).  Based on the returned capabilities, the client requests a more detailed description of a desired dataset, including return data format choices. In the third call to the server, the client sends specific data requests to the server, which are formulated in terms of a physical bounding box (X,Y,H) and time range. Following such universal data requests the server can deliver the data in the desired format, such that further processing in the client environment can proceed in a seamless manner. Currently the first two steps of the conversation are preformed by humans but it is hoped that new technologies will aid the (semi) automatic execution of 'find' and 'bind' operations for distributed data and services.
  
DataFed
+
Other evolving and emerging specifications will be explored. OGC ''Catalog Services'' support publishing and searching collections of metadata and services. The OGC Sensor Web Enablement activity includes a number of emerging specifications, including ''Sensor Observation Service'' for retrieving observation datasets such as those from ground-based monitoring networks. The proposed ''Web Processing Service'' offers geospatial operations, including traditional GIS processing and spatial analysis algorithms, to clients across networks.
  
Unidata
+
Interoperability testing and prototyping will be conducted through service compliance and standards gap analysis. Use of OGC specifications and interaction with OGC during the development of the AQO prototype will be facilitated by Northrop Grumman. Northrop Grumman acts as the Chair of the OGC Compliance and Interoperability Subcommittee and is nearing completion of an open source compliance engine soon to be adopted by the OGC. Compliance testing of the AQO prototype will ensure more complete interoperability and establish an AQO infrastructure that can be networked with other infrastructure developments.
  
===Extending Current Infrastructure===
+
==Protoype Air Quality Observatory==  
  
DataFed wrappers - data access and homogenization
+
Much of the effort in this project will be devoted to the Air Quality Observatory prototype. The functional design of the systems will incorporate responding to AQ events in real-time, delivering needed information to decision makers (AQ managers, public), and overcoming the syntactic and semantic impedances. The cyberinfrastructure design of the AQO will accommodate a variety of data sources; respond to different types of AQ events; offer simplicity via a Common Data Model; foster interoperability through standard protocols; provide user-programmable components for data filtering, aggregation, and fusion; employ a service-oriented architecture for linking loosely coupled web-services; and facilitate the creation of user-configurable monitoring consoles. The software application framework and the prototype will also serve as a test bed for testing advanced computer science ideas on robust distributed computing.
  
'''Standards Based Interoperability'''
+
[[Image:DataFed_ValueChainS.gif|left|frame|Figure 2. End-to-end information system for air quality analysis]]
 +
The prototype AQO will be an extension of the [http://DataFed.net DataFed] air quality information system, depicted in Figure 2. On the left side of the end-to-end system are the providers of heterogeneous distributed air quality-related datasets and on the far right are the users requiring reports and other high-level information products. The first processing stage is data homogenization into a common data model such as a multi-dimensional “cube.” This is accomplished by data wrappers. Next the cubed data are sliced, diced and otherwise filtered or aggregated by simple tools for exploring a given air pollution situation. More elaborate processing,in step 3, occurs by user-programmable web applications. The final products (prepared by analysts) are the reports and summaries for consideration by decision-makers.  The underlying infrastructure and tools for this type of analysis have been operational in DataFed since 2004. The system includes a catalog for publishing, finding, and describing datasets. There are data access wrapper classes for station-monitoring data, sequential images, grids, trajectories, and other data types. Wrappers for specific datasets can be prepared semi-automatically by filling out a registration form. Web services exist for gridding, aggregation, filtering, spatial and temporal rendering, view overlays, and annotations. User-defined data views are created by chaining web services for each data layer and then applying to the overlay services for the creation of multi-layered views (e.g. maps, time charts).   
  
Data and service interoperability among Air Quality Observatory (AQO) participants will be fostered through the implementation of accepted standards and protocols. Adherence to standards will foster interoperability not only within the AQO but also among other observatories, cyberinfrastrcuture projects, and the emerging GEOSS efforts.
+
The [http://www.unidata.ucar.edu/ Unidata] system will be the mediator for passing meteorological data into the observatory. The enabling Unidata technologies include real-time data distribution, advanced push technologies, cross-disciplinary desktop visualization tools,  mechanisms for tracking events, standards-based remote access, and others. The prototype Observatory will be greater than the sum of its parts since it will enable access to data and functionality of both systems and it will foster the fusion of multi-disciplinary data and synergism of the combined functionality.  
  
Standards for finding, accessing, portraying and processing geospatial data are defined by the Open Geospatial Consortium (OGC). The AQO will implement many of the OGC specifications for discovering and interacting with its data and tools. The OGC specifications we expect to use in developing the AQO prototype are described in Table X.
 
  
{| border="1" cellpadding="5" cellspacing="0" align="center"
+
===AQO Architecture===  
|+'''Table X: OGC Specifications'''
 
|-
 
! style="background:#efefef;" | Specification
 
! style="background:#efefef;" | Description
 
! style="background:#efefef;" | AQO Use
 
|-
 
|WMS
 
|Web Map Services support the creation, retrieval and display of registered and superimposed map views of information that can come simultaneously from multiple sources.
 
|DataFed supports WMS both as a server and client. By serving WMS layers, other map viewers can access and interact with air quality data. As a WMS client, DataFed is able to make use of the numerous WMS servers available.
 
|-
 
|WFS
 
|The Web Feature Service defines interfaces for accessing discrete geospatial data encoded in GML.
 
|Within AQO, WFS will allow users to build queries to retrieve point monitoring data in table formats (GML, CSV, etc.).
 
|-
 
|WCS
 
|Web Coverage Services allow access to multi-dimensional data that represent coverages, such as grids and point data of spatially continuous phenomena.
 
|The early phases of AQO development will actively explore the use of the WCS specification. Advances made from GALEON will be incorporated and the WCS will be extended to provide a powerful interface for building multi-dimensional queries to monitoring, model, and satellite data.
 
|-
 
|CSW
 
|Catalog services support publishing and searching collections of metadata, services, and related information objects. Metadata in catalogs represent resource characteristics that can be queried and presented for humans and software.
 
|The DataFed and THREDDS catalogs have not implemented the OGC catalog service [Ben, has any work been done on a THREDDS-CSW interface?]. The CSW specification offers an approach for DataFed and THREDDS to interoperate at the catalog level by exchanging metadata. An AQO catalog service would provide an interface to other catalogs that have implemented the specification, such as Geospatial One Stop.
 
|-
 
|SWE
 
|Specifications emerging from the Sensor Web Enablement activity include ''SensorML'' for describing instruments, ''observations & measurements'' for describing sensor data, ''sensor observation service'' for retrieving data and ''sensor planning service'' for managing sensors.
 
|Ground-based environmental monitors are a key consideration in the development of these "sensor" specifications. Much of the data accessible through THREDDS and DataFed originate from monitoring networks and by making use of the accomplishments in the SWE, AQO could  in developing data models for describing and exchanging data, AQO could the SWE specifications.
 
|-
 
|WPS
 
|The proposed Web Processing Service offers geospatial operations, including traditional GIS processing and spatial analysis algorithms, to clients across networks.
 
|The proposed AQO plans to include services not only for accessing and visualizing data but also to conduct data analysis. The ongoing WPS specification effort could serve as a useful resource for developing these analysis services and, if WPS is adopted as a specificaiton, would provide another interoperable connection to the broader envrionemntal and geospatial communities.
 
|-
 
|}
 
The most well established OGC specification is the Web Map Server for exchanging map images but the Web Feature Service and Web Coverage Service are gaining wider implementation. While these standards are based on the geospatial domain, they are being extended to support non-spatial aspects of geospatial data. For example, the WFS revision working group is presently revising the specification to include support for time. WCS is being revised to support coverage formats other than grids.
 
  
The success of OGC specifications have led to efforts to develop interfaces between them and other common data access protocols. For example the GALEON Interoperability Experiment, led by Ben Dominico at Unidata, is developing a WCS interface to netCDF. [Ben, anything more to add??]
+
[[Image:Unidata DataFedLink IDDnodeS.gif|left|frame|Figure 3. Network architecture of the Air Quality Observatory]]
  
Use of OGC specifications and interaction with OGC during the development of the AQO prototype will be facilitated by Northrop Grumman IT TASC (NG). NG has developed a GeoEnterprise Architecture approach for developing integrated solutions by leveraging the large body of geospatial standards, specifications, architectures, and services. As part of this effort, Northrop Grumman has participated in Technology Interoperability Experiments (TIEs) where multiple organizations collaborate to test ability to exchange distributed geospatial information using standards.
+
The architectural design of the Air Quality Observatory follows that of '''networks''' as illustrated in Figure 3. In fact, there will be two networks operating. At the bottom is the Unidata LDM network that pushes meteorological data among its nodes.  This network will receive air quality data from DataFed through a dedicated LDM node. The benefit of this arrangement is that the university meteorology departments who are already members of the LDM network will receive real-time air quality data through their normal channels. The associated AQO network, will disseminate air quality data and also ingest meteorological data delivered, in real time, via the LDM network. To the users of each network, the other network will be opaque. This two-way coupling of the two networks will be a challenging experiment in linking the push-based messaging of LDM with the pull-driven data flow of AQO.
  
Achieving interoperability among the components of AQO will involve close interaction among its participants. Interoperability testing and prototyping will be conducted through service compliance and standards gap analysis.  
+
The AQO network will consist of nodes which can be used both as servers as well as clients of air-quality-relevant data. The nodes will belong to different organizations, that serve a variety of communities, and that have different data needs. There is also considerable heterogeneity in which the nodes conduct their business internally: finding, accessing, transforming, and delivering data. In Figure 3, the Unidata and DataFed netwok nodes are shown in more detail.  
  
'''''Service compliance''''' Northrop Grumman acts as the Chair of the OGC Compliance and Interoperability Subcommittee and is nearing completion of an open source compliance engine soon to be adopted by the OGC. Through compliance testing, NG's technical team has validated interoperability of various platforms and diverse geospatial data warehouses. Proving compliance testing to AQO components early in the prototype development process will ensure faster and more complete interoperability and establish an AQO infrastructure that can be networked with other infrastructure developments.
+
Other server nodes in the AQO network will include the NASA Goddard DAAC server, which provides access to an immense volume of satellite data contained in their “data pool.” At that server, a WCS interface is being implemented that will facilitate its joining the AQO network. The managers of the Goddard DAAC node have also expressed strong interest in accessing air quality data from DataFed and meteorological data from Unidata. Similarly, there are good prospects of adding an EPA node to the AQO, which will serve air quality model forecasts as well as provide access to an array of monitoring data. The participation of these additional AQO nodes is to be arranged and performed independently. However, this NSF AQO prototype project will provide the architectural framework for such networking and connectivity tools such as adapters, and also can serve as the testbed for the expanding AQO nodes. Attracting new nodes and guiding their inclusion will be pursued by the project team members through multiple venues: membership through workgroups, ESIP Federation meetings, and training workshops.
  
'''''Standards gap analysis''''' To fully exploit the multi-dimensional nature of the data (x, y, z, time, multi-parameters) query statements and portrayal services would need to support more than the traditional GIS map-focused perspective. Current OGC specifications lay a solid foundation upon which to add these capabilities.  AQO development will extend and customize standards as needed and will forward these modifications to OGC for consideration in future versions of the specificationsThe AQO development team has extensive experience in evaluating and enhancing geospatial standards. For example, Northrop Grumman is pressently involved in a National Technology Alliance project testing and extending OGC specifications to more fully support the temporal dimension.  
+
The '''interoperability''' of these heterogeneous systems can be achieved through the adaptation of a uniform query protocol such as provided by the OGC Web Coverage Service. To achieve this level of interoperability, a significant role is assigned to the adapter services (Figure 3) that can translate data formats and make other adjustments of the data syntaxThese services can be provided by the data server, by the client or by third-party mediators, such as DataFed and Unidata. This type of “loose coupling” between clients and servers will allow the creation of dynamic, user-defined processing chains.  
  
[[Image:AQO_Network.gif|left]]
+
The AQO will be connected through a mediated peer-to-peer network. The mediation will be performed by centralized or virtually centralized '''catalog services''' which enable the Publish-Find operations needed for loosely-coupled, web-service-based networking. The Bind, i.e. data access operation, will be executed directly through a protocol-driven peer-to-peer approach. The Unidata THREDDS system performs the meteorological data brokering, while the DataFed Catalog serves the same purpose for the AQ data. Other candidate AQO nodes are currently brokered through their own catalog services, e.g. ECHO for the NASA Goddard DAAC. The unification (physically or virtually) of these distributed catalog services will be performed using the best available emerging web service brokering technologies.
  
Unidata THREDDS middleware for data discovery and use; and test beds that assure the data exchange is indeed interoperable, e.g. Unidata-OGC GALEON Interoperability Experiment/Network.
+
===Combined Air Quality - Meteorology Tools===
[much more Unidata stuff here] [Stefan OGC W*S standards] [CAPITA data wrapping, virtual SQL query for point data]
 
  
===New Activities Extending the Infrastructure===
+
"Today the boundaries between all disciplines overlap and converge at an accelerating pace. Progress in one area seeds advances in another. New tools can serve many disciplines, and even accelerate interdisciplinary work." (Rita R. Colwell, Director, NSF, February 2003). Over the past decades, many in the AQ and the meteorological communities have had a desire to create new knowledge together, but with marginal success.
  
Common Data Model [How about Stefano Nativi's semantic mediation]
+
The AQO will offer the possibility of creating powerful new synergistic tools, such as the Combined Aerosol Trajectory Tool, CATT<sup>3</sup>.  In CATT, air quality and pollutant transport data are combined in an exploration tool that highlights the source regions from which high or low pollutant concentrations originate. Advanced data fusion algorithms applied in CATT have already contributed to the elucidation of unexpected aerosol source regions, such as the nitrate source over the Upper Midwest<sup>4</sup>.   
  
Networking. [Semantic mediation of distributed data and services]
+
Figure 4 illustrates the current capabilities of the tool that highlights in red the airmass trajectories that carry the highest concentration of sulfate over the Eastern US on a particular day. Currently, the real-time application of such a diagnostic tool is not possible, since the necessary IT infrastructure for bringing together and fusing the AQ and transport data does not exist. <br>   
[Jeff Ullman Mediator-as-view] [Purposeful pursuit of maximizing the Network Effect]
 
[Value chains, value networks]
 
  
The novel technology development will focus on the framework for building distributed data analysis applications using loosely coupled web service component.  By these technologies, applications will be built by dynamically 'orchestrating' the information processing components. .....[to perform an array of user-defined processing applications]. The user-configurable applications will include Analysts Consoles for real-time monitoring and analysis of air pollution  events, workflow programs for more elaborate processing and tools for intelligent multi-sensory data fusion. Most of these technologies are already part of the CAPITA DataFed access and analysis system, developed through support from NSF, NASA, EPA and other agencies.  Similarly, and increasing array of web service components are now being offered various providers. However, a crucial missing piece is the testing of service interoperability  and the development of the necessary service-adapters that will facilitate interoperability and service chaining......  [more on evolvable, fault tolerance web apps ..from Ken Goldman here] [also link to Unidata LEAD project here]
 
  
[[[Ken Goldman added the following Friday, January 20, 2006:
+
[[Image:CATTTrajToolS.gif|left|frame|Figure 4. CATT diagnostic tool for dirty and clean air source regions.]] <br>
  
The proposed Observatory will consist of a large collection of independent data services, as well as applications that operate on those data services in order to collect, analyze and disseminate information.  Applications will be created by multiple organizations and will require interaction with the data and applications created by other organizations.  Furthermore, individual applications will need to be modified over time without disruption of the other applications that depend upon them.
 
  
To support this high degree of interoperability and dynamic change, we plan to leverage our ongoing research efforts on the creation of a shared infrastructure for the execution of distributed applications [TG06].  Important goals of for this work include support for installation, execution, and evolution (live upgrades) of long-running distributed applications.  For those applications that require high levels of robustness, the infrastructure will provide strong guarantees that installed applications will continue to execute correctly in spite of failures and attacks, and that they will not interfere with one another.  To support widespread sharing of resources and information, the computing infrastructure is being designed in a decentralized way, with computing resources provided by a multitude of independently administered hosts with independent security policies.
+
With the IT infrastructure of the Air Quality Observatory, which seamlessly links real-time AQ monitoring data to current and forecast meteorology, the CATT tool will be a significant addition to the toolbox of air quality analysts and meteorologists.
  
The execution model for this infrastructure captures a wide class of applications and supports integration of legacy systems, including applications written using SOAP.  The execution model consists of an interconnected graph of data repositories and the work flow transactions that access them. By separating repositories from computation, the model simplifies the creation of applications that span multiple organizations. For example, one organization might install an application that reads from the data repository of a second organization and writes into a repository used by a third organization.  Each application will specify its own security policies and fault-tolerance requirements.
+
Other synergistic tools include Analysts Consoles, which consist of an array of maps and charts similar to the 'meteorolgical wall.' The View-based data processing and rendering system of DataFed is well suited to create such data views from the distributed data streams. Early protoyping has shown that Virtual [http://www.datafed.net/consoles/realtime_consoles.asp?datetime=now-48&image_width=260&image_height=110 Analysts Consoles] are indeed feasible. However, there is considerble development required to make the preparation of user-defined data views easy and fast.
  
Building on prior experience in constructing distributed systems infrastructure [ADD CITATIONS], work is underway is to design and implement algorithms, protocols, and middleware for a practical shared computing infrastructure that is incrementally deployable.  The architecture will feature dedicated data servers and transaction servers that communicate over the Internet, that run on heterogeneous hosts, and that are maintained and administered by independent service providers.  For applications that require fault-tolerance, the servers will participate in replica groups that use an efficient Byzantine agreement protocol to survive arbitrary failures, provided that the number of faulty replicas in a group is less than one third of the total number of replicas.  Consequently, the infrastructure will provide guarantees that once information enters the system, it will continue to be processed to completion, even though processing spans multiple applications and administrative domains.
+
===Distributed Computing Research and Applications===
  
The Observatory will be able to benefit from this infrastructure in several ways, most notably ease of application deployment and interoperability.  In addition, the infrastructure will provide opportunities for reliable automated data monitoring.  For example, we anticipate that ongoing computations, such as those that perform “gridding” operations on generated data points, will be installed into the system and happen automatically.  Moreover, that some of the data analysis that is currently performed on demand could be installed into the system for ongoing periodic execution.  This will result in the availability of shared data repositories not only for raw data, but also for information that is the result of computational synthesis of data obtained from multiple data sources.  Researchers will be able to install applications into the infrastructure to make further use of these data sources, as well as the raw data sources, as input to their application.  The fact that installation in this infrastructure is managed as a computation graph provides additional structure for certifying the source and derivation of information.  Knowing the sources and destinations of each information flow in the computation graph could enable, for example, the construction of a computation trace for a given result.  This could be useful for verifying the legitimacy of the information, since the information trace would reveal the source of the raw data and how the result was computed.
+
The currently used simple Web service execution engine of DataFed will be extended with advanced capabilities arising from reseach in distributed computing.  
  
---end of Ken Goldman’s text]]]
+
The proposed Observatory will consist of a large collection of independent data services, as well as applications that operate on those data services in order to collect, analyze, and disseminate information.  Applications will be created by multiple organizations and will require interaction with data and applications created by other organizations.  Furthermore, individual applications will need to be modified over time without disruption of the other applications that depend upon them.  Computing resources will be provided by a multitude of independently-administered hosts with independent security policies.
  
Support for networked community interactions by creating web-based communication channels, aid cooperation through the sharing and reuse of multidisciplinary (air chemistry, meteorology, etc) AQ data, services and tools and by providing infrastructure support for group coordination among researchers, managers for achieving common objectives such as research, management and educational projects. [Unidata community support]               
+
To support this high degree of interoperability, we plan to investigate multiple paradigms for the construction of interoperable distributed applications.  One promising approach involves a logical separation of data and process. Data repositories are seen as purely passive entities, which are acted upon by separatly installed processes (transactions). In this way, independent organizations can contribute to the Observatory by installing into the infrastructure an interoperating mixture of data and processes.  This separation also simplifies construction and maintenance of applications that span multiple organizations, because the code for ongoing processes need not be integrated with the code for data servers that may be developed by others. We plan to leverage our ongoing research efforts on the creation of a shared infrastructure for the reliable execution of long-lived, distributed applications<sup>5</sup>.  Important design goals include decentralized control, support for installation, distributed execution, and evolution (live upgrades of the system and applications), as well as the aforementioned principles of polymorphism and high-level abstract data typing.
The exploratory data analysis tools built on top of this infrastructure will seamlessly access these data, facilitate data integration and fusion operations and allow user-configuration of the analysis steps. ...[ including simple diagnostic AQ models driven by data in the Unidata system]. The resulting insights will help developing AQ management responses to the phenomenon and contribute to the scientific elucidation of this unusual phenomenon. [cyberinfrastructure-long end-to-end value chain, many players].
 
  
==Protoype Air Quality Observatory==
+
Building on prior experience in constructing distributed systems infrastructure<sup>6,7,8</sup>, work is underway is to design and implement algorithms, protocols, and middleware for a practical shared computing infrastructure that is incrementally deployable.  The architecture will feature dedicated servers that communicate over the Internet, run on heterogeneous hosts, and are maintained and administered by independent service providers.  The execution model for this infrastructure captures a wide class of applications and supports integration of legacy systems, including applications written using SOAP.  The system configuration consists of an interconnected graph of data repositories and the work flow transactions that access them.  We will investigate analysis of the graph for "information traces" that verify information authenticity.  Each application will specify its own security policies and fault-tolerance requirements.  For critical applications, servers will participate in replica groups that use an efficient Byzantine agreement protocol to survive a bounded number of arbitrary failures.  Through this mechanism, the infrastructure will provide guarantees that once information enters the system, it will continue to be processed to completion, even though processing spans multiple applications and administrative domains. 
  
===Extending Current Prototype===
+
The Observatory will benefit from this infrastructure in several ways, most notably ease of application deployment and interoperability.  In addition, the infrastructure will provide opportunities for reliable automated data monitoring.  For example, we anticipate that ongoing computations, such as those that perform “gridding” operations on generated data points, will be installed into the system.  Some of the data analysis currently performed on demand could be installed for periodic execution.  This will result in the availability of shared data repositories not only for raw data, but also for information that is the result of computational synthesis of data obtained from multiple data sources.  Researchers will be able to install applications into the infrastructure to make further use of these data sources, as well as the raw data sources, as input to their applications.
  
DataFed
+
==Use Cases for Prototype Demonstration==
  
THREDDS
+
The proposed AQO prototype will be demonstrated through three cross-cutting use cases. The cases are true scientific challenges and also contributions to AQ management. These topics are also areas of active research in atmospheric chemistry and transport at CAPITA and other groups. The cases will be end-to end applications, connecting real data produces and mediators, as well as decision makers.
 +
   
 +
#'''Intercontinental Pollutant Transport'''.  Sahara dust over the Southeast, Asian dust and pollution over the Western US, and smoke from Canada and Mexico constitute significant contrubutions to air pollution. The problem is being addressed by the Task Force on Hemispheric Transport of Air Pollutants under the Convention on Long-range Transboundary Air Pollution. According to the Task-Force Co-chair, "The proposed AQO will provide a powerful tool for integrating information for use by the Task Force." (See letter of collaboration by T. Keating)
  
===New Prototyping Activities===
+
#'''Exceptional Events'''. This use case will demonstrate real-time data access/processing/delivery/response as well as the capability of AQO to trigger managerial responses. Exceptional AQ events include smoke from natural and some anthropogenic fires, windblown dust events, volcanoes, and also long range pollution transport events from sources such as other continents. The AQO prototype system will provide real-time characterization and near-term forecasting that can be used for preventive action triggers, such as public health warnings. Exceptional events are also important for long-term AQ management since EE samples can be flagged for exclusion from the National Ambient Air Quality Standards calculations. AQO IT support will be provided to federal and state agencies. (See letter of collaboration by R. Poirot)
  
Networking: Connecting Data DataFed, THREDDS, other nodes to make System of Systems
+
# '''Midwestern Nitrate Anomaly'''. Over the last few years, a mysterious pollutant source has caused the rise of pollutant levels in excess of the AQ standard over much of the Upper Midwest in the winter/spring. Nitrogen sources are suspected. The phenomenon has eluded detection and quantification since the area was not monitored, but recent intense sampling campaigns have implicated NOX and Ammonia release from agricultural fields during and after snow melt. This AQO use case will integrate and facilitate access to data from soil quality, agricultural fertilizer concentration and flow, snow chemistry, surface meteorology, and air chemistry.
  
Service Interoparability chaining 
+
==Participant Qualifications==
  
Processing Applications, novel ways [Loose Coupling, Service Adapters, KenGoldman stuff]...
+
DataFed is a community-supported effort led by CAPITA at Washington University. While the data integration Web services infrastructure was initially supported by specific information technology grants form NSF and NASA, the data resources are contributed by the autonomous providers. The application of the federated data and tools is in the hands of users as part of specific projects. Similar to how data quality improves by passing it through many hands, the analysis tools will also improve with use and feedback from data analysts. A partial list is at http://datafed.net/projects. Rudolf Husar is Professor of Mechanical Engineering, and Director of the Center for Air Pollution Impact and Trend Analysis (CAPITA) and will lead the DataFed integration into AQO. Dr. Husar brings 30+ years of experience in AQ analysis and environmental informatics to the AQO project.  
  
==Cross-Cutting Use Cases for Prototype Demonstration==
+
Unidata is a diverse [http://www.unidata.ucar.edu/community community] of education and research institutions vested in the common goal of sharing data, tools to access the data, and software to use and visualize the data. Successful cooperative endeavors have been launched through Unidata and its member institutions to enrich the geosciences community. Unidata's governing committees facilitate consensus building for future directions for the program and establish standards of involvement for the community. Ben Domenico is Deputy Director of Unidata. Since its inception in 1983, Domenico has played a key role in turning Unidata into one of the  earliest examples of  successful cyberinfrastructure; providing data, tools, and general building support to the meteorological research and education community. 
  
 +
The Goldman research group at Washington University has been working on JPie, a novel visual programming environment that supports live construction of running applications.  In addition, the group is currently working on algorithms and middleware for a fault-tolerant shared infrastructure that supports evolvable long-running distributed applications. Kenneth J. Goldman is an Associate Professor in the Washington University Department of Computer Science and Engineering and brings to this project over 20 years of research experience in the areas of distributed systems and programming environments.
  
The proposed Air Quality Observatory prototype will demonstrate the benefits of the IIT through three use cases that are both integrating [cross-cutting], make a true contribution to AQ science and management and also place significant demand on the IIT. Use case selection driven by user needs: [letters EPA, LADCO, IGAC]. Not by coincidence, these topics are areas of active research in atmospheric chemistry and transport at CAPITA and other groups. The cases will be end-to end, connecting real data produces, mediators and well as decision makers. Prototype will be demonstration of seamless data discovery and access, flexible analysis tools and delivery.<br>
+
Northrop Grumman Corporation (NG) contributes expertise in development and implementation of geospatial applications, architectures and enterprise-wide solutions.  NG is a Principal Member of the OGC and has been influential in defining an interoperable open infrastructure that is shared across user communities. Through its development of OGC’s Compliance Testing Tools, NG leads the geospatial community in insight into service providers’ and GIS vendors’ compliance to OGC standards. Stefan Falke, Systems Engineer with NG, will lead the NG team. He brings experience in applying OGC-based services (including DataFed's) to NG's projects. As a part-time research professor of Environmental Engineering at Washington University, he is involved in air quality cyberinfrastructure projects for satellite and emissions data. Dr. Falke is co-lead for the ESIP Air Quality Cluster.
  
[use future IT scenarios to illustrate the contribution of the advanced AQP IT]
+
==Workplan==
   
+
The AQO project will be lead by Rudolf Husar (CAPITA) and Ben Domenico (Unidata). CAPITA and Unidata, with their rich history and the experience of their staff, will be the pillars of the AQO. The active members of the AQO network will come from the ranks of data providers, data users, and value-adding mediators-analysts. The latter group will consist of existing AQ research projects funded by EPA, NASA, NOAA, and NSF that have data, tools, or expertise to contribute to the shared AQO pool. Ken
1)<b> Intercontinental Pollutant Transport.</b> Sahara dust over the Southeast, Asian dust, pollution, [20+ JGR papers facilitated on the Asian Dust Events of April 1998 - now more can be done, faster and better with AQO] [letter from Terry Keating?] 
+
Goldman will use the AQO as a testbed for the testing of advanced distributed application frameworks. Stefan Falke will coordinate the OGC interoperability compliance and geospatial standards gap analysis.
  
2)<b> Exceptional Events</b>. The second AQO use case will be demonstration of real-time data access/processing/delivery/response system for Exceptional Events (EE). Exceptional AQ events include, smoke from natural and some anthropogenic fires, windblown dust events, volcanoes and also long range pollution transport events from sources such as other continents. A key feature of exceptional events is that they tend to be episodic with very high short-term concentrations.  The AQO information prototype system needs will provide real-time characterization and near-term forecasting, that can be used for preventive action triggers, such as warnings to the public. Exceptional events are also important for long-term AQ management since EE samples can be flagged for exlosion from the National Ambient Air Quality Standards calculations. The IIT support by both state agencies and fedral gov...[need a para on the IIT support to global science e.g. IGAC projets] During extreme air quality events, the stakeholders need more extensive 'just in time analysis’, not just qualitative air quality information.  
+
Coordination among activities and organizations will be fostered through a combination of virtual and physical interaction. Approximately every sixth months the core AQO team will meet as part of the Earth Science Information Partners (ESIP)/Air Quality Cluster. The ESIP Federation and AQ Cluster provide an inter-agency and inter-organization environment. General team communication will be handled through a shared Wiki site.
  
 +
=<center>Collaborators</center>=
  
3)<b> Midwestern Nitrate Anomaly</b>. Over the last two years, a mysterious pollutant source has caused the rise of pollutant levels in excess of the AQ standard over much of the Upper Midwest in the winter/spring. Nitrogen sources are suspected since a sharp rise in nitrate aerosol is a key air component. The phenomenon has eluded detection and quantification since the area was not monitored but recent intense sampling campaigns have implicated NOX and Ammonia release from agricultural fields during snow melt. This AQO use case will integrate and facilitate access to data from soil quality, agricultural fertilizer concentration and flow, snow chemistry, surface meteorology and air chemistry.
+
The Air Quality Observatory will be a community activity involving many of the key data providers, users and researchers. Four specific collaborations have been singled out to indicate the nature of this collaborative activity.  
  
==Observatory Guiding Principles, Governance, Personnel==
+
'''Terry Keating''', Co-Chair of the Task Force on Hemispheric Transport of Air Pollutants under the Convention on Long-range Transboundary Air Pollution. Dr. Keating and the Task Force will collaborate with the Observatory regarding Intercontinental Pollutant Transport issues.
  
Guiding Priciples: openness, networking, 'harnessing the winds' [of change in technology, attitudes]
+
'''Rich Poirot''', Co-Chair of the National Inter-RPO Monitoring and Data Analysis Workgroup will collaborate with the AQO team in the area of regional haze and fine particle pollution.
 
 
  
<i>[everybody needs to show off their hats and feathers here, dont be too shy]</i>The AQO project will be lead by Rudolf Husar and Ben Domenico. Husar is Professor of Mechanical engineering and director of the Center for Air Pollution Impact and Trend Analysis (CAPITA) and brings 30+ years of experience in AQ analysis and environmental informatics to AQO project. Ben Domenico, Deputy Director of Unidata. Since its inception in 1983, Domenico was an engine that  turned Unidata into one of the  earliest examples of  successful cyberinfrastructure, providing data, tools and general building support to the meteorological research and education community.  CAPITA and Unidata with their rich history and the experience of their staff will be the pillars of the AQO. The active members of the AQO network will be from the ranks of data providers, data users and value-adding mediators-analysts. The latter group will consist of existing AQ research projects funded by EPA, NASA, NOAA, NSF that have data, tools, or expertise to contribute to the shared AQO pool.  The communication venue for the AQO will be through the Earth Science Information Partners (ESIP), as part of the Air Quality Cluster [agency/organization neutral].
+
'''Neil Frank''', Senior Air Quality Data Advisor, USEPA, Emissions, Monitoring and Analysis Division will continue long-term collaboration of his office with DataFed in the area of regulatory decision making.  
The governance of the Observatory ... reps from data providers (NASA/NOAA), users (EPA), AQ science/Projects
 
Use agency-neutral ESIP/AQ cluster as the interaction platform -- AQO Project wiki on ESIP. Use ESIP meetings to have AQO project meetings. [Ben/Dave could use help here on governace]....
 
  
 +
'''Richard Wirtz''', Executive Director of the Foundation for Earth Science, collaborates with the project with the AQO project by providing support through the ESIP Air Quality Cluster.
  
 +
=References Cited=
  
Stefan Falke is a systems engineer with NG and will lead their participation in AQO. He has made use of DataFed web services through Northrop Grumman projects. Dr. Falke is a part-time research professor of Environmental Engineering at Washington University. He is co-PI with Dr. Husar on the REASoN project and PI on an EPA funded cyberinfrastrucutre project focused on air emissions databases. Dr. Falke is co-lead for the ESIP air quality cluster.  At Washington University, he teaches an environmental spatial data analysis course in which AQO could be used by students in their semester projects. From 2000-2002, Dr. Falke was a AAAS Science & Technology Policy fellow at the EPA where he served as liason with OGC in the initial development of the Sensor Web Enablement activity.
+
1. National Ambient Air Monitoring Strategy, Draft, OAQPS. USEPA, December 2005, http://www.epa.gov/ttn/amtic/files/ambient/monitorstrat/naamstrat2005.pdf
  
----
+
2. Buehler, K. and McKee, L. The Open GIS Guide: Introduction to Interoperable Geoprocessing and the OpenGIS Specification, Waltham, MA (1998).
DataFed is a community-supported effort. While the data integration web services infrastructure was initially supported by specific information technology grants form NSF and NASA, the data resources are contributed by the autonomous providers. The application of the federated data and tools is in the hands of users as part of specific projects. Just like the way the quality data improves by passing it through many hands, the analysis tools will also be improve with use and feedback from data analysts. A partial list is at http://datafed.net/projects. At this time the DataFed-FASTNET user community is small but substantial efforts are under way to encourage and facilitate broader participation through larger organizations such as the Earth Science Information Partners (ESIP) Federation (NASA, NOAA, EPA main member agencies) and the Regional Planning Organizations (RPOs) for regional haze management.
 
  
============
+
3. Husar, R. B.; Höijärvi, K.; Poirot, R. L.; Kayin S.; Gebhart, K. A.; Schichtel, B. A. and Malm, W. C. Combined Aerosol Trajectory Tool, CATT: Status Report on Tools Development and Use,
Northrop Grumman (NG) is a Principal Member of the OGC and has been helping to define an interoperable open infrastructure that is shared across user communities. Through its development of OGC’s Compliance Testing Tools, NG leads the geospatial community in insight into service providers’ and GIS vendors’ compliance to OGC standards. Northrop Grumman actively supports the US Geospatial Intelligence Foundation (USGIF) and linked corporate partners, their tools, capabilities, and technologies to show the power of standards-based web services. NG has been providing geospatial applications, architectures and enterprise-wide solutions to the U.S. Government, military and homeland security organizations.
+
Paper # 97, A&WMA Specialty Conference on Regional and Global Perspectives on Haze: Causes, Consequences and Controversies, Asheville, NC, October 25-29, 2004.
  
==Activity Schedule==
+
4. Husar, R.; Poirot, R. DataFed and Fastnet: Tools for Agile Air Quality Analysis; Environmental Manager 2005, September, 39-41
  
Infrastructure
+
5. Thorvaldsson, H. D.; Goldman, K. J. "Architecture and Execution Model for a Survivable Workflow Transaction Infrastructure." Washington University Department of Computer Science and Engineering, Technical Report TR-2005-61, December 2005.
  
Prototype
+
6. Goldman, K. J.; Bala Swaminathan, B.; McCartney, T. P.; Michael D. Anderson, M. D. and Sethuraman, R. The Programmers' Playground: I/O Abstraction for User-Configurable Distributed Applications. IEEE Transactions on Software Engineering, 21(9):735-746, September 1995.
 
 
Use Cases
 
 
 
=References Cited=
 
  
Husar, R.B., et al. The Asian Dust Events of April 1998; J. Geophys. Res. Atmos. 2001, 106, 18317-18330. See event website: http://capita.wustl.edu/Asia-FarEast/
+
7. Pallemulle, S. L;  Goldman, K.J. and Morgan, B. E. Supporting Live Development of SOAP and CORBA Servers. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS’05), pages 553-562, Washington DC, 2005.
  
Husar, R.; Poirot, R. DataFed and Fastnet: Tools for Agile Air Quality Analysis; Environmental Manager 2005, September, 39-41
+
8. Parwatikar, J. K; Engebretson, A. M; McCartney, T. P.; Dehart, J.D. and Goldman, K. J. Vaudeville: A High Performance, Voice Activated Teleconferencing Application, Multimedia Tools and Applications, 10(1): 5-22, January 2000.
 +
Buehler, K. and L. McKee, The OpenGIS Guide: Introduction to Interoperable Geoprocessing and the OpenGIS Specification, Waltham, MA (1998).
  
Wayland, R.A.; Dye, T.S. AIRNOW: America’s Resource for Real-Time and Forecasted Air Quality Information; Environmental Manager 2005, September, 19-27.
 
  
National Ambient Air Monitoring Strategy {(NAAMS)}
 
  
=Biographical Sketches=
 
  
=Collaborators and Other Personnel=
+
[[Category:AQOProposal]][[PS::Done]][[PS::Atomic]][[date::2006-01-25]]

Latest revision as of 19:50, June 2, 2012

Links to: AQO Proposal Main Page > Proposal | Proposal Discussion| NSF Solicitation | NSF Solicitation Discussion | People



Link to PDF

Project Summary

Research and management of air quality is addressed by diverse, but linked, communities that deal with emissions, atmospheric transport and removal processes, chemical transformations, and environmental/health effects. Among the most complex of the attendant cross-disciplinary links is the interaction between atmospheric chemistry and meteorology.

The goal of this Air Quality Observatory (AQO) project is to enhance the linkage between these communities through effective cyberinfrastructure. In air quality, a recently developed infrastructure (DataFed) provides access to over 30 distributed data sets (emissions, concentrations, depositions), along with web-based processing capabilities that serve both research and management needs. In meteorology, long-standing infrastructure developed by Unidata supports a large community of researchers, educators, and decision-makers needing observational data. These two cyberinfrastructure components are most useful at present within the scopes of their respective communities, and great opportunity exists for widening their combined effectiveness.

Intellectual Merit. The overarching contribution of this project is to advance cross-community interoperability among cyberinfrastructure components that are critical in multidisciplinary environmental observation. Interoperability topics will include: access methods for heterogeneous data sources; adapters and transformers to aid in connecting Web services from distinct communities; and designing service-oriented architectures for simplicity and extensibility in the presence of semantic impedance. Leveraging on current capabilities of Unidata and DataFed, new understandings will be gained about the benefits of abstract data typing, service adaptors, and polymorphism in such architectures.

The framework for these advances will be an end-to-end prototype whose functional design will enable intellectual advances in the application domain (as well as in cyberinfrastructure). Several use cases will be explored that require cross-community sharing of tools and data, including aggregations of observed and simulated information from multiple sources. An unusual aspect of the AQO will be the extent to which it joins Unidata's push-style capabilities for handling real-time, asynchronous data streams with more traditional pull-style Web services.

Broader Impact. The AQO will support diverse learners within and beyond the DataFed and Unidata communities. Lessons learned in this project will inform builders of other cross-disciplinary cyberinfrastructure, especially those facing semantic impedances and the challenges of real-time streams. Finally, the observatory will support many users, such as federal and state AQ managers performing status and trend analyses, managing exceptional events, or evaluating monitoring networks.

The AQO will leverage, augment, and integrate DataFed and Unidata in a prototype cyberinfrastructure component that better serves researchers, decision-makers, teachers and students of air quality, meteorology, and related fields by overcoming key difficulties. The research team from Washington University and Unidata has decades of experience in developing information technologies and applying them to air-quality analysis, meteorology, and environmental engineering.

Past NSF Projects

Rudolf Husar's "Collaboration Through Virtual Workgroups" project (NSF ATM-ITR small grant #0113868, $445,768, 9/01/01 to 8/31/04, ITR 2001-2004) found that web services are mature enough for integration of distributed, heterogeneous, and autonomous datasets into homogeneous, federated datasets. The developed DataFed (http://datafed.net) tools allowed real-time, 'just-in-time' data analysis for the characterization of major air pollution events.

Ben Domenico's "Thematic Real-time Environmental Distributed Data (THREDDS)" project (DUE-0121623, $900,001, 10/01/2001 to 09/30/2003) was a collaborative initiative to build a software infrastructure to provide students, educators, and researchers with Internet access to large collections of real-time and archived environmental datasets from distributed servers. Unidata is also nearing completion on "THREDDS Second Generation" (DUE-0333600, $554,993, 10/01/03 to 09/30/06) which focuses on integrating GIS information via Open GIS protocols. Additional THREDDS details can be found here: http://www.unidata.ucar.edu/projects/THREDDS/. Dr. Domenico has been involved in several other successful efforts: "Unidata 2008: Shaping the Future of Data Use in the Geosciences"(ATM-0317610, $22,800,000, 10/01/03 to 09/30/08); "Linked Environments for Atmospheric Discovery (LEAD)" (ATM-0331587, $11,250,000, 10/01/03 to 09/30/08); and "DLESE Data Services: Facilitating the Development and Effective Use of Earth System Science Data in Education" (EAR-0305045, $390,985, 09/30/03 to 08/31/06).

Ken Goldman's "Interactive Learning Environment for Introductory Computer Science" project (EIA-0305954, $514,996.00, 8/15/03 to 7/31/06) involves the development of JPie, an interactive programming environment designed to make object-oriented software development accessible to a wider audience. Programs are constructed by graphical manipulation of functional components so inexperienced programmers can achieve early success without the steep learning curve that precedes development in a traditional textual language. Recent JPie extensions support dynamic server interface changes, with support for both SOAP and CORBA.

Intellectual and Technical Merit

The overarching technological contribution of this project is to advance cross-community interoperability among cyberinfrastructure components that are critical in contemporary environmental observation. The tangible outcomes will include a prototype observatory that provides genuine end-to-end services needed by two distinct communities of users and simultaneously advances the state of the art in designing observatories for multidisciplinary communities of users. Each of the communities participating in this study have operational systems that will be leveraged to create the prototype, but the marriage of their systems presents significant design challenges within which to study important interoperability questions.

Specifically, the joining of the air-quality and meteorology communities will require (1) effective global access to distinct but overlapping, heterogeneous data streams and data sets; (2) use of these data in distinct but overlapping sets of tools and services, to meet complex needs for analysis, synthesis, display, and decision support; and (3) new combinations of these data and (chained) services such as can be achieved only in a distributed, service-oriented architecture that exhibits excellence of functional and technical design, including means to overcome the semantic differences that naturally arise when communities develop with distinct motivations and interests.

Interoperable Data Access Methods. The contributions of this research will advance the state of interdisciplinary data use by achieving effective global access to heterogeneous data streams and data sets. Specific emphasis will be placed on mediating access to diverse types of data from remote and in-situ observing systems, combined with simulated data from sophisticated, operational forecast models. The observational sources will include satellite- and surface-based air quality and meteorological measurements, emission inventories, and related data from the remarkably rich arrays of resources presently available via Unidata and DataFed.

The tangible output of this research component will be an extended Common Data Model (including the associated metadata structures), realized in the form of (interoperable) Web services that will meet the data-access needs of both communities and that can become generally accepted standards.

Interoperable Data-Processing Services. Interoperability among Web services at the physical and syntactic level is, of course, assured by the underlying Internet protocols, though semantic interoperability is not. In the case of SOAP-based services, WSDL descriptions permit syntax checking, but higher-level meanings of data exchange are inadequately described in the schema. Hence, SOAP-based services developed by different organizations for different purposes are rarely interoperable in a meaningful way. The research contribution on this topic will include--within key contexts for environmental data use--development of Web service adapters that provide loosely-coupled, snapping interfaces for Web services created autonomously in distinct communities.

Distributed Applications. The Service Oriented Architecture (SOA) movement, among others, indicates ongoing intellectual interest in the (unmet) challenges of distributed computing. Our team’s experience with SOA in recent years has demonstrated that useful applications can be built via Web-service chaining, but our current prototypes--including DataFed--operate within a context where service interoperability is assured by internal, community-specific conventions. As the AQO evolves into a fully networked system, distributed applications built upon its infrastructure will need to be robust, evolvable, and linkable to the system of systems that is the Web. The project approach to these goals will be based on abstract data typing, polymorphism, and standards-based service adaptors.

Broader Impacts

The Air Quality Observatory, through its technologies and applications, will have broader impact on the evolving cyberinfrastructure, air quality management, and atmospheric sciences.

Impact on Cyberinfrastructure

Infusion of Web Service Technologies. The agility and responsiveness of the evolving cyberinfrastructure is accomplished by loose coupling and user-driven dynamic rearrangement of its components. Service orientation and Web services are the key architectural and technological features of AQO. These new paradigms have been applied by the proposal team for several years generating applications and best-practice procedures that use these new approaches. Through collaborative activities, multi-agency workgroups, and formal publications the Web service based approach will be infused into the broader earth science cyberinfrastructure.

Technologies for Wrapping Legacy Datasets. The inclusion of legacy datasets into the cyberinfrastructure necessitates the wrapping of legacy datasets with formal interfaces for programmatic access, i.e. turning data into services. In the course of developing these interfaces to a wide variety of air quality, meteorological, and other datasets, the proposal team has developed an array of data wrappers, procedures and tools. A wide distribution of these wrappings will assure the rapid growth of the content shareable through the cyberinfrastructure and the science and societal benefits resulting from the “network effect”.

Common Data Models for Multiple Disciplines. The major resistance in the horizontal diffusion of data through the cyberinfrastructure arises from the variety of physical and semantic data structures for earth science applications. Common data models are emerging that allow uniform queries and standardized, self-describing returned data types. Through the development, promotion, and extensive application of these common, cross-disciplinary data models, the AQO will contribute to interoperability within the broader earth science community.

Impact on Air Quality Management

Federal and State Air Quality Status and Planning. DataFed has already been used extensively by federal and state agencies to prepare status and trend analysis and to support various planning processes. The new air quality observatory with the added meteorological data and tools will be able to serve these communities more effectively.

Exceptional Air Quality Events. AQ management is being increasingly responsive to the detection, analysis, and management of short-term systems. The combined DataFed-Unidata system and the extended cyberinfrastructure of AQO will be able to support these activities with increased effectiveness and through the just-in-time delivery of actionable knowledge to decision makers in AQ management organizations as well as the general public.

Monitoring Network Assessment. A current revolution in remote and surface based sensing of air pollutants is paralleled by a bold new National Ambient Air Monitoring Strategy 1. The effectiveness of the new strategy will depend heavily on cyberinfrastructure for data collection, distribution, and analysis for a variety of applications. The cyberinfrastructure will also help assess the overall effectiveness of the monitoring system that now includes data from multiple agencies, disciplines, media, and global communities.

Impact on Atmospheric Science and Educations

Chemical Model Evaluation and Augmentation. Dynamic air quality models are driven by emissions data and/or scenarios and a module that includes air chemistry and meteorology to calculate the source receptor relationships. The chemistry models themselves can be embedded in larger earth system models, and they can serve as inputs into models for health, ecological and economic effects. The AQO will provide homogenized data resources for model validation and also for assimilation into advanced models.

International Air Chemistry Collaboration. A significant venue for advancing global atmospheric chemistry is through international collaborative projects that bring together the global research community to address complex new issues such as intercontinental pollutant transport. The AQO will be able to support these scientific projects using real-time global scale data resources, the user-configurable processing chains, and the user-defined virtual dashboards. (See collaboration letter from T. Keating).

Near-Term Application of GEOSS. A deeper understanding of the earth system is now being pursued by a Global Earth Observation System of Systems (GEOSS, ref) which now includes the cooperation of over 60 nations. Air quality was identified as one of the near-term opportunities for demonstrating GEOSS through real examples. The AQO prototype can serve as a test bed for GEOSS demonstrations.

Education and Outreach Impact. Twice each year, Unidata will report on progress in implementing the AQO prototype at its User and Policy Committee meetings, encourage participation in the prototype, and solicit community input. Unidata site surveys conducted in 2001 and 2002 showed the significance of Unidata's impact in the educational community: over 21,000 students per year use Unidata tools and data in classrooms and labs; more than 1,800 faculty and research staff use Unidata products in both teaching and research; over 130 Unidata-connected university programs influence over 40,000 K-12 students; nearly 900 teacher-training participants have used Unidata software; Unidata-based weather web sites at colleges and universities have over 400,000 hits per day. Many of the Unidata universities serve large numbers of students from underrepresented and minority groups. Besides the technological and research advances, an important outcome of this initiative will be bringing together the existing Unidata community and the DataFed community.

Project Description: Air Quality Observatory (AQO)

Introduction

Research and management of air quality is addressed by several diverse communities. Pollutant emissions are determined by environmental engineers; atmospheric transport and removal processes are mainly in the domain of meteorologists; pollutant transformations are in the purview of atmospheric chemists and air-quality analysts; and the impacts of air pollution are assessed by health scientists, ecologists, and economists. Among the most dynamic and structurally complex cross-disciplinary links is the interaction between atmospheric chemistry and meteorology.

Command-and-control style air quality management has recently given way to a more participatory approach that includes the key stakeholders and encourages the application of more science-based ‘weight of evidence’ approaches to controls. The air quality regulations now emphasize short-term monitoring, and air quality goals are set to glide toward ‘natural background’ levels. The EPA has also adopted a new National Ambient Air Monitoring Strategy1 to provide more relevant and timely data for these complex management needs. Real-time surface-based monitoring networks now routinely provide patterns of fine particles and ozone throughout the US. Satellite sensors with global coverage depict the pattern of haze, smoke, and dust in stunning detail. The emergence a new cooperative spirit to make effective use of these developments is exemplified in the Global Earth Observation System of Systems (with over 60 member nations), where air quality is identified as one of the near-term opportunities for collaborative data integration and analysis.

The increased data supply and the demand for higher grade AQ information products is a grand challenge for both the environmental and information science communities. The current dedicated ‘stove-pipe’ information systems are unable to close the huge information supply-demand gap. Fortunately, new information technologies now offer outstanding opportunities. Gigabytes of data can now be stored, processed, and delivered in near-real time. Standardized computer-computer communication protocols and Service-Oriented Architectures (SOA) facilitate the flexible processing of raw data into high-grade ‘actionable’ knowledge. The instantaneous ‘horizontal’ diffusion of information via the Internet permits, in principle, the delivery of the right information to the right people at the right place and time.

The vision of this research is to improve air quality through a more supportive information infrastructure. The specific project obectives are to (1) Improve the interoperability infrastructure that links the air quality and the meteorological communities; (2) Develop a prototype air quality observatory (AQO); and (3) Demonstrate the utility of AQO through three cross-cutting use cases.

Interoperability Infrastructure

Overcoming Semantic Impedance

This project envisions innovative uses of Web services to enable new levels of cross-discipline interoperability among the tools, data sets, and data streams employed by members of the air-quality and meteorology communities. Two complementary approaches are proposed to reduce semantic impedance: polymorphism and standards-based interoperability.

Figure 1a. Unidata's Common Data Model as a basis for abstract data typing.

In this project, polymorphism means Web services that (1) maintain sophisticated awareness of and sensitivity to the types of input information they receive and (2) take differing actions, dependent upon this data-type discernment. Such flexibility requires sufficient descriptive capacity to characterize data inputs and outputs at a high (i.e., semantic rather than syntactic) level. Such abstract data typing will be achieved in this project by relying on and augmenting the Common Data Model, sketched as shown in Figure 1a, under development in Unidata.

The power and simplicity potentially gained via polymorphism, combined with an appropriate set of data-transformation and -projection services, is that for N types of data and M applications, the number of adaptor components is roughly N+M, rather than NxM as would be required with a many-to-many approach.

One application of this concept may be seen in Figure 1b, which illustrates an approach to complex query mediation. In this example, flexible connections are expedited by data-specific wrappers that homogenize the data into virtual data cubes. Data access adapters facilitate queries to the data cubes through multiple data access protocols.

Figure 1b. Flexible access through data wrappers and query adapters addressing the vitual (noth physical) data cube

Carrying these principles a step further, key Web services can themselves be interoperable in a fashion that allows the composition or chaining of functions. Hence, with a minimum of semantic complexity (and software development) the capabilities of the AQO can grow in a combinatorial fashion, evolving and adapting to the changing needs of a user base that increases both in size and diversity.

Standards Based Interoperability

Data and service interoperability among Air Quality Observatory (AQO) participants will be fostered through the implementation of accepted standards and protocols. (In the above diagrams, connectors will be standards-compliant to the greatest practical extent.) Adherence to standards will foster interoperability not only within the AQO but also among other observatories, cyberinfrastructure projects, and the emerging GEOSS efforts. Standards for finding, accessing, portraying, and processing geospatial data are defined by the Open Geospatial Consortium (OGC)2.

The most established OGC specification is the Web Map Server (WMS) for exchanging map images, but the Web Feature Service (WFS) and Web Coverage Service (WCS) are gaining wider implementation. WFS provides queries to discrete feature data output in Geography Markup Language (GML) format. WCS allows access to multi-dimensional data that represent coverages, such as grids. While these standards are based on the geospatial domain, many are designed to be extended to support non-geographic data "dimensions."

The OGC specifications success has led to efforts to develop interfaces between them and other common data access protocols (e.g. OPeNDAP, THREDDS). For example the GALEON Interoperability Experiment, led by B. Domenico, is developing a WCS interface to netCDF datasets to "map" the multi-dimensional atmospheric model outputs into the three-dimensional geospatial framework. AQO development will apply and extend the WCS specification for accommodating air quality and atmospheric data. Advances from GALEON will be incorporated and the WCS will be extended to provide a powerful interface for building multi-dimensional queries to monitoring, model, and satellite data.

WCS service interaction is through a client-server 'conversation' by the client requesting the server to offer its list of capabilities (e.g. datasets). Based on the returned capabilities, the client requests a more detailed description of a desired dataset, including return data format choices. In the third call to the server, the client sends specific data requests to the server, which are formulated in terms of a physical bounding box (X,Y,H) and time range. Following such universal data requests the server can deliver the data in the desired format, such that further processing in the client environment can proceed in a seamless manner. Currently the first two steps of the conversation are preformed by humans but it is hoped that new technologies will aid the (semi) automatic execution of 'find' and 'bind' operations for distributed data and services.

Other evolving and emerging specifications will be explored. OGC Catalog Services support publishing and searching collections of metadata and services. The OGC Sensor Web Enablement activity includes a number of emerging specifications, including Sensor Observation Service for retrieving observation datasets such as those from ground-based monitoring networks. The proposed Web Processing Service offers geospatial operations, including traditional GIS processing and spatial analysis algorithms, to clients across networks.

Interoperability testing and prototyping will be conducted through service compliance and standards gap analysis. Use of OGC specifications and interaction with OGC during the development of the AQO prototype will be facilitated by Northrop Grumman. Northrop Grumman acts as the Chair of the OGC Compliance and Interoperability Subcommittee and is nearing completion of an open source compliance engine soon to be adopted by the OGC. Compliance testing of the AQO prototype will ensure more complete interoperability and establish an AQO infrastructure that can be networked with other infrastructure developments.

Protoype Air Quality Observatory

Much of the effort in this project will be devoted to the Air Quality Observatory prototype. The functional design of the systems will incorporate responding to AQ events in real-time, delivering needed information to decision makers (AQ managers, public), and overcoming the syntactic and semantic impedances. The cyberinfrastructure design of the AQO will accommodate a variety of data sources; respond to different types of AQ events; offer simplicity via a Common Data Model; foster interoperability through standard protocols; provide user-programmable components for data filtering, aggregation, and fusion; employ a service-oriented architecture for linking loosely coupled web-services; and facilitate the creation of user-configurable monitoring consoles. The software application framework and the prototype will also serve as a test bed for testing advanced computer science ideas on robust distributed computing.

Figure 2. End-to-end information system for air quality analysis

The prototype AQO will be an extension of the DataFed air quality information system, depicted in Figure 2. On the left side of the end-to-end system are the providers of heterogeneous distributed air quality-related datasets and on the far right are the users requiring reports and other high-level information products. The first processing stage is data homogenization into a common data model such as a multi-dimensional “cube.” This is accomplished by data wrappers. Next the cubed data are sliced, diced and otherwise filtered or aggregated by simple tools for exploring a given air pollution situation. More elaborate processing,in step 3, occurs by user-programmable web applications. The final products (prepared by analysts) are the reports and summaries for consideration by decision-makers. The underlying infrastructure and tools for this type of analysis have been operational in DataFed since 2004. The system includes a catalog for publishing, finding, and describing datasets. There are data access wrapper classes for station-monitoring data, sequential images, grids, trajectories, and other data types. Wrappers for specific datasets can be prepared semi-automatically by filling out a registration form. Web services exist for gridding, aggregation, filtering, spatial and temporal rendering, view overlays, and annotations. User-defined data views are created by chaining web services for each data layer and then applying to the overlay services for the creation of multi-layered views (e.g. maps, time charts).

The Unidata system will be the mediator for passing meteorological data into the observatory. The enabling Unidata technologies include real-time data distribution, advanced push technologies, cross-disciplinary desktop visualization tools, mechanisms for tracking events, standards-based remote access, and others. The prototype Observatory will be greater than the sum of its parts since it will enable access to data and functionality of both systems and it will foster the fusion of multi-disciplinary data and synergism of the combined functionality.


AQO Architecture

Figure 3. Network architecture of the Air Quality Observatory

The architectural design of the Air Quality Observatory follows that of networks as illustrated in Figure 3. In fact, there will be two networks operating. At the bottom is the Unidata LDM network that pushes meteorological data among its nodes. This network will receive air quality data from DataFed through a dedicated LDM node. The benefit of this arrangement is that the university meteorology departments who are already members of the LDM network will receive real-time air quality data through their normal channels. The associated AQO network, will disseminate air quality data and also ingest meteorological data delivered, in real time, via the LDM network. To the users of each network, the other network will be opaque. This two-way coupling of the two networks will be a challenging experiment in linking the push-based messaging of LDM with the pull-driven data flow of AQO.

The AQO network will consist of nodes which can be used both as servers as well as clients of air-quality-relevant data. The nodes will belong to different organizations, that serve a variety of communities, and that have different data needs. There is also considerable heterogeneity in which the nodes conduct their business internally: finding, accessing, transforming, and delivering data. In Figure 3, the Unidata and DataFed netwok nodes are shown in more detail.

Other server nodes in the AQO network will include the NASA Goddard DAAC server, which provides access to an immense volume of satellite data contained in their “data pool.” At that server, a WCS interface is being implemented that will facilitate its joining the AQO network. The managers of the Goddard DAAC node have also expressed strong interest in accessing air quality data from DataFed and meteorological data from Unidata. Similarly, there are good prospects of adding an EPA node to the AQO, which will serve air quality model forecasts as well as provide access to an array of monitoring data. The participation of these additional AQO nodes is to be arranged and performed independently. However, this NSF AQO prototype project will provide the architectural framework for such networking and connectivity tools such as adapters, and also can serve as the testbed for the expanding AQO nodes. Attracting new nodes and guiding their inclusion will be pursued by the project team members through multiple venues: membership through workgroups, ESIP Federation meetings, and training workshops.

The interoperability of these heterogeneous systems can be achieved through the adaptation of a uniform query protocol such as provided by the OGC Web Coverage Service. To achieve this level of interoperability, a significant role is assigned to the adapter services (Figure 3) that can translate data formats and make other adjustments of the data syntax. These services can be provided by the data server, by the client or by third-party mediators, such as DataFed and Unidata. This type of “loose coupling” between clients and servers will allow the creation of dynamic, user-defined processing chains.

The AQO will be connected through a mediated peer-to-peer network. The mediation will be performed by centralized or virtually centralized catalog services which enable the Publish-Find operations needed for loosely-coupled, web-service-based networking. The Bind, i.e. data access operation, will be executed directly through a protocol-driven peer-to-peer approach. The Unidata THREDDS system performs the meteorological data brokering, while the DataFed Catalog serves the same purpose for the AQ data. Other candidate AQO nodes are currently brokered through their own catalog services, e.g. ECHO for the NASA Goddard DAAC. The unification (physically or virtually) of these distributed catalog services will be performed using the best available emerging web service brokering technologies.

Combined Air Quality - Meteorology Tools

"Today the boundaries between all disciplines overlap and converge at an accelerating pace. Progress in one area seeds advances in another. New tools can serve many disciplines, and even accelerate interdisciplinary work." (Rita R. Colwell, Director, NSF, February 2003). Over the past decades, many in the AQ and the meteorological communities have had a desire to create new knowledge together, but with marginal success.

The AQO will offer the possibility of creating powerful new synergistic tools, such as the Combined Aerosol Trajectory Tool, CATT3. In CATT, air quality and pollutant transport data are combined in an exploration tool that highlights the source regions from which high or low pollutant concentrations originate. Advanced data fusion algorithms applied in CATT have already contributed to the elucidation of unexpected aerosol source regions, such as the nitrate source over the Upper Midwest4.

Figure 4 illustrates the current capabilities of the tool that highlights in red the airmass trajectories that carry the highest concentration of sulfate over the Eastern US on a particular day. Currently, the real-time application of such a diagnostic tool is not possible, since the necessary IT infrastructure for bringing together and fusing the AQ and transport data does not exist.


Figure 4. CATT diagnostic tool for dirty and clean air source regions.



With the IT infrastructure of the Air Quality Observatory, which seamlessly links real-time AQ monitoring data to current and forecast meteorology, the CATT tool will be a significant addition to the toolbox of air quality analysts and meteorologists.

Other synergistic tools include Analysts Consoles, which consist of an array of maps and charts similar to the 'meteorolgical wall.' The View-based data processing and rendering system of DataFed is well suited to create such data views from the distributed data streams. Early protoyping has shown that Virtual Analysts Consoles are indeed feasible. However, there is considerble development required to make the preparation of user-defined data views easy and fast.

Distributed Computing Research and Applications

The currently used simple Web service execution engine of DataFed will be extended with advanced capabilities arising from reseach in distributed computing.

The proposed Observatory will consist of a large collection of independent data services, as well as applications that operate on those data services in order to collect, analyze, and disseminate information. Applications will be created by multiple organizations and will require interaction with data and applications created by other organizations. Furthermore, individual applications will need to be modified over time without disruption of the other applications that depend upon them. Computing resources will be provided by a multitude of independently-administered hosts with independent security policies.

To support this high degree of interoperability, we plan to investigate multiple paradigms for the construction of interoperable distributed applications. One promising approach involves a logical separation of data and process. Data repositories are seen as purely passive entities, which are acted upon by separatly installed processes (transactions). In this way, independent organizations can contribute to the Observatory by installing into the infrastructure an interoperating mixture of data and processes. This separation also simplifies construction and maintenance of applications that span multiple organizations, because the code for ongoing processes need not be integrated with the code for data servers that may be developed by others. We plan to leverage our ongoing research efforts on the creation of a shared infrastructure for the reliable execution of long-lived, distributed applications5. Important design goals include decentralized control, support for installation, distributed execution, and evolution (live upgrades of the system and applications), as well as the aforementioned principles of polymorphism and high-level abstract data typing.

Building on prior experience in constructing distributed systems infrastructure6,7,8, work is underway is to design and implement algorithms, protocols, and middleware for a practical shared computing infrastructure that is incrementally deployable. The architecture will feature dedicated servers that communicate over the Internet, run on heterogeneous hosts, and are maintained and administered by independent service providers. The execution model for this infrastructure captures a wide class of applications and supports integration of legacy systems, including applications written using SOAP. The system configuration consists of an interconnected graph of data repositories and the work flow transactions that access them. We will investigate analysis of the graph for "information traces" that verify information authenticity. Each application will specify its own security policies and fault-tolerance requirements. For critical applications, servers will participate in replica groups that use an efficient Byzantine agreement protocol to survive a bounded number of arbitrary failures. Through this mechanism, the infrastructure will provide guarantees that once information enters the system, it will continue to be processed to completion, even though processing spans multiple applications and administrative domains.

The Observatory will benefit from this infrastructure in several ways, most notably ease of application deployment and interoperability. In addition, the infrastructure will provide opportunities for reliable automated data monitoring. For example, we anticipate that ongoing computations, such as those that perform “gridding” operations on generated data points, will be installed into the system. Some of the data analysis currently performed on demand could be installed for periodic execution. This will result in the availability of shared data repositories not only for raw data, but also for information that is the result of computational synthesis of data obtained from multiple data sources. Researchers will be able to install applications into the infrastructure to make further use of these data sources, as well as the raw data sources, as input to their applications.

Use Cases for Prototype Demonstration

The proposed AQO prototype will be demonstrated through three cross-cutting use cases. The cases are true scientific challenges and also contributions to AQ management. These topics are also areas of active research in atmospheric chemistry and transport at CAPITA and other groups. The cases will be end-to end applications, connecting real data produces and mediators, as well as decision makers.

  1. Intercontinental Pollutant Transport. Sahara dust over the Southeast, Asian dust and pollution over the Western US, and smoke from Canada and Mexico constitute significant contrubutions to air pollution. The problem is being addressed by the Task Force on Hemispheric Transport of Air Pollutants under the Convention on Long-range Transboundary Air Pollution. According to the Task-Force Co-chair, "The proposed AQO will provide a powerful tool for integrating information for use by the Task Force." (See letter of collaboration by T. Keating)
  1. Exceptional Events. This use case will demonstrate real-time data access/processing/delivery/response as well as the capability of AQO to trigger managerial responses. Exceptional AQ events include smoke from natural and some anthropogenic fires, windblown dust events, volcanoes, and also long range pollution transport events from sources such as other continents. The AQO prototype system will provide real-time characterization and near-term forecasting that can be used for preventive action triggers, such as public health warnings. Exceptional events are also important for long-term AQ management since EE samples can be flagged for exclusion from the National Ambient Air Quality Standards calculations. AQO IT support will be provided to federal and state agencies. (See letter of collaboration by R. Poirot)
  1. Midwestern Nitrate Anomaly. Over the last few years, a mysterious pollutant source has caused the rise of pollutant levels in excess of the AQ standard over much of the Upper Midwest in the winter/spring. Nitrogen sources are suspected. The phenomenon has eluded detection and quantification since the area was not monitored, but recent intense sampling campaigns have implicated NOX and Ammonia release from agricultural fields during and after snow melt. This AQO use case will integrate and facilitate access to data from soil quality, agricultural fertilizer concentration and flow, snow chemistry, surface meteorology, and air chemistry.

Participant Qualifications

DataFed is a community-supported effort led by CAPITA at Washington University. While the data integration Web services infrastructure was initially supported by specific information technology grants form NSF and NASA, the data resources are contributed by the autonomous providers. The application of the federated data and tools is in the hands of users as part of specific projects. Similar to how data quality improves by passing it through many hands, the analysis tools will also improve with use and feedback from data analysts. A partial list is at http://datafed.net/projects. Rudolf Husar is Professor of Mechanical Engineering, and Director of the Center for Air Pollution Impact and Trend Analysis (CAPITA) and will lead the DataFed integration into AQO. Dr. Husar brings 30+ years of experience in AQ analysis and environmental informatics to the AQO project.

Unidata is a diverse community of education and research institutions vested in the common goal of sharing data, tools to access the data, and software to use and visualize the data. Successful cooperative endeavors have been launched through Unidata and its member institutions to enrich the geosciences community. Unidata's governing committees facilitate consensus building for future directions for the program and establish standards of involvement for the community. Ben Domenico is Deputy Director of Unidata. Since its inception in 1983, Domenico has played a key role in turning Unidata into one of the earliest examples of successful cyberinfrastructure; providing data, tools, and general building support to the meteorological research and education community.

The Goldman research group at Washington University has been working on JPie, a novel visual programming environment that supports live construction of running applications. In addition, the group is currently working on algorithms and middleware for a fault-tolerant shared infrastructure that supports evolvable long-running distributed applications. Kenneth J. Goldman is an Associate Professor in the Washington University Department of Computer Science and Engineering and brings to this project over 20 years of research experience in the areas of distributed systems and programming environments.

Northrop Grumman Corporation (NG) contributes expertise in development and implementation of geospatial applications, architectures and enterprise-wide solutions. NG is a Principal Member of the OGC and has been influential in defining an interoperable open infrastructure that is shared across user communities. Through its development of OGC’s Compliance Testing Tools, NG leads the geospatial community in insight into service providers’ and GIS vendors’ compliance to OGC standards. Stefan Falke, Systems Engineer with NG, will lead the NG team. He brings experience in applying OGC-based services (including DataFed's) to NG's projects. As a part-time research professor of Environmental Engineering at Washington University, he is involved in air quality cyberinfrastructure projects for satellite and emissions data. Dr. Falke is co-lead for the ESIP Air Quality Cluster.

Workplan

The AQO project will be lead by Rudolf Husar (CAPITA) and Ben Domenico (Unidata). CAPITA and Unidata, with their rich history and the experience of their staff, will be the pillars of the AQO. The active members of the AQO network will come from the ranks of data providers, data users, and value-adding mediators-analysts. The latter group will consist of existing AQ research projects funded by EPA, NASA, NOAA, and NSF that have data, tools, or expertise to contribute to the shared AQO pool. Ken Goldman will use the AQO as a testbed for the testing of advanced distributed application frameworks. Stefan Falke will coordinate the OGC interoperability compliance and geospatial standards gap analysis.

Coordination among activities and organizations will be fostered through a combination of virtual and physical interaction. Approximately every sixth months the core AQO team will meet as part of the Earth Science Information Partners (ESIP)/Air Quality Cluster. The ESIP Federation and AQ Cluster provide an inter-agency and inter-organization environment. General team communication will be handled through a shared Wiki site.

Collaborators

The Air Quality Observatory will be a community activity involving many of the key data providers, users and researchers. Four specific collaborations have been singled out to indicate the nature of this collaborative activity.

Terry Keating, Co-Chair of the Task Force on Hemispheric Transport of Air Pollutants under the Convention on Long-range Transboundary Air Pollution. Dr. Keating and the Task Force will collaborate with the Observatory regarding Intercontinental Pollutant Transport issues.

Rich Poirot, Co-Chair of the National Inter-RPO Monitoring and Data Analysis Workgroup will collaborate with the AQO team in the area of regional haze and fine particle pollution.

Neil Frank, Senior Air Quality Data Advisor, USEPA, Emissions, Monitoring and Analysis Division will continue long-term collaboration of his office with DataFed in the area of regulatory decision making.

Richard Wirtz, Executive Director of the Foundation for Earth Science, collaborates with the project with the AQO project by providing support through the ESIP Air Quality Cluster.

References Cited

1. National Ambient Air Monitoring Strategy, Draft, OAQPS. USEPA, December 2005, http://www.epa.gov/ttn/amtic/files/ambient/monitorstrat/naamstrat2005.pdf

2. Buehler, K. and McKee, L. The Open GIS Guide: Introduction to Interoperable Geoprocessing and the OpenGIS Specification, Waltham, MA (1998).

3. Husar, R. B.; Höijärvi, K.; Poirot, R. L.; Kayin S.; Gebhart, K. A.; Schichtel, B. A. and Malm, W. C. Combined Aerosol Trajectory Tool, CATT: Status Report on Tools Development and Use, Paper # 97, A&WMA Specialty Conference on Regional and Global Perspectives on Haze: Causes, Consequences and Controversies, Asheville, NC, October 25-29, 2004.

4. Husar, R.; Poirot, R. DataFed and Fastnet: Tools for Agile Air Quality Analysis; Environmental Manager 2005, September, 39-41

5. Thorvaldsson, H. D.; Goldman, K. J. "Architecture and Execution Model for a Survivable Workflow Transaction Infrastructure." Washington University Department of Computer Science and Engineering, Technical Report TR-2005-61, December 2005.

6. Goldman, K. J.; Bala Swaminathan, B.; McCartney, T. P.; Michael D. Anderson, M. D. and Sethuraman, R. The Programmers' Playground: I/O Abstraction for User-Configurable Distributed Applications. IEEE Transactions on Software Engineering, 21(9):735-746, September 1995.

7. Pallemulle, S. L; Goldman, K.J. and Morgan, B. E. Supporting Live Development of SOAP and CORBA Servers. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS’05), pages 553-562, Washington DC, 2005.

8. Parwatikar, J. K; Engebretson, A. M; McCartney, T. P.; Dehart, J.D. and Goldman, K. J. Vaudeville: A High Performance, Voice Activated Teleconferencing Application, Multimedia Tools and Applications, 10(1): 5-22, January 2000. Buehler, K. and L. McKee, The OpenGIS Guide: Introduction to Interoperable Geoprocessing and the OpenGIS Specification, Waltham, MA (1998).DoneAtomic2006-01-25