Talk:NSF Air Quality Observatory: NSF Proposal Response

Links to: AQO Proposal Main Page > Proposal | Proposal Discussion| NSF Solicitation | NSF Solicitation Discussion | People

To add to the discussion, log in to DataFed wiki
Begin each entry with ====Username: Subject====
To respond, add dots ====......Username: Subject====
Indent response text by adding : for each tab.
Sign your entry by ending with '~~~~',

This discussion page contains responses and comments on the 'Proposal must cover' topics stated in the NSF Solicitation. (I have introduced tentative headings for each requirement, thinking these might eventually serve as subsection titles in the proposal. Dave Fulker (Dave.Fulker))

Terrific, tightening the connection between these NSF requirments and the proposal body is a really good move. Lets see how can we fold these headings into the prop body. Rhusar 19:02, 19 January 2006 (EST)

1. Research Team: Team includes information scientists and environmental researchers from two of four areas: ecology, ocean science, atmospheric science, environmental engineering

Unidata certainly has applied information scientists that can contribute to the effort, but we do not have researchers in that area in the sense of individuals who publish regularly in the information science literature. Ben Domenico (BDomenico)

The Information Science in the team is covered through Prof. Ken Goldman in the CompSci department at WashU. His main research is in distributed computing, most recently ' Shared Computing Infrastructure for Survivable and Evolvable Distributed Applications'. A nice topic for the distributed With the applied information scientist at your shop and ours, and Ken Goldman, I think we are covered on the IT science/application area. I would like to include some theoretical/empirical work on the 'Network Effect'. Another possible information science area is the 'Semantic Mediation' stuff, a la Stefano Nativi &co. The applied IT is clearly Unidata's forte. The Atmospheric Science and Environmental Engineering is covered with the CAPITA air pollution group and the group of 'Collaborators' (traitors? :)) that will supply their letters of support/commitment. Rhusar

2. Realization of a Prototype: Develop & deploy a prototype observatory as a component of cyberinfrastructure

This should be done at WUSTL CAPITA but we could definitely provide support to help you integrated the relevant Unidata components. Ben Domenico (BDomenico)

Yes, the prototype observatory will be the DataFed air quality data sharing system. A good part of the proposal will be on the description of the prototype. The 3 Use Cases will show of the prototype in action. The vital links to Unidata will be through the (1) access to Met data through THREDDS (2) standardisation of data flow interfaces, e.g. GALEON II the sequel (3) transferring Unidata experience in community building (vague...need to refine it). I see the key feature of the protoype to be the 'networking'. We know that the Unidata resources and tools work like a charm -- for the Unidata community. Ditto for the DataFed serving the AQ community. Same for other 'island-kingdoms'. The big task is to make them work together... but how?? Rhusar

3. Cyberinfrastructure Design Questions: Identify one+ cyberinfrastructure design/functionality questions for the observatory and describe how these will be addressed

One key IIT issue we are working on it interoperability between the data systems of the atmospheric science and those of GIS. Another is real-time access to air quality data. A third is coupled atmospheric and plume dispersion models. We have tools that can be used in the first two. In the third area, someone else would have to be the dispersion experts but we can provide real-time access to the weather forecast model output Ben Domenico (BDomenico)

The IIT topics that CAPITA is pursuing and contributing are:

(1) Tools and procedures for wrapping existing air quality data into standard web-accessible resources.

(2) Develop and mediate data transformation and caching services that make existing data analyst-friendly

(3) Develop and demonstrate data analyses tools built on loosely coupled web services Rhusar

I think getting this right will be crucial to ensuring that our proposal has sufficient specificity to overcome potential reviewer concerns about (overly) general concepts for seamless integration, interoperability and/or homogeneity. As a starting point, I might emphasize parallel system-design ideas about functionality and cyberinsfrastructure, with attendant topics as show below.

Functional Design

Responding to Natural Events: (automated) triggers derived from observations...

Responding to the Needs of People: user requests, user-steered processing...

Overcoming Semantic Impedance: mediation of inter-community & inter-tool issues...

Cyberinfrastructure Design

Classes of Data Sources: extant or expected streams or sets of data...

Classes of Events: routine data arrivals, (detected) exceptional events, user requests...

Classes of Data-Analysis Tools: GIS, IDV, interactive models, decision-support systems...

Classes of Data Transformers and Aggregators: THREDDS & OGC capabilities...

Simplicity and Polymorphism: an architecture that reduces design complexity (say for N data sources and M data-analysis tools) from NxM to fewer than N+M code modules...

A Common Data Model: an abstract data-typing framework to enable polymorphism...

This is just a preliminary (and naive?) approach to the overall design. Dave Fulker (Dave.Fulker)

4. Domain Questions: Identify one+ compelling environmental research questions and through a prototype demonstrate the IIT for solving the questions

We are suggesting three air quality use cases that would be significantly enhanced by a prototype observatory: (1) Intercontinental air pollutant transport - e.g. Asian dust, some and pollution to N America; (2) Air quality (smke-dust-pollution) events, posibly mixed with urban-industrial emissions, such as occurring over the Eastern US - also relevant to air quality management (3) Mystery nitrate over Upper Midwest. Agricultural notrogen (?) releasd from the fields at snowmelt? The use cases involve real-time as well as post-analysis with with significant publishable research contributions. Is this the right mix? Need use cases with more sex apeal to NSF (academic) types? Are these good IIT demo cases? Rhusar

Testing the editor rich poirot (Richpo)

5. New Users of Real Data: Demonstrates the prototype utility to outside users; attract and include general users

I believe many members of the Unidata community would be interested in this prototype. We already have a large community of users a regular newsletter and a governing structure with a users committee and a policy committee that meet twice a year each. We would report on the prototype at these meetings and solicit input from all those groups.

College and university site surveys conducted in 2001 and 2002 showed the significance of Unidata's impact in the educational community:
- Over 21,000 students per year use Unidata tools and data in classrooms and labs.
- More than 1,800 faculty and research staff use Unidata products in teaching and research.
- Unidata-connected university programs influence over 40,000 K-12 students.
- Nearly 900 teacher-training participants have used Unidata software.
- Unidata-based weather web sites at colleges and universities have over 400,000 hits per day.
- 404 participants at Unidata workshops during the previous 5 years period. Ben Domenico (BDomenico)

The air quality community that will use the system - extension of current DataFed users. Federal agencies - EPA aq management; State and Regional agencies (RPOs); internationl (e.g. IGAC). More user calsses? Here is where the supporting letters should help spelling out how the Observatory can/will help. Rhusar

6. Leveraging Extant Cyberinfrastructure: Leverage existing cyberinfrastructure developments, e.g. ITR, NMI or SEIII programs

We consider Unidata one of the earliest examples of successful cyberinfrastructure and this project could leverage our IDD/LDM (NSF ATM) technologies, our THREDDS Data Server (THREDDS was sponsored by NSF NSDL) and our work with dynamically steered high res local forecast models from LEAD (NSF ITR). Ben Domenico (BDomenico)

Unidata has the big guns here, so we are counting on their weight in this area. At CAPITA this project is a continuation and extension of a sequence of phased air quality cyberinfrastructure projects at CAPITA: NSF ITR project (2001-4), NASA Reason Project (2004-9), Numerous smaller EPA/State projects. AQO will also leverage some of the tools and experiences from the NSF-supported astronomical Virtual Observatory (e.g. VOTable, SkyQuery). Any other observatory to be leveraged? Rhusar

We might mention the EPA Emissions "Cyberinfrastructure" project as it seems a likely user and provider to an AQO. Stefan Falke (Sfalke)

7. IT Gap Analysis: Identify specific IT gaps and their importance to environmental observatory cyberinfrastructure

I can't figure out what the differenece is between these IT gaps and the IIT gaps above. Are not IIT issues one form of IT issue? Ben Domenico (BDomenico)

Maybe the Ken Goldman topics on distributed computein issues; Interoperability of heterogeneous distributed web services (service adapters); Semantic web service chaning (description/matching beyond WSDL and SOAP enveope.. More ideas"> Rhusar

I think the distinction between these two is one of design versus need, and I have rewritten the Wiki paraphrasing of requirement 3 to emphasize this distinction. The comments I introduced under requirement 3 show one approach to articulating an overall architecture/design. I suggest addressing this requirement by focusing on a few specific challenges, namely: conducting research, teaching, and making decisions in an environment where end-user tools must be matched to needed (often aggregated) data sets and streams and where these matchings are impeded by semantic variations among the pertinent disciplines and tools. In this context, each need for tool-matching and for data-aggregation is an instance of an IT gap, and we propose to fill these gaps not individually (an NxM problem) but more generally via abstract data-typing (an N+M or less problem) and related IT advances, as manifest in the overarching system design. Dave Fulker (Dave.Fulker)

8. End-to-End Results: Pursue an end-to-end approach to some component of cyberinfrastructure

Identify the providers, consumers and the value adding mediators in the data flow. This needs a graphic on the air quality data flow, value-adding chain/value network. Also, the end-to-end for the use cases should start with the shared data from the provides and preferably end with decision maker, public or educators/students. Rhusar 15:38, 17 January 2006 (EST)

9. Domain-Tool Advances: Advance the technological capabilities of the environmental research community

The actual integration would have to be done at CAPITA but you could consider an end to end system of the sort depicted in End-to-End LEAD Prototype with a dispersion model connected to the output of the weather forecast model. Of course other parts of it would have to be adapted but that would be the general idea. By the way, if you do have a machine with lots of memory and WebStart running, you can actually fire up our IDV and interact directly with the data on THREDDS servers from that web page Ben Domenico (BDomenico)

LEAD - good stuff for end-to-end system demo! Also, using the forecast model to drive a vanilla trajectory server of a fancier diagnostic Monte Carlo dispersion model would be a nice chainig of services, packaged into a webapp. Its wide usage would be guaranteed by Federal, State, Academics.. particularly since Ron Drexler's HYSPLIT cannot be run as a chainable service. Rhusar

10. Leveraging Extant Data: Extensive leveraging of existing or expected data streams/sets

This could make use of many of the Unidata data streams, but it would seem the most important ones would be the output of numerical forecast models (forecast winds and precip, etc.) and radar data (precip purging the atmosphere and perhaps polluting land and strerams) Ben Domenico (BDomenico)

Indeed, we should stress that for years, both Unidata (20+ years) and DataFed (2+ years) have been mediating the flow of data to their respoctive communities. DataFed has 20+ AQ-relevant datasets registerd and accessible through standard protocols. Rhusar

11. Extensibility: Leads to a flexible prototype, amenable to extension as technology evolves

On could cite the Unidata program as a whole adapting over the years from satellite broadcast to Internet data distribution and now our ongoing work in areas like GALEON for evolving international standards. Ben Domenico (BDomenico)

Similarly, the data systems used in AQ evolved from mainframe access (60s) , 'Voyager data browser' (1970s), file-based internet data sharing (80s, 90s) and now convergence with Met and GIS through Service Protocol-based inter-interoperability, a la GALEON, yeah!!. Rhusar

12. Proscriptions: Project should not create new data or models; should not overlap with existing projects

This would be an integration of existing data, data systems and models. It would leverage existing IT infrastructure and make it useful in and entirely new context and a broader community. Ben Domenico (BDomenico)

Amen. Rhusar