Scheffe: Breakout Sessions: Objectives & Questions

Explanation of breakout sessions:

The workshop objective is data system interoperability focused on the air quality community. The 4 leading applications of the first day might give the impression that this is another technical discussion of data needs and a menu of insights and ideas. These are only business case examples to 1) flesh out the data (availability, movement, access, usage) issues and eventually help convey (market?) a future proposal. Marketing an IT strategy is both a bit abstract and likely will fall on deaf ears. So, the cases – model evaluation, trends, health, forecasting – are designed to produce enough variety of needs that should funnel into a more robust solution. There are many other venues to more completely address any one of the business case examples from a science and programmatic perspective – this venue is to address those cases from a data movement perspective. Accordingly, we will not be debating in depth what data is needed; but rather come in with some basic assumptions of what data exists and how best to organize, access, integrate, and so forth. [in the big picture, the business case examples are the end and the IT support is the means..….in this workshop we reverse…with obvious risk of escalating confusion]

1) After hearing brief overviews of data systems and processing centers, the application area breakouts (health, forecasting, model evaluation, characterization) define their needs and ideals in accessing data, using the processing centers, preparing data/information for its users and what the present challenges are in working with data systems and processing centers in achieving their application area objectives.

2) After re-aligning into information system breakouts on day 2 (data systems, processing centers, user visualization/analysis), the groups determine current gaps in meeting the ideals defined by the application area breakouts and how interoperability and shared tools/methods can address those gaps.

Groups A2, B2, and C2 target the IT and organizational issues in relating and connecting the data systems, processing centers, and users. A2 focuses on making the data systems accessible and part of an interoperable AQ network B2 focuses on integrating the data from the data systems and interoperability among the processing centers C2 focuses on making the output from the data systems and processing centers relevant to user visualization and analysis

Breakout session questions.

What challenges in data formatting conventions, software constraints, and security policy do EPA systems face in participating in a broader data sharing community, and what viable solutions exist? [this question is intended to address barriers such as firewall policies, organizational software rqmts. and others.]

Is it more efficient to produce multiple data bases that include common information that can be tailored to a particular “system” or should systems be largely distributive in nature and grab rely on the root repositories of information? Or, perhaps more realistically, are both practices applied in most cases? [this question attempts to address the objective of having a dynamic source of information that is updated by the provider but can be accessed seamlessly for downstream analysis purposes without the need to conduct periodic data dumps to rebuild data inputs.]

GEOSS borne DataFed (http://datafedwiki.wustl.edu/index.php/DataFed_Wiki) and Earth Science Information Partners (ESIP), http://wiki.esipfed.org/index.php/Main_Page, appear to embrace the basis tenets of data system interoperability and have used “air” as a powerful illustrative medium. Does this DataFed/ESIP structure benefit the air quality user community? Or, does it have enormous potential to do so? What needs to be accomplished to elevate DataFed/ESIP to a more prominent position in mainstream data sharing and analysis activities? Or, has it reached that point? What, then, is the relationship among DataFed/ESIP and other organizational systems? Should other systems make some form of a commitment to embrace”standards” inherent in the DatFed/ESIP venture? What would those commitments be? By the way, what is the distinction between DataFed and ESIP?

Why bother? I have the ability to retrieve data and manipulate it the way I need to? Do not encumber me with all this collaboration and data harmonization talk.

Air quality modeling results offer an enormous suite of time, space, composition…1 typical CMAQ run provides about…100 species * 20 vertical layers times (SA of CONUS/144 km^2). And, modelers do not want to be burdened trying to explain what this all means, and modeling-policy types are just plain paranoid in general (sometimes, anyway). And, let’s not forget the nicely processed emission input fields. How do we deal with this? Recognizing these realities, is there value in having EPA provide 1 or 2 “seminal” base case and future year scenarios with adequate mete-data descriptions that can be archived, and queried on a time-space-species basis (yes…all species, all vertical levels and all hours of entire year or ensemble year run).

How can EPA’s GEOSS AMI facilitate interoperability of systems? Can EPA-AMI add an integration component to the existing project by project structure?

Can we focus on a specific example to highlight the benefits of facilitating observation and model integration efforts, such as working with HTAP to build or support related efforts to form an observational base for evaluating transport models? And, create a North Amerixan component for regional model evaluation support?

I like the features in VIEWS, why can’t our own EPA system do the same?

Why is it so hard to access AQS? (myth or fact…or a bit of both?), and, does DataMart handle earlier AQS concerns?

These questions are a start…