AQ 2007 10 31 Discussion

From Federation of Earth Science Information Partners

back to 2007-10-31 Workshop page

Data lineage tracking

  • Do we need to coordinate conventions for tracking provenance, even in readme files?
  • How do we recognize and track sources of variance (and their magnitude) within and across datasets throughout the processing chain?
    • Take as an example the case x and y variance are not the same.
    • Which processing tools compound error? How do we account for it?
  • How do we make apparent critical aspects of data interpretation as these aspects change along the processing chain? Which propagate?
    • Do we need rules? How about conventions? Community organization to provide oversite and promote use, especially responsible use?
  • Air quality forecasting application
    • plus special/exceptional events
  • Assessment of control programs application
    • attention special/exceptional events

Focus: HTAP and exceptional events

  • Identify users in these applications
  • Connect the existing projects (Airnow Tech, RSIG, Giovanni, etc.)
    • Exceptional events: Airnow Tech is becoming source for surface data
    • HTAP: model evaluation is needed (e.g., RSIG, Giovanni)
  • How to support analysis and comparison of datasets
  • Create "interoperability network" and connect appropriate players
  • Lineage info required is driven by uses: How to capture this up front?

Data quality considerations

  • Can we list spurious sources of variance that must be taken into consideration as we visualize or composite datasets?
    • Cloud cover
    • Registration issues
      • sensitivity to neighboring measurements
    • Instrument issues
      • resolution
        • types of interpixel contamination
      • interference (NOx, chemical, electrical)
      • "viewing" angle (each pixel, terrain issues)
    • Timing issues
      • Time of day
      • Day of week
      • Lunar cycle
      • Special (car exhaust, sporting event, volcanic activity)
    • Magnetic issues (no omi for Rio)
      • Raised as a confounding issue only known to remote sensing elders
      • How much of an effect is seen at other sites (besides Rio)
  • What is the best way to determine "best available evidence"?
    • How do we know if a remote sensing product has been verified with ground data and in situations comparable to use?
  • How and why might we tag a dataset "bad data"?

Use Cases: Question-Driven Approach

Puget Sound Challenge

  • Highlight at EPA OEI Symposium Nov 14

Wildfire Scenario Southern California

  • How to advertise the needs and opportunities?

Beijing China Olympics

  • Door may be closed to participate ahead of time
  • High visibility for anything accomplished

How much NO2 is man made?

  • How would we provide data to justify the statement, "Man-made emissions of nitrogen oxides dominate total emissions"?

Thessolonika dataset

  • Perhaps useful to discuss formally Ernest Hilsenrath's Thessolonika slide

Compare appropriate use of "near real time" vs "standard product"

  • Provides example in which absolutes are not all there is
  • Would give experience in a limited context that might generalize
  • Has immediate effect on use and users

Is the air safe to breath?

  • Can't give satisfying answer: spatial/temporal variation
  • Is it getting better?
  • How do we get people used to tracking environmental indicators and In Situ monitoring to track and use the remote sensing data?
    • Need to train and re-train experts and decision makers


Technology problem

  • Statement was made that this may not be technology problem
  • But we don't have technology for capturing inference or causes
    • we are barely handling ontology when we need epistemology, grammar, rhetoric, and dialectics
  • We don't capture emergent properties
    • Rudy's frames provide slots for lots of the necessary ancillary data
    • Frame-based reasoning is brittle and doesn't handle ambiguity
  • Compare "readme" files and wiki
    • wiki can be more interesting and see larger audience participation
    • wiki may provide too many structural decisions; needs hourly grooming and attention
  • How do we capture the chain of decisions involved in data collection, processing, and use?
    • The "chain" begins before the sensor is designed and extends through many intermediaries out to each user in new contexts.
      • How do we tag data appropriately for an educator or policy maker?
      • Decisions are made between perceived choices. At the extreme, what if there are whole new sets of considerations that hadn't arisen at the time the mission was designed? What if they arise for some irreversible processing decision?

Social problem

  • How do we get people to use the technology
    • Dedicate a person to capturing ephemera
    • But there is an inestimable volume of such, it would be too expensive
  • Question may be, "How do we elevate those few really significant details?"
    • What tools would facilitate this? Easy way to raise "level 3" objections or warnings
  • Different communities don't know enough to ask each other for the "right" things
    • Need low level interaction to raise common ground
  • Sometimes more than a "chain" of intermediaries: a whole social network
    • There is feedback even to the design of the next sensor
    • Create a wire diagram of specific flows (chain diagrams)
    • Examine tools from popular social networking sites (Orkut,, Amazon, friendster, linkedin)

Systemic problem

  • Are there appropriate rewards for investment of time in metadata?
  • How do we control for bias, institutional and other?
    • There are always reasons to "sell" aspects of what was done
    • Everyone has a bias; courts require id for advocacy; should science?
    • Science has an assumption of no unbiased judgments: is a non-rhetorical assumption valid?
      • data processing and presentation canbe/is rhetorical, advocating a decision
  • Are the right people at the table?
    • ESIP Federation was established to assure that the right mix of experts were involved in data decisions.
    • Can we expedite reviews using "federalism" (balanced interests)?

Useful analogies

  • Process for creating "data spaces" used for data sheets in education
    • DLESE and NSDL Data Access Working Group
    • Data sheets are one-page summaries giving the "vibe of the thing"
  • Stock Market
    • Used to be little information available; every one used broker
    • Now day traders can get high level realtime data
    • Idiots get weeded out
  • Radar images in weather report
    • Having Google Earth OMI visualizations scares folks
    • It is not like weather; seems like only stupid people live in Northeast urban areas
    • 2004 Sciamachy image got frontpage coverage showing European pollution sites; people knew red meant bad; raises whole issue of color scales and interpretation; AQI has regulatory requirement for color values
    • Might better inform people who could drive change.
  • Ecosystem Model
    • Data services are in an environment competing/cooperating
    • Diversity is good to avoid collapse
    • Valuable to have multiple overlapping products to compare

Opening Issues

  • What is the particular niche for this group?
    • Comparison to science meeting
      • Was this more like a science meeting?
Can we segment topics so all parties participate in all discussions?
Is there a way to capture all the discussion that happened around presentations?
  • Special opportunities to build "knowledge base"
    • Journal articles may not capture all available discussion
Could we test tools that would allow us to elaborate and apportion the significance of topics? Maybe we could use a wiki for this.
The comment was made that our measurements are "unstable", which elicited the comment, "Then what are we doing here?" How do we avoid defaming all our data with details? Certainly these satellite differences are significant. How can we state that while questioning the details of our understanding?
  • Opportunity to reach across communities
    • Clearly an advantage of the ESIP Federation is the opportunity to put technology people together with data people, scientists, educators, and applied science people. How can we take best advantage of this opportunity?
It might be that we could use discussion tools to help create an effective knowledge base that communicates appropriately to different audiences. Policy makers need to see that there is a human effect, not that there is still some question which of the effects are the most significant.
  • Data/tool Decision Tree
wonder if we could set up table comparing products
might do by datasource, visualization tool, processing tool, etc.
  • Usage considerations
    • Legal
      • EPA always has to defend its judgments in court
      • Court needs "preponderance of evidence" or "beyond a reasonable doubt"
This is quite different from 0.95% certainty.
    • Science
      • Needs detailed information about sources and models used in "correcting" data
    • Education
      • Special considerations for "real-time" data
      • Special considerations for hiding and introducing complexity
  • How can we leverage the NO2 work as a pathfinder to accelerate our collective capacity to use data from new platforms such as the Orbiting Carbon Observatory (OCO) targeted for launch in 2008 or NPOESS?