Output from Visualization Summit

From Earth Science Information Partners (ESIP)

* RESPONSES TO KEY VISUALIZATION QUESTIONS

Earth Data Visualization Summit. Santa Barbara, CA. October 26-27, 2009. Participants: Bruce Caron, Marty Landsfeld, Kevin Ward, Tommy Jasmin, Kevin Hussey, Jeff McWhirter, Robert Simmon, Marit Jentoft-Nilsen, Eric Russell, Suresh Santhanavannan, Tom Rink, John Moreland, David Nadeau, Chris Torrence. This summit was Funded by a NASA Reason Grant: NNX06AB08A

Overall questions

Question 1: Imagine the perfect earth data remote sensing visualization tool/system. What are the main components of this system?

  • Needs to closely track needs and abilities of a variety of audiences
  • Support multiple GUIs (meaning API-based back-end to support tools?)
  • Open Source
  • Solves data format/standards problem — new data type? derivative data types?
  • Social web, community-based: support collaborative workflow and data discovery
  • One-stop repository to support aggregate querying — e.g., What is the causality relationship between meningitis outbreaks and precipitation patterns in sub-Saharan Africa?
  • Provide ability to store and track provenance

Question 2: Visualization for understanding vs. conveying — It’s all about the user: How should tools and visualizations be tailored for distinct user groups?

Two different issues:

  • Conveying — you know the answer, need to display it in a good way. “Static”
  • Understanding — trying to find the answer via exploration. Needs more facilities. “Interactive”

Hard to build a general-purpose tool, to satisfy power users vs. non-power users, cross-discipline. Same tool needs to scale with abilities.

Question 3: Are there user groups that you know are under-served by the current data visualization technology? What needs to change to serve these groups?

  • Everyone is underserved, but not equally.
  • PIs/those closest to data are best served by way of their familiarity with the data.
  • Solving this problem is an argument in favor of plug-in based development.
  • Tools need to be developed to support each audience — different audiences have wildly different needs in terms of capabilities and end result.
  • Of course, serving user groups only comes after they have located the data.

Question 4: Delivering data vs. images of data. What are the sweet spots for each? What are the areas where we need to focus or change?

  • It depends on audience: images and data can serve both “public” and scientific audiences
    - In general images are great for the general public or less interested specialists (maybe 80%), data needs to be available for the interested 20%.
    - The <a href="http://www.ifp.illinois.edu/nabhcs/abstracts/shneiderman.html">Ben Shneiderman UI design mantra</a> “overview, zoom & filter, details on demand” applies in this case.

Images sweet spots

  • As a means of discovery and monitoring production
  • Able to meet needs of wider audience [formats and general ease of use]

Data sweet spots

  • Scientist users who create their own visualizations
  • maintaining metadata (provenance)
  • can be used to derive multiple visualizations

Question 5: How can visualizers bridge the science-outreach divide? How to teach the public about science and the scientists about the public?

Visualization is a compelling medium that science communicators can use to make complex scientific ideas approachable to a broad audience. Carl Sagan’s Cosmos series is the prototypical example, weaving visuals with narrative to explain astrophysics. It is crucial to define an audience: there is no such thing as a uniform general public. Concrete visualizations, such as planet walks and painted lines representing sea level rise, can be particularly effective. Effective visualization requires focus: emphasize important elements of a dataset, and de-emphasize or eliminate less important data. Ideally, tools would be designed by visualizers, not computer scientists. Try for verisimilitude: make things appear how the audience expects it to appear (for example Google Earth’s discontinuous boundaries between scenes are very distracting).

(T1)
Perhaps current outreach programs are too top-down, we need a tighter, iterative relationship between viz developers, scientists and the outreach audience. We didn’t like the notion that the public can’t understand science, and consider the question: aren’t scientist part of the public? It’s important to bring scientists and their work into the community, can effective visualization facilitate the needed two-way motivation needed between scientists the community they serve?

Features for tools, easy to read, embedded description with the displays. Simple initial presentation, but allows a progressive disclosure of information and concepts as far as the user desires.

(T2)

  • Tune the visualization to the audience — but how?
    – use of toolkits, plug-ins, component frameworks
    – providing different presentations of the same data
  • Provide HTML/Flash/CMS for data
    – easier mechanism for scientists/educators to convey info
    – equivalent of wiki or “build your own web page” for general public

Part B, How to teach the public about science and the scientists about the
public?

  • Force scientists through NSF, etc, new requirements on publishing
  • Use emerging technologies like social media
  • Enable new ways of publishing
    – e.g. Data CMS, like RAMADDA

General comments:

  • Visualization is conveying information
  • Teacher knows the answer, needs to find way to convey
  • Scientist does not know the answer, needs exploratory analysis
  • There is a declining importance of traditional journal publications

Current Issues

Question 1C: You tackle visualization tasks every day. What is the one thing that you do every day that needs to be different in 5 years for your life to improve?

  • data search and retrieval
  • one data format to rule them all — standards-based in all aspects (data structure, metadata)

Question 2C: Name the three (or more) top obstacles today that limit the effective development/deployment of earth data visualizations.

Question 3C: State of gridded data fusion capabilities: what are the main obstacles/opportunities?

(T1)
Error tracking and error propagation during data fusion has to be done and provided to the user. For example when resampling data for fusion we need to keep track of the errors and how it affects the grid fusion,

Knowledge embedded in the tool to facilitate data fusion. Sort of an knowledge based system that recommends to the user how the data fusion has to be done.

Provide raw data,processed re-gridded data and software to data fusion packaged together. Users have all the pieces needed for data fusion.

Algorithm/Tools/Gridding are data dependent.What techniques are used are dependent on the data brought together.

Embed data units,scaling, offsets within the data to facilitate fusion. We have data in different units, for example we have temperature data in Fahrenheit and in Celsius. Scaling/ data unit conversion has to be embedded in the data or interpreted readily by the tool.

Strengthen interaction between standards for data fusion. OpenDAP to OGC to web services.

(T2)
How do we get the grid itself – some domains end up creating gridded data from point observation data. How does the different algorithms and parametrizations affect the outcome? How do we show the provenance?

How to handle large scale grids – tiling, etc. Its not just a visualization issue – we need integrated data systems to deal with this issue – not just client applications.

Services include – gridding (e.g., Barnes objective analysis), varying temporal and spatial resolutions, resampling, irregular and unstructured grids, pushing analysis onto the server due to the data set size. Need to stream them, etc.

For example, NCEP’s global 0.5 degree GFS model a single 3D field has dimensions of 720×361×26 and 61 time steps. This results in 412233120 points or 1.6 GB of data per field. Lots of data!

(T3)
Grids are a sampling of an underlying continuous function that is reality. That sampling has aliasing artifacts. Those artifacts may vary from point to point within the data (imagine a satellite image at an angle across the Earth’s surface – the sample size varies from one side of the image to the next).

Grid to grid fusion often involves resampling those grids to merge them into a common grid. This introduces more aliasing artifacts. To reduce those artifacts we should use an interpolation function that models that underlying continuous function. However, we often do not. Perhaps the “correct” interpolation function is not known, a subject of debate, or not available in the software.

To get a handle on the artifacts introduced, and the interpolation function to use (or not use), we need to track the source of the data and the propagation of error. This becomes a file format issue because files often store the data result, but not the path to that result and the error function.

Question 4C: Gridded data and GIS… getting these to talk to each other. How to make smart maps of GIS point and polygon data AND gridded satellite data.

Future Development Ideas

Question 1F: 2D, 3D, 4D: What’s the future of data display?

(T1)
- Data, software, and hardware all can have different dimensionality.

- How soon, if ever will 3 spatial dimension displays (stereo glasses, etc) become common.

- Issues of user interaction/control of displays, augmented reality (such iPhone location app) where user location in real world drives position in a 3 spatial dimension world shown on 2d hardware.

- Single user vs. collaborative displays – for single user you could have level of detail optimizations, where eye look direction drives detail of display.

(T2)
Google earth but with data.
Immersive technology:
2D – image, contour
3D – volume, surface
4D – 3D + time, animated
5D – multiple parameters

User interacts with data. Showing data in different ways, both geographically and analytically. Not just pretty pictures.
Probing
Transects through data
Scatter plot
Slice & dice
Time series analysis
Multiple linked views into same data (“4-up”)
Geographic displays coupled with charts

For doing science, 3D has a problem with perspective

Exploration capability. Educators are adopting 3D, via Google Earth.

Seems like we have the tools. But on a 2D display, you can control pixel color, transparency, and glyph, that’s it.

(T3)
Our consensus is that 3D and beyond are not a good use of resources and that visualizations should concentrate on 2D and 2.5D visualizations. It is desirable but with very little return on the massive efforts.

There are many reasons for this. Those being:

Humans are really poor at perceiving depth. Studies have shown that humans do not perceive more than 6 depths at one time. Humans eyes are rarely of the same strength and contributes to depth perception problems.

Computer technology, at least at this point, does a very poor job or creating the illusion of depth.

Human perception in the Z plane is only about 10% of our 2D perception strength.

Question 2F: Java/Spring/AJAX vs. Flex/Flash… what will rule in 5 years? What are the issues that need to be watched in terms of data visualization?

(Tri)
All of these technologies have problems. Some are heavy-weight that work well for applications, but not on the web (Java). Some are overly complex (AJAX). Some are proprietary or nearly so (Flash, Silverlight). Often they lack well-defined toolkits for building effective user interfaces. Weak standard support among browsers (such as IE) complicate matters.

An evolving trend is HTML 5 and JavaScript, plus JavaScript-based toolkits. These cover up browser quirks, leverage scheduling features in browser JavaScript schedules and sandboxing, and provide a client-side GUI. These toolkits, as the emerge and mature, may provide a good technology for building vis tools accessed via browsers on platforms from desktops to iPhones.

(Square)
Level of programming skill to produce rich application development for novices would be a factor. There’s some relationship to the previous question concerning presentation vs. interactive data analysis. Flash and Java seem well established, respectively, in dealing with these broad user categories. Popularity of small devices and social networking, are Flash/Java others amenable to these environments? With current pace of technological acceleration, 5 years could be a little outside a useful prediction.

(T3)

  • Both will be around and viable/popular still
  • Plug-in presence used to be an issue, but muss less so now
  • Other technologies that may emerge as contenders
    – Canvas w/video audio HTML 5 tags
    – Silverlight (less likely to dominate IOO)
  • Issues to watch in terms of data visulalization
    – ability to save/export to alternate formats
    e.g. export flex app as iPhone app
    – capabilities that will emerge with Canvas
    API (specifically relating to data import) enhancements

Question 3F: Standards for viewing earth data: what is the future… JPEG 2000? GeoTIFF? KML? Where are we going?

(T3)
The data formats are generally adequate, but the structure of the data is often inadequate. GeoTIFF can easily be abused, JPEG 2000 isn’t well supported, KML is useful but primitive (limited support for map projections). Hopefully we’ll refine existing standards, rather than proliferating poorly supported standards.

Question 4F: Data and video: Thoughts on building animations for video distribution. Where is this going?

(Red)
Tools need to have more batch processing capabilities.

Automated production (creation and serving from web) vs one-off visualizations.

Time series data vs fly-thoughs – which animations needs human intervention.

There is a distinction between animations vs video with voice-over, music, close captioning. Video takes much more time, resources.

(Green)
We agree that if the data set is appropriate for time series distribution that applications should contain a component to output animation video.

(Blue)
Question: Data and video: Thoughts on building animations for video distribution. Where is this going?

Video is dieing. It’s been years since we produced DVDs or tape. When a video is created, it’s in MPEG4, AVI, or whatever format can be played in PowerPoint or on a web page.

However, the trend is away from these canned video presentations and instead towards live demos of visualization software run on the presenter’s laptop. This is often more credible, and it allows the presenter to adapt their presentation up to the last minute before their talk, or during their talk.

Question 5F: Open Source / COTS vs. custom tools — issues with not-invented-here and the ability to write plugins for existing packages (or is the future going to continue to be comprised of an ever-expanding repertoire of software)?

(T2)
Significant factors:

  • Politics are a major driver. Motivation to take credit for tangible results, branding etc.

Assessment:

  • Status quo will likely continue. Proliferation of semi-redundant software is probably overall a positive.
    – refinement
    – reinforces successes.

(Red)
Too often scientists, etc., need to get their work done and do not (or cannot) have the luxury of time to use COTS, to do effective software engineering and management, i.e., to do the right thing from a software engineering perspective.

This is neither good nor bad – it is just the reality of the way things work.

(Green)
Open Source
Has well documented code, and API
Good design architecture for extension for customization ie, has a plug-in capability. No need to change or access system software.
Good community involvement, has anyone been able to do this.
If above yes, then why not use?

COTS: issues with propriety data formats a problem, in either case, support for issues with software may be out of the users control.

Custom tools can be very optimized to do a few tasks very well, from UI level to rendering level. A good visualization system would allow developers to extend the system at these different levels.