Output from Visualization Summit

RESPONSES TO BIG QUESTIONS (edited)

=========================

still in progress

                         Overall questions

Question 1: Imagine the perfect earth data remote sensing visualization tool/system. What are the main components of this system?

 * Needs to closely track needs and abilities of a variety of
   audiences
 * Support multiple GUIs (meaning API-based back-end to support
   tools?)
 * Open Source
 * Solves data format/standards problem -- new data type? derivative
   data types?
 * Social web, community-based: support collaborative workflow and
   data discovery
 * One-stop repository to support aggregate querying -- e.g., What is
   the causality relationship between meningitis outbreaks and
   precipitation patterns in sub-Saharan Africa?
 * Provide ability to store and track provenance

Question 2: Visualization for understanding vs. conveying -- It's all about the user: How should tools and visualizations be tailored for distinct user groups?

Two different issues:

 * Conveying -- you know the answer, need to display it in a good
   way. "Static"
 * Understanding -- trying to find the answer via exploration. Needs
   more facilities. "Interactive"

Hard to build a general-purpose tool, to satisfy power users vs. non-power users, cross-discipline. Same tool needs to scale with abilities.

Question 3: Are there user groups that you know are under-served by the current data visualization technology? What needs to change to serve these groups?

 * Everyone is underserved, but not equally.
 * PIs/those closest to data are best served by way of their
   familiarity with the data.
 * Solving this problem is an argument in favor of plug-in based
   development.
 * Tools need to be developed to support each audience -- different
   audiences have wildly different needs in terms of capabilities and
   end result.
 * Of course, serving user groups only comes after they have located
   the data.

Question 4: Delivering data vs. images of data. What are the sweet spots for each? What are the areas where we need to focus or change?

 * It depends on audience: images and data can serve both "public"
   and scientific audiences
   - In general images are great for the general public or less
   interested specialists (maybe 80%), data needs to be available for
   the interested 20%.
   - The Ben Shneiderman UI design mantra "overview, zoom & filter,
   details on demand" applies in this case.

Images sweet spots

 * As a means of discovery and monitoring production
 * Able to meet needs of wider audience [formats and general ease of
   use]

Data sweet spots

 * Scientist users who create their own visualizations
 * maintaining metadata (provenance)
 * can be used to derive multiple visualizations

Question 5: How can visualizers bridge the science-outreach divide? How to teach the public about science and the scientists about the public?

Visualization is a compelling medium that science communicators can use to make complex scientific ideas approachable to a broad audience. Carl Sagan’s Cosmos series is the prototypical example, weaving visuals with narrative to explain astrophysics. It is crucial to define an audience: there is no such thing as a uniform general public. Concrete visualizations, such as planet walks and painted lines representing sea level rise, can be particularly effective. Effective visualization requires focus: emphasize important elements of a dataset, and de-emphasize or eliminate less important data. Ideally, tools would be designed by visualizers, not computer scientists. Try for verisimilitude: make things appear how the audience expects it to appear (for example Google Earth’s discontinuous boundaries between scenes are very distracting).

(T1) Perhaps current outreach programs are too top-down, we need a tighter, iterative relationship between viz developers, scientists and the outreach audience. We didn’t like the notion that the public can’t understand science, and consider the question: aren’t scientist part of the public? It’s important to bring scientists and their work into the community, can effective visualization facilitate the needed two-way motivation needed between scientists the community they serve?

Features for tools, easy to read, embedded description with the displays. Simple initial presentation, but allows a progressive disclosure of information and concepts as far as the user desires.

(T2)

 * Tune the visualization to the audience -- but how?
   - use of toolkits, plug-ins, component frameworks
   - providing different presentations of the same data

 * Provide HTML/Flash/CMS for data
   - easier mechanism for scientists/educators to convey info
   - equivalent of wiki or “build your own web page” for
   general public

Part B, How to teach the public about science and the scientists about the public?

 * Force scientists through NSF, etc, new requirements on publishing
 * Use emerging technologies like social media
 * Enable new ways of publishing
   – e.g. Data CMS, like RAMADDA

General comments:

 * Visualization is conveying information
 * Teacher knows the answer, needs to find way to convey
 * Scientist does not know the answer, needs exploratory analysis
 * There is a declining importance of traditional journal
   publications

                           Current Issues

Question 1C: You tackle visualization tasks every day. What is the one thing that you do every day that needs to be different in 5 years for your life to improve?

 * data search and retrieval
 * one data format to rule them all -- standards-based in all aspects
   (data structure, metadata)

Question 2C: Name the three (or more) top obstacles today that limit the effective development/deployment of earth data visualizations.

Question 3C: State of gridded data fusion capabilities: what are the main obstacles/opportunities?

(T1) Error tracking and error propagation during data fusion has to be done and provided to the user. For example when resampling data for fusion we need to keep track of the errors and how it affects the grid fusion,

Knowledge embedded in the tool to facilitate data fusion. Sort of an knowledge based system that recommends to the user how the data fusion has to be done.

Provide raw data,processed re-gridded data and software to data fusion packaged together. Users have all the pieces needed for data fusion.

Algorithm/Tools/Gridding are data dependent.What techniques are used are dependent on the data brought together.

Embed data units,scaling, offsets within the data to facilitate fusion. We have data in different units, for example we have temperature data in Fahrenheit and in Celsius. Scaling/ data unit conversion has to be embedded in the data or interpreted readily by the tool.

Strengthen interaction between standards for data fusion. OpenDAP to OGC to web services.

(T2) How do we get the grid itself – some domains end up creating gridded data from point observation data. How does the different algorithms and parametrizations affect the outcome? How do we show the provenance?

How to handle large scale grids – tiling, etc. Its not just a visualization issue – we need integrated data systems to deal with this issue – not just client applications.

Services include – gridding (e.g., Barnes objective analysis), varying temporal and spatial resolutions, resampling, irregular and unstructured grids, pushing analysis onto the server due to the data set size. Need to stream them, etc.

For example, NCEP’s global 0.5 degree GFS model a single 3D field has dimensions of 720×361×26 and 61 time steps. This results in 412233120 points or 1.6 GB of data per field. Lots of data!

(T3) Grids are a sampling of an underlying continuous function that is reality. That sampling has aliasing artifacts. Those artifacts may vary from point to point within the data (imagine a satellite image at an angle across the Earth’s surface – the sample size varies from one side of the image to the next).

Grid to grid fusion often involves resampling those grids to merge them into a common grid. This introduces more aliasing artifacts. To reduce those artifacts we should use an interpolation function that models that underlying continuous function. However, we often do not. Perhaps the “correct” interpolation function is not known, a subject of debate, or not available in the software.

To get a handle on the artifacts introduced, and the interpolation function to use (or not use), we need to track the source of the data and the propagation of error. This becomes a file format issue because files often store the data result, but not the path to that result and the error function.

Question 4C: Gridded data and GIS... getting these to talk to each other. How to make smart maps of GIS point and polygon data AND gridded satellite data.

                      Future Development Ideas

Question 1F: 2D, 3D, 4D: What's the future of data display?

(T1) - Data, software, and hardware all can have different dimensionality.

- How soon, if ever will 3 spatial dimension displays (stereo glasses, etc) become common.

- Issues of user interaction/control of displays, augmented reality (such iPhone location app) where user location in real world drives position in a 3 spatial dimension world shown on 2d hardware.

- Single user vs. collaborative displays – for single user you could have level of detail optimizations, where eye look direction drives detail of display.

(T2) Google earth but with data. Immersive technology: 2D – image, contour 3D – volume, surface 4D – 3D + time, animated 5D – multiple parameters

User interacts with data. Showing data in different ways, both geographically and analytically. Not just pretty pictures. Probing Transects through data Scatter plot Slice & dice Time series analysis Multiple linked views into same data (“4-up”) Geographic displays coupled with charts

For doing science, 3D has a problem with perspective

Exploration capability. Educators are adopting 3D, via Google Earth.

Seems like we have the tools. But on a 2D display, you can control pixel color, transparency, and glyph, that’s it.

(T3) Our consensus is that 3D and beyond are not a good use of resources and that visualizations should concentrate on 2D and 2.5D visualizations. It is desirable but with very little return on the massive efforts.

There are many reasons for this. Those being:

Humans are really poor at perceiving depth. Studies have shown that humans do not perceive more than 6 depths at one time. Humans eyes are rarely of the same strength and contributes to depth perception problems.

Computer technology, at least at this point, does a very poor job or creating the illusion of depth.

Human perception in the Z plane is only about 10% of our 2D perception strength.

Question 2F: Java/Spring/AJAX vs. Flex/Flash... what will rule in 5 years? What are the issues that need to be watched in terms of data visualization?

(Tri) All of these technologies have problems. Some are heavy-weight that work well for applications, but not on the web (Java). Some are overly complex (AJAX). Some are proprietary or nearly so (Flash, Silverlight). Often they lack well-defined toolkits for building effective user interfaces. Weak standard support among browsers (such as IE) complicate matters.

An evolving trend is HTML 5 and JavaScript, plus JavaScript-based toolkits. These cover up browser quirks, leverage scheduling features in browser JavaScript schedules and sandboxing, and provide a client-side GUI. These toolkits, as the emerge and mature, may provide a good technology for building vis tools accessed via browsers on platforms from desktops to iPhones.

(Square) Level of programming skill to produce rich application development for novices would be a factor. There’s some relationship to the previous question concerning presentation vs. interactive data analysis. Flash and Java seem well established, respectively, in dealing with these broad user categories. Popularity of small devices and social networking, are Flash/Java others amenable to these environments? With current pace of technological acceleration, 5 years could be a little outside a useful prediction.

(T3)

 * Both will be around and viable/popular still

 * Plug-in presence used to be an issue, but muss less so now

 * Other technologies that may emerge as contenders
   – Canvas w/video audio HTML 5 tags
   – Silverlight (less likely to dominate IOO)

 * Issues to watch in terms of data visulalization
   – ability to save/export to alternate formats
   e.g. export flex app as iPhone app
   – capabilities that will emerge with Canvas
   – API (specifically relating to data import) enhancements

Question 3F: Standards for viewing earth data: what is the future... JPEG 2000? GeoTIFF? KML? Where are we going?

(T3) The data formats are generally adequate, but the structure of the data is often inadequate. GeoTIFF can easily be abused, JPEG 2000 isn’t well supported, KML is useful but primitive (limited support for map projections). Hopefully we’ll refine existing standards, rather than proliferating poorly supported standards.

Question 4F: Data and video: Thoughts on building animations for video distribution. Where is this going?

(Red) Tools need to have more batch processing capabilities.

Automated production (creation and serving from web) vs one-off visualizations.

Time series data vs fly-thoughs – which animations needs human intervention.

There is a distinction between animations vs video with voice-over, music, close captioning. Video takes much more time, resources.

(Green) We agree that if the data set is appropriate for time series distribution that applications should contain a component to output animation video.

(Blue) Question: Data and video: Thoughts on building animations for video distribution. Where is this going?

Video is dieing. It’s been years since we produced DVDs or tape. When a video is created, it’s in MPEG4, AVI, or whatever format can be played in PowerPoint or on a web page.

However, the trend is away from these canned video presentations and instead towards live demos of visualization software run on the presenter’s laptop. This is often more credible, and it allows the presenter to adapt their presentation up to the last minute before their talk, or during their talk.

Question 5F: Open Source / COTS vs. custom tools -- issues with not-invented-here and the ability to write plugins for existing packages (or is the future going to continue to be comprised of an ever-expanding repertoire of software)?

(T2) Significant factors:

 * Politics are a major driver. Motivation to take credit for
   tangible results, branding etc.

Assessment:

 * Status quo will likely continue. Proliferation of semi-redundant
   software is probably overall a positive.
   – refinement
   – reinforces successes.

(Red) Too often scientists, etc., need to get their work done and do not (or cannot) have the luxury of time to use COTS, to do effective software engineering and management, i.e., to do the right thing from a software engineering perspective.

This is neither good nor bad – it is just the reality of the way things work.

(Green) Open Source Has well documented code, and API Good design architecture for extension for customization ie, has a plug-in capability. No need to change or access system software. Good community involvement, has anyone been able to do this. If above yes, then why not use?

COTS: issues with propriety data formats a problem, in either case, support for issues with software may be out of the users control.

Custom tools can be very optimized to do a few tasks very well, from UI level to rendering level. A good visualization system would allow developers to extend the system at these different levels.

                       Book related questions

Question 1B: Specific content suggestions

Data / meta-data standards, conventions. Standards and conventions for visual display (color map issues / conflicts between domains). Color maps – selection based upon different intents (perception of different colors, generic conventions: “blue == cold”, “red=hot” ; “rainbow” is problematic, etc). Software Interoperability (and relation to file format standards and conventions). Survey of existing tools. What’s the current state of the technology? What software is available now? What needs to be fixed? Future directions. What do we need to do better? Common (dimension-independant) issues VS problems specific to 2D or 3D data.

(T3) - How to incentivize sharing of data in usable formats a) requiring b) build an online community, ranking system for contributors c) build the best tool, providers will want to comply (Google Earth is good example – everyone wants to get their data working in GE now) - The most common current problems with status quo: data retrieval,data formats, etc. - Current State-of-the-art tool summaries - Post meeting, have all attendees agree on Book chapter outline

(Tri) - should discuss file formats/interoperability issues in abstract way - example of successes: can be partial successes such as NetCDF, which is very flexible syntax, but conventions for using syntax were not well established so many variations exist - describes example visualization going through thought behind design decisions – examples of this already are “sound on sound” tutorials, visualization blogs such as http://eagereyes.org/

Question 2B: Audience ideas and considerations

 * non-professional visualizers as a way of communicating the
   practices, but not "dumbed-down"

Question 3B: Website issues (links to durable URLs for product info, etc.)

 * durable links
 * Have online examples or not: examples will become dated quickly
   vs. interaction with examples discussed in book that will
   facilitate understanding of them

Question 4B: Getting others involved OR not

Perhaps, but must be a strong commitment; number still needs to be kept so that logistics do not overwhelm.

What is the equivalent of pagerank for searching geoscience data?

Criteria for computing "pagerank":

 * Spatial and temporal resolution (assuming high-frequency and
   -resolution = good)
 * Time distance from desired date
 * Quality of metadata (compliant with standards)
 * Includes calibration / control data
 * Provenance
 * Lowest error/trustworthiness
 * Data provider (credibility of curator)
 * Consistency (outlier detection)
 * Supported data formats (NetCDF vs. custom binary)
 * Popularity (links, citations, social ranking)
 * Frequency of updates (is dataset current and reliably updated)
 * Access method (direct online access better than placing an order)

Common Sources of Error and How Can Visualization Tools Help? (not absolutely certain what this actual question was)

Error or uncertainty? Uncertainty can be shown by flagging areas at certain data quality levels, showing confidence intervals, or plotting the results of model ensembles.

Sources of error:

 * Floating-point representation as text
 * Gridding (of points), resampling, changing resolution,
   reprojecting
 * Interpolation
 * Incomplete or inconsistent metadata
 * Undocumented satellite data correction
 * Incorrect math (e.g., floor vs. ceil vs. trunc)

What visualization tools can do:

 * Represent error in visualization using error bars, or color coding
   by uncertainty
 * Only possible when the error or possible sources of error are
   captured in the data or metadata