Lecture 23 Information Visualization

The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, … because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it. Hal Varian, Google’s Chief Economist The McKinsey Quarterly, Jan 2009

Why Infoviz?

Why Infoviz?

Analyze

  • Expand memory
  • Find patterns
  • Develop and assess hypotheses
  • Discover errors in data

Communicate

  • Communicate information to others
  • “Seeing is Believing”,
  • “Picture worth 1000 words”
  • Share and persuade
  • Collaborate and revise

Human Insight

Anscombe's quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
  • 4 x-y data sets
  • Identical means, std. deviations, correlations, regressions
  • What’s the difference?

The Datasaurus

Justin Matejka, George Fitzmaurice. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. CHI 2017

Cholera Map

Snow, John (1855). On the Mode of Communication of Cholera.
  • John Snow
    • Initiated the study of epidemiology
  • Cholera outbreak in London
  • At the time, believed transmitted by "miasma" through the air
  • Snow plotted homes of infected patients
  • Identified center of infection at a water pump
  • Removing pump handle ended the outbreak

Nightingale's Rose

Example of polar area diagram by Florence Nightingale (1820–1910).

This "Diagram of the causes of mortality in the army in the East" was published in Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army and sent to Queen Victoria in 1858.

This graphic indicates the number of deaths that occured from preventable diseases (in blue), those that were the results of wounds (in red), and those due to other causes (in black).

The legend reads: The Areas of the blue, red, & black wedges are each measured from the centre as the common vertex. The blue wedges measured from the centre of the circle represent area for area the deaths from Preventable or Mitigable Zymotic diseases, the red wedges measured from the centre the deaths from wounds, & the black wedges measured from the centre the deaths from all other causes. The black line across the red triangle in Nov. 1854 marks the boundary of the deaths from all other causes during the month. In October 1854, & April 1855, the black area coincides with the red, in January & February 1855, the blue coincides with the black. The entire areas may be compared by following the blue, the red, & the black lines enclosing them.

  • Work by Florence Nightingale
    • Famed nurse, but also accomplished statistician
    • Innovator in info viz
  • Deaths during Crimean War
    • Red: Wounds
    • Blue: disease
  • Led to commission of inquiry on sanitation
  • Led to significant improvements in barracks and hospitals

Napoleon's Russian Campaign

Charles Minard's map shows 6 types of info: geography, time, temperature, the course and direction of the army's movement, troops remaining

Small Multiples

  • Eadweard Muybridge (1878): do a horse's feet all leave the ground at once during a gallop?
  • Small multiples arrange many similar visualizations together
  • Permit comparison
  • Animation uses adjacency in time instead of space

NYC Weather

Newspapers and magazines usually choose simplicity over detail, because it is hard to show much information in a single, comprehensible display. A collection of truly bad examples from the popular media are shown on our Darts pages. Here, the attention to detail, and to graphic design are evident.

This graph, from the New York Times (Jan. 7, 1979) shows--- both valiantly, and sucessfully--- 2200 numbers which summarize the trends and patterns in weather in New York City in 1980. The three aligned charts show temperature, precipitation, and relative humidity. In the graph of temperature, the area is filled between the daily low and daily high.

What makes this graph successful, in spite of the large amount of information presented are (a) clear visual comparisons between the 1980 data and the long-run average, (b) clear textual labels, (c) visual segregation between the three series. For example, it is easy to see that March and April were about of normal temperature, but a lot wetter.

What is Visualization?

Why?

Taxonomy

Charts

Charts

Common Chart Types

Line
Bar
Pie
Scatter
Bubble
Radar

Visual Variables

As defined by Jaques Bertin 1974

Pre-Attentive Processing

Pre-attentive processing of visual information is performed automatically on the entire visual field detecting basic features of objects in the display. Such basic features include colors, closure, line ends, contrast, tilt, curvature and size. These simple features are extracted from the visual display in the pre-attentive system and later joined in the focused attention system into coherent objects. Pre-attentive processing is done quickly, effortlessly and in parallel without any attention being focused on the display. —Treisman 1985

Pre-attentive Visual Variables

Color (Healy '96)
Shape (Chipman '96)
Combination prevents (Healy '96)

Pre-Attentive Visual Variables

length
Triesman & Gormican [1988]
width
Julesz [1985]
size
Triesman & Gelade [1980]
curvature
Triesman & Gormican [1988]
number
Julesz [1985]; Trick & Pylyshyn [1994]
terminators
Julesz & Bergen [1983]
intersection
Julesz & Bergen [1983]
closure
Enns [1986]; Triesman & Souther [1985]
color (hue)
Nagy & Sanchez [1990, 1992]; D'Zmura [1991]
Kawai et al. [1995]; Bauer et al. [1996]
intensity
Beck et al. [1983]; Triesman & Gormican [1988]
flicker
Julesz [1971]
direction of motion
Nakayama & Silverman [86]; Driver & McLeod [92]
binocular lustre
Wolfe & Franzel [1988]
stereoscopic depth
Nakayama & Silverman [1986]
3-D depth cues
Enns [1990]
lighting direction
Enns [1990]

Text Not Pre-attentive

SUBJECTPUNCHEDQUICKLYOXIDIZEDTCEJBUSDEHCNUPYLKCIUQDEZIDIXO
CERTAINQUICKLYPUNCHEDMETHODSNIATRECYLKCIUQDEHCNUPSDOHTEM
SCIENCEENGLISHRECORDSCOLUMNSECNEICSHSILGNESDROCERSNMULOC
GOVERNSPRECISEEXAMPLEMERCURYSNREVOGESICERPELPMAXEYRUCREM
CERTAINQUICKLYPUNCHEDMETHODSNIATRECYLKCIUQDEHCNUPSDOHTEM
GOVERNSPRECISEEXAMPLEMERCURYSNREVOGESICERPELPMAXEYRUCREM
SCIENCEENGLISHRECORDSCOLUMNSECNEICSHSILGNESDROCERSNMULOC
SUBJECTPUNCHEDQUICKLYOXIDIZEDTCEJBUSDEHCNUPYLKCIUQDEZIDIXO
CERTAINQUICKLYPUNCHEDMETHODSNIATRECYLKCIUQDEHCNUPSDOHTEM
SCIENCEENGLISHRECORDSCOLUMNSECNEICSHSILGNESDROCERSNMULOC

Text Not Pre-attentive

SUBJECTPUNCHEDQUICKLYOXIDIZEDTCEJBUSDEHCNUPYLKCIUQDEZIDIXO
CERTAINQUICKLYPUNCHEDMETHODSNIATRECYLKCIUQDEHCNUPSDOHTEM
SCIENCEENGLISHRECORDSCOLUMNSECNEICSHSILGNESDROCERSNMULOC
GOVERNSPRECISEEXAMPLEMERCURYSNREVOGESICERPELPMAXEYRUCREM
CERTAINQUICKLYPUNCHEDMETHODSNIATRECYLKCIUQDEHCNUPSDOHTEM
GOVERNSPRECISEEXAMPLEMERCURYSNREVOGESICERPELPMAXEYRUCREM
SCIENCEENGLISHRECORDSCOLUMNSECNEICSHSILGNESDROCERSNMULOC
SUBJECTPUNCHEDQUICKLYOXIDIZEDTCEJBUSDEHCNUPYLKCIUQDEZIDIXO
CERTAINQUICKLYPUNCHEDMETHODSNIATRECYLKCIUQDEHCNUPSDOHTEM
SCIENCEENGLISHRECORDSCOLUMNSECNEICSHSILGNESDROCERSNMULOC

Elementary Abstract Data Types

Nominal (qualitative)

  • no inherent order
  • city names, types of diseases, ...

Ordinal (qualitative)

  • ordered, but not at measurable intervals
  • cold, warm, hot; historical eras …

Quantitative

  • Numeric
  • Some absolute (fixed 0): mass, length
  • Some relative (arbitrary 0): date, position

Relational

  • Connections between items
  • Social network, subway map

Visual Variables

Jaques Bertin 1974
Which visual variables for which abstract data types?

Visual Variables

Jaques Bertin 1974
Which visual variables for which abstract data types?

Visual Variable Accuracy

Mackinlay '88; Cleveland & McGill

Ranking Visual Variables by Utility

QUANTITATIVEORDINALNOMINAL
Position PositionPosition
Length DensityColor Hue
Angle Color Saturation Texture
Slope Color HueConnection
Area TextureContainment
Volume ConnectionDensity
Density Containment Color Saturation
Color SaturationLengthShape
Color Hue Angle Length

Pie Charts

There is no data that can be displayed in a pie chart, that cannot be displayed better in some other type of chartJohn Tukey

Case Study:
Challenger Shuttle Explosion

History

Challenger disaster – Jan 28, 1986. Space shuttle Challenger broke apart 73 seconds into its flight. Disintegration of the entire vehicle began after an O-ring seal in its right solid rocket booster failed at liftoff.
  • Jan 28, 1987
  • Challenger shuttle scheduled for launch
  • History of problems with O-rings
  • Unusually cold weather
  • Engineers argued against launch
  • NASA went ahead anyway
  • Why?
"With the data available to them, and with NASA knowing as well as they that the design was flawed and that temperature might be a causal factor, the engineers argued that the Challenger ought not to fly so far out of the field database, the firmest evidence available."Robison, et. al.

Debate about Launch


  • 2 of 13 pages of material faced to NASA by Morton Thiokol
  • No authors lists — reduces responsibility from falling on a person
  • 3 different names for rocket — 61A LH (NASA number), SRM no. 22A (Thiokol number), launch date (handwritten in margin)
  • 6 types of damage — erosion, soot, depth, location, extent, view
  • must integrate information across many charts and examples

Viz to Make the Argument

  • Visualization by Morton-Thiokol engineers
  • Shows problems and non-problems
  • Temperature not pre-attentive
  • Need to look close and read

A Better Alternative

Summary

Your Lying Eyes

Your Lying Eyes

Howard Wainer, How to Display Data Badly can be found online here

What's Wrong?

Examine the axes…

The Baseline Matters

What's Wrong?

Magazine article displaying cornell cost and rank.
  • Low rank is good!
  • Different time scales
    • Tuition from 1965
    • Rank from 1988
  • Not really tuition
    • Relative to income

False Perspective

Size Encoding

  • What is the visual variable?
  • Height? Diameter? Surface area? Volume?
  • Data change 5.5x
  • Volume change 270x

Network Visualization

  • Core and Outliers?
  • In fact, symmetric 3x3x4 torus!
  • Apparent core is artifact of layout algorithms and dimensionality constraints

How Not to Lie

Tufte: Graphical Integrity

Tufte: Graphical Integrity

Summary

Interaction

Interaction

Outline

Exploratory Data Analysis

John Tukey

Invented:

John Tukey

“If we need a short suggestion of what exploratory data analysis is, I would suggest that it is
  • an attitude and
  • a flexibility and
  • some graph paper (or transparencies, or both).
No catalogue of techniques can convey a willingness to look for what can be seen, whether or not anticipated. Yet this is at the heart of exploratory data analysis. The graph paper - and transparencies - are there, not as a technique, but rather as recognition that the picture-examining eye is the best finder we have of the wholly unanticipated.”

Exploratory Data Analysis

John Tukey
  • Pioneered use of computer visualization in exploratory data analysis
  • PRIM-9 System
    • Project, Rotate, Isolate,Mask (filter)
  • direct manipulation

Summary

Data Interaction

Goal

Direct Manipulation Motivation

Advanced Users

  • Think abstractly
  • Have a language of relevant actions
  • Can describe a sequence of actions (program)
  • Can debug, retry
  • Read the manual

Amateurs/Novices

  • Think concretely
  • Know what they want done, but not what to call it
  • Can show, not tell
  • Don't know what they did wrong
  • Learn by doing

Direct Manipulation Examples

Advanced Users

  • DOS command line
  • Hotkeys
  • HTML/LaTeX/WikiText source code
  • Write programs
  • Write SQL database queries
  • Zork, Adventure

Amateurs/Novices

  • Desktop files & folders
  • Menus
  • WYSIWYG editors
  • Record macros
  • Excel, some Access
  • Pong, Space Invaders

Direct Manipulation Paradigm

Shneiderman, Ben (1983). Direct Manipulation: A Step Beyond Programming Languages

Direct Manipulation Utility

DM interfaces

  • Shows data to explore
  • Shows available actions
  • Immediate feedback
  • Physical interactions directly manipulate data
  • Reversible
  • Discoverable actions

Amateurs/Novices

  • Think concretely
  • Know what they want done but not what to call it
  • Can show, not tell
  • Don't know what they did wrong
  • Learn by doing

Direct Manipulation Tradeoff

Direct Manipulation for Exporing Data

Filmfinder: Interactive Scatterplots

Filmfinder: Interactive Scatterplots

Interface Choices

More modern example: Google Maps

Pre-attentive Interaction

Data Linking and Brushing

Baseball: Eick & Wills '95

What is learned from brushing

Baseball: Eick & Wills '95

Summary