Download as pdf or txt
Download as pdf or txt
You are on page 1of 96

CDS 6324

DATA VISUALIZATION

Lecture 4: Graphics Integrity, Visualization Design and Tools


“delight both by wonder
of the spectacle and the
accuracy of expression”
--Federico Cesi
Revision: Data & Image Models
Data & Image Models

❑ Formal specification
❑ Data model: relational data; N,O,Q types
❑ Image model: visual encoding channels
❑ Encodings map data to visual variables

❑ Choose expressive and effective encodings


❑ Rule-based tests of expressiveness
❑ Perceptual effectiveness rankings
From Data Model to N, O, Q
❑ People Count Q-Ratio Measure
❑ Year Q-Interval (O) Dimension
❑ Age Q-Ratio (O) Depends!
❑ Sex N Dimension
❑ Marital Status N Dimension
Image Model - Visual Encoding Variables
Position (x 2)
Size
Value
Texture
Color
Orientation
Shape

[Bertin, Graphics and Graphic Information Processing 1981]


Mackinlay’s Ranking

Conjectured effectiveness of encodings by data type


Design Criteria [Mackinlay 86]

❑ Expressiveness
A set of facts is expressible in a visual language if the sentences (i.e.
the visualizations) in the language express all the facts in the set of
data, and only the facts in the data.

❑ Effectiveness
A visualization is more effective than another visualization if the
information conveyed by one visualization is more readily perceived
than the information in the other visualization.
In A Nutshell

❑ Tell the truth and nothing but the truth


(don’t lie, and don’t lie by omission)

❑ Use encodings that people decode better


(where better = faster and/or more accurate)
Graphics Excellence &
Graphics Integrity
Graphics Excellence
Graphics Excellence

● The well-designed presentations of interesting data are a


matter of
○ substance,
○ statistics, and
○ design.
Graphics Excellence

● consists of complex ideas communicated with


○ clarity
○ precision
○ efficiency
Graphics Excellence
● is that which gives
the viewer
○ the greatest
number of ideas
○ in the shortest time
○ with the least ink
Graphics Excellence

● is nearly always multivariate


● requires telling the truth about the data
Graphics Excellence

Graphical Excellence begins with


Graphical Integrity …
Graphics Integrity

● “not lying with statistics”


● tell the truth about data
Graphics Integrity

($11,014)

-$4,200,000
[Day Mines, Inc., 1974 Annual Report]
Graphics Integrity

[New York Times, 8/8/78]


Graphics Integrity
Graphics Integrity
● Formalizing Distortion on Infographics

Perceived = Actualx x = 0.8 ± 0.3


Graphics Integrity : Examples lacking integrity

● Perceived Area
○ grows more slowly than measured area
○ varies between people
○ changes with experience
○ changes with context
○ changes with loading
Graphics Integrity

● Guidelines:
○ The measurement of the graphic should be in proportion to the quantity
○ Clear, detailed labels should explain distortion and events, on the graphic.

size of effect shown in graphic


Lie Factor ==
size of effect in data
Graphics Integrity
● Lie Factor

5.3 – 0.6
0.6
=
27.5 – 18
18

= 14.8
[New York Times, 8/9/78]
Graphics Integrity

● Past and Future are reversed


● Foreshortening is confusing two issues
● Scale is crazy
Graphics Integrity
Graphics Integrity
Graphics Integrity

● Design vs Data Variation


○ We expect that patterns will continue
○ Don’t confuse design variation and data variation
Graphics Integrity

[NSF, 1974]
Graphics Integrity

[NSF, 1974]
Graphics Integrity

Design variation:
● Lie Factor = 15.1
[New York Times, 19/12/78]
Graphics Integrity

• 454% in data
• 4280% in graphic

Design variation:
● Lie Factor = 9.4

[New York Times, 4/9/79]


Graphics Integrity

• 708% in data
• 6700% in graphic

Design variation:
● Lie Factor = 9.5
Graphics Integrity
Good example:
adjusted dollar amounts
for inflation
Graphics Integrity
Good example:
adjusted dollar amounts
for inflation
Graphics Integrity
Good example:
adjusted dollar amounts
for inflation
Graphics Integrity
Bad example
Graphics Integrity
● Done well
● Adjusted for inflation
● Adjusted for population
● No chart junk
Graphics Integrity

Context is essential for graphical integrity


• Graphics must not quote data out of context
Graphics Integrity
Graphics Integrity
Graphics Integrity
Summary of Graphics Integrity
• The size of the graphic should match the size of the
quantity Lie Factor = 1.0
• Labels, explanations and events should be on the graph
• Data variation should dominate, not design variation
• Time-series with money should be inflation adjusted and
standardized
• The number of dimensions of data should match the
number of dimensions in the graphic
• Graphics should be put in context
Deceptive Charts
Bars Must Always Start at Zero

Storytelling With Data, page 51


Avoid Pie Charts and 3-D
Be Careful with Dual Axes

Storytelling With Data, pages 67-68


Synchronize Axes
Make sure scale is in standard direction
Independent variable on x-axis
What is Wrong With This Graph?
What is Wrong With This Graph?

Truthful Art, page 46


Visualization Design
Principles
Tufte’s Design Principles

❑ Data-Ink Maximization and Graphical Redesign


❑ Remove Chartjunk
❑ Multi-functioning Graphical Elements
❑ Data Density
❑ Small Multiples and Parallel Sequencing
Data Ink and Graphic Redesign
Tufte’s Five Laws of Data-Ink:
1. Above all else show the data.
2. Maximize the data-ink ratio
3. Erase non-data-ink.
4. Erase redundant data-ink.
5. Revise and edit.
Data Ink and Graphic Redesign
❑ Maximize data‐ink ratio
Data ink
Data ink ratio =
Total ink used in graphic
= proportion of graphic’s ink devoted to
the non-redundant display of data-
information
Data Ink and Graphic Redesign
❑ Example

Erase non‐data‐ink (within reason)


Erase redundant data-‐ink
Redesign – Bar Chart
Avoid Chartjunk

❑ Extraneous visual elements that detract from message:

“Chartjunk promoters imagine that numbers and


details are boring, dull, and tedious, requiring
ornament to enliven.

If the numbers are boring…then you’ve got the


wrong numbers”
Avoid Chartjunk

"Readers of a report should be unaware of its


'design.' Rather, they should be enticed into reading it
by interesting content, logical arrangement and
simple presentation. The printed page should appear
natural and authoritative, avoiding gimmicks which
might get in the way of its documentary character." Paul
Rand, "Design," in Speaking Out on Annual Reports
(New York, 1983).
Multi-functioning Elements

“Mobilize every graphical element,


perhaps several times over,
to show the data”

❑ In other words, try to make all present graphical


elements data encoding elements.
Multi-functioning Elements
❑ Stem and Leaf Plots
❑ construct the
distribution of a variable
with numbers
themselves

The simplest and most


useful meaningful mark
is a digit.
- John Tukey
Tufte, Visual Display of Quantitative Information, pg. 140
Multi-functioning Elements
❑ Stem and Leaf Plots
❑ The volcanos are ordered by height, and grouped by region.
❑ Here are, for instance, the volcanos corresponding to the line
13 | 47830 — in Cameroon, Hawaii, and three in Guamemala:

https://ilyabirman.net/meanwhile/all/tufte-mystery/
Multi-functioning Elements
❑ Chernoff Faces
❑ Show a bunch of variables
at once via facial features
like lips, eyes, and nose
size.

Tufte, Visual Display of Quantitative Information, pg. 142

https://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/
Multi-functioning Elements
❑ Chernoff Faces
❑ Chernoff faces for lawyers'
ratings of twelve judges

https://en.wikipedia.org/wiki/Chernoff_face
Multi-functioning Elements
❑ Coordinate labels as Marks

Tufte, Visual Display of Quantitative Information, pg. 140


Maximize Data Density
❑ Maximize data density and the size of the data
matrix within reason

# entries in data matrix


Data density =
area of data graphic
Maximize Data Density
Maximize Data Density
❑ This graph shows both valiantly, and successfully 2200 numbers which
summarize the trends and patterns in weather in New York City in
1980. The three aligned charts show temperature, precipitation, and
relative humidity. In the graph of temperature, the area is filled
between the daily low and daily high.
❑ Segregates visually three different time series
❑ Aligns the axes on different plots to assist in making comparisons
across the time series
❑ Uses markers to make to clearly
denote when extremes occurred
❑ Contains clear textual labels
Small Multiples and Parallel Sequencing

❑ Previous examples improved data density by using a LARGE data


matrix.
# entries in data matrix
Data density =
area of data graphic

❑ Alternatively, we can reduce the size of the graphic = the Shrink


Principle
❑ Repeated application of this principle leads to a Small Multiples
design
Small Multiples and Parallel Sequencing

❑ Escape flatland – small multiples, parallel sequencing


❑ Data is multivariate
❑ Doesn’t necessarily mean 3D projection
❑ How can we enhance multivariate data on inherently 2D
surfaces?
Small Multiples
Small Multiples

[New York Times]


Parallel Sequencing
Parallel Sequencing
Tufte’s Design Principles
❑ Data-Ink Maximization and Graphical Redesign
❑ Remove Chartjunk
❑ Multi-functioning Graphical Elements
❑ Data Density
❑ Small Multiples and Parallel Sequencing
Visualization Tools
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts
Interactive Data Exploration
Tableau, Qlik, Lyra, Polestar, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite
Visualization Grammars
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs
OpenGL, Java2D, Python, R
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts
Interactive Data Exploration
Tableau, Qlik, Lyra, Polestar, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite
Visualization Grammars
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs
OpenGL, Java2D, Python, R
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts Charting Tools /
Interactive Data Exploration Graphical Interfaces
Tableau, Qlik, Lyra, Polestar, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite Declarative
Visualization Grammars Language
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs Programming
OpenGL, Java2D, Python, R Toolkits
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts Charting Tools /
Interactive Data Exploration Graphical Interfaces
Tableau, Qlik, Lyra, Polestar, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite Declarative
Visualization Grammars Language
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs Programming
OpenGL, Java2D, Python, R Toolkits
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts Charting Tools /
Interactive Data Exploration Graphical Interfaces
Tableau, Qlik, Lyra, Polestar, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite Declarative
Visualization Grammars Language
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs Programming
OpenGL, Java2D, Python, R Toolkits
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts Charting Tools /
Interactive Data Exploration Graphical Interfaces
Tableau, Qlik, Lyra, Polestar, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite Declarative
Visualization Grammars Language
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs Programming
OpenGL, Java2D, Python, R Toolkits
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts Charting Tools /
Interactive Data Exploration Graphical Interfaces
Tableau, Qlik, Lyra, Polestar, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite Declarative
Visualization Grammars Language
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs Programming
OpenGL, Java2D, Python, R Toolkits
Visualization Tool Stack
Charting Tools
Excel, Many Eyes, Google Charts Charting Tools /
Interactive Data Exploration Graphical Interfaces
Tableau, Qlik, PowerBI, Lyra, Voyager

Visual Analysis Grammars


VizQL, ggplot2, Vega-Lite Declarative
Visualization Grammars Language
Protovis, D3.js, Vega

Component Architectures
Prefuse, Flare, Improvise, VTK
Graphics / Visualization APIs Programming
OpenGL, Java2D, Python, R Toolkits
Why Declarative Languages?
❑ Programming by describing what, not how
❑ Separate specification (what you want) from execution (how
it should be computed)
❑ In contrast to imperative programming, where you must give
explicit steps.
d3.selectAll("rect")
.data(my_data)
.enter().append("rect")
.attr("x", function(d) { return xscale(d.foo); })
.attr("y", function(d) { return yscale(d.bar); })
Why Declarative Languages?

❑ Faster iteration. Less code. Larger user base.


❑ Better visualization. Smart defaults.
❑ Reuse. Write-once, then re-apply.
❑ Performance. Optimization, scalability.
❑ Portability. Multiple devices, renderers, inputs.
❑ Programmatic generation.
❑ Write programs which output visualizations.
❑ Automated search & recommendation.
Graphics Excellence
Tufte’s Principles:
❑ Graphical excellence is the well -designed presentations of
interesting data --- a matter of substance, statistics, and design.
❑ Graphical excellence consists of complex ideas communicated with
clarity, precision and efficiency.
Recap
Graphics Excellence
Tufte’s Principles:
Graphical excellence is that
which gives to the viewer the
greatest number of ideas in the
shortest time with the least ink
in the smallest space
Graphics Integrity
Graphical Integrity Principles: Lie Factor = 1.0
1. Avoid distortion and ambiguity
Graphics Integrity
Graphical Integrity Principles
1. Avoid distortion and ambiguity
2. Show data variation, not design
variation (avoid fancy chart junk
/ tricks!)
Graphics Integrity
Graphical Integrity Principles
1. Avoid distortion and
ambiguity
2. Show data variation, not
design variation (avoid fancy
chart junk / tricks!)
3. Account for Inflation
Graphics Integrity
Graphical Integrity Principles
1. Avoid distortion and
ambiguity
2. Show data variation, not
design variation (avoid fancy
chart junk / tricks!)
3. Account for Inflation
4. Graphics should be put in
context.
Tufte’s Design Principles
❑ Data-Ink Maximization and Graphical Redesign
❑ Remove Chartjunk
❑ Multi-functioning Graphical Elements
❑ Data Density
❑ Small Multiples and Parallel Sequencing
The KISS Principle

Keep it Simple, Stupid!

“It seems that perfection is reached not when there is nothing


to add, but when there is nothing left to take away.”
- Antoine de Saint Exupéry

*This quote sums up Edward Tufte’s approach to data visualization.

You might also like