Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

3 Essential steps for better

data visualization
Dr. Abby Benninghoff

3 Essentials

1 Use the right kind of plot for your data

Figures should elucidate information, not


2 obfuscate

3 Apply principles of effective formatting

3 Essential steps for better data visualization. January 20, 2016

1
1 Use the right kind of
plot for your data

Pictures as figures
• Never assume that anyone knows what
is in a picture
– Use arrows, markers to identify features
– Include scale bar
– Specify meanings of colors
– Include key explanations in figure legend

3 Essential steps for better data visualization. January 20, 2016

2
Use diagrams to convey complex ideas
This works.
Particularly as an
overview in an
introduction section
or a review paper.

But, probably too


much for a quick
slide in a
presentation.

3 Essential steps for better data visualization. January 20, 2016

Use diagrams to convey complex ideas

This, not so much.

3 Essential steps for better data visualization. January 20, 2016

3
What type of graph?
• Line graphs for dynamic comparisons
– Dependent variable changes with respect to
time, dose, etc.
– Do not over-crowd with multiple (>4) lines
– Use curve-fitting when appropriate
(provide curve fit model)
• Scatter plot for correlations
– Most commonly made in two dimensions
– Line of best fit – linear regression
– Show r statistic, confidence interval lines or
other indication of robustness of the data
3 Essential steps for better data visualization. January 20, 2016

What type of graph?


• Bar graph for subdividing and comparing
data
– Preferred as vertical rather than horizontal
– Used if no continuum of data points that would
work as a line graph
– Make bars the same width, space between is
one-half a bar
• Pie chart to compare parts of a whole
– Useful for visually showing how much subgroups
contribute to the whole
– Problematic if too many groups are shown
– Problematic if color not allowed (shading okay,
but patterns look bad)
3 Essential steps for better data visualization. January 20, 2016

4
What type of graph? bc
c
• Box plot useful for descriptive 100

statistics ab
a
– Will highlight differences in

Tumor volume (mm3)


10
data set populations
– Box outline shows 25th to 75th
1
percentile
– Line shows the mean
0.1
– Error bars show a defined
distance set by user (min to
Water Tart Cherry
max, 10 to 90th percentile)
0.01
AIN93G TWD
– Dots indicate possible outliers

3 Essential steps for better data visualization. January 20, 2016

What type of graph?


Contour plot
• Useful for multi-dimensional data
• Contour lines indicate regions that
fit within a defined scale
• Shows scaled values in
relationship to two experimental
factors or other reference
measurements on XY axes
or other coordinate system
• Color is very effective in
these plots

3 Essential steps for better data visualization. January 20, 2016

5
2 Graphs should elucidate
information, not obfuscate

Do not manipulate to (de)emphasize observations


• Do not manipulate the
30 30

25 a 25 a

figure to sway the reader


Body weight (g)
Body weight (g)

b b
20 20

15 15

or inappropriately 10 10

emphasize or disguise 5

0
DIO
TWD
AIN93G
5

0
DIO
TWD
AIN93G

trends 2 4 6 8
Weeks of age
10 12 4 8
Weeks of age
12

• Do not start axes at 40 None


30 None
DSS

midpoint in scale without


DSS
28
Final body weight (g)
Final body weight (g)

30
26

visual cue (e.g., axis 20 24

break)
22
10
20

0 18
Male Female Male Female Male Female M F M F M F M F M F M F
AIN93G TWD DIO AIN93G TWD DIO

3 Essential steps for better data visualization. January 20, 2016

6
Including/excluding data
40

• Do not arbitrarily delete data 35


Consider how the four points colored
points without clear

Variable B
30
blue influence the regression fit.

justification 25 Are these outliers? Do they have an


outsized impact on the apparent trend?
– Experimental error documented 20

in your laboratory notebook 15


0.0 0.5 1.0 1.5
Variable A
– Statistical test for outliers 125 125

• Don’t “massage” line fits or 100 100

[3H]-E2 Bound (%)

[3H]-E2 Bound (%)


change parameters post-hoc 75 75

to best fit your data


50 50

25 25
E2 E2
PFOS PFOS
0 10:2 FtOH 0 10:2 FtOH
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
Inhibitor concentration (Log M) Inhibitor concentration (Log M)

A switch from 4-parameter to 3-parameter curve allows for fit of 3rd


data set, but is this post-hoc change in analysis appropriate?

3 Essential steps for better data visualization. January 20, 2016

Presenting multiple figures


The center panel uses a different scale for the same type
• Similar data presented jointly of data as in the left panel. Clearly, an comparison
should be shown at the same between ”Water” and “GT” is intended. But the different
scales obfuscate the comparison.
scale (most of the time) to
enable appropriate Water GT GT
comparisons 8 5 8

• Data groupings should infer 4


Response (units)

Response (units)

Response (units)

6 6

what type of analysis was


3
4 4
2
performed. 2 2
1
– This presentation suggests two 0 0 0
separate Student’s t-tests. But Control Treatment Control Treatment Control Treatment

what if a two-way ANOVA were


performed? The right panel corrects this problem. Now you can see more easily
that the response was overall a bit lower for both control and treatment
in the GT group compared to the Water group. The dashed red line
added helps you visualize the impact of the selected scale.

3 Essential steps for better data visualization. January 20, 2016

7
Graph style should reflect experiment design
Water Water
GT GT GT
• Data groupings should reflect the 8 58 5
8 8

experiment design and the type of 4 4

Response (units)

(units)
Response (units)

Response (units)

Response (units)
6 6 6 6
statistical analysis performed. 3 3

Response
4 4 4 4
– This presentation suggests two 2 2

separate Student’s t-tests. But what2 1


2 2
1
2

if a two-way ANOVA were 0


Control Treatment
00
Control
Control Treatment
Treatment
0
Control Treatment
0
Control Treatment
performed?
7
– This presentation groups the data by Water GT
6 a
control/treatment and by Water/GT,

Response (units)
5 ac
appropriate or this 2x2 experiment 4 c
design. 3
b

– Note that the first approach did not 2

allow for the most interesting 1

comparison, water vs. GT! 0


Control Treatment

3 Essential steps for better data visualization. January 20, 2016

Apply principles of effective


3 formatting

8
Characteristics of well-designed figures
• Neatness – Image is clean and sharp, invites attention
• Readability – Eye can discern important information easily, quickly
• Font – Text is large enough to be read, placed well and limited
• Size – Graph is sized appropriately for anticipated reduction during printing.
• Aesthetics – Balanced graphs, good use
of white space, eyes drawn to most important
features
• Use of color – Distinguishes important
content, pleasing to eye
• Consistency – Similar graphs have same
stylistic scheme (line width, font type
and size, labeling, scales, etc.)
3 Essential steps for better data visualization. January 20, 2016

A poorly formatted graph


Let me count the problems
1. No units
2. Scale isn’t appropriate
3. Axis, error bars and bar outline
thickness
4. Font on Y-axis too small, serif font
5. Axis title fonts uneven
6. Dependent and independent
variables not on the right axes
7. Minor ticks not helpful
8. Too many X-axis labels
9. Patterns are rather obnoxious, difficult to distinguish
10.Categories on Y-axis are not in any useful order

3 Essential steps for better data visualization. January 20, 2016

9
How can I fix it?
Scale is more appropriate;
subdivide Y-axis into big Use solid fills and big
Use large font for axis
units; don’t use minor ticks patterns to distinguish
titles, two sizes smaller
(unless log scale) bars
for axis labels; be
consistent in font sizes

Use the same thickness


Dependent variable for axes, bar outlines and
(measurement) on the error bars – thick enough
Y-axis to be seen but not overly
thick
Order your categories so that
they are easily interpreted, tell
the story; use white for control
Independent variable (treatments)
or reference group which should
on the X-axis
be placed at the left-most
position

Keep the figure looking clean and easily readable


3 Essential steps for better data visualization. January 20, 2016

Poorly formatted line graph


Straight from Excel with no tinkering.
What is missing? What are the problems?
1. Grid lines are distracting
0.12
2. No axis labels 0.1

3. Category labels are not 0.08


informative Series1
0.06
4. Series aren’t labeled 0.04
Series3

5. Position of data points not 0.02


clear (no symbols)
0
6. No error bars 1 2 3 4 5 6 7

7. Fonts too small

3 Essential steps for better data visualization. January 20, 2016

10
Reformatted (not Excel) Keep the field behind the data clear of
any grid lines (unless needed to show
reference measurement, such as for
Follow previous normalized data)
recommendations for
font sizes, axes
thickness, etc. Also use
the same thickness for
symbol outline. Use big symbols, easy
to see when graph is
reduced for publication.
Use shading to help
distinguish symbols.

If the first data point


falls on top of the Y-
axis, then offset the Include a legend to
axes for clarity. identify symbols

3 Essential steps for better data visualization. January 20, 2016

Examples of poorly done graphs

3 Essential steps for better data visualization. January 20, 2016

11
The 3D temptation
• Do not use 3D unless absolutely
necessary.
• Many other ways to show 3 levels of data
without using a rotated 3D graph

3 Essential steps for better data visualization. January 20, 2016

When 3D works

(Actually, these plots incorporate 4 levels of information; 3 different axes and color!)
3 Essential steps for better data visualization. January 20, 2016

12
Exceptions to the “rules”
• Sometimes, breaking the rules of figure formatting is necessary to make
the graph more clear and to emphasize the key findings

• Using horizontal lines to emphasize data trends, deviation from baseline


• (Note that these graphs have other design problems)

3 Essential steps for better data visualization. January 20, 2016

Exceptions to the “rules”


• Use minor ticks to
distinguish logarithmic
scales
• A signal to the reader
that the scale is not
linear

3 Essential steps for better data visualization. January 20, 2016

13
Exceptions to the “rules”
• Horizontal graphs are okay, if they aid in data interpretation

3 Essential steps for better data visualization. January 20, 2016

(Judicious) Use of color in publications


• Use color if it is necessary to emphasize
your point
• Do not use it to merely to jazz up the paper,
color is typically costly for publication
• Suggest keep your control/reference white,
color the other groups.
• Use colors with lots of contrast that go well
together; yellow or orange with blue; primary
colors; shades within a color.
• Avoid red + green
• All this said, this example graph works just
fine as B&W for print publication with the
right formatting

3 Essential steps for better data visualization. January 20, 2016

14
Black & White Color

• Color costs money in hard copy publication! So, be prepared to have


both B&W and Color versions of your data.
• Definitely use color in your presentations
3 Essential steps for better data visualization. January 20, 2016

Some figures are clearly better in color

3 Essential steps for better data visualization. January 20, 2016

15
Some figures require color

3 Essential steps for better data visualization. January 20, 2016

Poorly designed multi-panel figure

What’s wrong with this multi-panel figure?

3 Essential steps for better data visualization. January 20, 2016

16
Better design

• Only need to show the legend once since it clearly applies to both panels
• A graph title is optional (depends on reviewer); use the letters to identify the
panels in the figure legend
• Mirror formatting, scale, colors, etc. between the panels when appropriate
• Don’t have to repeat Y-axis title for panel B since it is the same measurement
3 Essential steps for better data visualization. January 20, 2016

? Can I use trendy infographics?

17
What is an
infographic?
• A visual image such
as a chart or diagram
used to represent
information or data
• Highly shareable!
• Makes information
accessible to non-
experts

3 Essential steps for better data visualization. January 20, 2016

When to use infographic style?


• Excellent for oral presentations, poster
presentations
– Especially for coordinating complex
ideas presented in introduction or
conclusion

• Excellent for public presentation


– Infographics excel at distilling complex
data into a simple visual format

• Not generally appropriate for professional


journals
– Insufficient detail, often no statistics
shown

3 Essential steps for better data visualization. January 20, 2016

18
Use elements of the infographic style
• Just as with regular figure, infographic
100
* No supplement
80 Tart Cherry

Incidence (%)
must be easily discernable within allowed 60

the time frame for viewing 40

20

• Create new data charts using different 0


AIN93G TWD
design that focuses less on details
• Avoid excessive detail, as seen in some
Tart cherry supplemented diet
infographic examples
• You may want to work with professional
40% fewer mice with
colon tumors,
but only for
graphic designer to achieve desired look those fed a
and style healthy diet

For more information on infographics, see http://www.slideshare.net/IQ_Agency/5-rulesinfographicsuccess


3 Essential steps for better data visualization. January 20, 2016

What software do you


? recommend?

19
Software options
• Excel is just awful for making science graphs
– Especially bad for multi-panel figures
– Thinks too much for you
• Try out other software that specializes in scientific data presentation
– SigmaPlot http://www.sigmaplot.com/
• Haven’t worked with this one in years
• Advanced graphics with somewhat steep learning curve
– GraphPad Prism http://www.graphpad.com/
• Easy to use interface
• Integrates statistics with graphing
• Drawback – pricey for individuals ($100/yr); bulk licensing available
• 30-day trial available

3 Essential steps for better data visualization. January 20, 2016

Primer on Prism
• Organization of sheets by
folder structure
– Data tables
– Notes
– Analyses
– Charts
– Layouts
• Each data table is linked
to other “family” sheets

3 Essential steps for better data visualization. January 20, 2016

20
Advantages of Prism A
400
Food intake
B
1500
Energy intake
c
C
40
Final body weight

Total food consumed (g)

Energy consumed (Kcal)


1400 b
350 35
a
c b
c 1300 b
b ab ab

• Very customizable
300 b a 30

Grams
a a
1200 a
250 25
1100

200 1000 20

• Create your own D


AIN TWD DIO

Gain in fat mass


MM VMM

E
0
AIN TWD DIO MM

Fasting Glucose
VMM

F
AIN TWD DIO MM VMM

“template” and then


40 200 Oral Glucose Tolerance
400
AIN93G b AIN93G
TWD
ab TWD

Fat mass (% of BW)


30 DIO 150 ab

Glucose (mg/dL)
c a DIO

use that for other


MM

Glucose (mg/dL)
a 300 MM
VMM bc
VMM
20 ab 100
a
a 200

charts
10 50

100
0 0
0 2 4 6 8 10 12 14 16 AIN TWD DIO MM VMM
0 15 30 45 60 75 90 105 120

• Excellent for multi-


Weeks
Time (min)
G H I
Insulin Leptin Resistin Tumor Multiplicity Tumor size
2.5 8.0 20 0.25 8

panel figures Concentration (ng/ml plasma)


b b

No. tumors/mm colon


2.0 0.20

Tumor volume (mm3)


6.0 15 b
6

1.5 0.15 d
4.0 10 a 4
1.0 0.10
ac a
2.0 5 c 2
0.5 0.05
c c
0.0 0.0 0 0.00 0
AIN93G TWD DIO VMM MM AIN93G TWD DIO VMM MM
IN

IN

IN
D
IO

D
IO

D
IO
TW

TW

TW
A

A
D

D
3 Essential steps for better data visualization. January 20, 2016

Advantages of Prism
• Integrates basic statistics
– Students t-test
– One-way and two-way
ANOVA
– Histogram analyses
– Contingency analysis

• Includes interpretation of
results to help students
understand analysis results
• Excellent for curve fitting

3 Essential steps for better data visualization. January 20, 2016

21
Other tools for data visualization
Enrollment: College of Agriculture & Applied Sciences, Fall 2013

• Create diagrams or simple infographics


Countries

using MS PowerPoint
1

• Create cartographic data using Tableau


195 195
1
4
12
1
2

Free public platform available for non-protected


8


6 6
1

data 1
1
1
1

• Work with digital image data using Adobe About Tableau maps: www.tableausoftware.com/mapdata

US States Utah Counties

PhotoShop or other photo editing 3


10

software*
6
1
10
1
14
2 2

– Follow discipline-specific rules about editing


6 1
3
2
1

1 8 1 1
1

image-based data
2
1
2
7 136 3
11

4 1
1 1 1 2
2

3
1
1

5
1

About Tableau maps: www.tableausoftware.com/mapdata

About Tableau maps: www.tableausoftware.com/mapdata

3 Essential steps for better data visualization. January 20, 2016

Comments and revisions of


sample figures

22
Comments on sample figures
• Patterns of lines difficult to discern
– Is it important to be able to distinguish
these lines?
– Some colors are quite similar to each other.
I see two different green dotted lines that
are quite hard to distinguish

• Need a legend to define each of the traces

• Label “Mean N = 355” doesn’t make much


sense in context of its location. Does one
of the lines indicate the mean response?
• Labels need to be bold, much larger in
size. This figure is likely to be reduced for
publication.

3 Essential steps for better data visualization. January 20, 2016

Comments on sample figures


• Not necessary to label each data
point. Crowds the graph.

• Mixture of colors is confusing.


– Avoid red border around image,
especially on the light green
background
– Use more contrast between colors or
the different locations. Phoenix and
Raleigh are quite similar.

• Place Y axis label along the axis, not


at the top
• Not necessary to use minor ticks;
makes figure look crowded.

3 Essential steps for better data visualization. January 20, 2016

23
Revised example
Average monthly temperature
100

80

Temperature (°F)
60

40

20

1 2 3 4 5 6 7 8 9 10 11 12
Month
Phoenix Raleigh Minneappolis

3 Essential steps for better data visualization. January 20, 2016

Comments on sample figure


• The grid behind the data is unnecessary, crowds the
graph
• If you remove the grid, then make sure the X and Y
axis have ticks at each of the number labels
• Font size for the X and Y axes labels should be a bit
bigger
• Use a sans serif font, such as Arial or Helvetica
• Not entirely clear whether lines represent connections
between symbols or regression analysis
• Why a rectangular plot? Scales seem very similar for
X and Y axes. Perhaps square plot is needed to show
relationships between variables more precisely.
• Symbols need to be easily distinguished. X and the
asterisk are too similar, especially right near each
other.
3 Essential steps for better data visualization. January 20, 2016

24
Comments on sample figure
8 Cch1 (r2 = 0.994)
Cch3 (r2 = 0.999)
Cch4 (r2 = 0.996)
Cch5 (r2 = 0.990)

Head at Toe (m)


6

0
0 2 4 6 8
Differential Head over Levee (m)

I assumed connecting line was a regression in this example, and showed the r2 value
I also shortened the legend text, as the 1.47E-1 cm 2/s can easily go into the figure legend text.
But, if showing the figure as a presentation, the author may want to keep that info.
3 Essential steps for better data visualization. January 20, 2016

Contact for more information or questions

Abby D. Benninghoff
Department of Animal, Dairy and Veterinary Sciences
abby.benninghoff@usu.edu
435-797-8649

3 Essential steps for better data visualization. January 20, 2016

25

You might also like