Professional Documents
Culture Documents
Tufte in R
Tufte in R
Tufte in R
Tufte in R
Lukasz Piwek (http://www.lukaszpiwek.com)
Updates
Introduction
Minimal line plot
Range-frame (or quartile-frame) scatterplot
Dot-dash (or rug) scatterplot
Marginal histogram scatterplot
Minimal boxplot
Minimal barchart
Slopegraph
Sparklines
Stem-and-leaf display
Discussion
Updates
6th of July 2017: (1) New category - interactive plots made in Tufte-style with R - the first addition is basic
line plot and basic barchart with the use of package highcharter (http://jkunst.com/highcharter/); (2)
Revised slopegraph in base graphics - Thomas Leeper has implemented his slopegraph functions into
development version package on GitHub (https://github.com/leeper/slopegraph); (3) Revised sparklines
in lattice get a gray bands (thanks to Bryan Urban for sharing his code on this); (4) Improved quality of all
example figures; (5) Commented code and removed plot for range frame plot and dot-dash plot in base
graphics produced with Steven Murdoch’s custom GitHub fancyaxis function
(https://github.com/sjmurdoch/fancyaxis) - doesn’t work anymore; (6) Commented code and removed
plot for slopegraph produced with James Keirstead’s GitHubs function (https://github.com/jkeirstead/r-
slopegraph) - doesn’t work anymore.
30th of May 2016: (1) Added a new category - marginal histogram scatterplots for base graphics and ggplot2;
(2) added new sparklines in base graphics with plotSparklineTable
(http://finzi.psych.upenn.edu/R/library/epanetReader/html/plotSparklineTable.html) function from
epanetReader (https://cran.r-project.org/web/packages/epanetReader/index.html) package; (3) added
new slopegraph in base graphic with bumpchart (http://www.inside-
r.org/packages/cran/plotrix/docs/bumpchart) function from plotrix (https://cran.r-
http://motioninsocial.com/tufte/ Page 1 of 29
Tufte in R 11/01/2022, 19:20
project.org/web/packages/plotrix/) package (thanks to Jim Lemon forthis suggestion); (4) added range-
frame scatterplot in ggplot2 with function qfplot (https://raw.githubusercontent.com/bearloga/Quartile-
frame-Scatterplot/master/qfplot.R) by Mikhail Popov.
16th of February 2016: (1) Added stem-and-leaf displays; (2) added sparklines in ggplot2 (thanks to Wouter
Van Der Bijl (http://stackoverflow.com/users/4341440/axeman) for addressing my StackOverflow
(http://stackoverflow.com/questions/35434760/sparklines-in-ggplot2/35436104#35436104) question); (3)
back to all-in-one-page format and revised sparklines to simplify; (4) range-frame and dot-dash plots split
into separate sections for clarity; (5) overlong lines in ggplot2 version of dot-dash plots is back due to recent
updates in ggplot2 that crash previous solution; (6) changed data set for slopegraph to match it between
graphical systems.
17th of October 2015: Thank you for the kind words regarding this project! Major changes in this update: (1)
added the first two methods to create Sparklines using Base Graphics and Lattice; (2) the document has
become too large to keep on one-page. Its now split into two separate pages-chapters: Chapter 1: Line plot,
Boxplot, Barchart and Slopegraph and Chapter 2: Sparklines; (3) corrected overlong lines produces in y-axis
when making dot-dash plots with panel.rug() for Lattice (thanks to Josh O’Brien
(http://stackoverflow.com/questions/33222841/lattices-panel-rug-produces-different-line-length-with-
wide-plot)) and with geom_rug() for ggplot2 (thanks to BondedDust
(http://stackoverflow.com/questions/33223311/ggplot2-geom-rug-produces-different-line-length-with-
wide-plot/33224205?noredirect=1#comment54254444_33224205)); (4) added this list of updates to keep a
reasonable track of changes.
Introduction
Motivation
The idea behind Tufte in R is to use R (http://www.r-project.org) - the most powerful open-source statistical
programming language - to replicate excellent visualisation practices developed by Edward Tufte
(http://www.edwardtufte.com/tufte/books_vdqi). It’s not a novel approach - there are plenty of excellent
R functions and related packages wrote by people who have much more expertise in programming than
myself. I simply collect those resources in one place in an accessible and replicable format, adding a few bits
of my own coding discoveries.
Format
Each visualisation is provided in three graphical systems used in R: base graphics (http://rstudio-pubs-
static.s3.amazonaws.com/7953_4e3efd5b9415444ca065b1167862c349.html), lattice (https://stat.ethz.ch/R-
manual/R-devel/library/lattice/html/Lattice.html) and ggplot2 (http://ggplot2.org). As an example data I
mainly use basic data sets easily accessible within R. Occasionally I use data from package psych
developed by William Revelle, package MASS developed by Brian Ripley with collegues and various
custom data I link via my Gist profile (https://gist.github.com/GeekOnAcid).
Requirements
You need the most recent version of R (http://www.r-project.org) installed on you computer. You also need
a basic understanding of R and there are some great online tutorials to get you started. I also recommend R
Studio (https://www.rstudio.com/products/RStudio/) as an integrated development environment for R.
I use resources from a number of R packages. You can install all those packages at once via R console using
the command below:
http://motioninsocial.com/tufte/ Page 2 of 29
Tufte in R 11/01/2022, 19:20
We start by plotting the most basic graph from page 65 of The Visual Display of Quantitative Information - a Edward Tufte, The Visual Display
minimal line plot. This one is important because it illustrates the most elemental principle - that of of Quantitative Information
minimalism with reduced ‘data-ink’. As Tufte explains, the ‘data-ink’ (total ink used to print the graphic) (Cheshire, 1983), p. 65
ratio should equal to ‘1 - proportion of graphic that can be erased without loss of data-information’. The
primary challenge is therefore to modify the default graphs produced with R so that we remove as much of
‘non-data ink’ as possible. As you will soon see, this is done by subtracting and deconstructing existing R
graphs to get rid of as much ‘non-data ink’ as possible.
(http://stackoverflow.com/questions/12901396/contro
axis-ticks-and-axis-lines-
separately-on-r-lattice-xyplot)
from Stackoverflow to draw only
http://motioninsocial.com/tufte/ Page 3 of 29
Tufte in R 11/01/2022, 19:20
axes ticks.
library(lattice)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
xyplot(y~x, xlab="", ylab="", pch=16, col=1, border = "transparent", type="o",
abline=list(h = c(max(y),max(y)-1), lty = 2),
scales=list(x=list(at=x,labels=x, fontfamily="serif", cex=1),
y=list(at=seq(1,6,1), fontfamily="serif", cex=1,
label=sprintf("$%s",seq(300,400,20)))),
par.settings = list(axis.line = list(col = "transparent"), dot.line=list(lwd=0)),
axis = function(side, line.col = "black", ...) {
if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif",
current.panel.limits()$ylim[1]/1.3, cex=1,
"Per capita\nbudget expandures\nin constant dollars")
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif",
current.panel.limits()$ylim[1]/5.5, cex=1, "5%")
http://motioninsocial.com/tufte/ Page 4 of 29
Tufte in R 11/01/2022, 19:20
library(highcharter)
x <- 1967:1977
y <- c(290,318,372,385,385,372,386,380,390,400,380)
d <- data.frame(x, y)
highchart() %>%
hc_chart(type = "scatter") %>%
hc_subtitle(text = "Per capita budget expanditures in constant dollars") %>%
hc_yAxis(labels = list(format = "${value}")) %>%
hc_add_series(data = d) %>%
hc_add_theme(hc_theme_tufte())
$400
$380
$360
$340
$320
$300
$280
1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977
http://motioninsocial.com/tufte/ Page 5 of 29
Tufte in R 11/01/2022, 19:20
Range frame plot in base graphics Edward Tufte, The Visual Display
of Quantitative Information
http://motioninsocial.com/tufte/ Page 6 of 29
Tufte in R 11/01/2022, 19:20
http://motioninsocial.com/tufte/ Page 7 of 29
Tufte in R 11/01/2022, 19:20
http://motioninsocial.com/tufte/ Page 8 of 29
Tufte in R 11/01/2022, 19:20
http://motioninsocial.com/tufte/ Page 9 of 29
Tufte in R 11/01/2022, 19:20
(http://deanattali.com/2015/03/29/ggExtra-
r-package/) to create margin
histograms. There is no option to
create dot-histograms, just the
standard bar-based histograms.
http://motioninsocial.com/tufte/ Page 10 of 29
Tufte in R 11/01/2022, 19:20
library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F) +
theme(axis.title=element_blank(), axis.text=element_blank())
ggMarginal(p, type = "density")
Margin densityplot
library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F) +
theme(axis.title=element_blank(), axis.text=element_blank())
ggMarginal(p, type = "boxplot", size=10, fill="transparent")
Margin boxplot
Minimal boxplot
http://motioninsocial.com/tufte/ Page 11 of 29
Tufte in R 11/01/2022, 19:20
project.org/web/packages/PerformanceAnalytics/inde
package with dedicated
as.Tufte=T argument. This
http://motioninsocial.com/tufte/ Page 12 of 29
Tufte in R 11/01/2022, 19:20
Minimal barchart
http://motioninsocial.com/tufte/ Page 13 of 29
Tufte in R 11/01/2022, 19:20
library(ggplot2)
library(ggthemes)
library(psych)
library(reshape2)
d <- melt(colMeans(msq[,c(2,7,34,36,42,43,46,55,68)],na.rm = T)*10)
d$trait <- rownames(d)
ggplot(d, aes(x=trait, y=value)) + theme_tufte(base_size=14, ticks=F) +
geom_bar(width=0.25, fill="gray", stat = "identity") + theme(axis.title=element_blank()) +
scale_y_continuous(breaks=seq(1, 5, 1)) +
geom_hline(yintercept=seq(1, 5, 1), col="white", lwd=1) +
annotate("text", x = 3.5, y = 5, adj=1, family="serif",
label = c("Average scores\non negative emotion traits
from 3896 participants\n(Watson et al., 1988)"))
http://motioninsocial.com/tufte/ Page 14 of 29
Tufte in R 11/01/2022, 19:20
library(psych)
library(reshape)
library(highcharter)
values <- 1 + abs(rnorm(12))
d <- melt(colMeans(msq[,c(2,7,34,36,42,43,46,55,68)], na.rm = T)*10)
trait <- row.names(d)
value <- as.vector(d[,1])
highchart() %>%
hc_chart(type = "column") %>%
hc_add_series(data = value) %>%
hc_xAxis(categories = row.names(d)) %>%
hc_add_theme(hc_theme_tufte2())
0
afraid ashamed guilty hostile irritable jittery nervous scared upset
http://motioninsocial.com/tufte/ Page 15 of 29
Tufte in R 11/01/2022, 19:20
Slopegraph
http://motioninsocial.com/tufte/ Page 16 of 29
Tufte in R 11/01/2022, 19:20
Slopegraph in ggplot2 with ggslopegraph (with bugs, in preparation) been logged here on Github
(https://github.com/leeper/slopegraph/issues/17)
Sparklines
There is no ‘out-of-box’ solution in the existing packages that truly replicate Tufte-style sparklines. Main
issues are scaling the size of the plot and labeling of the points - those factors are likely to change depending
on the data set you’re plotting, so you will have to adjust specific parameters (which I highlight for every
graphical system). To make the output more consistent, every sparkline plot will be automatically saved in
the working directory in a vector format as a PDF (using pdf() and dev.off() functions).
A word of warning - in its current format, making sparklines requires a bit more advanced knowledge of R.
Its far from perfect - proceed with caution.
http://motioninsocial.com/tufte/ Page 17 of 29
Tufte in R 11/01/2022, 19:20
library(RCurl)
dd <- read.csv(text = getURL("https://gist.githubusercontent.com/GeekOnAcid/da022affd36310c96cd4/
raw/9c2ac2b033979fcf14a8d9b2e3e390a4bcc6f0e3/us_nr_of_crimes_1960_2014.csv"))
d <- dd[,c(2:11)]
pdf("sparklines_base.pdf", height=10, width=6)
par(mfrow=c(ncol(d),1), mar=c(1,0,0,8), oma=c(4,1,4,4))
for (i in 1:ncol(d)){
plot(d[,i], lwd=0.5, axes=F, ylab="", xlab="", main="", type="l", new=F)
axis(4, at=d[nrow(d),i], labels=round(d[nrow(d),i]), tick=F, las=1, line=-1.5,
family="serif", cex.axis=1.2)
axis(4, at=d[nrow(d),i], labels=names(d[i]), tick=F, line=1.5,
family="serif", cex.axis=1.4, las=1)
text(which.max(d[,i]), max(d[,i]), labels=round(max(d[,i]),0),
family="serif", cex=1.2, adj=c(0.5,3))
text(which.min(d[,i]), min(d[,i]), labels=round(min(d[,i]),0),
family="serif", cex=1.2, adj=c(0.5,-2.5))
ymin <- min(d[,i]); tmin <- which.min(d[,i]); ymax<-max(d[,i]); tmax<-which.max(d[,i]);
points(x=c(tmin,tmax), y=c(ymin,ymax), pch=19, col=c("red","blue"), cex=1)
rect(0, summary(d[,i])[2], nrow(d), summary(d[,i])[4], border=0,
col = rgb(190, 190, 190, alpha=90, maxColorValue=255))}
axis(1, at=1:nrow(dd), labels=dd$Year, pos=c(-5), tick=F, family="serif", cex.axis=1.4)
dev.off()
http://motioninsocial.com/tufte/ Page 18 of 29
Tufte in R 11/01/2022, 19:20
(https://cran.r-
library(epanetReader)
library(reshape) project.org/web/packages/epanetReader/index.html)
library(RCurl) package by Bradley Eck. This
dd <- read.csv(text = getURL("https://gist.githubusercontent.com/GeekOnAcid/da022affd36310c96cd4/ compact solution requires data to
raw/9c2ac2b033979fcf14a8d9b2e3e390a4bcc6f0e3/us_nr_of_crimes_1960_2014.csv"))
be in long table format and it has
d <- melt(dd[,c(2:11)])
a limited customisation options.
pdf("sparklines_base_epanetReader.pdf", height=6, width=10)
plotSparklineTable(d, row.var = 'variable', col.vars = 'value') Great for making a rapid
dev.off() summaries with sparklines.
Sparklines in lattice
You have much better control over the location and size of sparklines when you use lattice. The only
problem are right-side labels for which you have to use grid library in order to ‘hack’ the view parameters
with functions pushViewport() and popViewport() . You can learn more about this in an extensive
collection of grid vignettes (https://stat.ethz.ch/R-manual/R-devel/library/grid/html/grid-
package.html).
http://motioninsocial.com/tufte/ Page 19 of 29
Tufte in R 11/01/2022, 19:20
library(lattice)
library(latticeExtra)
library(grid)
library(reshape)
library(RCurl)
dd <- read.csv(text = getURL("https://gist.githubusercontent.com/GeekOnAcid/da022affd36310c96cd4/
raw/9c2ac2b033979fcf14a8d9b2e3e390a4bcc6f0e3/us_nr_of_crimes_1960_2014.csv"))
d <- melt(dd, id="Year")
names(d)[1] <- "time"
pdf("sparklines_lattice.pdf", height=10, width=8)
xyplot(value~time | variable, d, xlab="", ylab="", strip=F, lwd=0.7, col=1, type="l",
layout=c(1,length(unique(d$variable))), between = list(y = 1),
scales=list(y=list(at=NULL, relation="free"), x=list(fontfamily="serif")),
par.settings = list(axis.line = list(col = "transparent"),
layout.widths=list(right.padding=20, left.padding=-5)),
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
pushViewport(viewport(xscale=current.viewport()$xscale-5,
yscale=current.viewport()$yscale, clip="off"))
panel.text(x=tail(x,n=1), y=tail(y,n=1), labels=levels(d$variable)[panel.number()],
fontfamily="serif", pos=4)
popViewport()
panel.text(x=x[which.max(y)], y=max(y), labels=round(max(y),0), cex=0.8,
fontfamily="serif",adj=c(0.5,2.5))
panel.text(x=x[which.min(y)], y=min(y), labels=round(min(y),0), cex=0.8,
fontfamily="serif",adj=c(0.5,-1.5))
panel.text(x=tail(x,n=1), y=tail(y,n=1), labels=round(tail(y,n=1),0), cex=0.8,
fontfamily="serif", pos=4)
panel.points(x[which.max(y)], max(y), pch=16, cex=1)
panel.points(x[which.min(y)], min(y), pch=16, cex=1, col="red")
panel.rect(min(x), quantile(y, 0.25), max(x), quantile(y, 0.75),
col = "grey", border = "transparent", alpha = 0.4)
})
dev.off()
http://motioninsocial.com/tufte/ Page 20 of 29
Tufte in R 11/01/2022, 19:20
Sparklines in ggplot2
Created with the help of Wouter
Van Der Bijl
(http://stackoverflow.com/users/4341440/axeman)
as part of this StackOverflow
(http://stackoverflow.com/questions/35434760/sparkl
in-ggplot2/35436104#35436104)
question.
http://motioninsocial.com/tufte/ Page 21 of 29
Tufte in R 11/01/2022, 19:20
library(ggplot2)
library(ggthemes)
library(dplyr)
library(reshape)
library(RCurl)
dd <- read.csv(text = getURL("https://gist.githubusercontent.com/GeekOnAcid/da022affd36310c96cd4/
raw/9c2ac2b033979fcf14a8d9b2e3e390a4bcc6f0e3/us_nr_of_crimes_1960_2014.csv"))
d <- melt(dd, id="Year")
names(d) <- c("Year","Crime.Type","Crime.Rate")
d$Crime.Rate <- round(d$Crime.Rate,0)
mins <- group_by(d, Crime.Type) %>% slice(which.min(Crime.Rate))
maxs <- group_by(d, Crime.Type) %>% slice(which.max(Crime.Rate))
ends <- group_by(d, Crime.Type) %>% filter(Year == max(Year))
quarts <- d %>% group_by(Crime.Type) %>%
summarize(quart1 = quantile(Crime.Rate, 0.25),
quart2 = quantile(Crime.Rate, 0.75)) %>%
right_join(d)
pdf("sparklines_ggplot.pdf", height=10, width=8)
ggplot(d, aes(x=Year, y=Crime.Rate)) +
facet_grid(Crime.Type ~ ., scales = "free_y") +
geom_ribbon(data = quarts, aes(ymin = quart1, max = quart2), fill = 'grey90') +
geom_line(size=0.3) +
geom_point(data = mins, col = 'red') +
geom_point(data = maxs, col = 'blue') +
geom_text(data = mins, aes(label = Crime.Rate), vjust = -1) +
geom_text(data = maxs, aes(label = Crime.Rate), vjust = 2.5) +
geom_text(data = ends, aes(label = Crime.Rate), hjust = 0, nudge_x = 1) +
geom_text(data = ends, aes(label = Crime.Type), hjust = 0, nudge_x = 5) +
expand_limits(x = max(d$Year) + (0.25 * (max(d$Year) - min(d$Year)))) +
scale_x_continuous(breaks = seq(1960, 2010, 10)) +
scale_y_continuous(expand = c(0.1, 0)) +
theme_tufte(base_size = 15, base_family = "Helvetica") +
theme(axis.title=element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank(), strip.text = element_blank())
dev.off()
http://motioninsocial.com/tufte/ Page 22 of 29
Tufte in R 11/01/2022, 19:20
Stem-and-leaf display
Stem-and-leaf display is not exactly a ‘Tuftesque’ solution as it invented in the beginning of 20 century but Edward Tufte, The Visual Display
was only popularised in 1980s by John Tukey. A stem-and-leaf display is a display for presenting of Quantitative Information
quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a (Cheshire, 1983), p. 140 Based on
distribution. Stem-and-leaf plot is the only visualisation in this collection thats printed in the console in R John Tukey, Exploratory Data
rather than being processed with any graphical system. Analysis (Addison-Wesley, 1970).
http://motioninsocial.com/tufte/ Page 23 of 29
Tufte in R 11/01/2022, 19:20
##
## The decimal point is 1 digit(s) to the left of the |
##
## 16 | 070355555588
## 18 | 000022233333335577777777888822335777888
## 20 | 00002223378800035778
## 22 | 0002335578023578
## 24 | 00228
## 26 | 23
## 28 | 080
## 30 | 7
## 32 | 2337
## 34 | 250077
## 36 | 0000823577
## 38 | 2333335582225577
## 40 | 0000003357788888002233555577778
## 42 | 03335555778800233333555577778
## 44 | 02222335557780000000023333357778888
## 46 | 0000233357700000023578
## 48 | 00000022335800333
## 50 | 0370
http://motioninsocial.com/tufte/ Page 24 of 29
Tufte in R 11/01/2022, 19:20
##
## ***Stem and Leaf plot for infant birth weight (in grams) ***
## Grouped by levels of whether mother smoked during pregnancy (1) or not (0)
##
## 0
## :
## The decimal point is 2 digit(s) to the right of the |
##
## 10 | 2
## 12 | 3
## 14 | 799
## 16 | 03
## 18 | 9037
## 20 | 66809
## 22 | 4480358
## 24 | 14450025
## 26 | 24423558
## 28 | 144468822288
## 30 | 6668990088
## 32 | 003333377227
## 34 | 0266794479
## 36 | 011355037779
## 38 | 0366814478
## 40 | 00551577
## 42 |
## 44 | 9
## 46 |
## 48 | 9
##
##
## 1
## :
## The decimal point is 3 digit(s) to the right of the |
##
## 0 | 7
## 1 | 1
## 1 | 889999
## 2 | 11112223344444444
## 2 | 5555566677888899999
## 3 | 0000011111233333444
## 3 | 6666778999
## 4 | 2
http://motioninsocial.com/tufte/ Page 25 of 29
Tufte in R 11/01/2022, 19:20
Discussion
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
http://motioninsocial.com/tufte/ Page 26 of 29
Tufte in R 11/01/2022, 19:20
Thank you Sandra, I hope to roll the well-overdue update soon, I'm glad its a useful resource for you :)
△ ▽ • Reply • Share ›
library(psych)
library(reshape)
library(highcharter)
1△ ▽ • Reply • Share ›
Thanks Joshua, I will update the code within the next month - keep me posted if you add more "tuftesque" themes to
highcharter.
1△ ▽ • Reply • Share ›
library(prettyR)
cs_stretch<-stretch_df(d,"group","value")
library(plotrix)
pdf("cancer_survival.pdf",width=10,height=12)
bumpchart(cs_stretch[,3:6],rank=FALSE,
top.labels=c(5,10,15,20),labels=cs_stretch[,1],
main="Years cancer survival by group",mar=c(2,12,5,12))
dev.off()
Jim
1△ ▽ • Reply • Share ›
Thank you Jim - I will test it and implement in my next update that I'm preparing for this month.
△ ▽ • Reply • Share ›
http://motioninsocial.com/tufte/ Page 27 of 29
Tufte in R 11/01/2022, 19:20
I notice from the examples by Leeper that the dataframes do not use a have a defined observation that is named, rather it is
silent and only pulled by row.names(). I am struggling to get similar results and wondering if there is a hint
http://motioninsocial.com/tufte/ Page 28 of 29
Tufte in R 11/01/2022, 19:20
silent and only pulled by row.names(). I am struggling to get similar results and wondering if there is a hint
robert
△ ▽ • Reply • Share ›
Indeed, it does work for me now so my advice - install newest version of all packages and R, restart everything, try again
with clear environment.
△ ▽ • Reply • Share ›
The culprit is the # in the second line. When I copy your code and delete it: col.lines <- rep(col.lines, length.out=nrow(df)), it
works perfectly.
△ ▽ • Reply • Share ›
Thanks for the hint - I will keep an eye when the new version is released to update the code.
△ ▽ • Reply • Share ›
http://motioninsocial.com/tufte/ Page 29 of 29