Professional Documents
Culture Documents
Lecture #3
Lecture #3
Lecture #3
Lecture 3
Visualizing data
Learning Objectives
What is a visualization?
Representing numerical and categorical data with graphical elements (color, shape, position, size)
4
GET1030
Boxplots
Barplots
Lineplots
Scatterplots
Histograms
KDE plots
Violinplots
Joint plots
5
GET1030
Boxplot
1st quartile =
Median =
3rd quartile =
Minimum =
Maximum =
6
GET1030
Boxplot (Outliers)
1.5 x IQR
Calculating outliers using Tukey’s rule
IQR = 75th percentile - 25th percentile
Median = 10
Minimum = 3
Maximum = 17
7
GET1030
Categories in boxplots
variables
1. days of the week
2. bill size
categories
1. smoke or not
8
GET1030
Mean =
Variance =
Standard Deviation =
9
GET1030
10
GET1030
Scatterplot
11
GET1030
Scatterplot
12
GET1030
Scatterplot
Shows the numerical relationship between two variables. Color can be use categorical variables.
13
GET1030
Histogram
14
GET1030
Histogram
15
GET1030
16
GET1030
17
GET1030
18
GET1030
Histogram
https://tinlizzie.org/histograms/
19
GET1030
20
GET1030
Violinplot
21
GET1030
Jointplots
22
GET1030
Pairplots
23
Lecture 3
Visualizing data
24
GET1030
From The Commercial and Political Atlas; Representing, by Means of Stained Copper-Plate
Charts, the Exports, Imports, and General Trade of England, at a Single View (1785)
25
GET1030
26
GET1030
‘Coxcom’ (polar area graph) visualization of the cause of death in the army (1850s
27
Crimea War)
*preventable causes, wounds, accidents
GET1030
The scatterplot
29
GET1030
The scatterplot
30
GET1030
Statisticians often disagree with a type of visualization aesthetic common in the news, which was most
influential developed by Nigel Holmes, while working for TIME magazine in the 1970s.
*as we’ll see later, this is the kind of graph that Tufte would classify as ‘chartjunk’
31
GET1030
Additional resources
https://exhibits.stanford.edu/dataviz 32
GET1030
33
GET1030
Edward Tufte:
Against “chartjunk”
Popularized sparklines.
Famous for several concepts:
lie factor, the data-ink ratio (against decoration), small multiples and
the data density of a graphic.
34
GET1030
To grab attention?
To show trends?
To help scientists analyze data?
35
Lecture 3
Visualizing data
36
GET1030
Axis Cropping
Kendall Fortney, “5 Ways Data Visualizations can Lie”, Towards data science,
https://towardsdatascience.com/5-ways-data-visualizations-can-lie-46e54f41de37
37
GET1030
Axis Scalling
38
GET1030
Axis Scaling
Kendall Fortney, “5 Ways Data Visualizations can Lie”, Towards data science,
https://towardsdatascience.com/5-ways-data-visualizations-can-lie-46e54f41de37 39
GET1030
Maryland is bigger than the others (by 3%). The 3D effect visually
adds more volume to NH, tricking your eyes. Without labels telling
the percentages there would be little to no chance of accurately
guessing it.
Kendall Fortney, “5 Ways Data Visualizations can Lie”, Towards data science,
https://towardsdatascience.com/5-ways-data-visualizations-can-lie-46e54f41de37 40
GET1030
42
GET1030
Multiple dimensions
43
GET1030
44
GET1030
Spurious correlations
http://www.tylervigen.com/spurious-correlations 45
GET1030
Cabanski, C., Gilbert, H., & Mosesova, S. (2018). Can Graphics Tell Lies? A Tutorial on How
To Visualize Your Data. Clinical and translational science, 11(4), 371–377.
doi:10.1111/cts.12554
46
GET1030
List of caveats
https://www.data-to-viz.com/caveats.html
48
Lecture 3
Visualizing data
49
GET1030
Alternative modes
Drucker (2011)
50
GET1030
Alternative modes
Drucker (2011)
51
GET1030
52
Hepword and Church (2011)
GET1030
53
Hepword and Church (2011)
GET1030
54
Hepword and Church (2011)
GET1030
55
GET1030
By Nathan Yau
https://flowingdata.com/2017/01/24/one-data
set-visualized-25-ways/
56
GET1030
Jer Thorp
https://medium.com/@blprnt/you-say-data-i-say-system-54e84aa7a421
57
GET1030
● Culturally and historically specific ‘ways of seeing’ that reflect values, cultures,
aesthetics and prejudices.
58
GET1030
World ⇢ data
59
GET1030
Data ⇢ image
60
GET1030
Image ⇢ eye
Visual cultures - what are they and where do they come from?
61
GET1030
Data infrastructures
Visualization devices, rather than ‘tools’ (they emphasize world-making)
‘Communicative objectivity’ Halpern
Tufte advocates: championing efficiency and parsimony, maximizing the
‘data–ink’ ratio, and eliminating ‘chartjunk’
62
GET1030
63
GEK 2050
GET1030
You must explain the computational thinking process that led to this
project: how data was modeled, which aspects of the analysis were
automated, and what output was produced. You should also explain what
are the potential blind spots and limitations of the project.
File format: one file with your name clearly stated in the body of the text
(5 marks will be deducted if this missing). For citations, please use the
Chicago Manual of Style 17th Edition, Author-Date format. Do not cite
academic journals is if they were websites.
64
GEK 2050
GET1030
65
GET1030
https://xkcd.com/688/
66
GET1030
References
Drucker, Johanna. 2011. “Humanities Approaches to Graphical Display.” Digital Humanities Quarterly 005 (1).
Hepworth, Katherine, and Christopher Church. 2019. “Racism in the Machine: Visualization Ethics in Digital Humanities
Projects.” Digital Humanities Quarterly 012 (4).
Gray, Jonathan, Liliana Bounegru, Stefania Milan, and Paolo Ciuccarelli. 2016. “Ways of Seeing Data: Toward a Critical
Literacy for Data Visualizations as Research Objects and Research Devices.” In Innovative Methods in Media and
Communication Research, edited by Sebastian Kubitschko and Anne Kaun, 227–51. Cham: Springer International
Publishing. https://doi.org/10.1007/978-3-319-40700-5_12.
Thorp, Jer. 2017. “You Say Data, I Say System.” Hacker Noon. July 13, 2017.
https://hackernoon.com/you-say-data-i-say-system-54e84aa7a421.
Friendly, Michael and Daniel Denis. “The early origins and development of the scatterplot.” Journal of the History of the
Behavioral Sciences, 41(2), 103–130.
Nundy, Surajit et al. 2000. “Why are angles misperceived?”. Proc Natl Acad Sci 97(10): 5592–5597.
67