Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

QBUS5011 Week 10

Data Visualisation

Discipline of Business Analytics


The University of Sydney Business School
Reading

Graph Workflow - Demetris Christodoulou


https://graphworkflow.com
I Graph workflow model
I Graph objective
I Qualities of data graphics
I Retinal variables
I Graph identification
Data Wrangling - Mapping to Lectures

1. Understanding (Week 9)
2. Assembling (Week 9)
3. Cleaning (Week 9)
4. Transforming (Week 10)
5. Sharing (Weeks 10 and 11)
Motivation
Motivation

Humans have not evolved to decode information from


data tables. We have evolved to decode visual
information.
- Graph Workflow, Demetris Christodoulou
Motivation

Consider a table of iPod sales during 2002-2015


Motivation

Using that table try to answer the following:


I What sort of information can you decode by reading this
table?
I How quickly can you decode this information?
I How much confidence do you have in the accuracy of the
information that you have decoded?
I Will you remember this information after 10 minutes? How
about after 1 day?
Motivation

Now try to answer the questions using this graph


Motivation

Clearly a visual representation can be a vastly more efficient way to


share information between humans.
Data Visualisation Principles
Choosing the Right Visualisation

Christodoulou suggests three questions to guide your selection:


1. what is the statistical context of the question?
2. what are the properties of data?
3. who is the intended audience?
Statistical Context

The statistical context refers to the number and types of


statistical variables involved.

For example there are univariate and multivariate relationships for


single and multiple variable contexts.

This extends further to hypervariate and multi-way, which are for


larger number of variables or for a categorical complication
respectively.
Statistical Context - Categories

I Distributional analysis of a single variable


I Correlation analysis between variables
I Comparative analysis of magnitudes and differences
I Compositional analysis of how components make up a total
I Temporal analysis
I Spatial analysis
I etc
Statistical Context - Rules of Thumb

I Univariate questions suggest the use of quantile plots,


distribution plots, box-plots, and bar charts if the data is
categorical.
I Multivariate questions suggest the use of the scatter plot and
smoothing estimators, and clever use of retinal variables.
I Hypervariate and multiway questions need more advanced
encoding strategies, such as the use of small multiples and
parallel coordinate plots.
Properties of Data

The three so-called “properties” of data


I Qualitative level
I Ordered level
I Quantitative interval-ratio level
have different presentation requirements.

For example with qualitative variables care must be taken to ensure


that the levels do not convey a particular order or preference.
Intended Audience

The audience of the visualisation ultimately decides on the form.

This is further complicated that the audience is rarely an individual


and it may be impossible to provide a visualisation which perfectly
targets the “decoding ability“ of everyone.
Intended Audience

The decoding ability of the audience is influenced by many factors


such as:
I understanding of the problem context
I prior exposure to the visualisation type (ubiquity)
I understanding of the statistical types involved
I complexity of visualisation
I strength or obviousness of the effect
Intended Audience

The decoding ability can be hard to objectively measure since you


are usually intimately acquainted with the data.

Small missteps can confuse or muddle the point.

To overcome this you can accompany visualisations with a short


written guide or explanation with at least one example
interpretation.
Accessibility
Visual Impairments

It is estimated that over 50% of the Australian population has one


or more eye conditions1 .

All conditions will impact on the ability for a visualisation to be


decoded in some way.

For example colour blindness might cause various plot elements to


be interpreted as belonging to the same category or become
impossible to decode.

1
https://www.aihw.gov.au/reports/eye-health/eye-health/
contents/how-common-is-visual-impairment
Visual Impairments - Colour Blindness
Accessibility - Rules of Thumb

I only use colour blind compatible colour maps/sets


I use at least two methods of differentiation (colour, line style,
markers, shading etc)
I however too much differentiation adds cognitive burden
I use simple and standard fonts
I adjust element and text sizes so that it is readable at the
intended scale
I provide written alternatives for the blind
Visualisations with Python
Top 3 Libraries
Top 3 Libraries

I matplotlib is based on MATLAB’s plotting functions and is


the most stable
I seaborn is an opinionated version of matplotlib
I plotly was originally a Javascript plotting library but now has
libraries for Python and R

For QBUS5011 we will mostly use matplotlib.


Figures
A figure is like a painters canvas upon which we can draw
elements.

Within a figure we can have single or multiple axes, which are


normally associated with a plot.
Figures
Looking more closely at a matplotlib figure there are many
elements which can be enabled, disabled or customised.
pyplot

The matplotlib library is expansive and allows for very low level
control over plotting elements.

For most tasks it is recommended to use the pyplot sub-module


which provides a higher level interface.
Your first plot
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16]) # Line plot
plt.show() # Display figure
pyplot is stateful

In matplotlib.pyplot various states are preserved across


function calls, so that it keeps track of things like the
current figure and plotting area, and the plotting
functions are directed to the current axes2

This means that functions we use implicitly refer to an existing


current Figure and current Axes, or create them anew if none
exist3 .

2
https://matplotlib.org/stable/tutorials/introductory/pyplot.
html#intro-to-pyplot
3
https://realpython.com/python-matplotlib-guide/
#stateful-versus-stateless-approaches
Plotting in Ed

import matplotlib.pyplot as plt


plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
# Save plot as a file, which Ed will open
plt.savefig("plot.png")
Scatter plot
import matplotlib.pyplot as plt
plt.scatter([1, 2, 3, 4], [1, 4, 9, 16], 'ro')
plt.show()
Integrating Real Data

The first two parameters of functions like plot and scatter can be
of any array-like type. This means that Pandas Series objects can
be directly passed to these functions.

import matplotlib.pyplot as plt


import pandas as pd

marketing = pd.read_csv('DirectMarketing.csv')

salary = marketing['Salary']
spent = marketing['AmountSpent']

plt.scatter(salary, spent)
Histograms

plt.hist(marketing['Salary'])
Bar Charts
xpos = [0, 1, 2, 3, 4, 5, 6, 7]
height = [507, 269, 123, 567, 245, 346, 429, 329]

plt.bar(xpos, height)
Mediums
Saving Figures

Once you have completed a figure it is now time to publish it,


either by sending it to the audience as a standalone image file or
embedded in a bigger publication e.g. a report or slides.

pyplot provides a simple function for saving the state of a figure to


a file.

plt.savefig("plot.png")

With savefig the image file type is automatically inferred from the
extension in the filename.
Raster graphics

Raster graphics are those that consist of a grid of pixels.

Most image formats that you encounter are raster graphics formats
e.g. JPEG, PNG and GIF. Video formats such as MP4 store the
image information in a raster format.

Raster graphics are sometimes called “bitmap” graphics.

Raster images have a fixed resolution, therefore scaling up


(zooming) will cause an apparent loss of quality.
Raster graphics
Vector Graphics

Vector graphics consist of a set of drawing instructions for the


image elements, which will be lines, curves or other polygons.

In contrast to raster graphics they can be drawn at any scale


without an apparent loss of quality.

Common examples of vector file formats are PDF, SVG and EPS.
Vector Graphics
Vector Graphics

When exporting your figures it is highly recommended that you use


a vector image format for the following reasons
I scales to any desired resolution
I typically smaller file size
I printable
I sharing and redistribution of the image will not cause a loss of
quality
Animations
Animations

Simple animations can be generated by using a loop to generate


multiple figures.
However matplotlib provides various convenience functions to
output these image files into different forms e.g GIF or MP4 files.
Animations
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
ln, = plt.plot([], [], 'r')
ax.set_xlim(-3, 3)
ax.set_ylim(-10.2, 10.2)

def update(i):
x = np.linspace(-3, 3)
y = i*np.sin(x+i*2)
ln.set_xdata(x)
ln.set_ydata(y)

ani = FuncAnimation(fig, update, frames=np.linspace(0, 10))


ani.save("sine.mp4")
Interactivity
Interactivity

Increasingly there is demand for visualisations to be interactive.

Often this is because


I the visualisation is large
I the visualisation is complex
I interactivity is seen as more engaging
Technical Affordances

Interactive visualisations cannot be published using standard image


files.

Technical affordances needs to be provided for interaction e.g.


respond to user input.

Currently the solutions are to leverage web-browser technology i.e.


HTML and Javascript or a standalone software product like
Tableau.
Browser Based Interactivity

Browser based interactions are the dominant solution because:


I they are cross-platform
I work on almost any modern computing device
I can be accessed over a network on demand
I don’t require any software to be installed
I are readily deployed as standalone websites or integrated into
existing ones
plot.ly Dash

The state of the art when it comes to browser based interactivity is


plot.ly Dash which is a library for generating interactive
visualisations and dashboards with your choice of Python, R or
Javascript.

Interactivity is provided through a call-backs which are functions


that get triggered when a specified interaction occurs e.g. mouse
click on a graph or a drop down select is changed.
plot.ly Dash

LIVE DEMO
Too Many Guns
Too Many Guns

https://toomanyguns.herokuapp.com/
LIVE DEMO

You might also like