Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

THE COMPLETE DATA VISUALIZATION

COURSE
CHARTS AND DATA TYPES
THERE IS NEVER ONLY ONE RIGHT VISUALIZATION

NUMERICAL & CATEGORICAL NUMERICAL & NUMERICAL TIME SERIES


COLORS
CHOOSE 2-3 COLORS FOR YOUR CHART

PREDETERMINED ONLINE TOOLS OUR COLOR PALETTES


COMPANY COLORS CREATE YOUR OWN CUSTOM PALETTE YOU CAN USE ANY OF OUR TEMPLATES

CLIENTS REQUEST SPECIFIC COLORS WITH THE AID OF ONLINE TOOLS TO BUILD YOUR OWN GRAPHS AND CHARTS
Bar Chart
Car Listings by Brand
1000
875

• COMMUNICATE YOUR INTENTIONS CLEARLY


820
Number of Listings

800
636
600 509

• MAKE SURE YOUR CHART ISN’T MISLEADING


419 438
400 306

200

• INTUITIVE
• APPROPRIATE FOR NON-TECHNICAL
AUDIENCES
• ONE OF THE MOST COMMONLY USED
CHARTS
Pie Chart
• DON’T USE WHEN DATA ≠ 100%
C AR S B Y E N G I N E F U E L T Y P E

36%
Diesel
46% Gas
Other

• DON’T USE WHEN THERE ARE


Petrol

14%
TOO MANY CATEGORIES
4%

• APPROPRIATE FOR NON-TECHNICAL


AUDIENCES
• WIDELY USED, DESPITE CRITICISM
• A FEW CATEGORIES • NO 3D OR DOUGHNUT
• DATA SUMS UP TO 100%
Stacked Area Chart
Popularity of engine fuel types (1982-2016)
70,000

60,000

• AVOID WHEN YOU HAVE TOO MANY CATEGORIES – A


Number of Cars

50,000

40,000
LINE CHART WORKS BETTER
Gas
30,000 Petrol

20,000

10,000
Diesel
• AVOID WITH CATEGORIES OF SIMILAR SIZE – DIFFICULT
0 TO DETERMINE SIZE OF NON-RECTANGULAR SHAPES
• ORDER CATEGORIES BY SIZE – TO IMPROVE
1994

2009
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993

1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008

2010
2011
2012
2013
2014
2015
2016
READABILITY
• COMPARE VOLUME AMONG FEATURES
• AT LEAST THREE FEATURES • Y-AXIS MUST START AT 0 – WE’RE MEASURING VOLUME
• ORDERING FOR AT LEAST TWO OF THEM
• TIME SERIES DATA
Line Chart
• WHEN YOU HAVE A LARGE PERIOD OF TIME, NARROW
IT DOWN TO GAIN MORE INSIGHT
S&P vs FTSE Returns (H2 2008)
15.00%

10.00%

5.00%

0.00%

-5.00%

-10.00%

• BE CAREFUL NOT TO INCLUDE TOO MANY


7/1/2008 8/1/2008 9/1/2008 10/1/2008 11/1/2008 12/1/2008

GSPC500 FTSE100

CATEGORIES, TO AVOID A SPAGHETTI CHART

• UP TO SEVERAL CATEGORIES
• TIME SERIES DATA
• Y-AXIS DOESN’T HAVE TO START AT 0
Histogram

• SIMILAR TO A BAR CHART, NO GAP BETWEEN BINS


• TO CREATE A HISTOGRAM
• DETERMINE THE INTERVAL SIZE
• CHOOSE THE NUMBER OF BINS
• DISTRIBUTION OF A NUMERIC VARIABLE
• THE VARIABLE’S RANGE OF VALUES IS
SPLIT INTO INTERVALS OR BINS
• Y-AXIS – NUMBER OF OBSERVATIONS
WITHIN EACH INTERVAL (OR DENSITY)
CHOOSING THE NUMBER OF BINS

START WITH A VERY LARGE NUMBER TO REDUCE THE NUMBER CHOOSE SEVERAL BINS, SUCH THAT THE
OBSERVE THE DATA PATTERN PATTERN IN THE DATA IS VISIBLE

There are scientific approaches, however, they are Scott’s rule - 3.49𝜎𝑛−1/3
not often used in practice. Sturge’s Rule - 𝐾 = 1 + 3.322 log 𝑁
The reason is that real data has noise, is discrete, 𝑏
Doane’s Rule - log 2 (𝑛) + 1 + log 2 (1 + )
etc. 𝜎 𝑏
Scatter Plot • USE TRANSPARENCY TO AVOID OVERPLOTTING
Relationship between Area and Price of California Real
Estate
600

500

400
Price (000' of $)

300

200

100
• A THIRD VARIABLE COULD BE USED WITH A COLOR
0 PARAMETER
0 500 1000 1500 2000 2500
Area (sq. ft.)

• DISPLAYS EACH POINT FROM THE DATA,


INSTEAD OF SHOWING AGGREGATED
FORM
• SHOWS RELATIONSHIP BETWEEN
VARIABLES
Regression Plot
30
Advertisment vs Sales
• THERE EXIST MANY TYPES OF RELATIONSHIPS
25
y = 0.0487x + 4.243
R² = 0.7529
BETWEEN VARIABLES
Budget in 1000 units

20

15

10

• SOMETIMES THERE IS NO APPARENT


0 50 100 150 200 250 300 350 400 450 500
Sales in 1000 $

• USED TO DETERMINE RELATIONSHIPS RELATIONSHIP BETWEEN FEATURES


BETWEEN PREDICTOR(S) AND OUTCOME
• REGRESSION LINE & EQUATION HELP US
QUANTIFY THE RELATIONSHIP
ADDITIONAL RESOURCES
• HTTPS://SEABORN.PYDATA.ORG/TUTORIAL/REGRESSION.HTML
• HTTP://WWW.COOKBOOK-R.COM/
• HTTP://WWW.STAT.COLUMBIA.EDU/~TZHENG/FILES/RCOLOR.PDF
• HTTPS://PYTHON-GRAPH-GALLERY.COM/100-CALLING-A-COLOR-WITH-SEABORN/
• HTTPS://WWW.DATA-TO-VIZ.COM/
• HTTPS://COOLORS.CO/

You might also like