Common Visualization Idioms

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 95

Common Visualization Idioms

•Reusable Scatter Plot


•Bar Chart: Vertical & Horizontal
• Charts: Pie, Line & Area
Visualization Idioms
• Visualization idiom: "a specific sequence of
data enrichment and enhancement
transformations, visualization mappings and
rendering transformations that produce an
abstract display of a scientific data set".
Visualization process is series of transformations to convert raw
simulated data into a displayable image:
Introduction
• Data visualization provides an important suite of tools for
identifying a qualitative understanding. This can be
helpful when we try to explore the dataset and extract
some information to know about a dataset and can help
with identifying patterns, corrupt data, outliers, and
much more.
• If we have a little domain knowledge, then data
visualizations can be used to express and identify key
relationships in plots and charts that are more helpful to
yourself and stakeholders than measures of association or
significance.
Data Visualization
• Data visualization is defined as a graphical representation that
contains the information and the data.
• By using visual elements like charts, graphs, and maps, data
visualization techniques provide an accessible way to see
and understand trends, outliers, and patterns in data.
• In modern days we have a lot of data in our hands i.e, in the world
of Big Data, data visualization tools, and technologies are crucial to
analyze massive amounts of information and make data-driven
decisions.
• It is used in many areas such as:
– To model complex events.
– Visualize phenomenons that cannot be observed directly, such as weather
patterns, medical conditions, or mathematical relationships.
Benefits of Good Data Visualization
• Since our eyes can capture the colors and patterns, therefore, we can quickly
identify the red portion from blue, square from the circle, our culture is visual,
including everything from art and advertisements to TV and movies.
• So, Data visualization is another technique of visual art that grabs our interest and
keeps our main focus on the message captured with the help of eyes.
• Whenever we visualize a chart, we quickly identify the trends and outliers present
in the dataset.
• The basic uses of the Data Visualization technique are as follows:
– It is a powerful technique to explore the data
with presentable and interpretable results.
– In the data mining process, it acts as a primary step in the pre-processing
portion.
– It supports the data cleaning process by finding incorrect data and corrupted
or missing values.
– It also helps to construct and select variables, which means we have to
determine which variable to include and discard in the analysis.
– In the process of Data Reduction, it also plays a crucial role while combining
the categories.
Different Types of Analysis for Data
Visualization
• Mainly, there are three different types of analysis for
Data Visualization:
• Univariate Analysis: In the univariate analysis, we will
be using a single feature to analyze almost all of its
properties.
• Bivariate Analysis: When we compare the data
between exactly 2 features then it is known as
bivariate analysis.
• Multivariate Analysis: In the multivariate analysis,
we will be comparing more than 2 variables.
Univariate Analysis Techniques for Data
Visualization
• 1. Distribution Plot
• It is one of the best univariate plots to know about the
distribution of data.
• When we want to analyze the impact on the target
variable(output) with respect to an independent
variable(input), we use distribution plots a lot.
• This plot gives us a combination of both probability
density functions(pdf) and histogram in a single plot.
• Implementation:
• The distribution plot is present in the Seaborn package.
Some conclusions inferred from the above
distribution plot:
• From the above distribution plot we can conclude the
following observations:
• We have observed that we created a distribution plot on
the feature ‘Age’(input variable) and we used different
colors for the Survival status(output variable) as it is the
class to be predicted.
• There is a huge overlapping area between the PDFs for
different combinations.
• In this plot, the sharp block-like structures are called
histograms, and the smoothed curve is known as the
Probability density function(PDF).
Box and Whisker Plot

• This plot can be used to obtain more statistical details about the


data.
• The straight lines at the maximum and minimum are also
called whiskers.
• Points that lie outside the whiskers will be considered as an
outlier.
• The box plot also gives us a description of the 25th, 50th,75th
quartiles.
• With the help of a box plot, we can also determine
the Interquartile range(IQR) where maximum details of the data
will be present. Therefore, it can also give us a clear idea about the
outliers in the dataset.
Implementation:
1.Boxplot is available in the Seaborn library.
2.Here x is considered as the dependent variable and
y is considered as the independent variable. These
box plots come under univariate analysis, which
means that we are exploring data only with one
variable.
3.Here we are trying to check the impact of a feature
named “axil_nodes” on the class named “Survival
status” and not between any two independent
features.
From the box and whisker plot we can
conclude the following observations:
How much data is present in the 1st
quartile and how many points are
outliers etc.
For class 1, we can see that it is very
little or no data is present between
the median and the 1st quartile.
There are more outliers for class 1 in
the feature named  axil_nodes.
Bivariate Analysis Techniques for Data
Visualization
• Line Plot
• This is the plot that you can see in the nook and corners
of any sort of analysis between 2 variables.
• The line plots are nothing but the values on a series of
data points will be connected with straight lines.
• The plot may seem very simple but it has more
applications not only in machine learning but in many
other areas.
• Implementation:
• The line plot is present in the Matplotlib package.
Bar Plot
• This is one of the widely used plots, that we would
have seen multiple times not just in data analysis, but
we use this plot also wherever there is a trend analysis
in many fields.
• Though it may seem simple it is powerful in analyzing
data like sales figures every week, revenue from a
product, Number of visitors to a site on each day of a
week, etc.
• Implementation:
• The bar plot is present in the Matplotlib package.
• From the bar plot we
can conclude the
following observations:
1.We can visualize the
data in a cool plot and
can convey the details
straight forward to
others.
2.This plot may be
simple and clear but it’s
not much frequently
used in Data science
applications.
Scatter Plot
• It is one of the most commonly used plots used for visualizing
simple data in Machine learning and Data Science.
• This plot describes us as a representation, where each point in
the entire dataset is present with respect to any 2 to 3
features(Columns).
• Scatter plots are available in both 2-D as well as in 3-D. The 2-D
scatter plot is the common one, where we will primarily try to
find the patterns, clusters, and separability of the data.
• Implementation:
• The scatter plot is present in the Matplotlib package.
• From the Scatter plot we can
conclude the following
observations:
The colors are assigned to different
data points based on how they
were present in the dataset i.e,
target column representation. 
We can color the data points as
per their class label given in the
dataset.
Fundamental
principles: how
Expressiveness:
• the visual encoding should express all of, and only,
the information in the dataset attributes

Effectiveness:
• the importance of the attribute should match the
salience of the channel.
• Use the strongest and most accurate channels for the
most important interpretation tasks (data)
Recall How version 1:
Channel Expressiveness and Effectiveness

Credit: T. Munzner, 2014


Roadmap so far
Part 1: principles Part 2: Methods (How v2!)
• Data • What defines the
• Perception design space?
• Visual encoding • Taxonomy of design
• (interaction to come considerations
later) • How many views?

• How to reduce
A Framework for Analysis
(Munzner)

Task IDIOM

Data IDIOM

Design IDIOM
• A visualisation idiom is a distinct approach to
creating and manipulating visual representations.

• Data: the types and hierarchical salience of the information to


represent
• Design: the visual encoding and organisation choices
• Interaction: the methods to man
From data model to type

Sepia and petal length for three species of iris [Fisher 1936]
24
N
Possible views – scatter plot – why?

Distribution and
correlations
between
IAT 814 | Design Choices 1
variables
Possible views – scatter plot
– why?

Easy with 2/3


dimensions

IAT 814 | Design Choices 1


Harder with
more !!
• www

IAT 814 | Design Choices 1


We can remodel
the data

• Add abstraction

IAT 814 | Design Choices 28


1
2 Nominal

IAT 814 | Design Choices 1 29


Change
representation

IAT 814 | Design Choices 1 30


Scatter plots show
correlations

Perfect positive : r=1 Strong positive : r=0.97 positive : r=0.8

Strong negative : r=10.98 No correlation: r = 0.16 Nonlinear correlation


However

• Scatter plots can be difficult to understand

• What alternatives are there?

• More generally, what kinds of techniques are best for


what kinds of problems?
Common 2D design
idioms

IAT 814 | Design Choices 1


33
Design choices for
tabular data
Tabular data are KeyValue vectors.

• Key/Attribute: property of the data that can


be used to index into (sort by/look up) the
set. (independent variable)
• N, O

• Value : the actual value of an individual


item
• N,O,Q,
Credit: T. Munzner, 2014
If all you want is a single precise
value ….
The first question: Table or
graph? []
• Will the data be used to look up and compare individual values, or
will the data need to be precise? If so, you should display it in
a table.

• Is the message contained in the shape of the data—in trends,


patterns, exceptions, or comparisons that involve more than a
few values? If so, you should display it in a graph.

• NOTE: (You can use both. Next time and beyond).


Key-value question defines idiom
choice (1)
[Munzner]

• 2values • 1 Key and 1


value 100
90
80
70
Height (ft)

60
50
40
1
30
20 • 2 Keys and 1 value
10
0
0 5 15
10
Circumference (ft)
2
Compared to what defines
choice
1. Time Series
(2) [Few]

• Quantitative across equal intervals (of time)

2. Ranking
• Sequenced by size of attribute value

3. Part-Whole
• • portion that each value represents to some whole,

4.
• Deviation
• Differ from reference (baseline)

5.
• Distribution
6.
• Correlation
• How one value affects another

7. Nominal
• Simple categorical
The Power
Categorical
of SpaceOrdered/Quantitiative
What/where How Much
Planar position Position common scale
Position unaligned scale
Hue Length
Shape Tilt/angle
Stipple/texture Area
Curvature
Relational/Same category Lightness
Grouping
Containment (2D) Saturation
Connection Texture
Similarity (other channels) density
Proximity (position)
The Power
Categorical
of Space
Ordered/Quantitiative
What/where How Much
Planar position Position common scale
Position unaligned scale
Hue Length
Shape Tilt/angle
Stipple/texture Area
Curvature
Relational/Same category Lightness
Grouping
Saturation
Containment (2D)
Texture
Connection
Proximity (other
(position) density
Similarity
channels)
We encode data
spatially to
• Express (show) values

• Arrange data groupings


• Separate /distinguish regions by categorical key
• Order groups by ordinal key
• Align for visual comparison along reference
value
Spatial Channels Spatial Layouts
Space Values
Express Parallel
Regions
Rectilinear
Separate

Radial
order

Align Spacefilling
1D

2D Dense

Given Use

Geographic
Fields
scalar
Design choices
[Munzner]
Single view
methods
• All information integrated in one view
• basic visual encodings
• spatial position
• color
• other channels
• pixel-oriented techniques
• visual layering
• global compositing
• item-level stacking
• glyphs
Expressing
values
• Scatterplots
• Axes encode 2D
values
• Color for key/
category
• Additional value
in size
100
90
80
70

Scatterplot idiom
60

Height (ft)
50
40
30
20
10
0
0 5 10 15
Circumference (ft)
Data transformations can
enhance value

Log transformations show strong correlation between size and price


Organising
data
• List alignment

• Can be ordered along the list


axis or by value
• Nominal and ranking (if axis
ordered)
• Emphasises individual
values
1

Bar chart
2

idiom
• Categorical attributes match well with spatial regions
• Separate, order, align
• Can be hard to find patterns in the data “shape”

Credit: T. Munzner, 2014


Stacked
bars
• Multidimensional tables with 2 keys/attributes
• Typically use colour or texture for 2nd
Few’s correlation bar
graph
Paired Bar graph with trend
lines (Few)
Streamgraphs

• Stacked time series

• Show shape of the data and part-


whole relationships
• De-emphasise individual values
Line charts and
dotplots
• Position to express
value according to
key

• Line charts use


angle/shape to
show trends

• Frequently
time
Line Chart
idiom
• Line charts, dotplots
• Good for ordered data

IAT 814 | Design Choices 1 55


Mind the Gap - An Economic Chart Remake

• ww
• ww

Mind the Gap - An Economic Chart Remake


58
IAT 814 | Design Choices 1
What’s
wrong?
The semantics of mark
types
• Bar charts, line charts and dotplots all encode a
quantitative value against a key attribute in a rectilinear
layout.
• Often use additional encoding for other categories
• Lines also use connection marks to show inter-item
relations
• Only use for ordered data!
Which to use
when
• Bars and bubbles emphasise comparison and
association of individual values

• Lines (explicit and implied) emphasise trends


Lines and
bars
Lines imply connections
• “the more male someone is
the taller he is”

Use when there is some ordered


progression between thediscrete categories
on the x-axis
• “12 year olds are taller than
10 year olds”
Tufte’s
Sparklines
• Give a hint of the trend, but don’t show the actual axes
and scales.

• Good for dashboards and small spaces


Lines: Aspect ratio
matters!
• our ability to judge angles is more accurate at exact
diagonals than at arbitrary direction
• We can judge distances “off” 45 or 90 degrees (43 ) but cannot see
the difference between 20 and 22 degrees
• Multiscale banking to 45 degrees – algorithm to compute informative
aspect ratios to maximise line segments close to the diagonal
Matrix
alignment
• Heatmap
• 2 keys, 1 value
• Good for dense encoding
• Re-ordering for clusters
Parallel
layouts
• Parallel coordinates
• Many key attributes
• Different correlations
• Value vector is a line
Parallel
coordinates
16K items, 5 keys
13 items, 7 keys
What about
Pies?
Radial
layouts

Use polar coordinates


• 1 categorical key, 1
quantitative value
radial
idioms
Idiom Star plot
What:data Table: 1 quant value, 1 categorical
attribute
How: Encode length coding along point marks at 1D spatial
position
along axis + 1D spatial position for
aligned axes

Idiom Pie chart


What:data Table: 1 quant value, 1 categorical
attribute
How: Encode area and angle
Percent Blue relative
to Red?
Percent Blue relative
to Red?

1
Few’s criteria for an effective
visualization
• Clearly indicate the nature of the relationship
• Represent the quantities accurately
• Makes it easy to compare the quantities
• Makes it easy to see the ranked order of values
• Makes obvious how people should use the information
Clearly indicate the
nature of the
relationship?
Represents quantities
accurately?
Makes it easy to compare
quantities?
Makes it easy to see ranked
values?
Makes it easy to see how
people should use
information?
A better
way
Percent Water

80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0

body

brain

blood
Percent Water
85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

0
body brain blood
Percent Water

100

75

50

25

0
body brain blood
Bad
Bett
er
Even Better*
0 5 10 15 20 25 30

Too Little

About Right

Too Much

National Spending to Deal with Drug Addiction


0 5 10 15 20 25 30 35 40

Too Little

About Right

Too Much Male

Female

National Spending to Deal with Drug Addiction


National Spending to Deal with Drug Addiction

Too Little

About Right Female Male

Too Much

- 5 10 15 20 25 30 35 40
National Spending to Deal with Drug Addiction

Femal
e

Too Little

About Right

Too Much

Mal
e

0 10 20 30 40 50 60 70 80 90 100
% % % % % % % % % % %

You might also like