Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Pattern of data

Part 1 – section 4

Lecturer: Le Hoai Long (Ph.D.)


1
lehoailong@hcmut.edu.vn
Center
• The center of a distribution is located at the
median of the distribution.
• This is the point where about half of the
observations are on either side.

Lecturer: Le Hoai Long (Ph.D.)


2
lehoailong@hcmut.edu.vn
Spread
• The spread of a distribution refers to the
variability of the data.
• If the observations cover a wide range, the
spread is larger. If the observations are
clustered around a single value, the spread is
smaller

Lecturer: Le Hoai Long (Ph.D.)


3
lehoailong@hcmut.edu.vn
Shape
• The shape of a distribution is described by the
following characteristics.
– Symmetry
– Number of peaks. Distributions can have few
or many peaks.
• Distributions with one clear peak are called
unimodal,
• and distributions with two clear peaks are
called bimodal.

Lecturer: Le Hoai Long (Ph.D.)


4
lehoailong@hcmut.edu.vn
Shape
• And by the following characteristics.
– Skewness. Distributions with most of their
observations on the left (toward lower values)
are said to be skewed right; and so on.
– Uniform. When the observations in a set of
data are equally spread across the range of the
distribution, the distribution is called a uniform
distribution.

Lecturer: Le Hoai Long (Ph.D.)


5
lehoailong@hcmut.edu.vn
Shape

Lecturer: Le Hoai Long (Ph.D.)


6
lehoailong@hcmut.edu.vn
Gap and outlier
• Gaps: areas of a
distribution where
there are no
observations.
• Outliers: distributions
are characterized by
extreme values that
differ greatly from the
other observations.
Lecturer: Le Hoai Long (Ph.D.)
7
lehoailong@hcmut.edu.vn
Chart and graph
Dotplot
• A dotplot is made up of dots plotted on a graph.
– Each dot can represent a single observation or a
specified number of observations.
– The dots are stacked in a column over a category
– If the categories are quantitative, the pattern of data
in a dotplot can be described in terms of symmetry
and skewness
• Dotplots are used most often to plot frequency
counts within a small number of categories,
usually with small sets of data.

Lecturer: Le Hoai Long (Ph.D.)


8
lehoailong@hcmut.edu.vn
Dotplot
• In SPSS:
1. Graphs
2. Legacy
dialogs
3. Scatter/
Dot

Lecturer: Le Hoai Long (Ph.D.)


9
lehoailong@hcmut.edu.vn
Chart and graph
Bar Charts
• A bar chart is made up of columns plotted on
a graph.
– The columns are positioned over a label that
represents a categorical variable.
– The height of the column indicates the size of the
group defined by the column label.

Lecturer: Le Hoai Long (Ph.D.)


10
lehoailong@hcmut.edu.vn
Chart and graph
Histograms
• Like a bar chart, a histogram is made up of
columns plotted on a graph. Usually, there is no
space between adjacent columns.
– The columns are positioned over a label that
represents a quantitative variable.
– The column label can be a single value or a range of
values.
– The height of the column indicates the size of the
group defined by the column label.

Lecturer: Le Hoai Long (Ph.D.)


11
lehoailong@hcmut.edu.vn
Bar chart and histogram
• In SPSS: Graphs => Legacy dialogs => Bar
(Histogram)

Lecturer: Le Hoai Long (Ph.D.)


12
lehoailong@hcmut.edu.vn
Chart and graph
Difference Between Bar Charts and Histograms
• With bar charts, each column represents a
group defined by a categorical variable; and
with histograms, each column represents a
group defined by a quantitative variable.
• It is always appropriate to talk about the
skewness of a histogram. And how about bar
charts?

Lecturer: Le Hoai Long (Ph.D.)


13
lehoailong@hcmut.edu.vn
Chart and graph
Stemplots
• A stemplot is used to display quantitative
data, generally from small data sets (50 or
fewer observations).
• The entries on the left are called stems; and
the entries on the right are called leaves
• Stemplots usually do not include explicit
labels for the stems and leaves
Lecturer: Le Hoai Long (Ph.D.)
14
lehoailong@hcmut.edu.vn
Stemplot (Stem and leaf)

Lecturer: Le Hoai Long (Ph.D.)


15
lehoailong@hcmut.edu.vn
Chart and graph
Boxplot Basics
• A boxplot splits the data set into quartiles. The body of
the boxplot consists of a "box” which goes from the
first quartile (Q1) to the third quartile (Q3).
• Within the box, a vertical line is drawn at the Q2, the
median of the data set.
• Two horizontal lines, called whiskers, extend from the
front and back of the box. The front whisker goes from
Q1 to the smallest non-outlier in the data set, and the
back whisker goes from Q3 to the largest non-outlier
• If the data set includes one or more outliers, they are
plotted separately as points on the chart
Lecturer: Le Hoai Long (Ph.D.)
16
lehoailong@hcmut.edu.vn
Boxplot
• In SPSS: Graphs => Legacy dialogs => Boxplot

Lecturer: Le Hoai Long (Ph.D.)


17
lehoailong@hcmut.edu.vn
Chart and graph
Scatterplot
• A scatterplot is a graphic tool used to display
the relationship between two quantitative
variables
• A scatterplot consists of an X axis (the
horizontal axis), a Y axis (the vertical axis), and
a series of dots.
• Each dot on the scatterplot represents one
observation from a data set
Lecturer: Le Hoai Long (Ph.D.)
18
lehoailong@hcmut.edu.vn
Chart and graph
Scatterplot
• Scatterplots are used to analyze patterns in
bivariate data.
• These patterns are described in terms of
linearity, slope, and strength.

Lecturer: Le Hoai Long (Ph.D.)


19
lehoailong@hcmut.edu.vn
Scatter plot

Lecturer: Le Hoai Long (Ph.D.)


20
lehoailong@hcmut.edu.vn
Compare distributions
• Focus on four
features:
– Center.
– Spread.
– Shape.
– Unusual
features.

Lecturer: Le Hoai Long (Ph.D.)


21
lehoailong@hcmut.edu.vn
Table
• Alternatively, data can be presented in table
form
– One-way table
– Two-way table

Lecturer: Le Hoai Long (Ph.D.)


22
lehoailong@hcmut.edu.vn
Table
• A one-way table is the tabular equivalent of a bar
chart. Like a bar chart, a one-way table displays
categorical data in the form of frequency counts
and/or relative frequencies.
– Frequency Tables: a one-way table shows frequency
counts for a particular category of a categorical
variable
– Relative Frequency Tables: a one-way table shows
relative frequencies for particular categories of a
categorical variable
Lecturer: Le Hoai Long (Ph.D.)
23
lehoailong@hcmut.edu.vn
Table
• A two-way table (also called a contingency
table) is a useful tool for examining
relationships between categorical variables.
The entries in the cells of a two-way table can
be frequency counts or relative frequencies
just like a one-way table

Lecturer: Le Hoai Long (Ph.D.)


24
lehoailong@hcmut.edu.vn
Table

Lecturer: Le Hoai Long (Ph.D.)


25
lehoailong@hcmut.edu.vn
Be careful,
Simpson’s paradox
• Simpson's paradox (or the Yule-Simpson
effect) is a paradox in which a correlation
present in different groups is reversed when
the groups are combined.
• It occurs when frequency data are hastily
given causal interpretations.
• Simpson's Paradox disappears when causal
relations are brought into consideration
(Wikipedia)

Lecturer: Le Hoai Long (Ph.D.)


26
lehoailong@hcmut.edu.vn
Be careful,
Simpson’s paradox
• Consider the situation of two contractors in the table
below (Good quality/number of contracts)
• Who is better? (Long N.D. 2010)
Type of contract
Civil Industrial Total
Contractor A 40/60 13/15 53/75
66.6% 86.7% 70.7%
Contractor B 5/8 42/50 47/58
62.5% 84% 81%
Lecturer: Le Hoai Long (Ph.D.)
27
lehoailong@hcmut.edu.vn

You might also like