Section 4.3 Measuring Variation Pt2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Stat 1010: standard deviation

Comment (shapes and centers)


For

skewed distributions, the median is a


better measure of center than the mean.
For skewed distributions, the mean is
pulled toward the tail (or the direction of
the skew), but the median is not.
The mean can be greatly affected by
outliers (extreme values) in the tail of a
distribution.
1

Mean vs. Median


For left-skewed data,

Mean > Median

Mean < Median

800

Frequency

200

200

400

600

800
600
400

Frequency

1000

1000

1200

1200

For right-skewed data,

10

15

10

bird count

15

20

points scored

Mean = 3.48
Median = 2.88

Mean = 16.52

Median = 17.11
2

Comment (boxplots and the IQR)


50%

of the observations fall between the


lower quartile (Q1) and the upper quartile
(Q3).
50% of the data

0.5

1.0

10

1.5

15

20

25
3

Stat 1010: standard deviation

Comment (boxplots and the IQR)


25%

of the observations fall below the


lower quartile (Q1).
25% of the data

0.5

1.0

10

1.5

15

20

25
4

Comment (boxplots and the IQR)


25%

of the observations fall above the


upper quartile (Q3).
Or similarly, 75% of the observations fall
below the upper quartile (Q3).
75% of the data

0.5

25% of the data

1.0

10

1.5

15

20

25
5

Comment (boxplots and the IQR)


The

range of the middle 50% of the data


is called the interquartile range (IQR)
50% of the data

0.5

1.0

10

1.5

15

20

25

IQR = Q3 Q1= 15 5 = 10

Stat 1010: standard deviation

Comment (boxplots and the IQR)


The

IQR quantifies the spread of the


middle 50% of the data. It is a measure of
variation.
50% of the data

0.5

1.0

10

1.5

15

20

25

IQR =15 5 = 10

Boxplot comparisons
Can

be used to check for technical


problems (e.g. gene expression values).

Usually,

you want to see similar


characteristics (center, shape, spread) for
all side-by-side boxplots.

Boxplot comparisons
20 different slides.
Centers all around 0.
Some have more
spread than others.
The IQR is larger
for some slides (you
can see this from the
rectangles at the
middle of the plot).
9

Stat 1010: standard deviation

4.3 Measures of Variation (part 2)


How

much variation is there in the data?

Look

for the spread of the distribution.

What

do we mean by spread?

Part
Part

1: Range and the Quartiles


2: Standard deviation (and Variance)
10

Limitations of measures of spread


The

range only considers the most extreme


values (min and max). The middle
observations do not affect the range at all.
These

1)

distributions can have the same range:


2)

10

10
11

Limitations of measures of spread


The

5-number summary and the quartiles


allow some of the middle numbers to
contribute to the measure of spread (better).
But

we often like to summarize the spread with a


single value (not 5 numbers).

Can

we find a measure of spread that allows


EVERY observation to contribute to the
measure AND is a single value?
12

Stat 1010: standard deviation

Yes!
The

standard deviation.

Involves

a computation that includes


EVERY observation.

Provides

a single summary value of the


spread of a distribution.

13

Standard Deviation
Based
This

What

on the deviation from the mean.


is a distance from the center.

is a deviation?

( xi x )
Mean computed from ALL
observations (measure of center).

One observed value.


Observation i.

14

Golf Scores (n=6)


46, 44, 50, 43, 47, 52
x=

40

282
= 47 strokes
6

45

50

55
15

Stat 1010: standard deviation

Deviations (distances from the mean)


Graphically

+5

+3

1
40

45

50

55
16

Deviations (distances from the mean)


Observed value mean = deviation

Numerically

43 47 = 4
52 47 = + 5

+5

3
1
40

45

+3

44 47 = 3
50 47 = + 3
46 47 = 1
47 47 =

50

55

17

Standard Deviation
Is

a measure of the average of all the


deviations from the mean.

Or

the average distance the observations


are from the mean.

Larger

standard deviation more spread.


Smaller standard deviation less spread.
18

Stat 1010: standard deviation

Standard Deviation
Because

the actual mean of the deviations


is always zero, we instead focus on the
mean of the squared deviations.
Mean

of the deviations: -4+5+-3+3+-1+0


0
=
=0
6
6

Also,

we divide by n-1 rather than n for


technical reasons.

And

we take the square root so we can work


in our original units (not squared units).
19

Standard Deviation
The

letter s usually represents the


standard deviation.

( ( x x ) )
2

s=

n 1

((deviations from the mean) )


2

s=

total number of observations 1


20

Standard Deviation
Golf

score example.

s=

(16 + 9 +1+ 25 + 9 + 0)
6 1

60
= 12 = 3.5strokes
5
21

Stat 1010: standard deviation

Technical note: Variance


A commonly

discussed measure of spread.


the average squared deviation.
Denoted as s2 (where s is the standard
deviation).
Almost

( x x ) )
(
=
2

s2

n 1

22

Variance
Golf

score example.

Notice

s2 =

the units of the variance is strokes2.

(16 + 9 +1+ 25 + 9 + 0) = 60
5

= 12 strokes

23

Center & Spread Choice


(General guidelines)
For symmetrical distributions
(Like the bell-shaped normal distribution)
Mean
Both

For

(center) and standard deviation (spread)


of these measures are affected by outliers

skewed distributions

Median

(center) and 5-number summary (spread)


24

You might also like