Standard Deviation, Standardization and Outliers

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

EXTENSION MATERIAL

Standard deviation,
2 standardization and outliers
Try this worksheet after you have completed section 2.6. You may find these
techniques useful in your project.

Standard deviation If you do the calculation by


hand, you can check it is
The standard deviation is the statistic most commonly used to represent the dispersion
correct using your GDC. You
of a set of data. It measures the spread of the data in the set from the mean. In your
don’t want to lose marks
project you may wish to show the moderator that you know how to calculate this by
because of errors in simple
hand rather than relying on your GDC.
mathematical processes.

EXAMPLE 1
Follow the steps below to calculate the standard deviation of this set of data:
12 10 8 12 11 10 12 8 12 10
Step 1 The standard deviation is also called the ‘root, mean squared deviation’.
To calculate the standard deviation, first calculate the mean of the set of data.
Step 2 To measure the deviation from the mean, subtract the mean from each
number in the list.
Step 3 Square each of the numbers you are left with after step 2.
Step 4 Calculate the mean of this new set of squared numbers.
Step 5 Find the square root of your answer to step 4.

Answer
12 + 10 + 8 + 12 + 11 + 10 + 12 + 8 + 12 + 10
Step 1 The mean, x , is
10
= 105 ÷ 10 = 10.5
You can set out the data
Step 2 Step 3
values and calculations in a
Number, x (x − x) (x − x)
2
table like this.

12 1.5 2.25

10 −0.5 0.25

8 −2.5 6.25

12 1.5 2.25

11 0.5 0.25

10 −0.5 0.25

12 1.5 2.25

8 −2.5 6.25

12 1.5 2.25

10 −0.5 0.25

Step 4 The mean of ( x − x ) is


2 2.25 + 0.25 + 6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25 + 2.2
25 + 0.25
10
= 22.5 ÷ 10 = 2.25
Step 5 Standard deviation = 2.25 = 1.5

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 1
EXTENSION MATERIAL

Exercise 1
1 Calculate the standard deviation of this data set:
21 15 12 15 12 18 12 15 18 12

2 Which set of numbers is more spread out: the set in the in the example or the set in
question 1?

Algebraic method for calculating


standard deviation
Let the original data values be
x1 x2 x3 x4 … x xn
n 1

There are n numbers in the list.


Step 1 Calculate the mean of these numbers – this is denoted as x
Step 2 Subtract the mean from each number in the list to give:
x1 − x x2 − x x3 − x … x x
n
xn − x 1

Step 3 Some of these numbers will be positive and others negative, so square each
number in the list so that you have only positive numbers.
(x −x)
2
( x1 − x ) ( x2 − x ) ( x3 − x ) x
2 2 2 2
… x n
n 1

∑1 ( x k − x )
n 2
Step 4 Find the sum of this new list. This is written as
Then find the mean, by dividing the sum by the number of data points in the list (n).

∑(x )
n 2
k −x
This is written as 1
n
Step 5 Take the square root.

∑(x )
n 2
k −x
Standard deviation = 1
n

Alternative method for calculating


the standard deviation
Step 1 Square each number in your original list.
Step 2 Find the mean of these numbers (mean of the squares).
Step 3 Find the mean of the original set of numbers and square it
(square of the mean).
Step 4 Subtract ‘the square of the mean from the mean of the squares’
[Answer 2 – Answer 3.]
Step 5 Square root your answer.
The formula for this is: Standard deviation = ∑x −(x ) k
2
2

n
Use this method to calculate the standard deviation of the data sets above.

Exercise 2
1 Calculate the mean and the standard deviation of each set of data
a 5 6 7 8 9 10 11
b 65 66 67 68 69 70 71
c 50 60 70 80 90 100 110
d 7.5 7.6 7.7 7.8 7.9 8.0 8.1

2 Do you notice anything interesting about the way in which the means and the
standard deviations change?

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 2
EXTENSION MATERIAL

Calculating the standard deviation


from a frequency table
EXAMPLE 2
Calculate the mean and the standard deviation of the data in the frequency table:
x 1 2 3 4 5 6 7
f 6 2 0 5 6 4 9

Answer
Using the shorter method:
x f x×f (x 2) f × (x 2)
1 6 6 1 6

2 2 4 4 8

3 0 0 9 0

4 5 20 16 80

5 6 30 25 150

6 4 24 36 144

7 9 63 49 441

Total 32 147 829

x = 147 ÷ 32 = 4.59375
¦ f x = 829 ÷ 32 = 25.90625
2
k

Standard deviation = 25.90625  4.593752 = 2.19 (3 sf)

Exercise 3
1 Calculate the mean and the standard deviation of each set of data.
a
x 11 12 13 14 15 16 17
f 6 2 0 5 6 4 9

b x 1 2 3 4 5 6 7
f 2 5 10 14 20 34 29

c x 1 2 3 4 5 6 7
f 16 12 52 41 34 25 44

2 Find the means and standard deviations of these sets of data.

∑ ( x − x ) = 150
2
∑ x = 567 n = 15

∑ ( x − x ) = 80
2
∑ x = 167 n = 24

∑ ( x − x ) = 85
2
∑ x = 125 n = 12

∑ ( x − x ) = 32
2
∑ x = 652 n = 52

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 3
EXTENSION MATERIAL

3 Find the means and standard deviations of these sets of data.

∑ x = 567 ∑ x 2 = 22570 n = 24

∑ x = 125 ∑ x 2 = 2003 n = 15

∑ x = 85 ∑ x 2 = 998 n = 30

∑ x = 445 ∑ x 2 = 6250 n = 54

∑ x = 250 ∑ x = 2257
2
n = 29

Standardizing results
Statistics analyzes data in order to reach a conclusion. For example you can analyze
examination results, to compare a student’s performance in different exams.
Here are Bruce’s exam results.

Subject Mark
Mathematics 50
English 75
Physics 70
Biology 65
History 45
Drama 90
Music 85

To properly compare results across subjects you need to use

the standardized score, z =


(x − x )
sd
where
x = mean mark for all the students in that exam
sd = standard deviation for all the students’ results in that exam.

Standardized score
Standard ⎛ (x − x) ⎞
Subject Mark (x) Mean mark (x)
deviation (sd) ⎜z = ⎟
⎝ sd ⎠

Mathematics 50 60 20 −0.5
English 75 50 15 1.67
Physics 70 65 10 0.5
Biology 65 70 5 −1
History 45 50 15 −0.33
Drama 90 95 2 −2.5
Music 85 72.5 12.5 1

This shows that Bruce did best in English, and then music. His worst result was for drama.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 4
EXTENSION MATERIAL

Exercise 4
1 The table shows Fred’s exam marks, the mean mark for the year group and the
standard deviation.
Which subjects did Fred do best and worst in, compared to the rest of his year?

Mean Standard
Subject Mark
mark deviation
Mathematics 75 60 20
English 75 50 15
Physics 75 65 18
Biology 85 70 12
Art 40 45 20
History 45 55 15
Drama 90 95 2
Music 85 72.5 12.5

2 Jenny wrote eight essays for her GCSE English. Her marks out of 20 were
12, 15, 13, 17, 10, 9, 15, 13
a Calculate Jenny’s mean mark.
b Calculate the standard deviation of Jenny’s marks.

Paula’s marks in the same eight essays were 19, 8, 4, 16, 12, 18, 5, 6.
c Calculate Paula’s mean mark.
d Calculate Paula’s standard deviation.
e Briefly compare the two sets of marks.

3 The mass of coffee, in grams, in ten jars of ‘Fine Blend’ labeled 200 g are
218, 222, 206, 212, 220, 200, 196, 222, 194, 212
a Work out the mean mass of coffee per jar.
b Work out the standard deviation for the mass of the coffee.

4 The class results for paper 1 and paper 2 in a mock exam were

(74, 59) (66, 54) (54, 56) (34, 22) (45, 63) (78, 71)
(90, 85) (49, 42) (72, 59) (45, 39) (54, 48) (34, 42)
(77, 63) (78, 45) (81, 85) (49, 37)
a Find the mean mark and standard deviation for each paper.
b A candidate scored 65 on paper 1 and 60 on paper 2. Which was his
better performance?

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 5
EXTENSION MATERIAL

Outliers
An outlier is a value that is much smaller or much larger than the other values.
Normally an outlier is
smaller than ‘the lower quartile – 1.5 × the interquartile range’
or larger than ‘the upper quartile + 1.5 × the interquartile range’

EXAMPLE 3
Fifty children do a jigsaw puzzle. The times, correct to the nearest minute, for
completing the puzzle are
Time (t min) Frequency
4 4
5 12
6 18
7 9
8 3
9 2
11 1

Construct a box and whisker graph and test for outliers.

Answer
Using the GDC, the lower quartile = 5, the median = 6, the upper quartile = 7.
The lowest value is 4 and the highest is 11.
The box and whisker graph is:
y

0 1 2 3 4 5 6 7 8 9 10 11 12 x

IQR = 7 – 5 = 2
5 – 1.5 × 2 = 5 – 3 = 2, there are no outliers at the low end.
7 + 1.5 × 2 = 7 + 3 = 10, 11 is larger than 10, so 11 is an outlier.

Exercise 5
1 The temperatures in °C recorded each day at noon in April in Rotterdam were
8 9 7 6 8 10 11 12 9 12
13 8 10 11 13 9 13 14 11 9
12 10 9 12 11 15 12 15 23 26
a Draw a box and whisker graph to represent this information.
b Test for outliers.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 6
EXTENSION MATERIAL

2 The table shows the numbers of daisies in patches of area 25 cm2 .

Number of daisies Frequency


6 10
7 17
8 18
9 25
10 29
11 7
12 5
13 2
14 1

a Draw a box and whisker graph to represent this information.


b Test for outliers.

3 The number of words in each sentence in the first chapter of a book are

Number of words Frequency


1 1
4 2
5 2
6 5
7 6
8 17
12 25
15 12
16 18
17 5
20 2
21 1
23 2
30 1

a Draw a box and whisker graph to represent this information.


b Test for outliers.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 7
EXTENSION MATERIAL

Chapter 2 extension worked solutions


Exercise 1
1 Mean (x) = sum of data ÷ number of data items

Number (x) x−x (x − x)2


21 6 36
15 0 0
12 −3 9
15 0 0
12 −3 9
18 3 9
12 −3 9
15 0 0
18 3 9
12 −3 9
Total = 150 Total = 90
x = 150 ÷ 10 90 ÷ 10
= 15 =9
Mean of (x − x)2 = 9
Standard deviation = 9 = 3
2 The set in question 1 (sd in example = 1.5)

Exercise 2
1 a Mean = (5 + 6 + 7 + 8 + 9 + 10 + 11) ÷ 7 = 56 ÷ 7 = 8
∑ ( x − x ) = (9 + 4 +1 + 0 + 1 + 4 + 9) = 28
2

∑(x − x ) ÷ 7 = 4
2

sd = 4 = 2
b 68, 2 c 80, 20 d 7.8, 0.2
2

Exercise 3
1 a 14.594, 2.192 b 5.307, 1.540 c 4.411, 1.813

2 a 37.8, 3.162 b 6.833, 1.826


5
c 10 , 2.661 d 12.538, 0.784
12

3 a 23.625, 19.552 b 8.333, 8.006 c 2.833, 5.024


d 8.241, 6.916 e 8.621, 1.874

Exercise 4
1 z-scores = 0.75, 1.67, 0.56, 1.25, –0.25, –0.67, –2.5, 1

2 a 13 b 2.6
c 11 d 5.68

3 a 210.2 g b 10.7 g

4 a Paper 1, mean = 61.25 sd = 17.79 Paper 2, mean = 54.375 sd = 17.06


b zx = 0.211 zy = 0.329

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 8
EXTENSION MATERIAL

Exercise 5
1 a f
10
8
6
4
2

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 x

b 9 – 1.5 × 4 = 3. Therefore no outliers at the low end.


13 + 1.5 × 4 = 19. Therefore 23 and 26 are both outliers.

2 a f
10
8
6
4
2

0 10 20 x

b 8 – 1.5 × 2 = 5. Therefore no outliers at the low end.


10 + 1.5 × 2 = 13. Therefore 14 is an outlier.

3 a f
10
8
6
4
2

0 10 20 30 x

b 8 – 1.5 × 8 = −4. So, no outliers at the low end.


16 + 1.5 × 8 = 28. So, 30 is an outlier.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 9

You might also like