Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

Statistics

Measures of
location and spread
Twitter: @Owen134866

www.mathsfreeresourcelibrary.com
Prior Knowledge Check

1) State whether each variable is 3) Calculate the 3 averages and range for
qualitative or quantitative: the data set below:
Qualitative
a) Car colour Quantitative
b) Miles travelled by a cyclist Peas in a pod 3 4 5 6 7
Qualitative
c) Favourite pet Frequency 4 7 11 18 6
d) Number of siblings Quantitative
Mean 5.33

Median 6
2) State whether each of these variables is
discrete or continuous
Discrete Mode 6
e) Number of pets owned
Continuous
Range 4
f) Distance walked by hikers
Continuous
g) Fuel consumption of lorries Discrete
h) Number of peas in a pod
i) Times taken by aContinuous
group of athletes to
run 1500m.
Teachings for
Exercise 2A and 2B
Measures of location and spread
A measure of location is a single value
which is used to represent a set of
data. Examples include the mean,
median and mode

These can also be known as ‘measures of


central tendency’

The mode or modal class is the value or


class with the highest frequency The sum of ( represents
The mean, each number in the data
‘x-bar’ set)
The median is the middle value when the
data is put into ascending (or descending)
order

The mean is calculated using the formula: 𝑥=


∑𝑥
𝑛 The number of
You need to start using ‘proper’
‘bits’ of data
notation!
2A/B
𝑥=
∑𝑥
𝑛

Measures of location and spread


A measure of location is a single value
which is used to represent a set of
data. Examples include the mean,
median and mode

The mean of a sample of 25 observations


is 6.4. The mean of a second sample of 30
observations is 7.2. Calculate the mean of
all 55 observations.

6.4+ 7.2
2

Why is this calculation wrong?

Each data set has a different quantity

 The mean will be ‘weighted’ towards


the data set with the higher quantity

2A/B
𝑥=
∑𝑥
𝑛

Measures of location and spread


Sample 1 Sample 2
A measure of location is a single value
which is used to represent a set of ∑𝑥 ∑𝑥
data. Examples include the mean, 6.4= 7.2=
median and mode 25 Multiply 30 Multiply
by 25 by 30
∑ 𝑥 ¿ 160 ∑ 𝑥 ¿ 216
The mean of a sample of 25 observations ❑ ❑
is 6.4. The mean of a second sample of 30
observations of 7.2. Calculate the mean of So the total sum of the data is 376
all 55 observations.
 There were 55 ‘bits’ of data in total

𝑥=
∑𝑥
𝑛
Sub in
376 values
𝑥=
55
Sub in
values
𝑥=6.84

2A/B
𝑥=
∑𝑥
𝑛

Measures of location and spread


A measure of location is a single value
which is used to represent a set of
data. Examples include the mean,
median and mode

You will need to decide which is the best


average to use depending on the
situation…

Mode Median Mean


 The mode is used for  This is used for  The mean is used for
qualitative data, or quantitative data, and is quantitative data and
where the data has a usually used when the uses all data in a set. It
single mode or two data has some extreme therefore gives a true
modes (bi-modal) values (since this will measure, but can be
not affect the median skewed by extreme
too much) values

2A/B
𝑥=
∑𝑥
𝑛
𝑥=
∑ 𝑓𝑥
∑𝑓 Measures of location and spread
A measure of location is a single value
which is used to represent a set of
data. Examples include the mean,
median and mode

For data given in a frequency table, you


can calculate the mean by using the
formula below: The sum of the products of
the data values and their
𝑥=
∑ 𝑓𝑥 frequencies
∑𝑓
The sum of the
frequencies

2A/B
𝑥=
∑𝑥
𝑛
𝑥=
∑ 𝑓𝑥
∑𝑓 Measures of location and spread
𝑥 𝑓
A measure of location is a single value
which is used to represent a set of Collar Size
Number of
data. Examples include the mean, Students
median and mode 15 3
3
15.5 17 20
Rebecca records the shirt collar size, x,
of the male students in her year. The 16 29 49
results are shown in the table. 16.5 34

17 12
For the data, calculate:
a) The mode
b)
¿ 𝟏𝟔 . 𝟓
The median¿ 𝟏𝟔
∑ 𝑓 =95
95+1
c) The mean 2

d) Explain why a shirt manufacturer ¿ 48 𝑡h


might use the mode for setting their The median is the 48th value
production quota  Add the frequencies up until you get beyond
this value
 So the median must be 16 as it is in that group

2A/B
𝑥=
∑𝑥
𝑛
𝑥=
∑ 𝑓𝑥
∑𝑓 Measures of location and spread
𝑥 𝑓 𝑓𝑥
A measure of location is a single value
which is used to represent a set of Collar Size
Number of
data. Examples include the mean, Students
median and mode 15 3 45
15.5 17 263.5
Rebecca records the shirt collar size, x,
of the male students in her year. The 16 29 464
results are shown in the table. 16.5 34 561
17 12 204

∑ 𝑓 =95 ∑ 𝑓𝑥=1537.5
For the data, calculate:
a) The mode ¿ 𝟏𝟔 . 𝟓
b) The median¿ 𝟏𝟔
∑ 𝑓𝑥
c) The mean ¿ 𝟏𝟔 . 𝟐 𝑥=
∑𝑓 Sub in
d) Explain why a shirt manufacturer values
might use the mode for setting their 1537.5
𝑥=
production quota 95
Calculate
𝑥=16.2
2A/B
𝑥=
∑𝑥
𝑛
𝑥=
∑ 𝑓𝑥
∑𝑓 Measures of location and spread
𝑥 𝑓 𝑓𝑥
A measure of location is a single value
which is used to represent a set of Collar Size
Number of
data. Examples include the mean, Students
median and mode 15 3 45
15.5 17 263.5
Rebecca records the shirt collar size, x,
of the male students in her year. The 16 29 464
results are shown in the table. 16.5 34 561
17 12 204

∑ 𝑓 =95 ∑ 𝑓𝑥=1537.5
For the data, calculate:
a) The mode ¿ 𝟏𝟔 . 𝟓
b) The median¿ 𝟏𝟔
c) The mean ¿ 𝟏𝟔 . 𝟐 The mode is in this case more useful as it
tells the manufacturer what size shirt it
d) Explain why a shirt manufacturer
might use the mode for setting their needs to produce the most of
production quota

2A/B
𝑥=
∑𝑥
𝑛
𝑥=
∑ 𝑓𝑥
∑𝑓 Measures of location and spread
𝑥 𝑓 𝑓𝑥
A measure of location is a single value Cone length
Frequency
which is used to represent a set of (mm)
data. Examples include the mean, 61
30-31 2
median and mode 30.5

32-33 25 812.5
32.5

The length, mm, to the nearest mm, of a 34-36 30 1050


35
random sample of pine cones is measured. 494
37-39 13
The data is shown in the table to the 38

∑ 𝑓 =70 ∑ 𝑓𝑥=2417.5
right.

34-36
a) Write down the modal class To calculate the mean from a grouped table
you need to use the midpoint of each group
b) Estimate the mean 34.5
c) Find the median class ∑ 𝑓𝑥
𝑥=
∑𝑓 Sub in
values
2417.5
𝑥=
70
Calculate
𝑥=34.5
2A/B
𝑥=
∑𝑥
𝑛
𝑥=
∑ 𝑓𝑥
∑𝑓 Measures of location and spread
𝑥 𝑓
A measure of location is a single value Cone length
Frequency
which is used to represent a set of (mm)
data. Examples include the mean,
30-31 2
median and mode 2
32-33 25 27

The length, mm, to the nearest mm, of a 34-36 30 57


random sample of pine cones is measured.
37-39 13
The data is shown in the table to the

∑ 𝑓 =70
right.

34-36
a) Write down the modal class
b) Estimate the mean 34.5 70+1
2
c) Find the median class
34-36 ¿ 35.5 𝑡h
The median is the 35.5th value
 Add the frequencies up until you get beyond
this value
 So the median class is 34-36

2A/B
Teachings for
Exercise 2C
Measures of location and spread
Lower Upper
You need to be able to calculate Lowest
Quartile Median Quartile
Highest
quartiles and percentiles of a data value 𝑄1 𝑄2 𝑄3 value
set

10%25% 50% 25% 90% 25% 50% 25%


The median describes the middle of a
set of data, splitting the data into two  When combined with the median, the lower
halves with 50% in each. and upper quartiles split the data into 4 equal
sections

 You can also calculate quartiles and  The 10th percentile is the value with 10%
percentiles, which are also both
of the data lower than it
measures of location

 The 90th percentile is the value with 90%


 The median is also known as the of the data lower than it
second quartile
 So if your test score was in the 90 th
percentile, that is a good thing!

2C
Measures of location and spread
You need to be able to calculate
quartiles and percentiles of a data
set

 The way the quartiles are calculated


depends on whether the data is
discrete or continuous…

Lower Quartile Upper Quartile


Divide by 4 Divide by 4
 If whole, the LQ is between this  If whole, the UQ is between this
Discrete value and the one above
 If not whole, round up and take
value and the one above
 If not whole, round up and take
that data point that data point

Divide by 4 and Divide by 4 and


Continuous take that data point take that data point

2C
Measures of location and spread
You need to be able to calculate value
quartiles and percentiles of a data There are
set 20 values
20+1
𝑄 2=
2
Calculate
From the large data set, the daily
maximum gust (knots) during the first value
20 days of June 2015 is recorded in Find this
Hurn. The data is shown below: 𝑄 2=22.5 value

14 15 17 17 18
18 19 19 22 22
Note that we treat this as
23 23 23 24 25 discrete data since we have all
26 27 28 36 39 the actual values!

Find the median and quartiles for this


data.
𝑸 𝟐=𝟐𝟐 . 𝟓
2C
Measures of location and spread
You need to be able to calculate  The data is discrete
quartiles and percentiles of a data
set
For
𝑛
From the large data set, the daily 4
maximum gust (knots) during the first
20 days of June 2015 is recorded in 20
Hurn. The data is shown below: 4¿ 5
 So take the 5.5th value
14 15 17 17 18
𝑄1 =18
18 19 19 22 22
For
23 23 23 24 25 3𝑛
26 27 28 36 39 4
60
Find the median and quartiles for this 4¿ 15
data.  So take the 15.5th value
𝑸 𝟏=𝟏𝟖 𝑸 𝟐=𝟐𝟐 . 𝟓 𝑸 𝟑=𝟐𝟓 . 𝟓 𝑄 3=25.5
2C
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
You need to be able to calculate  The data is continuous
quartiles and percentiles of a data 3𝑛
set
4
210
The length of time (to the nearest value
4
minute) spent on the internet each
evening by a group of students is  Find the group that this is in, and use linear
shown in the table below. interpolation to estimate the median…
Time spent on
Frequency  You can use this formula:
internet (mins)

30-31 2 2 Places into


32-33 25 27
group

( 𝑃𝐿
)
34-36 30 57
37-39 13 𝐿𝐵 + × 𝐶𝑊
𝐺𝐹
a) Find an estimate for the upper
quartile
Lower Classwidth of
b) Find an estimate for the 10th boundary of Group the group
percentile the group Frequency

2C
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
The value is 25.5 places into
You need to be able to calculate
quartiles and percentiles of a data (
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 ) the group (it is the 52.5th value,
and we have already had 27
before the group started)
set
 Remember for continuous
data, you will need to use 33.5

( )
The length of time (to the nearest 25.5 and 36.5 as the class
minute) spent on the internet each
33.5+ ×3 boundaries
30
evening by a group of students is
shown in the table below. Calculate

Time spent in
internet (mins)
Frequency ¿ 36.05
30-31 2 2
32-33 25 27
34-36 30 57
37-39 13

a) Find an estimate for the upper


quartile value
b) Find an estimate for the 10th
percentile

2C
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
You need to be able to calculate  The data is continuous
quartiles and percentiles of a data  The 10th percentile is calculated as follows
set
10 𝑛
The length of time (to the nearest 100
minute) spent on the internet each 700
evening by a group of students is value
shown in the table below.
100

Time spent in  Find the group that this is in, and use linear
Frequency
internet (mins) interpolation to estimate it…
30-31 2 2
32-33 25 27
34-36 30
37-39 13

a) Find an estimate for the upper


quartile ¿ 𝟑𝟔 . 𝟎𝟓
b) Find an estimate for the 10th
percentile

2C
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
The value is 5 places into the
You need to be able to calculate
quartiles and percentiles of a data
𝐿𝐵 +( 𝑃𝐿
𝐺𝐹
× 𝐶𝑊 ) group (it is the 7th value, and we
have already had 2 before the
group started)
set
 Remember for continuous
data, you will need to use 31.5

( )
The length of time (to the nearest 5 and 33.5 as the class
minute) spent on the internet each
31.5+ ×2 boundaries
25
evening by a group of students is
shown in the table below. Calculate

Time spent in 𝑃 10 =31.9


Frequency
internet (mins)

30-31 2 2
32-33 25 27
34-36 30
This notation is usually used
37-39 13 for the 10th percentile
a) Find an estimate for the upper
quartile ¿ 𝟑𝟔 . 𝟎𝟓
b) Find an estimate for the 10th
percentile

2C
Teachings for
Exercise 2D
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
A measure of spread is a value which
indicated how spread out the data
set is. Examples include the range
and interquartile range.

 The range is the difference between


the largest and smallest values, and
measures the spread of all the data

 The interquartile range is the


difference between the upper and
lower quartiles, and measures the
spread of the middle 50% of the
data

 The interpercentile range is the


difference between 2 given percentiles

2D
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
Mass, m (t) Frequency
A measure of spread is a value
13
which indicated how spread out the
data set is. Examples include the 23
range and interquartile range.
31
The range is the biggest
34
The table to the right shows the possible value subtract the
masses (tonnes) of 120 African smallest possible value 19
elephants.
6.5 - 4.0 = 2.5

Find estimates for:


a) The range 𝟐 . 𝟓
b) The interquartile range
c) The 10th to 90th percentile range

2D
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
Mass, m (t) Frequency
A measure of spread is a value
13
which indicated how spread out the 13
data set is. Examples include the 23
range and interquartile range. 36
31
67
For
34
The table to the right shows the 𝑛 101
masses (tonnes) of 120 African 19
elephants. 4 120

120
¿ value
Find estimates for: 4
a) The range 𝟐 . 𝟓 Now use linear interpolation
b) The interquartile range
c) The 10th to 90th percentile range
𝐿𝐵 + ( 𝐺𝐹
𝑃𝐿
× 𝐶𝑊 )
Sub in
values
𝑸 𝟏=𝟒 . 𝟖𝟕 ¿ 4.5+ ( 17
23
× 0.5 )
Calculate
¿ 4.87

2D
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
Mass, m (t) Frequency
A measure of spread is a value
13
which indicated how spread out the 13
data set is. Examples include the 23
range and interquartile range. 36
31
67
For
34
The table to the right shows the 101
masses (tonnes) of 120 African
3𝑛
19
elephants. 4 120

360
¿ value
Find estimates for: 4
a) The range 𝟐 . 𝟓 Now use linear interpolation
b) The interquartile range 𝟎 . 𝟗𝟕
c) The 10th to 90th percentile range
𝐿𝐵 +( 𝐺𝐹
𝑃𝐿
× 𝐶𝑊 )
Sub in
values
𝑸 𝟏=𝟒 . 𝟖𝟕 𝑸 𝟑=𝟓 . 𝟖𝟒

𝟓 . 𝟖𝟒 −𝟒 . 𝟖𝟕=𝟎 . 𝟗𝟕
¿ 5.5+ ( 23
34
× 0.5 )
Calculate
¿ 5.84

2D
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
Mass, m (t) Frequency
A measure of spread is a value
13
which indicated how spread out the 13
data set is. Examples include the 23
range and interquartile range. 36
31
67
For
34
The table to the right shows the 10 𝑛 101
masses (tonnes) of 120 African 19
elephants. 100 120

1200
¿ value
Find estimates for: 100
a) The range 𝟐 . 𝟓 Now use linear interpolation
b) The interquartile range 𝟎 . 𝟗𝟕
c) The 10th to 90th percentile range
𝐿𝐵 + ( 𝐺𝐹
𝑃𝐿
× 𝐶𝑊 )
Sub in
values
𝑷 𝟏𝟎=𝟒 . 𝟒𝟔
¿ 4.0+ ( 12
13
× 0.5 )
Calculate
¿ 4.46

2D
(
𝐿𝐵 +
𝑃𝐿
𝐺𝐹
× 𝐶𝑊 )
Measures of location and spread
Mass, m (t) Frequency
A measure of spread is a value
13
which indicated how spread out the 13
data set is. Examples include the 23
range and interquartile range. 36
31
67
For
34
The table to the right shows the 90 𝑛 101
masses (tonnes) of 120 African 19
elephants. 100 120

10800
¿ value
Find estimates for: 100
a) The range 𝟐 . 𝟓 Now use linear interpolation
b) The interquartile range 𝟎 . 𝟗𝟕
c) The 10th to 90th percentile range
𝐿𝐵 + ( 𝐺𝐹
𝑃𝐿
× 𝐶𝑊 )
Sub in
values
𝑷 𝟏𝟎=𝟒 . 𝟒𝟔 𝑷 𝟗𝟎=𝟔 .𝟏𝟖 ¿ 6.0 + ( 197 × 0.5 )
𝟔 . 𝟏𝟖 −𝟒 . 𝟒𝟔=𝟏 .𝟕𝟐 Calculate
¿ 6.18

2D
Teachings for
Exercise 2E
Measures of location and spread
The variance and standard deviation
can also be used to analyse a set of 𝑀𝑒𝑎𝑛=
∑ 𝑥𝑖
data 3 -4.6 𝑛 Sub in
6 -1.6 values
38
The variance and standard deviation 7 ¿
-0.6 5
are both measures of spread, and Calculate
involve the fact that each data point 9 1.4
deviates from the mean by the 13 5.4
¿ 7.6
amount:

Now we can fill in this column, with each data point


subtracting the mean…

Where is a specific data point and is Note that the sum of the differences from the
the mean of the data as a whole mean will always be 0

For example…

2E
∑ ( 𝑥𝑖− 𝑥)2
( )
2 𝑆 𝑥𝑥
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=
∑ 𝑥𝑖 2 − ∑ 𝑥 𝑖 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=
𝑛 𝑛
𝑛 𝑛

Measures of location and spread


The variance and standard deviation 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=
∑ ( 𝑥𝑖− 𝑥)2
can also be used to analyse a set of 𝑛 This formula is equivalent
data (a demonstration of why is
at the end of this section)
( )
“Mean of the
∑ 𝑥𝑖 2 ∑ 𝑥𝑖
2
squares subtract ¿ −
The variance is defined as: the square of the 𝑛 𝑛
mean” This formula is
equivalent
𝑆 𝑥𝑥
“The average of the squared ¿
𝑛
distances from the mean”
The notation is short for or

So the distances of each data point


from the mean are all squared, and
divided by how many there are.

The squared distance is used because,


 This gives the formula to the right: as we just saw, the unsquared
differences always sum to 0

2E
√ √ √
∑ ( 𝑥 𝑖 −∑
( ) ( )
2 𝑆 𝑥𝑥
𝑥 𝑖 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= 𝑆 𝑥𝑥
2 2 2 2

𝑥 )( 𝑥 𝑖− 𝑥 ) 𝑥 2 ∑ ∑ ∑ ∑ ( 𝑥 𝑖 − 𝑥 )2 𝜎 = ∑ 𝑥
( )
2
𝜎
2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=
=
𝑛 𝜎 2
𝑛
𝑖
𝑛 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=−
𝑛
𝑥𝑖 𝜎𝑥2𝑖 =
𝑛
−𝑛
𝑛
𝑛 𝜎= 𝑖
2

∑ 𝑥𝑖 𝜎 = 𝑆 𝑥𝑥
𝑛 𝑛 𝑛 𝑛

Measures of location and spread


The variance and standard deviation
can also be used to analyse a set of
data

The standard deviation is the square


root of the variance.

The symbol (lower case sigma) is used


to represent standard deviation

Therefore is usually used to


represent the variance This is what it looks like in
the formula booklet!
 The square root of any of these
gives the Standard Deviation

2E
√ √ √
∑ ( 𝑥 𝑖 − 𝑥 )2
( )
2 𝑆 𝑥𝑥
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 ∑ ( 𝑥 𝑖 − 𝑥 )2 𝜎 = ∑ 𝑥
( )
2
2
𝜎 =
𝑛 𝜎 2
=
𝑛 𝑛
2
𝜎 =
𝑛 𝜎= 𝑖
2

∑ 𝑥𝑖 𝜎 = 𝑆 𝑥𝑥
𝑛 𝑛 𝑛 𝑛

Measures of location and spread


The variance and standard deviation
can also be used to analyse a set of
data

 The Standard Deviation tells you the range from the mean which
contains around 68% of the data (if data is normally distributed – you
will learn about this at a later date)

For example, if 100 students have a mean height of 150cm and a


standard deviation of 10cm.

150
68 of the students are within
140 160 one Standard Deviation

95 of the students are within


130 170 two Standard Deviations

2E
√ √ √
∑ ( 𝑥 𝑖 − 𝑥 )2
( )
2 𝑆 𝑥𝑥
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 ∑ ( 𝑥 𝑖 − 𝑥 )2 𝜎 = ∑ 𝑥
( )
2
2
𝜎 =
𝑛 𝜎 2
=
𝑛 𝑛
2
𝜎 =
𝑛 𝜎= 𝑖
2

∑ 𝑥𝑖 𝜎 = 𝑆 𝑥𝑥
𝑛 𝑛 𝑛 𝑛

Measures of location and spread


The variance and standard deviation

( )
∑𝑥
2 2
can also be used to analyse a set of 2 𝑥
𝜎 = −
data 𝑛 𝑛
Sub in values – we need
and

( )
2
2 218 36
The marks gained in a test by seven 𝜎 = −
randomly selected students are: 7 7
Calculate

𝜎 2= 4.69
𝑥 3 4 6 2 8 8 5 ∑ 𝑥=36 Square root
𝑥
2
9 16 36 4 64 64 25 ∑ 𝑥❑2 =218 𝜎 =2.17
Find the variance and standard ❑
deviation of the marks of the seven
students.

 The middle of the 3 formulae above


is most commonly used when you have
the ‘raw’ data

2E
√ √ √
∑ ( 𝑥 𝑖 − 𝑥 )2
( )
2 𝑆 𝑥𝑥
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 ∑ ( 𝑥 𝑖 − 𝑥 )2 𝜎 = ∑ 𝑥
( )
2
2
𝜎 =
𝑛 𝜎 2
=
𝑛 𝑛
2
𝜎 =
𝑛 𝜎= 𝑖
2

∑ 𝑥𝑖 𝜎 = 𝑆 𝑥𝑥
𝑛 𝑛 𝑛 𝑛

Measures of location and spread


The variance and standard deviation
can also be used to analyse a set of
data

The marks gained in a test by seven


randomly selected students are: 2 3 4 5 6 8 8
2.97 5.14 7.31

3 4 6 2 8 8 5 −2.17 +2.17
𝑀 𝑒𝑎𝑛=5.14
Find the variance and standard So out of the original 7 values, 4 are within
deviation of the marks of the seven 1 standard deviation of the mean
students.
 This is only 57% rather than 68%,
𝜎 =2.17 because the data size is very small!
 So this means that 68% of the
data is within 2.17 of the mean. Note
that 68% is not always a possible
percentage though!

2E
√ √ √
∑ ( 𝑥 𝑖 − 𝑥 )2
( )
2 𝑆 𝑥𝑥
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 ∑ ( 𝑥 𝑖 − 𝑥 )2 𝜎 = ∑ 𝑥
( )
2
2
𝜎 =
𝑛 𝜎 2
=
𝑛 𝑛
2
𝜎 =
𝑛 𝜎= 𝑖
2

∑ 𝑥𝑖 𝜎 = 𝑆 𝑥𝑥
𝑛 𝑛 𝑛 𝑛

Measures of location and spread


𝑥 𝑓 𝑓𝑥 𝑓𝑥
2

The variance and standard deviation


Time (mins) Frequency
can also be used to analyse a set of
data 35 3 105 3675
36 17 612 22032
Shamsa records the time spent out of
school during the lunch hour to the
37 29 1073 39701
nearest minute, x, of the female 38 34 1292 49096
students in her year. The results are
shown in the table. ∑ 𝑓 =83 ∑ 𝑓𝑥=3082 ∑ 𝑓 𝑥 2=114504
❑ ❑ ❑


Calculate the standard deviation of
( )
2

the time spent out of school. 𝜎=


∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥
Sub in values – we
∑𝑓 ∑𝑓 need to use the
table to calculate


 For tabled data, we need to use a
( )
114504 3 082
2 these!
modified formula… 𝜎= −
83 83

√ √
Calculate

(∑ )
2
∑𝑥 −
( ) ∑𝑥
2
2
𝜎=
∑ 𝑓 𝑥2 −
∑𝑓 𝑥
𝜎=
𝑛 𝑛 ∑ 𝑓 𝑓 𝜎 =0.861

2E
√ √ √ √
∑ ( 𝑥 𝑖 − 𝑥 )2
( )
2 𝑆 𝑥𝑥
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 ∑ ( 𝑥 𝑖 − 𝑥 )2 𝜎 = ∑ 𝑥
( ) ( )
2
∑ 𝑥𝑖 𝜎 = 𝑆 𝑥𝑥
2
2
𝜎 = 𝜎 2
=
2
𝜎 =
𝑛 𝜎=
∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥 𝑖
2
𝑛 𝑛 𝑛 𝜎= −
∑𝑓 ∑𝑓 𝑛 𝑛 𝑛 𝑛

Measures of location and spread


The variance and standard deviation For a grouped table we need to use the
can also be used to analyse a set of midpoints (as in the previous section)
data

Andy recorded the length, in minutes,


of each telephone call he made for a
month. The data is summarized in the
𝑥 table below. 𝑓 𝑓𝑥 𝑓 𝑥2
Length of call (mins) Frequency

2.5 4 10 25
7.5 15 112.5 843.75
12.5 5
62.5 781.25
17.5 35 612.5
2
40 0 0
0
65 65 4225
∑ 𝑓1 =27 ∑ 𝑓𝑥=285 ∑ 𝑓 𝑥 2=6487.5
Calculate an estimate of the❑standard ❑ ❑
deviation of the length of the
phonecalls

2E
√ √ √ √
∑ ( 𝑥 𝑖 − 𝑥 )2
( )
2 𝑆 𝑥𝑥
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 ∑ ( 𝑥 𝑖 − 𝑥 )2 𝜎 = ∑ 𝑥
( ) ( )
2
∑ 𝑥𝑖 𝜎 = 𝑆 𝑥𝑥
2
2
𝜎 = 𝜎 2
=
2
𝜎 =
𝑛 𝜎=
∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥 𝑖
2
𝑛 𝑛 𝑛 𝜎= −
∑𝑓 ∑𝑓 𝑛 𝑛 𝑛 𝑛

Measures of location and spread


The variance and standard deviation
can also be used to analyse a set of
data

Andy recorded the length, in minutes,

√ ( )
2
of each telephone call he made for a ∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥
𝜎= Sub in values – we
month. The data is summarized in the ∑𝑓 ∑𝑓 need to use the
𝑥 table below. 𝑓 table to calculate

√ ( )
these!
Length of call (mins) Frequency 2
6 487.5 2 85
2.5 4
𝜎= −
27 27
7.5 15 Calculate
12.5 5
17.5 𝜎 =11.35
2
40
0
65
1

Calculate an estimate of the standard


deviation of the length of the
∑ 𝑓 =27 phonecalls 2
𝑓 𝑥 =6487.5 ∑
❑ ∑ 𝑓𝑥=285 ❑
❑ 2E
Measures of location and spread
∑ ( 𝑥𝑖− 𝑥)2
Why are the two expressions below 𝑛
equivalent? Square the bracket
¿
∑ (𝑥 𝑖
2
− 2 𝑥𝑖 𝑥+ 𝑥
2
)
𝑛

( )
∑ 𝑥𝑖 2 − ∑ 𝑥 𝑖
2
∑ ( 𝑥𝑖− 𝑥)2 Now the bracket is all
and separate terms, we can
𝑛 𝑛 𝑛 write it as 3 ‘sums’
¿
∑ 𝑥𝑖 − ∑ 2 𝑥 𝑖 𝑥+ ∑ 𝑥
2 2

 We are going to take the expression on To show


Note that this is ‘x-bar this works…
squared’ Mean = 5
ie) the mean
the left, and show it can be written as the
squared
expression on the right… 𝑥𝑖
2
2 𝑥𝑖 𝑥 𝑥 2 𝑥𝑖 2 − 2 𝑥 𝑖 𝑥 + 𝑥 2
 It is not , although it kindof looks like that!
 You can see that combining the
3 9 30 25 4
calculations before summing gives the same 4 1
16 40 25
answer as summing each part first, and
combining after… 8 64 80 25 9

∑ 𝑥𝑖 ∑ 2𝑥𝑖 𝑥
2

¿ 89 ¿ 150 ¿ 75
∑𝑥 2
∑ ( 𝑥𝑖2 −2𝑥𝑖 𝑥+𝑥2 )
¿ 14

∑𝑖 ∑ 𝑖 ∑
𝑥 2
− 2 𝑥 𝑥+ 𝑥 2
¿ 14
2E
Measures of location and spread
∑ ( 𝑥𝑖− 𝑥)2
Why are the two expressions below 𝑛
equivalent? Square the bracket
¿
∑ (𝑥 𝑖
2
− 2 𝑥𝑖 𝑥+ 𝑥
2
)
𝑛

( )
∑ 𝑥𝑖 2 − ∑ 𝑥 𝑖
2
∑ ( 𝑥𝑖− 𝑥)2 Now the bracket is all
and separate terms, we can
𝑛 𝑛 𝑛 write it as 3 ‘sums’
¿
∑ 𝑥𝑖 − ∑ 2 𝑥 𝑖 𝑥+ ∑ 𝑥
2 2

𝑛 Rewrite, and find ways


to replace terms 2 and
 We are going to take the expression on
the left, and show it can be written as the ¿
∑ 𝑥𝑖
2

∑ 2 𝑥𝑖 𝑥 ∑ 𝑥
+
2
3
expression on the right… 𝑛 𝑛 𝑛
The second term has
∑ 𝑥𝑖 2
2 ∑𝑥 2 now been replaced
¿ −2 𝑥 +
∑ 2 𝑥𝑖 𝑥 𝑛 𝑛

Since the mean is constant, and 2 is 𝑛


constant, we can take these out as a factor 2 𝑥 ∑ 𝑥𝑖
¿
Note that
(every is the
data mean,
point (the
will be sum of all
multiplied by the
, 𝑛
data points, divided by the number of data
so we may as well add them all up first,
and multiply thepoints)
total afterwards) ¿2 𝑥
2

2E
Measures of location and spread
∑ ( 𝑥𝑖− 𝑥)2
Why are the two expressions below 𝑛
equivalent? Square the bracket
¿
∑ (𝑥 𝑖
2
− 2 𝑥𝑖 𝑥+ 𝑥
2
)
𝑛

( )
∑ 𝑥𝑖 2 − ∑ 𝑥 𝑖
2
∑ ( 𝑥𝑖− 𝑥)2 Now the bracket is all
and separate terms, we can
𝑛 𝑛 𝑛 write it as 3 ‘sums’
¿
∑ 𝑥𝑖 − ∑ 2 𝑥 𝑖 𝑥+ ∑ 𝑥
2 2

𝑛 Rewrite, and find ways


to replace terms 2 and
 We are going to take the expression on
the left, and show it can be written as the ¿
∑ 𝑥𝑖
2

∑ 2 𝑥𝑖 𝑥 ∑ 𝑥
+
2
3
expression on the right… 𝑛 𝑛 𝑛
The second term has
∑ 𝑥𝑖 2
2 ∑𝑥 2 now been replaced
¿ −2 𝑥 +
𝑛 𝑛
The third term can now
∑¿𝑥∑ 𝑥2
𝑖
2
2 2 been replaced
−2 𝑥 + 𝑥
𝑛
We will be adding up copies of the mean 𝑛
squared (remember the table from before?) 2
𝑛𝑥
¿
So the sum of the mean squareds is the 𝑛
Divide
number of the
bitsnumerator and denominator
of data, multiplied by
by the mean
squared ¿ 𝑥2

2E
Measures of location and spread
∑ ( 𝑥𝑖− 𝑥)2
Why are the two expressions below 𝑛
equivalent? Square the bracket
¿
∑ (𝑥 𝑖
2
− 2 𝑥𝑖 𝑥+ 𝑥
2
)
𝑛

( )
∑ 𝑥𝑖 2 − ∑ 𝑥 𝑖
2
∑ ( 𝑥𝑖− 𝑥)2 Now the bracket is all
and separate terms, we can
𝑛 𝑛 𝑛 write it as 3 ‘sums’
¿
∑ 𝑥𝑖 − ∑ 2 𝑥 𝑖 𝑥+ ∑ 𝑥
2 2

𝑛 Rewrite, and find ways


to replace terms 2 and
 We are going to take the expression on
the left, and show it can be written as the ¿
∑ 𝑥𝑖
2

∑ 2 𝑥𝑖 𝑥 ∑ 𝑥
+
2
3
expression on the right… 𝑛 𝑛 𝑛
The second term has
∑ 𝑥𝑖 2
2 ∑𝑥 2 now been replaced
¿ −2 𝑥 +
𝑛 𝑛
The third term can now
∑ 𝑥𝑖 2
2 2 been replaced
¿ −2 𝑥 + 𝑥
𝑛
Group together
∑ 𝑥𝑖
2
2
like terms
¿ −𝑥
𝑛
Remember that is the mean,
ie the sum of all the data
∑ ∑ 𝑥𝑖
( )
2
𝑥𝑖 2 points, divided by
¿ −
𝑛 𝑛
2E
Teachings for
Exercise 2F
√ ( ) √ ( )
∑𝑥

2
𝑆 𝑥𝑥
2

𝜎=
∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥 𝜎=
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 𝑥= 𝜎=
∑𝑓 ∑𝑓 𝑛 𝑛 𝑛 𝑛

Measures of location and spread


Imagine we have the following data on people’s
Coding can be used to make a set heights (cm)
of values simpler to work with
145 170 168 166 151 147 150 172
If numbers in a data set are
particularly large, they can all be Mean = 158.625 Range = 27
altered in the same way to make
them smaller If all the values were multiplied by 2, what would
happen to the measures above?

 This will however affect the  Mean and range would double
measures of location and
dispersion that we have been  If all the values above had 20 added to them,
calculating, and you need to be what would happen to the measures above?
aware of these effects…
 Mean would increase by 20, but the range
would stay the same
If you change a set of data by adding or subtracting
an amount, this will not affect the range, or any
other measures of spread, such as the IQR or
standard deviation

2F
√ ) √ ( ) 𝑛 √
∑𝑥
(
2
𝑆 𝑥𝑥
2

𝜎=
∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥 𝜎=
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 𝑥= 𝜎=
∑𝑓 ∑𝑓 𝑛 𝑛 𝑛

Measures of location and spread


Coding can be used to make a set 𝑦
2
of values simpler to work with
332 3.2 1 0.24
355 5.5 3 0.25
A scientist measures the
temperature, at five different 306 0.6 0 .36
points in a nuclear reactor. Her
results are given below: 317 1.7 2 .89
340 4 16
332, 355, 306, 317, 340 ∑ 𝑦 =15 ∑ 𝑦 2=59.74
❑ ❑
 To code the data. take each starting value, subtract 300 from
it, and divide the answer by 10
a) Use the coding to code this
data  Now we can calculate the mean and standard deviation of this
new information…
b) Calculate the mean and standard
deviation of the coded data ∑𝑦
√ ( )
2

𝑦= 𝜎 𝑦=
∑ 𝑦 𝑖2 − ∑ 𝑦 𝑖
𝑛 Sub in 𝑛 𝑛
c) Use your answer to b) the Sub in


calculate the mean and standard 15 values values
−( )
2
𝑦= 5 9.74 15
deviation of the original data. 5 𝜎 𝑦=
Work 5 5 Work
out out
𝑦 =3 𝜎 𝑦 =1.72
2F
√ ( ) √ ( )
∑𝑥

2
𝑆 𝑥𝑥
2

𝜎=
∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥 𝜎=
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 𝑥= 𝜎=
∑𝑓 ∑𝑓 𝑛 𝑛 𝑛 𝑛

Measures of location and spread


Coding can be used to make a set Original mean
of values simpler to work with
 The original data had 300 subtracted, and then
was divided by 10
A scientist measures the
temperature, at five different  We need to reverse this, so multiply by 10, and
points in a nuclear reactor. Her then add 300
results are given below: 3 ×10 +300
¿ 330 ℃
332, 355, 306, 317, 340
Original standard deviation
a) Use the coding to code this  The original data had 300 subtracted, and then
data was divided by 10
𝒚 =𝟑
b) Calculate the mean and standard  The subtracting 300 will not have affected the
deviation of the coded data 𝝈 =𝟏 . 𝟕𝟐 standard deviation, so we only need to multiply
𝒚

c) Use your answer to b) the by 10


calculate the mean and standard 1.72 ×10
deviation of the original data.
¿ 17.2 ℃
2F
√ ( ) √ ( )
∑𝑥

2
𝑆 𝑥𝑥
2

𝜎=
∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥 𝜎=
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 𝑥= 𝜎=
∑𝑓 ∑𝑓 𝑛 𝑛 𝑛 𝑛

Measures of location and spread


Coding can be used to make a set The mean has had 5 subtracted and then been
of values simpler to work with multiplied by 10

 You can write this as a formula:


From the large data set, date on
the maximum gust, knots, is 𝑔 −5
h=
recorded in Leuchars during May 10 We know the
and June 2015. 𝑔 −5 mean of h
2=
10
Multiply by 10
The data was coded using and the
following statistics found: 2 0=𝑔 − 5
Add 5
25=𝑔

This is a better way to show your


workings!
Calculate the mean and standard
deviation of the maximum gust in
𝑔=25 knots.

2F
√ ( ) √ ( )
∑𝑥

2
𝑆 𝑥𝑥
2

𝜎=
∑ 𝑓 𝑥2 − ∑ 𝑓 𝑥 𝜎=
∑ 𝑥 𝑖2 − ∑ 𝑥𝑖 𝑥= 𝜎=
∑𝑓 ∑𝑓 𝑛 𝑛 𝑛 𝑛

Measures of location and spread


Coding can be used to make a set We need to find the standard deviation of h first!
of values simpler to work with
 Use the formula above…

From the large data set, date on


the maximum gust, knots, is
recorded in Leuchars during May
𝜎 h=
√ 𝑆 hh
𝑛
Sub in values
and June 2015.

The data was coded using and the


𝜎 h=
√ 4 3.58

𝜎 h=0 .845 …
61 Calculate

following statistics found:


As the subtraction has not affected the standard deviation,
we only need to undo the division by 10

 Like with the previous example, we can write this as a


formula
𝜎𝑔
𝜎 h=
10
Calculate the mean and standard Sub in values
𝜎𝑔
deviation of the maximum gust in 0.845=
10
𝑔=25 𝜎 𝑔 =8.45
knots. Multiply by 10
8.45=𝜎 𝑔

2F

You might also like