Data Presentation & Interpretation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Head to savemyexams.co.

uk for more awesome resources

YOUR NOTES
AS Maths CIE 

1. Data Presentation & Interpretation

CONTENTS
1.1 Statistical Measures
1.1.1 Basic Statistical Measures
1.1.2 Frequency Tables
1.1.3 Standard Deviation & Variance
1.1.4 Coding
1.2 Representation of Data
1.2.1 Data Presentation
1.2.2 Stem and Leaf Diagrams
1.2.3 Box Plots & Cumulative Frequency
1.2.4 Histograms
1.3 Working with Data
1.3.1 Interpreting Data
1.3.2 Skewness

Page 1 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.1 Statistical Measures YOUR NOTES



1.1.1 Basic Statistical Measures
Types of Data
What are the different types of data?
Qualitative data is data that is usually given in words not numbers to describe
something
For example: the colour of a teacher's car
Quantitative data is data that is given using numbers which counts or measures
something
For example: the number of pets that a student has
Discrete data is quantitative data that needs to be counted
Discrete data can only take specific values from a set of (usually finite) values
For example: the number of times a coin is flipped until a tails is obtained
Continuous data is quantitative data that needs to be measured
Continuous data can take any value within a range of infinite values
For example: the height of a student
Age can be discrete or continuous depending on the context or how it is defined
If you mean how many years old a person is then this is discrete
If you mean how long a person has been alive then this is continuous

Page 2 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Mean, Mode, Median YOUR NOTES


What are mean, median and mode? 
Mean, median and mode are measures of location
A measure of location gives information about where data is in the number
system
Mean, median and mode are measures of central tendency
They describe where the centre of the data is
They are all types of averages
In Statistics it is important to be specific about which average you are referring
to
How are mean, median and mode calculated?
You should already be familiar with finding the mean, median and mode from raw,
ungrouped data
The mode is the value that occurs most often in a data set
In a frequency table the group or class that occurs most often will be referred
to as the modal class
A data set with more than one mode is bimodal
The median is the middle value when the data is in order of size
If there are two values in the middle of the data set, the median is the
midpoint of the two values
If finding median from a frequency table find the cumulative frequency first
and find the group or class where the middle value will lie
The mean is the sum of all the values divided by the number of values in the data
set
What are summary statistics and their notation?
Summary statistics are information that summarises a set of data values
For n items in a data set:
The sum of the data is represented by
n
∑x i = x 1 + x 2 + . . . + x n
i =1

This is usually written ∑ x and reads as ‘sigma x’


The mean of the data is represented by
x1+ x2+ . . . + xn Σx
⎯⎯
x= =
n n
This reads as ‘x bar’
You will come across more summary statistics later in the course
How do we choose the best measure of central tendency?
It is often better to use one of the averages over the others, depending on the data
set

Page 3 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

It’s a good idea to be aware of the advantages and disadvantages of the use of YOUR NOTES
each average 
The mean uses all of the data values, this is good for a large data set where all of
the values are close together, but also means that the mean can be affected by
extreme values
The median is not affected by very high or low values so is a good average to use
in data sets with extreme values
The mode is very useful in a lot of practical situations, however often there may be
more than one mode, no mode or even a mode that is nowhere near the middle of
the data set

 Worked Example
For the data set given below, find the mode, median and mean.
23 19 14 28 27 19

Page 4 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Quartiles & Range YOUR NOTES


What are quartiles? 
Quartiles and percentiles are measures of location
Quartiles divide a population or data set into four equal sections
The lower quartile, Q 1 splits the lowest 25% from the highest 75%
The median, Q 2 is the value that is 50% of the way through the data
The upper quartile, Q 3 splits the lowest 75% from the highest 25%
How are quartiles calculated?
For a data set of size, n ,
n +1
To find the lower quartile, calculate 4
n +1
If 4
is an integer then the lower quartile is the corresponding value
n +1
If 4
is not an integer then the lower quartile is the midpoint of the two
n +1
values corresponding to the integers either side of 4
3 (n +1)
To find the upper quartile, calculate 4
3 (n +1)
If 4
is an integer then the upper quartile is the corresponding value
3 (n +1)
If 4
is not an integer then the upper quartile is the midpoint of the
3 (n +1)
two values corresponding to the integers either side of 4
n +1
The median can be found in the same way using 2

What are the range and interquartile range?


The range and interquartile range are both measures of spread
A measure of spread gives information about how spread out the data set is
The range is the difference between the largest and smallest values in the data set
All data points in the set will be included in the range, including extreme
values
The interquartile range is the difference between the upper quartile and the lower
quartile
Only the middle 50% of the data is included in the interquartile range
It is not affected by extreme values
The units for range and interquartile range are the same as the units for the
original data

Page 5 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Worked Example YOUR NOTES


 Find the range and interquartile range for the data set given below

43 29 70 31 84 56 17
53.

 ExamBe Tip
aware of the difference between averages and ranges, especially
when answering contextual questions asking you to describe or
compare data. Remember, averages give an indication of where the data
are whilst range gives an indication of how varied the data are.

Page 6 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.1.2 Frequency Tables YOUR NOTES



Frequency Tables
In most cases in your statistics course, you will come across data that is presented in a
frequency table. These allow data to be summarised and make them easier to work
with.

Page 7 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Ungrouped Data YOUR NOTES


A frequency table for ungrouped data shows the frequency of individual data values 
Why use a frequency table for ungrouped data?
When collecting large amounts of raw data, it is quicker to use a tally system and
then collate the results into a frequency table
Ungrouped frequency tables are normally used for numerical, discrete data
Organising data into a frequency table makes it easier to work with
Calculating averages, ranges and summary statistics can be done much
quicker from a frequency table than from raw data
It gives a clear pattern of the data
It is easy to quickly see where most of the data are and to see extreme values
A frequency table for ungrouped data keeps all of the original data values
It is still possible to calculate the actual averages, ranges and summary
statistics
How are mean, median and mode calculated from an ungrouped
frequency table?
You should already be familiar with finding the mean, median and mode from raw,
ungrouped data
The mode is the value that occurs most often in a data set
In an ungrouped frequency table the data value with the highest frequency
will be the mode
The median is the middle value when the data is in order of size
To find the median from an ungrouped frequency table, add the frequencies
together until you reach the value that is half of the total
For a data set of values, calculate n +1
2
and the median will be whichever data
value corresponds with this frequency
The mean is the sum of all the values divided by the number of values in the data
set
To find the mean, multiply each data value by its corresponding frequency,
find the sum of these values and then divide by the total frequency
The notation for the sum of the values is Σxf
Σxf
The formula for the mean from a frequency table is Σf

Page 8 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Worked Example YOUR NOTES


 The frequency table below gives information about the shoe size of a group

of year 12 students.
Shoe Size 37 37.5 38 38.5 39 39.5 40 40.5 41
Frequency 3 3 12 5 16 9 7 4 1
(i)
Write down the modal shoe size.
(ii)
Explain how you know that the median shoe size is 39.
(iii)
Calculate the mean shoe size.

 ExamUseTip
common sense when checking your answers, is your mean within
the range of the data? Does is seem right? A mean of 140 for example
could not be correct if the data is about ages of students taking an
exam at university.

Page 9 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Grouped Data YOUR NOTES


A frequency table for grouped data is usually used for large amounts of continuous 
data. They shows the frequency of data values that are within a particular group or
class.

What are the advantages and disadvantages of grouping data?


Grouping data is especially useful when data can take a large range of different
values
Trends and patterns can be easily spotted when data has been grouped
Calculations are much quicker with data that has been grouped
It is important to be aware, however, that grouped frequency tables also have
some negatives
The actual data values are lost when data is grouped
It is only possible to calculate estimated averages, ranges and summary
statistics
Notation for grouped frequency tables
When grouping data, it is important to be clear about which group or class any
data value should be entered into
A group entry of 10 – 20 followed by 20 – 30 would be unclear because the
number 20 could be entered into both groups
If the data are discrete, this could be written as 10 – 19 and 20 – 29
For continuous data, this could be changed to 10 – and 20 –
This would most likely be represented as 10 ≤ x < 20 and 20 ≤ x < 30
Most commonly inequalities are used to group continuous data as they leave no
ambiguity
If the data are continuous, always check that there are no gaps between upper
boundary of a class and the lower boundary of the next class
If there are gaps you will need to close these gaps by changing the boundaries
before carrying out any calculations
For example, the groups 10 ≤ x ≤ 19 followed by 20 ≤ x ≤ 29 will become
9 . 5 ≤ x < 19 . 5 and 19 . 5 ≤ x < 29 . 5
Check the inequality signs carefully
Be careful when deciding what category the data falls into, taking the group
10 ≤ x ≤ 19 for example,
if the data had been rounded it would take the form 9. 5 ≤ x < 19. 5 as described
above
if the data had been truncated however, then the boundaries would become
10 ≤ x < 20
this is most likely to happen with age, which is technically a continuous
variable due to being able to take any, however we would usually consider
age by counting years
Finding averages from grouped frequency tables
Instead of finding the mode, when working with a grouped frequency table we
instead find the modal class
This will be the class (group) with the greatest frequency

Page 10 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

It is only possible to calculate an estimate for the mean and the median from a YOUR NOTES
grouped frequency table 
Calculating the estimated mean is the same as for ungrouped frequency tables,
however you will need to find the midpoints first
the midpoint is the mean of the upper and lower values in the class
boundaries
multiply the midpoint by its corresponding frequency, find the sum of these
values and then divide by n
Calculating the estimated median is more complicated and questions will only ask
for the class that the estimated median would be in

 Worked Example
The table below shows the heights in cm of a group of year 12 students.
Height, h Frequency

150 ≤ h < 155 3


155 ≤ h < 160 5
160 ≤ h < 165 9
165 ≤ h < 170 7
170 ≤ h < 175 1

(i)
Write down the class that a student of exactly 160 cm should be added to.

(ii)
Write down the modal class.

(iii)
Calculate the estimated mean height.

Page 11 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

 ExamThere
Tip
can be a lot of calculations when working with grouped frequency
tables, be extra careful when using your calculators as it is easy to make
small errors with these questions. Use the table and add information to
the table as you go as there will be marks available for showing work
within the methods.

Page 12 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.1.3 Standard Deviation & Variance YOUR NOTES



Standard Deviation & Variance
The variance is another measure for the spread of the data, it measures the variability
from the mean of the data.
What is the variance and the standard deviation?
The variance is a statistic that tells us how varied a set of data is
Data that is more spread out will have a greater variance
Data that is consistent and close together will have a smaller variance
The standard deviation is the square root of the variance
The symbol for the population standard deviation is the lowercase Greek letter
sigma, σ and for variance is sigma squared, σ2
Standard deviation and variance are used interchangeably within this course so
make sure you look out for which one a question shows or asks for
How are the variance and standard deviation calculated?
There is more than one formula that can be used for calculating the variance, and
you should choose the most useful one
For a set of n values x 1, x 2, . . . , xi , . . . ,x n the variance is the sum of the squares of
the deviations from the mean, divided by the frequency
⎯ 2
Σ(x − x)
Variance = n
This formula can be time consuming and therefore is rarely used in this
statistics course
A second, easier to use version of the variance is:
2
Variance = Σx − ⎯⎯x 2
n
This version is easier to work with and should be used in most instances
Variance can also be written in other ways
⎯⎯
Σ(x − x ) 2 1⎛ (Σx ) 2 ⎞⎟ Σx 2 ⎛⎜ Σx ⎞⎟ 2 Σx 2 ⎯⎯ 2
Variance = σ2= = ⎜⎜ Σx 2 − ⎟= −⎜ ⎟ = −x
n n⎝ n ⎠ n ⎝ n ⎠ n
An easy way to remember this is to think of it as ‘the sum of x squared
over n minus the sum of the mean squared’
Most calculators can be used to find summary statistic such as the
standard deviation and variance fairly quickly, practice finding it on yours
The standard deviation is the square root of the variance
⎯⎯
Σ(x − x ) 2
Σx 2
Standard deviation = σ = =
⎯⎯
− x2
n n
Makes sure you know how to find these formulae in the formula booklet
and are familiar with the version given
The units for standard deviation are the same as the units for the data and the
units for variance are the same as the units for the data but squared

Page 13 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

How are the variance and standard deviation calculated from a frequency YOUR NOTES
table? 
The method for finding the variance from a frequency table is similar to that of the
mean
If calculating from a grouped frequency table, find the midpoints, x , first
Multiply each x value by its corresponding frequency and use these values
within the formulae
The formulae will become
⎯⎯
Σ(x − x ) 2f Σx 2f ⎛⎜ Σxf ⎞⎟ 2
Variance = σ 2 = = −⎜ ⎟
Σf Σf ⎝ Σf ⎠
⎯⎯ 2
Σ(x − x ) 2f Σx 2f ⎛⎜ Σxf ⎞⎟
Standard deviation = σ = = −⎜ ⎟
Σf Σf ⎝ Σf ⎠

 Worked Example
Phoom recorded the length of time, t , it took him, in minutes, to answer a
selection of further calculus exam questions. The data is summarised in the
table below.
Time (minutes) Frequency

2≤t <4 1
4≤t <6 4
6≤t <8 7
8 ≤ t < 10 5
10 ≤ t < 12 2
12 ≤ t < 20 2

(i)
Calculate an estimate of the variance of the time taken to complete a
question, include units with your answer.

(ii)
Write down an estimate of the standard deviation of the time taken to
complete a question, include units with your answer.

Page 14 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

Page 15 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES
 ExamLookTipout for whether a question gives or asks for the standard deviation 
or variance, especially if the question is using sigma notation.
Choose which formula to use wisely, most of the time the summary
statistics will be given so only one of the formulae will be possible. On
the rare occasion that you are asked to calculate directly from a table
think carefully about which version of the formula is quickest and
easiest to use. It will almost always be the second version given in this
revision note.

Page 16 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.1.4 Coding YOUR NOTES



Coding
Sometimes data needs to be coded for further use with calculations. This is particularly
useful with data that deals with very small or very large numbers, or with data that
needs to be classified for research purposes.
What is coding?
Coding is a way of simplifying data to make it easier to work with
The coding must be carried out on all values within the data set and will normally
be done using a given formula
Coding can be carried out in a number of ways:
Adding or subtracting a constant to each data value
Multiplying or dividing each data value by a constant
A combination of both of the above
You may occasionally see coding called an assumed mean
This refers to the value chosen to subtract from the original data
It is usually chosen as an easy value, similar to an estimate of the true mean
How are statistical calculations carried out with coded data?
If you know the mean or standard deviation of the original data it is possible to
find the mean or standard deviation of the coded data and vice versa
It is important to remember what the mean and standard deviation actually tell us
about the data to understand how coding calculations work
The mean is a measure of location, changing the data set in any way will cause
the mean to change in the same way
The standard deviation is a measure of spread, adding or subtracting a
constant to every value within the data set will not change the standard
deviation of the data set
Multiplying or dividing every value within the data set by a constant will
change the standard deviation by the modulus of the constant
If the data were coded by multiplying or dividing by a negative, the
standard deviation will change by the equivalent positive value
Anytime calculations are carried out on data that has been coded,
The original mean can be found by solving the equation to reverse the coding
For example, if the data, x , was coded using the formula
y = ax + b
Then the mean of the coded data, ⎯⎯y would be
y⎯⎯ = ax⎯⎯ + b
The original mean, ⎯⎯x , will be
⎯⎯
⎯⎯ y −b
x =
a
The original standard deviation

Page 17 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Will be the same as the coded standard deviation if the data was coded by YOUR NOTES
adding or subtracting a constant only 
Can be found by reversing the coding if the data was coded by multiplying
or dividing by a constant only
If the data was coded by a combination of both then only the multiplying
or dividing will need to be reversed to find the original standard deviation
For example, if the data, x , was coded using the formula
y = ax + b
Then the standard deviation of the coded data, σ y would be
σ y =  a  σ x

The original standard deviation, σ x , will be


σy
σ x =  
a 
What will summary statistics and formulae look like with an assumed
mean?
If an assumed mean, a, has been subtracted from all data values, x, then the
summary statistics for the coded data will be
The sum of the coded data Σ(x − a)
The sum of the squares of the coded data Σ(x − a) 2
Be careful not to mix this up with the square of the sum of the coded data
(Σ(x − a)) 2
If an assumed mean, a, has been subtracted from all data values, x, then the
formulae for the mean and standard deviation for the coded data will be
Σ (x − a )
(x − a ) =
n
2
Σ (x − a) 2 ⎛⎜ Σ (x − a) ⎞⎟
σx −a = −⎜ ⎟
n ⎝ n ⎠

Most questions will give you either the summary statistics or the mean of the
coded data and expect you to work with these formulae to find original
information about the data

Page 18 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Worked Example YOUR NOTES


 A coffee machine is set to dispense 150 ml of coffee per cup.

In a random sample of 20 cups of coffee Σ( c − 150) = − 16 , where c ml is


the volume of coffee in a cup.
(a)
Find the mean volume in a cup of coffee.

(b)
Given that Σ( c − 150) 2 = 112 , find the standard deviation of the sampled
cups of coffee.

(a)
Find the mean volume in a cup of coffee.

(b)
Given that Σ( c − 150) 2 = 112 , find the standard deviation of the sampled cups of
coffee.

Page 19 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES
 ExamBe Tip
careful when using the formulae for the mean and standard deviation 
with coded summary statistics, you must make sure that you use the
summary statistics consistently throughout. For example, if you use the
sum of the coded data squared in the formula for the standard
deviation, you must subtract the square of the coded mean.

Page 20 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.2 Representation of Data YOUR NOTES



1.2.1 Data Presentation
Data Presentation
What graphs and diagrams should I be familiar with?
You will be expected to be able to use a variety of graphs such as:
Stem-and-leaf diagrams
Can be used with ungrouped data of a single variable
Shows all the data and the shape of its distribution
Box plots
Can be used with ungrouped data of a single variable
Shows the range, interquartile range and quartiles clearly
Very useful for comparing data patterns quickly
Cumulative frequency graphs
Can be used with continuous grouped data of a single variable
Shows the running total of the frequencies that fall below the upper bound
of each class
Histograms
Can be used with continuous grouped data of a single variable
Can be used with varying group sizes
Shows the frequencies of the group, represented by the area of each bar
You might be expected to draw a full diagram or to add to an incomplete diagram
What should I look out for when interpreting graphs?
Look carefully at the context of the information given in the graph
Check the scales on both axes carefully, including units
Sometimes the numbers will be abbreviated to fit on the scale, for example if a
population is given in millions then the number 60 will represent 60 000 000
Look carefully at the labels and units to determine how a value should be read
If there is more than one graph represented on the same set of axes take extra
care to ensure you are reading from the correct one
Beware of misleading graphs, the scales on the axes, units and representation can
be manipulated to make a graph look more/less convincing

 Worked Example
A student is collecting information on his friends’ interests and believes
that his friends who only have dogs spend more time outside than his
friends who only have cats. He has surveyed 20 friends with only cats and
20 friends with only dogs and has written down the total amount of time,
rounded to the nearest hour, each of them spent outside last week.
Describe, with a reason, which diagram would be best for the student to
use to display the data.

Page 21 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

 ExamTakeTipthe time needed when working with diagrams, they are usually
‘easy marks’ questions but it is common for students to rush them and
make silly mistakes.

Page 22 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.2.2 Stem and Leaf Diagrams YOUR NOTES



Stem and Leaf Diagrams
What is a stem and leaf diagram?
A stem and leaf diagram shows ALL RAW data and groups it into class intervals
Stem and leaf diagrams lend themselves to two-digit data but can be used with
three-digit data, rarely more

The numbers in brackets indicate how many values are in that class interval
These are not always included but can be useful when there is a large amount
of data to display
How do I draw a stem and leaf diagram?
Identify the stems and the leaves
Leaves would always be single digits
the number 2 would be represented by 12 | 2
If starting from unordered data draw two diagrams
The first diagram should get the data into the right format
i.e. a list of stems with their corresponding leaves
The second diagram should have stems and leaves in order, with a key
This helps accuracy as values are less likely to be missed out
What are stem and leaf diagrams used for?
The data is arranged into classes so at a glance it is possible to see the modal
class interval
As the data is in order the median, quartiles, maximum and minimum can be
identified easily
Check you can do this – find the minimum, maximum, median and upper and
lower quartiles from the stem and leaf diagram at the start of this revision
note
Note that these five values are those needed in order to construct a box‑and-
whisker diagram (box plot)
Outliers, once defined, can be easily identified and removed

What about back-to-back stem and leaf diagrams?


These are used when it is helpful for the data to be split into two comparable
categories such as boy/girl, child/adult, UK/non-UK. Etc

Page 23 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Note that the leaves on the left-hand side of the stems (Boys) increase from the YOUR NOTES
centre outwards 
Are there any variations on stem and leaf diagrams?
There are a few minor variations on stem and leaf diagrams that you may see
online or in different textbooks

Some or all the different/extra features in the diagram above may appear
These differences can be applied to back-to-back stem and leaf diagrams
With large amounts of data, the stems may be split into two rows
Every stem will be listed twice
The first row for a stem will contain leaves 0 - 4
The second row will contain leaves 5 - 9
What might I be asked to do with a stem and leaf diagram?
You may be asked to draw or complete a stem and leaf diagram
Find statistical measures – median, quartiles and interquartile range in particular
From which you may be required to draw a box-and-whisker diagram
Identify and remove outliers
Compare data shown by stem and leaf diagrams (either separate or back-to-back);
comment on two things and each should be in both terms of the maths and the
context of the question
a comment about average (use median)
e.g. the girls’ median of 88% was higher than the boys’ median of 65% so on
average the girls performed better on the test
a comment about variation (spread) (use interquartile range)
e.g. the girls’ interquartile range of 30% was greater than the boys’ 15% so the
boys had more consistent scores on the test
Analyse what would happen to statistical measures such as the median and
quartiles if a value changed or a new value were to be added to the data

Page 24 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Worked Example YOUR NOTES


 The following stem and leaf diagrams show the times taken by some

children and adults to complete a level on a computer game.

2 | 3 represents a time of 23 seconds


(a)
Compare the times taken to complete the level between the children and
the adults.
(b)
It is later discovered two of the adults’ times had been omitted from the
diagram –times of 23 and 42 seconds.
Briefly explain whether adding these times would change the adults’
median time.

(a)
Compare the times taken to complete the level between the children and the
adults.

Page 25 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

(b)
It is later discovered two of the adults’ times had been omitted from the diagram –
times of 23 and 42 seconds.
Briefly explain whether adding these times would change the adults’ median time.

 ExamAccuracy
Tip
is important
(Lightly) tick off values as you add them to a stem and leaf diagram
Check you have the right number of data values in total on your
diagram
Other checks can include ensuring the median has the same
number of values either side of it

Page 26 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.2.3 Box Plots & Cumulative Frequency YOUR NOTES



Box Plots
What is a box plot?

A box plot is a graph that clearly shows key statistics from a data set
It shows the median, quartiles, minimum and maximum values and outliers
It does not show any other individual data items
The middle 50% of the data will be represented by the box section of the graph and
the lower and upper 25% of the data will be represented by each of the whiskers
Only one axis is used when graphing a box plot
It is still important to make sure the axis has a clear, even scale and is labelled
with units
Box plots are often used for comparing two sets of data
Both box plots will be drawn one above the other on the same scale on the x-
axis
They are useful for comparing data because it is easy to see the main shape
of the distribution of the data from a box plot

 Worked Example
The incomplete box plot below shows the tail lengths in cm of some
students’ pets.

(i)
Given that the median tail length was 21 cm, complete the box plot.

(ii)
Find the range and interquartile range of the tail lengths.

Page 27 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

 ExamRemember
Tip
a box plot is a graph and should be treated like one, even
though there is only one axis. It should have a title, a clear, even scale
that is labelled with units if there are any. If drawing two box plots on
the same axis label each one clearly.

Page 28 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Cumulative Frequency YOUR NOTES


What is a cumulative frequency graph? 
A cumulative frequency graph is used with data that has been organised into a
grouped frequency table
The cumulative frequency graph can be used to find estimates of percentiles and
quartiles
As the data is grouped, it is not possible to find the actual values of these
statistics
What are the main features of a cumulative frequency graph?
A cumulative frequency graph considers how much data there is up to a certain
value, including the data in that group and the one below
Cumulative frequency will always be plotted on the y – axis
Consider the scale carefully because this will usually be a large number
You may be asked to add to one or both axes, remember to label both axes
clearly and include units on the x – axis if they are needed
The cumulative frequency is calculated by adding the frequency in each group, or
class, to the frequency in the ones before
This is essentially accumulating the frequencies as you go
The cumulative frequency for each class must be plotted against the upper
boundary of each corresponding class
The cumulative frequency that corresponds with each upper boundary will not
only consider the frequency of the data in that class, but all of the data in the
groups below it too
When the points have been plotted they should be joined up with a smooth curve
However, some may be joined with straight lines from point to point
How do we read statistics from a cumulative frequency graph?
Quartiles and percentiles can be read from a cumulative frequency graph
The median, Q2 is read from the y – axis scale at the n2 th value
The lower quartile, Q1 ,is read from the n4 th value and the upper quartile, Q3 is
read from the 34n th value
Any percentile can be read from the graph by finding the percent of the total
frequency and reading from the value on the y - axis
To read the corresponding data value once the position on the y – axis is known,
use a ruler to draw a line from the y – axis to the graph and then down to the x –
axis
Sometimes the frequency of values greater than or less than a particular data value
will need to be found, this time you will have to read from the x – axis to the y –
axis
Take particular care if the question asks for a frequency greater than a
particular data value, the value found from the y – axis will need to be
subtracted from the total frequency

Page 29 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Worked Example YOUR NOTES


 The cumulative frequency graph below shows the lengths in cm, l , of a

group of puppies in a training group.

(i)
Given that the group 40 < l ≤ 45 was one of the groups used in the data
collection, find the number of puppies that were in this group.

(ii)
Use the graph to find an estimate for the interquartile range of the puppies.

(iii)
x % of the puppies are greater than 53. 5 cm long, use your graph to find an
estimate for the value of x .

(i)
Given that the group 40 < l ≤ 45 was one of the groups used in the data collection,
find the number of puppies that were in this group.

Page 30 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

(ii)
Use the graph to find an estimate for the interquartile range of the puppies.

Page 31 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

(iii)
x % of the puppies are greater than 53.5 cm long, use your graph to find an
estimate for the value of x .

Page 32 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

 ExamIf you
Tip
are asked to read values from your graph make sure you use a
ruler and mark the lines on clearly to show where you took your
readings from. Remember that the graph shows the accumulated
frequencies so if you need only the frequency you may need to subtract
the previous value.

Page 33 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.2.4 Histograms YOUR NOTES



Histograms
What is a histogram?
A histogram is similar to a bar chart but with some key differences
A histogram is for displaying grouped continuous data whereas a bar chart is
for discrete or qualitative data
There will never be any gaps between the bars of adjacent groups in a
histogram
Whilst in a bar chart the frequency is read from the height of the bar, in a
histogram the height of the bar is the frequency density
On a histogram frequency density is plotted on the y – axis
This allows a histogram to be plotted for unequal class intervals
It is particularly useful if data is spread out at either or both ends
The area of each bar on a histogram will be proportional to the frequency in that
class
How do I draw a histogram?
Step 1. Always check that there are no gaps between the upper boundary of a
class and the lower boundary of the next class
If there are gaps you will need to close them by changing the boundaries
before carrying out any calculations
Consider whether the values are rounded or truncated before closing the
gaps
Step 2. Find the class width of each group by subtracting the lower boundary
from the upper boundary
Step 3 . Calculate the frequency density for each group using the formula:

frequency
frequency density =
class width
Step 4. The histogram will be drawn with the data values on the x – axis and
frequency density on the y – axis
Remember that the scale on both axes must be even, although the class
widths may be uneven
Both axes should be clearly labelled and units included on the x – axis
Most often, the bars will have different widths
How do we interpret a histogram?
It is important to remember that the y – axis does not tell us the frequency of each
bar in the histogram
The frequency of a class is found by
Frequency = Frequency Density × Class Width
You may be asked to find the frequency of part of a bar within a histogram
Find the area of that section of the bar using any information you have already
found out

Page 34 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Worked Example YOUR NOTES


 The table below and its corresponding histogram show the mass, in kg, of

some new born bottlenose dolphins.
Mass, m kg Frequency

4 ≤ m <8 4
8 ≤ m < 10 15
10 ≤ m < 12 19
12 ≤ m < 15 9
15 ≤ m < 30 6

(a)
Complete the histogram.

(b)
Estimate the number of dolphins whose weight is greater than 13 kg.

Page 35 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

 ExamLookTipcarefully at the scales on the axes, it will rarely be a simple 1 unit


to 1 square.

Page 36 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.3 Working with Data YOUR NOTES



1.3.1 Interpreting Data
Interpreting Data
You may be asked to comment on how statistics could affect data or how removing or
adding a new piece of data could change statistics you have calculated. You may also
be asked to compare two data sets of a similar context.
Analysing statistics calculated from a data set:
You could be asked to use some statistics you have calculated, such as the median
or interquartile range, to make decisions about the set of data you were analysing
Writing about the mean or median gives information about where the data is
located
Writing about the range, interquartile range, standard deviation or variance
gives information about how spread out the data is
A lower value means the data set is more consistent
A greater value means the data set is more spread out, or varied
When commenting on a data set you should usually write about both a measure of
location and a measure of spread
You should pair the median with the range or interquartile range and the mean
with the standard deviation or variance
Use the mean and standard deviation/variance when the data is roughly
symmetrical and does not contain outliers
Use the median and interquartile range when the data contains outliers
You should always write your analysis in the context of the data
Sometimes a lower mean/median is better: time taken to complete a puzzle
Sometimes a higher mean/median is better: score on a test
Changing a data set:
Sometimes it might be discovered that a value was omitted from a data set and
needs to be added in
Similarly a data value could be found to be an error and will need to be removed
(cleaned) from the data set
It is important to be aware of how the measures of location and spread may
change when the data is added or removed
Adding or removing a data value to the set could change the mean, median
and quartiles
How each statistics changes will depend on where the data value lies within
the data set
Adding a data point below the mean, or removing one from above will
cause the mean to decrease
Adding a data point above the mean, or removing one from below will
cause the mean to increase
The median and quartiles may or may not change depending on the data
value, you should always check these cases individually

Page 37 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Adding or removing extreme values will change the value of the mean by a lot but YOUR NOTES
affect the median in the same way as any other value 
Comparing two data sets
When comparing two data sets you must comment on both:
A measure of location such as the mean, median or mode and
A measure of spread such as the range, interquartile range, variance or
standard deviation
If you comment on the mean as the measure of location you should use
the standard deviation or variance as the measure of spread
If you comment on the median as the measure of location you should use
the interquartile range as the measure of spread
You should use information about the data to decide which measure of location
and measure of spread is the best to use
If data contains extreme values then it is best to use the median and
interquartile range to compare the data sets
Extreme values can cause the mean to be an unreliable statistic
It is common to be asked to compare two data sets that are represented as two
box plots on the same scale
You should write about both the median and the interquartile range
When writing a comparison about the data you should always write in context

Page 38 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Worked Example YOUR NOTES


 The diagram below shows a box plot of the average hopping speeds, in m

s-1, of some Eastern Grey kangaroos living in an Australian nature reserve.

The maximum hopping speeds, in m s-1, of some Red Kangaroos living in


the nature reserve are summarised below.

Lower quartile: 10 Median: 13 Upper quartile:


14
Minimum value: 9 Maximum value: 16

Draw a box plot on the grid to represent the data for the Red Kangaroos
and compare the distribution of the hopping speeds for the Red and Eastern
Grey kangaroos.

Page 39 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

YOUR NOTES

 ExamRemember
Tip
to always write about data sets and their distributions in the
context of the question. The number of marks available in comparison
question are often an indication of how much you should say in the
answer.

Page 40 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

1.3.2 Skewness YOUR NOTES



Skewness
The distribution of a data set is either symmetrical or it has skewness.
What is skewness?
Skewness describes the way in which data in a non – symmetrical distribution is
leaning
A distribution that has its tail on the right side has positive skew
A distribution that has its tail on the left side has negative skew

If the distribution is shown on a box plot looking at the difference between the
quartiles can help decide how it is skewed
If the median is closer to the lower quartile then the distribution has positive
skew
Q 3 - Q 2 > Q 2- Q 1
If the median is closer to the upper quartile then the distribution has negative
skew
Q3 - Q2 < Q2 - Q1

Page 41 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Looking at the values of the statistics can help you decide whether distribution is YOUR NOTES
positively skewed or negatively skewed 
In a positively skewed distribution
Mode < median < mean
In a negatively skewed distribution
Mean < median < mode

 ExamYouTip
only need to be able to recognise the different types of skewness. It
can also help to comment on skewness when asked to compare
distributions.

Page 42 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Page 43 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources

Page 44 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers

You might also like