Professional Documents
Culture Documents
Data Presentation & Interpretation
Data Presentation & Interpretation
Data Presentation & Interpretation
YOUR NOTES
AS Maths CIE
CONTENTS
1.1 Statistical Measures
1.1.1 Basic Statistical Measures
1.1.2 Frequency Tables
1.1.3 Standard Deviation & Variance
1.1.4 Coding
1.2 Representation of Data
1.2.1 Data Presentation
1.2.2 Stem and Leaf Diagrams
1.2.3 Box Plots & Cumulative Frequency
1.2.4 Histograms
1.3 Working with Data
1.3.1 Interpreting Data
1.3.2 Skewness
Page 1 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 2 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 3 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
It’s a good idea to be aware of the advantages and disadvantages of the use of YOUR NOTES
each average
The mean uses all of the data values, this is good for a large data set where all of
the values are close together, but also means that the mean can be affected by
extreme values
The median is not affected by very high or low values so is a good average to use
in data sets with extreme values
The mode is very useful in a lot of practical situations, however often there may be
more than one mode, no mode or even a mode that is nowhere near the middle of
the data set
Worked Example
For the data set given below, find the mode, median and mean.
23 19 14 28 27 19
Page 4 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 5 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
43 29 70 31 84 56 17
53.
ExamBe Tip
aware of the difference between averages and ranges, especially
when answering contextual questions asking you to describe or
compare data. Remember, averages give an indication of where the data
are whilst range gives an indication of how varied the data are.
Page 6 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 7 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 8 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
ExamUseTip
common sense when checking your answers, is your mean within
the range of the data? Does is seem right? A mean of 140 for example
could not be correct if the data is about ages of students taking an
exam at university.
Page 9 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 10 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
It is only possible to calculate an estimate for the mean and the median from a YOUR NOTES
grouped frequency table
Calculating the estimated mean is the same as for ungrouped frequency tables,
however you will need to find the midpoints first
the midpoint is the mean of the upper and lower values in the class
boundaries
multiply the midpoint by its corresponding frequency, find the sum of these
values and then divide by n
Calculating the estimated median is more complicated and questions will only ask
for the class that the estimated median would be in
Worked Example
The table below shows the heights in cm of a group of year 12 students.
Height, h Frequency
(i)
Write down the class that a student of exactly 160 cm should be added to.
(ii)
Write down the modal class.
(iii)
Calculate the estimated mean height.
Page 11 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
ExamThere
Tip
can be a lot of calculations when working with grouped frequency
tables, be extra careful when using your calculators as it is easy to make
small errors with these questions. Use the table and add information to
the table as you go as there will be marks available for showing work
within the methods.
Page 12 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 13 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
How are the variance and standard deviation calculated from a frequency YOUR NOTES
table?
The method for finding the variance from a frequency table is similar to that of the
mean
If calculating from a grouped frequency table, find the midpoints, x , first
Multiply each x value by its corresponding frequency and use these values
within the formulae
The formulae will become
⎯⎯
Σ(x − x ) 2f Σx 2f ⎛⎜ Σxf ⎞⎟ 2
Variance = σ 2 = = −⎜ ⎟
Σf Σf ⎝ Σf ⎠
⎯⎯ 2
Σ(x − x ) 2f Σx 2f ⎛⎜ Σxf ⎞⎟
Standard deviation = σ = = −⎜ ⎟
Σf Σf ⎝ Σf ⎠
Worked Example
Phoom recorded the length of time, t , it took him, in minutes, to answer a
selection of further calculus exam questions. The data is summarised in the
table below.
Time (minutes) Frequency
2≤t <4 1
4≤t <6 4
6≤t <8 7
8 ≤ t < 10 5
10 ≤ t < 12 2
12 ≤ t < 20 2
(i)
Calculate an estimate of the variance of the time taken to complete a
question, include units with your answer.
(ii)
Write down an estimate of the standard deviation of the time taken to
complete a question, include units with your answer.
Page 14 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
Page 15 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
ExamLookTipout for whether a question gives or asks for the standard deviation
or variance, especially if the question is using sigma notation.
Choose which formula to use wisely, most of the time the summary
statistics will be given so only one of the formulae will be possible. On
the rare occasion that you are asked to calculate directly from a table
think carefully about which version of the formula is quickest and
easiest to use. It will almost always be the second version given in this
revision note.
Page 16 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 17 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Will be the same as the coded standard deviation if the data was coded by YOUR NOTES
adding or subtracting a constant only
Can be found by reversing the coding if the data was coded by multiplying
or dividing by a constant only
If the data was coded by a combination of both then only the multiplying
or dividing will need to be reversed to find the original standard deviation
For example, if the data, x , was coded using the formula
y = ax + b
Then the standard deviation of the coded data, σ y would be
σ y = a σ x
Most questions will give you either the summary statistics or the mean of the
coded data and expect you to work with these formulae to find original
information about the data
Page 18 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
(b)
Given that Σ( c − 150) 2 = 112 , find the standard deviation of the sampled
cups of coffee.
(a)
Find the mean volume in a cup of coffee.
(b)
Given that Σ( c − 150) 2 = 112 , find the standard deviation of the sampled cups of
coffee.
Page 19 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
ExamBe Tip
careful when using the formulae for the mean and standard deviation
with coded summary statistics, you must make sure that you use the
summary statistics consistently throughout. For example, if you use the
sum of the coded data squared in the formula for the standard
deviation, you must subtract the square of the coded mean.
Page 20 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Worked Example
A student is collecting information on his friends’ interests and believes
that his friends who only have dogs spend more time outside than his
friends who only have cats. He has surveyed 20 friends with only cats and
20 friends with only dogs and has written down the total amount of time,
rounded to the nearest hour, each of them spent outside last week.
Describe, with a reason, which diagram would be best for the student to
use to display the data.
Page 21 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
ExamTakeTipthe time needed when working with diagrams, they are usually
‘easy marks’ questions but it is common for students to rush them and
make silly mistakes.
Page 22 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
The numbers in brackets indicate how many values are in that class interval
These are not always included but can be useful when there is a large amount
of data to display
How do I draw a stem and leaf diagram?
Identify the stems and the leaves
Leaves would always be single digits
the number 2 would be represented by 12 | 2
If starting from unordered data draw two diagrams
The first diagram should get the data into the right format
i.e. a list of stems with their corresponding leaves
The second diagram should have stems and leaves in order, with a key
This helps accuracy as values are less likely to be missed out
What are stem and leaf diagrams used for?
The data is arranged into classes so at a glance it is possible to see the modal
class interval
As the data is in order the median, quartiles, maximum and minimum can be
identified easily
Check you can do this – find the minimum, maximum, median and upper and
lower quartiles from the stem and leaf diagram at the start of this revision
note
Note that these five values are those needed in order to construct a box‑and-
whisker diagram (box plot)
Outliers, once defined, can be easily identified and removed
Page 23 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Note that the leaves on the left-hand side of the stems (Boys) increase from the YOUR NOTES
centre outwards
Are there any variations on stem and leaf diagrams?
There are a few minor variations on stem and leaf diagrams that you may see
online or in different textbooks
Some or all the different/extra features in the diagram above may appear
These differences can be applied to back-to-back stem and leaf diagrams
With large amounts of data, the stems may be split into two rows
Every stem will be listed twice
The first row for a stem will contain leaves 0 - 4
The second row will contain leaves 5 - 9
What might I be asked to do with a stem and leaf diagram?
You may be asked to draw or complete a stem and leaf diagram
Find statistical measures – median, quartiles and interquartile range in particular
From which you may be required to draw a box-and-whisker diagram
Identify and remove outliers
Compare data shown by stem and leaf diagrams (either separate or back-to-back);
comment on two things and each should be in both terms of the maths and the
context of the question
a comment about average (use median)
e.g. the girls’ median of 88% was higher than the boys’ median of 65% so on
average the girls performed better on the test
a comment about variation (spread) (use interquartile range)
e.g. the girls’ interquartile range of 30% was greater than the boys’ 15% so the
boys had more consistent scores on the test
Analyse what would happen to statistical measures such as the median and
quartiles if a value changed or a new value were to be added to the data
Page 24 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
(a)
Compare the times taken to complete the level between the children and the
adults.
Page 25 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
(b)
It is later discovered two of the adults’ times had been omitted from the diagram –
times of 23 and 42 seconds.
Briefly explain whether adding these times would change the adults’ median time.
ExamAccuracy
Tip
is important
(Lightly) tick off values as you add them to a stem and leaf diagram
Check you have the right number of data values in total on your
diagram
Other checks can include ensuring the median has the same
number of values either side of it
Page 26 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
A box plot is a graph that clearly shows key statistics from a data set
It shows the median, quartiles, minimum and maximum values and outliers
It does not show any other individual data items
The middle 50% of the data will be represented by the box section of the graph and
the lower and upper 25% of the data will be represented by each of the whiskers
Only one axis is used when graphing a box plot
It is still important to make sure the axis has a clear, even scale and is labelled
with units
Box plots are often used for comparing two sets of data
Both box plots will be drawn one above the other on the same scale on the x-
axis
They are useful for comparing data because it is easy to see the main shape
of the distribution of the data from a box plot
Worked Example
The incomplete box plot below shows the tail lengths in cm of some
students’ pets.
(i)
Given that the median tail length was 21 cm, complete the box plot.
(ii)
Find the range and interquartile range of the tail lengths.
Page 27 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
ExamRemember
Tip
a box plot is a graph and should be treated like one, even
though there is only one axis. It should have a title, a clear, even scale
that is labelled with units if there are any. If drawing two box plots on
the same axis label each one clearly.
Page 28 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 29 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
(i)
Given that the group 40 < l ≤ 45 was one of the groups used in the data
collection, find the number of puppies that were in this group.
(ii)
Use the graph to find an estimate for the interquartile range of the puppies.
(iii)
x % of the puppies are greater than 53. 5 cm long, use your graph to find an
estimate for the value of x .
(i)
Given that the group 40 < l ≤ 45 was one of the groups used in the data collection,
find the number of puppies that were in this group.
Page 30 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
(ii)
Use the graph to find an estimate for the interquartile range of the puppies.
Page 31 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
(iii)
x % of the puppies are greater than 53.5 cm long, use your graph to find an
estimate for the value of x .
Page 32 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
ExamIf you
Tip
are asked to read values from your graph make sure you use a
ruler and mark the lines on clearly to show where you took your
readings from. Remember that the graph shows the accumulated
frequencies so if you need only the frequency you may need to subtract
the previous value.
Page 33 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
frequency
frequency density =
class width
Step 4. The histogram will be drawn with the data values on the x – axis and
frequency density on the y – axis
Remember that the scale on both axes must be even, although the class
widths may be uneven
Both axes should be clearly labelled and units included on the x – axis
Most often, the bars will have different widths
How do we interpret a histogram?
It is important to remember that the y – axis does not tell us the frequency of each
bar in the histogram
The frequency of a class is found by
Frequency = Frequency Density × Class Width
You may be asked to find the frequency of part of a bar within a histogram
Find the area of that section of the bar using any information you have already
found out
Page 34 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
4 ≤ m <8 4
8 ≤ m < 10 15
10 ≤ m < 12 19
12 ≤ m < 15 9
15 ≤ m < 30 6
(a)
Complete the histogram.
(b)
Estimate the number of dolphins whose weight is greater than 13 kg.
Page 35 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
Page 36 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 37 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Adding or removing extreme values will change the value of the mean by a lot but YOUR NOTES
affect the median in the same way as any other value
Comparing two data sets
When comparing two data sets you must comment on both:
A measure of location such as the mean, median or mode and
A measure of spread such as the range, interquartile range, variance or
standard deviation
If you comment on the mean as the measure of location you should use
the standard deviation or variance as the measure of spread
If you comment on the median as the measure of location you should use
the interquartile range as the measure of spread
You should use information about the data to decide which measure of location
and measure of spread is the best to use
If data contains extreme values then it is best to use the median and
interquartile range to compare the data sets
Extreme values can cause the mean to be an unreliable statistic
It is common to be asked to compare two data sets that are represented as two
box plots on the same scale
You should write about both the median and the interquartile range
When writing a comparison about the data you should always write in context
Page 38 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Draw a box plot on the grid to represent the data for the Red Kangaroos
and compare the distribution of the hopping speeds for the Red and Eastern
Grey kangaroos.
Page 39 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
YOUR NOTES
ExamRemember
Tip
to always write about data sets and their distributions in the
context of the question. The number of marks available in comparison
question are often an indication of how much you should say in the
answer.
Page 40 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
If the distribution is shown on a box plot looking at the difference between the
quartiles can help decide how it is skewed
If the median is closer to the lower quartile then the distribution has positive
skew
Q 3 - Q 2 > Q 2- Q 1
If the median is closer to the upper quartile then the distribution has negative
skew
Q3 - Q2 < Q2 - Q1
Page 41 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Looking at the values of the statistics can help you decide whether distribution is YOUR NOTES
positively skewed or negatively skewed
In a positively skewed distribution
Mode < median < mean
In a negatively skewed distribution
Mean < median < mode
ExamYouTip
only need to be able to recognise the different types of skewness. It
can also help to comment on skewness when asked to compare
distributions.
Page 42 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 43 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
Page 44 of 44
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers