Professional Documents
Culture Documents
S4 Week 7 Exam Prep 2 Statistics
S4 Week 7 Exam Prep 2 Statistics
10
DIFFERENT TYPES OF DATA
Numerical data is data in number form. It can be an amount, a measurement, a
time or a score. Numerical data is also called quantitative data (from the word
quantity).
Tally tables Tallies are little marks (////) that you use to keep a record of
items you count. Each time you count five items you draw a line across the
previous four tallies to make a group of five (////). Grouping tallies in fives
makes it much easier to count and get a total when you need one.
A tally table is used to keep a record when you are counting things.
13
Look at this tally table. A student used this to record how many cars of each
colour there were in a parking lot. He made a tally mark in the second
column each time he counted a car of a particular colour.
14
2
FREQUENCY
TABLE
15
2. FREQUENCY TABLES
A frequency table shows the totals of the tally marks. Some frequency
tables include the tallies.
16
FREQUENCY
TABLES
The frequency table has space to write a total at the bottom of the
frequency column. This helps you to know how many pieces of
data were collected. In this example the student recorded the
colours of 157 cars.
17
Here is a frequency table without tallies. It was drawn up by the
staff at a clinic to record how many people were treated for
different diseases in one week.
18
3
GROUPING DATA IN
CLASS INTERVALS
19
3. GROUPING DATA IN CLASS
INTERVALS
Sometimes numerical data needs to be recorded in different
groups.
20
To simplify things, the
collected data can be
arranged in groups called
class intervals.
A frequency table with results
arranged in class intervals is
called a grouped frequency
table.
The range of scores (40–84) has been divided into class intervals
21
STEM AND LEAF DIAGRAMS
A stem and leaf diagram is a special type of table that allows you
to organise and display grouped data using the actual data values.
In a stem and leaf diagram each data item is broken into two parts:
a stem and a leaf.
The final digit of each value is the leaf and the previous digits are
the stem. The stems are written to the left of a vertical line and the
leaves are written to the right of the vertical line
22
STEM AND LEAF DIAGRAMS
For example a score of 13 would be shown as:
Stem Leaf
1 | 3
In this case, the tens digit is the stem and the units digit is the leaf. A
larger data value such as 259 would be shown as:
Stem Leaf
25 | 9
In this case, the stem represents both the tens and the hundreds digits
while the units digit is the leaf.
23
WORKED EXAMPLE 1
This data set shows the ages of customers using an internet café. 34 23
40 35 25 28 18 32 37 29 19 17 32 55 36 42 33 20 25 34 48 39 36 30
Draw a stem and leaf diagram to display this data.
24
STEPS:
1. Group the ages in intervals of ten, 10 – 19; 20 – 29 and so on. These are
two-digit numbers, so the tens digit will be the stem.
2. List the stems in ascending order down the left of the diagram.
3. Work through the data in the order it is given, writing the units digits (the
leaves) in a row next to the appropriate stem.
4. If you need to work with the data, you can redraw the diagram, putting
the leaves in ascending order.
5. From this re-organised stem and leaf diagram you can quickly see that:
• the youngest person using the internet café was 17 years old
• the oldest person was 55 (the last data item)
• most users were in the age group 30 – 39 (the group with the largest
number of leaves).
25
WORKED EXAMPLE 1
This data set shows the ages of customers using an internet café. 34 23
40 35 25 28 18 32 37 29 19 17 32 55 36 42 33 20 25 34 48 39 36 30
Draw a stem and leaf diagram to display this data.
26
STEM AND LEAF DIAGRAMS
A back to back stem and leaf diagram is used to show two sets of data.
The second set of data is plotted against the same stem, but the leaves
are written to the left . This stem and leaf plot compares the battery life
of two different brands of mobile phone.
29
4
TWO - WAY
TABLE
30
4. TWO – WAY TABLES
A two-way table shows the frequency of certain results for two
or more sets of data. Here is a two way table showing how many
men and woman drivers were wearing their seat belts when they
passed a check point.
31
4. TWO – WAY TABLES
Here are two more examples of two-way tables:
Drinks and crisps sold at a school tuck shop during lunch break
32
5
PIE CHARTS
33
5. PIE CHARTS
34
WORKED EXAMPLE 1
40
AVERAGE
‘Average’ is a word which in general use is taken to mean
somewhere in the middle.
If you count how many size fours, how many size fives and
so on, you will find that the most common (most frequent)
shoe size in the class is six.
What most people think of as the average is the value you get when you
add up all the shoe sizes and divide your answer by the number of
students:
𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑠h𝑜𝑒 𝑠𝑖𝑧𝑒𝑠 115
= =6.05(2 𝑑 . 𝑝 .)
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 19
This average is called the mean. The mean value tells you that the shoe
sizes appear to be spread in some way around the value 6.05. It also gives
you a good impression of the general ‘size’ of the data.
DIFFERENT TYPES OF AVERAGE
Another measure of central tendency is the middle value when the shoe
sizes are arranged in ascending order
3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11
DIFFERENT TYPES OF AVERAGE
3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11
If you now think of the first and last values as one pair, the second and
second to last as another pair, and so on, you can cross these numbers off
and you will be left with a single value in the middle.
3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11
This middle value, (in this case six), is known as the median.
DIFFERENT TYPES OF AVERAGE
What if there had been 20 students in the class?
For example, add an extra student with a shoe size of 11.
Crossing off pairs gives this result:
3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11
You are left with a middle pair rather than a single value.
If this happens then you simply find the mean of this middle pair:
SUMMARY
Mode - The value that appears in your list more than any other. There can
be more than one mode but if there are no values that occur more often
than any other then there is no mode. Mean - . The mean may not be one
of the actual data values.
Median 1. Arrange the data into ascending numerical order.
2. If the number of data is n and n is odd, find and this will
give you the position of the median.
3. If n is even, then calculate and this will give you the position of
the first of the middle pair.
Find the mean of this pair.
WORK EXAMPLE
a. i. Find the mean, median and mode of the data listed below.
1, 0, 2, 4, 1, 2, 1, 1, 2, 5, 5, 0, 1, 2, 3
1 +0 +2+4 +1+ 2+1+1+2+5+5+ 0+1+2+3 30
MEAN ¿
15
¿
15 ¿𝟐
MEDIAN
*Arrange all the data in order and then pick out the middle number.
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4, 5, 5
MODE
The mode is the number which appeared most often.
Therefore the mode is 1.
WORK EXAMPLE
168 (5+ 6)
𝑚𝑒𝑎𝑛= 𝑚𝑒 𝑑𝑖𝑎𝑛=
=𝟓 .𝟐𝟓 =𝟓 .𝟓𝑚 𝑜𝑑𝑒=𝟕
32 2
ii. Range = 10 – 0 = 10
STEM – AND –
LEAF DIAGRAM
WORKED EXAMPLE 1
The ordered stem and leaf diagram shows the number of customers served at a
supermarket checkout every half hour during an 8-hour shift.
Stem Leaf
0 2 5 5 6 6 6 6
1 1 3 3 5 5 6 7 7
2 1
Find:
a. The mean mark
b. The median mark
c. The modal mark
WORKED EXAMPLE 3
(ANSWER)
The marks obtained by 100 students in a test were as follows:
Find:
a. The mean mark
𝑀𝑒𝑎𝑛=
∑ 𝑥𝑓 where means ‘the sum of the products’
∑𝑓 and means ‘the sum of the frequencies’
( 0 × 4 )+ ( 1 × 19 )+ ( 2 × 25 ) + ( 3 × 29 )+(4 × 23)
𝑀𝑒𝑎𝑛=
100
248
𝑀𝑒𝑎𝑛= =2.48
100
WORKED EXAMPLE 3
(ANSWER)
The marks obtained by 100 students in a test were as follows:
Find:
b. the median mark
The median mark is the number between the 50th and
51st numbers. By inspection, both the 50th and 51st numbers are 3
∴ 𝑀𝑒𝑑𝑖𝑎𝑛=3 𝑚𝑎𝑟𝑘𝑠
Find an estimate for the mean height of the children, the modal class,
the median class and an estimate for the range.
WORKED EXAMPLE 1 (ANSWER
CONT…)
So, extend your table to include midpoints and then totals for each class:
Height in cm (h) Frequency (f) Midpoint Frequency x midpoint
12 124.5 1494
16 134.5 2152
38 144.5 5491
24 155.5 3732
10 164.5 1645
Total = 14514
+12=28
+28=66
+66=90
+90=100
Median class:
WORKED EXAMPLE 1 (ANSWER
CONT…)
+12=28
+28=66
+66=90
+90=100
Median class:
The class with the highest frequency is the modal class. In this case it is the same class as the
median class:
The shortest child could be as small as 120cm and the tallest could be as tall as 170cm. The best estimate of the
range is, therefore, 170 − 120 = 50cm. The class with the highest frequency is the modal class. In this case it is the
same class as the median class: .
WORKED EXAMPLE 2
The history test scores for a group of 40 students are shown in the
grouped frequency table below.
Find:
a. The mean mark
b. The median mark
c. The modal mark
WORKED EXAMPLE 2
The marks obtained by 100 students in a test were as follows:
Find:
a. The mean mark
b. The median mark
c. The modal mark
WORKED EXAMPLE 2
The marks obtained by 100 students in a test were as follows:
Find:
a. The mean mark
b. The median mark
c. The modal mark
HISTOGRAMS
71
HISTOGRAMS WITH
EQUAL CLASS
INTERVALS
72
WORK EXAMPLE 1
The table and histogram below show the heights of trees in a sample from a forestry site.
e. Why is there a gap between the columns on the right-hand side of the graph?
The frequency for the class interval is zero, so no bar is drawn.
WORK EXAMPLE 2
Joy-Anne did an experiment in her class to see what mass of raisins (in grams) the
students could hold in one hand. Here are her results.
a. Using the class intervals 16–20, 21–25, 26–30 and 31–35 draw a grouped
frequency table.
b. What is the modal class (the mode) of this data?
c. Draw a histogram to show her results.
WORK EXAMPLE 2 (ANSWER)
Joy-Anne did an experiment in her class to see what mass of raisins (in grams) the
students could hold in one hand. Here are her results.
a. Using the class intervals 16–20, 21–25, 26–30 and 31–35 draw a grouped
frequency table.
Count the number in each class to fill in
the table.
WORK EXAMPLE 2 (ANSWER)
Joy-Anne did an experiment in her class to see what mass of raisins (in grams) the
students could hold in one hand. Here are her results.
83
WORKED EXAMPLE 1
Here is a table showing the heights of 25 plants.
Next draw the axes. You will need to decide on a suitable scale for both the
horizontal and the vertical axes.
WORKED EXAMPLE 1 (ANSWER
CONT…)
Here, 1 cm has been used to represent 10 cm on the horizontal axis (label
height in cm) and 2 cm per unit on the vertical axis (label frequency density).
Once you have done this, draw the histogram, paying careful attention to the
scales on the axes.
CUMULATIVE
FREQUENCY
87
WORKED EXAMPLE 3
The examination marks of 300 students are summarised in the table.
38
WORKED EXAMPLE 4 (ANSWER)
c. the number of students who took less than 10 minutes to get to school
=4
WORKED EXAMPLE 4 (ANSWER)
d. the number of students who had journey times greater than 30 minutes
Subtract the
cumulative frequency
at 30 minutes, 18, from
the total frequency.
50 - 18= 12
WORKED EXAMPLE 4 (ANSWER)
e. the number of students who took between 40 minutes and one hour to get to
school.
Subtract the
cumulative frequency
at 40 minutes, 28, from
that at 60 minutes, 42.
42 - 28= 14
WORKED EXAMPLE 5
Twenty bean seeds were planted for a biology experiment. The heights of
the plants were measured after three weeks and recorded as below.
𝑀𝑒𝑎𝑛=
∑ 𝑓𝑥
𝑀𝑒𝑎𝑛 h𝑒𝑖𝑔h𝑡 =
132
=6.6 𝑐𝑚
∑𝑓 20
WORKED EXAMPLE 5 (ANSWER)
b. Draw a cumulative frequency curve and find an estimate for the
median height
𝑀𝑒𝑑𝑖𝑎𝑛 h𝑒𝑖𝑔h𝑡=7.0 𝑐𝑚
QUARTILES AND THE
INTERQUARTILE
RANGE
QUARTILES
2 5 6 7 8 12 14 16 20 21 30
median
QUARTILES
We consider the 5 values on the left of the median. The middle value of these 5
values is 6 and it is called the lower quartile or the first quartile .
Lower half
2 5 6 7 8 12 14 16 20 21 30
Lower quartile median
The first quartile can be considered as the ‘first-quartile value’.
25% (or one quarter) of the data less than or equal to this value.
QUARTILES
Since the median is the middle value or ‘second – quarter value’, the median is also
called the second quartile . 50% (or half) of the data is less than or equal to this
value.
Lower half
2 5 6 7 8 12 14 16 20 21 30
Lower quartile Median
QUARTILES
We consider the 5 values on the left of the median. The middle value of these 5
values is 6 and it is called the upper quartile or the third quartile . 75% (or three
quarters) of the data is less than or equal to this value.
upper half
2 5 6 7 8 12 14 16 20 21 30
Median Upper quartile
QUARTILES
We see that the quartiles obtained by the above method divide the data which is
arranged in ascending in ascending order into 4 roughly equal parts.
2 5 6 7 8 12 14 16 20 21 30
Lower quartile Median Upper quartile
RANGE AND THE INTERQUARTILE RANGE
The figure shows the range and interquartile for the data values in Set A. The median,
range and interquartile range are indicated in the dot diagram as shown.
range
0 2 6 12 20 30
𝑄1 𝑀 𝑒𝑑𝑖𝑎𝑛 𝑄3
Interquartile range
These measures of spread show the degree of variation or how ‘spread out’ the data
values are.
RANGE AND THE INTERQUARTILE RANGE
For Set A,
range
0 2 6 12 20 30
𝑄1 𝑀 𝑒𝑑𝑖𝑎𝑛 𝑄3
Interquartile range
WORK EXAMPLE: FINDING AND INTERPRETING THE RANGE AND INTERQUARTILE
RANGE FOR A SET OF DISCRETE WITH AN EVEN NUMBER OF DATA VALUES
The data below shows the marks for a multiple choice quiz with 20
questions, taken by 8 students.
10 12 12 13 9 17 11 14
(i) For the given set of data, find , , and .
(ii) Find the range.
(iii) Find the interquartile range.
WORK EXAMPLE: FINDING AND INTERPRETING THE RANGE AND INTERQUARTILE
RANGE FOR A SET OF DISCRETE WITH AN EVEN NUMBER OF DATA VALUES
The data below shows the marks for a multiple choice quiz with 20
questions, taken by 8 students.
10 12 12 13 9 17 1114
(i) For the given set of data, find , , and .
Arranging the given data in ascending order.
9 10 11 12 12 13 17 14
𝑄1 𝑄2 𝑄3
WORK EXAMPLE: FINDING AND INTERPRETING THE RANGE AND INTERQUARTILE
RANGE FOR A SET OF DISCRETE WITH AN EVEN NUMBER OF DATA VALUES
9 10 11 12 12 13 17 14
𝑄1 𝑄2 𝑄3
(i) For the given set of data, find , , and .
12+12 10+11 13+14
𝑄2 = =12 𝑄1 = =10.5 𝑄3 = = 13.5
2 2 2
(ii) Range = 17 – 9 = 8
(iii) Interquartile range = –
= 13.5 – 10.5
=3
PERCENTILES AND
QUARTILES
QUARTILES
Two very important percentiles are the upper and lower quartiles.
These lie 25% and 75% of the way through the data respectively.
Use the following rules to estimate the positions of each quartile within
a set of ordered data:
If the position does not turn out to be a whole number, you simply find
the mean of the pair of numbers on either side.
For example, if the position of the lower quartile turns out to be 5.25,
then you find the mean of the 5th and 6th pair.
INTERQUARTILE RANGE
As with the range, the interquartile range gives a measure of how
spread out or consistent the data is.
The main difference is that the interquartile range (IQR) avoids using
extreme data by finding the difference between the lower and upper
quartiles. You are, effectively, measuring the spread of the central
50% of the data.
If one set of data has a smaller IQR than another set, then the
first set is more consistent and less spread out.
This can be a useful comparison tool.
WORK EXAMPLE
Notice that these are not whole numbers, so the lower quartile will be the mean of the 3rd and 4th
values, and the upper quartile will be the mean of the 9th and 10th values.
= =7 = = 12.5
These are whole numbers so the lower quartile is in position two and the upper
quartile is in position six.
So = second position = 9 Thus, the IQR = 15 – 9 = 6
= sixth position = 15
QUARTILE
(GROUPED
DATA)
A pet shop owner weighs his mice every week to check their health.
The weights of the 80 mice are shown below:
Cumulative
weight (g) Frequency (f) Frequency
0 < w ≤ 10 3 3
10 < w ≤ 20 5 8
20 < w ≤ 30 5 13
30 < w ≤ 40 9 22
40 < w ≤ 50 11 33
50 < w ≤ 60 15 48
60 < w ≤ 70 14 62
70 < w ≤ 80 8
70
80 < w ≤ 90 6 76
90 < w ≤100 4 80
Cumulative frequency
20 < w ≤ 30 5 13
50 x
30 < w ≤ 40 9 22
40 < w ≤ 50 11 33 40
50 < w ≤ 60 15 48 x
60 < w ≤ 70 14 62 30
70 < w ≤ 80 8
70 20
x
80 < w ≤ 90 6 76 x
90 < w ≤100 4 80 10 x
x
0
1. The cumulative frequency (c.f.) can now 0 10 20 30 40 50 60 70 80 90 100
Weight (g)
be plotted on a graph taking care to plot
2. The point are now joined with straight lines
the c.f. at the end of each class interval.
The line always starts at the bottom of the
This is because we don’t know where in first class interval
the class interval 0 < w ≤ 10, the values
The resulting graph should look like this and
are, but we do know that by the end of the
is sometimes called an ‘S’ curve.
class interval there are 3 pieces of data
Cumulative Frequency
From this graph we can now find estimates of the median, and upper and lower
quartiles Upper quartile
There are 80 pieces of data 80 x
x
The middle is the 40th 70 x
The lower quartile is the 20 th
x
60
piece of data ¼ of the total
Cumulative frequency
pieces of data Median position 50 x
Lower quartile 20
x
x
10 x
The upper quartile is the 60th x
piece of data ¾ of the total 0
0 10 20 30 40 50 60 70 80 90 100
Weight (g)
pieces of data Lower quartile is 38g
Median weight is 54g
Upper quartile is 68g
INTERQUARTILE (GROUPED DATA)
The upper and lower quartiles can now be used to find what is called
the interquartile range and is found by:
Upper quartile – Lower quartile
In this example: Lower quartile is 38g Upper quartile is 68g
Because this has been found by the top ¾ subtract the bottom ¼
½ of the data (50%) is contained within these values
So we can also say from this that half the mice weigh between 38g and 68g
WORK EXAMPLE
In an international competition 60 children from Britain and France
did the same Maths test. The results are in the table below:
Britain Britain France France
Marks Frequency c.f. Frequency c.f
1-5 1 2
6 - 10 2 5
11 - 15 4 11
16 - 20 8 16
21 - 25 16 10
26 - 30 19 8
31 - 35 10 8
a. Using the same axes draw the cumulative frequency diagram for
each country.
b. Find the median mark and the upper and lower quartiles for both
countries and the interquartile range.
c. Make a short comment comparing the two countries
WORK EXAMPLE (ANSWER)
Britain Britain France France Both have 60 pieces of data
Marks Frequency c.f. Frequency c.f
1-5 1 1 2 2
6 - 10 2
3 5
7
Median position is 30
11 - 15 4 7 11 18 Lower quartile position is 15
16 - 20 8 15 16 34
21 - 25 16 31 10 44
Upper quartile position is 45
26 - 30 19 50 8 52
31 - 35 10 60 8 60
Britain 60 x
France xx
50