Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 135

STATISTICS

S4 Term 2 2022 - 2023


RECAP
You should already be familiar with the following concepts from
working with data:
Types of data and methods of collecting data
• Primary data – collected by the person doing the investigation
• Secondary data – collected and stored by someone else
(and accessed for an investigation)
• Data can be collected by experiment, measurement, observation or
carrying out a survey.
RECAP
Ways of organizing and displaying data
RECAP
Ways of organizing and displaying data
RECAP
Ways of organizing and displaying data
RECAP
Ways of organizing and displaying data
RECAP
Ways of organizing and displaying data
COLLECTING
AND
CLASSIFYING
DATA
COLLECTING AND
CLASSIFYING DATA
Data is a set of facts, numbers or
other information.

Statistics involves a process of


collecting data and using it to try
and answer a question.

The flow diagram shows the four


main steps involved in this
process of statistical
investigation: 9
DIFFERENT TYPES OF DATA

Categorical data is non-numerical data. It names or


describes something without reference to number or size.

Colours, names of people and places, yes and no answers,


opinions and choices are all categorical.

Categorical data is also called qualitative data..

10
DIFFERENT TYPES OF DATA
Numerical data is data in number form. It can be an amount, a measurement, a
time or a score. Numerical data is also called quantitative data (from the word
quantity).

Numerical data can be further divided into two groups:


• discrete data – this is data that can only take certain values, for example, the
number of children in a class, goals scored in a match or red cars passing a point.
When you count things, you are collecting discrete data.
• continuous data – this is data that could take any value between two given
values, for example, the height of a person who is between 1.5 m and 1.6 m tall
could be 1.5 m, 1.57 m, 1.5793 m, 1.5793421 m or any other value between 1.5 m
and 1.6 m depending on the degree of accuracy used. Heights, masses, distances
and temperatures are all examples of continuous data. Continuous data is normally
collected by measuring.
11
ORGANISING
DATA
1. TALLY TABLES (TALLY CHARTS)

Tally tables Tallies are little marks (////) that you use to keep a record of
items you count. Each time you count five items you draw a line across the
previous four tallies to make a group of five (////). Grouping tallies in fives
makes it much easier to count and get a total when you need one.

A tally table is used to keep a record when you are counting things.

13
Look at this tally table. A student used this to record how many cars of each
colour there were in a parking lot. He made a tally mark in the second
column each time he counted a car of a particular colour.

14
2

FREQUENCY
TABLE

15
2. FREQUENCY TABLES
A frequency table shows the totals of the tally marks. Some frequency
tables include the tallies.

16
FREQUENCY
TABLES
The frequency table has space to write a total at the bottom of the
frequency column. This helps you to know how many pieces of
data were collected. In this example the student recorded the
colours of 157 cars.

Most frequency tables will not include tally marks. Here is a


frequency table without tallies.

17
Here is a frequency table without tallies. It was drawn up by the
staff at a clinic to record how many people were treated for
different diseases in one week.

18
3

GROUPING DATA IN
CLASS INTERVALS

19
3. GROUPING DATA IN CLASS
INTERVALS
Sometimes numerical data needs to be recorded in different
groups.

For example, if you collected test results for 40 students


you might find that students scored between 40 and 84 (out
of 100). If you recorded each individual score (and they
could all be different) you would get a very large frequency
table that is difficult to manage.

20
To simplify things, the
collected data can be
arranged in groups called
class intervals.
A frequency table with results
arranged in class intervals is
called a grouped frequency
table.

The range of scores (40–84) has been divided into class intervals
21
STEM AND LEAF DIAGRAMS
A stem and leaf diagram is a special type of table that allows you
to organise and display grouped data using the actual data values.

In a stem and leaf diagram each data item is broken into two parts:
a stem and a leaf.

The final digit of each value is the leaf and the previous digits are
the stem. The stems are written to the left of a vertical line and the
leaves are written to the right of the vertical line

22
STEM AND LEAF DIAGRAMS
For example a score of 13 would be shown as:
Stem Leaf
1 | 3

In this case, the tens digit is the stem and the units digit is the leaf. A
larger data value such as 259 would be shown as:
Stem Leaf
25 | 9

In this case, the stem represents both the tens and the hundreds digits
while the units digit is the leaf.
23
WORKED EXAMPLE 1
This data set shows the ages of customers using an internet café. 34 23
40 35 25 28 18 32 37 29 19 17 32 55 36 42 33 20 25 34 48 39 36 30
Draw a stem and leaf diagram to display this data.

24
STEPS:
1. Group the ages in intervals of ten, 10 – 19; 20 – 29 and so on. These are
two-digit numbers, so the tens digit will be the stem.
2. List the stems in ascending order down the left of the diagram.
3. Work through the data in the order it is given, writing the units digits (the
leaves) in a row next to the appropriate stem.
4. If you need to work with the data, you can redraw the diagram, putting
the leaves in ascending order.
5. From this re-organised stem and leaf diagram you can quickly see that:
• the youngest person using the internet café was 17 years old
• the oldest person was 55 (the last data item)
• most users were in the age group 30 – 39 (the group with the largest
number of leaves).
25
WORKED EXAMPLE 1
This data set shows the ages of customers using an internet café. 34 23
40 35 25 28 18 32 37 29 19 17 32 55 36 42 33 20 25 34 48 39 36 30
Draw a stem and leaf diagram to display this data.

26
STEM AND LEAF DIAGRAMS
A back to back stem and leaf diagram is used to show two sets of data.
The second set of data is plotted against the same stem, but the leaves
are written to the left . This stem and leaf plot compares the battery life
of two different brands of mobile phone.

You read the data for Brand X from right to left.


The stem is still the tens digit. 27
WORKED EXAMPLE 2
The ages of people on a coach transferring them from an airport to a ski resort are
as follows: 22 24 25 31 33 23 24 26 37 42 40 36 33 24 25 18 20 27 25 33 28 33 35
39 40 48 27 25 24 29
Displaying the data on a stem-and-leaf diagram produces the following graph

In this form the data can be analysed quite quickly:


• The youngest person is 18 • The oldest is 48 • The modal ages are 24, 25 and 33
28
WORKED EXAMPLE 3
Continuing from the example given above, consider a second coach from the airport
taking people to a golfing holiday. The ages of these people are shown below:
43 46 52 61 65 38 36 28 37 45 69 72 63 55 46 34 35 37 43 48 54 53 47 36 58 63 70
55 63 64

Displaying the two sets of data on a back-to-back stem-and-leaf diagram is shown


below:

29
4

TWO - WAY
TABLE

30
4. TWO – WAY TABLES
A two-way table shows the frequency of certain results for two
or more sets of data. Here is a two way table showing how many
men and woman drivers were wearing their seat belts when they
passed a check point.

31
4. TWO – WAY TABLES
Here are two more examples of two-way tables:
Drinks and crisps sold at a school tuck shop during lunch break

How often male and female students use Facebook

32
5

PIE CHARTS

33
5. PIE CHARTS

Data can be displayed on a pie chart − a circle divided into


sectors. The size of the sector is in direct proportion to the
frequency of the data. The sector size does not show the actual
frequency. The actual frequency can be calculated easily from
the size of the sector.

34
WORKED EXAMPLE 1

In a survey, 240 English children were


asked to vote for their favourite holiday
destination.
The results are shown on the pie chart
above. Calculate the actual number of
votes for each destination.
ANSWER
The total 240 votes are represented by 360°.
It follows that if 360° represents 240 votes:
There were votes for Spain so, 80 votes for
Spain.
There were votes for France so, 50 votes for
France.

There were votes for Portugal so, 30 votes for Portugal.


There were votes for Greece so, 60 votes for Greece.
Other destinations received votes so, 20 votes for other destinations
WORKED EXAMPLE 2
The table shows how a student spent her day.

Draw a pie chart to show this data


ANSWER
7 + 8 + 1.5 + 3 + 2.5 + 2 = 24 First work out the total number of hours.
ANSWER
7 + 8 + 1.5 + 3 + 2.5 + 2 = 24 First work out the total number of hours.
ANSWER
MEAN, MEDIAN,
MODE AND RANGE

40
AVERAGE
‘Average’ is a word which in general use is taken to mean
somewhere in the middle.

An average is a single value used to represent a set of data.


Th ere are three types of average used in statistics and the
following shows how each can be calculated.
DIFFERENT TYPES OF AVERAGE

The shoe sizes of 19 students in a class are shown below:


4, 7, 6, 6, 7, 4, 8, 3, 8, 11, 6, 8, 6, 3, 5, 6, 7, 6, 4
How would you describe the shoe sizes in this class?

If you count how many size fours, how many size fives and
so on, you will find that the most common (most frequent)
shoe size in the class is six.

This average is called the mode.


DIFFERENT TYPES OF AVERAGE

What most people think of as the average is the value you get when you
add up all the shoe sizes and divide your answer by the number of
students:
𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑠h𝑜𝑒 𝑠𝑖𝑧𝑒𝑠 115
= =6.05(2 𝑑 . 𝑝 .)
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 19
This average is called the mean. The mean value tells you that the shoe
sizes appear to be spread in some way around the value 6.05. It also gives
you a good impression of the general ‘size’ of the data.
DIFFERENT TYPES OF AVERAGE

The mean is sometimes referred to as the measure of ‘central tendency’


of the data.

Another measure of central tendency is the middle value when the shoe
sizes are arranged in ascending order

3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11
DIFFERENT TYPES OF AVERAGE

3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11
If you now think of the first and last values as one pair, the second and
second to last as another pair, and so on, you can cross these numbers off
and you will be left with a single value in the middle.

3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11
This middle value, (in this case six), is known as the median.
DIFFERENT TYPES OF AVERAGE
What if there had been 20 students in the class?
For example, add an extra student with a shoe size of 11.
Crossing off pairs gives this result:

3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 11

You are left with a middle pair rather than a single value.
If this happens then you simply find the mean of this middle pair:
SUMMARY
Mode - The value that appears in your list more than any other. There can
be more than one mode but if there are no values that occur more often
than any other then there is no mode. Mean - . The mean may not be one
of the actual data values.
Median 1. Arrange the data into ascending numerical order.
2. If the number of data is n and n is odd, find and this will
give you the position of the median.
3. If n is even, then calculate and this will give you the position of
the first of the middle pair.
Find the mean of this pair.
WORK EXAMPLE
a. i. Find the mean, median and mode of the data listed below.
1, 0, 2, 4, 1, 2, 1, 1, 2, 5, 5, 0, 1, 2, 3
1 +0 +2+4 +1+ 2+1+1+2+5+5+ 0+1+2+3 30
MEAN ¿
15
¿
15 ¿𝟐

MEDIAN
*Arrange all the data in order and then pick out the middle number.

0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4, 5, 5
MODE
The mode is the number which appeared most often.
Therefore the mode is 1.
WORK EXAMPLE

The frequency chart (below) shows the score out of 10 achieved by a


class in a maths test. Calculate the mean, median and mode for this data.

ii. Calculate the range of the data.


WORK EXAMPLE (ANSWER)

Transferring the results to a frequency table gives:

168 (5+ 6)
𝑚𝑒𝑎𝑛= 𝑚𝑒 𝑑𝑖𝑎𝑛=
=𝟓 .𝟐𝟓 =𝟓 .𝟓𝑚 𝑜𝑑𝑒=𝟕
32 2

ii. Range = 10 – 0 = 10
STEM – AND –
LEAF DIAGRAM
WORKED EXAMPLE 1
The ordered stem and leaf diagram shows the number of customers served at a
supermarket checkout every half hour during an 8-hour shift.

Stem Leaf
0 2 5 5 6 6 6 6
1 1 3 3 5 5 6 7 7
2 1

a. What is the range of customers served?


b. What is the modal number of customers served.
c. Determine the median number of customers served?
d. How many customers were served altogether during this shift?
e. Calculate the mean number of customers served every half hour.
WORKED EXAMPLE 1 (ANSWER)
Stem Leaf
0 2 5 5 6 6 6 6
1 1 3 3 5 5 6 7 7
2 1

a. What is the range of customers served?


The lowest number is 2 and the highest number is 21.
The range is 21 – 2 = 19 customers.
b. What is the modal number of customers served.
6 is the value that appears most often.
WORKED EXAMPLE 1 (ANSWER CONT…)
Stem Leaf
0 2 5 5 6 6 6 6
1 1 3 3 5 5 6 7 7
2 1

c. Determine the median number of customers served?


There are 16 pieces of data, so the median is the mean of the 8th
and 9th values
(11 +13 ) 24
= =𝟏𝟐
2 2
WORKED EXAMPLE 1 (ANSWER CONT…)
Stem Leaf
0 2 5 5 6 6 6 6
1 1 3 3 5 5 6 7 7
2 1
d. How many customers were served altogether during this shift?
To calculate this, find the sum of all the values. Find the total for
each row and then combine these to find the overall total.
Row 1: 2 + 5 + 5 + 6 + 6 + 6 + 6
Row 2: 11 + 13 + 13 + 15 + 15 + 16 + 17 + 17
Row 3: 21
36 + 117 + 21 = 174 customers in total
WORKED EXAMPLE 1 (ANSWER CONT…)
Stem Leaf
0 2 5 5 6 6 6 6
1 1 3 3 5 5 6 7 7
2 1
e. Calculate the mean number of customers served every half hour.
𝑠𝑢𝑚 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
𝑀𝑒𝑎𝑛=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
174
¿ =𝟏𝟎 . 𝟖𝟕𝟓 𝒄𝒖𝒔𝒕𝒐𝒎𝒆𝒓𝒔 𝒑𝒆𝒓 𝒉𝒂𝒍𝒇 𝒉𝒐𝒖𝒓 .
16
WORKED EXAMPLE 2
The marks obtained by 100 students in a test were as follows:

Find:
a. The mean mark
b. The median mark
c. The modal mark
WORKED EXAMPLE 3
(ANSWER)
The marks obtained by 100 students in a test were as follows:

Find:
a. The mean mark
𝑀𝑒𝑎𝑛=
∑ 𝑥𝑓 where means ‘the sum of the products’
∑𝑓 and means ‘the sum of the frequencies’
( 0 × 4 )+ ( 1 × 19 )+ ( 2 × 25 ) + ( 3 × 29 )+(4 × 23)
𝑀𝑒𝑎𝑛=
100
248
𝑀𝑒𝑎𝑛= =2.48
100
WORKED EXAMPLE 3
(ANSWER)
The marks obtained by 100 students in a test were as follows:

Find:
b. the median mark
The median mark is the number between the 50th and
51st numbers. By inspection, both the 50th and 51st numbers are 3
∴ 𝑀𝑒𝑑𝑖𝑎𝑛=3 𝑚𝑎𝑟𝑘𝑠

c. the modal mark


𝑀𝑜𝑑𝑎𝑙 𝑚𝑎𝑟𝑘=3
CALCULATING
AVERAGES AND RANGES
FOR GROUPED CONTINUOUS
DATA
WORKED EXAMPLE 1
The heights of 100 children were measured in cm and the results
recorded in the table below:

Find an estimate for the mean height of the children, the modal class,
the median class and an estimate for the range.
WORKED EXAMPLE 1 (ANSWER
CONT…)
So, extend your table to include midpoints and then totals for each class:
Height in cm (h) Frequency (f) Midpoint Frequency x midpoint
12 124.5 1494
16 134.5 2152
38 144.5 5491
24 155.5 3732
10 164.5 1645
Total = 14514

An estimate for the mean height of the children is then:


𝑀𝑒𝑎𝑛=
∑ 𝑥𝑓 ¿
14514
=145.14
∑𝑓 100
WORKED EXAMPLE 1 (ANSWER
CONT…)

+12=28

+28=66

+66=90

+90=100

Median class:
WORKED EXAMPLE 1 (ANSWER
CONT…)

+12=28

+28=66

+66=90

+90=100

Median class:
The class with the highest frequency is the modal class. In this case it is the same class as the
median class:
The shortest child could be as small as 120cm and the tallest could be as tall as 170cm. The best estimate of the
range is, therefore, 170 − 120 = 50cm. The class with the highest frequency is the modal class. In this case it is the
same class as the median class: .
WORKED EXAMPLE 2
The history test scores for a group of 40 students are shown in the
grouped frequency table below.

a. Calculate an estimate for the mean test result

b. What is the modal class?


WORKED EXAMPLE 2
(ANSWER)
The history test scores for a group of 40 students are shown in the
grouped frequency table below.

a. Calculate an estimate for the mean test result


19   + 118  +  693  +  1112  +  358
Mean  = =57.5
40

b. What is the modal class? 6 0 ≤ 𝑆 ≤ 79


WORKED EXAMPLE 2
The marks obtained by 100 students in a test were as follows:

Find:
a. The mean mark
b. The median mark
c. The modal mark
WORKED EXAMPLE 2
The marks obtained by 100 students in a test were as follows:

Find:
a. The mean mark
b. The median mark
c. The modal mark
WORKED EXAMPLE 2
The marks obtained by 100 students in a test were as follows:

Find:
a. The mean mark
b. The median mark
c. The modal mark
HISTOGRAMS

71
HISTOGRAMS WITH
EQUAL CLASS
INTERVALS

72
WORK EXAMPLE 1
The table and histogram below show the heights of trees in a sample from a forestry site.

a. How many trees are less than 5 m tall?


b. What is the most common height of tree?
c. How many trees are 20 m or taller?
d. Why do you think the class intervals include inequality symbols?
e. Why is there a gap between the columns on the right-hand side of the graph?
WORK EXAMPLE 1
The table and histogram below show the heights of trees in a sample from a forestry site.

a. How many trees are less than 5 m tall?


Read the frequency (vertical scale on the histogram) for the bar 0 – 5.
= 21
WORK EXAMPLE 1
The table and histogram below show the heights of trees in a sample from a forestry site.

b. What is the most common height of tree?


Find the tallest bar and read the class interval from the horizontal scale.
𝟐𝟓 ≤ 𝒉<𝟑𝟎
WORK EXAMPLE 1
The table and histogram below show the heights of trees in a sample from a forestry site.

c. How many trees are 20 m or taller?


Find the frequency for each class with heights of 20 m or more and add them
together.
𝟐𝟗+𝟑𝟎+𝟐=𝟔𝟏
WORK EXAMPLE 1
The table and histogram below show the heights of trees in a sample from a forestry site.

d. Why do you think the class intervals include inequality symbols?


The horizontal scale of a histogram is continuous, so the class intervals are also
continuous. The inequality symbols prevent the same height of tree falling into more than
one group. For example, without the symbols a tree of height 5 m could go into two groups
and thus be counted twice.
WORK EXAMPLE 1
The table and histogram below show the heights of trees in a sample from a forestry site.

e. Why is there a gap between the columns on the right-hand side of the graph?
The frequency for the class interval is zero, so no bar is drawn.
WORK EXAMPLE 2
Joy-Anne did an experiment in her class to see what mass of raisins (in grams) the
students could hold in one hand. Here are her results.

a. Using the class intervals 16–20, 21–25, 26–30 and 31–35 draw a grouped
frequency table.
b. What is the modal class (the mode) of this data?
c. Draw a histogram to show her results.
WORK EXAMPLE 2 (ANSWER)
Joy-Anne did an experiment in her class to see what mass of raisins (in grams) the
students could hold in one hand. Here are her results.

a. Using the class intervals 16–20, 21–25, 26–30 and 31–35 draw a grouped
frequency table.
Count the number in each class to fill in
the table.
WORK EXAMPLE 2 (ANSWER)
Joy-Anne did an experiment in her class to see what mass of raisins (in grams) the
students could hold in one hand. Here are her results.

b. What is the modal class (the mode) of this data?


It is actually not possible to find the mode of grouped data because you do not have the
individual values within each group.
Instead, you find the class interval that has the greatest frequency. This is called the
‘modal class ’
The modal class is 21 - 25
WORK EXAMPLE 2 (ANSWER)
Joy-Anne did an experiment in her class to
see what mass of raisins (in grams) the
students could hold in one hand. Here are
c. Draw a histogram to show her results.
her results.
HISTOGRAMS
WITH UNEQUAL
CLASS INTERVALS

83
WORKED EXAMPLE 1
Here is a table showing the heights of 25 plants.

Draw a histogram to show these results.


WORKED EXAMPLE 1 (ANSWER)
First work out the frequency density by adding columns to your
frequency distribution table like this:

Next draw the axes. You will need to decide on a suitable scale for both the
horizontal and the vertical axes.
WORKED EXAMPLE 1 (ANSWER
CONT…)
Here, 1 cm has been used to represent 10 cm on the horizontal axis (label
height in cm) and 2 cm per unit on the vertical axis (label frequency density).
Once you have done this, draw the histogram, paying careful attention to the
scales on the axes.
CUMULATIVE
FREQUENCY

87
WORKED EXAMPLE 3
The examination marks of 300 students are summarised in the table.

a. Draw a cumulative frequency


table.
b. Construct a cumulative
frequency graph to show this
data.
c. Calculate an estimate for the
median mark.
WORKED EXAMPLE 3
(ANSWER)
a. Draw a cumulative frequency table.
WORKED EXAMPLE 3
(ANSWER)
b. Construct a cumulative
frequency graph to show this
data.
WORKED EXAMPLE 3
(ANSWER)
c. Calculate an estimate for the
median mark.
The median is the middle value.
For continuous data, the middle
value can be found by dividing the
total frequency by 2.
,
so the median mark is the 150th
result.
The median mark is 58.
WORKED EXAMPLE 4
This cumulative frequency curve shows the journey times to school of
different students.
WORKED EXAMPLE 4
Use the curve to find:
a. the total number of students
b. an estimate of the median journey time
c. the number of students who took less than 10 minutes to
get to school
d. the number of students who had journey times greater
than 30 minutes
e. the number of students who took between 40 minutes and
one hour to get to school.
WORKED EXAMPLE 4 (ANSWER)
a. the total number of students

The top of the curve


is at 50, so this is the
total frequency.
WORKED EXAMPLE 4 (ANSWER)
b. an estimate of the median journey time

so it’s the 25th result;


drop a perpendicular
from where the line
cuts the graph.

38
WORKED EXAMPLE 4 (ANSWER)
c. the number of students who took less than 10 minutes to get to school

Read off the


cumulative frequency
at 10 minutes.

=4
WORKED EXAMPLE 4 (ANSWER)
d. the number of students who had journey times greater than 30 minutes

Subtract the
cumulative frequency
at 30 minutes, 18, from
the total frequency.

50 - 18= 12
WORKED EXAMPLE 4 (ANSWER)
e. the number of students who took between 40 minutes and one hour to get to
school.
Subtract the
cumulative frequency
at 40 minutes, 28, from
that at 60 minutes, 42.

42 - 28= 14
WORKED EXAMPLE 5
Twenty bean seeds were planted for a biology experiment. The heights of
the plants were measured after three weeks and recorded as below.

a. Find an estimate for the mean height.


b. Draw a cumulative frequency curve and find an estimate for the
median height
WORKED EXAMPLE 5 (ANSWER)
a. You will need the mid-points of the classes to help you find an estimate of
the mean, and the cumulative frequency to help find an estimate of the median,
so more columns need to be added to the table.
Don’t forget to label the new columns.

𝑀𝑒𝑎𝑛=
∑ 𝑓𝑥
𝑀𝑒𝑎𝑛 h𝑒𝑖𝑔h𝑡 =
132
=6.6 𝑐𝑚
∑𝑓 20
WORKED EXAMPLE 5 (ANSWER)
b. Draw a cumulative frequency curve and find an estimate for the
median height

, so the median height is the 10th


value.

𝑀𝑒𝑑𝑖𝑎𝑛 h𝑒𝑖𝑔h𝑡=7.0 𝑐𝑚
QUARTILES AND THE
INTERQUARTILE
RANGE
QUARTILES

Consider the following set of distinct data arranged in


ascending order: Set A: 2, 5, 6, 7, 8, 12, 14, 16, 20 , 21, 30
The total number of data values is 11, i.e. n = 11.

2 5 6 7 8 12 14 16 20 21 30
median
QUARTILES

We consider the 5 values on the left of the median. The middle value of these 5
values is 6 and it is called the lower quartile or the first quartile .

Lower half

2 5 6 7 8 12 14 16 20 21 30
Lower quartile median
The first quartile can be considered as the ‘first-quartile value’.
25% (or one quarter) of the data less than or equal to this value.
QUARTILES

Since the median is the middle value or ‘second – quarter value’, the median is also
called the second quartile . 50% (or half) of the data is less than or equal to this
value.
Lower half

2 5 6 7 8 12 14 16 20 21 30
Lower quartile Median
QUARTILES
We consider the 5 values on the left of the median. The middle value of these 5
values is 6 and it is called the upper quartile or the third quartile . 75% (or three
quarters) of the data is less than or equal to this value.
upper half

2 5 6 7 8 12 14 16 20 21 30
Median Upper quartile
QUARTILES
We see that the quartiles obtained by the above method divide the data which is
arranged in ascending in ascending order into 4 roughly equal parts.

Lower half upper half

2 5 6 7 8 12 14 16 20 21 30
Lower quartile Median Upper quartile
RANGE AND THE INTERQUARTILE RANGE
The figure shows the range and interquartile for the data values in Set A. The median,
range and interquartile range are indicated in the dot diagram as shown.
range

0 2 6 12 20 30

𝑄1 𝑀 𝑒𝑑𝑖𝑎𝑛 𝑄3

Interquartile range
These measures of spread show the degree of variation or how ‘spread out’ the data
values are.
RANGE AND THE INTERQUARTILE RANGE
For Set A,

range

0 2 6 12 20 30

𝑄1 𝑀 𝑒𝑑𝑖𝑎𝑛 𝑄3

Interquartile range
WORK EXAMPLE: FINDING AND INTERPRETING THE RANGE AND INTERQUARTILE
RANGE FOR A SET OF DISCRETE WITH AN EVEN NUMBER OF DATA VALUES

The data below shows the marks for a multiple choice quiz with 20
questions, taken by 8 students.

10 12 12 13 9 17 11 14
(i) For the given set of data, find , , and .
(ii) Find the range.
(iii) Find the interquartile range.
WORK EXAMPLE: FINDING AND INTERPRETING THE RANGE AND INTERQUARTILE
RANGE FOR A SET OF DISCRETE WITH AN EVEN NUMBER OF DATA VALUES

The data below shows the marks for a multiple choice quiz with 20
questions, taken by 8 students.

10 12 12 13 9 17 1114
(i) For the given set of data, find , , and .
Arranging the given data in ascending order.

Lower half Upper half

9 10 11 12 12 13 17 14
𝑄1 𝑄2 𝑄3
WORK EXAMPLE: FINDING AND INTERPRETING THE RANGE AND INTERQUARTILE
RANGE FOR A SET OF DISCRETE WITH AN EVEN NUMBER OF DATA VALUES

Lower half Upper half

9 10 11 12 12 13 17 14
𝑄1 𝑄2 𝑄3
(i) For the given set of data, find , , and .
12+12 10+11 13+14
𝑄2 = =12 𝑄1 = =10.5 𝑄3 = = 13.5
2 2 2

(ii) Range = 17 – 9 = 8
(iii) Interquartile range = –
= 13.5 – 10.5
=3
PERCENTILES AND
QUARTILES
QUARTILES
Two very important percentiles are the upper and lower quartiles.
These lie 25% and 75% of the way through the data respectively.
Use the following rules to estimate the positions of each quartile within
a set of ordered data:

If the position does not turn out to be a whole number, you simply find
the mean of the pair of numbers on either side.
For example, if the position of the lower quartile turns out to be 5.25,
then you find the mean of the 5th and 6th pair.
INTERQUARTILE RANGE
As with the range, the interquartile range gives a measure of how
spread out or consistent the data is.
The main difference is that the interquartile range (IQR) avoids using
extreme data by finding the difference between the lower and upper
quartiles. You are, effectively, measuring the spread of the central
50% of the data.

If one set of data has a smaller IQR than another set, then the
first set is more consistent and less spread out.
This can be a useful comparison tool.
WORK EXAMPLE

For each of the following sets of data calculate the median,


upper and lower quartiles.
In each case calculate the interquartile range.

a. 13, 12, 8, 6, 11, 14, 8, 5, 1, 10, 16, 12


b. 14, 10, 8, 19, 15, 14, 9
WORK EXAMPLE (ANSWER)

a. 13, 12, 8, 6, 11, 14, 8, 5, 1, 10, 16, 12


a. First sort the data into ascending order.
1, 5, 6, 8, 8, 10, 11, 12, 12, 13, 14, 16
There is an even number of items (12). So for the median, you find the value of
the middle pair, the first of which is in position .
median = = 10.5

lower quartile () = = 3.25 (position) upper quartile () = = 9.25 (position)

Notice that these are not whole numbers, so the lower quartile will be the mean of the 3rd and 4th
values, and the upper quartile will be the mean of the 9th and 10th values.
= =7 = = 12.5

Thus, the IQR = 12.5 – 7 = 5.5


WORK EXAMPLE (ANSWER)

b. 14, 10, 8, 19, 15, 14, 9


The ordered data is:
8, 9, 10, 14, 14, 15, 19
The number of data is odd, so the median will be in position . The median is
14.

There are seven items, so calculate lower quartile () = = 2


and upper quartile () = = 6

These are whole numbers so the lower quartile is in position two and the upper
quartile is in position six.
So = second position = 9 Thus, the IQR = 15 – 9 = 6
= sixth position = 15
QUARTILE
(GROUPED
DATA)

20XX Pitch Deck 119


QUARTILES (GROUPED DATA)

The cumulative frequency curve on the next slide shows the


marks obtained by 64 students in a test. These are listed below:
• 48 students scored less than 15 marks. 15 marks is the upper
quartile or third quartile .
• 32 students scored less than 13 marks. 13 marks is the
second quartile , or median mark.
• 16 students scored less than 11 marks. 11 marks is the lower
quartile or first quartile .
QUARTILES (GROUPED DATA)
Cumulative Frequency

A pet shop owner weighs his mice every week to check their health.
The weights of the 80 mice are shown below:
Cumulative
weight (g) Frequency (f) Frequency
0 < w ≤ 10 3   3
10 < w ≤ 20 5   8
20 < w ≤ 30 5   13
30 < w ≤ 40 9   22
40 < w ≤ 50 11   33
50 < w ≤ 60 15   48
60 < w ≤ 70 14   62
70 < w ≤ 80 8  
70
80 < w ≤ 90 6   76
90 < w ≤100 4   80

Cumulative means adding up, so a cumulative frequency diagram


requires a running total of the frequency.
Cumulative Frequency
80 x
Cumulative x
weight (g) Frequency (f) Frequency
70 x
0 < w ≤ 10 3   3
x
10 < w ≤ 20 5   8 60

Cumulative frequency
20 < w ≤ 30 5   13
50 x
30 < w ≤ 40 9   22
40 < w ≤ 50 11   33 40
50 < w ≤ 60 15   48 x
60 < w ≤ 70 14   62 30

70 < w ≤ 80 8  
70 20
x
80 < w ≤ 90 6   76 x
90 < w ≤100 4   80 10 x
x
0
1. The cumulative frequency (c.f.) can now 0 10 20 30 40 50 60 70 80 90 100
Weight (g)
be plotted on a graph taking care to plot
2. The point are now joined with straight lines
the c.f. at the end of each class interval.
The line always starts at the bottom of the
This is because we don’t know where in first class interval
the class interval 0 < w ≤ 10, the values
The resulting graph should look like this and
are, but we do know that by the end of the
is sometimes called an ‘S’ curve.
class interval there are 3 pieces of data
Cumulative Frequency
From this graph we can now find estimates of the median, and upper and lower
quartiles Upper quartile
There are 80 pieces of data 80 x
x
The middle is the 40th 70 x
The lower quartile is the 20 th
x
60
piece of data ¼ of the total

Cumulative frequency
pieces of data Median position 50 x

Read across, then down to 40


find the median weight x
30

Lower quartile 20
x
x
10 x
The upper quartile is the 60th x
piece of data ¾ of the total 0
0 10 20 30 40 50 60 70 80 90 100
Weight (g)
pieces of data Lower quartile is 38g
Median weight is 54g
Upper quartile is 68g
INTERQUARTILE (GROUPED DATA)

The upper and lower quartiles can now be used to find what is called
the interquartile range and is found by:
Upper quartile – Lower quartile
In this example: Lower quartile is 38g Upper quartile is 68g

The interquartile range (IQR) = 68 – 38 = 30g

Because this has been found by the top ¾ subtract the bottom ¼
½ of the data (50%) is contained within these values
So we can also say from this that half the mice weigh between 38g and 68g
WORK EXAMPLE
In an international competition 60 children from Britain and France
did the same Maths test. The results are in the table below:
Britain Britain France France
Marks Frequency c.f. Frequency c.f
1-5 1   2  
6 - 10 2   5  
11 - 15 4   11  
16 - 20 8   16  
21 - 25 16   10  
26 - 30 19   8  
31 - 35 10   8  

a. Using the same axes draw the cumulative frequency diagram for
each country.
b. Find the median mark and the upper and lower quartiles for both
countries and the interquartile range.
c. Make a short comment comparing the two countries
WORK EXAMPLE (ANSWER)
Britain Britain France France Both have 60 pieces of data
Marks Frequency c.f. Frequency c.f
1-5 1 1 2 2
6 - 10 2
 
3 5
 
7
Median position is 30
   
11 - 15 4   7 11   18 Lower quartile position is 15
16 - 20 8   15 16   34
21 - 25 16 31 10 44
   
Upper quartile position is 45
26 - 30 19   50 8   52
31 - 35 10   60 8   60

Britain 60 x
France xx
50

Britain Cumulative frequency x France


40
x
LQ = 20 30 x LQ = 13.5
Median = 25 20
Median = 19
x
x UQ = 26
UQ = 29 10
x x
IQR = 9 xx x IQR = 12.5
0
0 5 10 15 20 25 30 35
Marks
The scores in Britain are higher with less variation
WORK EXAMPLE
The percentage scored by 1000 students on an exam is shown on this
cumulative frequency curve.

Use the cumulative frequency curve to find an estimate for:


a. the median score
b. the lower quartile
c. the upper quartile d the interquartile range.
WORK EXAMPLE (ANSWER)
WORK EXAMPLE (ANSWER)
BOX – AND -
WHISKER
PLOTS

20XX Pitch Deck 131


DRAWING BOX – AND – WHISKER PLOTS
All box-and-whisker plots have the same basic features. You
can see these on the diagram.
WORK EXAMPLE
The marks obtained by 80 students in an examination are shown below.

The table also shows the cumulative


frequency.
a. Plot a cumulative frequency
curve and hence estimate:
i. The median
ii. The interquartile range.
b. Illustrate this data using a box –
and – whisker plot.
WORK EXAMPLE (ANSWER)
The marks obtained by 80 students in an examination are shown below.
The table also shows the cumulative
frequency.
a. Plot a cumulative frequency curve and
hence estimate:
i. The median
ii. The interquartile range.
From the cumulative frequency curve
Median = 55 marks
Lower quartile = 37.5 marks
Upper quartile = 68 marks
Interquartile range = 68 – 37.5
= 30.5 marks.
WORK EXAMPLE (ANSWER)

b. Illustrate this data using a box – and – whisker plot.


From the cumulative frequency curve
Median = 55 marks
Lower quartile = 37.5 marks
Upper quartile = 68 marks
Interquartile range = 68 – 37.5
= 30.5 marks.

You might also like