Download as pdf or txt
Download as pdf or txt
You are on page 1of 206

MATHEMATICS

GRADE 12

TOPIC : STATISTICS

2024 SCHOOL WINTER


HOLIDAY
PROGRAMME

LESSON TOPIC : BIVARIATE DATA


Grade 12
Bivariate Data
22 June 2024
What is Statistics

• A branch of mathematics dealing with the collection, analysis,


interpretation, and presentation of masses of numerical data.
• A collection of quantitative data.
Importance of statistics in our lives
TERMINOLOGY

DISCRETE DATA

• Data that can only take certain values. For example, the number of
learners in a class (there can’t be half a learner)

CONTINUOUS DATA

• Data that can take on any value within a certain range. For example,
the heights of a group of learners (heights could be measured in
decimals)
Scatter plot & regression line

6
Example of a Scatter Plot

7
Types of Correlation(r)

8
Scatter Plot

9
Scatter Plot: Strong Positive Correlation (r)

10
Scatter Plot: Moderate Positive Correlation (r)

11
Scatter Plot: Negative Correlation (r)

12
Scatter Plot: Moderate Negative Correlation (r)

13
Scatter Plot: Relationship

14
Scatter Plot & Line of Best Fit

15
Scatter Plot & Regression line

16
Scatter Plot & Regression line

17
Scatter plot & regression line

18
Correlation between two variables

• So far we only looked at univariate data or single variable data


(only one measure xi was made).
• Now consider bivariate data which result in pairs of
measurements of the type (xi,yi).
• It is possible to examine relationship between the two
variables.
• There may be a pattern in how the one variable changes in
relationship with the other.
• A key question: How well do the two variables
correlate?
Bivariate Data – Example
Take 5 businesses (the units) and for each unit
record the total income ( y ) for a specific month
as well as the number of employees ( x) in the
business at the time.
x 4 6 7 9 12
y R50 000 R54 000 R68 000 R70 000 R110 000
• Is there a relationship b e t w e e n x a n d y ?
• H o w can a possible relationship be investigated?
• O n e useful tool to V I E W the relationship is to draw
a scatter plot.
Representation by Scatter Plot
Number of employees ( x) 4 6 7 9 12
Total income ( y) R50 000 R54 000 R68 000 R70 000 R110 000

• Is there an obvious relationship between x and y?


• What is the general trend (High & Low values of x and y ?)
Interpolation and Extrapolation

•Interpolation: • Extrapolation :
Estimate values of y outside
Estimate values of y for values
the range of observed values.
of x within observedrange. Predictions
Examples of Bivariate Data
Consider the following three sets of collected bivariate data:
Example 1 :
Key Questions :
• Is there a relationship between x and y ? Can y be x −4 −2 −1 1 4
expressed as a relation in x ? y −11 −7 −5 −1 5
• Can we determine the defining equation for such a relation? Can we
extend the observed data?

• How can a possible relationship be investigated? One method


is to draw a scatter plot.
• Can we identify the general trend?
Can we determine the best fit curve? Example 2 :
• How can we determine the degree of correlation?
x −4 −2 −1 2 4
Pearson's Correlation Coefficient
y 48 12 6 0.75 0.1875
Coefficient of Determination
Scatter Plot for Examples 1 & 2
Ex ample 1 : Example 2:
x −4 −2 −1 1 4 x −4 −2 −1 2 4
y −11 −7 −5 −1 5 y 48 12 6 0.75 0.1875

• Strong linear relationship between x andy. • Exponential Decay relationship (Trend).


• Increasing relationship (Trend). • Strong exponential relationship between x andy.
• Linear relationship with a positive gradient. • Exponential relationship with a
decreasing negative gradient.
Example: Defining Equation for Linear Relationship

x −4 −2 −1 1 4
Observed Data:
y −11 −7 −5 −1 5
• Select STAT LINEAR MODE on your calculator.
• Input OBSERVED DATA from table into calculator.
• RECALL from Calculator values of A (y − intercept) and B (gradient).
• Write down defining equation in the form y = Bx + A.
 Def in in g Equation: y = 2 x − 3
yˆ(−3 ) = −9
• Utilize this defining equation to extend data table.
• OR utilize Calculator Capacity to extend table. yˆ(0) = −3

x −4 −3 −2 −1 0 1 2 3 4 yˆ(2 ) = 1
y −11 −9 −7 −5 −3 −1 1 3 5 yˆ(3) = 3
Strength of Linear Relationship
y •• y
•• • •• •
•• • •
•• • • • ••
• •• •

• •• • •
• • • •• •
x x
Strong, +ve Linear Weak Linear, - ve
Correlation Correlation

• Observed Data will hardly ever perfectly fit


the linear relationship.
• Idea is to determine the best fit line.
Properties of Correlation Coefficient

1. −1  r  1
2. r = 1  A perfect positive linear relationship
(All points exactly on line with a positive gradient)
3. r = −1  A perfect negative linear relationship.
(All points exactly on line with a negative gradient)
4. r = 0  No linear relationship.

What if values of r is relativelyclose to 1?


Scatter plot & regression line
The following table provides different examples of r and how to interpret these values.

r Interpretation
1 Perfect positive association
0,9 Strong positive association
0,5 Moderate positive association
0,2 Weak positive association
0 No association
−0, 2 Weak negative association
−0,5 Moderate negative association
−0,9 Strong negative association
−1 Perfect negative association

N.B: The description of r is not provided in the formula sheet (remind learners). 28
Coefficient of Determination.
Important question: How well does the straight line
represent the relationship?
Regression line can be seen as a Regression line is not a good
good fit for observed data fit for observed data
y •
y •
• •
••• • •
• •• ••• •
•• • •
• • 2 r  100% • •
••
x x
Coefficient of determination = r −squared = r 2
What percentage of variation in y is explained by variations in x ?
Activity

30
Example : Eight randomly selected families were asked about their monthly
incomes and amounts saved in that month. The following data was
recorded:
Amount saved 500 1 900 1 100 2 500 1 500 2 300 800 2 100
MonthlyIncome 10 000 19 500 13 400 27 000 16 000 22 000 12 000 20 000

• Understand which variable is independent and which is dependent.


• Draw a scatter diagram.
• Use your scientific calculator to determine the Correlation Coefficient.
• If a linear relationship is obvious determine the equation of the best
fit line. Use scientific calculator.
• Determine the mean point (x ; y) and y-intercept AND /OR at least two
predicted points thatcould be on the line of best fit.
• Utilize these points to draw the desired straight line.
Scatter Diagram
Monthly Income: x 10 000 19 500 13 400 27 000 16 000 22 000 12 000 20 000
Amount saved: y 500 1 900 1 100 2 500 1 500 2 300 800 2 100

General Trend
• Linear relationship
• Positive gradient
• Savings increases with increase in income
Correlation Coefficient
Monthly Income: x 10 000 19 500 13 400 27 000 16 000 22 000 12 000 20 000
Amount saved: y 500 1 900 1 100 2 500 1 500 2 300 800 2 100

Procedure :
• Select STATS LINEAR MODE
• Input available data
• RECALL value of r

r = 0.975  A strong positive linear correlation


Sketch the best fit line
U s e equation y = 0 . 1 2 5 5 x − 0 . 6 0 7 0 to calculate
t w o points (better wi t h 3 points) o n this line.
 Alternatively : Utilize facility of y o u r calculator 
 
 to obtain the coordinates of these points 

Plot three points and draw best fit line.


Predict from graph saving of unobserved incomes.




Activity 1
A training manager wants to know if there is a link between the hours in training (x)
spent by a particular category of employee and their productivity (units produced per day, y )
on a job. The data below was extracted from the files of 10 employees.
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38

(1) Draw a scatter plot for the data on a grid.


(2) Using the least square method, establish a linear relationship
between training hours and productivity for these employees.
(3) Draw the least square line on the scatter plot.
(4) Estimate the productivity level for a particular employee
who has received only 22 hours of training.
(5) Determine the correlation between productivity and hours of training.
(6) Is the association strong? Advise the manager.
Scatter Plot
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Equation Least Square Line: Using Calculator

Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38

Use Calculator:
Select STAT Mode
Select line option: Y = A + BX
Input data
Recall A = 29.22 and B = 0.89
Defining equation: y = 0.89x + 29.22
Equation Least Square Line: Using Formulae

Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Use relevant formulae:
yˆ= a + b  x where
a = y − b  x and
Equation of least squares line:
y = 0.89x + 29.22 (Same result)
Due to mark allocation the calculator option is prefered.
Sketch Least Square Line
( )
Always one possibility: x; y = (30.1;55.9)

Determine at least 3 points on line:  yˆ(20) = 46.9  (20;46.9) is a second possible point

 yˆ(40) = 64.7  (40;64.7 ) is a third possible point
Activity 2

The relationship between age and spending money on buying clothes per month has
been studied for years. Research has shown the following results:

Age Spending money on buying clothes in rands


21 R1534
27 R1 260
35 R 986
42 R 810
55 R 450
63 R 250

40
……….Activity 2

1. Draw a scatter plot to represent the data.

2. Determine the equation of the regression line and draw it on the scatter plot.

3. Describe the trend of the data with reference to the correlation coefficient.

4. Estimate the expenditure of a 50-year old person from the scatter plot.

5. Is the estimation in 2.4 an example of interpolation or extrapolation?


Explain.

41
Solution

1.

Age

42
Solution

2.
3. r = −0,995
Strong negative correlation.

4.
1. Approx R600

5. Interpolation.
Data value is within the given range.

43
Activity 2
(1) Estimate the productivity level for a particular employee
who has received only 22 hours of training.
(2) Determine the correlation between productivity and hours of training.
() Is the association strong? Advise the manager.
…..Activity 3
(1) Take a reading from graph or
Determine yˆ(22) = 48.72 using calculator or
Calculate: yˆ(22) = 0.8922 + 29.22 = 48.8
() Using calculator: r = 0.66

(3) Although there is a positive correlation.


It is not a strong correlation.
If we consider this relationship to be exponential
the correlation coefficient is 0.68 which is also weak.
Activity 4
A learner conducted and experiment to investigate the relationship between
age ( x) and resting heart rate ( y ) in beats per minute. He sought the assistance
of a local clinic. The information for 12 people is hown in the table below.

Age ( x) 59 32 42 50 22 39 21 20 27 40 29 47
Resting heart rate ( y ) 88 74 74 93 85 71 78 82 70 75 95 75

(1) Reprresent the data in a scatter plot.


(2 ) Determine the equation of the least squares line.
(3) Draw the least squares line on the scatter plot.
(4 ) Calculate the correlation coefficient for the data.
(5) Use the correlation coefficient to comment on the relastionship
between age and the resting heart rate.
(6 ) If a learner uses the least square line to predict the resting heart rate
of a 45-year-old person, will his answer be reliable? Motivate your answer.
Scatter Plot; Equation of least square line and sketch thereof

Age ( x) 59 32 42 50 22 39 21 20 27 40 29 47
Resting heart rate ( y ) 88 74 74 93 85 71 78 82 70 75 95 75

y = 0.0954x +76.5956
…..Solution
(4 ) Calculate the correlation coefficient for the data.
(5) Use the correlation coefficient to comment on the relastionship
between age and the resting heart rate.
(6 ) If a learner uses the least square line to predict the resting heart rate
of a 45-year-old person, will his answer be reliable? Motivate your answer.

(4) Using calculator: r = 0.140837 0.14

(5) Very weak positive relationship

(6) Take a reading from graph or


Determine yˆ(45) = 80.89 using calculator or
Calculate: yˆ(45) = 0.095445 + 76.5956 = 80.8886
Not reliable as r = 0.14 is a very weak correlation
Activity 5
The data below shows the pulse rate of a sample of 12 people
when they rest and then again after 2 minutes of jogging.
Resting heart rate: R
47 55 95 65 75 78 80 72 82 76 68 62
(in beats per minute)
Heart rate after jogging: J
65 68 100 78 81 90 85 84 105 88 75 80
(in beats per minute)

(1) Draw a scatter plot of the data given on a grid.


(2 ) Calculate the equation of the least square line for this data.
(3) Calculate the correlation coefficient.
(4 ) Comment on the correlation of the data.
(5) If Joan's heart rate after jogging is 86 beats per minute,
what is her resting heart rate, in beats per minute?
Solution
Use Calculator to show that:
Least square regression line is defined by
y = 0.81437x +25.22587
Correlation Coefficient: r = 0.89791
Show that: xˆ(86) = 74.62682

(1) Draw a scatter plot of the data given on a grid: Left as an exercise.
(2 ) Calculate the equation of the least square line for this data: y = 0.81x + 25.23
(3) Calculate the correlation coefficient: r = 0.898
(4 ) Comment on the correlation of the data.
Strong positive correlation
(5 ) If Joan's heart rate after jogging is 86 beats per minute,
what is her resting heart rate, in beats per minute?
xˆ(86) = 74.6 beats per minute
Activity 6
The outdoor temperature, in C, at noon on 10 days and the
number of units of electricity, in kW, used to heat a house on
each of those days, are shown in the table below.
Noon temperature: T
7 11 9 2 4 7 0 10 5 3
(in C)
Units of electricity used: E
32 20 27 37 32 28 41 23 33 36
(in kW)
(1) Draw a scatter graph that shows this information on a grid.
(2 ) Determine the equation of the least squares regression line.
(3) Determine the correlation coefficient.
(4 ) What can we conclude about the relationship between the noon
temperature and the number of units of electricity used for heating?
(5 ) Estimate the number of units of electricity that was used to heat the
house on a day when the outdoor temperatuer at noon was 8C.
Solution
Use Calculator to show that:
Least square regression line is defined by
y = −1.73639x +40.971088
Correlation Coefficient: r = −0.969926
Show that: yˆ(8) = 27.07993

(1) Draw a scatter graph that shows this information on a grid: Left as exercise
(2 ) Determine the equation of the least squares regression line: y = −1.74x + 40.97
(3) Determine the correlation coefficient: r = −0.969926
(4 ) What can we conclude about the relationship between the noon
temperature and the number of units of electricity used for heating?
Strong negative correllation (r tends to −1)
If noon temperatures increases the elctricity usuage decreases
(5 ) Estimate the number of units of electricity that was used to heat the
house on a day when the outdoor temperatuer at noon was 8C: yˆ(8) = 27.07993
Activity 7
The scatter plot below represents the times taken by the winners of the men's
100 m freestyle swimming event at the Olympic Games from 1972 to 2004.
The data was obtained from www.databaseOlympics.com.

Time taken Men's 100m freestyle

1. Calculate the equation of the least square line for this data and draw it.
2. Describe the trend that is observed in these times.
3. Give ONE reason for this trend.
4. What can be said about the efforts of the winners in the years 1976 and 1988?
5. Use your line of best fit to predict the winning time for 2008.
Line of Best fit
1. Calculate the equation of the least square line for this data and draw it.

Ye a r 1972 1976 1980 1984 1988 1992 1996 2000 2004


Ti m e 51,2 50,0 50,4 49,8 48,6 49,0 48, 7 48, 3 48,1

y = −0,0904x + 229
Best Fit Line
Draw a line of best fit for the data on the graph.

Time taken : Men's 100m freestyle


y = −0,0904x + 229

Calculator : Predicted Values
yˆ(1972) = 50,79

yˆ(1984) = 49,71
yˆ(1996) = 48,62
Trends
2. Describe the trend that is observed in these times.
3. Give ONE reason for the trend.
Time taken Men's 100m freestyle
3. Negative gradient
Downward trend
Times decreased
Swimming faster
y = −0,0904x + 229 Improved performance
4. Better exercise methods
Controlled diets
Swimwear: Less friction
More professional approach
Interpretation
4. What can be said about the efforts of the winners in the years 1976 and 1988?

Time taken Men's 100m freestyle


2.5 Calculator : Predicted Times Observed Times
yˆ(1976) = 50,429 y (1976) = 50,0
• yˆ(1988) = 49,344 y (1988) = 48,6

Efforts of winners better


y = −0,0904x + 229
• than expected.

Year 1972 1976 1980 1984 1988 1992 1996 2000 2004
Time 51,2 50,0 50,4 49,8 48,6 49,0 48,7 48,3 48,1
Prediction
5. Use your line of best fit to predict the winning time for 2008.

Time taken Men's 100m freestyle


Use formula to calculate :
y = −0,09041666667x + 229,0927778
 yˆ(2008) = (−0, 0904166667 2008) + 229,0927778 = 47,53611107

Calculator : Predicted Time


Extrapolation : From Graph yˆ(2008) = 47,53611111
Predict that time in 2008
will be 47,58 seconds.
Slightly smaller than 47,6
(Estimation not accurate)
Activity 8
A recording company investigates the relationship between the number of times
a CD is played by a national radio station and the national sales of the same CD
in the following week. The data below was collected for a random sample of
10 CDs. The sales figures are rounded to the nearest 50.
Number of times CD is played: P 47 34 40 34 33 50 28 53 25 46
Weekly sales of CD: S 3 950 2 500 3 700 2 800 2 900 3 750 2 300 4 400 2 200 3 400

(1) Identify the independent variable.


(2 ) Draw a scatter plot of this data on a grid.
(3) Determine the equation of the least squared regression line.
(4 ) Calculate the correlation coefficient.
(5) Predict, correct to the nearest 50, the weekly sales for a CD that
was played 45 times by the radio station in the previous week.
(6 ) Comment on the strength of the relationship between the variables.
Solutions
Use Calculator to show that:
Least square regression line is defined by y = 74.2805x + 293.05755
Correlation Coefficient: r = 0.945818
Show that: Sˆ(45) = 3635.68

(1) Identify the independent variable:


P as sales (S )depends on the number of times a CD was played
(2 ) Draw a scatter plot of this data on a grid: Left as exercise
(3) Determine the equation of the least squared regression line:
y = 74.28x +293.06
(4 ) Calculate the correlation coefficient: r = 0.95
(5) Predict, correct to the nearest 50, the weekly sales for a CD that
was played 45 times by the radio station in the previous week: 3 650
(6 ) Comment on the strength of the relationship between the variables:
Strong positive correlation ( r 1)
The more a CD is played the higher the sales
Scatter plot & regression line

61
Thank you
MATHEMATICS GRADE 11

PAPER 2

LESSON 1:

Statistics:
Ungrouped Data: Measures of
Central Tendency &
Dispersion
23 June 2024
Stats!!!
Ungrouped
Data
What is Statistics

• A branch of mathematics dealing with the collection, analysis,


interpretation, and presentation of masses of numerical data.
• A collection of quantitative data.
Importance of statistics in our lives
TERMINOLOGY

DISCRETE DATA

• Data that can only take certain values. For example, the number of
learners in a class (there can’t be half a learner)

CONTINUOUS DATA

• Data that can take on any value within a certain range. For example,
the heights of a group of learners (heights could be measured in
decimals)
1. TERMINOLOGY
Population: Collection of all potential observations
that can be found in a givensituation.

Sample: Collection of observations representing


only a portion of the population.
Variable: A variable x has a value xi for
data item (observation) number i.
Measures of Central Tendency

Wi l l c o n s i d e r t h e f o l l o w i n g t h r e e
M e a s u r e s o f C e n t r a l Te n d e n c y :
• Mean: Average of observations
• M e d i a n : M i d d l e Va l u e
• Mode: Most frequently
occuring observation.
MEASURES OF CENTRAL TENDENCIES OF DATA IN A
FREQUENCY TABLE
How to organise
Ungrouped or Raw Data
• Not arranged in any meaningful fashion
• Ungrouped Data or Raw Data
Example : The number of SMS calls received (variable x) in a certain
day by12 students may be recorded as: 0;3;6;5;2;5;4;8;3;5;5 and 7.
For further analysis by hand or PC the set of raw data is usually arranged
in an ascending order.
Discuss :
Mode Organised in an ascendingorder:
Mean &
Median
0 2 3 3 4 5 5 5 5 6 7 8
Determining the Median for Ungrouped Data
• Median (Q 2 ) is the middle value in the data set.
 n +1  th
• Location:   position, provided data is ordered.
 2 
n odd: 7, 13, 14, 17, 20

Location of Q 2 =  5 +1  = 3rd
th

  position
 2 
 Q 2 =14 n even: 7, 13, 14, 17, 20, 21
• Location of Q 2 ? 6 +1 = 3,5th position
2
• Calculate Q 2 = 14 +17 = 15,5
2
Calculating the Mode for Ungrouped Data

The mode is the mostfrequently occuring observation.

F o r u n g r o u p e d data:
M o can b e f o u n d b y a n i n s p e c t i o n
of the observations.
Consider the ordered ungrouped data
3; 5; 12; 12 and 13.
There can be more
Mode: M o = 12 than one mode.
5 6 6 6 7 9 9 9 10
SKEWNESS USING

mean (x), median ( Q 2 ) and Mode


• T h e relative values of x, Q2 a n d M o

indicates the sha pe of the distribution.


f Symmetrical f f
Positive Skew Negative Skew

x x x
x = Q 2 = Mo Mo Q 2 x x Q 2 Mo
Afew very large values More very largevalues
Tail to develop onright Tail to develop onleft
x and Q 2 dragged to right x and Q 2 dragged to left
Mo  Q 2  x x  Q 2  Mo
Reliability of Measures of Central Tendency

The median is usually a good measure of the central tendency.


The median is determined by position and is therefore not
sensitive to skewness and/or outliers.

The mean is only a good measure of central tendency if the


data is symmetrical. If the data is skewed, or if there is outliers,
the mean is not reliable.

The mode is only considered significant if it has a high


frequency.
Calculating the Mean of Data
DISCUSSION Activity

1. Develop Learners Calculator Skills


2. This Activity must be completed across Grades (10 – 12 )
SOLUTIONS

1. Develop Learners Calculator Skills


2. This Activity must be completed across Grades (10 – 12 )
Assessment Activities

Assessment Activities aimed at


further Development of Conceptual
Understanding

79
Assessment Activities

80
SOLUTION

81
Conclusion : Summary of Key Points

Check that you are able to:

❖ Define terms in statistics.


❖ Organise raw data using suitable methods
❖ Explain and Determine and the meaning of measures of central
tendency and dispersion.
❖ Describe skewness with suitable motivation

82
Five-number Summary

Understanding of five-number summary


(Min, Q1, Q2, Q3 Max):
1) The Minimum value in the data set;
2) Q1 : the Lower Quartile;
3) Q2 : the Median;
4) Q3 : the Upper Quartile;
5) The Maximum value in the data set.

Relate the five


number summary
with the fingers
Box - and - Whisker Plot
Box - and - Whisker Plot is a graph representing the sample distribution
by a rectangular box covering the IQR and whiskers at the end ofthe
box indicating central dispersion beyond the quartile range.
NB: the Box and whisker and it must be scaled & fully label (5 Number summary)

• IQR = Q3 − Q1 (middle 50% of observations)


• IQR

Q1 Q2 Q3 Largest
Lowest
class limit Whiskers class limit
Min Max
Box - and - Whisker Plot

• 25% of the data lies between the Minimum value and Q1


• 25% of the data lies between Q1and the Median
• 25% of the data lies between the Median and Q3
• 25% of the data lies between Q3 and the Maximum value
• 50% of the data lies between Q1 and Q3

Clarify the
Percentiles!!!
CLARIFY WITH LEARNERS

The following are some special


percentiles:
▪ The median is at the 50th percentile
▪ The lower quartile is at the 25th percentile
▪ The upper quartile is at the 75th percentile.
Activity : Box - and - Whisker Plot
Arrange data in increasing order:
10 13 13 17 18 19 20 21 22 23 24 26 27 30 32

Median = Q2 = P50 = 21
Lower Quartile = Q1 = P25 =17
Upper Quartile = Q3 = P75 = 26
Min =10 Max = 32
5 - Number Summary : (10; 17; 21; 26; 32)
Interpret : Box - and - Whisker Plot

• Clarify the relationship between the Mean and the Mode on the distribution
of data and clarify the skewness:

✓ Note that if the mean and the median of a data set are known, then

• If mean – median ≈ 0, then the distribution is symmetric

• If mean – median > 0, then the distribution is positively skewed

• If mean – median < 0, then the distribution is negatively skewed


Note : The length of the whisker does not
describe the skewness. Different length
could be as the result of outliers
DISCUSSION ACTIVITY
Marks for learners Box and Whisker Plot
in Class A: for learners in Class B:
9 14 14 19 21
23 33 35 37 37
42 45 55 56 57
59 68 75 75 75
77 78 80 81 92

1. Write down the five-number summary for Class A.


2. Draw the box and whisker diagram that represents
Class A's marks. Clearly indicate ALL relevant values.
3. Determine which class performed better in the June
examination and give reasons for your conclusion.
SOLUTIONS
1. Write down the five- number summary for Class A.
9 14 14 19 21
Marks for learners 23 33 35 37 37
42 45 55 56 57
in Class A: 59 68 75 75 75
77 78 80 81 92
Position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mark 9 14 14 19 21 23 33 35 37 37 42 45 55 56 57 59 68 75 75 75 77 78 80 81 92

Min = 9 Q1 = P 25 =
23 + 33 = 28 P50= Q2 =55 Q3 = P 75 =
75 + 75 = 75 Max = 92
2 2

Five-Number Summary
(Min, Q1, Q 2 , Q3, Max) = (9, 28, 55, 75, 92)
ACTIVITY
Five-Number Summary for Class A
( Min, Q1 , Q2 , Q3 , Max ) = (9, 28, 55, 75, 92)
2. Draw the box and whisker diagram that represents class A's marks.

3. D e t e r m i n e w h i c h c l a s s p e r f o r m e d b e t t e r in t h e J u n e
e x a m i n a t i o n a n d g i v e r e a s o n s f o r yo u r c o n c l u s i o n .

Class B performed better


Q 2 (Class B)  Q 2 (ClassA)
50% of class B obtained an average  60%
50% of class A obtained an average  55%

DISTRIBUTION OF DATA
Normal Distribution curve

Median

Mean

Mean - median = 0 OR Mean = median


(mean to the left of the median)
DISTRIBUTION OF DATA – CONT…
Mode Skewed to the left
Median

Mean

Mean – median < 0 OR Mean < median


(mean to the left of the median)
DISTRIBUTION OF DATA- CONT…
Mode
Skewed to the Right
Median

Mean

Mean – median > 0 OR Mean > median


(mean to the right of the median)
DISTRIBUTION OF DATA - CONT…
DISCUSSION: ACTIVITY
Grade 10: Overview CONT… SOLUTION
Assessment Activities

Assessment Activities aimed at


further Development of Conceptual
Understanding

98
STANDARD DEVIATION AND VARIANCE

• Standard deviation (s or σ ) is a measure of how spread-out


numbers are. It is the square root of the variance, which is the
average of the squared differences from the mean.
• The range is not always a good measure of dispersion, as it does
not eliminate extreme values.
• Unlike the range that only considers the extreme values, the
variance considers all the data points and then determines their
distribution.
• Note: A small standard deviation indicates that the data points are
closely distributed about the mean, while a large standard deviation
indicates that the data points are more spread out.
STANDARD DEVIATION AND VARIANCE
STANDARD DEVIATION AND VARIANCE (USING A CALCULATOR)

Clarify the relationship between


the Standard Deviation and
Variance
STANDARD DEVIATION AND VARIANCE
Use your calculator to find the standard deviation

• We are often required to find out how many items are within 1, 2 or 3 standard
deviations from the mean:
• 1 standard deviation from the mean is denoted as follows: (𝑥 −  x ; 𝑥+  x)
• 2 standard deviations from the mean is denoted as follows:(𝑥 − 2 x ; 𝑥 + 2 x)
• 3 standard deviations from the mean is denoted as follows:(𝑥 − 3 x ; 𝑥+ 3 x)

Make learners
aware: These are
not in the formular
sheet
STANDARD DEVIATION AND VARIANCE CONT…

Standard Deviation - around the mean: - Below, within above, outside and
related percentages, etc.).
• How many are : Below, within, above, outside.
• Percentage of those that are: Below, within, above, outside

Discuss with
the learners
STANDARD DEVIATION AND VARIANCE (WITHOUTA CALCULATOR)
SOLUTION
……SOLUTION…..
……SOLUTION…..
Soluti
on
IDENTIFYING OUTLIERS

In a set of data, it sometimes happens that a particular number is extremely high or low in
comparison to the other numbers. Such a number is called an outlier.

In 1977, the statistician, John Tukey, invented box and whisker plots and defined an outlier
to be any number in a data set which falls outside the interval:

where Q1 and Q3 represent the lower and upper quartiles respectively. IQR represents the
inter-quartile range ,

is called the lower fence AND

is called the upper fence.


IDENTIFYING OUTLIERS – EXAMPLE
IDENTIFYING OUTLIERS – EXAMPLE CONT…
IDENTIFYING OUTLIERS – EXAMPLE CONT…
IDENTIFYING OUTLIERS – EXAMPLE CONT…

You will notice that the mean for the data including the outliers is different to the mean
when the outliers are excluded. The standard deviations in each case are also different.
The value of the median is only slightly affected by the exclusion of the outliers.
IDENTIFYING OUTLIERS - ACTIVITY
Assessment Activities

Assessment Activities aimed at


further Development of Conceptual
Understanding

116
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
Conclusion : Summary of Key Points

Check that you are able to:

❖State the five number summary.


❖Draw and interpret a box-and –whisker diagram.
❖Use a box-and-whisker diagram to determine and describe skewness of
data.
❖Compare, and make informed decisions from at least two box-and-
whisker plots.
❖Calculate and interpret variance, standard deviation and standard
deviation intervals.
❖Identify outliers using a suitable criteria.

125
Concluding Remarks

The NEXT lesson will focus on


histograms, frequency
polygons and ogives, which
links with the work we
completed today

126
MATHEMATICS GRADE 11

PAPER 2

LESSON 3:

Statistics: Grouped Data


Histograms, frequency
polygons, Ogives
24 June 2024
Grouped Data: Learning goals

Develop learners ability to:


1. Generate/complete the frequency table and draw the Ogive
2. Draw the Ogive from the Histogram.
3. Identify the Modal class on the graph (longest segment between two
points) and write down its class interval.
4. Use the Ogive/graph to determine Q1, Q2 & Q3, then sketch the Box
and whisker (relate the quartiles to percentiles)
5. Show estimated Q1, Q2 & Q3 on the frequency table
6. Determine the estimated mean and standard deviation from the table
7. Interpret the given Ogive/graph
RECAP : CUMULATIVE FREQUENCY

• Cumulative frequency shows the number of results that are


less than (<) or less or equal to (≤) a stated value in a set of
data.
• To find the cumulative frequency, •
• Add up the frequencies as you go down the frequency table.
• Write each running total or cumulative frequency in your
table.
NB: You can find cumulative frequencies of discrete data
and continuous data.
OGIVE CURVE

• An ogive or cumulative frequency curve is a graph that


shows the information in a cumulative frequency table.
The graph is useful for estimating the median and
inter-quartile range of the grouped data.
• You can draw an ogive of ungrouped discrete data,
grouped discrete data or grouped continuous data. It can
be drawn from a grouped frequency table or an ungrouped
frequency table.
EXAMPLE
• The following frequency table shows the time (in minutes) taken
by learners to travel to school.
Time taken to Frequency Cumulative Ordered Pairs
travel to school Frequency

-10 < t ≤ 0 0 0
0 < t ≤ 10 4 0+4=4
10 < t ≤ 20 12 4+12=16
20 < t ≤ 30 28 16+28=44
30 < t ≤ 40 32 44+32=76
40 < t ≤ 50 29 76+29=105
50 < t ≤ 60 15 105+15=120
Example continue…

Draw the ogive as follows:


i) Draw the axes and label the variable on the x-axis and the
cumulative frequency on the y-axis.
ii) Plot the ordered pairs.
iii) Join the points to form a smooth curve.
Emphasize on the:
✓ Grounding the Ogive
✓ Last point is the last point/graph not
going beyond this point
Solutions

NB: Always remember when drawing cumulative frequency curve from a table of
grouped data, the cumulative frequencies are plotted at the upper limit of the interval.
Using the ogive to get Q1; Q2 and Q3

Use the ogive drawn in the Example to:


a) Determine the approximate values of
i) the median
ii) the lower quartile
iii) the upper quartile of the set of data.
b) What does each of these values tell you about the time taken
by the learners?
Solution
How to plot Q1, Q2, Q3 on the ogive
i) To find the approximate value of the median (M), find the midpoint of the
values plotted on the cumulative frequency axis.
• The maximum value is 120, so the median lies between the 60th and 61st
term.
• Draw a horizontal line from just above 60 until it touches the ogive.
• From that point draw a vertical line down to the horizontal axis.
• So the median ≈ 35 minutes.
ii) To find the approximate value of the lower quartile (Q1), find the midpoint
of the lower half of the values plotted on the cumulative frequency axis.
• There are 60 terms in the lower half of the data, so the lower quartile
lies between the 30th and the 31st term.
• Draw a horizontal line from just above 30 until it touches the ogive.
• From that point draw a vertical line down to the horizontal axis.
• So the lower quartile ≈ 25 minutes.
Solution continues

iii) To find the approximate value of the upper quartile (Q3), find the midpoint of the
upper half of the values plotted on the cumulative frequency axis.
• There are 60 terms in the upper half of the data, so the upper quartile lies between
60 + 30 = 90th and the 91st term.
• Draw a horizontal line from just above 90 until it touches the ogive.
• From that point draw a vertical line down to the horizontal axis.
• So the upper quartile ≈ 45 minutes.
b)
i) The median tells us that 50% of the learners took 35 minutes or less or to walk to
school.
ii) The lower quartile tells us that 25% of the learners took 25 minutes or less to walk to
school.
iii) The upper quartile tells us that 75% of the learners took 45 minutes or less to walk to
school.
OGIVE
Determining cumulative frequencies is an effective way of representing
grouped data. If you want to find the median of grouped data from a
frequency table, a useful way to do this is by first determining the
cumulative frequencies from the frequency table and then representing
the information on a cumulative frequency graph (or ogive curve).

EXAMPLE
The company HEALTHMANIA conducted a survey in Gauteng to find out
which age group most frequently uses their health supplements. The
company determined the ages of a representative sample of their
current client group. The ages of current clients were recorded and then
sorted. The company wanted to market a new health supplement to the
age group in Gauteng which most frequently uses their products.
OGIVE CONT…
OGIVE CONT…
OGIVE CONT…
NOTICE:
The total frequency of marks (77) is equal to the final cumulative frequency (77).
We can use the graph to determine estimates of the quartiles and percentilesfor this data.

1 1
Position of median = (n+ 1) = (77 +1) = 39th position
2 2

1 1
Position of lower quartile = (n+ 1) = (77+1) = 19,5th position
4 4

3 3
Position of upper quartile = (n+ 1) = (77+1) = 58,5th position
4 4

Percentiles divide the data into hundredths. For example;


95
the 95th percentile can be determined as follows: × 77 = 73,15 ≈ 73𝑟𝑑 client
100
OGIVE CONT…
Assessment Activities

Assessment Activities for Conceptual


Development

143
ACTIVITY
ACTIVITY (continued)
Diagram sheet 2
SOLUTIONS
SOLUTIONS (CONTINUED)
SOLUTIONS (CONTINUED)
ACTIVITY
ACTIVITY (Continued)
Diagram sheet 1
SOLUTIONS
SOLUTIONS (continued)
ACTIVITY
ACTIVITY (continued)
SOLUTIONS
ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - SOLUTIONS
OGIVE CONT… - SOLUTIONS
ACTIVITY
ACTIVITY
WORK AREA
SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
ACTIVITY
EXAM QUESTIONS – AS ACTIVITIES
EXAM QUESTIONS – SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
Conclusion : Summary of Key Points

Check that you are able to:

❖Draw and interpret histograms, frequency polygons, cumulative


frequency tables and ogives.
❖Determine estimates of quartiles or percentiles using an ogive.

177
Assessment Activities : Next Level

Assessment Activities for Exam


Readiness

178
Assessment Activities

179
SOLUTIONS

180
Assessment Activities

181
SOLUTIONS

182
Assessment Activities

183
Assessment Activities

184
SOLUTIONS

185
Assessment Activities

186
SOLUTIONS

187
SOLUTIONS

188
SOLUTIONS

189
Assessment Activities

190
Assessment Activities

191
Assessment Activities

192
SOLUTIONS

193
SOLUTIONS

194
SOLUTIONS

195
SOLUTIONS

196
SOLUTIONS

197
Assessment Activities

198
Assessment Activities

199
SOLUTIONS

200
Assessment Activities

201
Assessment Activities

202
SOLUTIONS

203
Conclusion : Summary of Key Points

Check that you are able to:

❖Determine and interpret measures of central tendency and dispersion in


ungrouped or grouped data.
❖Draw and interpret statistical tables, diagrams and graphs
❖Identify and describe skewness of data with appropriate motivation
❖Identify an outlier

204
Concluding Remarks

Following our today lesson, I Repeat this procedure until you are
want you to do the to: confident.

Read through what the learner need to


Do not forget: Practice makes
understand and master in your learner
perfect!
material.

Complete the activities

Attempt as many as possible other similar examples on your own from the
Text-Book and the past exam papers.
205
Thank you

You might also like