Professional Documents
Culture Documents
Gade 12 & 12 Promaths STATS 2024 June 2024
Gade 12 & 12 Promaths STATS 2024 June 2024
GRADE 12
TOPIC : STATISTICS
DISCRETE DATA
• Data that can only take certain values. For example, the number of
learners in a class (there can’t be half a learner)
CONTINUOUS DATA
• Data that can take on any value within a certain range. For example,
the heights of a group of learners (heights could be measured in
decimals)
Scatter plot & regression line
6
Example of a Scatter Plot
7
Types of Correlation(r)
8
Scatter Plot
9
Scatter Plot: Strong Positive Correlation (r)
10
Scatter Plot: Moderate Positive Correlation (r)
11
Scatter Plot: Negative Correlation (r)
12
Scatter Plot: Moderate Negative Correlation (r)
13
Scatter Plot: Relationship
14
Scatter Plot & Line of Best Fit
15
Scatter Plot & Regression line
16
Scatter Plot & Regression line
17
Scatter plot & regression line
18
Correlation between two variables
•Interpolation: • Extrapolation :
Estimate values of y outside
Estimate values of y for values
the range of observed values.
of x within observedrange. Predictions
Examples of Bivariate Data
Consider the following three sets of collected bivariate data:
Example 1 :
Key Questions :
• Is there a relationship between x and y ? Can y be x −4 −2 −1 1 4
expressed as a relation in x ? y −11 −7 −5 −1 5
• Can we determine the defining equation for such a relation? Can we
extend the observed data?
x −4 −2 −1 1 4
Observed Data:
y −11 −7 −5 −1 5
• Select STAT LINEAR MODE on your calculator.
• Input OBSERVED DATA from table into calculator.
• RECALL from Calculator values of A (y − intercept) and B (gradient).
• Write down defining equation in the form y = Bx + A.
Def in in g Equation: y = 2 x − 3
yˆ(−3 ) = −9
• Utilize this defining equation to extend data table.
• OR utilize Calculator Capacity to extend table. yˆ(0) = −3
x −4 −3 −2 −1 0 1 2 3 4 yˆ(2 ) = 1
y −11 −9 −7 −5 −3 −1 1 3 5 yˆ(3) = 3
Strength of Linear Relationship
y •• y
•• • •• •
•• • •
•• • • • ••
• •• •
•
• •• • •
• • • •• •
x x
Strong, +ve Linear Weak Linear, - ve
Correlation Correlation
1. −1 r 1
2. r = 1 A perfect positive linear relationship
(All points exactly on line with a positive gradient)
3. r = −1 A perfect negative linear relationship.
(All points exactly on line with a negative gradient)
4. r = 0 No linear relationship.
r Interpretation
1 Perfect positive association
0,9 Strong positive association
0,5 Moderate positive association
0,2 Weak positive association
0 No association
−0, 2 Weak negative association
−0,5 Moderate negative association
−0,9 Strong negative association
−1 Perfect negative association
N.B: The description of r is not provided in the formula sheet (remind learners). 28
Coefficient of Determination.
Important question: How well does the straight line
represent the relationship?
Regression line can be seen as a Regression line is not a good
good fit for observed data fit for observed data
y •
y •
• •
••• • •
• •• ••• •
•• • •
• • 2 r 100% • •
••
x x
Coefficient of determination = r −squared = r 2
What percentage of variation in y is explained by variations in x ?
Activity
30
Example : Eight randomly selected families were asked about their monthly
incomes and amounts saved in that month. The following data was
recorded:
Amount saved 500 1 900 1 100 2 500 1 500 2 300 800 2 100
MonthlyIncome 10 000 19 500 13 400 27 000 16 000 22 000 12 000 20 000
General Trend
• Linear relationship
• Positive gradient
• Savings increases with increase in income
Correlation Coefficient
Monthly Income: x 10 000 19 500 13 400 27 000 16 000 22 000 12 000 20 000
Amount saved: y 500 1 900 1 100 2 500 1 500 2 300 800 2 100
Procedure :
• Select STATS LINEAR MODE
• Input available data
• RECALL value of r
•
•
•
Activity 1
A training manager wants to know if there is a link between the hours in training (x)
spent by a particular category of employee and their productivity (units produced per day, y )
on a job. The data below was extracted from the files of 10 employees.
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Use Calculator:
Select STAT Mode
Select line option: Y = A + BX
Input data
Recall A = 29.22 and B = 0.89
Defining equation: y = 0.89x + 29.22
Equation Least Square Line: Using Formulae
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Use relevant formulae:
yˆ= a + b x where
a = y − b x and
Equation of least squares line:
y = 0.89x + 29.22 (Same result)
Due to mark allocation the calculator option is prefered.
Sketch Least Square Line
( )
Always one possibility: x; y = (30.1;55.9)
Determine at least 3 points on line: yˆ(20) = 46.9 (20;46.9) is a second possible point
yˆ(40) = 64.7 (40;64.7 ) is a third possible point
Activity 2
The relationship between age and spending money on buying clothes per month has
been studied for years. Research has shown the following results:
40
……….Activity 2
2. Determine the equation of the regression line and draw it on the scatter plot.
3. Describe the trend of the data with reference to the correlation coefficient.
4. Estimate the expenditure of a 50-year old person from the scatter plot.
41
Solution
1.
Age
42
Solution
2.
3. r = −0,995
Strong negative correlation.
4.
1. Approx R600
5. Interpolation.
Data value is within the given range.
43
Activity 2
(1) Estimate the productivity level for a particular employee
who has received only 22 hours of training.
(2) Determine the correlation between productivity and hours of training.
() Is the association strong? Advise the manager.
…..Activity 3
(1) Take a reading from graph or
Determine yˆ(22) = 48.72 using calculator or
Calculate: yˆ(22) = 0.8922 + 29.22 = 48.8
() Using calculator: r = 0.66
Age ( x) 59 32 42 50 22 39 21 20 27 40 29 47
Resting heart rate ( y ) 88 74 74 93 85 71 78 82 70 75 95 75
Age ( x) 59 32 42 50 22 39 21 20 27 40 29 47
Resting heart rate ( y ) 88 74 74 93 85 71 78 82 70 75 95 75
y = 0.0954x +76.5956
…..Solution
(4 ) Calculate the correlation coefficient for the data.
(5) Use the correlation coefficient to comment on the relastionship
between age and the resting heart rate.
(6 ) If a learner uses the least square line to predict the resting heart rate
of a 45-year-old person, will his answer be reliable? Motivate your answer.
(1) Draw a scatter plot of the data given on a grid: Left as an exercise.
(2 ) Calculate the equation of the least square line for this data: y = 0.81x + 25.23
(3) Calculate the correlation coefficient: r = 0.898
(4 ) Comment on the correlation of the data.
Strong positive correlation
(5 ) If Joan's heart rate after jogging is 86 beats per minute,
what is her resting heart rate, in beats per minute?
xˆ(86) = 74.6 beats per minute
Activity 6
The outdoor temperature, in C, at noon on 10 days and the
number of units of electricity, in kW, used to heat a house on
each of those days, are shown in the table below.
Noon temperature: T
7 11 9 2 4 7 0 10 5 3
(in C)
Units of electricity used: E
32 20 27 37 32 28 41 23 33 36
(in kW)
(1) Draw a scatter graph that shows this information on a grid.
(2 ) Determine the equation of the least squares regression line.
(3) Determine the correlation coefficient.
(4 ) What can we conclude about the relationship between the noon
temperature and the number of units of electricity used for heating?
(5 ) Estimate the number of units of electricity that was used to heat the
house on a day when the outdoor temperatuer at noon was 8C.
Solution
Use Calculator to show that:
Least square regression line is defined by
y = −1.73639x +40.971088
Correlation Coefficient: r = −0.969926
Show that: yˆ(8) = 27.07993
(1) Draw a scatter graph that shows this information on a grid: Left as exercise
(2 ) Determine the equation of the least squares regression line: y = −1.74x + 40.97
(3) Determine the correlation coefficient: r = −0.969926
(4 ) What can we conclude about the relationship between the noon
temperature and the number of units of electricity used for heating?
Strong negative correllation (r tends to −1)
If noon temperatures increases the elctricity usuage decreases
(5 ) Estimate the number of units of electricity that was used to heat the
house on a day when the outdoor temperatuer at noon was 8C: yˆ(8) = 27.07993
Activity 7
The scatter plot below represents the times taken by the winners of the men's
100 m freestyle swimming event at the Olympic Games from 1972 to 2004.
The data was obtained from www.databaseOlympics.com.
1. Calculate the equation of the least square line for this data and draw it.
2. Describe the trend that is observed in these times.
3. Give ONE reason for this trend.
4. What can be said about the efforts of the winners in the years 1976 and 1988?
5. Use your line of best fit to predict the winning time for 2008.
Line of Best fit
1. Calculate the equation of the least square line for this data and draw it.
y = −0,0904x + 229
Best Fit Line
Draw a line of best fit for the data on the graph.
•
y = −0,0904x + 229
•
Calculator : Predicted Values
yˆ(1972) = 50,79
•
yˆ(1984) = 49,71
yˆ(1996) = 48,62
Trends
2. Describe the trend that is observed in these times.
3. Give ONE reason for the trend.
Time taken Men's 100m freestyle
3. Negative gradient
Downward trend
Times decreased
Swimming faster
y = −0,0904x + 229 Improved performance
4. Better exercise methods
Controlled diets
Swimwear: Less friction
More professional approach
Interpretation
4. What can be said about the efforts of the winners in the years 1976 and 1988?
Year 1972 1976 1980 1984 1988 1992 1996 2000 2004
Time 51,2 50,0 50,4 49,8 48,6 49,0 48,7 48,3 48,1
Prediction
5. Use your line of best fit to predict the winning time for 2008.
61
Thank you
MATHEMATICS GRADE 11
PAPER 2
LESSON 1:
Statistics:
Ungrouped Data: Measures of
Central Tendency &
Dispersion
23 June 2024
Stats!!!
Ungrouped
Data
What is Statistics
DISCRETE DATA
• Data that can only take certain values. For example, the number of
learners in a class (there can’t be half a learner)
CONTINUOUS DATA
• Data that can take on any value within a certain range. For example,
the heights of a group of learners (heights could be measured in
decimals)
1. TERMINOLOGY
Population: Collection of all potential observations
that can be found in a givensituation.
Wi l l c o n s i d e r t h e f o l l o w i n g t h r e e
M e a s u r e s o f C e n t r a l Te n d e n c y :
• Mean: Average of observations
• M e d i a n : M i d d l e Va l u e
• Mode: Most frequently
occuring observation.
MEASURES OF CENTRAL TENDENCIES OF DATA IN A
FREQUENCY TABLE
How to organise
Ungrouped or Raw Data
• Not arranged in any meaningful fashion
• Ungrouped Data or Raw Data
Example : The number of SMS calls received (variable x) in a certain
day by12 students may be recorded as: 0;3;6;5;2;5;4;8;3;5;5 and 7.
For further analysis by hand or PC the set of raw data is usually arranged
in an ascending order.
Discuss :
Mode Organised in an ascendingorder:
Mean &
Median
0 2 3 3 4 5 5 5 5 6 7 8
Determining the Median for Ungrouped Data
• Median (Q 2 ) is the middle value in the data set.
n +1 th
• Location: position, provided data is ordered.
2
n odd: 7, 13, 14, 17, 20
Location of Q 2 = 5 +1 = 3rd
th
position
2
Q 2 =14 n even: 7, 13, 14, 17, 20, 21
• Location of Q 2 ? 6 +1 = 3,5th position
2
• Calculate Q 2 = 14 +17 = 15,5
2
Calculating the Mode for Ungrouped Data
F o r u n g r o u p e d data:
M o can b e f o u n d b y a n i n s p e c t i o n
of the observations.
Consider the ordered ungrouped data
3; 5; 12; 12 and 13.
There can be more
Mode: M o = 12 than one mode.
5 6 6 6 7 9 9 9 10
SKEWNESS USING
x x x
x = Q 2 = Mo Mo Q 2 x x Q 2 Mo
Afew very large values More very largevalues
Tail to develop onright Tail to develop onleft
x and Q 2 dragged to right x and Q 2 dragged to left
Mo Q 2 x x Q 2 Mo
Reliability of Measures of Central Tendency
79
Assessment Activities
80
SOLUTION
81
Conclusion : Summary of Key Points
82
Five-number Summary
Q1 Q2 Q3 Largest
Lowest
class limit Whiskers class limit
Min Max
Box - and - Whisker Plot
Clarify the
Percentiles!!!
CLARIFY WITH LEARNERS
Median = Q2 = P50 = 21
Lower Quartile = Q1 = P25 =17
Upper Quartile = Q3 = P75 = 26
Min =10 Max = 32
5 - Number Summary : (10; 17; 21; 26; 32)
Interpret : Box - and - Whisker Plot
• Clarify the relationship between the Mean and the Mode on the distribution
of data and clarify the skewness:
✓ Note that if the mean and the median of a data set are known, then
Min = 9 Q1 = P 25 =
23 + 33 = 28 P50= Q2 =55 Q3 = P 75 =
75 + 75 = 75 Max = 92
2 2
Five-Number Summary
(Min, Q1, Q 2 , Q3, Max) = (9, 28, 55, 75, 92)
ACTIVITY
Five-Number Summary for Class A
( Min, Q1 , Q2 , Q3 , Max ) = (9, 28, 55, 75, 92)
2. Draw the box and whisker diagram that represents class A's marks.
3. D e t e r m i n e w h i c h c l a s s p e r f o r m e d b e t t e r in t h e J u n e
e x a m i n a t i o n a n d g i v e r e a s o n s f o r yo u r c o n c l u s i o n .
Median
Mean
Mean
Mean
98
STANDARD DEVIATION AND VARIANCE
• We are often required to find out how many items are within 1, 2 or 3 standard
deviations from the mean:
• 1 standard deviation from the mean is denoted as follows: (𝑥 − x ; 𝑥+ x)
• 2 standard deviations from the mean is denoted as follows:(𝑥 − 2 x ; 𝑥 + 2 x)
• 3 standard deviations from the mean is denoted as follows:(𝑥 − 3 x ; 𝑥+ 3 x)
Make learners
aware: These are
not in the formular
sheet
STANDARD DEVIATION AND VARIANCE CONT…
Standard Deviation - around the mean: - Below, within above, outside and
related percentages, etc.).
• How many are : Below, within, above, outside.
• Percentage of those that are: Below, within, above, outside
Discuss with
the learners
STANDARD DEVIATION AND VARIANCE (WITHOUTA CALCULATOR)
SOLUTION
……SOLUTION…..
……SOLUTION…..
Soluti
on
IDENTIFYING OUTLIERS
In a set of data, it sometimes happens that a particular number is extremely high or low in
comparison to the other numbers. Such a number is called an outlier.
In 1977, the statistician, John Tukey, invented box and whisker plots and defined an outlier
to be any number in a data set which falls outside the interval:
where Q1 and Q3 represent the lower and upper quartiles respectively. IQR represents the
inter-quartile range ,
You will notice that the mean for the data including the outliers is different to the mean
when the outliers are excluded. The standard deviations in each case are also different.
The value of the median is only slightly affected by the exclusion of the outliers.
IDENTIFYING OUTLIERS - ACTIVITY
Assessment Activities
116
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
Conclusion : Summary of Key Points
125
Concluding Remarks
126
MATHEMATICS GRADE 11
PAPER 2
LESSON 3:
-10 < t ≤ 0 0 0
0 < t ≤ 10 4 0+4=4
10 < t ≤ 20 12 4+12=16
20 < t ≤ 30 28 16+28=44
30 < t ≤ 40 32 44+32=76
40 < t ≤ 50 29 76+29=105
50 < t ≤ 60 15 105+15=120
Example continue…
NB: Always remember when drawing cumulative frequency curve from a table of
grouped data, the cumulative frequencies are plotted at the upper limit of the interval.
Using the ogive to get Q1; Q2 and Q3
iii) To find the approximate value of the upper quartile (Q3), find the midpoint of the
upper half of the values plotted on the cumulative frequency axis.
• There are 60 terms in the upper half of the data, so the upper quartile lies between
60 + 30 = 90th and the 91st term.
• Draw a horizontal line from just above 90 until it touches the ogive.
• From that point draw a vertical line down to the horizontal axis.
• So the upper quartile ≈ 45 minutes.
b)
i) The median tells us that 50% of the learners took 35 minutes or less or to walk to
school.
ii) The lower quartile tells us that 25% of the learners took 25 minutes or less to walk to
school.
iii) The upper quartile tells us that 75% of the learners took 45 minutes or less to walk to
school.
OGIVE
Determining cumulative frequencies is an effective way of representing
grouped data. If you want to find the median of grouped data from a
frequency table, a useful way to do this is by first determining the
cumulative frequencies from the frequency table and then representing
the information on a cumulative frequency graph (or ogive curve).
EXAMPLE
The company HEALTHMANIA conducted a survey in Gauteng to find out
which age group most frequently uses their health supplements. The
company determined the ages of a representative sample of their
current client group. The ages of current clients were recorded and then
sorted. The company wanted to market a new health supplement to the
age group in Gauteng which most frequently uses their products.
OGIVE CONT…
OGIVE CONT…
OGIVE CONT…
NOTICE:
The total frequency of marks (77) is equal to the final cumulative frequency (77).
We can use the graph to determine estimates of the quartiles and percentilesfor this data.
1 1
Position of median = (n+ 1) = (77 +1) = 39th position
2 2
1 1
Position of lower quartile = (n+ 1) = (77+1) = 19,5th position
4 4
3 3
Position of upper quartile = (n+ 1) = (77+1) = 58,5th position
4 4
143
ACTIVITY
ACTIVITY (continued)
Diagram sheet 2
SOLUTIONS
SOLUTIONS (CONTINUED)
SOLUTIONS (CONTINUED)
ACTIVITY
ACTIVITY (Continued)
Diagram sheet 1
SOLUTIONS
SOLUTIONS (continued)
ACTIVITY
ACTIVITY (continued)
SOLUTIONS
ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - SOLUTIONS
OGIVE CONT… - SOLUTIONS
ACTIVITY
ACTIVITY
WORK AREA
SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
ACTIVITY
EXAM QUESTIONS – AS ACTIVITIES
EXAM QUESTIONS – SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
Conclusion : Summary of Key Points
177
Assessment Activities : Next Level
178
Assessment Activities
179
SOLUTIONS
180
Assessment Activities
181
SOLUTIONS
182
Assessment Activities
183
Assessment Activities
184
SOLUTIONS
185
Assessment Activities
186
SOLUTIONS
187
SOLUTIONS
188
SOLUTIONS
189
Assessment Activities
190
Assessment Activities
191
Assessment Activities
192
SOLUTIONS
193
SOLUTIONS
194
SOLUTIONS
195
SOLUTIONS
196
SOLUTIONS
197
Assessment Activities
198
Assessment Activities
199
SOLUTIONS
200
Assessment Activities
201
Assessment Activities
202
SOLUTIONS
203
Conclusion : Summary of Key Points
204
Concluding Remarks
Following our today lesson, I Repeat this procedure until you are
want you to do the to: confident.
Attempt as many as possible other similar examples on your own from the
Text-Book and the past exam papers.
205
Thank you