Professional Documents
Culture Documents
CH 4 Scatter Diagrams and Correlation
CH 4 Scatter Diagrams and Correlation
CH 4 Scatter Diagrams and Correlation
Statistics
Mr M Dominguez
mdominguez@kegs.org.uk
Chapter 4 Scatter Diagrams and
Correlation
Lesson 1: 4.1 to 4.5 Print out Q1 worksheet
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
80
75
70
65
60
4 5 6 7 8 9 10 11 12 13
Shoe Size
§ 4.2 Correlation
There are many different types of correlation not just positive or negative.
Scatter graphs are used to show whether there is a relationship between two sets
of data. The relationship between the data can be described as either:
1. A positive correlation. As one quantity increases so does the other.
2. A negative correlation. As one quantity increases the other decreases.
3. No linear correlation. Both quantities vary with no clear relationship.
Soup Sales
Shoe Size
Height
The scatter diagrams shows the heights and weights of different students
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
4 5 6 7 8 9 10 11 12 13
Shoe Size
1) The table below shows the shoe size and mass of 8 men.
(a) Plot a scatter graph for this data and draw a line of best fit.
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
4 5 6 7 8 9 10 11 12 13
Shoe Size
§ 4.3 Causal Relationships
Do not draw causal implications from statements about associations, unless
your data come from a randomized experiment. Just because January and
June temperatures increase together does not mean that January
temperatures cause June temperatures to increase (or vice versa). The only
certain way to sort out causality is to move beyond statistical analysis and talk
about mechanisms.
1) The table below shows the shoe size and mass of 10 men.
(e) Find the mean shoe size and the mean mass
Size 5 12 7 10 10 9 8 11 6 8
Mass 65 97 68 92 78 78 76 88 74 80
1) The table below shows the shoe size and mass of 8 men.
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
80
through this point.
75 (mean data 1, mean data 2)
70
In this case: (8.5, 78.625)
65
60
4 5 6 7 8 9 10 11 12 13
Shoe Size
2) The table below shows the number of people who visited a museum over a 10 day
period last summer together with the daily sunshine totals.
(a) Plot a scatter graph for this data and draw a line of best fit.
0 1 2 3 4 5 6 7 8 9 10 Means Means 2
Hours of Sunshine
§ 4.5 Interpolation and extrapolation
Using our line of best fit we can estimate the
value of one variable when given the other.
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
0 1 2 3 4 5 6 7 8 9 10
Hours of Sunshine
§ 4.6 The equation of a line of best fit
To find the equation of the line of best fit you must find
the Gradient. You must also know a point on the line.
Either the y intercept of the mean. You can then use one
of the two general equations for a straight line.
𝒚=𝟎. 𝟓𝟒 𝒙? +𝟑𝟗
80
70
60
Maths Score
We can find the gradient by 50
English Score
m = Δy = 43 = 0.54
Δx 80 ?
Interpret the value of and in equation of the line
For every extra mark in English the maths mark increases by 0.54
The maths mark is approximately 39 when a student scores 0 in the English test.
We can actually use our calculator to input data and find a line of best fit.
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
80
(i) Find the equation of the line of
75
best fit.
70
This time we can’t find the -
65 intercept from the graph
60
To find sub in a know point (8.5,
4 5 6 7 8 9 10 11 12 13 78.625) hence,
Shoe Size ?
1) The table below shows the shoe size and mass of 8 men.
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
100
95
Interpret the values of the
90 gradient and the y-intercept.
85 Gradient:
Mass (kg)
20
y = -0.18x + 17
Weekly time on internet (hours)
15
10
0
0 10 20 30 40 50 60 ? 70 80 90
Age
0 10 20 30 40 50 60 70 80 90
Age
Eg: is there more agreement between height and weight or height and arm length.
§ 4.8 Calculating Spearman’s rank correlation coefficient
Does being good at maths make you better at biology?
n = 10
∑d2 = 66
6(66) 6(66)
rs = 1 - =1-
10(102 – 1) 10 x 99
= 1 – 0.4 = 0.6
Step 6: comment on the value of rs
rs = 0.6 suggests a relatively moderate (agreement) positive
correlation between Maths and Biology scores.
As a maths scores increase, Biology scores also increase
Step 1: Rank each set of data
Step 2: Work out the differences in ranks (Why doesn’t it matter
what order we subtract in?)
Step 3: Work out the square of the differences
Step 4: Work out the sum of the square of the differences
Step 5: Work out the value of the coefficient, rs
Step 6: comment on the value of rs
1) The table below shows the shoe size and mass of 8 men.
Size 5 12 7 10 9 11 6 8
Mass 65 97 68 78 79 88 74 80
Rank(s) 1 8 3 6 5 7 2 4
Rank(M) 1 8 2 4 5 7 3 6
d 0 0 1 2 0 0 1 2
𝑑2 0 0 1 4 ?0 0 1 4
The shoe size and mass are in (strong) agreement / positive correlation
Calculate Spearman's rank Correlation
coefficient and interpret your answer.
?
Strong agreement between mock and GCSE %
?33 (positive correlation)
?
The higher a students mark in the Mock the higher
their mark in the GCSE
§ 4.9 PMCC
If both variables X, Y are random samples from normal distributions (the data is
symmetrical about the mean and the samples set is chosen using a random sampling
method) then the Product Moment Correlation Coefficient (PMCC) can be calculated to
given an estimation of the correlation.
However if the variables X and Y are not random samples from a normal distribution. We
can not use PMCC
For example IRG attainment based on class test would be normally distributed but.
Teachers opinions on effort would not be.
For each of the questions below identify the most appropriate value for
Spearman's rank correlation coefficient and Persons product moment correlation
coefficient, from the list. Then explain your reasoning.
-0.95 -0.60 0.60 0.95