Professional Documents
Culture Documents
18MAT41 Module 4
18MAT41 Module 4
MODULE – 04
Statistical Methods
Correlation and Regression
Examples:
∑𝑛
1 (𝑥− 𝑥̅ )(𝑦− 𝑦
̅)
r= … (1)
𝑛𝜎𝑥 𝜎𝑦
∑ 𝑋𝑌
r=
√∑ 𝑋 2 ∑ 𝑌 2
Property :
ie -1 ≤ r ≤ +1.
Note : If r = ± 1 we say that x and y are perfectly correlated and if r = 0 we say that x and
Problems:
1. Calculate the karl – pearson co – efficient for the following ages of husband and
wife’s.
Roll No. 1 2 3 4 5 6 7 8 9 10
Husband’s 36 23 27 28 28 29 30 31 33 35
age (x)
Wife’s age 29 18 20 22 27 21 29 27 29 28
(y)
∑𝑋 300 ∑𝑌 250
Solu: 𝑥̅ = = = 30 𝑦̅ = = = 25 Here n = 10
10 10 10 10
x y X= x - 𝑥̅ Y = y -𝑦̅ XY X2 Y2
36 29 -7 -7 49 49 49
23 18 -3 -5 15 9 25
27 20 -2 -3 6 4 9
28 22 -2 2 -4 4 4
28 27 -1 -4 4 1 16
29 21 0 4 0 0 16
30 29 1 2 2 1 4
31 27 3 4 12 9 16
33 29 5 3 15 25 9
35 28 6 4 24 36 16
∑𝑥 ∑𝑦 ∑ 𝑋𝑌 ∑ 𝑋 2 ∑ 𝑌 2
= 300 = 250 = 123 = 138 = 164
We have
∑ 𝑋𝑌
r=
√∑ 𝑋 2 ∑ 𝑌 2
123
r= = 0.817
√(138)(164)
It is a positive correlation.
x 10 14 18 22 26 30
y 18 12 24 6 30 36
∑ 𝑋𝑌
Solu. We have r= where X = x - 𝑥̅ , Y = y -𝑦̅
√∑ 𝑋 2 ∑ 𝑌 2
Here n = 6
∑𝑋 120 ∑𝑌 126
𝑥̅ = = = 20 𝑦̅ = = = 21
10 6 10 6
x y X= x -𝑥̅ Y = y-𝑦̅ XY X2 Y2
10 18 -10 -3 30 100 9
14 18 -6 -9 53 36 81
18 18 -2 3 -6 4 9
26 18 6 9 54 36 81
∑𝑥 ∑𝑦 ∑ 𝑋𝑌 ∑ 𝑋 2 ∑ 𝑌 2
= 120 = 126 = 252 = 280 = 6304
252
We have r= = 0.6
√(280)(630)
It is a positive correlation.
3. Calculate Karl pearson co-efficient of correlation b/w the marks obtained by 8 students in
mathematics and statistics :
Statistics 8 10 15 17 20 23 24 25
Mathematics 25 30 32 35 37 40 42 45
∑ 𝑋𝑌
We have r= where X = x - 𝑥̅ , Y = y -𝑦̅
√∑ 𝑋 2 ∑ 𝑌 2
Here n = 8
∑𝑋 142 ∑𝑌 286
𝑥̅ = = = 17.75 𝑦̅ = = = 35.75
𝑛 8 𝑛 8
x y X= x -𝑥̅ Y = y-𝑦̅ XY X2 Y2
8 25 -9.75 -10.75 104.81 95.06 115.56
10 30 -7.75 -5.75 44.56 60.06 33.06
15 32 -2.75 -3.75 10.31 7.56 14.06
17 35 -0.75 -0.75 0.56 0.56 0.56
20 37 2.25 1.25 2.81 5.06 1.56
23 40 5.25 4.25 22.31 27.56 18.06
24 42 6.25 6.25 39.06 39.06 39.06
25 45 7.25 9.25 67.06 56.56 85.56
∑𝑥 ∑𝑦 ∑ 𝑋𝑌 ∑ 𝑋2 ∑ 𝑌2
= 17.75 = 35.75 = 291.48 = 291.480 = 307.48
291.48
We have r= = 0.97
√(291.48)(307.48)
It is a positive correlation.
Regression
Regression is an estimation of one independent variable in terms of the other. If x & y are
correlated, the best fitting straight line in the least square sense gives reasonably a good relation
between x & y.
The best fitting straight line of the form y = ax + b (x being the independent variable) is called
the regression line of y on x & x = ay + b (y being the independent variable) is called the
regression line of x on y.
Let y = ax + b be the equation of the regression line of y on x for a given set of n values (x,y).
𝜎𝑦
Then y- 𝑦̅ = 𝑟 (x-𝑥̅ )………….(1)
𝜎𝑥
Similarly,
𝜎𝑥
x- 𝑥̅ = 𝑟 (y-𝑦̅)………..(2)
𝜎𝑦
Thus we can conclude that r is the geometric mean (GM) of the regression coefficients since the
GM of two numbers a, b is √𝑎𝑏. That is
r = ±√(𝑐𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑥 )(𝑐𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑦)
The sign of r will be positive or negative according as the regression coefficients are positive or
negative.
Note: The lines of regression (1) & (2) are also of the form
∑ 𝑋𝑌 ∑ 𝑋𝑌
Y= ∑ 𝑋2
(X) and X = ∑ 𝑌 2 (Y)
This form will be useful to find out the coefficient of correction by first obtaining the lines of
regression as we have deduced that
r = ±√(𝑐𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑥 )(𝑐𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑦)
Problems:
1. Compute the coefficient of correlation & the equation of the lines of regression for the
following data.
x 1 2 3 4 5 6 7
y 9 8 10 12 11 13 14
∑ 𝑋𝑌
Solu: We have coefficient of correlation r =
√∑𝑋 2 ∑𝑌 2
Where Y = y- 𝑦̅ X= x- 𝑥̅ Here n = 7
∑𝑥 ∑𝑦
𝑥̅ = = 28/7 = 4 𝑦̅ = = 77/7 = 11
𝑛 𝑛
x y X= x- Y= y- 𝑦̅ XY 𝑋2 𝑌2
𝑥̅
1 9 -3 -2 6 9 4
2 8 -2 -3 6 4 9
3 10 -1 -1 1 1 1
4 12 0 1 0 0 1
5 11 1 0 0 1 0
6 13 2 2 4 4 4
7 14 3 3 9 9 9
2
∑ 𝑋 =28 ∑ 𝑌 2 =28
∑𝑥 ∑𝑦 ∑ 𝑋𝑌
= 28 = 77 = 26
26 26
y- 𝑦̅ = (x- 𝑥̅ ) x-4 = 28 (y-11)
28
2. Obtain the lines of regression and hence find the co-efficient of correlation for the
following data
x 1 3 4 2 5 8 9 10 13 15
y 8 6 10 8 12 16 16 10 32 32
Solution:
∑𝑥 ∑𝑦
𝑥̅ = = 70/10 = 7 𝑦̅ = = 150/10 =15
𝑛 𝑛
x y X= x- 𝑥̅ Y= y- 𝑦̅ xy 𝑋2 𝑌2
1 8 -6 -7 42 36 49
3 6 -4 -9 35 16 81
4 10 -3 -5 15 9 25
2 8 -5 -7 35 25 49
5 12 -2 -3 6 4 9
8 16 1 1 1 1 1
9 16 2 1 2 4 1
10 10 3 -5 -15 9 25
13 32 6 17 102 36 289
15 32 8 17 136 64 289
∑𝑥 ∑𝑦 ∑ 𝑋𝑌 ∑ 𝑋2 ∑ 𝑌2
= 10 = 150 = 360 = 204 = 818
360
y- 𝑦̅ = 360/204( x- 𝑥̅ ) x- 7 = 818 (y-15)
y = 1.764x-12.348+15
y = 1.764x + 2.652
We have
= 0.88
3. Compute the coefficient of correlation & the equation of the lines of regression for the
following data.
x 10 14 18 22 26 30
y 18 12 24 6 30 36
∑𝑥 120 ∑𝑦 126
Solu: 𝑥̅ = = = 20 𝑦̅ = = = 21
𝑛 6 𝑛 6
x y X= x- 𝑥̅ Y= y- 𝑦̅ XY 𝑋2 𝑌2
10 18 -10 -3 30 100 9
14 12 -6 -9 54 36 81
18 24 -2 3 -6 4 9
22 6 2 -15 -30 4 225
26 30 6 9 54 36 81
30 36 10 15 150 100 225
∑ 𝑋 2 =280 ∑ 𝑌 2 =630
∑𝑥 ∑𝑦 ∑ 𝑋𝑌
= 120 = 126 = 252
∑ 𝑋𝑌
We have, Coefficient of correlation r = = 0.6
√∑𝑋 2 ∑𝑌 2
252 252
y- 𝑦̅ = 280 ( x- 𝑥̅ ) ( x- 𝑥̅ ) = (y- 𝑦̅ )
630
4. If ϴ is the acute angle between the lines of regression, then show that
𝜎𝑥 𝜎𝑦 1− 𝑟 2
tan ϴ = ( ).Explain the significance when r = 0 & r = ± 1.
𝜎𝑥 2 + 𝜎𝑦 2 𝑟
Solu: W. K. T If is ϴ acute, the angle between the lines y = m1 x + c1 and y = m2 x +c2 is given by
𝑚2 − 𝑚1
tan ϴ =
1+ 𝑚1 𝑚2
𝜎𝑥
and ( x - 𝑥̅ ) = r (y - 𝑦̅)
𝜎𝑦
𝜎𝑦
We write the second of the equation as y - 𝑦̅ = ( x - 𝑥̅ ) ….(2)
𝑟𝜎𝑥
𝜎𝑥 𝜎𝑦 1− 𝑟 2
tan ϴ = ( )
𝜎𝑥 2 + 𝜎𝑦 2 𝑟
By solving we get 𝑥̅ = 13 𝑦̅ = 17
We shall now rewrite the equation of the regression lines to find the regression
coefficients.
r = ± √(𝑐𝑜𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑥 )(𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑦)
= ± √(0.8)(0.45)
r = 0.6
It is a positive correlation.
6 ∑(𝑥−𝑦)2 6 ∑ 𝑑2
ρ =1- or 1 -
𝑛 (𝑛2 − 1) 𝑛 (𝑛2 − 1)
Note:
(1) If the ranking of x ,y are entirely in the same order like for example, x :1 1,2,3,4,5 ; y :
1,2,3,4,5 then ∑ 𝑑 2 = ∑(𝑥 − 𝑦 )2 = 0. This will give us 𝜌 = ± 1 and is called perfect direct
correlation.
If the ranking of x and y are entirely in the opposite order like for example, x : 1,2,3,4,5
y : 5,4,3,2,1 then ∑ 𝑑 2 = 40.This will give us ρ = -1 and is called perfect inverse
correlation.
Problems:
1. Ten competitors in a beauty contest are ranked by two judges in the following
order. Compute the coefficient of correlation
I 1 6 5 3 10 2 4 9 7 8
II 6 4 9 8 1 2 3 10 5 7
6 ∑ 𝑑2
Solu : We have ρ =1-
𝑛 (𝑛2 − 1)
∑ 𝑑2 = (1 − 6)2 + (6 − 4)2 +(5 − 9)2 + (3 − 8)2 +(10 − 1)2 +(2 − 2)2 +(4 − 3)2 + (9 − 10)2
= 25 + 4+ 16 + 25 + 81 + 0 +1 +1 + 4 + 1 =158
6 (158)
Hence 𝜌 = 1 – 10(102 − 1) = 0.042
2. Ten students got the following percentage of marks in two subjects x and y.
Compute their rank correlation coefficient.
Marks in 78 36 98 25 75 82 90 62 65 39
x
Marks in 84 51 91 60 68 62 86 58 53 47
y
Solu : We prepare the table consisting of the given data along with the ranks assigned
according to their order of the magnitude.In the subject x, 98 will be awarded rank
6 ∑ 𝑑2
We have ρ =1- and n = 10 for the given data.
𝑛 (𝑛2 − 1)
6 (30)
= 1 - 10(102 − 1)
= 0.82
3. Ten competitors in music contest are ranked by 3 judges A, B,C in the following order.
Use the rank correlation coefficient to decide which pair of judges have the nearest
A 1 6 5 10 3 2 4 9 7 8
B 3 5 8 4 7 10 2 1 6 9
C 6 4 9 8 1 2 3 10 5 7
Solu : We shall compute 𝜌𝐴𝐵 , 𝜌𝐵𝐶 , 𝜌𝐶𝐴 with the help of the following table where d is
6 ∑ 𝑑2
We have ρ = 1 - 𝑛 (𝑛2− 1) and n = 10 for the given data.
6 (200)
Now, 𝜌𝐴𝐵 = 1 - 10(102 − 1) = - 0.21
6 (214)
𝜌𝐵𝐶 = 1 - 10(102 − 1) = - 0.297
6 (60)
𝜌𝐶𝐴 = 1 - = + 0.636
10(102 − 1)
It may be observed that 𝜌𝐴𝐵 and 𝜌𝐵𝐶 are negative which means their tastes
(A &B; B &C) are opposite. But 𝜌𝐶𝐴 is positive and is nearer to 1.(perfect
correlation)