Professional Documents
Culture Documents
Notes 4
Notes 4
Correlation:
If the change in one variable affects a change in the other variable, the variables are said
to be correlated.
Positive Correlation
Negative Correlation
Eg: Price and Demand of a commodity, the correlation between volume and pressure of a
perfect gas.
Covariance
𝐶𝑜𝑣 ( 𝑋, 𝑌) = 𝐸(𝑋𝑌) – 𝐸(𝑋). 𝐸(𝑌). If 𝑋 and 𝑌 are independent then 𝐶𝑜𝑣 (𝑋, 𝑌)
𝐶𝑂𝑉(𝑋,𝑌)
𝑟(𝑋, 𝑌) =
𝜎𝑋 .𝜎𝑌
1
Where 𝐶𝑂𝑉(𝑋, 𝑌) = ∑ 𝑋𝑌 − 𝑋̅ 𝑌̅
𝑛
1 ∑𝑋
𝜎𝑋 = √ ∑ 𝑋 2 − ̅̅̅̅
𝑋2 , 𝑋̅ =
𝑛 𝑛
1 ∑𝑌
𝜎𝑌 = √ ∑ 𝑌 2 − ̅𝑌̅̅2̅ , 𝑌̅ =
𝑛 𝑛
Note
Note
0<𝑟<1 Positive
𝑟 = 0 Uncorrelated
1. Calculate the correlation coefficient for the following heights (in inches) of fathers
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Solution:
X Y XY X2 Y2
= 544 = 38132
544 552
𝑋̅ = = 68 , 𝑌̅ = = 69
8 8
𝑋̅ 𝑌̅ = 68 𝑋 69 = 4692
1
𝜎𝑋 = √ ∑ 𝑋 2 − ̅̅̅̅
𝑋2 = 2.121
𝑛
1
𝜎𝑌 = √ ∑ 𝑌 2 − ̅𝑌̅̅2̅ = 2.345
𝑛
1 1
𝐶𝑂𝑉(𝑋,𝑌) ∑ 𝑋𝑌− 𝑋̅𝑌̅ 37560−4692
𝑛 8
𝑟(𝑋, 𝑌) = = = = 0.6031
𝜎𝑋 .𝜎𝑌 𝜎𝑋 .𝜎𝑌 2.121 𝑋 2.345
2. Find the correlation coefficient between industrial production and export using
Production(X) 55 56 58 59 60 60 62
EXPORT(Y) 35 38 37 39 44 43 44
Solution:
U V 𝑼 = 𝑿 − 𝟓𝟖 𝑽 = 𝒀 − 𝟒𝟎 UV U2 V2
55 35 -3 -5 15 9 25
56 38 -2 -2 4 4 4
58 37 0 -3 0 0 9
59 39 1 -1 -1 1 1
60 44 2 4 8 4 16
60 43 2 3 6 4 9
62 44 4 4 16 16 16
= 48 = 38 = 80
∑𝑈 4
Now ̅=
𝑈 = = 0.5714
𝑛 7
∑𝑉
𝑉̅ = = 0
𝑛
∑ 𝑈𝑉
𝐶𝑜𝑣 (𝑈, 𝑉) = ̅𝑉̅ = 6.857
− 𝑈
𝑛
1
𝜎𝑋 = √ ∑ 𝑈 2 − ̅̅̅̅
𝑈 2 = 2.2588
𝑛
1
𝜎𝑌 = √ ∑ 𝑉 2 − ̅̅
𝑉̅̅2 = 3.38
𝑛
1
𝐶𝑂𝑉(𝑈,𝑉) ̅𝑉
∑ 𝑈𝑉− 𝑈 ̅
𝑛
𝑟 (𝑈, 𝑉) = = = 0. 898
𝜎𝑈 .𝜎𝑉 𝜎𝑈 .𝜎𝑉
𝒙𝒚
𝟎 < 𝒙 < 𝟒, 𝟏 < 𝒚 < 𝟓
3. Two R.V.’S X and Y have joint p.d.f of 𝒇(𝒙, 𝒚) = {𝟗𝟔 .
𝟎 𝒆𝒍𝒔𝒆𝒘𝒉𝒆𝒓𝒆
Find(i) 𝑬(𝑿) (ii) 𝑬(𝒀) (iii) 𝑬(𝑿𝒀) (iv) 𝑬( 𝟐𝑿 + 𝟑𝒀 ) (v) 𝑽𝒂𝒓(𝑿) (vi)𝑽𝒂𝒓(𝒀) (vii)
𝑪𝒐𝒗(𝑿, 𝒀)
Solution:
∞ ∞
(i) 𝐸 (𝑋) = ∫−∞ ∫−∞ 𝑥 𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
5 4 𝑥𝑦
= ∫1 ∫0 𝑥 𝑑𝑥𝑑𝑦
96
1 5 4 2
= ∫ ∫ 𝑥 𝑦𝑑𝑥𝑑𝑦
96 1 0
1 5 𝑥3
= ∫ 𝑦( )40 𝑑𝑦
96 1 3
5
64 5 2 y2 24
= ∫ 𝑦 𝑑𝑦 =
288 1 9 2 1 9
∞ ∞
(ii) 𝐸 (𝑌) = ∫−∞ ∫−∞ 𝑦 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦
5 4 𝑥𝑦
= ∫1 ∫0 𝑦 𝑑𝑥𝑑𝑦
96
1 5 4
= ∫ ∫ 𝑥𝑦 2 𝑑𝑥𝑑𝑦
96 1 0
5 2 𝑥2
1 4
= ∫ 𝑦 (2 ) 𝑑𝑦
96 1 0
1 5
=
96 (2)
∫1 𝑦 2 (42 − 0)𝑑𝑦
5
16 y 3 124
=
192 3 1 36
31
⇒ 𝐸 (𝑌) =
9
∞ ∞
(iii) 𝐸 (𝑋𝑌 ) = ∫−∞ ∫−∞ 𝑥𝑦 𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
5 4 𝑥𝑦
= ∫1 ∫0 𝑥𝑦 ( )𝑑𝑥𝑑𝑦
96
1 5 4
= ∫ ∫ 𝑥 2 𝑦 2 𝑑𝑥𝑑𝑦
96 1 0
1 5 𝑥3
= ∫ 𝑦2(
96 1 3
)40 𝑑𝑦
1 5
=
96 (3)
∫1 𝑦 2 (43 − 0)𝑑𝑦
5
64 5 2 y3 248
= ∫ 𝑦2
288 1
𝑑𝑦 =
9 3 1 27
248
⇒ 𝐸 (𝑋𝑌) =
27
8 31
= 2( ) + 3( )
3 9
16+31 47
= =
3 3
∞ ∞
Now, 𝐸 (𝑋2 ) = ∫−∞ ∫−∞ 𝑥 2 𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
5 4 𝑥𝑦
= ∫1 ∫0 𝑥 2 ( )𝑑𝑥𝑑𝑦
96
1 5 4 3
= ∫ ∫ 𝑥 𝑦𝑑𝑥𝑑𝑦
96 1 0
1 5 𝑥4
= ∫ 𝑦( )40 𝑑𝑦
96 1 4
1 5
=
96 (4)
∫1 𝑦 (44 − 0)𝑑𝑦
𝑑𝑦 = 8
256 5 2 y2 24
= ∫ 𝑦
384 1 3 2 1 3
⇒ 𝐸 (𝑋 2 ) = 8
8 2 8
𝑉𝑎𝑟 (𝑋 ) = E (𝑋2 ) − [𝐸 (𝑋)]2 = 8 − ( ) =
3 9
8 √8
𝜎𝑋2 = ⇒ 𝜎𝑋 =
9 3
∞ ∞
𝐸 (𝑌 2 ) = ∫−∞ ∫−∞ 𝑦 2 𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
5 4 𝑥𝑦
= ∫1 ∫0 𝑦 2 ( )𝑑𝑥𝑑𝑦
96
1 5 4
= ∫ ∫ 𝑥𝑦 3 𝑑𝑥𝑑𝑦
96 1 0
1 5 𝑥2
= ∫ 𝑦3(
96 1 2
)40 𝑑𝑦
1 5
=
96 (2)
∫1 𝑦 3 (42 − 0)𝑑𝑦
16 5
=
192
∫1 𝑦 3 𝑑𝑦
5
4
1 y 624
= 13
12 4 48
1
⇒ 𝐸 (𝑌 2 ) = 13
31
= 13 – ( )2
9
92
=
81
92 √92
𝜎𝑌2 = ⇒ 𝜎𝑌 =
81 9
248 248
= - =0
27 27
Solution:
Given that 𝑉𝑎𝑟 (𝑋 ) = 36, 𝑉𝑎𝑟 (𝑌 ) = 16. Since X and Y are independent,
Let 𝑈 = 𝑋 + 𝑌 and 𝑉 = 𝑋 – 𝑌
𝑉𝑎𝑟 (𝑈 ) = 𝑉𝑎𝑟 ( 𝑋 + 𝑌 )
= 36 + 16 = 52
⇒ 𝜎𝑈 = √52
𝑉𝑎𝑟 (𝑉 ) = 𝑉𝑎𝑟 ( 𝑋 − 𝑌 )
= 36 + 16 = 52
⇒ 𝜎𝑉 = √52
= E [ X2 – Y2]
= 𝑉𝑎𝑟(𝑋) – 𝑉𝑎𝑟(𝑌)
𝐶𝑜𝑣 (𝑈, 𝑉) = 36 – 16 = 20
𝐶𝑜𝑣(𝑈,𝑉) 20 20 5
Hence 𝜌(𝑈, 𝑉 ) = = = =
𝜎𝑈 𝜎𝑉 √52√52 52 13
Regression
Lines of Regression
If the variables in a bivariate distribution are related we will find that the points in the
scattered diagram will cluster around some curve called the curve of regression. If the
curve is a straight line, it is called the line of regression and there is said to be linear
𝝈𝒀
̅ = 𝒓.
𝒚−𝒚 (𝒙 − 𝒙
̅)
𝝈𝑿
𝝈𝒀
̅ = 𝒓.
𝒙−𝒙 (𝒚 − 𝒚
̅)
𝝈𝑿
Note
Both the lines of regression passes through the mean (𝑥̅ , 𝑦̅)
𝝈𝒀
̅ = 𝒓.
𝒚−𝒚 (𝒙 − 𝒙
̅)
𝝈𝑿
𝝈𝒀
̅ = 𝒓.
𝒙−𝒙 (𝒚 − 𝒚
̅)
𝝈𝑿
𝟏 − 𝒓𝟐 𝝈𝒀 𝝈𝑿
𝒕𝒂𝒏𝜽 = ( 𝟐 )
𝒓 𝝈𝒀 + 𝝈𝑿 𝟐
Problems on Regression
1. From the following data, find (i) the two regression equations (ii) The coefficient
of correlation between the marks in Economics and Statistics (iii) The most likely
Marks in 25 28 35 32 31 36 29 38 34 32
Economics
Marks in 43 46 49 41 36 32 31 30 33 39
Statistics
Solution:
X Y ̅
𝑿−𝑿 ̅
𝒀−𝒀 ̅ )𝟐
(𝑿−𝑿 ̅ )𝟐
(𝒀−𝒀 ̅ )( 𝒀 − 𝒀
(𝑿 − 𝑿 ̅)
25 43 -7 5 49 25 -35
28 46 -4 8 16 64 -32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
∑𝑋 320
Now 𝑋̅ = = = 32
𝑛 10
∑𝑌 380
𝑌̅ = = = 38
𝑛 10
∑(𝑋−𝑋̅)( 𝑌−𝑌̅)
Coefficient of regression of Y on X is 𝑏𝑌𝑋 = ∑( 𝑋−𝑋̅)2
93
= − = −0.6643
140
∑(𝑋−𝑋̅)( 𝑌−𝑌̅)
Coefficient of regression of X on Y is 𝑏𝑋𝑌 = ∑( 𝑌−𝑌̅)2
93
= − = −0.2337
398
⇒ 𝑥 − 32 = −0.2337(𝑦 − 38)
⇒ 𝑥 = −0.2337𝑦 + 0.2337 × 38 + 32
⇒ 𝑥 = −0.2337𝑦 + 40.8806
⇒ 𝑦 − 38 = −0.6642(𝑦 − 38)
⇒ 𝑦 = −0.6642𝑥 + 0.6642 × 32 + 38
⇒ 𝑦 = −0.6642𝑥 + 59.2576
= −0.6643 × (−0.2337)
= 0.1552
𝑟 = ±√0.1552
= ±0.394
Now we have to find the most likely marks in statistics (𝑌) when marks in Economics
⇒ 𝑦 = −0.6642𝑥 + 59.2576
Put 𝑥 = 30 we get
⇒ 𝑦 = −0.6642 × 30 + 59.2576
⇒ 𝑦 = 39.3286
2. The two lines of regression are 𝟖𝒙 − 𝟏𝟎𝒚 + 𝟔𝟔 = 𝟎, 𝟒𝟎𝒙 − 𝟏𝟖𝒚 − 𝟐𝟏𝟒 = 𝟎. The
variance of X is 9. Find (i) the mean value of X and Y (ii) correlation coefficient
between X and Y.
Solution:
Since both the lines of regression passes through the mean values 𝑥̅ and 𝑦̅, the point
⇒ 32𝑦̅ = 544
⇒ 𝑦̅ = 17
⇒ 8𝑥̅ − 10 × 17 = −66
⇒ 𝑥̅ = 13
(ii) Let us suppose that equation (A) is the equation of line of regression of Y on X and
(B) is the equation of the line regression of X on Y, we get after rewriting (A) and (B)
⇒ 10𝑦 = 8𝑥 + 66
8 66
⇒𝑦= 𝑥+
10 10
8
⇒ 𝑏𝑌𝑋 =
10
18 214
⇒𝑥= 𝑦+
40 40
18
⇒ 𝑏𝑋𝑌 =
40
8 18 9
= × =
10 40 25
3
⇒𝑟=±
5
= ±0.6
Hence 𝑟 = 0.6
Important note:
⇒ 8𝑥 = 10𝑦 − 66
10 66
⇒𝑥= 𝑦−
8 8
10
⇒ 𝑏𝑋𝑌 =
8
40 214
⇒𝑦= 𝑥−
18 18
40
⇒ 𝑏𝑌𝑋 =
18
10 40 25
= × =
8 8 9
𝑟 = 2.78
But 𝑟 2 should always lies between 0 and 1. Hence our assumption that line (A) is line of
variance of X is 25. Find (i) the mean value of X and Y (ii) correlation coefficient
between X and Y.
Solution:
Since both the lines of regression passes through the mean values 𝑥̅ and 𝑦̅, the point
⇒ 16𝑦̅ = 272
⇒ 𝑦̅ = 17
⇒ 4𝑥̅ − 5 × 17 = −33
⇒ 4𝑥̅ = −33 + 85
⇒ 𝑥̅ = 13
(ii) Let us suppose that equation (A) is the equation of line of regression of Y on X and
(B) is the equation of the line regression of X on Y, we get after rewriting (A) and (B)
⇒ 5𝑦 = 4𝑥 + 33
4 33
⇒𝑦= 𝑥+
5 5
4
⇒ 𝑏𝑌𝑋 =
5
⇒ 20𝑥 = 9𝑦 + 107
9 107
⇒𝑥= 𝑦+
20 20
9
⇒ 𝑏𝑋𝑌 =
20
4 9 3
= × =
5 20 5
𝑟 = ±0.6
Solution:
Given,
⇒ 𝑋 = 3 − 0.5𝑌
⇒ 𝑏𝑋𝑌 = −0.5
⇒ 𝑌 = 5 + 2.8𝑋
⇒ 𝑏𝑌𝑋 = 2.8