Session 18

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Correlation

Smoking and Lung Capacity


• Suppose, for example, we want to investigate the relationship
between cigarette smoking and lung capacity
• We might ask a group of people about their smoking habits, and
measure their lung capacities

Cigarettes (X) Lung Capacity (Y)


0 45
5 42
10 33
15 31
20 29
• Scatter plot of the data

Lung Capacity
60

40

20

0
0 10 20 30
• We can see that as smoking goes up, lung capacity tends
to go down.
• The two variables change the values in opposite
directions.
Height and Weight
• Consider the following data of heights and weights of 5
women swimmers:
Height (inch): 62 64 65 66 68
Weight (pounds): 102 108 115 128 132
• We can observe that weight is also increasing with
height.
150

100

50

0
60 65 70
• Sometimes two variables are related to each other.
• The values of both of the variables are paired.
• Change in the value is reflected in the change of the value of
other.
• Usually these two variables are two attributes of each member
of the population
• For Example:
Height Weight
Advertising Expenditure Sales Volume
Unemployment Crime Rate
Rainfall Food Production
Expenditure Savings
Correlation
• Karl Pearson’s Correlation coefficient is given by
Cov( X , Y )
rXY = Corr ( X , Y ) =
Var( X ) Var(Y )

• When the joint distribution of X and Y is known


Cov( X , Y ) = E ( XY ) − E ( X ) E (Y )
Var( X ) = E ( X 2 ) − [ E ( X )] 2 ,Var(Y ) = E (Y 2 ) − [ E (Y )] 2

• When observations on X and Y are available


1 n
Cov( X , Y ) =  ( xi − x )( yi − y )
n i =1
1 n 1 n
Var( X ) =  ( xi − x ) ,Var(Y ) =  ( yi − y ) 2
2

n i =1 n i =1
Properties of Correlation Coefficient

• It is unit free.

• It measures the strength of relationship on a scale of -1 to +1.

• So, it can be used to compare the relationships of various pairs of variables.

• Values close to 0 indicate little or no correlation

• Values close to +1 indicate very strong positive correlation.

• Values close to -1 indicate very strong negative correlation.


Scatter Diagram
Y

X
Positively Correlated Negatively Correlated

Weakly Correlated Strongly Correlated Not Correlated


• Correlation Coefficient measures the strength of linear
relationship.

• r = 0 does not necessarily imply that there is no


correlation.

• It may be there, but is not a linear one.

• The correlation between two random variables X and Y is


a measure of the degree of linear association between
the two variables.
Exercise
x y x−x y− y ( x − x )2 ( y − y )2 ( x − x )( y − y )
1.25 125 -0.9 45 0.8100 2025 -40.50
1.75 105 -0.4 25 0.1600 625 -10.00
2.25 65 0.1 -15 0.0100 225 -1.50
2.00 85 -0.15 5 0.0225 25 -0.75
2.50 75 0.35 -5 0.1225 25 -1.75
2.25 80 0.1 0 0.0100 0 0
2.70 50 0.55 -30 0.3025 900 -16.50
2.50 55 0.35 -25 0.1225 625 -8.75
17.50 640 0 0 1.560 4450 -79.75
SSX SSY SSXY

Cov( X , Y ) SSXY − 79.75


r= = = = −0.957
Var ( X )Var (Y ) SSX SSY 1.56 4450
Alternative Formulas for Sum of Squares
( x ) SSY = y − ( y ) ,
2 2
( x )( y )
SSX =  x − 2

n
,  n
SSXY =  xy −
2

n
x y x2 y2 x.y
1.25 125 1.5625 15625 156.25
1.75 105 3.0625 11025 183.75
2.25 65 5.0625 4225 146.25
SSX = 1.56
2.00 85 4.0000 7225 170.00
2.50 75 6.2500 5625 187.50
SSY = 4450
2.25 80 5.0625 6400 180.00
2.70 50 7.2500 2500 135.00
2.50 55 6.2500 3025 137.50
SSXY= -79.75
17.20 640 38.54 55650 1296.25

Cov( X , Y ) SSXY − 79.75


r= = = = −0.957
Var ( X )Var (Y ) SSX SSY 1.56 4450
Smoking and Lung Capacity Example

Cigarettes Lung
(X) 2 2 Capacity
X XY Y
(Y)
0 0 0 2025 45
5 25 210 1764 42
10 100 330 1089 33
15 225 465 961 31
20 400 580 841 29
50 750 1585 6680 180
(5)(1585) − (50)(180)
rxy =
(5)(750) − 50  (5)(6680) − 180 
2 2

7925 − 9000
=
(3750 − 2500)(33400 − 32400)
−1075
= = −.9615
(1250 ) (1000)
Exercise

Compute coefficient of correlation for the given data set

X: 100 200 300 400 500 600 700


Y: 30 50 60 80 100 110 130

Ans: r=.997
Following data gives indices of industrial production and number of
registered unemployed people (in lakh). Calculate correlation
coefficient
Year production unemployed
1991 100 15
1992 102 12
1993 104 13
1994 107 11
1995 105 12
1996 112 12
1997 103 19
1998 99 26

Ans: -.619

You might also like