Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Simple correlation

Basic concept:

Coefficient of correlation: Coefficient of correlation is the measure of linear


relationship between two or more random variables.

It measures the strength and the direction of linear relationship between two
variables.

As the measures of degree of linear relationship between two variables Karl


Pearson developed a formula.

If X and Y are two random variables. Then the coefficient of correlation between X
and Y is denoted by ρ xyor r xywhich is defined by
C ov ( X , Y )
r XY =
√V ( x ) . V (Y )
n
1
∑ (X − X́)(Y i−Ý )
n i=0 i
¿
n n

√ 1

n i =0
(
21
X i− X́ ) ∑ ( Y i −Ý )
n i=0
2

∑ (X i − X́ )(Y i−Ý )
i=0
¿ n n

√ ∑ ( X i− X́ )
i=0
2
∑ ( Y i−Ý )
i=0
2

Example: i) Height and weight of a person.

ii) Price and demand of a thing.

Different kinds of correlation:

Positive correlation: If the variables are variance in the same direction i.e. if the
increase or decrease of one variable result in a corresponding increase or
decrease in the other variable is called positive correlation.

Example: Height and weight of a person.


Negative correlation: If two variables constantly variance in the opposite direction
i.e. if one variables is increasing, the other is decreasing or one is decreasing, the
other is increasing is said to be negative correlation.

Example: Price and demand of crops.

Non-sense correlation: When two variables X and Y are linearly independent then
the value of correlation coefficient is zero. But r=0 does not means that the two
variables X and Y are not related. This type of correlation is known as non-sense
correlation.

Example: The height and age of university student.

Properties of correlation coefficient:

i) The correlation coefficient is independent of change of origin and scale


of measurement.
ii) The value of correlation coefficient is lies between -1 to +1.
iii) The correlation coefficient is the geometric mean of regression
coefficient.
iv) The correlation coefficient is a symmetric measure.
v) The correlation coefficient is dimensionless quantity. It is not expressed
any units of measurement.

Measure of correlation:

strong
Intermediate Weak Weak Intermediate strong

1- 1
-0.75 -0.25
0 0.25 0.75
1
indirect Direct Perfect
Perfect No relation correlation
correlation
Example question: Discuss the situation when r=-1, r=+1, r=-0.93 and r=0.65.

r= -1 means there exists a perfect negative relationship between the two


variables X and Y.

r= +1 means that there exists a perfect negative relation between them.

r= -0.93 indicates that there exists a strong negative relationship between X and
Y.

r= 0.65 means there exists a perfect moderate relationship between them.

Theorem: Prove that correlation coefficient lies between -1 to +1

Proof: Let X= x 1 , x 2 , x 3 ,… , x n and Y= y 1 , y 2 , y 3 , … , y n be two random variables having


mean X́ andÝ . Their corresponding variance are σ 2X and σ Y2 .

Now by definition of variance we can write,


n
12 2
V(X) =σ ❑ = ∑ ( X i− X́ )
X
n i=1
n
2 2
⇒ nσ ❑ =∑ ( X i− X́ )
X
i=1

n
1 2
And V(Y) = σ ❑2Y = ∑ ( Y i−Ý )
n i=1
n
2
⇒ nσ ❑2Y =∑ ( Y i−Ý )
i =1

By definition of correlation coeffient we can write,


C ov ( X , Y )
r=
√ V ( x ) . V (Y )
n
1
⇒r
∑ ( X − X́ )(Y i−Ý )
n i=0 i
¿ 2
√σ ❑X σ ❑2Y
n
∴ nr σ X σ Y =∑ ( X i− X́ )(Y i−Ý )------------------------(i)
i =0
Let us consider the following expression,

¿)

Which square is always positive.

Performing the square and taking sum we get,


n 2
X i− X́ Y i−Ý

i=0
( σX
±
σY ) ≥0

X i− X́ 2 Y i−Ý 2
n n n 2 2
X − X́ Y i−Ý
⇒∑
i=0
( σX ) +¿¿ ∑
i=0
( )
σY (
+¿ ¿2∑ i
i=0 σX )( ) σY
≥0

n n n
1 2 1 2 1 1
⇒ 2 ∑ ( X i− X́ ) + 2 ∑ ( Y i−Ý ) ± 2. . ∑ ( X i− X́ )(Y i−Ý )≥0
σ X i=1 σ Y i=1 σ X σ Y i=0

1 1 1 1
⇒ nσ ❑2X + 2 nσ ❑2Y ±2. . nr σ X σ Y ≥ 0
2
σX σY σ X σY

⇒ 2 n ±2 rn ≥ 0

⇒ 2 n ( 1± n ) ≥ 0

∴ 1± r ≥0

Considering positive sign


1+r ≥ 0

⇒ r ≥−1

∴−1 ≤ r -------(iii)

Considering negative sign


1−r ≥ 0

⇒r ≥1

∴ r ≤ 1 -------(iv)

From equation (iii) and (iv) we can write,


−1 ≤r ≤+1

So the correlation coefficient lies between -1 to +1

(Proved)

Theorem: Show that the coefficient of correlation is independent of the change of


origin and scale of measurement.

Proof: Let X and Y be two random variables. Then he correlation coefficient


between X and Y is defined as,

C ov ( X , Y )
r XY =
√V ( x ) . V (Y )
n

∑ (X i − X́ )(Y i−Ý )
i=0
¿ −−−−−−(i)
n n

√∑ (
i=0
X i− X́ )
2
∑ ( Y i−Ý )
i=0
2

Let us suppose,
X i−A Y i−B
U i= ∧V i=
h k

That means we have shifted the origin of X to A and Y to B. Also we have changed
the scale of X by h and Y by k.

Now,
X i−A
U i=
h

⇒ X i= A+U i h
n n n
⇒ ∑ X i=¿ ∑ A ¿+h ∑ U
i=0 i=0 i=0

n n n
1 1 1
⇒ ∑ X i=¿ ∑ A ¿ +h ∑ U
n i=0 n i=0 n i=0
∴ X́= A+ h Ú

And
Y i−B
V i=
k

⇒ Y i =B+Y i k
n n n
⇒ ∑ Y i =¿ ∑ B ¿ +k ∑ V i
i=0 i=0 i=0

n n n
1 1 1
⇒ ∑ X i=¿ ∑ B ¿+k ∑ V i
n i=0 n i=0 n i=0

∴ Ý =B+k V́

Putting these values on equation (i) we get,


n
1
∑ ( A +U i h− A−h Ú )(B+Y i k−B−k V́ )
n i=0
r XY =
n n

√ 1

n i =0
(
21
A +U i h− A−h Ú ) ∑ ( B+ Y i k −B−k V́ )
n i=0
2

n
1
hk ∑ (U i−Ú )(V i−V́ )
n i=0
¿
n n

√ h2
1

n i=0
(
2 1
U i−Ú ) . k 2 ∑ ( V i −V́ )
n i=0
2

n
1
hk ∑ (U −Ú )(V i−V́ )
n i=0 i
¿
n n
hk
√ 1

n i=0
(
2 1
U i−Ú ) . ∑ ( V i−V́ )
n i=0
2

¿ r UV

∴ r XY =r UV

So the correlation coefficient is independent of change of origin


and scale of measurement.
Theorem: Prove that correlation coefficient is the geometric
mean of two regression coefficient.
Proof: Let X and Y be two variables. Then the regression
coefficient of Y on X is defined as,
n

∑ ( X i− X́ )(Y i−Ý )
b Y∨ X = i=0 n
−−−(i)
2
∑ ( X i− X́ )
i=0

The regression coefficient of X and Y is defined as,


n

∑ ( X i− X́ )(Y i−Ý )
b X ∨Y = i=0 n
−−−(ii)
2
∑ ( Y i−Ý )
i=0

By definition of correlation coefficient we know that,

∑ (X i − X́ )(Y i−Ý )
i=0
r XY = n n

√ 2
∑ ( X i− X́ ) ∑ ( Y i−Ý )
i=0 i=0
2

Multiplying equation (i) and (ii) we get,


n n

∑ (X i− X́)(Y i−Ý ) ∑ ( X i− X́ )(Y i−Ý )


i=0
b Y∨ X × b X ∨Y = n
× i=0 n
2
∑ ( X i − X́ ) ∑ ( Y i−Ý ) 2
i=0 i =0
n 2

⇒ b Y ∨X × b X ∨Y = n
{∑ (
i=0
X i− X́ )( Y i−Ý )
n
}
2 2
∑ ( X i− X́ ) ∑ ( Y i−Ý )
i=0 i=0

∑ ( X i− X́ )(Y i −Ý )
i=0
⇒ √ bY ∨ X × b X ∨Y = n n

√∑ (i=0
X i− X́ )
2
∑ ( Y i−Ý )
i=0
2

⇒ √ bY ∨ X × b X ∨Y =r XY

∴ r XY = √ b Y ∨X × b X ∨Y

∴Correlation coefficient is geometric mean of two regression coefficient.

(Showed)

Problem: aX+bY+c=0 is an equation. Prove that, The correlation coefficient


between X and Y is -1 if signs of a and b are alike and +1 if they are different.

Proof: The given equation is,


aX + bY + c=0−−−−−−−(i)

aE ( X )+ bE ( Y )+ c=0−−−(ii) [Taking expectation on both sides]

From equation (i)-(ii) we get,


aX + bY −aE ( X )−bE ( Y )=0

⇒ a { X−E ( X ) }+ b { Y −E ( Y ) }=0

−b
∴ { X−E ( X ) }= { Y −E ( Y ) } −−−−−(iii)
a

By definition of covariance we know,


Cov ( X , Y )=E [ { X−E ( X ) }{ Y −E ( Y ) } ]

−b
¿ E[ { Y −E ( Y ) }{ Y −E ( Y ) }]
a

b
¿− E[ {Y −E ( Y )2 }]
a

b
¿− σ 2X
a

Squaring both sides of equation (iii) and taking expectation we get,


2
b 2 2
E { X −E ( X ) } = 2 E[ { Y −E ( Y ) } ]
a

b2 2
⇒ V ( X )= σY
a2

b2 2
∴ ⇒ V ( X )= σY
a2

By definition of correlation we get,


Cov( X ,Y )
r=
√ V ( X ) V (Y )
−b 2
σ
a Y
¿
b2 2
√ σ 2Y
a
σ
2 Y

−b 2
σ
a Y
¿
b 2
||
σ
a Y

−b
a
¿
b
a||
When a and b are opposite sign then,
−(+ b)
(−a)
r=
+b
| |
−a

b
a
r=
b
a

∴ r=+1

When a and b are same sign,


−(−b)
(−a)
r=
−b
−a| |
−b
a
r=
b
a

∴ r=−1

So the correlation coefficient between X and Y is -1 if signs of a and b are alike and
+1 if they are different.

(Proved)

Problem: If X and Y are uncorrelated. Find the correlation coefficient between


(X+Y) and (X-Y)

Let,

U =X +Y ∴ Ú= X́ + Ý [Taking sum and dividing by n]

U =X−Y ∴ Ú= X́ −Ý [Taking sum and dividing by n]

By definition of correlation we know that


Cov(U ,V )
r UV = −−−−−(i)
√V ( U ) V (V )
Let,

V ( X )=σ 2X and V ( Y )=σ 2Y

Now,
V ( U )=V ( X +Y ) =V ( X ) +V (Y )+ 2Cov ( X , Y )

¿ σ 2X +σ 2Y +0 [Since X and Y are uncorrelated]

¿ σ 2X +σ 2Y

And
V ( V )=V ( X −Y )=V ( X ) +V ( Y )−2 Cov ( X , Y )

¿ σ 2X +σ 2Y −0 [Since X and Y are uncorrelated]

¿ σ 2X +σ 2Y

Again,

Cov ( X , Y )=E [ ( U−Ú ) ( V −V́ ) ]

¿ E[ ( X +Y − X́ + Ý )( X −Y − X́ + Ý ) ]

¿E¿

¿E¿

2 2
¿ E ( X − X́ ) −E ( Y −Ý ) −2 E ¿

¿ σ 2X −σ 2Y −0 [As X and Y are uncorrelated. So the Cov(X,Y)=0]

Now putting these values on equation (i) we get,

σ 2X −σ 2Y
r UV = 2 2 2 2
√(σ X + σ Y )( σ X + σ Y )
σ 2X −σ 2Y
¿ 2
√( σ 2
X + σ 2Y )

σ 2X −σ 2Y
∴ r= 2 2
σ X +σ Y

(Showed)

Problem: If X 1 , X 2and X 3 are three uncorrelated variable with equal varianceσ 2.


1
Show that the correlation coefficient between X 1 + X 2 and X 2 + X 3 is 2 .

Solution: Let the variance of the variables is,

V ( X 1 ) =V ( X 2 )=V ( X 3 )=σ 2

As the variables are uncorrelated. So,


Cov ( X 1 , X 2 ) =Cov ( X 2 , X 3 ) =Cov ( X 1 , X 3 ) =0

Let,
U =X 1 + X 2 ∴ Ú= X́ 1+ X́ 2 [Taking sum and dividing by n]
V = X2+ X3 ∴ Ú= X́ 2+ X́ 3 [Taking sum and dividing by n]

By definition of correlation we know that


Cov(U ,V )
r UV = −−−−−(i)
√V ( U ) V (V )
We know,

Cov ( U , V )=E [ ( U −Ú ) ( V −V́ ) ]

¿ E[ ( X 1+ X 2− X́ 1− X́ 2 )( X 2+ X 3− X́ 2− X́ 3 ) ]

¿ E¿

¿ E¿

¿ Cov ( X 1 , X 2 ) +V ( X 2) + Cov ( X 1 , X 3 ) +Cov ( X 2 , X 3 )


¿ 0+ σ 2+ 0+0

¿ σ2

Again,

V ( U )=E (U−Ú )

¿ E( X 1+ X 2− X́ 1− X́ 2 )

Type equation here .

You might also like