Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

1

0 1
A 2
S T
3 _
0 2
g 2
Correlation ir n
S p
Md. Mahfuzur Rahman
R _
Z
Senior Lecturer, Statistics
M
Content 2

0 1
A 2
 Correlation
S T
 Simple correlation (Pearson correlation)
3 _
 Rank correlation (Spearman correlation)
0 2
g 2
ir n
S p
R _
Z
M
Correlation 3

0 1
 2
In many applications, we may want to study the underlying nature of
A
S T
relationships among the variables. Furthermore, we may also want to
utilize these relationships for predicting or estimating the values for
3 _
some variables on the basis of the given values for the other variables.
0 2
By exploring the underlying relationships, we can explore very
g 2
important findings and can provide necessary inputs required for useful
ir n
decisions. Some examples of these relationships are:

S p
(i) relationship between height and weight,
R _
(ii) relationship between weight and cholesterol level,
Z
M
(iii) relationship between income and expenditure, etc.
Correlation 4

0 1
A 2
 T
In this type of studies, we are interested in answering several
S
important questions, some of which are:
3 _
 2
(i) Is there a relationship between the variables? What is the
0
2
nature of this relationship? What is the strength of this
g
relationship?
r i n
 (ii) If there is a relationshippbetween the variables, how can we
_
formulate it mathematically?S How can we utilize it?

Z R
M
Correlation 5

0 1
A 2
 Relationship between two or more variables
S T
 _
When variables are found to be related, we often want to know
3
2
how close the relationship is. This type of analysis is known as
0
correlation analysis
g 2

ir n
The primary objective of correlation analysis is to measure-

S p
Degree or strength of relationships

R _
Direction of relationship
Z

M
Correlation does not necessarily mean causation
Correlation 6

0 1
A 2
 Example:
S T
No. of family members, X
_
Monthly expenditure on food (thousand taka), Y
3
2 5
0 2
3
6
7
11 g 2
4 8 ir n
7 13
S p
3 6
R _
6
Z 12
M
Correlation 7

Scatter Diagram 0 1
14
A 2
T
Monthly expenditure on food

12

_ S
3
(thousand taka)

10

0 2
8

g 2
6
ir n
4
S p
2
R _
0 Z
0 1 M 2 3 4
Number of family members
5 6 7 8
Correlation 8

0 1
A 2
Positive correlation T
Negative correlation
S
30 30

3 _
25 25

0 2
20
g
20 2
ir n
15

S p 15

10

R _ 10

5 Z 5

0
M 0
0 2 4 6 8 10 0 2 4 6 8 10
Correlation 9

0 1
A 2
Non-linear correlation T
No correlation
S
45 60

3 _
40
50

0 2
2
35

30
g
40

25
ir n
20

S p 30

15

R _ 20

10

Z 10
5

0
M 0
0 2 4 6 8 10 0 2 4 6 8 10
Simple correlation 10

0 1
A 2
 Pearson correlation coefficient-
𝑐𝑜𝑣(𝑋, 𝑌) S T
𝑟=
𝑣 𝑋 𝑣 𝑌 3 _
∑ 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
0 2
=
∑ 𝑥𝑖 − 𝑥 2 ∑ 𝑦𝑖 − 𝑦 2
g 2
=
𝑛 ∑𝑥𝑖 𝑦𝑖 − ∑𝑥𝑖 ∑𝑦𝑖
ir n= 𝑛 ∑𝑥𝑦 − ∑𝑥∑𝑦

𝑛 ∑ 𝑥𝑖2 − ∑𝑥𝑖 2 𝑛 ∑ 𝑦𝑖2


S p
− ∑𝑦 𝑖
2 𝑛 ∑ 𝑥 2 − ∑𝑥 2 𝑛 ∑ 𝑦 2 − ∑𝑦 2

R _
 Z
X & Y are two numerical variables and n is the number of pairs.
M
Simple correlation 11

Interpretation
0 1
 r>0: Positive linear relationship
A 2
 r<0: Negative linear relationship
S T
 r=0: No linear relationship
3 _
0 2
Perfect negative correlation
g 2 Perfect positive correlation

ir n
-1
S p 0 +1

R _
Negative correlation Positive correlation
Z No linear relationship
M
Simple correlation 12

0 1
A 2
S T
3 _
Correlatio
n -1 (-.99)-(-.51) -.5 (-.49)-(-.01) 0
0 2 .01-.49 .5 .51-.99 1
Coefficient
Correlatio Perfect Strong Moderate Weak g 2 No Weak Moderate Strong Perfect
n type negative negative negative
ir n
negative correlation positive positive positive positive

S p
R _
Z
M
Simple correlation 13

0 1
A 2
Assumptions:
S T

_
Both X & Y are measured on an interval or ratio scales
3

2
The two variables follow bi-variate normal distribution
0

2
The relationship between the variables is linear
g

ir n normality
The sample is of adequate size to assume

S p
R _
Z
M
Simple correlation 14

0 1
A 2
Example (continues)
S T
No. of family
members, x
Monthly expenditure on
food (thousand taka), y
𝒙𝟐
3 _ 𝒚𝟐 xy

2 5
0 2
3 7
g 2
6 11 ir n
4 8
S p
7
3
13
6 R _
Z
6 12
M
Simple correlation 15

0 1
A 2
Example (continues)
S T
No. of family
members, x
Monthly expenditure on
food (thousand taka), y
𝒙𝟐
3 _ 𝒚𝟐 xy

2 5 4
0 2 25 10
3 7
g92 49 21
6 11 ir n 3616 121 66
4 8
S p 64 32
7
3
13
6 R _ 49
9
169
36
91
18
Z
6 12
M 36 144 72

∑𝒙 = 𝟑𝟏 ∑𝒚 = 𝟔𝟐 ∑𝒙𝟐 = 𝟏𝟓𝟗 ∑𝒚𝟐 = 𝟔𝟎𝟖 ∑𝒙𝒚 = 𝟑𝟏𝟎


Simple correlation 16

0 1
A 2
𝑟=
2
𝑛 ∑𝑥𝑦 − ∑𝑥∑𝑦
2 2 2S T
7 ∗ 310 − 31 ∗ 62 3_
𝑛 ∑ 𝑥 − ∑𝑥 𝑛 ∑ 𝑦 − ∑𝑦
=
7 ∗ 159 − 31 2
0
7 ∗ 6082 2
− 62
= 0.991
g 2
ir n
S p
Interpretation: So, there is a very strong positive relationship between number of

the same direction. R _


family members and monthly expenditure. That is, both increase or decrease in

Z
M
Simple correlation 17

0 1
A 2
Properties:
S T
 r always measures linear relationships
3 _

linearly related. 0 2
r=0 doesn’t necessarily mean that X & Y are not related, but that they are not

2
𝒓 = 𝒓 , i.e. correlation coefficient is agsymmetrical measure
 𝒙𝒚 𝒚𝒙
r i n

S p
The correlation coefficient is a dimensionless
expressed in any units of measurement
measure, implying that it is not


R _
Correlation doesn’t mean causation, i.e. correlation doesn’t necessarily imply
Z
any cause and effect relationship
M
Simple correlation 18
Example:
0 1
A 2
An executive manager of a private hospital was interested in studying the relationship between

S T
the monthly number of part-time physicians (X) hired in the hospital and the monthly extra
profit earned by the hospital in thousands (Y). For this purpose, the manger selected a random
sample of ten months and obtained the following data:
3 _
i Xi Yi i
0 2
Xi Yi

1 43 175
g 2
6 32 165
2 49 180
ir n 7 51 190
3 50 186
S p 8 30 95
4 12

R _95 9 35 130
5 8
Z 75 10 23 95

M
(1) Draw the scatter diagram. What indications does the scatter diagram reveal?
(2) Calculate Pearson's Correlation Coefficient (r).
Rank correlation 19

0 1
A 2
coefficient r applied to the rank order data. S T
Spearman rank correlation (Spearman’s rho) rs is the sample correlation

3 _
0 2
g 2
ir n
S p
R _
Z
M
Rank correlation 20

0 1
A 2
Formula:
∑ 𝑥𝑖 𝑦𝑖 − 𝐶 S T
𝑟𝑠 =
−3
_
∑ 𝑥𝑖2 −𝐶
0 2
∑ 𝑦𝑖2 𝐶

Where, 𝐶 =
𝑛 𝑛+1 2
g 2
4
ir n
And n is the number of pairs.
S p
R _
Z
M
Rank correlation 21

0 1
Example A 2
No. of family Monthly expenditure on Rank of x, Rank ofS
x,
T 𝒂𝟐 𝒃𝟐 ab
b _
members, x
2
food (thousand taka), y
5
a
2 3
3 7 2 0
n g
6 11
r i
4 8
S p
7
3
13
6
R _
6 12 Z
M
Rank correlation 22

0 1
Example A 2
No. of family Monthly expenditure on Rank of x, Rank ofS
y,
T 𝒂𝟐 𝒃𝟐 ab
b _
members, x
2
food (thousand taka), y
5
a
1 2
1
3
3 7 2.5
0
2 3
1 1 1

n g 6.25 9 7.5
6 11
4r
i
5.5 5 30.25 25 27.5
4 8
p
S7
4 16 16 16
7
3
13
6
R _ 2.5
7
2
49
6.25
49
4
49
5
6 12 Z 5.5 6
Total - M - -
30.25
139
36
140
33
139
Rank correlation 23

0 1
𝑛 𝑛+1 2 7 7+1 2
A 2
𝐶=
4
=
4
= 112
S T
3 _
𝑟𝑠 =
∑ 𝑎𝑖 𝑏𝑖 − 𝐶
0 2
2
∑ 𝑎𝑖2 − 𝐶 ∑ 𝑏𝑖2 − 𝐶
g
= ir n
139 − 112
= 0.982
p
139 − 112 140 − 112
S
R _
Z
Interpretation: So, there is a very strong positive relationship between number of
M
family members and monthly expenditure.
Rank correlation 24

0 1
A 2
Properties:
S T

_
Spearman correlation coefficient ranges from -1 to 1 with similar interpretation
to that for the simple correlation coefficient r. 3
 0
r is a measure of monotonicity of a relationship
2
s

g 2
ir n
 Again, correlation doesn’t mean causation, i.e. correlation doesn’t necessarily
imply any cause and effect relationship

S p
R _
Z
M
Advantages and disadvantages 25

0 1
A 2
Advantage of r over rs:
S T

_
r provides a more accurate result than rs, when applicable, as r uses more
information than rs 3
0 2
g 2
Advantage of rs over r:
r i n

p
r is less affected by extreme observations
s

S


s

R _
r can be calculated for curvilinear
r can be calculated for ordinal
(non-linear) relationship
level of data.
s
Z
M

You might also like