Professional Documents
Culture Documents
Correlation (STA 201)
Correlation (STA 201)
0 1
A 2
S T
3 _
0 2
g 2
Correlation ir n
S p
Md. Mahfuzur Rahman
R _
Z
Senior Lecturer, Statistics
M
Content 2
0 1
A 2
Correlation
S T
Simple correlation (Pearson correlation)
3 _
Rank correlation (Spearman correlation)
0 2
g 2
ir n
S p
R _
Z
M
Correlation 3
0 1
2
In many applications, we may want to study the underlying nature of
A
S T
relationships among the variables. Furthermore, we may also want to
utilize these relationships for predicting or estimating the values for
3 _
some variables on the basis of the given values for the other variables.
0 2
By exploring the underlying relationships, we can explore very
g 2
important findings and can provide necessary inputs required for useful
ir n
decisions. Some examples of these relationships are:
S p
(i) relationship between height and weight,
R _
(ii) relationship between weight and cholesterol level,
Z
M
(iii) relationship between income and expenditure, etc.
Correlation 4
0 1
A 2
T
In this type of studies, we are interested in answering several
S
important questions, some of which are:
3 _
2
(i) Is there a relationship between the variables? What is the
0
2
nature of this relationship? What is the strength of this
g
relationship?
r i n
(ii) If there is a relationshippbetween the variables, how can we
_
formulate it mathematically?S How can we utilize it?
Z R
M
Correlation 5
0 1
A 2
Relationship between two or more variables
S T
_
When variables are found to be related, we often want to know
3
2
how close the relationship is. This type of analysis is known as
0
correlation analysis
g 2
ir n
The primary objective of correlation analysis is to measure-
S p
Degree or strength of relationships
R _
Direction of relationship
Z
M
Correlation does not necessarily mean causation
Correlation 6
0 1
A 2
Example:
S T
No. of family members, X
_
Monthly expenditure on food (thousand taka), Y
3
2 5
0 2
3
6
7
11 g 2
4 8 ir n
7 13
S p
3 6
R _
6
Z 12
M
Correlation 7
Scatter Diagram 0 1
14
A 2
T
Monthly expenditure on food
12
_ S
3
(thousand taka)
10
0 2
8
g 2
6
ir n
4
S p
2
R _
0 Z
0 1 M 2 3 4
Number of family members
5 6 7 8
Correlation 8
0 1
A 2
Positive correlation T
Negative correlation
S
30 30
3 _
25 25
0 2
20
g
20 2
ir n
15
S p 15
10
R _ 10
5 Z 5
0
M 0
0 2 4 6 8 10 0 2 4 6 8 10
Correlation 9
0 1
A 2
Non-linear correlation T
No correlation
S
45 60
3 _
40
50
0 2
2
35
30
g
40
25
ir n
20
S p 30
15
R _ 20
10
Z 10
5
0
M 0
0 2 4 6 8 10 0 2 4 6 8 10
Simple correlation 10
0 1
A 2
Pearson correlation coefficient-
𝑐𝑜𝑣(𝑋, 𝑌) S T
𝑟=
𝑣 𝑋 𝑣 𝑌 3 _
∑ 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
0 2
=
∑ 𝑥𝑖 − 𝑥 2 ∑ 𝑦𝑖 − 𝑦 2
g 2
=
𝑛 ∑𝑥𝑖 𝑦𝑖 − ∑𝑥𝑖 ∑𝑦𝑖
ir n= 𝑛 ∑𝑥𝑦 − ∑𝑥∑𝑦
R _
Z
X & Y are two numerical variables and n is the number of pairs.
M
Simple correlation 11
Interpretation
0 1
r>0: Positive linear relationship
A 2
r<0: Negative linear relationship
S T
r=0: No linear relationship
3 _
0 2
Perfect negative correlation
g 2 Perfect positive correlation
ir n
-1
S p 0 +1
R _
Negative correlation Positive correlation
Z No linear relationship
M
Simple correlation 12
0 1
A 2
S T
3 _
Correlatio
n -1 (-.99)-(-.51) -.5 (-.49)-(-.01) 0
0 2 .01-.49 .5 .51-.99 1
Coefficient
Correlatio Perfect Strong Moderate Weak g 2 No Weak Moderate Strong Perfect
n type negative negative negative
ir n
negative correlation positive positive positive positive
S p
R _
Z
M
Simple correlation 13
0 1
A 2
Assumptions:
S T
_
Both X & Y are measured on an interval or ratio scales
3
2
The two variables follow bi-variate normal distribution
0
2
The relationship between the variables is linear
g
ir n normality
The sample is of adequate size to assume
S p
R _
Z
M
Simple correlation 14
0 1
A 2
Example (continues)
S T
No. of family
members, x
Monthly expenditure on
food (thousand taka), y
𝒙𝟐
3 _ 𝒚𝟐 xy
2 5
0 2
3 7
g 2
6 11 ir n
4 8
S p
7
3
13
6 R _
Z
6 12
M
Simple correlation 15
0 1
A 2
Example (continues)
S T
No. of family
members, x
Monthly expenditure on
food (thousand taka), y
𝒙𝟐
3 _ 𝒚𝟐 xy
2 5 4
0 2 25 10
3 7
g92 49 21
6 11 ir n 3616 121 66
4 8
S p 64 32
7
3
13
6 R _ 49
9
169
36
91
18
Z
6 12
M 36 144 72
0 1
A 2
𝑟=
2
𝑛 ∑𝑥𝑦 − ∑𝑥∑𝑦
2 2 2S T
7 ∗ 310 − 31 ∗ 62 3_
𝑛 ∑ 𝑥 − ∑𝑥 𝑛 ∑ 𝑦 − ∑𝑦
=
7 ∗ 159 − 31 2
0
7 ∗ 6082 2
− 62
= 0.991
g 2
ir n
S p
Interpretation: So, there is a very strong positive relationship between number of
Z
M
Simple correlation 17
0 1
A 2
Properties:
S T
r always measures linear relationships
3 _
linearly related. 0 2
r=0 doesn’t necessarily mean that X & Y are not related, but that they are not
2
𝒓 = 𝒓 , i.e. correlation coefficient is agsymmetrical measure
𝒙𝒚 𝒚𝒙
r i n
S p
The correlation coefficient is a dimensionless
expressed in any units of measurement
measure, implying that it is not
R _
Correlation doesn’t mean causation, i.e. correlation doesn’t necessarily imply
Z
any cause and effect relationship
M
Simple correlation 18
Example:
0 1
A 2
An executive manager of a private hospital was interested in studying the relationship between
S T
the monthly number of part-time physicians (X) hired in the hospital and the monthly extra
profit earned by the hospital in thousands (Y). For this purpose, the manger selected a random
sample of ten months and obtained the following data:
3 _
i Xi Yi i
0 2
Xi Yi
1 43 175
g 2
6 32 165
2 49 180
ir n 7 51 190
3 50 186
S p 8 30 95
4 12
R _95 9 35 130
5 8
Z 75 10 23 95
M
(1) Draw the scatter diagram. What indications does the scatter diagram reveal?
(2) Calculate Pearson's Correlation Coefficient (r).
Rank correlation 19
0 1
A 2
coefficient r applied to the rank order data. S T
Spearman rank correlation (Spearman’s rho) rs is the sample correlation
3 _
0 2
g 2
ir n
S p
R _
Z
M
Rank correlation 20
0 1
A 2
Formula:
∑ 𝑥𝑖 𝑦𝑖 − 𝐶 S T
𝑟𝑠 =
−3
_
∑ 𝑥𝑖2 −𝐶
0 2
∑ 𝑦𝑖2 𝐶
Where, 𝐶 =
𝑛 𝑛+1 2
g 2
4
ir n
And n is the number of pairs.
S p
R _
Z
M
Rank correlation 21
0 1
Example A 2
No. of family Monthly expenditure on Rank of x, Rank ofS
x,
T 𝒂𝟐 𝒃𝟐 ab
b _
members, x
2
food (thousand taka), y
5
a
2 3
3 7 2 0
n g
6 11
r i
4 8
S p
7
3
13
6
R _
6 12 Z
M
Rank correlation 22
0 1
Example A 2
No. of family Monthly expenditure on Rank of x, Rank ofS
y,
T 𝒂𝟐 𝒃𝟐 ab
b _
members, x
2
food (thousand taka), y
5
a
1 2
1
3
3 7 2.5
0
2 3
1 1 1
n g 6.25 9 7.5
6 11
4r
i
5.5 5 30.25 25 27.5
4 8
p
S7
4 16 16 16
7
3
13
6
R _ 2.5
7
2
49
6.25
49
4
49
5
6 12 Z 5.5 6
Total - M - -
30.25
139
36
140
33
139
Rank correlation 23
0 1
𝑛 𝑛+1 2 7 7+1 2
A 2
𝐶=
4
=
4
= 112
S T
3 _
𝑟𝑠 =
∑ 𝑎𝑖 𝑏𝑖 − 𝐶
0 2
2
∑ 𝑎𝑖2 − 𝐶 ∑ 𝑏𝑖2 − 𝐶
g
= ir n
139 − 112
= 0.982
p
139 − 112 140 − 112
S
R _
Z
Interpretation: So, there is a very strong positive relationship between number of
M
family members and monthly expenditure.
Rank correlation 24
0 1
A 2
Properties:
S T
_
Spearman correlation coefficient ranges from -1 to 1 with similar interpretation
to that for the simple correlation coefficient r. 3
0
r is a measure of monotonicity of a relationship
2
s
g 2
ir n
Again, correlation doesn’t mean causation, i.e. correlation doesn’t necessarily
imply any cause and effect relationship
S p
R _
Z
M
Advantages and disadvantages 25
0 1
A 2
Advantage of r over rs:
S T
_
r provides a more accurate result than rs, when applicable, as r uses more
information than rs 3
0 2
g 2
Advantage of rs over r:
r i n
p
r is less affected by extreme observations
s
S
s
R _
r can be calculated for curvilinear
r can be calculated for ordinal
(non-linear) relationship
level of data.
s
Z
M