Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Duhok Polytechnic University

Technical College of Engineering


Chemical Engineering, 2nd Grade

Statistics and Probability

The Correlation
By: Dr. Firas M. AlFiky
2021-2022 Lec_007
Correlation
•Finding the relationship between two quantitative
variables without being able to infer causal
relationships.
•Correlation is a statistical technique used to
determine the degree to which two variables are
related.
Scatter diagram
❖ Rectangular coordinate.
❖ Two quantitative variables.
❖ The horizontal axis is (X) and the vertical axis is (Y).
❖ Points are not joined.
Y
❖ No frequency table.
* *
*
X
Example
Wt.
67 69 85 83 74 81 97 92 114 85
(kg)
SBP
120 125 140 160 130 180 150 140 200 130
(mmHg)

(SBP) = Systolic Blood Pressure


Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)

Scatter diagram of weight and Systolic Blood Pressure (SBP)


Scatter diagram of weight and systolic blood pressure
Scatter plots:
The pattern of data is indicative of the type of
relationship between your two variables:

➢ Positive relationship

➢ Negative relationship

➢ No relationship
Positive relationship
18

16

14

12
Height in CM

10

0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship

Reliability

Age of Car
No relation
Correlation Coefficient
Statistic showing the sign and the degree of relation between two
variables.

Simple Correlation coefficient (r)


❑ It is also called Pearson's correlation or product moment correlation
coefficient.
❑ It measures the nature and strength between two variables of the
quantitative type.
❑ The sign of r denotes the nature of association.
❑ The value of r denotes the strength of association.
➢If the sign is +ve this means the relation is direct (an increase in one

variable is associated with an increase in the other variable and a

decrease in one variable is associated with a decrease in the other

variable).

➢While if the sign is -ve this means an inverse or indirect relationship

(which means an increase in one variable is associated with a decrease

in the other).
➢The value of r ranges between ( -1) and ( +1)

➢The value of r denotes the strength of the association as

illustrated by the following diagram.


strong intermediate weak weak intermediate strong

-1 -0.75 -0.25
0 0.25 0.75
1
indirect Direct
perfect correlation perfect correlation
no relation
If r = Zero this means no association or correlation between

the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.
How to compute the simple correlation coefficient (r)

 xy −  x y
r= n

x −
2
(  x) 2

.  y −
2
(  y) 2


 n  n 
  
Example:
A sample of 6 children was selected, data about their age in years
and weight in kilograms was recorded as shown in the following
table. It is required to find the correlation between age and weight.
serial No Age (years) Weight (Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
These 2 variables are of the quantitative type, one variable
(Age) is called the independent and denoted as (X) variable
and the other (weight) is called the dependent and denoted
as (Y) variables to find the relation between age and weight
compute the simple correlation coefficient using the following
formula:
 xy −  x y
r= n

  x2 −
(  x) 2

.  y 2 −
(  y) 2


 n  n 
  
Age Weight
Serial (years) (Kg) xy x2 y2
(x) (y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169

Total ∑x= 41 ∑y= 66 ∑xy= 461 ∑x2= 291 ∑y2= 742


41  66
461 −
r = 6
 (41)  
2
(66) 
2

291 − .742 − 
 6  6 

r = 0.759, strong direct correlation…


EXAMPLE:
Find the relationship between Anxiety and Test Scores…

Anxiety Test score


10 2
8 3
2 9
1 7
5 6
6 5
Answer:

Anxiety Test score x2 y2 xy


(x) (y)

10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
∑x = 32 ∑y = 32 ∑x2 = 230 ∑y2 = 204 ∑xy=129
Calculating Correlation Coefficient
 xy −  x y
r = n

  x2 −  .  y 2 − 
 ( x) 2
 ( y) 2


 n  n 
  

Indirect strong correlation


Example:
Find the relationship between Weight and SBP.

Wt.
67 69 85 83 74 81 97 92 114 85
(kg)
SBP
120 125 140 160 130 180 150 140 200 130
(mmHg)
Answer:
Wt. (kg) SBP (mmHg)
x2 y2 xy
(x) (y)
1 67 120 4489 14400 8040
2 69 125 4761 15625 8625
3 85 140 7225 19600 11900
4 83 160 6889 25600 13280
5 74 130 5476 16900 9620
6 81 180 6561 32400 14580
7 97 150 9409 22500 14550
8 92 140 8464 19600 12880
9 114 200 12996 40000 22800
10 85 130 7225 16900 11050
∑x = 847 ∑y = 1475 ∑x2 = 73495 ∑y2 = 223525 ∑ xy = 127325
Answer: Calculating Correlation Coefficient

𝒏 = 𝟏𝟎
σ𝑥 σ𝑦
σ 𝑥𝑦 −
𝑟= 𝑛
σ 𝑥 2 σ𝑦 2
2
σ𝑥 − . 2
σ𝑦 −
𝑛 𝑛
2392.5
=
𝟏, 𝟕𝟓𝟒 . 5,962
2392.5
= = 0.7398
𝟑𝟐𝟑𝟑
Direct intermediate correlation
Example
student Statistic Physics
The score of 9 students
1 35 65
in statistic and physics 2 55 70
are as follows. 3 45 40
4 50 70
Find the relationship 5 25 45
between them by finding 6 45 50
7 29 52
the Pearson's correlation
8 52 46
Coefficient (r). 9 45 58
Answer:
Physics Statistic
student x2 y2 xy
x y
1 65 35 4225 1225 2275
2 70 55 4900 3025 3850
3 40 45 1600 2025 1800
4 70 50 4900 2500 3500
5 45 25 2025 625 1125
6 50 45 2500 2025 2250
7 52 29 2704 841 1508
8 46 52 2116 2704 2392
9 58 45 3364 2025 2610
∑x = 496 ∑y = 381 ∑x2 =28334 ∑y2 =16995 ∑ xy =21310
Answer: Calculating Correlation Coefficient
σ𝑥 σ𝑦
𝒏=𝟗 σ 𝑥𝑦 −
𝑟= 𝑛
σ 𝑥 2 σ𝑦 2
2
σ𝑥 − 2
. σ𝑦 −
𝑛 𝑛
21310 −
496 ∗ 381
9
312.66
𝑟=
496
=
𝟗𝟗𝟖. 𝟖𝟖𝟖 . 866
2 381 2
28334 − . 16995 −
9 9

312. 𝟔𝟔
= = 0.336
𝟗𝟑𝟎
Direct intermediate correlation
Spearman Rank Correlation Coefficient (rs)
It is a non-parametric measure of correlation.

This procedure makes use of the two sets of ranks that may be assigned
to the sample values of x and y.

Spearman Rank correlation coefficient could be computed in the


following cases:

➢Both variables are quantitative.

➢Both variables are qualitative ordinal.

➢One variable is quantitative and the other is qualitative ordinal.


Procedure:
1. Rank the values of X from 1 to n where n is the
numbers of pairs of values of X and Y in the sample.

2. Rank the values of Y from 1 to n.

3. Compute the value of di for each pair of observation by


subtracting the rank of Yi from the rank of Xi

4. Square each di and compute ∑di2 which is the sum of


the squared values.
5. Apply the following formula:

6 (di) 2
rs = 1 −
n(n − 1)
2

The value of rs denotes the magnitude and nature of


association giving the same interpretation as simple r.
Example:
Find the relationship between A and B by finding the Spearman
Rank Correlation Coefficient (rs).

A 67 69 84 83 74 81 97 92 114 85

B 120 125 145 160 130 180 150 140 200 135
Answer:
A B di
Rank x Rank y di2
(x) (y) Rank x-Rank y
67 120 10 10 0 0
69 125 9 9 0 0
84 145 5 5 0 0
83 160 6 3 3 9
74 130 8 8 0 0
81 180 7 2 5 25
97 150 2 4 -2 4
92 140 3 6 -3 9
114 200 1 1 0 0
85 135 4 7 -3 9
∑di2 = 56
2
6 ∗ σ 𝑑𝑖
𝑟𝑠 = 1 −
𝑛 ∗ 𝑛2 − 1
n = 10, Then:
6 ∗ 56 336
𝑟𝑠 = 1 − =1−
10 ∗ 102 − 1 10 ∗ 100 − 1

336 336
𝑟𝑠 = 1 − =1− = 1 − 0.34
10 ∗ 99 990

𝒓𝒔 = 𝟎. 𝟔𝟔

direct intermediate correlation


***NOTE:
When a specific case is repeated for a number of times, a rank
must be given to each repetition, and because this repetition is for
the same condition or value, but it is repeated, it must have the
same rank.

Therefore, when there is a repetition of a particular case, the


average is taken for the ranks that are used to cover this repetition
and then the rate is used as a rank for each time the duplicate
status appears.
Example:

If it is found that the number 30 is repeated, for example 3 times,


and the rank we have reached is for example 4, then the ranks
used to cover the cases of repetition the number 45 are 4, 5 and
6 respectively, but it is not permissible for the same value to have
different ranks!!! Therefore, we take the average for ranks (4, 5,
and 6), so the rank result for this example is 5, meaning we give
the value 30, the rank 5 wherever it appears.
Example
student Statistic Physics
The score of 9 students in
1 35 65
statistic and physics are as 2 55 70
3 45 40
follows. Find the relationship
4 50 70
between them by finding the 5 25 45
6 45 50
Spearman Rank Correlation
7 29 52
Coefficient (rs). 8 52 46
9 45 58
Answer:
Statistic Physics di
student Rank x Rank y di2
(x) (y) Rank x-Rank y

1 35 65 7 3 4 16
2 55 70 1 1.5 -0.5 0.25
3 45 40 5 9 -4 16
4 50 70 3 1.5 1.5 2.25
5 25 45 9 8 1 1
6 45 50 5 6 -1 1
7 29 52 8 5 3 9
8 52 46 2 7 -5 25
9 45 58 5 4 1 1
∑ di2 = 71.5
2
6 ∗ σ 𝑑𝑖
𝑟𝑠 = 1 −
𝑛 ∗ 𝑛2 − 1
n = 9, Then:
6 ∗ 71.5 429
𝑟𝑠 = 1 − =1−
9 ∗ 92 − 1 9 ∗ 81 − 1

429 429
𝑟𝑠 = 1 − =1− = 1 − 0.596
9 ∗ 80 720

𝒓𝒔 = 𝟎. 𝟒𝟎𝟒

direct intermediate correlation


Example:
In a study of the relationship between level education and income
the following data was obtained. Find the relationship between them
and comment. sample level education Income
numbers (X) (Y)
Level Education A Preparatory 25
University B Primary 10
Secondary C University 8
Preparatory D Secondary 10
Primary E Secondary 15
illiterate F illiterate 50
G University 60
Answer:
Rank Rank di
(X) (Y) di2
X Y Rank X-Rank Y
A Preparatory 25 5 3 2 4
B Primary. 10 6 5.5 0.5 0.25
C University. 8 1.5 7 -5.5 30.25
D secondary 10 3.5 5.5 -2 4
E secondary 15 3.5 4 -0.5 0.25
F illiterate 50 7 2 5 25
G University. 60 1.5 1 0.5 0.25

∑ di2 = 64
6 × 64
𝑟𝑠 = 1 −
7 48
𝑟𝑠 = −0.14
Comment:
There is an indirect weak correlation between level
of education and income.

You might also like