Professional Documents
Culture Documents
Correlation and Linear Regression Analysis
Correlation and Linear Regression Analysis
Analysis
Correlation
Correlation is a statistical technique used for
measuring the relationship of two or more
variables.
Ex:
relationship between family income and
multiple correlation.
Ex: When we study the relationship between the yield of rice per acre and both
the amount of rainfall and the amount of fertilisers used, it is a problem of
multiple correlation.
In partial correlation we recognise more than two variables, but consider only
Scatter plot
Karl Pearson’s Coefficient of Linear correlation
Rank correlation Coefficient
Method of least squares
Scatter Diagram
Scatter diagram is a graphical method to
display the relationship between two
variables
Y Y Y
Y Y Y
X X X
y y
x x
y y
x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
No relationship
x
Example
A researcher believes that there is a linear
relationship between BMI (Kg/m2) of
pregnant mothers and the birth-weight (BW
in Kg) of their newborn
20 2.7
30 2.9
50 3.4
45 3.0
10 2.2
30 3.1
40 3.3
25 2.3
50 3.5
20 2.5
10 1.5
55 3.8
60 3.7
50 3.1
35 2.8
Scatter diagram of BMI and Birthweight
4
3.5
2.5
1.5
0.5
0
0 10 20 30 40 50 60 70
Is there a linear relationship
between BMI and BW?
Scatter diagrams are important for initial
exploration of the relationship between two
quantitative variables
Merits:
It is a simple and non mathematical method.
It is not influenced by extreme values.
Scatter plot is the first step to study the relationship between
variables.
Limitation:
From this method we can get the idea about the direction of the
correlation and also whether it is high or low. But we cannot
establish the exact degree of correlation.
Karl Pearson’s Coefficient of Linear
correlation
The population correlation coefficient ρ (rho)
measures the strength of the association
between the variables
The sample correlation coefficient r is an
estimate of ρ and is used to measure the
strength of the linear relationship in the
sample observations
Calculation of the
Karl Pearson’s Coefficient of Linear correlation
Sample correlation coefficient:
r
( x x)( y y)
[ ( x x ) ][ ( y y ) ]
2 2
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
Calculate correlation coefficient between tree
height and trunk diameter
Tree Trunk
Height Diameter
35 8
49 9
27 7
33 6
60 13
21 7
45 11
51 12
Calculation Example
Tree Trunk
Height Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =14111 =713
Calculation Example
(continued)
Tree n xy x y
Height, r
y 70 [n( x 2 ) ( x)2 ][n( y 2 ) ( y)2 ]
60
8(3142) (73)(321)
50
40
[8(713) (73)2 ][8(14111) (321) 2 ]
30
0.886
20
10
0
r = 0.886 → relatively strongly positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Example
Find correlation coefficient
X : 2 6 7
Y :13 20 27
Ans:0.943
Example
Find the correlation coefficient
X: 2 6 7
Y:27 20 13
Ans: -0.943
Calculate the Karl Pearson’s Coefficient of
correlation from the following data
Values of X Values of Y
12 14
9 8
8 6
10 9
11 11
13 12
7 3
r=0.949
Q. the covariance (Cov) between the length and
weight of five items is 6 and their standard
deviations are 2.45 and 2.61 respectively. Find the
coefficient of correlation between length and weight.
Ans : 0.94
Q. The Karl Pearson’s coefficient of correlation
and covariance between two variables X and Y is
0.85 and -15 respectively. If variance of Y is 9,
find the standard deviation of X.