Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 32

Correlation and Linear Regression

Analysis
Correlation
Correlation is a statistical technique used for
measuring the relationship of two or more
variables.
Ex:
relationship between family income and

expenditure on luxury items.


Price and demand of a commodity.
Types of Correlation
Correlation can be classified in the following
ways:
i)Positive and Negative

ii)Simple, Partial and Multiple

iii)Linear and non- linear


Positive and Negative correlation
If one variable increasing other is also increasing
or if one variable decreasing other is also
decreasing then correlation is said to be positive.

If one variable increasing other is decreasing or if


one variable decreasing other is increasing then
correlation is said to be negative.
Example:
Positive correlation Positive correlation
X Y X Y
10 15 80 50
12 20 70 45
16 22 60 30
18 25 40 20
20 37 30 10

Negative correlation Negative correlation


X Y X Y
20 40 100 10
30 30 90 20
40 22 60 30
60 15 40 40
80 16 30 50
Simple, Partial and Multiple
 When only two variables are studied it is a problem of simple correlation.
Ex: when we study the relationship between demand and supply it is simple
correlation problem.
 When more than two variables are simultaneously studied it is a problem of

multiple correlation.
Ex: When we study the relationship between the yield of rice per acre and both
the amount of rainfall and the amount of fertilisers used, it is a problem of
multiple correlation.
 In partial correlation we recognise more than two variables, but consider only

two variables to be influencing each other, the effect of other influencing


variable being kept constant.
Ex: In the rice problem if we limit our correlation analysis of yield and rainfall to
periods when a certain average daily temperature existed, it becomes the
problem of partial correlation.
Linear and non- linear
 If the variable changes with a consistent rate of
change then the correlation is said to be linear.
if we plot them, then we have a straight line
graph.
 If the variable changes with a inconsistent rate
of change then the correlation is said to be
nonlinear or curvilinear. If we plot them we have
a curve.
Methods of studying correlation

 Scatter plot
 Karl Pearson’s Coefficient of Linear correlation
 Rank correlation Coefficient
 Method of least squares
Scatter Diagram
 Scatter diagram is a graphical method to
display the relationship between two
variables

 Scatter diagram plots pairs of bivariate


observations (x, y) on the X-Y plane

 Y is called the dependent variable

 X is called an independent variable


Scattergrams

Y Y Y
Y Y Y

X X X

Positive correlation Negative correlation No


correlation
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships

y y

x x

y y

x x
Scatter Plot Examples
(continued)
No relationship

x
Example
 A researcher believes that there is a linear
relationship between BMI (Kg/m2) of
pregnant mothers and the birth-weight (BW
in Kg) of their newborn

 The following data set provide information


on 15 pregnant mothers who were contacted
for this study
BMI (Kg/m2) Birth-weight (Kg)

20 2.7
30 2.9
50 3.4
45 3.0
10 2.2
30 3.1
40 3.3
25 2.3
50 3.5
20 2.5
10 1.5
55 3.8
60 3.7
50 3.1
35 2.8
Scatter diagram of BMI and Birthweight
4

3.5

2.5

1.5

0.5

0
0 10 20 30 40 50 60 70
Is there a linear relationship
between BMI and BW?
 Scatter diagrams are important for initial
exploration of the relationship between two
quantitative variables

 In the above example, we may wish to


summarize this relationship by a straight line
drawn through the scatter of points
Merits and Limitations of scatter plot

Merits:
It is a simple and non mathematical method.
It is not influenced by extreme values.
Scatter plot is the first step to study the relationship between
variables.
Limitation:
From this method we can get the idea about the direction of the
correlation and also whether it is high or low. But we cannot
establish the exact degree of correlation.
Karl Pearson’s Coefficient of Linear
correlation
 The population correlation coefficient ρ (rho)
measures the strength of the association
between the variables
 The sample correlation coefficient r is an
estimate of ρ and is used to measure the
strength of the linear relationship in the
sample observations
Calculation of the
Karl Pearson’s Coefficient of Linear correlation
Sample correlation coefficient:

r
 ( x  x)( y  y)
[ ( x  x ) ][  ( y  y ) ]
2 2

or the algebraic equivalent:


n xy   x  y
r
[n(  x 2 )  (  x )2 ][n(  y 2 )  (  y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Properties of ρand r
 Unit free
 Range between -1 and 1
 The closer to -1, the stronger the negative
linear relationship
 The closer to 1, the stronger the positive
linear relationship
 The closer to 0, the weaker the linear
relationship
Examples of Approximate
r Values
y y y

x x x
r = -1 r = -.6 r=0
y y

x x
r = +.3 r = +1
Calculate correlation coefficient between tree
height and trunk diameter

Tree Trunk
Height Diameter
35 8
49 9
27 7
33 6
60 13
21 7
45 11
51 12
Calculation Example
Tree Trunk
Height Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =14111 =713
Calculation Example
(continued)

Tree n xy   x  y
Height, r
y 70 [n(  x 2 )  (  x)2 ][n(  y 2 )  (  y)2 ]
60

8(3142)  (73)(321)
50 
40
[8(713)  (73)2 ][8(14111)  (321) 2 ]
30

 0.886
20

10

0
r = 0.886 → relatively strongly positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Example
Find correlation coefficient
X : 2 6 7
Y :13 20 27

Ans:0.943
Example
Find the correlation coefficient
X: 2 6 7
Y:27 20 13

Ans: -0.943
Calculate the Karl Pearson’s Coefficient of
correlation from the following data

Values of X Values of Y
12 14
9 8
8 6
10 9
11 11
13 12
7 3

r=0.949
Q. the covariance (Cov) between the length and
weight of five items is 6 and their standard
deviations are 2.45 and 2.61 respectively. Find the
coefficient of correlation between length and weight.

Ans : 0.94
Q. The Karl Pearson’s coefficient of correlation
and covariance between two variables X and Y is
0.85 and -15 respectively. If variance of Y is 9,
find the standard deviation of X.

Ans: Standard Deviation of X is =5.88


Merits of Pearson’s Coefficient of Correlation:-

1.This is the most widely used algebraic method to


measure coefficient of correlation.
2.It gives a numerical value to express the relationship
between variables
3.It gives both direction and degree of relationship
between variables
4.It can be used for further algebraic treatment such as
coefficient of determination.
5.It gives a single figure to explain the accurate degree of
correlation between two variables
Demerits of Coefficient of correlation

1. It is very difficult to compute the value of coefficient


of correlation.
2. It is very difficult to understand
3.It requires complicated mathematical calculations
4. It takes more time
5. It is unduly affected by extreme items
6. It assumes a linear relationship between the variables.
But in real life situation, it may not be so.

You might also like