Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Chapter 6

Linear Regression and Correlation


Prepared by
Dr. Mohammad Bayezid Ali
Professor
Department of Finance
Jagannath University, Dhaka

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


GOALS

 Understand and interpret the terms dependent and


independent variable.
 Calculate and interpret the coefficient of correlation,
the coefficient of determination, and the standard
error of estimate.
 Calculate the least squares regression equation and
regression line.

2
Definition: Dependent and Independent Variable

 Dependent variables are those variables whose value is


expected to be dependent or based on the value of some other
variable. For example: profit is dependent upon the amount of
revenue, agricultural production is dependent upon the amount
of rainfall or use of fertilizer, student satisfaction in the class
room is dependent upon the academic qualification of the
faculty or logistic support in the class room etc. In statistics,
dependent variables is also referred to as predictand,
regressand, explained variable, effect variable or target
variable.
 Independent variables are those variables whose values are
not expected to be dependent on some other variables. In
statistics, independent variables are also known as
explanatory variables, predictors, regressors or control
variables.
3
Correlation Analysis

 Correlation Analysis is the study of the


relationship between variables. It is also defined as
group of techniques to measure the association
between two variables. The measures of
correlation is called correlation coefficient and is
denoted by the symbol ‘r’.
 A Scatter Diagram is a chart that portrays the
relationship between the two variables. It is the
usual first step in correlations analysis.

4
Scatter Plot (or scatter diagram)

Scatter plot or scatter diagram


is a graph in which the paired (x,y)
sample data are plotted with a
horizontal x axis and a vertical y axis.
Each individual (x, y) pair is plotted as
a single point.

5
Positive Linear Correlation

y y y

x x x
(a) Positive (b) Strong (c) Perfect
positive positive

6
Negative Linear Correlation

y y y

x x x
(d) Negative (e) Strong (f) Perfect
negative negative

7
No Linear Correlation
y y

x x
(g) No Correlation (h) Nonlinear Correlation

8
Example: Correlation

There is a general intuition Sales Intelligence Sales (‘000


person Test Scores Tk.)
that an intelligent sales
person will be able to make 1 45 2.0
more sales. This table 2 75 6.5
arranges the information 3 50 3.5
about the random selection 4 60 5
of 10 different sales person, 5 80 4.5
their obtained intelligent test
6 90 6
scores and their sales
performance in a particular 7 85 6.5
month. 8 40 2.5
9 80 5.5
10 55 4.5
9
Scatter Diagram

10
Pearson Correlation Coefficient

The Coefficient of Correlation (r) is a measure of the


direction as well as strength of the relationship
between two variables. Measures a “linear”
relationship only.

Direction of relationship between x, y


Positive (+r) = As X goes up, Y goes up
Negative (-r) = As X goes up, Y goes down
Strength of a relationship between X, Y
Closer to  1.0, stronger
Closer to 0, weaker
when r = 0  X,Y relationship not defined by a straight
line
11
Different Level of the Strength of
Relationship

12
Different Level of the Strength of
Relationship

13
What does r represent?:

r = degree to which X and Y vary together


degree to which X and Y vary separately
r = covariance of X and Y
variance of X and Y
X . Y
XY 
n
r 
 ( X )
2
  2 ( Y ) 2 
 X 
2
  Y  
 n  n 

14
Solving for correlation coefficient
Sales Intelligence Sales (‘000 Tk.) X2 Y2 XY
Test Score (X) (Y)
1 45 2 2025 4 90
2 75 6.5 5625 42 488
3 50 3.5 2500 12 175
4 60 5 3600 25 300
5 80 4.5 6400 20 360
6 90 6 8100 36 540
7 85 6.5 7225 42 553
8 40 2.5 1600 6.3 100
9 80 5.5 6400 30 440
10 55 4.5 3025 20 248
∑X2 = ∑Y2= ∑XY=
N=10 ΣX=660 ΣY=46.5 46500 2239 3293
15
X . Y
XY 
n
r 
 ( X ) 
2
(  Y )
2

 X 2
  Y 2  
 n  n 
660  46 . 5
3293 
10

2 2
660 46 . 5
( 46500  )( 2239  )
10 10
 0 . 869
Comment: Correlation coefficient value of 0.869 implies that
there exist a strong and positive statistical association
between intelligent test score of the sales person and their
sales performance.
16
Coefficient of Determination

The coefficient of determination (r2) is the proportion of the


total variation in the dependent variable (Y) that is
explained or accounted for by the variation in the
independent variable (X). It is the square of the coefficient
of correlation.
 It ranges from 0 to 1.
 It does not give any information on the direction of the
relationship between the variables.
 In our example that means 75.4%
r  ( 0 . 869 )  0 . 754
2 2

variations or changes in sales performance can be


explained by the changes in intelligent test scores.

17
Linear Regression Model
Regression analysis is a technique used to develop an
equation
to express the linear (straight line) relationship between two
variables and provide the estimates. The objective of this
analysis is to estimate a regression equation which is used to
estimate the change in the dependent variable due to any
change in independent variable.

18
a 
Y b
X
n n

XY  X .Y / n
b 
 X 
2

X 2 
n

The least squares principle is used to obtain a and b.


The equations to determine a and b are:

b 
XY  X .Y / n and a 
 Y
b
 X
 X 
2
n n
X 
2

n
19
Computing the Slope of the Line

Since, y = a + bx XY  X .Y / n


Here, b 
 X 
2

X 
2

n
660  46 . 5
3293 
10
 2
660
46500 
10
 0 . 076

20
Computing the Y-Intercept

a 
Y b
X
n n
46 . 5 660
  ( 0 . 076  )
10 10
 0 . 366
Hence, the regression equation of sales on test score is
y=-0.366+0.076x. This regression equation can be explained
as if the intelligence test score increased by 1 marks, sales
performance is expected to increase by 0.076 thousand or
Tk. 76.

21
Predicting Values with Regression Equation

If any particular sales person got an intelligence


test score equal to 94, what is the expected sales
performance?

Here the regression equation of sales on test score


is y=-0.366+0.076x

When the test score is 94, the expected sales


amount will be -0.366+0.076*94= Tk. 7.51
thousand.

22
Difference Between Correlation and
Regression

23
Difference Between Correlation and
Regression

24
The Standard Error of Estimate

 The standard error of estimate measures the scatter, or


dispersion, of the observed values around the line of
regression.
 The formulas that are used to compute the standard error:

Y 2  a Y  b XY
sy.x 
n 2
 In our example, the standard error of the regression
estimates is 0.847865. Here, the lower the value of the
standard error, greater the statistical reliability of the
regression estimates.

25
Charles Spearman’s Coefficient of Correlation
(Rank Correlation)

26
Charles Spearman’s Coefficient of Correlation
(Rank Correlation)

Two managers are Employees Ranking by Ranking by


asked to rank a group Manager I Manager II
of employees in order A 10 9
of potential for B 2 4
eventually becoming C 1 2
top managers. The
D 4 3
rankings are as follows:
E 3 1
F 6 5
G 5 6
Compute the
H 8 8
coefficient of rank
I 7 7
correlation and
comment on the value. J 9 10
27
Solution: Rank Correlation

Thus we find that there


is a high degree of
positive correlation in
the ranks assigned by
two managers.

28
End of Chapter 6

29

You might also like