Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

THE INSTITUTE OF FINANCE MANAGEMENT (IFM)

Department of Computer Science and Mathematics

BUSINESS STATISTICS I
MTU 07203

Simple Linear Regression and Correlation


Topic Content

i. Define the terms regression and correlation


ii. Identify and explain the independent and
dependent variables
iii. Present data in the scatter diagram
iv. Solve the linear regression equation
v. Interpret the linear regression equation
vi. Use the linear regression equation to forecast
vii. Compute the coefficient of correlation, coefficient
of determination and rank correlation and interpret
Regression
Definition

Regression is a determination of statistical


relationship between two or more variables

Regression is a technique for determining the


statistical relationship between two or more
variables where a change in a dependent variable
is associated with, and depends on, a change in
one or more independent variables.
Independent and dependent variables
Regression must contain independent and
dependent variables
Independent variable
Independent variable is a variable that is
manipulated to determine the value of a dependent
variables,
not influenced or controlled by others in matters of 
opinion,  conduct, etc,
thinking or acting for oneself
Dependent variable
A factor or phenomenon that is changed by the
effect of an associated factor or phenomenon called
Scatter diagram

Scatter diagram is the suitable way to represent the


relationship pictorially
Data presented on graph so that one can see at
easier
Each pair of numbers provides one point on the
diagram
Example: The following data relate to rainfall ad
subsequent crop yield over five years

Year 1 Year 2 Year 3 Year 4 Year 5


Rainfall in 4 2 5 7 8
inch
Yields in tons 50 25 40 70 85
Example cont …
Scatter Diagram
90
80
70
60
Yields in tons

50
40
30
20
10
0
1 2 3 4 5 6 7 8 9

Rainfall in Inch
Types of Linear Regression
There two types of linear regression analysis are
further classified into
Simple linear regression
Is the relationship developed between only one
dependent variable against only one independent
variables (explanatory)
Multiple Linear regression
 Is the relationship developed between only one
dependent variable against many independent
variables (more than one explanatory)
Simple linear regression
Simple Linear Regression
Linear regression is the process of determining the
statistical examination of the line of the best fit
Linear regression is any line connecting a
dependent variable (Y) and only one independent
variable (x) that may be expressed as
y  β 1  β 2 x i  ε, i  1 ,... n

where β 1 and β 2 are constant parameter


 2 is the gradient or slope of the line
ε is called error term
Simple Linear Regression Model…
Meaning of β 1 and β 2

y
y  β1  β 2 x i  ε
rise

y-intercept run
β1 β2 =slope (=rise/run)

β 2 >0 is positive slope


x
β 2< 0 is negative slope
17.
10
Simple Linear Regression cont …
We can not compute the parameter β1 and β2 from
equation y  β 1  β 2 x i  ε
Taking the paired sample of size ‘n’ from the same
population we can estimate the values.
The estimated parameters reveals what we call it
the line of the best fit (sample regression line)

ŷ i  β̂ 1  β̂ 2 x i i  1 ,..., n
Consider the Scatter diagram
20
18 18

16
14
12
10
Y

8
6
4
2
0
0 1 2 3 4 5 6 7 8 9 10

X
Which line has the best “fit” to the data?
?
20 Scatter Diagram ?
18 18

16
14
?
12
10
Y

8
6
4
2
0
0 1 2 3 4 5 6 7 8 9 10
X
Estimation of least squares
Using method of least squares we get
β̂ 1 and β̂ 2 β 1 and β 2
as the estimates of
Least Squares Graphically
LS minimizes i1ε̂ i  ε̂1  ε̂ 2  ε̂ 3  ε̂ 4
n 2 2 2 2 2

Y Y i  β̂ 1  β̂ 2 X 2  ε̂ 2

^ 4
^2
^ 1 ^ 3
Ŷi  β̂1  β̂ 2 X i

X
Least Squares
1.‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative. So square errors!
n

i1
Y i
 Ŷ i  2

n

i1
ε̂
2
i

2.LS Minimizes the Sum of the Squared


Differences (errors) (SSE)

EPI 809/Spring 2008 16


The method of list squares
We may define the difference between y i  ŷ  ei,
e i is the residual
ei  yi  ŷ i  yi  β̂1  β̂ 2 x i , i  1,..., n
The method of least squares suggest the values of
β̂ 1 and β̂ 2 that minimizes the residual  e 2
i

where
 e i    y i  ŷ i     y i  β̂ 1  β̂ 2 x i 
2 2
2

 let  i ,
q  e then the method involving solving
2

the following system


q q
 0,  0
 β̂ 1  β̂ 2
The method of list squares …

This result to the estimators, ˆ1  ˆ2 x


yˆ, Where,

 (x i  x )(y i  y ) 2 n  xy   x  y
β̂ 2  
 x  x   x   x 
2 2
i
n 2

and
β̂ 1  y  β̂ 2 x 
 y  β̂ 2  x
n
By defining the following

S xx   x 
2 1
 x  2 S yy   y2 
1
 y  2

n n
1
S,xy   xy   x y
n
S xy
β̂ 2 
One can simplify S xx
The method of list squares …
The line of best fit can roughly be estimated from
scatter diagram plotted using paired (x, y)
Application
 The purpose of linear regression is to
develop a mathematical relationship (model)
between variables that can be used to predict
the value of one variable if the value of
another variable is known
Example
A company keeps extensive records on its sales people on
the promise that the sales should increase with experience. A
random sample of six new sales people produces data on the
experience and sales provided in the table below

Months on job (X) 2 4 6 8 12 14

Monthly sales (Tshs.‘000’) 2.4 7.0 8 11.3 15.0 18.0

a) Plot a scatter diagram and estimate the line of the best fit
b) Determine the linear regression model that exists
between the two variables.
c) Project the monthly sales for 9 months experience on job
Scatter Diagram
Scatter Diagram
20

18

16

14

12
Monthly Sales

10

0
0 2 4 6 8 10 12 14 16

Months on Jobs
•Summary data
x y xy x2 Y2
2 2.4 4.8 4 5.76
4 7 28 16 49
6 8 48 36 64
8 11 88 64 121
12 15 180 144 225
14 18 252 196 324
Sum 46 61.4 600.8 460 788.76
From the table we find that
 x  46,  y 61.4,  xy  600.8,   460,
x 2
  788.76
y 2
Now yˆ  ˆ1  ˆ 2 x

Then,

ˆ n  xy   x  y  y  2 x
2  , ˆ1 
n  x   x 
2
2 n

6  600 . 8    46  61 . 4 
ˆ 2   1 . 21 ,
6  460    46 
2

ˆ 61.4  1.21 46


1   0.97
6

 The required line of the best fit is,


yˆ  0 . 97  1 . 21 x
Correlation
Simple Correlation Analysis
Correlation is the determination of degree of
relationship between two or more variables
Correlation Analysis is the process of examination
on how strong the variables relates
The degree measure coefficient of correlation , r
can be determined by different formulas, but we
will see only two
i. Carl Peason’s Moment of correlation
Coefficient and
ii. Spear Man rank correlation coefficient
Coefficient of correlation
The coefficient r is evaluated as
n  xy   x  y
r
n  x   x
2 2
 n  y   y
2 2

 Also we use
S xy
r , 1 r 1
S xx S yy
It can take a value from -1≤ r ≤ 1
Perfect Positive correlation
Scatter Diagram

12

10

6
Y

0
0 2 4 6 8 10 12
X

Allpoints lie on the straight line in the direction


Correlation coefficient = +1
Called perfect positive linear relationship
Higher Positive correlation
Scatter Diagram
12

10

6
Y

0
0 2 4 6 8 10 12
X

Many points lie on the straight line in the direction


Correlation coefficient , 0< r < +1
Called higher or ( weak, moderate, strong) positive
Perfect negative correlation
7

0
0 1 2 3 4 5 6 7

All points lie on the straight line in the direction


Correlation coefficient , r = -1
Called perfect negative linear relationship
Higher negative correlation
Scatter Diagram
12

10

6
Y

0
0 2 4 6 8 10 12
X

Many points lie on the straight line in the direction


Correlation coefficient , -1< r < 0
Called higher (weak, moderate, strong) negative
relationship
No correlation

Scatter Diagram

8
7

6
5
Y

4
3

1
0
1.5 2 2.5 3 3.5 4 4.5 5 5.5
X

Points are haphazard located with no particular


direction
Correlation coefficient , r = 0
Example
Recall the previous example. Determine the
coefficient of correlation

Solution
Refer the data
Sxx  107.3,Syy  160.4,Sxy  129.3,β̂1  0.97,β̂ 2  1.21
Sxy 129 . 3
r    0 . 99
S xx S yy 107 . 3  160 . 4

r = 0.99 show strong positive linear relationship


Try
The following data obtained from claims
drawn on life assurance policies for particular
category of employment, relates age at official
retirement to age of death for nine males

Age of retire 57 62 60 57 65 60 58 62 56
Age of death 71 70 66 70 69 67 69 63 70

Calculate the product moment coefficient of


correlation between the age of retirement and
age of death.
Coefficient of Determination
Is the percentage of the independent explanatory
variables has explained all variations/factors of the
dependent variable
Example:
Recall the previous example.
Determine the coefficient of Determination
Solution:
Coefficient of determination is
r2 %= (0.99)2x100= 98%
indicate that an explanatory variables (x) has explain
98% of all linear variation of the dependent variable (y)
and 02% can be explained by the another variable, not
considered in this case
Spearman Rank Correlation
Spearman Rank Correlation measures the degree
of association between the two variable.
It finds out if the variables concerned do have
some association
Spearman Rank Correlation, r

6 d2
r 1 1 r 1
n(n 2  1)

where d is the deviation between pairs of rankings


of the two variables
n is the number of pairs for the rankings
Example
 The manager of company with ten operating plants of
similar size producing small components have observed the
following pattern of expenditure on inspection and
defective parts delivered to the customer.
Observation Inspection Expenditure Defective parts
in Tshs 1000 per 1000 units delivered
1 25 50
2 30 35
3 15 60
4 75 15
5 40 46
6 65 20
7 45 28
8 24 45
9 35 42
10 70 22
Solution
Find the rank correlation of the expenditure
and defective number of units.
X Rank y Rank d d^2
15 10 60 1 9 81
24 9 45 4 5 25
25 8 50 2 6 36
30 7 35 6 1 1
35 6 42 5 1 1
40 5 46 3 2 4
45 4 28 7 -3 9
65 3 20 9 -6 36
70 2 22 8 -6 36
75 1 15 10 -9 81
        Sum  = 310
Solution Cont…
n=10
6  d2
r  1   1  r  1
n(n 2
 1)

6  310
r  1 
10 ( 100  1)

r   0 . 88
Two commentators gave ratings out of
100 for sports personalities. The ratings
are shown in the table below.
Personality A B C D E F G

Commentator I 73 76 78 65 86 82 91

Commentator II 77 78 79 80 86 89 95

Calculate Spearman’s rank correlation


coefficient for these ratings.

You might also like