Professional Documents
Culture Documents
Session 19 1
Session 19 1
Session 19 1
Regression Analysis
• Having determined the correlation between X and Y, we
wish to determine a mathematical relationship between
them.
• Dependent variable: the variable you wish to explain
• Independent variables: the variables used to explain the
dependent variable
• Regression analysis is used to:
▪ Predict the value of a dependent variable based on
the value of at least one independent variable
▪ Explain the impact of changes in an independent
variable on the dependent variable
Types of Relationships
Y Y
X X
Y Y
X X
Types of Relationships
Y Y
X X
Y Y
X X
Types of Relationships
No relationship
X
Simple Linear Regression Analysis
• The simplest mathematical relationship is
• Y = a + bX + error (linear)
• Changes in Y are related to the changes in X
• What are the most suitable values of
▪ a (intercept) and b (slope)?
Y b
1
y = a + b.x X
}a
X
Method of Least Squares
Y
(xi,yi)
ERROR
a + bxi
xi X
The best fitted line would be for which all the ERRORS
are minimum.
Least Squares Procedure
• We want to fit a line for which all the errors are
minimum.
• We want to obtain such values of a and b in
Y = a + bX + error for which all the errors are
minimum.
• To minimize all the errors together we minimize
the sum of squares of errors (SSE).
n
S = ( y i − a − bxi ) 2
i =1
• To get the values of a and b which minimize SSE, we
proceed as follows:
SSE n
= 0 −2 ( yi − a − bxi ) = 0
a i =1
n n
yi = na + b xi (1)
i =1 i =1
SSE n
= 0 −2 ( yi − a − bxi ) xi = 0
b i =1
n n n
yi xi = a xi + b xi2 (2)
i =1 i =1 i =1
y
i =1
i = na + b xi
i =1
n n n
i i i i
y x
i =1
= a x + b x 2
i =1 i =1
n
n n n
n yi xi − yi xi (y i − y )(xi − x )
b = i =1 i =1 i =1 = i =1
=
SSXY
2 n
( )
xi − x
n n SSX
n xi − xi
2
2
i =1 i =1 i =1
a = y − bx
• The values of a and b obtained using least squares
method are called as least squares estimates (LSE)
of a and b.
• Thus, LSE of a and b are given by
SSXY
aˆ = y − bˆx, ˆ
b= .
SSX
• Also the correlation coefficient between X and Y is
x = 2.15, y = 80.
SSXY
r= = −0.957
SSX SSY
140
120
100
80
60
40
0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75
Fitted Line is ŷ = 189.91 − 51.12 x
◙ We can predict the value of y for some given value
of x.
◙ 189.91 is the predicted value of y when the value
of x is zero.
◙ -51.12 is the change in the predicted value of y as
a result of a one-unit change in x.
◙ For x = 2.15 (say), predicted value of y is
◙ 189.91 – 51.12 x 2.15 = 80.002
The Residuals and their analysis
◙ Regression is an attempt to
explain the value of dependent
variable Y in terms of
independent variable X.
Y ◙ Residual is the unexplained part
(xi,yi)
of Y.
Residual ◘ Variation in Y that is not
explained by X.
◙ The smaller the residuals, the
better the utility of Regression.
◙ Sum of Residuals is always zero.
Least Square procedure ensures
that.
◙ Residuals play an important role
in investigating the adequacy of
xi X the fitted model.
◙ We obtain coefficient of
determination (R2) using the
residuals.
◙ R2 is used to examine the
adequacy of the fitted linear
model to the given data.
Coefficient of Determination
( y − yˆ ) (y − y)
( yˆ − y )
n
◙ Total Sum of squares SST = (y
i =1
i − y) 2
n
◙ Residual (Error) Sum of squares SSE = ( yi − yˆ i ) 2
i =1 n
(a) Fit a regression line for this data using the method of least squares.
b= SSXY/SSX = .0625
a = y − bx = 5.0500 − .0625 * 46.1429 = 2.1661
Y = 2.1661 + 0.0625 X
SSXY
r= = .5545
SSX SSY
R2 = r2 =.3075