Professional Documents
Culture Documents
Linear Regression Full Version
Linear Regression Full Version
1
Goals, Aims, and Requirements
2
Regression Analysis
IIn regression
i analysis
l i we analyze
l th
the relationship
l ti hi between
b t t
two or more
variables.
The relationship between two or more variables could be linear or non linear.
linear
This week we only talk about relationship between only two variables. We also
restrict our topic to Linear Regression.
3
F
Functional
ti l relationship
l ti hi St ti ti l relationship
Statistical l ti hi
“Exact”
Regression:
• response Y is dependent variable, a random variable
• xi’s are independent variables (or regressors) and are
mathematical variables ii.e.
e can be set/controlled or
measured with negligible errors
• linear regression: parameters occur linearly
• simple
i l lilinear regression:
i lilinear parameters
t & 1 regressor
4
• simple linear regression: true (population) regression line
sample
p data used to p
produce
fitted regression line :
a estimated intercept
b estimated slope 6
• statistical models
for response r.v., Y, at the given x
say, a point yi
yi= α+βxi+εi
ε is a random error or random
disturbance, with properties:
ÎE(ε) = 0, and
Îconstant Var(ε)
Var( ) = σ2 (homogenous
variance assumption)
σ2 is error variance or residual variance
Îin fact, ε ~ n(ei;0,σ) – see Figure 11.4 7
Figure
g 11.4
8
using fitted model, given the point yi at xi
9
• fitting – method of least squares
Î form sum-of-squares , SSE:
11
Properties of least-square estimators
Yi= α+βxi+εi
r.v. εi, for i=1, 2, …, n indep Î r.v. Yi, for i=1, 2, …, n indep
E(ε) = 0 Î E(Y/xi) = α+βxi , for i=1, 2, …, n
Var(ε)
( ) = σ2 Î Var(Y/x
( i) = σ2 for i=1, 2, …, n
It can be shown the following for r.v. B, estimator of β, and
r.v. A, estimator for α:
μB= E(B)=β; μA= E(A)=α
13
Inferences of regression coefficients
εi~ n(e
( i;0,σ)
0 ) Î Yi ~ n(y β i,σ)) since A, B are linear functions of Yi
( i;α+βx
Î A~ n(a;α,σA)
Î B~ n(b;β,σB)
Slope parameter β
H
Hypothesis
th i ttesting
ti H Ho: β=β
β β0
14
Examples
a p es 11.2
15
16
Intercept parameter α
17
Examples
a p es 11.4
18
19
Prediction using
predict:
di t
Î mean response
(x0 need not be one of the pre-chosen xi)
Îsingle value y0 of r.v. Y0 at x=x0
Mean response
(1-α) C.I.
C I for
20
Example 11.6
21
Prediction interval for single response y0
22
Example
a p e 11.7
Figure 11.12
23
ANalysis-Of-VAriance (ANOVA) approach
24
ANOVA Table (see Figure 11. 14)
variation of sq
Regression SSR 1 SSR/1 SSR/s2
E
Error SSE n-2
2 SSE/(n-
SSE/(
2)=s2
Total SST n-1
n 1
corrected SS
Under the null hypothesis
hypothesis, f=(SSR/1)/(SSE/(n-2) follows f-statistic)
25
Coefficient of Determination, R2
R2 = 1 – (SSE/ SST)
large R2 generally means a good linear fit;
small R2 generally means a poor linear fit
26
The Regression Picture
ŷi = βxi + α
yi
C A
B
y
B y
A
C
yi
*Least squares
x estimation gave us the
n n n line (β) that minimi
minimized
ed
∑i=1
( y i − y ) 2
= ∑
i=1
( yˆ i − y ) 2
+ ∑
i=1
( yˆ i − y i ) 2 C2
R2=SSreg/SStotal
A2 B2 C2
SStotal SSreg SSresidual
Total squared distance of Distance from regression line to naïve mean of Variance around the regression line
observations from naïve mean of y y Additional variability not explained
Total variation Variability due to x (regression) 27 squares method aims
by x—what least
to minimize
Lack of Fit and Repeating Observations
28
Table
ab e 11.3
3
29
Data Plots and Transformations
observed residuals:
32
Correlation
34