Linear Regression Full Version

Simple Linear Regression
1
Goals, Aims, and Requirements
Understand the concept of Regression.

Simple
p Linear Regression.
g
The Least Squares Method
The Coefficient of Determination
Using the Estimated Regression
Equation for Estimation and Prediction
t test
F test
.
2
Regression Analysis
IIn regression
i analysis
l i we analyze
l th
the relationship
l ti hi between
b t t
two or more
variables.
The relationship between two or more variables could be linear or non linear.
linear
This week we only talk about relationship between only two variables. We also
restrict our topic to Linear Regression.
Simple Linear Regression : Linear Regression Between Two Variables
How we could use available data to investigate such a relationship?
If there exist a relationship, how could we use this relationship to forecast

future.
3
F
Functional
ti l relationship
l ti hi St ti ti l relationship
Statistical l ti hi
“Exact”
Regression:
• response Y is dependent variable, a random variable
• xi’s are independent variables (or regressors) and are
mathematical variables ii.e.
e can be set/controlled or
measured with negligible errors
• linear regression: parameters occur linearly
• simple
i l lilinear regression:
i lilinear parameters
t & 1 regressor
4
• simple linear regression: true (population) regression line
“true” (popl) regression line:

α intercept parameter
β slope
p pparameter
α, β also called regression coefficients 5
• simple linear regression: fitted sample regression line
sample
p data used to p
produce
fitted regression line :
a estimated intercept
b estimated slope 6
• statistical models
for response r.v., Y, at the given x
say, a point yi
yi= α+βxi+εi
ε is a random error or random
disturbance, with properties:
ÎE(ε) = 0, and
Îconstant Var(ε)
Var( ) = σ2 (homogenous
variance assumption)
σ2 is error variance or residual variance
Îin fact, ε ~ n(ei;0,σ) – see Figure 11.4 7
Figure
g 11.4
8
using fitted model, given the point yi at xi
ei = residual (observable) (different from εi )
9
• fitting – method of least squares
Î form sum-of-squares , SSE:
Î minimize SSE to give least-square estimators of α, β
Î see Example 11.1

10
Example
a p e 11.1
11
Properties of least-square estimators
Yi= α+βxi+εi
r.v. εi, for i=1, 2, …, n indep Î r.v. Yi, for i=1, 2, …, n indep
E(ε) = 0 Î E(Y/xi) = α+βxi , for i=1, 2, …, n
Var(ε)
( ) = σ2 Î Var(Y/x
( i) = σ2 for i=1, 2, …, n
It can be shown the following for r.v. B, estimator of β, and
r.v. A, estimator for α:
μB= E(B)=β; μA= E(A)=α
Î need to estimate σ2 to draw inference of β &12α

an unbiased estimator of σ2 is:
13
Inferences of regression coefficients
εi~ n(e
( i;0,σ)
0 ) Î Yi ~ n(y β i,σ)) since A, B are linear functions of Yi
( i;α+βx
Î A~ n(a;α,σA)
Î B~ n(b;β,σB)
Slope parameter β
(1-α) C.I. for β in regression line
H
Hypothesis
th i ttesting
ti H Ho: β=β
β β0
See Examples 11.2 & 11.3 & Figures 11.8

& 11.0
11 0
14
Examples
a p es 11.2
15
16
Intercept parameter α
(1-α) C.I. for α in regression line
Hypothesis testing Ho: α=α0
See Examples 11.4 & 11.5
17
Examples
a p es 11.4
18
19
Prediction using
predict:
di t
Î mean response
(x0 need not be one of the pre-chosen xi)
Îsingle value y0 of r.v. Y0 at x=x0
Mean response
(1-α) C.I.
C I for
See Example 11.6 & Fig 11.11
20
Example 11.6
21
Prediction interval for single response y0
(1-α) C.I. for single response y0
See Example 11.7
22
Example
a p e 11.7
Figure 11.12
23
ANalysis-Of-VAriance (ANOVA) approach
Îpartition total corrected sum of squares of y into 2

p
components
SST = SSR + SSE

Total Corrected Regression Sum Error Sum of Sq
Sum of Sq of Sq (variation unexplained,
(variation explained by pure error)
(variation ideally would
be explained by the the model))
model)
24
ANOVA Table (see Figure 11. 14)
Source of Sum Dof Mean sq f computed
variation of sq
Regression SSR 1 SSR/1 SSR/s2
E
Error SSE n-2
2 SSE/(n-
SSE/(
2)=s2
Total SST n-1
n 1
corrected SS
Under the null hypothesis
hypothesis, f=(SSR/1)/(SSE/(n-2) follows f-statistic)
If fcomputed >fα,(1, n-2) Î β≠ 0 Note: same as (tα/2)2 for testing slope
25
Coefficient of Determination, R2
Î R2 as a measure of quality of fit?
R2 = 1 – (SSE/ SST)
large R2 generally means a good linear fit;
small R2 generally means a poor linear fit
Note: see Figure 11.10
26
The Regression Picture
ŷi = βxi + α
yi
C A
B
y
B y
A
C
yi
*Least squares
x estimation gave us the
n n n line (β) that minimi
minimized
ed
∑i=1
( y i − y ) 2
= ∑
i=1
( yˆ i − y ) 2
+ ∑
i=1
( yˆ i − y i ) 2 C2
R2=SSreg/SStotal
A2 B2 C2
SStotal SSreg SSresidual
Total squared distance of Distance from regression line to naïve mean of Variance around the regression line
observations from naïve mean of y y Additional variability not explained
Total variation Variability due to x (regression) 27 squares method aims
by x—what least
to minimize
Lack of Fit and Repeating Observations
see Figure 11.17 and Table 11.3
SSE may comprise 2 components

Îpure experimental error
Îlack of fit
lack of fit affects adequacy
q y of linear model ((Fig
g 11.17))
lack of fit component can be tested by having repeated
observations (Table 11.3)
28
Table
ab e 11.3
3
29
Data Plots and Transformations
see Figure 11.19 and Table 11.6
transformations Î allow variable to enter in a non-linear way

non-linear: yi = αeβxiεi
linear: ln(yi) = ln α + βxi+ ln εi
Notes:
(1) error structures in transformed and original variables may
differ (e.g. from multiplicative to additive)
(2) if y’s are transformed, use of values of residuals in the
metric of untransformed response in comparing between
models
d l
see Example 11.9 and Figure 11.20 30
31
Diagnostic plots of residuals – Graphical detection of violations of
assumptions
observed residuals:
assumption of homogeneous variance - see Figures 11.21

and 11.22
assumption of normality – normal probability testing: see

case studyy
case study: Table 11.8, Figures 11.23-11.27
32
Correlation
• measure of strength of associations between 2 random

variables
• popl correlation coefficient, ρ Î strength of linear
association between 2 variables b
• sample correlation coefficient (r) Î
see Figure
g 11.28
• sample coefficient of determination (r2)

Îamount of variation accounted byy linear relationship
p
SSR
• test Ho: ρ=0

see Example 11.11
33
Example
a p e 11.11
34

Linear Regression Full Version

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Regression Full Version

Uploaded by

Copyright:

Available Formats

Simple Linear Regression

Understand the concept of Regression.

Simple Linear Regression : Linear Regression Between Two Variables

How we could use available data to investigate such a relationship?

If there exist a relationship, how could we use this relationship to forecast

“true” (popl) regression line:

ei = residual (observable) (different from εi )

Î minimize SSE to give least-square estimators of α, β

Î see Example 11.1

Î need to estimate σ2 to draw inference of β &12α

(1-α) C.I. for β in regression line

See Examples 11.2 & 11.3 & Figures 11.8

(1-α) C.I. for α in regression line

Hypothesis testing Ho: α=α0

See Examples 11.4 & 11.5

See Example 11.6 & Fig 11.11

(1-α) C.I. for single response y0

See Example 11.7

Îpartition total corrected sum of squares of y into 2

SST = SSR + SSE

Source of Sum Dof Mean sq f computed

If fcomputed >fα,(1, n-2) Î β≠ 0 Note: same as (tα/2)2 for testing slope

Î R2 as a measure of quality of fit?

Note: see Figure 11.10

see Figure 11.17 and Table 11.3

SSE may comprise 2 components

see Figure 11.19 and Table 11.6

transformations Î allow variable to enter in a non-linear way

assumption of homogeneous variance - see Figures 11.21

assumption of normality – normal probability testing: see

case study: Table 11.8, Figures 11.23-11.27

• measure of strength of associations between 2 random

• sample coefficient of determination (r2)

• test Ho: ρ=0

You might also like