Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Simple Linear Regression

1
Goals, Aims, and Requirements

Understand the concept of Regression.


Simple
p Linear Regression.
g
The Least Squares Method
The Coefficient of Determination
Using the Estimated Regression
Equation for Estimation and Prediction
t test
F test
.

2
Regression Analysis
IIn regression
i analysis
l i we analyze
l th
the relationship
l ti hi between
b t t
two or more
variables.

The relationship between two or more variables could be linear or non linear.
linear

This week we only talk about relationship between only two variables. We also
restrict our topic to Linear Regression.

Simple Linear Regression : Linear Regression Between Two Variables

How we could use available data to investigate such a relationship?

If there exist a relationship, how could we use this relationship to forecast


future.

3
F
Functional
ti l relationship
l ti hi St ti ti l relationship
Statistical l ti hi

“Exact”

Regression:
• response Y is dependent variable, a random variable
• xi’s are independent variables (or regressors) and are
mathematical variables ii.e.
e can be set/controlled or
measured with negligible errors
• linear regression: parameters occur linearly
• simple
i l lilinear regression:
i lilinear parameters
t & 1 regressor
4
• simple linear regression: true (population) regression line

“true” (popl) regression line:


α intercept parameter
β slope
p pparameter
α, β also called regression coefficients 5
• simple linear regression: fitted sample regression line

sample
p data used to p
produce
fitted regression line :
a estimated intercept
b estimated slope 6
• statistical models
for response r.v., Y, at the given x

say, a point yi
yi= α+βxi+εi
ε is a random error or random
disturbance, with properties:
ÎE(ε) = 0, and
Îconstant Var(ε)
Var( ) = σ2 (homogenous
variance assumption)
σ2 is error variance or residual variance
Îin fact, ε ~ n(ei;0,σ) – see Figure 11.4 7
Figure
g 11.4

8
using fitted model, given the point yi at xi

ei = residual (observable) (different from εi )

9
• fitting – method of least squares
Î form sum-of-squares , SSE:

Î minimize SSE to give least-square estimators of α, β

Î see Example 11.1


10
Example
a p e 11.1

11
Properties of least-square estimators

Yi= α+βxi+εi

r.v. εi, for i=1, 2, …, n indep Î r.v. Yi, for i=1, 2, …, n indep
E(ε) = 0 Î E(Y/xi) = α+βxi , for i=1, 2, …, n
Var(ε)
( ) = σ2 Î Var(Y/x
( i) = σ2 for i=1, 2, …, n
It can be shown the following for r.v. B, estimator of β, and
r.v. A, estimator for α:
μB= E(B)=β; μA= E(A)=α

Î need to estimate σ2 to draw inference of β &12α


an unbiased estimator of σ2 is:

13
Inferences of regression coefficients

εi~ n(e
( i;0,σ)
0 ) Î Yi ~ n(y β i,σ)) since A, B are linear functions of Yi
( i;α+βx
Î A~ n(a;α,σA)
Î B~ n(b;β,σB)
Slope parameter β

(1-α) C.I. for β in regression line

H
Hypothesis
th i ttesting
ti H Ho: β=β
β β0

See Examples 11.2 & 11.3 & Figures 11.8


& 11.0
11 0

14
Examples
a p es 11.2

15
16
Intercept parameter α

(1-α) C.I. for α in regression line

Hypothesis testing Ho: α=α0

See Examples 11.4 & 11.5

17
Examples
a p es 11.4

18
19
Prediction using

predict:
di t
Î mean response
(x0 need not be one of the pre-chosen xi)
Îsingle value y0 of r.v. Y0 at x=x0

Mean response

(1-α) C.I.
C I for

See Example 11.6 & Fig 11.11

20
Example 11.6

21
Prediction interval for single response y0

(1-α) C.I. for single response y0

See Example 11.7

22
Example
a p e 11.7
Figure 11.12

23
ANalysis-Of-VAriance (ANOVA) approach

Îpartition total corrected sum of squares of y into 2


p
components

SST = SSR + SSE


Total Corrected Regression Sum Error Sum of Sq
Sum of Sq of Sq (variation unexplained,
(variation explained by pure error)
(variation ideally would
be explained by the the model))
model)

24
ANOVA Table (see Figure 11. 14)

Source of Sum Dof Mean sq f computed

variation of sq
Regression SSR 1 SSR/1 SSR/s2

E
Error SSE n-2
2 SSE/(n-
SSE/(
2)=s2
Total SST n-1
n 1
corrected SS
Under the null hypothesis
hypothesis, f=(SSR/1)/(SSE/(n-2) follows f-statistic)

If fcomputed >fα,(1, n-2) Î β≠ 0 Note: same as (tα/2)2 for testing slope

25
Coefficient of Determination, R2

Î R2 as a measure of quality of fit?

R2 = 1 – (SSE/ SST)
large R2 generally means a good linear fit;
small R2 generally means a poor linear fit

Note: see Figure 11.10

26
The Regression Picture

ŷi = βxi + α
yi
C A

B
y
B y
A
C
yi

*Least squares
x estimation gave us the
n n n line (β) that minimi
minimized
ed
∑i=1
( y i − y ) 2
= ∑
i=1
( yˆ i − y ) 2
+ ∑
i=1
( yˆ i − y i ) 2 C2

R2=SSreg/SStotal
A2 B2 C2
SStotal SSreg SSresidual
Total squared distance of Distance from regression line to naïve mean of Variance around the regression line
observations from naïve mean of y y Additional variability not explained
Total variation Variability due to x (regression) 27 squares method aims
by x—what least
to minimize
Lack of Fit and Repeating Observations

see Figure 11.17 and Table 11.3

SSE may comprise 2 components


Îpure experimental error
Îlack of fit
ƒ lack of fit affects adequacy
q y of linear model ((Fig
g 11.17))
ƒ lack of fit component can be tested by having repeated
observations (Table 11.3)

28
Table
ab e 11.3
3

29
Data Plots and Transformations

see Figure 11.19 and Table 11.6

transformations Î allow variable to enter in a non-linear way


non-linear: yi = αeβxiεi
linear: ln(yi) = ln α + βxi+ ln εi
Notes:
(1) error structures in transformed and original variables may
differ (e.g. from multiplicative to additive)
(2) if y’s are transformed, use of values of residuals in the
metric of untransformed response in comparing between
models
d l
see Example 11.9 and Figure 11.20 30
31
Diagnostic plots of residuals – Graphical detection of violations of
assumptions

observed residuals:

assumption of homogeneous variance - see Figures 11.21


and 11.22

assumption of normality – normal probability testing: see


case studyy

case study: Table 11.8, Figures 11.23-11.27

32
Correlation

• measure of strength of associations between 2 random


variables
• popl correlation coefficient, ρ Î strength of linear
association between 2 variables b
• sample correlation coefficient (r) Î
see Figure
g 11.28

• sample coefficient of determination (r2)


Îamount of variation accounted byy linear relationship
p
SSR

• test Ho: ρ=0


see Example 11.11
33
Example
a p e 11.11

34

You might also like