Chapter5-Multiple Linear Regression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

HASTS112/HSTS112 REGRESSION ANALYSIS AND ANOVA

CHAPTER 5: MULTIPLE LINEAR REGRESSION

Introduction
Up to now, we have been dealing with regression relationships in which two
variables, dependent and one independent variables were involved. Multiple
linear regression analysis is merely an extension of simple linear regression.
The difference is that more than one independent variable is involved in the
relationship.
The multiple linear regression model is;

yi = β0 + β1 X1i + β2 X2i + ... + βp−1 Xp−1,i + i

The term linear is used because the model is a linear function of the unknown
parameters, β 0 s.

Example Yield (Y) depend on many variables for example amount of rainfall
and amount of fertilizer. Our model will be of the form;

yi = β0 + β1 X1i + β2 X2i + i

that is a multiple linear regression model with two regressor/independent


variables (X1i being amount of rainfall and X2i being amount of fertilizer)

Estimation of Parameters
There are many ways of estimating the parameters in a regression model.
However in this course we shall focus attention on the Least Squares method.
There are two to apply the LS method

(i) Estimation by substitution.

(ii) Matrix approach.

1
However, with multiple linear regression, the matrix approach seem to be
more appropriate.

In matrix notation, our model is given by

Y = Xβ + , where
     
y1 1 x1,1 . . . xp−1,1 β0

 y2 


 1 x1,2 . . . xp−1,2 


 β1 

. . . . .
     
     
Y =  , X =  , β =   and

 . 


 . . . 


 . 

. . . . .
     
     
yn 1 x1,n . . . xp−1,n βp−1
 
1
 2 
 
 . 
 
= 
 . 

 . 
 

n
Consequently, the random vector has expectation,

E [Y] = Xβ

The fitted data is obtained from,

Ŷ = Xβ̂

There is a standard formula we use estimate the vector β, and this is

β̂ = (XT X)−1 XT Y

Hypothesis Testing on the Parameters


We have hypotheses of the form

(A) H0 βi = b versus H1 βi 6= b for i = 0, 1, ..., p − 1

(B) H0 βi ≥ b versus H1 βi < b for i = 0, 1, ..., p − 1

(C) H0 βi ≤ b versus H1 βi > b for i = 0, 1, ..., p − 1

2
The estimate of the variance of β̂i is given by

V ar(β̂i ) = s2 × [(i + 1), (i + 1)]th element of (XT X)−1


Pn
SSE (yi −ŷi )2
where s2 = n−p
= i=1
n−p

It there follows that for the above hypotheses,


(A) We reject H0 if
β̂i − b
|t| = q > t α2 (n−p)
V ar(β̂i )

(B) We reject H0 if
β̂i − b
t= q < tα (n − p)
V ar(β̂i )

(C) We reject H0 if
β̂i − b
t= q > tα (n − p)
V ar(β̂i )

Analysis of Variance Approach to Multiple Lin-


ear Regression
Analysis of Variance(ANOVA) is a highly useful and flexible mode of anal-
ysis for regression models. We will use ANOVA to compute s2 = SSE n−p
(an
2
estimate of σ ) and to check if there is a regression relationship.

Partitioning the Total Sum of Squares


SST = SSR + SSE
The matrix approach puts this as

SST = YT Y − nȳ 2

SSR = β̂ T XT Y − nȳ 2

3
and
SSE = YT Y − β̂ T XT Y
Once β has been estimated, the sum of squares can be easily computed.

Partitioning the degrees of freedom


SST has n − 1 degrees of freedom
SSR has p − 1 degrees of freedom
SSE has n − p degrees of freedom

Mean squares
The sum of squares divided by its degrees of freedom is called mean squares.
The two important mean squares are the Regression Mean Squares (MSR)
and Error Mean Squares (MSE) and these are given by

SSR
M SR =
p−1
SSE
M SE =
n−p

The Basic ANOVA table

Source of Variation SS d.f MS F


M SR
Regression SSR p-1 MSR F = M SE
Error SSE n-p MSE
Total SST n-1

To test for the significance of the regression, our hypotheses are of the
form

H0 : β1 = β2 = ... = βp−1 = 0
H1 : βi 6= 0 for at least one i at α significance level

4
Test Statistic: F-ratio

Rejection Criteria: Reject H0 if F > Fα (p − 1, n − p)

If we reject H0 , we go on to test the significance of each of the parame-


ters to find out which variable led to the rejection of the null hypothesis.

Exercise 5.1
Consider the following data set, where Y is the dependent variable and X1
and X2 are the regressors

Y 4.1 8.5 5.2 9.6 8.7


X1 2.5 3.7 2.6 5.5 4.0
X2 3.5 4.4 3.9 4.3 4.9

Suppose the data can be described by model Yi = β0 + β1 X1i + β2 X2i + i


where i ∼ N (0, σ 2 ) and Cov(ei , ej ) = 0 if i 6= j

(a) Express the above model in matrix form.

(b) Find the least squares estimator of β given that


 
17.2124 0.5764 −4.5529
[X0 X]−1 =  0.5764

0.2632 −0.3666 

−4.5529 −0.3666 1.4035

(c) Construct the ANOVA table and test for the significance of the regres-
sion line using α = 0.05.

(d) Test the hypothesis H0 : β0 = 0 versus H1 :β0 6= 0 at α = 0.05.

(e) Estimate y at X1 = 3 and X2 = 4.5.

You might also like