Chapter 4 Multiple Regression Model

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Chapter Three

• MULTIPLE LINEAR REGRESSION MODEL


– Introduction: extension of simple regression
analysis
– Assumptions
– OLS Estimations
• OLS exercise
– Hypothesis Testing and Inferences
– Coefficient of determination and test of model
adequacy

Tasew T.(PhD)
3.1 Introduction

• In the simple linear regression model, we have seen the basic


statistical tools and procedures for analyzing relationships
between two variables.
• But in practice, economic models generally contain one
dependent variable and two or more independent variables.
Such models are called multiple regression models.
• Example:
– In theory of demand, we study the relationship between the demand for
a good (Y) and price of the good (X2 ), prices of substitute goods (X3 )
and the consumer’s income (X4 ).
Tasew T.(PhD)
– Production function: Y=f(K, L)
Introduction to the model
• Interested to establish a relationship between Output(Q) and labor
input(L) and capital input(K)—which entails to use a multiple
regression model of the above form but with two independent
variables.
• The relationship is estimated by a multiple linear regression equation
(model) of the form:

Y = 1 + 2X2 + 3X3 + 4X4 + U

• Here, Y is the dependent variable and X2, X3 and X4 are the


explanatory (independent) variables.
• Multiple regression analysis is used for testing hypotheses about the
relationship between a dependent variable Y and two or more
independent variables X and for prediction.
Tasew T.(PhD)
Intro...
• Multiple Regression allows us to:
1. Disentangle and examine the separate effects of
the independent variables on the dependent
variable
2. Use multiple variables to predict Y.
3. Assess the combined effects of the independent
variables on Y.

Tasew T.(PhD)
Intro…
Previously, the bivariate regression equation was:

In the multivariate case, the regression equation becomes:

Tasew T.(PhD)
3.2 Model Assumptions
• Dependent variable: Y of size nx1 • No multicollinearity: No
• Independent (explanatory) exact linear relationship exists
variables: X2 , X3 , . . ., Xk each of
between any of the
Y = 1 + 2 X2 + 3 X3 +…+ k Xk + u
size nx1.
explanatory variables.

Assumptions:
Note: all the assumptions we
made in chapter two wroks • Or minimal multicollinearity
here among independent variables.
The only additional assumption
here is that there is no
multicollinearity, meaning that
there is no linear dependence
between the regressor Tasew T.(PhD)
Assumption of no multicollinearity:
• Multicollinearity means the existence of a “perfect,” or “exact”,
linear relationship among some or all explanatory variables
included in the model.

• No multicollinearity means none of the regressors can be


written as exact linear combinations of the remaining regressors
in the model
• No perfect collinearity, i.e., Xk  Xj

Tasew T.(PhD)
3.3 OLS Estimation of Parameters
• The k — variable population regression function involving the
dependent variable Y and k-1 explanatory variables x2, x3, ...,xk
may be written as:
• Yi = 1 + 2X2i+ 2X3i + ….+ kXki + U I
• The Three-variable Linear Model:
• Here we will consider a model with two explanatory variables
• Yi = 1 + 2X2i+ 3X3i + Ui , where
• Y is the dependent variable, X2 and X3 are the explanatory
variables (or regressors),
• U the stochastic disturbance term, and i is the ith observation;
in case the data are time series, the subscript i will denote the
ith observation.
• 1 is the intercept term. As usual, it gives the mean or average
effect on Y of all the variables excluded from the model,
although its mechanical interpretation
Tasew T.(PhD) is the average value of Y
Estimation of parameters...
• The coefficients 2 and 3 are called the partial regression/slope
coefficients.
• The partial regression coefficients are interpreted as
representing the partial effect of the given explanatory variable
on the explained variable, after holding constant, or eliminating
the effect of, all other explanatory variables.
• In other words, each coefficient measures the average change in
the dependent variable per unit change in a given ind’t
variable, holding all other ind’t variables constant at their
average values.
• For example, 2 ˆmeasures the effect of x2 on y after eliminating
the effects of x3.
– it gives the “direct” or the “net” effect of a unit change in X2 on the mean
value of Y, net of any effect that X3 may have on mean Y.
• Likewise, β3 measures the change in the mean value of Y per
unit change in X3, holding the value of X2 constant.
Tasew T.(PhD)
Cont’d.

The matrix form of the above is:

Tasew T.(PhD)
Derive OLS estimators of multiple regression
^ ^ ^
Y = 1 + 2X2 + 3X3 + u^

^u = Y - ^ - ^ X - ^ X
1 2 2 3 3

OLS is to minimize the SSR( u2) ^

^ ^ ^ ^
min. RSS = min.  u = min. (Y - 1 - 2X2 - 3X3)2
2

RSS
^ ^ ^
1^ =2  ( Y - 1- 2X2 - 3X3)(-1) = 0

RSS
^ ^ ^
2^ =2  ( Y - 1- 2X2 - 3X3)(-X2) = 0

RSS
^^ ^
3^ =2  ( Y - 1- 2X2 - 3X3)(-X3) = 0

Tasew T.(PhD)
Rearranging three equations:

n^1 + ^2 X2 + ^3  X3 = Y (1)

^
^ ^
1X2 + 2 X2 + 3  X2X3 = X2Y
2
(2)

^ ^ ^
1 X3 + 2 X2X3 + 3  X32 = X3Y (3)

rewrite in matrix form:

n X2 X3 ^1 Y


2-variables Case
X2 X22 X2X3 2  X 2Y
X3 X2X3 X32 ^
3 = X3Y
3-variables Case
^

^
(X’X)
 = X’Y Matrix notation

Tasew T.(PhD)
OLS...
• Following the convention of letting the
lowercase letters denote deviations from
sample mean values, one can derive the
following formulas from the normal
equations (2) and (3)
rewrite in deviation form:

^
X2 X2X3
2
2  X2Y
^ =
X2X3 X32 3 X3Y

Tasew T.(PhD)
Cramer’s rule:

X2Y X2X3
X3Y X32
^ (yx2)(x32) - (yx3)(x2x3)
2 = =
(x22)(x32) - (x2x3)2
X X2X3
2
2

X2X3 X32

X22 X2Y
X2X3 X3Y
(yx3)(x22) - (yx2)(x2x3)
^
3 = =
X22 X2X3
(x22)(x32) - (x2x3)2
X2X3 X32

_ _ _
^
^1 = Y - 2X2 - 3X3 ^

Tasew T.(PhD)
Tasew T.(PhD)
Example: House Price Results
Dependent Variable: Canadian Dollars per Month

Variable Coefficient t-ratio A priori sign expected


Intercept 282.21 56.09 +
LnAGE -53.10 -59.71 -
NBROOMS 48.47 104.81 +
AREABYRM 3.97 29.99 +
ELEVATOR 88.51 45.04 +
BASEMENT -15.90 -11.32 -
OUTPARK 7.17 7.07 +
INDPARK 73.76 31.25 +
NOLEASE -16.99 -7.62 -
LnDISTCBD 5.84 4.60 -
SINGLPAR -4.27 -38.88 -
DSHOPCNTR -10.04 -5.97 -
VACDIFF1 0.29 5.98 -
Notes: Adjusted R2 = 0.65l; regression F-statistic = 2082.27. Source: Des Rosiers and
Thérialt
(1996). Reprinted with permission of the American Real Estate Society.

Tasew T.(PhD)
Example 1: Estimate the regression parameters using the
following data

X2 X3
X2 X3 yX2 yX3 X2 X3 X2 2 X3 2
X2 X3

Tasew T.(PhD)
Example…

Hence, the estimated model is:


Interpret the partial slope coefficients
• = where X2 and X3 are fertilizer(Kg) and
Tasew T.(PhD) pesticide used for corn production(Y)
3.4 Variances and standard errors of
estimators
• We need the standard errors for two main
purposes:
– to establish confidence intervals and
– to test statistical hypotheses.
• The relevant formulas are as follows:

Tasew T.(PhD)
The variances of estimated regression coefficients are
estimated respectively as follows
Unbiased estimates of the
variance are given by
• where r23 is the coefficient
of correlation between X 2

Alternatively, it can be and X .


3

computed as
• Taking the square roots, we
obtain the standard errors
of the estimated regression
coefficients.
• An unbiased estimator of
the^variance of the errors δ2
is: δ2= ESS/n-3
Tasew T.(PhD)
Example: statistics to compute residual sum of squares.
To compute the variances and standard errors of estimates, we need
to estimate the error variance:

Tasew T.(PhD)
Alternatively we can use this short-hand formula,
Example: Variances and standard errors of
estimates
^
Var(2)= Se=0.2449

^
Var( Se =0.2645
3 )=

Residual
=1.952
variance

Tasew T.(PhD)
3.5 Hypothesis Testing and Inferences
• Hypothesis testing about
– 1. significance of individual
partial regression coefficients
– 2. Overall significance of the The test statistic(t-cal.) is
regression parameters
• Hypothesis testing about
individual partial regression • Decision rule:
coefficient can be conducted
using the t—statistic as usual. • If | t j|> t /2, (n-3) df, we
• To test whether each of the reject Ho and conclude that
coefficients are significant or not,
the null and alternative j is significant, that is, the
hypotheses are given. regressor variable Xj , j = 2, 3,
– Ho: j= 0 …k, significantly affects the

– Ha : j ≠ 0, for j = 2, 3. Tasew T.(PhD) dependent variable Y.


1. Individual partial coefficient test at 5% level of siginificance

1 Holding X3 constant, whether X2 has the effect on Y ?


Y
H0 : 2 = 0 = 2 = 0?
X2
H1 : 2  0

^2 - 0 0.65
t2 = = = 2.7
Se (2)^ 0.24

Compare with the critical value tc0.05, 7= 2.365

Since t > tc ==> reject Ho

Answer : Yes, 2 is^ statistically significant and is significantly different from zero.
From our example, we reject the null hypothesis and conclude that applying
fertilizer significantly affects corn production at the 5% level of significance.
Tasew T.(PhD)
1. Individual partial coefficient test (cont.)

2
Holding X2 constant, whether X3 has the effect on Y?
Does pesticide significantly affect corn productivity?
Y
H0 :  3 = 0 = 3 = 0?
X3
H1 :  3  0

^
3 - 0 1.11
t3 = = = 4.11
Se (3)^ 0.264

Critical value: tc0.05, 7 = 2.365

Since | t | > | tc | ==> reject Ho

^
Answer: Yes, 3 is statistically significant and is significantly different from
zero.
We reject the null hypothesis and conclude that using pesticide significantly
affects per corn production at the 5%
Tasew level of significance.
T.(PhD)
2. Testing overall significance of the multiple regression: Test of
model adequacy
• In multiple regressions, one can test for overall significance of a
sample regression by using the F-statistic.
• Tests the overall significance of the regression parameters.
• A test for the significance of R 2 or a test of model adequacy is
accomplished by testing the hypotheses:
– Ho: 2 = 3 = 4=---= k=0(all variable are zero effect)
– Ha : Ho is not true(at least one of the coefficients is non-zero)
• The test statistic is given by:
– Fcal=( RSS/k-1)/(ESS/n-k), RSS(regression SS), ESS(Error SS)
• In our case, Fcal=(RSS/3-1)/(ESS/n-3), where k is the number of
parameters estimated from the sample data (k = 3 in our case since
we estimate 1 , 2, and 3 ) and n is the sample size.
• The model is adequate in explaining the r/ship between the
dependent variable and one or more of the independent variables if:
Fcal > F  (k-1, n-k) Tasew T.(PhD)
2. Testing overall significance of the multiple regression

3-variable case: Y = 1 + 2X2 + 3X3 + u

H0 : 2 = 0, 3 = 0, (all variable are zero effect)

H1 : 2  0 or 3  0 (At least one variable is not zero,


At least one variable has the effect)

Steps for testing overall significance of the multiple


regression:
Compute and obtain F-statistics
Check for the critical Fc value (Fc, k-1, n-k)
Compare F and Fc , and
if F > Fc ==> reject H0 Tasew T.(PhD)
From the previous example:
• The test statistic is given by:
– Fcal=( RSS/k-1)/(ESS/n-k);
– RSS=TSS-ESS=1634-13.67
– F=(1620.32/(3-1)/(13.67/(10-3)= 413.7
• For =0.01, F 0.01,(2, 7)= 9.55
• For =0.05, F 0.05,(2, 7)= 4.74
• Since the test statistic is greater than both tabulated values, the
above ratio is significant at the conventional levels of
significance (1% and 5%).
• Thus, we reject the null hypothesis and conclude that the model
is adequate, that is, variation (change) in corn production is
significantly attributed to the effect of fertilizer and/or
pesticide application.
Tasew T.(PhD)
3.6 The coefficient of determination(R 2) and AdjustedR 2
• The coefficient of determination R 2 can be calculated as usual
as: RSS/TSS, where RSS is regression sum of square, TSS is
total sum of square

Residual SS=

• From the previous example,

Tasew T.(PhD)
Adjusted R 2
• One of the drawbacks of the R-squared is that it is often a non-
decreasing function of the number of regressors.
– As the number of explanatory (independent) variables increases, R 2 may
increase.
– This implies that the goodness-of-fit of an estimated model depends on
the number of independent (explanatory) variables regardless of
whether they are important or not.

• Though the variables added do not have impact in affecting the


dependent variable, R 2 may increase.
• To eliminate this dependency, we calculate the adjusted R 2(R 2)
as: =1-(1- R 2)/ (n-1/n-k)
Tasew T.(PhD)
Adjusted…
• To avoid this, R 2 can be • From the previous
adjusted by accounting example,
for the degrees of
freedom.
• Unlike R 2 , adjusted R 2
may increase or decrease
when new variables are
added into the model

Tasew T.(PhD)

You might also like