Business Stat & Emetrics - Regression Misspec

You might also like

Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 6

The multiple regression model builds on the assumptions below

Assumption 1: Linear relationship

y i=B0 + B1 x 1 i +B 2 x 2 i+ … .+ ε i

Assumption 2 : Zero expected error

E ( ε i )=0

Assumption 3: The error term variance is the same for all observations

2
va r ( ε i )=σ ε for all i

Assumption 4: Covariance of the error terms for any two observations is zero

Cov ( ε i , ε j ) =0 for all i an �၎ j w h ere i≠ j

Assumption 5: independence of errors and explanatory variables

Cov ( X ji , ε j ) =0 for all i∧ j

Assumption 6: the error terms are normally distributed

2
ε j N (0 , σ ε )When the assumptions above hold, the OLS linear regression model parameter estimators
are BLUE: They are the most efficient unbiased estimators of all linear estimators of the parameters. But,
what happens when one or more of the assumptions don’t hold?
Problem I: Including an Irrelevant variable

Suppose we formed the regression equation

y i= B́0 + B́1 x 1 i + B́ 2 x 2 i+ έ i

When the true model were

y i=B0 + B1 x 1 i +ε i

What problem does this cause? Will the estimator of B1 in the wrong model be still unbiased estimator of
the true B1?

 The estimator is still unbiased: on average, we will be correct


 The standard errors associated with the wrong estimates are still unbiased: allow us to from valid
confidence intervals and do hypothesis testing
 But, the estimates will be inefficient: have larger sampling variance. The larger the correlation
between X1 and X2, the larger the cost of error in terms of efficiency

Problem 2: excluding a relevant variable (omission variable bias)

Suppose we formed the regression equation

y i= B́0 + B́1 x 1 i + έ i

When the true model were

y i=B0 + B1 x 1 i +B 2 x 2 i+ ε i

∑ X 2i ( X ¿¿ 1i− X )
E ( B́1 ) =B1 + B2 1
n
¿
∑ ( X ¿¿1 i− X) ¿2

1
The last term is however the coefficient of X1 in regression of X2 on X1. That is, if we forget the
intercept for notational simplifcatin and form the linear regression

X 2 i=b1 2 x 1 i+ v i

Then, the result above means tha t

E ( B́1 ) =B1 + B2 b12

Hence, if B2 b 12 ≠ 0, we get a biased estimate of the true parameter value.

Scenarios

If b 12=0, then the estimator is unbiased, in spite of the value of B2.

If b 12 ≠ 0, the estimator is biased, and the bias depends on the two terms. If one is positive and the other is
negative, our estimator will underestimate the true value on average, and if both are negative or both are
positive, the estimate will on average overestimate the true value.

Nonlinearities

Example 1: non constant slope functions

The regression model can be applied in nonlinear relationships. This is done through transformation of the
nonlinear variable. For instance
2
y i=B0 + B1 X 1 i+ B2 X 1 i +ε i

2
Here, let X =X 1 i. Then the linear regression model is

y i=B0 + B1 X 1 i+ B2 X + ε i

The dependent variable responds to X1 nonlinearly. The first derivative with respect to X1 is

dy
=B 1+2 B2 X 1
dx

The slope is function of B1 and B2. The rate of change of the slope is a function of B2. That is, the
second derviatie is

2
d y
=2 B 2
d x2

Example 2 Cobb-Douglas Production function (stochastic)

α β ε
Q= A K L ϵ

Taking the natural log

ln Q=ln A +α ln K + β ln L A +ε

Now, let
y=ln Q

Bo=ln A

x 1=ln K

x 2=ln L A

The transformed model is

y i=B0 + α x1 + β x 2 + ε i

Problem: Multicollinearity

We want to estimate the model

y i=B0 + B1 x 1 i +B 2 x 2 i+ ε i

But suppose the following relation

X 2 i=γ x 1 i

Then the model becomes

y i=B0 + B1 x 1 i +B 2 γ x1 i + ε i

y i=B0 +(B 1+ B2 γ ) x 1i + ε i

We can estimate the slope B1+ B 2 γ but we can’t identify the individual effects of the two variables.
Intuition: If two variables always vary together, how can we tell their individual effects? A variable’s
explanatory effect on a dependent variable can be identified only if the variable varies independently at
least some times.

Reasons

 If a variable takes same constant value


 If we have two variables that always sum to a constant

Multicollinearity: a data, not a model, problem

Multicollinearity is not a failure of the model: it is simply lack of variation in the data

Hypothesis testing effect

 Multicollinearity causes large standard errors for the estimated parameters- and in turn makes the
t-value smaller- which leads to premature acceptance of the null hypothesis 9(the variable has no
effect on the dependent)

Problem: Measurement error

You might also like