Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

The Unbiasedness Property

Model Specification

Model Specification: Precision and Bias

Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009

April 1, 2009

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

The Classical Linear Model:

1 Linearity: Y = X + u.
2 Strict exogeneity: E(u|X) = 0
3 No Multicollinearity: (X) = K, w.p.1.
4 No heteroskedasticity/ serial correlation: V (u|X) = 2 In .

Gauss/Markov: = (X 0 X)1 X 0 Y is best linear unbiased.

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Result: is unbiased, that is E()


= .

Proof

= (X 0 X)1 X 0 Y
= (X 0 X)1 X 0 (X + u)
= (X 0 X)1 X 0 X + (X 0 X)1 X 0 u
= + (X 0 X)1 X 0 u
= + E (X 0 X)1 X 0 u

E()
= + (X 0 X)1 X 0 E (u)
=

Note the crucial role of the E(u) = 0 assumption.

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Specification errors, bias and imprecision

So far we have considered that our linear model Y = X + u is


correct

Consider the following case

Y = X1 1 + X2 2 + u

where all classical assumptions hold K1 and K2 are the columns of


X1 and X2 . Trivially, our original model corresponds to
X = [X1 X2 ], with K = K1 + K2 .

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Consider the following scenarios regarding 2 and the


corresponding estimation strategies:

Omission of relevant variables: 2 6= 0, but we wrongly


proceed as if 2 = 0, that is, we regress Y on X1 only.
Inclusion of irrelevant variables: 2 = 0, but we wrongly
proceed as if 2 might be 6= 0, that is, regress Y on X1 and
X2 when we could have ignored X2 .

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Biases

Let us compare results for the estimation of 1 in the two scenarios


I) Omission of relevant variables
First note that in this case

Y = X1 1 + u

with u = X2 2 + u.

Let 1 = (X10 X1 )1 X10 Y .


This is the OLS estimator when we regress Y on X1 only (omiting
X2 .

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Result: 1 will be biased unless X20 X1 = 0.

X20 X1 = 0 implies that X2 and X1 are uncorrelated.

In words: the omission or relevant (2 6= 0) variables leads to


biases in the estimation, unless the omited variables are
uncorrelated with the included variables.

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Proof:

1 = (X10 X1 )1 X10 Y
= (X10 X1 )1 X10 (X1 1 + X2 2 + u)
= 1 + (X10 X1 )1 X10 X2 2 + (X10 X1 )1 X10 u
E(1 ) = 1 + (X10 X1 )1 X10 X2 2 + (X10 X1 )1 X10 E(u)
= 1 + (X10 X1 )1 X10 X2 2

This expression is equal to 1 if X10 X2 = 0.

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

II) Inclusion of Irrelavant Variables


In this case we would estimate 1 jointly with 2 by regressing Y
on X1 and X2 , that is, 1 is a subvector of

1
 
= = (X 0 X)1 X 0 Y
2

It is important to see that under the classical assumptions and


hence 1 will be unbiased. Why?

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Variances

Result: the estimator that omits X2 has smaller variance.

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Bias-variance trade-off

To summarize:
In practice we do not know which model holds (the large one
or the small one)?
The trade-off: estimating a small model (omit variables)
implies a gain in precision and a likely bias. A large model is
less likely to be biased and will be more inefficient.
Variable omission does not necessarily lead to biases.

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

Ommited Variable Bias: an example

Computer generated data, but based on Appleton, French and


Vanderpump (Ignoring a Covariate: an Example of Simpons
Paradox, The American Statistician, 50, 4, 1996)
Y = risk of death.
SM OKE = consumption of cigarrettes.

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

. reg y smoke

Source | SS df MS Number of obs = 100


-------------+------------------------------ F( 1, 98) = 194.34
Model | 7613.25147 1 7613.25147 Prob > F = 0.0000
Residual | 3839.18734 98 39.1753811 R-squared = 0.6648
-------------+------------------------------ Adj R-squared = 0.6614
Total | 11452.4388 99 115.6812 Root MSE = 6.259

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | -1.819348 .1305081 -13.94 0.000 -2.078337 -1.560359
_cons | 158.5975 4.774249 33.22 0.000 149.1231 168.0718
------------------------------------------------------------------------------

Walter Sosa-Escudero Model Specification: Precision and Bias


The Unbiasedness Property
Model Specification

. reg y smoke age

Source | SS df MS Number of obs = 100


-------------+------------------------------ F( 2, 97) = 5424.58
Model | 11350.9524 2 5675.47622 Prob > F = 0.0000
Residual | 101.486373 97 1.04625126 R-squared = 0.9911
-------------+------------------------------ Adj R-squared = 0.9910
Total | 11452.4388 99 115.6812 Root MSE = 1.0229

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | .9431267 .050902 18.53 0.000 .8421004 1.044153
age | .9804631 .0164039 59.77 0.000 .9479059 1.01302
_cons | 12.84084 2.560392 5.02 0.000 7.759169 17.92251
------------------------------------------------------------------------------

. cor y smoke age


(obs=100)

| y smoke age
-------------+---------------------------
y | 1.0000
smoke | -0.8153 1.0000
age | 0.9797 -0.9080 1.0000

Walter Sosa-Escudero Model Specification: Precision and Bias

You might also like