Specification Econ 471

The Unbiasedness Property
Model Specification
Model Specification: Precision and Bias
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009
April 1, 2009
Walter Sosa-Escudero Model Specification: Precision and Bias

Model Specification
The Classical Linear Model:
1 Linearity: Y = X + u.
2 Strict exogeneity: E(u|X) = 0
3 No Multicollinearity: (X) = K, w.p.1.
4 No heteroskedasticity/ serial correlation: V (u|X) = 2 In .
Gauss/Markov: = (X 0 X)1 X 0 Y is best linear unbiased.

Model Specification
Result: is unbiased, that is E()

= .
Proof
= (X 0 X)1 X 0 Y
= (X 0 X)1 X 0 (X + u)
= (X 0 X)1 X 0 X + (X 0 X)1 X 0 u
= + (X 0 X)1 X 0 u
= + E (X 0 X)1 X 0 u

E()
= + (X 0 X)1 X 0 E (u)
=
Note the crucial role of the E(u) = 0 assumption.

Model Specification
Specification errors, bias and imprecision
So far we have considered that our linear model Y = X + u is

correct
Consider the following case
Y = X1 1 + X2 2 + u
where all classical assumptions hold K1 and K2 are the columns of

X1 and X2 . Trivially, our original model corresponds to
X = [X1 X2 ], with K = K1 + K2 .

Model Specification
Consider the following scenarios regarding 2 and the

corresponding estimation strategies:
Omission of relevant variables: 2 6= 0, but we wrongly

proceed as if 2 = 0, that is, we regress Y on X1 only.
Inclusion of irrelevant variables: 2 = 0, but we wrongly
proceed as if 2 might be 6= 0, that is, regress Y on X1 and
X2 when we could have ignored X2 .

Model Specification
Biases
Let us compare results for the estimation of 1 in the two scenarios

I) Omission of relevant variables
First note that in this case
Y = X1 1 + u
with u = X2 2 + u.
Let 1 = (X10 X1 )1 X10 Y .

This is the OLS estimator when we regress Y on X1 only (omiting
X2 .

Model Specification
Result: 1 will be biased unless X20 X1 = 0.
X20 X1 = 0 implies that X2 and X1 are uncorrelated.
In words: the omission or relevant (2 6= 0) variables leads to

biases in the estimation, unless the omited variables are
uncorrelated with the included variables.

Model Specification
Proof:
1 = (X10 X1 )1 X10 Y
= (X10 X1 )1 X10 (X1 1 + X2 2 + u)
= 1 + (X10 X1 )1 X10 X2 2 + (X10 X1 )1 X10 u
E(1 ) = 1 + (X10 X1 )1 X10 X2 2 + (X10 X1 )1 X10 E(u)
= 1 + (X10 X1 )1 X10 X2 2
This expression is equal to 1 if X10 X2 = 0.

Model Specification
II) Inclusion of Irrelavant Variables

In this case we would estimate 1 jointly with 2 by regressing Y
on X1 and X2 , that is, 1 is a subvector of
1

= = (X 0 X)1 X 0 Y
2
It is important to see that under the classical assumptions and

hence 1 will be unbiased. Why?

Model Specification
Variances
Result: the estimator that omits X2 has smaller variance.

Model Specification
Bias-variance trade-off
To summarize:
In practice we do not know which model holds (the large one
or the small one)?
The trade-off: estimating a small model (omit variables)
implies a gain in precision and a likely bias. A large model is
less likely to be biased and will be more inefficient.
Variable omission does not necessarily lead to biases.

Model Specification
Ommited Variable Bias: an example
Computer generated data, but based on Appleton, French and

Vanderpump (Ignoring a Covariate: an Example of Simpons
Paradox, The American Statistician, 50, 4, 1996)
Y = risk of death.
SM OKE = consumption of cigarrettes.

Model Specification
. reg y smoke
Source | SS df MS Number of obs = 100

-------------+------------------------------ F( 1, 98) = 194.34
Model | 7613.25147 1 7613.25147 Prob > F = 0.0000
Residual | 3839.18734 98 39.1753811 R-squared = 0.6648
-------------+------------------------------ Adj R-squared = 0.6614
Total | 11452.4388 99 115.6812 Root MSE = 6.259
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | -1.819348 .1305081 -13.94 0.000 -2.078337 -1.560359
_cons | 158.5975 4.774249 33.22 0.000 149.1231 168.0718
------------------------------------------------------------------------------

Model Specification
. reg y smoke age
Source | SS df MS Number of obs = 100

-------------+------------------------------ F( 2, 97) = 5424.58
Model | 11350.9524 2 5675.47622 Prob > F = 0.0000
Residual | 101.486373 97 1.04625126 R-squared = 0.9911
-------------+------------------------------ Adj R-squared = 0.9910
Total | 11452.4388 99 115.6812 Root MSE = 1.0229
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | .9431267 .050902 18.53 0.000 .8421004 1.044153
age | .9804631 .0164039 59.77 0.000 .9479059 1.01302
_cons | 12.84084 2.560392 5.02 0.000 7.759169 17.92251
------------------------------------------------------------------------------
. cor y smoke age

(obs=100)
| y smoke age
-------------+---------------------------
y | 1.0000
smoke | -0.8153 1.0000
age | 0.9797 -0.9080 1.0000

Specification Econ 471

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Specification Econ 471

Uploaded by

Copyright:

Available Formats

The Unbiasedness Property

Model Specification: Precision and Bias

Walter Sosa-Escudero Model Specification: Precision and Bias

The Classical Linear Model:

Gauss/Markov: = (X 0 X)1 X 0 Y is best linear unbiased.

Walter Sosa-Escudero Model Specification: Precision and Bias

Result: is unbiased, that is E()

Note the crucial role of the E(u) = 0 assumption.

Walter Sosa-Escudero Model Specification: Precision and Bias

Specification errors, bias and imprecision

So far we have considered that our linear model Y = X + u is

Consider the following case

where all classical assumptions hold K1 and K2 are the columns of

Walter Sosa-Escudero Model Specification: Precision and Bias

Consider the following scenarios regarding 2 and the

Omission of relevant variables: 2 6= 0, but we wrongly

Walter Sosa-Escudero Model Specification: Precision and Bias

Let us compare results for the estimation of 1 in the two scenarios

Let 1 = (X10 X1 )1 X10 Y .

Walter Sosa-Escudero Model Specification: Precision and Bias

Result: 1 will be biased unless X20 X1 = 0.

X20 X1 = 0 implies that X2 and X1 are uncorrelated.

In words: the omission or relevant (2 6= 0) variables leads to

Walter Sosa-Escudero Model Specification: Precision and Bias

This expression is equal to 1 if X10 X2 = 0.

Walter Sosa-Escudero Model Specification: Precision and Bias

II) Inclusion of Irrelavant Variables

It is important to see that under the classical assumptions and

Walter Sosa-Escudero Model Specification: Precision and Bias

Result: the estimator that omits X2 has smaller variance.

Walter Sosa-Escudero Model Specification: Precision and Bias

Walter Sosa-Escudero Model Specification: Precision and Bias

Ommited Variable Bias: an example

Computer generated data, but based on Appleton, French and

Walter Sosa-Escudero Model Specification: Precision and Bias

Source | SS df MS Number of obs = 100

Walter Sosa-Escudero Model Specification: Precision and Bias

. reg y smoke age

Source | SS df MS Number of obs = 100

. cor y smoke age

Walter Sosa-Escudero Model Specification: Precision and Bias

You might also like