Professional Documents
Culture Documents
Lec10 F
Lec10 F
TESTS
Chapter 7
THE ATTRIBUTES OF A GOOD MODEL
Whether a model used in empirical analysis is good, or appropriate, or the "right” model
cannot be determined without some reference criteria, or guidelines.
[7]
Where, is the slope coefficient in the regression of the omitted variable on the 7
included variable .
…BIAS DUE TO EXCLUSION OF RELEVANT X VARIABLE(S)
Therefore, iff the term
Now, from EQ. [7], we can obtain the bias introduced by the exclusion of a relevant
explanatory variable as:
Specification bias [8]
Above exposition tells us that the bias due to exclusion of relevant explanatory
variable(s) depends on two terms:
(a) Regression coefficient of the explanatory variable(s) excluded from the
fitted model.
(b) The covariance/correlation between the explanatory variable(s) dropped and
kept in the fitted regression model, that is,
NOTE: We work with data in mean deviation form and assume that to simplify the derivations. We
already know that in econometrics, most, but not all, of the regression results for the intercept term has no 8
economic interpretation.
EXCLUSION OF RELEVANT X VARIABLE(S): CONSEQUENCES
9
INCLUSION OF UNNECESSARY/IRRELEVANT X VARIABLES:
CONSEQUENCES
Another type of specification bias may arise when the set of explanatory variables is
enlarged by inclusion of one or more irrelevant variables.
The philosophy is that so long as you include the theoretically relevant variables,
inclusion of one or more unnecessary or “nuisance” variables will not hurt—
unnecessary in the sense that there is no solid theory that says they should be
included.
In that case inclusion of such variables will certainly increase (and adjusted when
the ).
This is called overfitting a model. But if the variables are not economically
meaningful and relevant, such a strategy is not recommended.
10
INCLUSION OF UNNECESSARY/IRRELEVANT X VARIABLES: CONSEQUENCES
The OLS estimators of the “incorrect” model are unbiased (as well as consistent). That
is, , and . If does not belong to the model, is expected to be zero.
Also, the estimator of obtained from over-fitted regression is correctly estimated.
11
INCLUSION OF UNNECESSARY/IRRELEVANT X VARIABLES:
CONSEQUENCES
The standard confidence interval and hypothesis-testing procedure on the basis of the t
and F tests remains valid.
However, the ’s estimated from the regression are inefficient ─ their variances will be
generally larger than those of the ’s estimated from the true model.
As a result, the confidence intervals based on the standard errors of ’s will be larger
than those based on the standard errors of ’s of the true model.
However, we can use F-test statistic (pls. see lecture on multivariate regression) to
choose the right variable(s) for our model.
12
IS IT BETTER TO INCLUDE IRRELEVANT VARIABLES THAN TO
EXCLUDE THE RELEVANT ONES?
The addition of unnecessary variables will lead to a loss in the efficiency of the
estimators (i.e., larger standard errors) and may also lead to the problem of
multicollinearity (Why?), not to mention the loss of degrees of freedom.
13
WRONG FUNCTIONAL FORM (LINEAR OR NON-LINEAR).
Sometimes researchers mistakenly do not account for the nonlinear nature of
variables in a model.
Moreover, some dependent variables (such as wage, which tends to be skewed to
the right) are more appropriately entered in natural log form.
Consider the following true marginal cost model
[11]
Instead the econometrician estimated following model
Consistently underestimate
[12] the true marginal cost
14
TESTS FOR OMITTED VARIABLE AND FUNCTIONAL FORM OF REGRESSION EQUATION
RESET Test:
Consider the original model as: [13]
RESET adds polynomials in the OLS fitted values to above EQ. to detect general kinds of
functional form misspecification.
To implement RESET we need to decide how many functions of fitted values to include in an
expanded regression. However, there is no right or wrong answer.
Let denotes the OLS fitted value.
We use this equation to test whether original equation has missed important nonlinearities.
15
…TESTS FOR OMITTED VARIABLE AND FUNCTIONAL FORM OF REGRESSION EQUATION
To illustrate use of MWD test to identify which functional form is correct, we specify the
hypotheses as follows:
Linear Model: is a linear function of the
Log-linear Model: is a linear function of or the ’s.
16
…TESTS FOR OMITTED VARIABLE AND FUNCTIONAL FORM OF REGRESSION
EQUATION
The MWD test involves the following steps:
Estimate the linear model and obtain the values.
Obtain
17
…TESTS FOR OMITTED VARIABLE AND FUNCTIONAL FORM OF REGRESSION
EQUATION
LM Test:
This is an alternative to RESET test. Estimate model in EQ.[13] and obtain the
estimated residuals, .
If in fact EQ.[13] is the correct model, then the residuals obtained from this model
should not be related to the regressors omitted from the model.
We now regress on the regressors in the original model and the omitted variables
from the original model.
[11]
If the sample size is large, it can be shown that n (the sample size) times the R2
obtained from the auxiliary regression (follows a distribution, symbolically, .
If the computed value > critical value at the chosen level of significance we reject
the null of no misspecification.
18
ERRORS OF MEASUREMENT
So far we have assumed implicitly that the dependent variable and the
explanatory variables, the , are measured without any errors.
Although not explicitly spelled out, this presumes that the values of the regressand as well
as regressors are accurate. That is, they are not guess estimates, extrapolated, interpolated
or rounded off in any systematic manner or recorded with errors.
Consequences for Errors of Measurement in the Regressand:
In short, errors of measurement in the regressand do not pose a very serious threat to OLS
estimation.
19
…ERRORS OF MEASUREMENT
We should thus be very careful in collecting the data and making sure that some
obvious errors are eliminated.
20
OUTLIERS, LEVERAGE, AND INFLUENCE DATA
Observations or data points that are not “typical” of rest of the sample are known
as outliers, leverage or influence points.
Outliers: In the context of regression analysis, an outlier is an observation with a
large residual (ei), large in comparison with the residuals of the rest of the
observations.
Leverage: An observation is said to exert (high) leverage if it is disproportionately
distant from the bulk of the sample observations. In this case such observation(s)
can pull the regression line towards itself, which may distort the slope of the
regression line.
Influential point: If a levered observation in fact pulls the regression line toward
itself, it is called an influential point. The removal of such a data point(s) from the
sample can dramatically change the slope of the estimated regression line.
21
…OUTLIERS, LEVERAGE, AND INFLUENCE DATA