Unit 5. Model Selection: María José Olmo Jiménez

Unit 5.
Model selection
María José Olmo Jiménez
Econometrics
Contents
1 Introduction 2
2 Model specification errors 3

2.1 Omission of relevant variable(s) . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Inclusion of irrelevant variable(s) . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Incorrect functional form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Chow test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Computational techniques for variable selection 11
4 Model selection criteria 14
1
1 Introduction
• When we estimate a linear regression model we assume that the regression model used
in the analysis is “correctly” specified.
• If the model is not “correctly” specified, we encounter the problem of model specifica-
tion error or model specification bias.
• In this unit we take a close and critical look at this assumption, because searching for
the correct model is not trivial. In particular we examine the following questions:
1. How does one go about finding the “correct” model? In other words, what are the
criteria in choosing a model for empirical analysis?
2. What types of model specification errors is one likely to encounter in practice?
3. What are the consequences of specification errors?
4. How does one detect specification errors? In other words, what are some of the
diagnostic tools that one can use?
5. Having detected specification errors, what remedies can one adopt and with what
benefits?
6. How does one evaluate the performance of competing models?
What are the criteria in choosing a model for empirical analysis?

According to some econometricians, a model chosen for empirical analysis should satisfy
the following criteria:
• Be encompassing and as much simple as possible, that is, other models cannot be
an improvement over the chosen model.
• Have all the relevant regressors for the dependent variable.
• Be consistent with theory, that is, it must make economic sense (Do the coefficients
have the expected signs?)
• Be jointly relevant, that is, rejecting the null hypothesis in the overall significance test.
• Have all regressors individually relevant, that is rejecting the null hypothesis in the
individual significance tests.
• Exhibit parameter constancy; that is, the values of the parameters should be stable.
Otherwise, forecasting will be difficult.
• Exhibit data coherency, that is, the residuals estimated from the model must be
coherent with the assumptions of the model about the error terms.
2
2 Model specification errors
Strictly speaking, a specification error is committed when some of the assumptions of the
linear regression model are violated. The violation of the assumptions related to the error terms
(homocedasticity, no autocorrelation and normality) will be studied in the following units. So,
in this unit we focus on the specification errors related to the explanatory variables and the
functional form:
1. Omission of relevant variable(s)
2. Inclusion of an unnecessary variable(s)
3. Adopting the wrong functional form
Remark: The model can also present problems related to the sample information such as
multicollinearity, errors of measurement and outliers and influential observations.
2.1 Omission of relevant variable(s)

Definition and consequences
• We commit a specification error of omission of a relevant variable when we omit a

relevant variable from a regression model, that is we are underfitted the model. The
reasons may be ignorance or data unavailability.
• The consequencies of this error are:
– The estimators of the regression coefficients are biased.

– The estimators of the regression coefficients are inconsistent and the bias does not
disappear as the sample size gets larger.
– The disturbance variance σ 2 is incorrectly estimated: its OLS estimator is biased.
– The variance of the regression coefficients is overestimated, so the standard errors
of βbj are greater than they should be.
– In consequence, the usual confidence interval and hypothesis-testing procedures
are likely to give misleading conclusions about the statistical significance of the
estimated parameters.
– The forecasts based on the incorrect model and the forecast (confidence) intervals
will be unreliable.
Example
Consider the model (the expectations-augmented Phillips curve):
Yt = β0 + β1 X1t + β2 X2t + ut
3
where Yt is the actual inflation rate at time t (in %), X1t the actual unemployment rate
prevailing at time t (in %) and X2t the expected inflation rate at time t (in %).
Coefficient Estimates s.e. texp p−value

β0 7.19336 1.59479 4.51054 0.0011
β1 -1.39247 0.305018 -4.56521 0.0010
β2 1.47003 0.155786 8.36263 0.0000
Suppose that for some reason we fit the model without X2 which is a relevant variable:
Coefficient Estimate s.e. texp p−value

β0 6.12717 4.28528 1.42982 0.1806
β1 0.24493 0.63046 0.38850 0.7051
The consequences of omitting this variable are very serious:
• The estimates of the regression coefficients change, in fact, βb1 changes from negative
to positive.
• The standard errors increase.
• β1 stops being relevant, althouth in fact it is.
So, the results obtained os resultados obtenidos nos llevarían a conclusiones erróneas.
How does one detect this error?
• Subjective tools: If in fact there is such errors, a plot of the residuals will exhibit a
noticeable pattern (trend, cyclic...).
• Objective tools:
– Ramsey’s regression specification error test (RESET).

– Lagrange multiplier (LM) test for adding variables.
Both tests are implemented in Gretl.
Possible solutions
The solution seems very simple: introduce the omitted variables in the model. However,
when we specify an econometric model we follow the economic theory and our common sense
and we are not aware of forgetting some variables. Moreover, the omission of relevant variables
may get confused with other violations of the basic assumptions of a LRM that need different
treatments.
Thus, if we know which the omitted variable is
• and we have data about this variable, we can include it in the model.
4
• but we do not have data about this variable:
– we can replace it by a proxy variable, that is, a highly correlated variables with
the original omitted variable. In this way, the bias caused by the omitted variable
is reduced. The lagged dependent variable is often used as proxy variable.
– we can used panel data, since they allow for reducing the bias caused by the omitted
variable when this variable is constant in time.
2.2 Inclusion of irrelevant variable(s)

• Now let us assume that Y = β0 + β1 X1 + u is the true model, but we fit this one
Y = α0 + α1 X1 + α2 X2 + u. Then we commit the specification error of including
an unnecessary variable in the model, that is, we are overfitting the model.This error
happens accidentally because the researcher is not sure about its role in the model.
• The consequences of this error are:
– The OLS estimators of the regression coefficients are unbiased and consistent.
– The error variance σ 2 is correctly estimated.
– The usual confidence intervals and hypothesis-testing procedures remain valid.
However,
– the estimated αj0 s will be generally inefficient, that is, their variances will be gen-
erally larger than those of the βj0 s of the true model.
– Therefore, the only penalty we pay for the inclusion of the superfluous variable
is that the estimated variances of the coefficients are larger, and as a result our
probability inferences about the parameters are less precise.
Example 3 of Unit 3
In Unit 3 we studied the demand of a product in terms of its price per unit considering a
third-degree polynomial regression model:
Y = β0 + β1 X + β2 X 2 + β3 X 3 + u
We obtained the following results:

β0 2555.72 913.55 2.79757 0.0266
β1 -444.365 213.254 -2.08374 0.0757
β2 26.8745 16.1498 1.66408 0.1400
β3 -0.542572 0.397408 -1.36528 0.2144
5
Let us consider a quadratic model since the term X 3 is not individually relevant. The results
are now:
β0 1330.41 179.565 7.40905 0.0001
β1 -155.467 27.8676 -5.57878 0.0005
β2 4.86612 1.0303 4.72163 0.0015
X 3 was clearly an irrelevant variable since:
• The OLS estimates of the regression coefficients change.
• Neither of the slopes were significant at the 5% significance level, because their standard
errors were really higher.
So, the results of the cubic model would lead us to wrong conclusions.

The simplest way to know whether an explanatory variable is irrelevant or not is the
individual significance test.
What remedies one can adopt?
• If the elimination of this variable helps to clarify the model, we can remove it.
• In practice, if the elimination of a variable hasn’t been wrong, the estimates of the
regression coefficients will be more precise, that is, their standard errors will be smaller.
Summarising
Omission of relevant Inclusion of irrelevant

variables variables
True model y = XAβ A + XB β B + u y = XA β A + u
Fitted model y = XAβ A + u y = XAβ A + XB β B + u
βb Biased and inconsistent Unbiased and consistent
b2
σ Biased Unbiased
Inference No valid Valid
2.3 Incorrect functional form

• A multiple regression model suffers from functional form misspecification when it does
not properly account for the relationship between the dependent and the observed ex-
planatory variables.
6
• Sometimes the economic theory does not provide information about the functional form,
so we specify a linear model since it is the simplest one.
• However there are many possible functional forms and then, the chosen one may be
incorrect.
• The consequences of this specification error are the same as those of omitting a relevant
variable. In addition, some of the assumptions related to the error terms may be violated.
Example 1
The following table contains data about Y , U.S. expenditure on imported goods (in billions
of 1982 dollars), and X, personal disposable income (in billions of 1982 dollars), from 1968 to
1987:
Year Yi Xi Año Yi Xi
1968 135.7 1551.3 1978 274.1 2167.4
1969 144.6 1599.8 1979 277.9 2112.6
1970 150.9 1668.1 1980 253.6 2214.3
1971 166.2 1728.4 1981 258.7 2248.6
1972 190.7 1797.4 1982 249.5 2261.5
1973 218.2 1916.3 1983 282.2 2331.9
1974 211.8 1896.6 1984 351.1 2469.8
1975 187.9 1931.7 1985 367.9 2542.8
1976 229.3 2001.0 1986 412.3 2640.9
1977 259.4 2066.6 1987 439.0 2686.3
1. Firstly, we consider a linear regression model:
Yt = β0 + β1 Xt + β2 T + ut
where T is a trend variable (1 for the first year, 2 for the second...).

β0 -751.465 133.542 -5.6272 0.0000
β1 0.575673 0.0890487 6.4647 0.0000
β2 -19.0095 5.08598 -3.7376 0.0016
p − valor = 4.37 · 10−13 (sign. global), R2 = 0.964853
2. Secondly, we consider a regression model with logarithms:
ln Yt = β0 + β1 ln Xt + β2 T + ut ,

β0 -22.0135 5.27649 -4.1720 0.0006
β1 3.66494 0.718253 5.1026 0.0001
β2 -0.0458548 0.0197892 -2.3172 0.0332
p − value = 2.16 · 10−13 (overall sign.), R2 = 0.967653
7
The variables are individually and jointly significant in both models and the predictive power is
high (nevertheless, the R2 cannot be used to compare them). So, both models seem adequate.
Then, which model does one prefer? Is the functional form of any of them incorrect?
• Subjective tools: Residual plotsGráfico de residuos frente a valores predichos o frente

a variables explicativas.
• Objective tools:
– Ramsey’s RESET test

– Non-linearity test (squares and logarithms)
Ramsey’s RESET test

This is a general test of specification error (omission of relevant variable(s) or incorrect
functional form).
Steps:
1. From the chosen model, obtain the estimated Ybi .
2. Rerun the model introducing Ybi in some form as an additional regressor(s). The squared
Ybi2 and cubed Ybi3 terms have proven to be useful in most applications.
3. Test whether the additional regressor(s) are jointly significant. Then:
• if we reject H0 , we conclude that the model is misspecified.

• if we do not reject H0 , we conclude that the model is correctly specified.
One advantage of RESET is that it is easy to apply, for it does not require one to specify
what the alternative model is. But that is also its disadvantage because knowing that a model
is mis-specified does not help us necessarily in choosing a better alternative.
Example 1 (continuation)
Returning to Example 1 we apply RESET to the two models (version with squares).
1. For the linear regression model, after calculating the Ybi values we fit the model
Y = β0 + β1 X + β2 T + β3 Yb 2 + u
8
obtaining the following results:

β0 -88.6937 308.264 -0.2877 0.7773
β1 0.136158 0.204594 0.6655 0.5152
β2 -2.29407 8.48322 -0.2704 0.7903
β3 0.00114487 0.000491264 2.3305 0.0332
Since the term Ybi2 is individually significant at the 5%, one can conclude that the linear
regression model is misspecified..
2. For the model with logarithms, after calculating the lnY

d i values we fit the model
2
ln Y = β0 + β1 ln X + β2 T + β3 ln
d Y + u.
The results are:

β0 31.9347 32.0322 0.9970 0.3336
β1 -4.35777 4.75412 -0.9166 0.3729
β2 0.0496846 0.0590896 0.8408 0.4128
β3 0.208635 0.122358 1.7051 0.1075
2
Since the term lnY
d is not individually significant at the 5%, one can conclude that the
model with logarithms is correctly specified.
So, we dismiss the linear regression model.
What remedies can one adopt?

The solution to this problem is re-specified the funtional form of the model.
2.4 Chow test

• When we use a regression model involving time series data, it may happen that there
is a structural change in the relationship between the regressand Y and the regressors.
By structural change, we mean that the values of the parameters of the model do not
remain the same through the entire time period. Sometimes the structural change may
be due to external forces, policy changes, action taken by the Government or to a variety
of other causes.
• In regression model involving cross-sectional data, a similar structural change may happen
between two groups of observations.
• How do we find out that a structural change has in fact occurred? Chow test
The Chow test consist of:
9
• Suppose a sample of size n divided in two independent sub-samples of sizes n1 and n2 ,
respectively (n1 + n2 = n):
Yi =β0 + β1 X1i + . . . + βk Xki + ui , i = 1, . . . , n1

Yl =β00 + β10 X1l + ... + βk0 Xkl + ul , l = n1 + 1, . . . , n1 + n2
• The hypotheses of the test are:
H0 : βj = βj0 , j = 0, . . . , k
H1 : Any pair of coefficients is different
The mechanics of the Chow test are as follows:
• Estimate the model with all the observations (supposing that there is no parameter
instability) by the OLS method and obtain RSS.
• Estimate the model with the observations of the first sub-sample by the OLS method
and obtain RSS1 .
• Estimate the model with the observations of the second sub-sample by the OLS method
and obtain RSS2 .
• Now the idea behind the Chow test is that if in fact there is no structural change (i.e.,
the two regressions are essentially the same), then the RSS and RSS1 + RSS2 should
not be statistically different. Therefore, if we form the following ratio:
RSS−(RSS1 +RSS2 )
k+1
F = RSS1 +RSS2
v F (k + 1, n − 2(k + 1))
H0
n−2(k+1)
• We reject H0 of parameter stability (i.e. structural change) at the α significance level if
Fexp > Fk+1,n−2(k+1);1−α
Example 2
We wish to analyse the per capita food consumption, Y , in function of the price, X1 , and
the per capita income (both adjusted by the CPI) in the periods 1927-1941 and 1948-1962,
using the data in Table 1 (the years of the World War II have been omitted). We suspect that
there could be a structural change in the relationship between Y and X1 and X2 due to the
effect of World War II. We apply the Chow test:
1. We estimate the model with all the observations:
Ybt =77.5322 − 0.0561155X1t + 0.287650X2t , t = 1927, . . . , 1962

RSS =28.75457
10
Year Yi X1i X2i Year Yi X1i X2i
1927 88.9 91.7 57.7 1948 96.7 105.3 82.1
1928 88.9 92 59.3 1949 96.7 102 83.1
1929 89.1 93.1 62 1950 98 102.4 88.6
1930 88.7 90.9 56.3 1951 96.1 105.4 88.3
1931 88 82.3 52.7 1952 98.1 105 89.1
1932 85.9 76.3 44.4 1953 99.1 102.6 92.1
1933 86 78.3 43.8 1954 99.1 101.9 91.7
1934 87.1 84.3 47.8 1955 99.8 100.8 96.5
1935 85.4 88.1 52.1 1956 101.5 100 99.8
1936 88.5 88 58 1957 99.9 99.8 99.9
1937 88.4 88.4 59.8 1958 99.1 101.2 98.4
1938 88.6 83.5 55.9 1959 101 98.8 101.8
1939 91.7 82.4 60.3 1960 100.7 98.4 101.8
1940 93.3 83 64.1 1961 100.8 98.8 103.1
1941 95.1 86.2 73.7 1962 101 98.4 105.5
Table 1: Data on per capita food consumption, price and per capita income before and after
the World War II
2. We estimate the model again but in the two periods separately:

Ybt =86.0832 − 0.215958X1t + 0.378127X2t , t = 1927, . . . , 1941
RSS1 =8.567271
Ybt =107.879 − 0.2258368X1t + 0.149715X2t , t = 1948, . . . , 1962
RSS2 =5.401241
3. The value of the F statistic is:

28.75457−(8.567271+5.401241)
3
Fexp = 8.567271+5.401241 = 8.468222
30−2·3
4. Conclusion: Since Fexp > F3,24;0.95 = 3.00879, we reject H0 , so a structural change in

food consumption happened due to the World War II, that is to say, the food consumer
habits changed after the World War II.
3 Computational techniques for variable selection

We have seen that it is desirable to consider regression models that employ a subset of the
candidate regressor variables. To find the subset of variables to use in the final equation,
when the economic theory does not provide us with information in this regard, it is natural to
consider fitting models with various combinations of the candidate regressors. In this section
we will discuss several computational techniques for generating subset regression models:
11
1. All possible regressions: This procedure requires that the analyst fit all the regression
equations involving one candidate regressor, two candidate regressors, and so on. These
equations are evaluated according to some suitable criterion and the “best” regression
model selected.
2. Stepwise regression methods: Because evaluating all posible regressions can be bur-
densome computationally, various methods have been developed for evaluating only a
small number of subset regression models by either adding or deleting regressors one at
a time. They can be classified into three broad categories:
• Forward selection
• Backward elimination
• Stepwise regression.
Forward selection
This procedure begins with the assumption that there are no regressors in the model
other than the intercept.An effort is made to find an optimal subset by inserting regressors
into the model one at a time.
Steps:
1. The first regressor selected for entry into the equation is the one that has the largest
simple correlation with the response variable Y .
2. We carry out all the possible regressions adding a new regressor to the previous model.
3. The second regressor chosen for entry is the one that now has the lowest p−value (lower
than α) in the corresponding individual significance test, that is, the most relevant
regressor among those that are relevant.
4. Repeat Steps 2 and 3.
5. The procedure terminates either when all the p − values at a particular step are greater
than α or when the last candidate regressor is added to the model.
Backward elimination
Forward selection begins with no regressors in the model and attempts to insert variables
until a suitable model is obtained. Backward elimination attempts to find a good model by
working in the opposite direction.
Steps:
1. We begin with a model that includes all k candidate regressors.
12
2. We first remove from the model the regressor which has the greatest p−value (greater
than α) corresponding to the individual significance test, that is, the most irrelevant
regressor among those that are irrelevant.
3. Now a regression model with k − 1 regressors is fit and the procedure repeat.
4. The backward elimination algorithm terminates when all the p−values corresponding to
the individual significance tests are less than α, that is, all the regressors in the model
are relevant.
Stepwise regression
• The two procedures described above suggest a number of possible combinations. One
of the most popular is the stepwise regression algorithm of Efroymson.
• Stepwise regression is a modification of forward selection in which at each step all re-
gressors entered into the model previously are reassessed via their p−value of the corre-
sponding individual significance test.
• A regressor added at an earlier step may now be redundant because of the relationships
between it and regressors now in the equation.
• If the p−value is greater than αout , that variable is dropped from the model.
• Some analysts prefer to choose the same α for adding and removing a regressor, that is,
αin = αout . Frequently we choose αin < αout , making it relatively more difficult to add
a regressor than to delete one.
The pros and cons
• The stepwise-type procedures are easy to implement.
• None of the procedures generally guarantees that the best subset regression model of
any size will be identified.
• Since all the stepwise-type procedures terminate with one final equation, inexperienced
analysts may conclude that they have found a model that is in some sense optimal.
• Part of the problem is that it is likely, not that there is one best subset model, but that
there are several equally good ones.
• The analyst should also keep in mind that the order in which the regressors enter or leave
the model does not necessarily imply an order of importance to the regressors. It is not
unusual to find that a regressor inserted into the model early in the procedure becomes
negligible at a subsequent step.
13
• Note that forward selection, backward elimination, and stepwise regression do not nec-
essarily lead to the same choice of final model.
• Some users have recommended that all the procedures be applied in the hopes of either
seeing some agreement or learning something about the structure of the data that might
be overlooked by using only one selection procedure. Furthermore, there is not necessarily
any agreement between any of the stepwise-type procedures and all possible regressions.
• For these reasons stepwise - type variable selection procedures should be used with
caution.
4 Model selection criteria

In this section we discuss some criteria for evaluating and comparing competitive regression
models with the same dependent variable:
• Coefficients of determination, R2 , adjusted or not.

2
• The Akaike information criterion (AIC): AIC imposes a harsher penalty than R for
adding more regressors. The criterion is
AIC = 2(k + 1) − 2 ln L
where L is the value of the likelihood function for the model.
• The Bayesian or Schwarz information criterion (BIC): This criterion places a

greater penalty on adding regressors as the sample size increases. This criterion is
BIC = (k + 1) ln n − 2 ln L
where n is the number of observations.
• Hannan-Quinn information criterion (HQC): This criterion is a version of the BIC

with a smaller penalty on adding regressors as the sample size increases. This criterion
is
HQC = 2(k + 1) ln ln n − 2 ln L
The model with the lowest AIC, BIC or HQC is preferred. They can be used for nested and
non-nested models.
Remark: We have discussed several model selection criteria, but one should look at these
criteria as an adjunct to the various specification tests we have discussed in this unit. Some
of the criteria discussed above are purely descriptive and may not have strong theoretical
properties. Nonetheless, they are so frequently used by the practitioner that one should be
aware of them. No one of these criteria is necessarily superior to the others.
14
Example 3
We will apply the stepwise-type procedures to the Hald cement data given in the file
Hald_cement.xls.
Hald (1952) presents data concerning the heat evolved in calories per gram of cement (Y )
as a function of the amount of each of four ingredients in the mix: tricalcium aluminate (X1 ),
tricalcium silicate (X2 ), tetracalcium alumino ferrite (X3 ) and dicalcium silicate (X4 ).
15

Unit 5. Model Selection: María José Olmo Jiménez

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5. Model Selection: María José Olmo Jiménez

Uploaded by

Copyright:

Available Formats

Unit 5.

2 Model specification errors 3

3 Computational techniques for variable selection 11

4 Model selection criteria 14

What are the criteria in choosing a model for empirical analysis?

• Have all the relevant regressors for the dependent variable.

1. Omission of relevant variable(s)

2. Inclusion of an unnecessary variable(s)

3. Adopting the wrong functional form

2.1 Omission of relevant variable(s)

• We commit a specification error of omission of a relevant variable when we omit a

• The consequencies of this error are:

– The estimators of the regression coefficients are biased.

Consider the model (the expectations-augmented Phillips curve):

Coefficient Estimates s.e. texp p−value

Coefficient Estimate s.e. texp p−value

The consequences of omitting this variable are very serious:

• The standard errors increase.

• β1 stops being relevant, althouth in fact it is.

How does one detect this error?

– Ramsey’s regression specification error test (RESET).

Both tests are implemented in Gretl.

2.2 Inclusion of irrelevant variable(s)

• The consequences of this error are:

We obtained the following results:

Coefficient Estimate s.e. texp p−value

• The OLS estimates of the regression coefficients change.

How does one detect this error?

What remedies one can adopt?

Omission of relevant Inclusion of irrelevant

2.3 Incorrect functional form

1. Firstly, we consider a linear regression model:

Coefficient Estimate s.e. texp p−value

2. Secondly, we consider a regression model with logarithms:

Coefficient Estimate s.e. texp p−value

How does one detect this error?

• Subjective tools: Residual plotsGráfico de residuos frente a valores predichos o frente

– Ramsey’s RESET test

Ramsey’s RESET test

1. From the chosen model, obtain the estimated Ybi .

3. Test whether the additional regressor(s) are jointly significant. Then:

• if we reject H0 , we conclude that the model is misspecified.

Coefficient Estimate s.e. texp p−value

2. For the model with logarithms, after calculating the lnY

The results are:

So, we dismiss the linear regression model.

What remedies can one adopt?

2.4 Chow test

The Chow test consist of:

Yi =β0 + β1 X1i + . . . + βk Xki + ui , i = 1, . . . , n1

• The hypotheses of the test are:

The mechanics of the Chow test are as follows:

• We reject H0 of parameter stability (i.e. structural change) at the α significance level if

Fexp > Fk+1,n−2(k+1);1−α

1. We estimate the model with all the observations:

Ybt =77.5322 − 0.0561155X1t + 0.287650X2t , t = 1927, . . . , 1962

2. We estimate the model again but in the two periods separately:

3. The value of the F statistic is:

4. Conclusion: Since Fexp > F3,24;0.95 = 3.00879, we reject H0 , so a structural change in

3 Computational techniques for variable selection

4. Repeat Steps 2 and 3.

1. We begin with a model that includes all k candidate regressors.