Possible Indirect Measures For Alleviating Multicollinerarity

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(1) Combine the correlated variables.

In this sequence, we look at four possible indirect methods for alleviating a problem of
multicollinearity.

1
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(1) Combine the correlated variables.

First, if the correlated variables are similar conceptually, it may be reasonable to combine
them into some overall index.

2
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. reg S ASVABC SM SF
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 3, 496) = 81.06
Model | 1235.0519 3 411.683966 Prob > F = 0.0000
Residual | 2518.9701 496 5.07856875 R-squared = 0.3290
-----------+------------------------------ Adj R-squared = 0.3249
Total | 3754.022 499 7.52309018 Root MSE = 2.2536
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------

That is precisely what has been done with the three cognitive ASVAB variables. ASVABC
has been calculated as a weighted average of scores on subtests: ASVABAR (arithmetic
reasoning), ASVABWK (word knowledge), and ASVABPC (paragraph comprehension).
3
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. reg S ASVABC SM SF
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 3, 496) = 81.06
Model | 1235.0519 3 411.683966 Prob > F = 0.0000
Residual | 2518.9701 496 5.07856875 R-squared = 0.3290
-----------+------------------------------ Adj R-squared = 0.3249
Total | 3754.022 499 7.52309018 Root MSE = 2.2536
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------

The three components are highly correlated and by combining them as a weighted average,
rather than using them individually, one avoids a potential problem of multicollinearity.

4
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(2) Drop some of the correlated variables.

Dropping some of the correlated variables, if they have insignificant coefficients, may
alleviate multicollinearity.

5
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(2) Drop some of the correlated variables.

However, this approach to multicollinearity is dangerous. It is possible that some of the


variables with insignificant coefficients really do belong in the model and that the only
reason their coefficients are insignificant is because there is a problem of multicollinearity.
6
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(2) Drop some of the correlated variables.

If that is the case, their omission may cause omitted variable bias, to be discussed in
Chapter 6.

7
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u

A further way of dealing with the problem of multicollinearity is to use extraneous


information, if available, concerning the coefficient of one of the variables.

8
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u

For example, suppose that Y in the equation above is the demand for a category of
consumer expenditure, X is aggregate disposable personal income, and P is a price index
for the category.
9
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u

To fit a model of this type you would use time series data. If X and P are highly correlated,
which is often the case with time series variables, the problem of multicollinearity might be
eliminated in the following way.
10
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u Y ' = β 1' + β 2' X ' + u


Ŷ=' βˆ1' + βˆ2' X '

Obtain data on income and expenditure on the category from a household survey and
regress Y' on X'. (The ' marks are to indicate that the data are household data, not
aggregate data.)
11
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u Y ' = β 1' + β 2' X ' + u


Ŷ=' βˆ1' + βˆ2' X '

This is a simple regression because there will be relatively little variation in the price paid
by the households.

12
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u Y ' = β 1' + β 2' X ' + u

Y =β 1 + βˆ2' X + β 3 P + u Ŷ=' βˆ1' + βˆ2' X '

Y − βˆ2' X =
Z= β1 + β 2 P + u

Now substitute β̂ 2' for β2 in the time series model. Subtract β̂ 2' X from both sides, and regress
Z = Y – β̂ 2' X on price. This is a simple regression, so multicollinearity has been eliminated.

13
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u Y ' = β 1' + β 2' X ' + u

Y =β 1 + βˆ2' X + β 3 P + u Ŷ=' βˆ1' + βˆ2' X '

Y − βˆ2' X =
Z= β1 + β 2 P + u

There are some problems with this technique. First, the β2 coefficients may be conceptually
different in time series and cross-section contexts.

14
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(3) Empirical restriction

Y = β1 + β 2 X + β 3 P + u Y ' = β 1' + β 2' X ' + u

Y =β 1 + βˆ2' X + β 3 P + u Ŷ=' βˆ1' + βˆ2' X '

Y − βˆ2' X =
Z= β1 + β 2 P + u

Second, since we subtract the estimated income component β̂ 2' X, not the true income
component β 2X, from Y when constructing Z, we have introduced an element of
measurement error in the dependent variable.
15
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(4) Theoretical restriction

The last, but by no means least, indirect method for alleviating multicollinearity is the use of
a theoretical restriction, which is defined as a hypothetical relationship among the
parameters of a regression model.
16
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(4) Theoretical restriction

S = β 1 + β 2 ASVABC + β 3 SM + β 4 SF + u

It will be explained using a basic educational attainment model as an example. Suppose


that we hypothesize that highest grade completed, S, depends on ASVABC, and highest
grade completed by the respondent's mother and father, SM and SF, respectively.
17
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. reg S ASVABC SM SF
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 3, 496) = 81.06
Model | 1235.0519 3 411.683966 Prob > F = 0.0000
Residual | 2518.9701 496 5.07856875 R-squared = 0.3290
-----------+------------------------------ Adj R-squared = 0.3249
Total | 3754.022 499 7.52309018 Root MSE = 2.2536
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------

S increases by 0.09 years for every extra year of schooling of the mother and 0.20 years for
every extra year of schooling of the father.

18
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. reg S ASVABC SM SF
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 3, 496) = 81.06
Model | 1235.0519 3 411.683966 Prob > F = 0.0000
Residual | 2518.9701 496 5.07856875 R-squared = 0.3290
-----------+------------------------------ Adj R-squared = 0.3249
Total | 3754.022 499 7.52309018 Root MSE = 2.2536
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------

Mother's education is generally held to be at least, if not more, important than father's
education for educational attainment, so this outcome is unexpected.

19
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. reg S ASVABC SM SF
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 3, 496) = 81.06
Model | 1235.0519 3 411.683966 Prob > F = 0.0000
Residual | 2518.9701 496 5.07856875 R-squared = 0.3290
-----------+------------------------------ Adj R-squared = 0.3249
Total | 3754.022 499 7.52309018 Root MSE = 2.2536
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------
. cor SM SF
(obs=500)
| SM SF
--------+------------------
SM | 1.0000
SF | 0.5312 1.0000

However assortive mating leads to correlation between SM and SF and the regression may
be suffering from multicollinearity. This could lead to erratic estimates of the coefficients.

20
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(4) Theoretical restriction

S = β 1 + β 2 ASVABC + β 3 SM + β 4 SF + u

β3 = β4

Suppose that we hypothesize that mother's and father's education are equally important.
We can then impose the restriction β3 = β4.

21
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(4) Theoretical restriction

S = β 1 + β 2 ASVABC + β 3 SM + β 4 SF + u

β3 = β4

S =β 1 + β 2 ASVABC + β 3 ( SM + SF ) + u
β 1 + β 2 ASVABC + β 3 SP + u
=

This allows us to rewrite the equation as shown.

22
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

σ u2 1 σ 2
1
σ=
2
× = u
×
βˆ2
∑ ( 2i 2 )
X − X
2
1 − r 2
X2 ,X3 n MSD ( X 2 ) 1 − r 2
X2 ,X3

(4) Theoretical restriction

S = β 1 + β 2 ASVABC + β 3 SM + β 4 SF + u

β3 = β4

S =β 1 + β 2 ASVABC + β 3 ( SM + SF ) + u
β 1 + β 2 ASVABC + β 3 SP + u
=

Defining SP to be the sum of SM and SF, the equation may be rewritten as shown. The
problem caused by the correlation between SM and SF has been eliminated.

23
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. g SP=SM+SF
. reg S ASVABC SP
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 120.22
Model | 1223.98508 2 611.992542 Prob > F = 0.0000
Residual | 2530.03692 497 5.09061754 R-squared = 0.3260
-----------+------------------------------ Adj R-squared = 0.3233
Total | 3754.022 499 7.52309018 Root MSE = 2.2562
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.243199 .1237327 10.05 0.000 1.000095 1.486303
SP | .1500751 .0229866 6.53 0.000 .1049123 .1952379
_cons | 10.50285 .6117 17.17 0.000 9.301009 11.70468
----------------------------------------------------------------------------

The estimate of β3 is now 0.150.

24
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. g SP=SM+SF
. reg S ASVABC SP
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.243199 .1237327 10.05 0.000 1.000095 1.486303
SP | .1500751 .0229866 6.53 0.000 .1049123 .1952379
_cons | 10.50285 .6117 17.17 0.000 9.301009 11.70468
----------------------------------------------------------------------------

. reg S ASVABC SM SF
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------

Not surprisingly, this is a compromise between the coefficients of SM and SF in the


previous specification.

25
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. g SP=SM+SF
. reg S ASVABC SP
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.243199 .1237327 10.05 0.000 1.000095 1.486303
SP | .1500751 .0229866 6.53 0.000 .1049123 .1952379
_cons | 10.50285 .6117 17.17 0.000 9.301009 11.70468
----------------------------------------------------------------------------

. reg S ASVABC SM SF
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------

The standard error of SP is much smaller than those of SM and SF. The use of the
restriction has led to a large gain in efficiency and the problem of multicollinearity has been
eliminated.
26
POSSIBLE INDIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY

. g SP=SM+SF
. reg S ASVABC SP
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.243199 .1237327 10.05 0.000 1.000095 1.486303
SP | .1500751 .0229866 6.53 0.000 .1049123 .1952379
_cons | 10.50285 .6117 17.17 0.000 9.301009 11.70468
----------------------------------------------------------------------------

. reg S ASVABC SM SF
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.242527 .123587 10.05 0.000 .999708 1.485345
SM | .091353 .0459299 1.99 0.047 .0011119 .1815941
SF | .2028911 .0425117 4.77 0.000 .1193658 .2864163
_cons | 10.59674 .6142778 17.25 0.000 9.389834 11.80365
----------------------------------------------------------------------------

The t statistic is very high. Thus it would appear that imposing the restriction has improved
the regression results. However, it is possible that the restriction may not be valid. We
should test it. Testing theoretical restrictions is one of the topics in Chapter 6.
27
Copyright Christopher Dougherty 2016.

These slideshows may be downloaded by anyone, anywhere for personal use.


Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.

The content of this slideshow comes from Section 3.4 of C. Dougherty,


Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.

2016.04.30

You might also like