Multicollinearity AND Heteroskedasticity

MULTICOLLINEARITY
5
AND
LECTURE
HETEROSKEDASTICITY
1
Your Questions!?
1- What is the nature of IT?

2- What are its consequences?
3- How to detect IT?
4- What are the remedial measures?
2
3
CHAPTER 10
Multicollinearity
4
Multicollinearity
• The nature of Multicollinearity
• Consequences of Multicollinearity
• Detection of Multicollinearity
• Remedial measures
5
The nature of multicollinearity
In general:
When there are some functional relationships existing
among independent variables, that is  iXi = 0
or 1X1+ 2X2 + 3X3 +…+ iXi = 0

Such as 1X1+ 2X2 = 0  X1= -2X2
If multicollinearity is perfect, the regression coefficients of
the Xi variables, is, are indeterminate and their standard
errors, Se(i)s, are infinite.
6
Example:
Y = ^1 + ^2X2 + 
^X +u
3 3
^
3-variable Case:
(yx2)(x32) - (yx3)(x2x3)
^
2 =
(x22)(x32) - (x2x3)2
If x 3 = x 2,
(yx2)(2x22) - (yx2)(x2x2) 0
^ =
2 =
(x2 )( x ) -  (x2x2)
2 2
2
2 2 2
0
Indeterminate
(yx3)(x22) - (yx2)(x2x3)
^
3 =
(x22)(x32) - (x2x3)2
Similarly
If x3 = x2 ^
(yx2)(x22) - (yx2)(x2x2) 0
3 = =
(x2 )( x2 ) -  (x2x2)
2 2 2 2 2
0
7
Indeterminate
If multicollinearity is imperfect,
x3 = 2 x2+  where  is a stochastic error

(or x3 = 1+ 2 x2+  )
Then the regression coefficients, although determinate,

possess large standard errors, which means the
coefficients can be estimated but with less accuracy.
^ (yx2)(22x22 +  2 ) - (2 yx2 + y )(2 x2x2+ x2 )
2 =
(x22)(22 x22 +  2 ) - (2x2x2 + x2 )2
0
8
Large variance and covariance of OLS estimators
2 2
 
var(ˆ2 )  u
 VIF
u
 x2 (1  r23 )  x2
2 2 2
1
Variance-inflating factor: VIF  2
1  r23
Higher pair-wise correlation  higher VIF  larger variance
2
where r  OLS : X 2  1   2 X 3  v
23
r322  OLS : X 3  1'   2' X 2  v '
2 2
 
var(ˆ j )  u
 u
VIF j
 x j (1  rji )  x j
2 2 2
9
Consequences of imperfect multicollinearity
1. Although the estimated coefficients are BLUE,

OLS estimators have large variances and
covariances, making the estimation with
less accuracy.
2. The estimation confidence intervals tend to be
much wider, leading to accept the “zero null
hypothesis” more readily.
3. The t-statistics of coefficients tend to be
Can be
statistically insignificant.
detected from
4. The R2 can be very high. regression
5. The OLS estimators and their standard errors results
can be sensitive to small change in the data.
10
Detection of Multicollinearity
• High R2 but few significant t ratios

• Auxiliary regressions
• Variance - inflation factor - VIF
• Intercorrelation Matrix (Eview 4)
11
Significant
Insignificant
Since GDP and GNP

are highly related
Other examples: CPI <=> WPI;

CD rate <=> TB rate 12
M2 <=> M3
Auxiliary regressions
Basic idea:
Multicollinearity arises because one or more of
the regressors are exact or approximately
linear combinations of the other regressors
→regress each Xi on the remaining X variables
and compute the corresponding R2
→ each one of these regressions is called an
auxiliary regression, auxiliary to the main
regression of Y on the X’s
13
Implementation
• Step 1: regress the auxiliary regression
• Step 2: compute the corresponding R2, designated as Ri2
• Step 3: compute F statistics
Rx21 , x2 , x3 .... xk k  1
Fstat  F
1  R 2
x1 , x2 , x3 .... xk /n  k 
Rx21 , x2 , x3 .... xk
Where is the coefficient of determination i
the regression of variable Xi on the remaining X variables
- k is the number of explanatory variables in the original
regression including the intercept term.
- n is the number of observations in the original regression14
• Step4: compare Fstat with Fcrit(k-1, n-k)
If Fstat >Fcrit → it means that the particular Xi

should be dropped from the model and
vice versa in which we may retain that
variable in the model.
15
V-I-F
• Some authors use the VIF as an indicator
of Multicollinearity:
The larger is the value of VIFj’
The more “troublesome” or collinearity is the
variable Xj.
According to this ,if the VIF of variable
exceeds 10 (this will happen if Rj2 exceeds
0.9), that variable is said to be highly
collinear
16
Remedial Measures
Y  1   2 X 2   3 X 3  u
1. Utilise a priori information
 1   2 X 2  0.1 2 X 3  u
given  3  0.1 2  1   2 ( X 2  0.1X 3 )  u
 1   2 Z  u
2. Combining cross-sectional and time-series data
3. Dropping a variable(s) and re-specify the regression
4. Transformation of variables: Y  1   2 X 2   3 X 3  u '

(i) First-difference form Y 1 X
(ii) Ratio transformation  1 ( )   2 ( 2 )  3  u'
X3 X3 X3
5. Additional or new data
6. Reducing collinearity in polynomial regression
Y  1   2 X 22   3 X 3  u ' Y  1   2 X 2   3 X 32  u '
7. Do nothing (if the objective is only for prediction) 17
(i) First-difference form
The basic idea of this method is that if the relation
holds at time t ,it must also hold at time t-1.
Then we take the first difference from the regression,
then run the new regression which is not on the
original variables.
The first difference regression model often reduces
the severity of multicollinearity, there is no a priori
reason to believe that their differences will also be
highly correlated.
18
(i) First-difference form
• At time t
Yt  1   2 X 2t   3 X 3t  ut
• At time t-1
Yt 1  1   2 X 2t 1   3 X 3t 1  ut 1
• First difference
Yt  Yt 1  1   2 ( X 2t  X 2t 1 )   3 ( X 3t  X 3t 1 )  vt
• Where
vt  ut  ut 1
However this method creates some imperfection such as

the new error term - vi - may not satisfy one of the
assumptions of the classical linear regression model
(CLRM).
19
CHAPTER 11
Heteroscedasticity
20
Heteroscedasticity
• The nature of Heteroscedasticity
• Consequences of Heteroscedasticity
• Detection of Heteroscedasticity
• Remedial measures
21
Homoscedasticity Case
f(Yi) Yi
re
tu i
nd
pe
ex .
.
.
Var(ui) = E(ui2)= 2
x11=80 x12=90 x13=100 x1i

income
The probability density function for Yi at two
levels of family income, X i , are identical. 22
Homoscedastic pattern of errors
yi
.
. . . .
.. . . . .
. . . ...
... .. .
. . .. . .
. .. . . . .
..
. .
.
0 The scattered points spread out quite equally xi

23
Heteroscedasticity Case
f(Yi)
Y
i
r e
i tu
e nd
.
p
ex
.
. Var(ui) = E(ui2)= i2
x11 x1 2 x1 3 income x1
The variance of Yi increases as family income,
Xi, increases. 24
Heteroscedastic pattern of errors
.
yt . .
. .
. . .
. .
. . . .
. . . . .
. . . .. . . .
. . . . .
. . .. . . .
. . . . .
. .
0 The scattered points spread out quite unequally xt

25
Definition of Heteroscedasticity: Var(ui) = E(ui2) = i2  2
Two-variable regression: Y = 1 + 2 X + ui
 xy
^
2 = =  ki Y = ki (1 + 2 X + ui) k  0
i
x 2
k X i i 1
^ =  + k u
=>  2 2 i E( )^=  unbiased
i 2 2
Var (^2) = E (^2 - 2)2 = E (ki ui)2

= E (k12 u12 + k22 u22 + …. + 2k1k2 u1 u2 + …)
= k12 12 + k22 22 + … …..+ 0 + ...
= i2  ki2
if 12  22  32  …
x 2 i2
= i2 = i.e., heteroscedasticity
( xi2)2  xi2 if 12 = 22 = 32 = …
^ 2
Var (2) = i.e., homoscedasticity
 xi2 26
Consequences of heteroscedasticity
1. OLS estimators are still linear and unbiased

^
2. Var( i )s are not minimum.
=> not the best => not efficiency => not blue
i2  2
3. Var ( ^2) = instead of Var( ^) =
 2
Two-variable
x2  x2 case
^ ^2
u
4. 2 = i ^ 2 )  2
is biased, E(
n-k
5. t and F statistics are unreliable. Y = 0 + 1 X + u
SEE =  ^ RSS =  ^ u2
Cannot be min. 27
Detection of
Heteroscedasticity
Informal and formal methods
28
Informal Method
1. Graphical method :
î 2 ) against the
plot the estimated residual ( ûi ) or squared (u
^
predicted dependent Variable (Yi) or any independent
variable(Xi).
Observe the graph whether there is a systemic pattern as:
u^ 2
Yes,
heteroscedasticity exists
^
Y 29
GRAPHICAL METHOD
u^2
no heteroscedasticity u^ u^2
2
yes yes
^
Y ^ ^
Y Y
yes yes yes
u^2 u^2 u^2
^
Y ^ ^
Y Y
30
Formal Methods
1- Park Test
2- Goldfeld – Quandt Test
3- White’s Heteroscedasticity Test
- No cross terms
- With cross terms
31
1- PARK TEST
H0 : No heteroscedasticity exists i.e., Var( ui ) = 2
(homoscedasticity)
H1 : Yes, heteroscedasticity exists i.e., Var( ui ) = i2
Park test procedures:
1. Run OLS on regression: Yi = 1 + 2 Xi + ui , obtain uî
2. Take square and take log : ln ( ^
u 2) i
3. Run OLS on regression: ln( ^

ui2) = 1* + 2* ln Xi + vi
Suspected variable
that causes
4. Use t-test to test H0 : 2* = 0 (Homoscedasticity) heteroscedasticity
If t* > tc ==> reject H0 ==> heteroscedasticity exists
If t* < tc ==> not reject H0 ==> homoscedasticity 32

Example: Studenmund (2001), Equation 10.24, pp.370
Procedure 1
33
Check the residuals whether they are spreading out or not?
34
Procedure 2:
Obtain the residuals from previous regression,
take squares and take logs
35
Procedure 3 & 4
Ln( û 2 ) = 1* + 2* ln Xi + vi
If | t | > tc => reject H0

=> heteroscedasticity
36
2- The Goldfeld-Quandt Test
H0 : homoscedasticity Var ( ui ) = 2
H1 : heteroscedasticity Var ( ui ) = i2
Goldfeld-Quandt Test procedures:
(1) Order or rank the observations according to the values of Xi,
beginning with the lowest X value.
(2) Omit c central observations, where c is specified a priori, and
divide the remaining (n-c) observation into two groups each of
(n-c)/2 observations.
(3) Run the separate regression on two sub-samples and obtain the
respective RSS1 and RSS2.. Each RSS has [(n-c)/2 - k] df
RSS2/df
(4) Compute the -ratio:  =
RSS1/df
(5) Compare the  and the Fc, if  > Fc (0.05, (n-c)/2)-k)==> reject the H0
37
Gujarati(2003)Table 11.3 Re-order data
Run regression based

on this group of data (n1=13),
Obtains RSS1
Omit central observations

(c = 4)
Run regression based

on this group of data (n2=13),
Obtains RSS2
RSS 2 / df
  F38c ?
RSS1 / df
3.1- White’s heteroscedasticity test (no cross terms)
(LM test)
Test procedures:
(1) Run OLS on regression: Yi = 1 + 2X2i + 3X3i +...+ qXqi + ui ,
obtain the residuals, uî
(2) Run the auxiliary regression:
ûi2  1   2 X 2i  ...   q X qi   q 1 X 22i  ...   2 q 1 X qi2  vi
H0: 2 = 3 = …= q = 0
(3) Compute W (or LM) = nR2 asy
~ χ2df
(4) Compare the W and 2df(=q-1) (where the df is #(q) of regressors in (2))
if W > 2df ==> reject the Ho
39
Yi = 1 + 2X2i + 3X3i + 4X4i + ui
40
41
W=
2(0.05, 6) = 12.59
W> reject Ho
2(0.10, 6) = 10.64
The White test for a linear model

The test statistic (nR2) indicates
heteroscedasticity is existed.
42
2(0.05, 6) = 12.59
W< not reject Ho
2(0.10, 6) = 10.64
After transform the linear model to a log-log model.
The White test for a log-log model of the same data

43
The W-statistic (nR2) indicates heteroscedasticity is not existed
3.2- White’s general heteroscedasticity test (with cross terms)
Test procedures:
(1) Run OLS on regression: Yi = 1 + 2X2i + 3X3i + ui ,
obtain the residuals, uî
(2) Run the auxiliary regression:
û 2 =  +  X +  X +  X2 +  X2 +  X X + v
i 1 2 2i 3 3i 4 2i 5 3i 6 2i 3i i
(3) Compute W (or LM) = nR2 asy

~ χ2df
(4) Compare the W and 2df (where the df is # of regressors in (2))

if W > 2df ==> reject the Ho
44
45
Example 8.4 (Wooldridge, pp.258)
W=
2(0.05, 9) = 16.92
W>  reject Ho
 (0.10, 9) = 14.68
2
The White test for a linear model

The test statistic indicates
heteroscedasticity is existed.
46
For the log-log model
2(0.05, 9) = 16.92
W<  not reject Ho
2(0.10, 9) = 14.68
The White test for a log-log model

The test statistic indicates
heteroscedasticity is not existed
47
Remedial Measures
2

• When i is known: the method of weighted least
squares.
• When  i2 is unknown
- Assumption 1
- Assumption 2
- Assumption 3
- Assumption 4
48
General Meaning of Weighted Least Squares (WLS)
Suppose : Y = 1 + 2 X2 + 3 X3 + ui
E ( ui ) = 0, E ( ui uj )= 0 ij
Var (u i2) = i2 = 2 Z(X2) = 2Zi2 = 2E(Yi)2 If Var(ui2)=2Zi
If all Zi = 1 (or any constant), homoscedasticity returns.

But Zi can be any value, and it is the proportionality factor.
In the case of 2 was known:To correct the heteroscedasticity
Transform the regression:
Then each term
Yi 1 X2i X3i ui
=1 + 2 + 3 + divided by Zi
Zi Zi Zi Zi Zi
=> Y* = 1 X1* + 2 X2* + 3 X3* + ui*

49
Theoretically, in the transformed equation where
i 1
(i) E ( Z ) = Z E (i) = 0
i i
ui 2 1 1
(ii) E ( ) = 2E (ui ) =
2
2 Zi  = 
2 2 2
Zi Zi Zi
ui uj 1
(iii) E ( )= E ( ui uj ) = 0
Zi Zj ZiZj
These three results satisfy the assumptions of classical OLS,

therefore, use the Zi as the weight for each regressor can correct
the problem of heteroscedasticity, and can obtain the BLUE
estimators.
Determine to use “Zi” or “Zi ” as the weight?

50
2
When  is known: the method of
i
weighted least squares:
• Let’s consider the 2-variable model   1   2 X 2  ui
• Divide both sides of above model by  i to obtain:
Y 1 X u
 1    2  2  i
i i i i ui
• Now, in the model, the error term is realized by  i ,
hence, the variance of the error term can be rewritten as
follows:
2
 ui   1 
var     var ui   12   i2  1  const
i  i  i
51
When  is not known i
2
• Let’s consider the two-variable model: Y  1   2 X 2i  ui

• We now consider several assumptions about the pattern of
heteroscedasticity.
• Assumption 1: The error variance is proportional to
X i2 : E (ui2 )   i2   2 X i2
– Divide both sides of the original model by Xi to obtain:
Yi  u
 1  2  i
Xi Xi Xi
– We obtain the transformed disturbance term, which has the variance

as follows:
2
 ui  1 1
var     var(ui )  2   2 X i2   2  const
 Xi  X Xi 52
If the residual plot against X2 is as following :
ûi û2
+
0 X3
-
X3
This plot suggests a variance is increasing proportional to X3i2.
The scattered plots spreading out as nonlinear trumpet pattern.
Therefore, we might expect  2 = Z 22 Z 2 = X 2 =>Z =X
i i i 3i i 3i
Hence, the transformed equation becomes
Yi 1 X2i X3i ui
= 1 + 2 + 3 +
X2i X3i X3i X3i X3i
This becomes
=> Yi* = 1 X1* + 2 X2* + 3 + u* the intercept
Where u*i satisfies the assumptions of classical OLS coefficient
53
Example: Studenmund (2001), Eq. 10.27, pp.373
54
When  is not known
i
2
• Assumption 2: The error is proportional to Xi. the

2 2 2
square root transformation: E (u i )   i   Xi
– Divide both sides of the original model by X i to obtain
Yi i ui
  2 X i 
Xi Xi Xi ui
– Now, in the new model, the error term is realized by Xi
,
hence, the variance of the error term can be rewritten as
follows: 2
 ui   1  1
var      var(ui )    2 X i   2  const
 X   X  Xi
 i   i 
55
If the residual plot against X2 is as following :
û û2
i
+
0 X3
-
X3
This plot suggests a variance is increasing proportional to X3i.
The scattered plots spreading out as a linear cone pattern
Therefore, we might expect i2 = Zi2 hi2 = X3i => hi = X3i
The transformed equation is
Yi 1 X2i X3i ui
= 1 + 2 + 3 +
X3i  X3i  X3i  X3i  X3i
=> Yi* = 1 X1* + 2 X2* + 3 X3* + u* 56

Simple OLS result :(Gujarati(2003), Example 11.10. pp.424)
R&D = 192.99 + 0.0319 Sales SEE = 2759
t = (0.194) (3.830) R2=0.478 C.V. = 0.9026
57
White Test for heteroscedasticity
W < 2(0.05, 2)= 5.9914  not reject Ho

W > 2(0.10, 2) = 4.6051  reject Ho
58
A bell shape pattern of residuals:
59
Transformation equations:
^
1. Yi 1
( ) = -246.67 + 0.0367 Xi
Xi Xi
(-0.64) (5.17) C.V. = 0.8195
=>(1) ^
R&D = -246.67 + 0.036 Sales SEE = 7.25
i
(-0.64) (5.17) R2 = 0.3648
Yi 1 Xi Compare the C.V.

2.
( ) = 1 + 2 To determine which
Xi Xi Xi weight is appropriated
^
(2) R&D = -243.49 + 0.0366 Sales SEE = 0.021
(-1.79) (5.52) R2 = 0.168
60
C.V. = 0.7467
^
Yi 1
= 1 +2 Xi
Xi Xi
C.V. = 0.819561
After transformation by @sqrt(x), residuals still spread out wider
62
^
Yi 1
= 1 + 2
Xi Xi
C.V. = 0.746763
After transformation by Xi, residuals spread out more stable
64
When  is not known i
2
• Assumption 3: the error variance is proportional to the

square of the mean value of Y: E (u 2
i )   i
2
  2
E (Yi ) 2
– Similar to previous two cases, divide both sides of the original model
by E (Yi ) :
Yi i Xi u
  2  i
E (Yi ) E (Yi ) E (Yi ) E (Yi )
– Similarly, we also obtain the transformed error term’s variance is
constant.
– The problem here is E (Yi )  1   2 X i depends on coefficient,
which are unknown. Therefore, we may, first, estimate Yˆ  ˆ1  ˆ2 X 2i ,
which is an estimator of E(Yi). Then divide both sides of original
model by estimated value of Yi .
65
When  i2 is not known
• Assumption 4: A log transformation such as
ln Y  1   2 ln X 2i  ui
very often reduces here heteroscedasticity when compared
with the original regression.
– Why? Because log transformation compresses the scales in which
the variables are measures, thereby reducing the differences among
values.
– The log transformation also helps coefficient measure the elasticity of
Y with respect to X → more exact.
66
Refers to Studenmund (2001), Eq.(10.31), pp.375
67
Alternative remedy of heteroscedasiticty: Weighted log-log model
68
W< 2(0.05, 5) = 11.07 not reject Ho
69
Concluding Example
70
Scatter Plot of Corruption and GDP per Capita
60000
(133 Countries)
50000
40000
PERGDP
30000
20000
10000
0
1 2 3 4 5 6 7 8 9 10
CORRUPTION
71
2(0.05, 2)= 5.9914
2(0.10, 2)= 4.60517
Since W* > 2  reject Ho

Heteroscedasticity is existed
72
Change the function form as the log-linear model
73
2(0.05, 2)= 5.9914
2(0.10, 2)= 4.60517
W* < 2(0.05,2) not reject Ho
It means there is no more heterscedasticity

problem after the change of functional
Form.
74
THE END
75

Multicollinearity AND Heteroskedasticity

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multicollinearity AND Heteroskedasticity

Uploaded by

Copyright:

Available Formats

MULTICOLLINEARITY

1- What is the nature of IT?

or 1X1+ 2X2 + 3X3 +…+ iXi = 0

x3 = 2 x2+  where  is a stochastic error

Then the regression coefficients, although determinate,

r322  OLS : X 3  1'   2' X 2  v '

1. Although the estimated coefficients are BLUE,

• High R2 but few significant t ratios

Since GDP and GNP

Other examples: CPI <=> WPI;

If Fstat >Fcrit → it means that the particular Xi

4. Transformation of variables: Y  1   2 X 2   3 X 3  u '

However this method creates some imperfection such as

x11=80 x12=90 x13=100 x1i

0 The scattered points spread out quite equally xi

0 The scattered points spread out quite unequally xt

Var (^2) = E (^2 - 2)2 = E (ki ui)2

1. OLS estimators are still linear and unbiased

Observe the graph whether there is a systemic pattern as:

3. Run OLS on regression: ln( ^

If t* > tc ==> reject H0 ==> heteroscedasticity exists

If t* < tc ==> not reject H0 ==> homoscedasticity 32

Ln( ^u 2 ) = 1* + 2* ln Xi + vi

If | t | > tc => reject H0

Run regression based

Omit central observations

Run regression based

The White test for a linear model

After transform the linear model to a log-log model.

The White test for a log-log model of the same data

(3) Compute W (or LM) = nR2 asy

(4) Compare the W and 2df (where the df is # of regressors in (2))

The White test for a linear model

The White test for a log-log model

Var (u i2) = i2 = 2 Z(X2) = 2Zi2 = 2E(Yi)2 If Var(ui2)=2Zi

If all Zi = 1 (or any constant), homoscedasticity returns.

=> Y* = 1 X1* + 2 X2* + 3 X3* + ui*

These three results satisfy the assumptions of classical OLS,

Determine to use “Zi” or “Zi ” as the weight?

• Let’s consider the two-variable model: Y  1   2 X 2i  ui

– We obtain the transformed disturbance term, which has the variance

• Assumption 2: The error is proportional to Xi. the

=> Yi* = 1 X1* + 2 X2* + 3 X3* + u* 56

W < 2(0.05, 2)= 5.9914  not reject Ho

Yi 1 Xi Compare the C.V.

• Assumption 3: the error variance is proportional to the

Since W* > 2  reject Ho

W* < 2(0.05,2) not reject Ho

It means there is no more heterscedasticity

You might also like