Professional Documents
Culture Documents
Multicollinearity AND Heteroskedasticity
Multicollinearity AND Heteroskedasticity
5
AND
LECTURE
HETEROSKEDASTICITY
1
Your Questions!?
2
3
CHAPTER 10
Multicollinearity
4
Multicollinearity
• The nature of Multicollinearity
• Consequences of Multicollinearity
• Detection of Multicollinearity
• Remedial measures
5
The nature of multicollinearity
In general:
When there are some functional relationships existing
among independent variables, that is iXi = 0
0
8
Large variance and covariance of OLS estimators
2 2
var(ˆ2 ) u
VIF
u
x2 (1 r23 ) x2
2 2 2
1
Variance-inflating factor: VIF 2
1 r23
Higher pair-wise correlation higher VIF larger variance
2
where r OLS : X 2 1 2 X 3 v
23
2 2
var(ˆ j ) u
u
VIF j
x j (1 rji ) x j
2 2 2
9
Consequences of imperfect multicollinearity
11
Significant
Insignificant
15
V-I-F
• Some authors use the VIF as an indicator
of Multicollinearity:
The larger is the value of VIFj’
The more “troublesome” or collinearity is the
variable Xj.
According to this ,if the VIF of variable
exceeds 10 (this will happen if Rj2 exceeds
0.9), that variable is said to be highly
collinear
16
Remedial Measures
Y 1 2 X 2 3 X 3 u
1. Utilise a priori information
1 2 X 2 0.1 2 X 3 u
given 3 0.1 2 1 2 ( X 2 0.1X 3 ) u
1 2 Z u
2. Combining cross-sectional and time-series data
3. Dropping a variable(s) and re-specify the regression
18
(i) First-difference form
• At time t
Yt 1 2 X 2t 3 X 3t ut
• At time t-1
Yt 1 1 2 X 2t 1 3 X 3t 1 ut 1
• First difference
Yt Yt 1 1 2 ( X 2t X 2t 1 ) 3 ( X 3t X 3t 1 ) vt
• Where
vt ut ut 1
Heteroscedasticity
20
Heteroscedasticity
• The nature of Heteroscedasticity
• Consequences of Heteroscedasticity
• Detection of Heteroscedasticity
• Remedial measures
21
Homoscedasticity Case
f(Yi) Yi
re
tu i
nd
pe
ex .
.
.
Var(ui) = E(ui2)= 2
Y
i
r e
i tu
e nd
.
p
ex
.
. Var(ui) = E(ui2)= i2
x11 x1 2 x1 3 income x1
The variance of Yi increases as family income,
Xi, increases. 24
Heteroscedastic pattern of errors
.
yt . .
. .
. . .
. .
. . . .
. . . . .
. . . .. . . .
. . . . .
. . .. . . .
. . . . .
. .
Two-variable regression: Y = 1 + 2 X + ui
xy
^
2 = = ki Y = ki (1 + 2 X + ui) k 0
i
x 2
k X i i 1
^ = + k u
=> 2 2 i E( )^= unbiased
i 2 2
Cannot be min. 27
Detection of
Heteroscedasticity
Informal and formal methods
28
Informal Method
1. Graphical method :
^i 2 ) against the
plot the estimated residual ( ^ui ) or squared (u
^
predicted dependent Variable (Yi) or any independent
variable(Xi).
u^ 2
Yes,
heteroscedasticity exists
^
Y 29
GRAPHICAL METHOD
u^2
no heteroscedasticity u^ u^2
2
yes yes
^
Y ^ ^
Y Y
yes yes yes
u^2 u^2 u^2
^
Y ^ ^
Y Y
30
Formal Methods
1- Park Test
2- Goldfeld – Quandt Test
3- White’s Heteroscedasticity Test
- No cross terms
- With cross terms
31
1- PARK TEST
H0 : No heteroscedasticity exists i.e., Var( ui ) = 2
(homoscedasticity)
H1 : Yes, heteroscedasticity exists i.e., Var( ui ) = i2
Park test procedures:
1. Run OLS on regression: Yi = 1 + 2 Xi + ui , obtain u^i
2. Take square and take log : ln ( ^
u 2) i
Procedure 1
33
Check the residuals whether they are spreading out or not?
34
Procedure 2:
Obtain the residuals from previous regression,
take squares and take logs
35
Procedure 3 & 4
36
2- The Goldfeld-Quandt Test
H0 : homoscedasticity Var ( ui ) = 2
H1 : heteroscedasticity Var ( ui ) = i2
Goldfeld-Quandt Test procedures:
(1) Order or rank the observations according to the values of Xi,
beginning with the lowest X value.
(2) Omit c central observations, where c is specified a priori, and
divide the remaining (n-c) observation into two groups each of
(n-c)/2 observations.
(3) Run the separate regression on two sub-samples and obtain the
respective RSS1 and RSS2.. Each RSS has [(n-c)/2 - k] df
RSS2/df
(4) Compute the -ratio: =
RSS1/df
(5) Compare the and the Fc, if > Fc (0.05, (n-c)/2)-k)==> reject the H0
37
Gujarati(2003)Table 11.3 Re-order data
RSS 2 / df
F38c ?
RSS1 / df
3.1- White’s heteroscedasticity test (no cross terms)
(LM test)
H0 : homoscedasticity Var ( ui ) = 2
H1 : heteroscedasticity Var ( ui ) = i2
Test procedures:
(1) Run OLS on regression: Yi = 1 + 2X2i + 3X3i +...+ qXqi + ui ,
obtain the residuals, u^i
(2) Run the auxiliary regression:
ˆui2 1 2 X 2i ... q X qi q 1 X 22i ... 2 q 1 X qi2 vi
H0: 2 = 3 = …= q = 0
(3) Compute W (or LM) = nR2 asy
~ χ2df
(4) Compare the W and 2df(=q-1) (where the df is #(q) of regressors in (2))
if W > 2df ==> reject the Ho
39
Yi = 1 + 2X2i + 3X3i + 4X4i + ui
40
41
W=
2(0.05, 6) = 12.59
W> reject Ho
2(0.10, 6) = 10.64
42
2(0.05, 6) = 12.59
W< not reject Ho
2(0.10, 6) = 10.64
H0 : homoscedasticity Var ( ui ) = 2
H1 : heteroscedasticity Var ( ui ) = i2
Test procedures:
(1) Run OLS on regression: Yi = 1 + 2X2i + 3X3i + ui ,
obtain the residuals, u^i
(2) Run the auxiliary regression:
^u 2 = + X + X + X2 + X2 + X X + v
i 1 2 2i 3 3i 4 2i 5 3i 6 2i 3i i
W=
2(0.05, 9) = 16.92
W> reject Ho
(0.10, 9) = 14.68
2
46
For the log-log model
2(0.05, 9) = 16.92
W< not reject Ho
2(0.10, 9) = 14.68
47
Remedial Measures
2
• When i is known: the method of weighted least
squares.
• When i2 is unknown
- Assumption 1
- Assumption 2
- Assumption 3
- Assumption 4
48
General Meaning of Weighted Least Squares (WLS)
Suppose : Y = 1 + 2 X2 + 3 X3 + ui
E ( ui ) = 0, E ( ui uj )= 0 ij
ui 2 1 1
(ii) E ( ) = 2E (ui ) =
2
2 Zi =
2 2 2
Zi Zi Zi
ui uj 1
(iii) E ( )= E ( ui uj ) = 0
Zi Zj ZiZj
+
0 X3
-
X3
This plot suggests a variance is increasing proportional to X3i2.
The scattered plots spreading out as nonlinear trumpet pattern.
Therefore, we might expect 2 = Z 22 Z 2 = X 2 =>Z =X
i i i 3i i 3i
Hence, the transformed equation becomes
Yi 1 X2i X3i ui
= 1 + 2 + 3 +
X2i X3i X3i X3i X3i
This becomes
=> Yi* = 1 X1* + 2 X2* + 3 + u* the intercept
Where u*i satisfies the assumptions of classical OLS coefficient
53
Example: Studenmund (2001), Eq. 10.27, pp.373
54
When is not known
i
2
+
0 X3
-
X3
This plot suggests a variance is increasing proportional to X3i.
The scattered plots spreading out as a linear cone pattern
Therefore, we might expect i2 = Zi2 hi2 = X3i => hi = X3i
The transformed equation is
Yi 1 X2i X3i ui
= 1 + 2 + 3 +
X3i X3i X3i X3i X3i
57
White Test for heteroscedasticity
58
A bell shape pattern of residuals:
59
Transformation equations:
^
1. Yi 1
( ) = -246.67 + 0.0367 Xi
Xi Xi
(-0.64) (5.17) C.V. = 0.8195
=>(1) ^
R&D = -246.67 + 0.036 Sales SEE = 7.25
i
(-0.64) (5.17) R2 = 0.3648
C.V. = 0.819561
After transformation by @sqrt(x), residuals still spread out wider
62
^
Yi 1
= 1 + 2
Xi Xi
C.V. = 0.746763
After transformation by Xi, residuals spread out more stable
64
When is not known i
2
– Similar to previous two cases, divide both sides of the original model
by E (Yi ) :
Yi i Xi u
2 i
E (Yi ) E (Yi ) E (Yi ) E (Yi )
– Similarly, we also obtain the transformed error term’s variance is
constant.
– The problem here is E (Yi ) 1 2 X i depends on coefficient,
which are unknown. Therefore, we may, first, estimate Yˆ ˆ1 ˆ2 X 2i ,
which is an estimator of E(Yi). Then divide both sides of original
model by estimated value of Yi .
65
When i2 is not known
• Assumption 4: A log transformation such as
ln Y 1 2 ln X 2i ui
very often reduces here heteroscedasticity when compared
with the original regression.
– Why? Because log transformation compresses the scales in which
the variables are measures, thereby reducing the differences among
values.
– The log transformation also helps coefficient measure the elasticity of
Y with respect to X → more exact.
66
Refers to Studenmund (2001), Eq.(10.31), pp.375
67
Alternative remedy of heteroscedasiticty: Weighted log-log model
68
W< 2(0.05, 5) = 11.07 not reject Ho
69
Concluding Example
70
Scatter Plot of Corruption and GDP per Capita
60000
(133 Countries)
50000
40000
PERGDP
30000
20000
10000
0
1 2 3 4 5 6 7 8 9 10
CORRUPTION
71
2(0.05, 2)= 5.9914
2(0.10, 2)= 4.60517
72
Change the function form as the log-linear model
73
2(0.05, 2)= 5.9914
2(0.10, 2)= 4.60517
74
THE END
75