Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 75

MULTICOLLINEARITY

5
AND
LECTURE
HETEROSKEDASTICITY

1
Your Questions!?

1- What is the nature of IT?


2- What are its consequences?
3- How to detect IT?
4- What are the remedial measures?

2
3
CHAPTER 10

Multicollinearity

4
Multicollinearity
• The nature of Multicollinearity
• Consequences of Multicollinearity
• Detection of Multicollinearity
• Remedial measures

5
The nature of multicollinearity

In general:
When there are some functional relationships existing
among independent variables, that is  iXi = 0

or 1X1+ 2X2 + 3X3 +…+ iXi = 0


Such as 1X1+ 2X2 = 0  X1= -2X2
If multicollinearity is perfect, the regression coefficients of
the Xi variables, is, are indeterminate and their standard
errors, Se(i)s, are infinite.
6
Example:
Y = ^1 + ^2X2 + 
^X +u
3 3
^
3-variable Case:
(yx2)(x32) - (yx3)(x2x3)
^
2 =
(x22)(x32) - (x2x3)2
If x 3 = x 2,
(yx2)(2x22) - (yx2)(x2x2) 0
^ =
2 =
(x2 )( x ) -  (x2x2)
2 2
2
2 2 2
0
Indeterminate
(yx3)(x22) - (yx2)(x2x3)
^
3 =
(x22)(x32) - (x2x3)2
Similarly
If x3 = x2 ^
(yx2)(x22) - (yx2)(x2x2) 0
3 = =
(x2 )( x2 ) -  (x2x2)
2 2 2 2 2
0
7
Indeterminate
If multicollinearity is imperfect,

x3 = 2 x2+  where  is a stochastic error


(or x3 = 1+ 2 x2+  )

Then the regression coefficients, although determinate,


possess large standard errors, which means the
coefficients can be estimated but with less accuracy.
^ (yx2)(22x22 +  2 ) - (2 yx2 + y )(2 x2x2+ x2 )
2 =
(x22)(22 x22 +  2 ) - (2x2x2 + x2 )2

0
8
Large variance and covariance of OLS estimators
2 2
 
var(ˆ2 )  u
 VIF
u

 x2 (1  r23 )  x2
2 2 2

1
Variance-inflating factor: VIF  2
1  r23
Higher pair-wise correlation  higher VIF  larger variance
2
where r  OLS : X 2  1   2 X 3  v
23

r322  OLS : X 3  1'   2' X 2  v '

2 2
 
var(ˆ j )  u
 u
VIF j
 x j (1  rji )  x j
2 2 2
9
Consequences of imperfect multicollinearity

1. Although the estimated coefficients are BLUE,


OLS estimators have large variances and
covariances, making the estimation with
less accuracy.
2. The estimation confidence intervals tend to be
much wider, leading to accept the “zero null
hypothesis” more readily.
3. The t-statistics of coefficients tend to be
Can be
statistically insignificant.
detected from
4. The R2 can be very high. regression
5. The OLS estimators and their standard errors results
can be sensitive to small change in the data.
10
Detection of Multicollinearity

• High R2 but few significant t ratios


• Auxiliary regressions
• Variance - inflation factor - VIF
• Intercorrelation Matrix (Eview 4)

11
Significant

Insignificant

Since GDP and GNP


are highly related

Other examples: CPI <=> WPI;


CD rate <=> TB rate 12
M2 <=> M3
Auxiliary regressions
Basic idea:
Multicollinearity arises because one or more of
the regressors are exact or approximately
linear combinations of the other regressors
→regress each Xi on the remaining X variables
and compute the corresponding R2
→ each one of these regressions is called an
auxiliary regression, auxiliary to the main
regression of Y on the X’s
13
Auxiliary regressions
Implementation
• Step 1: regress the auxiliary regression
• Step 2: compute the corresponding R2, designated as Ri2
• Step 3: compute F statistics
Rx21 , x2 , x3 .... xk k  1
Fstat  F
1  R 2
x1 , x2 , x3 .... xk /n  k 
Rx21 , x2 , x3 .... xk
Where is the coefficient of determination i
the regression of variable Xi on the remaining X variables
- k is the number of explanatory variables in the original
regression including the intercept term.
- n is the number of observations in the original regression14
Auxiliary regressions
• Step4: compare Fstat with Fcrit(k-1, n-k)

If Fstat >Fcrit → it means that the particular Xi


should be dropped from the model and
vice versa in which we may retain that
variable in the model.

15
V-I-F
• Some authors use the VIF as an indicator
of Multicollinearity:
The larger is the value of VIFj’
The more “troublesome” or collinearity is the
variable Xj.
According to this ,if the VIF of variable
exceeds 10 (this will happen if Rj2 exceeds
0.9), that variable is said to be highly
collinear
16
Remedial Measures
Y  1   2 X 2   3 X 3  u
1. Utilise a priori information
 1   2 X 2  0.1 2 X 3  u
given  3  0.1 2  1   2 ( X 2  0.1X 3 )  u
 1   2 Z  u
2. Combining cross-sectional and time-series data
3. Dropping a variable(s) and re-specify the regression

4. Transformation of variables: Y  1   2 X 2   3 X 3  u '


(i) First-difference form Y 1 X
(ii) Ratio transformation  1 ( )   2 ( 2 )  3  u'
X3 X3 X3
5. Additional or new data
6. Reducing collinearity in polynomial regression
Y  1   2 X 22   3 X 3  u ' Y  1   2 X 2   3 X 32  u '
7. Do nothing (if the objective is only for prediction) 17
(i) First-difference form
The basic idea of this method is that if the relation
holds at time t ,it must also hold at time t-1.
Then we take the first difference from the regression,
then run the new regression which is not on the
original variables.
The first difference regression model often reduces
the severity of multicollinearity, there is no a priori
reason to believe that their differences will also be
highly correlated.

18
(i) First-difference form
• At time t
Yt  1   2 X 2t   3 X 3t  ut
• At time t-1
Yt 1  1   2 X 2t 1   3 X 3t 1  ut 1
• First difference
Yt  Yt 1  1   2 ( X 2t  X 2t 1 )   3 ( X 3t  X 3t 1 )  vt
• Where
vt  ut  ut 1

However this method creates some imperfection such as


the new error term - vi - may not satisfy one of the
assumptions of the classical linear regression model
(CLRM).
19
CHAPTER 11

Heteroscedasticity

20
Heteroscedasticity
• The nature of Heteroscedasticity
• Consequences of Heteroscedasticity
• Detection of Heteroscedasticity
• Remedial measures

21
Homoscedasticity Case
f(Yi) Yi

re
tu i
nd
pe
ex .
.
.
Var(ui) = E(ui2)= 2

x11=80 x12=90 x13=100 x1i


income
The probability density function for Yi at two
levels of family income, X i , are identical. 22
Homoscedastic pattern of errors
yi
.
. . . .
.. . . . .
. . . ...
... .. .
. . .. . .
. .. . . . .
..
. .
.

0 The scattered points spread out quite equally xi


23
Heteroscedasticity Case
f(Yi)

Y
i
r e
i tu
e nd

.
p
ex

.
. Var(ui) = E(ui2)= i2

x11 x1 2 x1 3 income x1
The variance of Yi increases as family income,
Xi, increases. 24
Heteroscedastic pattern of errors
.
yt . .
. .
. . .
. .
. . . .
. . . . .
. . . .. . . .
. . . . .
. . .. . . .
. . . . .
. .

0 The scattered points spread out quite unequally xt


25
Definition of Heteroscedasticity: Var(ui) = E(ui2) = i2  2

Two-variable regression: Y = 1 + 2 X + ui
 xy
^
2 = =  ki Y = ki (1 + 2 X + ui) k  0
i
x 2
k X i i 1
^ =  + k u
=>  2 2 i E( )^=  unbiased
i 2 2

Var (^2) = E (^2 - 2)2 = E (ki ui)2


= E (k12 u12 + k22 u22 + …. + 2k1k2 u1 u2 + …)
= k12 12 + k22 22 + … …..+ 0 + ...
= i2  ki2
if 12  22  32  …
x 2 i2
= i2 = i.e., heteroscedasticity
( xi2)2  xi2 if 12 = 22 = 32 = …
^ 2
Var (2) = i.e., homoscedasticity
 xi2 26
Consequences of heteroscedasticity

1. OLS estimators are still linear and unbiased


^
2. Var( i )s are not minimum.
=> not the best => not efficiency => not blue
i2  2
3. Var ( ^2) = instead of Var( ^) =
 2
Two-variable
x2  x2 case
^ ^2
u
4. 2 = i ^ 2 )  2
is biased, E(
n-k
5. t and F statistics are unreliable. Y = 0 + 1 X + u
SEE =  ^ RSS =  ^ u2

Cannot be min. 27
Detection of
Heteroscedasticity
Informal and formal methods

28
Informal Method

1. Graphical method :
^i 2 ) against the
plot the estimated residual ( ^ui ) or squared (u
^
predicted dependent Variable (Yi) or any independent
variable(Xi).

Observe the graph whether there is a systemic pattern as:

u^ 2

Yes,
heteroscedasticity exists

^
Y 29
GRAPHICAL METHOD
u^2
no heteroscedasticity u^ u^2
2
yes yes

^
Y ^ ^
Y Y
yes yes yes
u^2 u^2 u^2

^
Y ^ ^
Y Y
30
Formal Methods

1- Park Test
2- Goldfeld – Quandt Test
3- White’s Heteroscedasticity Test
- No cross terms
- With cross terms

31
1- PARK TEST
H0 : No heteroscedasticity exists i.e., Var( ui ) = 2
(homoscedasticity)
H1 : Yes, heteroscedasticity exists i.e., Var( ui ) = i2
Park test procedures:
1. Run OLS on regression: Yi = 1 + 2 Xi + ui , obtain u^i
2. Take square and take log : ln ( ^
u 2) i

3. Run OLS on regression: ln( ^


ui2) = 1* + 2* ln Xi + vi
Suspected variable
that causes
4. Use t-test to test H0 : 2* = 0 (Homoscedasticity) heteroscedasticity

If t* > tc ==> reject H0 ==> heteroscedasticity exists

If t* < tc ==> not reject H0 ==> homoscedasticity 32


Example: Studenmund (2001), Equation 10.24, pp.370

Procedure 1

33
Check the residuals whether they are spreading out or not?

34
Procedure 2:
Obtain the residuals from previous regression,
take squares and take logs

35
Procedure 3 & 4

Ln( ^u 2 ) = 1* + 2* ln Xi + vi

If | t | > tc => reject H0


=> heteroscedasticity

36
2- The Goldfeld-Quandt Test
H0 : homoscedasticity Var ( ui ) = 2
H1 : heteroscedasticity Var ( ui ) = i2
Goldfeld-Quandt Test procedures:
(1) Order or rank the observations according to the values of Xi,
beginning with the lowest X value.
(2) Omit c central observations, where c is specified a priori, and
divide the remaining (n-c) observation into two groups each of
(n-c)/2 observations.
(3) Run the separate regression on two sub-samples and obtain the
respective RSS1 and RSS2.. Each RSS has [(n-c)/2 - k] df
RSS2/df
(4) Compute the -ratio:  =
RSS1/df
(5) Compare the  and the Fc, if  > Fc (0.05, (n-c)/2)-k)==> reject the H0
37
Gujarati(2003)Table 11.3 Re-order data

Run regression based


on this group of data (n1=13),
Obtains RSS1

Omit central observations


(c = 4)

Run regression based


on this group of data (n2=13),
Obtains RSS2

RSS 2 / df
  F38c ?
RSS1 / df
3.1- White’s heteroscedasticity test (no cross terms)
(LM test)
H0 : homoscedasticity Var ( ui ) = 2
H1 : heteroscedasticity Var ( ui ) = i2
Test procedures:
(1) Run OLS on regression: Yi = 1 + 2X2i + 3X3i +...+ qXqi + ui ,
obtain the residuals, u^i
(2) Run the auxiliary regression:
ˆui2  1   2 X 2i  ...   q X qi   q 1 X 22i  ...   2 q 1 X qi2  vi
H0: 2 = 3 = …= q = 0
(3) Compute W (or LM) = nR2 asy
~ χ2df
(4) Compare the W and 2df(=q-1) (where the df is #(q) of regressors in (2))
if W > 2df ==> reject the Ho
39
Yi = 1 + 2X2i + 3X3i + 4X4i + ui

40
41
W=

2(0.05, 6) = 12.59
W> reject Ho
2(0.10, 6) = 10.64

The White test for a linear model


The test statistic (nR2) indicates
heteroscedasticity is existed.

42
2(0.05, 6) = 12.59
W< not reject Ho
2(0.10, 6) = 10.64

After transform the linear model to a log-log model.

The White test for a log-log model of the same data


43
The W-statistic (nR2) indicates heteroscedasticity is not existed
3.2- White’s general heteroscedasticity test (with cross terms)

H0 : homoscedasticity Var ( ui ) = 2
H1 : heteroscedasticity Var ( ui ) = i2

Test procedures:
(1) Run OLS on regression: Yi = 1 + 2X2i + 3X3i + ui ,
obtain the residuals, u^i
(2) Run the auxiliary regression:
^u 2 =  +  X +  X +  X2 +  X2 +  X X + v
i 1 2 2i 3 3i 4 2i 5 3i 6 2i 3i i

(3) Compute W (or LM) = nR2 asy


~ χ2df

(4) Compare the W and 2df (where the df is # of regressors in (2))


if W > 2df ==> reject the Ho
44
45
Example 8.4 (Wooldridge, pp.258)

W=

2(0.05, 9) = 16.92
W>  reject Ho
 (0.10, 9) = 14.68
2

The White test for a linear model


The test statistic indicates
heteroscedasticity is existed.

46
For the log-log model

2(0.05, 9) = 16.92
W<  not reject Ho
2(0.10, 9) = 14.68

The White test for a log-log model


The test statistic indicates
heteroscedasticity is not existed

47
Remedial Measures

2

• When i is known: the method of weighted least
squares.
• When  i2 is unknown
- Assumption 1
- Assumption 2
- Assumption 3
- Assumption 4

48
General Meaning of Weighted Least Squares (WLS)
Suppose : Y = 1 + 2 X2 + 3 X3 + ui
E ( ui ) = 0, E ( ui uj )= 0 ij

Var (u i2) = i2 = 2 Z(X2) = 2Zi2 = 2E(Yi)2 If Var(ui2)=2Zi

If all Zi = 1 (or any constant), homoscedasticity returns.


But Zi can be any value, and it is the proportionality factor.
In the case of 2 was known:To correct the heteroscedasticity
Transform the regression:
Then each term
Yi 1 X2i X3i ui
=1 + 2 + 3 + divided by Zi
Zi Zi Zi Zi Zi

=> Y* = 1 X1* + 2 X2* + 3 X3* + ui*


49
Theoretically, in the transformed equation where
i 1
(i) E ( Z ) = Z E (i) = 0
i i

ui 2 1 1
(ii) E ( ) = 2E (ui ) =
2
2 Zi  = 
2 2 2
Zi Zi Zi

ui uj 1
(iii) E ( )= E ( ui uj ) = 0
Zi Zj ZiZj

These three results satisfy the assumptions of classical OLS,


therefore, use the Zi as the weight for each regressor can correct
the problem of heteroscedasticity, and can obtain the BLUE
estimators.

Determine to use “Zi” or “Zi ” as the weight?


50
2
When  is known: the method of
i
weighted least squares:
• Let’s consider the 2-variable model   1   2 X 2  ui
• Divide both sides of above model by  i to obtain:
Y 1 X u
 1    2  2  i
i i i i ui
• Now, in the model, the error term is realized by  i ,
hence, the variance of the error term can be rewritten as
follows:
2
 ui   1 
var     var ui   12   i2  1  const
i  i  i
51
When  is not known i
2

• Let’s consider the two-variable model: Y  1   2 X 2i  ui


• We now consider several assumptions about the pattern of
heteroscedasticity.
• Assumption 1: The error variance is proportional to
X i2 : E (ui2 )   i2   2 X i2
– Divide both sides of the original model by Xi to obtain:
Yi  u
 1  2  i
Xi Xi Xi

– We obtain the transformed disturbance term, which has the variance


as follows:
2
 ui  1 1
var     var(ui )  2   2 X i2   2  const
 Xi  X Xi 52
If the residual plot against X2 is as following :
^ui ^u2

+
0 X3
-
X3
This plot suggests a variance is increasing proportional to X3i2.
The scattered plots spreading out as nonlinear trumpet pattern.
Therefore, we might expect  2 = Z 22 Z 2 = X 2 =>Z =X
i i i 3i i 3i
Hence, the transformed equation becomes
Yi 1 X2i X3i ui
= 1 + 2 + 3 +
X2i X3i X3i X3i X3i
This becomes
=> Yi* = 1 X1* + 2 X2* + 3 + u* the intercept
Where u*i satisfies the assumptions of classical OLS coefficient
53
Example: Studenmund (2001), Eq. 10.27, pp.373

54
When  is not known
i
2

• Assumption 2: The error is proportional to Xi. the


2 2 2
square root transformation: E (u i )   i   Xi
– Divide both sides of the original model by X i to obtain
Yi i ui
  2 X i 
Xi Xi Xi ui
– Now, in the new model, the error term is realized by Xi
,
hence, the variance of the error term can be rewritten as
follows: 2
 ui   1  1
var      var(ui )    2 X i   2  const
 X   X  Xi
 i   i 
55
If the residual plot against X2 is as following :
^u ^u2
i

+
0 X3
-
X3
This plot suggests a variance is increasing proportional to X3i.
The scattered plots spreading out as a linear cone pattern
Therefore, we might expect i2 = Zi2 hi2 = X3i => hi = X3i
The transformed equation is
Yi 1 X2i X3i ui
= 1 + 2 + 3 +
X3i  X3i  X3i  X3i  X3i

=> Yi* = 1 X1* + 2 X2* + 3 X3* + u* 56


Simple OLS result :(Gujarati(2003), Example 11.10. pp.424)
R&D = 192.99 + 0.0319 Sales SEE = 2759
t = (0.194) (3.830) R2=0.478 C.V. = 0.9026

57
White Test for heteroscedasticity

W < 2(0.05, 2)= 5.9914  not reject Ho


W > 2(0.10, 2) = 4.6051  reject Ho

58
A bell shape pattern of residuals:

59
Transformation equations:

^
1. Yi 1
( ) = -246.67 + 0.0367 Xi
Xi Xi
(-0.64) (5.17) C.V. = 0.8195
=>(1) ^
R&D = -246.67 + 0.036 Sales SEE = 7.25
i
(-0.64) (5.17) R2 = 0.3648

Yi 1 Xi Compare the C.V.


2.
( ) = 1 + 2 To determine which
Xi Xi Xi weight is appropriated
^
(2) R&D = -243.49 + 0.0366 Sales SEE = 0.021
(-1.79) (5.52) R2 = 0.168
60
C.V. = 0.7467
^
Yi 1
= 1 +2 Xi
Xi Xi

C.V. = 0.819561
After transformation by @sqrt(x), residuals still spread out wider

62
^
Yi 1
= 1 + 2
Xi Xi

C.V. = 0.746763
After transformation by Xi, residuals spread out more stable

64
When  is not known i
2

• Assumption 3: the error variance is proportional to the


square of the mean value of Y: E (u 2
i )   i
2
  2
E (Yi ) 2

– Similar to previous two cases, divide both sides of the original model
by E (Yi ) :
Yi i Xi u
  2  i
E (Yi ) E (Yi ) E (Yi ) E (Yi )
– Similarly, we also obtain the transformed error term’s variance is
constant.
– The problem here is E (Yi )  1   2 X i depends on coefficient,
which are unknown. Therefore, we may, first, estimate Yˆ  ˆ1  ˆ2 X 2i ,
which is an estimator of E(Yi). Then divide both sides of original
model by estimated value of Yi .

65
When  i2 is not known
• Assumption 4: A log transformation such as
ln Y  1   2 ln X 2i  ui
very often reduces here heteroscedasticity when compared
with the original regression.
– Why? Because log transformation compresses the scales in which
the variables are measures, thereby reducing the differences among
values.
– The log transformation also helps coefficient measure the elasticity of
Y with respect to X → more exact.

66
Refers to Studenmund (2001), Eq.(10.31), pp.375
67
Alternative remedy of heteroscedasiticty: Weighted log-log model

68
W< 2(0.05, 5) = 11.07 not reject Ho

69
Concluding Example

70
Scatter Plot of Corruption and GDP per Capita
60000
(133 Countries)

50000

40000
PERGDP

30000

20000

10000

0
1 2 3 4 5 6 7 8 9 10

CORRUPTION
71
2(0.05, 2)= 5.9914
2(0.10, 2)= 4.60517

Since W* > 2  reject Ho


Heteroscedasticity is existed

72
Change the function form as the log-linear model

73
2(0.05, 2)= 5.9914
2(0.10, 2)= 4.60517

W* < 2(0.05,2) not reject Ho

It means there is no more heterscedasticity


problem after the change of functional
Form.

74
THE END

75

You might also like