Professional Documents
Culture Documents
Heteroskedasticity in Regression: Paul E. Johnson
Heteroskedasticity in Regression: Paul E. Johnson
Heteroskedasticity in Regression: Paul E. Johnson
Heteroskedasticity in Regression
Paul E. Johnson1 2
1
Department of Political Science
2
Center for Research Methods and Data Analysis, University of Kansas
2014
Heteroskedasticity 2 / 55
Introduction
Outline
1 Introduction
6 Practice Problems
Heteroskedasticity 3 / 55
Introduction
Linear Model
yi = β0 + β1 x1i + ei
About the error term, we assumed, for all i,
E (ei ) = 0 for all i
Var (ei ) = E [(ei − E (ei ))2 ] = σ 2 (Homoskedasticity).
There is no i subscript on σ 2 . It is the same for all rows.
Heteroskedasticity (or heteroscedasticity): the assumption of
homogeneous variance is violated.
Heteroskedasticity 4 / 55
Introduction
Homoskedasticity means
σe2
0 0 0 0
0 σe2 0 0 0
Var (e) =
0 0 σe2 0 0
..
0 0 0 . 0
0 0 0 0 σe2
Heteroskedasticity 5 / 55
Introduction
σe2
σe2 w1 0
w1 0
σe2
σe2 w2 w2
σe2
Var (e) =
σe2 w3 or
w3
.. ..
.
.
0 σe2 wN σe2
0 wN
1
Var (e1 ) 0
1
Var (e2 )
1
Wi = Var (e3 )
..
.
1
0 Var (eN )
Sometimes they call the variance of the errors Σ and so the weights are
Σ−1 .
Heteroskedasticity 7 / 55
Introduction
Easy! Begin with “mean centered” data. The slope estimate from
OLS with one variable:
β1 xi2 + xi · ei
P P P P P
xi · yi xi (b · xi + ei ) xi · ei
β̂1 = P 2 = P 2 = P 2 = β1 + P 2
xi xi xi xi
\
Consequence 2. OLS formula for Var (β̂) is wrong
1 The derivation of the “true variance” of the OLS estimator, Var (β̂),
assumed Var [ei ] was same for all i.
2
\
Thus our formula that estimated Var (β̂), Var (β̂) is wrong. And it’s
square root, the std.err .(β̂) is wrong.
3 Thus t-tests from OLS, β̂/std.err .(β̂) are WRONG (too big).
4 And your p-values are WRONG (too small).
Heteroskedasticity 11 / 55
Introduction
\
Proof: OLS Var (β̂) Wrong
The first term is ”roughly” what OLS would calculate for the variance
of β̂1 .
The second term is the additional ”true variance” in β̂1 that the OLS
\
formula V (β̂ ) does not include.
1
Heteroskedasticity 13 / 55
Introduction
WLS always proceeds with the assumption that the off-diagonals are
0, so none of the errors are correlated with each other.
GLS: Generalized Least Squares introduces those off diagonal
components. GLS is used in time series analysis.
Heteroskedasticity 17 / 55
Introduction
1
Var (e1 ) 0
1
Var (e2 )
..
.
1
0 Var (eN )
Heteroskedasticity
The “truth” is
60
●
●
●
●
●
yi = 3 + 0.25xi + ei (11)
40
●
●● ●
● ● ●
● ● ●
● ● ● ●●
● ● ●● ● ●
●● ●● ● ● ●
● ● ●
20
●
●● ● ● ● ●● ● ● ● ●
Homoskedastic: ●
●●
● ●
●●
●
●
●● ●
● ●● ●
●●
● ● ● ●
●● ● ●
●● ●
● ●
● ●
●
●● ●
● ● ●● ●
● ● ●
●● ●
0
●
30 40 50 60 70 80
Heteroskedastic:
x
Heteroskedastic Errors
1 example scatter ●
80
●
60
● ●
Heteroskedastic y
●●
● ● ●
● ●● ●
●
●
●● ●● ● ● ●
● ● ● ●●
True regression line in red ●● ● ●●●● ●
●
40
● ● ● ● ●● ● ● ●● ●● ●● ●
● ●
●
● ●● ● ● ●● ●
● ● ● ● ● ●● ●● ●
●● ● ●●●●● ●
●
●●
●●●● ●● ●●
● ●
●●
●
●●
● ●●●● ● ●●
● ●● ● ●
● ●
●●●● ●●● ● ● ●●
● ● ●●●
●
●● ●●●●
● ●● ●●●● ● ●
● ● ●● ●●●
20
●● ● ●●
● ●● ●
●● ● ●● ● ●
●●●●●
●●
●●
●●●
● ●●
●●●●● ●
●● ●●●●● ● ●●
● ●● ● ● ● ● ● ●●
●●●●● ●●●
● ●●
● ●●●● ● ●●
● ●● ● ●●●●●●●
● ●
●
●●●●● ●● ● ● ●●
●
●
●●●●●
●
●●
●● ●●
●● ●●
●
● ●● ●●
●● ●●
●●●●
●●●● ● ●
●●● ● ●
● ● ●● ● ● ● ●● ●●●●
●●●● ● ●●●● ● ●●●
● ●●● ● ● ● ●●
●● ● ● ● ●● ●●● ●●● ●●● ●●●●● ●●● ●● ● ●● ●●
●●●
●●● ●
● ● ●●●●
●●● ●
● ● ●●
● ● ● ● ● ● ● ●●● ●
●
0
●● ● ●●● ● ●
●● ●●●●● ●
● ●●●● ● ● ● ● ●
●
● ● ● ●●
● ●●● ●● ●
● ● ●●● ● ● ●
−20 ●
● ●
● ●
● ●
●
●
20 30 40 50 60 70 80
x
Heteroskedasticity 20 / 55
Introduction
●
80
80
●
60
60
● ●
Heteroskedastic y
Homoskedastic y
●●
● ● ●
● ●● ●
●
●
●● ●● ● ● ●
●
●● ● ●●●●
● ● ● ●● ● ● ●
●
40
40
● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ●● ●
● ● ●● ●●● ● ●● ●● ●
● ●
●
● ●● ● ● ●● ● ● ● ●● ● ●
●●
● ●
●● ● ●●●●●
● ● ●
●
●●
●●
●●●●●
●●
●
●● ●● ●● ● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●●●●
●
● ●● ● ●
●●
● ● ● ●● ●●●● ● ● ●●
●● ●●●●● ●● ●● ●●
● ●●●● ● ●●
● ●● ●
●●
●
●●●●● ●
●●●●
● ● ●● ●
● ●●●● ●
● ●
● ●●●
● ●● ● ●● ●●●● ●
● ●●● ●
● ●●●●
● ●
●●
●
● ●●●
●
●●●● ●
●●● ● ● ● ●●● ●
● ●
● ● ●● ●●● ●●● ● ● ●● ●● ●
● ●● ● ● ●
20
20
●● ● ●●
● ●● ●
●● ● ●● ● ● ● ●● ● ●
●● ● ●
● ●● ●
● ● ●
●●●●●
●●
●●
●●●
● ●●
●●●●● ●
●● ●●●●● ● ●●
● ●● ● ● ● ● ● ●● ● ●● ● ●●●●●●●● ●● ●
●●●
●●
●● ●●
●●
●●●
●●
●●
●●
●●●
●
●● ●●●●● ●●● ● ●● ● ● ●● ●
●
●●●●● ●●●
● ●●
● ●●●● ● ●●
● ●● ● ●●●●●●●
● ● ● ● ●●●●●●●
●●
● ●● ●● ● ●
● ●●●●
● ● ●●
● ●●●
●●
●
●● ●●
●● ●
●●
●
●●●●● ● ● ●● ●
●●●●●
●
●● ●●●●
● ●● ●●●●●●● ● ● ● ●●
●
●●●●●
●● ●●
● ●● ●
●● ●
●●● ● ●●●● ● ● ●
● ● ●● ●
● ●
● ● ●● ●●●
●
●
●●●●
●●
●
●● ● ●●
●●● ● ●● ●●● ●
●● ●
●●● ●
● ● ● ● ● ● ● ●● ●●●●●●
● ●●
●●
● ●●
●
●
●● ● ● ●
●●●●
●●●
● ●
●
●●●●●●● ●● ●●
● ●
●● ● ●●
● ●●●
● ● ●●● ●● ● ●● ●● ● ● ● ● ● ● ● ●
●● ● ● ● ●● ●●● ●●● ●●● ●●●● ●● ●● ● ● ●
●●●●● ●
●●●●● ●● ●●● ●●
● ●●●
● ● ●
●● ●
● ●● ●
●●●
●●● ●
● ● ●●●●
●●● ● ● ●●
● ● ● ● ● ●● ●
● ● ● ●● ● ●● ● ● ●●● ●●●● ●●
●●● ●●●●●●
● ● ● ● ●● ●
0
0
●● ● ●●● ● ● ●● ●
●● ●●●●● ●
● ●●●● ● ● ● ● ●
● ●● ●● ● ●● ●●
●
●● ● ●
● ● ● ●●
● ●●● ●● ● ● ● ●●
● ● ●●● ● ● ● ●
● ●
−20
−20
●
● ● ● ●
● ●
●
●
20 30 40 50 60 70 80 20 30 40 50 60 70 80
x x
Heteroskedasticity 21 / 55
Introduction
10
5
8
4
6
3
Density
Density
4
2
2
1
0
0
0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6
Outline
1 Introduction
6 Practice Problems
Heteroskedasticity 24 / 55
Fix #1: Robust Standard Errors
We should not use the OLS formula for Var \ (b̂) anymore (and hence
the std.err .(b̂) from OLS is no good either).
Robust “heteroskedasticity consistent” variance estimator.
Wish these were accurate for small samples, but they generally are
not
Best we can hope for is
consistent standard errors
asymptotically valid t-tests
Note: This does not “fix” b̂OLS . It just gives us more accurate
t-ratios by correcting std.err (b̂).
Heteroskedasticity 25 / 55
Fix #1: Robust Standard Errors
0 0 0 0 σc2
N
\ MSE
Var (b̂1 ) = P 2 and the square root of that is std.err .(b̂1 ) (17)
xi
The robust versions replace Var (ei ) with other estimates. White’s
suggestion was P 2 2
\ x · ê
Robust Var (b̂1 ) = Pi 2 2i (18)
( xi )
eˆi 2 : the “squared residual”, used in place of the unknown error
variance.
Heteroskedasticity 29 / 55
Weighted Least Squares
Outline
1 Introduction
6 Practice Problems
Heteroskedasticity 30 / 55
Weighted Least Squares
WLS is Efficient
men : yi = β0 + β1 xi + ei (20)
women : yi = c0 + c1 xi + ui (21)
Then you wonder, “can I combine the data for men and women to
estimate one model”
This “manages” the differences of intercept and slope for men and
women by adding coefficients β2 and β3 .
But this ASSUMED that Var (ei ) = Var (ui ).
We should have tested for homoskedasticity (the ability to pool the
2 samples).
Heteroskedasticity 33 / 55
Weighted Least Squares
Random coefficient model
Methods Synonyms
The basic idea is to say that the linear model has “extra” random error
terms.
This “Laird and Ware” notation has now
become a standard. Let the “fixed”
Synonyms coefficients be β’s, but suppose in
Random effects models addition there are random coefficients
b ∼ N(0, σb2 ).
Mixed Models
Hierarchical Linear Models (HLM) y = X β + Zb + e (23)
Multi-level Models (MLM)
I’ll probably write something on the
board.
Heteroskedasticity 34 / 55
Weighted Least Squares
Random coefficient model
Start with the regression model that has a different slope for each
case:
yi = β0 + βi xi + ei (24)
Slope is a ”random coefficient” with 2 parts
βi = β1 + ui
yi = β0 + (β1 + ui )xi + ei
= β0 + β1 xi + ui xi + ei
Note: My “new” error term is ui xi + ei . NOT homoskedastic
What’s the variance of that? Apply the usual rule:
Var [ui xi + ei ] = xi2 Var (ui ) + Var (ei ) + 2xi Cov (ui , ei )
Get rid of the last part by asserting that the 2 random effects are
uncorrelated, so we have
= xi2 σu2 + σe2
Heteroskedasticity 36 / 55
Weighted Least Squares
Aggregate Data
Outline
1 Introduction
6 Practice Problems
Heteroskedasticity 38 / 55
Testing for heteroskedasticity
Categorical Heteroskedasticity
Idea: Estimate the error variances for the subgroups, try to find
out if they are different.
60
o r d e r . b y= ∼ x ) ● ● ●
● ● ●
● ● ●●● ●● ● ● ●
40
● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●
● ●
●
Goldfeld−Quandt t e s t ●
● ●● ● ● ● ●●● ● ● ●●● ●
●
●●● ● ● ●
●
●● ●●●●●●● ●●●●● ●●●
●
● ●
●
●
●
●●●●● ● ●●●
●●●●●
● ●
●●●● ●● ●●● ● ● ● ● ●
● ●● ● ●●●●●● ● ●
● ●● ●●●
● ●● ●●●● ●● ● ● ●
y
● ● ●● ●
20
● ●
● ● ● ● ● ●
●
● ● ●●●●●
●● ● ● ●
● ●
●●
●● ●● ●●● ● ● ● ● ●●● ●●● ● ●●
●● ●●● ●● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ●●● ●
●● ● ● ●●●
●● ●● ●
●
●●●●●●●● ● ● ●●
d a t a : mymod ● ●
●
● ●● ●
●●
●
●●●●● ●● ●●
● ●● ●
● ● ●● ●●●
●●●●●●●●●●●
●
● ●●
●●●● ●●
● ●
●●●
● ●● ●●
● ●●●
●● ● ●●
●
● ●
●
● ●
●
●● ● ● ●
●
●●● ●●● ●
●
● ● ●●● ●● ●●●●●●●● ● ●●●
● ● ●● ●● ● ● ●●●
●● ●
GQ = 3 . 0 2 1 3 , d f 1 = 1 9 8 , d f 2 = ●
●●●
●
●
●●
● ● ●
● ●● ● ●●●
●●
● ●
● ●
● ●●●●
●● ● ● ●
●
●●
● ● ●
● ●
0 ● ● ●●● ● ● ● ● ●
● ● ● ●●● ●●● ● ●
1 9 8 , p−value = 1 .672e−14 ● ● ● ● ● ● ●
● ●●
●
●●
● ●
● ●
● ● ● ● ● ●
● ●
−20
● ● ●● ●
● ●●
●
This excludes 20% of the cases
from the middle, and compares 30 40 50 60 70
sections.
Heteroskedasticity 42 / 55
Testing for heteroskedasticity
Toward a General Test for Heteroskedasticity
Versions of this test were proposed in Breusch & Pagan (1979) and
White (1980).
Basic Idea: If errors are homogeneous, the variance of the residuals
should not be predictable with the use of input variables.
T.S. Breusch & A.R. Pagan (1979), A Simple Test for
Heteroscedasticity and Random Coefficient Variation. Econometrica
47, 1287–1294
Heteroskedasticity 43 / 55
Testing for heteroskedasticity
Toward a General Test for Heteroskedasticity
Breusch-Pagan test
Model the squared residuals with the other predictors (Z 1i , etc) like
this:
ebi 2
= γo + γ1 Z 11 + γ2 Z 2i
c2
σ
c2 = MSE .
Here, σ
If the error is homoskedastic/Normal, the coefficients γ0 , γ1 , and γ2
will all equal zero. The input variables Z can be the same as the
original regression, but may are include squared values of those
variables.
BP contend that 21 RSS (the regression sum of squares) should be
distributed as χ2 with degrees of freedom equal to the number of Z
variables.
l i b r a r y ( lmtest )
mod <− lm ( y ∼ x1 + x2 +x3 , d a t a=d a t )
b p t e s t ( mod , s t u d e n t i z e=F ) # # for the classic bp test
Heteroskedasticity 44 / 55
Testing for heteroskedasticity
Toward a General Test for Heteroskedasticity
l i b r a r y ( lmtest )
mod <− lm ( y ∼ x1 + x2 +x3 , d a t a=d a t )
b p t e s t ( mod ) # # Koenker ' s robust version
Heteroskedasticity 45 / 55
Testing for heteroskedasticity
Toward a General Test for Heteroskedasticity
Outline
1 Introduction
6 Practice Problems
Heteroskedasticity 47 / 55
Appendix: Robust Variance Estimator Derivation
b̂ = (X 0 X )−1 X 0 Y (25)
\
Var (b̂) = MSE · (X 0 X )−1 (27)
Heteroskedasticity 48 / 55
Appendix: Robust Variance Estimator Derivation
Those “true variances” are unknown. How can we estimate Var (b̂)?
Heteroskedasticity 50 / 55
Appendix: Robust Variance Estimator Derivation
White’s Idea
White proved that the estimator is consistent, i.e, for large samples,
the value converges to the true Var (b̂).
Sometimes called an “information sandwich” estimator. The matrix
(X 0 X )−1 is the “information matrix”. This equation gives us a
“sandwich” of X 0 Var (e)X between two pieces of information matrix.
Heteroskedasticity 52 / 55
Practice Problems
Outline
1 Introduction
6 Practice Problems
Heteroskedasticity 53 / 55
Practice Problems
Problems
Problems ...
l i b r a r y ( lmtest )
g q t e s t ( mod , f r a c t i o n =0. 2 , o r d e r . b y=∼MYX)
On the other hand, the help page for gqtest also suggests using it to
test a dichotomous predictor, but in that case don’t exclude a
fraction in the middle, just specify the division point that splits the
range of MYX in two. You better sort the dataset by MYX before
trying this, it will be tricky.
l i b r a r y ( lmtest )
d a t <− d a t [ d a t $MYX, ] ## Sorts rows by MYX
g q t e s t ( mod , p o i n t =67) ## splits data at row 67.
Heteroskedasticity 55 / 55
Practice Problems
Problems ...