Professional Documents
Culture Documents
"Introductory Econometrics", Chapter 8 by Wooldridge: Heteroskedasticity
"Introductory Econometrics", Chapter 8 by Wooldridge: Heteroskedasticity
Introduction
Heteroskedasticity Heteroskedasticity
Consider the multiple linear regression model: Thus, in the presence of heteroskedasticity
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u the usual OLS t statistics do not have t distributions
F statistics are no longer F distributed
Remember that we proved unbiasedness of the OLS estimators under the first four
Gauss-Markov assumptions, → the statistics we used to test hypotheses under the Gauss-Markov assumptions
are not valid in the presence of heteroskedasticity.
→ the homoskedasticity assumption (fifth assumption), played no role in showing
whether OLS was unbiased. We also know that the Gauss-Markov theorem, which says that OLS is best linear
unbiased, relies crucially on the homoskedasticity assumption,
We introduce homoskedasticity as one of the Gauss-Markov assumptions because
the estimators of the variances, Var (β̂j ), are biased without it. → if Var (u|x) is not constant, OLS is no longer BLUE.
→ since the OLS standard errors are based directly on these variances, they are no As we will see later, it is possible to find estimators that are more efficient than
longer valid for constructing confidence intervals and statistics. OLS in the presence of heteroskedasticity
3 4
Heteroskedasticity Heteroskedasticity
Heteroskedasticity-robust Inference After OLS Estimation Heteroskedasticity-robust Inference After OLS Estimation
First of all, we need a new estimate for the variances, Var (β̂j ), one that is valid in
Fortunately, OLS is still useful: we’ll learn how to adjust standard errors, t and F the presence of heteroskedasticity.
statistics so that they are valid in the presence of heteroskedasticity of unknown
form. White (1980) derived such an estimate: the heteroskedasticity-robust standard
error for Var (β̂j ),
→ we can report new statistics that work, regardless of the kind of
heteroskedasticity present in the population. → a careful derivation of the theory is well-beyond the scope of this class, but the
application of heteroskedasticity-robust methods is very easy now using
The methods in this section are known as heteroskedasticity-robust procedures econometrics packages.
because they are valid, at least in large samples, whether or not the errors have
constant variance, and we do not need to know which is the case. Notice that White’s estimation is valid also under homoskedasticity.
We will just refer to them as just robust standard errors when the context is clear.
5 6
Heteroskedasticity Heteroskedasticity
Let us estimate a model that allows for wage differences among four groups:
1 married men,
Once heteroskedasticity-robust standard errors are obtained, it is simple to 2 married women,
construct a heteroskedasticity-robust t statistic. 3 single men,
4 single women.
Recall that the general form of the t statistic is
To do this, we must select a base group; we choose single men.
estimate − hypothesized value
t= → we must define dummy variables for each of the remaining groups: marrmale,
standard error
marrfem, singfem
where
Remember:
we are still using the OLS estimates,
we have chosen the hypothesized value ahead of time, since the base group is represented by the intercept, we have included
dummy variables for only three of the four groups.
→ the only difference between the usual OLS t statistic and the
If we were to add a dummy variable for single males to, we would be
heteroskedasticity-robust t statistic is in how the standard error is computed.
introducing perfect collinearity.
Thus, our model is:
7 8
Heteroskedasticity Heteroskedasticity
Example Example
The following figure shows our estimated model (using OLS), with both the
heteroskedasticity-robust standard errors (in []) and the usual OLS standard errors (in
()),
we also see that the robust standard errors can be either larger or smaller than the
usual standard errors,
we do not know which will be larger ahead of time.
as an empirical matter, the robust standard errors are often found to be
larger than the usual standard errors.
Note that we do not know, at this point, whether heteroskedasticity is even
present in the population model underlying our estimated model.
Thus: in this particular application, here, no important conclusions are overturned by using the robust standard
errors in this example. This often happens in applied work,
the two sets of standard errors are not very different.
but in other cases the differences between the usual and robust standard
any variable that was statistically significant using the usual t statistic is still errors are much larger.
statistically significant using the heteroskedasticity-robust t statistic.
the largest relative change in se’s is for the coefficient on educ: the usual standard
error is .0067, and the robust standard error is .0074.
→ still, the robust standard error implies a robust t statistic above 10.
9 10
Heteroskedasticity Heteroskedasticity
Heteroskedasticity-robust Inference After OLS Estimation Heteroskedasticity-robust Inference After OLS Estimation
Remember that the heteroskedasticity-robust standard errors are valid more often Thus, in large sample sizes, we can make a case for always reporting only the
than the usual OLS standard errors, heteroskedasticity-robust standard errors in cross-sectional applications,
→ why do we bother with the usual standard errors at all? this practice is being followed more and more in applied work.
One reason they are still used in cross-sectional work is that, if the it is also common to report both standard errors, so that a reader can
determine whether any conclusions are sensitive to the standard error in use.
homoskedasticity assumption holds and the errors are normally distributed,
It is also possible to obtain a F statistic that is robust to heteroskedasticity of an
then the usual t statistics have exact t distributions, regardless of the sample
unknown, arbitrary form.
size.
the robust standard errors and robust t statistics are justified only as the the heteroskedasticity-robust F statistic is also called a
sample size becomes large, heteroskedasticity-robust Wald statistic.
→ with small sample sizes, the robust t statistics can have distributions that a general treatment of this statistic is beyond the scope of this class.
are not very close to the t distribution, which would could throw off our Nevertheless, since many statistics packages now compute it routinely, it is
inference. useful to know that it is available
11 12
Heteroskedasticity Heteroskedasticity
13 14
Heteroskedasticity Heteroskedasticity
Breusch-Pagan Test for Heteroskedasticity (BP test) Breusch-Pagan Test for Heteroskedasticity (BP test)
If we could observe the u 2 in the sample, then we could easily compute an statistic
Thus, in order to test for violation of the homoskedasticity assumption, we want to to test H0 (like the F statistic),
test whether u 2 is related (in expected value) to one or more of the explanatory
→ just run the OLS regression of u 2 on x1 ,...,xk , using all n observations.
variables.
Now, we never know the actual errors in the population model, but we do have
If H0 is false, the expected value of u 2 , given the independent variables, can be any
estimates of them:
function of the xj .
→ the OLS residual, ûi , is an estimate of the error ui for observation i.
a simple approach is to assume a linear function of the independant variables:
Thus, we can estimate the equation:
ut2 = δ0 + δ1 x1t + δ2 x2t + ... + δk xpt + νt
û 2 = δ0 + δ1 x1 + ... + δk xk + error (2)
where ν is an error term with mean zero given the xj (reasonable assumption
under H0 ). and test the joint significance of x1 , ..., xk .
and the null hypothesis of homoskedasticity is We could use the F statistic, but usually the Lagrange Multiplier (LM) statistic is
used instead:
H0 : δ1 = δ2 = ...δk = 0 LM = nRû22
where n is the number of observations and Rû22 is the R-squared from equation (2).
Under the null hypothesis, LM is distributed asymptotically as χ2k .
15 16
Heteroskedasticity Heteroskedasticity
Breusch-Pagan Test for Heteroskedasticity (BP test) EXAMPLE: Heteroskedasticity in Housing Price Equations
Let’s test for heteroskedasticity in a simple housing price equation. The estimated
equation using the levels of all variables is
Summarized steps for testing for heteroskedasticity using the BP test:
[ = −21.77 + 0.00207lotsize + 0.123sqrft + 13.85bdrms
price
1 estimate the model (1) by OLS, as usual. Obtain the squared OLS residuals, û 2 (29.48) (0.00064) (0.013) (9.01)
(one for each observation).
n = 88, R 2 = 0.672
2 run the regression in (2). Keep the R-squared from this regression, Rû22 .
3 form the LM statistic and compute the p-value (using the χ2k distribution), this equation tells us nothing about whether the error in the population model is
→ if the p-value is sufficiently small, that is, below the chosen significance level, heteroskedastic.
then we reject the null hypothesis of homoskedasticity. we need to regress the squared OLS residuals on the independent variables,
If the BP test results in a small enough p-value, some corrective measure should be → the R 2 from the regression of û 2 on lotsize, sqrft, and bdrms is Rû22 = 0.1601.
taken. For example, use the heteroskedasticity-robust standard errors and test statistics with n=88, this produces LM statistic of LM = 88 × 0.1601 ≈ 14.09;
discussed in the previous section.
this gives a p-value≈ 0.0028 (using the χ23 distribution)
thus, the reported standard errors are not reliable.
17 18
Heteroskedasticity Heteroskedasticity
EXAMPLE: Heteroskedasticity in Housing Price Equations Breusch-Pagan Test for Heteroskedasticity (BP test)
It has been noticed in many empirical applications that the use of logarithmic
functional form for the dependent variable reduces the occurence of
heteroskedasticity. If we suspect that heteroskedasticity depends only upon certain independent variables,
let us put price, lotsize, and sqrft in logarithmic form, so that the elasticities of we can easily modify the Breusch-Pagan test:
price, with respect to lotsize and sqrft, are constant. The estimated equation is:
we simply regress û 2 on whatever independent variables we choose and carry out
\ = 5.61 + 0.168 log(lotsize) + 0.700 log(sqrft) + 0.037bdrms the appropriate LM test.
log(price)
(0.65) (0.038) (0.093) (0.028)
remember that the appropriate degrees of freedom depends upon the number of
2 independent variables in the regression with û 2 as the dependent variable
n = 88, R = 0.643
if the squared residuals are regressed on only a single independent variable, the test
for heteroskedasticity is just the usual t statistic on the variable,
regressing the squared OLS residuals from this regression on log(lotsize),
→ a significant t statistic suggests that heteroskedasticity is a problem.
log(sqrft), and bdrms gives Rû22 = 0.0480.
thus, LM = 4.22 (p-value=0.239)
→ we fail to reject the null hypothesis of homoskedasticity in the model with
the logarithmic functional forms.
19 20
Heteroskedasticity Heteroskedasticity
The White Test for Heteroskedasticity The White Test for Heteroskedasticity
When the model contains k = 3 independent variables, the White test is based on
It can be proved that the homoskedasticity assumption can be replaced with a an estimation of
weaker condition: û 2 = δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x12 + δ5 x22 + δ6 x32
2
the squared error, u , is uncorrelated with all the independent variables (xj ), the
squares of the independent variables (xj2 ), and all the cross products (xj xh for +δ7 x1 x2 + δ8 x1 x3 + δ9 x2 x3 + error (3)
j 6= h). The White test for heteroskedasticity is the LM statistic for testing that all of
→ if this condition is satisfied, the usual OLS standard errors and test statistics the δj in equation (3) are zero, except for the intercept.
are valid.
→ nine restrictions are being tested in this case.
This observation motivated White (1980) to propose a test for heteroskedasticity
Thus, for White test:
that adds the squares and cross products of all of the independent variables to
equation (2). H0 : Var (u|x) = σ 2 vs H1 : Var (u|x) = g (x)
21 22
Heteroskedasticity Heteroskedasticity
where h is a known function and x denotes all the explanatory variables. where h(x) is a known function of the explanatory variables that determines the
heteroskedasticity, and σ 2 is an unknown constant.
2 the heteroskedasticity function must be estimated,
Since variances must be positive, h(x) > 0 for all possible values of the
→ example: independent variables.
Var (u|x) = σ 2 exp(δ0 + δ1 x2 + ... + δk xk ) We will be able to estimate the unknown population parameter σ 2 from a data
sample.
where x1 , x2 , ..., xk are the independent variables appearing in the
regression model, and the δj are unknown parameters.
23 24
Heteroskedasticity Heteroskedasticity
For a random drawing from the population, we can write To estimate the βj ,
σi2 2 2
= Var (ui |xi ) = σ h(xi ) = σ hi , we will take the original equation, which contains heteroskedasticity,
where xi denotes all independent variables for observation i. yi = β0 + β1 xi1 + β2 xi2 + ... + βk xik + ui (5)
→ hi changes with each observation because the independent variables change and transform it into an equation that has homoskedastic errors (and
across observations. satisfies the other Gauss-Markov assumptions).
For example, consider the simple savings function,
Notice that:
savi = β0 + β1 inci + ui , Var (ui |inci ) = σ 2 inci √
since hi is just a function of xi , ui / hi has a zero expected value conditional
Here: on xi .
here, h(inc) = inc further, since Var (ui |xi ) = E (ui2 |xi ) = σ 2 hi , then
that is, the variance of the error is proportional to the level of income. 2 !
E (ui2 |xi ) σ 2 hi
ui ui
this means that, as income increases, the variability in savings increases. Var √ xi = E √ xi = = = σ2
as inc ≥ 0, the variance is always guaranteed to be positive. hi hi hi hi
√
the standard deviation of ui , conditional on inci , is σ inci .
25 26
Heteroskedasticity Heteroskedasticity
√
We can divide equation (5) by hi to get
yi β0 xi1 xi2 xik ui
√ = √ + β1 √ + β2 √ + ... + βk √ + √
hi hi hi hi hi hi
In the preceding savings example, the transformed equation looks like
√
Let xi0∗ = 1/ hi and√the other starred variables denote the corresponding original savi 1 √
variables divided by hi . Then: √ = β0 √ + β1 inci + ui∗
inci inci
yi∗ = β0 xi0∗ + β1 xi1∗ + β2 xi2∗ + ... + βk xik∗ + ui∗ (6)
→ β1 is the marginal propensity to save out of income, an interpretation we obtain
from equation the original saving equation.
The starred variables rarely have a useful interpretation, but
remember that we derived eq.(6) so that we could obtain estimators of the
βj that have better efficiency properties than OLS.
for interpreting the parameters and the model, we always want to return to
the original equation (5).
27 28
Heteroskedasticity Heteroskedasticity
is linear in its parameters We know that OLS has appealing properties (is BLUE, for example) under the
the random sampling assumption has not changed. Gauss-Markov assumptions,
further, ui∗ has a zero mean and a constant variance ( σ 2 ), conditional on x∗i .
→ the discussion in the previous paragraph suggests estimating the parameters in
→ if the original equation satisfies the first four Gauss-Markov assumptions, then equation (6) by OLS.
the transformed equation (6) satisfies all five Gauss-Markov assumptions.
These estimators, β0∗ , β1∗ ,..., βk∗ , will be different from the OLS estimators in the
Also, if ui has a normal distribution, then ui∗ has a normal distribution with original equation.
variance σ 2 .
The βj∗ are examples of generalized least squares (GLS) estimators.
Therefore, the transformed equation satisfies the classical linear model
assumptions (MLR.1 through MLR.6), if the original model does so, except for the
homoskedasticity assumption.
29 30
Heteroskedasticity Heteroskedasticity
Since equation (6) satisfies all of the ideal assumptions, The GLS estimators for correcting heteroskedasticity are called weighted least
squares (WLS) estimators.
standard errors, t statistics, and F statistics can all be obtained from
regressions using the transformed variables. This name comes from the fact that the βj∗ minimize the weighted sum of squared
residuals, where each squared residual is weighted by 1/hi ,
further, because the GLS estimators of the βj are BLUE, they are more
efficient than the OLS estimators β̂j obtained from the untransformed n
X
equation. (y1 − β0 − β1 xi1 − β2 xi2 − ... − βk xik )2 /hi
i=1
→ after we have transformed the variables, we simply use standard OLS analysis.
But we must remember to interpret the estimates in light of the original equation. the idea is that less weight is given to observations with a higher error
The R 2 that is obtained from estimating (6), while useful for computing some variance;
statistics, is not especially informative: it tells us how much variation in y ∗ is whereas OLS gives each observation the same weight because it is best when
explained by the xj∗ the error variance is identical for all partitions of the population.
31 32
Heteroskedasticity Heteroskedasticity
The following table contains OLS and WLS estimates of the saving function
Most modern regression packages have a feature for doing weighted least squares.
Typically, sav = β0 + β1 inc + u
we write out the estimated equation in the usual way assuming, for the WLS, that Var (ui |inci ) = σ 2 inci .
we just specify the weighting function.
Our data set contains on 100 families from 1970.
→ the estimates and standard errors will be different from OLS, but the way we We then add variables for family size, age of the household head, years of
interpret those estimates, standard errors, and test statistics is the same. education for the household head, and a dummy variable indicating whether the
household head is black.
33 34
Heteroskedasticity Heteroskedasticity
Example Example
OBS: the se for OLS are the nonrobust se. If we really thought heteroskedasticity was a
problem, we would probably compute the heteroskedasticity-robust standard errors as
well; we will not do that here.
35 36
Heteroskedasticity Heteroskedasticity
Example Example
37 38
Heteroskedasticity Heteroskedasticity
Example Example
39 40
Heteroskedasticity Heteroskedasticity
41 42
Heteroskedasticity Heteroskedasticity
FGLS FGLS
where ν has a mean equal to unity, conditional on x. Actually, what we need from this regression are the fitted values; call these ĝi .
Then, the estimates of hi are
If we assume that ν is actually independent of x, we can write ĥi = exp(ĝi )
log(u 2 ) = α0 + δ1 x1 + δ2 x2 + ... + δk xk + e (8) We now use WLS with weights 1/ĥi . We summarize the steps,
where e has a zero mean and is independent of x; the intercept in this equation is
different from δ0 , but this is not important.
Since (8) satisfies the Gauss-Markov assumptions, we can get unbiased estimators
of the δj by using OLS.
43 44
Heteroskedasticity Heteroskedasticity
FGLS
A feasible GLS procedure to correct for heteroskedasticity: If we could use (a correct) hi rather than ĥi in the WLS procedure, we know that
our estimators would be BLUE
1 run the regression of y on x1 , x2 , ..., xk and obtain the residuals, û.
Having to estimate hi means that the FGLS estimator is no longer unbiased (so it
2 create log(û 2 ) by first squaring the OLS residuals and then taking the natural log. cannot be BLUE, either).
3 run the regression of log(û 2 ) on x1 , x2 , ..., xk and obtain the fitted values, ĝ . Nevertheless, the FGLS estimator has good properties in large sample (for
example, asymptotically more efficient than OLS)
4 exponentiate the fitted values ĝ : ĥ = exp(ĝ ).
→ for large sample sizes, FGLS is an attractive alternative to OLS when there is
5 estimate the equation evidence of heteroskedasticity
y = β0 + β1 x1 + ... + βk xk + u
45 46
Heteroskedasticity Heteroskedasticity
FGLS Example
We must remember that the FGLS estimators are estimators of the parameters in
the equation We estimate a demand function for daily cigarette consumption.
y = β0 + β1 x1 + ... + βk xk + u.
Since most people do not smoke, the dependent variable, cigs, is zero for most
observations,
just as the OLS estimates measure the marginal impact of each xj on y , so a linear model is not ideal because it can result in negative predicted values.
do the FGLS estimates.
nevertheless, we can still learn something about the determinants of cigarette
we use the FGLS estimates in place of the OLS estimates because they are smoking by using a linear model.
more efficient and have associated test statistics with the usual t and F
distributions, at least in large samples.
47 48
Heteroskedasticity Heteroskedasticity
Example Example
The equation estimated by ordinary least squares, with the usual OLS standard errors in
parentheses, is
Heteroskedasticity Heteroskedasticity
Example Example
Therefore, we estimate the equation using the previous FGLS procedure. The estimated
equation is
51 52
Heteroskedasticity Heteroskedasticity
FGLS FGLS
we should first estimate the unrestricted model by OLS. all the coefficients maintain the same signs, and
once we have obtained the weights, we can use them to estimate the the biggest changes are on variables that were statistically insignificant when
restricted model as well. the equation was estimated by OLS.
the F statistic can be computed as usual. The OLS and WLS estimates will always differ due to sampling error,
→ the issue is whether their difference is enough to change important conclusions.
Fortunately, many regression packages have a simple command for testing joint
restrictions after WLS estimation, so we need not perform the restricted regression
ourselves.
53 54
Heteroskedasticity Heteroskedasticity
If OLS and WLS produce statistically significant estimates that: Remember that, by definition,
55 56