Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Heteroskedasticity

Introduction

The homoskedasticity assumption: V (u|x) is constant,


Topic 1 → homoskedasticity fails whenever the variance of the unobservables changes
Heteroskedasticity across different segments of the population.
Example: in a consumption equation, heteroskedasticity is present if the variance
of the unobserved factors affecting consumption increases with income.
We saw that homoskedasticity is needed to justify
t tests,
F tests,
“Introductory Econometrics”, Chapter 8
confidence intervals for OLS estimation of the linear regression model
by Wooldridge
In this chapter, we discuss
the available remedies when heteroskedasticity occurs, and
how to test for its presence.
We begin by briefly reviewing the consequences of heteroskedasticity for ordinary
least squares estimation.

Heteroskedasticity Heteroskedasticity

Consequences of Heteroskedasticity for OLS Consequences of Heteroskedasticity for OLS

Consider the multiple linear regression model: Thus, in the presence of heteroskedasticity
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u the usual OLS t statistics do not have t distributions
F statistics are no longer F distributed
Remember that we proved unbiasedness of the OLS estimators under the first four
Gauss-Markov assumptions, → the statistics we used to test hypotheses under the Gauss-Markov assumptions
are not valid in the presence of heteroskedasticity.
→ the homoskedasticity assumption (fifth assumption), played no role in showing
whether OLS was unbiased. We also know that the Gauss-Markov theorem, which says that OLS is best linear
unbiased, relies crucially on the homoskedasticity assumption,
We introduce homoskedasticity as one of the Gauss-Markov assumptions because
the estimators of the variances, Var (β̂j ), are biased without it. → if Var (u|x) is not constant, OLS is no longer BLUE.
→ since the OLS standard errors are based directly on these variances, they are no As we will see later, it is possible to find estimators that are more efficient than
longer valid for constructing confidence intervals and statistics. OLS in the presence of heteroskedasticity

3 4
Heteroskedasticity Heteroskedasticity

Heteroskedasticity-robust Inference After OLS Estimation Heteroskedasticity-robust Inference After OLS Estimation

First of all, we need a new estimate for the variances, Var (β̂j ), one that is valid in
Fortunately, OLS is still useful: we’ll learn how to adjust standard errors, t and F the presence of heteroskedasticity.
statistics so that they are valid in the presence of heteroskedasticity of unknown
form. White (1980) derived such an estimate: the heteroskedasticity-robust standard
error for Var (β̂j ),
→ we can report new statistics that work, regardless of the kind of
heteroskedasticity present in the population. → a careful derivation of the theory is well-beyond the scope of this class, but the
application of heteroskedasticity-robust methods is very easy now using
The methods in this section are known as heteroskedasticity-robust procedures econometrics packages.
because they are valid, at least in large samples, whether or not the errors have
constant variance, and we do not need to know which is the case. Notice that White’s estimation is valid also under homoskedasticity.
We will just refer to them as just robust standard errors when the context is clear.

5 6

Heteroskedasticity Heteroskedasticity

Heteroskedasticity-robust Inference After OLS Estimation Example

Let us estimate a model that allows for wage differences among four groups:
1 married men,
Once heteroskedasticity-robust standard errors are obtained, it is simple to 2 married women,
construct a heteroskedasticity-robust t statistic. 3 single men,
4 single women.
Recall that the general form of the t statistic is
To do this, we must select a base group; we choose single men.
estimate − hypothesized value
t= → we must define dummy variables for each of the remaining groups: marrmale,
standard error
marrfem, singfem
where
Remember:
we are still using the OLS estimates,
we have chosen the hypothesized value ahead of time, since the base group is represented by the intercept, we have included
dummy variables for only three of the four groups.
→ the only difference between the usual OLS t statistic and the
If we were to add a dummy variable for single males to, we would be
heteroskedasticity-robust t statistic is in how the standard error is computed.
introducing perfect collinearity.
Thus, our model is:

log(wage) = β0 + β1 marrmale + β2 marrfem + β3 singfem + ... + u

7 8
Heteroskedasticity Heteroskedasticity

Example Example
The following figure shows our estimated model (using OLS), with both the
heteroskedasticity-robust standard errors (in []) and the usual OLS standard errors (in
()),
we also see that the robust standard errors can be either larger or smaller than the
usual standard errors,
we do not know which will be larger ahead of time.
as an empirical matter, the robust standard errors are often found to be
larger than the usual standard errors.
Note that we do not know, at this point, whether heteroskedasticity is even
present in the population model underlying our estimated model.

Thus: in this particular application, here, no important conclusions are overturned by using the robust standard
errors in this example. This often happens in applied work,
the two sets of standard errors are not very different.
but in other cases the differences between the usual and robust standard
any variable that was statistically significant using the usual t statistic is still errors are much larger.
statistically significant using the heteroskedasticity-robust t statistic.
the largest relative change in se’s is for the coefficient on educ: the usual standard
error is .0067, and the robust standard error is .0074.
→ still, the robust standard error implies a robust t statistic above 10.
9 10

Heteroskedasticity Heteroskedasticity

Heteroskedasticity-robust Inference After OLS Estimation Heteroskedasticity-robust Inference After OLS Estimation

Remember that the heteroskedasticity-robust standard errors are valid more often Thus, in large sample sizes, we can make a case for always reporting only the
than the usual OLS standard errors, heteroskedasticity-robust standard errors in cross-sectional applications,
→ why do we bother with the usual standard errors at all? this practice is being followed more and more in applied work.
One reason they are still used in cross-sectional work is that, if the it is also common to report both standard errors, so that a reader can
determine whether any conclusions are sensitive to the standard error in use.
homoskedasticity assumption holds and the errors are normally distributed,
It is also possible to obtain a F statistic that is robust to heteroskedasticity of an
then the usual t statistics have exact t distributions, regardless of the sample
unknown, arbitrary form.
size.
the robust standard errors and robust t statistics are justified only as the the heteroskedasticity-robust F statistic is also called a
sample size becomes large, heteroskedasticity-robust Wald statistic.
→ with small sample sizes, the robust t statistics can have distributions that a general treatment of this statistic is beyond the scope of this class.
are not very close to the t distribution, which would could throw off our Nevertheless, since many statistics packages now compute it routinely, it is
inference. useful to know that it is available

11 12
Heteroskedasticity Heteroskedasticity

Testing for Heteroskedasticity Breusch-Pagan Test for Heteroskedasticity (BP test)

As usual, we start with the linear model


The heteroskedasticity-robust standard errors provide a simple method for
computing t statistics that are asymptotically t distributed whether or not y = β0 + β1 x1 + β2 x2 + ... + βk xk + u (1)
heteroskedasticity is present.
the assumptions MLR.1 through MLR.4 are maintained in this section. In
Nevertheless, there are still some good reasons for having simple tests that can particular, we assume that E (u|x1 , x2 , ..., xk ) = 0, so that OLS is unbiased.
detect its presence.
We take the null hypothesis to be that the homoskedasticity assumption (5th) is
first, the usual t statistics have exact t distributions under the classical linear true:
model assumptions. H0 : Var (u|x1 , ..., xk ) = σ 2
→ many economists still prefer to see the usual OLS standard errors and test Because we are assuming that u has a zero conditional expectation, we have that
statistics reported, unless there is evidence of heteroskedasticity.
second, if heteroskedasticity is present, the OLS estimator is no longer the Var (u|x1 , x2 , ..., xk ) = E (u 2 |x1 , x2 , ..., xk )
best linear unbiased estimator.
→ the null hypothesis of homoskedasticity is equivalent to:
→ we will see that it is possible to obtain a better estimator than OLS when
the form of heteroskedasticity is known. H0 : E (u 2 |x1 , ..., xk ) = E (u 2 ) = σ 2

13 14

Heteroskedasticity Heteroskedasticity

Breusch-Pagan Test for Heteroskedasticity (BP test) Breusch-Pagan Test for Heteroskedasticity (BP test)

If we could observe the u 2 in the sample, then we could easily compute an statistic
Thus, in order to test for violation of the homoskedasticity assumption, we want to to test H0 (like the F statistic),
test whether u 2 is related (in expected value) to one or more of the explanatory
→ just run the OLS regression of u 2 on x1 ,...,xk , using all n observations.
variables.
Now, we never know the actual errors in the population model, but we do have
If H0 is false, the expected value of u 2 , given the independent variables, can be any
estimates of them:
function of the xj .
→ the OLS residual, ûi , is an estimate of the error ui for observation i.
a simple approach is to assume a linear function of the independant variables:
Thus, we can estimate the equation:
ut2 = δ0 + δ1 x1t + δ2 x2t + ... + δk xpt + νt
û 2 = δ0 + δ1 x1 + ... + δk xk + error (2)
where ν is an error term with mean zero given the xj (reasonable assumption
under H0 ). and test the joint significance of x1 , ..., xk .
and the null hypothesis of homoskedasticity is We could use the F statistic, but usually the Lagrange Multiplier (LM) statistic is
used instead:
H0 : δ1 = δ2 = ...δk = 0 LM = nRû22
where n is the number of observations and Rû22 is the R-squared from equation (2).
Under the null hypothesis, LM is distributed asymptotically as χ2k .

15 16
Heteroskedasticity Heteroskedasticity

Breusch-Pagan Test for Heteroskedasticity (BP test) EXAMPLE: Heteroskedasticity in Housing Price Equations

Let’s test for heteroskedasticity in a simple housing price equation. The estimated
equation using the levels of all variables is
Summarized steps for testing for heteroskedasticity using the BP test:
[ = −21.77 + 0.00207lotsize + 0.123sqrft + 13.85bdrms
price
1 estimate the model (1) by OLS, as usual. Obtain the squared OLS residuals, û 2 (29.48) (0.00064) (0.013) (9.01)
(one for each observation).
n = 88, R 2 = 0.672
2 run the regression in (2). Keep the R-squared from this regression, Rû22 .
3 form the LM statistic and compute the p-value (using the χ2k distribution), this equation tells us nothing about whether the error in the population model is
→ if the p-value is sufficiently small, that is, below the chosen significance level, heteroskedastic.
then we reject the null hypothesis of homoskedasticity. we need to regress the squared OLS residuals on the independent variables,
If the BP test results in a small enough p-value, some corrective measure should be → the R 2 from the regression of û 2 on lotsize, sqrft, and bdrms is Rû22 = 0.1601.
taken. For example, use the heteroskedasticity-robust standard errors and test statistics with n=88, this produces LM statistic of LM = 88 × 0.1601 ≈ 14.09;
discussed in the previous section.
this gives a p-value≈ 0.0028 (using the χ23 distribution)
thus, the reported standard errors are not reliable.

17 18

Heteroskedasticity Heteroskedasticity

EXAMPLE: Heteroskedasticity in Housing Price Equations Breusch-Pagan Test for Heteroskedasticity (BP test)

It has been noticed in many empirical applications that the use of logarithmic
functional form for the dependent variable reduces the occurence of
heteroskedasticity. If we suspect that heteroskedasticity depends only upon certain independent variables,
let us put price, lotsize, and sqrft in logarithmic form, so that the elasticities of we can easily modify the Breusch-Pagan test:
price, with respect to lotsize and sqrft, are constant. The estimated equation is:
we simply regress û 2 on whatever independent variables we choose and carry out
\ = 5.61 + 0.168 log(lotsize) + 0.700 log(sqrft) + 0.037bdrms the appropriate LM test.
log(price)
(0.65) (0.038) (0.093) (0.028)
remember that the appropriate degrees of freedom depends upon the number of
2 independent variables in the regression with û 2 as the dependent variable
n = 88, R = 0.643
if the squared residuals are regressed on only a single independent variable, the test
for heteroskedasticity is just the usual t statistic on the variable,
regressing the squared OLS residuals from this regression on log(lotsize),
→ a significant t statistic suggests that heteroskedasticity is a problem.
log(sqrft), and bdrms gives Rû22 = 0.0480.
thus, LM = 4.22 (p-value=0.239)
→ we fail to reject the null hypothesis of homoskedasticity in the model with
the logarithmic functional forms.

19 20
Heteroskedasticity Heteroskedasticity

The White Test for Heteroskedasticity The White Test for Heteroskedasticity

When the model contains k = 3 independent variables, the White test is based on
It can be proved that the homoskedasticity assumption can be replaced with a an estimation of
weaker condition: û 2 = δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x12 + δ5 x22 + δ6 x32
2
the squared error, u , is uncorrelated with all the independent variables (xj ), the
squares of the independent variables (xj2 ), and all the cross products (xj xh for +δ7 x1 x2 + δ8 x1 x3 + δ9 x2 x3 + error (3)
j 6= h). The White test for heteroskedasticity is the LM statistic for testing that all of
→ if this condition is satisfied, the usual OLS standard errors and test statistics the δj in equation (3) are zero, except for the intercept.
are valid.
→ nine restrictions are being tested in this case.
This observation motivated White (1980) to propose a test for heteroskedasticity
Thus, for White test:
that adds the squares and cross products of all of the independent variables to
equation (2). H0 : Var (u|x) = σ 2 vs H1 : Var (u|x) = g (x)

where g is a non-specified function of the regressors.

21 22

Heteroskedasticity Heteroskedasticity

Weighted Least Squares Estimation The Heteroskedasticity Is Known up to a Multiplicative


Constant
We will study estimation methods more efficient than OLS under
heteroskedasticity.We’ll distinguish two cases:
1 the heteroskedasticity is known up to a multiplicative constant,
Let x denote all the explanatory variables in equation (1) and assume that
→ example:
Var (u|x) = σ 2 h(x) Var (u|x) = σ 2 h(x) (4)

where h is a known function and x denotes all the explanatory variables. where h(x) is a known function of the explanatory variables that determines the
heteroskedasticity, and σ 2 is an unknown constant.
2 the heteroskedasticity function must be estimated,
Since variances must be positive, h(x) > 0 for all possible values of the
→ example: independent variables.

Var (u|x) = σ 2 exp(δ0 + δ1 x2 + ... + δk xk ) We will be able to estimate the unknown population parameter σ 2 from a data
sample.
where x1 , x2 , ..., xk are the independent variables appearing in the
regression model, and the δj are unknown parameters.

23 24
Heteroskedasticity Heteroskedasticity

Known Heteroskedasticity, up to a Mult. Constant Known Heteroskedasticity, up to a Mult. Constant

For a random drawing from the population, we can write To estimate the βj ,
σi2 2 2
= Var (ui |xi ) = σ h(xi ) = σ hi , we will take the original equation, which contains heteroskedasticity,

where xi denotes all independent variables for observation i. yi = β0 + β1 xi1 + β2 xi2 + ... + βk xik + ui (5)
→ hi changes with each observation because the independent variables change and transform it into an equation that has homoskedastic errors (and
across observations. satisfies the other Gauss-Markov assumptions).
For example, consider the simple savings function,
Notice that:
savi = β0 + β1 inci + ui , Var (ui |inci ) = σ 2 inci √
since hi is just a function of xi , ui / hi has a zero expected value conditional
Here: on xi .

here, h(inc) = inc further, since Var (ui |xi ) = E (ui2 |xi ) = σ 2 hi , then
that is, the variance of the error is proportional to the level of income. 2 !
E (ui2 |xi ) σ 2 hi
  
ui ui
this means that, as income increases, the variability in savings increases. Var √ xi = E √ xi = = = σ2

as inc ≥ 0, the variance is always guaranteed to be positive. hi hi hi hi

the standard deviation of ui , conditional on inci , is σ inci .

25 26

Heteroskedasticity Heteroskedasticity

Known Heteroskedasticity, up to a Mult. Constant Known Heteroskedasticity, up to a Mult. Constant


We can divide equation (5) by hi to get
yi β0 xi1 xi2 xik ui
√ = √ + β1 √ + β2 √ + ... + βk √ + √
hi hi hi hi hi hi
In the preceding savings example, the transformed equation looks like

Let xi0∗ = 1/ hi and√the other starred variables denote the corresponding original savi 1 √
variables divided by hi . Then: √ = β0 √ + β1 inci + ui∗
inci inci
yi∗ = β0 xi0∗ + β1 xi1∗ + β2 xi2∗ + ... + βk xik∗ + ui∗ (6)
→ β1 is the marginal propensity to save out of income, an interpretation we obtain
from equation the original saving equation.
The starred variables rarely have a useful interpretation, but
remember that we derived eq.(6) so that we could obtain estimators of the
βj that have better efficiency properties than OLS.
for interpreting the parameters and the model, we always want to return to
the original equation (5).

27 28
Heteroskedasticity Heteroskedasticity

Known Heteroskedasticity, up to a Mult. Constant Known Heteroskedasticity, up to a Mult. Constant

Notice that our transformed equation (eq. (6)):

is linear in its parameters We know that OLS has appealing properties (is BLUE, for example) under the
the random sampling assumption has not changed. Gauss-Markov assumptions,
further, ui∗ has a zero mean and a constant variance ( σ 2 ), conditional on x∗i .
→ the discussion in the previous paragraph suggests estimating the parameters in
→ if the original equation satisfies the first four Gauss-Markov assumptions, then equation (6) by OLS.
the transformed equation (6) satisfies all five Gauss-Markov assumptions.
These estimators, β0∗ , β1∗ ,..., βk∗ , will be different from the OLS estimators in the
Also, if ui has a normal distribution, then ui∗ has a normal distribution with original equation.
variance σ 2 .
The βj∗ are examples of generalized least squares (GLS) estimators.
Therefore, the transformed equation satisfies the classical linear model
assumptions (MLR.1 through MLR.6), if the original model does so, except for the
homoskedasticity assumption.

29 30

Heteroskedasticity Heteroskedasticity

Known Heteroskedasticity, up to a Mult. Constant Known Heteroskedasticity, up to a Mult. Constant

Since equation (6) satisfies all of the ideal assumptions, The GLS estimators for correcting heteroskedasticity are called weighted least
squares (WLS) estimators.
standard errors, t statistics, and F statistics can all be obtained from
regressions using the transformed variables. This name comes from the fact that the βj∗ minimize the weighted sum of squared
residuals, where each squared residual is weighted by 1/hi ,
further, because the GLS estimators of the βj are BLUE, they are more
efficient than the OLS estimators β̂j obtained from the untransformed n
X
equation. (y1 − β0 − β1 xi1 − β2 xi2 − ... − βk xik )2 /hi
i=1
→ after we have transformed the variables, we simply use standard OLS analysis.
But we must remember to interpret the estimates in light of the original equation. the idea is that less weight is given to observations with a higher error
The R 2 that is obtained from estimating (6), while useful for computing some variance;
statistics, is not especially informative: it tells us how much variation in y ∗ is whereas OLS gives each observation the same weight because it is best when
explained by the xj∗ the error variance is identical for all partitions of the population.

31 32
Heteroskedasticity Heteroskedasticity

Known Heteroskedasticity, up to a Mult. Constant Example

The following table contains OLS and WLS estimates of the saving function
Most modern regression packages have a feature for doing weighted least squares.
Typically, sav = β0 + β1 inc + u
we write out the estimated equation in the usual way assuming, for the WLS, that Var (ui |inci ) = σ 2 inci .
we just specify the weighting function.
Our data set contains on 100 families from 1970.
→ the estimates and standard errors will be different from OLS, but the way we We then add variables for family size, age of the household head, years of
interpret those estimates, standard errors, and test statistics is the same. education for the household head, and a dummy variable indicating whether the
household head is black.

33 34

Heteroskedasticity Heteroskedasticity

Example Example

in (1): OLS estimate of the


marginal propensity to save the intercept estimates are very
(MPS) is 0.147, with a t statistic different for OLS and WLS, but
of 2.53. this should cause no concern since
the t statistics are both very
the WLS estimate of the MPS is small.
somewhat higher: 0.172, with
→ finding fairly large changes in
t = 3.02.
coefficients that are insignificant
the se of the OLS and WLS is not uncommon when comparing
estimates are very similar for OLS and WLS estimates.
MPS.
the R 2 in columns (1) and (2) are
not comparable.

OBS: the se for OLS are the nonrobust se. If we really thought heteroskedasticity was a
problem, we would probably compute the heteroskedasticity-robust standard errors as
well; we will not do that here.
35 36
Heteroskedasticity Heteroskedasticity

Example Example

F test based on the OLS


Adding demographic variables:
estimates:
reduces the MPS whether
uses the R 2 from (1) and
OLS or WLS is used
(3)
the se also increase by a fair
amount df = n − k − 1 =
100 − 5 − 1 = 94
none of the additional variables is
individually significant (in OLS four restrictions,
and WLS). (0.0828−0.0621) 94
→F = (1−0.0828) 4
≈ 0.53
→ are they jointly significant?
the p-value=0.715, and thus, the
Let’s test.
variables are jointly insignificant

37 38

Heteroskedasticity Heteroskedasticity

Example Example

F test based on the WLS


estimates:
uses the R 2 from (2) and Using the OLS residuals obtained from
(4) the OLS regression reported in column
same df and restrictions (1), the regression of

→ F ≈ 0.50 and p-value=0.739 û 2 on inc


thus, using either OLS or WLS, yields a t statistic on inc of 0.96. Is
the demographic variables are there any need to use weighted least
jointly insignificant. squares in this example?
→ this suggests that the simple
regression model relating savings
to income is sufficient.

39 40
Heteroskedasticity Heteroskedasticity

The Heteroskedasticity Function Must Be Estimated: FGLS


Feasible GLS (FGLS)
There are many ways to model heteroskedasticity, but we will study one particular,
fairly flexible approach. Assume that

Var (u|x) = σ 2 exp(δ0 + δ1 x1 + δ2 x2 + ... + δk xk ) (7)


In most cases, the exact form of heteroskedasticity is not obvious,
where x1 , x2 , ..., xk are the independent variables appearing in the regression model,
→ it is difficult to find the function h(xi ) of the previous section.
and the δj are unknown parameters.
Nevertheless, in many cases we can model the function h and use the data to Other functions of the xj can appear, but we will focus primarily on (7).
estimate the unknown parameters in this model. In the notation of the previous subsection,
→ this results in an estimate of each hi , denoted as ĥi
h(x) = exp(δ0 + δ1 x1 + δ2 x2 + ... + δk xk )
Using ĥi instead of hi in the GLS transformation yields an estimator called the
feasible GLS (FGLS) estimator. If the parameters δj were known, then we would just apply WLS, as in the previous
subsection.
If they are not, use the data to estimate them, and then to use these estimates to
construct weights.

41 42

Heteroskedasticity Heteroskedasticity

FGLS FGLS

How can we estimate the δj ?


→ we will transform equation (7) into a linear form that, with slight modification,
can be estimated by OLS.
As usual, we must replace the unobserved u with the OLS residuals. Therefore, we
Under assumption (7), we can write
run the regression of
u 2 = σ 2 exp(δ0 + δ1 x1 + δ2 x2 + ... + δk xk )ν log(û 2 ) on x1 , x2 , ..., xk

where ν has a mean equal to unity, conditional on x. Actually, what we need from this regression are the fitted values; call these ĝi .
Then, the estimates of hi are
If we assume that ν is actually independent of x, we can write ĥi = exp(ĝi )
log(u 2 ) = α0 + δ1 x1 + δ2 x2 + ... + δk xk + e (8) We now use WLS with weights 1/ĥi . We summarize the steps,

where e has a zero mean and is independent of x; the intercept in this equation is
different from δ0 , but this is not important.
Since (8) satisfies the Gauss-Markov assumptions, we can get unbiased estimators
of the δj by using OLS.

43 44
Heteroskedasticity Heteroskedasticity

FGLS

A feasible GLS procedure to correct for heteroskedasticity: If we could use (a correct) hi rather than ĥi in the WLS procedure, we know that
our estimators would be BLUE
1 run the regression of y on x1 , x2 , ..., xk and obtain the residuals, û.
Having to estimate hi means that the FGLS estimator is no longer unbiased (so it
2 create log(û 2 ) by first squaring the OLS residuals and then taking the natural log. cannot be BLUE, either).
3 run the regression of log(û 2 ) on x1 , x2 , ..., xk and obtain the fitted values, ĝ . Nevertheless, the FGLS estimator has good properties in large sample (for
example, asymptotically more efficient than OLS)
4 exponentiate the fitted values ĝ : ĥ = exp(ĝ ).
→ for large sample sizes, FGLS is an attractive alternative to OLS when there is
5 estimate the equation evidence of heteroskedasticity
y = β0 + β1 x1 + ... + βk xk + u

by WLS, using weights 1/ĥ.

45 46

Heteroskedasticity Heteroskedasticity

FGLS Example

We must remember that the FGLS estimators are estimators of the parameters in
the equation We estimate a demand function for daily cigarette consumption.
y = β0 + β1 x1 + ... + βk xk + u.
Since most people do not smoke, the dependent variable, cigs, is zero for most
observations,
just as the OLS estimates measure the marginal impact of each xj on y , so a linear model is not ideal because it can result in negative predicted values.
do the FGLS estimates.
nevertheless, we can still learn something about the determinants of cigarette
we use the FGLS estimates in place of the OLS estimates because they are smoking by using a linear model.
more efficient and have associated test statistics with the usual t and F
distributions, at least in large samples.

47 48
Heteroskedasticity Heteroskedasticity

Example Example
The equation estimated by ordinary least squares, with the usual OLS standard errors in
parentheses, is

neither income nor cigarette price is statistically significant


where each year of education reduces the average cigarettes smoked per day by one-half,
cigs: number of cigarettes smoked per day, and the effect is statistically significant.
income: annual income, cigarette smoking is also related to age, in a quadratic fashion:
cigpric: per pack price of cigarettes (in cents), smoking increases with age up until age = 0.771/[2 × 0.009] ≈ 42.83,
educ: years of schooling, and then smoking decreases with age.
age: age measured in years, and Both terms in the quadratic are statistically significant.
restaurn: binary indicator equal to unity if the person resides in a state with the presence of a restriction on smoking in restaurants decreases cigarette smoking
restaurant smoking restrictions. by almost three cigarettes per day, on average.
49 50

Heteroskedasticity Heteroskedasticity

Example Example
Therefore, we estimate the equation using the previous FGLS procedure. The estimated
equation is

Do the errors contain heteroskedasticity? The BP test:


the regression of the squared OLS residuals on the independent variables produces the income effect is now statistically significant and larger in magnitude.
Rû22 = 0.040.
the price effect is also notably bigger, but it is still statistically insignificant.
the LM statistic is LM = 807 × 0.040 = 32.28, and this is the outcome of a χ26 the estimates on the other variables have, naturally, changed somewhat, but the
random variable. basic story is still the same. Cigarette smoking:
the p-value is less than 0.000015, which is very strong evidence of is negatively related to schooling,
heteroskedasticity. has a quadratic relationship with age, and
is negatively affected by restaurant smoking restrictions.

51 52
Heteroskedasticity Heteroskedasticity

FGLS FGLS

Two important observations before finishing this topic,

First, we must be careful in computing F statistics for testing multiple hypotheses


after estimation by WLS. Secondly, our last example hints at an issue that sometimes arises in applications of
weighted least squares: the OLS and WLS estimates can be substantially different.
it is important that the same weights be used to estimate the unrestricted
and restricted models. This is not such a big problem in the demand for cigarettes equation,

we should first estimate the unrestricted model by OLS. all the coefficients maintain the same signs, and
once we have obtained the weights, we can use them to estimate the the biggest changes are on variables that were statistically insignificant when
restricted model as well. the equation was estimated by OLS.
the F statistic can be computed as usual. The OLS and WLS estimates will always differ due to sampling error,
→ the issue is whether their difference is enough to change important conclusions.
Fortunately, many regression packages have a simple command for testing joint
restrictions after WLS estimation, so we need not perform the restricted regression
ourselves.

53 54

Heteroskedasticity Heteroskedasticity

FGLS The R-Squared Form of the F Statistic

If OLS and WLS produce statistically significant estimates that: Remember that, by definition,

differ in sign, or SSR = SST (1 − R 2 )


the difference in magnitudes of the estimates is practically large,
Thus, a bit of algebra shows that
→ we should be suspicious.
Typically, this indicates that one of the other Gauss-Markov assumptions is false, (SSRr − SSRur )/q
F =
particularly the zero conditional mean assumption on the error (MLR.3). SSRur /(n − k − 1)
2
correlation between u and any independent variable causes bias and (Rur − Rr2 )/q
=
inconsistency in OLS and WLS, and the biases will usually be different. 2
(1 − Rur )/(n − k − 1)
the Hausman test can be used to formally compare the OLS and WLS
estimates to see if they differ by more than the sampling error suggests. This Since the R 2 is reported with almost all regressions (whereas the SSR is not), it is
test is beyond the scope of this text. In many cases, an informal “eyeballing” easy to use the R 2 from the unrestricted and restricted models to test for exclusion
of the estimates is sufficient to detect a problem. of some variables.

55 56

You might also like