Chapter 11
Answers to the following questions:

1. What is the nature of heteroscedasticity?

2. What are its consequences?
3. How does one detect it?
4. What are the remedial measures?

The nature of heteroscedasticity

1. The error-learning models  variance is expected

to decrease
2. Discretionary income  variance is expected to
3. Data collecting technique improve  variance is
likely to decrease
4. Presence of outliers
5. Specification bias  some important variables are
omitted from the model

The nature of heteroscedasticity

6. Skewness in the distribution of one or more

regressors included in the model
7. As David Hendry notes, heteroscedasticity can
also arise because of:
a) Incorrect data transformation
b) Incorrect functional form
8. Problem of heteroscedasticity is likely to be more
common in cross-section than in time series data.

OLS estimation in the presence of
 The two-variable model:
Yi  1   2 X i  ui

2 
 xi yi n X iYi   X i  Yi

 i n X i    X i 
x 2 2 2

 
var ˆ2 
i i
x  2

x 

 ̂ 2 is still linear unbiased and consistent, but no longer

best and minimum variance  they are not BLUE

The method of Generalized Least
Squares (GLS)
 GLS is OLS on the transformed variables that
satisfy the standard least squares assumptions
 GLS estimators are BLUE
 The SRF:
Yi  1   2 X i  ui
Yi  1 X 0i   2 X i  ui where X 0i  1 for each i
Assume that the heteroscedastic variances  i2 are known,
Yi  X 0i   Xi   ui 
 1    2   
i  i   i   i 

The method of Generalized Least
Squares (GLS)
Yi    1 X 0i   2 X i  ui
where the starred variables are the original variables devided by  i .
 ui 
   Eu 
 
var u i i  E 
 i 
 2 E  ui 
since  i2 is known
 
 2  i2
since E  ui    i2

1 constant = homoscedastic

GLS estimators
Yi  X 0i   X i   ui 
 1    2     
i  i   i   i 
Yi   1 X   2 X i  u
  
  

 
Minimize  uˆ2
i   Yi   1 X   2 X i
  
 

2 2
 uˆi   Yi    X 0i    X i  
          1     2    
 i  i   i   i 

GLS estimators

ˆ    w    w X Y     w X    wY 
i i i i i i i i

 w   w X   w X 
2 2
i i i i i

var  ˆ  
   w i

 w   w X   w X 
2 2
i i i i i

where wi  2

Difference between OLS and GLS

in OLS we minimize a sum of residual squares

 uˆ    
i Yi  ˆ1  ˆ2 X i
in GLS we minimize a weighted sum of residual squares
with wi  1  i2

 
 w uˆ   wi Yi  ˆ1  ˆ2 X i
i i

Observation coming from a population with larger  will get

relatively smaller weight and those from a population with
smaller  will proportionally larger weight in minimizing the RSS

Consequences of using OLS in the
presence of heteroscedasticity

   
var  2  var ˆ2
ˆ 

 Confidence interval based on the later will be

unnecessarily larger
 The t and F tests are likely to give us inaccurate
results and coefficient to be statistically insignificant
 If we persist in using the usual testing procedure
despite heteroscedasticity, whatever conclusions
we draw or inferences we make may be very

Informal methods
1. Nature of the problem
2. Graphical method

Formal methods
1. Park test
2. Glejser test
3. Spearman’s rank correlation test
4. Goldfeld-Quandt test
5. Breusch-Pagan-Godfrey test
6. White’s general heteroscedasticity test
7. Koenker-Basset test

Graphical method
u2 u2 u2

u2 u2


Park test
 Variance is some function of the explanatory
variable Xi
 i2   2 X i evi or ln i2  ln  2   ln X i  vi
Since  2 is generally unknown, using uˆi2 as a proxy
ln uˆi2  ln  2   ln X i  vi     ln X i  vi
if  statistically significant  heteroscedasticity
Problem: vi may not satisfy the OLS assumptions and
may itself be heteroscedasticity

Glejser test
ˆ i  1   2 X i  i ˆi  1   2 X i  i
1 1
ˆ i  1   2  i ˆi  1   2  i
Xi Xi
ˆ i  1   2 X i  i ˆi  1   2 X i2  i

P roblem: i has some problems in that its expected

value is nonzero, it is serially correlated and

Spearman’s rank correlation test

Spearman’s rank correlation coefficient:

  di2 
rs  1  6 2 
 
 n n  1 
di = difference in the rank assigned to two
different characteristic of the i-th individual
or phenomenon
n = number of individuals or phenomenon

Spearman’s rank correlation test
1. fit the regression on the data on Y and X and obtain
the residuals
2. taking absolute value of residual and rank both
residual and X and compute the Spearman’s rank
correlation coefficient
3. assuming that the population rank correlation
coefficient s is zero and n>8, the significance of the
sample rs can be tested by
rs n  2
t with df=n-2
1 r

If t > the critical t  heteroscedasticity

Goldfeld-Quandt test
 Assumes that the heteroscedasticity variance,
i2, is positively related to one of the explanatory
variables in the regression model

 The two-variable model: Yi  i   2 X i  ui

 i2 is positively related to Xi as

 i2   2 X i2
where  is a constant

Goldfeld-Quandt test
1. Order the observation according to the values of Xi,
beginning with the lowest X value
2. Omit c central observations, and divide the remaining
observations into two groups each (n-c)/2
3. Fit separate OLS regressions to the first (n-c)/2
observations and the last (n-c)/2 observations, and
obtain the residual sum of squares RSS1 and RSS2.
These RSS each have

 n  c  k or
 n  c  2k 
  df
2  2 
Goldfeld-Quandt test
4. Compute the ratio
RSS2 df
RSS1 df
If ui are assumed to be normally distributed and if the
assumption of homoscedasticity is valid, then it can be
shown that  follows the F distribution with df each of
(n-c-2k)/2. If  > the critical F, reject the hypothesis of
 The success of the GQ test depends on the value of c
and identifying the correct X with which to order the

Breusch-Pagan-Godfrey test
The k-variable regression model
Yi  1   2 X 2i  ...   k X ki  ui
Assume that the error variance  i2 is described as
 i2  f  1   2 Z 2i  ...   m Z mi 
that is,  i2 is a linear function of the nonstochastic variables Z's.
Specifically, assume that  i2  1   2 Z 2i  ...   m Z mi
that is,  i2 is a linear function of the Z's.
If  2   3  ...   m  0   i2  1  constant 

Breusch-Pagan-Godfrey test

1. Estimate by OLS and obtain the residuals

2. 
Obtain  2  uˆi2 n , that is the ML estimator of
Construct variables pi defined as pi  ui 
2 2
3. ˆ 
4. Regress pi on the Z’s as pi  1   2 Z 2i  ...   m Z mi
5. Obtain the ESS and define
   ESS 

Breusch-Pagan-Godfrey test
Assuming ui are normally distributed, if there is
homoscedasticity and if the sample size n increase
indefinitely, then
~ 2
m 1

If  > the critical 2, can reject the hypothesis of

 It is sensitive to the normally assumption

White’s general heteroscedasticity
 Does not rely on the normality assumption
 The three-variable regression model:

Yi  1   2 X 2i  3 X 3i  ui
 The k-variable regression model :
Yi  1   2 X 2i  ...   k X ki  ui

White’s general heteroscedasticity
The White test proceeds as follows:
1. Estimate the regression model and obtain the
2. Estimate the following regression
uˆi2  1   2 X 2i   3 X 3i   4 X 22i   5 X 32i   6 X 2i X 3i  ui
3. H0: there is no heteroscedasticity
4. Obtain nR 2
~  2
df number of regressors

5. If 2 > the critical chi-square, the conclusion is

there is heteroscedasticity

Koenker-Basset (KB) test
 Estimate the regression model
Yi  1   2 X 2i  ...   k X ki  ui
 Obtain the estimated Y value and the residual, and
then estimate

 
uˆi2   i   2 Yˆi  i
 H0: 2=0
If this is not rejected, conclude that there is no
It is applicable if the error term in the original model
is not normally distributed

Remedial measures

 When i2 is known: the method of weight

least squares
 When i2 is not known
 The true i2 are rarely known
 White’s Heteroscedasticity-Consistent
Variances and Standard Errors  robust
standard errors
 can be larger or smaller than the uncorrected
standard errors

Remedial measures
Assumption about heteroscedasticity pattern
1. The error variance is proportional to Xi2
 
E uˆi2   2 X i2
Divide the original model through by Xi
Yi 1 ui 1
  2   1   2  i
Xi Xi Xi Xi
 ui  1 1
 
E  i
  
 E    2 E ui  2  2 X i2   2

 Xi  Xi Xi

Remedial measures
2. The error variance is proportional to Xi
The square root transformation:

 
E uˆi2   2 X i
The model can be transformed:
Yi 1 ui 1
  2 X i   1   2 X i  i
Xi Xi Xi Xi
 ui  1 1
 
E  i
 E
 X
 
 Xi
E ui 

  
 2 Xi   2
 i 

Remedial measures
3. The error variance is proportional to the
square of the mean value of Y

E  uˆ     E  Yi  
2 2 2

The transform model

Yi 1 Xi ui 1 Xi
  2   1  2  i
E  Yi  E  Yi  E  Yi  E  Yi  E  Yi  E  Yi 
 
E i2   2

Remedial measures

4. A log transformation

ln Yi  1   2 ln X i  ui
very often reduces heteroscedasticity
 log transformation compresses the scales of
 The slope coefficient 2 measures the
elasticity of Y respect to Xi

Conclude of the remedial measures
 All the transformations discussed previously
are ad hoc  speculating about the nature of
 i2
 Some problems:
1. Beyond the two-variable model, we do not know a
priori which of the X variables should be chosen for
transforming the data
2. Log transformation is not applicable if some of the
Y and X values are zero or negative
3. The problem of spurious correlation
4. All testing procedure using the t test, F test, etc are
strictly speaking valid only in large samples

