Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 83

Topic 9

HETEROSCEDASTICITY( 异方差 )

 DEFINITION OF HETEROSCEDASTICITY
 EXAMPLES OF HET MODELS
 TESTS FOR HETEROSCEDASTICITY
 REMEDIAL METHODS : WEIGHTED AND
LOGARITHMIC REGRESSIONS

1
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

This sequence introduces the topic of heteroscedasticity, which relates to the distribution
of the disturbance term in a regression model.

2
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

We will discuss it in the context of the regression model Y = 1 + 2X + u. To keep the
diagram uncluttered, we will suppose that we have a sample of only five observations, the X
values of which are as shown.
3
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

If there were no disturbance term in the model, the observations would lie on the line as
shown.

4
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

Now we take account of the effect of the disturbance term. It will displace each observation
in the vertical dimension, since it modifies the value of Y without affecting X.

5
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

The disturbance term in each observation is hypothesized to be drawn randomly from a


given distribution. In the diagram, three assumptions are being made.

6
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

One is that the expected value of u in each observation is 0. That is, E(u)=0. The second is
that the distribution in each observation is normal. We are not concerned with either of
these and we will assume them to be true.
7
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

The third is that the distribution is the same for each observation. Var(ut)=σ2. In the present
case, that means that the normal distributions shown all have the same variance. In other
words, the variance of ut does not depend on X (σ2 ≠ f(Xi)).
8
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

If this condition is satisfied, the disturbance term is said to be homoscedastic ( 同方差 )


(Greek for same scattering).

9
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

Each observation is then potentially (before the sample is drawn) an equally reliable guide
to the location of the line Y = 1 + 2X.
10
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

Once the sample has been drawn, some observations will lie closer to the line than others,
but we have no way of anticipating in advance which ones these will be.

11
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

Now consider the situation illustrated by the diagram above. The distribution of u
associated with each observation still has expected value 0 and is normal. However the
variance is no longer constant.
12
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

Obviously, observations where u has low variance, like that for X1, will tend to be better
guides to the underlying relationship than those like that for X5, where it has a relatively
high variance.
13
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

When the distribution is not the same for each observation, the disturbance term is said to
be subject to heteroscedasticity ( 异方差 ).

14
Examples of het models
Example 1. Consumption-expenditure model: ( 单调递
增性 )
CONt = β1+β2INCt+ut
It is likely that V(ut)>V(us) if INCt>INCs

Example 2. Error-learning Models: ( 单调递减型 )


ERRORSt= β1+β2PRACTICE_TIMEt+ut
As practice time increases, the variance as well
as the average of typing errors will decrease.
学的越多发生错误越少

15
Example 3. Average hourly earnings vs years of
education
(Data source: Current Population Survey)

16
CONSEQUENCES OF HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

OLS estimator is still linear and unbiased. There are two major consequences of
heteroscedasticity. One is that the standard errors of the regression coefficients are
estimated wrongly and the t tests (and F test) are invalid.
17
Example:
For simple regression model Yt  1   2 X t  ut
If there is HET, where var(ut)=σt2 , then the usual formula for the variance
(and hence the standard error) of the OLS estimator:

var(ut ) 2
var(b2 )  
n var( X ) n var( X )

is a biased estimator of the true variance of b2 in the presence of HET.


The usual t-test and F-test are invalid.

t2
var(b2 ) 
n var( X )

18
HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

The other is that OLS is an inefficient estimation technique. An alternative technique which
gives relatively high weight to the relatively low-variance observations should tend to yield
more accurate estimates.
19
Still in the example of a simple regression model, if there is no HET,
then the variance of b2 is,
1
var(b2 )   kt2   2
where  kt2 
 ( X t  X )2
If there is HET in the disturbance terms ut, suppose the estimator of the
parameter β2 is b2*
var(ut )   t2  t 2 (t  0, t  1,2,..., n)
then

var(b ) 
*
k 
2 2
 2
 k 2
 2
k 2

 k
t t
2

k
2 t t t t t 2
t

 var(b2 )
 k t t
2

k t
2

Then if
 k t t
2

 1, var(b2* )  var(b2 )
k t
2

Therefore, when there is HET, the OLS estimator is not efficient anymore.

20
Detection of Heteroscedasticity
确定是否存在异方差

1. Graphical method ( 图示法 )


2. G-Q (Goldfeld-Quandt) test
3. Park test( 帕克检验 ) and Gleiser test( 戈里
瑟检验 )
4. White test( 怀特检验 )
DETECTION OF HETEROSCEDASTICITY

1800000

1600000

1400000

1200000
Manufacturing

1000000

800000

600000

400000

200000

0
0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000
GDP

In the scatter diagram manufacturing output is plotted against GDP, both measured in U.S.
$ millions, for 30 countries for 1997. (Data are from the UNIDO Yearbook. The sample is
restricted to countries with GDP at least $10 billion and GDP per capita at least $2000.)
22
DETECTION OF HETEROSCEDASTICITY

1800000

1600000
USA
1400000
Japan
1200000
Manufacturing

1000000

800000

600000

400000

200000

0
0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000
GDP

The scatter diagram is dominated by the observations for Japan and the USA and it is
difficult to detect any kind of pattern.

23
DETECTION OF HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

150000

100000

50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

However if those two countries are dropped and the scatter diagram rescaled, a clear
picture of heteroscedasticity emerges.

24
DETECTION OF HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

South Korea
150000

100000

50000
Mexico
0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

The reason for the heteroscedasticity is that variations in the size of the manufacturing
sector around the trend relationship increase with the size of GDP.

25
DETECTION OF HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

South Korea
150000

100000

50000
Mexico
0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

South Korea and Mexico are both countries with relatively large GDP. The manufacturing
sector is relatively important in South Korea, so its observation is far above the trend line.
The opposite was the case for Mexico, at least in 1997.
26
DETECTION OF HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

150000

100000
Singapore
50000

Greece
0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

Singapore and Greece are another pair of countries with relatively large and small
manufacturing sectors. However, because the GDP of both countries is small, their
variations from the trend relationship are also small.
27
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

The disturbance term in a regression model is said to be homoscedastic if it has the same
potential distribution in all observations. If this condition is not satisfied, it is said to be
heteroscedastic, and clearly the possible types of heteroscedasticity are endless.
28
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

However, in one particularly common type the standard deviation of the distribution is
proportional to the size of one of the explanatory variables.

29
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

+ 2X
Y = 1

1

X1 X2 X3 X4 X5 X

This type of heteroscedasticity is illustrated in the diagram above. The standard deviation
of the distribution is proportional to X.

30
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

150000

100000

50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

The Goldfeld-Quandt test is a test for this type of heteroscedasticity. The sample is divided
into three ranges containing the 3/8 of the observations with the smallest values of the X
variable, the 3/8 of the observations with the largest values, and 1/4 in the middle.
31
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

150000

100000

50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

In the present case with 28 observations, the lower, middle, and upper ranges have 11, 6,
and 11 observations, respectively

32
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

150000

100000

50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

You then fit regression lines to the lower and upper ranges of the observations, as shown.

33
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000

200000
Manufacturing

150000

100000

50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

The regression line for the lower range has been buried under the observations. Here it is,
in red.

34
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000
Manufacturing

150000

100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

You then compare the residual sum of squares for the two regressions. We will denote
them RSS1 and RSS2 for the lower and upper ranges, respectively.

35
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000
Manufacturing

150000

100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

If the disturbance term is homoscedastic, there should be no systematic difference


between RSS1 and RSS2.

36
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000
Manufacturing

150000

100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

However, if the standard deviation of the distribution of the disturbance term is


proportional to the X variable, RSS2 is likely to be greater than RSS1.

37
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000 RSS 2 / n2 13,518,000,000 / 9
Manufacturing

F ( n2 , n1 )    86.1
RSS1 / n1 157,000,000 / 9
150000
F (9,9)crit , 0.1%  10.1
100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

If it is greater, the question is whether it is significantly greater. The test statistic is the F
statistic shown above. n1 and n2 are the numbers of degrees of freedom in the lower and
upper regressions. (Normally n1 and n2 will be the same.)
38
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000 RSS1 = 157,000,000


RSS 2 / n2 13,518,000,000 / 9
200000
F ( n2 , n1 )    86.1
Manufacturing

RSS1 / n1 157,000,000 / 9
150000
F (9,9)crit , 0.1%  10.1
100000
RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

In the present case we reject the null hypothesis of homoscedasticity at the 0.1% level. We
therefore need to find an alternative to straightforward OLS regression.

39
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000
Manufacturing

150000

100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

Incidentally, why was the sample split into three ranges? Why not split it into two halves,
and compare RSS for the regressions using the two halves?

40
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000
Manufacturing

150000

100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

The reason is that, by omitting the central range, you increase the contrast between the
variances of the residuals, and you have a better chance of rejecting the null hypothesis of
homoscedasticity.
41
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000
Manufacturing

150000

100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

However, the larger the omitted central section, the smaller will be the number of degrees
of freedom in the subsample regressions, and this will make it more difficult to reject the
null hypothesis.
42
GOLDFELD-QUANDT TEST FOR HETEROSCEDASTICITY

300000

250000
RSS1 = 157,000,000
200000
Manufacturing

150000

100000

RSS2 = 13,518,000,000
50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

Thus there is a trade-off between making the omitted range too large and too small. On the
basis of experimentation, Goldfeld and Quandt recommend omitting about a quarter of the
observations.

43
Park TEST and Gleiser test FOR HETEROSCEDASTICITY

Park test: Park suggests that σi2 is some function


of the explanatory variable Xi.
Since σi2 is generally not known,
e 2  f ( X ji )   i   2 X ji e i
ln e 2  ln  2   ln X ji   i

If α turns out to be statistically significant, it


would suggest that HET is present in the data.
Park TEST and Gleiser test FOR HETEROSCEDASTICITY

Gleiser test: The Gleiser test is similar in spirit to the Park


test. Gleiser suggests regressing the absolute values of
ei on the X variable that is thought to be closely
associated with σ2i

ei  f ( X ji )   i
ei  1   2 X i   i
ei  1   2 X i  i
1
ei  1   2  i
Xi
White test FOR HETEROSCEDASTICITY
最常用
Unlike the G-Q test, which requires reordering the
observations with respect to the X variable that
supposedly caused HET, White test is used to detect
general HET and easy to implement.
eg.
Yi  1   2 X 2   3 X 3  i
 ei 2  1   2 X 2   3 X 3   4 X 22   5 X 32   6 X 2 X 3   i
Under the null that there is no HET,
n  R 2 ~  2 df
If the observed chi-square value exceeds the critical chi-square
value at the chosen level of sign ificance, then there is HET.
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2

This sequence presents two methods for dealing with the problem of heteroscedasticity.
We will start with the general case, where the variance of the distribution of the disturbance
term in observation i is i2.
47
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2
Yi 1 X i ui
 1   2 
i i i i

If we knew i in each observation, we could derive a homoscedastic model by dividing the


equation through by it.

48
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2
Yi 1 X i ui
 1   2 
i i i i

 ui  1
population variance of    2 population variance of ui
 i   i
 i2
 2 1
i

The population variance of the disturbance term in the revised model is now equal to 1 in all
observations, and so the disturbance term is homoscedastic.

49
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2
Yi 1 X i ui
 1   2 
i i i i

 ui  1
population variance of    2 population variance of ui
 i   i
 i2
 2 1
i
Yi 1 Xi ui
Y '   1 H   2 X ' u' Y ' , H  , X ' , u' 
i i i i
In the revised model, we regress Y' on X' and H, as defined. Note that there is no intercept
in the revised model. 1 becomes the slope coefficient of the artificial variable 1/ i.
50
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2
Yi 1 X i ui
 1   2 
i i i i

 ui  1
population variance of    2 population variance of ui
 i   i
 i2
 2 1
i
Yi 1 Xi ui
Y '   1 H   2 X ' u' Y ' , H  , X ' , u' 
i i i i
The revised model is described as a weighted regression model( 加权回归模型 ) because we are
weighting observation i by a factor 1/i. Note that we are automatically giving the highest
weights to the most reliable observations (those with the lowest values of i).
51
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2

 i  Z i

Of course in practice we do not know the value of i in each observation. However it may
be reasonable to suppose that it is proportional to some measurable variable, Zi.

52
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2

 i  Z i
Yi 1 X i ui
 1   2 
Zi Zi Zi Zi

If this is the case, we can make the model homoscedastic by dividing through by Zi.

53
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2

 i  Z i
Yi 1 X i ui
 1   2 
Zi Zi Zi Zi

 ui  1 2  i2
population variance of    2  i  2 2  2
 Zi  Zi i /

Yi 1 Xi ui
Y '   1 H   2 X ' u' Y ' , H  , X ' , u' 
Zi Zi Zi Zi
The disturbance term in the revised model has constant variance 2. We do not need to
know the value of 2. The crucial point is that, by assumption, it is constant.
54
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

300000

250000

200000
Manufacturing

150000

100000

50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

We will illustrate this procedure with the UNIDO data on manufacturing output and GDP.
We will try scaling by population. A regression of manufacturing output per capita on GDP
per capita is less likely to be subject to heteroscedasticity.

55
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

9000

8000
Manufacturing per capita

7000

6000

5000

4000

3000

2000

1000

0
0 5000 10000 15000 20000 25000 30000 35000 40000
GDP per capita

Here is the revised scatter diagram. Does it look homoscedastic? Actually, no. This is still
a classic pattern of heteroscedasticity.

56
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

9000

8000
RSS1 = 5,378,000
Manufacturing per capita

7000

6000

5000

4000

3000

2000

1000 RSS2 = 17,362,000


0
0 5000 10000 15000 20000 25000 30000 35000 40000
GDP per capita

RSS2 is much larger than RSS1.

57
11
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

9000

8000
RSS1 = 5,378,000
Manufacturing per capita

7000

6000 RSS 2 / n2 17,362,000 / 9


F ( n2 , n1 )    3.23
5000 RSS 1 / n1 5,378,000 / 9
4000
F (9,9)crit , 5%  3.18
3000

2000

1000 RSS2 = 17,362,000


0
0 5000 10000 15000 20000 25000 30000 35000 40000
GDP per capita

However, the subsamples are small and high ratios can occur on a pure chance basis. The
null hypothesis of homoscedasticity is only just rejected at the 5% level.

58
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2

 i  X i

Often the X variable itself is a suitable scaling variable. After all, the Goldfeld-Quandt test
assumes that the standard deviation of the disturbance term is proportional to it.

59
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2

 i  X i
Yi 1 ui
 1  2 
Xi Xi Xi

Note that when we scale though by it, the 2 term becomes the intercept in the revised
model.

60
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

Y  1   2 X  u
population variance of ui   i2

 i  X i
Yi 1 ui
 1  2 
Xi Xi Xi

 ui  1 2  i2
population variance of    2  i  2 2  2
 Xi  Xi i /

Yi 1 ui
Y '   1 H   2  u' Y ' , H , u' 
Xi Xi Xi
It follows that when we interpret the regression results, the slope coefficient is an estimate
of 1 in the original model and the intercept is an estimate of 2.
61
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

0.40
Manufacturing/GDP

0.30

0.20

0.10

0.00
0 10 20 30 40 50 60 70 80
1/GDP x 1,000,000

Here is the corresponding scatter diagram. Is there any evidence of heteroscedasticity?

62
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

0.40
RSS1 = 0.065
Manufacturing/GDP

0.30

0.20

0.10

RSS2 = 0.070
0.00
0 10 20 30 40 50 60 70 80
1/GDP x 1,000,000

No longer. The residual sums of squares for the two subsamples are almost identical,
indeed closer than one would usually expect on a pure chance basis under the null
hypothesis.

63
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

0.40
RSS1 = 0.065
Manufacturing/GDP

0.30 RSS 2 / n2 0.070 / 9


F ( n2 , n1 )    1.08
RSS1 / n1 0.065 / 9
0.20
F (9,9)crit , 5%  3.18

0.10

RSS2 = 0.070
0.00
0 10 20 30 40 50 60 70 80
1/GDP x 1,000,000

As a consequence, the F statistic is not significant. The heteroscedasticity has been


eliminated.

64
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

300000

250000

200000
Manufacturing

150000

100000

50000

0
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP

We will now consider an alternative approach to the problem. It is possible that the
heteroscedasticity has been caused by an inappropriate mathematical specification.
Suppose, in particular, that the true relationship is in fact logarithmic.

65
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

13

12
log Manufacturing

11

10

7
9 10 11 12 13 14 15
log GDP

Here is the corresponding scatter diagram. No sign of heteroscedasticity.

66
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

13

12 RSS1 = 2.140
log Manufacturing

11

10

8
RSS2 = 1.037
7
9 10 11 12 13 14 15
log GDP

We confirm this with the Goldfeld-Quandt test. In this case there is no point in calculating
the conventional test statistic. RSS2 is smaller than RSS1, so it cannot be significantly
greater than RSS1.

67
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

13

12 RSS1 = 2.140
log Manufacturing

11 RSS1 / n1 2.140 / 9
F ( n2 , n1 )    2.06
RSS 2 / n2 1.037 / 9
10
F (9,9)crit , 5%  3.18
9

8
RSS2 = 1.037
7
9 10 11 12 13 14 15
log GDP

The null hypothesis of homoscedasticity is not rejected.

68
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

13

12
log Manufacturing

11

10

9 log Y   1   2 log X  u
8 Y  e 1 X  2 e u
7
9 10 11 12 13 14 15
log GDP

Now an additive disturbance term in the logarithmic model is equivalent to a multiplicative


one in the original model.

69
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)

Here is a summary of the regressions using the four alternative specifications of the model.

70
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)
The first regression suggests that, for every increase of $1 million in GDP, manufacturing
output increases by $194,000. Thus, at the margin, manufacturing accounts for 0.19 of
GDP. The intercept does not have any plausible meaning.

71
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)

However, this regression was subject to severe heteroscedasticity. Although the estimate
of the coefficient of GDP is unbiased, it is likely to be relatively inaccurate. Also, and this is
a separate effect of heteroscedasticity, the standard errors, t tests and F test are invalid.
72
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)

In the second regression, the estimate of the slope coefficient was a little lower. However
for this regression also the null hypothesis of homoscedasticity was rejected, but only at
the 5% level.
73
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)

In the third regression the model was scaled through by GDP. As a consequence, the
intercept became an estimator of the original slope coefficient, and vice versa.

74
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)
For this model the null hypothesis of homoscedasticity was not rejected. In principle,
therefore, it should yield more accurate estimates of the coefficients than the first two, and
we are able to perform tests.

75
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)
For the logarithmic model also the null hypothesis of homoscedasticity was not rejected.
So we have two models which survive the Goldfeld-Quandt test. Which do you prefer?
Think about it.

76
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)
You probably went for the logarithmic model, attracted by the high R2. However, in this
example, there is little to choose between the third and fourth models. Substantively, they
have the same interpretation.
77
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)
In the third model, 1/GDP has a low t statistic and appears to be an irrelevant variable. The
model is telling us that manufacturing output, as a proportion of GDP, is constant. Because
it is constant, R2 is effectively 0.
78
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)

MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)
The fourth model is telling us that the elasticity of manufacturing output with respect to
GDP is equal to 1. In other words, manufacturing output increases proportionally with GDP
and remains a constant proportion of it.
79
REMEDIAL METHODS: WEIGHTED AND LOGARITHMIC REGRESSIONS

MANˆ U  604  0.194GDP R 2  0.89


(5700) (0.013)
MANU  e 1.694GDP 0.999  0.184GDP 0.999
MANˆ U 1 GDP
 612  0.182 R 2  0.70
POP (1371)POP (0.016)POP

MANˆ U 1
 0.189  533 R 2  0.02
GDP (0.019) (841)GDP

log MAˆ NU  1.694  0.999 log GDP R 2  0.90


(0.785) (0.066)
Converting the logarithmic equation back into natural units, you obtain the equation shown.
Like the third equation, it implies that manufacturing output accounts for a little over 0.18
of GDP, at the margin.

80
REMEDIAL METHODS:White heteroskedasiticity-consistence covariance matrix
estimator( 怀特异方差一致协方差矩阵估计量 )

Consider the model with a single independent variable,

Y   0  1 X  ui
If the errors contain heteroscedasticity, then
Var (ui )   i 2
White(1980) provided a way to estimate a valid variance of ˆ1

 i
( X
i 1
 X ) 2 2
uˆ i
n
[ ( X i  X ) 2 ]2
i 1

Briefly, it can be shown that when the above equation is multiplied by the sample size n, it converges
in probability toE[( X i  x ) 2 u 2i ] / ( 2 x ) 2 , which is the probability limit of n times the equation. The law of
numbers and the central limit theorem play key roles in establishing these convergences. A similar
formula works in the general multiple regression model.
81
REMEDIAL METHODS:White heteroskedasiticity-consistence covariance matrix
estimator( 怀特异方差一致协方差矩阵估计量 )

Consider the model with a single independent variable,

Y   0  1 X  ui
If the errors contain heteroscedasticity, then
Var (ui )   i 2
White(1980) provided a way to estimate a valid variance of ˆ1

 i
( X
i 1
 X ) 2 2
uˆ i
n
[ ( X i  X ) 2 ]2
i 1

The square root of the variance is called White heteroscedasticity-


consistent standard error( 怀特异方差一致标准误 ) or heteroscedasticity-robust
standard error (稳健标准误) forˆi .
82
Problem
Fit an earnings function using your data set, taking
EARNINGS as the dependent variable and S,
ASVABC, and MALE as the explanatory variables,
and perform a Goldfeld-Quandt test for HET in
the S dimension. (Remember to sort the
observations by S first.)

You might also like