Mult Hetero Notes Agd

Notes on Multicollinearity and
Heteroskedasticity
We begin with the basic multiple linear

regression (MLR) model and its assumptions.
1. The Population Model

y     1 x 1   2 x 2 . . . .  k x k  u
where, y  Dependent variable - Observable
random variable (r.v.)
x i , i  1, . . . k  Explanatory / Independent
variables - Observable r.v.
u  Disturbance / Error term; Unobservable
r.v.
,  i , i  1, . . . , k  Unobservable parameters /
constants
1
2. The Assumptions
2.1 The model is linear in parameters

y     1 x 1   2 x 2 . . . .  k x k  u
2.2 Random Sampling

x 1 , x 2, . . . , x k; y i  is a random sample from the
population such that
y i     1 x 1i   2 x 2i . . . .  k x ki  u i ,
i  1, 2, . . . , n
Note : From the definition of a random
sample it follows, that the u i are identically
and independently distributed (iid) r.vs., so
that
correlation between u i & u j for any i, ji  j is zero
2.3 There is no Perfect Collinearity
This means : (i) None of the x i are a constant;

2
and
(ii) There is no exact linear relation between
any two x i
2.4. Zero Conditional Mean

Eu|x 1, x 2 , . . . , x k   0
and for the random sample
Eu i |x 1i, x 2i , . . . , x ki   0
The zero conditional mean assumption

implies :
(i) Ey|x     1 x 1   2 x 2 . . . .  k x k
and Ey i |x i      1 x 1i   2 x 2i . . . .  k x ki
(ii) That u and x are uncorrelated .

In fact Eu|x 1, x 2 , . . . , x k   Eu and every
function (and not just linear functions) of x is
uncorrelated with u
3
2.5 Homoskedasticity
This is an assumption regarding the
disturbance variance being constant :
Varu|x 1, x 2 , . . . , x k    2
and
Varu i |x 1i, x 2i , . . . , x ki    2
2.6 The classical assumption regarding

distribution of the error term is that
u i are independent of x i and Normally distributed
i.e.,:
u i  N0,  2 
and
y i  N   x 1, x 2 , . . . , x k ,  2 .
We use the MLR model to estimate the

unknown parameters  i , using the estimators
i.
4
3. Some Basic Concepts
We will use the 2-variable model to discuss

some basic concepts related to estimators,
their properties, link between the OLS
assumptions and properties of estimators and
variances of estimators.
3.1 The OLS Estimators

The 2-variable Simple Linear Model is
y    x  u
or Ey|x    x
A random sample of observations on x and y
is used to derive the ordinary least squares
(OLS) estimators for  and .
In the 2-variable model, the OLS estimators
of the unknown parameters  and  are :
    i x i xy i y
  y  x and 
 i x i x
2
5
3.2 Statistical Properties of the OLS estimators
As we saw in our Statistics classes that 

estimators are random variables, so  and 
are random variables.
Since errors are normally

 distributed, so is y

and so are  and  (being linear functions of
y).
According to the Gauss Markov  theorem the

OLS estimators (i.e.,  and ) are Best,
Linear, Unbiased estimators (B.L.U.E) of 
and .
  
i.e.   N , Var ; since  is unbiased, so

E  

Similarly   N , Var and E   
 
 and  are ‘best’, i.e., in the class of all linear
and unbiased estimators of  and , these
have the least variance or are most ‘efficient’.
3.3 The OLS Residuals

The method of least squares minimizes the
6
sum of squared residuals :
2
 
2
yi  0  1xi  ui
i i
u i are the least squares residuals, which are

observable and:
ui  yi  yi

oru i  y i   0   1 x i
The residuals u i are used as estimators of the

errors or disturbances
(u i ) which are unobservable.
3.4 Variance of Estimators in OLS Model

It follows from assumptions 2.1 to 2.6 that in
the 2-variable model :
   2
  N , Var , where, Var  ,
2
 i x i x 

Similarly   N , Varwhere,
7

2
n  i xi 2
Var  ,
 i x i x 2

 
Both Var and Var are functions of the error
variance  2 which is unknown.
 
To estimate Var and Var we need an
estimate of  2 . How do we get that ?
Estimation of Disturbance Variance  2 
We use the LS residuals u i to proxy for the
unobserved errors, u i .
So the residual variance seems a natural
estimator for the distubance
variance,Varu   2 ,
i.e.,

Var u   n  u i
1 2
i

( u is also a r.v. since it is a function of the

r.vs. yand y.)

However, Var u  is not an unbiased estimator
of  2 . We can show that :
8
 ui
2
E  2
i
 ui
2
E  n  2 2
i
 ui 2
So we use n2
  2 as an estimator of
 2 in the 2-variable model.  2 is an unbiased
estimator of  2 , since
E 2    2
2
 ui
i. e. , E  2
n2
In the multiple regression model with k

explanatory variables,
2
 ui
 2 
nk1
Note :  is not an unbiased estimator of  but

it is consistent, i.e.,p lim n   
9
Wereplace  2 by  2 in the expressions for

Var and Var to get the estimated variances
of  and  .
i.e.,
  2
Var 
 i x i  x  2
 2
 n i xi 2
and, Var 
 i x i  x  2
Positive square root of these estimated
variances give us the standard errors of the

estimators  and  .
 
i.e., Var  std.error of  ; and

Var  std.error of 
3.5 Assumptions & Properties of OLS Estimators
According to the Gauss Markov

 theorem the

OLS estimators (i.e.,  and ) are Best,
10
Linear, Unbiased estimators (B.L.U.E) of 
and .
Assumptions 2.1 to 2.4 above are required for

obtaining the estimators and to prove
 
Unbiasedness of  and . If Assumption 2.4
(zero conditional mean) is violated and there
is correlation between the
errors and the x-variables the estimators
would be biased.
Assumption 2.5 (homoskedasticity) is

 
required to prove efficiency of  and .
Assumption 2.6 (normally distributederrors)


is NOT required to prove that  and  are
B.L.U.E. This assumption is required for
carrying out hypothesis tests.
11
4. Multicollinearity
4.1 What is Multicollinearity ?
Multicollinearity refers to the presence of high

linear correlation between the explanatory
variables.It is part of the Data generating
process (DGP), very common in business /
social sciences.
4.2 How does it affect properties of OLS estimators ?

OLS estimators are BLUE in the presence of
Multicollinearity (But, perfect multicollinearity
is ruled out by assumption 2.3 above).
Why do we rule out perfect multicollinearity ?

In our discussion on the Dummy variable trap
we saw the OLS model in matrix form and the
X-matrix. In the presence of
perfect multicollinearity columns of the
X-matrix would become linearly dependent
12
(as we saw in case of the dummy variable
trap), so inverse of the X-matrix would not
exist and we would not be able to estimate
the unknown parameters  and.
4.3 What is the problem due to multicollinearity?
To understand why multicollinearity is a

problem see the variance of the estimators in
the MLR model :

Var j  2
, where
SST x j 1R 2j
Var j  Variance of x j , the jth explantory

variable
SST x j  Total sample variation in x j
R 2j  R 2 from regressing x j on all the other
explanatory variables
High correlation between explanatory
variables means R 2j is very high (close to 1)
and 1  R 2j  is very low, so that Var j tends
to be high, given values of SST x j and  2 .
13
High Var j leads to high standard errors and
low values of t-statistics (remember t-statistic
  j
of  j   ).
se j
In the presence of multicollinearity, variables

can be statistically insignificant (very low
t-statistics for individual variables), even
though the F-statistic, for overall significance
is significant.
4.4. How do we detect Multicollinearity ?
We can calculate Variance Inflation Factor or

VIF for each explanatory variable,
1
VIF of x j  ;
1R 2j
The general cutoff value for VIF is 10

(whenR 2j is close to 1, say 0.9, indicating high
correlation between x j and other explantory
variables).
It indicates to what extent standard error of x j
is higher due to its correlation with other
14
xi, i  j
But there is no exact measure or cutoff for
what is ‘too high’. In practice therefore, VIF
has limited relevance.
4.5 What can we do about Multicollinearity ?
We try to have large datasets to ensure high

variance in the x-variables. Look at the
components of Var j . If total variance of x j is
high, SST x j would be high and this would
reduce Var j and lower standard errors, for
given values of  2 and R 2j .
Do you see,
when data sets are small SST x j is also lower ?
This creates a problem of ‘micronumerosity’ !

Small data sets have low values of SST x j
which can lead to high Var j , high standard
errors and low t-statistics, even if there is no
multicollinearity.
15
Taking log of variables may also help in the
presence of multicollinearity.
There is no simple rule of thumb to tackle this

problem.
Be careful about dropping explanatory

variables to avoid multicollinearity. This can
create a problem of omitted variables bias.
Remember estimators are BLUE in the

presence of multicollinearity. But dropping a
variable can lead to biased estimators.
Readings : Section on Multicollinearity from

Chapter 3 in Wooldridge (2012).
16
5. Heteroskedasticity
5.1 What is Heteroskedasticity ?
It is violation of Assumption 2.5 above and is

best understood in contrast with
Homoskedasticity, the assumption regarding
the disturbance variance being constant.
Under Homoskedasticity :
Varu|x 1, x 2 , . . . , x k    2
Since Eu|x 1, x 2 , . . . , x k   0
we can write
Eu 2 |x 1, x 2 , . . . , x k    2
In contrast, under Heteroskedasticity :
Varu i |x 1, x 2 , . . . , x k    2i , i  1, . . . n
Since Eu i |x 1, x 2 , . . . , x k   0
we can write
Eu 2i |x 1, x 2 , . . . , x k    2i
17
Heteroskedasticity essentially means error
variances may not be constant, rather they
may be functions of the explanatory variables
or x-variables.
5.2 What problem does Heteroskedasticity cause ?
(i) When there is a relation between

error variance and x-variables the
assumption of homoskedasticity is violated,
so OLS estimators are not efficient, not BLUE
any more.
As you saw above the error variance ( 2 )
affects variance of estimators (Var j ). Hence
standard errors of estimators are affected and
results of hypothesis tests are no longer valid
(iii) But estimators are unbiased and
consistent; this follows as long Assumption
2.4 is valid i.e., there is
zero correlation between errors and x-variables
18
5.3 What to do about Heteroskedasticity ?
First we discuss what to do when we

do not know the form of heteroskedasticity.
(When form of heteroskedasticity is known
we use WLS as discussed in Section 5.5
below).
Suppose we suspect
Eu 2i |x 1, x 2 , . . . , x k    2i  fx 1, x 2 , . . . , x k  but we
do not know what is the exact funtional form
of ffx 1, x 2 , . . . , x k .
In this case we can use
heteroskedasticity- robust standard errors .
How to estimate robust standard errors ?
Recall in the 2-variable model, under

homoskedasticity :
19
  i x i  x  2  2 2
Var  
 i x i  x  
2 2
 i x i  x  2
Under heteroskedasticity :
  i x i  x  2  2i
Var 
 i x i  x  2  2
To estimate Var  in the presence of

heteroskedasticity of any form , replace the
error variance  2i above by its estimator, the
residual variance û 2i (recall, residuals,û i also
have zero mean);
So robust standard error estimation involves

using the following to estimate Var:
 x i  x  2 û 2i
Var 
2 2
x i  x 
20
(ii) Robust standard errors can always be
used with cross-section data and is valid
even under homoskedasticity, when the data
set is large.
5.4 Tests for Heteroskedasticity
(i) With heteroskedasticity, the error variance

( 2 ) is a function of the x-variables.
Therefore tests check for presence or
absence of a relation between the squared
residuals (û 2 ) (proxy for the error variance)
and the included explanatory variables, on
the basis of an auxilliary regression of the
following type :
û 2   0   1 x 1   2 x 2   3 x 21   4 x 22   5 x 1 x 2  
After this regression we test :
H o :  1   2 . .   5  0 against H 1 :At least
one   0
You can see that with just 2 explanatory
variables in the model, the auxilliary
regression has to estimate 6 parameters. So
21
there can be problems with degrees of
freedom, in models with more than 2
regressors.
(ii) The following test for heteroskedasticity

can be used to conserve degrees of freedom
:
 
û2  0  1 y  2 y 2  

where, y are predicted values from the
regression of y on the x-variables.
5.5 Weighted Least Squares (WLS) Estimators
Suppose our regression model has

heteroskedasticity and
we know the exact functional form relating the
error variance to the x-variables. In this case
WLS estimators are used as they correct for
heteroskedasticity.
Recall from our statistics class, for any
22
random variable y and a constant m:
Varmy  m 2 Vary
 ifVary   2 ,
Varmy  m 2  2
Also remember, for a set of independent
variables, (e.g., the error terms are assumed
to be independent), the variance of the sum is
equal to the sum of variances :
Varu i    2 , i
Then
Var u i   n 2
We will use these results as we proceed.
Weighted Least Squares (WLS) - Example 1
Consider the following example :

Model :
23
y i    x i  u i   1
where, Varu i |x i    2 fx   2 x i
In this case we use WLS and transform our
model as follows, where each term is
weighted by 1x i :
yi
  1   x i  u i   2
xi xi xi xi
y
or, i     x i  u i   2
xi xi xi
What is the variance of the error term in
Model (2) ?
Var u i   x1i Varu i   x1i  2 x i   2
xi
i. e. Var u i    2
xi
Using WLS we have homoskedastic errors,
i.e., the error variance is constant in Model
(2) !
So when the exact form of heteroskeasiticy is
known we use Generalized Least Squares
24
(GLS) based on the following general
principle. Suppose :
Varu|x   2 fx
We transform the model by using WLS using
the weights 1 as follows :
fx
y   x  u   3

fx fx fx fx
So that :
Var u   1 Varu  1  2 fx
fx fx fx
or, Var u    2
fx
Look at the estimated coefficients in the WLS
Models (2) and (3). Their interpretation is
exactly the same as in Model 1.
Clearly, when the form of heteroskeasiticy is

known, WLS estimators are more efficient.
25
Weighted Least Squares (WLS) - Example 2
Suppose the model is :

y    x  u
and
Varu   2
But, the data is not available for y and x, only
data on averages is available.
E.g., instead of years of education (y) and
age (x) of each employee, for each firm, you
have the average number of years of
education and average age of all the
employees.
i.e., the Model you are estimating is :
y    x  u
where
26
 ui
Varu  Var n   12 Var u i 
n
 12 n 2
n
 1n  2
The errors are heteroskedastic in the
averages-Model and the form of
heteroskedasticity is known.
So we use WLS as discussed above.
Here fx  1n so the weights used will be
1
 n
fx
Using WLS the model is transformed as

follows :
y i n i   n i  x i n i  u i n i
In this model errors are homoskedastic :
Varu i n i   nVaru i   n 1n  2   2
This example shows, if we are working with

cross-section data on, say, firm-level
27
averages for employees of each firm, it is
best to weight each observation by
square-root of number of employees in the
firm. This WLS estimator is more efficient
compared to the OLS estimator.
Carefully go through the examples in

Wooldridge Chapter 8 to understand this
better.
Note : When the form of the

heteroskedasticity is unknown, i.e.fx is not
known, it may be estimated using the
residuals to obtain fx . When fx is used to
transform the data we are using Feasible
Generalized Least Squares (FGLS).
Suggested Readings for Heteroskedasticity :

Chapter 8 in Wooldridge (2012).
Wooldridge, J.M. (2012), Introductory
econometrics: A modern approach, Cengage
Learning (Latest edition)
28
29

Mult Hetero Notes Agd

Uploaded by

Copyright:

Available Formats

You might also like

Mult Hetero Notes Agd

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mult Hetero Notes Agd

Uploaded by

Copyright:

Available Formats

Notes on Multicollinearity and

We begin with the basic multiple linear

1. The Population Model

2.1 The model is linear in parameters

2.2 Random Sampling

2.3 There is no Perfect Collinearity

This means : (i) None of the x i are a constant;

2.4. Zero Conditional Mean

The zero conditional mean assumption

(ii) That u and x are uncorrelated .

2.6 The classical assumption regarding

We use the MLR model to estimate the

We will use the 2-variable model to discuss

3.1 The OLS Estimators

Since errors are normally

3.3 The OLS Residuals

u i are the least squares residuals, which are

The residuals u i are used as estimators of the

3.4 Variance of Estimators in OLS Model

In the multiple regression model with k

Note :  is not an unbiased estimator of  but

3.5 Assumptions & Properties of OLS Estimators

According to the Gauss Markov

Assumptions 2.1 to 2.4 above are required for

Assumption 2.5 (homoskedasticity) is

Assumption 2.6 (normally distributederrors)

4.1 What is Multicollinearity ?

Multicollinearity refers to the presence of high

4.2 How does it affect properties of OLS estimators ?

Why do we rule out perfect multicollinearity ?

4.3 What is the problem due to multicollinearity?

To understand why multicollinearity is a

Var j  Variance of x j , the jth explantory

In the presence of multicollinearity, variables

4.4. How do we detect Multicollinearity ?

We can calculate Variance Inflation Factor or

The general cutoff value for VIF is 10

4.5 What can we do about Multicollinearity ?

We try to have large datasets to ensure high

This creates a problem of ‘micronumerosity’ !

There is no simple rule of thumb to tackle this

Be careful about dropping explanatory

Remember estimators are BLUE in the

Readings : Section on Multicollinearity from

5.1 What is Heteroskedasticity ?

It is violation of Assumption 2.5 above and is

5.2 What problem does Heteroskedasticity cause ?

(i) When there is a relation between

First we discuss what to do when we

How to estimate robust standard errors ?

Recall in the 2-variable model, under

To estimate Var  in the presence of

So robust standard error estimation involves

5.4 Tests for Heteroskedasticity

(i) With heteroskedasticity, the error variance

(ii) The following test for heteroskedasticity

5.5 Weighted Least Squares (WLS) Estimators

Suppose our regression model has

Recall from our statistics class, for any