Mult Hetero Notes Agd

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Notes on Multicollinearity and

Heteroskedasticity

We begin with the basic multiple linear


regression (MLR) model and its assumptions.

1. The Population Model


y     1 x 1   2 x 2 . . . .  k x k  u
where, y  Dependent variable - Observable
random variable (r.v.)
x i , i  1, . . . k  Explanatory / Independent
variables - Observable r.v.
u  Disturbance / Error term; Unobservable
r.v.
,  i , i  1, . . . , k  Unobservable parameters /
constants

1
2. The Assumptions

2.1 The model is linear in parameters


y     1 x 1   2 x 2 . . . .  k x k  u

2.2 Random Sampling


x 1 , x 2, . . . , x k; y i  is a random sample from the
population such that
y i     1 x 1i   2 x 2i . . . .  k x ki  u i ,
i  1, 2, . . . , n
Note : From the definition of a random
sample it follows, that the u i are identically
and independently distributed (iid) r.vs., so
that
correlation between u i & u j for any i, ji  j is zero

2.3 There is no Perfect Collinearity

This means : (i) None of the x i are a constant;


2
and
(ii) There is no exact linear relation between
any two x i

2.4. Zero Conditional Mean


Eu|x 1, x 2 , . . . , x k   0
and for the random sample
Eu i |x 1i, x 2i , . . . , x ki   0

The zero conditional mean assumption


implies :
(i) Ey|x     1 x 1   2 x 2 . . . .  k x k
and Ey i |x i      1 x 1i   2 x 2i . . . .  k x ki

(ii) That u and x are uncorrelated .


In fact Eu|x 1, x 2 , . . . , x k   Eu and every
function (and not just linear functions) of x is
uncorrelated with u

3
2.5 Homoskedasticity
This is an assumption regarding the
disturbance variance being constant :
Varu|x 1, x 2 , . . . , x k    2
and
Varu i |x 1i, x 2i , . . . , x ki    2

2.6 The classical assumption regarding


distribution of the error term is that
u i are independent of x i and Normally distributed
i.e.,:
u i  N0,  2 
and
y i  N   x 1, x 2 , . . . , x k ,  2 .

We use the MLR model to estimate the


unknown parameters  i , using the estimators
i.

4
3. Some Basic Concepts

We will use the 2-variable model to discuss


some basic concepts related to estimators,
their properties, link between the OLS
assumptions and properties of estimators and
variances of estimators.

3.1 The OLS Estimators


The 2-variable Simple Linear Model is
y    x  u
or Ey|x    x
A random sample of observations on x and y
is used to derive the ordinary least squares
(OLS) estimators for  and .
In the 2-variable model, the OLS estimators
of the unknown parameters  and  are :
    i x i xy i y
  y  x and 
 i x i x
2

5
3.2 Statistical Properties of the OLS estimators
As we saw in our Statistics classes that 

estimators are random variables, so  and 
are random variables.

Since errors are normally


 distributed, so is y

and so are  and  (being linear functions of
y).
According to the Gauss Markov  theorem the

OLS estimators (i.e.,  and ) are Best,
Linear, Unbiased estimators (B.L.U.E) of 
and .
  
i.e.   N , Var ; since  is unbiased, so

E  

Similarly   N , Var and E   
 
 and  are ‘best’, i.e., in the class of all linear
and unbiased estimators of  and , these
have the least variance or are most ‘efficient’.

3.3 The OLS Residuals


The method of least squares minimizes the
6
sum of squared residuals :
2
 
2
yi  0  1xi  ui
i i

u i are the least squares residuals, which are


observable and:
ui  yi  yi

oru i  y i   0   1 x i

The residuals u i are used as estimators of the


errors or disturbances
(u i ) which are unobservable.

3.4 Variance of Estimators in OLS Model


It follows from assumptions 2.1 to 2.6 that in
the 2-variable model :
   2
  N , Var , where, Var  ,
2
 i x i x 

Similarly   N , Varwhere,

7

2
n  i xi 2
Var  ,
 i x i x 2

 
Both Var and Var are functions of the error
variance  2 which is unknown.
 
To estimate Var and Var we need an
estimate of  2 . How do we get that ?
Estimation of Disturbance Variance  2 
We use the LS residuals u i to proxy for the
unobserved errors, u i .
So the residual variance seems a natural
estimator for the distubance
variance,Varu   2 ,
i.e.,

Var u   n  u i
1 2

i

( u is also a r.v. since it is a function of the

r.vs. yand y.)

However, Var u  is not an unbiased estimator
of  2 . We can show that :

8
 ui
2
E  2
i

 ui
2
E  n  2 2
i

 ui 2
So we use n2
  2 as an estimator of
 2 in the 2-variable model.  2 is an unbiased
estimator of  2 , since
E 2    2
2
 ui
i. e. , E  2
n2

In the multiple regression model with k


explanatory variables,
2
 ui
 2 
nk1

Note :  is not an unbiased estimator of  but


it is consistent, i.e.,p lim n   
9
Wereplace  2 by  2 in the expressions for

Var and Var to get the estimated variances
of  and  .
i.e.,
  2
Var 
 i x i  x  2
 2
 n i xi 2
and, Var 
 i x i  x  2
Positive square root of these estimated
variances give us the standard errors of the

estimators  and  .
 
i.e., Var  std.error of  ; and

Var  std.error of 

3.5 Assumptions & Properties of OLS Estimators

According to the Gauss Markov


 theorem the

OLS estimators (i.e.,  and ) are Best,
10
Linear, Unbiased estimators (B.L.U.E) of 
and .

Assumptions 2.1 to 2.4 above are required for


obtaining the estimators and to prove
 
Unbiasedness of  and . If Assumption 2.4
(zero conditional mean) is violated and there
is correlation between the
errors and the x-variables the estimators
would be biased.

Assumption 2.5 (homoskedasticity) is


 
required to prove efficiency of  and .

Assumption 2.6 (normally distributederrors)



is NOT required to prove that  and  are
B.L.U.E. This assumption is required for
carrying out hypothesis tests.

11
4. Multicollinearity

4.1 What is Multicollinearity ?

Multicollinearity refers to the presence of high


linear correlation between the explanatory
variables.It is part of the Data generating
process (DGP), very common in business /
social sciences.

4.2 How does it affect properties of OLS estimators ?


OLS estimators are BLUE in the presence of
Multicollinearity (But, perfect multicollinearity
is ruled out by assumption 2.3 above).

Why do we rule out perfect multicollinearity ?


In our discussion on the Dummy variable trap
we saw the OLS model in matrix form and the
X-matrix. In the presence of
perfect multicollinearity columns of the
X-matrix would become linearly dependent

12
(as we saw in case of the dummy variable
trap), so inverse of the X-matrix would not
exist and we would not be able to estimate
the unknown parameters  and.

4.3 What is the problem due to multicollinearity?

To understand why multicollinearity is a


problem see the variance of the estimators in
the MLR model :

Var j  2
, where
SST x j 1R 2j

Var j  Variance of x j , the jth explantory


variable
SST x j  Total sample variation in x j
R 2j  R 2 from regressing x j on all the other
explanatory variables
High correlation between explanatory
variables means R 2j is very high (close to 1)
and 1  R 2j  is very low, so that Var j tends
to be high, given values of SST x j and  2 .

13
High Var j leads to high standard errors and
low values of t-statistics (remember t-statistic
  j
of  j   ).
se j

In the presence of multicollinearity, variables


can be statistically insignificant (very low
t-statistics for individual variables), even
though the F-statistic, for overall significance
is significant.

4.4. How do we detect Multicollinearity ?

We can calculate Variance Inflation Factor or


VIF for each explanatory variable,
1
VIF of x j  ;
1R 2j

The general cutoff value for VIF is 10


(whenR 2j is close to 1, say 0.9, indicating high
correlation between x j and other explantory
variables).
It indicates to what extent standard error of x j
is higher due to its correlation with other

14
xi, i  j
But there is no exact measure or cutoff for
what is ‘too high’. In practice therefore, VIF
has limited relevance.

4.5 What can we do about Multicollinearity ?

We try to have large datasets to ensure high


variance in the x-variables. Look at the
components of Var j . If total variance of x j is
high, SST x j would be high and this would
reduce Var j and lower standard errors, for
given values of  2 and R 2j .
Do you see,
when data sets are small SST x j is also lower ?

This creates a problem of ‘micronumerosity’ !


Small data sets have low values of SST x j
which can lead to high Var j , high standard
errors and low t-statistics, even if there is no
multicollinearity.

15
Taking log of variables may also help in the
presence of multicollinearity.

There is no simple rule of thumb to tackle this


problem.

Be careful about dropping explanatory


variables to avoid multicollinearity. This can
create a problem of omitted variables bias.

Remember estimators are BLUE in the


presence of multicollinearity. But dropping a
variable can lead to biased estimators.

Readings : Section on Multicollinearity from


Chapter 3 in Wooldridge (2012).

16
5. Heteroskedasticity

5.1 What is Heteroskedasticity ?

It is violation of Assumption 2.5 above and is


best understood in contrast with
Homoskedasticity, the assumption regarding
the disturbance variance being constant.
Under Homoskedasticity :
Varu|x 1, x 2 , . . . , x k    2
Since Eu|x 1, x 2 , . . . , x k   0
we can write
Eu 2 |x 1, x 2 , . . . , x k    2
In contrast, under Heteroskedasticity :
Varu i |x 1, x 2 , . . . , x k    2i , i  1, . . . n
Since Eu i |x 1, x 2 , . . . , x k   0
we can write
Eu 2i |x 1, x 2 , . . . , x k    2i
17
Heteroskedasticity essentially means error
variances may not be constant, rather they
may be functions of the explanatory variables
or x-variables.

5.2 What problem does Heteroskedasticity cause ?

(i) When there is a relation between


error variance and x-variables the
assumption of homoskedasticity is violated,
so OLS estimators are not efficient, not BLUE
any more.
As you saw above the error variance ( 2 )
affects variance of estimators (Var j ). Hence
standard errors of estimators are affected and
results of hypothesis tests are no longer valid
(iii) But estimators are unbiased and
consistent; this follows as long Assumption
2.4 is valid i.e., there is
zero correlation between errors and x-variables

18
5.3 What to do about Heteroskedasticity ?

First we discuss what to do when we


do not know the form of heteroskedasticity.
(When form of heteroskedasticity is known
we use WLS as discussed in Section 5.5
below).

Suppose we suspect
Eu 2i |x 1, x 2 , . . . , x k    2i  fx 1, x 2 , . . . , x k  but we
do not know what is the exact funtional form
of ffx 1, x 2 , . . . , x k .
In this case we can use
heteroskedasticity- robust standard errors .

How to estimate robust standard errors ?

Recall in the 2-variable model, under


homoskedasticity :

19
  i x i  x  2  2 2
Var  
 i x i  x  
2 2
 i x i  x  2
Under heteroskedasticity :

  i x i  x  2  2i
Var 
 i x i  x  2  2

To estimate Var  in the presence of


heteroskedasticity of any form , replace the
error variance  2i above by its estimator, the
residual variance û 2i (recall, residuals,û i also
have zero mean);

So robust standard error estimation involves


using the following to estimate Var:
 x i  x  2 û 2i
Var 
2 2
x i  x 

20
(ii) Robust standard errors can always be
used with cross-section data and is valid
even under homoskedasticity, when the data
set is large.

5.4 Tests for Heteroskedasticity

(i) With heteroskedasticity, the error variance


( 2 ) is a function of the x-variables.
Therefore tests check for presence or
absence of a relation between the squared
residuals (û 2 ) (proxy for the error variance)
and the included explanatory variables, on
the basis of an auxilliary regression of the
following type :
û 2   0   1 x 1   2 x 2   3 x 21   4 x 22   5 x 1 x 2  
After this regression we test :
H o :  1   2 . .   5  0 against H 1 :At least
one   0
You can see that with just 2 explanatory
variables in the model, the auxilliary
regression has to estimate 6 parameters. So
21
there can be problems with degrees of
freedom, in models with more than 2
regressors.

(ii) The following test for heteroskedasticity


can be used to conserve degrees of freedom
:
 
û2  0  1 y  2 y 2  

where, y are predicted values from the
regression of y on the x-variables.

5.5 Weighted Least Squares (WLS) Estimators

Suppose our regression model has


heteroskedasticity and
we know the exact functional form relating the
error variance to the x-variables. In this case
WLS estimators are used as they correct for
heteroskedasticity.

Recall from our statistics class, for any

22
random variable y and a constant m:
Varmy  m 2 Vary
 ifVary   2 ,
Varmy  m 2  2
Also remember, for a set of independent
variables, (e.g., the error terms are assumed
to be independent), the variance of the sum is
equal to the sum of variances :
Varu i    2 , i
Then
Var u i   n 2

We will use these results as we proceed.

Weighted Least Squares (WLS) - Example 1

Consider the following example :


Model :

23
y i    x i  u i   1
where, Varu i |x i    2 fx   2 x i
In this case we use WLS and transform our
model as follows, where each term is
weighted by 1x i :
yi
  1   x i  u i   2
xi xi xi xi
y
or, i     x i  u i   2
xi xi xi
What is the variance of the error term in
Model (2) ?
Var u i   x1i Varu i   x1i  2 x i   2
xi
i. e. Var u i    2
xi
Using WLS we have homoskedastic errors,
i.e., the error variance is constant in Model
(2) !
So when the exact form of heteroskeasiticy is
known we use Generalized Least Squares

24
(GLS) based on the following general
principle. Suppose :
Varu|x   2 fx
We transform the model by using WLS using
the weights 1 as follows :
fx

y   x  u   3

fx fx fx fx
So that :
Var u   1 Varu  1  2 fx
fx fx fx

or, Var u    2
fx
Look at the estimated coefficients in the WLS
Models (2) and (3). Their interpretation is
exactly the same as in Model 1.

Clearly, when the form of heteroskeasiticy is


known, WLS estimators are more efficient.

25
Weighted Least Squares (WLS) - Example 2

Suppose the model is :


y    x  u
and
Varu   2
But, the data is not available for y and x, only
data on averages is available.
E.g., instead of years of education (y) and
age (x) of each employee, for each firm, you
have the average number of years of
education and average age of all the
employees.
i.e., the Model you are estimating is :
y    x  u
where

26
 ui
Varu  Var n   12 Var u i 
n
 12 n 2
n
 1n  2
The errors are heteroskedastic in the
averages-Model and the form of
heteroskedasticity is known.
So we use WLS as discussed above.
Here fx  1n so the weights used will be
1
 n
fx

Using WLS the model is transformed as


follows :
y i n i   n i  x i n i  u i n i
In this model errors are homoskedastic :
Varu i n i   nVaru i   n 1n  2   2

This example shows, if we are working with


cross-section data on, say, firm-level
27
averages for employees of each firm, it is
best to weight each observation by
square-root of number of employees in the
firm. This WLS estimator is more efficient
compared to the OLS estimator.

Carefully go through the examples in


Wooldridge Chapter 8 to understand this
better.

Note : When the form of the


heteroskedasticity is unknown, i.e.fx is not
known, it may be estimated using the
residuals to obtain fx . When fx is used to
transform the data we are using Feasible
Generalized Least Squares (FGLS).

Suggested Readings for Heteroskedasticity :


Chapter 8 in Wooldridge (2012).
Wooldridge, J.M. (2012), Introductory
econometrics: A modern approach, Cengage
Learning (Latest edition)
28
29

You might also like