Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 11

Amity Institute of Applied

Sciences

MULTICOLLINEARITY

Dr. Niraj Kumar Singh

1
Amity Institute of Applied
Sciences

Multicollinearity
• One of the important assumptions of classical
linear model is to assume that there is no
correlation among the explanatory variables
included in the linear model. For example if we
consider a linear model
yi = Bo + B1X1i + B2X2i+ ui : i=1, 2
Then this assumption of CLRM says that
Cor (X2i , X1i)=0
That is X2i and X1i are not correlated by any
means.
Amity Institute of Applied
Sciences
– The situation when this assumption is violated
is referred as Multicollinearity.
– When the explanatory variables are highly
correlated it becomes Very difficult to
distinguish the separate effects of each of the
explanatory variables.
• The term “multicollinearity” means the existence
of a perfect or exact linear relationship among
some or all explanatory variables of a regression
model.

3
Amity Institute of Applied
Sciences

For the k-variable regression model, an exact


relationship is said to exist if the following
condition is satisfied:
λ1X1+ λ2X2+……+ λkXk=0 (1)
Where λi are not all zero simultaneously.
Multicollinearity given by condition (1) is called
‘perfect’ multicollinearity.
To distinguish with perfect multicollinearity with
the other case, when X variables are inter-
correlated but not as perfectly as (1),
Amity Institute of Applied
Sciences

we refer to the following condition


λ1X1+ λ2X2+……+ λkXk + νi = 0
(2)
where νi is a stochastic term. X variables are said
to have multicollinearity but not as perfect as (1).
To see the difference between these two types of
multicollinearity let us see the following
hypothetical data:
X1 : 10 15 18 24 30
X2 : 50 75 90 120 150
X3 : 52 75 97 129 152
Amity Institute of Applied
Sciences
It is apparent that X2i = 5 X1i . Thus these two
variables are said to have perfect
multicollinearity.
Now let us consider the variable X3 .
The relationship between X1 and X3 is observed
as follows:
X3i = 5 X1i + νi , i = 1, 2, 3, 4 ,5
Where νi = 2, 0, 7, 9, 2 for i = 1, 2, 3, 4 ,5
respectively.
Here the variables X1 and X3 are said to have
multicollinearity but not as perfect as the previous
case.
Amity Institute of Applied
Sciences
Sources of Multicollinearity
Multicollinearity may occur due to
• different data collection techniques being employed
for example sampling over a limited range of the
values taken by the regressors in the model.
• constraints on the model or on the population
being sampled. For example in the regression of
electricity consumption (Y) on income (X2) and
house size (X3) there is a physical constraint that
family with higher income usually have larger
homes than the families with lower incomes.
Amity Institute of Applied
Sciences
• misspecification of model for example adding
polynomial terms to the model especially when
the range of the X variables is small.
• an over determined model. This occurs when the
model has more explanatory variables than the
number of observations. For example in a
medical study where there may be a small no. of
patients about whom information has to be
collected on a large number of variables.
• regressors in the model share a common trend
that is they all increase or decrease over the
time.
Amity Institute of Applied
Sciences

For example in the regression of consumption


expenditure on income wealth and population, the
regressors income, wealth and population may all
be growing over the time.

9
Amity Institute of Applied
Sciences

Consequences of Multicollinearity
In cases of near and high multicollinearity one is
likely to have the following consequences:
• Although the Ordinary least square estimators are
still BLUE, they have large variances and
covariance's making precise estimation difficult.
• Because of above consequence, the confidence
intervals tend to be much wider, leading to the
acceptance of the zero null hypothesis (i.e. the true
population coefficient is zero).
• The t-ratio of one or more variables tends to be
statistically insignficant.
Amity Institute of Applied
Sciences
• The multiple correlation coefficient R 2 i.e. the
overall measure of goodness of fit of the model
can be very high misleading about goodness of
fit of the model.
• The OLS estimators and their standard errors
can be sensitive to small change in the data.
Ideally OLS should not change with small change
in data.

You might also like