Professional Documents
Culture Documents
Chapter 4 Violations of The Assumptions of Classical Linear Regression Models
Chapter 4 Violations of The Assumptions of Classical Linear Regression Models
4.1.1. Introduction
One of the assumptions of CLR model is that there are no exact linear
relationships between the independent variables and that there are at least as
many observations as the dependent variables (Rank of the regression). If either
of these is violated it is impossible to estimate OLS and the estimating procedure
simply breaks down.
Even though the estimation procedure might not entirely breakdown when the
independent variables are highly correlated, severe estimation problems might
arise.
There could be two types of multicollinearity problems: Perfect and less than
perfect collinearity.
3. An over determined model: This happens when the model has more
explanatory variables than the number of observations. This could happen in
medical research where there may be a small number of patients about whom
information is collected on a large number of variables.
1
4. In time series data, the regressors may share the same trend.
While the estimated parameter values remain unbiased, the reliance we place on the
value of one or the other will be small. This presents a problem if we believe that one
or both of the variables ought to be in the model, but we cannot reject the null
hypothesis because of the large standard errors. In other words the presence of
multicollinearity makes the precision of the OLS estimators less precise.
2. The confidence intervals tend to be much wider, leading to the acceptance of the
null hypothesis
3. The t ratios may tend to be insignificant and the overall coefficient of determination
may be high.
4. The OLS estimators and their standard errors could be sensitive to small changes in
the data.
2
R 2 / k 1
F
1 R / n k
2
1
VIF
1 R2
Where, R 2 is the multiple correlation coefficients between the independent variables.
2
CC
N 2
Where, N is the total sample size
3
4) Rethinking of the model: Incorrect choice of functional form, specification
errors, etc…
5) Prior information about some parameters of a model could also help to get
rid of multicollinearity.
This involves the determination the relation ship between a dependent variable
and independent variable(s) by netting out the effect of other independent
variable(s).
4.2. Autocorrelation
4.2.1. Introduction
One of the assumptions of the OLS is that successive values of the error terms
are independent or are not related.
cov ui u j E ui u j 0
If this assumption is not satisfied and the value of any of the disturbance terms
is related to its preceding value, then there is autocorrelation (serial
correlation).
ut ut 1 t
4.2.2. Sources of Autocorrelation
a) Inertia: The momentum built into economic data will continue until something
happens. Thus successive time series data are likely to be interdependent.
b) Specification Bias
i) Excluded variable(s)
ii) Incorrect functional form
c) Cobweb Phenomenon: Agricultural supply reacts to price with a lag of one
time period because supply decisions take time to implement. E.g. Beginning
of this year’s planting of crops; farmers are influenced by prices prevailing last
year.
d) Lags: For various reasons the behavior of some economic variables does not
change readily. E.g. Consumers do not change their consumption habits
readily for psychological, technological or institutional reasons.
4
Ct 1 2income 3Consumptiont 1 ut
If the lagged term is neglected, the error term will reflect a systematic pattern.
e) Manipulating data: e.g. taking averages, interpolation, extrapolation
f) Data transformation.
Dubrin-Watson Test:
H0 : 0 no autocorrelation
H1 : 0
The Dubrin-Watson test involves the calculation of a test statistic based on the
residuals from the OLS regression procedure.
T 2
^ ^
t t 1
DW t 2 T 2
^
t
t 1
5
Expanding the d statistic:
d 2 1
Thus, when there is no serial correlation 0 , the DW statistic will be close to 2.
Graphical Method
This involves the visual inspection of the pattern taken by the residuals by
plotting them against time.
Where, Y * Y Y and X * X * X
t t t 1 nt nt kt 1
5) This procedure will continue until the new estimates of differ from the old
ones by less than 0.01 0r 0.005 or after 10 to 20 estimates of have been
obtained.
6
The procedure selects the equation with the lowest residual sum of
squares as the best equation.
In using this procedure we could choose any limits and any spacing
arrangement between the grids.
4.3. Heteroskedasticity
4.3.1. Introduction
Spherical disturbances: One of the assumptions of the CRM is that the
disturbances are spherical. This means the disturbances have uniform variance
and are not correlated with each other. The issue of not uniform disturbances
is known as heteroskedastic.
The variance-covariance of the error terms can be represented in a matrix form
with n columns and n rows. The diagonal terms of the matrix represent the
variances of the individual disturbances and the off-diagonal terms represent
the covariance between them.
If all the diagonal terms are the same, the disturbances are said to have
uniform variance (homoskedastic) and if they are not the same they are said to
be heteroskedastic. This means the disturbance term is thought of as being
drawn from a different distribution for each observation.
If the disturbances are homoskedastic, their variance-covariance matrix can be
written as:
E ' 2 I
Where, I is an identity matrix of nxn (with 1’s along the diagonal and zero’s on
the off-diagonal sides).
Heteroskedasticity is usually associated with cross sectional observations and
not time series.
7
4.3.3. Consequences of Heteroskedasticity
1) OLS is still linear and unbiased
The fact that the parameter estimators are unbiased can be seen as
follows:
x x x
xi yi
i i i i i
x 2
i x 2
i x 2
i
^ x
E E i i
x 2
i
8
ESS 2
distributed normally, the statistic will have an F distribution with
ESS1
nd 4
d . f . in both the numerator and denominator.
2
If the calculated F value is greater than the tabulated value, reject
the null hypothesis of homoskedasticty (same variance).
Breusch-Pagan test
Does not require ordering of observations but requires the assumption of
normality.
It requires the specification of the relationship between the true error variance
i
2
and independent variable (s) Z .
To conduct the test:
a) Calculate the least squares residuals from the original regression
equation.
Yi 0 1 X i i
b) Estimate the regression variance
2
i
i
2
n
c) Run the regression
2
i
Z i vi
2
i
d) After obtaining the regression sum of squares from the above equation,
calculate the following statistic
RSS
1
2
2
When there are independent variables, the relevant test would be:
RSS
p
2
2
If the calculated 2 value is greater than the tabulated (critical)
value, we reject the null hypothesis of homoskedasticity.
9
2
i Zi vi
Compute
nR 2 21
When there are independent variables
nR 2 p
2
10