Chapter 4 Violations of The Assumptions of Classical Linear Regression Models

CHAPTER 4: VIOLATIONS OF THE ASSUMPTIONS OF CLASSICAL
LINEAR REGRESSION MODELS

4.1. Multicollinearity
4.1.1. Introduction
 One of the assumptions of CLR model is that there are no exact linear
relationships between the independent variables and that there are at least as
many observations as the dependent variables (Rank of the regression). If either
of these is violated it is impossible to estimate OLS and the estimating procedure
simply breaks down.
 In estimation the number of observations should be greater than the number of

parameters to be estimated. The difference between the sample size the number
of parameters (the difference is the degree of freedom) should be as large as
possible.
 In regression there could be an approximate relationship between independent

variables.
 Even though the estimation procedure might not entirely breakdown when the
independent variables are highly correlated, severe estimation problems might
arise.
 There could be two types of multicollinearity problems: Perfect and less than
perfect collinearity.
 If multicollinearity is perfect, the regression coefficients of the X variables are

indeterminate and their standard errors infinite.
 If multicollinearity is less than perfect, the regression coefficient although

determinate, possesses large standard errors, which means the coefficients can
not be estimated with great precision.
4.1.2. Sources of multicollinearity

1. The data collection method employed: For instance, sampling over a limited
range.
2. Model specification: For instance adding polynomial terms.
3. An over determined model: This happens when the model has more
explanatory variables than the number of observations. This could happen in
medical research where there may be a small number of patients about whom
information is collected on a large number of variables.
1
4. In time series data, the regressors may share the same trend.
4.1.3. Consequences of Multicollinearity

1. Although BLUE, the OLS estimators have larger variances making precise
estimation difficult. OLS are BLUE because near collinearity does not affect the
assumptions made.
When the independent variables are uncorrelated, the correlation coefficient is zero.
However, when the correlation coefficient becomes high (close to 1) in absolute
value, multicollinearity is present with the result that the estimated variances of both
parameters get very large.
While the estimated parameter values remain unbiased, the reliance we place on the
value of one or the other will be small. This presents a problem if we believe that one
or both of the variables ought to be in the model, but we cannot reject the null
hypothesis because of the large standard errors. In other words the presence of
multicollinearity makes the precision of the OLS estimators less precise.
2. The confidence intervals tend to be much wider, leading to the acceptance of the
null hypothesis
3. The t ratios may tend to be insignificant and the overall coefficient of determination
may be high.
4. The OLS estimators and their standard errors could be sensitive to small changes in
the data.
4.1.4. Detection of Multicollinearity

The presence of multicollinearity makes it difficult to separate the individual effects
of the collinear variables on the dependent variable. Explanatory variables are rarely
uncorrelated with each other and multicollinearity is a matter of degree.
1) A relatively high R 2 and significant F-statistics with few significant t-

statistics.
2) Wrong signs of the regression coefficients
3) Examination of partial correlation coefficients among the independent

variables.
4) Use subsidiary or auxiliary regressions. This involves regressing each

independent variable on the remaining independent variables and use F-test to
determine the significance of R 2 .
2
R 2 /  k  1
F
 1 R  /  n  k 
2
5) Using VIF (variance inflating factor)
1
VIF 
 1  R2 
Where, R 2 is the multiple correlation coefficients between the independent variables.
VIF  10 is used to indicate the presence of multicollinearity between continuous

variables.
When the variables to be investigated are discrete in nature, Contingency Coefficient

(CC) is used.
2
CC 
N  2
Where, N is the total sample size
If CC is greater than 0.75, the variables are said to be collinear.
4.1.5. Remedies of Multicollinearity

Several methodologies have been proposed to overcome the problem of
multicollinearity.
1) Do nothing: Sometimes multicollinearity is not necessarily bad or

unavoidable. If the R 2 of the regression exceeds the R 2 of the regression of
any independent variable on other variables, there should not be much
worry. Also, if the t-statistics are all greater than 2 there should not be
much problem. If the estimation equation is used for prediction and the
multicollinearity problem is expected to prevail in the situation to be
predicted, we should not be concerned much about multicollinearity.
2) Drop a variable(s) from the model: This however could lead to

specification error.
3) Acquiring additional information: Multicollinearity is a sample problem.

In a sample involving another set of observations multicollinearity might
not be present. Also, increasing the sample size would help to reduce the
severity of collinearity problem.
3
4) Rethinking of the model: Incorrect choice of functional form, specification
errors, etc…
5) Prior information about some parameters of a model could also help to get
rid of multicollinearity.
6) Transformation of variables: e.g. in to logarithms, forming ratios, etc…
7) Use partial correlation and stepwise regression
 This involves the determination the relation ship between a dependent variable
and independent variable(s) by netting out the effect of other independent
variable(s).
4.2. Autocorrelation
4.2.1. Introduction
 One of the assumptions of the OLS is that successive values of the error terms
are independent or are not related.
cov  ui u j   E  ui u j   0
 If this assumption is not satisfied and the value of any of the disturbance terms
is related to its preceding value, then there is autocorrelation (serial
correlation).
 In this chapter we will concentrate on first order serial correlation:
ut   ut 1   t
4.2.2. Sources of Autocorrelation
a) Inertia: The momentum built into economic data will continue until something
happens. Thus successive time series data are likely to be interdependent.
b) Specification Bias
i) Excluded variable(s)
ii) Incorrect functional form
c) Cobweb Phenomenon: Agricultural supply reacts to price with a lag of one
time period because supply decisions take time to implement. E.g. Beginning
of this year’s planting of crops; farmers are influenced by prices prevailing last
year.
d) Lags: For various reasons the behavior of some economic variables does not
change readily. E.g. Consumers do not change their consumption habits
readily for psychological, technological or institutional reasons.
4
Ct  1   2income   3Consumptiont 1  ut
If the lagged term is neglected, the error term will reflect a systematic pattern.
e) Manipulating data: e.g. taking averages, interpolation, extrapolation
f) Data transformation.
4.2.3. Consequences of Autocorrelation

 Autocorrelation does not affect the linearity, consistency and unbiasedness of
OLS estimators.
 Under the existence of autocorrelation, OLS estimators do not have minimum
variance (not efficient)
 OLS estimators might fit the data more closely than the true regression thus
reducing the standard errors of the estimates.
 Invalid statistical inferences. The usual t and F statistics are not reliable.

ESS
 The formula to compute the error variance  
2
, is biased estimator of
d. f .
the true error variance (might underestimate it in most cases). As a
consequence the estimated R 2 could be unreliable.
4.2.4. Detection and Tests for serial Correlation
Dubrin-Watson Test:
H0 :   0  no autocorrelation
H1 :   0
 The Dubrin-Watson test involves the calculation of a test statistic based on the
residuals from the OLS regression procedure.
T 2
 ^ ^

   t   t 1 
DW  t  2  T 2

^
 t
t 1
 The DW statistic lies between 0 and 4.

DW near 2 indicates no first order serial correlation.
DW  2 Positive serial correlation
DW  2 Negative serial correlation
 By making several approximations:

 
DW  2 1   
 
5
Expanding the d statistic:
 d  2  1  
Thus, when there is no serial correlation    0  , the DW statistic will be close to 2.
Graphical Method
 This involves the visual inspection of the pattern taken by the residuals by
plotting them against time.
4.2.5. Corrections for Autocorrelation
The Cochrane-Orcutt Procedure
 It involves a series of iterations, each of which produces a better estimate of

 than does the previous one. The estimated  is used the generalized
differencing transformation process.
1) Estimate the original regression equation using OLS.

Yt   0  1 X 1t   t
2) By using the residuals of the above equation, estimate the following regression
 
    t 1  vt
3) The estimated  value is used in the estimation of the following generalized
difference transformed equation.
Yt*   0 1      2 X 1*t  ...   n X nt*


 
 
Where, Y *  Y   Y and X *  X *   X
t t t 1 nt nt kt 1
4) The estimated transformed equations yield parameter values for the

parameters of the original equation. These revised parameter estimates are
used in the original equation and new regression residuals are obtained.
5) This procedure will continue until the new estimates of  differ from the old
ones by less than 0.01 0r 0.005 or after 10 to 20 estimates of  have been
obtained.
The Hildreth-Lu Procedure

 This involves the selection of grid values of  as 0, 0.1, 0.2, 0.3, …, 1.0 and
estimating a transformed regression equation for each value of  .
Yt*   0  1     1 X 2*t  ...   n X nt*  vt
6
 The procedure selects the equation with the lowest residual sum of
squares as the best equation.
 In using this procedure we could choose any limits and any spacing
arrangement between the grids.
4.3. Heteroskedasticity
4.3.1. Introduction
 Spherical disturbances: One of the assumptions of the CRM is that the
disturbances are spherical. This means the disturbances have uniform variance
and are not correlated with each other. The issue of not uniform disturbances
is known as heteroskedastic.
 The variance-covariance of the error terms can be represented in a matrix form
with n columns and n rows. The diagonal terms of the matrix represent the
variances of the individual disturbances and the off-diagonal terms represent
the covariance between them.
 If all the diagonal terms are the same, the disturbances are said to have
uniform variance (homoskedastic) and if they are not the same they are said to
be heteroskedastic. This means the disturbance term is thought of as being
drawn from a different distribution for each observation.
 If the disturbances are homoskedastic, their variance-covariance matrix can be
written as:
E   '    2 I
Where, I is an identity matrix of nxn (with 1’s along the diagonal and zero’s on
the off-diagonal sides).
 Heteroskedasticity is usually associated with cross sectional observations and
not time series.
4.3.2. Causes of Heterosckedasticity

 When we consider the specific product sales of firms, the sales of large sized
firms is usually volatile compared to small sized firms. Also, consumption
expenditure studies showed that the consumption expenditure of high-income
people is relatively volatile compared to low income individuals.
 Outlier: is an observation from a different population to that generating the
remaining sample observations.
 Violating the assumptions of CLRM: Specifically of specification error
(omitted variable). In such a case the residuals obtained from the regression
may give the distinct impression that the error variances may not be constant.
 Skewness: the distribution of some variables such as income, wealth, etc… is
skewed.
 Incorrect data transformation, incorrect functional form, etc…
7
4.3.3. Consequences of Heteroskedasticity
1) OLS is still linear and unbiased
 The fact that the parameter estimators are unbiased can be seen as
follows:
 x  x      x
 xi yi
 i i i i i
x 2
i x 2
i x 2
i
  ^  x 
E     E  i i
    x  2
  i
 This means as we increase our observation, negative deviations will tend to be

off set by positive deviations from the regression line.
2) The formulas for obtaining OLS variances of the parameter estimates are biased.
 When heteroskedasticity is present, ordinary least squares estimation places
more weight on observations with large variances than those with smaller
variances. The weighting occurs because the sum of squared residuals
associated with large error variances are likely to be larger the sum of squared
residuals associated with low error variances.
3) OLS estimators are inefficient (do not have minimum variance).

 This means they are no longer BLUE. Even though, the OLS estimators is
unbiased as we increase our sample size the absolute deviations of the error
variances from the regression line would still remain large leading to
inefficiencies in the estimated variances of OLS.
4) When heteroskedasticity is present, the usual hypothesis tests and confidence
intervals based on the various test statistics do not hold and become unreliable leading
to wrong conclusions.
4.3.4. Detection of the Presence of Heteroskedasticity

Visual inspection of the residuals
 This involves plotting the residuals against the independent variable to which
the residuals are suspected to be related. This helps to check whether the
residuals show systematic variations with the independent variable(s).
The Goldfeld-Quandt test

 Observations are ordered by the magnitude of the independent variable
thought to be related to the variances of the disturbances.
 Needs the formation of two equal sized groups. A certain number of central
observations (d) need to be omitted.
 Separate regressions are run for each of the two groups and the ratio of their
sum of squared residuals is formed. Assuming the error variances are
8
ESS 2
distributed normally, the statistic will have an F distribution with
ESS1
nd 4
  d . f . in both the numerator and denominator.
 2 
 If the calculated F value is greater than the tabulated value, reject
the null hypothesis of homoskedasticty (same variance).
Breusch-Pagan test
 Does not require ordering of observations but requires the assumption of
normality.
 It requires the specification of the relationship between the true error variance
  i
2
and independent variable (s)  Z  .
 To conduct the test:
a) Calculate the least squares residuals from the original regression
equation.
  
Yi   0  1 X i   i
b) Estimate the regression variance
 2
i 
 i
2
n
c) Run the regression
2
i
    Z i  vi
 2
i
d) After obtaining the regression sum of squares from the above equation,
calculate the following statistic
RSS
   1
2
2
When there are    independent variables, the relevant test would be:
RSS
  p
2
2
 If the calculated 2 value is greater than the tabulated (critical)
value, we reject the null hypothesis of homoskedasticity.
The white test

 It does not require normally distributed error terms.
 So, instead of normalized equations, we use the following
9
 2
 i     Zi  vi
 Obtain the value of R 2
 Compute
nR 2   21
 When there are    independent variables
nR 2    p 
2
4.3.5. What to do with heteroskedasticity

As we have seen, heteroscedasticity does not destroy the unbiasedness and
consistency property of the OLS estimators, but they are no longer efficient. This
lack of efficiency makes the usual hypothesis testing procedure of dubious value.
Therefore, remedial measures concentrate on the variance of the error term.
Consider the model
Y     X i  U i , var(u i )   i2 , (u i )  0 (u i u j )  0
If we apply OLS to the above then it will result in inefficient parameters since
var(u i ) is not constant.
The remedial measure is transforming the above model so that the transformed model
satisfies all the assumptions of the classical regression model including
homoscedasticity. Applying OLS to the transformed variables is known as the
method of Generalized Least Squares (GLS). In short GLS is OLS on the
transformed variables that satisfy the standard least squares assumptions. The
estimators thus obtained are known as GLS estimators, and it is these estimators that
are BLUE.
To overcome the problem of heteroskedasticity, we can use estimation method called
the Weighted Least Squares (WLS).
However, we rarely know the true error variance associated with each observation.
10

Chapter 4 Violations of The Assumptions of Classical Linear Regression Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 Violations of The Assumptions of Classical Linear Regression Models

Uploaded by

Copyright:

Available Formats

CHAPTER 4: VIOLATIONS OF THE ASSUMPTIONS OF CLASSICAL

LINEAR REGRESSION MODELS

 In estimation the number of observations should be greater than the number of

 In regression there could be an approximate relationship between independent

 If multicollinearity is perfect, the regression coefficients of the X variables are

 If multicollinearity is less than perfect, the regression coefficient although

4.1.2. Sources of multicollinearity

2. Model specification: For instance adding polynomial terms.

4.1.3. Consequences of Multicollinearity

4.1.4. Detection of Multicollinearity

1) A relatively high R 2 and significant F-statistics with few significant t-

2) Wrong signs of the regression coefficients

3) Examination of partial correlation coefficients among the independent

4) Use subsidiary or auxiliary regressions. This involves regressing each

5) Using VIF (variance inflating factor)

VIF  10 is used to indicate the presence of multicollinearity between continuous

When the variables to be investigated are discrete in nature, Contingency Coefficient

If CC is greater than 0.75, the variables are said to be collinear.

4.1.5. Remedies of Multicollinearity

1) Do nothing: Sometimes multicollinearity is not necessarily bad or

2) Drop a variable(s) from the model: This however could lead to

3) Acquiring additional information: Multicollinearity is a sample problem.

6) Transformation of variables: e.g. in to logarithms, forming ratios, etc…

7) Use partial correlation and stepwise regression

 In this chapter we will concentrate on first order serial correlation:

4.2.3. Consequences of Autocorrelation

4.2.4. Detection and Tests for serial Correlation

 The DW statistic lies between 0 and 4.

 By making several approximations:

4.2.5. Corrections for Autocorrelation

The Cochrane-Orcutt Procedure

 It involves a series of iterations, each of which produces a better estimate of

1) Estimate the original regression equation using OLS.

Yt*   0 1      2 X 1*t  ...   n X nt*

4) The estimated transformed equations yield parameter values for the

The Hildreth-Lu Procedure

4.3.2. Causes of Heterosckedasticity

 This means as we increase our observation, negative deviations will tend to be

3) OLS estimators are inefficient (do not have minimum variance).

4.3.4. Detection of the Presence of Heteroskedasticity

The Goldfeld-Quandt test

The white test

 Obtain the value of R 2

4.3.5. What to do with heteroskedasticity

You might also like

Yt*   0 1      2 X 1t  ...   n X nt