ch 4 2023 Eonometrics for acct and finance

CHAPTER IV: Violations of assumptions of the CLRM
4.1 Multicollinearity
When discussing the suitability of a model, an important issue is the interaction among
the independent variables. The statistical term used for the problem that arises due to
high correlations among the independent variables in a multiple regression model is
multicollinearity. Technically, multicollinearity is caused by independent variables in
a regression model that contain common information (or are highly inter-correlated).
The presence of such highly correlated independent variables prevents us from
obtaining insight into the true contribution to the regression from each of the
independent variables.
In a very extreme case, two or more variables may be perfectly correlated, which would
imply that some explanatory variables are merely linear combinations of others. The
result of this would be that some variables are fully explained by others and, thus,
provide no additional information. This is a very extreme case, however. In most
problems in finance, the independent variables are not perfectly correlated but may be
correlated to a high degree.
4.1.1 Consequences of perfect MC

We say there is a perfect MC if two or more explanatory variables are perfectly
correlated with one another. In such cases, the estimators of the regression coefficients
as well as their standard errors are indeterminate (undefined).
Example 1: Consider the multiple linear regression model involving two explanatory
variables X2 and X 3 :
Yi  1  2X2i  3X3i  i ………………….. (1)
Expressing all variables in deviations form, that is, yi  (Yi  Y) , x 2i  (X2i  X2 ) and
x 3i  (X3i  X3 ) , the OLS estimators of 2 and 3 are given by:
ˆ   2i i   3i   3i i   2i 3i  ……………. (2)
 x y   x2    x y   x x 
 x 22i   x 3i2    x 2i x 3i 
2 2
ˆ   3i i   2i   2i i   2i 3i  ……………. (3)
 x y   x2    x y   x x 
 x 22i   x 3i2    x 2i x 3i 
3 2
Suppose X2 and X 3 are perfectly correlated, and their relationship is given by

X2  2X3 . The OLS estimator of 2 is (replacing x 2 by 2x3 ):
ˆ  
 (2x 3 )y   x 32    x 3 y   (2x 3 )x 3 
 (2x 3 )2   x 32    (2x 3 )x 3 
2 2
CHAPTER IV: Violations of assumptions of the CLRM 52
2   x 3 y    x 32   2  x 3 y   x 32  0
 
4   x 3    x 3   4   x 3 
2 2 2 2 0
Thus, ̂ 2 is indeterminate. It can also be shown that ̂3 is indeterminate. Therefore, in

the presence of perfect MC, the regression coefficients cannot be estimated.
We have seen in Chapter 3 that the standard errors of ̂ 2 and ̂3 are estimated by:
ˆ 2
se(ˆ 2 )  ……………. (4)
(1  r232 ) x 2i2
ˆ 2
se(ˆ 3 )  ……………. (5)
(1  r232 ) x 3i2
where r23 is the coefficient of correlation between X2 and X 3 , and ̂2 is the residual
variance.
The relationship x 2i  2x 3i implies that X2 and X 3 are perfectly correlated, and

consequently, r23  1 (or r232  1 ). In this case, the expression (1  r232 ) in the
denominators of equations (4) and (5) is equal to zero. Thus, the standard error
estimators involve division by zero, and consequently, both standard errors are
indeterminate.
4.1.2 Consequences of strong (severe) MC

Strong multicollinearity is much more likely to occur in practice, and would arise when
there is a non-negligible, but not perfect, relationship between two or more
explanatory variables.
Example 2: Again consider model (1) above. There is a strong MC between X2 and
X 3 means that r23 will be a number close to 1 or –1 (but not equal to  1 for this
would mean perfect MC). In such cases:
 r232 approaches to one
 (1  r232 ) approaches to zero
 (1  r232 ) x 22i and (1  r232 ) x3i2 both approach to zero (become very small)
 se(ˆ 2 ) and se(ˆ 3 ) both become very large (or will be inflated)
Recall that the test statistic for testing H 0 :  j  0 versus H1 :  j  0 is given by:
ˆ j
tj  , j  2, 3
se(ˆ j )
The decision rule is to reject the null hypothesis when t j is ‘large’ in absolute value,
that is, when | t j |  t  / 2 (n  3) . Since the standard errors (the denominators) will be
very large due to strong multicollinearity, the test statistic will be very small (tends to
zero). This often leads to incorrectly failing to reject the null hypothesis (Type II
error)!
53 Applied Econometrics for Accounting and Finance
In general, due to strong multicollinearity, the standard errors of the regression

coefficients will be inflated, the t-ratios will be too small, and consequently, all or some
of the independent variables will be insignificant. Note that this may happen in spite of
the fact that the model appears to be doing well (that is, large R 2 and significant F-
ratio in ANOVA).
Example 3: The following data (from main stream economics) pertain to imports (Y),
GDP ( X2 ), stock formation ( X 3 ) and consumption ( X4 ) for the years 1949 – 1967.
Year Y X2 X3 X4 Year Y X2 X3 X4
1949 15.9 149.3 4.2 108.1 1959 26.3 239.0 0.7 167.6
1950 16.4 161.2 4.1 114.8 1960 31.1 258.0 5.6 176.8
1951 19.0 171.5 3.1 123.2 1961 33.3 269.8 3.9 186.6
1952 19.1 175.5 3.1 126.9 1962 37.0 288.4 3.1 199.7
1953 18.8 180.8 1.1 132.1 1963 43.3 304.5 4.6 213.9
1954 20.4 190.7 2.2 137.7 1964 49.0 323.4 7.0 223.8
1955 22.7 202.1 2.1 146.0 1965 50.3 336.8 1.2 232.0
1956 26.5 212.4 5.6 154.1 1966 56.6 353.9 4.5 242.9
1957 28.1 226.1 5.0 162.3 1967 59.9 369.7 5.0 252.0
1958 27.6 231.9 5.1 164.3
Applying OLS, we obtain the following results (using EViews):

Dependent Variable: IMPORTS
Variable Coefficient Std. Error t-Statistic Prob.
C -19.98241 4.372139 -4.570396 0.0004
GDP 0.099763 0.193641 0.515194 0.6139
STOCK_FORMATION 0.446650 0.341174 1.309157 0.2102
CONSUMPTION 0.148789 0.296877 0.501181 0.6235
R-squared 0.975354
F-statistic 197.8734
Prob(F-statistic) 0.000000
The value of R 2 is close to 1, meaning GDP, stock formation and consumption

together explain 97.5% of the variation in imports. Also the F-statistic is significant at
the 1% level of significance (since Prob(F-statistic) = 0.000 < 0.01) - an indication that
at least one of the explanatory variables is significant.
However, the regression coefficients corresponding to GDP, stock formation and

consumption are all insignificant at the 5% level of significance (since the p-values of
all exceed 0.05). This paradox (contradiction) is an indication that the standard errors
are inflated due to MC. Since an increase in GDP is often associated with an increase
in consumption, they have a tendency to grow up together over time leading to MC.
The coefficient of correlation between GDP and consumption is 0.999. Thus, it seems
that the problem of MC is due to the joint appearance of these two variables.
4.1.3 Methods of detection of MC

Multicollinearity almost always exists in most applications. So the question is not
whether it is present or not – it is a question of degree! Also MC is not a statistical
problem; it is rather a data (sample) problem. Therefore, we do not “test for MC’’ but
measure its degree in any particular sample using some rules of thumb.
Some of the methods of detecting MC are:
1. High R 2 but few (or no) significant regression coefficients.
2. High pair-wise correlations among explanatory variables. Note that this is a

sufficient but not a necessary condition: a small pair-wise correlation for all pairs of
regressors does not guarantee the absence of MC. In other words, there could still
be strong MC due to three or more explanatory variables that are collinear.
3. Variance inflation factor (VIF): Consider the following regression model:

Yi  1  2 X2i  3X3i  . . .  k Xki  i ……… (6)
The VIF of ̂ j is defined as:

1
VIF(ˆ j )  , j  2, 3, . . ., k
1  R 2j
where R 2j is the coefficient of determination from a regression of jth explanatory
variable ( X j ) on the remaining explanatory variables. For example, the VIF of ̂ 2
is defined as:
1
VIF(ˆ 2 ) 
1  R 22
where R 22 is the coefficient of determination from the auxiliary regression:
X2i  1  3X3i  4X4i  . . .  k Xki  ui …….. (7)
Rule of thumb: The benchmark (threshold) for the VIF is often given as 10. Thus, if
VIF(ˆ j) exceeds 10, then ̂ j is poorly estimated because of MC or the jth regressor
variable ( X j ) is responsible for MC.
Note: Some recent literatures use a threshold of 5 (instead of 10).
Example 4: Consider the data in example 3. The auxiliary regression of GDP ( X2 ) on

stock formation ( X 3 ) and consumption ( X4 ) is given by:
X2i  1  3X3i  4 X4i  ui
The coefficient of determination from this auxiliary regression (using EViews) is found
to be: R 22 = 0.998203. The VIF of ̂ 2 is thus:
1 1
VIF(ˆ 2 )    556.5799
1  R2
2
1  0.998203
Since this figure is by far exceeds 10, we can conclude that the coefficient of GDP is
poorly estimated because of MC (or that GDP is responsible for MC).
The variance inflation factors for all explanatory variables (EViews output) are shown
below:
Variance Inflation Factors
Coefficient Uncentered Centered

Variable Variance VIF VIF
C 19.11560 63.32197 NA
GDP 0.037497 7980.397 556.5817
STOCK_FORMATION 0.116400 6.494439 1.079783
CONSUMPTION 0.088136 9176.249 555.8980
We can see that the (centered) VIF’s corresponding to GDP and consumption are
greater than 10. This is an indication that these two variables are responsible for MC.
4.1.4 Remedial measures

To circumvent the problem of MC, some of the possibilities are:
1. Include additional observations maintaining the original model so that a

reduction in the correlation among variables is attained.
2. Drop one of the collinear variables: Removing an explanatory variable that is

responsible for strong MC will surely solve the problem. However, this may not be
acceptable if there are strong theoretical reasons for including both variables in the
model (model misspecification).
Example 5: Consider example 4. Variance inflation factors indicated that the joint
appearance of GDP and Consumption has caused MC. Thus, one possibility would be
dropping Consumption from the regression model. After dropping Consumption, the
results are shown below:
Dependent Variable: IMPORTS

C -18.08911 2.148980 -8.417536 0.0000
GDP 0.196718 0.008318 23.64925 0.0000
STOCK_FORMATION 0.438727 0.332737 1.318542 0.2059
R-squared 0.974941
The strange results that we have seen earlier (the inconsistency between F-test and t-
tests) no more prevails. Notice that the value of R-squared remains almost the same
after the removal of Consumption in the re-specified model. This is an indication that
Consumption provides no additional information (is redundant).
The variance inflation factors from the re-specified model are shown below. The VIF
for both GDP and Stock formation now becomes 1.077 (which is less than 10) – an
indication that the problem of multicollinearity has been properly dealt with.
Variance Inflation Factors
Coefficient Uncentered Centered

Variable Variance VIF VIF
C 4.618114 16.04899 NA
GDP 6.92E-05 15.44894 1.077464
STOCK_FORMATION 0.110714 6.480497 1.077464
4.2 Violation of the assumption of normality of errors

One of the assumptions of the CLRM is that the disturbance terms are normally
distributed. The implication of the violation of this assumption is that the estimators of
the regression coefficients are also not normally distributed. Consequently, the F-test
of model adequacy and the t-test of significance of regression coefficients will no
more be applicable.
The Jarque-Bera test is one of the common tests of normality. This test is based on the
coefficients of skewness and kurtosis of the residuals.
A distribution is said to be skewed to the right (or positively skewed) when most of
the data are concentrated on the left tail of the distribution, that is, the right tail extends
out quite a long way from the distribution's center than the left tail. For such
distributions, the coefficient of skewness is greater than zero (positive). On the other
hand, if most of the data are concentrated on the right tail of the distribution, then it
said to be skewed to the left (or negatively skewed). The coefficient of skewness for a
left-skewed distribution is less than zero (negative).
(a) Right-skewed distribution (b) Left-skewed distribution
Right- and left-skewed distributions with median ( X )
The other characteristic of a distribution is its peakedness which is measured through

the coefficient of kurtosis. A distribution is said to be leptokurtic if it exhibits fat tails
and excess peakedness at the mean. This is a typical characteristic of most financial
time series (e.g., asset returns). Such distributions are said to have excess kurtosis – a
coefficient of kurtosis by far exceeding three. The other type of distribution which
has a smaller peak at the center is called platykurtic. Such distributions have a
coefficient of kurtosis smaller than three.
Leptokurtic and platykurtic distributions

The null and alternative hypotheses of the Jarque-Bera test of normality are:
H0 : The errors are approximately normally distributed

H1 : The errors are not normally distributed
For the normal distribution (which is bell-shaped and symmetric about the mean), the
coefficients of skewness and kurtosis are zero and three, respectively. The Jarque-Bera
test statistic compares the sample coefficients of skewness (S) and kurtosis (K) of the
residuals ( ̂ i ) with that of the normal distribution. The test statistic is given by:
n (K  3) 2 
JB   2
 …………………. (8)
6  4 
(S 0)
where S and K are computed as:
1 n 1 n
n n
(ˆ i  ˆ )3 (ˆ i  ˆ )4
i 1 i 1
S  3/ 2
, K  2
1 n  1 n 
 n  (ˆ i  ˆ )   n  (ˆ i  ˆ ) 
2 2
 i 1   i 1 
We reject the null hypothesis of normality if:
JB  2 (2)
where 2 (2) is the critical value from the Chi-square distribution with two degrees of
freedom for a level of significance  .
If normality is rejected, one possible reason is the presence of extreme values

(outliers) among the observed values of the variables in the regression. It might also be
the case that the linearity assumption between the dependent and the independent
variables is possibly incorrect.
Example 6: Consider example 3. EViews output of normality test of the residuals for
the final regression model (consumption removed) is shown below:
7
Series: Residuals
6 Sample 1949 1967
Observations 19
5
Mean -8.69e-15
Median 0.349307
4 Maximum 3.096290
Minimum -4.021041
3 Std. Dev. 2.204495
Skewness -0.378786
2 Kurtosis 2.083374
Jarque-Bera 1.119511
1
Probability 0.571349
0
-5 -4 -3 -2 -1 0 1 2 3 4
The Jarque-Bera test statistic is 1.119511 with a p-value of 0.571349. Since the p-value
exceeds 0.05, we do not reject the null hypothesis of normality of residuals.
4.3 Autocorrelation of errors

Consider the following multiple linear regression model:
Yt  1  2X2t  3X3t  . . .  k Xkt  t t  1, 2, . . ., T ………. (9)
where t denotes time and T is the sample size. One of the assumptions of the CLRM is:
cov( t ,  t  s )  E( t  t  s )  0 for s  0
This property of the regression disturbances is called non-autocorrelation or absence

of serial correlation. It tells us that the error term at time t is not correlated with the
error term at any other point in the past or there is no relationship between the current
and past values of a time series (variable), that is, there is no carry-over effect. In case
of cross-sectional data (such as those on income and expenditure of different
households), this assumption is plausible since the expenditure behaviour of one
household does not affect the expenditure behaviour of any other household in general.
The assumption of non-autocorrelation is more frequently violated in case of relations

estimated from time series data. For instance, non-autocorrelation simply means the
fact that the price of an asset is higher than expected today should not lead to a higher
(or lower) than expected price tomorrow. Generally, this is unrealistic since future
prices are often correlated with current and past prices. Moreover, an underestimate for
one quarter’s profits can result in an underestimate of profits for subsequent quarters.
Thus, the assumption of non-autocorrelation does not seem plausible here.
In general, there are a number of situations under which this assumption is violated,
that is, we have:
cov( t ,  t  s )  E( t  t  s )  0 for s  0
In such cases we say that the errors are autocorrelated or serially correlated. Here
 t  s is called lagged value of  t (when s  0 ). The lagged value of a variable is
simply the value that the variable took during a previous period (e.g.,  t 1 is the value
of  t lagged by one period,  t 2 is the value of  t lagged by two periods, etc.).
4.3.1 Autoregressive processes

In order to study the behaviour of the disturbances under autocorrelation, we have to
specify the nature (mathematical form) of the relationship. Usually we assume that the
disturbances follow an autoregressive process of order one, denoted by AR(1), which
is defined as:
 t   t 1  u t ………………………………… (10)
where the u t ’s satisfy all assumptions of the CLRM (that is, they have zero mean,
constant variance and are uncorrelated). Here, each term is correlated with its
predecessor so that the variance of the disturbances is partially explained by regressing
each term on its predecessor.
The parameter  in equation (10) is the correlation coefficient between  t and  t 1 .

The case where   0 is known as positive autocorrelation – if  t 1 is positive
(negative), then  t is likely to be also positive (negative), on average. In case of
negative autocorrelation (   0 ), if the disturbance at time (t-1) is positive, the

disturbance at time t is likely to be negative, and vice versa. If   0 , then there is no
autocorrelation between  t and  t 1 .
4.3.2 Consequences of using OLS in the presence of autocorrelation

The consequence of ignoring autocorrelation when it is present is that the estimators of
the regression coefficients derived using OLS are still unbiased, but the standard error
estimates could be wrong. Thus, there is a possibility that wrong inferences could be
made about the significance of explanatory variables.
For simplicity, consider the simple linear regression model:

Yt    Xt  t ………………………...… (11)
Suppose there is positive serial correlation in the disturbances (which is often the case
in practice). If we proceed to estimate the regression coefficients using OLS, then the
standard error for the slope ( se(ˆ ) ) will be under-estimated, and consequently, the t-
ratio will be large. This often leads to rejecting the null hypothesis H0 :   0 while
it is true (Type I error). Thus, test of model adequacy (F-test) and tests of significance
(t-test) are invalid if there is autocorrelation of errors.
4.3.3 Tests for the presence of autocorrelation
1. Graphical method
We first estimate the model using OLS and plot the residuals ( ̂ t ) against time. If there
is a clustering of neighbouring residuals on one or the other side of the line ˆ t  0 ,
then such clustering is probably a sign that the errors are autocorrelated. However,
graphical methods may be difficult to interpret in practice, and hence, formal statistical
tests should also be applied.
RESID
8
-2
-4
-6
-8
02 03 04 05 06 07 08 09 10 11 12 13
Clustering of residuals below and above the line ˆ  0
2. Durbin-Watson (DW) test

Suppose the errors follow an autoregressive process of order one as specified in
equation (10): t  t 1  u t . The null and alternative hypotheses are:
H0 :   0
H1 :   0
The null hypothesis states that there is no serial correlation since   0 would mean
t  u t and the u t ’s are uncorrelated by definition. Rejection of the null hypothesis is
an indication that the errors are indeed autocorrelated. The DW test statistic is
computed as:
T
 (ˆ
t2
t  ˆ t 1 )2
d  T
………………………...… (12)
 ˆ
t 1
2
t
The numerator compares the values of the residuals at times (t-1) and t. If there is
positive autocorrelation in the errors, the successive residuals ̂ t and ˆ t 1 will
frequently have the same sign (will be in the same direction with respect to the line
ˆ  0 ). In this case, the difference ( ˆ t  ˆ t 1 ) in the numerator will be relatively small,
and consequently, the test statistic will be small. Thus, we reject the null hypothesis
when the test statistic (12) is ‘small’.
The DW test is a non-standard test, that is, the test statistic does not follow a standard
statistical distribution such as the t-distribution, F-distribution or Chi-square
distribution. Instead, we refer to critical values specifically developed for the DW test.
Unlike the standard tests, we have two critical values: an upper critical value ( d U () )
and a lower critical value ( dL () ), and there is also an intermediate region where
the null hypothesis of no autocorrelation can neither be rejected nor not rejected.
Decision rule:
Reject H0 if d  dL ()
Do not reject H0 if d  d U ()
The test is inconclusive if dL ()  d  d U ()
The decision rule is shown diagrammatically below:
Some of the limitations of the DW test are: there is a certain region where the test is
inconclusive; the test is valid only for the AR(1) error scheme; and the test is invalid
when lagged values of the dependent variable (e.g., Yt 1 , Yt 2 , etc.) appear as
regressors.
3. Breusch-Godfrey (BG) test

As stated above, the Durbin-Watson test is applicable only if the disturbances follow an
AR(1) process. The BG test is a generalization of the DW test that is applicable for an
autoregressive scheme of any order p (denoted by AR(p), p  1, 2, 3. . . . ). Suppose our
model is as specified in equation (11), and that the errors follow an AR(2) scheme
which is defined by:
t  1t 1  2t 2  u t ……………………..… (13)

where u t fulfils all assumptions of the CLRM. The null hypothesis of no serial
correlation in the disturbances is given by:
H0 : 1  2  0
The steps involved in carrying out the BG test are:

i) Estimate equation (11) using OLS and obtain the residuals ̂ t .
ii) Regress ̂ t on X t , ˆ t 1 and ˆ t 2 , that is, run the following auxiliary regression:
ˆ t    Xt  1ˆ t 1  2ˆ t 2  t ……… (14)
iii) Obtain the coefficient of determination ( R 2 ) from the auxiliary regression (14).
iv) Calculate the test statistic as:
2  (T  p) R 2
where T is the sample size and p is the number of lagged residuals included in the
auxiliary regression. In this particular case, we have p = 2 since two lagged
residuals ( ˆ t 1 and ˆ t 2 ) are included in equation (14).
v) Decision rule: Reject the null hypothesis of no autocorrelation if the test statistic
exceeds the critical value from the Chi-square distribution with p degrees of
freedom for a given level of significance (  ).
4. Tests based on the correlogram of OLS residuals

A correlogram is a plot of the autocorrelation function (ACF) and partial
autocorrelation function (PACF) of a time series of observations against the number
of lags. In testing the presence of autocorrelation among error terms, the focus is on the
PACF of the errors.
A correlogram produced by software often displays the PACF of the residuals together
with the 95% upper and lower confidence limits at each of the lags. If the function at
lag one extends beyond the upper or lower confidence limits, for instance, then this is
an indication that the errors follow an AR(1) process. Higher order error processes can
be detected similarly. The following figures illustrate the PACF of an AR(1) error
process and that of a process which exhibits no serial correlation.
PACF for an AR(1) error process PACF for an error process with no serial
correlation
4.3.4 Correcting for error autocorrelation

Consider the simple linear regression model in equation (11) where the errors are
generated according to the AR(1) scheme:  t  t 1  u t . Here, u t  t   t 1
satisfies all assumptions of the CLRM.
Suppose by applying any one of the above tests you come to the conclusion that the
errors are autocorrelated. What to do next?
Lagging equation (11) by one period and multiplying throughout by  , we get:

Yt 1    Xt 1  t 1 ……………..….. (15)
Subtracting equation (15) from equation (11), we get:
Yt  Yt 1  (1  )   (X t  X t 1 )  ( t   t 1 )
 X*t u
 Yt*  * t
 Y    X  u t ……….….…. (16)
*
t
* *
t
Since the error term u t (  t   t 1 ) fulfils all assumptions of the CLRM, we can
estimate equation (16) using OLS. The above transformation is known as the
Cochrane-Orcutt transformation. This transformation requires a knowledge of the
value of  . An estimator ̂ of  can be obtained by a regression of the OLS residuals
̂ t on ˆ t 1 without a constant term, that is: ˆ t  ˆ t 1  u t
Example 7: The following data is on investment (INVEST) and value of outstanding

shares (VOS) for the years 1935-1953.
year INVEST (Y) VOS (X) year INVEST (Y) VOS (X)
1935 317.6 3078.5 1945 561.2 4840.9
1936 391.8 4661.7 1946 688.1 4900.9
1937 410.6 5387.1 1947 568.9 3526.5
1938 257.7 2792.2 1948 529.2 3254.7
1939 330.8 4313.2 1949 555.1 3700.2
1940 461.2 4643.9 1950 642.9 3755.6
1941 512.0 4551.2 1951 755.9 4833.0
1942 448.0 3244.1 1952 891.2 4924.9
1943 499.6 4053.7 1953 1304.4 6241.7
1944 547.5 4379.3
The estimated regression equation of INVEST on VOS is (EViews output):

Dependent Variable: INVEST
C -186.1598 216.2925 -0.860685 0.4014

VOS 0.175261 0.049697 3.526581 0.0026
R-squared 0.422491
F-statistic 12.43677 Durbin-Watson stat 0.552764
The F-statistic is significant at the 1% level (since Prob(F-statistic) = 0.0026 < 0.01) .
This indicates that the model is adequate. However, we have to explore the model for
the presence of autocorrelation of errors.
AC diagnostics
We first plot the estimated residuals against time and look for some model
misspecifications. A scatter plot of the estimated disturbances (residuals) is shown
below. We can see a clustering of neighbouring residuals on one or the other side of the
line ˆ  0 . This might be a sign that the errors are autocorrelated. However, we need to
conduct formal tests of error autocorrelation before coming to a final conclusion.
RESID
500
400
300
200
100
-100
-200
-300
-400
1936 1938 1940 1942 1944 1946 1948 1950 1952
One of the statistics displayed in EViews output is the Durbin-Watson (DW) statistic.
This test statistic is equal to d = 0.553. Note that the p-value for the test is not provided
since the test is non-standard, and thus, we have to refer to the DW critical values. At
the 5% level of significance (   0.05 ), the Durbin-Watson critical values (for T = 19)
are: dL () = 1.180 and d U () = 1.401. Since d = 0.553 is less than dL () , we reject
the null hypothesis of no serial correlation in the residuals ( H0 :   0 ).
We can also test for error AC using the Breusch-Godfrey (BG) test. EViews output of
this test is shown below. Since the p-values of both versions of the test (F-test and Chi-
square test) are less than 0.05, we reject the null hypothesis of no autocorrelation.
Breusch-Godfrey Serial Correlation LM Test:
F-statistic 7.940436 Prob. F(2,15) 0.0044

Obs*R-squared 9.770986 Prob. Chi-Square(2) 0.0076
Note: If the two tests yield contradictory results, it is better to go for the results implied
by the F-test since it is applicable for large as well as small sample sizes, while the Chi-
square test is an asymptotic test and requires large sample size.
Another method of detecting error autocorrelation is through the partial autocorrelation

function (PACF) of the residuals. This is shown below. We can see that the PACF at
lag one extends beyond the upper confidence limit, while that of higher lags lies within
the confidence limits. This is an indication that the errors follow an AR(1) process:
 t   t 1  u t .
All tests indicated that there is serial correlation in the residuals. The implication is that
all significance tests based on OLS regression are invalid. Thus, we need to apply the
Cochrane-Orcutt transformation. This transformation requires an estimate ̂ of  .
This is achieved through a regression of the OLS residuals ̂ t on ˆ t 1 without a
constant term. This gives the following result:
Dependent Variable: RESIDUAL
RESIDUAL(-1) 0.804902 0.205706 3.912878 0.0011
Thus, an estimate of  is given by ̂ = 0.805.
The Cochrane-Orcutt transformation

We then apply the following (Cochrane-Orcutt) transformation:
Yt  Yt 1  (1  )  (X t  X t 1 )  u t
 Yt  0.805Yt 1  (1  0.805)   (X t  0.805X t 1 )  u t

* X*t
Yt*
 Yt*  *  X*t  u t
Note that the transformed model fulfills all basic assumptions and, thus, we can
estimate the parameters in this equation by OLS regression of Yt*
( INVESTt  0.805* INVESTt 1 ) on X t ( VOSt  0.805* VOSt 1 ) . The result (EViews
*
output) is shown below:
Dependent Variable: INVEST_TRAN
C 66.26550 36.13147 1.834011 0.0853

VOS_TRAN 0.091288 0.026284 3.473124 0.0031
R-squared 0.429846
The partial autocorrelation function of the residuals in the transformed model is shown
below. We can see the partial correlation of the residuals at all lags lie within the upper
and lower confidence limits – an indication that the autocorrelation structure has been
properly dealt with.
EViews output of the Breusch-Godfrey test of serial correlation of residuals of the

transformed model is shown below. We can see that the p-values of both versions of the
test (F-test and Chi-square test) exceed 0.05. Thus, the null hypothesis of no
autocorrelation cannot be rejected.
Breusch-Godfrey Serial Correlation LM Test:

4.4 Heteroscedasticity
Consider the model:
Yi  1  2 X2i  3X3i  . . .  k Xki  i i  1, 2, . . ., n
One of the assumptions of the CLRM is:
Var(i )  E(i2 )  2 i  1, 2, . . ., n
This assumption tells us that the variance remains constant for all observations. This
assumption of constant variance is referred to as homoscedasticity. However, there are
many situations in which this assumption may not hold, that is, the error terms may be
expected to be larger for some observations or periods of the data than for others.
Under such circumstances, we have the case of heteroscedasticity. Generally, under
heteroscedasticity we have:
Var(i )  E( i2 )  k i 2 , k i are not all equal
Heteroscedasticity often occurs when we estimate relationships based on cross-

sectional data. Consider the case of modeling the relationship between income and
expenditure of individual families. Here the assumption of homoscedasticity is not very
plausible since we expect less variation in consumption levels for low income families
than for high income families. At low levels of income, the average level of
consumption is low and the variation around this level is restricted: consumption cannot
fall too far below the average level because this might mean starvation, and it cannot
rise too far above the average level because the asset does not allow it. These
constraints are likely to be less binding at higher income levels. This situation is
displayed in the figure below. Similarly, companies with larger profits are expected to
show greater variability in their dividend policies than companies with lower profits.
consumption income
The variance in consumption levels systematically increases with income
4.4.1 Consequences of using OLS in the presence of heteroscedasticity

If the errors are heteroscedastic and yet we proceed with estimation and inference using
OLS, then the OLS estimators will still be unbiased, but the standard errors will be
either underestimated or overestimated. This invalidates any inferences made based on
the fitted model.
In the simple linear regression model Yi    Xi  i , suppose that the variance of
the errors is positively related with the square of the explanatory variable (which is
often the case in practice under heteroscedasticity), that is, i2 increases with X i2 .
Then, the OLS standard error estimate for the slope ( se(ˆ ) ) will be too low (an
underestimate). In this case, the t-ratio will be large, and consequently, the probability
of incorrectly rejecting the null hypothesis H0 :   0 (or Type I error) increases. In
general, the estimated variances of the OLS estimators are biased and the conventional
tests of significance are invalid.
4.4.2 Detection of heteroscedasticity

How can one tell whether the errors are heteroscedastic or not? One possibility is to use
a graphical method. But one rarely knows the cause or the form of heteroscedasticity
from the plot. Fortunately, there are a number of formal statistical tests for
heteroscedasticity. Here we will discuss White’s test.
Suppose that our model is:
Yi  1  2X2i  3X3i  i ………………… (17)
We first estimate equation (17) by OLS and obtain the residuals ( ̂ i ). We then apply
OLS to the following auxiliary regression:
ˆ i  1   2 X 2i   3 X3i   4 X 2i   5 X3i   6 X 2i X3i  u i

2 2 2
………. (18)
The reason that the auxiliary regression takes this form is that it is desirable to
investigate whether the variance of the residuals varies systematically with any known
variables relevant to the model. Relevant variables include the original explanatory
variables, their squared values and their cross-products.
The null hypothesis of homoscedasticity is:

H0 : 2  3  . . .  6  0
White’s test is based on the coefficient of determination ( R 2 ) from the auxiliary

regression. If one (or more) of the coefficients in equation (18) is statistically
significant, then the value of R 2 will be relatively large. This is an indication that the
error variance (proxied by ̂i2 ) increases (or decreases) with the values of one (or more)
of the regressor variables, that is, the assumption of constant error variance
(homoscedasticity) is violated. The test statistic is given by:
2  nR 2
where n is the total number of observations. Notice that the higher the value of R 2 , the
higher will be the test statistic. Thus, we reject the null hypothesis of homoscedasticity
when the test statistic is ‘large’.
Decision rule: Reject H0 if the test statistic exceeds the critical value from the Chi-
square distribution with m degrees of freedom for a given level of significance (  ).
Here, m is the number of regressors in the auxiliary regression (m = 5 in our case).
4.4.3 Dealing with heteroscedasticity

One possible way to overcome the problem of heteroscedasticity is by making certain
assumptions about the variance of the disturbances ( var(i )  E(i2 )  i2 ). Often we
assume that i2 is associated with some variable, say Zi , which may be one of the
regressors in the model under consideration or some ‘outside’ variable, that is,
i2  2 Zi2 . In this case, all that would be required to get rid of heteroscedasticity is to
divide the regression equation throughout by Zi .
As an illustration, suppose that the model under consideration is equation (17).

Dividing the equation throughout by Zi we get:
 Yi   1  X 2i   X3i   i 
 Z   1  Z   2  Z   3  Z   Z 
 i  i  i   i   i
Yi* Z*i X*2i X*3i *i
 Yi*  1Z*i  2 X*2i  3X*3i  *i ……………. (19)
Now we have:
0
  E(i ) 0
E(*i )  E  i     0
 Zi  Zi Zi
 2 Zi2
2
    E(i2 ) 2 Zi2
var(*i )  var  i   E  i   2
 2
 2
 Zi   Zi  Zi Zi
We can see that the disturbances from the transformed model (equation 19) have
constant variance ( 2 ), and hence, we can estimate the regression coefficients using
OLS. This method of estimation is known as weighted least-squares (WLS). WLS can
be viewed as OLS applied to transformed (weighted) data.
One problem with WLS is that we need to identify the regressor that induced
heteroscedasticity in order to perform appropriate linear transformation. This difficulty
can be resolved by using White’s heteroscedasticity consistent estimators. These
estimators are robust to the violation of the homoscedasticity assumption and thereby
help us perform consistent statistical inference in the face of heteroscedasticity. Note
that this method corrects for heteroscedasticity without altering the values of the
coefficients.
Example 8: The following data pertain to rental value (in Birr), floor area (in square
meters) and frontage size of plot (in meters) of a random sample of 68 properties.
Property Rental value Floor area Frontage Property Rental value Floor area Frontage
1 400 15 15 35 1000 30 20
2 1400 6 12 36 500 16 4
3 500 9 12.5 37 1000 16 20
4 500 12 15 38 800 32 10
5 4000 30 10 39 800 24 10
6 5000 35 25 40 3600 32 25
7 400 12 10 41 1000 30 12.5
8 600 12 16 42 1500 9 15
9 500 8 10 43 2500 24 14
10 500 24 20 44 900 12 12
11 2600 20 5 45 900 16 10
12 2800 20 12 46 1500 12 10
13 500 9 10 47 1100 18 15
14 450 12 10 48 800 11.2 8
15 3000 24 8 49 2500 21 25
16 800 12 20 50 1000 9 15
17 400 20 10 51 1800 20 5
18 400 20 10 52 1500 24 20
19 400 21 10 53 2500 28 14
20 600 20 15 54 1200 12 4
21 500 16 10 55 1500 9 20
22 500 20 10 56 1000 9 4
23 300 30 6 57 1000 9 12
24 1500 9 10 58 1500 12 25
25 600 8 10 59 1300 26 4
26 600 9 10 60 1500 16 10
27 2000 16 6 61 1000 5 12.5
28 4000 30 30 62 2000 35 15
29 1000 30 25 63 1300 16 5
30 3500 32 10 64 1700 26 4
31 700 12 12 65 1600 20 10
32 700 32 12.5 66 2000 30 8
33 3700 30 25 67 1500 21 7
34 1000 4 10 68 2000 24 20
The model we would like to fit is:
Yi  1  2 X2i  3 X3i  i ………………. (20)
where Y, X2 and X 3 represent rental value, floor area and frontage size of plot,
respectively. EViews output of the fitted model is shown below.
Dependent Variable: RENTAL_VALUE
C -166.3250 306.9813 -0.541808 0.5898

FLOOR_AREA 56.34780 12.91721 4.362227 0.0000
FRONTAGE_SIZE 40.62063 17.90170 2.269093 0.0266
R-squared 0.323259
The value of F-statistic is 15.52431 with a p-value of less than 0.01. Thus, the fitted
model passes the F-test at the 1% level of significance, that is, we reject the null
hypothesis that the regressors are all insignificant ( H0 : 2  3  0 ). Moreover, both
explanatory variables are significant. However, we have to conduct further tests for
possible violation of basic assumptions. In particular, we have to assess the presence of
heteroscedasticity of errors which is more prevalent in relationships estimated from
cross-sectional data.
We start with a graphical exploration of possible relationship between regressors and

spread of residuals. A scatter plot of OLS residuals ( ̂ ) against floor area ( X2 ) is
shown below. It can clearly be seen that the spread of the residuals (i.e., the variance of
the residuals) increases with floor area. This might be an indication of a
heteroscedasticity problem. However, we need to apply formal statistical tests before
arriving at a final conclusion.
2,400
2,000
1,600
1,200
800
RESID
400
-400
-800
-1,200
-1,600
0 5 10 15 20 25 30 35 40
Floor area
A scatter plot of residuals against floor area
The results of White’s test (EViews output) is shown below. The p-values
(corresponding to all versions of the test) are less than 0.05. Thus, we reject the null
hypothesis of homoscedasticity.
Heteroskedasticity Test: White

Scaled explained SS 21.95753 Prob. Chi-Square(5) 0.0005
As discussed earlier, the conventional test of significance (t-test) is invalid since the
disturbances are heteroscedastic. One possible solution is to use White’s
heteroscedasticity consistent estimators which are robust to the violation of the
homoscedasticity assumption. EViews output of the fitted model with
heteroscedasticity consistent standard errors is shown below.
Dependent Variable: RENTAL_VALUE
Method: Least Squares
Included observations: 68
White heteroskedasticity-consistent standard errors & covariance
C -166.3250 329.4048 -0.504926 0.6153

FLOOR_AREA 56.34780 14.44866 3.899862 0.0002
FRONTAGE 40.62063 20.51162 1.980372 0.0519
R-squared 0.323259 Mean dependent var 1406.618

Adjusted R-squared 0.302437 S.D. dependent var 1044.385
S.E. of regression 872.2727 Akaike info criterion 16.42320
Sum squared resid 49455877 Schwarz criterion 16.52112
Log likelihood -555.3887 Hannan-Quinn criter. 16.46199
F-statistic 15.52431 Durbin-Watson stat 1.968742
Prob(F-statistic) 0.000003 Wald F-statistic 9.676643
Prob(Wald F-statistic) 0.000210
The value of R-squared and F-statistic are unchanged, but the standard errors are now
larger than those reported earlier. Thus, the OLS standard errors of the coefficient
estimates in the original fitted model were under-estimated due to heteroscedasticity,
leading to inflated t-ratios. This explains why frontage size is now insignificant (p-
value = 0.0519 > 0.05), while it was significant in the original model.
Appendix: Critical values for Durbin-Watson test (5% level of significance)
*
Note: k is the number of regressors excluding the intercept.

ch 4 2023 Eonometrics for acct and finance

Uploaded by

Copyright:

Available Formats

You might also like

ch 4 2023 Eonometrics for acct and finance

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ch 4 2023 Eonometrics for acct and finance

Uploaded by

Copyright:

Available Formats

CHAPTER IV: Violations of assumptions of the CLRM

4.1.1 Consequences of perfect MC

Suppose X2 and X 3 are perfectly correlated, and their relationship is given by

Thus, ̂ 2 is indeterminate. It can also be shown that ̂3 is indeterminate. Therefore, in

The relationship x 2i  2x 3i implies that X2 and X 3 are perfectly correlated, and

4.1.2 Consequences of strong (severe) MC

In general, due to strong multicollinearity, the standard errors of the regression

Applying OLS, we obtain the following results (using EViews):

The value of R 2 is close to 1, meaning GDP, stock formation and consumption

However, the regression coefficients corresponding to GDP, stock formation and

4.1.3 Methods of detection of MC

Some of the methods of detecting MC are:

1. High R 2 but few (or no) significant regression coefficients.

2. High pair-wise correlations among explanatory variables. Note that this is a

3. Variance inflation factor (VIF): Consider the following regression model:

The VIF of ̂ j is defined as:

Note: Some recent literatures use a threshold of 5 (instead of 10).

Example 4: Consider the data in example 3. The auxiliary regression of GDP ( X2 ) on

Variance Inflation Factors

Coefficient Uncentered Centered

4.1.4 Remedial measures

1. Include additional observations maintaining the original model so that a

2. Drop one of the collinear variables: Removing an explanatory variable that is

Dependent Variable: IMPORTS

Variance Inflation Factors

Coefficient Uncentered Centered

4.2 Violation of the assumption of normality of errors

(a) Right-skewed distribution (b) Left-skewed distribution

Right- and left-skewed distributions with median ( X )

The other characteristic of a distribution is its peakedness which is measured through

Leptokurtic and platykurtic distributions

H0 : The errors are approximately normally distributed

where S and K are computed as:

We reject the null hypothesis of normality if:

If normality is rejected, one possible reason is the presence of extreme values

4.3 Autocorrelation of errors

This property of the regression disturbances is called non-autocorrelation or absence

The assumption of non-autocorrelation is more frequently violated in case of relations

4.3.1 Autoregressive processes

The parameter  in equation (10) is the correlation coefficient between  t and  t 1 .

negative autocorrelation (   0 ), if the disturbance at time (t-1) is positive, the

4.3.2 Consequences of using OLS in the presence of autocorrelation

For simplicity, consider the simple linear regression model:

4.3.3 Tests for the presence of autocorrelation

Clustering of residuals below and above the line ˆ  0

2. Durbin-Watson (DW) test

The decision rule is shown diagrammatically below:

3. Breusch-Godfrey (BG) test

t  1t 1  2t 2  u t ……………………..… (13)

The steps involved in carrying out the BG test are:

4. Tests based on the correlogram of OLS residuals

4.3.4 Correcting for error autocorrelation

Lagging equation (11) by one period and multiplying throughout by  , we get:

Subtracting equation (15) from equation (11), we get:

Example 7: The following data is on investment (INVEST) and value of outstanding

The estimated regression equation of INVEST on VOS is (EViews output):

Variable Coefficient Std. Error t-Statistic Prob.

C -186.1598 216.2925 -0.860685 0.4014

F-statistic 7.940436 Prob. F(2,15) 0.0044

 Yi*  1Zi  2 X2i  3X3i  i ……………. (19)