Professional Documents
Culture Documents
ch 4 2023 Eonometrics for acct and finance
ch 4 2023 Eonometrics for acct and finance
ch 4 2023 Eonometrics for acct and finance
4.1 Multicollinearity
When discussing the suitability of a model, an important issue is the interaction among
the independent variables. The statistical term used for the problem that arises due to
high correlations among the independent variables in a multiple regression model is
multicollinearity. Technically, multicollinearity is caused by independent variables in
a regression model that contain common information (or are highly inter-correlated).
The presence of such highly correlated independent variables prevents us from
obtaining insight into the true contribution to the regression from each of the
independent variables.
In a very extreme case, two or more variables may be perfectly correlated, which would
imply that some explanatory variables are merely linear combinations of others. The
result of this would be that some variables are fully explained by others and, thus,
provide no additional information. This is a very extreme case, however. In most
problems in finance, the independent variables are not perfectly correlated but may be
correlated to a high degree.
Example 1: Consider the multiple linear regression model involving two explanatory
variables X2 and X 3 :
Yi 1 2X2i 3X3i i ………………….. (1)
Expressing all variables in deviations form, that is, yi (Yi Y) , x 2i (X2i X2 ) and
x 3i (X3i X3 ) , the OLS estimators of 2 and 3 are given by:
ˆ 2i i 3i 3i i 2i 3i ……………. (2)
x y x2 x y x x
x 22i x 3i2 x 2i x 3i
2 2
ˆ 3i i 2i 2i i 2i 3i ……………. (3)
x y x2 x y x x
x 22i x 3i2 x 2i x 3i
3 2
ˆ
(2x 3 )y x 32 x 3 y (2x 3 )x 3
(2x 3 )2 x 32 (2x 3 )x 3
2 2
CHAPTER IV: Violations of assumptions of the CLRM 52
2 x 3 y x 32 2 x 3 y x 32 0
4 x 3 x 3 4 x 3
2 2 2 2 0
We have seen in Chapter 3 that the standard errors of ̂ 2 and ̂3 are estimated by:
ˆ 2
se(ˆ 2 ) ……………. (4)
(1 r232 ) x 2i2
ˆ 2
se(ˆ 3 ) ……………. (5)
(1 r232 ) x 3i2
where r23 is the coefficient of correlation between X2 and X 3 , and ̂2 is the residual
variance.
Example 2: Again consider model (1) above. There is a strong MC between X2 and
X 3 means that r23 will be a number close to 1 or –1 (but not equal to 1 for this
would mean perfect MC). In such cases:
r232 approaches to one
(1 r232 ) approaches to zero
(1 r232 ) x 22i and (1 r232 ) x3i2 both approach to zero (become very small)
se(ˆ 2 ) and se(ˆ 3 ) both become very large (or will be inflated)
Recall that the test statistic for testing H 0 : j 0 versus H1 : j 0 is given by:
ˆ j
tj , j 2, 3
se(ˆ j )
The decision rule is to reject the null hypothesis when t j is ‘large’ in absolute value,
that is, when | t j | t / 2 (n 3) . Since the standard errors (the denominators) will be
very large due to strong multicollinearity, the test statistic will be very small (tends to
zero). This often leads to incorrectly failing to reject the null hypothesis (Type II
error)!
53 Applied Econometrics for Accounting and Finance
Example 3: The following data (from main stream economics) pertain to imports (Y),
GDP ( X2 ), stock formation ( X 3 ) and consumption ( X4 ) for the years 1949 – 1967.
Year Y X2 X3 X4 Year Y X2 X3 X4
1949 15.9 149.3 4.2 108.1 1959 26.3 239.0 0.7 167.6
1950 16.4 161.2 4.1 114.8 1960 31.1 258.0 5.6 176.8
1951 19.0 171.5 3.1 123.2 1961 33.3 269.8 3.9 186.6
1952 19.1 175.5 3.1 126.9 1962 37.0 288.4 3.1 199.7
1953 18.8 180.8 1.1 132.1 1963 43.3 304.5 4.6 213.9
1954 20.4 190.7 2.2 137.7 1964 49.0 323.4 7.0 223.8
1955 22.7 202.1 2.1 146.0 1965 50.3 336.8 1.2 232.0
1956 26.5 212.4 5.6 154.1 1966 56.6 353.9 4.5 242.9
1957 28.1 226.1 5.0 162.3 1967 59.9 369.7 5.0 252.0
1958 27.6 231.9 5.1 164.3
Rule of thumb: The benchmark (threshold) for the VIF is often given as 10. Thus, if
VIF(ˆ j) exceeds 10, then ̂ j is poorly estimated because of MC or the jth regressor
variable ( X j ) is responsible for MC.
The coefficient of determination from this auxiliary regression (using EViews) is found
to be: R 22 = 0.998203. The VIF of ̂ 2 is thus:
1 1
VIF(ˆ 2 ) 556.5799
1 R2
2
1 0.998203
Since this figure is by far exceeds 10, we can conclude that the coefficient of GDP is
poorly estimated because of MC (or that GDP is responsible for MC).
The variance inflation factors for all explanatory variables (EViews output) are shown
below:
55 Applied Econometrics for Accounting and Finance
C 19.11560 63.32197 NA
GDP 0.037497 7980.397 556.5817
STOCK_FORMATION 0.116400 6.494439 1.079783
CONSUMPTION 0.088136 9176.249 555.8980
We can see that the (centered) VIF’s corresponding to GDP and consumption are
greater than 10. This is an indication that these two variables are responsible for MC.
The strange results that we have seen earlier (the inconsistency between F-test and t-
tests) no more prevails. Notice that the value of R-squared remains almost the same
after the removal of Consumption in the re-specified model. This is an indication that
Consumption provides no additional information (is redundant).
The variance inflation factors from the re-specified model are shown below. The VIF
for both GDP and Stock formation now becomes 1.077 (which is less than 10) – an
indication that the problem of multicollinearity has been properly dealt with.
C 4.618114 16.04899 NA
GDP 6.92E-05 15.44894 1.077464
STOCK_FORMATION 0.110714 6.480497 1.077464
CHAPTER IV: Violations of assumptions of the CLRM 56
The Jarque-Bera test is one of the common tests of normality. This test is based on the
coefficients of skewness and kurtosis of the residuals.
A distribution is said to be skewed to the right (or positively skewed) when most of
the data are concentrated on the left tail of the distribution, that is, the right tail extends
out quite a long way from the distribution's center than the left tail. For such
distributions, the coefficient of skewness is greater than zero (positive). On the other
hand, if most of the data are concentrated on the right tail of the distribution, then it
said to be skewed to the left (or negatively skewed). The coefficient of skewness for a
left-skewed distribution is less than zero (negative).
The null and alternative hypotheses of the Jarque-Bera test of normality are:
For the normal distribution (which is bell-shaped and symmetric about the mean), the
coefficients of skewness and kurtosis are zero and three, respectively. The Jarque-Bera
test statistic compares the sample coefficients of skewness (S) and kurtosis (K) of the
residuals ( ̂ i ) with that of the normal distribution. The test statistic is given by:
n (K 3) 2
JB 2
…………………. (8)
6 4
(S 0)
1 n 1 n
n n
(ˆ i ˆ )3 (ˆ i ˆ )4
i 1 i 1
S 3/ 2
, K 2
1 n 1 n
n (ˆ i ˆ ) n (ˆ i ˆ )
2 2
i 1 i 1
JB 2 (2)
where 2 (2) is the critical value from the Chi-square distribution with two degrees of
freedom for a level of significance .
Example 6: Consider example 3. EViews output of normality test of the residuals for
the final regression model (consumption removed) is shown below:
7
Series: Residuals
6 Sample 1949 1967
Observations 19
5
Mean -8.69e-15
Median 0.349307
4 Maximum 3.096290
Minimum -4.021041
3 Std. Dev. 2.204495
Skewness -0.378786
2 Kurtosis 2.083374
Jarque-Bera 1.119511
1
Probability 0.571349
0
-5 -4 -3 -2 -1 0 1 2 3 4
The Jarque-Bera test statistic is 1.119511 with a p-value of 0.571349. Since the p-value
exceeds 0.05, we do not reject the null hypothesis of normality of residuals.
CHAPTER IV: Violations of assumptions of the CLRM 58
In general, there are a number of situations under which this assumption is violated,
that is, we have:
cov( t , t s ) E( t t s ) 0 for s 0
In such cases we say that the errors are autocorrelated or serially correlated. Here
t s is called lagged value of t (when s 0 ). The lagged value of a variable is
simply the value that the variable took during a previous period (e.g., t 1 is the value
of t lagged by one period, t 2 is the value of t lagged by two periods, etc.).
where the u t ’s satisfy all assumptions of the CLRM (that is, they have zero mean,
constant variance and are uncorrelated). Here, each term is correlated with its
predecessor so that the variance of the disturbances is partially explained by regressing
each term on its predecessor.
Suppose there is positive serial correlation in the disturbances (which is often the case
in practice). If we proceed to estimate the regression coefficients using OLS, then the
standard error for the slope ( se(ˆ ) ) will be under-estimated, and consequently, the t-
ratio will be large. This often leads to rejecting the null hypothesis H0 : 0 while
it is true (Type I error). Thus, test of model adequacy (F-test) and tests of significance
(t-test) are invalid if there is autocorrelation of errors.
1. Graphical method
We first estimate the model using OLS and plot the residuals ( ̂ t ) against time. If there
is a clustering of neighbouring residuals on one or the other side of the line ˆ t 0 ,
then such clustering is probably a sign that the errors are autocorrelated. However,
graphical methods may be difficult to interpret in practice, and hence, formal statistical
tests should also be applied.
RESID
8
-2
-4
-6
-8
02 03 04 05 06 07 08 09 10 11 12 13
The null hypothesis states that there is no serial correlation since 0 would mean
t u t and the u t ’s are uncorrelated by definition. Rejection of the null hypothesis is
an indication that the errors are indeed autocorrelated. The DW test statistic is
computed as:
T
(ˆ
t2
t ˆ t 1 )2
d T
………………………...… (12)
ˆ
t 1
2
t
The numerator compares the values of the residuals at times (t-1) and t. If there is
positive autocorrelation in the errors, the successive residuals ̂ t and ˆ t 1 will
frequently have the same sign (will be in the same direction with respect to the line
ˆ 0 ). In this case, the difference ( ˆ t ˆ t 1 ) in the numerator will be relatively small,
and consequently, the test statistic will be small. Thus, we reject the null hypothesis
when the test statistic (12) is ‘small’.
The DW test is a non-standard test, that is, the test statistic does not follow a standard
statistical distribution such as the t-distribution, F-distribution or Chi-square
distribution. Instead, we refer to critical values specifically developed for the DW test.
Unlike the standard tests, we have two critical values: an upper critical value ( d U () )
and a lower critical value ( dL () ), and there is also an intermediate region where
the null hypothesis of no autocorrelation can neither be rejected nor not rejected.
Decision rule:
Reject H0 if d dL ()
Do not reject H0 if d d U ()
The test is inconclusive if dL () d d U ()
Some of the limitations of the DW test are: there is a certain region where the test is
inconclusive; the test is valid only for the AR(1) error scheme; and the test is invalid
when lagged values of the dependent variable (e.g., Yt 1 , Yt 2 , etc.) appear as
regressors.
A correlogram produced by software often displays the PACF of the residuals together
with the 95% upper and lower confidence limits at each of the lags. If the function at
lag one extends beyond the upper or lower confidence limits, for instance, then this is
an indication that the errors follow an AR(1) process. Higher order error processes can
be detected similarly. The following figures illustrate the PACF of an AR(1) error
process and that of a process which exhibits no serial correlation.
PACF for an AR(1) error process PACF for an error process with no serial
correlation
CHAPTER IV: Violations of assumptions of the CLRM 62
Yt Yt 1 (1 ) (X t X t 1 ) ( t t 1 )
X*t u
Yt* * t
Y X u t ……….….…. (16)
*
t
* *
t
Since the error term u t ( t t 1 ) fulfils all assumptions of the CLRM, we can
estimate equation (16) using OLS. The above transformation is known as the
Cochrane-Orcutt transformation. This transformation requires a knowledge of the
value of . An estimator ̂ of can be obtained by a regression of the OLS residuals
̂ t on ˆ t 1 without a constant term, that is: ˆ t ˆ t 1 u t
year INVEST (Y) VOS (X) year INVEST (Y) VOS (X)
1935 317.6 3078.5 1945 561.2 4840.9
1936 391.8 4661.7 1946 688.1 4900.9
1937 410.6 5387.1 1947 568.9 3526.5
1938 257.7 2792.2 1948 529.2 3254.7
1939 330.8 4313.2 1949 555.1 3700.2
1940 461.2 4643.9 1950 642.9 3755.6
1941 512.0 4551.2 1951 755.9 4833.0
1942 448.0 3244.1 1952 891.2 4924.9
1943 499.6 4053.7 1953 1304.4 6241.7
1944 547.5 4379.3
R-squared 0.422491
F-statistic 12.43677 Durbin-Watson stat 0.552764
Prob(F-statistic) 0.002590
63 Applied Econometrics for Accounting and Finance
The F-statistic is significant at the 1% level (since Prob(F-statistic) = 0.0026 < 0.01) .
This indicates that the model is adequate. However, we have to explore the model for
the presence of autocorrelation of errors.
AC diagnostics
We first plot the estimated residuals against time and look for some model
misspecifications. A scatter plot of the estimated disturbances (residuals) is shown
below. We can see a clustering of neighbouring residuals on one or the other side of the
line ˆ 0 . This might be a sign that the errors are autocorrelated. However, we need to
conduct formal tests of error autocorrelation before coming to a final conclusion.
RESID
500
400
300
200
100
-100
-200
-300
-400
1936 1938 1940 1942 1944 1946 1948 1950 1952
One of the statistics displayed in EViews output is the Durbin-Watson (DW) statistic.
This test statistic is equal to d = 0.553. Note that the p-value for the test is not provided
since the test is non-standard, and thus, we have to refer to the DW critical values. At
the 5% level of significance ( 0.05 ), the Durbin-Watson critical values (for T = 19)
are: dL () = 1.180 and d U () = 1.401. Since d = 0.553 is less than dL () , we reject
the null hypothesis of no serial correlation in the residuals ( H0 : 0 ).
We can also test for error AC using the Breusch-Godfrey (BG) test. EViews output of
this test is shown below. Since the p-values of both versions of the test (F-test and Chi-
square test) are less than 0.05, we reject the null hypothesis of no autocorrelation.
Breusch-Godfrey Serial Correlation LM Test:
Note: If the two tests yield contradictory results, it is better to go for the results implied
by the F-test since it is applicable for large as well as small sample sizes, while the Chi-
square test is an asymptotic test and requires large sample size.
All tests indicated that there is serial correlation in the residuals. The implication is that
all significance tests based on OLS regression are invalid. Thus, we need to apply the
Cochrane-Orcutt transformation. This transformation requires an estimate ̂ of .
This is achieved through a regression of the OLS residuals ̂ t on ˆ t 1 without a
constant term. This gives the following result:
Dependent Variable: RESIDUAL
Yt* * X*t u t
Note that the transformed model fulfills all basic assumptions and, thus, we can
estimate the parameters in this equation by OLS regression of Yt*
( INVESTt 0.805* INVESTt 1 ) on X t ( VOSt 0.805* VOSt 1 ) . The result (EViews
*
R-squared 0.429846
F-statistic 12.06259
Prob(F-statistic) 0.003137
The partial autocorrelation function of the residuals in the transformed model is shown
below. We can see the partial correlation of the residuals at all lags lie within the upper
65 Applied Econometrics for Accounting and Finance
and lower confidence limits – an indication that the autocorrelation structure has been
properly dealt with.
4.4 Heteroscedasticity
Consider the model:
Yi 1 2 X2i 3X3i . . . k Xki i i 1, 2, . . ., n
Var(i ) E(i2 ) 2 i 1, 2, . . ., n
This assumption tells us that the variance remains constant for all observations. This
assumption of constant variance is referred to as homoscedasticity. However, there are
many situations in which this assumption may not hold, that is, the error terms may be
expected to be larger for some observations or periods of the data than for others.
Under such circumstances, we have the case of heteroscedasticity. Generally, under
heteroscedasticity we have:
consumption income
In the simple linear regression model Yi Xi i , suppose that the variance of
the errors is positively related with the square of the explanatory variable (which is
often the case in practice under heteroscedasticity), that is, i2 increases with X i2 .
Then, the OLS standard error estimate for the slope ( se(ˆ ) ) will be too low (an
underestimate). In this case, the t-ratio will be large, and consequently, the probability
of incorrectly rejecting the null hypothesis H0 : 0 (or Type I error) increases. In
general, the estimated variances of the OLS estimators are biased and the conventional
tests of significance are invalid.
We first estimate equation (17) by OLS and obtain the residuals ( ̂ i ). We then apply
OLS to the following auxiliary regression:
The reason that the auxiliary regression takes this form is that it is desirable to
investigate whether the variance of the residuals varies systematically with any known
variables relevant to the model. Relevant variables include the original explanatory
variables, their squared values and their cross-products.
67 Applied Econometrics for Accounting and Finance
where n is the total number of observations. Notice that the higher the value of R 2 , the
higher will be the test statistic. Thus, we reject the null hypothesis of homoscedasticity
when the test statistic is ‘large’.
Decision rule: Reject H0 if the test statistic exceeds the critical value from the Chi-
square distribution with m degrees of freedom for a given level of significance ( ).
Here, m is the number of regressors in the auxiliary regression (m = 5 in our case).
Yi 1 X 2i X3i i
Z 1 Z 2 Z 3 Z Z
i i i i i
Yi* Z*i X*2i X*3i *i
Now we have:
0
E(i ) 0
E(*i ) E i 0
Zi Zi Zi
2 Zi2
2
E(i2 ) 2 Zi2
var(*i ) var i E i 2
2
2
Zi Zi Zi Zi
We can see that the disturbances from the transformed model (equation 19) have
constant variance ( 2 ), and hence, we can estimate the regression coefficients using
CHAPTER IV: Violations of assumptions of the CLRM 68
OLS. This method of estimation is known as weighted least-squares (WLS). WLS can
be viewed as OLS applied to transformed (weighted) data.
One problem with WLS is that we need to identify the regressor that induced
heteroscedasticity in order to perform appropriate linear transformation. This difficulty
can be resolved by using White’s heteroscedasticity consistent estimators. These
estimators are robust to the violation of the homoscedasticity assumption and thereby
help us perform consistent statistical inference in the face of heteroscedasticity. Note
that this method corrects for heteroscedasticity without altering the values of the
coefficients.
Example 8: The following data pertain to rental value (in Birr), floor area (in square
meters) and frontage size of plot (in meters) of a random sample of 68 properties.
Property Rental value Floor area Frontage Property Rental value Floor area Frontage
1 400 15 15 35 1000 30 20
2 1400 6 12 36 500 16 4
3 500 9 12.5 37 1000 16 20
4 500 12 15 38 800 32 10
5 4000 30 10 39 800 24 10
6 5000 35 25 40 3600 32 25
7 400 12 10 41 1000 30 12.5
8 600 12 16 42 1500 9 15
9 500 8 10 43 2500 24 14
10 500 24 20 44 900 12 12
11 2600 20 5 45 900 16 10
12 2800 20 12 46 1500 12 10
13 500 9 10 47 1100 18 15
14 450 12 10 48 800 11.2 8
15 3000 24 8 49 2500 21 25
16 800 12 20 50 1000 9 15
17 400 20 10 51 1800 20 5
18 400 20 10 52 1500 24 20
19 400 21 10 53 2500 28 14
20 600 20 15 54 1200 12 4
21 500 16 10 55 1500 9 20
22 500 20 10 56 1000 9 4
23 300 30 6 57 1000 9 12
24 1500 9 10 58 1500 12 25
25 600 8 10 59 1300 26 4
26 600 9 10 60 1500 16 10
27 2000 16 6 61 1000 5 12.5
28 4000 30 30 62 2000 35 15
29 1000 30 25 63 1300 16 5
30 3500 32 10 64 1700 26 4
31 700 12 12 65 1600 20 10
32 700 32 12.5 66 2000 30 8
33 3700 30 25 67 1500 21 7
34 1000 4 10 68 2000 24 20
where Y, X2 and X 3 represent rental value, floor area and frontage size of plot,
respectively. EViews output of the fitted model is shown below.
69 Applied Econometrics for Accounting and Finance
R-squared 0.323259
F-statistic 15.52431
Prob(F-statistic) 0.000003
The value of F-statistic is 15.52431 with a p-value of less than 0.01. Thus, the fitted
model passes the F-test at the 1% level of significance, that is, we reject the null
hypothesis that the regressors are all insignificant ( H0 : 2 3 0 ). Moreover, both
explanatory variables are significant. However, we have to conduct further tests for
possible violation of basic assumptions. In particular, we have to assess the presence of
heteroscedasticity of errors which is more prevalent in relationships estimated from
cross-sectional data.
2,000
1,600
1,200
800
RESID
400
-400
-800
-1,200
-1,600
0 5 10 15 20 25 30 35 40
Floor area
The results of White’s test (EViews output) is shown below. The p-values
(corresponding to all versions of the test) are less than 0.05. Thus, we reject the null
hypothesis of homoscedasticity.
Heteroskedasticity Test: White
As discussed earlier, the conventional test of significance (t-test) is invalid since the
disturbances are heteroscedastic. One possible solution is to use White’s
heteroscedasticity consistent estimators which are robust to the violation of the
homoscedasticity assumption. EViews output of the fitted model with
heteroscedasticity consistent standard errors is shown below.
Dependent Variable: RENTAL_VALUE
Method: Least Squares
Included observations: 68
White heteroskedasticity-consistent standard errors & covariance
The value of R-squared and F-statistic are unchanged, but the standard errors are now
larger than those reported earlier. Thus, the OLS standard errors of the coefficient
estimates in the original fitted model were under-estimated due to heteroscedasticity,
leading to inflated t-ratios. This explains why frontage size is now insignificant (p-
value = 0.0519 > 0.05), while it was significant in the original model.
71 Applied Econometrics for Accounting and Finance
*
Note: k is the number of regressors excluding the intercept.