Professional Documents
Culture Documents
LAD Estimation With Random Coefficient Autocorrelated Errors
LAD Estimation With Random Coefficient Autocorrelated Errors
LAD Estimation With Random Coefficient Autocorrelated Errors
www.elsevier.com/locate/csda
Abstract
In this paper we compare the performance of LAD and OLS in the linear regression model with
errors which are randomly autocorrelated. This model yields thick-tailed error distributions which make
pro4table to estimate the model by LAD. The LAD estimator for randomly autocorrelated errors is
proved to be asymptotically normal. The Monte Carlo results show that LAD improves upon OLS,
unless we revert to a constant autocorrelation model, where the two methods are comparable. c 2001
Elsevier Science B.V. All rights reserved.
Keywords: Thick-tailed distributions; Least absolute deviation (LAD); Random coe$cient autocorrela-
tion (RCA); Conditional heteroskedasticity (ARCH).
0. Introduction
This paper considers a linear regression model with random coe$cient autocorre-
lated (RCA) errors. As discussed in Tsay (1987), the RCA model is characterized by
changing conditional variance. In a time series setting, the conditional heteroskedas-
ticity caused by RCA is function of the past observations of the variable under study,
while in the ARCH model the variances depend upon the lagged errors of the equa-
tion. When we consider, however, a linear regression with randomly autocorrelated
errors, the di>erence between the two models disappears, and the conditional variance
is function of past innovations in the RCA just as in the standard ARCH case.
In the ARCH literature, there is a wealth of empirical evidence discussing how con-
ditional heteroskedasticity a>ects the unconditional error distribution, causing non-
normality like leptokurtosis and=or skewness (Engle and Gonzales–Rivera, 1991).
∗
Correspondence address: Via S. Lucia 173, 80132 Napoli, Italy.
E-mail address: furnoma@tin.it (M. Furno).
0167-9473/01/$ - see front matter c 2001 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 9 4 7 3 ( 0 0 ) 0 0 0 5 0 - 5
512 M. Furno / Computational Statistics & Data Analysis 36 (2001) 511–523
This leads us to set aside the maximum-likelihood estimator and to consider a dis-
tribution free estimator. The presence of thick tails suggests the choice of a robust
estimator, which provides e$ciency gains with respect to least squares. In this paper,
to deal with thick-tailed distributions, we propose to implement the least absolute
deviation (LAD) estimator.
LAD coincides with maximum likelihood when the errors follow a double ex-
ponential distribution. In all the other cases, LAD is less a>ected by observations
coming from the tails, since it minimizes the absolute value and not the squared
value of the errors. This is particularly useful with leptokurtic error distributions.
LAD has already been considered in models with constant autocorrelation (Weiss,
1990). We propose to implement LAD in the presence of random autocorrelation
and we prove its asymptotic normality. The simulations we perform show that LAD
improves upon OLS in case of RCA errors, both in terms of bias reduction and
e$ciency gains. However, when we revert to the constant autocorrelation model,
our results agree with Weiss (1990) 4ndings. His Monte Carlo study shows that the
LAD-based procedure is not particularly advantageous, especially in small samples,
since its sampling distribution di>ers from the asymptotic one. In addition, the OLS-
and LAD-based procedures yield results which are comparable in many respects,
thus discouraging the use of LAD.
The 4rst section of the paper brieHy reviews the relevant literature. The linear-
regression model with random autocorrelation of the 4rst order and the corresponding
LAD objective function, are in Section 2.1. In Section 2.2 we discuss the asymptotic
distribution of the LAD estimator here considered. Section 3 presents more general
random coe$cient ARMA models for the error term, analyzing the resulting condi-
tional heteroskedasticity. A Monte Carlo experiment is described in Sections 4 and
5, while the 4nal section draws the conclusions.
Furno (2000) investigates the performance of LAD residuals to build LM tests for
AR and=or ARCH processes. In case of non-normal distributions LAD-based tests
have greater power than the same tests built on OLS residuals. In addition, Machado
and Silva (2000) show that the Glejser test for heteroskedasticity improves with the
use of LAD residuals, even if the error distribution is skewed.
The characteristics of the LAD estimator, together with the good performance of
the LAD residuals in terms of testing procedures (Furno, 2000; Machado and Silva,
2000) lead us to believe that LAD can improve upon OLS.
Eq. (4) coincides with the auxiliary regression de4ning the pattern of the conditional
heteroskedasticity. It can be estimated by replacing the unknown ht with a function
of vt . The latter are the residuals computed in Eq. (3), that is after purging the 4xed
2
autocorrelation. The et−1 in Eq. (4) are the lagged errors of Eq. (1), which is in
terms of the original variables yt and xt .
The terms et and vt can be computed by implementing LAD in Eqs. (1) and (3),
respectively. When we estimate Eq. (1) by LAD, we minimize
(i) t |yt − xt | = t |et |:
When in Eq. (3) we transform the variables to account for the 4xed correlation,
the objective function is
(ii) t |(yt − yt−1 − xt + xt−1
)| = t |vt |:
In addition, but this is not implemented in this paper, we can purge the conditional
heteroskedasticity as well, and the objective function is given by
√ √
(iii) t |(yt − yt−1 − xt + xt−1
)= ht | = t |vt = ht |:
Eq. (4), which describes the pattern of conditional heteroskedasticity, produces
an important by-product. The slope coe$cient 1 provides an estimate of r2 =
[2fr (0:5)]−2 , and the constant term estimates a2 = [2fa (0:5)]−2 . Both a2 and r2
are very useful in computing the variance covariance matrix of the coe$cients of
the main equation, thus simplifying the problem of estimating f(0:5), which usually
M. Furno / Computational Statistics & Data Analysis 36 (2001) 511–523 515
involves non-parametric estimators. Eq. (4) can be estimated by LAD as well. This
implies the minimization of the objective function
2
(iv) t |ht − 0 − 1 et−1 |;
which di>ers from the formulation proposed by Koenker and Zao (1996) for a closely
related problem. 2
Weiss (1990) proves the asymptotic normality of LAD and GLAD in case of 4xed
serial correlation.
To estimate the RCA model of Eqs. (3) and (4), we minimize t |vt | for the main
2
equation, and t ||vt |−0 −1 et−1 | for the auxiliary regression, where we approximate
the term ht with |vt |. The parameters of interest are ’ = (; ; 1 ), and their normal
equations are given by
n−1=2 t (vt )xt∗ = 0;
2
Koenker and Zao (1996), in the model yt = 0 + i = 1; p i yt−i + et , present the quantile regression
estimator of the auxiliary equation de4ning the ARCH process et = (0 + 1 |et−1 | + · · · + q |et−q |)!t .
Assuming su$cient conditions for the stationarity and ergodicity of yt and et , and provided a consistent
estimate of the coe$cients of the main equation, they prove the asymptotic normality of the coe$cients
i of the auxiliary regression.
516 M. Furno / Computational Statistics & Data Analysis 36 (2001) 511–523
√
and n(ˆ − ) = [f(F −1 (0:5))]−1 M −1 g0 + 0p (1). This allows to state the asymp-
totic distribution of the vector, which is normally distributed with zero mean and
covariance matrix
W = [f(F −1 (0:5))]−2 M −1 AM −1 ;
where
xt∗ xt∗ xt∗ et−1 xt∗ et−1
2
A = lim n−1 t xt∗ et−1 2
et−1 3
et−1 :
xt∗ et−1
2 3
et−1 4
et−1
3. Extensions
The 4rst possible generalization is to assume that the errors follow a pth-order
random correlation process. After purging the constant autocorrelation, the errors are
de4ned as vt = i = 1; p rit et−i + at . This implies the following conditional variance:
var(vt =It−1 ) = a2 + i = 1; p r i2 et−i
2
+ i = 1; p j = i r ij et−i et−j ; (9)
which de4nes the augmented ARCH (AARCH) process (Bera et al., 1992).
If the errors follow a random coe$cient ARMA(p; q) process, after purging the
constant correlation, the errors become vt = j = 1; p rjt et−j + at + i = 1; q git at−i , with
conditional variance
var(vt =It−1 ) = j = 1; p r j2 et−j
2
+ s = 1; p j = 1; p r sj et−s et−j + a2
4. Monte Carlo
5. Results
Table 1 presents the summary statistics for GLS and GLAD when the random
correlation follows a standard normal distribution. In this set of experiments the
3
Engle et al. consider a gamma distribution to analyse skewness, but the 2 is just a special case of
the gamma distribution.
4
We could choose error terms et following other non-normal distributions. However, Furno (2000)
shows that is the t distribution to have the greatest inHuence on the results.
5
Herce (1996) presents the asymptotic distribution of LAD in the presence of unit roots.
518 M. Furno / Computational Statistics & Data Analysis 36 (2001) 511–523
Table 1
Random autocorrelation t = + rt ; rt is distributed as a standard normal
(a) Mean and standard deviation of the distributions of the estimated coe.cients
E(1=nt t ) E(seGLS =seGLAD )b0 E(seGLS =seGLAD )b1 E(seGLS =seGLAD )ˆ
Table 2
Random autocorrelation t = + rt ; rt is distributed as a contaminated normal
(a) Mean and standard deviation of the distributions of the estimated coe.cients
E(1=nt t ) E(seGLS =seGLAD )b0 E(seGLS =seGLAD )b1 E(seGLS =seGLAD )ˆ
0.0 0.03 14 13 21
0.3 0.34 17 17 29
0.6 0.56 63 66 82
0.9 0.93 38 39 72
replicates, presented in the last column of the table, with the 4rst column of the
table reporting its true value. Table 1(b) shows that the GLAD bias is lower than
GLS, particularly in the slope coe$cient. Section (c) shows that, on average, GLAD
is more e$cient than GLS, since the ratios between the standard errors are always
greater than one.
In sum, with random autocorrelation following a normal distribution, GLAD im-
proves upon GLS in terms of both bias and e$ciency. This is due to the conditional
520 M. Furno / Computational Statistics & Data Analysis 36 (2001) 511–523
heteroskedasticity induced by RCA, and thus to the thick-tailed nature of the uncon-
ditional distribution.
Table 2 reports the results for RCA with a random component following a con-
taminated distribution. By comparing the 4rst with the last column of the table, once
again we can see that the degree of autocorrelation is increasingly underestimated as
increases. GLAD presents a reduced bias and a greater e$ciency than GLS, and
the improvements (bias reduction and e$ciency gains) are enhanced with respect to
those reported in the previous table. With a contaminated normal the GLS procedure
is quite unreliable in all the experiments here considered. GLAD becomes less reli-
able only in the highly correlated experiments, when the average correlation is equal
to 0.93.
The last two tables present a di>erent kind of experiments. In order to preserve
stationarity, the impact of the random component of the autocorrelation is strongly
reduced: t = + 'rt . The value of ' is reported in the second column of Tables 3
and 4.
Table 3 summarizes the results for rt following a Student-t distribution with 4
degrees of freedom. The 4xed correlation increases from 0 to 0.8, while the coe$-
cient controlling the impact of the random correlation, ', decreases from 1 to 0.1.
Therefore, the 4rst row of each section in this table provides the results for a fully
randomly correlated experiment, while the fourth row of each section presents the
case of an almost 4xed autocorrelation coe$cient. GLAD has a smaller bias than
GLS in all but the fourth experiment, where the two estimators are comparable. In
terms of e$ciency, GLAD is preferable in the 4rst two experiments, where prevails
the random correlation. GLS instead is more e$cient in the last two experiments,
where the 4xed serial correlation dominates (last two rows of the table). The 4xed
autocorrelation coe$cient is overestimated in the 4rst two rows of the table, where
the random correlation prevails.
In Table 4, rt follows a 2 distribution with 4 degrees of freedom. This table
con4rms the results of the previous set of experiments. When the random correlation
dominates, GLAD estimated coe$cients have smaller bias and greater e$ciency than
GLS. When the 4xed correlation prevails, instead, the two estimators are comparable
in terms of bias, while GLS is more e$cient than GLAD. The 4xed autocorrelation
coe$cient is overestimated throughout the table.
This con4rms the results of Weiss (1990): when serial correlation has a 4xed
coe$cient, LAD and OLS based procedures are comparable. However we 4nd that,
when the correlation has random component, GLAD improves upon GLS and can
be pro4tably implemented.
6. Conclusions
This study compares the behavior of OLS and LAD procedures in the context
of randomly autocorrelated errors. The performance of the two estimators has been
analyzed by Weiss (1990) in case of constant correlation. Weiss 4nds that the two
estimators yield very similar results so that LAD, being more cumbersome than OLS,
is not really advisable.
M. Furno / Computational Statistics & Data Analysis 36 (2001) 511–523 521
Table 3
Random autocorrelation t = + ' rt ; rt is distributed as a t(4)
(a) Mean and standard deviation of the distributions of the estimated coe.cients
GLS 0.0 1.0 1.0 −0:15 (5:7) 0.59 (8.1) 0.63 (0.1)
GLAD 0.0 1.0 1.0 0.06 (1.1) 0.66 (1.3) 0.60 (0.2)
GLS 0.3 0.6 0.9 0.02 (1.1) 0.61 (1.3) 0.69 (0.1)
GLAD 0.3 0.6 0.9 0.10 (0.8) 0.60 (1.0) 0.66 (0.1)
GLS 0.6 0.2 0.8 0.12 (0.5) 0.58 (0.6) 0.68 (0.1)
GLAD 0.6 0.2 0.8 0.14 (0.6) 0.59 (0.8) 0.68 (0.1)
GLS 0.8 0.1 0.9 0.05 (0.6) 0.62 (0.6) 0.77 (0.13)
GLAD 0.8 0.1 0.9 0.04 (0.7) 0.62 (0.7) 0.76 (0.16)
(b) Average bias over 500 replicates
' E(1=nt t ) E(seGLS =seGLAD )b0 E(seGLS =seGLAD )b1 E(seGLS =seGLAD )ˆ
Table 4
2
Random autocorrelation t = + ' rt ; rt is distributed as a (4)
(a) Mean and standard deviation of the distributions of the estimated coe.cients
GLS 0.0 0.2 0.80 0.13 (1.0) 0.54 (1.4) 0.60 (0.1)
GLAD 0.0 0.2 0.80 0.10 (0.8) 0.62 (1.0) 0.58 (0.2)
GLS 0.2 0.2 0.99 −0:27 (1:8) 1.38 (2.2) 0.75 (0.1)
GLAD 0.2 0.2 0.99 0.12 (1.1) 0.64 (1.3) 0.73 (0.1)
GLS 0.6 0.1 0.99 0.01 (1.6) 0.69 (1.8) 0.83 (0.1)
GLAD 0.6 0.1 0.99 0.02 (1.1) 0.67 (1.03) 0.81 (0.1)
GLS 0.8 0.05 0.99 0.04 (0.8) 0.63 (0.7) 0.85 (0.12)
GLAD 0.8 0.05 0.99 0.02 (0.9) 0.66 (0.9) 0.84 (0.14)
(b) Average bias over 500 replicates
' E(1=nt t ) E(seGLS =seGLAD )b0 E(seGLS =seGLAD )b1 E(seGLS =seGLAD )ˆ
contaminated normal distribution, LAD provides a sizable bias reduction and a rele-
vant improvement in e$ciency with respect to least squares. In the experiments with
random coe$cients following a Student-t or a 2 distribution there are stability issues
involved, since these distributions render the average correlation greater than one.
Therefore, we need to balance the 4xed and the random component of the serial cor-
relation in order to keep the average value of the serial correlation below unity. This
M. Furno / Computational Statistics & Data Analysis 36 (2001) 511–523 523
4ne-tuning allows seeing that, when the random component prevails, LAD substan-
tially improves upon least squares. On the other hand, when the random component
is small, least squares and LAD yield similar results, thus con4rming Weiss (1990)
4ndings.
Summarizing, LAD-based procedure can be seen as an insurance policy against
undetected RCA. In case of 4xed correlation OLS and LAD are comparable, but
LAD is more cumbersome and possibly less e$cient. In case of random correlation,
the LAD procedure induces bias reduction and e$ciency gain with respect to OLS.
These considerations make highly recommendable its implementation.
References
Bera, A., Higgins, M., Lee, S., 1992. Interaction between autocorrelation and conditional
heteroscedasticity: A random coe$cient approach. J. Bus. Econom. Statist. 10, 133–142.
Bollerslev, T., 1987. A conditional heteroskedastic time series model for speculative prices and rates
of returns. Rev. Econom. Statist. 69, 542–547.
Bollerslev, T., Wooldridge, J., 1992. Quasi-maximum likelihood estimations and inference in dynamic
models. Econometric Rev. 11 (2), 143–172.
Engle, R., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of the UK
inHation. Econom. 50, 987–1008.
Engle, R., Gonzales-Rivera, G., 1991. Semiparametric ARCH models. J. Bus. Econom. Statist. 9, 345–359.
Furno, M., 2000. LM tests in the presence of non-normal error distributions. Econometric Theory 16,
249–261.
Herce, M., 1996. Asymptotic theory of LAD estimation in a unit root process with 4nite variance
errors. Econometric Theory 12, 129 –153.
Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46, 33–50.
Koenker, R., Zao, Q., 1996. Conditional quantile estimation and inference for ARCH models.
Econometric Theory 12, 793–813.
Machado, J., Silva, J., 2000. Glejser’s test revisited. J. Econometrics 97, 189–202.
Nelson, D., 1991. Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59,
307–346.
Ruppert, D., Carroll, R., 1980. Trimmed least-squares estimation in the linear model. J. Amer. Statist.
Assoc. 75, 828–838.
Tsay, R., 1987. Conditional heteroscedastic time series models. J. Amer. Statist. Assoc. 82, 590 – 604.
Weiss, A., 1990. Least absolute error estimation in the presence of serial correlation. J. Econometrics
44, 127–158.