Professional Documents
Culture Documents
Econ 546 Project Final Report
Econ 546 Project Final Report
Monica Mow
ECON 546
Themes in Econometrics
University of Victoria
April 9, 2018
ABSTRACT
This paper uses simulation-based techniques to obtain the null distribution for ARCH LM
tests of conditional heteroscedasticity to analyze the performance of the limiting null distribution
in finite samples. I estimate a GARCH model using daily S&P 500 returns to obtain Monte Carlo
LM test statistics. The simulated test statistics are compared to the sample LM test statistic to
generate Monte Carlo p-values. The Monte Carlo p-value obtained from the procedure at 199
replications is 0.25, much lower than the 0.48 associated with the sample LM statistic. This
result reveals the implications of statistical inference in finite samples. I also provide a bias-
corrected estimate for the GARCH model conditional mean return by bootstrapping the bias. The
bias was found to be -0.02%, indicating that the maximum likelihood estimator is fairly biased.
Page 2 of 22
1. Introduction
Modeling stock returns has confounded finance researchers for quite some time. Common
models include constant expected return models (mean models) or market models such as the
capital asset pricing model (CAPM). Stock returns exhibit what financial econometricians call
either sign, and small changes tend[ing] to be followed by small changes.” As time series
variables that exhibit autocorrelation are modeled with AR and/or MA processes, so too are their
variances; the series can be estimated using an ARCH or GARCH process to treat the
autocorrelation in the variance. Before modeling any time series variable using an ARCH or
GARCH process, Engle’s (1982) Lagrange Multiplier (LM) test should be employed to test for
conditional heteroscedasticity.
In this paper, I model daily returns of the S&P 500 index using a GARCH(1,2) expected
return model and implement Dwass (1957) and Barnard’s (1963) Monte Carlo test technique to
calculate p-values for the ARCH LM test at a varying number of replications. I also provide a
bias-corrected estimate of the mean return generated from the GARCH(1,2) model using
bootstrap methods. The results shown in this paper reveal the significant changes in the p-value
associated with the LM test-statistic and the estimate for the mean return after correcting for bias.
This highlights for financial econometricians the finite sample properties of the LM test and the
related implications. This research also highlights the potential biasedness of the maximum
Page 3 of 22
2. Literature Review
In his 1982 paper, Engle introduces the ARCH model and ARCH Lagrange Multiplier test
based on the autocorrelation of the squared OLS residuals. Unlike traditional models, Engle
assumes a non-constant one-period forecast variance. Past attempts to model the mean and
variables is of importance since economic theory suggests that agents respond to higher moments
of economic variables than just the mean. For example, in portfolio theory, the mean and
variance are used as measures for return and risk, which determine portfolio allocation decisions.
Engle allows for the variance to change stochastically and finds that UK inflation has significant
ARCH effects. He finds that specifying models with disturbances that follow an ARCH process
result in better-performing least squares models and more realistic forecast variances. Engle also
shows that maximum likelihood estimation is more efficient than OLS when disturbances follow
an ARCH process.
Another empirical paper by French, Schwert & Stambaugh (1987) examines the relationship
between the expected risk premium (excess return) on common stocks and predictable volatility
using a market model. The authors use a GARCH-M model on S&P 500 data but do not conduct
any ARCH LM tests. They find a positive relationship between excess returns and predictable
volatility. They also find that the GARCH-M model produces similar results to an ARIMA
This paper implements a Monte Carlo test procedure similar to Dufour, Khalaf, Bernard &
Genest (2004). However, Dufour et al. (2004) conduct a pure simulation-based procedure and do
not use actual data. The authors conduct Monte Carlo experiments under specific data generating
processes for the disturbances to analyze the properties of finite sample tests for
Page 4 of 22
heteroscedasticity and ARCH effects. To analyze the power of the tests, the authors obtain
rejection percentages at a 5% nominal size using 10,000 replications. They find that Engle’s test
is undersized, which leads to substantial power losses, and that the size distortions are larger
when non-normal errors are used in the DGPs. One of the most important findings in this paper
is that the Monte Carlo tests yield significant power gains, especially with non-normal errors.
My project contributes to the literature by simulating the finite sample null distribution
through Monte Carlo procedures to analyze the properties of finite sample ARCH LM tests.
Unlike the aforementioned papers, I conduct the Monte Carlo test procedure using actual data
and also obtain bias-corrected maximum likelihood estimates for the mean return by
Daily index prices for the S&P 500 are obtained for the period January 3, 1950 to March 7,
2018. There are a total of 17,155 observations that include trading days only. The data were
retrieved from Yahoo! Finance1. Daily returns are constructed by taking the first difference of
the log price series. Figure 1 below plots the daily returns and Figure 2 plots the daily squared
returns. Summary statistics are provided in Table 1 below. The mean daily return over the
sample period is 0.03% with a standard deviation of 0.97% and a variance of 0.93%-squared.
Daily returns over the sample range from a low of -22.9% and a high of 10.96%. The mean
squared daily return is 0.93% with a standard deviation of 5.03% and a variance of 25.30%-
squared.
As is evident in Figure 1, the biggest one-day decline in the S&P 500 of -22.9% occurred on
October 19, 1987, now referred to as “Black Monday.” Other periods of increased volatility
1
Data source: https://ca.finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC
Page 5 of 22
include the late-1990s and early-2000s, which coincide with the dot-com bubble. The most
recent period of increased volatility occurred during the global financial crisis in late-2008.
Figure 2 plots the daily squared returns, which further highlights the periods of increased
volatility. The spikes in volatility in 1987, the late-1990s, early-2000s and in 2008 are also
Std
Mean Dev Var Min Max
Return 0.03 0.97 0.93 -22.90 10.96
Return2 0.93 5.03 25.30 0.00 524.40
10
-5
-10
-15
-20
-25
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Page 6 of 22
Figure 2 – Daily squared returns of S&P 500
500
400
300
200
100
0
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Figure 3 shows the correlogram of the daily returns, while Figure 4 shows the squared daily
returns. At first glance, the returns appear to be not autocorrelated as the correlogram does not
(PACFs) also do not show any large spikes, indicating that there is no autocorrelation in the
series. The latter graph of the squared daily returns is presented in order to provide evidence for
conditional heteroscedasticity (volatility clustering). The ACFs of the squared returns exhibit a
slow decay, indicating their autoregressive behaviour. The PACFs do not show as much of a
slow-decaying pattern as the ACFs but tend to spike at specific lag orders. This indicates that,
after controlling for a specific lag, all other lags exhibit autoregressive behaviour, which hints at
It should be noted that for the returns, the Ljung-Box Q-statistics are large and their
associated p-values are zero, indicating a rejection of the null hypothesis that there is no
autocorrelation up to that specific lag. Thus, we cannot say that the returns are not
Page 7 of 22
autocorrelated. However, there is an even stronger rejection of the null hypothesis of no
autocorrelation in the squared returns since the Q-statistics range in the thousands after lag order
1, while they were less than 100 for the returns themselves.
Page 8 of 22
4. Methodology
Since the squared returns exhibit a large degree of autoregressive behaviour, we could model
the series with multiple AR terms as independent variables. Mills and Markellos (2008) control
for the high number of AR terms by including an MA(1) process as an independent variable, and
model the series as a GARCH(1,2). This is consistent with the literature that suggests modeling
series that have slow-decaying ACFs and spikes in PACFs as GARCH processes (Penn State
University, n.d.). Originally, an ARCH(1) model was specified but it did not adequately treat the
autocorrelation in the error variances. This led to a zero p-value associated with the LM-statistic
for each replication, and consequently a zero Monte Carlo p-value for every different number of
replications, an uninteresting result. Instead, I follow Mills and Markelos (2008) and estimate a
GARCH(1,2) model on S&P 500 daily returns, except I exclude the MA(1) term as an
The sample regression is a GARCH(1,2) constant expected return model and regresses the
𝑟𝑡 = 𝜇 + 𝜖𝑡
That is, the daily return is composed of a constant non-zero mean, 𝜇, and a disturbance, 𝜖𝑡 ,
which is the product of a time-dependent standard deviation and a stochastic process i.e. a white
noise term:
𝜖𝑡 = 𝜎𝑡 𝑒𝑡
𝑒𝑡 ~ 𝑤ℎ𝑖𝑡𝑒 𝑛𝑜𝑖𝑠𝑒(0, 1)
Page 9 of 22
The original ARCH(1) model specifies the variance equation as:
2
𝜎𝑡2 = 𝛼0 + 𝛼1 𝜖𝑡−1
The Monte Carlo test technique begins by estimating the sample regression, in this case, the
GARCH(1,2) mean model of daily returns. Engle’s ARCH LM test is then performed post
estimation, where the sample LM test-statistic, LM0 is obtained. The standard error of the
regression, 𝜎̂, is obtained to generate a parametric random sample of residuals for each
replication j, j = 1, 2, …, J:
𝑣𝑡 ~𝑁(0, 𝜎̂ 2 )
For each replication, a series of new returns, 𝑛𝑒𝑤𝑟𝑡 is generated using the estimated mean return
𝑛𝑒𝑤𝑟𝑡 = 𝜇̂ + 𝑣𝑡
The J return series are used to estimate the GARCH(1,2) model J times. Obtaining the ARCH
LM test-statistic programmatically in EViews requires a few steps. First, the residuals and the
variance from the regression are obtained, where the variance is used to scale the residuals to
𝑒̂
𝑒̃ =
√𝜎̃ 2
Second, the weighted-residuals are squared and used in the following auxiliary regression to test
Page 10 of 22
Following Mills and Markellos (2008), I use a constant and 12 lags of the residual as
independent variables. Finally, the ARCH LM test statistic is obtained by multiplying the
number of observations by the R-squared in the auxiliary regression, and is denoted LMj.
To obtain a Monte Carlo p-value, I count the number of times LMj is greater than LM0:
𝐽
𝑐𝑜𝑢𝑛𝑡 + 1
𝑝𝑀𝐶 =
𝐽+1
According to Dufour (2005), implementing permutation tests has the desirable feature that exact
(randomized) tests can be obtained for statistics with intractable finite sample distributions that
can be simulated, as long as there are no nuisance parameters. Dwass (1957) proved that the
choice of count will result in an exact test of size k/Q for a chosen significance level. To achieve
this result, the count must be an integer, and therefore the number of permutations, or
𝑐𝑜𝑢𝑛𝑡 = 𝑘 × (𝐽 + 1)/𝑄 − 1,
𝛼 × (𝐽 + 1)
For example, for a significance level of 5%, J can be as low as 19. Dwass showed that the
number of replications achieves size control as long as the above is satisfied, and does not impact
the validity of the test, only the power. Increasing J will increase power but power gains at very
Page 11 of 22
4.3. Bootstrap bias correction
I also provide bias-corrected estimates of the mean return from the GARCH(1,2) model by
bootstrapping the bias. The first step is to estimate the GARCH model, save the residuals and the
parameter, 𝜇𝑛 , while the expected value of the estimator is the average of all the mean returns
𝑏𝑖𝑎𝑠 = 𝐸(𝜇̂ 𝑛 ) − 𝜇𝑛
𝑏𝑖𝑎𝑠𝑏𝑜𝑜𝑡 = 𝜇̅𝐵 − 𝜇𝑛
1
where 𝜇̅𝐵 = (𝐵) ∑𝐵𝑏=1 𝜇𝑛,𝑏
and B is the number of replications. Similar to the Monte Carlo test procedure, a new series
of returns is generated using the estimated mean return and a random sample of residuals. For
replacement). B estimations of the GARCH model are performed, which regresses the new return
series on a constant. The estimated mean returns from the B regressions are the 𝜇𝑛,𝑏 in the third
equation, and are averaged to obtain 𝜇̅𝐵 . The bias-corrected mean return is then:
As can be seen in Table 2 below, the estimated conditional mean daily return is 0.049%. This
is quite different from the unconditional mean of 0.03% reported in Table 1. The reported
standard error is 0.0000519 which leads to a Z-statistic of 9.49 and an associated p-value of 0.
Thus, we would reject the null of no significance at all conventional levels. The variance
equation estimates indicate that the variance is adequately specified as a GARCH(1,2) process as
Page 12 of 22
can be seen in the second panel of the table. The estimate for the constant term is near zero,
while the one-period lagged squared residual term has a coefficient of 0.11. The Z-statistic on the
squared residual term is 35.08 with a p-value of zero, indicating that there are statistically
significant ARCH effects. The coefficients on the GARCH terms are 0.60 and 0.29 for lags of
one and two, respectively. Both have p-values of zero, indicating that there are statistically
significant GARCH effects. If there were no ARCH or GARCH effects, only the constant term
would be significant, and if there were only ARCH effects, only the constant and squared
Page 13 of 22
The reported sample LM statistic, which I denote LM0, is 11.55 with an associated p-value of
0.48. This value will be compared to each replication of the LM statistic, denoted LMj, in the
Monte Carlo test method. Since the p-value is greater than the conventional significance levels,
we cannot reject the null and can conclude that the GARCH(1,2) specification adequately treats
the conditional heteroscedasticity. The reported standard error of the regression, 0.0097, is used
in the parametric generation of new residuals in the Monte Carlo test procedure.
I simulate Monte Carlo p-values at 19, 49, 99, 199 and 499 replications. As discussed in
Section 4.2, these number of replications satisfy Dwass’ (1957) criteria to obtain an exact test.
The results in Table 3 show that as the number of replications increases, the p-value decreases
until the number of replications reaches 199. The p-value is as high as 0.50 at 19 replications and
as low as 0.25 at 199 replications. These values are in contrast to the sample p-value of 0.48.
Nonetheless, we would still fail to reject the null hypothesis, and can conclude that the
Monte
Number of Carlo p-
replications value
19 0.50
49 0.28
99 0.27
199 0.25
499 0.26
999 0.27
Page 14 of 22
Figure 5 shows the distribution of the LM statistics for each number of replications (i.e. the
simulated finite sample null distribution). It is interesting to see how the chi-square distribution
becomes more evident as J increases. At higher values of J (i.e. greater than 19), the majority of
the LM statistics range between eight and ten, which may be quite different from the sample
value of 11.55.
J = 19 J = 49
LM_STAT LM_STAT
.09 .12
.08
.10
.07
.06 .08
.05
Density
Density
.06
.04
.03 .04
.02
.02
.01
.00 .00
2 4 6 8 10 12 14 16 18 20 22 24 26 0 2 4 6 8 10 12 14 16 18 20 22 24 26
J = 99 J = 199
LM_STAT LM_STAT
.14 .12
.12
.10
.10
.08
.08
Density
Density
.06
.06
.04
.04
.02
.02
.00 .00
0 2 4 6 8 10 12 14 16 18 20 22 24 0 10 20 30 40 50 60 70 80 90
Page 15 of 22
J = 499 J = 999
LM_STAT LM_STAT
.12 .10
.10
.08
.08
.06
Density
Density
.06
.04
.04
.02
.02
.00 .00
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 110 120
A bias-corrected mean return is obtained by bootstrapping the estimator’s bias with 100 and
1,000 replications. The bias is -0.0199% at 100 replications and -0.0193% at 1,000 replications.
The resulting bias-corrected mean returns are 0.0691% and 0.0685%, respectively, indicating a
Table 4: Bootstrap bias-corrected mean return, bias, and expected mean return (%)
Number of
𝜇𝑏𝑜𝑜𝑡 𝑏𝑖𝑎𝑠𝑏𝑜𝑜𝑡 𝜇̅𝐵
replications
100 0.0691 -0.0199 0.0293
1,000 0.0685 -0.0193 0.03
5.4. Implications
The Monte Carlo LM tests produced significant differences between the sample p-value and
the Monte Carlo p-values. At 199 replications the p-value was 0.25 compared to the sample
value of 0.48, a difference of 23 percentage points. This suggests that Engle’s LM test may lead
us to under-reject the null in finite sample sizes. This has negative consequences for modeling
time series that have ARCH effects; one might reject at higher significance levels than
Page 16 of 22
appropriate2, or fail to detect ARCH effects altogether. As Dufour et al. (2004) highlight, the
estimate for the conditional mean return can be inconsistent if one is unable to detect
There was also a significant difference between the sample mean daily return and its bias-
corrected counterpart. At 1,000 replications, the mean return was 0.0193 percentage points
higher than the 0.0492% sample estimate, which is almost 40% higher. On an annual return
basis, this difference is around 5 percentage points—the difference between 12% and 17%,
obtained by multiplying the mean daily return by 252 trading days. The consequence is that
6. Conclusion
This paper obtains the finite sample null distribution of the ARCH LM statistic through
Monte Carlo simulations to reveal the poor approximation of the asymptotic null distribution in
finite samples. From the simulated LM statistics, Monte Carlo p-values are obtained at a varying
number of replications. A bias-corrected mean return is also obtained by bootstrapping the bias.
The significant difference between the sample p-value and the Monte Carlo p-value suggests
that employing the ARCH LM test in finite sample sizes can result in an under-rejection of the
null hypothesis of the presence of conditional heteroscedasticity. This could lead to inconsistent
conditional moment estimates in GARCH models, and result in inaccurate forecast variances,
which is undesirable for investors or policymakers who regularly conduct forecasts. For
empirical papers that use ARCH LM tests, it would be advisable to conduct Monte Carlo tests.
The estimate of the mean daily return from the GARCH(1,2) model was 0.0492%. After
correcting for the negative bias, this estimate increases to 0.0685% at 1,000 replications. This
2
For example, for a sample p-value of 7% and a Monte Carlo p-value of 2%, one would reject at the 10% level
when they should reject at the 5% level.
Page 17 of 22
reveals that even with fairly large samples, the maximum likelihood estimator is quite biased,
and this has the implication that average returns may be higher than what models predict.
This paper creates an opening for a fair amount of future research. The Monte Carlo test
procedure in this paper randomly generates residuals parametrically, which assumes that the
residuals are normally distributed with mean zero and standard deviation equal to the standard
error of the GARCH(1,2) regression. Further research could see how the results differ by either
Like in Dufour et al. (2004) the residuals could take on a standard normal, chi-square, Student’s
This paper could also be extended to look at the power of the test. Dufour et al. (2004)
analyze the power of the ARCH LM test but do not apply the procedure to actual data. One could
generate the new return series using a DGP that satisfies the alternative hypothesis of conditional
heteroscedasticity.
Other areas of future research could also re-specify the original model. The GARCH(1,2)
model is just one of several time series specifications. For example, one could instead model a
(EGARCH) or an Asymmetric Power ARCH (APARCH) process, which model the mean or
constant expected return model, one could estimate a CAPM model like French et al. (1986).
Finally, the methods employed here could also be applied to other datasets that exhibit
Page 18 of 22
Appendix
cd "\\home\mcmow\econ546\project\"
freeze(correl) returns.correl(12)
correl.save(t=png) \\home\mcmow\econ546\project\correl.png
freeze(correl_sq) returns_sq.correl(12)
correl_sq.save(t=png) \\home\mcmow\econ546\project\correl_sq.png
' **********************************************************
' ********** THE SAMPLE VALUE OF LM ***********
' **********************************************************
' estimate ARCH/GARCH model
equation arch1.arch(1, 2) returns c
Page 19 of 22
' **********************************************************
' ****** THE MONTE CARLO VALUE OF LM ******
' **********************************************************
matrix(nrep, 1) lm_stat
if lm_mc>=lm0 then
scalar count=count+1
endif
next
scalar pvalue=(count+1)/(nrep+1)
wfsave main
Page 20 of 22
A2. Code from bootstrap bias correction
cd "\\home\mcmow\econ546\project\"
rndseed 88888888
scalar nrep=100
scalar sum=0
scalar tbarb=sum/nrep
scalar bias_boot=tbarb-tb
scalar t_boot=2*tb-tbarb
scalar sum2=0
scalar vboot=sum2/nrep
wfsave bias_correct
Page 21 of 22
References
Clarke, J. (2018). A brief look at some simulation methods. ECON 546: Themes in Econometrics
lecture notes.
Dufour, J-M., (2006). Monte Carlo tests with nuisance parameters: A general approach to finite-
477.
Dufour, J-M., Khalaf, L., Bernard, J-T., Genest, I. (2004). Simulation-based finite-sample tests
French, R.F., Schwert, G.W., & Stambaugh, R.F. (1986). Expected stock returns and volatility.
Mandelbrot, B. (1963). The Variation of Certain Speculative Prices. The Journal of Business,
36(4), 394-419.
Mills, T.C. & Markellos, R.N. (2008). The Econometric Modelling of Financial Time Series
Penn State University, STAT 510 (n.d.). Department of Statistics Online Learning. Retrieved
from: https://onlinecourses.science.psu.edu/stat510/node/85
Yahoo Finance (2018, March 7). S&P 500 (^GSPC). SNP - SNP Real Time Price. Currency in
USD. Retrieved from:
https://ca.finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC
Page 22 of 22