Professional Documents
Culture Documents
Solutions To The Review Questions at The End of Chapter 5: y y + U y 0.5 y y 0.8 U + U
Solutions To The Review Questions at The End of Chapter 5: y y + U y 0.5 y y 0.8 U + U
2. ARMA models are of particular use for financial series due to their
flexibility. They are fairly simple to estimate, can often produce reasonable
forecasts, and most importantly, they require no knowledge of any structural
variables that might be required for more “traditional” econometric analysis.
When the data are available at high frequencies, we can still use ARMA
models while exogenous “explanatory” variables (e.g. macroeconomic
variables, accounting ratios) may be unobservable at any more than monthly
intervals at best.
3. yt = yt-1 + ut (1)
yt = 0.5 yt-1 + ut (2)
yt = 0.8 ut-1 + ut (3)
(a) The first two models are roughly speaking AR(1) models, while the last is
an MA(1). Strictly, since the first model is a random walk, it should be called
an ARIMA(0,1,0) model, but it could still be viewed as a special case of an
autoregressive model.
(b) We know that the theoretical acf of an MA(q) process will be zero after q
lags, so the acf of the MA(1) will be zero at all lags after one. For an
autoregressive process, the acf dies away gradually. It will die away fairly
quickly for case (2), with each successive autocorrelation coefficient taking on
a value equal to half that of the previous lag. For the first case, however, the acf
will never die away, and in theory will always take on a value of one, whatever
the lag.
Turning now to the pacf, the pacf for the first two models would have a large
positive spike at lag 1, and no statistically significant pacf’s at other lags.
Again, the unit root process of (1) would have a pacf the same as that of a
stationary AR process. The pacf for (3), the MA(1), will decline geometrically.
(c) Clearly the first equation (the random walk) is more likely to represent
stock prices in practice. The discounted dividend model of share prices states
that the current value of a share will be simply the discounted sum of all
expected future dividends. If we assume that investors form their expectations
about dividend payments rationally, then the current share price should
Thus stock prices should follow a random walk. Note that we could apply a
similar rational expectations and random walk model to many other kinds of
financial series.
If the stock market really followed the process described by equations (2) or
(3), then we could potentially make useful forecasts of the series using our
model. In the latter case of the MA(1), we could only make one-step ahead
forecasts since the “memory” of the model is only that length. In the case of
equation (2), we could potentially make a lot of money by forming multiple
step ahead forecasts and trading on the basis of these.
Hence after a period, it is likely that other investors would spot this potential
opportunity and hence the model would no longer be a useful description of
the data.
(d) See the book for the algebra. This part of the question is really an extension
of the others. Analysing the simplest case first, the MA(1), the “memory” of
the process will only be one period, and therefore a given shock or
“innovation”, ut, will only persist in the series (i.e. be reflected in yt) for one
period. After that, the effect of a given shock would have completely worked
through.
For the case of the AR(1) given in equation (2), a given shock, ut, will persist
indefinitely and will therefore influence the properties of yt for ever, but its
effect upon yt will diminish exponentially as time goes on.
In the first case, the series yt could be written as an infinite sum of past
shocks, and therefore the effect of a given shock will persist indefinitely, and
its effect will not diminish over time.
4. (a) Box and Jenkins were the first to consider ARMA modelling in this
logical and coherent fashion. Their methodology consists of 3 steps:
Identification - determining the appropriate order of the model using
graphical procedures (e.g. plots of autocorrelation functions).
Estimation - of the parameters of the model of size given in the first stage.
This can be done using least squares or maximum likelihood, depending on
the model.
Diagnostic checking - this step is to ensure that the model actually estimated
is “adequate”. B & J suggest two methods for achieving this:
If the model appears to be adequate, then it can be used for policy analysis
and for constructing forecasts. If it is not adequate, then we must go back to
stage 1 and start again!
(b) The main problem with the B & J methodology is the inexactness of the
identification stage. Autocorrelation functions and partial autocorrelations for
actual data are very difficult to interpret accurately, rendering the whole
procedure often little more than educated guesswork. A further problem
concerns the diagnostic checking stage, which will only indicate when the
proposed model is “too small” and would not inform on when the model
proposed is “too large”.
We can calculate the value of Akaike’s (AIC) and Schwarz’s (SBIC) Bayesian
information criteria using the following respective formulae
AIC = ln ( 2 ) + 2k/T
SBIC = ln ( 2 ) + k ln(T)/T
5. The best way to check for stationarity is to express the model as a lag
polynomial in yt.
y t 0.803 y t 1 0.682 y t 2 ut
Rewrite this as
y t (1 0.803 L 0.682 L2 ) ut
We want to find the roots of the lag polynomial (1 0.803 L 0.682 L2 ) 0 and
determine whether they are greater than one in absolute value. It is easier (in
my opinion) to rewrite this formula (by multiplying through by -1/0.682,
using z for the characteristic equation and rearranging) as
z2 + 1.177 z - 1.466 = 0
1177
. 1177
. 2
4 * 1 * 1.466
z = 0.758 or 1.934
2
Since ALL the roots must be greater than one for the model to be stationary,
we conclude that the estimated model is not stationary in this case.
6. Using the formulae above, we end up with the following values for each
criterion and for each model order (with an asterisk denoting the smallest
value of the information criterion in each case).
The result is pretty clear: both SBIC and AIC say that the appropriate model is
an ARMA(3,2).
7. We could still perform the Ljung-Box test on the residuals of the estimated
models to see if there was any linear dependence left unaccounted for by our
postulated models.
Another test of the models’ adequacy that we could use is to leave out some of
the observations at the identification and estimation stage, and attempt to
construct out of sample forecasts for these. For example, if we have 2000
observations, we may use only 1800 of them to identify and estimate the
models, and leave the remaining 200 for construction of forecasts. We would
then prefer the model that gave the most accurate forecasts.
8. This is not true in general. Yes, we do want to form a model which “fits” the
data as well as possible. But in most financial series, there is a substantial
amount of “noise”. This can be interpreted as a number of random events that
are unlikely to be repeated in any forecastable way. We want to fit a model to
the data which will be able to “generalise”. In other words, we want a model
which fits to features of the data which will be replicated in future; we do not
want to fit to sample-specific noise.
This clearly looks like the data are consistent with a first order moving average
process since all but the first acfs are not significant (the significant lag 4 acf is
a typical wrinkle that one might expect with real data and should probably be
ignored), and the pacf has a slowly declining structure.
m
k2
Q* T (T 2) m2
k 1 T k
In this case, T=100, and m=3. The null hypothesis is H0: 1 = 0 and 2 = 0 and
3 = 0. The test statistic is calculated as
i.e. E t 1 ( y t y t 2 , y t 3 ,...)
So ft-1,1 = b1u t 1
But
E ( yt 1 yt 1 , yt 2 ,...) = E (u t 1 b1u t )
= 0
Suppose that we know t-1, t-2,... and we are trying to forecast yt.
Our forecast for t is given by
(b) Given the forecasts and the actual value, it is very easy to calculate the
MSE by plugging the numbers in to the relevant formula, which in this case is
N
1
MSE
N
n 1
( x t 1 n f t 1, n ) 2
MSE
1
3
(1.836 0.032) 2 (1.302 0.961) 2 (0.935 0.203) 2
1
(3.489 0.116 0.536) 1.380
3
Notice also that 84% of the total MSE is coming from the error in the first
forecast. Thus error measures can be driven by one or two times when the
model fits very badly. For example, if the forecast period includes a stock
market crash, this can lead the mean squared error to be 100 times bigger than
it would have been if the crash observations were not included. This point
needs to be considered whenever forecasting models are evaluated. An idea of
whether this is a problem in a given situation can be gained by plotting the
forecast errors over time.
(c) This question is much simpler to answer than it looks! In fact, the
inclusion of the smoothing coefficient is a “red herring” - i.e. a piece of
misleading and useless information. The correct approach is to say that if we
believe that the exponential smoothing model is appropriate, then all useful
information will have already been used in the calculation of the current
smoothed value (which will of course have used the smoothing coefficient in
its calculation). Thus the three forecasts are all 0.0305.
(d) The solution is to work out the mean squared error for the exponential
smoothing model. The calculation is
Therefore, we conclude that since the mean squared error is smaller for the
exponential smoothing model than the Box Jenkins model, the former
produces the more accurate forecasts. We should, however, bear in mind that
the question of accuracy was determined using only 3 forecasts, which would
be insufficient in a real application.
11. (a) The shapes of the acf and pacf are perhaps best summarised in a table:
A couple of further points are worth noting. First, it is not possible to tell what
the signs of the coefficients for the acf or pacf would be for the last three
processes, since that would depend on the signs of the coefficients of the
processes. Second, for mixed processes, the AR part dominates from the point
of view of acf calculation, while the MA part dominates for pacf calculation.
(b) The important point here is to focus on the MA part of the model and to
ignore the AR dynamics. The characteristic equation would be
(1+0.42z) = 0
The root of this equation is -1/0.42 = -2.38, which lies outside the unit circle,
and therefore the MA part of the model is invertible.
(c) Since no values for the series y or the lagged residuals are given, the
answers should be stated in terms of y and of u. Assuming that information is
available up to and including time t, the 1-step ahead forecast would be for
time t+1, the 2-step ahead for time t+2 and so on. A useful first step would be
to write the model out for y at times t+1, t+2, t+3, t+4:
The 1-step ahead forecast would simply be the conditional expectation of y for
time t+1 made at time t. Denoting the 1-step ahead forecast made at time t as
ft,1, the 2-step ahead forecast made at time t as ft,2 and so on:
since Et[ut+1]=0 and Et[ut+2]=0. Thus, beyond 1-step ahead, the MA(1) part of
the model disappears from the forecast and only the autoregressive part
remains. Although we do not know yt+1, its expected value is the 1-step ahead
forecast that was made at the first stage, ft,1.
(e) Moving average and ARMA models cannot be estimated using OLS – they
are usually estimated by maximum likelihood. Autoregressive models can be
estimated using OLS or maximum likelihood. Pure autoregressive models
contain only lagged values of observed quantities on the RHS, and therefore,
the lags of the dependent variable can be used just like any other regressors.
However, in the context of MA and mixed models, the lagged values of the
error term that occur on the RHS are not known a priori. Hence, these
quantities are replaced by the residuals, which are not available until after the
model has been estimated. But equally, these residuals are required in order to
be able to estimate the model parameters. Maximum likelihood essentially
works around this by calculating the values of the coefficients and the
residuals at the same time. Maximum likelihood involves selecting the most
likely values of the parameters given the actual data sample, and given an
assumed statistical distribution for the errors. This technique will be
discussed in greater detail in the section on volatility modelling in Chapter 8.
m m
k2
Q T k2 and Q* T (T 2) .
k 1 k 1 T k
and
(d) Forecasts from this ARMA model would be produced in the usual way.
Using the same notation as above, and letting fz,1 denote the forecast for time
z+1 made for x at time z, etc:
Model A: MA(1)
f z ,1 0.38 0.10u t 1
f z , 2 0.38 0.10 0.02 0.378
f z , 2 f z , 3 0.38
Note that the MA(1) model only has a memory of one period, so all forecasts
further than one step ahead will be equal to the intercept.
Model B: AR(2)
(e) The methods are overfitting and residual diagnostics. Overfitting involves
selecting a deliberately larger model than the proposed one, and examining
the statistical significances of the additional parameters. If the additional
parameters are statistically insignificant, then the originally postulated model
is deemed acceptable. The larger model would usually involve the addition of
one extra MA term and one extra AR term. Thus it would be sensible to try an
ARMA(1,2) in the context of Model A, and an ARMA(3,1) in the context of
Model B. Residual diagnostics would involve examining the acf and pacf of the
residuals from the estimated model. If the residuals showed any “action”, that
is, if any of the acf or pacf coefficients showed statistical significance, this
would suggest that the original model was inadequate. “Residual diagnostics”
in the Box-Jenkins sense of the term involved only examining the acf and
pacf, rather than the array of diagnostics considered in Chapter 4.
(f) There are obviously several forecast accuracy measures that could be
employed, including MSE, MAE, and the percentage of correct sign
predictions. Assuming that MSE is used, the MSE for each model is
1
MSE ( Model A)
4
(0.378 0.62) 2 (0.38 0.19) 2 (0.38 0.32) 2 (0.38 0.72) 2 0.175
1
MSE( Model B) (0.681 0.62) 2 (0.718 0.19) 2 (0.690 0.32) 2 (0.683 0.72) 2 0.326
4
Therefore, since the mean squared error for Model A is smaller, it would be
concluded that the moving average model is the more accurate of the two in
this case.