Time Series Analysis: Conditional Volatility Models: Mei-Yuan Chen

Time Series Analysis: Conditional Volatility Models
Mei-Yuan Chen
Department of Finance
National Chung Hsing University
Feb, 25, 2013

Contents
1 Introduction 1
1.1 Descriptive Statistics of Heteroskedasticity . . . . . . . . . . . . . . . . . . . . 2
1.2 Heteroskedastic Residuals in a Regression . . . . . . . . . . . . . . . . . . . . 3
2 Linear Volatility Models 4

2.1 ARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Simulated and Estimate ARMA-GARCH Time series using R . . . . . 8
2.3 ARCH-in-mean(or ARCH-M) Models . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 IGARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 GARCH-L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Stochastic Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Nonlinear GARCH Models 10

3.1 Exponential GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 GJR-GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Smooth Transition GARCH Models . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Volatility-Switching GARCH Models . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Asymmetric Nonlinear Smooth Transition GARCH Models . . . . . . . . . . 13
3.6 Quadratic GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.7 Markov-Switching GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . 13
3.8 ARCH Models with Conditionally Non-normal Disturbances . . . . . . . . . . 14
3.8.1 t-distributed Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.9 Long Memory Stochastic Volatility (LMSV) Models . . . . . . . . . . . . . . 15
4 Testing for GARCH 17

4.1 Testing for Linear GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Testing for Nonlinear GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Testing for ARCH in the Presence of Misspecification . . . . . . . . . . . . . 20
4.4 Testing for ARCH in the Presence of Outliers . . . . . . . . . . . . . . . . . . 20
i
5 Estimation for Conditional Variance Models 23
5.1 Quasi Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Robust Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Diagnostic Checking: Testing Properties of Standardized Residuals 27

6.1 Testing for Higher-order GARCH . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Testing Parameter Constancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7 Forecasting Volatility 31
7.1 Forecasting the Conditional Mean in the Presence of Conditional Heteroskedas-
ticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Forecasting the Conditional Variance . . . . . . . . . . . . . . . . . . . . . . . 33
7.3 Forecasting Conditional Volatility for Nonlinear GARCH Models . . . . . . . 35
7.4 Evaluating Forecasts of Conditional Volatility . . . . . . . . . . . . . . . . . . 37
8 Multivariate Conditional Variance Models 38

8.1 Impulse Respond Function for Multivariate GARCH Model . . . . . . . . . . 41
8.2 Testing Spillover Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
ii
1 Introduction
Uncertainty, or risk, plays an important role in financial analysis and is usually measured with
volatility. Obviously, the volatility of an asset is not observable so its modeling is necessary.
Based on a constructed model, the volatility can be measured as well as be predicted. The
prediction of volatility is crucial for option pricing and risk management (estimation for
value-at risk). In literature, numerous volatility models have been suggested to capture the
characteristics of return for an asset. Characteristics of the return of an asset recognized in
financial literature are as follows.
1. The volatility of an asset evolves over-time in a continuous manner.
2. Periods of large movements in prices alternate with periods during which prices hardly
changes. This characteristic feature is commonly referred as volatility clustering.
3. The volatility of an asset does not diverge to infinite.
4. Asymmetric movement of the volatility exists, i.e., a large (small) change in prices is
more likely followed be a large (small) price change.
5. Excess kurtosis or fat-tailedness is commonly observed.
Let rt = log(pt ) log(pt1 ) be the log return of an asset and pt be the price level of the
asset at t. Suppose the conditional mean and conditional variance of rt given Ft1 are
E(rt |Ft1 ) = t
var(rt |Ft1 ) = E[(rt t )2 |Ft1 ] = ht , (1)
where Ft1 denotes the information set available at time t 1. In general, the time series
{rt } can be represented as the sum of a predictable and an unpredictable part,
rt = E(rt |Ft1 ) + t . (2)
To allow ht for time-varying, ht ht (Ft1 ) is specified and then t is conditional hetero-

cedastic that can be expressed as
p
t = zt ht ,
1
where zt is a white noise with mean zero and variance 1. The unconditional variance of t is
2 E(2t ) = E[E(t |Ft1 )] = E[ht ],
which is usually assumed to be constant, i.e., E(ht ) is constant. In literature, there are lot
of volatility models have been suggested and they can be classified into linear and nonlinear
representations.
1.1 Descriptive Statistics of Heteroskedasticity
Time-variation in volatility (heteroskedasticity) is a common feature of macroeconomic and

financial data. The perhaps most straightforward way to gauge heteroskedasticity is to esti-
mate a time-series of variances on rolling samples. For a zero-mean variable, rt t , this
could mean
ht = [(rt1 t1 )2 + (rt2 t2 )2 + + (rtq tq )2 ]/q
Notice that ht depends on lagged information, and could therefore be thought of as the
prediction (made in t 1) of the volatility in t.
Unfortunately, this method can produce quite abrupt changes in the estimate. An alter-
native is to apply an exponential moving average (EMA) estimator of volatility, which uses
all data points since the beginning of the sample but where recent observations carry larger
weights. The weight for lag s be (1 )s where 0 < < 1, so
ht = (1 )[(rt1 t1 )2 + (rt2 t2 )2 + 2 (rtq tq )2 + ].
which can be represented and calculated in a recursive fashion as
ht = (1 )(rt1 t1 )2 + ht1 ,
with initial value which (before the sample) could be assumed to be zero or (perhaps better)
the unconditional variance in a historical sample.
This methods is commonly used by practitioners. For instance, the RISK Metrics uses
this method with = 0.94 for use on daily data. Alternatively, can be chosen to minimize
P
some criterion function like Tt=1 [(rt t )2 ht ]2 .
2
1.2 Heteroskedastic Residuals in a Regression
Suppose we have a regression model
yt = xt 0 + et , E(et ) = 0, cov(xt , et ) = 0. (3)
In the standard case, classical assumptions assume that et is i.i.d. (independently and iden-
tically distributed), which rules out heteroskedasticity.
In case the residuals actually are heteroskedastic, least squares (LS) is nevertheless a
useful estimator: it is still consistent (we get the correct values as the sample becomes really
large) and it is reasonably efficient (in terms of the variance of the estimates). However, the
standard expression for the standard errors (of the coefficients) is (except in a special case,
see below) not correct.
There are two ways to handle this problem. First, we could use some other estimation
method than LS that incorporates the structure of the heteroskedasticity. For instance,
combining the regression model with an ARCH structure of the residuals and estimate
the whole thing with maximum likelihood (MLE) is one way. As a by-product we get the
correct standard errors provided, of course, the assumed distribution is correct. Second,
we could stick to OLS, but use another expression for the variance of the coefficients: a
heteroskedasticity consistent covariance matrix, among which Whites covariance matrix
is the most common.
To test for heteroskedasticity, we can use Whites test of heteroskedasticity. The null
hypothesis is homoskedasticity, and the alternative hypothesis is the kind of heteroskedasticity
which can be explained by the levels, squares, and cross products of the regressors clearly
a special form of heteroskedasticity. To implement Whites test, run a regression of squared
fitted residuals from y t on xt on z t consisting of 1, xti and xti xtj for
e2t = z t + ut (4)
and to test if all the slope coefficients (not the intercept) in are zero. This can be done be
using the statistic T R2 which has the limiting distribution 2 (dim 1), where R2 is from (4)
and dim is the dimension of z t .
3
2 Linear Volatility Models
In financial literature, the risk or the volatility of an asset return is specified as the condi-
tional variance empirically. Particularly, the time series of an asset return exhibits periods
of unusually large volatility followed by periods of relative tranquillity. This phenomenon of
time-varying volatility can be observed in many time-series variables, especially in financial
time-series data. Under such conditions, the assumption of a constant variance of the dis-
turbances in conventional econometric models, i.e., homoscedasticity is inappropriate. This
implies that it is important to construct econometric models which can allow for the variance
changes over time. Time-varying volatility specifications are usually referred to as the family
of ARCH (autoregressive conditional heteroskedastic) models in which constant coefficients
are specified and as the stochastic volatility models in which random coefficients are consid-
ered. In this section, volatility models in the ARCH family and stochastic volatility models
are introduced.
2.1 ARCH Models
There are two basic reasons for being interested in an ARCH model. First, if residuals of
the regression model (3) has ARCH features, then an ARCH model (that is, a specification
of exactly how the ARCH features are generated) can help us estimate the regression model
by maximum likelihood. Second, we may be interested in understanding the ARCH features
more carefully, for instance, as an input in a portfolio choice process or option pricing
Engle (1982) suggestes the heteroskedastic of conditional variance can be formulated as a
linear function of past squared errors. This formulation is called autoregressive conditional
heteroskedasticity (ARCH) model. The ARCH(q) model is
rt = t + t , (5)
p
t = zt ht ,
i.i.d.
zt N (0, 1),
Xq
ht = 0 + i 2ti
i=1
= 0 + (L)2t1 , (6)
where t may be formulated as a regression model with lagged dependent variables and
4
exogenous variable contained in the information set Ft1 or an ARMA model, and q is the
order of the ARCH process, zt and t1 are independent of each other, t is called to follow
an ARCH(q) process. There are many possible applications for ARCH models since the
residuals in (5) can come from an autoregression, an ARMA model, or a standard regression
model. The cluster effect which is the feature of financial data is captured here by ht . The
cluster effect indicates that large and small errors tend to cluster together (in continuous time
periods). For this model to be well defined and the conditional variance ht to be positive,
the following restrictions are needed, i.e., > 0, and i 0 for i = 1, ..., q. Besides, let
vt 2t ht , then the ARCH(q) model in (6) can be rewritten as
2t = 0 + (L)2t1 + vt .
Obviously, the model corresponds directly to an AR(q) model for the squared errors, 2t . The
process is covariance stationary if and only if the sum of the positive autoregressive parameters
is less than one, in which case the unconditional variance equals var(t ) = 0 /(11 q ).
It is obvious that the unconditional variance will be infinite if 1 + + q = 1.
p 2
For q = 1 and having t = 0, rt = ht zt with ht = 0 + 1 rt1 becomes the ARCH(1)
model. It is clear that ht > 0 with probability one provided 1 > 0 and 1 0. As the
unconditional variance var(rt ) is
r2 = var(rt ) = E(rt2 ), with t = 0
= E{E[rt2 |Ft1 ]} = E{0 + 1 rt1

2
}
2
= 0 + 1 E(rt1 ) = 0 + 1 r2
0
= .
1 1
p
given rt is weakly (covariance) stationary and has finite variance. Besides, as rt = 2t = zt ht
given t = 0,
rt2 = ht zt2 = ht [1 + (zt2 1)]
= ht + ht (zt2 1) = [0 + 1 rt1
2
] + [ht zt2 ht ]
2
= [0 + 1 rt1 ] + rt2 ht = 0 + 1 rt1
2
+ t
is a form of AR(1) model in yt2 , where t = rt2 ht is a mean zero innovation uncorrelated
with its past, albeit heteroskedastic. Likewise
2
ht = 0 + 1 rt1
5
= 0 + 1 (t1 + ht1 ) = 0 + 1 ht1 + 1 t1 ,
is a form of AR(1) model in ht . In addition,

cov(rt2 , rts
2 )
corr(rt2 , rts
2
) = = corr(ht , hts )
var(rt2 )
= s1 > 0.
Suppose zt is standard normal and process and rt is weakly stationary and possesses finite
fourth moments. Then, as
E(zt4 ) = 3[E(zt2 )]2 = 3
E(rt4 ) = E(zt4 h2t ) = E(zt4 )E(h2t ) = 3E(h2t )

2
= 3E[(0 + 1 rt1 )2 ]
= 3E[20 + 21 rt1
4 2
+ 20 1 rt1 ]
4
= 3[20 + 21 E(rt1 ) + 20 1 var(rt1 )].

3(0 +20 1 var(rt ))
1321
2 < 1/3
E(rt4 ) =
else
Hence, the excess kurtosis is

E(rt4 )
4 (rt ) = 3
var(rt )2
320 + 60 1 var(rt ) (1 1 )2
= 3
1 321 20
320 + 60 1 [0 /(1 1 )] (1 1 )2
= 3
1 321 20
621
= 0,
1 321
p p
given 21 < 1/3. It is obvious that E(rt4 ) = if 1 1/3 and becomes finite if 1 < 1/3.
The existence of moments is important for the interpretation of the sample correlogram of rt
and rt2 , and inference.
However, suppose that zt is not Gaussian and has the fourth moment E(zt4 ) < . Then,
1
E(rt4 ) < 21
E(zt4 )
E(zt4 ) 3 + 2E(zt4 )21
4 (rt ) = .
1 E(zt4 )21
6
By Cauchy-Schwarz inequality, we have E(zt4 ) 1 and so (rt ) 2. In principle there is
no restriction on 1 so long as E(zt4 ) is close to one. If zt is not i.i.d., then restriction is even
weaker.
Although the ARCH(1) model implies heavy tails and volatility clustering, it does not
in practice generate enough of either. ARCH(q) for q big does a bit better but at a price
in terms of parsimony. There are also many inequality restrictions to impose, that can be
violated otherwise.
2.2 GARCH Models
In the empirical applications of ARCH(q) models a long lag length and a large number of
parameters are often called for. In order to improve this shortcomings, Bollerslev (1986) pro-
posed the generalized ARCH, or GARCH(p,q) model, which specified the conditional variance
to be a function of lagged squared errors and past conditional variance. The GARCH(p, q)
process is given as
rt = t + t ,
p
t = zt ht ,
i.i.d.
zt N (0, 1),
Xq p
X
ht = 0 + i 2ti + i hti ,
i=1 i=1
= 0 + (L)2t + (L)ht , (7)
where
p 0, q 0,
0 > 0, i 0, i = 1, , q,
i 0, i = 1, , p.
For p = 0, the GARCH(p,q) process reduces to an ARCH(q) process, and for p = q = 0,

t is simply white noise. The process is covariance stationary if and only if the sum of
the positive i and i is less than one, in which case the unconditional variance equals
P P
var(t ) = 0 /(1 qi=1 i pi=1 i ). Obviously, the unconditional variance will be infinite
if 1 + + q + 1 + + p = 1.
7
2.2.1 Simulated and Estimate ARMA-GARCH Time series using R
In the packages, TSA and fGarch, a time series with ARMA(p, q) conditional mean and
GARCH(p, q) conditional variance can be simulated. Some example R codes, like TSA garch sim.R
and fGarch sim.R, are to simulated various time series process with conditional mean and
variance.
2.3 ARCH-in-mean(or ARCH-M) Models
From the asset price theorem, the returns of an asset depend on the level of risk as the
asset takes. Thus the conditional variance of an asset can influence its conditional mean.
Therefore, Engle , Lilien, and Robins (1987) extended the basic ARCH framework to impose
the conditional variance into the mean function which is called ARCH-M model. Thus,
the conditional variance in ARCH-M model, ht , is regarded as explanatory variable in the
conditional mean equation specified as
rt = t + ht + t , (8)
p
t = zt ht ,
i.i.d.
zt N (0, 1),
Xq
ht = 0 + i 2ti ,
i=1
In this model, the relationship between rt and ht is what we focused on. It is found that an
increase in the conditional variance will be associated with an increase or a decrease in the
conditional mean of rt depending on the sign of the partial derivative of conditional mean
function with respect to ht . Thus, the ARCH-M model is ideally suited to handling the trade-
off between the risk and the expected return in many financial theories. Formally, estimation
of the ARCH-M model poses no added difficulties. However, unlike the estimation of the
other linear ARCH model, the information matrix obtained under the auxiliary assumption
of the conditional normality is not block diagonal between the parameters in the conditional
mean and variance equations of the model. Thus, consistent estimation in the linear ARCH
model can be obtained even though presence of miss-specification in ht , but consistent es-
timates of the parameters in the conditional mean function requires that the full model be
correctly specified.
8
2.4 IGARCH Models
A common finding in much of the empirical studies using high-frequency financial data con-
cerns the apparent persistence implied by the estimates of the conditional variance equation.
In the linear GARCH(p,q) model the persistence is manifested by the presence of an approx-
imate unit root in the autoregressive polynomial; i.e., 1 + + q + 1 + + p = 1. Engle
and Bollerslev (1986) refer to this model as Integrated in variance or IGARCH which is
q
X p
X
ht = 0 + i 2ti + j htj ;
i=1 j=1
1 + + q + 1 + + p = 1.
Furthermore, the unconditional variance for IGARCH(p,q) model does not exists. In addition,
as noted in Lamoureux and Lastrapes (1990), a GARCH model with parameter shift in the
intercept of the variance equation can bias the estimation toward to an integrated GARCH
(IGARCH) model. This result points out the importance of having a test for testing parameter
stability in conditional variance function, especially in the near integrated case.
2.4.1 GARCH-L
The linear GARCH(p,q) model successfully captures thick tailed, and volatility clustering,
but is not well suited to capture the leverage effect. The empirical evidence for stock returns
reflects that there exists a negative correlation between current returns and future volatility.
It implies a stock price decrease tends to increase subsequent volatility by more than would a
stock price increase of the same magnitude. This phenomenon is reflecting the asymmetry in
the conditional variance equation. Hence, many other parametric formulations explained the
leverage effect have been considered in the literature. Engle and Ng (1993) compare several
alternative specifications of this leverage effect, and conclude that the parameterization of
Glosten, Jagannathan, and Runkle (1993) is the most promising. Their GARCH-L(1,1)
specification is
ht = 0 + 1 2t1 + dt1 2t1 + 1 ht1 ,
where dt1 is a dummy variable that is equal to zero if t1 > 0 and equal to unity if t1 0.
This allows the impact of the squared errors on conditional volatility to be different according
9
to the sign of the lagged error terms. The leverage effect mentioned above predicts that
> 0.
2.5 Stochastic Volatility Models
In the GARCH model, the conditional volatility of rt is driven by the same shocks as its
conditional mean. Furthermore, conditional upon the history of rt as summarized in the
information set Ft1 , current volatility ht is deterministic. An alternative class of volatil-
ity models assumes that ht is subject to an additional contemporaneous shock. The basic
stochastic volatility (SV) model, introduced by Taylor (1989), is given by
p
t = zt ht (9)
ln(ht ) = 0 + 1 ln(ht1 ) + 2 t , (10)
with zt i.i.d.N (0, 1), t i.i.d.N (0, 1), and zt and t are uncorrelated.
3 Nonlinear GARCH Models
For stock returns, it appears to be the case that volatile periods often are initiated by a large
negative shock, which suggests that positive and negative shocks may have an asymmetric
impact on the conditional volatility of subsequent observations. This was recognized by
Black (1976), who suggested that a possible explanation for this finding might be the way
firms are financed. When the value of (the stock of) a firm falls, the debt-to-equity ratio
increases, which in turns leads to an increase in the volatility of the returns on equity. As the
debt-to-equity ratio is also known as leverage of the firm, this phenomenon is commonly
referred to as the leverage effect.
The GARCH models discussed previously can not capture such asymmetric effects of
positive and negative shocks. Most nonlinear GARCH models are motivated by the desire
to capture the different effects of positive and negative shocks on conditional volatility. A
convenient way to compare different GARCH models is by means of the so-called news im-
pact curve (NIC), introduced by Pagan and Schwert (1990) and popularized by Engle and
Ng (1993). The NIC measures how new information is incorporated into volatility. To be more
precise, the NIC shows the relationship between the current shock or news t and conditional
volatility 1-period ahead ht+1 , holding constant all other past and current information.
10
3.1 Exponential GARCH Models
The earliest variant of the GARCH model which allows for asymmetric effects is the Ex-
ponential GARCH (EGARCH), introduced by Nelson (1991). The EGARCH(p, q) model is
given by
q
X
ln(ht ) = 0 + i [zti + (|zti | E(|zti |))]
i=1
p
X
+ j ln(htj ). (11)
j=1
Pq1
Denote z t,q = [zt , zt1 , . . . , ztq ] and then g(z t ) = i=0 i [zti + (|zti | E(|zti |)]. The
function g(z t ) is piecewise linear in z t , as it can be written as
q1
X
g(z t ) = (i + )zti I(zti 0)
i=0
+(i )zti I(zti z < 0) E(zti ).
3.2 GJR-GARCH Models
The model introduced by Glosten, Jagannathan, and Runkle (1993) offers an alternative
method to allow for asymmetric effects of positive and negative shocks to volatility. The
GJR-GARCH(p, q) model can be written as
q
X
ht = 0 + {i 2ti [1 I(ti > 0)]
i=1
p
X
+i 2ti I(ti > 0)} + j htj . (12)
j=1
3.3 Smooth Transition GARCH Models
The GJR-GARCH model (12) can be interpreted as a threshold model. In stead of abrupt
change in parameters caused by the lagged squared shocks in GJR-GARCH models, Hagerud (1997)
and Gonzalez-Rivera (1998) independently applied the idea of smooth transition to allow for a
more gradual change of parameters. The Logistic Smooth Transition GARCH (LSTGARCH(p, q))
11
model is given by
q
X
ht = 0 + {i 2ti [1 F (ti )]
i=1
p
X
+i 2ti F (ti )} + j htj , (13)
j=1
where the function F (ti ) is the logistic function

1
F (t1 ) = , > 0. (14)
1 + exp(ti )
As the function F (ti ) in (14) changes monotonically from 0 to 1 as ti increases, the
impact of 2ti on ht changes also smoothly. When the parameter in (14) becomes large,
the logistic function approaches a step function function which equals 0 for negative ti and
1 for positive ti . In that case, the LSTGARCH(p, q) reduces to the GJR-GARCH(p, q).
The smooth transition model can also be used to describe asymmetric effects of large and
small shocks on conditional volatility by using the exponential function
F (ti ) = 1 exp(2ti ), > 0. (15)
The function F (ti ) in (15) changes from 1 for large negative values of ti to 0 for ti = 0
and increases back again to 1 for large positive values of ti . Thus, the effective parameter
of 2ti in the exponential smooth transition GARCH (ESTGARCH) model changes from i
to i and back to i again.
3.4 Volatility-Switching GARCH Models
The LSTGARCH and GJR-GARCH models assume that the asymmetric behaviour of ht
depends only on the sign of the past shocks ti . In applications, it is typically found that
a negative shock increase the conditional variance more than a positive shock of the same
size. On the other hand, the ESTGARCH model assumes that the asymmetry is caused by
the size of the shock. Rabemananjara and Zakoan (1993) point out that the asymmetric
behaviour of ht may be more complicated and that both the sign and the size of the shock
may be important. In particular, they argue that negative shocks increase future conditional
volatility more than positive shocks only if the shock is larger in absolute value. For small
shocks they observe the opposite kind of asymmetry, in that small positive shocks increase
the conditional volatility more than small negative shocks.
12
Fornari and Mele (1996, 1997) discuss a model which allows for such complicated asymme-
try behaviour. The model is obtained by allowing all parameters in the conditional variance
equation to depend on the sign of the shock ti . The Volatility Switching GARCH (VS-
GARCH) model is given by
q
X p
X
ht = (0 + i 2ti + j htj )[1 I(ti > 0)]
i=1 j=1
X q Xp
+(0 + i 2ti + j htj )I(ti > 0). (16)
i=1 j=1
Clearly, it is a generalization of the GJR-GARCH models.
3.5 Asymmetric Nonlinear Smooth Transition GARCH Models
Anderson, Nam, and Vahid (1999) modify the VS-GARCH model by allowing the transition
from one regime to the other to be smooth. The resulting Asymmetric Nonlinear Smooth
Transition GARCH (ANST-GARCH) model is given by
q
X p
X
ht = (0 + i 2ti + j htj )[1 F (ti )]
i=1 j=1
X q Xp
+(0 + i 2ti + j htj )F (ti ). (17)
i=1 j=1
3.6 Quadratic GARCH Models
Sentana (1995) introduces the Quadratic GARCH (QGARCH) model to cope with asymmet-
ric effects of shocks on volatility. The QGARCH(p, q) model is specified as
q
X p
X
ht = 0 + t1 + i 2ti + j htj (18)
i=1 j=1
3.7 Markov-Switching GARCH Models
In the previous specifications, the parameters in the model change according to the sign
and/or the size of shock ti . Therefore, these models can be interpreted as regime-switching
models where the regime is determined by an observable variable, similar to the SETAR and
STAR models for conditional mean.
13
An obvious alternative is to assume that the regime is determined by an unobservable
Markov-process st . A general Markov-Switching GARCH (MSW-GARCH) model is given by
q
X p
X
ht = (0 + i 2ti + j htj )I(st = 1)
i=1 j=1
X q Xp
+(0 + i 2ti + j htj )I(st = 2). (19)
i=1 j=1
where st is a two-state Markov chain with certain transition probability matrix. The markov-
switching ARCH models for conditional variance is suggested by Hamilton and Susmel (1994)
and Cai (1994). On the other hand, the markov-switching GARCH models for volatility is
suggested in Garry (1996).
3.8 ARCH Models with Conditionally Non-normal Disturbances
In applying ARCH or GARCH model to empirical study, many researchers criticize the
assumption of conditionally normal distribution in the ARCH or GARCH process since the
fat-tail distributed data is usually observed. Bollerslev (1987) suggested using the student-t
distribution with an estimated kurtosis regulated by the degrees of freedom parameter could
fit the data more better. That is a t-GARCH model. Thus, the specification of the t-GARCH
model is
rt = t + t ;
t |Ft1 f (t |Ft1 )
+1 1
= ( )( )1 (( 2)ht|t1 ) 2
2 2
(+1)
(1 + 2t h1 1
t|t1 ( 2) )
2 , > 2;
q
X p
X
ht = 0 + i 2ti + i hti ;
i=1 i=1
t = 1, , T.
where Ft1 is the information set available up through time t 1, f (t |Ft1 ) is the student-t
distribution function for t with degrees of freedom , ht is the conditional variance for t ,
and T is the sample size. Besides, f (t |Ft1 ) is symmetric around zero and the variance and
the fourth moment are
Var(t |Ft1 ) = ht|t1 ;
14
E(4t |Ft1 ) = 3( 2)( 4)1 h2t|t1 , > 4.
Furthermore, if the student-t distribution approximates a normal distribution with

variance ht|t1 , but when is not large enough, the student-t distribution has fatter tails
than the corresponding normal distribution. Here, the degree of freedom is treated as an
unknown parameter, which need to be estimated with the other unknown parameters in the
model. Let be all the unknown parameters in the model. Then the log-likelihood function
for the whole sample is
T
X
LT () = log f (t |, Ft1 ).
t=1
The standard inference procedures regarding are available.
3.8.1 t-distributed Errors
When the innovation zt are assumed to be normally distributed, the conditional distribution
of t is normal with mean zero and variance ht . The unconditional distribution of a series t
for which the conditional variance follows a GARCH model is non-normal. In particular, the
kurtosis of t is larger than the normal value of 3, the unconditional distribution has fatter
tails than the normal distribution.
3.9 Long Memory Stochastic Volatility (LMSV) Models
The stochastic volatility model is specified by

p
t = zt ht , ht = exp(vt /2),
where {vt } is independent of zt , {zt } is independent and identically distributed with mean
zero and variance one, and {vt } is an ARMA. The long memory stochastic volatility (LMSV)
model is suggested by Breidt, Crato, and Lima (1998) for {vt } being a stationary long-memory
process.
When {vt } is Gaussian, {t } is both covariance and strictly stationary. Denote () as the
autocovariance function of {vt }. The covariance structure of t is obtained from properties
of the lognormal distribution:
IE(t ) = 0, var(t ) = exp((0)/2) 2
cov(t , t+h ) = 0 for h 6= 0.
15
So that {t } is a white noise sequence. Since
IE(4t )
3 = 3{exp[(0)] 1} > 3
[IE(2t )]2
when the driving noise {zt } is Gaussian, {t } displays excess kurtosis.
The process {2t } is also both covariance and strictly stationary. Thus

2 (0)
IE(t ) = exp 2,
2
var(2t ) = 4 {[1 + var(zt2 )] exp[2(0)] exp (0)]},
cov(2t , 2t+h ) = 4 {exp[(0) + (h)] exp[(0)]}, for h 6= 0.
The series is simple to analyze after it is transformed to the stationary process
xt = log(2t ) = log(ht zt2 ) = log(ht ) + log(zt2 )
= log[ 2 (exp(vt /2))2 ] + log(zt2 )
= log( 2 ) + log[(exp(vt /2))2 ] + log(zt2 )
= log( 2 ) + 2 log[exp(vt /2)] + log(zt2 )
= log( 2 ) + IE[log(zt2 )] + vt + {log(zt2 ) IE[log(zt2 )]}
= + vt + t ,
where {t } is i.i.d. with mean zero and variance 2 . For example, if zt is standard normal,
then log(zt2 ) is distributed as the log of a 21 random variable, IE[log(zt2 )] = 1.27 and
2 = /2 (Wishart, 1947). Therefore, {xt } is a long-memory Gaussian signal plus an i.i.d.
non-Gaussian noise, with IE(xx ) = , and
x (k) = cov(xt , xt+k ) = (k) + 2 I{k=0} ,
where
I{k=0} = 1, if k=0
= 0, otherwise.
It turns out that the autocovariance function of the process {log(2t )} is the same as that of
a specified fractional integrated EGARCH model, see Breidt, et al (1998).
16
The LMSV model considered in this paper is to extend the original LMSV model of Breidt,
et al (1998) by replacing ht = exp(xt /2) with a more general specification t = K(xt ), i.e.,
:
rt = t t , t = K(xt ),
where K() is a positive Borel function,

X
xt = i ti
i=1
is a long-memory linear process with regular varying coefficients i = i L(i), 1/2 < < 1,
and i.i.d. zero-mean-unit-variance innovations {t } which is independent of {t }. IE(x2t ) = 1
is also assumed. This model is called GLMSV in this paper. Under the GLMSV model
assumption, this paper aims to estimate and test the high-order cross moments (or higher-
order cross covariance)
(p, q, k) = cov(rtp , rt+k

q
)
by using the nature sample estimate

n
X
(p, q, k) = n 1
(rtp p )(rt+k
q
q ),
t=1
Pn
where n is the sample size and s = n1 s
t=1 rt , s = p, q. The major implication of (p, q, k)
is that it can be used to reflect various kind of forecast efficiency.
1. (1, q, k) = 0 for all q and k speaks of the usual market efficiency;
2. Non-zero (p, q, k) with restricted q and fixed k is related to the efficiency of volatility
forecast, skewness forecast and kurtosis forecast for p = 2, 3, and 4, respectively.
The limiting distribution of the normalized (p, q, k) is derived in Theorem 1. Since the
result of (1) depends on the values of and d, consistent estimates of and d are suggested
in empirical inference on (p, q, k).
4 Testing for GARCH
In this section, several important issues related to the statistic testing for GARCH models
are discussed.
17
1. Homoscedastic v.s. Linear GARCH
2. Linear GARCH v.s. Asymmetric Nonlinear GARCH
3. Homoscedastic v.s. Asymmetric Nonlinear GARCH
4.1 Testing for Linear GARCH
Engle (1982) developed a test for the conditional heteroskedasticity in the context of ARCH
models based on the Lagrange Multiplier (LM) principle. Given an ARCH(q) model,
q
X
ht = 0 + i 2ti ,
i=1
the null of constant ht becomes H0 : 1 = = q = 0. The corresponding LM test can be

computed as T R2 , where T is the sample size and the R2 is obtained from auxiliary regression:
2t = 0 + 1 2t1 + + q 2tq + ut ,
where t is the residual obtained from the regression of conditional mean under the null
hypothesis. The LM test-statistic has an asymptotic 2 (q) distribution. Lee (1991) shows
that the LM test against this GARCH(p, q) alternative is the same as the LM test against
the alternative of ARCH(q) errors.
4.2 Testing for Nonlinear GARCH
With respect to the specification of nonlinear GARCH models, there are two possible routes
one might follow. First, one can start with specifying and estimating a linear GARCH
model and subsequently test the need for asymmetric of other nonlinear components in the
model. Second, one can test the null hypothesis of conditional homoscedasticity against the
alternative of asymmetric ARCH.
Engle and Ng (1993) discuss tests to check whether positive and negative shocks have a

different impact on the conditional variance. Let St1 denote a dummy variable which takes
the value 1 when t1 is negative and 0 otherwise, where t are the residuals from estimating a
model for the conditional mean of rt under the null of conditional homoscedasticity. The tests
+
examine whether the squared residual 2t can be predicted by St1 , St1 t1 , and/or St1 t1 ,
18
+
where St1 1 St1 . The test-statistics are computed as the t-ratio of the parameter 1
in the regression
2t = 0 + 1 wt1 + ut ,
+
where wt1 is one of St1 , St1 t1 , and St1 t1 .

Where wt1 = St1 , the test is called the Sign Bias (SB) test. In case wt1 = St1 t1
+
or wt1 = St1 t1 , the tests are called the Negative Size Bias (NSB) and Positive Size
Bias (PSB) tests, respectively. As the SB, NSB and PSB statistics are t-ratios, they follow a
standard normal distribution asymptotically.
The tests can be constructed jointly, by estimating the regression
2t = 0 + 1 St1

+ 2 St1 +
t1 + 3 St1 t1 + ut .
The null hypothesis H0 : 1 = 2 = 3 = 0 can be evaluated by computing T R2 , where

R2 is the coefficient of determinant from the regression. The resulted test-statistic has an
asymptotic 2 distribution with 3 degrees of freedom.
Sentana (1995) discusses a test of homoscedasticity against the alternative of quadratic
ARCH (QARCH). Consider the QARCH model in (18) and setting 1 = = p = 0 as well
as adding lagged shocks t2 . . . , tq and their squares, that is
ht = 0 + 1 t1 + 2 t2 + + q tq
+1 2t1 + 2 2t2 + + +q 2tq .
Therefore, the null for conditional homoscedasticity is H0 : i = i = 0, i = 1, . . . , q. A LM

statistic to test the null can be computed as T R2 from a regression of 2t on t1 , . . . , tq and
2t1 , . . . , 2tq . Asymptotically, the LM-statistic is 2 distributed with 2q degrees of freedom.
Hagerud (1997) suggests two statistics to test constant conditional variance against STARCH.
The STARCH(q)
q
X p
X
ht = 0 + {i 2ti [1 F (ti )] + i 2ti F (ti )} + j htj ,
i=1 j=1
where F () is either the logistic function or the exponential function. The null hypothesis of
conditional homoscedasticity can again be specified as H0 : 1 = = q = 1 = = q =
0. The testing problem is complicated in this case as the parameter in F () is not identified
19
under the null. The solution is to approximate the transition function by a lower-order Taylor.
In case of the Logistic STARCH (LSTARCH) model, this results in the auxiliary model
q
X q
X
ht = 0 + i 2ti + i 3ti .
i=1 i=1
An LM-statistic to test the equivalent null hypothesis H0 : 1 = = q = 1 = =

q = 0 can be computed as T R2 from the regression of 2t on 2t1 , . . . , 2tq and 3t1 , . . . , 3tq .
Asymptotically, the statistic is 2 distributed with 2q degrees of freedom. In case of the
Exponential STARCH (ESTARCH) model, the auxiliary regression includes 4ti , i = 1, . . . , q
instead of 3ti , i = 1, . . . , q.
4.3 Testing for ARCH in the Presence of Misspecification
The small sample properties of the LM test for linear (G)ARCH have been investigated
quite extensively. In particular, it has been found that rejection of the null hypothesis of
homoscedasticity might be due to other sorts of model misspecification, such as neglected
serial correlation (e.g., Engle, Hendry, and Trumble (1985), Bera, Higgins, and Lee (1992)
and Sullivan and Giles (1995)), nonlinearity (e.g., Bera and Higgins (1997)) and omitted
variables (e.g., Giles, Giles, and Wong (1993) and Lumsdaine and Ng (1999)) in the model
for the conditional mean.
One suggestive solution might be estimating the conditional mean by nonparametric re-
gression and then the fitted residuals are obtained. Using the significance test in nonpara-
metric regression of 2t on t1 , 2t1 , 3t1 and 4t1 .
4.4 Testing for ARCH in the Presence of Outliers
van Dijk, Franses, and Lucas (1999) show that the behaviour of the LM test for ARCH based
on the regression:
q
X
ht = 0 + i 2ti ,
i=1
in the presence of additive outliers (AO). If the AOs are neglected, the LM test suffers from
the over-rejection of the null of homoscedasticity when it is in fact true. Besides, the test has
difficulty detecting genuine GARCH effects, in the sense that the power of the test is reduced
considerably. To overcome this problem, the robust LM test is suggested.
20
Suppose the conditional mean equation is rt = 1 rt1 + t . Lucas (1996) discusses how to
use the Generalized M (GM) estimator to estimate the conditional mean. A GM estimator
of 1 is obtained from the moment condition:
T
X
(rt 1 rt1 )rt1 w (t ) = 0,
t=1
where t denotes the standardized residual, t (rt 1 rt1 )/[ wx (rt1 )], with a measure
of scale of the residuals t rt 1 rt1 and wx and w are weight functions bounded between
0 and 1. As AO at t = shows up as aberrant value of r and/or (r +1 1 r )/ , whereas
the latter can also be caused by an AO at time + 1 of course. The functions wx and w
should be chosen such that the + 1st observation receives a relatively small weight if either
the regressor r or the standardized residual (r +1 1 r )/ becomes large, such that the
outlier does not influence the estimate of 1 and .
The weight function w (t ) usually is specified in term of a function (t ) as w (t ) =
(t )/t for t 6= 0 and w (0) = 1. Common choices for the () function are the Huber and
Tukey bisquare functions. The Huber function is given by

c if t c

(t ) = t if c < t c

c if > c, t
where med denotes the median and c > 0. Usually c is taken

or () = med(c, c, ),
equal to 1.345 to produce an estimator that has an efficiency 95 % compared to the OLS
estimator if t is normally distributed. The weights w (t ) implied by the Huber function
have the attractive property that w (t ) = 1 if c t < c. Only observations for which the
standardized residual is outside this region receive less weight. A disadvantage is that these
weights decline to zero very slowly. Subjective judgement is thus required to decide whether
a weight is small or not.
The Tukey bisquare function is given by

(1 ( /c)2 )2 if | | c
t t t
(t ) =
0 if |t | > c.
Usually c is set equal to 4.685, again to achieve 95 % efficiency for normally distributed t .
21
A third possibility is the polynomial function as proposed in Lucas, van Dijk and
Kloek (1996), given by

if |t | c1

t
(t ) = sgn(t )g(|t |) if c1 < |t | c2

0 if |t | > c2 .
Usually, c1 = 2.576 and c2 = 3.291 are taken.
The weight function wx () for the regressor is commonly specified as
wx (rt1 ) = (d(rt1 ) )/d(rt1 ) ,
where d(rt1 ) is the Mahalanobis distance of rt1 , that is d(rt1 ) = |rt1 mr |/r , with mr
and r measures of location and scale of rt1 , respectively. These measures can be estimated
robustly by the median mr = med(rt1 ) and the median absolute deviation (MAD) r =
1.483med(|rt1 mr |), respectively. Finally, following Simpson, Ruppert, and Carroll (1992),
the constant usually is set equal to 2 to obtain robustness of standard errors.
Notice that the weight w () depend on the unknown parameter 1 and therefore are
not fixed a priori but are determined endogenously. Consequently, the moment condition is
nonlinear in 1 and , and estimation of these parameters requires an iterative procedure.
In fact, interpreting w () as a function of (1 , ), w (1 , ), and denote the estimates of 1
(n) (n)
and at the nth iteration by 1 and , respectively. It follows the moment condition
(n+1)
that 1 might be computed as the weighted least squares estimate
PT (n) (n)
(n+1) t=1 w (1 , )rt1 rt
1 = PT (n) (n) 2
,
t=1 w (1 , )rt1
where the estimate of can be updated at each iteration using a robust estimator of scale,
such as the median absolute deviation (MAD) given by = 1.483 med(|t med(t )|).
Let t be the standardized residuals and the weight function w (t ). The weighted and
standardized residuals w (t )t = (t ) can be constructed. A robust equivalent to the
LM test for ARCH(q) is obtained as T R2 , where R2 is from the regression of (t )2 on
a constant and (t1 )2 , . . . , (tq )2 . Under conventional assumptions, the outlier-robust
LM test has a 2 (q) distribution asymptotically. A similar procedure can be followed to
obtain outlier-robust tests against the alternative of nonlinear ARCH. For example, a robust
test against LSTARCH(q) can be computed as T R2 with R2 from a regression of (t )2 on
(t1 )2 , . . . , (tq )2 and (t1 )3 , . . . , (tq )3 .
22
5 Estimation for Conditional Variance Models
Recall that rt = log(pt ) log(pt1 ) be the log return of an asset at time t, where pt is the
price level of the asset at t. Suppose the conditional mean and conditional variance of rt
given Ft1 are
E(rt |Ft1 ) = t = G(xt ; )
var(rt |Ft1 ) = E[(rt t )2 |Ft1 ] = ht ,
where G(xt ; ) is the conditional mean function with parameters for exogenous regressors
xt . The function G() is assumed at least twice continuously differentiable. Then
rt = G(xt ; ) + t .
As to the conditional variance, the model for time-varying variance is assumed to have pa-
rameters = (0 , i , i ) . Denote ( , ) and their true values as 0 = (0 , 0 ) . Given
the assumed pdf f (zt ) for zt , the parameters in can conveniently be estimated by maximum
likelihood (ML). The conditional log likelihood for the tth observation is
p p
lt () = ln f (t / ht ) ln ht .
For example, if zt is normally distributed,
1 1 p 2
lt () = ln 2 ln ht t .
2 2 2ht
The maximum likelihood estimator (MLE) for , denoted as ML , is found by maximizing the
log likelihood for the full sample, which is simply the sum of the conditional log likelihoods.
The MLE solves the first-order condition
XT
lt () set
= 0.
t=1

p
Denote st () lt ()/ as the scores. As t = zt ht , the score can be decomposed as
st () = (lt ()/ , lt ()/ ), where
2
lt () t G(xt ; ) 1 t ht
= + 1 ,
ht 2ht ht
2
lt () 1 t ht
= 1 .
2ht ht
23
If the conditional distribution f () is correctly specified, the resulting estimates are consistent
and asymptotically normal. That is,

T (ML 0 ) d N (0, A1
0 ),
where A1
0 is the inverse of the information matrix evaluated at the true parameter vector
0 ,
T 2 T
1X lt (0 ) 1X
A0 = E = E(Ht (0 )).
T T
t=1 t=1
The negative definite of the matrix of second-order partial derivatives of the log likelihood
with respect to the parameters, Ht () 2 lt ()/ , is called the Hessian matrix. The
matrix A0 can be consistently estimated by its sample analogue
T
!
1 X 2 lt (ML )
AT (ML ) = .
T t=1
5.1 Quasi Maximum Likelihood Estimator
As argued previously, conditional normality of t is often not a very realistic assumption for
high-frequency financial time series, as the resulting model fails to capture the kurtosis in the
data. Instead, one sometimes assumes that zt follows a Student-t distribution or any other
distributions, say stable distributions. The parameters in the GARCH models can then be
estimated by maximizing the log likelihood corresponding with the particular distribution.
As one can never be sure that the specified distribution of zt is the correct one, an alternative
approach is to ignore the problem and base the likelihood on the normal distribution. This
method usually is referred to as quasi maximum likelihood estimator (QMLE). In general, the
resulting estimates still are consistent and asymptotically normal. However, the asymptotic
variance-covariance of the QMLE has to be adjusted as A1 1
0 B0 A0 , where B0 is the expected
value of the outer product of the gradient matrix

T T
1X lt (0 ) lt (0 ) 1X
B0 = E = E[st (0 )st (0 ) ].
T T
t=1 t=1
The asymptotic covariance matrix can be estimated consistently by using the sample ana-
logues for both A0 , given above, and B0 , given by
!
1 lt (QMLE ) lt (QMLE )
BT (QMLE ) =
T
24
T
1X
= s ( )s ( ) .
T t=1 t QMLE t QMLE
5.2 Robust Estimation
Several approaches to handle outliers in GARCH models have been investigated. Sakata
and White (1998) consider outlier-robust estimation for GARCH models, using the technique
mentioned previously. Hotta and Tsay (1998) derive test-statistics to detect outliers in a
GARCH model, distinguishing between outliers which do and which do not affect the condi-
tional volatility. Franses and Ghijsels (1999) apply the outlier detection method of Chen and
Liu (1993) to GARCH models. For simplicity, we consider a GARCH (1,1) model
p
t = zt ht
ht = 0 + 1 t1 + 1 ht1 ,
where 0 > 0, 1 > 0, 1 > 0 and 1 + 1 < 1 such that the model is covariance-stationary.
As the GARCH(1,1) model can be represented as an ARMA(1,1) model for t ,
2t = 0 + (1 + 1 )2t1 + t 1 t1 , (20)
where t = 2t ht . In additional, (20) can be rewritten as
[1 (1 + 1 )L]2t = 0 + (1 L)t
and then
0 1 (1 + 1 )L 2
t = + t ,
1 1 L 1 1 L
where L is the lag operator. Define the lag polynomial (L) as
1 (1 + 1 )L
(L) =
1 1 L
= (1 1 L)1 [1 (1 + 1 )L]
= (1 + 1 L + 12 L2 + 13 L3 + )[1 (1 + 1 )L]
= 1 1 L 1 1 L2 1 12 L3
X
= k .
k=0
25
Suppose that instead of the true series t one observes the series et which is defined by
e2t = 2t + I[t = ],
where I[t = ] is the indicator function defined as I[t = ] = 1 if t = and zero otherwise, and
is a nonzero constants indicating the magnitude of the outlier. Applying the GARCH(1,1)
model to the observed series e2t , it is straightforward to show that the corresponding residuals
ut are given by
0 1 (1 + 1 )L 2
ut = + e
1 1 L 1 1 L
0
= + (L)e2t
1 1 L
0
= + (L)(2t + I[t = ])
1 1 L
0
= + (L)2t + (L)I[t = ])
1 1 L
= t + (L)I[t = ]).
The last line can be interpreted as a regression for ut , that is
ut = xt + t ,
with
xt = 0, for t < ,
x = 1,
x +k = k for k = 1, 2, . . . .
The magnitude of the outlier at time t = then can be estimated as

T
!1 T !
X X
2
( ) = xt xt ut .
t=1 t=1
For fixed , the t-statistic of ( ), denoted as t( ) , has an asymptotic standard normal
distribution. In practice, the timing of possible outliers is of course unknown. In that case,
an intuitive plausible test-statistic is the maximum of the absolute values of the t-statistic
over the entire sample, that is
tmax () = max |t ( )|.

1 T
The distribution of tmax () is nonstandard and deserves to be investigated.

The outlier detection method for GARCH(1,1) models then consists of the following steps:
26
1. Estimate a GARCH(1,1) model for the observed series et and obtain estimates of the
conditional variance ht and t e2t ht .
2. Obtain estimates ( ) for all possible = 1, . . . , T and compute the test-statistic

tmax (). If the value of tmax () exceeds the pre-specified critical value C and outlier
is detected at the observation for which the t-statistic of is maximized (in absolute
value), say .
3. Replace e2 with e2 e2 ( ) and define the outlier corrected series et as et = et for

t 6= and
q
e = sgn(e ) e2
.
4. Return to step (1) to estimate a GARCH(1,1) model for the series et .
The iterations terminate if the tmax () statistic no longer exceeds the critical value C,
6 Diagnostic Checking: Testing Properties of Standardized Residuals
1/2
One of the assumptions which is made in GARCH models is that the innovations zt = t ht
are independent and identically distributed. Hence, if the model is correctly specified, the
1/2
standardized residuals zt = t ht should possess the classical properties of well behaved re-
gression errors, such as constant variance, lack of serial correlation, and so on. One particular
interest is to test whether the standardized residuals still contain signs of conditional het-
erokedasticity. Lundbergh and Terasvirta (1998) suggest the LM test for remaining ARCH(m)
in zt as T R2 , where R2 is from the auxiliary regression
zt2 = 0 + 1 zt1 + + m ztm + xt + ut ,
where the vector xt consists of the partial derivatives of the conditional variance ht with
respect to the parameters in the original GARCH model, evaluated under the null hypothesis,
that is, xt h1
t ht /. For example, in the case of a GARCH(1,1) model
ht = 0 + 1 2t1 + 1 ht1 ,
it follows that
ht h
= (1, 2t1 , ht1 ) + 1 t1 .

27
Then
Pt1 Pt1 Pt1 !
i1 i1 2 i1
i=1 1 i=1 1 ti i=1 1 hti
xt = , , .
ht ht ht
The test-statistic for the null H0 : 1 = = m = 0 is asymptotically 2 distributed with

m degree of freedom.
6.1 Testing for Higher-order GARCH
The LM-statistic to test a GARCH(p, q) specification against either a GARCH(p + r, q) or

GARCH(p, q + s) alternative has been discussed by Bollerslev (1986). The test statistic is
T R2 where R2 is from the regression
zt2 = 0 + 1 2tq1 + + r 2tqr + xt + ut ,
or
zt = 0 + 1 ztp1 + + s ztps + xt + ut .
6.2 Testing Parameter Constancy
Conditional variance of a financial time series measures market risk and plays very important
roles in pricing derivative securities. Given the structural changes caused by various economic
shocks occur all the time, it is necessary to detect any structural shift in the conditional
variance in order to correctly forecast volatility and price derivative securities. Recently,
models with conditional heteroskedasticity, such as the ARCH-family are heavily and very
popular used in examining financial time series. Therefore, detecting the parameter stability
in conditional variance equation is an important issue because failure to detect and thereby
ignore the instability of conditional variance parameter can easily lead to spurious inference.
As noted in Lamoureux and Lastrapes (1990), that the high degree of persistence in GARCH
models might be due to a misspecifications of the variance equation. By introducing dummy
variables for deterministic shifts in the unconditional variances, they find out that the duration
of the volatility shocks is conspicuously reduced. Thus, neglect instability of parameters
in conditional variance equation can lead to the spurious appearance of extremely strong
persistence in variance. A similar point is raised by Diebold (1986), who conjectures that the
28
apparent existence of a unit roots as in the IGARCH class of models may be the result of
shifts in regimes which affect the level of the unconditional variances. Obviously, incorrect
forecasting in volatility and spurious inference would be due to neglected structural change in
conditional variance equation. Hence, it is important to develop a test for testing instability in
conditional variance equation and then specifying the conditional variance equation correctly
would helpful to get more precisely inference and forecasting. However, relevant study on the
conditional variance function is very limited. A few exceptions are, among others, Chu (1995),
Lin and Chang (1997), Lin and Yang (1999). Chu (1995)s and Lin and Chang (1997)s LM
test statistics are mainly designed for detecting parameter shift in the conditional variance
function, and the test statistics derived by Lin and Yang are capable of detecting changes in
the parameters characterizing the mean and the conditional variance as well as changes in
higher moments. In this paper, we introduce Chus Lagrange multiplier test for parameter
shifts in the conditional Gaussian GARCH model.
Consider a standard Gaussian GARCH(p,q) model:
t |t1 N (0, ht ),
Xq p
X
ht = 0 + i 2ti + i hti ,
i=1 i=1
where p 0, q 0, 0 , i 0, i 0 for all i. Let [T ] denote the integer part of T . The

quasi log-likelihood function under the alternative of one-time parameter shift in the variance
equation is
[T ] T
X X
1
LT (r1 , r2 , |.) = T ln f1t + T 1 ln f2t , (21)
t=1 t=[T ]+1
where is the break-point parameter within the interval(0,1),
1
ln f1t = [ ln h1t h1 2
1t t ], h1t = r1 Zt ,
2
1
ln f2t = [ ln h2t h1 2
2t t ], h2t = r2 Zt ,
2
r1 = (10 , 11 , ..., 1q , 11 , ..., 1p )
r2 = (20 , 21 , ..., 2q , 21 , ..., 2p )
Zt = (1, 2t1 , ..., 2tq , ht1 , ..., htp ) .
29
Let k = p + q + 1 be the number of parameters in the variance equation and r = (r1 , r2 ) .
Based on (5), the score is given by
(LT /r)2k1 = [(LT /r1 ) , (LT /r2 ) ] ,
where
LT /r1
[T ]
X 1
= T 1
[h1 2 2
1t (h1t /r1 ) (0 t /h1t )(h1t /r1 )]
2
t=1
[T ]
X1
= T 1 (h1t /r1 )[2t h2 1
1t h1t ]
t=1
2
[T ]
X1
= T 1 h1 2
1t (h1t /r1 )[t /h1t 1]
2
t=1
and
LT /r2
T
X 1
= T 1 [h1 2 2
2t (h2t /r2 ) (0 t /h2t )(h2t /r2 )]
2
t=[T ]+1
T
X 1
= T 1
(h2t /r2 )[2t h2 1
2t h2t ]
2
t=[T ]+1
T
X 1 1
= T 1 h (h2t /r2 )[2t /h2t 1].
2 2t
t=[T ]+1
We are interested in testing the null hypothesis of no structural change H0 : r1 = r2

versus H0 : r1 6= r2 , where r1 is the pre-break parameter vector of order k 1 in the variance
equation and r2 is the post-break parameters of the same order k. The LM test requires
estimating the null model only. The null hypothesis under testing is H0 : r1 = r2 = r0 . We
use the restricted quasi-maximum likelihood estimators (QMLE) re to construct the prototype
LM statistic. By Aitcheson and Silvey (1958) we have
LMT ()
rT , )/r] A1
= T [LT (e 1 1
0 ()R [RC0 ()R ] RA0 ()[LT (e
rT , )/r]
e [RA1 ()R ][RC ()R ]1 [RA1 ()R ]
= T e , (22)
T 0 0 0 T
30
where reT is the quasi-maximum likelihood estimators for r under the null model, R = [Ik
e is a k 1 Lagrange multiplier, A () lim
Ik ], 2
T 0 T E[ LT (r0 , r0 , )/rr ], B0 () =
limT var[T1/2 LT (r0 , r0 , )/r] and C0 () = A1 1

0 B0 A0 .
Given the process
{vt h1 2
0t (h0T /r)(t /h0t 1)}, h0t r0 Zt ,
obeys the functional central limit theorem, that is
1/2
T 1/2 V0 S[T ] Wk (),
P P P[T ]
where V0 = limT T 1 E[( Tt=1 vt )( Tt=1 vt ) ], S[T ] = t=1 vt , and Wk () is a standard
k-dimensional Wiener process, the LM statistic for detecting one-time parameter shift is
LMT = max LMT ()

sup Bk2 (),

where
[Wk () Wk (1)] [Wk () Wk (1)]

sup Bk2 ()
(1 )
known as the square of a tied-down Bessel process of order k. It follows from the continu-
ous mapping theorem (Billingsley, 1968) that sup LM () sup Bk2 (), where is a
prescribed subset of [0, 1] to prevent the limiting distribution from diverging.
7 Forecasting Volatility
The presence of time-varying volatility has some pronounced consequences for out-of-sample
forecasting. For simplicity, we consider the following model
rt = 1 rt1 + t
p
t = zt ht
ht = 0 + 1 2t1 + 1 ht1 .
The general case of ARMA(k, l)-GARCH(p, q) models is discussed in Baillie and Boller-
slev (1992).
31
7.1 Forecasting the Conditional Mean in the Presence of Conditional Heteroskedasticity
Let rt+h|t denote the h-step-ahead forecast of rt which minimizes the squared prediction
errors (SPE)
SPE(h) E[e2t+h|t ] = E[(rt+h rt+h|t )2 ],
where et+h|t is the h-step-ahead forecast error. Baillie and Bollerslev (1992) show that the
forecast that minimizes SPE is the same irrespective of whether the shocks t are conditional
homoscedastic or conditional heteroskedastic. Thus, the optimal h-step-ahead forecast of
rt+h is its conditional expectation at time t, that is
rt+h|t = E[yt+h |Ft ].
For the AR(1) conditional mean, the optimal 1-step-ahead forecast is given by rt+1|t = 1 rt
and the optimal h-step-ahead forecast is as
rt+h|t = 1 rt+h1|t
= 1 [1 rt+h2|t ]
..
. h1 yt .
For the h-step-ahead prediction error, it follows that
et+h|t = rt+h rt+h|t = 1 rt+h|t + t+h h1 rt
= 21 rt+h1 + t+h h1 rt
=
h
X
= h1 rt + hi
1 t+i 1 rt
i=1
h
X
= hi
1 t+i .
i=1
The conditional SPE of et+h|t is given by
!2
Xh
E[e2t+h|t |Ft ] = E hi
1 t+i |Ft
i=1
h
X 2(hi)
= 1 E[2t+i |Ft ]
i=1
Xh
2(hi)
= 1 E[ht+i |Ft ].
i=1
32
In the case of homoscedastic errors, the conditional SPE for the optimal h-step-ahead forecast
is constant, as E[ht+i |Ft ] is constant and equals to the unconditional variance of t , 2 .
However in the case of heteroskedastic errors, the conditional SPE is varying over time. Since
h
X h
X
2(hi) 2 2(hi)
E[e2t+h|t |Ft ] = 1 + 1 [E(ht+i |Ft ) 2 ],
i=1 i=1
the conditional SPE in the case of heteroskedastic errors can be both larger and smaller than
in the case of homoscedastic errors.
Recall that in the homoscedastic case, the SPE converges to the unconditional variance
of rt as the forecast horizon h increases, that is
h
X 2(hi) 2 2
lim E[e2t+h|t |Ft ] = lim 1 = r2 .
h h 1 2
i=1
Moreover, the convergence is monotonic, in the sense that the h-step-ahead SPE is always
smaller than the unconditional variance r2 , while the h-step-ahead SPE is larger than the
(h 1)-step SPE for all finite horizons h. The convergence of the SPE to the unconditional
variance of the time series also holds in the present case of heteroskedastic errors.
7.2 Forecasting the Conditional Variance
In the case of GARCH(1,1) model, the conditional expectation of ht+s , i.e., the optimal
s-step-ahead forecast of the conditional variance, can be computed recursively from
ht+s = 0 + 1 2t+s1|t + 1 ht+s1|t ,
where 2t+i|t = ht+i|t for i > 0 by definition, while 2t+i|t = 2t+i and ht+i|t = ht+i for i 0.
Alternatively, by recursive substitution we obtain
ht+s|t = 0 + 1 2t+s1|t + 1 ht+s1|t
= 0 + (1 + 1 )ht+s1|t
= 0 + (1 + 1 )[0 + (1 + 1 )ht+s2|t ]
= [0 + (1 + 1 )0 ]
+(1 + 1 )2 [0 + (1 + 1 )ht+s3|t ]
= [0 + (1 + 1 )0 + (1 + 1 )2 0 ]
+(1 + 1 )3 [0 + (1 + 1 )ht+s4|t ]
33
.. ..
. .
s1
X
= 0 (1 + 1 )i + (1 + 1 )s1 ht+1 .
i=0
Ps1
Given the formula, i=1 r i = (1 r s1 )/(1 r), the last equation becomes
1 (1 + 1 )s1
ht+s|t = 0 + (1 + 1 )s1 ht+1
1 1 1

0 s1 0
= + (1 + 1 ) ht+1
1 1 1 1 1 1
2 s1 2
= + (1 + 1 ) (ht+1 ),
where 2 = 0 /(1 1 1 ) is the unconditional variance of t .

The s-step-ahead forecasting error is vt+s|t = ht+s ht+s|t . As in the GARCH(1,1) model,
ht+s = 0 + 1 2t+s1 + 1 ht+s1 , we obtain
vt+s|t ht+s ht+s|t
= [0 + 1 2t+s1 + 1 ht+s1 ]
[0 + 1 2t+s1|t + 1 ht+s1|t ]
= 1 (2t+s1 2t+s1|t ) + 1 (ht+s1 ht+s1|t )
= 1 (2t+s1 ht+s1 + ht+s1 ht+s1|t )
+1 (ht+s1 ht+s1|t )
= 1 t+s1 + (1 + 1 )vt+s1|t ,
where we have used the fact that 2t+i|t = ht+i|t for i > 0 and the definition t = 2t ht . By
continued recursive substitution, we have
vt+s|t = 1 t+s1 + (1 + 1 )vt+s1|t
= (1 + 1 )[1 t+s2 + (1 + 1 )vt+s2|t ] + 1 t+s1
= (1 + 1 )2 [1 t+s3 + (1 + 1 )vt+s3|t ] + [1 t+s1 + (1 + 1 )1 t+s2 ]

....
. .
s1
X
= 1 (1 + 1 )i1 t+si .
i=1
As the t s are serially uncorrelated and can be written as t = ht (zt2 1), it follows that the
conditional SPE of the s-step-ahead forecast ht+s|t is given by
s1
X
2
E[vt+s|t |Ft ] = ( 1)21 (1 + 1 )2(i1) E[h2t+si |Ft ],
i=1
34
where is the kurtosis of zt .
7.3 Forecasting Conditional Volatility for Nonlinear GARCH Models
For the GJR-GARCH(1,1) model,
ht = 0 + 1 2t1 [1 I(t1 > 0)] + 1 2t1 I(t1 > 0) + 1 ht1 .
Assuming that the distribution of zt is symmetric around 0, the 2-step-ahead forecast of ht+2
is given by
ht+2|t = E{0 + 1 2t+1 [1 I(t+1 > 0)]
+1 2t+1 I(t+1 > 0) + 1 ht+1 |Ft }
= 0 + [(1 + 1 )/2 + 1 ]ht+1 ,
which follows from observing that 2t+1 and the indicator function I(t+1 > 0) are uncorrelated
and E[I(t+1 > 0)] = P (t+1 > 0) = 0.5, and again using E(2t+1 |Ft ) = ht+1 . In general,
s-step-ahead forecasts can be computed either recursively as
ht+s|t = 0 + [(1 + 1 )/2 + 1 ]ht+s1|t , (23)
or directly from
s1
X
ht+s|t = 0 [(1 + 1 )/2 + 1 ]i + [(1 + 1 )/2 + 1 ]s1 ht+1 . (24)
i=0
For the LSTGARCH(1,1) (logist smoothing transition GARCH(1,1)) model,
ht = 0 + 1 2t1 [1 F (t1 )] + 1 2t1 F (t1 ) + 1 ht1 ,
where F (t1 ) = [1 + exp(t1 )]1 . Ast+i and F (t+i ) are uncorrelated, combined with
the fact that F (t+i ) is anti-symmetric around the expected value of t+i , E(t+i ) = 0, and
thus E[F (t+i )] = F [E(t+i )] = 0.5. In general, a function G(x) is said to be anti-symmetric
around a if G(x+a)G(a) = [G(x+a)G(a)] for all x. If furthermore x is symmetrically
distributed with mean a it holds that E[G(x)] = G[E(x)] = G(a). Then the s-step-ahead
forecasts of LSTGARCH(1,1) is same as in (23) or in (24).
For the VS-GARCH(1,1) (volatility-Switching GARCH(1,1)) model:
ht = (0 + 1 2t1 + 1 ht1 )[1 I(t1 > 0)]
+(0 + 1 2t1 + 1 ht1 )I(t1 > 0) (25)
35
and the ANST-GARCH(1,1) (asymmetric nonlinear smooth transition GARCH(1,1)) model:
ht = (0 + 1 2t1 + 1 ht1 )[1 F (t1 )]
+(0 + 1 2t1 + 1 ht1 )[1 F (t1 )]. (26)
the s-step-ahead forecasts can be computed either recursively from
ht+s|t = 0 + [(1 + 1 )/2 + (1 + 1 )/2]ht+s1|t ,
or directly from
s1
X
ht+s|t = 0 [(1 + 1 )/2 + (1 + 1 )/2]i
i=0
+[(1 + 1 )/2 + (1 + 1 )/2]s1 ht+1 .
For the QGARCH(1,1) (quadratic GARCH(1,1)) model,
ht = 0 + 1 t1 + 1 2t1 + 1 ht1 .
As the asymmetric term 1 t1 does not affect the forecasts for the conditional variance since
the conditional expectation of t+i with i > 0 is zero by assumption. Hence, point forecasts
for the conditional variance can be obtained either recursively as (23):
ht+s|t = 0 + [(1 + 1 )/2 + 1 ]ht+s1|t ,
or directly from (24):

s1
X
ht+s|t = 0 [(1 + 1 )/2 + 1 ]i + [(1 + 1 )/2 + 1 ]s1 ht+1 .
i=0
For the ESTGARCH (exponential smoothing transition GARCH) model, the exponential
function F (t+i ) = 1 exp(2t+i ) is correlated with 2t+i , and it is not in the case that
E[F (t+i )] = F [E(t+i )]. Therefore, it is not possible to derive a recursive or direct formula
for the s-step-ahead ht+s|t in this case. Instead, forecasts for future conditional variance
have to be obtained by means of simulation. As to the Markov-Switching GARCH models,
the analytical expression for multiple-step-ahead forecasts of the conditional variance can be
obtained by exploiting the properties of the Markov-process, e.g., Hamilton and Lin (1996),
Dueker (1997), and Klaassen (1999).
36
7.4 Evaluating Forecasts of Conditional Volatility
As discussed previously, it is quite difficult to select a suitable nonlinear GARCH model on

the basis of specification tests only. The out-of-sample forecasting ability of various GARCH
models is an alternative approach to judge the adequacy of different models. Suppose a
GARCH model has been estimated using a sample of T observations, whereas observations
at t = T + 1, . . . , T + m s 1 are held back for evaluation of s-step-ahead forecasts for the
conditional variance.
Most studies use statistical criteria such as the mean squared prediction error (MSPE),
which for a set of m s-step-ahead forecasts is computed as
m1
1 X
MSPE = (hT +s+j|T +j hT +s+j )2
m
j=0
or the R2 from the regression
hT +s+j = a + bhT +s+j|T +j + eT +s+j , j = 0, . . . , m 1.
To make these forecast evaluation criteria operational, the unobservable hT +s+j is usually
replaced by the squared shock T +s+j .
A common finding from forecast competitions is that all GARCH models provide seem-
ingly poor volatility forecasts and explain only very little of the variability of asset returns,
in the sense that the MSPE is very large while the R2 is very small, typically below 0.1. In
addition, the forecasts from GARCH appears to be biased, as it commonly found that a 6= 0.
Anderson and Bollerslev (1998) and Christodoulakis and Satchell (1998) demonstrated that
this poor forecasting performance is caused by the fact that the unobervable true volatil-
ity hT +s+j is approximated with the squared shock 2T +s+j . As shown by Anderson and
Bollerslev (1998), for a GARCH(1,1) model with a finite unconditional fourth moment the
population R2 for s = 1 and hT +s+J replaced with 2T +s+j is equal to
21
R2 = .
1 12 21 1
As the condition for a finite unconditional fourth moment in the GARCH(1,1) model is given
by 21 + 12 + 21 1 < 1, it follows that the population R2 is bounded from above by 1/.
Where zt is normally distributed, the R2 cannot be larger than 1/3, while the upper bound
is even smaller if, for example, zt is assumed to be Student-t distributed.
37
Christodoulakis and Satchell (1998) explain the occurrence of apparent bias in GARCH
volatility forecasts by noting that
ln(2T +s+j ) = ln(hT +s+j ) + ln(zT2 +s+j ),
or
ln(2T +s+j ) ln(hT +s+j|T +j ) = (ln(hT +s+j ) ln(hT +s+j|T +j ))
+ ln(zT2 +s+j ).
As ln(x) (1 x) for small x, the left-hand-side of above equation is approximately equal

to the observed bias 2T +s+j hT +s+j|T +j . If the GARCH forecasts are unbiased, the first
term on the right-hand-side of above equation is equal to zero. Hence, the expected observed
bias is equal to E[ln(zT2 +s+j )], which in the case of normally distributed zt is equal to -1.27.
Anderson and Bollerslev (1998) suggest that a (partial) solution to the above-mentioned
problems might be to estimate the unobserved volatility with data which is sampled more
frequently than the time series of interest. For example, if rt is a time series of weekly
returns, the corresponding daily returns if available might be used to obtain a more
accurate measure of the weekly volatility.
8 Multivariate Conditional Variance Models
The multivariate GARCH(p,q) regression model can be written as
yt = zt + t
t | t1 N (0, Ht )
where t and yt are n 1 vectors. n is the number of random variables. zt is a 1 k vector

where k is the number of independent variables. is an n k matrix. Ht is an n n matrix.
t1 is the information set available at time t 1. Three specifications of the conditional
variance-covariance matrix, Ht , are presented as follows.
First, allowing each element of Ht to depend on q lagged values of the squares and cross-
products of t , as well as p lagged values of the elements of Ht , and a J 1 vector of weakly
exogenous variables xt , the vec representation is
ht = vec(Ht );
38
xt = vec(xt xt );

t = vec(t t );
q
X p
X
ht = C0 + C1 xt + Ai ti + Gi hti . (27)
i=1 i=1
If we take no account of exogenous influences, the conditional variance equation becomes

q
X p
X
ht = C0 + Ai ti + Gi hti . (28)
i=1 i=1
For example, for n = 2 and p = q = 1, (6.2) becomes

h11,t

ht = h12,t

h22,t

C01,t a11 a12 a13 21,t1

= C02,t + a21 a22 a23 1,t1 2,t1

C03,t a31 a32 a33 22,t1

g g g h
11 12 13 11,t1

+ g21 g22 g23 h12,t1 . (29)

g31 g32 g33 h22,t1
In the above formulation, we omit the redundant variables such as h21,t and the coefficients
to 2,t1 1,t1 and h21,t1 . After eliminating the redundant terms, a total of ((n(n + 1))/2)2
unique parameters exist in each of the Ai and Gi matrices.
Second, in the vec model, if the matrices Ai and Gi are assumed to be diagonal, a diagonal
representation of Ht will be obtained. For example, in the bivariate case, the diagonal model
for GARCH(1,1) is specified as

h11,t

ht = h12,t

h22,t

C01,t a11 0 0 21,t1

= C02,t + 0 a22 0 1,t1 2,t1

C03,t 0 0 a33 22,t1
39

g11 0 0 h11,t1

+ 0 g22 0 h12,t1 , (30)

0 0 g33 h22,t1
or
h11,t = C01,t + a11 21,t1 + g11 h11,t1 ;
h12,t = C02,t + a22 1,t1 2,t1 + g22 h12,t1 ;
h22,t = C03,t + a33 22,t1 + g33 h22,t1 .
This means that a conditional variance depends only on its own lagged squared residuals
and its lagged values. In the bivariate GARCH(1,1) model, there are only three parameters
in each of the A1 and G1 matrices, and in the general n-variate diagonal model there are
((n(n + 1))/2) free parameters in each matrix. No matter which parameterization of Ht is
used, Ht is required to be positive definite for all values of t and xt in the sample space.
However, it is not easy to check and also difficult to impose the restrictions that are
guaranteed the positive definiteness of Ht at the estimation stage. Baba, Engle, Kraft and
Kroner (1990) suggest the following parameterization, known as the BEKK representation,
which is almost guaranteed to be positive definite:
K
X

Ht = C0 C0 +
C1k
xt xt C1k
k=1
X q
K X p
K X
X

+ Aik ti ti Aik + Gik Hti Gik
k=1 i=1 k=1 i=1
XK

= C0 + C1k xt xt C1k
k=1
K
XX q p
K X
X

+ Aik ti ti Aik + Gik Hti Gik (31)
k=1 i=1 k=1 i=1
where C0 , Aik , and G1k are n n parameter matrices with C0 triangular; G1k are J n
parameter matrices; and the summation limit K determined the generality of the process.
Due to the triangular n n matrix C0 , C0 = [cij ] is an n n symmetric matrix. It is clear
that (31) will be positive definite under very weak conditions. Besides, this presentation is
sufficiently general because it includes all positive definite diagonal representations and nearly
all positive definite vec representations. For example, for K = 1 and no exogenous influences,
40
the conditional variance of GARCH(p,q) model is
q
X p
X

H t = C0 + Ai1 ti ti Ai1 + Gi1 Hti Gi1 .
i=1 i=1
Moreover, for the bivariate GARCH(1,1) model with K = 1 and no exogenous influences, the
conditional variance-covariance matrix, Ht , is

Ht = C0 + A11 t1 t1 A11 + G11 Ht1 G11

c11 c12
=
c21 c22

a11 a12 21,t1 1,ti 2,t1 a11 a12
+
a21 a22 2,ti 2,t1 22,t1 a21 a22

g11
g12
g11
g12
+ Ht1 .

g21
g22
g21
g22
Comparing it with the vec representation and excluding constants, there are ((n(n + 1))/2)2
parameters in the vec representation while there are n2 parameters to be estimated in the
BEKK representation.
8.1 Impulse Respond Function for Multivariate GARCH Model
As mentioned in Lin (1997), the impulse response function for conditional volatility is defined
as the impact of a small perturbation of the ith innovation on the future predicted volatility.
Because the classes of multivariate GARCH models can be written as a function of t t , the
response of future volatility to one unit shock in t will depend on the impact filtered through
t t . Hence, the impulse response function is equal to the derivative of the conditional variance
with respect to the vector of dg(t t ), where dg(t t ) is an n 1 vector containing diagonal
elements of t t . For example, the impulse response function for the vec representation of Ht
is defined as
vech(Ht+s|t )
Rs,n = ,
dg(t t )
where Rs,n is an N n matrix and N = (n + 1)n/2. As mentioned previously, there are

various possible formulations when Ht is parameterized as a function of past information.
41
Various forms of Ht and their definitions of the impulse response function are summarized as
follows.
In vec representation of vector multivariate GARCH(p,q) model,
q
X p
X
vech(Ht ) = C0 + Ai vech(ti ti ) + Gi vech(Hti ),
i=1 i=1
where Ai and Gi are n(n + 1)/2 n(n + 1)/2 parameter matrices and C0 is an n(n + 1)/2 1
parameter vector. The impulse response function becomes
vech(Ht+s|t )
Rs,n = ,
dg(t t )
in which the number of parameters equals to [n(n + 1)]2 (p + q)/4 + n(n + 1)/2. In BEKK
representation of the generalized multivariate GARCH(p,q) model,
q
X p
X

Ht = C0 C0 + Ai ti ti Ai + Gi Hti Gi ,
i=1 i=1
where Ai and Gi are nn parameter matrices for all i, and C is an nn triangular parameter
matrix. The corresonding impulse response function is
vech(Ht+s|t )
Rs,n = ,
dg(t t )
with number of parameters (p + q)n2 + n(n + 1)/2. In the constant correlation multivariate
GARCH(p,q) model,
q
X p
X
ht = C + Ai uti + Gi hti
i=1 i=1
1
hij,t = ij (hii,t hjj,t ) 2 ,
where Ht = [hij,t ], ht = (h11,t , , hnn,t ) , ut = dg(t t ), Ai and Gi are n n parameter

matrices for all i, and C is an n 1 parameter vector. The corresponding impulse response
ht+s|t
function is Rs,n = ut with number of parameters (p + q)n2 + n(n + 1)/2.
8.2 Testing Spillover Effects
Suppose the multivariate GARCH(1,1) regression model without exogenous variables is esti-
mated. The model is
yt = c + t
42
and
t | t1 N (0, Ht ),
where yt , c and t are n 1 vectors and n is the number of random variables. Ht is an n n

matrix and t1 is the information set available at time t 1. In this section, the bivariate,
four-variate and eight-variate GARCH(1,1) models are considered respectively.
Let the stock returns in Taiwan be 1,t , the stock returns in New York be 2,t , t =
(1,t , 2,t ) , and Ht = [hij,t ] for i = 1, 2, j = 1, 2. Ht is a symmetric matrix. The BEKK
bivariate GARCH(1,1) model for the Taiwan and the New York returns is
t |t1 N (0, Ht ),
h11,t = c11 + a2 2 2 2
11 1,t1 + 2a11 a21 1,t1 2,t1 + a21 2,t1
2 2
+g11 h11,t1 + 2g11 g21 h12,t1 + g21 h22,t1 ,
h12,t = h21,t
= c12 + a11 a12 2

1,t1 + (a21 a12 + a11 a22 )1,t1 2,t1
+a21 a22 2
2,t1

+g11 g12 h11,t1 + g21 g12 h21,t1

+g11 g22 h12,t1 + g21 g22 h22,t1 ,
h22,t = c22 + a2 2 2 2
12 1,t1 + 2a12 a22 1,t1 2,t1 + a22 2,t1
2 2
+g12 h11,t1 + 2g12 g22 h12,t1 + g22 h22,t1 ,
where t1 is an information set available at time t 1. The estimated results for above
model using maximum likelihood method. The likelihood ratio test statistic is used to test the
volatility spillover effect between Taiwan and New York. The null hypothesis of no volatility
spillovers from New York to Taiwan is
H0 : a21 = g21

= 0,
and the null hypothesis of no volatility spillovers from Taiwan to New York is
H0 : a12 = g12

= 0.
43
References
Andersen, T. and T. Bollerslev (1998), Answering the skeptics: yes, standard volatility
models do provide accurate forecasts, International Economic Review, 39, 885906.
Anderson, H.M., K. Nam and F. Vahid (1999), Asymmetric nonlinear smooth transition
GARCH models, in P. Rothman(ed.), Nonlinear Time Series Analysis of Economic
and Financial Data, Boston:Kluwer, 191207.
Baillie, R.T. and T. Bollerslev (1992), Prediction in dynamic models with time-dependent
conditional variances, Journal of Econometrics, 52, 91113.
Bera, A.K. and M.L. Higgins (1997), ARCH and bilinearity as competing models for non-
linear dependence, Journal of Business and Economic Statistics, 15, 4350.
Bera, A.K., M.L. Higgins and S. Lee (1992), Interaction between autocorrelation and con-
ditional heteroskedastcity: a random coefficient approach, Journal of Business and
Economic Statistics, 10, 133142.
Black, F. (1976), The pricing of commodity contracts, Journal of Financial Economics, 3,

653665.
Bollerslev, T. (1986), Generalized autoregressive conditional heteroskedasticity, Journal of

Econometrics, 31, 307327.
Bollerslev, T. (1987), A conditional heteroskedastic time series model for speculative prices
and rates of return, Review of Economics and Statistics, 69, 542547.
Cai, J. (1994), A Markov model of switching-regime ARCH, Journal of Business and Eco-
nomic Statistics, 12, 309316.
Chen, C. and L.-M. Liu (1993), Joint estimation of model parameters and outlier effects in
time series, Journal of American Statistical Association, 88, 284297.
Christodoulakis, G.A. and S.E. Satchell (1998), Hashing GARCH: a re-assessment of volatil-
ity forecasting performance, in J. Knight and S.E. Satchell (eds.) Forecasting Volatility
44
in the Finacial Market, New York: Butterworth-Heinemann.
Chu, C.-S.J. (1995), Detecting parameter shifts in GARCH models, Econometric Review,
14, 241266.
Dueker, M.J. (1997), Markov Switching in GARCH process and mean reverting stock-market
volatility, Journal of Business and Economic Statistics, 15, 2634.
Engle, R.F. (1982), Autoregressive conditional heteroskedasticity with estimates of the vari-
ance of US inflation, Econometrca, 50, 9871007.
Engle, R.F. and T. Bollerslev (1986), Modelling the persistence of conditional variances,
Econometric Reviews, 5, 150 (with discussion).
Engle, R.F. and V.K. Ng (1993), Measuring and testing the impact of news on volatility,
Journal of Finance, 48, 17491778.
Engle, R.F., D.F. Hendry and D. Trumble (1985), Small-sample properties of ARCH esti-
mators and tests, Canadian Journal of Economics, 18, 6693.
Engle, R.F., D.M. Lilien and R.P. Robins (1987), Estimating time varying risk premia in
the term structure: the GARCH-M models, Econometrica, 55, 391407.
Fornari, F. and A. Mele (1996), Modeling the changing asymmetry of conditional variances,
Economics Letters, 50, 197203.
Fornari, F. and A. Mele (1996), Sign- and volatility-switching ARCH models: theory and
applications to international stock markets, Journal of Applied Econometrics, 12, 49
65.
Franses, P.H. and H. Ghijsels (1999), Additive outliers, GARCH and forecasting volatility,
International Journal of Forecasting, 15, 19.
Franses, P.H. and Dick van Dijk (2000), Nonlinear Time Series Model in Empirical Finance,
Cambridge University Press.
Giles, D.E.A., J.A. Giles and J.W. Wong (1993), Testing for ARCH-GARCH errors in a
misspecified regression, Computation Statistics, 8, 109126.
45
Glosten, L.R., R. Jagannathan and D.E. Runkle (1993), On the relation between the ex-
pected value and the volatility of the nominal excess return on stocks, Journal of
Finance, 48, 17791801.
Gonzalez-Rivera, G. (1998), Smooth transition GARCH, Studies in Nonlinear Dynamics

and Econometrics, 3, 6178.
Hagerud, G.E. (1997), A new non-linear GARCH model, PhD thesis, IFE, Stockholm School
of Economics.
Hamilton, J.D. and G. Lin (1996), Stock market volatility and the business cycle, Journal
of Applied Econometrics, 11, 573593.
Hamilton, J.D. and R. Susmel (1994), Autoregressive conditional heteroskedasticity and

changes in regimes, Journal of Econometrics, 64, 307333.
Hotta, L.K. and R.S. Tsay (1998), Outliers in GARCH processes, Graduate School of
Business, University Chicago, unpublished manuscript.
Klaassen, F. (1999), Improving GARCH volatility forecasts, Tilburg University, unpublished

manuscript.
Lee, J.H.H. (1991), A Lagrange multiplier test for GARCH models, Economics Letters, 37,
265271.
Lucas, A. (1996), Outlier Robust Unit Root Analysis, PhD thesis, Rotterdam: Tinbergen
Institute.
Lumsdaine, R.L. and S. Ng (1999), Testing for ARCH in the presence of a possibly misspec-
ified conditional mean, Journal of Econometrics, 93, 257279.
Lundbergh, R. and T. Terasvirta (1998), Modeling economic high-frequency time series with
STAR-GARCH models, Working Paper in Economics and Finance 292, Stockholm
School of Economics.
Lundbergh, R. and T. Terasvirta (2002), Evaluating GARCH models, Journal of Econo-

metrics, 110, 417435.
46
Nelson, D.B. (1991), Conditional heteroskedasticity in asset returns: a new approach,
Econometrica, 59, 347370.
Pagan, A.R. and G.W. Schwert (1990), Alternative models for conditional stock volatility,
Journal of Econometrics, 45, 267290.
Rabemananjara, R. and J.M. Zakoan (1993), Threshold ARCH models and asymmetries in
volatility, Journal of Applied Econometrics, 8, 3149.
Sakata, S. and H. White (1998), High breakdown point conditional dispersion estimation
with application to S&P 500 daily returns volatility, Econometrica, 66, 529567.
Sentana, E. (1995), Quadratic ARCH models, Review of Economic Studies, 62, 639661.
Simpson, D.G., D. Ruppert and R.J. Carroll (1992), On one-step GM estimates and stability
of inference in linear regression, Journal of American Statistical Association, 87, 439
450.
Sullivan, M.J. and D.E.A. Giles (1995), The robustness ARCH/GARCH tests to first-order
autocorrelation, Journal of Quantitative Economics, 11, 3561.
Taylor, J.W. (1999), Evaluating volatility and interval forecasts, Journal of Forecasting, 18,
111-128.
47

Time Series Analysis: Conditional Volatility Models: Mei-Yuan Chen

Uploaded by

Copyright:

Available Formats

You might also like

Time Series Analysis: Conditional Volatility Models: Mei-Yuan Chen

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Analysis: Conditional Volatility Models: Mei-Yuan Chen

Uploaded by

Copyright:

Available Formats

Time Series Analysis: Conditional Volatility Models

National Chung Hsing University

Feb, 25, 2013

2 Linear Volatility Models 4

3 Nonlinear GARCH Models 10

4 Testing for GARCH 17

6 Diagnostic Checking: Testing Properties of Standardized Residuals 27

8 Multivariate Conditional Variance Models 38

1. The volatility of an asset evolves over-time in a continuous manner.

3. The volatility of an asset does not diverge to infinite.

5. Excess kurtosis or fat-tailedness is commonly observed.

var(rt |Ft1 ) = E[(rt t )2 |Ft1 ] = ht , (1)

rt = E(rt |Ft1 ) + t . (2)

To allow ht for time-varying, ht ht (Ft1 ) is specified and then t is conditional hetero-

2 E(2t ) = E[E(t |Ft1 )] = E[ht ],

1.1 Descriptive Statistics of Heteroskedasticity

Time-variation in volatility (heteroskedasticity) is a common feature of macroeconomic and

ht = [(rt1 t1 )2 + (rt2 t2 )2 + + (rtq tq )2 ]/q

ht = (1 )[(rt1 t1 )2 + (rt2 t2 )2 + 2 (rtq tq )2 + ].

which can be represented and calculated in a recursive fashion as

Suppose we have a regression model

yt = xt 0 + et , E(et ) = 0, cov(xt , et ) = 0. (3)

2.1 ARCH Models

r2 = var(rt ) = E(rt2 ), with t = 0

= E{E[rt2 |Ft1 ]} = E{0 + 1 rt1

rt2 = ht zt2 = ht [1 + (zt2 1)]

is a form of AR(1) model in ht . In addition,

E(zt4 ) = 3[E(zt2 )]2 = 3

E(rt4 ) = E(zt4 h2t ) = E(zt4 )E(h2t ) = 3E(h2t )

Hence, the excess kurtosis is

2.2 GARCH Models

For p = 0, the GARCH(p,q) process reduces to an ARCH(q) process, and for p = q = 0,

2.3 ARCH-in-mean(or ARCH-M) Models

ht = 0 + 1 2t1 + dt1 2t1 + 1 ht1 ,

2.5 Stochastic Volatility Models

ln(ht ) = 0 + 1 ln(ht1 ) + 2 t , (10)

3 Nonlinear GARCH Models

3.2 GJR-GARCH Models

3.3 Smooth Transition GARCH Models

where the function F (ti ) is the logistic function

F (ti ) = 1 exp(2ti ), > 0. (15)

3.4 Volatility-Switching GARCH Models

Clearly, it is a generalization of the GJR-GARCH models.

3.5 Asymmetric Nonlinear Smooth Transition GARCH Models

3.6 Quadratic GARCH Models

3.7 Markov-Switching GARCH Models

3.8 ARCH Models with Conditionally Non-normal Disturbances

Var(t |Ft1 ) = ht|t1 ;

Furthermore, if the student-t distribution approximates a normal distribution with

3.8.1 t-distributed Errors

3.9 Long Memory Stochastic Volatility (LMSV) Models

The stochastic volatility model is specified by

IE(t ) = 0, var(t ) = exp((0)/2) 2

cov(t , t+h ) = 0 for h 6= 0.

cov(2t , 2t+h ) = 4 {exp[(0) + (h)] exp[(0)]}, for h 6= 0.

The series is simple to analyze after it is transformed to the stationary process

xt = log(2t ) = log(ht zt2 ) = log(ht ) + log(zt2 )