Wong and Li

A NOTE ON THE CORRECTED AKAIKE INFORMATION CRITERION
FOR THRESHOLD AUTOREGRESSIVE MODELS
BY C. S. WONG AND W. K. LI
The University of Hong Kong
First version received November 1995
Abstract. A bias-corrected Akaike information criterion AICC is derived for self-

exciting threshold autoregressive (SETAR) models. The small sample properties of
the Akaike information criteria (AIC, AICC ) and the Bayesian information criterion
(BIC) are studied using simulation experiments. It is suggested that AICC performs
much better than AIC and BIC in small samples and should be put in routine
usage.
Keywords. Corrected Akaike information criterion; Kullback±Leibler information;

threshold time series model.
1. INTRODUCTION
Model selection is an important problem in statistical modelling. The Akaike

information criterion (AIC), introduced by Akaike (1973), is one of the leading
selection methods. The properties of AIC for regression models and linear time
series models have been studied by many researchers (for example, see Shibata,
1976). Hurvich and Tsai (1989) (see also Brockwell and Davis, 1991) introduced
a bias correction of AIC (namely AICC ) which can be used in non-linear
regression models and linear time series models.
Although AIC and AICC are both designed to be approximately un-
biased estimates of the Kullback±Leibler information of the ®tted model
relative to the true model, they behave differently in the small sample situation.
Hurvich and Tsai (1989) show that while AICC retains the asymptotic
properties of AIC, it behaves better than AIC in the small sample situation.
For example, in the case of linear time series models, they show that the
minimum AIC procedure will tend to over®t severely when the maximum
allowable dimension is comparable with the number of observations. They also
show that AICC has less tendency to over®t as the bias in estimating the
Kullback±Leibler information is reduced. However, whether the properties of
AIC or AICC are retained in the non-linear time series context is open to
question.
The self-exciting threshold autoregressive (SETAR) models, introduced by
Tong (1978), have been popular non-linear models for many years. However,
the problem of model selection is rarely addressed. Tong (1983, 1990) and Li
0143-9782/98/01 113±124 JOURNAL OF TIME SERIES ANALYSIS Vol. 19, No. 1

# 1998 Blackwell Publishers Ltd., 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street,
Malden, MA 02148, USA.
114 C. S. WONG AND W. K. LI
(1988) suggested using AIC to solve the problem, where AIC for a model is
de®ned as minus twice the maximized log-likelihood plus twice the number of
independently adjusted parameters. However, no theoretical justi®cation for the
procedure has been given. The empirical studies done by Li (1988) suggest that
the minimum AIC procedure can be effective, but they have not taken into
account the small sample situation. A modi®ed version of AICC for SETAR
models was incorporated into the computer package STAR (developed by Tong)
for ®tting non-linear time series models, but neither theoretical nor empirical
justi®cation has been made in the literature.
In this paper, this gap is ®lled for AICC in SETAR model selection. This
cannot be done without knowledge of the asymptotic properties of the
conditional least squares estimator (CLSE) for the SETAR model (Chan, 1993).
We will show that AICC is an asymptotically unbiased estimator for the
Kullback±Leibler information
Ä(øjî) Eî fÿ2 ln f (X ; ø)g
which can be interpreted as a measure of how close the two densities f (:; ø)
and f(´; î) are. The small sample properties of AIC, AICC and the Bayesian
information criterion BIC (Schwarz, 1978) will be investigated by some
simulation studies. The results suggest that the performance of AICC is much
better than those of AIC and BIC in the small sample situation.
The organization of the paper is as follows. In Section 2 we discuss the
asymptotic properties of the CLSE and the proof will be given in the
Appendix. Section 3 develops AICC for SETAR models. Section 4 gives the
minimum AICC procedure in SETAR model selection. The small sample
properties of AIC, AICC as well as BIC are demonstrated via simulation
experiments in Section 5. An example is also given in that section. The
concluding remarks are given in Section 6.
2. ASYMPTOTIC PROPERTIES OF THE CLSE
We consider the following SETAR model:

X t ö10 ö11 X tÿ1 . . . ö1 p1 X tÿ p1 a1 t if X tÿd < r
(2:1)
X t ö20 ö21 X tÿ1 . . . ö2 p2 X tÿ p2 a2 t if X tÿd . r
where a1 t N(0, ó 21 ) and a2 t N(0, ó 22 ), d is the delay parameter and r 2 R is
the threshold parameter. Let p max ( p1 , p2 , d). For the moment, d is assumed
to be known. The case where d is not known will be discussed in Section 4. The
process fX t g is assumed to be stationary and ergodic with ®nite second
moments, and the stationary distribution of fX 1 , . . ., X n g9 admits a density that
is positive everywhere.
Here is some notation: ö1 (ö10 , ö11 , . . ., ö1 p1 )9, ö2 (ö20 , ö21 , . . ., ö2 p2 )9
where the prime denotes the transpose of a vector or matrix; ö ^ 1 (r), ö ^ 2 (r),
2 2
ó^ 1(r) and ó^ 2 (r) are the CLSEs based on r. The sum of squares error function
# Blackwell Publishers Ltd 1998

AKAIKE INFORMATION CRITERION FOR THRESHOLD AR MODELS 115
is de®ned as
S X (ö1 , ö2 , r) S X 1 (ö1 , r) S X 2 (ö2 , r)
where
X
S X 1 (ö1 , r) (X t ÿ ö10 ÿ ö11 X tÿ1 ÿ . . . ÿ ö1 p1 X tÿ p1 )2 I(X tÿd < r)
X
S X 2 (ö2 , r) (X t ÿ ö20 ÿ ö21 X tÿ1 ÿ . . . ÿ ö2 p2 X tÿ p2 )2 I(X tÿd . r)
and I(:) is the indicator function. All the summations, unless otherwise stated,
are from p 1 to n and N n ÿ p is the effective number of observations. The
CLSEs of ö ^ 2 (^r) and ^r are the values which globally minimize
^ 1 (^r), ö
S X (ö1 , ö2 , r). The CLSEs of ó^ 21 (^r) and ó^ 22 (^r) are de®ned as
^ i (^r), ^rg
S X i fö
ó^ 2i (^r) i 1, 2
n i (^r)
P P
where n1 (^r) I(X tÿd < ^r) and n2 (^r) I(X tÿd . ^r). The following
theorem is required in the derivation of the corrected Akaike information
criterion. The proof is given in the Appendix.
THEOREM. Under the four conditions stated in Chan (1993, pp. 522±23),
(T1) ^r r Op (1=N );
(T2) ö ^ 2 (^r), ó^ 21 (^r) and ó^ 22 (^r) are asymptotically independent of ^r and
^ 1 (^r), ö
their asymptotic distributions are the same as those for the case when r is
known;
(T3) given r,
2 ÿ1
ö^ (r) ö1 (r) ó 1 V 1 (r) 0
N 1=2 ^ 1 N p1 p2 2 ,
ö2 (r) ö2 (r) 0 ó 22 V ÿ1
2 (r)
where N p1 p2 2 (:, :) denotes a ( p1 p2 2)th-dimensional normal distribution

and V1 (r) EfU t (r)U t (r)9g, V2 (r) EfW t (r)W t (r)9g. Here, U t (r)
I(X tÿd < r)(1, X tÿ1 , . . ., X tÿ p1 )9 and W t (r) I(X tÿd . r)(1, X tÿ1 , . . .,
X tÿ p2 )9;
(T4) given r, n i ó^ 2i (r) ó 2i (r)÷2n i ÿ( pi 1) , i 2
P1, 2, where ÷ h denotes the ÷
2
distribution with h degrees of freedom, n1 I(X tÿd < r) and n2 N ÿ n1 .

^ i (^r)
Note that Chan (1993) has shown the asymptotic independence between ö
and ^r but not the more general situation (T2).
3. THE CORRECTED AKAIKE INFORMATION CRITERION
Let X 1 , . . ., X n be observations from a threshold autoregressive process of

the form (2.1) and assume that the true order is ( p1 , p2 ). Let Y1 , . . ., Y n be

an independent realization of the true process. Then
X
2
^ 1 (^r), ö
ÿ2 ln LY fö ^ 2 (^r), ó^ 21 (^r), ó^ 22 (^r), ^rg n i (^r) ln ó^ 2i (^r)
i1
X
2
ó^ ÿ2 ^ i (^r), ^rg
i (^
r)S Yi fö
i1
where LY denotes the likelihood for observations fY t g. Taking expectations with

respect to the true parameters,
Eî fÄ(^ ^ 1 (^r), ö
îjî)g Eö1 ,ö2 ,ó 21 ,ó 22 , r [ÿ2 ln LY fö ^ 2 (^r), ó^ 2 (^r), ó^ 2 (^r), ^rg]
1 2
Eö1 ,ö2 ,ó 21 ,ó 22 , r fn1 (^r) ln ó^ 21 (^r) n2 (^r) ln ó^ 22 (^r)g
Eö1 ,ö2 ,ó 21 ,ó 22 , r [ó^ ÿ2 ^ 1 (^r), ^rg ó^ ÿ2 (^r)S Y2 fö

^ 2 (^r), ^rg]:
1 (^
r)S Y1 fö 2
(3:1)
The ®rst term of (3.1) is estimated consistently by n1 (^r) ln ó^ 21 (^r) n2 (^r) ln ó^ 22 (^r).
For the second term
Eö1 ,ö2 ,ó 21 ,ó 22 , r [ó^ ÿ2 ^ 1 (^r), ^rg ó^ ÿ2 (^r)S Y2 fö

^ 2 (^r), ^rg]
1 (^
r)S Y1 fö 2
ó ÿ2
E r (Eö1 ,ö2 ,ó 21 ,ó 22 [^ ^ 1 (^r), ^rg ó^ ÿ2 (^r)S Y2 fö
^ 2 (^r), ^rgjr ^r])
1 (^
r)S Y1 fö 2
E r (Eö1 ,ó 21 [ó^ ÿ2
1 (^
^ 1 (^r), ^rgjr ^r] Eö ,ó 2 [ó^ ÿ2
r)S Y1 fö 2 (^
^ 2 (^r), ^rgjr ^r]):
r)S Y2 fö
2 2
^ 1 (^r), ^rg at the `true' parameter ö1 (^r) assuming that ^r is the true
Expanding S Y1 fö
value of r, by abusing the notation, we have
2
^ 1 ÿ ö1 )9 @S Y1 (ö1 ) 1 (ö
^ 1 ) S Y1 (ö1 ) (ö
S Y1 (ö ^ 1 ÿ ö1 )9 @ S Y1 (ö1 ) (ö
^ 1 ÿ ö1 )
@ö1 2 @ö1 @ö91
( )
X
^
S Y1 (ö1 ) (ö1 ÿ ö1 )9 2 U t (^r)a t (ö1 )
( )
X
^
(ö1 ÿ ö1 )9 ^ 1 ÿ ö1 ):
U t (^r)U t (^r)9 (ö
P P
It is easy toPsee that N ÿ1 U t (^r)U t (^r)9 ! V1 (^r) and U t (^r)a t (ö1 ) has mean zero.
Replacing U t (^r)U t (^r)9 by NV1 (^r), we obtain

^ 1 )jr ^rg
Eö1 ,ó 21 fS Y1 (ö
( )
V (^ r )
Eö1 ,ó 21 fS Y1 (ö1 )jr ^rg ó 21 Eö1 ,ó 21 ^ 1 ÿ ö1 )9 1 (ö
N (ö ^ 1 ÿ ö1 ) r ^r
ó 21
n1 (^r)ó 21 (^r) ( p1 1)ó 21 (^r)
as U t (^r)a t (ö1 ) is independent of ö ^ 1 ÿ ö1 .

The independence of fX t g and fY t g implies that ó^ 21 (^r ) is asymptotically
independent of S Y1 . So
" #
S Y1 fö^ 1 (^r), ^rg
Eö1 ,ó 21 r ^r fn1 (^r) p1 1gó 2 (^r)Eö ,ó 2 (^r) f^
ó ÿ2 r)jr ^rg
1 1 (^
ó^ 21 (^r) 1 1

n1 (^r) ÿ ( p1 1) ÿ 2 ÿ1
fn1 (^r) p1 1gó 21 (^r) 2
ó 1 (^r)
n1 (^r)
n1 (^r)fn1 (^r) p1 1g

n1 (^r) ÿ p1 ÿ 3
and hence
" #!
S Y1 fö^ 1 (^r), ^rg
E r Eö1 ,ó 21 r ^r
ó^ 21 (^r)
can be consistently estimated by

n1 (^r)fn1 (^r) p1 1g
:
n1 (^r) ÿ p1 ÿ 3
P
As ^r is an N-consistent estimator of r, we should have n1 (^r) ! n1 (r). Similarly,
we obtain
" #!
S Y2 fö^ 2 (^r), ^rg n2 (^r)fn2 (^r) p2 1g
E r Eö2 ,ó 22 r ^r :
2
ó^ 2 (^r) n2 (^r) ÿ p2 ÿ 3
Thus we can obtain an unbiased estimate of the expected Kullback±Leibler

information Eî fÄ(^ îjî)g. As the above computation was based on the true order
( p1 , p2 ), we therefore select the values of p1 and p2 for our ®tted SETAR model
to be those which minimize the corrected Akaike information criterion AICC :
AICC ( p1 , p2 ) n1 (^r) ln ó^ 21 (^r) n2 (^r) ln ó^ 22 (^r)
n1 (^r)fn1 (^r) p1 1g n2 (^r)fn2 (^r) p2 1g

:
n1 (^r) ÿ p1 ÿ 3 n2 (^r) ÿ p2 ÿ 3

4. THE MINIMUM AICC PROCEDURE
4.1. Minimum AICC procedure when r is unknown and d is known

In the above discussion, we have assumed that r 2 R. In practice, if we know the
value of the threshold parameter r, then the procedure of selecting the orders of
a SETAR model by minimizing AICC is quite straightforward.
However, if we do not have any prior information about r, we suggest the
following procedure.
(1) Fix the maximum orders f p1 , p2 g from which the orders of the SETAR
model are selected.
(2) Assume r 2 R ~ [l, u] R. For example, take l and u to be the
a 3 100% percentile and the b 3 100% percentile respectively of X t . Either
a 0:25, b 0:75 or a 0:1, b 0:9 may be an appropriate choice.
However, care should be taken to guarantee that there are enough observations
in each regime.
(3) Let X (1) < X (2) <. . .< X ( n) denote the order statistics of X t .
(4) Let I r be the index set f[an], [an] 1, . . ., [bn]g where [u] denotes the
largest integer less than u. Set r X (i) , i 2 I r ; we calculate
min AICC (k 1 , k 2 ):
1< k 1 < p
1 ,1< k 2 < p2
Denote the minimum as AICC (X (i) ) and the corresponding orders giving this
minimum as k 1 (X (i) ), k 2 (X (i) ). Note that in the calculation the ®rst max (d,
p1 , p2 ) observations should be discarded to make the comparison meaningful.
(5) Find the minimum of AICC (X (i) ), i 2 I r . Denote the value of X (i) giving
this minimum as X (i) .
(6) The orders selected by minimizing AICC are k 1 (X (i) ), k (X ), and the
2 (i)

estimate ^r X (i) .
Tong (1990, p. 379) and Li (1988) suggested a similar procedure for
selecting the orders by minimizing AIC.
4.2. Minimum AICC procedure when both r and d are unknown

The assumption that the delay parameter d is known may not be reasonable. If
we have to estimate the delay parameter, the procedure above can be modi®ed to
include its selection. The justi®cation to do this comes from the consistency in
estimating d by the conditional least squares method given in Chan (1993,
Theorem 1). The above procedure can be easily modi®ed for the case where d is
unknown. First, replace steps (1) and (6) by steps (19) and (69) below
respectively; second, ®x d before peforming steps (4) and (5), and replace X (i) ,
(d) respectively; and third, add step (5a) below after step (5).
X (i) by X (i) (d), X (i)
(19) Fix the maximum orders f p , p g and d from which the orders of the
1 2
SETAR model are selected.

(5a) Find the minimum of AICC fX (i) (d)g, 1 < d < d . Denote the value of
d giving this minimum as d. ^
(69) The selected delay parameter is d, ^ the selected orders for the conditional
^ ^ ( d).
mean are k 1 fX (i) ( d)g, k 2 fX (i) ( d)g, and the estimate ^r X (i) ^
5. SIMULATION STUDY AND EXAMPLE
In order to compare the performance of AIC, AICC and BIC in the selection of
SETAR models, we performed seven simulation experiments. The procedures for
order selection by AIC and BIC are similar to those mentioned in Section 4,
with the criterion AICC replaced by AIC or BIC. The de®nitions of AIC and BIC
used for SETAR models are
AIC( p1 , p2 ) n1 (^r) ln ó 21 (^r) n2 (^r) ln ó 22 (^r) 2( p1 p2 2)
BIC( p1 , p2 ) n1 (^r) ln ó 21 (^r) n2 (^r) ln ó 22 (^r) ( p1 1) ln n1 (^r) ( p2 1) ln n2 (^r):
For the case where r is unknown but d is known, we carried out four
simulation experiments to compare the performance of AIC, AICC and BIC.
The models simulated are of the form
X t ö11 X tÿ1 a t if X tÿ1 < 0:0
5:1
X t ö21 X tÿ1 a t if X tÿ1 . 0:0
where a t N(0, 1). The number of replications was 100 in all the experiments
below. For model (1), ö11 0:5, ö21 ÿ0:5, and for model (2), ö11 ÿ0:8,
ö21 ÿ0:2. We assume that d is known to be unity and we use a 0:25,
b 0:75 in the studies. In Table I, we use sample size n 50 and
p1 p2 5. In Table II, n 75 and p1 p2 10.
We get similar results from Tables I and II in spite of the different settings
of the simulation experiments. The performance of AIC is unacceptable. The
identi®cation of the orders are only correct (i.e. both k 1 and k 2 are correct) in
about 23% and 11% of the replications when n 50 and n 75 respectively.
The problem of over®tting is quite severe. If we de®ne over®tting as either one
of the orders greater than 3, then about 48% (in the case n 50) and 83% (in
the case n 75) of the replications are over®tted by AIC. The performance of
AICC is much better in terms of both a higher rate of correct identi®cation
(about 65% for n 50 and 50% for n 75) and less over®tting (about 6% for
n 50 and 18% for n 75). Although BIC is designed to overcome the
inconsistentcy of AIC, its small sample performance is not as good as that of
AICC . The rates of correct identi®cation are lower (about 43% for n 50 and
42% for n 75 and the tendency to over®t is higher (about 32% for n 50
and 45% for n 75) than for AICC , as observed in Tables I and II. As the
maximum allowable number of parameters is proportional to the number of
observations, over®tting with AIC and BIC is expected, as in the case of linear
time series. However, it is clearly not a great problem with AICC . Hence AICC

TABLE I
SIMULATION RESULTS WHEN r IS UNKNOWN AND d IS KNOWN
k
2
k
1 1 2 3 4 5
Model (1): ö11 0:5, ö21 ÿ0:5
1 60, 21, 40 9, 7, 6 6, 8, 6 1, 6, 8 0, 9, 13
2 16, 8, 7 0, 1, 0 2, 2, 2 1, 4, 3 0, 2, 0
3 0, 5, 5 1, 0, 1 0, 1, 0 0, 0, 0 0, 3, 0
4 3, 4, 3 1, 1, 0 0, 0, 0 0, 1, 0 0, 1, 0
5 0, 8, 4 0, 5, 2 0, 1, 0 0, 2, 0 0, 0, 0
Model (2): ö11 ÿ0:8, ö21 ÿ0:2
1 69, 24, 45 4, 5, 6 3, 4, 5 1, 7, 7 0, 6, 8
2 13, 7, 8 0, 2, 0 1, 1, 0 1, 1, 1 0, 6, 1
3 4, 5, 5 0, 1, 0 0, 2, 0 0, 0, 0 0, 0, 0
4 0, 5, 5 1, 1, 0 0, 3, 1 0, 1, 0 0, 1, 0
5 2, 13, 7 0, 1, 0 1, 1, 1 0, 2, 0 0, 1, 0

Notes: n 50, p1 p2 5; ®rst value AICC , second value AIC, third value BIC; entries are
frequencies of selecting k 1 and k 2 .
TABLE II
SIMULATION RESULTS WHEN r IS UNKNOWN AND d IS KNOWN
k
2
k
1 1 2 3 4±6 7±10
Model (1): ö11 0:5, ö21 ÿ0:5
1 48, 13, 42 6, 0, 0 6, 3, 2 5, 4, 3 1, 26, 22
2 15, 2, 6 2, 0, 1 1, 1, 0 1, 0, 2 1, 9, 4
3 4, 1, 2 2, 0, 0 0, 0, 0 0, 1, 0 0, 3, 1
4±6 6, 7, 4 0, 0, 0 0, 0, 0 0, 1, 0 0, 1, 0
7±10 2, 18, 11 0, 2, 0 0, 0, 0 0, 2, 0 0, 6, 0
Model (2): ö11 ÿ0:8, ö21 ÿ0:2
1 52, 8, 42 9, 3, 7 3, 1, 0 6, 7, 2 4, 28, 20
2 10, 2, 5 1, 0, 1 1, 0, 0 1, 0, 0 0, 3, 2
3 3, 1, 2 1, 0, 0 0, 0, 0 0, 0, 0 0, 3, 1
4±6 5, 3, 9 1, 2, 0 1, 2, 0 2, 2, 0 0, 5, 0
7±10 0, 16, 8 0, 3, 0 0, 0, 1 0, 6, 0 0, 5, 0
Notes: n 75, p1 p2 10; ®rst value AICC , second value AIC, third value BIC; entries are
frequencies of selecting k 1 and k 2 .
is a better criterion than AIC and BIC in order selection for SETAR models in
the small sample situation.
For the case where both r and d are unknown, we carried out three
simulation experiments. Model (1) above was simulated in these experiments.
Tables III and IV present the results for the small sample situation and Table V
presents the results for a larger sample size. In Table III, we use a sample size

of 50 with p1 p2 5 and d 6. The sample sizes in Tables IV and V are
75 and 150 respectively. In both tables, p1 p2 10 and d 6.
From Tables III and IV, it can be observed that AICC performs a little bit
better in identifying the delay parameter than AIC and BIC. However, the rates
TABLE III
SIMULATION RESULTS WHEN BOTH r AND d ARE UNKNOWN
k 2
k 1 1 2 3 4 5
Correct identi®cation of d: 40, 32, 34
1 59, 6, 21 8, 2, 3 6, 8, 6 1, 5, 10 0, 21, 24
2 19, 4, 5 1, 1, 0 1, 0, 0 0, 2, 2 0, 3, 1
3 1, 5, 4 1, 2, 1 0, 0, 0 0, 1, 0 0, 2, 0
4 2, 5, 6 0, 1, 0 0, 0, 0 0, 0, 0 0, 3, 0
5 0, 16, 12 0, 4, 4 0, 1, 0 0, 6, 0 1, 2, 1
Notes: Model (1), ö11 0:5, ö21 ÿ0:5, n 50, p1 p2 5, d 6; ®rst value AICC , second
value AIC, third value BIC; entries are frequencies of selecting k 1 and k 2 .
TABLE IV
k2
k1 1 2 3 4±6 7±10

1 39, 0, 20 7, 1, 1 6,
0, 1 8, 2, 1 5, 38, 42
2 9, 1, 3 4, 0, 1 1,
1, 0 1, 0, 1 1, 5, 4
3 5, 0, 0 1, 0, 0 0,
0, 0 1, 0, 0 0, 6, 1
4±6 3, 3, 2 2, 0, 0 0,
1, 0 2, 1, 0 0, 4, 0
7±10 4, 21, 22 0, 3, 1 1,
1, 0 0, 4, 0 0, 8, 0
Notes: Model (1), ö11 0:5, ö21 ÿ0:5, n 75, p1 p2 10, d 6; ®rst value AICC ,
second value AIC, third value BIC; entries are frequencies of selecting k 1 and k 2 .
TABLE V
k2
k1 1 2 3 4±6 7±10

1 39, 15, 79 10, 7, 6 4,
2, 0 3, 5, 1 7, 18, 1
2 9, 8, 8 1, 1, 1 0,
0, 0 0, 1, 0 1, 2, 1
3 5, 3, 2 2, 2, 0 1,
0, 0 1, 3, 0 1, 1, 0
4±6 12, 12, 1 1, 1, 0 0,
0, 0 1, 2, 0 0, 1, 0
7±10 1, 11, 0 1, 1, 0 0,
1, 0 0, 0, 0 0, 3, 0
Notes: Model (1), ö11 0:5, ö21 ÿ0:5, n 150, p1 p2 10, d 6; ®rst value AICC ,
second value AIC, third value BIC; entries are frequencies of selecting k 1 and k 2 .

of correct identi®cation of d (less than 50%) are not satisfactory. The

performance of AICC in identifying the orders ( p1 and p2 ) is clearly much
better than those of AIC and BIC, in terms of higher rates of correct
identi®cation (59% for n 50 and 39% for n 75) and lower rates of
over®tting (4% for n 50 and 28% for n 75). Again, AICC should be used
instead of AIC and BIC when the sample size is small.
For moderate sample size, BIC clearly outperforms AIC and AICC . Although
the rates of correct identi®cation for the delay parameter are about the same for
BIC and AICC , BIC gives better order selection results for SETAR models, as
shown in Table V. However, AICC still outperforms AIC.
As an example, the minimum AICC procedure is applied to the order
selection problem of the Canadian Lynx data (1821±1934) with length 114,
which are described in Tong (1990). Tong developed a least squares search
based on the minimum AIC procedure. For comparison, we also use the
minimum BIC procedure. The original data are transformed by taking
logarithms to the base 10. The maximum allowable dimensions are
p1 p2 10 and grid search is used for determining d and r with d 6.
As the length of the series is just over 100, we would expect that AIC will tend
to over®t the models and so AICC will be more reliable. It was found that, for
each value of d and r, the model selected by AIC may be over®tted, and both
AICC and BIC select a more parsimonious model. The model selected by
minimizing AIC is (A) d^ 3, ^r 2:9464, k 1 5 and k 2 9, and the model
selected by minimizing AICC or BIC is (B) d^ 3, ^r 2:9464, k 1 4 and
k 2 2. Models (A) and (B) have the same estimated delay parameter and
threshold parameter. No obvious lack of ®t is detected for these two models
from the analysis of the normalized residuals series f^a it =ó^ i g. Although the
value of AIC for model (A) is a global minimum, Tong (1990, p. 386) discards
this model for further study as there is a possibility of over®tting (i.e. k 2 > 8).
Clearly, model (B) should be a better choice than model (A) which deserves
further investigation.
6. CONCLUSION
From the above discussion, it is essential that the length of the data should be at
least 150 for the CLSE of the delay parameter to be reliable. Also, it is well
known that AIC and AICC are not consistent model order selection criteria for
linear time series models (Shibata, 1976). From the above simulation
experiments, we can conclude that AIC and AICC are both inconsistent selection
criteria while BIC is consistent in SETAR model selection. However, the
asymptotic ef®ciency of AIC and hence AICC in the linear model setting
(Shibata, 1981, p. 53) should be carried over to the SETAR model, as the
SETAR model is just a piecewise linear model. Since a true model rarely exists,
the property of asymptotic ef®ciency appears to be more desirable.
As a conclusion, while AICC retains similar large sample properties to AIC,

it performs much better than AIC and even BIC in small sample situations. We
suggest that AICC rather than AIC should be put into routine usage for the
order selection of SETAR models.
APPENDIX
The proof of (T1) and part of (T2), i.e. the asymptotic independence of ö ^ 1 (^r), ö
^ 2 (^r) with
^r, was given in Theorem 2 of Chan (1993). For (T3), the result follows by applying
Theorems 5.4 and 5.5 in Tong (1990). For (T4), note that model (2.1) is piecewise linear
and the result can be deduced directly from the theory of least squares estimation for the
general linear model.
For the second part of (T2), for simplicity of exposition, we just give the proof for the
simpler case when ó 21 ó 22 ó 2 ; it can easily be extended to the general case. First,
we have to prove that

2 2 1
sup j^ ó (z) ÿ ó (r)j Op : (A1)
jzÿ rj< K= N N
The method of the proof is adopted from Chan (1988, pp. 24±27). (A1) and (T1) imply
the asymptotic independence of ó 2 (^r) with ^r. Let
X (X p1 , . . ., X n )9

U p1 (z) U p2 (z) . . . U n (z) 9
H(z) :
W p1 (z) W p2 (z) . . . W n (z)
Then ó^ 2 (z) X 9[I ÿ H(z)f H(z)9 H(z)gÿ1 H(z)9]X =N .
Let Ã(z) H(z)9 H(z)=N . It is easy to see that Ã(z) can be written as

Ã1 (z) 0
:
0 Ã2 (z)
For N large, Ãi (z), i 1, 2, will be of full rank and the inverse operation exists. By the
ergodicity of X t , Ãi (z) ! V i (z) almost surely, i 1, 2. Hence

V1 (z) 0
Ã(z) ! a:s:
0 V2 (z)
Now,
ó 2 (z) ÿ ó^ 2 (r)j
j^

1 1
X 9[ H(r)f H(r)9 H(r)g H(r)9]X ÿ X 9[ H(z)f H(z)9 H(z)g H(z)9]X
ÿ1 ÿ1
N N

1 1 1 1
ÿ1
X 9 H(r) Ã (r) H(r)9X ÿ ÿ1
X 9 H(z) Ã (z) H(z)9X
N N N N

1 1
ÿ1
X 9f H(r) ÿ H(z)g Ã (r) H(r)9X
N N

1 ÿ1 ÿ1 1
X 9 H(z) [Ã (r) ÿ Ã (z)] H(r)9X
N N


1 1
X 9 H(z) Ãÿ1 (z) f H(r) ÿ H(z)g9X
N N

1 1
< X 9f H(r) ÿ H(z)gjÃÿ1 (r)j H(r)9X
N N

1 1
X 9 H(z)jÃÿ1 (r) ÿ Ãÿ1 (z)j H(r)9X
N N

1 ÿ1 1
X 9 H(z)jÃ (z)j f H(r) ÿ H(z)g9X :

N N
Then by the argument given in Chan (1988, pp. 25±26),

1 1

sup f H(z) ÿ H(r)g9X Op
jzÿ rj< K= N N N

1
sup jÃÿ1 (r) ÿ Ãÿ1 (z)j Op
jzÿ rj< K= N N
and all the other terms are of order Op (1). Hence (A1) follows.
ACKNOWLEDGEMENT
The authors would like to thank the referee for comments on a previous version of this
paper. W. K. Li's research is supported by the Hong Kong Research Grant Council.
REFERENCES
AKAIKE, H. (1973) Information theory and an extension of the maximum likelihood principle. In
2nd International Symposium on Information Theory (eds. B. N. Petrov and F. Csaki). Budapest:
Akademia Kiado, pp. 267±81.
BROCKWELL, P. J. and DAVIS, R. A (1991) Time Series: Theory and Methods. New York: Springer.
CHAN, K. S. (1988) Consistency and limiting distribution of the least squares estimator of a
threshold autoregressive model. Technical Report 245, Department of Statistics, University of
Chicago.
Ð± (1993) Consistency and limiting distribution of the least squares estimator of a threshold
autoregressive model. Ann. Stat. 21, 520±33.
HURVICH, C. M. and TSAI, C. L. (1989) Regression and time series model selection in small
samples. Biometrika 76, 297±307.
LI, W. K. (1988) The Akaike information criterion in threshold modelling: some empirical
evidences. In Lecture Notes in Control and Information Sciences (eds. M. Thoma and A.
Wyner). Heidelberg: Springer, pp. 88±96.
SCHWARZ, G. (1978) Estimating the dimension of a model. Ann. Stat. 6, 461±64.
SHIBATA, R. (1976) Selection of the order of an autoregressive model by Akaike's information
criterion. Biometrika 63, 117±26.
Ð± (1981) An optimal selection of regression variables. Biometrika 68, 45±54.
TONG, H. (1978) On a threshold model. In Pattern Recognition and Signal Processing (ed. C. H.
Chen). Amsterdam: Sijthoff and Noordhoff, pp. 575±86.
Ð± (1983) Threshold models in non-linear time series analysis. Springer Lecture Notes in
Statistics, Vol. 21. New York: Springer.
Ð± (1990) Nonlinear Time Series: A Dynamical System Approach. Oxford: Oxford University
Press.

Wong and Li

Uploaded by

Copyright:

Available Formats

You might also like

Wong and Li

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wong and Li

Uploaded by

Copyright:

Available Formats

A NOTE ON THE CORRECTED AKAIKE INFORMATION CRITERION

FOR THRESHOLD AUTOREGRESSIVE MODELS

Abstract. A bias-corrected Akaike information criterion AICC is derived for self-

Keywords. Corrected Akaike information criterion; Kullback±Leibler information;

Model selection is an important problem in statistical modelling. The Akaike

0143-9782/98/01 113±124 JOURNAL OF TIME SERIES ANALYSIS Vol. 19, No. 1

2. ASYMPTOTIC PROPERTIES OF THE CLSE

We consider the following SETAR model:

# Blackwell Publishers Ltd 1998

where N p1  p2 2 (:, :) denotes a ( p1  p2  2)th-dimensional normal distribution

distribution with h degrees of freedom, n1  I(X tÿd < r) and n2  N ÿ n1 .

3. THE CORRECTED AKAIKE INFORMATION CRITERION

Let X 1 , . . ., X n be observations from a threshold autoregressive process of

# Blackwell Publishers Ltd 1998

an independent realization of the true process. Then

where LY denotes the likelihood for observations fY t g. Taking expectations with

 Eö1 ,ö2 ,ó 21 ,ó 22 , r fn1 (^r) ln ó^ 21 (^r)  n2 (^r) ln ó^ 22 (^r)g

 Eö1 ,ö2 ,ó 21 ,ó 22 , r [ó^ ÿ2 ^ 1 (^r), ^rg  ó^ ÿ2 (^r)S Y2 fö

Eö1 ,ö2 ,ó 21 ,ó 22 , r [ó^ ÿ2 ^ 1 (^r), ^rg  ó^ ÿ2 (^r)S Y2 fö

# Blackwell Publishers Ltd 1998

 n1 (^r)ó 21 (^r)  ( p1  1)ó 21 (^r)

as U t (^r)a t (ö1 ) is independent of ö ^ 1 ÿ ö1 .

can be consistently estimated by

Thus we can obtain an unbiased estimate of the expected Kullback±Leibler

AICC ( p1 , p2 )  n1 (^r) ln ó^ 21 (^r)  n2 (^r) ln ó^ 22 (^r)

n1 (^r)fn1 (^r)  p1  1g n2 (^r)fn2 (^r)  p2  1g

# Blackwell Publishers Ltd 1998

4. THE MINIMUM AICC PROCEDURE

4.1. Minimum AICC procedure when r is unknown and d is known

4.2. Minimum AICC procedure when both r and d are unknown

# Blackwell Publishers Ltd 1998

5. SIMULATION STUDY AND EXAMPLE

# Blackwell Publishers Ltd 1998

# Blackwell Publishers Ltd 1998

k1 1 2 3 4±6 7±10

k1 1 2 3 4±6 7±10

# Blackwell Publishers Ltd 1998

of correct identi®cation of d (less than 50%) are not satisfactory. The

# Blackwell Publishers Ltd 1998

# Blackwell Publishers Ltd 1998

# Blackwell Publishers Ltd 1998

You might also like

where N p1 p2 2 (:, :) denotes a ( p1 p2 2)th-dimensional normal distribution

distribution with h degrees of freedom, n1 I(X tÿd < r) and n2 N ÿ n1 .

Eö1 ,ö2 ,ó 21 ,ó 22 , r fn1 (^r) ln ó^ 21 (^r) n2 (^r) ln ó^ 22 (^r)g

Eö1 ,ö2 ,ó 21 ,ó 22 , r [ó^ ÿ2 ^ 1 (^r), ^rg ó^ ÿ2 (^r)S Y2 fö

Eö1 ,ö2 ,ó 21 ,ó 22 , r [ó^ ÿ2 ^ 1 (^r), ^rg ó^ ÿ2 (^r)S Y2 fö

n1 (^r)ó 21 (^r) ( p1 1)ó 21 (^r)

AICC ( p1 , p2 ) n1 (^r) ln ó^ 21 (^r) n2 (^r) ln ó^ 22 (^r)

n1 (^r)fn1 (^r) p1 1g n2 (^r)fn2 (^r) p2 1g