Professional Documents
Culture Documents
Wong and Li
Wong and Li
Wong and Li
BY C. S. WONG AND W. K. LI
The University of Hong Kong
First version received November 1995
1. INTRODUCTION
(1988) suggested using AIC to solve the problem, where AIC for a model is
de®ned as minus twice the maximized log-likelihood plus twice the number of
independently adjusted parameters. However, no theoretical justi®cation for the
procedure has been given. The empirical studies done by Li (1988) suggest that
the minimum AIC procedure can be effective, but they have not taken into
account the small sample situation. A modi®ed version of AICC for SETAR
models was incorporated into the computer package STAR (developed by Tong)
for ®tting non-linear time series models, but neither theoretical nor empirical
justi®cation has been made in the literature.
In this paper, this gap is ®lled for AICC in SETAR model selection. This
cannot be done without knowledge of the asymptotic properties of the
conditional least squares estimator (CLSE) for the SETAR model (Chan, 1993).
We will show that AICC is an asymptotically unbiased estimator for the
Kullback±Leibler information
Ä(øjî) Eî fÿ2 ln f (X ; ø)g
which can be interpreted as a measure of how close the two densities f (:; ø)
and f(´; î) are. The small sample properties of AIC, AICC and the Bayesian
information criterion BIC (Schwarz, 1978) will be investigated by some
simulation studies. The results suggest that the performance of AICC is much
better than those of AIC and BIC in the small sample situation.
The organization of the paper is as follows. In Section 2 we discuss the
asymptotic properties of the CLSE and the proof will be given in the
Appendix. Section 3 develops AICC for SETAR models. Section 4 gives the
minimum AICC procedure in SETAR model selection. The small sample
properties of AIC, AICC as well as BIC are demonstrated via simulation
experiments in Section 5. An example is also given in that section. The
concluding remarks are given in Section 6.
is de®ned as
S X (ö1 , ö2 , r) S X 1 (ö1 , r) S X 2 (ö2 , r)
where
X
S X 1 (ö1 , r) (X t ÿ ö10 ÿ ö11 X tÿ1 ÿ . . . ÿ ö1 p1 X tÿ p1 )2 I(X tÿd < r)
X
S X 2 (ö2 , r) (X t ÿ ö20 ÿ ö21 X tÿ1 ÿ . . . ÿ ö2 p2 X tÿ p2 )2 I(X tÿd . r)
and I(:) is the indicator function. All the summations, unless otherwise stated,
are from p 1 to n and N n ÿ p is the effective number of observations. The
CLSEs of ö ^ 2 (^r) and ^r are the values which globally minimize
^ 1 (^r), ö
S X (ö1 , ö2 , r). The CLSEs of ó^ 21 (^r) and ó^ 22 (^r) are de®ned as
^ i (^r), ^rg
S X i fö
ó^ 2i (^r) i 1, 2
n i (^r)
P P
where n1 (^r) I(X tÿd < ^r) and n2 (^r) I(X tÿd . ^r). The following
theorem is required in the derivation of the corrected Akaike information
criterion. The proof is given in the Appendix.
THEOREM. Under the four conditions stated in Chan (1993, pp. 522±23),
(T1) ^r r Op (1=N );
(T2) ö ^ 2 (^r), ó^ 21 (^r) and ó^ 22 (^r) are asymptotically independent of ^r and
^ 1 (^r), ö
their asymptotic distributions are the same as those for the case when r is
known;
(T3) given r,
2 ÿ1
ö^ (r) ö1 (r) ó 1 V 1 (r) 0
N 1=2 ^ 1 N p1 p2 2 ,
ö2 (r) ö2 (r) 0 ó 22 V ÿ1
2 (r)
X
2
^ 1 (^r), ö
ÿ2 ln LY fö ^ 2 (^r), ó^ 21 (^r), ó^ 22 (^r), ^rg n i (^r) ln ó^ 2i (^r)
i1
X
2
ó^ ÿ2 ^ i (^r), ^rg
i (^
r)S Yi fö
i1
Eî fÄ(^ ^ 1 (^r), ö
îjî)g Eö1 ,ö2 ,ó 21 ,ó 22 , r [ÿ2 ln LY fö ^ 2 (^r), ó^ 2 (^r), ó^ 2 (^r), ^rg]
1 2
The ®rst term of (3.1) is estimated consistently by n1 (^r) ln ó^ 21 (^r) n2 (^r) ln ó^ 22 (^r).
For the second term
ó ÿ2
E r (Eö1 ,ö2 ,ó 21 ,ó 22 [^ ^ 1 (^r), ^rg ó^ ÿ2 (^r)S Y2 fö
^ 2 (^r), ^rgjr ^r])
1 (^
r)S Y1 fö 2
E r (Eö1 ,ó 21 [ó^ ÿ2
1 (^
^ 1 (^r), ^rgjr ^r] Eö ,ó 2 [ó^ ÿ2
r)S Y1 fö 2 (^
^ 2 (^r), ^rgjr ^r]):
r)S Y2 fö
2 2
^ 1 (^r), ^rg at the `true' parameter ö1 (^r) assuming that ^r is the true
Expanding S Y1 fö
value of r, by abusing the notation, we have
2
^ 1 ÿ ö1 )9 @S Y1 (ö1 ) 1 (ö
^ 1 ) S Y1 (ö1 ) (ö
S Y1 (ö ^ 1 ÿ ö1 )9 @ S Y1 (ö1 ) (ö
^ 1 ÿ ö1 )
@ö1 2 @ö1 @ö91
( )
X
^
S Y1 (ö1 ) (ö1 ÿ ö1 )9 2 U t (^r)a t (ö1 )
( )
X
^
(ö1 ÿ ö1 )9 ^ 1 ÿ ö1 ):
U t (^r)U t (^r)9 (ö
P P
It is easy toPsee that N ÿ1 U t (^r)U t (^r)9 ! V1 (^r) and U t (^r)a t (ö1 ) has mean zero.
Replacing U t (^r)U t (^r)9 by NV1 (^r), we obtain
n1 (^r) ÿ ( p1 1) ÿ 2 ÿ1
fn1 (^r) p1 1gó 21 (^r) 2
ó 1 (^r)
n1 (^r)
n1 (^r)fn1 (^r) p1 1g
n1 (^r) ÿ p1 ÿ 3
and hence
" #!
S Y1 fö^ 1 (^r), ^rg
E r Eö1 ,ó 21 r ^r
ó^ 21 (^r)
Denote the minimum as AICC (X (i) ) and the corresponding orders giving this
minimum as k 1 (X (i) ), k 2 (X (i) ). Note that in the calculation the ®rst max (d,
p1 , p2 ) observations should be discarded to make the comparison meaningful.
(5) Find the minimum of AICC (X (i) ), i 2 I r . Denote the value of X (i) giving
this minimum as X (i) .
(6) The orders selected by minimizing AICC are k 1 (X (i) ), k (X ), and the
2 (i)
estimate ^r X (i) .
Tong (1990, p. 379) and Li (1988) suggested a similar procedure for
selecting the orders by minimizing AIC.
(5a) Find the minimum of AICC fX (i) (d)g, 1 < d < d . Denote the value of
d giving this minimum as d. ^
(69) The selected delay parameter is d, ^ the selected orders for the conditional
^ ^ ( d).
mean are k 1 fX (i) ( d)g, k 2 fX (i) ( d)g, and the estimate ^r X (i) ^
In order to compare the performance of AIC, AICC and BIC in the selection of
SETAR models, we performed seven simulation experiments. The procedures for
order selection by AIC and BIC are similar to those mentioned in Section 4,
with the criterion AICC replaced by AIC or BIC. The de®nitions of AIC and BIC
used for SETAR models are
AIC( p1 , p2 ) n1 (^r) ln ó 21 (^r) n2 (^r) ln ó 22 (^r) 2( p1 p2 2)
BIC( p1 , p2 ) n1 (^r) ln ó 21 (^r) n2 (^r) ln ó 22 (^r) ( p1 1) ln n1 (^r) ( p2 1) ln n2 (^r):
For the case where r is unknown but d is known, we carried out four
simulation experiments to compare the performance of AIC, AICC and BIC.
The models simulated are of the form
X t ö11 X tÿ1 a t if X tÿ1 < 0:0
5:1
X t ö21 X tÿ1 a t if X tÿ1 . 0:0
where a t N(0, 1). The number of replications was 100 in all the experiments
below. For model (1), ö11 0:5, ö21 ÿ0:5, and for model (2), ö11 ÿ0:8,
ö21 ÿ0:2. We assume that d is known to be unity and we use a 0:25,
b 0:75 in the studies. In Table I, we use sample size n 50 and
p1 p2 5. In Table II, n 75 and p1 p2 10.
We get similar results from Tables I and II in spite of the different settings
of the simulation experiments. The performance of AIC is unacceptable. The
identi®cation of the orders are only correct (i.e. both k 1 and k 2 are correct) in
about 23% and 11% of the replications when n 50 and n 75 respectively.
The problem of over®tting is quite severe. If we de®ne over®tting as either one
of the orders greater than 3, then about 48% (in the case n 50) and 83% (in
the case n 75) of the replications are over®tted by AIC. The performance of
AICC is much better in terms of both a higher rate of correct identi®cation
(about 65% for n 50 and 50% for n 75) and less over®tting (about 6% for
n 50 and 18% for n 75). Although BIC is designed to overcome the
inconsistentcy of AIC, its small sample performance is not as good as that of
AICC . The rates of correct identi®cation are lower (about 43% for n 50 and
42% for n 75 and the tendency to over®t is higher (about 32% for n 50
and 45% for n 75) than for AICC , as observed in Tables I and II. As the
maximum allowable number of parameters is proportional to the number of
observations, over®tting with AIC and BIC is expected, as in the case of linear
time series. However, it is clearly not a great problem with AICC . Hence AICC
TABLE I
SIMULATION RESULTS WHEN r IS UNKNOWN AND d IS KNOWN
k
2
k
1 1 2 3 4 5
Model (1): ö11 0:5, ö21 ÿ0:5
1 60, 21, 40 9, 7, 6 6, 8, 6 1, 6, 8 0, 9, 13
2 16, 8, 7 0, 1, 0 2, 2, 2 1, 4, 3 0, 2, 0
3 0, 5, 5 1, 0, 1 0, 1, 0 0, 0, 0 0, 3, 0
4 3, 4, 3 1, 1, 0 0, 0, 0 0, 1, 0 0, 1, 0
5 0, 8, 4 0, 5, 2 0, 1, 0 0, 2, 0 0, 0, 0
Model (2): ö11 ÿ0:8, ö21 ÿ0:2
1 69, 24, 45 4, 5, 6 3, 4, 5 1, 7, 7 0, 6, 8
2 13, 7, 8 0, 2, 0 1, 1, 0 1, 1, 1 0, 6, 1
3 4, 5, 5 0, 1, 0 0, 2, 0 0, 0, 0 0, 0, 0
4 0, 5, 5 1, 1, 0 0, 3, 1 0, 1, 0 0, 1, 0
5 2, 13, 7 0, 1, 0 1, 1, 1 0, 2, 0 0, 1, 0
Notes: n 50, p1 p2 5; ®rst value AICC , second value AIC, third value BIC; entries are
frequencies of selecting k 1 and k 2 .
TABLE II
SIMULATION RESULTS WHEN r IS UNKNOWN AND d IS KNOWN
k
2
k
1 1 2 3 4±6 7±10
Model (1): ö11 0:5, ö21 ÿ0:5
1 48, 13, 42 6, 0, 0 6, 3, 2 5, 4, 3 1, 26, 22
2 15, 2, 6 2, 0, 1 1, 1, 0 1, 0, 2 1, 9, 4
3 4, 1, 2 2, 0, 0 0, 0, 0 0, 1, 0 0, 3, 1
4±6 6, 7, 4 0, 0, 0 0, 0, 0 0, 1, 0 0, 1, 0
7±10 2, 18, 11 0, 2, 0 0, 0, 0 0, 2, 0 0, 6, 0
Model (2): ö11 ÿ0:8, ö21 ÿ0:2
1 52, 8, 42 9, 3, 7 3, 1, 0 6, 7, 2 4, 28, 20
2 10, 2, 5 1, 0, 1 1, 0, 0 1, 0, 0 0, 3, 2
3 3, 1, 2 1, 0, 0 0, 0, 0 0, 0, 0 0, 3, 1
4±6 5, 3, 9 1, 2, 0 1, 2, 0 2, 2, 0 0, 5, 0
7±10 0, 16, 8 0, 3, 0 0, 0, 1 0, 6, 0 0, 5, 0
Notes: n 75, p1 p2 10; ®rst value AICC , second value AIC, third value BIC; entries are
frequencies of selecting k 1 and k 2 .
is a better criterion than AIC and BIC in order selection for SETAR models in
the small sample situation.
For the case where both r and d are unknown, we carried out three
simulation experiments. Model (1) above was simulated in these experiments.
Tables III and IV present the results for the small sample situation and Table V
presents the results for a larger sample size. In Table III, we use a sample size
of 50 with p1 p2 5 and d 6. The sample sizes in Tables IV and V are
75 and 150 respectively. In both tables, p1 p2 10 and d 6.
From Tables III and IV, it can be observed that AICC performs a little bit
better in identifying the delay parameter than AIC and BIC. However, the rates
TABLE III
SIMULATION RESULTS WHEN BOTH r AND d ARE UNKNOWN
k 2
k 1 1 2 3 4 5
Correct identi®cation of d: 40, 32, 34
1 59, 6, 21 8, 2, 3 6, 8, 6 1, 5, 10 0, 21, 24
2 19, 4, 5 1, 1, 0 1, 0, 0 0, 2, 2 0, 3, 1
3 1, 5, 4 1, 2, 1 0, 0, 0 0, 1, 0 0, 2, 0
4 2, 5, 6 0, 1, 0 0, 0, 0 0, 0, 0 0, 3, 0
5 0, 16, 12 0, 4, 4 0, 1, 0 0, 6, 0 1, 2, 1
Notes: Model (1), ö11 0:5, ö21 ÿ0:5, n 50, p1 p2 5, d 6; ®rst value AICC , second
value AIC, third value BIC; entries are frequencies of selecting k 1 and k 2 .
TABLE IV
SIMULATION RESULTS WHEN BOTH r AND d ARE UNKNOWN
k2
TABLE V
SIMULATION RESULTS WHEN BOTH r AND d ARE UNKNOWN
k2
6. CONCLUSION
From the above discussion, it is essential that the length of the data should be at
least 150 for the CLSE of the delay parameter to be reliable. Also, it is well
known that AIC and AICC are not consistent model order selection criteria for
linear time series models (Shibata, 1976). From the above simulation
experiments, we can conclude that AIC and AICC are both inconsistent selection
criteria while BIC is consistent in SETAR model selection. However, the
asymptotic ef®ciency of AIC and hence AICC in the linear model setting
(Shibata, 1981, p. 53) should be carried over to the SETAR model, as the
SETAR model is just a piecewise linear model. Since a true model rarely exists,
the property of asymptotic ef®ciency appears to be more desirable.
As a conclusion, while AICC retains similar large sample properties to AIC,
it performs much better than AIC and even BIC in small sample situations. We
suggest that AICC rather than AIC should be put into routine usage for the
order selection of SETAR models.
APPENDIX
The proof of (T1) and part of (T2), i.e. the asymptotic independence of ö ^ 1 (^r), ö
^ 2 (^r) with
^r, was given in Theorem 2 of Chan (1993). For (T3), the result follows by applying
Theorems 5.4 and 5.5 in Tong (1990). For (T4), note that model (2.1) is piecewise linear
and the result can be deduced directly from the theory of least squares estimation for the
general linear model.
For the second part of (T2), for simplicity of exposition, we just give the proof for the
simpler case when ó 21 ó 22 ó 2 ; it can easily be extended to the general case. First,
we have to prove that
2 2 1
sup j^ ó (z) ÿ ó (r)j Op : (A1)
jzÿ rj< K= N N
The method of the proof is adopted from Chan (1988, pp. 24±27). (A1) and (T1) imply
the asymptotic independence of ó 2 (^r) with ^r. Let
X (X p1 , . . ., X n )9
U p1 (z) U p2 (z) . . . U n (z) 9
H(z) :
W p1 (z) W p2 (z) . . . W n (z)
Then ó^ 2 (z) X 9[I ÿ H(z)f H(z)9 H(z)gÿ1 H(z)9]X =N .
Let Ã(z) H(z)9 H(z)=N . It is easy to see that Ã(z) can be written as
Ã1 (z) 0
:
0 Ã2 (z)
For N large, Ãi (z), i 1, 2, will be of full rank and the inverse operation exists. By the
ergodicity of X t , Ãi (z) ! V i (z) almost surely, i 1, 2. Hence
V1 (z) 0
Ã(z) ! a:s:
0 V2 (z)
Now,
ó 2 (z) ÿ ó^ 2 (r)j
j^
1 1
X 9[ H(r)f H(r)9 H(r)g H(r)9]X ÿ X 9[ H(z)f H(z)9 H(z)g H(z)9]X
ÿ1 ÿ1
N N
1 1 1 1
ÿ1
X 9 H(r) Ã (r) H(r)9X ÿ ÿ1
X 9 H(z) Ã (z) H(z)9X
N N N N
1 1
ÿ1
X 9f H(r) ÿ H(z)g à (r) H(r)9X
N N
1 ÿ1 ÿ1 1
X 9 H(z) [Ã (r) ÿ Ã (z)] H(r)9X
N N
ACKNOWLEDGEMENT
The authors would like to thank the referee for comments on a previous version of this
paper. W. K. Li's research is supported by the Hong Kong Research Grant Council.
REFERENCES
AKAIKE, H. (1973) Information theory and an extension of the maximum likelihood principle. In
2nd International Symposium on Information Theory (eds. B. N. Petrov and F. Csaki). Budapest:
Akademia Kiado, pp. 267±81.
BROCKWELL, P. J. and DAVIS, R. A (1991) Time Series: Theory and Methods. New York: Springer.
CHAN, K. S. (1988) Consistency and limiting distribution of the least squares estimator of a
threshold autoregressive model. Technical Report 245, Department of Statistics, University of
Chicago.
б (1993) Consistency and limiting distribution of the least squares estimator of a threshold
autoregressive model. Ann. Stat. 21, 520±33.
HURVICH, C. M. and TSAI, C. L. (1989) Regression and time series model selection in small
samples. Biometrika 76, 297±307.
LI, W. K. (1988) The Akaike information criterion in threshold modelling: some empirical
evidences. In Lecture Notes in Control and Information Sciences (eds. M. Thoma and A.
Wyner). Heidelberg: Springer, pp. 88±96.
SCHWARZ, G. (1978) Estimating the dimension of a model. Ann. Stat. 6, 461±64.
SHIBATA, R. (1976) Selection of the order of an autoregressive model by Akaike's information
criterion. Biometrika 63, 117±26.
б (1981) An optimal selection of regression variables. Biometrika 68, 45±54.
TONG, H. (1978) On a threshold model. In Pattern Recognition and Signal Processing (ed. C. H.
Chen). Amsterdam: Sijthoff and Noordhoff, pp. 575±86.
б (1983) Threshold models in non-linear time series analysis. Springer Lecture Notes in
Statistics, Vol. 21. New York: Springer.
б (1990) Nonlinear Time Series: A Dynamical System Approach. Oxford: Oxford University
Press.