14.384 Time Series Analysis: Mit Opencourseware

MIT OpenCourseWare
http://ocw.mit.edu
14.384 Time Series Analysis

Fall 2008
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Spectrum Estimation
1
14.384 Time Series Analysis, Fall 2007
Professor Anna Mikusheva
Paul Schrimpf, scribe
September 20, 2007
Lecture 5
Spectrum Estimation and Information Criteria

Spectrum Estimation
Same
last time. We have a stationary series, {zt } with covariances j and spectrum S() =
setupas
ij
. We want to estimate S().
j=
j e
Nave approach
We cannot estimate all the covariances from a nite sample. Lets just estimate all the covariances that we
can
j =
T
1
zj zjk
T
j=k+1
and use them to form

S() =
T
1
j eij
j=(T 1)
Thisestimator is not consistent. It convergers to a distribution instead of a point. To see this, let y =
T
it
1
zt , so that
t=1 e
T
S() = y y
If =
0
2S() S()2 (2)
Kernel Estimator
S() =
ST
|j|
1
j eij
ST
j=ST
Under appropriate conditions on ST (ST , but more slowly than T ), this estimator is consistent1 This
can be shown in a way similar to the way we showed the Newey-West estimator is consistent.
Information Criteria
Suppose you want to estimate an AR(p), but you dont know the right p. As a rst case, lets say you know
an upper bound, p p. Then, well consider the case where p is not known or p = .
Well begin by assuming p is known.
1 In
a uniform sense, i.e. P sup[,] |S() S()| > 0
Using information criteria
2
(1)
(1)
A bad way: testing down We test H0 : AR(p 1) vs HA : AR(p) If accept, then test p 2 vs p 1.
Continue until rejecting. This test is bad because were doing multiple testing, so the size of this procedure is
not the usual one. In other words, the individual signicance level of each test diers from the probability of
type I errors of the entire procedure. The rejection of H0i automatically means the rejection of H0i+1 , ..., H0p.
Also, this procedure wont be consistent because if we choose a signicance level of , well incorrectly reject
portion of the time no matter how large our sample.
Let Sp = ln
2 + pg(T ), where
p2 = T1 t e2t and et are the residuals from the model with p terms (in the
| in place of
multivariate case wed use det |
2 ). pg(T ) is a penalty term for having more parameters, where
g(T ) is a non-random function. We choose
p = arg min Sp
pp
Dierent forms of g(T ) have special names

Denition 1. AIC (Akaike): g(T ) = T2
BIC (Bayesian/Schwartz): g(T ) = lnTT
HQIC (Hannan-Quinn): g(T ) = 2 lnTln T
Question: which of these are consistent?
Denition 2. p is weakly consistent if lim P (
p = p0 ) = 1 (when we just say consistent, we mean weakly
consistent)
Denition 3. p is strongly consistent if P (lim p = p0 ) = 1
p
Theorem 4. If g(T ) 0 and T g(T ) , then p p0

Proof. As usual, we will only sketch the proof.
Lemma 5. If g(T ) 0, then P (
p p0 ) 1
Proof. Note that
P (
p p0 ) =P (Sp0 < min{S0 , S1 , ..., Sp0 1 })
This happens when Sp0 < Sj for j < p0 , which means
log p20 + p0 g(T ) < log j2 + jg(T )
log
j2
(p0 j)g(T ) >0
p20
Then since p0 > j, it must be that

p20
j2 , so
P
j2
p20
= bj > 1. Also, since g(T ) , we must have
j2
log 2 (p0 j)g(T ) > 0 1
p0
for each j, and since there are only nitely many j < p0 , we have
P (Sp0 < min{S0 , S1 , ..., Sp0 1 }) 1
Lemma 6. If T g(T ) 0 then P (p p0 ) 1.

Proof. As above, note that
P (p p0 ) =P (Sp0 < min{Sp0 +1 , Sp0 +1 , ..., Sp})
j2
(p0 j)g(T ) > 0 , j = p0 + 1, .., p
=P
p20
We can multiply both sides of the inequality by T without changing the probability:
j2
P (
p p0 ) =P T
2 (p0 j)T g(T ) > 0 , j = p0 + 1, .., p
p0
T 2j 2jp0 since it is like a likelihood ratio (in particular, it would be the likelihood ratio for testing
p0
j = 0, ..., 1p = 0, which is true). We have a nite number of these, and T g(T ) , so the whole
T 2j + (j p0 )T g(T ) , so the probability converges to 1.

p0
The proof so far has been for weak consistency. If we additionally have
p is strongly consistent.
g(T )T
2 ln ln T
> 1, then p p a.s., so
Corollary 7. BIC is strongly consistent. HQIC is consistent. AIC is not; it has a positive probability of
overestimating the number of lags asymptotically.
Remark 8. When p is consistent, estimating an AR(p) model in this 2-step way has the same asymptotic
distribution as if we knew p. However, in nite samples, not knowing p can make a dierence, and the nite
sample distribution of our estimates is inuenced by model selection.
AIC and forecasting Despite its inconsistency for picking the
number of lags, the AIC is not useless. It
minimizes MSE for one-step ahead forecasting. Suppose yt = j j xtj + et . We estimate {j } and form
yt = j j xtj . The MSE is

(j j )xtj )2 ]
=Ee2t + E[( (j j )xtj )2 ]
E[(yt yt )2 ] =E[(et +
Some asymptotics tell us
p+1
e
T
T +p+1
e
=
T
E[(yt yt )2 ] = 2y =e +
We want to minimize this by choosing p. First, we can take the log since its a monotonic transformation
arg min
p
T +p+1
T
T +p+1
2 (p) = arg min log

p2 + log
T
T (p + 1)
T (p + 1)
p
Note that
2(p + 1)
1
T +p+1
=1+
+ O( 2 )
T (p + 1)
T
T

so log
T +p+1
T (p+1)
2(p+1)
T
+ O( T12 ). Therefore,
2(p + 1)
1
+ O( 2 )
T
T
1
AIC + O( 2 )
T
log (p) log

p2 +
Thus, the AIC minimizes the variance of the one step ahead forecast error.
Consider an alternative approach:
yt = xt + ut
t1 = (Xt1 Xt1 )1 Xt1 yt1

yt|t1 = t1 xt
P LS(p) =
(yt yt|t1 )2
t=m+1
Choose p to minimize P LS(p).

Wei (1992) shows that for m xed
1
1
P LS(p) =SpBIC + o( )
T m
T
pP LS =
pBIC + op (1)
Innoue and Killian (2002) shows that if m = cT
1
1
P LS(p) = SpAIC + o( 2 )
T m
T
Remark 9. A larger point is that the best model selection criteria depends on what your goal is.
Example 10. Suppose we want to estimate the spectrum by tting an AR() model.
a(L)yt = et a(L) = 1 +
aj Lj
j=1
The spectrum is
S() =
|1 + j=1 aj eij |2
From this, we see that what we really want to estimate is the innite sum in the denominator. There are
two approaches
1. Estimate AR(pT ), let pT , but not too fast, say
estimate of the spectrum. To see this, consider:
|
j=1
ij
aj e
pT
j=1
p3T
T
. This procedure will give us a consistent
a
j eij |
|aj a
j |
j=1
By a similar argument to what we used to show the consistency of HAC, we could show that
a
j | 0
j=1 |aj
2. Same pT , but choose p = minppT BIC(p). Then estimate AR(

p) and compute the spectral density.
a
If pT = (ln T ) , 0 < a < , and if (i) p0 < then P (
p = p0 ) 1, or if (ii) p0 = , then lnpT 1
and sup |S() S()| p 0.
Remark 11. Of course, the condition that pT = (ln T )a is a bit dicult to interprete in practice. For
T 1/3 1
2
T
4
50
4 (ln T )
4
4
example, 50
250
7
8
2500
15
15

14.384 Time Series Analysis: Mit Opencourseware

Uploaded by

Copyright:

Available Formats

You might also like

14.384 Time Series Analysis: Mit Opencourseware

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

14.384 Time Series Analysis: Mit Opencourseware

Uploaded by

Copyright:

Available Formats

MIT OpenCourseWare

14.384 Time Series Analysis

Professor Anna Mikusheva

Paul Schrimpf, scribe

September 20, 2007

Spectrum Estimation and Information Criteria

and use them to form

a uniform sense, i.e. P sup[,] |S() S()| > 0

Using information criteria

Using information criteria

Dierent forms of g(T ) have special names

Theorem 4. If g(T ) 0 and T g(T ) , then p p0

Then since p0 > j, it must be that

= bj > 1. Also, since g(T ) , we must have

Using information criteria

Lemma 6. If T g(T ) 0 then P (p p0 ) 1.

T 2j + (j p0 )T g(T ) , so the probability converges to 1.

> 1, then p p a.s., so

yt = j j xtj . The MSE is

=Ee2t + E[( (j j )xtj )2 ]

Some asymptotics tell us

2 (p) = arg min log

Using information criteria

log (p) log

t1 = (Xt1 Xt1 )1 Xt1 yt1

Choose p to minimize P LS(p).

. This procedure will give us a consistent

2. Same pT , but choose p = minppT BIC(p). Then estimate AR(

Using information criteria

You might also like