14.384 Time Series Analysis: Mit Opencourseware

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

MIT OpenCourseWare

http://ocw.mit.edu

14.384 Time Series Analysis


Fall 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Spectrum Estimation

1
14.384 Time Series Analysis, Fall 2007

Professor Anna Mikusheva

Paul Schrimpf, scribe

September 20, 2007

Lecture 5

Spectrum Estimation and Information Criteria


Spectrum Estimation
Same
last time. We have a stationary series, {zt } with covariances j and spectrum S() =
setupas
ij
. We want to estimate S().

j=
j e

Nave approach

We cannot estimate all the covariances from a nite sample. Lets just estimate all the covariances that we
can
j =

T
1
zj zjk
T
j=k+1

and use them to form


S() =

T
1

j eij

j=(T 1)

Thisestimator is not consistent. It convergers to a distribution instead of a point. To see this, let y =
T
it
1
zt , so that
t=1 e
T
S() = y y

If =
0
2S() S()2 (2)

Kernel Estimator

S() =

ST

|j|
1
j eij
ST

j=ST

Under appropriate conditions on ST (ST , but more slowly than T ), this estimator is consistent1 This
can be shown in a way similar to the way we showed the Newey-West estimator is consistent.

Information Criteria
Suppose you want to estimate an AR(p), but you dont know the right p. As a rst case, lets say you know
an upper bound, p p. Then, well consider the case where p is not known or p = .
Well begin by assuming p is known.
1 In

a uniform sense, i.e. P sup[,] |S() S()| > 0

Using information criteria

2
(1)

(1)

A bad way: testing down We test H0 : AR(p 1) vs HA : AR(p) If accept, then test p 2 vs p 1.
Continue until rejecting. This test is bad because were doing multiple testing, so the size of this procedure is
not the usual one. In other words, the individual signicance level of each test diers from the probability of
type I errors of the entire procedure. The rejection of H0i automatically means the rejection of H0i+1 , ..., H0p.
Also, this procedure wont be consistent because if we choose a signicance level of , well incorrectly reject
portion of the time no matter how large our sample.

Using information criteria

Let Sp = ln
2 + pg(T ), where
p2 = T1 t e2t and et are the residuals from the model with p terms (in the
| in place of
multivariate case wed use det |
2 ). pg(T ) is a penalty term for having more parameters, where
g(T ) is a non-random function. We choose
p = arg min Sp
pp

Dierent forms of g(T ) have special names


Denition 1. AIC (Akaike): g(T ) = T2
BIC (Bayesian/Schwartz): g(T ) = lnTT
HQIC (Hannan-Quinn): g(T ) = 2 lnTln T
Question: which of these are consistent?
Denition 2. p is weakly consistent if lim P (
p = p0 ) = 1 (when we just say consistent, we mean weakly
consistent)
Denition 3. p is strongly consistent if P (lim p = p0 ) = 1
p

Theorem 4. If g(T ) 0 and T g(T ) , then p p0


Proof. As usual, we will only sketch the proof.
Lemma 5. If g(T ) 0, then P (
p p0 ) 1
Proof. Note that
P (
p p0 ) =P (Sp0 < min{S0 , S1 , ..., Sp0 1 })
This happens when Sp0 < Sj for j < p0 , which means
log p20 + p0 g(T ) < log j2 + jg(T )
log

j2
(p0 j)g(T ) >0

p20

Then since p0 > j, it must be that


p20
j2 , so
P

j2

p20

= bj > 1. Also, since g(T ) , we must have

j2
log 2 (p0 j)g(T ) > 0 1

p0

for each j, and since there are only nitely many j < p0 , we have
P (Sp0 < min{S0 , S1 , ..., Sp0 1 }) 1

Using information criteria

Lemma 6. If T g(T ) 0 then P (p p0 ) 1.


Proof. As above, note that
P (p p0 ) =P (Sp0 < min{Sp0 +1 , Sp0 +1 , ..., Sp})

j2
(p0 j)g(T ) > 0 , j = p0 + 1, .., p
=P

p20
We can multiply both sides of the inequality by T without changing the probability:

j2

P (
p p0 ) =P T
2 (p0 j)T g(T ) > 0 , j = p0 + 1, .., p

p0

T 2j 2jp0 since it is like a likelihood ratio (in particular, it would be the likelihood ratio for testing
p0

j = 0, ..., 1p = 0, which is true). We have a nite number of these, and T g(T ) , so the whole

T 2j + (j p0 )T g(T ) , so the probability converges to 1.


p0

The proof so far has been for weak consistency. If we additionally have
p is strongly consistent.

g(T )T
2 ln ln T

> 1, then p p a.s., so

Corollary 7. BIC is strongly consistent. HQIC is consistent. AIC is not; it has a positive probability of
overestimating the number of lags asymptotically.
Remark 8. When p is consistent, estimating an AR(p) model in this 2-step way has the same asymptotic
distribution as if we knew p. However, in nite samples, not knowing p can make a dierence, and the nite
sample distribution of our estimates is inuenced by model selection.
AIC and forecasting Despite its inconsistency for picking the
number of lags, the AIC is not useless. It
minimizes MSE for one-step ahead forecasting. Suppose yt = j j xtj + et . We estimate {j } and form

yt = j j xtj . The MSE is


(j j )xtj )2 ]

=Ee2t + E[( (j j )xtj )2 ]

E[(yt yt )2 ] =E[(et +

Some asymptotics tell us

p+1
e
T
T +p+1
e
=
T

E[(yt yt )2 ] = 2y =e +

We want to minimize this by choosing p. First, we can take the log since its a monotonic transformation
arg min
p

T +p+1
T
T +p+1

2 (p) = arg min log


p2 + log
T
T (p + 1)
T (p + 1)
p

Note that
2(p + 1)
1

T +p+1
=1+
+ O( 2 )

T (p + 1)
T
T

Using information criteria


so log

T +p+1
T (p+1)

2(p+1)
T

+ O( T12 ). Therefore,
2(p + 1)
1
+ O( 2 )
T
T
1
AIC + O( 2 )
T

log (p) log


p2 +

Thus, the AIC minimizes the variance of the one step ahead forecast error.
Consider an alternative approach:
yt = xt + ut

t1 = (Xt1 Xt1 )1 Xt1 yt1


yt|t1 = t1 xt
P LS(p) =

(yt yt|t1 )2

t=m+1

Choose p to minimize P LS(p).


Wei (1992) shows that for m xed
1
1
P LS(p) =SpBIC + o( )
T m
T
pP LS =
pBIC + op (1)
Innoue and Killian (2002) shows that if m = cT
1
1
P LS(p) = SpAIC + o( 2 )
T m
T
Remark 9. A larger point is that the best model selection criteria depends on what your goal is.
Example 10. Suppose we want to estimate the spectrum by tting an AR() model.
a(L)yt = et a(L) = 1 +

aj Lj

j=1

The spectrum is
S() =

|1 + j=1 aj eij |2

From this, we see that what we really want to estimate is the innite sum in the denominator. There are
two approaches
1. Estimate AR(pT ), let pT , but not too fast, say
estimate of the spectrum. To see this, consider:
|

j=1

ij

aj e

pT

j=1

p3T
T

. This procedure will give us a consistent

a
j eij |

|aj a
j |

j=1

By a similar argument to what we used to show the consistency of HAC, we could show that
a
j | 0

j=1 |aj

2. Same pT , but choose p = minppT BIC(p). Then estimate AR(


p) and compute the spectral density.
a
If pT = (ln T ) , 0 < a < , and if (i) p0 < then P (
p = p0 ) 1, or if (ii) p0 = , then lnpT 1
and sup |S() S()| p 0.

Using information criteria

Remark 11. Of course, the condition that pT = (ln T )a is a bit dicult to interprete in practice. For

T 1/3 1
2
T
4
50
4 (ln T )
4
4

example, 50
250
7
8

2500
15
15

You might also like