Professional Documents
Culture Documents
14.384 Time Series Analysis: Mit Opencourseware
14.384 Time Series Analysis: Mit Opencourseware
14.384 Time Series Analysis: Mit Opencourseware
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Spectrum Estimation
1
14.384 Time Series Analysis, Fall 2007
Lecture 5
j=
j e
Nave approach
We cannot estimate all the covariances from a nite sample. Lets just estimate all the covariances that we
can
j =
T
1
zj zjk
T
j=k+1
T
1
j eij
j=(T 1)
Thisestimator is not consistent. It convergers to a distribution instead of a point. To see this, let y =
T
it
1
zt , so that
t=1 e
T
S() = y y
If =
0
2S() S()2 (2)
Kernel Estimator
S() =
ST
|j|
1
j eij
ST
j=ST
Under appropriate conditions on ST (ST , but more slowly than T ), this estimator is consistent1 This
can be shown in a way similar to the way we showed the Newey-West estimator is consistent.
Information Criteria
Suppose you want to estimate an AR(p), but you dont know the right p. As a rst case, lets say you know
an upper bound, p p. Then, well consider the case where p is not known or p = .
Well begin by assuming p is known.
1 In
2
(1)
(1)
A bad way: testing down We test H0 : AR(p 1) vs HA : AR(p) If accept, then test p 2 vs p 1.
Continue until rejecting. This test is bad because were doing multiple testing, so the size of this procedure is
not the usual one. In other words, the individual signicance level of each test diers from the probability of
type I errors of the entire procedure. The rejection of H0i automatically means the rejection of H0i+1 , ..., H0p.
Also, this procedure wont be consistent because if we choose a signicance level of , well incorrectly reject
portion of the time no matter how large our sample.
Let Sp = ln
2 + pg(T ), where
p2 = T1 t e2t and et are the residuals from the model with p terms (in the
| in place of
multivariate case wed use det |
2 ). pg(T ) is a penalty term for having more parameters, where
g(T ) is a non-random function. We choose
p = arg min Sp
pp
j2
(p0 j)g(T ) >0
p20
j2
p20
j2
log 2 (p0 j)g(T ) > 0 1
p0
for each j, and since there are only nitely many j < p0 , we have
P (Sp0 < min{S0 , S1 , ..., Sp0 1 }) 1
j2
(p0 j)g(T ) > 0 , j = p0 + 1, .., p
=P
p20
We can multiply both sides of the inequality by T without changing the probability:
j2
P (
p p0 ) =P T
2 (p0 j)T g(T ) > 0 , j = p0 + 1, .., p
p0
T 2j 2jp0 since it is like a likelihood ratio (in particular, it would be the likelihood ratio for testing
p0
j = 0, ..., 1p = 0, which is true). We have a nite number of these, and T g(T ) , so the whole
The proof so far has been for weak consistency. If we additionally have
p is strongly consistent.
g(T )T
2 ln ln T
Corollary 7. BIC is strongly consistent. HQIC is consistent. AIC is not; it has a positive probability of
overestimating the number of lags asymptotically.
Remark 8. When p is consistent, estimating an AR(p) model in this 2-step way has the same asymptotic
distribution as if we knew p. However, in nite samples, not knowing p can make a dierence, and the nite
sample distribution of our estimates is inuenced by model selection.
AIC and forecasting Despite its inconsistency for picking the
number of lags, the AIC is not useless. It
minimizes MSE for one-step ahead forecasting. Suppose yt = j j xtj + et . We estimate {j } and form
E[(yt yt )2 ] =E[(et +
p+1
e
T
T +p+1
e
=
T
E[(yt yt )2 ] = 2y =e +
We want to minimize this by choosing p. First, we can take the log since its a monotonic transformation
arg min
p
T +p+1
T
T +p+1
Note that
2(p + 1)
1
T +p+1
=1+
+ O( 2 )
T (p + 1)
T
T
T +p+1
T (p+1)
2(p+1)
T
+ O( T12 ). Therefore,
2(p + 1)
1
+ O( 2 )
T
T
1
AIC + O( 2 )
T
Thus, the AIC minimizes the variance of the one step ahead forecast error.
Consider an alternative approach:
yt = xt + ut
(yt yt|t1 )2
t=m+1
aj Lj
j=1
The spectrum is
S() =
|1 + j=1 aj eij |2
From this, we see that what we really want to estimate is the innite sum in the denominator. There are
two approaches
1. Estimate AR(pT ), let pT , but not too fast, say
estimate of the spectrum. To see this, consider:
|
j=1
ij
aj e
pT
j=1
p3T
T
a
j eij |
|aj a
j |
j=1
By a similar argument to what we used to show the consistency of HAC, we could show that
a
j | 0
j=1 |aj
Remark 11. Of course, the condition that pT = (ln T )a is a bit dicult to interprete in practice. For
T 1/3 1
2
T
4
50
4 (ln T )
4
4
example, 50
250
7
8
2500
15
15