Professional Documents
Culture Documents
Topic1 Version 1
Topic1 Version 1
1
1.1
Time series econometrics is concerned with the analysis of variables that are observed in sequence over
time. Some examples of economic time series variables are GNP, inflation, interest rate, unemployment
rate, and so on. The next figure shows U.S. real GDP per capita observed annually from 1909 to 2004:
US Real GNP per capita, Base Year 2000
$40,000
$35,000
Real GNP/Capita
$30,000
$25,000
$20,000
$15,000
$10,000
$5,000
18
9
18 0
9
18 3
9
18 6
9
19 9
0
19 2
0
19 5
0
19 8
1
19 1
1
19 4
1
19 7
2
19 0
2
19 3
2
19 6
2
19 9
3
19 2
3
19 5
3
19 8
4
19 1
4
19 4
4
19 7
5
19 0
5
19 3
5
19 6
5
19 9
6
19 2
6
19 5
6
19 8
7
19 1
7
19 4
7
19 7
8
19 0
8
19 3
8
19 6
8
19 9
9
19 2
9
19 5
9
20 8
0
20 1
04
$0
Year
Y = yt=
= {. . . , y1 , y0 , y1 , . . . , yt , yt+1 , . . .} .
A realisation of T observations of such a process is denoted by {yt }Tt=1 = {y1 , ..., yT }. Stochastic means
that the exact point where future realisations will lie is uncertain. Suppose we know that the process of
interest can be described by the following simple time series model:
yt = + t + t1 .
(1)
where and are two non random parameters, and {t }Tt=1 is a series of independent random variables
with zero mean and constant variance. A set of realisations of t , for t = 1, ..., T , defines then a set of
realisations of yt , for t = 1, ..., T . N dierent realisations of t define then N dierent realisations of yt .This
in turn implies that (1) defines a joint distribution for the random variables {y1 , ..., yT }. The knowledge
of the properties of such distribution is crucial for predicting or forecasting future realisations of yt . In
practice, what often aim at learning about the first two moments only: the mean, the autocovariances,
and the autocorrelations of the process.
200
Figure 2 depicts two dierent draws of 200 i.i.d. shocks, {t }t=1 , from a normal law with mean zero
and constant variance 1, the blue draw and the pink draw.
Two draws T=200 of normally distributed shocks, N(0, 1)
4
91
96
10
1
10
6
11
1
11
6
12
1
12
6
13
1
13
6
14
1
14
6
15
1
15
6
16
1
16
6
17
1
17
6
18
1
18
6
19
1
19
6
81
86
76
71
66
56
61
46
51
31
36
41
26
21
16
6
11
0
1
shock
-1
-2
-3
-4
time
N001
N002
Figure 2
Figure 3 depicts the corresponding values taken by the previous process (1) for = 0.1 and = 0.99.
Two draws of the autoregressive process with mu = 0.1 and theta = 0.99
6
5
4
3
2
86
81
76
71
66
56
61
46
51
36
41
31
26
21
16
91
96
10
1
10
6
11
1
11
6
12
1
12
6
13
1
13
6
14
1
14
6
15
1
15
6
16
1
16
6
17
1
17
6
18
1
18
6
19
1
19
6
-1
6
11
-2
-3
-4
-5
time
Y001
Y002
Figure 3
Note that we have a problem for y0 . For instance, for the blue process, we have 0 = 0.078 and
1 = 0.97, which yields y1 = + 1 + 0 = 0.1 0.97 0.99 0.078 = 0.95. Likewise, to calculate
y0 = + 0 + 1 which means that we need the value of 1 to calculate y0 . A standard practice for, this
problem of initialisation, is to set 1 to its expected value = 0, in which case y0 = 0.1 0.078 = 0.022.
Suppose that, unknown to us, the true value of 1 for that realisation was dierent from 0. Then clearly
we would not have found the correct value for y0 . Would our picture in Figure 3 be much aected?
The properties of a time series process can be examined in the time domain or in the frequency domain.
In the frequency domain, time series processes are represented as the sum of periodic functions and the
focus is on the contributions made by such periodic components to the series. Here, we will examine the
time domain properties of time series processes only, where the focus is on the properties of the joint
distribution of {y1 , ..., yT } at time t and at time t + (for further reference on the frequency domain, see
Harvey, ch. 6, and Hamilton, ch. 6).
1.2
1.2.1
Mean Consider a (absolutely continuous) random process Yt , of which we observe a realisation over time
t = 1, ..., T , {{y1 , ..., yT }. The mean of the process at time t is defined as
t E(Yt ),
(2)
where E(Yt ) = yt fYt (yt )dyt is the expected value of Yt , fYt is the density of probability of Yt . Note
that the expectation may change with time: the mean can be represented by the (deterministic with
respect to the information set under consideration) function:
: t (t) E(Yt ).
We will study many processes which have the property of mean-stationarity, that is the function (t)
is a constant over time: it is time-invariant. In this case, there exists a real number such that
t, (t) = .
For our process (1), the unconditional expected value at time t is calculated as
E [yt ] =
=
=
=
E [ + t + t1 ]
E [] + E [t ] + E [t1 ]
+0+0
.
Hence, this process is, in fact, mean stationary. Note that this unconditional expectation is independent
of the realisation of the process: both the series generated by the blue shocks and by the pink shocks have
the same unconditional expectation = 0.1.
There is another concept of expectation that does come into play: conditional expectation. For
illustration, on the blue series of shocks, we had 9 = 1.16 and 10 = 1.35, which meant that y10 = 0.10.
Does the knowledge of this information aect the expected value of, say, y11 ? The answer is "it does".
Let I10 = {9 = 1.16 and 10 = 1.35} be the information at time t = 10,
E [y11 | I10 ] =
=
=
=
(3)
Note, first, that the knowledge of 9 did not matter for the calculation of E [yt+1 | It ], and, second, that the
conditional expectation of yt+1 depends on the realisation. For the pink process, we had in fact 10 = 2.49,
which meant that, in the world of the pink process, E [y11 | I10 ] = 0.1 + 0.99 2.49 = 2.36 6= 1.24.
It is easy enough to read the value of 10 when one has just constructed the process. In practice, one
has to draw a distinction between what is observable, typically the values of yt , and what is unobservobs
able, the underlying shocks {t }. The observable information
set I10
will then consist of the observed
obs
obs
values I10
=
{y
,
.
.
.
,
y
}
,
and
the
variable
E
|I
,
needed
to
calculate
the conditional expected
1
10
10 10
obs
value
y11 |I10 , will have to be inferred from this information. Typically, the conditional expectation
E obs
E y11 |I10
6= E [y11 | I10 ]: the information set matters when calculating the conditional expectation.
Variance The variance is defined as:
V ar(Yt ) E (Yt t )2 .
(4)
Note the variance is never negative. Again, the variance can be thought of as a (deterministic with respect
to the information set under consideration) function of time
V ar : t V ar (Yt ) E (Yt t )2 .
Again, one has to distinguish between conditional and unconditional variances, where the conditional
variance at some future time t + is predicated on the information available at that time
Furthermore, we will also be interested in processes for which the variance is constant over time, i.e. there
exists a positive number V , such that t, V ar (Yt ) = V .
i.i.d.
For our process (1), recall that N 0, 2 where = 1. The unconditional variance at time t is
given by
V ar(yt ) = E (yt t )2
h
i
2
= E (( + t + t1 ) )
h
i
2
= E (t + t1 )
= E 2t + 2t t1 + 2t1
= E 2t + 2E [t t1 ] + 2 E 2t1
= E 2t + 2E [t ] E [t1 ] + 2 E 2t1
= 2 + 0 + 2 2
= 1 + 2 2 ,
where we used (3) in to go from the first second line, and the i.i.d. property to go from the third
to the forth. Both conditional and unconditional variances are time stationary. Also note that the
conditional expectation, given our information set, is independent of the realisation of t : in contrast with
the conditional expectation of the process, which depended on the realisation of the shock, the conditional
expectation under consideration is completely deterministic: the blue process, the pink process and any
other process we could draw would have the same conditional variance V ar(yt+1 | It ) given It = {t }.
Autocovariance The (unconditional) autocovariance at lag , ,t , are given by:
,t E (Yt+ t+ )(Yt t ) .
(6)
The series ,t = = . . . , 1,t , 0,t , 1,t , . . . is called the autocovariance function. It associates
to each lag the autocovariance at this lag ,t . As before, we can write this relationship as the function:
t : t ( ) E (Yt+ t+ )(Yt t ) .
t ( ) = t ( ) ,
which means that, in practice, we are only interested in positive lags 0. Also note that the autocovariance function is also a function of time t. Again, a particular case of interest will be when this
function does not depend on time, in which case we write t ( ) = ( ). One can also define the concept
of conditional autocovariance.
For our process (1), the unconditional autocovariance at time t is given by
,t = E (yt+ t+ )(yt t )
= E [(t+ + t+ 1 )(t + t1 )]
= E [t+ t ] + (E [t+ 1 t ] + E [t+ t1 ]) + 2 E [t+ 1 t1 ] .
(7)
We simplify our task by recalling that: (i) we only need to calculate ,t for positive values of , (ii) we
= 0 + 2 + 0 + 2 0
= 2 ,
and beyond. Hence, the autocovariance function of this process is time-invariant. Its graph is depicted in
figure 4.
Autocovariance function
2.5
autocovariance
1.5
0.5
0
-5
-4
-3
-2
-1
lag
V ar(Yt+ )V ar(Yt )
,t 0,t
(8)
As for the autocovariances which defined the (possibly time varying) autocovariance function, these autocorrelations define a (possibly time varying) autocorrelation function
E (Yt+ t+ )(Yt t )
t : t ( ) q
.
E (Yt+ t+ )2 E [(Yt t )2 ]
At lag = 0, the autocorrelation function is always 1. At other lags, the autocorrelation is bounded
between -1 and 1: 1 ,t 1 (Hlder inequality). Hence, the autocorrelation function will always have
a peak of = 1 at lag = 0.
If the variance of the process is constant over time and equal to some constant 0 0, t, V ar(Yt ) =
0,t = 0 , then the autcorrelation can be written more simply as
,t =
,t
.
0
In this case, the symmetry of the autocovariance function ,t transfers to the autocovariance function:
,t = ,t .
If, in addition, the autocovariance function is also time-invariant, the autocorrelation function is timeinvariant as well and is given can be written more simply as
=
.
0
= 0,
1, for
=
/ 1 + 2 , for = 1, or 1,
0
for
all other .
whose graph is depicted in Figure 5
Autocorrelation function
0.9
0.8
autocorrelation
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-5
-4
-3
-2
-1
lag
For t = 1 Consider, first, only the paths for which y0 fell into the first bin [0.05, 0.05). Use the same collection B to construct the frequencies for y1 falling within each interval [0.05, 0.05), [0.05, 0.15), . . .:
it provides an estimate of the conditional probability: fy1 | y0 =0 . Then, move on to the next bin
[0.05, 0.15) to estimate, in the same way, the distribution of conditional probability: fy1 | y0 =0.1 , and
so on.
For further ts Iterate the previous procedure to calculate the distribution of conditional probabilities
(example) fyt | {yt1 =0.1, yt2 =1.2, ...} .
It is clear that, even if we start with a very large, but finite, number of observations of paths, we are
likely to have very few observations per bin as t becomes large. An alternative strategy would be to try
to guess what the functional form of the process generating these paths y are (remember, we do not they
have been generated by (1)). Lets suppose we guess something like
i.i.d.
yt = + t , with t N 0, 2 .
(9)
We would have to elaborate a strategy to (i) estimate the parameters {, , }, (ii) test for the congruence
of our modelling to the data: if the ys were really generate by the process (9), how likely would we be
to observe all of these paths? Here (9) implies that there should be no autocorrelation at lag 1 (since
the variability in yt+1 depends only on t+1 and the variability in yt only on t , and t+1 and t are
independent). From all the paths we have, we can easily estimate ,t for many s and ts. If we find,
say, that 1,1 = 0.5, then we may suspect our model does not fit the data very well. We can try to
derive the distribution of probability of the 1,1 associated with the process (9), either by algebra or by
simulating
it on acomputer for the number of paths we have at our disposal. If we we to find, say, that
Prob 1,1 0.1 = 95%, we would conclude that 1,1 = 0.5 is deep into the rejection region and we
would have to conclude our guess was quite likely not a good one. We would then have to try to postulate
another functional form and repeat the process of (i) inference and (ii) testing. Should we be satisfied
with our testing, we would still be interested in the precision of our inference: how far our estimates likely
to be from their actual values? This would be crucial for the next step which is likely to be forecasting.