Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Advanced Econometric Methods I:

On Stationarity, Ergodicity and the Ergodic theorem

October 22, 2020

This note somewhat formalizes the concepts of stationarity, ergodicity and the ergodic
theorem. Note that this is not exam material and merely intended for those interested in a
more complete understanding of these concepts. But please all look at footnote 5 and
note that last two theorems are also in Hayashi, hence they are included. A more
detailed treatment can be found in the book of Patrick Billingsley, Probability Theory and
Measure, notably Section 24. I hope you enjoy.

First, remember why we are doing this. The ”standard” law of large numbers by Kol-
mogorov is obviously brilliant, but the assumption of independence is inappropriate for
economic time series, which often are quite dependence. Just think of inflation or unem-
ployment series, it is impossible to argue that there is no dependence in such series. To be
able to consider these types of series in our large sample regression framework, we need laws
of large numbers that allow the random variables to be dependent. To be precise about the
kinds of dependence allowed, we need to discuss explicitly some probability theory that is
only floating around in the background in Hayashi.

1 Some basics
Let Ω denote the sample space, which is the space of all possible outcomes. The following
definition defines when a collection of subsets of Ω is a σ-field.

Definition 1. A family F of subsets of a set Ω is a σ-field provided

(i) ∅ and Ω belong to F;1

(ii) if F belongs to F, then F c (complement of F in Ω) belongs to F;

(iii) if Fi is a sequence of sets in F, the ∪∞


i=1 Fi belongs to F.
1
∅ denotes the empty set.

1
The pair (Ω, F) is called a measurable space when F is a σ-field of Ω. The sets in F are
well defined in the sense that we can assign probabilities to them, which is formalized in the
following definition.

Definition 2. Let (Ω, F) be a measurable space. A mapping P : F → [0, 1] is a probability


measure of {Ω, F} provided that

(i) P (∅) = 0;

(ii) For any F ∈ F, P (F c ) = 1 − P (F ) belongs to F, then F c (complement of F in Ω)


belongs to F;
P∞
(iii) For any disjoint sequences {Fi } of sets in F: P (∪∞
i=1 Fi ) = i=1 P (Fi ).

When P is a probability measure on the measurable space (Ω, F) we call the triple
(Ω, F, P ) a probability space. Thus a probability measure assign a probability between zero
and one to every event F ∈ F.
For econometrics the most relevant σ-field is the Borel σ-field which is defined as follows

Definition 3. The Borel σ-field B is the smallest collection of sets (called Borel sets) that
includes

(i) All open sets2 of R;

(ii) The complement B c of any set B ∈ B;

(iii) The union ∪∞


i=1 Bi of any sequence {Bi } of sets belonging to B.

We can thus think of the Borel σ-field as consisting of all events on the real line to which
we can assign a probability.3 We can extend the definition of Borel σ-fields to q-dimensional
random variables

Definition 4. The Borel σ-field B q (q < ∞) is the smallest collection of sets (called Borel
sets) that includes

(i) All open sets of Rq

(ii) The complement B c of any set B ∈ B q

(iii) The union ∪∞


i=1 Bi of any sequence {Bi } of sets belonging to B
q

2
Recall that an open set is a set containing only interior points, where a point x in the set B is interior
provided that all points in a sufficiently small neighborhood of x (y : |y − x| <  for some  > 0) are also in
B. Thus (a, b) is open, while (a, b] is not.
3
There do exists subsets of the real line that are not in B for which probabilities are not defined. This is
not so important for us.

2
In chapter 2 of Hayashi we are interested in the behavior of Wi = (yi , x0i ) which is
q = (1 + K) dimensional.4 More specifically we are interested in infinite sequences of these
vectors (n → ∞).
Corresponding to these are the Borel sets of Rq∞ defined as the Cartesian products of a
countable infinity of copies of Rq , Rq∞ = Rq × Rq × . . . In what follows we can think of ω
taking its values in Ω = Rq∞ . The events in which we are interested are the Borel sets of Rq∞ ,
which we define as follows.
Definition 5. The Borel sets of Rq∞ , denoted B∞
q
, are the smallest collection of sets that
includes
(i) all sets of the form ×∞ q q
i=1 Bi , where each Bi ∈ B and Bi = R except for finitely many
i;

(ii) the complement F c of any set F in B∞


q
;

(iii) the union ∪∞ q


i=1 Fi of any sequence {Fi } in B∞ .

A set of the form specified by (i) is called a measurable finite-dimensional product cylin-
q
der, so B∞ is the Borel σ-field generated by all the measurable finite-dimensional product
cylinders. When (Rq∞ , B∞ q
) is the specific measurable space, a probability measure P on
q q
(R∞ , B∞ ) will govern the behavior of events involving infinite sequences of finite dimen-
sional vectors, just as we require. In particular, when q = 1, the elements Wi (·) of the
sequence {Wi } can be thought of as functions from Ω = R1∞ to the real line R that simply
pick off the ith coordinate of ω ∈ Ω; with ω = {wi }, Wi (ω) = wi . When q > 1, Wi (·) maps
Ω = Rq∞ into Rq .
The following definition formally defines when a function is measurable.
Definition 6. A function g on Ω to R is F-measurable if for every real number a the set
[ω : g(ω) ≤ a] ∈ F.
When a function is F-measurable, it means that we can express the probability of an
event, say, [Wi ≤ a], in terms of the probability of an event in F, say, [ω : Wi (ω) ≤ a]. In
fact, a random variable is precisely an F-measurable function from Ω to R.
q
When the σ-field is taken to be B∞ the Borel sets of Rq∞ , we shall drop explicit reference
q
to B∞ and simply say that the function g is measurable. Otherwise, the relevant σ-field will
be explicitly identified.
Proposition 1. Let f and g be F-measurable real-valued functions, and let c be a real
number. Then the functions cf , f + g, f g, and |f | are also F-measurable.
Proof. See Theorem 13.3 in Billingsley.
A function from Ω to Rq is measurable if and only if each component of the vector valued
function is measurable.
4
The notation Wi is generic in this note. It corresponds to Hayashi Chapter 3 where Wi is defined as the
unique elements of the vector (yi , x0i , zi0 ) which includes instrumental variables as well. A special case of this
is where all variables are exogenous.

3
2 Measure preserving transformations
and stationarity
The notion of measurability extends to transformations from Ω to Ω in the following way.

Definition 7. Let (Ω, F) be a measurable space. A one-to-one transformation T : Ω → Ω


is measurable provided that T −1 (F) ⊂ F.

In other words, the transformation T is measurable provided that any set taken by the
transformation (or its inverse) into F is itself a set in F. This ensures that sets that are not
events cannot be transformed into events, nor can events be transformed into sets that are
not events.

Example 1. For any ω = (. . . , wt−2 , wt−1 , wi , wt+1 , wt+2 , . . .) define the transformation ω 0 =
T ω = (. . . , wt−1 , wi , wt+1 , wt+2 , wt+3 , . . .)0 so that T transforms w by shifting each of its
coordinates back one location. Then T is measurable since T (F ) is in F and T −1 (F ) is in
F, for all F ∈ F.

The transformation of this example is often called the shift, or the backshift operator.
By using such transformations, it is possible to define a corresponding transformation of
a random variable. For example, set W1 (ω) = W (ω), where W is a measurable function
from Ω to R (see definition 6) then we can define the random variables W2 (ω) = W (T ω),
W3 (ω) = W (T 2 ω), and so on, provided that T is a measurable transformation. The random
variables constructed in this way are said to be random variables induced by a measurable
transformation.

Definition 8. Let (Ω, F, P ) be a probability space. The transformation T : Ω → Ω is


measure preserving if it is measurable and if P (T −1 F ) = P (F ) for all F in F.

The random variables induced by measure-preserving transformations then have the prop-
erty that P [W1 ≤ a] = P [ω : W (ω) ≤ a] = P [ω : W (T ω) ≤ a] = P [W2 < a]; that is, they
are identically distributed. In fact, such random variables have an even stronger property:
they are stationary. Which is defined as follows.

Definition 9. Let G1 be the joint distribution function of the sequence {W1 , W2 , . . .}, where
Wi is a q×1 vector, and let Gr+1 be the joint distribution function of the sequence {Wr+1 Wr+2 , ...}.
The sequence {Wi } is stationary if G1 = Gr+1 for each r > 1.

In other words, a sequence is stationary if the joint distribution of the variables in the
sequence is identical, regardless of the date of the first observation.
The following two propositions link measure-preserving transformations (definition 8)
and the stationarity (definition 9).

Proposition 2. Let W be a random variable (i.e., W is a measurable function) and T be


a measure-preserving transformation. Let W1 (ω) = W (ω), W2 (ω) = W (T ω), ... , Wn (ω) =
W (T n−1 ω) for each ω in Ω. Then {Wi } is a stationary sequence.

4
Proof. See Stout page 169.

Proposition 2 implies that we can view stationary sequences as defined in Definition 9 as


those resulting from measure-preserving-transformations. The converse is also true.

Proposition 3. Let {Wi } be a stationary sequence. Then there exists a measure-preserving


transformation T defined on (Ω, F, P ) such that W1 (ω) = W (ω), W2 (ω) = W (T ω), ... ,
Wn (ω) = W (T n−1 ω) for all ω in Ω.

Proof. See Stout page 170.

That is, if {Wi } is defined as a stationary sequence (as in Definition 9) then there exists
a measure-preserving transformation that generated the sequence.

Remark 1. It is useful to compare the notation of stationarity to the more familiar i.i.d.
assumption. First, note that stationarity is a strengthening of the identical distribution
assumption, since it applies to joint and not simply marginal distributions. On the other
hand, stationarity is weaker than the i.i.d. assumption, since i.i.d. sequences are stationary,
but stationary sequences do not have to be independent.

3 Ergodicity
Does a version of the law of large numbers, hold if the i.i.d. assumption in Kolmogorov’s
strong law is simply replaced by the stationarity assumption? The answer is no, unless
additional restrictions are imposed. To illustrate the problem consider the following example

Example 2. Let Ui be a sequence of i.i.d. random variables uniformly distributed on [0, 1]


and let Z ∼ N (0, 1), independent of Ui , i = 1, 2, .... Define Yi = Z + Ui . Then {Yi } is
a.s.
stationary, but n1 ni=1 Yi does not converge to E(Yi ) = 1/2. But, n1 ni=1 Yi − Z → 1/2.
P P

In this example, Ȳn = n1 ni=1 Yi converges to a random variable, Z + 1/2 rather than
P

to a constant. The problem is that there is too much dependence in the sequence {Ȳn }.
No matter how far into the future we take an observation on Yi , the initial value Y1 still
determines to some extent what Yi will be, as a result of the common component Z. In
fact, the correlation between Y1 and Yi is always positive for any value of i. To obtain a law
of large numbers, we have to impose a restriction on the dependence or ”memory” of the
sequence. One such restriction is the concept of ergodicity.

Definition 10. Let (Ω, F, P ) be a probability space. Let {Wi } be a stationary sequence and
let T be the measure-preserving transformation of Proposition 3. Then {Wi } is ergodic if
n
X
−1
lim n P (F ∩ T i G) = P (F )P (G)
n→∞
i=1

for all events F, G ∈ F.

5
If F and G were independent, then we would have P (F ∩ G) = P (F )P (G). We can think
of T i G as being the event G shifted i periods into the future, and since P (T i G) = P (G)
when T is measure preserving, this definition says that an ergodic process (sequence) is one
such that for any events F and G, F and T i G are independent on average in the limit. Thus
ergodicity can be thought of as a form of ”average asymptotic independence”.5

Theorem 1. Let {Wi } be a stationary ergodic scalar sequence with E(Wi ) < ∞. Then
n
1X a.s.
Wi → E(Wi )
n i=1

Proof. See Billingsley Theorem 24.1.

This result is known as the ergodic theorem and generalizes Kolmogorov’s strong law
of large numbers to allow for dependence. Note that the restriction to scalar sequences is
inconsequential as almost sure convergence applies component-wise.
To apply this result in practice, we make use of the following result.

Theorem 2. Let g be an F-measurable function into Rk and define Yi = g(..., Wi−1 , Wi , Wi+1 , ...),
where Wi is q ×1. (i) If {Wi } is stationary, then {Yi } is stationary. (ii) If {Wi } is stationary
and ergodic, then {Yi } is stationary and ergodic.

Proof. See Stout pp. 170, 182

This result together with Proposition 1 implies that if {(yi , x0i )0 } is a stationary ergodic
sequence, then {xi x0i }, {xi yi }, {xi (yi − x0i β)} are stationary ergodic sequences.

4 References
• Billingsley, P. (1995) Probability Theory and Measure, 3rd edition

• Karlin, S., and H. M. Taylor, (1975), A First Course in Stochastic Processes (2d ed.),

• Stout, W. F. (1974). Almost Sure Convergence.

5
The definition that you find in Hayashi is an equivalent characterization for ergodicity (see Karlin &
Taylor, Theorem 5.6), but it contains several typos. It should be stated as (in Hayashi notation):
A stationary process {zi } is said to be ergodic if, for any two bounded functions f : Rk+1 → R and
g : Rl+1 → R,

lim |E(f (zi , . . . , zi+k )g(zi+n , . . . , zi+n+l ))| = |E(f (zi , . . . , zi+k ))| |E(g(zi , . . . , zi+l ))|
n→∞

hence there are two typos: (1) the functions f (resp g) are from Rk+1 (resp Rl+1 ) instead of Rk (resp Rl )
(2) there should be no n on the right hand side.

You might also like