Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Characteristic functions

1/13/12

Literature
Rick Durrett, Probability: Theory and Examples, Duxbury 2010.
Olav Kallenberg, Foundations of Modern Probability, Springer 2001,
Eugene Lukacs, Characteristic Functions, Hafner 1970.
Robert G. Bartle, The Elements of Real Analysis, Wiley 1976.
Michel Loève, Probability Theory, Van Nostrand 1963.

Contents
1 Definition and basic properties 2

2 Continuity 4

3 Inversion 7

4 CLT 9
4.1 The basic CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Lindeberg-Feller Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Poisson convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Poisson random measure 19


5.1 Poisson measure and integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 About stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Classical Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Transformations of Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.1 Nonhomogeneous Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.2 Reward or compound Poisson process . . . . . . . . . . . . . . . . . . . . . . 27
5.5 A few constructions of Poisson random measure . . . . . . . . . . . . . . . . . . . . . 29
5.5.1 adding new atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.5.2 gluing the pieces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.5.3 using a density of a random element . . . . . . . . . . . . . . . . . . . . . . . 31
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.7 Non-positive awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.8 SSα - symmetric α-stable processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Infinitely divisible distributions 40


6.1 Preliminaria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 A few theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 A side trip: decomposable distributions . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.4 ID of Poisson type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5 Lévy-Khinchin formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1 Definition and basic properties


Let µ be a probability distribution of a random variable X. The characteristic function, a.k.a.
Fourier transform, is the complex valued one-parameter function

µ̂(t) = φ(t) = eıtx µ(dx) = E eitX .
R

Similarly we define the ch.f. of a probability distribution µ = L(X) on Rd or in a Hilbert space


⟨ ⟩
where tx = t, x is the inner product. The definition applies also to finite measures, even to signed
measures of bounded variation. The term “characteristic function” is restricted to probability
measures.

b(t) = E eitX has the properties:


Proposition 1.1 Every ch.f. φ(t) = µ

1. φ(0) = 1;

2. |φ| ≤ 1;

3. φ is uniformly continuous on R.

4. φ is semi-positive definite, i.e.,


∑∑
φ(tj − tk ) zj zk ≥ 0, for every finite sets { tj } ⊂ R, { zj } ⊂ C
j k

Proof.
isX i(s−t)X

(3): |φ(s) − φ(t)| ≤ E e −eitX
≤ E 1 − e ≤ E 1 ∧ |s − t|X.
∑ 2 ∑ ∑

(4): 0 ≤ E zj E eitj X = φ(tj − tk )zj zk .
j j k

A probabilist should know ch.fs. of basic probability distributions by heart and how they behave
under simple transformations. To wit:

2
Proposition 1.2 1. φaX (t) = φX (at), hence φ−X = φ.

2. A convex combination of ch.fs. is a ch.f.

3. Hence, given a ch.f. φ, ℜφ = (φ + φ)/2 is a ch.f.

4. The finite product of ch.fs. is a ch.f. Namely,

φX1 · · · φXn = φX1′ +···+Xn′ ,

where Xk ’s are independent copies of Xk . In other words,

µˆ1 · · · µˆn = (µ1 ⊗ · · · ⊗ µn )ˆ.

5. Hence, given a ch.f. φ, |φ| and the natural powers φn and |φ|n are ch.fs.
D
6. a ch.f. is real if and only if1 X is symmetric, i.e. X = −X.

Notice that We will present examples as needed.

Example 1.3 A “duality”.


2(1 − cos t) 1 − cos x
The triangular density (1 − |x|)+ has the ch.f. 2
. The Polya density has the
t πx2
ch.f. (1 − |t|)+ .
1
The symmetrized exponential distribution with the density e−|x| /2 has the ch.f. . The ch.f.
1 + t2
1
of the Cauchy density equals e−|t| .
π(1 + x2 )

Using the idea from the proof of (3) of Proposition 1.1, for a family µα = L(Xα ) we obtain the
upper estimate that involves the standard L0 -metric:

sup |φα (s) − φα (t)| ≤ sup ∥(s − t)Xα ∥0


α α

Corollary 1.4 If the family { µα } is tight (i.e., { Xα } is bounded in L0 ), then { φα } is uniformly


equi-continuous.

The opposite implication is also true.


1
only the “if” part is obvious now

3
Lemma 1.5 Consider µ = L(X), φ = µ̂. Then, for r > 0,

r 2/r
(1) P(|X| ≥ r) = µ[−r, r]c ≤ (1 − φ(t)) dt,
2 −2/r
∫ 1/r
(2) P(|X| ≤ r) = µ[−r, r] ≤ 2r |φ(t)| dt.
−1/r

Proof.
W.l.o.g. we may and do assume that r = 1 (just consider X/r and change the variable in the right
hand side integrals).

(1): By Fubini’s theorem the right hand side equals


∫ ( )
1 2( ) sin 2X
E 1−e itX
dt = E 2 1 − ≥ E 1I{|X|≥1} = P(|X| ≥ 1).
2 −2 2X

(2): In virtue of Fubini’s theorem the left hand side is estimated as follows, using the formula for
the ch.f. of the triangular density:
∫ ∫ ∫
1 2(1 − cos X)
E 1I{|X|≤1} ≤ E =E (1 − |t|)+ e
itX
dt = (1 − |t|)+ φ(t) dt ≤ |ϕ(t)| dt.
2 X2 R R R

Corollary 1.6 If a family { φα } of ch.fs. is equicontinuous at 0, then µα is tight.

Proof. Let ϵ > 0 and δ > 0 be such that supα |1 − φα (t)| < ϵ/2 whenever |t| < δ. Let r0 = 2/δ.
Then (1) in the Lemma entails

sup P(|Xα | ≥ r) ≤ ϵ for r > r0


α

2 Continuity
Theorem 2.1 (Lévy Continuity Theorem)
cn and φ0 = µ
For ch.fs. φn = µ c0 the following are equivalent:

1. φn → φ0 pointwise;
w
2. µn → µ0 ;

3. φn → φ0 uniformly on every interval.

Proof. (2) ⇒ (1) follows by the definition of weak convergence and (3) ⇒ (1) is obvious.

The remaining nontrivial implications (1) ⇒ (2) and (2) ⇒ (3) would be much easier to prove
if the measures would have the common bounded support, i.e., underlying random variables were

4
bounded. However, each of the assumptions implies that the family { µn } is tight, i.e. they are
almost supported by a compact set.

(1) ⇒ (2): Assume the point convergence of ch.fs., which means that µn et → µ0 et for special
functions et (x) = eitx , and thus for their finite linear combinations, forming an algebra A. We
must show that µn f → µ0 f for every continuous bounded function on R.

We infer that { µn } is tight. Indeed, let ϵ > 0 and choose r > 0 such that |1 − φ(t)| < ϵ/4 for every
t ∈ [−r, r]. By Lemma 1.5.1 and the Dominated Convergence Theorem
∫ ∫
r 2/r r 2/r
lim sup µn [−r, r] ≤ lim sup
c
(1 − φn (t)) dt = (1 − φ(t)) dt < ϵ/2
n n 2 −2/r 2 −2/r
Then there is n0 such that
sup µn [−r, r]c < ϵ.
n>n0

At the same time there is r′ > 0 such that

sup µn [−r′ , r′ ]c < ϵ


n≤n0

Taking R = r ∨ r′ ,
sup µn [−R, R]c < ϵ.
n
For any continuous bounded complex function h on R
∫ ∫


h µn − h µ ≤ 2||h||∞ ϵ. (2.1)
[−R,R]c [−R,R]c

The Stone-Weierstrass Theorem (cf., e.g., Bartle, Theorem 26.2) says:

If K ⊂ Rd is compact, and A is a separating algebra with unit that consists of complex functions
on K, then every continuous function on K can be uniformly approximated by members of A.

Take K = [−R, R] and the algebra A, defined above. In virtue of the Stone-Weierstrass Theorem
there is g ∈ A such that ||f − g||K < ϵ. Hence, using (2.1) for h = f and for h = g,


(µn − µ0 )f ≤ (µn − µ0 )f 1IK c + ≤ (µn − µ0 )f 1IK


≤ 2||f ||∞ ϵ + (µn − µ0 )(f − g)1IK + (µn − µ0 )g1IK c + (µn − µ0 )g
( )

≤ 2||f ||∞ + 2 + 2||g||∞ ϵ + (µn − µ0 )g

Let n → ∞ and then ϵ → 0.

(2) ⇒ (3): Assume the weak convergence, consider an interval [−T, T ], and let ϵ > 0. So,
{ µn : n ≥ 0 } is tight, hence there is r > 0 such that


ϵ
sup eitx µn (dx) ≤ sup µn [−r, r]c ≤ . (2.2)

n≥0 [−r,r]c n 4

5
Then
∫ ∫ ∫
r

|φn (t) − φ(t)| ≤ itx
e µn (dx) + itx
e µ0 (dx) + e (µn − µ0 )(dx) = A + B + C
itx
[−r,r]c [−r,r]c −r

By (2.2), the sum A + B of the two first terms is bounded by ϵ/2.

To estimate the third term, consider a partition (xk ) of [−r, r] of mesh < ϵ/(8T ), chosen from the
continuity set of µ0 . In particular, we may enlarge the interval [−r, r], so −r, the first point of the
partition, and r, the last point of the partition, are also continuity points. In short,
∫ r ∑ ∫ xk
=
−r k xk−1

Adding and subtracting the term eitxk on each interval (xk−1 , xk ], C is bounded from above by the
following expression:

∑ ∫ xk ( ) ∑ ∫ xk ( ) ∑

eitx − eitxk µn (dx) + eitx − eitxk µ0 (dx) + µn (xk−1 , xk ] − µ0 (xk−1 , xk ]
xk−1 xk−1
k k k

Since eitx − eitxk ≤ t|x − xk |, hence

∑ ∫ xk ( ) ϵ ∑ ∫ xk

ϵ

e −e
itx itxk
µn (dx) ≤ µn (dx) ≤ ,
xk−1 8 xk−1 8
k k

so the sum of the first two terms in the latter estimate is less than ϵ/4. For the third term, choose
n0 such that, for every n ≥ n0 ,
∑ ϵ

µn (xk−1 , xk ] − µ(xk−1 , xk ] < .
4
k

So, for every ϵ > 0 and every interval [−T, T ] there is n0 such that for every t ∈ [−T, T ] there holds
|φn (t) − φ(t)| < ϵ, which completes the proof of (2) ⇒ (3).

Corollaries and remarks

1. The Lévy Continuity Theorem easily extends to Rd . In the language of random vectors
D D
Xn → X ⇐⇒ a · Xn → a · X, a ∈ Rd

2. Lévy Uniqueness Theorem φ = φ0 ⇐⇒ µ = µ0 .

3. The assumption that the limit φ0 is a ch.f. can be relaxed.

It suffices to assume that φ0 is continuous at 0, then it will be a ch.f. of some measure µ0 .

6
This follows by Prokhorov’s theorem. The continuity at 0 is necessary. For example, the ch.f.
sin nt
of the uniform distribution on [−n, n] is nt , which converges to 1I{0} (t). This sequence of
measures is not tight. It generates a continuous functional on Cb (R), a sort of generalized
integral akin to Césaro sum: ∫ n
1
Λf = lim f (x) dx
n 2n −n
which does not correspond to a measure.

4. The Lévy Continuity Theorem extends to bounded measures, also to signed measures with
bounded variation, after the weak convergence is augmented by the condition limn µn R = µR.

3 Inversion
Define the signum function as
σ = sign = 1I(0,∞) − 1I(−∞,0) .

The following formula involves an improper integral of a function that is not integrable:
∫ ∞
sin ux
dx = πσ(u).
−∞ x

Its value follows from the Cauchy theorem that states that an analytic function f (z) on an open
simply connected domain in the complex plane entails the curve integral vanishing over a rectifiable
eiz
simple closed curve. So, choosing u ̸= 0 and then u = 1, f (z) = z is analytic on the complement
of any closed disk centered at the origin. Denote by S(r) the semidisk |z| ≤ r, ℑz ≥ 0, and let
C = C(ϵ, r) be the boundary of S(r) \ S(ϵ), oriented counterclockwise. Then, using the standard
parametrization of four fragments - two segments and two semicircles, we have
I ∫ r ix ∫ π ∫ −ϵ ix ∫ 0 ∫ ∞
e ireiθ e iϵeiθ sin x
0= f (z) dz = dx + i e dθ + dx + i e dθ → i − iπ
C ϵ x 0 −r x π −∞ x

as ϵ → 0, r → ∞. This approach allows to define the integral


∫ ∞ iut ∫
e def eiut
dt = lim dt = πσ(u)
−∞ it ϵ→0 |t|>ϵ it

Alternatively, we may consider a more standard option


∫ ∞ iut
e − eivt
dt = π (σ(u) − σ(v))
−∞ it

which entails ∫ ∞
ei(x−a)t − ei(x−b)t
dt = 2π1I(a,b) (x) + π1I{a,b} (x) (3.1)
−∞ it

7
As noted previously, the ch.f. of the uniform distribution on [−T, T ]
sin T t
→ 1I{0} (t) as T → ∞.
υT (t) =
Tt
∑ ∑
Hence, for a bounded discrete measure µ = n an δxn , i.e. for which an ≥ 0 and n an < ∞,
∫ ∑
1 T
sin T (xn − xk )
lim µ̂(t)e−itxk = lim an = ak .
T →∞ 2T T T →∞
n
T (xn − xk )

On the other hand, ∫


1 ∞
e−itv ∑
µ̂(t) dt = an σ(xn − v),
π ∞ it n

whence, for u < v,


∫ ∞ ∑ σ(xn − u) − σ(xn − v)
1 e−itu − e−itv µ { u, v }
µ̂(t) dt = an = µ(u, v) + . (3.2)
2π ∞ it n
2 2

Since µ = µd + µc , with the pure discrete and pure continuous part, we may consider only the
latter.

Theorem 3.1 (Inversion Theorem) Let µ be an atomless bounded measure. Then, for every
a < b, ∫ ∞
1 e−iat − e−itb
µ(a, b) = µ̂(t) dt
2π −∞ it
Proof. By Fubini’s theorem for improper integrals (Exercise 2) we rewrite the right hand side and
use (3.1):
∫ ∞ ∫ ∞ ∫ ∞
1 eit(x−a) − eit(x−b)
dt = 1I(a,b) (x) µ(dx) = µ(a, b).
−∞ 2π −∞ it −∞

Corollary 3.2 If φ = µ̂ is integrable, then µ is absolutely continuous and its density f (x) is
continuous: ∫ ∞
1
f (x) = φ(t) e−itx dt.
2π −∞
Proof. Choose a = x, b = x + h in the theorem, divide by h, and let h → 0. The proof of continuity
is left as an exercise.

8
4 CLT

4.1 The basic CLT

A ch.f. can be perceived as a path in the unit disk of the complex plane. The ch.f. eita of a
point mass δa is a periodic circular path, visiting the point 1 infinitely often. So does the ch.f. of
a discrete measure with finitely many co-rational atoms xn (that is, for some t, txn is an integer
for every n). Periodicity will result from a lattice distribution of atoms (i.e., when they form an
arithmetic sequence). Otherwise, only lim sup |φ(t)| = 1 is certain (Lukacs, 2.2).
t→∞

On the other hand, lim |φ(t)| = 0 indicates an absolutely continuous (atomless) measure (ibid.)
t→∞

It follows immediately that the existence of the k-th moment of µ entails the existence of the k-th
derivative of µ̂. The inverse implication is not quite simple and we will not go there (but see
Exercise 3b).

For a complex valued function g ∈ C n+1 (R), Taylor’s theorem states that

n
g (k) (0)
g(x) = xk + Rn (x).
k!
k=0

Among various versions of the remainder, we choose the integral form:



1 x (n+1)
Rn (x) = g (t) (x − t)n dt
n! 0
The formula is true under weaker assumptions. For example, it suffices to assume that the n-th
derivative is absolutely continuous. Then the (n + 1)-the derivative exists in the Radon-Nikodym
sense and is integrable on every bounded interval. However, we are interested only in the smooth
function g(x) = eix , with a simple remainder that can be further refined by integrating by parts:
∫ ( n ∫ x )
in+1 x it x 1
Rn (x) = e (x − t) dt = i −
n n
+ e (x − t)
it n−1
dt .
n! 0 n! (n − 1)! 0
Hence we obtain two upper estimates that we merge to one:
|x|n+1 2|x|n
|Rn (x)| ≤ ∧
(n + 1)! n!
which is bounded by the second term for |x| > 2(n + 1).

Corollary 4.1 Let E |X|n < ∞, φ denote the ch.f. of X, and mk = E X k , k = 0, ..., n. Then

n
ik mk |tX|n+1 2|tX|n
φ(t) = tk + Rk (t), where |Rk (t)| ≤ E ∧
k! (n + 1)! n!
k=1

E.g., for n = 2,
t2
φ(t) = 1 + it E X − E X 2 + R2 (t)
2

9
where
|tX|3
|R2 (t)| ≤ E ∧ |tX|2 .
6

We hardly ever need this “precision” with 16 . Let’s make it cruder and simpler:

|R2 (t)| ≤ E |tX|2 (|tX| ∧ 1) ≤ cT E X 2 (|X| ∧ 1), (4.1)

where |t| ≤ T and cT = T 2 (T ∨ 1).

Note the immediate application that involves the standard Gaussian distribution N (0, 1):

1
with the density √ e−x /2 and the ch.f. e−t
2 2 /2
. (4.2)

The Central Limit Theorem


Let Xn ∈ L2 be i.i.d. with ch.f. φ. W.l.o.g. we may and do assume that E Xn = 0 and E Xn2 = 1.
Then
X1 + · · · + Xn D
Yn = √ → N (0, 1)
n
Proof. The ch.f. of Yn can be estimated as follows:
( )
√ t2 √ n
pn = φ(t/ n)n = 1 − + R2 (t/ n ,
2n

That is, we can rewrite it a


( an )n
pn = 1 − , where an → a = t2 /2
n
Given ϵ ∈ (0, a), we find n0 such that |an − a| < ϵ for every n > n0 , so
( ) ( )
a+ϵ n a−ϵ n
1− ≤ pn ≤ 1 −
n n

That is,
e−a−ϵ ≤ lim inf pn ≤ lim sup pn ≤ e−a+ϵ
n n

Hence limn pn = e −t2 /2 .

10
4.2 Lindeberg-Feller Condition

The CLT is one of the basic examples of a limit theorem that establishes a limit distribution of a
sequence of random variables (Sn ), subject to affine transformations:
S n − bn
,
an
where an , bn are scalar sequences, and Sn may depend on observed data, expressed as random
variables, and be their function (a.k.a. a statistic), e.g., the sum, maximum, minimum, etc.
For example, under moment assumptions, the centering scalar bn could be the mean while the
scaling scalar an could be the standard deviation of the transformed variable. The independence
assumption may be relaxed, the moment assumptions may be dropped, so the centering and scaling
constants might not be related to moments at all.

Presently we consider a random array [ξnk : n ∈ N, k ≤ n] and denote

Sn = ξn1 + · · · + ξnn .

We assume that

(1) For every n, ξn1 , . . . , ξnn are independent;


(2) For every n and k ≤ n, E ξnk = 0, σnk
2 = E |ξ |2 < ∞;
nk
(4.3)

n
(3) s2n = Var(Sn ) = E |Sn |2 = 2
σnk = 1.
k=1

Introduce also the Lindeberg-Feller condition



n
[ 2 ]
∀ ϵ > 0 lim ℓn (ϵ) = 0, where ℓn (ϵ) = E ξnj ; |ξnj | > ϵ (4.4)
n
j=1

Theorem 4.2 Let a triangular random matrix [ξnk ] satisfy (4.3). Then,
2
(a) max σnk → 0,
k
(b) L(Sn ) → N (0, 1).

if and only if the Linderberg-Feller condition (4.4) is satisfied.

Proof. Assume (4.4). Then (a) follows since

2
E ξnk ≤ ϵ + ℓn (ϵ).

Consider a Gaussian random matrix [ζnk ] with all characteristics (4.3), and denote Zn = k ζnk .
Clearly, Zn ∼ N (0, 1). Let |t| ≤ T .

itS ∏ itξ ∏ ∑
itξnk

itζnk
E e n − E e n = E e nk −
itZ
E e nk ≤
itζ
E e − E e ,

k k k

11
where the last inequality for products of complex numbers from the unit disk follows by induction.
Continuing, the latter term is bounded by
∑ 2 2

itζ 2 2

E eitξnk − 1 + t σnk + E e nk − 1 + t σnk .
2 2
k k

Using the error estimate (4.1), the first of the above terms is bounded by

cT E |ξnk |2 (1 ∧ |ξnk |) ≤ ϵ cT s2n + cT ℓn (ϵ)
k

Denoting the p’th absolute moment of a N (0, 1) Gaussian r.v. by mp , the p-th moment of N (0, σ 2 )
Gaussian r.v. ζ equals
E |ζ|p = σ p E |ζ/σ|p = mp σ p

Hence, the second term is bounded by


∑ ∑ ∑ 3/2 1/2
cT E |ζnk |2 (1 ∧ |ζnk |) ≤ cT E |ζnk |3 = cT m3 E σnk ≤ cT m3 max σnk
k
k k k

Now, let n → ∞, and then ϵ → 0.

To show that Lindeberg’s condition is necessary, assume (a) and (b), and fix ϵ > 0 and t > 0.
Assuming (b), the weak convergence of laws Sn to the symmetric normal law, we infer that
∑ t2
ln E cos tξnk = ln ℜE eitSn → − .
2
k

We claim that
∑ ( ) t2
E 1 − cos tξnk → . (4.5)
2
k
Indeed, let’s write
∑ ∑ ( )
bn = ln E cos tξnk + E 1 − cos tξnk .
k k

Applying the inequality | ln z + 1 − z| ≤ |1 − z|2 to z = E cos tξnk , and next using the estimate
1 − cos u ≤ u2 /2 (with u = tξnk ), we infer that
∑ 2 ∑ ( t2 ξ 2 )2 t4 max σ 2 ∑
k nk
|bn | ≤ E (1 − cos tξnk ) ≤ E nk
≤ 2
σnk → 0,
2 4
k k k

which proves (4.5), by virtue of the assumed condition (a). On the left hand side of (4.5), consider
a single term with ξ = ξnk . Then, since 1 − cos u ≤ u2 /2 and 1 − cos u ≤ 2,

E (1 − cos tξ) = E [1 − cos tξ; |ξ| ≤ ϵ] + E [1 − cos tξ; |ξ| > ϵ]


t2
≤ E [ξ 2 ; |ξ| ≤ ϵ] + 2P(|ξ| > ϵ)
2
t2 2
≤ E [ξ 2 ; |ξ| ≤ ϵ] + 2 E ξ 2 ,
2 ϵ

12
where the second term was estimated with the help of Chebyshev’s inequality. In other words,
2 4
E [ξ 2 ; |ξ| ≤ ϵ] ≥ 2
E (1 − cos tξ) − 2 2 E ξ 2
t t ϵ 2)
2( t 4
= 1 + 2 E (1 − cos tξ) − − 2 2 E ξ2
t 2 t ϵ
Return to ξnk , sum up along j, and let n → ∞, Then, (4.5) and the normalizing condition s2n → 1
imply
∑ 4
lim inf 2
E [ξnk ; |ξnk | ≤ ϵ] ≥ 1 − .
n t 2 ϵ2
k
Although t was fixed, it is arbitrary. Now, let t → ∞,

Corollary 4.3 (Lyapunov) Let ξnk fulfill assumptions of Lindeberg’s theorem, and let δ > 0.
Then
[ ∑ ] [ ]
d
λn (δ) = E |ξnk |2+δ → 0 ⇒ Sn → ζ
k

Proof. Indeed, ℓn ≤ λn (δ)/ϵδ , so Lyapunov’s condition implies Lindeberg’s.

We return to the sequence of independent random variables (ξk ). As before, assume E ξ = 0, σk2 =
E ξ 2 < ∞, Sn = ξ1 + · · · + ξn , sn = σ12 + · · · + σn2 . Let Fn = P(Sn /sn ≤ x) and Φ(x) = P(ζ ≤ x). If
w
Fn → Φ, it is desirable to know how fast the convergence occur. That is, estimates of

dist (Fn , Φ)

are of great practical and theoretical importance, where “dist” - preferably a metric - measures the
convergence. Although the Lévy-Prokhorov metric seems to be the most natural choice since it
metrizes the weak convergence of measures, its specific definition makes it difficult to examine. The
uniform metric is stronger but more appropriate for applications. Of course, to obtain a stronger
mode of convergence, a stronger assumption is needed. Let F (x) = P(ξ ≤ x), G(x) = P(η ≤ x) be
probability distribution functions. Consider

dist(F, G) = ∥F − G∥∞ = sup |F (x) − G(x)|.


x

Theorem 4.4 (Berry (1941), Esseen (1945), Van Beek 1972) Assume E ξ = 0, E ξ 2 = 1,
and E |ξ|2+δ < ∞, for some δ > 0. Let (ξk ) be independent copies of ξ. Then

E |ξ|2+δ
∥Fn − Φ∥∞ ≤ c √ ,
n
where c is some universal constant, independent of n and of distribution of ξ, although it may
depend on δ.

No proof will be presented here (see Durrett).

13
4.3 Poisson convergence

Let [ξnk ] be a triangular array of random variables:

(1) values are whole numbers 0, 1, 2, ...;


(2) for every n, ξnk are independent; (4.6)
(3) max ∥ξnk ∥0 → 0 as n → ∞, where ∥ · ∥0 is any L0 -metric.
k

We are free to choose ∥X∥0 = E (1 ∧ |X|), or ∥X∥0 = E (1 − e−tX ) for a fixed t > 0.

As before, denote Sn = ξnk . We will discuss the weak convergence of its distribution to the
k
Poisson distribution on Z+ :
λn −t )
µ { n } = P(ξ = n) = e−λ
it −1)
, µ̂(t) = eλ(e , µ̃(t) = eλ(1−e .
n!
Lemma 4.5 Assume (4.6). Then the following conditions are equivalent:
D
1. Sn → ξ

2. − ln ψnk (t) → λ(1 − e−t ), t > 0.
k
∑( )
3. Cn = 1 − ψnk (t) → λ(1 − e−t ), t > 0.
k

Proof. The equivalence of the first two conditions follows by the Lévy Continuity Theorem for
Laplace transforms ψ(t) = E e−tX , after applying the logarithm.

For u ∈ [0, 1/2] we have the identity

− ln(1 − u) = u + r(u), where 0 ≤ r(u) ≤ u2 . (4.7)

To show the equivalence of the second and third condition, we apply it with u = unk = 1 − ψnk ,
which is arbitrarily small by the third assumption in (4.6). Then we sum up along k. That is,
either we assume (3), so Cn ≤ C, or we assume (2) which yields
∑ ∑
Cn = (1 − ψnk ) ≤ − ln ψnk ≤ C.
k k

Hence the remainder is bounded by


∑ ∑
rnk ≤ (1 − ψnk )2 ≤ C max ∥ξnk ∥0 → 0,
k
k k

where we choose ∥X∥0 = E (1 − e−tX ).

14
Theorem 4.6 Let a random triangular array [ξnk ] satisfy (4.6) and ξ be Poisson(λ). Then
∑ 
(a) P(ξnk > 1) → 0  
 ∑ D
∑k ⇐⇒ Sn = ξnk → ξ
(b) P(ξnk = 1) → λ  
 k
k

D
Proof. The necessity. Suppose that Sn → ξ ∼ Poisson(λ), and look at the third condition in
Lemma 4.5. For simplicity, denote s = e−t . So, Cn → λ(1 − s). For a single random variable X
with values in Z+ we have

E (1 − sX ) = E [1 − sX ; X > 0] = E [1 − s + s − sX ; X > 0] = (1 − s)P(X > 0) + R(s)

where

R(s) = E [s − sX ; X > 0] = E [s − sX : X > 1], hence(s − s2 )P(X > 1) ≤ R(s) ≤ sP(X > 1).

(because X > 1 ⇐⇒ X ≥ 2). Since


∑ ∑
Cn = (1 − s) P(Xnk > 0) + Rnk (s), where Rnk (0) = 0,
k k

thus
∑ ∑ ∑
P(Xnk > 0) → λ and Rnk (s) → 0, hence P (ξnk > 1) → 0
k k k
so (b) and (a) hold true.

The sufficiency. First, we reduce the range of r.vs. to the mere { 0, 1 }. Then (a) is trivially
′ =
true. Suppose that [*] “(b) is sufficient for the Poisson convergence for 0-1 r.vs.” Denote ξnk
1I{ξnk = 1}.
∑ D
′ . So, by [*] S ′ = ′
Assume (b), which is the same for both ξnk and ξnk n ξnk → ξ. But
k

Sn = Sn′ + Rn , where Rn = ξnk 1I{ξnk >1} .
k
∑ ∑
By (a), using the subadditivity 1 ∧ k ck ≤ k (1 ∧ ck ),
∑ ∑
E 1 ∧ Rn ≤ E 1 ∧ ξnk 1I{ξnk >1} = P(ξnk > 1) → 0,
k k

P D
i.e., Rn → 0, so Sn → ξ.

Now, assume (b) for an 0-1 array, and compute the logarithm of the Laplace transform E exp { −tSn } ,
with the notation unk = 1 − E exp { −tξnk } = (1 − e−t )P(ξnk = 1), using (4.7):
∑ ∑
− ln(1 − unk ) = (1 − e−t ) P(ξnk = 1) + Rn , where Rn → 0
k k

15
since

Rn ≤ u2nk ≤ C max ∥ξnk ∥0 .
k
k

Thus, we obtain the Laplace transform of Poisson(λ) in the limit.

Remark 4.7 In elementary probability courses the special case of i.i.d. Bernoulli ξnk ’s is known
as the Poisson approximation of the Binomial. Indeed, in this case Sn is binomial, where it
is also assumed that pn = P(Xn1 = 1) → 0, and then (b) means that npn → λ.

Exercise. What condition imposed on pn does ensure (or, is necessary and sufficient for) the
Lindeberg-Feller condition. i.e. Gaussian rather than Poisson convergence? Note that the each
entry ξnk needs to be standardized to fulfill the standing assumptions for the CLT for random
arrays.

16
4.4 Exercises

1. Verify the relations “density vs. ch.f.” in Example 1.3 and formula (4.2).

2. Let (S, S, µ) be a bounded measure space and f (t, s) a measurable real or complex function
on R × S. Assume that

• f (·, s) is locally (i.e., on every interval) integrable functions on R for almost every s ∈ S;
• f (t, ·) is µ-integrable for almost every t ∈ R;
∫ ∫ T
def
• The improper integral g(s) = f (t, s) dt = lim f (t, s) dt exists for almost every
R T →∞ −T
s ∈ S;

T
• sup f (t, s)dt is µ-integrable;
T −T

Then ∫ (∫ ) ∫
f (t, s) µ(ds) = g(s) µ(ds).
R S S

3. A discrete version of the above theorem [Abel’s convergence criterion for infinite series].
Prove:

∑N ∑

If an ↘ 0 and sup bn < ∞, then an bn converges.
N
n=1 n
Hint: write dn = an−1 − an and Bn = b1 + · · · + bn , and split the sum (discovering and proving
Abel’s “summation by parts” formula):


N ∑
N −1
an bn = aN BN − d n Bn .
n=0 n=0

∑ sin an ∑ cos an
(a) Let p > 0. Show that converges for every real a, and converges for
n
np n
np
a∈/ 2πZ.
∑ C
(b) Let µ = δ , where C makes probabilities of the sequence (n2 ln n)−1 . The first
2 ln n n
n
n
moment does not exist but µ̂ is differentiable for t ̸= 0. Prove also the same statement
when atoms { n } oscillate, i.e., replace δn by δ(−1)n n .

4. Show that the density in Corollary 3.2 is continuous.


∏ ∏ ∑
5. On the unit disk of the complex plane, | k zk − k wk | ≤ k |zk − wk |.

6. Find the arbitrary n-th absolute moment E |ζ|n of a standard N (0, 1) r.v. ζ. Hint: in the
first semester we evaluated even moments (while studying Marcinkiewicz-Zygmund-Paley
inequalities).

17
7. What condition imposed on pn in the triangular matrix of Bernoulli r.vs. (i.i.d. in each
row) will ensure (or be implied by) the Lindeberg-Feller condition. i.e. Gaussian rather than
Poisson convergence?

18
5 Poisson random measure
In science and beyond the most typical activity is counting. Scientists and beyondists count every-
thing, stars in sky sectors, pollutant particles in water or air, bird nests per area of Alabama (or
Alaska), coins in collections, gold nuggets in mines, Burmese pythons in Everglades, Occupieds per
city, customers in burger joints, votes in the GOP primary per Florida county, etc.

Typically, the count involves the number of items per region, may vary from 0 through all natural
numbers, and there is no reason to assume that there is a definite upper bound. All the listed -
and unlisted - examples involve measurable regions - linear, planar, spatial, etc. It stands to reason
to suppose that the count depend more on the measurement (length, area, volume) than on other
aspects like geometry or topology. Also, the counts in separate regions should be independent.

Of course, both assumptions are ideal but so are all human made models.

The randomness is entailed by the random distribution of items Yn - wether arrival moments on
the temporal line, or scatter points in the plane or surface, or in the space or 3D-manifold, or just
in an abstract set S. So, the count is

N (A) = 1IA (Yn ). (5.1)
n

The formula can be viewed from the measure-theoretic point of view. Denoting by δa the atomic
measure at a point a, we may write

N= δYn , (5.2)
n
and then for f ≥ 0, ∫ ∑
Nf = f dN = f (Yn ).
S n

Let (S, S, λ) be a σ-finite continuous measure space and (Ω, F, P) be a probability space, both
entailing the L0 -spaces of measurable function. While L0 (Ω) with convergence in P is metrizable

by the traditional metric E (1∧|X|), the analogous metric S (1∧|f |) dλ yields a topology essentially
stronger than the convergence in measure although weaker than L1 . On L∞ (S) it is L1 , though.

A sequence of random elements Yn in S (i.e., measurable mappings Ω 7→ S) entails a random


counting measure on S, a so called point process. It is not immediately clear whether the
converse is true, that is, a random counting measure requires random points to be counted.

The counting random measure is just one example with the concept of a random measure, i.e. a
mapping X : S → L0 (Ω, F, P) such that
∪ ∑
X An = XAn , An ∈ S are disjoint. (5.3)
n n

19
The series on the right should converge in probability and, a fortiori, the convergence must be
unconditional, i.e., independent of permutations of the indices. The range might be a narrower
subspace of L0 such as L1 or L2 . Thus, a random measure is factually a vector measure which
extends the classical concept of a nonnegative countably additive set function. For example, a
signed measure is an R-valued vector measure.

A deterministic control measure is a very convenient tool:

XAn → 0 ⇐⇒ λAn → 0.

Then it would suffice to introduce the random measure on a generator S0 . For example, when
S = Rd , and S consists of Borel sets, with a control measure it suffices to define X on simple
figures such as intervals.

20
5.1 Poisson measure and integral

The function x 7→ 1 ∧ x on the positive half-line can be replaced by another more convenient
function. Below we shall use ψ(x) = 1 − e−x for reasons that will soon become clear. So, for
random variables, the L0 -metric is
∥X∥0 = E ψ(|X|).

The mapping ξ : S → L0 (Ω, F, P) is called a Poisson random measure (PRM) if

1. ξA is Poisson(λA), for every A ∈ S of finite measure λ;

2. ξA and ξB are independent if A ∩ B = ∅.

We call λ the control measure of ξ. At this moment the issue of existence is not yet resolved but
properties can be easily derived.

Proposition 5.1 Let ξ be a PRM with a control measure λ. Let A1 , . . . , An be disjoint measurable
sets of finite measure. Then ξA1 , · · · , ξAn are independent, and their joint Laplace transform is
{ } { }
∑ ∑( )
E exp − tk ξAk = exp − 1 − e−itk λAk .
k k

Proof. The Laplace transform formula follows by induction and utilizes the property of Poisson
distribution: the sum of two independent Poisson random variables is again Poisson, and the
parameters add up.

First, we note that ξ is factually a countably additive function (in the sense to be explained) on

the δ-ring of S0 subsets of S of finite measure. Let A ∈ S0 and A = k Ak , where Ak ∈ S0 are
disjoint. Then
∑ ∑ ∑ ( ∑ )
E ξA = λA = λAk = E ξAk = E ξAk , or E ξ(A) − ξAk = 0
k k k k

(the r.v. in parentheses is nonnegative2 ). That is, ξ is countably additive as a mapping with values
in L1 (Ω, F, P).

Corollary 5.2 From the measure-theoretic point of view the Laplace transform formula appears
as the assignments
∑ ∑ ∫
def
f= tk 1IAk 7→ ξf = tk ξ(Ak ) = f (s) ξ(ds).
k k S
{ ∫ ( ) }
−f (s)
E exp { −ξf } = exp − 1−e λ(ds)
S
2
why?

21
The quantity entails a complete metric vector subspace of measurable functions on S. Recall
ψ(x) = 1 − e−x , x ≥ 0.

∥f ∥0 = ψ(|f |) dλ
S
{ }
L = f ∈ L0 (S) : ∥f ∥0 < ∞ , d(f, g) = ∥f − g∥0 ,

where simple functions form a dense subset. Write λF = S F dλ. Then the last formula in the
Corollary can be rewritten as
E ψ(ξf ) = ψ(λψ(f )).

In other words,
∥ξf ∥0 = ψ(∥f ∥0 ) (5.4)

which establishes a homeomorphism between the space of simple f ’s and their Poisson integrals.

Proposition 5.3 For a Poisson random measure ξ consider the positive cone L+ = { f ∈ L : f ≥ 0 }.
Then the mapping ξf , defined originally for simple functions, extends to a continuous positive-linear
mapping from L+ into L0+ (Ω, F, P), and (5.4) continues to hold

Then ξ extends to a continuous linear mapping on L, defined as ξf = ξf+ − ξf− .

Proof. Let f ∈ L+ and fn ≥ 0 be increasing simple measurable functions such that f = limn fn =
supn fn and ∥f − fn ∥0 → 0. Clearly, the well defined ξfn increase a.s. and ξfn is a Cauchy sequence
in L0 . Indeed, for n ≥ m, by (5.4)

∥ξfn − ξfm ∥0 = ψ(∥fn − fm ∥0 ) → 0

So ξf = limn ξfn exists in probability and hence a.s. (since the sequence increases). Further, (5.4)
is preserved in the limit, which ensures the other listed properties,

5.2 About stochastic processes

The definition of PRM contains a family of finite dimensional distributions

{ µA1 ,...,An : disjoint Ak ∈ S } .

Although it is easy to create a random vector (ξ1 , . . . , ξk ) with independent Poisson (λAk ) compo-
nents, the existence of a robust mapping ξ : S → L0 is not immediately obvious. It is a special
case of a more general problem.

22
Let T be a nonempty set and X = (Xt : t ∈ T ) be a family of real random variables, a.k.a.
stochastic process. By finite dimensional distributions (FDD) of X we understand the
Borel probability measures
( )
µt1 ,...,tn = L Xt1 , . . . , Xtn on B(Rn ), n ∈ N, t1 , . . . , tn ∈ T,

µτ in short, where τ = { t1 , . . . , tn }. So, more precisely, µτ is a Borel measure on Rτ . We notice


the obvious relation for m > n
( ) ( )
P Xt1 ∈ A1 , . . . , Xtn ∈ An , Xtn+1 ∈ R, . . . , Xtm ∈ R = P Xt1 ∈ A1 , . . . , Xtn ∈ An , Ak ∈ B(R)

In terms of probability measures, we say that their family is consistent. That is, for finite τ ′ ⊃ τ
( ′
)
µτ ′ A × Rτ \τ = µτ (A), A ∈ B(Rτ ).

So, the passage from the family of random variables (Xt ) to the consistent family of multidimen-
sional probability distribution (µτ ) is immediate but the inverse implication is highly nontrivial,
and is known as Kolmogorov Extension Theorem (cf., e.g. Theorem 6.16 in Kallenberg, or
the special case in Appendix A7 in Durrett).

Even the existence of an infinite sequence of independent random variables belongs to this category.
However, we introduced a countable product measure in the first semester. That is, if (Ωk , Fk , Pk )
is an infinite sequence of probability spaces, then there is a product measure P = P1 ⊗ P2 ⊗ · · · on

(⊗, F) = ( Ωk , F1 ⊗ F2 ⊗ · · ·).
k

In particular, if µk are Borel probability measures on R, then the well defined product measure

µ = µ1 ⊗ µ2 ⊗ · · · on (RN , B(RN ))

entails independent random variables

Xn (ω) = ωn , ω = (ωn ) ∈ RN .

Therefore, any constructive and intuitive approach should be appreciated.

23
5.3 Classical Poisson Process

There is one-to-one correspondence between increasing sequences yn on [0, ∞) and nondecreasing


piecewise constant right continuous functions n(t) with unit jumps (CF - for “counting functions”)
on [0, ∞):

given yn ↗ put n(t) = 1I[0,t] (yn );
n
given a CF n(t) put yn = nth jump of n(t);
In other words, for every t ≥ 0 and n = 0, 1, 2, ...

n(t) ≥ n ⇐⇒ yn ≤ t, or n(t) = sup { k : yk ≤ t } . (5.5)

For any nonnegative Borel function f on [0, ∞), the Lebesgue-Stieltjes integral is well defined
although it could be infinite ∫ ∞ ∑
nf = f (t) dn(t) = f (yn ).
0 n
Hence any increasing random sequence Yn entails the CF Nt and a random counting measure N A
(5.1), and then the integral of a nonnegative function. Conversely, a counting random measure
defines the random CF Nt , and its discontinuities define Yn ’s. We shall call them signals.

If (Vn ) are i.i.d. and Yn = V1 + . . . + Vn , then the CF Nt defined by (5.5) is called a renewal
process. The most important case involves the exponential distribution of the summands Vk .
Denote the parameter, also called the intensity, by λ. We will show that Nt induces a Poisson
random measure with the scaled Lebesgue measure as a counting measure.

Proposition The r.v. Nt has the Poisson(λt) distribution.

Proof: Yn has the Gamma distribution with the density


λn
fn (x) = xn−1 e−λx
(n − 1)!
Hence, by conditioning and since Yn+1 = Yn + Vn+1

P(Nt = n) = P(Nt ≥ n, Nt < n + 1) = P(Yn ≤ t, Yn + Vn+1 > t)


∫ t
= P(Vn+1 > t − x)fn (x) dx
0
∫ t
λn (λt)n −λt
= e−λ(t−x) xn−1 e−λx dx = e
(n − 1)! 0 n!

We call Nt the Poisson process. Note that the name does not and should not apply to the
sequence Yn , although it determines Nt . For that reason the terminology is often abused, and the
sequence is improperly called “the Poisson process”.

24
Define the age time from the given moment to the last signal that precedes it, and the excess
time from the moment t to the next signal:

At = t − YNt , Wt = YNt +1 − t

In what follows the crucial role is played by the “lack of memory” of an exponential distribution:

P(U1 > t + s|U1 > t) = P(U1 > s).

Proposition. The excess time Wt is independent of Nt and L(Wt ) = L(U1 ).


Proof: Compute
∫ t
P(Yn+1 ≥ t + s, Yn ≤ t) = P(Un+1 ≥ t + s − x) fn (x) dx
0
∫ t
λn (λt)n −λ(t+s)
= e−λ(t+s−x) xn e−λx dx = e
(n − 1)! 0 n!

Since { Nt = n } = { Yn ≤ t, Yn+1 ≥ t }, this yields

P(Yn+1 ≥ t + s, Yn ≤ t)
P(Wt ≥ s|Nt = n) = P(YNt +1 ≥ t + s|Nt = n) = = e−λs
P(Nt = n)

and so

P(Wt ≥ s) = P(Wt ≥ s|Nt = n)P(Nt = n) = e−λs ,
n

which also entails the independence: P(Wt ≥ s|Nt = n) = P(Wt ≥ s).

Corollary 5.4

1. The Poisson process starts afresh and independently after any time t.

More precisely, given t > 0, define i.i.d. exp(λ) r.vs:

U1′ = Wt = YNt +1 − t, U2′ = YNt +2 − YNt +1 , ...Uk′ = YNt +k − YNt +k−1 , ...

and also
Yn′ = U1′ + · · · + Un′ .

Then, by Proposition, U1′ is independent of Nt , so it is independent of (Uk′ , k ≥ 2). Then, for


m ≥ 2,

P(Uk′ ≤ uk , k = 2, . . . , m) = P(Yn+k − Yn+k−1 ≤ uk , k = 2, . . . , m|Nt = n)P(Nt = n)
n

= P(Uk ≤ uk , k = 2, . . . , m) = P(U2 ≤ u2 ) · · · P(Um ≤ um )

25
2. N (t, t + s] = Nt+s − Nt is independent of Nt and is distributed as Ns :
∑ ∑
N (t, t + s] = 1I(t,t+s] (Yn ) = 1I(t,t+s] (Yn′ ).
n n

In other words, the distribution of increments depends only on their durations not on their
locations, and we often say that the process is stationary.

By induction, the independence holds true for any finite number of disjoint of increments.

3. Nt entails a Poisson measure that starts with N (a, b] = Nb − Na .

4. The age time At and excess time Wt have the same exp(λ) distribution.

Hence we encounter a paradox: if at time t > 0 the previous and the next signals are observed,
then the epoch - the time distance between them - has the expectation twice as long than the
average epoch between two arbitrary signals:
( ) ( )
E UNt +1 = E YNt +1 − YNt = E At + Wt = 2 E U1

Is it really a paradox?

5.4 Transformations of Poisson process

5.4.1 Nonhomogeneous Poisson process

Let ϕ : (0, ∞) → (0, ∞) be a strictly monotonic function with the inverse Λ = ϕ−1 . We assume
the strict monotonicity for the sake of clarity of presentation. Otherwise, for function that may be
piecewise constant we would have to use the generalized inverse.

Given a Poisson process Nt with unit intensity, transform its Gamma-distributed signals Yn into
Zn = ϕ(Yn ), and denote the new counting process by Mt or the counting measure by M A. That is
∑ ∑ ∑
MA = 1IA (Zn ) = 1IA (ϕ(Yn )) = 1IΛ(A) (Yn ) = N Λ(A).
n n n

Hence M A1 , . . . , M An are independent when A1 , . . . , An are disjoint, and M (A) is Poisson with
parameter Λ(A). Notice that ΛA is a measure, e.g.

Λ(a, b] = |Λ(b) − Λ(b)|.

Thus M is a Poisson random measure on the range ϕ(0, ∞). If the measure Λ is absolutely
continuous with respect to the Lebesgue measure, then denoting its density by λ(t), also called the
intensity function we obtain ∫
ΛA = λ(t) dt.
A
Examples.

26
1. Poisson process often serves as a model of customer service. However, its original setup would
require the 24/7 servicing, in contrast to the usual piecewise service periods as in banking
hours 9-5 for example. So, we can use two-valued { 0, λ } intensity function, with hours as
time units:


λ(t) = λ 1I(9+24n,17+24n] (t).
n=0

The above “square wave” is just one example of a periodic intensity function.

2. Say, ϕ(t) = t2 , t > 0. Then signals are Y12 , Y22 , .... Then Λ(t) = t and its intensity is
1
λ(t) = Λ′ (t) = √ .
2 t

For a general power ϕ(t) = tp , λ(t) = t1/p−1 /|p|. E.g., the transformation ϕ(t) = 1/t entails
the intensity λ(t) = 1/t2 , so with probability 1 the number of signals Zn in every half-line
[a, ∞) with a < 0 is finite.

5.4.2 Reward or compound Poisson process

Write the Poisson process, Poisson measure or Poisson integral again:


∑ ∑ ∑
Nt = 1I[0,t] (Yn ), NA = 1IA (Yn ), Nf = f (Yn ).
n n n

Let Rn be i.i.d. r.vs. (“rewards”), independent of N that replace unit size jumps by Rn ’s. Define
ad rewrite
∑ ∑ ∑ ∑
Nt
Mt = Rn 1I[0,t] (Yn ) = Rn 1I{Yn ≤t} = Rn 1I{Nt ≥n} = Rn (5.6)
n n n n=1
∑0
with the convention n=1 = 0. We may also write

Mf = Rn f (Yn ).
n

Let us compute the Laplace transform using Fubini’s Theorem (subscripts at the expectations
indicate the suitable integrals) and abbreviating R1 = R:
{ }
∑ ∏
−M f
Ee = E N E R exp − Rn f (Yn ) = E N E R exp { −Rf (Yn ) } .
n n

Introducing the function


g(x) = − ln E R exp { −Rf (x) }

we obtain the formula (no need to use the subscript anymore)


{ } { ∫ }
∏ ∑ ∞( )
−M f −N g −g(x)
Ee =E exp { −g(Yn ) } = E exp − g(Yn ) = E e = exp − 1−e dx
n n 0

27
Now, removing the function g we arrive at the identity
{ ∫ ∞ ( ) }
−M f −Rf (x)
Ee = exp − E 1−e dx . (5.7)
0

One more time, denote by µ the probability distribution of R, supported by [0, ∞), and let S =
[0, ∞)2 with Borel sets and the product measure λ = µ⊗Leb (“Leb” of course denotes the Lebesgue
measure). Also, define the positively linear operator

[0, ∞)2 ∋ (u, x) = s 7→ Lf (s) = u f (x).

Thus, finally we see that the “reward Poisson process” is factually identical (in regard to its FDD)
D
with a Poisson random measure ξ on the product space, M f = ξT f :
{ ∫ ( ) }
−M f −Lf (s)
Ee = exp − 1−e λ(ds) = E e−ξ Lf .
S

Example 5.5 Let us examine one more time formula (5.6)


Nt
Mt = Rn
n=1

An alternative name for Mt is a “compound Poisson process”.

1. Let Rn be i.i.d. Bernoulli with P(Rn = 1) = p. That is, with probability p a signal is
recorded (or taken, or colored) while with probability 1 − p the signal is neglected (or left
out, or whitened out). Then Mt is a Poisson process with intensity pλ, a “thinned” Poisson
process.

In other words, if X is a binomial r.v. with parameters n and p, bin(n, p) and then n is
“randomized” by a Poisson random variable N independent of X, then b(N, p) is Poisson.

2. The remaining process with “rewards” 1 − Rn is also Poisson with intensity (1 − p)λ. Further,
both processes are independent.

This property can be generalized to a finite decomposition of the unit (as in 1 = Rn +(1−Rn )).

To wit, let R = dj=1 Rj , where Rj Rk = 0 for j ̸= k, and Rj is Bernoulli with parameter
pj . We may think of a wheel-of-fortune like spinner, with slices marked by numbers or colors
j = 1, ..., d. Let (Rnj ) be independent copies of (Rj ). When a signal Yn of a Poisson process
is recorded, the spinner is spun and the signal is marked by the outcome shown, one between
1 and d. We claim that the resulting process

Mj f = Rnj f (Yn )
n

are independent Poisson with parameters pj λ.

28
5.5 A few constructions of Poisson random measure

5.5.1 adding new atoms

Using the setting of formula (5.2), we look at the reward Poisson process in Example 5.5 as an
extension of an already defined Poisson random measure on (S, S, λ) to a product space S × T ,
where (T, T , µ) is a probability space, and τn are i.i.d. random elements in T with probability
distribution µ:

M= δ(Yn ,τn ) .
n
We may think of τn ’s as “marks”, that are not necessarily numbers. That is why this Poisson
random measure (as we will see) is often called a marked Poisson process.

In the integral form, for a function F (t, y) = α(t)g(y) with separable variables

MF = α(τn ) f (Yn ).
n

So, the reward Poisson measure is just the special case of the marked Poisson measure, Rn = α(τn ).
For a general F ,

MF = F (τn , Yn ).
n
It remains to verify that M is a Poisson random measure.

E e−M F = E N E τ e−F (τ,Yn ) .
n

Denote
g(y) = − ln E F (τ, y).

So
{ } { ∫ ( }
∑ )
−g(y)
E e−M F = E exp − g(Yn ) = E e−N g = exp − 1−e λ(dy)
n S
{ ∫ ∫ ( ) }
= exp − 1 − e−F (t,y) µ(dt) λ(dy) .
S T

Hence M is a Poisson random measure on S × T with intensity λ ⊗ µ.

Example: A Poisson random measure in Rd . We shall use spherical coordinates (when d = 2


they are called polar coordinates) (r, t) where r ≥ 0 and t is a point from the (d − 1)-sphere
T = Sd−1 (e.g., the unit circle when d = 2, the two-dimensional unit sphere when d = 3, etc.). Let
Yn be signals of a unit intensity Poisson process on [0, ∞) and let independent τ n , also independent
of N , be uniformly distributed on Sd−1 . For a cone C described by r ≤ a, t ∈ B, where B is a
Borel subset of Sd−1 ,
1IC (r, t) = 1I[0,a] (r) 1IB (t),

29
the Poisson random variable M C has the expectation a · |B| = Lebd (C) (where |B| denotes the
normalized Lebesgue measure on the sphere). So, the intensity of M is the Lebesgue measure in
Rd .

5.5.2 gluing the pieces

The last case of Example 5.5 can be generalized (and simplified at the same time) as follows. Let
(S, S, µ) be a probability space and let X : Ω → S be a random element with distribution µ. That
is, P(X ∈ A) = µA for A ∈ S. Let Xn be its independent copies and let N be a unit intensity
Poisson process with signals (Yn ), independent of (Xn ). Define


N1 ∑
N1 ∞
∑ ∞

MA = 1IA (Xn ) or M f = f (Xn ) = f (Xn )1I{N1 ≥n} = f (Xn )1I[0,1] (Yn ),
n=1 n=1 n=1 n=1

where A ∈ S or f ≥ 0 is a Borel measurable function on S. For simplicity, denote I = 1I[0,1] . Then



E e−M f = E N E e−f (Xn )I(Yn ) .
n

Wit the help of the function


g(y) = − ln E e−f (X)I(y) .

the latter formula reads


∏ ∑
{ ∫ ∞( ) }
−M f −g(Yn ) − −N g −g(y)
Ee =E e = Ee n g(Yn ) = Ee = exp − 1−e dy .
n 0

Removing g and bringing up I = 1I[0,1] , the last expression


{ ∫ ∞( ) } { ∫ 1( ) }
−f (X)I(y) −f (X)
= exp − 1 − Ee dy = exp − 1 − Ee dy
0 0
{ ∫ ( ) }
−f (s)
= exp − 1−e µ(ds) .
S

In other words, M is a Poison measure on (S, S, µ).

Now let (S, S, λ) be an infinite but σ-finite measure space. Assume that is continuous (atomless).

Let S = Sk , where Sk ∈ S are probability spaces. Create independent Poisson meaures Mk on
(Sk , Sk , λk ), where Sk = S ∩ Sk and λk = λ|Sk according to the previous construction. Finally,
there comes the Poisson random measure with intensity λ:

MA = Mk (A ∩ Sk ).
k

30
5.5.3 using a density of a random element

Let (S, S, λ) be an atomless infinite σ-finite measure space and τ be a random element in S whose
distribution is absolutely continuous with respect to λ and its density p(s) is strictly positive. Let τn
be independent copies of τ . Let Nt be a unit intensity Poisson process with signals Yn , independent
of (τn ). Finally, let A be a Borel set on [0, ∞) with Lebesgue measure 1. Put α = 1IA and define
the integral process for f ∈ L0+ (S) by the formula

ξf = α (Yn p(τn )) f (τn ).
n

Theorem 5.6 ξ is a Poisson measure on S with intensity λ.

Proof. Let us compute the Laplace transform



E e−ξf = E N E τ e−α(Yn p(τ )f (τ ) .
n

With the help of the function


g(y) = − ln E e−α(y p(τ )) f (τ )

and the identity 1 − e−αc = (1 − e−c )α, where α ∈ { 0, 1 }, we rewrite the latter expression as

∏ { ∫ ∞( ) }
−g(Yn ) −N g −g(y)
E e = Ee = exp − 1−e dy
n 0
{ ∫ ∞ ( ) }
−α(y p(τ ))f (τ )
= exp − E 1−e dy
0
{ ∫ ∞∫ ( ) }
−f (s)
= exp − 1−e α(y p(s)) p(s) λ(ds) dy
0 S

Using Fubini’s Theorem, in the “dy-integral” we substitute x = y p(s), so dx = p(s) dy, and since
∫∞
0 α(x) dx = |A| = 1, the latter quantity becomes
{ ∫ ( ) }
−f (s)
exp − 1−e λ(ds) .
S

That is, ξ is a Poisson measure with intensity λ.

Example. Let us construct a planar Poisson measure, for which we need a strictly positive densitiy.
E.g., we may pick the Gaussian density, for u = (u1 , u2 ),
1 −(u21 +u22 )/2 1
= 2π e||u|| /2
2
p(u) = e , q(u) =
2π p(u)

31
so τn = (γn1 , γn2 ), where γnk are independent N(0,1) random variables. Also, we choose A = [0, 1].
Let (Yn ) form a unit intensity Poisson process Nt , independent of (τn ). We observe that Vn =
∥τn ∥2 /2 are exponential r.vs. with unit intensity. So we obtain

ξf = 1I{Yn ≤q(τn )} f (τn ).
n

5.6 Exercises

1. A Poisson random measure ξ is countably additive in every Lp , 0 < p < ∞

2. Show that the only solution of the functional Cauchy equation,

f (x + y) = f (x) + f (y), x, y ∈ R,

in the class of real continuous functions on R is the linear function f (x) = ax. Equivalently,
within this class, the only solution of the functional equation

g(s + t) = g(s)g(t), s, t ≥ 0

is the exponential g(s) = eas . Hence the only continuous distribution that enjoys the lack of
memory property is the exponential distribution.

3. Why do the age time At and the excess time Wt have the same distribution? Is this a property
of Poisson process or any renewal process?

Find the probability distribution of UNt +1 = At + Wt for the Poisson process.

4. A Poisson process Nt on the positive half-line entails immediately a finite additive set function
N (a, b] = Nb − Na on the field spanned by the intervals (a, b]. Since its control measure is the
Lebesgue measure times λ, show in few lines how this additive set function extends to a true
random measure on Bore sets. Note: it is easier to construct the Poisson integral N f first!
Then the random measure is simply N 1IA . Clean details need to be written down.

5. In Corollary 5.4.4 a “paradox” is shown. Say, Auburn Transit buses arrive at a bus stop
according to a Poisson distribution, say, with the average interarrival time 20 min. You come
to the bus stop, there is no bus yet so you wait. How long, in average? 10 minutes, 15, 20?
Yes, 20 is the answer. Also, the time between the moment of departure of the last bus before
your arrival and the moment of your forthcoming ride would be... yes, 40 minutes, in average.

It’s a paradox, isn’t it? Or, perhaps not...

Similarly, if there are two lines to a service, say to a cash register or a ticket booth at a rock
concert, and you choose one line, the other will move faster. So you’ll change the line. But

32
then the line that you just left will be mowing faster. That’s the fact and it has a logical
explanation (the same phenomenon as in waiting for a bus).

Explain!

6. Show that the split Poisson processes Mj in Example 5.5.2 are independent Poisson with
parameters pj λ. Hint: for a fixed f show that, for Mj = Mj f ,
∑ ∏
E e− j cj M j = E e−cj Mj
j

Then, for finitely many fk with disjoint supports (so N fk are independent):
∑ ∑ ∏∏
E e− j k cj Mj fk = E e−cj Mj fk .
j k

Argue that these relations prove the statement.

7. Let (S, S, λ) be an atomless (continuous) space. Let a ≤ λ(S). Then there exists A ∈ S such
that λA = c. In particular, an infinite σ-finite measure space enjoys a partition into the union
of probability spaces.

8. Let (S, S, µ) be a probability space. Consider the standard probability space (Ω, F, P) as the
unit interval with Borel sets and the Lebesgue measure. Argue that there exists a measurable
mapping X : Ω → S such that P(X ∈ A) = µA for every A ∈ S.

33
5.7 Non-positive awards

Let (ζn ) be i.i.d. and copies of a ζ with distribution µ, independent of a Poisson process Nt with
signals (Yn ) and intensity λ. The integral

Xf = ζn f (Yn )
n

is well defined, e.g., when f has a bounded support, e.g., for f = 1I(a,b] and linear combinations of
such functions, say,


n ∑
f = a0 1I{ 0 } + ak 1I[tk−1 ,tk ] = afk , 0 = t0 < t 1 · · · < tn = t
k=1 k

Then its ch.f. equals


{ ∫ ∞ ( ) } { ∫ ∞∫ ( ) }
Ee iXf
= exp −λ E 1−e iζf (t)
dt = exp −λ 1−e ixf (t)
µ(dx) dt .
0 0 R

For the specific simple function listed above, it equals


{ ∫ ( }
∑ ) ∏
exp −λ (tk − tk−1 ) 1 − eiak f (t) µ(dx) = E eiak fk ,
k R k

which shows that X is an independently scattered random measures with stationary increments.
Therefore, its FDD are fully described by one dimensional distributions, for f = 1I[0,t]
{ ∫ ( ) }
Ee iaXt
= exp −λt 1−e iax
µ(dx)
R

By the “gluing technique”, the introduced concept of a random measure can be extended even to
infinite but σ-finite measure µ on R, restricted by the existence of the integral that appears in the
characteristic function. Clearly, a sufficient condition is
∫ ∫
2 c
x µ(dx) + µ([−1, 1] ) = 1 ∧ x2 µ(dx) < ∞.
|x|≤1 R

The finiteness of the first term on the left is obviously necessary. It can be shown that the second
term must be finite necessarily but it requires some tedious reasoning, and we will not show it.

However, we will show details in the symmetric case.

Let’s begin with the simplest case of symmetric ±1-valued rewards. Let ξ be a Poisson random
measure counting random points Yn in (S, S, λ), and let εn be a Rademacher sequence independent
of ξ (and of (Yn )). Define

˜ =
ξf εn f (Yn ).
n

34
By Fubini’ theorem and properties of Rademacher series:
∑ ∑
εn an converges ⇐⇒ a2n < ∞,
n n

the series converges in probability, or, equivalently a.s., if and only if



ξf 2 = |f (Yn )|2 < ∞
n

and this happens if and only if


∫ ( )
1 − e−f
2 (s)
λ(ds) < ∞.
S

Observe that we do not need to restrict ourselves to nonnegative functions (or differences of such).
Instead of Laplace transforms we rather use the characteristic functions. Because of symmetry, the
ch.f. is real and equals { ∫ ( }
)
E eiξ̃f = exp − 1 − cos f (s) λ(ds) .
S

We shall call ξ˜ a symmetrized Poisson random measure (SPRM) with intensity λ. The above
existence condition can be replaced by a more elegant condition:

1 ∧ f 2 (s) λ(ds) < ∞. (5.8)
S
Now, we will examine some of previously discussed variants in this new context.
D
Symmetric rewards. Let Rn be independent copies of a symmetric r.v. R, i.e. R = −R.
D
Therefore, R = ε R, where ε and R are independent. Assume also that (Rn ) is independent of the
Poisson measure ξ. As before, put

Mf = Rn f (Yn ),
n
where the series converges if and only if

E 1 ∧ R2 f 2 (s) ds < ∞.
S

The ch.f. E eiM f equals


{ ∫ ( ) } { ∫ ∫ ( ) }
exp − E 1 − cos R f (s) λ(ds) = exp − 1 − cos x f (s) µ(dx) λ(ds) (5.9)
S S R

So M is a SPRM on S × R with intensity λ ⊗ µ, where µ = L(R). In fact


D
M f = ξLf, where L : S × R → R, L(s, x) = x f (s). (5.10)

We observe that the restriction to a probability or even finite measure µ is not necessary. A potential
extension is controlled by condition (5.8). Consider a standard Poisson process on S = R+ with
unit intensity. Let µ be a measure on R whose properties need to be found and let ξ be a PRM on
R × [0, ∞) with intensity µ ⊗ Leb.

35
Lemma 5.7 The inner integral in (5.9) is finite over the class of functions f that contains indi-
cators iff ∫
1 ∧ x2 µ(dx) < ∞ (5.11)
R

Proof. The statement follows from (5.8) and the inequalities

(a2 ∧ 1) (1 ∧ x2 ) ≤ 1 ∧ (ax)2 ≤ (a2 ∨ 1) (1 ∧ x2 )

Definition. A Borel measure µ on R is called a Lévy measure if (5.11) holds.

Thus, a PRM ξ on R × R+ with intensity dµ ⊗ dt, where µ is a Lévy measure, entails a process M f
by (5.10) with the ch.f.
{ ∫ ∞∫ ( ) }
E eiM f = exp − 1 − cos x f (s) µ(dx) ds
0 R

In particular, for functions f1 , . . . , fn with disjoint supports, M f1 , . . . , M fn are independent. Hence,


if fk = 1I(a+tk−1 ,a+tk ] , k = 1, . . . , n, where a ≥ 0 and t0 = 0 < t1 · · · < tn ,
{ } { ∫ }
∑ ∏
E exp −i ck M fk = exp −(tk − tk−1 ) (1 − cos x) µ(dx) .
k k R

In other words, the stochastic process Mt = M 1I[0,t] has independent and stationary increments.

36
5.8 SSα - symmetric α-stable processes

For α > 0 define the symmetric3 measure µα by the formula


1
µα [x, ∞) = , x>0

Equivalently, µα has the density
α
gα (x) = , x ̸= 0.
|x|α+1
Lemma 5.8 µα is a Lévy measure if and only if α < 2.

Put S = R \ { 0 }. Consider the Poisson measure M on S × R+ . That is,


{ ∫ ∞∫ ( ) α }
Ee iM f
= exp − 1 − cos xf (t) dx dt
0 S |x|α+1

By symmetry we may consider the integral for x > 0, and then change4 the variable x|f (t)| 7→ x,
so the ch.f. equals
{ ∫ ∞ } ∫ ∞
dx
exp −cα |f (t)| dt α
, where cα = 2α (1 − cos x) .
0 0 xα+1
−1/α
In particular, taking f = cα 1I(a,b] ,

E eit(Mb −Ma ) = e−(b−a)|t| .


α

Definition A random variable X with the ch.f.

E eitX = e−a|t|
α

is called symmetric α-stable, or SSα in short. M f , M A, Mt are called then SSα integral,
measure, process - respectively.

−1/α
Also, taking fk = cα 1IAk , where Ak are disjoint of unit Lebesgue measure
( )1/α
∑ D

Xk = M fk , k ∈ N ⇒ fk are i.i.d. and ak fk = |ak |
α
X1 .
k k

Example 5.9 (Le Page representation) Let Yn be Poisson points with unit intensity, τn be
i.i.d. uniform on [0, 1], εn be Rademacher r.vs., and the three sequences be independent. Let
α ∈ (0, 2). Then
∑ f (τn )
Mf = εn 1/α
n Yn
3
µ(A) = µ(−A)
4
the cosine is an even function

37
is a SSα process/integral/measure.

Indeed, even in a more general case


∑ ∑
Mf = εn f (τn ) ϕ(Yn ) converges iff f 2 (τn ) ϕ2 (Yn ) < ∞ a.s.
n n

and the necessary and sufficient condition is


∫ ∞∫ 1
1 ∧ f 2 (t) ϕ2 (y) dt dy < ∞.
0 0

The ch.f. equals { ∫ }


∞∫ 1( )
Ee iM f
= exp − 1 − cos f (t) ϕ(y) dt dy .
0 0

Returning to the original function ϕ(y) = y −1/α , using Fubini’s theorem and the substitution
y = |f (t)|α x−α , we obtain
{ ∫ 1∫ ∞( ) } { ∫ 1 }
−1/α
Ee iM f
= exp − 1 − cos f (t) y dt dy = exp −cα |f (t)| dt
α
0 0 0

with ∫ ∞
1 − cos x
cα = α dx.
0 x1+α

38
5.9 Exercises

1. Let Nt be a standard Poisson process with arrivals Yn and A be a bounded Borel set. Show
that P(Yn ∈ A for infinitely many n) = 0. Deduce then that f (Yn ) = 0 eventually with
probability 1 when f has a bounded support.

2. Verify that the symmetric measure µ on R \ 0 with the tail µ(x, ∞) = x−α is a Lévy measure
iff α ∈ (0, 2).

3. Show that the p the moment E |X|p of a SSα r.v. is finite iff p < α.

4. Let Xk be i.i.d. SSα. Let p ∈ [0, α). Show that


∑ ∑
ak Xk converges in L0 and a.s. ⇐⇒ |ak |α < ∞.
k k

In particular, for p ∈ (0, α)




ak Xk = ∥a∥α ,

k p

i.e., every F-space (for α < 1) or Banach space (for α ∈ (1, 2)) contains a subspace isometric
with ℓα . That it is true also for α = 2 was proved previously (in lieu of stable we can use
Rademacher or Gaussian i.i.d. r.vs.)

5. (added here although it belongs to the previous topic). Consider the paraboloid of revolution
given by the equation z = x2 + y 2 . Project the disk of radius r that lies on the xy-plane
to the paraboloid’s surface, obtaining a set A. Let Yn be Poisson points on the paraboloid,
controlled by the surface area. Find the probability that A has no Poisson points.

More difficult: Construct Poisson points on the paraboloid.

More difficult: Let S be a smooth connected unbounded surface, say, given by a parametric
equation r = r(u, v), where (u, v) ∈ D, where D is an open domain in R2 , and r ∈ C 1 .
Construct Poisson points on S.

(Hint: Show that w.l.o.g D = R2 . Construct Poisson points on the plane. Carry them by
some mapping into S. The Jacobian will be involved.).

39
6 Infinitely divisible distributions

6.1 Preliminaria

Recall that the ch.funs. φ1 , . . . , φn of independent r.vs X1 , . . . , Xn satisfy the formula


{ }

E exp i tk X k = φ1 (t1 ) · · · φ(tn )
k

(which is also sufficient for independence). For just two independent r.vs. X1 , X2 that make the
sum X = X1 + X2 we have, in terms of their probability laws µ1 , µ2 , and µ:
∫ ∫
µA = µ1 (A − x) µ2 (dx) = µ2 (A − x) µ1 (dx),
R R

which can be equivalently stated (Exercise: Prove it) in terms of the integrals
∫ ∫ ∫
µF = F (x) µ(dx) = E F (X) = µ1 F (· + y) µ2 (dy) = µ2 F (· + x) µ1 (dx).
R R R
The ”dot” inside indicates the integration along a hidden variable, e.g.:

µ1 F (· + y) = F (x + y) µ1 (dx).
R
When both partial measures are absolutely continuous, and f1 , f2 denote their densities, then the
density of the sum equals
∫ ∫
fX1 +X2 (z) = f1 (z − y) f2 (y) dy = f2 (z − x) f1 (x) dx.
R R
The operation produces a new measure or a new density which is called the convolution of
measures or densities, and denoted by µ1 ∗ µ2 or f1 ∗ f2 . The extension to any finite number of
terms follows immediately. If µ1 = · · · = µn = µ we may write µ∗n = µ1 ∗ · · · ∗ µn . In the language
of random variables the convolution n-th power is the probability law of the sum X1 + · · · + Xn of
i.i.d. r.vs. with L(X1 ) = µ.

Now, let us look at this pattern from an opposite point of view. Let L(X) = ν and ν̂ = ψ. Is it
possible to write
X = X1 + · · · + Xn , where Xk are i.i.d rvs.?

Equivalently, does there exist a ch.f. ϕ such that ψ = ϕn ? In other words, is ψ 1/n a ch.f.? 5 . If the
measure is supported by the positive halfline, we may use the Laplace transform in lie of the ch.f.
So, if L is a Laplace transform of a probability measure, is L1/n such, too?

First, we will find a counterexample. Suppose that ψ is ID. Then so is ψ and consequently, |ψ|2 is
ID. The limit of ch.f.
ϕ = lim |ψ|2/n
n
5
It doesn’t matter which root of n complex roots we consider; for simplicity we choose the principal root

40
takes only two values, 0 and 1. Since |ψ| > 0 on a neighborhood of 0, hence ϕ = 1 on that
neighborhood, and as a continuous function must equal 1 everywhere. Hence ψ ̸= 0 everywhere.
Let us repeat:

An ID ch.f. never vanishes.

Thus, as an example of a non-ID distribution it suffices to take one with a ch.f. vanishing at some
point. For example, consider the uniform r.v. on [0, 1]. Its ch.f.

eit − 1
it
vanishes for t = 2nπ, n ∈ Z.

Also the “tent function” that is a ch.f. by Polya criterion is not ID.

6.2 A few theorems

We note that the class of ID distributions is closed under convolution (Cf. an exercise).

Theorem 6.1 The class of ID probablity distributions is closed under the weak limit.

Proof. Let φk be ID and φk → φ0 . Let n ∈ N. Then |φk |2 are real ch.funs. for k = 0, 1, 2, .. and
ID for k = 1, 2, .... So, in the latter case |φk |2/n are ch.funs. But

|φ0 |2/n = lim |φk |2/n ,


k

and is continuous at 0, so by the continuity theorem |φ0 |2/n is a ch.fun. for every n ∈ N. That is,
|φ0 |2 is ID. As such, it has no zeros, but then φ0 has no zeros, and we can thus define its n-th root
as well as the n-th roots of φk ’s:
{ } { }
1/n 1 1 1/n
φ0 (t) = exp ln φ0 (t) = lim exp ln φk (t) = lim φk , n ∈ N.
n k n k

1/n
That is, φ0 is continuous at 0 and is the limit of ch.funs., so itself it is a ch.fun.

Corollary 6.2 Let ϕ be a ID ch.f. and α > 0. Then ϕα is ID.

Proof. For a rational α the statement is obvious. Then we pass to any real limit.

If α is irrational then the latter property is hard, if not impossible, to express in the language of
real variables or probability distribution

41
6.3 A side trip: decomposable distributions

A probability distribution µ is called decomposable if there are nontrivial probability distributions


µ1 , µ2 such that µ = µ1 ∗ µ2 . In the language of random variables, X is decomposable, if there exist
independent X1 and X2 , none degenerate, such that

X = X1 + X2

We exclude degenerate r.vs. from the class of decomposable ones to avoid the triviality:

X = (X − a) + a.

Note that the decomposability may have a finite depth, that is, some of the summands may be
non-decomposable. Even if the law can be split into an arbitrary finite number of parts, these
might not be identical.

Call a r.v. X self-similar or c-decomposable, if


D
X = cX + R,

where X and R are independent, and R is non-degenerate. It follows by iteration, that X can be
decomposed into a sum of any length that consists of independent summands:

D D D D

n
X = cX + R = c(cX + R′ ) + R = c2 X + cR′ + R = · · · = cn X + ck−1 Rk ,
k=1

where all r.vs on the right side are independent and Rk ’s are copies of each other. In particular, if
X is c-decomposable, then it is cn -decomposable for every n ∈ N. We observe that this an attribute
of the probability distribution or its transform rather than of the random variable itself. That is,
the property reads, for µ = L(X), with µc = L(cX) and φ = µ̂:
φ(t)
∃ ν µ = µc ∗ ν or is a ch.fun.
φ(ct)
We note that SSα and Gaussian distributions are c-decomposable for every c ∈ (−1, 1).

The uniform random variable V on [−1, 1] is 1/k-decomposable, for every natural number k. Indeed,
since its Fourier transform is sin t/t, then
sin kt ( 1 ) sin(k − 1)t 1
= 1− cos t + cos(k − 1)t,
k sin t k (k − 1) sin t k
i.e.,
1 ( ) ( )
D D
V = V + Rk , Rk = D1−1/k Rk−1 + εk−1 + 1 − D1/k (k − 1)εk ,
k
where D1/k are (1/k)-Bernoulli, εk are Rademacher variables (i.e., (1 + εk )/2 are 1/2-Bernoulli),
and all sequences are independent. Therefore, all uniform random variables are 1/k-decomposable,

42
b−a b+a
being affine transformations of each other, U[a,b] = 2 U[−1,1] + 2 . In general, if a variable with
D
possible negative values (or bounded away from 0) has the property Y = cY + R, where the residue
R has an atom at the minimum, P(R = m(R)) > 0, then Y − m(Y ) is c-decomposable for some c.
In the uniform case, there is a simpler direct argument.

Proposition 6.3 A uniform random variable U on [0, 1] belongs to the class S(c) if and only if
c = p = m−1 , for some natural number m. In this case
D 1
U= U + Dm · Zm
m
where Dm denotes a (1 − 1/m)-Bernoulli r.v. and Zm has the discrete uniform distribution

1 ∑
m−1
δk/m
m−1
k=1

on { k/m : k = 1, . . . , m − 1 }. The three variables U, Dm , Zm are independent.

Proof. Consider the binary series representation of U :



∑ Dn D 1 1
U= i.e. U = U + D ,
2n+1 2 2
n=0

where Dn are i.i.d. (1/2)-Bernoulli. Other admissible parameters c come from the equation
1 − e−s
M (s) = L(s)/L(cs) = a = p + (1 − p)H(s).
1 − e−cs
Clearly, c = p. Hence, the sought-for Laplace transform H(s) would be equal to
p e−ps − e−s
.
1 − p 1 − e−ps
Then, denoting the Dirac delta measure at point c by δc , we have
(∞ ) (∞ )
e−ps ∑ e−s ∑
=L δpk , =L δpk+1
1 − e−ps 1 − e−ps
k=1 k=0

Thus, the signed measure whose H(s) is the Laplace transform is nonnegative,
(∞ ∞
)
p ∑ ∑
−1
L (H) = δpk − δpk+1 ,
1−p
k=1 k=0

if and only if p = 1/m, for some m = 1, 2, .... Thus, H(s) is the Laplace transform of the uniform
discrete probability on { k/m, k = 1, . . . , m − 1 }.

Example 6.4 The replacement of the parameter c = 1/2 by 1/3 yields a c-decomposable variable
with the singular Cantor-Lebesgue distribution:

∑ 2Dn D 1 2
V = , i.e. V = V + D .
3n+1 3 3
n=0

Whether its decomposability semigroup extends beyond { 1/3n } is yet to be determined.

43
6.4 ID of Poisson type

The inspiration: Section 5.4 in Lukacs’ book.


{ }
Let Nλ be a Poisson r.v. with intensity λ and a > 0. Then the ch.fun. of aN is exp λeita − 1 .
A Poisson integral of a simple function is said to be of Poisson type in the literature. In other
words, a r.v. is of Poisson type, if for a finite choice of parameters ak , λk > 0, and independent
Poisson r.vs. Nλk ∫

X= ak Nλk = N g = gdN,
k S

where g = k ak 1IAk and Ak are disjoint with λAk = λk . We carry the name to probability
distributions and ch.funs. So, a ch.fun. ψ is of Poisson type iff
{ }
∏ { ( ita )} ∑ ( )
ψ(t) = exp ak e k − 1 = exp p pk eitak − 1 (6.1)
k k
∑ ∑
where p = k ak and pk = ak /p make a discrete probability distribution µ = k pk δak . In other
words, a Poisson type ch.fun. ψ, obtained from φ = µ̂ by the formula

ψ = exp { p (φ − 1) } , (6.2)

is an ID ch.fun.

Lemma 6.5 Every (6.2) is ID.

Proof. Let φ be a ch.fun. Let p > 0 and n > p. Then the power of the convex combination
(( p ) p )n
ψn = 1− + φ
n n
is a ch.fun., so is its limit as n → ∞. Let us repeat (*): (6.2) is a ch.fun. for every p > 0.

However, the limit is equal to the right hand side of (6.2). We must see that ψ 1/m is a ch.fun. for
every m ∈ N. But
(( p ) p )k
1/m
ψ 1/m = lim ψkm = lim 1− + φ = exp { (p/m) (φ − 1) }
k→∞ k→∞ km km
which is a ch.fun. by (*).

Proposition 6.6 (De Finetti’s Theorem) A ch.fun. is ID iff it has the form

ψ(t) = lim exp { pm (φm (t) − 1) } (6.3)


m

44
Proof. The sufficiency follows by the continuity theorem. To prove the necessity, let ψ be ID.
Then ψ α is ID for every α > 0. Hence
{ }
1 α
exp (ψ − 1)
α

is an ID ch.fun. by the preceding argument. So is ψ, obtained as the limit for α → 0. Now, choose
α = 1/m, pm = m, and φm = ψ 1/m . That is, ψ can be represented as the desired limit.

Now, we will see that t suffices to consider only the Poisson types among the above ϕm ’s.

Theorem 6.7 A ch.fun. ψ is ID iff ψ in (6.3) involves only (6.1).

Proof. The sufficiency follows from the continuity (or De Finetti’s) theorem. Let ψ be an ID
ch.fun. and consider its form (6.3), ensured by De Finetti’s Theorem, with φm = µˆm . We choose
w
discrete µmk → µm as k → ∞. That is, φmk = µ̂mk → φm .

Then the statement follows by the diagonal argument.

6.5 Lévy-Khinchin formula

This is inspired by the presentation in Loeève’s book, Section 22.1. However, the original approach
that used analysis, Riemann-Stieltjes integrals, etc., in a great detail and carefulness, has been
“translated” into the language of Poisson integrals with the help of our sufficient background in
measure theory.

Recall that a Lévy measure M on R is defined by the condition



1 ∧ |x|2 M (dx) < ∞.
R

In this condition we may replace the function 1∧|x|2 by any bounded monotonic continuous function
that behaves like |x|2 near 0. So, as the defining condition we may prefer

x2
2
M (dx) < ∞,
R 1+x

or in other words, that the measure x2 /(1 + x2 ) M (dx) is finite, or a probability up to a positive
scalar multiplier. This probability µ follows the formula

1 + x2
c µ(dx) = M (dx), (6.4)
x2
for some c > 0. For a fixed t let us examine the behavior of the function
itx
f (x) = eitx − 1 − .
1 + x2

45
We see that f (x) is bounded away from 0, and
t 2 x2 itx t 2 x2
f (x) ≈ itx − − ≈ − , |x| → 0.
2 1 + x2 2
Hence, for every Lévy measure M ∫
f (x)M (dx)
R
exists, and thus the continuous function
{ ∫ ( ) }
itx
ψ(t) = exp iat + eitx − 1 − M (dx) (6.5)
R 1 + x2
is well defined, and ψ(0) = 1. Using (6.4), we rewrite it:
{ ∫ ( ) }
itx 1 + x2
ψ(t) = exp iat + c e −1−
itx
µ(dx)
R 1 + x2 x2

Consider discrete µm = k pmk δxmk of finite supports without 0 that converge to µ. Then
∑ ( )
itxmk 1 + x2mk
ln µˆm (t) = iat + c pmk e itxmk
−1−
1 + x2mk x2mk
k
∑ 1 + x2mk ( itxmk ) ∑ pmk
= c pmk e − 1 − itc + iat
x2mk xmk
k k
= φm (t) − itam ,
where φm are of Poisson type (6.1). Thus µm are ID ch.fun. and so is ψ.

Let us denote the ch.f. given by (6.5) by (a, M ).

Proposition 6.8 ((a,M)-uniqueness Theorem)


The pair (a, M ) is unique, i.e. (a, M ) = (a′ , M ′ ) implies that a = a′ and M = M ′ .

Proof. Notice that the functions iat and the one defined by the integral are linearly independent.
Hence (a, M ) = (a′ , M ′ ) implies that a = a′ . Assume that a = 0 and ln ψ has two integral
representations with M and M ′ , or, equivalently with two corresponding probability measures µ
and µ′ .

We need to show that the function


∫ ( )
itx
ϕ(t) = eitx
−1− M (dx)
R 1 + x2
uniquely determines the probability µ. To this end, scale the variable, integrate, use Fubini’s
theorem, and exponentiate:
{ ∫ 1 } {∫ } {∫ u∫ }
1
exp ϕ(ut) dt = exp (cos ux − 1) M (dx) = E exp (cos xv − 1) dv M (dx) .
2 −1 R 0 R

This entails the distribution of a Poisson random measure on R+ × R with intensity Leb × M . It
remains to see that M is unique.

46
Proposition 6.9 ((a,M)-convergence Theorem)
(an , Mn ) → (a, M ) iff an → a and Mn → M weakly (which can be expressed in terms of the
convergence of the corresponding measures µn ).

Further, if (an , Mn ) → ψ continuous at the origin, then ψ = (a, M ) for some real a and some Lévy
measure M .

Proof. The sufficiency is obvious. The necessity will follow by the previous approach. We infer
that ϕn → ϕ implies that the distributions of the Poisson measures with mean Leb ⊗ Mn converge
weakly to the distribution of a Poisson measure with mean Leb ⊗ M . Hence Mn → M .

Proposition 6.10 (Lévy-Khinchin) Every ID ch.fun. has the unique representation (6.5).

Proof. As we have seen, (6.5) entails ID. Now, let ψ be an ID ch.fun, so is ψ 1/n for every n. Let
the latter be a ch.fun. of some probability µn . Thus
{ ( )} {∫ ( ) }
ψ(t) = lim exp n ψ (t) − 1)
1/n
= lim exp e − 1 nµn (dx)
itx
n n R

Rewrite the integral in the exponent:


∫ ∫ ( )
nx itx 1 + x2 x2
it 2
µn (dx) + eitx
− 1 − · nµn (dx) = ln(an , Mn )
R 1+x R 1 + x2 x2 1 + x2

By the convergence theorem an → a, Mn → M . So, ψ = (a, M ).

47
6.6 Exercises

1. Find an example of a r.v. or probability distribution that is not ID although its ch.f. does
not vanish ever. Hint: try simple discrete (even two-valued) r.vs.

2. Examples of ID laws: Normal, Poisson, stable, exponential, gamma. Prove (or just ob-
serve): If a one-parameter family of ch.funs. is of the form φ(t) = φθ (t) = exp { −θp(t) } ,
where θ may vary through R or [0, ∞), then φ is ID.

3. Convolutions of ID are ID. If φ is ID so is |φ|.

4. Let V be exponential, so it is ID. Is V α (so called, Weibull r.v.) ID?

5. Show that a discrete distribution is not ID. It might be not even decomposable! Find an
example (e.g., the binomial distribution is decomposable but...).

6. Let X be c-decomposable. Prove that necessarily |c| < 1. Infer from this that every c-
decomposable r.v. can be written as


D
X= Rn cn ,
n=0

where Rn are i.i.d., and so this provides a large class of examples.



7. Let X = k ak Nλk , where ak , λk > 0, Nλk are independent Poisson, and the infinite series
converges in distribution. Show that it converges a.s.

8. In the proof of Theorem 6.7 the “diagonal argument” was used. Write precisely all of its
details (beginning with “Let ϵ > 0...”) Hint: Let (xmk ) be a matrix of elements of a metric
space such that xm = limk xmk exists for every m, and also x = limm xm exists. Then there
is a subsequence km such that limm xm km = x. Although the statement in the theorem
involves a pointwise convergence of functions, which is not metrizable in general, the metric
convergence must be used somehow.

9. Prove that the function defined in (6.5) is continuous.

10. Prove that the functions iat and the one defined by the integral in (6.5) are linearly indepen-
dent.

11. At the end of the uniqueness and convergence theorems for (a, M ) there were three statements.

48
(a) First, w.l.o.g., we may assume that an atomless Lévy measure M is a probability mea-

sure. In fact, M = k Mk , where Mk are probabilities. Examine the details in both
statements.
(b) Let ξ, ξ ′ be Poisson measures on R+ × R with intensities Leb ⊗ M and Leb ⊗ M ′ . If
D
ξ = ξ ′ , then M = M ′ . Prove it.
(c) If the distributions of the Poisson measures with mean Leb ⊗ Mn converge weakly to the
distribution of a Poisson measure with mean Leb ⊗ M , then Mn → M . Prove it.
(d) Let the distributions of the Poisson measures with mean Leb ⊗ Mn converge weakly to
some distribution. Then it must be of some random Poisson measure with mean Leb⊗M .

49

You might also like