Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

MATH3029/MATH6109 LECTURE 15

We move on to a new chapter, and we begin by studying the expectation of


random variables. The setup for today is as follows:
• expectation for discrete random variables;
• expectation for general random variables;
• properties of expectation.

1. Expectation for discrete random variables


One of the most basic questions about a random variable X : Ω → R on a
probability space (Ω, F, P) that you could ask is: what is the average value of X?
The answer is given by the expectation.
Let X be a random variable on a probability space (Ω, F, P) with countable
range X(Ω) ⊆ R.
Definition 1.1. If
X X
(1.1) |x| P(X = x) = |x| PX ({x}) < ∞,
x∈X(Ω) x∈X(Ω)

then we say that X has an expectation, and we write X ∈ L1 (Ω, F, P) or simply


X ∈ L1 (Ω). Then
X X
E(X) = EP (X) := x P(X = x) = x PX ({x})
x∈X(Ω) x∈X(Ω)

is the expectation of X.
There are several things to note about this definition:
• E(X) depends only on the distribution PX of X.
• In general it need not be the case that E(X) ∈ X(Ω).
• One can trivially write E(X) in terms of the discrete density function ρ :
R → [0, 1] of PX , given by ρ(x) = PX ({x}) for x ∈ X(Ω), as
X
E(X) = xρ(x).
x∈X(Ω)

• If X
|x| P({X = x}) = ∞,
x∈X(Ω)
then the expression
X
x P({X = x})
x∈X(Ω)

does not make sense in general (because we might end up subtracting in-
finity from infinity, and the latter is not well-defined), and we say that X
does not have an expectation, denoted X ∈ / L1 (Ω). This is not merely a
technicality, if you try to calculate the expectation in such a case then you
can encounter things which don’t make sense.
1
2 LECTURE 15

• If X ∈/ L1 (Ω) but X(Ω) ⊆ [0, ∞) then one can still make sense of the
expectation as
X
(1.2) x P({X = x}) = ∞.
x∈X(Ω)

We say that E(X) = ∞ when X ≥ 0 and (1.2) holds.


• Since |X| ≥ 0, one can rephrase (1.1) as saying that X ∈ L1 (Ω) if and only
if E(|X|) < ∞, and we will do so frequently.
• If Ω is countable and F = P(Ω), then each X : Ω → R is discrete. In
this case one can simply view the set {X = x} as a countable union of
singletons, and write
X
E(|X|) = |X(ω)|P({ω}).
ω∈Ω

Moreover, if the latter quantity is finite, i.e. if X ∈ L1 (Ω), then


X
E(X) = X(ω)P({ω}).
ω∈Ω

Example 1.2. Let (Ω, F, P) be a probability space and A ∈ F. Then 1A ∈ L1 (Ω)


and

E(1A ) = 0 · P({1A = 0}) + 1 · P({1A = 1}) = 0 · P(Ac ) + 1 · P(A) = P(A).

In words, the expectation of the indicator function of an event is the probability of


that event.
In particular, if X : Ω → {0, 1} is a Bernoulli random variable with success
probability p ∈ [0, 1], then E(X) = p.

You can easily calculate the expectation of a binomial random variable in a


similar way, but we will derive it as a simple corollary later on.

Example 1.3. Let (Xk )∞ k=1 be a Bernoulli sequence with success probability p ∈ [0, 1]
on (Ω, F, P), i.e. (Xk )∞
k=1 is i.i.d. and has the Bernoulli distribution with parameter
p. Let T := inf{k ∈ N | Xk = 1} : Ω → N. If (Xk )∞ k=1 models an infinite coin
toss, then T can be the waiting time for the first tails (or heads, whichever one you
choose to correspond to 1). What is the expectation of T ?
If p = 1 then T = 1 constantly, so E(X) = 1. If p = 0 then T = ∞ so T ∈ / L1 (Ω).
If p ∈ (0, 1) then we have seen in Chapter 2 that T −1 has the geometric distribution
Gp . Since T − 1 ≥ 0 we can consider E(T ) ∈ [0, ∞], and with q = 1 − p we obtain

X ∞
X ∞
X
E(T ) = iP(T = i) = iP(T − 1 = i − 1) = (k + 1)Gp ({k})
i=1 i=1 k=0
∞ ∞ d X ∞
X X  d 1
= (k + 1)pq k = ipq i−1 = p si =p
i=1
ds i=0 s=q ds 1 − s s=q
k=0
1 1
=p = ,
(1 − q)2 p
where we used that both series on the final line converge because q ∈ (0, 1).
LECTURE 15 3

2. General random variables


Let X : Ω → R be a random variable, where R is equipped with the Borel σ-
algebra. (More generally, one could replace R by Rn .) To define the expectation of
X one uses the theory of Lebesgue integration. We don’t discuss that theory in this
course, but I’ll give a very brief summary of the parts which we use. These ideas
are the same as those used to define the Lebesgue integral on Rn . We proceed in
three steps.
Step 1 We have already seen how to define the expectation of X if X(Ω) ⊆ R is
finite. Such an X is usually called a simple random variable.
Step 2 For a nonnegative random variable X : Ω → [0, ∞], one can find a sequence
(Xn )n∈N of simple random variables Xn : Ω → [0, ∞] such that Xn (ω) ↑
X(ω) for all ω ∈ Ω. We set
E(X) := lim E(Xn ) ∈ [0, ∞],
n→∞

and say that X ∈ L1 (Ω, F, P) if E(X) < ∞.


Step 3 For a general random variable X : Ω → [−∞, ∞], write X ∈ L1 (Ω, F, P)
or X ∈ L1 (Ω) if |X| ∈ L1 (Ω, F, P). Since |X| ≥ 0 is a random variable,
this makes sense by the previous step. Then, if X ∈ L1 (Ω), we write X =
X+ −X− , where X+ , X− : Ω → [0, ∞] are given by X+ (ω) := max(X(ω), 0)
and X− (ω) := max(−X(ω), 0) for ω ∈ Ω. Then
(2.1) E(X) := E(X+ ) − E(X− ).
There are again several things to note.
• This generalizes the definition of the expectation for discrete random vari-
ables.
• We allow for X : Ω → [−∞, ∞] to assume the values ±∞, this is oc-
casionally useful. But for the expectation to be finite one must have
P(|X| = ∞) = 0.
• If E(|X|) = ∞ then one may have E(X+ ) = E(X− ) = ∞, since |X| =
X+ + X− , and in this case the expression in (2.1) does not make sense.
That is why we require that E(|X|) < ∞.
• The expectation is the integral over Ω of X with respect to P, and the steps
above show how to define this integral properly. Hence one often writes
Z
E(X) = X(ω)dP(ω).

Since we’re dealing with an abstract random variable on an abstract prob-
ability space here, you need the machinery above to make sense of this
integral to begin with.
• This abstract process doesn’t really let us calculate any expectations in
examples just yet, for that we need some properties of expectations.
Also, it is useful to compare the expectation to the notion of a median.
Definition 2.1. Let X : Ω → R be a random variable on (Ω, F, P). A median of X
is a µ ∈ R such that P(X ≥ µ) ≥ 1/2 and P(X ≤ µ) ≥ 1/2.
At least one median exists, by the right-continuity of the cumulative distribution
function (draw a picture to convince yourself, by considering separately the case
where FX is continuous at the point where its values cross 21 , and the case where
4 LECTURE 15

it is not). But the median and expectation of a random variable are in general not
the same. Each gives you a simple way to say something about the average of a
real-valued random variable.

3. Properties of expectations
There are several basic properties which are not difficult to show (trying to prove
them is a useful exercise, but not incredibly important).
Theorem 3.1. Let X, Y, X1 , X2 , . . . ∈ L1 (Ω, F, P). Then the following properties
hold.
(1) If X ≤ Y then E(X) ≤ E(Y ).
(2) For all c ∈ R one has cX, X +Y ∈ L1 (Ω) and E(cX) = cE(X), E(X +Y ) =
E(X) + E(Y ).
(3) If Xk ↑ X as k → ∞, then E(X) = Plim k→∞ E(Xk ).
∞ P∞
(4) If Xk ≥ 0 for all k ∈ N and X = k=1 Xk , then E(X) = k=1 E(Xk ).
(5) If X and Y are independent, then E(XY ) = E(X)E(Y ).
(6) If X and Y are identically distributed then E(X) = E(Y ).
Some remarks:
• The fifth property might seem surprising when you realize that an expec-
tation is just an integral, because it says that the integral of a product is
the product of the integrals. However, it is easy to see that this is true by
first looking at simple functions and then approximating.
• The final property is another instance of a motto which we’ve encountered
before: it is generally not the underlying probability space (Ω, F, P) that is
relevant, only the distribution on R of the random variable X.
The contents of the final remark are reinforced even more by the following propo-
sition.
Proposition 3.2. Let X : Ω → R be a real-valued random variable with proba-
bility density function ρ : R → [0, ∞) on a probability space (Ω, F, P). Then
X ∈ L1 (Ω, F, P) if and only if
Z
|x|ρ(x)dx < ∞,
R
in which case Z
E(X) = xρ(x)dx.
R
Furthermore, if Y : Ω → Rn is a continuous random variable with density function
fY : Rn → [0, ∞), and g : Rn → R is another random variable, then g(Y ) = g ◦ Y ∈
L1 (Ω, F, P) if and only if
Z
|g(x)|fY (x)dx < ∞,
Rn
in which case Z
E(g(Y )) = g(x)fY (x)dx.
Rn

The proof of this proposition uses approximation by simple random variables.


Due to time restriction we won’t do it here, but you can have a look at Corollary
4.13 in the book if you’re interested. The first part of the proposition tells you how
LECTURE 15 5

to compute the expectation of a continuous random variable. The second part is


even more powerful, since it allows you to compute the expectation of any function
of a continuous random variable. We will use the second part of this proposition a
lot. For example, it is useful in the case where n = 1 and g(x) = x2 , to compute
E(Y 2 ).
Even when X is not continuous, in fact when X is a general random variable,
one can still calculate its expectation using the cumulative distribution function.
Proposition 3.3. Let X : Ω → R be random variable with cumulative distribution
function FX : R → [0, 1] on a probability space (Ω, F, P). Then X ∈ L1 (Ω, F, P) if
and only if
Z ∞ Z 0
(1 − FX (c))dc + FX (c)dc < ∞,
0 −∞
in which case
Z ∞ Z 0
E(X) = (1 − FX (c))dc − FX (c)dc.
0 −∞

Proof. I’ll indicate how to prove this statement using the so-called standard ma-
chine, a method of proof for statements on Lebesgue integration where one reduces
the proof for general random variables to a statement for indicator functions. It
is a good exercise in measure and integration theory to fill in some of the details
which I am leaving out.
First of all, by writing X = X+ − X− as in Step 3 of the definition of the
expectation, one can check that it suffices to prove the statement for nonnegative
X. (Afterwards, one can apply that statement to the nonnegative random variables
X+ and X− and use the linearity of the expectation.)
Moreover, by using approximations by simple random variables as in Step 2 of
the definition of the expectation, it suffices to assume that X is a simple random
variable, i.e. that X(Ω) ⊆ R is finite.
Finally, by linearity of the expectation, we may assume that X = 1A is an
indicator function of an event A ∈ F. In this case one trivially has X ∈ L1 (Ω, F, P)
and
Z ∞ Z 0
(1 − FX (c))dx + FX (c)dx < ∞,
0 −∞
and in fact, since X(Ω) ⊆ {0, 1},
Z ∞ Z 0 Z ∞ Z 1
(1 − FX (c))dx + FX (c)dx = P(X > c)dc = P(X > c)dc
0 −∞ 0 0
Z 1
= P(X = 1)dc = P(1A = 1) = P(A) = E(X),
0

by what we have already proved in Example 1.3 about the expectation of an indi-
cator function. □

4. More examples
Example 4.1. Let X1 , . . . , Xn be a finite Bernoulli sequence on (Ω, F, P). That is,
the Xi are i.i.d. Bernoulli random variables, with success probability p ∈ [0, 1]. Let
P n
S = i=1 Xi . It then follows from our derivation of the multinomial and binomial
6 LECTURE 15

distributions (see Theorem 2.9 in the book) that PS = Bn,p . Moreover, by Example
1.3 and the linearity of the expectation one has
Xn
E(S) = E(Xi ) = np.
i=1
Linearity of the expectation does not actually require independence, so the above
holds even without the assumption of independence (although then S may not be
binomially distributed).
Example 4.2. Let X : Ω → (0, ∞) be a random variable on (Ω, F, P) with the
gamma distribution Γα,r for α, r > 0. We can calculate E(X) using the density
function and a change of variables:
Z ∞ Z ∞
αr
E(X) = xγα,r (x)dx = xr e−αx dx
0 Γ(r) 0
Z ∞
1 Γ(r + 1) r
= xr e−x dx = = ,
αΓ(r) 0 αΓ(r) α
where we used the definition of the gamma function and, in the last step, the
identity Γ(r + 1)/Γ(r) = r for r > 0.
In particular, if X has the exponential distribution Eα then E(X) = 1/α.
Example 4.3. Let X : Ω → R be a random variable on (Ω, F, P) with the Cauchy
distribution with location parameter 0 and scale parameter α > 0. That is, the
distribution PX of X has probability density function
α
ρ(x) = (x ∈ R).
π(α + x2 )
2

Since ρ is symmetric around zero, one might be tempted to think that E(X) = 0.
However, a change of variables shows that
Z c
1 c 1 c/α 1
Z Z
1 dx 1 c π
FX (c) = ρ(x)dx = x 2 = 2
dx = arctan +
−∞ π −∞ 1 + α2 α π −∞ 1 + x π a 2
for c < 0. And basic asymptotics of the arctan function show that FX (c) = FX (c) ∼
1
c as c → −∞. So Z 0
FX (c) dx = ∞,
−∞
/ L1 (Ω, F, P) by Proposition 3.3.
and X ∈

You might also like