Professional Documents
Culture Documents
Wattle Lecture 15
Wattle Lecture 15
is the expectation of X.
There are several things to note about this definition:
• E(X) depends only on the distribution PX of X.
• In general it need not be the case that E(X) ∈ X(Ω).
• One can trivially write E(X) in terms of the discrete density function ρ :
R → [0, 1] of PX , given by ρ(x) = PX ({x}) for x ∈ X(Ω), as
X
E(X) = xρ(x).
x∈X(Ω)
• If X
|x| P({X = x}) = ∞,
x∈X(Ω)
then the expression
X
x P({X = x})
x∈X(Ω)
does not make sense in general (because we might end up subtracting in-
finity from infinity, and the latter is not well-defined), and we say that X
does not have an expectation, denoted X ∈ / L1 (Ω). This is not merely a
technicality, if you try to calculate the expectation in such a case then you
can encounter things which don’t make sense.
1
2 LECTURE 15
• If X ∈/ L1 (Ω) but X(Ω) ⊆ [0, ∞) then one can still make sense of the
expectation as
X
(1.2) x P({X = x}) = ∞.
x∈X(Ω)
Example 1.3. Let (Xk )∞ k=1 be a Bernoulli sequence with success probability p ∈ [0, 1]
on (Ω, F, P), i.e. (Xk )∞
k=1 is i.i.d. and has the Bernoulli distribution with parameter
p. Let T := inf{k ∈ N | Xk = 1} : Ω → N. If (Xk )∞ k=1 models an infinite coin
toss, then T can be the waiting time for the first tails (or heads, whichever one you
choose to correspond to 1). What is the expectation of T ?
If p = 1 then T = 1 constantly, so E(X) = 1. If p = 0 then T = ∞ so T ∈ / L1 (Ω).
If p ∈ (0, 1) then we have seen in Chapter 2 that T −1 has the geometric distribution
Gp . Since T − 1 ≥ 0 we can consider E(T ) ∈ [0, ∞], and with q = 1 − p we obtain
∞
X ∞
X ∞
X
E(T ) = iP(T = i) = iP(T − 1 = i − 1) = (k + 1)Gp ({k})
i=1 i=1 k=0
∞ ∞ d X ∞
X X d 1
= (k + 1)pq k = ipq i−1 = p si =p
i=1
ds i=0 s=q ds 1 − s s=q
k=0
1 1
=p = ,
(1 − q)2 p
where we used that both series on the final line converge because q ∈ (0, 1).
LECTURE 15 3
it is not). But the median and expectation of a random variable are in general not
the same. Each gives you a simple way to say something about the average of a
real-valued random variable.
3. Properties of expectations
There are several basic properties which are not difficult to show (trying to prove
them is a useful exercise, but not incredibly important).
Theorem 3.1. Let X, Y, X1 , X2 , . . . ∈ L1 (Ω, F, P). Then the following properties
hold.
(1) If X ≤ Y then E(X) ≤ E(Y ).
(2) For all c ∈ R one has cX, X +Y ∈ L1 (Ω) and E(cX) = cE(X), E(X +Y ) =
E(X) + E(Y ).
(3) If Xk ↑ X as k → ∞, then E(X) = Plim k→∞ E(Xk ).
∞ P∞
(4) If Xk ≥ 0 for all k ∈ N and X = k=1 Xk , then E(X) = k=1 E(Xk ).
(5) If X and Y are independent, then E(XY ) = E(X)E(Y ).
(6) If X and Y are identically distributed then E(X) = E(Y ).
Some remarks:
• The fifth property might seem surprising when you realize that an expec-
tation is just an integral, because it says that the integral of a product is
the product of the integrals. However, it is easy to see that this is true by
first looking at simple functions and then approximating.
• The final property is another instance of a motto which we’ve encountered
before: it is generally not the underlying probability space (Ω, F, P) that is
relevant, only the distribution on R of the random variable X.
The contents of the final remark are reinforced even more by the following propo-
sition.
Proposition 3.2. Let X : Ω → R be a real-valued random variable with proba-
bility density function ρ : R → [0, ∞) on a probability space (Ω, F, P). Then
X ∈ L1 (Ω, F, P) if and only if
Z
|x|ρ(x)dx < ∞,
R
in which case Z
E(X) = xρ(x)dx.
R
Furthermore, if Y : Ω → Rn is a continuous random variable with density function
fY : Rn → [0, ∞), and g : Rn → R is another random variable, then g(Y ) = g ◦ Y ∈
L1 (Ω, F, P) if and only if
Z
|g(x)|fY (x)dx < ∞,
Rn
in which case Z
E(g(Y )) = g(x)fY (x)dx.
Rn
Proof. I’ll indicate how to prove this statement using the so-called standard ma-
chine, a method of proof for statements on Lebesgue integration where one reduces
the proof for general random variables to a statement for indicator functions. It
is a good exercise in measure and integration theory to fill in some of the details
which I am leaving out.
First of all, by writing X = X+ − X− as in Step 3 of the definition of the
expectation, one can check that it suffices to prove the statement for nonnegative
X. (Afterwards, one can apply that statement to the nonnegative random variables
X+ and X− and use the linearity of the expectation.)
Moreover, by using approximations by simple random variables as in Step 2 of
the definition of the expectation, it suffices to assume that X is a simple random
variable, i.e. that X(Ω) ⊆ R is finite.
Finally, by linearity of the expectation, we may assume that X = 1A is an
indicator function of an event A ∈ F. In this case one trivially has X ∈ L1 (Ω, F, P)
and
Z ∞ Z 0
(1 − FX (c))dx + FX (c)dx < ∞,
0 −∞
and in fact, since X(Ω) ⊆ {0, 1},
Z ∞ Z 0 Z ∞ Z 1
(1 − FX (c))dx + FX (c)dx = P(X > c)dc = P(X > c)dc
0 −∞ 0 0
Z 1
= P(X = 1)dc = P(1A = 1) = P(A) = E(X),
0
by what we have already proved in Example 1.3 about the expectation of an indi-
cator function. □
4. More examples
Example 4.1. Let X1 , . . . , Xn be a finite Bernoulli sequence on (Ω, F, P). That is,
the Xi are i.i.d. Bernoulli random variables, with success probability p ∈ [0, 1]. Let
P n
S = i=1 Xi . It then follows from our derivation of the multinomial and binomial
6 LECTURE 15
distributions (see Theorem 2.9 in the book) that PS = Bn,p . Moreover, by Example
1.3 and the linearity of the expectation one has
Xn
E(S) = E(Xi ) = np.
i=1
Linearity of the expectation does not actually require independence, so the above
holds even without the assumption of independence (although then S may not be
binomially distributed).
Example 4.2. Let X : Ω → (0, ∞) be a random variable on (Ω, F, P) with the
gamma distribution Γα,r for α, r > 0. We can calculate E(X) using the density
function and a change of variables:
Z ∞ Z ∞
αr
E(X) = xγα,r (x)dx = xr e−αx dx
0 Γ(r) 0
Z ∞
1 Γ(r + 1) r
= xr e−x dx = = ,
αΓ(r) 0 αΓ(r) α
where we used the definition of the gamma function and, in the last step, the
identity Γ(r + 1)/Γ(r) = r for r > 0.
In particular, if X has the exponential distribution Eα then E(X) = 1/α.
Example 4.3. Let X : Ω → R be a random variable on (Ω, F, P) with the Cauchy
distribution with location parameter 0 and scale parameter α > 0. That is, the
distribution PX of X has probability density function
α
ρ(x) = (x ∈ R).
π(α + x2 )
2
Since ρ is symmetric around zero, one might be tempted to think that E(X) = 0.
However, a change of variables shows that
Z c
1 c 1 c/α 1
Z Z
1 dx 1 c π
FX (c) = ρ(x)dx = x 2 = 2
dx = arctan +
−∞ π −∞ 1 + α2 α π −∞ 1 + x π a 2
for c < 0. And basic asymptotics of the arctan function show that FX (c) = FX (c) ∼
1
c as c → −∞. So Z 0
FX (c) dx = ∞,
−∞
/ L1 (Ω, F, P) by Proposition 3.3.
and X ∈