Professional Documents
Culture Documents
Chapter 3
Chapter 3
Conditional Expectation
zrrrennrlitigjn
Essence of PNB or
EX 沙
1
斷
i
1313
盤
P BIA RA t PCBHDP
爾
At
A B
A B
i
⼝ A E AIDS
佩
PLAID ⼩ ⾞ 器
Given B Africa
P
DslArifiak2
A B Region of Interest RE
A B PND Shifa
i
臖
GditiondEx .EXIY Ixifxn.ly
y
不
f
fxfyjdx
Eh
E
X income for univers
ig graduates
An
EX
Business Humanities
Y Science Engineering
S E B H
hn
我
iiixiy
Property of hg 比作 ⼩
ohgdependsonynnnrrernnnnnnnwnnnnnnnnnnnnn
Eh Y ⼆ EX
More generally
Ed IYEAKEYIYEAD
Éy E
Note
xicts.no
jxfdyjdx
h
Íxiptxily y xidy
Proof
Eh D fhyfydy
ffxffylyldxdy
ffxflx y dxdy
EX
LHS ⼆
y
fhyt 的邱 f dy
ffxfdyldy
X
It Nfǎy
y th
⼆
ffx IDEA f x y dxdg
yx
⼆
EKIXAD
RHS
Extending Elly y END
h.ly
⼤
YEZ
Properties
Z depends
on
y
on measurable
G
Elif
IN
11
EAZ ⼆
点 X
where A isly_measurable
EExteudingEXM EXH .in
Def ZEENA
Z depends on
A
lie Amearablej
EAZ EAX VAEA
ERIAFECXID
E X 2 IA ⼆ 0
E X 2 Y 0 畑
x 2I Y x
XECATD
鼎
2 X EXH
fx
o dy
0Hilbertspae
Q existence
unique of EEXH
AYesp.sn
X ⼆ 0
0 A
EYD
⼆
go a
P two measures on A EA
vap
PLA ⼆ 0 ⼆
EAKEXIA 0
RN Lemma
⼆ a
unique ZEA
UA f Zdp
ie
ll Il
Ey HZID
Entering
X xtx
Chapter 3
Conditional expectation
We will define conditional expectation E(X|A), where X is a r.v. and A is a sub- -algebra.
This will be used to define martingales later.
Given (⌦, F, P ), let A, B 2 F with P (B) > 0, and X, Y are two r.v.s.
1
– the population median m = F (1/2) (for heavy-tailed distribution), since
13
• Given some extra information B, the updated “best” guess of X is
– the sub-population (or sectional) mean µB = E(X|B) (for not so heavy-tailed dis-
tribution), since
µB = arg min E{(C X)2 |B}.
C
e.g. The salary of a fresh university graduate is $15,000, if (s)he works in fin-tech industry.
Given (⌦, F, P ), let A, B 2 F with P (B) > 0, and X, Y are two r.v.s.
E(XIB )
E(X|B) = ,
P (B)
14
An example
Example 3.1 Let X ⇠ U (0, 1] with pdf fX (x) = I(0,1] (x). Let
(i 1) i
Ai = {X 2 ( , ]}, i = 1, 2, ...., n.
n n
Pn
Clearly, ⌦ = i=1 Ai and P (Ai ) = 1/n. From the above definition, we get
R Z i/n
xfX (x)dx n i/n 2i 1
Ai
E(X|Ai ) = =n xdx = x2 = .
P (Ai ) (i 1)/n 2 (i 1)/n 2n
2i 1
The “best” guess for X, given Ai ’s (or (A1 , ..., Am )), is E(X|Ai ) = 2n
, i.e.,
1
Xn = E(X|A1 ) = if A1 occurs (with prob. n 1 )
2n
3
E(X|A2 ) = if A2 occurs (with prob. n 1 )
2n
..........
2n 1
E(X|An ) = if An occurs (with prob. n 1 ).
2n
These are the updated expectations on the new space (A1 , ..., Am ). For example, when n = 5,
Xn is a r.v. taking values = 0.1, 0.3, 0.5, 0.7, 0.9 with equal probabilities 1/5.
Z := h(Y ).
n
Theorem 3.1 Let Y be a discrete r.v. taking values {y1 , ..., ym } (m could be 1). Then,
S
• {Z t} = {h(Y ) t} = {k:h(yk )t} {Y = yk } 2 (Y ). So h(Y ) is (Y )-measurable
15
END 1
EXH
0Ah Az Asl A4 f
EKÌ
• For Ai = {Y = yi }. we have
E (h(Y )IAi ) = E (h(yi )IAi ) = h(yi )P (Ai ) = E(X|Ai )P (Ai ) = E(XIAi ). (3.1)
E(X| (Y )) =: E(X|Y ).
Alternatively, the two properties in Theorem 3.1 can be used to define E(X| (Y )) for all r.v.’s
Y (discrete or otherwise).
Definition 3.2 If E|X| < 1, we define E(X| (Y )) to be any r.v. Z such that:
• Z is (Y )-measurable, i.e. {Z t} ⇢ (Y ), 8t 2 R.
Definition 3.3 Give (⌦, F, P ) and a sub- -algebra A ⇢ F. If E|X| < 1, we define E(X|A)
to be any r.v. Z such that
Let
µ(A) := P (A) and ⌫(A) := EA X.
Then µ and ⌫ are two -finite measures on (⌦, A). Then EA X = EA Z can be rewritten as
Z
⌫(A) = Zdµ
A
16
Theorem 3.2
(a) Z is A-measurable;
Z
(b) ⌫(A) =: EA X = ZdP = EA Z, 8A 2 A.
A
Define L2 (⌦, G, P ) = {W 2 (⌦, G, P ) : EW 2 < 1}. Then, E(X|A) has a very nice interpreta-
tion.
Hence, E(X|A) is the projection of X onto L2 (⌦, A, P ), i.e., the closest point in this subspace
to X.
17
and the equality is attained at Y = E(X|A) 2 A.
It remains to show C = 0:
C = E(X E(X|A))(E(X|A) Y )
= E[E(X E(X|A))(E(X|A) Y )|A]
= E{[E(X|A) Y ][E(X E(X|A))|A]}
= E{(E(X|A) Y ) ⇥ 0}
= 0.
Loi
() E(X
() (X
Z)Y = 0,
Z) ? Y,
8A 2 A
8A 2 A
ye
(Here ? means perpendicular, not independent.)
() Z = arg minY 2L2 (⌦,A,P ) E(X Y )2 .
2. Next consider everything in L1 (⌦, A, P ). Define
Zn = E(X + ^ n|A) E(X ^ n|A),
EU
and Z = limn Zn .
Recall that, in order to show that E(W |A) = Z, it suffices to show that
It
Ai
(a) Z 2 A;
(b) EA W = EA Z, for any A 2 A.
ZE
18 EAW ⼆ 点 2
KAEA
take
y 有 EA
⼋
年
⼀
If EKD In ⼆ 0
VAFA
E 熙州
Eyz y
0
YEA
WE A Eh ⼦ D
xwxy 㸂 X
2 X
fx
EX iy
OHilbertspae ENIA
ln UA
EA
Properties of E(X|A) are very much like their unconditional counterparts. Here are some
examples.
which is a contradiction.
(E(X|A)) E( (X)|A).
Proof. Take µA = E(X|A) 2 A. Since is convex, there exists ⇢µA such that
(a) Z 2 A by assumption;
(b) for all A 2 A, we have EA (Z) = EA (X).
5. (Partial information: taking out what is known.) If X 2 A, and E|Y | < 1 and
E|XY | < 1, then E(XY |A) = XE(Y |A).
Proof. Let Z = XE(Y |A).
19
Proof of (b). We prove this result in the most typical manner.
FFMBY
(6.3) holds since
(a) Clearly EX 2 A.
(Any constant C is A-measurable as {C t} is either ; or ⌦.)
(b) For all A 2 A, we have
20
Appendix: An introduction to probability
Sets
• ! 2 A : ! is an element of A.
Ac = {! :! 62 A} (complement)
A[B = {! :! 2 A, or ! 2 B} (union)
A\B = {! :! 2 A, and ! 2 B} (intersection)
[1
n=1 An = {! :! 2 An for some n}
\1
n=1 An = {! :! 2 An for all n}.
A probability is a (set) function from a -algebra to [0, 1]. It is defined on -algebras, since the
power set P(⌦) can be too large unless ⌦ is finite or countable.
• ⌦ 2 F,
• Ac 2 F whenever A 2 F,
• [1
n=1 An 2 F whenever An 2 F, n 1.
• 8A 2 F, 0 P (A) 1,
21
• P (⌦) = 1,
P1 P1
• P( 1 An ) = 1 P (An ).
{! : X(!) r} 2 F.
{X = 0} = {w : X(w) = 0} = T ⇢ F, {X = 1} = {w : X(w) = 1} = H ⇢ F.
Expectation
Pn Pn
• For a simple r.v. X = 1 ai IAi with 1 Ai = ⌦, Ai 2 A, define
n
X
EX = ai P (Ai ).
1
• For X 0, there exists simple nonnegative r.v.’s Xn (!) % X(!) for every !. We define
EX = lim EXn 1
n!1
EX = EX + EX ,
22
Properties of expectations
• (Dominated Convergence Theorem). If Xn ! X a.s., |Xn | < Y a.s. for all n, and
EY < 1, then limn EXn = EX = E limn Xn .
P P1
• (Integration term by term). If 1 E|Xn | < 1, then |Xn | < 1, a.s. so that
P1 P1 i=1 P1 i=1
i=1 Xn converges a.s., and E ( i=1 Xn ) = i=1 EXn .
Convergence Concepts
4. Xn ! X in distribution if limn!1 FXn (x) = FX (x) for all continuity points of FX (x).
Conceptually, the mode of almost sure convergence keeps track of the values of random variables
at the same sample point on and on. It requires the convergence of X(!) at almost all sample
points. The mode of convergence in probability requires that the event on which Xn and X
di↵er more than a fixed amount shrinks in probability. This event can be di↵erent when n = 10
23
and when n = 11 and so on. Because of this, the convergence in probability does not imply
convergence of Xn (!) for any ! 2 ⌦. The following classical example is a vivid way to illustrate
this point.
Example. Let ⌦ = [0, 1]. Let F be the classical Borel -algebra on [0, 1] and P be the uniform
probability measure.
It is seen that
m
P (|Xn | > 0) 2
where m = log n/ log 2 1. Hence as n ! 1, P (|Xn | > 0) ! 0. This implies Xn ! 0 in
probability.
Theorem 3.4
1. Xn ! X a.s. or in Lr (r 1) =) Xn !p X =) Xn !d X.
2. If r > s > 0, then Xn ! X in Lr =) Xn ! X in Ls .
3. No other implications hold in general.
We also have some partial converses to the above results, i.e., converse results with some
additional assumptions. Here are a few examples.
24
Theorem 3.7 (Dominated convergence a.s. implies convergence in mean) If Xn !
X a.s., P (|Xn | Y ) = 1 for all n, and EY r < 1 for r 0, then Xn ! X in Lr .
Theorem 3.11 (Vitali’s Theorem) Suppose that Xn !p X, and E|Xn |r < 1 all n (i.e.
Xn 2 Lr ). Then the following three statements are equivalent.
Theorem 3.12 (Continuous mapping theorem) Let X1 , X2 , ... and X be k-dim random
vectors, g : Rk ! R be continuous. Then
25
(a). Xn + Yn !d X + C.
(b). Xn Yn !d CX.
(c). Xn /Yn !d X/C if C 6= 0.
Skorokhod’s
representation
theorem
a.s.
-
Xn X
6AA
AA
AA fast enough or
AA subsequence
AA
AA
AA Len on
AA
AA
AA
AA
AA
AA P d
u.i. fast X - X > Xn - X
n
enough
X=C
? r
L-
L.TT
Xn X<
u.i.
26
Radon-Nikodym theorem
Definition 3.10 Given two measures µ and ⌫ on (⌦, F), we say that ⌫ is absolutely con-
tinuous w.r.t. µ, written as ⌫ ⌧ µ (i.e., ⌫ is dominated by µ ), if
R
Remark 3.1 Clearly, ⌫ ⌧ µ and µ is -finite () 9f 0 : ⌫(A) = A f dµ.
Example 3.3 Toss a coin twice. Then we can construct a probability space (⌦, F, P ), where
⌦ = {HH, HT, T H, T T },
F = {all possible subsets of ⌦}
= {;, HH, HT, T H, T T, {HH, HT }, ...., ⌦}, |F| = 24 = 16
P (HH) = P (HT ) = P (T H) = P (T T ) = 1/4.
{X = 0} = {w : X(w) = 0} = {T H, T T } ⇢ F,
{X = 1} = {w : X(w) = 1} = {HH, HT } ⇢ F.
Now
(X) = (X = 0, X = 1) = ({X = 1}c , {X = 1}) = ({X = 1})
27
= ({HH, HT }) = {;, ⌦, {HH, HT }, {T H, T T }}
(Y ) = (Y = 0, Y = 1) = ({Y = 1}c , {Y = 1}) = ({Y = 1})
= ({HH, T H}) = {;, ⌦, {HH, T H}, {HT, T T }}.
Similarly, it can also be shown that
(X) and (Y ) contain information about possible outcomes for the first and second tosses,
respectively. Knowing both (X) and (Y ) (i.e., information about both tosses), given by
(X, Y ), we get F.
Theorem 3.15 Let X be a discrete r.v. taking distinct values {xi , 1 i n} (where n could
be 1) and let Ai = {! : X(!) = xi }. Then,
3.7 Exercises
1. Let V ar(X|A) = E(X 2 |A) (E(X|A))2 . Show
Remark 3.2 From the exercise, we get V ar(X) V ar (E(X|A)). That is, smoothing
by local averaging (i.e. E(X|A)) reduces variance.
2. Show that if X and Y are r.v.’s with E (Y |A) = X and EX 2 = EY 2 < 1, then X = Y
a.s. (i.e. P (X = Y ) = 1). [Hint: Work out E(X Y )2 .]
Game y if
OECYHKXEA
EXEEY2
A.i 28 yinx
Malhematiedprofnznnnr
EZ ⼆
EIEY_HD EY EY EYr.co
V4 EZYEDTEZ
⼆ EXTEF
crrv
2EXY
wv
2 EXLEXY as EXEEY2
9EECXYIA
as EXY ⼆
EY Ey
2
⼆ 比
⼗
⼆
z x_x 0 as