Lecturenotes5 6 Probability

Introduction to Probability Theory
K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay
August 5, 2017
2
LECTURES 5 - 6
Theorem 0.1 If X, Y are random variables on (Ω, F) and c ∈ R, then

X + c, cX, X + Y, X 2 and XY are random variables with respect to F.
Proof: The proof of first two are simple exercises. But don’t forget to write
down the solutions.
For a ∈ R,
{X + Y < a} = ∪r:rational ({X < r} ∩ {Y < a − r}) ∈ F .
The proof of the above identity, only uses ”between any two real numbers,
there is always a rational” Now
1
{X + Y ≤ a} = ∩∞
n=1 {X + Y < a + }.
n
Hence {X + Y ≤ a} ∈ F for all a ∈ R. Therefore X + Y is a random
variable.
For a ∈ R,

 ∅∈F if a < 0
{X 2 ≤ a} = {X = 0} = X −1 ({0}) ∈ F if a = 0
√ √ √ √
{− a ≤ X ≤ a} = X −1 ([− a, a]) ∈ F if a > 0 .

√ √
(In the above {0} and [− a, a] are closed sets and hence Borel sets.)
Hence, X 2 is a random variable.
Note that
1
XY = [(X + Y )2 − X 2 − Y 2 ] .
2
Since 2 (X + Y ) , − 2 X , − 12 Y 2 are random variables and XY is their sum,
1 2 1 2
XY is a random variable.
Theorem 0.2 (i) If X, Y are random variables with respect to F, so are

min{X, Y }, max{X, Y }.
(ii) If {Xn } is a sequence of random variables on (Ω, F) which is bounded
from above1 and
X(ω) = sup Xn (ω) , for all ω ∈ Ω.

n
1
Any set A from R which is bounded from above has a least upperbound. This is the
least upper bound property of R, it is usually taken as an axiom in the definition of real
number system.
3
Then X is a random variable with respect to F. Analogous statement

is true for infimum.
Proof:
(i) Set Z = min{X, Y }. For a ∈ R,
{Z ≤ a} = {X ≤ a} ∪ {Y ≤ a} ∈ F .
Therefore Z is a random variable.

The proof of max{X, Y } is a random variable is similar.
Proof of (ii) follows from
{X ≤ a} = ∩∞
n=1 {Xn ≤ a} .
Theorem 0.3 Let {Xn } be a sequence of random variables on (Ω, F) such

that
lim Xn (ω) exists for all ω ∈ Ω .
n→∞
Then X : Ω → R defined by
X(ω) = lim Xn (ω) , ω ∈ Ω

n→∞
is a random variable.
Proof. For a ∈ R.
ω ∈ {X ≤ a} ⇒ for each m ≥ 1, there exists n such that
1 1
Xk (ω) ≤ X(ω) + m ≤ a+ m for all k ≥ n
⇒ ω ∈ ∪∞ ∞
n=1 ∩k=n {Xk ≤ a +
1
m} , m ≥1
⇒ ω ∈ ∩∞ ∞ ∞
m=1 ∪n=1 ∩k=n {Xk ≤ a +
1
m} .
Hence
1
{X ≤ a} ⊆ ∩∞ ∞ ∞
m=1 ∪n=1 ∩k=n {Xk ≤ a + }.
m
Now suppose
1
ω ∈ ∩∞ ∞ ∞
m=1 ∪n=1 ∩k=n {Xk ≤ a + }.
m
If ω ∈
/ {X ≤ a}, then there exists there exists m0 and n0 such that
1
Xk (ω) > a + for all k ≥ n0 . (0.1)
m0
4
(This follows, since Xn (ω) → X(ω) as n → ∞)

Also
ω ∈ ∩∞ ∞ ∞
m=1 ∪n=1 ∩k=n {Xk ≤ a +
1
m} ⇒ ω ∈ ∪∞ ∞
n=1 ∩k=n {Xk ≤ a +
1
m0 }
⇒ Xk (ω) ≤ a + m10
for all k ≥ n1 for some n1
This contradicts (0.1). Therefore
ω ∈ {X ≤ a} .
Hence we have
1
{X ≤ a} = ∩∞ ∞ ∞
m=1 ∪n=1 ∩k=n {Xk ≤ a + }. (0.2)
m
1
Since {Xk ∈ a + m } ∈ F and F is a σ-field, using(0.2) it follows from the
definition of σ-field that
{X ≤ a} ∈ F .
Therefore X is a random variable.
Definition 0.1 A function f : R → R is said to be Borel (measurable)

function if f −1 (B) ∈ BR for all B ∈ BR .
Example 0.1 If f : R → R is continuous, then it is Borel function. This

follows from the fact that f −1 (O) is open if O is open in R.
Example 0.2 If f : R → R is a monotone function (i.e. either increasing2

or decreasing), then f is a Borel function.
Let f be increasing. For a ∈ R, consider the set Da := {f > a}. Note
that in general Da needed be bounded from below, see for example f (x) =
ex , x ∈ R and {f > 0} = , R itself. If Da is not bounded from below, then
Da = (−∞, ∞) ∈ BR .
If Da is bounded from below, set c = inf{x|f (x) > a}. There are two
cases, when c ∈ Da or c ∈
/ Da .
When c ∈ Da .
x ≥ c ⇒ f (x) ≥ f (c) > a ⇒ x ∈ Da .

2
f is increasing if when ever x < y, then f (x) ≤ f (y)
5
Hence [c, ∞) ⊆ Da . From the definition of Da and c, it follows that Da ⊆

[c, ∞). Hence Da = [c, ∞) ∈ BR .
When c ∈
/ Da .
x > c ⇒ there exists y ∈ Da such that c < y < x ⇒ f (x) ≥ f (y) > a ⇒ x ∈ Da .
Hence (c, ∞) ⊆ Da . Also it follows that Da ⊆ (c, ∞). Therefore Da =

(c, ∞) ∈ BR .
Thus we have,

 R if Da is not bounded from below
{f > a} = [c, ∞) if c = inf Da ∈ Da
(c, ∞) if c ∈ / Da .

i.e., {f > a} is an interval. Hence {f > a} ∈ BR for all a ∈ R. From this

we can see that {f ≤ a} ∈ BR for all a ∈ R. Therefore f is a Borel function.
Now we have a result which gives us plenty of random variables.
Lemma 0.1 Let f : R → R be a Borel function and X : Ω → R is a random

variable with respect to F. Then f ◦ X is a random variable with respect to
F.
Proof. For B ∈ BR ,
(f ◦ X)−1 (B) = X −1 (f −1 (B)) ∈ BR ,
since f −1 (B) ∈ BR . Hence f ◦ X is a random variable.
Example 0.3 The natural logarithm ln : (0, ∞) → R is defined by

Z x
1
ln x = dt.
1 t
Note that ln is a strictly increasing and continuous function which is also

a bijective map. We define the exponential function e : R → (0, ∞) as
the inverse of ln. Then both are Borel measurable and hence ln X, eX are
random variables if X is a random variable.
6
Chapter 3 : Conditional Probability and Independence

Key words: Conditional probability, law of total probability, Bayes the-
orem, independent events, independent σ-fields, independent random vari-
ables, lim sup and lim inf of events, Borel-Cantelli lemma.
In this chapter, we introduce the concepts of conditional probability and

independence.
”Conditioning” is an important tool for computing probabilities. Condi-
tioning is about writing down the probability of an event in terms of proba-
bilities of the event given that some other events have already occurred. For
using the above method, one need to understand these ’new’ probabilities.
Let us take an example to illustrate this. Suppose person I is inside a room
and throw a die and secretly tell his friend II that an even number turned
up. Now if a third person ask the person II how probable was the event
{1, 2, 3}? The II person will rule out the outcomes 1 and 3 and scale it using
even numbers to arrive at the new probability 31 which is different from 12 .
i.e., information changes the probabilities, these are called conditional prob-
abilities. Formalizing the above ’slicing’ and ’scaling’ leads to the following
definition of conditional probability.
Definition 3.1 Let (Ω, F, P ) be a probability space and A ∈ F be such

that P (A) > 0. The conditional probability of an event B ∈ F given A
denoted by P (B|A) is defined as
P (AB)
P (B|A) = .
P (A)
Define PA on F as follows.
PA (B) = P (B|A), B ∈ F .
Then (Ω, F, PA ) is a probability space satisfying
PA (A) = 1, PA (B) = 0 if B ⊆ Ac .
(exercise)
Remark 0.1 The above probability space may look useless but it is use-
ful for processing the thought experiments of the following type. Note that
(Ω, F, PA ) corresponds to a random experiment which comes from adding the
information that the event A has occurred. Hence by understanding directly
this random experiment gives the probabilities PA (B) directly without using
the formula for PA (B). We will see an illustration of this in a moment.
7
Example 0.4 Let Ω = {1, 2, 3, 4, 5, 6} and F = P(Ω) and P ({i}) = 61

for all i. Suppose, we got the information that ’even number’ has occured,
the what are the new ’probabilities’ i.e., the conditional probabilities of the
events?
Set A = {2, 4, 6}. We need to compute PA . They are given by
1
PA ({i}) = , i = 2, 4, 6, and = 0, i = 1, 3, 5.
3
Example 0.5 (Bridge hand) A pack of 52 cards are distributed among four
players and is called a bridge hand. Find the probability of a balanced bridge
hand of aces, i.e. each player gets an ace.
Define the following events. A♣ denote the event that a player gets the
ace of club, A♣,♦ , the event that two distinct players get an ace of club and
diamond each, A♣,♦,♥ , the event that three distinct players get an ace of
club, diamond and heart and A♣,♦,♥,♠ , the event that all players get an ace.
Then
A♣,♦,♥,♠ ⊂ A♣,♦,♥ ⊂ A♣,♦ ⊂ A♣ .

and hence using A♣,♦,♥,♠ = A♣,♦,♥,♠ ∩ A♣,♦,♥ ∩ A♣,♦ ∩ A♣ , we get (ex-
ercise)
P (A♣,♦,♥,♠ ) = P (A♣,♦,♥,♠ |A♣,♦,♥ )P (A♣,♦,♥ |A♣,♦ )P (A♣,♦ |A♣ )P (A♣ ).
Now P ((A♣ ) = 1. To compute3 P (A♣,♦ |A♣ ), observe that given A♣ , dis-

tributing diamond ace to one of the remaining three players is equivalent to
placing the ace of in one of the 39 locations out of a total of 51 locations.
Hence P (A♣,♦ |A♣ ) = 3951 . Similarly
26 13
P (A♣,♦,♥,♠ |A♣,♦,♥ ) = , P (A♣,♦,♥,♠ |A♣,♦,♥ ) = .
50 49
Hence
39 × 26 × 13
P (A♣,♦,♥,♠ ) = = 0.11.
51 × 50 × 49
3
To compute the conditional probabilities, we will not use the definition of conditional
probability instead we use understand an underlying random experiment which gives raise
to the probability space of conditional probabilities, as told in Remark 0.1. Note that
given the information that the club ace is distributed to one player, we can think about
the random experiment as an urn problem with 51 urns numbered 1 to 51 and a ball
(identified with diamond ace) and our event is distributing the ball into one of the first
39 urns.
8
The above example illustrates it is some times more easy to compute (or
natural to specify) conditional probabilities and use them to specify the
underlying probabilities, a reverse procedure!
Example 0.6 (Probability in the game show ”Let’s make a deal”) Here we
look at a version of the game show ”Let’s make a deal” which made its debut
on NBC Television network on December 30, 1963. Description of the game
is the following. A prize is placed behind one of the three doors and are
closed. Contestants of the show are aware of this. Contestant is asked to
select a door (but is not going to open at the moment). Once the choice is
made, the moderator of the show (Monty Hall) opens on of the remaining
doors and display what is in it (He will only open a door which has no prize).
Contestant now is given a chance to change the earlier choice. Qustion is,
will the contestant stay with the earlier choice or not?
Without any loss of generality, assume that the contestant chose door
no.1 (label chosen door as no.1).
Here take sample space as
Ω = {♦ab, ♦ba, a♦b, b♦a, ab♦, ba♦}.
(Here ♦ab denotes price ♦ behind the first door and the ’worthless’ a and
b behind the doors 2 and 3 respectively. Other sample points are similraly
interpreted.)
Question can be answered if we know the probability of ’♦ behind door
1’ given the additional information of the object behind one of the doors 2
or 3.
Let us denote the event ’♦ behind door 1’ by ’♦ ∈ 1’. Also Monty Hall
revealing object behind door 2 means behind door 2, object is either a or b.
Hence occurrence of the event A, i.e. revealing door 2 means occurrence of
{♦ab, ♦ba, ab♦, ba♦}. i.e.
A = {♦ab, ♦ba, ab♦, ba♦}.
Similarly the event of ’revealing door 3’ is given the occurrence of
B := {♦ab, ♦ba, a♦b, b♦a}.
Hence we need to find P (♦ ∈ 1|A ∪ B). Since A ∪ B = Ω, we get

1
P (♦ ∈ 1|A ∪ B) = P (♦ ∈ 1) = .
3
9
Hence it is better to change the option, since you have 2/3rd chance of
winning the prize by switching the door.
WARNING! Sometimes one make the mistake of calculating
1
P (♦ ∈ 1|A) + P (♦ ∈ 1|B) =
2
and gives ’wrong’ advise.
We have already seen ’finite’ partitions of the sample space. A collection

{A1 , A2 , . . . , AN } of events is said to be a partition of Ω, if
(i) Ai ’s are pairwise disjoint
(ii) ∪N
i=1 Ai = Ω .
Here N may be ∞. If N < ∞, then partition is said to be finite partition
and if N = ∞, it is called a countable partition.
Theorem 0.4 (Law of total probability-discrete form) Let (Ω, F, P ) be a

probability space and {A1 , A2 , . . . , AN } ⊆ F be a countable partition of Ω
such that P (Ai ) > 0 for all i. Then for B ∈ F,
n
X
P (B) = P (B|Ai ) P (Ai ) .
i=1
Proof.
N N
X X P (BAi )
P (B|Ai ) P (Ai ) = P (Ai )
P (Ai )
i=1 i=1
XN
= P (BAi )
i=1
= P (B(∪ni=1 Ai )) = P (B) .
The second last equality uses the countable additivity (for N = ∞) of prob-
ability to get convergence of the series.
Theorem 0.5 (Bayes Theorem) Let (Ω, F, P ) be a probability space and

A, B ∈ F such that P (A), P (B) > 0 and P (A) < 1. Then
P (B|A)P (A)
P (A|B) = .
P (B|A) + P (B|Ac )
10
Proof.
P (BA)P (A) P (B|A)P (A)
P (A|B) = = .
P (A) P (B) P (B)
Now use Law of total probability to complete the proof.
Remark 0.2 The total probability law is another formula for using condi-
tioning argument.

Lecturenotes5 6 Probability

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecturenotes5 6 Probability

Uploaded by

Copyright:

Available Formats

Introduction to Probability Theory

Theorem 0.1 If X, Y are random variables on (Ω, F) and c ∈ R, then

{X + Y < a} = ∪r:rational ({X < r} ∩ {Y < a − r}) ∈ F .

Theorem 0.2 (i) If X, Y are random variables with respect to F, so are

X(ω) = sup Xn (ω) , for all ω ∈ Ω.

Then X is a random variable with respect to F. Analogous statement

Therefore Z is a random variable.

Proof of (ii) follows from

Theorem 0.3 Let {Xn } be a sequence of random variables on (Ω, F) such

X(ω) = lim Xn (ω) , ω ∈ Ω

(This follows, since Xn (ω) → X(ω) as n → ∞)

This contradicts (0.1). Therefore

Definition 0.1 A function f : R → R is said to be Borel (measurable)

Example 0.1 If f : R → R is continuous, then it is Borel function. This

Example 0.2 If f : R → R is a monotone function (i.e. either increasing2

x ≥ c ⇒ f (x) ≥ f (c) > a ⇒ x ∈ Da .

Hence [c, ∞) ⊆ Da . From the definition of Da and c, it follows that Da ⊆

Hence (c, ∞) ⊆ Da . Also it follows that Da ⊆ (c, ∞). Therefore Da =

i.e., {f > a} is an interval. Hence {f > a} ∈ BR for all a ∈ R. From this

Now we have a result which gives us plenty of random variables.

Lemma 0.1 Let f : R → R be a Borel function and X : Ω → R is a random

(f ◦ X)−1 (B) = X −1 (f −1 (B)) ∈ BR ,

since f −1 (B) ∈ BR . Hence f ◦ X is a random variable.

Example 0.3 The natural logarithm ln : (0, ∞) → R is defined by

Note that ln is a strictly increasing and continuous function which is also

Chapter 3 : Conditional Probability and Independence

In this chapter, we introduce the concepts of conditional probability and

Definition 3.1 Let (Ω, F, P ) be a probability space and A ∈ F be such

Then (Ω, F, PA ) is a probability space satisfying

Example 0.4 Let Ω = {1, 2, 3, 4, 5, 6} and F = P(Ω) and P ({i}) = 61

A♣,♦,♥,♠ ⊂ A♣,♦,♥ ⊂ A♣,♦ ⊂ A♣ .

P (A♣,♦,♥,♠ ) = P (A♣,♦,♥,♠ |A♣,♦,♥ )P (A♣,♦,♥ |A♣,♦ )P (A♣,♦ |A♣ )P (A♣ ).

Now P ((A♣ ) = 1. To compute3 P (A♣,♦ |A♣ ), observe that given A♣ , dis-

Ω = {♦ab, ♦ba, a♦b, b♦a, ab♦, ba♦}.

A = {♦ab, ♦ba, ab♦, ba♦}.

Similarly the event of ’revealing door 3’ is given the occurrence of

B := {♦ab, ♦ba, a♦b, b♦a}.

Hence we need to find P (♦ ∈ 1|A ∪ B). Since A ∪ B = Ω, we get

We have already seen ’finite’ partitions of the sample space. A collection

(i) Ai ’s are pairwise disjoint

Theorem 0.4 (Law of total probability-discrete form) Let (Ω, F, P ) be a

Theorem 0.5 (Bayes Theorem) Let (Ω, F, P ) be a probability space and

You might also like