Lesson 1. Probability Spaces

ENSC 802 – Stochastic Systems – 2002 1
Lesson 1. Probability Spaces
In this unit, we study probability spaces and review some basic definitions. Most of the
material is presented in the form of examples, since it is really review.
Probability Spaces
Our goal is to find a probabilistic model of a physical phenomenon and to do this we need to
carefully define
 The sample space, W
 The event space, F
 The probability measure, P
The result is a probability space: { WF,

, P}
Sloppiness here can get us into trouble, since important things can be overlooked.
Example: (Problem 1.6 in Stark)
An experiment consists of measuring the height of the parents of randomly chosen

Engineering students. Find the sample space and describe the event F such that the mother is
taller than the father (is this different than the general population)?
Clearly, if the heights are given by hM and hW respectively, then

 W= { hM , h W : hM > 0, hW > 0}
 F = { hM , h W : hM > 0, hW > 0, hM < hW }
Note that we could also put a limit on the heights in the sample space.
Example:
Consider the following example (Leon-Garcia 1994):
The key parameters are:

 Data packets consist of 10ms chunks of quantized speech and all of the lines are
synchronized.
 There are N users to support, but only M outgoing lines
 Some of the 10ms time-slots contain silence (no packet generated)
 A is the number of active (speaking) users
 If A  M at the start of a time slot, then A  M packets are randomly selected and
discarded
We want to answer questions such as what is the fraction of packets discarded. To so this, we
need to know the probability of there being k active calls.
Lets define the system as follows (make it appropriate to the experiment).
We can think of this system as a random experiment that is run every 10ms. We have
 Wequals all combinations of 48 callers
o each is either active or inactive
 F i �F is a set that consists of all groups of the elements of W such that there are i
active callers
o note that F must be made up of elements from W
o F cannot be completely arbitrary
 need to be able to talk about the event “3 or 4” active callers and make
sense out of it
 each member set of F must be assigned a probability (there are rules, as we shall see)
Note that the definition of the event space is not unique – it may depend what we are
interested in.
For assigning probabilities (the last step), let

 A( j ) be the number of active calls in the jth time slot (assume independent of time).
o Event Fk � { A = k }
 N k (n) is the number of slots with k active calls in n runs of the experiment
Intuitively, we might estimate the probabilities as:
N k ( n)
Probability mass function p A (k ) = Pr [ Fk ] = lim
n�� n
however, we must be careful, since this assumes the existence of the limit! This is known as
the relative frequency approach.
We could also use a-priori data, such as the probability of “silent packets” and then
combinatorics to estimate the probabilities.
The average number of active calls in an interval is just
1
An= �kN k (n)
n k
Expected value
lim A n = �k p A (k ) = E ( A)
n��
k
The fraction of packets discarded will be on average:
48
E ( # of active calls above M ) �k =M +1 (k - M ) p A ( k )
=
E ( # of active calls) 48
� kpA ( k ) k =0
ASK: Can we simply define probability as the relative frequency of the corresponding event?
ANS: No!
 We would get a somewhat different answer with each set of trials (especially for low
prob. events)
 we would also be in trouble if the limit failed to exist!
The Axiomatic Approach
Although probability theory has been in existence for centuries, the axiomatic approach was
not formulated until 1933 by Andrei Kolmogorov.
The axiomatic approach resolved many difficulties with limiting operations that had been
plaguing theorists and is now the accepted foundation of the theory.
ASK: do we care about mathematical niceties?

ANS: Yes!
 Although many problems are esoteric, things like limits of sets and convergence (eg,
the central limit theorem) are critical to modern engineering
 We need to understand the language so as to get at the pieces that we do need.
Axioms are statements that are accepted as true from which other more interesting properties
can be proven. As the “hard rock” foundation, there should be as few axioms as possible (so
that very little is assumed).
The Axioms of Probability
The probability law for an experiment is simply a way of assigning a number to each of
the possible events. Given a sample space, Wand an event space, F , that is a s -field
(this is a requirement that we shall define better soon), we require that P satisfy
Axiom 1. P ( F ) �0, for all F �F

Axiom 2. P (W) =1 - the “certain event” must happen
Axiom 3. If Fi , i = 1, 2,..., n are disjoint sets (zero intersection), then
n n
P (U Fi ) = �P( Fi )
i =1 i =1
Axiom 4. If Fi , i = 1, 2,... are disjoint sets (zero intersection), then

� �
P (U Fi ) = �P( Fi )
i =1 i =1
note that these are easily converted to intersections via De Morgan’s law:
 A �B �L = A �B L
 A + B +L = AB
We need the “infinite” case above to deal with limiting operations.
Note also that the axioms do not tell us how to specify the probabilities; however,
whatever method we choose (relative frequency, deduction), the result must be consistent
with the axioms.
The axioms can also be used to prove many other relationships.
Example
What do we do if the sets specified in Axiom #3 are not disjoint?
A �B = A �( A c �B ) - disjoint
P ( A �B ) = P ( A) + P ( Ac �B )
but, These sets are disjoint

B = ( A �Ac ) �B = ( A �B ) �( Ac �B) - use distributivity
P ( B) = P ( A �B ) + P ( Ac �B)
and
P ( A �B) = P ( A) + P ( B) - P( A �B)
�P ( A) + P ( B )
This is known as the UNION BOUND, which is often used in Communications to put
analytical bounds on the error probability for a given scheme. The bound tends to be tight
for high signal to noise ratios.
 If we are sending one of M equally likely signals in noise, si �{ s1 , s 2 ,..., s M } , then
an error occurs if the received signal r is closer to at least one signal s k , k �i .
k
 Let Fi be the event that r is closer to s k than si when si is transmitted.
� � M
�M �
 U
Pr [ error| si ] = Pr � Fi �
j
��Pr � Fi j �
� �
� � j =1
�
j =1
� �
j �i � j �i
Very hard to easy

formulate
Sigma and Borel Fields
Can we let all subsets of  be events? No!! For example, if  is the real line, then there are
too many subsets to assign probabilities to
As mentioned in the axiom statement (and in earlier), we cannot simply pick any set of
events and expect to have a workable probability model.
 We must be able to deal with things like unions and intersections of events
o The existence and probabilities if these quantities must be clearly defined
 What is needed is that F be a s -field - but what is this?
Given a universal set, W, a field, M , is a collection satisfying

1. f �M , W�M
2. A, B �M � A �B �M De Morgan means intersections work too
c
3. A �M � A �M
A field is called a “ s -field ” if we add a condition on infinite (but countable) unions

�
4. A1 , A2 ,... �M � U An �M
n=1
Note that this last point isn’t a problem when there is a finite number of events; however,
complications arise in continuous problems
 for example, we may be interested in defining events as intervals on the real line
 the voltage across a resistor
 as a random variable abstraction
 ASK: why intervals and not points on the real line?

 ANS: zero probability (usually)
To deal with the real-line ( �) case, we need to look at s -fields made up of intervals
(Gray and Davisson 1999):
 Given a set G, we define the s -field generated by G, s (G ) , to be the “smallest”
s -field containing all the subsets of G. If F is any s -field that contains G, then it
must also contain s (G ) .
 Given the real-line, the Borel Field, B ( �) , is the s -field generated by all the
open intervals of the form (a, b) .
o Since B ( �) contains all open intervals, it must contain limit sets such as
(- �, b) = lim (- n, b)
n��
(a, �) = lim (a, n)

n��
{a} = lim (a - 1 n , a +1 n)
n��
 Note that these limits can be viewed as taking the unions of bigger
and bigger sets
 Need a s -field to guarantee that this limits are in the set
o Since B ( �) is a s -field , it must contain differences (which can be done

via an intersections and a complement
(- �, b] = (- �, �) - (b, �)
o Since B ( �) is a s -field , it must contain unions
(a, b] = ( a, b) �{b}
[a, b) = (a, b) �{a}
 In sum, B ( �) , consists of every real-line subset that can be described by

countable combinations of other intervals.
 B ( �) is, however, not equal to the power set of the real line (the set of all
subsets). Those subsets not in B ( �) are, however, not of engineering interest
(Gray and Davisson 1999) and are difficult even to describe.
Independence
Two events A and B are said to be independent if
P( A �B) = P ( AB ) = P ( A) P ( B )
Note that this does not mean A �B = �, which would imply that A and B never occur at the
same time (or that they are mutually exclusive).
 For example, we might look at all kittens born in 1999 and have two events
o Colour of fur
o Sex
 It is likely that these are not “correlated”
Equality
Two events A and B are
 Equal if they consist of the same elements of W
 Equal with probability one if the set consisting outcomes that are in A (or B), but
not in AB ( A �B ) has zero probability
o Isolated points on the real line often have zero probability

o Consider a voltage reading of noise across a resistor:
A � ( a, b)
B � ( a, b]
these events may have exactly the same probability of occurring if we
integrate the pdf of the noise voltage
Product Spaces
Given two probability spaces, (W1 , F1 , P1 ) and (W2 , F2 , P2 ) , the sample space of the
experiments considered jointly is a 2-dimensional vector, formed as the Cartesian
product:
You will need this in the
W�1 W2 = {(w1 , w2 ) : w1 �W1 , w2 �W2 } homework
Unfortunately, the s -field is trickier, since F1 �F2 may not be a s -field ; however, we
can use the s -field generated by F1 �F2 , s (F1 �F2 ) , provided that a suitable probability
measure is applied.
In general, we can have k experiments to consider jointly and the product space consists
of k-dimensional vectors, with each dimension being the outcome of an experiment.
k- 1
� Ai = Ak
i=0
Conditional Probability
We need to consider how to approach probability when certain events are known to have
happened already
 For example, what is the probability of our tires failing given that Firestone
manufactured them?
o Does the condition add any information?
 We need to make sure that the resulting probability space is valid.
o the probability of many events (when tires don’t fail) is now zero.
More specifically, suppose that we start with a probability space (WF,

, P ) . But then we
are told that event G has already occurred. We then want to compute
P ( F | G ) = P ( F �G | G )
 The trick is to come up with a good probability assignment!
In defining conditional probability, we make use of our intuition that conditioning on G

will not change the relative probabilities of events contained in G. Thus, we expect that
the following ratios will be equal:
P ( F �G | G ) P ( F �G )
=
P ( H �G | G ) P ( H �G )
However, if we let H =W, then we get

 P ( H �G | G ) = P(G | G ) =1
 P ( H �G ) = P (G )
and =1
P ( F �G | G ) P( F �G ) P ( F �G ) P ( FG )
P( F | G ) = P ( F �G | G ) = P ( H �G | G ) = = =
P ( H �G | G ) P ( H �G ) P(G ) P(G )
Thus, we can write:
P ( FG ) = P ( F | G ) P ( G ) = P ( G | F ) P ( F )
and we can generate Bayes’s Rule, which also works for pdfs This final result is
often used as a
P (G | F ) P ( F ) definition
P( F | G ) =
P (G )
P ( F ) P (G )
Note that for independent events, P ( F | G ) = = P( F )
P (G )
Example (matching problem)

N people with identical suitcases check them at the airport and board the same plane. On
arrival, each person “randomly” picks a suitcase. What is the probability that exactly k
people get the correct cases.
First off, we assume that

 “randomly” means that each available suitcase is selected with equal
probability (this is the “worst” case).
 The probability that any person picks a case in a specific time slot is 1/N.
1. Start by finding the prob. that a specific group of k people selects the right cases.
2. Use combinatorics to get the final answer
Notation
 Let N i be the event of no matches with N - i +1 people (and suitcases)
 Let G1 be the event that the 1st person gets their suitcase
j
 Let Ci be the event that the case of the ith person is chosen on the jth pick
First Step
Lets find the probability that any specific group of k people pick their own cases. Start with
the prob. that 1 specific person, A, gets their case
Use Bayes rule to expand
Pr [ A] = Pr [ person A picks correctly ]
= Pr ( C1A | A picks first ) Pr ( A picks first )
+ Pr ( C A2 | A picks 2nd) Pr ( A picks 2nd )
M
but, since the probability of person A picking in any given position is 1/N, we can write
1� 1
Pr ( A) = Pr ( C A | A picks first ) + Pr ( C A2 | A picks 2nd) +...]
�
N
1 1 � 1�� 1 1
= = P (C1A ) =�
1-
� �
� =
N �
N - 1 � N �N - 1 N
1
In fact, all of these terms have the same value and we end up with Pr ( A) = .
N
We now find the probability that 2 specific people get their own cases:
Pr(both A and B ) = Pr( B gets their case| A gets their case) Pr( A gets their case)
But if A gets their case, then there are N-1 cases left and we have the same
1 1
problem as with A alone =
N- 1N
Similarly, we have that
1 1 1
Pr(k specific people get their cases) = � L
N N - 1 N - k +1
Pr ( exactly k people get their cases ) =

Pr(k people get their cases AND the rest don't) =
Pr(the rest don't| k get their cases) Pr(k get their cases) =
1 1 1 ( N - k )!
Pr ( N k +1 ) � � L = Pr ( N k +1 )
N N - 1 N - k +1 N!
No matches with N-k people
Second Step
�N� � N!
Now, there are �� =
� ( N - k )!k ! ways of selecting k people from a group of N. Thus, the
�
�
�k �
probability of getting exactly k successful matches is just
( N - k )! N! 1
Pr ( N k +1 ) � = Pr ( N k +1 )
N ! ( N - k )!k ! k !
We now just need to find Pr ( N k +1 ) - this is the tricky part!

Third Step
(in future years we should stop here and ask them to do the rest on their own)
Lets consider start the probability that there are no matches with N people (same as
N - k , but simpler notation)
G1 and G1 form a partition

Pr ( N 1 ) = Pr ( N 1G1 ) + Pr ( N 1G1 )
= Pr(N 1 | G1 ) P(G1 ) + Pr(N 1 | G1 ) P(G1 )
=0 1 N- 1
1- =
no matches with N people, N N
but the first person matched
N- 1
Pr ( N 1 ) = Pr ( N 1 | G1 )
N
N
Pr ( N 1 | G1 ) = Pr ( N 1 )
N- 1
Lets focus on this item more closely
We know already that the first person to pick, say A, has not picked their own case, it is given.
There are two mutually exclusive ways in which the event { N 1 | G1 } can happen
 No matches and the man whose case was chosen first, B, does not select A’s case.
This problem is exactly the same as the one where we have N - 1 people and
nobody gets their case (assign A’s case to B), thus, the probability is just
o Pr ( N 2 )
 No matches and B selects A’s case, which can be written using Bayes rule as:
o
Pr ( N - 2 people don't get their cases | B gets A's case) Pr ( B gets A's case )
1
= P( N 3 )
N- 1
Since if B gets A’s case, then there are only N - 2 cases and people remaining,
none of who get their cases, and the probability that a specific person, B, out of
N - 1 gets a specific case is just 1 N - 1 (from before).
Since these two events are mutually exclusive, we have:

N 1
Pr(N 1 | G1 ) = Pr ( N 1 ) = Pr ( N 2 ) + Pr ( N 3 )
N- 1 N- 1
N- 1 1
Pr ( N 1 ) = Pr ( N 2 ) + Pr ( N 3 )
N N
1�
Pr ( N 1 ) - Pr ( N 2 ) =- Pr ( N 2 ) - Pr ( N 3 ) �
N� �
Expressing this in words, we have
Pr ( no matches for N people) - Pr ( no matches for N - 1 people) =

1�
- Pr ( no matches for N - 1 people) - Pr ( no matches for N - 3 people) �
N� �
This relation must be true for all values of N, so plugging in specific values gives
 no matches with 1 person (impossible) � Pr ( N N ) = 0
1
 no matches with 2 people (first choice is correct with prob. ½) � Pr ( N N - 1 ) =
2
 N = 3 we get
Pr ( no matches for 3 people) - Pr ( no matches for 2 people) =

1�
- Pr ( no matches for 2 people) - Pr ( no matches for 1 person ) �
3� �
Or, in terms of our symbols – and for different values of N
1
N = 2; Pr [ N N - 1 ] - Pr [ N N ] =
2
1 1
N = 3; Pr [ N N - 2 ] - Pr [ N N - 1 ] =- ( Pr [ N N - 1 ] - Pr [ N N ]) =-
3 3!
1 1 1 1
N = 4; Pr [ N N - 3 ] - Pr [ N N - 2 ] =- ( Pr [ N N - 2 ] - Pr [ N N - 1 ]) = � =
4 4 3! 4!
M
N
1 ( - 1)
N = N ; Pr [ N 1 ] - Pr [ N 2 ] =- ( Pr [ N 2 ] - Pr [ N 3 ]) =
N N!
 note that N N - 4 = N N - 4 even in different equations (where the values of N

are different), since the N cancels out in the definition of N i
o both are the prob. when there are no matches for 5 people
1
Adding up all of these equations (including Pr [ N N - 1 ] = ) results in everything on the LHS
2
canceling except for Pr [ N N ] = 0 and Pr [ N 1 ] . We thus have
N
1 1 ( - 1) Prob that no one gets their case
Pr [ N 1 ] = - +L +
2! 3! N! with N people
We thus have
1 1 (- 1) N - k
Pr ( N k +1 ) = - +K +
2! 3! ( N - k )!
and
1 ��1 1 (- 1) N - k �
�
Pr [ exactly k matches ] = �� - +K + �
�
k! �
2! 3!
� ( N - k )!� �
but, we know that
x 2 x3
e x =1+ x + + +L
2! 3!
and we thus have that
e- 1
lim Pr ( k matches ) =
N �� k!
which just happens to be a Poisson random variable with unity mean! Strange how these
things happen (or is it?)?
 also used for the arrivals of customers in queues
 calls in telephone networks
Example – Communications (Leon-Garcia 1994)
The “binary symmetric channel” is a common model for a communications system:
p 1- e
0 0
e e
1-p
1 1
1- e
What input is most probable given that the receiver spits out a “1”? Note that this is a
very common type of problem. For example, on a CD-ROM, the disk drive needs to
figure out if a “0” or a “1” was written given the amplitude of a reflected laser.
Let Ai and Bi be the respective events that the input and output is “i”. Now,
P ( B1 ) = P ( B1 | A0 ) P ( A0 ) + P ( B1 | A1 ) P( A1 ) A forms a partition
= ep + (1- e)(1- p )
Applying Bayes’ rule gives:
P ( B1 | A0 ) P( A0 ) ep
P ( A0 | B1 ) = =
P ( B1 ) ep + (1- e)(1- p)
P( B1 | A1 ) P ( A1 ) (1- e)(1- p)
P ( A1 | B1 ) = =
P ( B1 ) ep + (1- e)(1- p )
As expected, the decision depends upon the probabilities of transmission, and of error.
References
Gray, Robert M., and Lee D. Davisson. 1999. An Introduction to Statistical Signal Processing:
available at www-isl.stanford.edu/~gray/sp.html.
Leon-Garcia, Alberto. 1994. Probability and Random Processes for Electrical Engineering:
Addison Wesley.

Lesson 1. Probability Spaces

Uploaded by

Copyright:

Available Formats

You might also like

Lesson 1. Probability Spaces

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 1. Probability Spaces

Uploaded by

Copyright:

Available Formats

ENSC 802 – Stochastic Systems – 2002 1

Lesson 1. Probability Spaces

The result is a probability space: { WF,

Example: (Problem 1.6 in Stark)

An experiment consists of measuring the height of the parents of randomly chosen

Clearly, if the heights are given by hM and hW respectively, then

Consider the following example (Leon-Garcia 1994):

The key parameters are:

Lets define the system as follows (make it appropriate to the experiment).

For assigning probabilities (the last step), let

Intuitively, we might estimate the probabilities as:

The average number of active calls in an interval is just

The fraction of packets discarded will be on average:

The Axiomatic Approach

ASK: do we care about mathematical niceties?

The Axioms of Probability

Axiom 1. P ( F ) �0, for all F �F

Axiom 4. If Fi , i = 1, 2,... are disjoint sets (zero intersection), then

We need the “infinite” case above to deal with limiting operations.

The axioms can also be used to prove many other relationships.

What do we do if the sets specified in Axiom #3 are not disjoint?

but, These sets are disjoint

Very hard to easy

Sigma and Borel Fields

Given a universal set, W, a field, M , is a collection satisfying

A field is called a “ s -field ” if we add a condition on infinite (but countable) unions

 ASK: why intervals and not points on the real line?

(a, �) = lim (a, n)

o Since B ( �) is a s -field , it must contain differences (which can be done

o Since B ( �) is a s -field , it must contain unions

 In sum, B ( �) , consists of every real-line subset that can be described by

o Isolated points on the real line often have zero probability

More specifically, suppose that we start with a probability space (WF,

In defining conditional probability, we make use of our intuition that conditioning on G

However, if we let H =W, then we get

Example (matching problem)

First off, we assume that

Similarly, we have that

Pr ( exactly k people get their cases ) =

No matches with N-k people

We now just need to find Pr ( N k +1 ) - this is the tricky part!

G1 and G1 form a partition

= Pr(N 1 | G1 ) P(G1 ) + Pr(N 1 | G1 ) P(G1 )

Lets focus on this item more closely

Since these two events are mutually exclusive, we have:

Expressing this in words, we have

Pr ( no matches for N people) - Pr ( no matches for N - 1 people) =

Pr ( no matches for 3 people) - Pr ( no matches for 2 people) =

Or, in terms of our symbols – and for different values of N

 note that N N - 4 = N N - 4 even in different equations (where the values of N

but, we know that

Example – Communications (Leon-Garcia 1994)

The “binary symmetric channel” is a common model for a communications system:

Applying Bayes’ rule gives:

You might also like