Professional Documents
Culture Documents
Lesson 1. Probability Spaces
Lesson 1. Probability Spaces
Lesson 1. Probability Spaces
In this unit, we study probability spaces and review some basic definitions. Most of the
material is presented in the form of examples, since it is really review.
Probability Spaces
Our goal is to find a probabilistic model of a physical phenomenon and to do this we need to
carefully define
The sample space, W
The event space, F
The probability measure, P
Sloppiness here can get us into trouble, since important things can be overlooked.
Note that we could also put a limit on the heights in the sample space.
ENSC 802 – Stochastic Systems – 2002 2
Example:
We want to answer questions such as what is the fraction of packets discarded. To so this, we
need to know the probability of there being k active calls.
We can think of this system as a random experiment that is run every 10ms. We have
Wequals all combinations of 48 callers
o each is either active or inactive
F i �F is a set that consists of all groups of the elements of W such that there are i
active callers
o note that F must be made up of elements from W
o F cannot be completely arbitrary
need to be able to talk about the event “3 or 4” active callers and make
sense out of it
each member set of F must be assigned a probability (there are rules, as we shall see)
ENSC 802 – Stochastic Systems – 2002 3
Note that the definition of the event space is not unique – it may depend what we are
interested in.
N k ( n)
Probability mass function p A (k ) = Pr [ Fk ] = lim
n�� n
however, we must be careful, since this assumes the existence of the limit! This is known as
the relative frequency approach.
We could also use a-priori data, such as the probability of “silent packets” and then
combinatorics to estimate the probabilities.
1
An= �kN k (n)
n k
Expected value
lim A n = �k p A (k ) = E ( A)
n��
k
48
E ( # of active calls above M ) �k =M +1 (k - M ) p A ( k )
=
E ( # of active calls) 48
� kpA ( k ) k =0
ASK: Can we simply define probability as the relative frequency of the corresponding event?
ANS: No!
We would get a somewhat different answer with each set of trials (especially for low
prob. events)
we would also be in trouble if the limit failed to exist!
ENSC 802 – Stochastic Systems – 2002 4
Although probability theory has been in existence for centuries, the axiomatic approach was
not formulated until 1933 by Andrei Kolmogorov.
The axiomatic approach resolved many difficulties with limiting operations that had been
plaguing theorists and is now the accepted foundation of the theory.
Axioms are statements that are accepted as true from which other more interesting properties
can be proven. As the “hard rock” foundation, there should be as few axioms as possible (so
that very little is assumed).
The probability law for an experiment is simply a way of assigning a number to each of
the possible events. Given a sample space, Wand an event space, F , that is a s -field
(this is a requirement that we shall define better soon), we require that P satisfy
Note also that the axioms do not tell us how to specify the probabilities; however,
whatever method we choose (relative frequency, deduction), the result must be consistent
with the axioms.
ENSC 802 – Stochastic Systems – 2002 5
Example
A �B = A �( A c �B ) - disjoint
P ( A �B ) = P ( A) + P ( Ac �B )
P ( A �B) = P ( A) + P ( B) - P( A �B)
�P ( A) + P ( B )
This is known as the UNION BOUND, which is often used in Communications to put
analytical bounds on the error probability for a given scheme. The bound tends to be tight
for high signal to noise ratios.
If we are sending one of M equally likely signals in noise, si �{ s1 , s 2 ,..., s M } , then
an error occurs if the received signal r is closer to at least one signal s k , k �i .
k
Let Fi be the event that r is closer to s k than si when si is transmitted.
� � M
�M �
U
Pr [ error| si ] = Pr � Fi �
j
��Pr � Fi j �
� �
� � j =1
�
j =1
� �
j �i � j �i
Can we let all subsets of be events? No!! For example, if is the real line, then there are
too many subsets to assign probabilities to
As mentioned in the axiom statement (and in earlier), we cannot simply pick any set of
events and expect to have a workable probability model.
We must be able to deal with things like unions and intersections of events
o The existence and probabilities if these quantities must be clearly defined
What is needed is that F be a s -field - but what is this?
Note that this last point isn’t a problem when there is a finite number of events; however,
complications arise in continuous problems
for example, we may be interested in defining events as intervals on the real line
the voltage across a resistor
as a random variable abstraction
To deal with the real-line ( �) case, we need to look at s -fields made up of intervals
(Gray and Davisson 1999):
Given a set G, we define the s -field generated by G, s (G ) , to be the “smallest”
s -field containing all the subsets of G. If F is any s -field that contains G, then it
must also contain s (G ) .
Given the real-line, the Borel Field, B ( �) , is the s -field generated by all the
open intervals of the form (a, b) .
o Since B ( �) contains all open intervals, it must contain limit sets such as
ENSC 802 – Stochastic Systems – 2002 7
(- �, b) = lim (- n, b)
n��
{a} = lim (a - 1 n , a +1 n)
n��
Note that these limits can be viewed as taking the unions of bigger
and bigger sets
Need a s -field to guarantee that this limits are in the set
(- �, b] = (- �, �) - (b, �)
(a, b] = ( a, b) �{b}
[a, b) = (a, b) �{a}
Independence
Two events A and B are said to be independent if
P( A �B) = P ( AB ) = P ( A) P ( B )
Note that this does not mean A �B = �, which would imply that A and B never occur at the
same time (or that they are mutually exclusive).
For example, we might look at all kittens born in 1999 and have two events
o Colour of fur
o Sex
It is likely that these are not “correlated”
Equality
Two events A and B are
Equal if they consist of the same elements of W
Equal with probability one if the set consisting outcomes that are in A (or B), but
not in AB ( A �B ) has zero probability
ENSC 802 – Stochastic Systems – 2002 8
Product Spaces
Given two probability spaces, (W1 , F1 , P1 ) and (W2 , F2 , P2 ) , the sample space of the
experiments considered jointly is a 2-dimensional vector, formed as the Cartesian
product:
You will need this in the
W�1 W2 = {(w1 , w2 ) : w1 �W1 , w2 �W2 } homework
Unfortunately, the s -field is trickier, since F1 �F2 may not be a s -field ; however, we
can use the s -field generated by F1 �F2 , s (F1 �F2 ) , provided that a suitable probability
measure is applied.
In general, we can have k experiments to consider jointly and the product space consists
of k-dimensional vectors, with each dimension being the outcome of an experiment.
k- 1
� Ai = Ak
i=0
ENSC 802 – Stochastic Systems – 2002 9
Conditional Probability
We need to consider how to approach probability when certain events are known to have
happened already
For example, what is the probability of our tires failing given that Firestone
manufactured them?
o Does the condition add any information?
We need to make sure that the resulting probability space is valid.
o the probability of many events (when tires don’t fail) is now zero.
P ( F | G ) = P ( F �G | G )
The trick is to come up with a good probability assignment!
P ( F �G | G ) P ( F �G )
=
P ( H �G | G ) P ( H �G )
and =1
P ( F �G | G ) P( F �G ) P ( F �G ) P ( FG )
P( F | G ) = P ( F �G | G ) = P ( H �G | G ) = = =
P ( H �G | G ) P ( H �G ) P(G ) P(G )
Thus, we can write:
P ( FG ) = P ( F | G ) P ( G ) = P ( G | F ) P ( F )
and we can generate Bayes’s Rule, which also works for pdfs This final result is
often used as a
P (G | F ) P ( F ) definition
P( F | G ) =
P (G )
P ( F ) P (G )
Note that for independent events, P ( F | G ) = = P( F )
P (G )
N people with identical suitcases check them at the airport and board the same plane. On
arrival, each person “randomly” picks a suitcase. What is the probability that exactly k
people get the correct cases.
1. Start by finding the prob. that a specific group of k people selects the right cases.
2. Use combinatorics to get the final answer
Notation
Let N i be the event of no matches with N - i +1 people (and suitcases)
Let G1 be the event that the 1st person gets their suitcase
j
Let Ci be the event that the case of the ith person is chosen on the jth pick
First Step
Lets find the probability that any specific group of k people pick their own cases. Start with
the prob. that 1 specific person, A, gets their case
Use Bayes rule to expand
Pr [ A] = Pr [ person A picks correctly ]
= Pr ( C1A | A picks first ) Pr ( A picks first )
+ Pr ( C A2 | A picks 2nd) Pr ( A picks 2nd )
M
but, since the probability of person A picking in any given position is 1/N, we can write
1� 1
Pr ( A) = Pr ( C A | A picks first ) + Pr ( C A2 | A picks 2nd) +...]
�
N
1 1 � 1�� 1 1
= = P (C1A ) =�
1-
� �
� =
N �
N - 1 � N �N - 1 N
1
In fact, all of these terms have the same value and we end up with Pr ( A) = .
N
We now find the probability that 2 specific people get their own cases:
ENSC 802 – Stochastic Systems – 2002 11
Pr(both A and B ) = Pr( B gets their case| A gets their case) Pr( A gets their case)
But if A gets their case, then there are N-1 cases left and we have the same
1 1
problem as with A alone =
N- 1N
1 1 1
Pr(k specific people get their cases) = � L
N N - 1 N - k +1
Second Step
�N� � N!
Now, there are �� �=
� ( N - k )!k ! ways of selecting k people from a group of N. Thus, the
�
�
�k �
probability of getting exactly k successful matches is just
( N - k )! N! 1
Pr ( N k +1 ) � = Pr ( N k +1 )
N ! ( N - k )!k ! k !
Third Step
(in future years we should stop here and ask them to do the rest on their own)
Lets consider start the probability that there are no matches with N people (same as
N - k , but simpler notation)
=0 1 N- 1
1- =
no matches with N people, N N
but the first person matched
N- 1
Pr ( N 1 ) = Pr ( N 1 | G1 )
N
N
Pr ( N 1 | G1 ) = Pr ( N 1 )
N- 1
We know already that the first person to pick, say A, has not picked their own case, it is given.
There are two mutually exclusive ways in which the event { N 1 | G1 } can happen
No matches and the man whose case was chosen first, B, does not select A’s case.
This problem is exactly the same as the one where we have N - 1 people and
nobody gets their case (assign A’s case to B), thus, the probability is just
o Pr ( N 2 )
No matches and B selects A’s case, which can be written using Bayes rule as:
o
Pr ( N - 2 people don't get their cases | B gets A's case) Pr ( B gets A's case )
1
= P( N 3 )
N- 1
Since if B gets A’s case, then there are only N - 2 cases and people remaining,
none of who get their cases, and the probability that a specific person, B, out of
N - 1 gets a specific case is just 1 N - 1 (from before).
ENSC 802 – Stochastic Systems – 2002 13
1
N = 2; Pr [ N N - 1 ] - Pr [ N N ] =
2
1 1
N = 3; Pr [ N N - 2 ] - Pr [ N N - 1 ] =- ( Pr [ N N - 1 ] - Pr [ N N ]) =-
3 3!
1 1 1 1
N = 4; Pr [ N N - 3 ] - Pr [ N N - 2 ] =- ( Pr [ N N - 2 ] - Pr [ N N - 1 ]) = � =
4 4 3! 4!
M
N
1 ( - 1)
N = N ; Pr [ N 1 ] - Pr [ N 2 ] =- ( Pr [ N 2 ] - Pr [ N 3 ]) =
N N!
1
Adding up all of these equations (including Pr [ N N - 1 ] = ) results in everything on the LHS
2
canceling except for Pr [ N N ] = 0 and Pr [ N 1 ] . We thus have
N
1 1 ( - 1) Prob that no one gets their case
Pr [ N 1 ] = - +L +
2! 3! N! with N people
We thus have
1 1 (- 1) N - k
Pr ( N k +1 ) = - +K +
2! 3! ( N - k )!
and
1 ��1 1 (- 1) N - k �
�
Pr [ exactly k matches ] = �� - +K + �
�
k! �
2! 3!
� ( N - k )!� �
x 2 x3
e x =1+ x + + +L
2! 3!
and we thus have that
e- 1
lim Pr ( k matches ) =
N �� k!
which just happens to be a Poisson random variable with unity mean! Strange how these
things happen (or is it?)?
also used for the arrivals of customers in queues
calls in telephone networks
ENSC 802 – Stochastic Systems – 2002 15
p 1- e
0 0
e e
1-p
1 1
1- e
What input is most probable given that the receiver spits out a “1”? Note that this is a
very common type of problem. For example, on a CD-ROM, the disk drive needs to
figure out if a “0” or a “1” was written given the amplitude of a reflected laser.
Let Ai and Bi be the respective events that the input and output is “i”. Now,
P ( B1 ) = P ( B1 | A0 ) P ( A0 ) + P ( B1 | A1 ) P( A1 ) A forms a partition
= ep + (1- e)(1- p )
P ( B1 | A0 ) P( A0 ) ep
P ( A0 | B1 ) = =
P ( B1 ) ep + (1- e)(1- p)
P( B1 | A1 ) P ( A1 ) (1- e)(1- p)
P ( A1 | B1 ) = =
P ( B1 ) ep + (1- e)(1- p )
As expected, the decision depends upon the probabilities of transmission, and of error.
ENSC 802 – Stochastic Systems – 2002 16
References
Gray, Robert M., and Lee D. Davisson. 1999. An Introduction to Statistical Signal Processing:
available at www-isl.stanford.edu/~gray/sp.html.
Leon-Garcia, Alberto. 1994. Probability and Random Processes for Electrical Engineering:
Addison Wesley.