Lec-1 - Introduction To Probability Theory

INFORMATION AND COMMUNICATION THEORY
By
Dr. Hem Dutt Joshi
Dr. A K Singh
Dept of ECE, TIET, Patiala 1

INFORMATION AND COMMUNICATION THEORY
• Probability Theory
• Stochastic Processes
• Estimation & Hypothesis Testing
• Information Theory
• Statistical Modeling of Noise
2
PROBABILITY THEORY
Lecture – 1 Introduction to Probability Theory

Lecture – 2 Bernoulli Trials and Bernoulli’s Theorem
Lecture – 3 Random Variables
Lecture – 4 Conditional Distribution and Bayes' theorem
Lecture – 5 Function of a Random Variable
Lecture – 6 Mean and Variance of random variable
Lecture – 7 Moments and Characteristic Function
Lecture – 8 Function of Two Random Variables
Lecture – 9 Joint Moments and Joint Characteristic Functions
Lecture – 10 Conditional Distributions, Conditional Expected Values
Lecture – 11 Central Limit Theorem
3
INTRODUCTION
Experiment to describe any process that generates a set of data.
• tossing of a coin. In this experiment, there are only two possible
outcomes, heads or tails.
• launching of a missile and observing of its velocity at specified times.
Interest: in the observations obtained by repeating the experiment several times.
The outcomes will depend on chance and, therefore, cannot be predicted with
certainty.
RANDOMNESS
4
Probability theory deals with the study of random phenomena, which under
repeated experiments yield different outcomes
The set of all possible outcomes of a statistical experiment is called the sample
space and is represented by the symbol .
Each outcome in a sample space is called an element or a sample point
Thus, the sample space of possible outcomes when two coins are tossed, may be
written as
  1 ,  2 ,  3 ,  4  .
And elements/sample points are
1  ( H , H ),  2  ( H , T ), 3  (T , H ),  4  (T , T )
where H and T correspond to heads and tails, respectively.
5
When an experiment is performed under these conditions, certain elementary
events i occur in different but completely uncertain ways.
We can assign nonnegative number P(i ), as the probability of the event i .
Laplace’s Classical Definition: The Probability of an event A is defined a-

priori without actual experimentation as
Number of outcomes favorable to A

P( A)  ,
Total number of possible outcomes
provided all these outcomes are equally likely.
6
Example-1.1:
Consider a box with n white and m red balls. In this case, there are two
elementary outcomes: white ball or red ball. Probability of “selecting a white
n
ball”  .
nm
Example-1.2:
If p is a prime number, then every pth number (starting with p) is divisible by p.
Thus among p consecutive integers there is one favorable outcome, and hence
1
P  a given number is divisible by a prime p 
p
7
The totality of all i , known a priori, constitutes a set ,
the set of all experimental outcomes.
   1 ,  2 , ,  k , 
 has subsets A, B, C ,. Recall that if A is a subset of

, then if   A implies   . From A and B, we
can generate other related subsets A  B, A  B, A, B,
etc.
A  B    |   A or   B
A  B    |   A and   B
and
A    |   A
8
Assuming that the probability pi  P (i ) of elementary
outcomes  i of  are apriori defined, how does one
assign probabilities to more ‘complicated’ events such as
A, B, AB, etc., above?
The three axioms of probability defined below can be
used to achieve that goal.
9
Axioms of Probability
For any event A, a number P(A) is assigned, called the probability

of the event A. This number satisfies the following three conditions
that act as the axioms of probability.
(i) P ( A)  0 (Probabili ty is a nonnegativ e number)
(ii) P()  1 (Probabili ty of the whole set is unity)
(iii) If A  B   , then P( A  B )  P ( A)  P ( B ).
(Note that (iii) states that if A and B are mutually exclusive (M.E.) events, the
probability of their union is the sum of their probabilities.)
10
a. Since A  A   , we have using (ii)
P( A  A)  P ()  1.
But A  A   , and using (iii),
P( A  A)  P ( A)  P( A)  1 or P( A)  1  P ( A).
b. Similarly, for any A, A         .
Hence it follows that P  A       P ( A)  P ( ) .
But A      A, and thus P    0.
c. Suppose A and B are not mutually exclusive (M.E.)?
How does one compute P ( A  B )  ?
11
To compute the above probability, we should re-express
A  B in terms of M.E. sets so that we can make use of
the probability axioms.
A  B  A  AB, A AB
where A and AB are clearly M.E. events. A B

Thus using axiom
P ( A  B )  P ( A  AB )  P ( A)  P ( AB ). (1.1)
To compute P( AB ), we can express B as
B  B    B  ( A  A)
 ( B  A)  ( B  A)  BA  B A
Thus
P ( B )  P ( BA)  P ( B A), (1.2)
since and are M.E. events.
BA  AB B A  AB 12
From (1.2),
P( AB )  P ( B )  P( AB ) (1.3)
and using (1.3) in (1.1)

P( A  B )  P( A)  P( B )  P ( AB).
13
Conditional Probability and Independence
In N independent trials, suppose NA, NB, NAB denote the number of
times events A, B and AB occur respectively.
NA NB N AB
P( A)  , P( B)  , P( AB)  .
N N N
Among the NA occurrences of A, only NAB of them are also found
among the NB occurrences of B.
N AB N AB / N P ( AB)
 
NB NB / N P( B )
P(A|B) = Probability of “the event A given that B has occurred”

P ( AB)
P( A | B )  ,
P( B)
14
Properties of Conditional Probability:
a. If B  A, AB  B, and
P ( AB) P ( B )
P( A | B)   1
P( B) P( B)
since if B  A, then occurrence of B implies automatic
occurrence of the event A. Example: dice tossing experiment
A  {outcome is even}, B={outcome is 2},
Then B  A, and P( A | B )  1.
b. If A  B, AB  A, and
P ( AB ) P ( A)
P( A | B )    P ( A).
P( B ) P( B)
15
Independence: A and B are said to be independent events,
if
P ( AB )  P ( A)  P ( B ).
Notice that the above definition is a probabilistic statement,

not a set theoretic notion such as mutually exclusiveness.
16
Suppose A and B are independent, then
P ( AB ) P ( A) P ( B )
P( A | B)    P ( A).
P( B ) P( B )
Thus if A and B are independent, the event that B has
occurred does not add any more light into the event A.
Example: A box contains 6 white and 4 black balls.
Remove two balls at random without replacement. What
is the probability that the first one is white and the second
one is black?
Let W1 = “first ball removed is white”
B2 = “second ball removed is black”
17
We need P(W1  B2 )  ? We have W1  B2  W1B2  B2W1.
Using the conditional probability rule,
P (W1B2 )  P ( B2W1 )  P ( B2 | W1 ) P (W1 ).
But
6 6 3
P (W1 )    ,
6  4 10 5
and
4 4
P ( B2 | W1 )   ,
54 9
and hence
5 4 20
P (W1B2 )     0.25.
9 9 81
18
Are the events W1 and B2 independent? Our common sense
says No. To verify this we need to compute P(B2). Of course
the fate of the second ball very much depends on that of the
first ball. The first ball has two options: W1 = “first ball is
white” or B1= “first ball is black”. Note that W1  B1   ,
W1  B1  .
and Hence W1 together with B1 form a partition.
Thus
P ( B2 )  P ( B2 | W1 ) P (W1 )  P ( B2 | R1 ) P ( B1 )
4 3 3 4 4 3 1 2 42 2
          ,
5  4 5 6  3 10 9 5 3 5 15 5
and
2 3 20
P ( B2 ) P (W1 )    P ( B2W1 )  .
5 5 81
As expected, the events W1 and B2 are dependent. 19
P( AB )
P( A | B )  ,
P( B )
From,
P ( AB)  P ( A | B ) P ( B ).
Similarly,
P ( BA) P ( AB )
P ( B | A)   ,
P ( A) P ( A)
or
P ( AB )  P ( B | A) P ( A).
P ( A | B ) P ( B )  P ( B | A) P ( A).
or
P ( B | A)
P( A | B )   P ( A) (1.4)
P( B )
Equation (1.4) is known as Bayes’ theorem.
20
P ( B | A)
P( A | B )   P ( A)
P( B )
Bayes’ theorem has an interesting interpretation:

P(A) represents the a-priori probability of the event A. Suppose B has
occurred, and assume that A and B are not independent.
How can this new information be used to update our knowledge about A?
Bayes’ rule take into account the new information (“B has occurred”) and
gives out the a-posteriori probability of A given B.
A more general version of Bayes’ theorem is -
P ( B | Ai ) P ( Ai ) P ( B | Ai ) P ( Ai )
P ( Ai | B )   n
,
P( B )
 P( B | A ) P( A )
i 1
i i
21
Example: Two boxes B1 and B2 contain 100 and 200 light
bulbs respectively. The first box (B1) has 15 defective bulbs
and the second 5. Suppose a box is selected at random and
one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B1 has 85 good and 15 defective
bulbs. Similarly box B2 has 195 good and 5 defective
bulbs. Let D = “Defective bulb is picked out”.
Then
15 5
P ( D | B1 )   0.15, P ( D | B2 )   0.025.
100 200
22
Since a box is selected at random, they are equally likely.
1
P ( B1 )  P ( B2 )  .
2
Thus B1 and B2 form a partition
P ( D )  P ( D | B1 ) P ( B1 )  P ( D | B2 ) P ( B2 )
1 1
 0.15   0.025   0.0875.
2 2
Thus, there is 8.75% probability that a bulb picked at
random is defective.
23
(b) Suppose we test the bulb and it is found to be defective.
What is the probability that it came from box-1? P( B1 | D )  ?
P ( D | B1 ) P ( B1 ) 0.15  1 / 2
P ( B1 | D )    0.8571. (1.5)
P( D) 0.0875
Notice that initially P ( B1 )  0.5; then we picked out a box
at random and tested a bulb that turned out to be defective.
Can this information shed some light about the fact that we
might have picked up box 1?
From (1.5), P ( B1 | D)  0.857  0.5, and indeed it is more
likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs
compared to box2). 24
Reference
• Athanasios Papoulis, “Probability, Random Variables, and Stochastic
Processes,” 3rd edition, McGraw Hill Publication.

Lec-1 - Introduction To Probability Theory

Uploaded by

Copyright:

Available Formats

You might also like

Lec-1 - Introduction To Probability Theory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec-1 - Introduction To Probability Theory

Uploaded by

Copyright:

Available Formats

INFORMATION AND COMMUNICATION THEORY

Dept of ECE, TIET, Patiala 1

Lecture – 1 Introduction to Probability Theory

Interest: in the observations obtained by repeating the experiment several times.

Each outcome in a sample space is called an element or a sample point

Laplace’s Classical Definition: The Probability of an event A is defined a-

Number of outcomes favorable to A

 has subsets A, B, C ,. Recall that if A is a subset of

For any event A, a number P(A) is assigned, called the probability

where A and AB are clearly M.E. events. A B

and using (1.3) in (1.1)

P(A|B) = Probability of “the event A given that B has occurred”

Notice that the above definition is a probabilistic statement,

Bayes’ theorem has an interesting interpretation:

You might also like