Introduction To Probability Theory

Probability Models
Dr. Rahul J. Pandya,

Department of Electrical Engineering,
Indian Institute of Technology (IIT), Dharwad
Email: rpandya@iitdh.ac.in
1
Reference books
 Introduction to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis

 https://www.vfu.bg/en/e-Learning/Math--
Bertsekas_Tsitsiklis_Introduction_to_probability.pdf
 Introduction to probability models by Sheldon Ross

 http://mitran-lab.amath.unc.edu/courses/MATH768/biblio/introduction-
to-prob-models-11th-edition.PDF
 Stochastic process by Sheldon Ross
 Stochastic process and models by David Stirzaker
RJEs: Remote job entry points

Introduction to Probability theory
 Introduction to Probability theory:
 Review of sample space, events, axioms of probability
 Ref. Book - Introduction to Probability by Dimitri P. Bertsekas and

John N. Tsitsiklis
 https://www.vfu.bg/en/e-Learning/Math--
Bertsekas_Tsitsiklis_Introduction_to_probability.pdf
3
Sets
 A set is a collection of objects, which are the elements of the set.
 If S is a set and x is an element of S, we write
 If x is not an element of S, we write
 A set can have no elements, in which case it is called the empty set,
denoted by
4
RJEs: Remote job entry points Ref. Book - Introduction to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis
Sets
 If a set (S) contains a finite number of elements, say
 we write it as a list of the elements, in braces
 For example, the set of possible outcomes of a die roll is

S = {1, 2, 3, 4, 5, 6}
 Possible outcomes of a coin toss is {H, T}, where H stands for “heads”
and T stands for “tails.”
S = {H, T}
5
RJEs: Ref.
Remote job-entry
Book points to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis
Introduction
Sets
 If S contains infinite elements x1, x2, . . .
 Set of all x that have a certain property P, and denote it by
 The symbol “|” is to be read as “such that.”
 E.g. set of all scalars x in the interval [0, 1] can be written as,
6
Sets
 If every element of a set S is also an element of a set T, we say that S
is a subset of T
 
T T
S
S
 Introduce a universal set, denoted by 

 The complement of a set S, with respect to the universe 
𝑆𝑐
𝑆 𝑐 = {𝑥 ∈  | 𝑥 ∉ S}
S
 𝑆 𝑐 indicates all elements of Ω that do not belong to S,
7
Set operations
 Complement of a universal set is an empty set
Venn diagrams
 The union of two sets S and T is the set of all elements
that belong to S or T (or both), and is denoted by
 The intersection of two sets S and T is the set of all

elements that belong to both S and T, and is denoted by
8
Venn diagrams
9
Set operations
 Consider the union or the intersection of several sets
 If for every positive integer n, we are given a set Sn, then
 Two sets are said to be disjoint if their intersection is empty. More generally,
several sets are said to be disjoint if two of them have a no common element.
 A collection of sets is said to be a partition of a set S if the sets in the collection

are disjoint and their union is S.
10
Venn diagrams
11
The Algebra of Sets
12
de Morgan’s laws
13
Probabilistic Models
A probabilistic model is a mathematical description of an uncertain situation
Elements of a Probabilistic Model

• The sample space Ω, which is the set of all possible outcomes of an experiment.
• The probability law, which assigns to a set A of possible outcomes (also called
an event) a nonnegative number P(A) (called the probability of A)
14
Sample Spaces and Events
Sample Space:
 Every probabilistic model involves an underlying process, called the experiment,

that will produce exactly one out of several possible outcomes.
 The set of all possible outcomes is called the sample space of the experiment,
and is denoted by Ω.
 The sample space of an experiment may consist of a finite or an infinite number of

possible outcomes.
Event
 A subset of the sample space, that is, a collection of possible outcomes, is called
an event.
15
Sample Spaces and Events
Examples:
1. If the experiment consists of the flipping of a coin, then
where H means that the outcome of the toss is a head and T that it is a tail.
2. If the experiment consists of rolling a die, then the sample space is
3. If the experiments consists of flipping two coins, then the sample space consists
of the following four points:
The outcome will be (H, H) if both coins come up heads; it will be (H, T ) if the first coin comes up heads
and the second comes up tails; it will be (T, H) if the first comes up tails and the second heads; and it
will be (T, T ) if both coins come up tails. 16
Home work
Example 1.1. Consider two alternative games, both involving ten successive coin
tosses:
Game 1: We receive $1 each time a head comes up.
Game 2: We receive $1 for every coin toss, up to and including the first time a head
comes up. Then, we receive $2 for every coin toss, up to the second time a head
comes up. More generally, the dollar amount per toss is doubled each time a head
comes up.
17
Sequential Models
 Many experiments have an

inherently sequential character
 Tossing a coin three times
 Observing the a stock price

on five successive days
 Describe the experiment and

the associated sample space
by means of a tree-based
sequential description Sample space of an experiment involving
two rolls of a 4-sided die
18
Example
If the experiment consists of rolling two dice, then the sample space consists of the
following 36 points:
where the outcome (i, j) is said to occur if i appears on the first die and j on the
second die.
19
Probability Axioms
1. (Non-negativity) P(A) ≥ 0, for every event A.
2. (Additivity) If A and B are two disjoint events, then the probability of their union
satisfies
P(A ∪ B) = P(A) + P(B)
Furthermore, if the sample space has an infinite number of elements and A1,A2, .
is a sequence of disjoint events, then the probability of their union satisfies
P(A1 ∪ A2 ∪ ・・・) = P(A1) + P(A2) + ・・・
3. (Normalization) The probability of the entire sample space Ω is equal to 1,
P(Ω) = 1
20
Probability Axioms
Example Coin tosses. Consider an experiment involving a single coin toss. There
are two possible outcomes, heads (H) and tails (T). The sample space is Ω = {H, T},
and the events are
 If the coin is fair, i.e., if we believe that heads and tails are “equally likely,” we
should assign equal probabilities to the two possible outcomes and specify that
 The additivity axiom implies that
 Which is consistent with the normalization axiom. Thus, the probability law is given
by
21
Exercise
 Consider another experiment involving three coin tosses. The outcome will now
be a 3-long string of heads or tails. The sample space is
 Assume that each possible outcome has the same probability of 1/8.
 Using additivity, the probability of A is the sum of the probabilities of its

elements:
22
Discrete Probability Law
 If the sample space consists of a finite number of possible outcomes, then the
probability law is specified by the probabilities of the events that consist of a single
element. In particular, the probability of any event {s1, s2, . . . , sn} is the sum of
the probabilities of its elements:
Discrete Uniform Probability Law

 If the sample space consists of n possible outcomes which are equally likely (i.e.,
all single-element events have the same probability), then the probability of any
event A is given by
23
Exercise
Consider the experiment of rolling a pair of 4-sided dice. We assume the dice are fair, and we
interpret this assumption to mean
that each of the sixteen possible outcomes [ordered
pairs (i, j), with i, j = [1, 2, 3, 4], has the same
probability of 1/16. To calculate the probability of
an event, we must count the number of elements of
event and divide by 16 (the total number of possible
outcomes). Here are some event probabilities
calculated in this way:
Sample space:
1, 1; 2, 1; 3, 1; 4, 1;
1, 2; 2, 2; 3, 2; 4, 2;
1, 3; 2, 3; 3, 3; 4, 3;
1, 4; 2, 4; 3, 4; 4, 4;
24
Exercise
Sample space:
1, 1; 2, 1; 3, 1; 4, 1;
1, 2; 2, 2; 3, 2; 4, 2;
1, 3; 2, 3; 3, 3; 4, 3;
1, 4; 2, 4; 3, 4; 4, 4;
25
Continuous Models
 Probabilistic models with continuous sample spaces differ from their discrete counterparts in that the
probabilities of the single-element events may not be sufficient to characterize the probability law.
 This is illustrated in the following examples, which also illustrate how to generalize the uniform
probability law to the case of a continuous sample space.
26
Properties of Probability Laws
27
28
29
30
31
Conditional Probability
 Conditional probability provides us with a way to reason about the
outcome of an experiment, based on partial information
 Conditional probability of A given B, denoted by P(A|B) assuming

P(B) > 0;
32
Conditional Probability
 Conditional probability provides us with a way to reason about the outcome of an
experiment, based on partial information
 Conditional probability of A given B is P(A|B)
 Example: All six possible outcomes of a fair die roll are equally likely. If we are
told that the outcome is even, we are left with only three possible outcomes,
namely, 2, 4, and 6. These three outcomes were equally likely to start with, and so
they should remain equally likely given the additional knowledge that the outcome
was even.
33
Probability Law
34
Additive properties
 To verify the additivity axiom
 Two disjoint events A1 and A2
35
Properties
36
Exercise
 We toss a fair coin three successive times. We wish to find the conditional probability P(A|B) when
A and B are the events defined as follow
A = {more heads than tails come up} B= {1st toss is a head}
 The sample space consists of eight sequences
 The event B consists of the four elements HHH, HHT, HTH, HTT, so its probability is
 The event A ∩ B consists of the three

elements outcomes HHH, HHT, HTH
37
Exercise
 A fair 4-sided die is rolled twice and we assume that all sixteen
possible outcomes are equally likely. Let X and Y be the result of the
1st and the 2nd roll, respectively. We wish to determine the conditional
probability P(A|B) where
and m takes each of the values 1, 2, 3, 4
38
Exercise
 The conditioning event B = {min(X,Y) = 2} consists of
the 5-element shaded set.
 B = { (2,2), (2,3), (2,4), (3,2), (4,2) }
 The set A = {max(X, Y) = m}

 where m takes each of the values 1, 2, 3, 4
 A = { (1,1), for m=1

(2,1), (1,2), (2,2) for m=2
(3,1), (3,2), (3,3), (1,3), (2,3), for m=3
(4,1), (4,2), (4,3), (4,4), (1,4), (2,4), (3,4)} for m=4
 The set A = {max(X, Y ) = m} shares with B two elements if m = 3 or m = 4, one

element if m = 2, and no element if m = 1.
39
Exercise
 The conditioning event B = {min(X,Y) = 2} consists of
the 5-element shaded set.
 B = { (2,2), (2,3), (2,4), (3,2), (4,2) }
 The set A = {max(X, Y) = m}

 where m takes each of the values 1, 2, 3, 4
 A = { (1,1), for m=1

(2,1), (1,2), (2,2) for m=2
(3,1), (3,2), (3,3), (1,3), (2,3), (3,3) for m=3
(4,1), (4,2), (4,3), (4,4), (1,4), (2,4), (3,4), (4,4)} for m=4
40
Exercise
 A conservative design team, call it C, and an innovative design team,

call it N, are asked to separately design a new product within a month.
From past experience we know that:
 (a) The probability that team C is successful is 2/3

 (b) The probability that team N is successful is 1/2.
 (c) The probability that at least one team is successful is 3/4.
 If both teams are successful, the design of team N is adopted.
 Assuming that exactly one successful design is produced, what is the

probability that it was designed by team N?
41
Exercise
 There are four possible outcomes here, corresponding to the four combinations of
success and failure of the two teams:
 SS: both succeed
 FF: both fail
 SF: C succeeds, N fails
 FS: C fails, N succeeds
42
Exercise
(a) The probability that team C is successful is 2/3  SS: both succeed
 FF: both fail
(b) The probability that team N is successful is 1/2
(c) The probability that at least one team is successful is 3/4
P(SS) + P(SF) + P(FS) + P(FF) = 1

43
Exercise
 Assuming that exactly one successful design is produced, what is the probability
that it was designed by team N?
 SS: both succeed

 FF: both fail
44
Multiplication Rule
 Assuming that all of the conditioning events have positive probability,
we have
 The multiplication rule can be verified by writing
 By using the definition of conditional probability to rewrite the right-hand side above
as
45
Visualization of the total probability theorem
46
Total Probability Theorem
 Let A1, . . . , An be disjoint events that form a partition of the sample space (each
possible outcome is included in one and only one of the events A1, . . . , An) and
assume that P(Ai) > 0, for all i = 1, . . . , n.
 Then, for any event B, we have
47
Homework
 Three cards are drawn from an ordinary 52-card deck without replacement (drawn
cards are not placed back in the deck). We wish to find the probability that none of
the three cards is a heart. We assume that at each step, each one of the
remaining cards is equally likely to be picked. By symmetry, this implies that every
triplet of cards is equally likely to be drawn. A cumbersome approach, that we will
not use, is to count the number of all card triplets that do not include a heart, and
divide it with the number of all possible card triplets. Instead, we use a sequential
description of the sample space in conjunction with the multiplication rule.
48
Exercise
 You enter a chess tournament where your probability of winning a game is 0.3 against half the players
(call them type 1), 0.4 against a quarter of the players (call them type 2), and 0.5 against the
remaining quarter of the players (call them type 3). You play a game against a randomly chosen
opponent.
 What is the probability of winning? Let Ai be the event of playing with an opponent of type i. We
have
B be the event of winning
Ai be the event of
playing with an
opponent of type i.
49
Exercise
 Using the additivity axiom, it follows that
 Since, by the definition of conditional

probability, we have
 the preceding equality yields
 Let B be the event of winning. We have
 Thus, by the total probability theorem, the probability of winning is
50
Homework exercise
We roll a fair four-sided die. If the result is 1 or 2, we roll once more but otherwise,
we stop. What is the probability that the sum total of our rolls is at least 4?
51
Homework exercise
Alice is taking a probability class and at the end of each week she can be either up-
to-date or she may have fallen behind. If she is up-to-date in a given week, the
probability that she will be up-to-date (or behind) in the next week is 0.8 (or 0.2,
respectively). If she is behind in a given week, the probability that she will be up-to-
date (or behind) in the next week is 0.6 (or 0.4, respectively). Alice is (by default) up-
to-date when she starts the class. What is the probability that she is up-to-date after
three weeks?
52
Bayes’ Rule
 Let A1,A2, . . . , An be disjoint events that form a partition of the sample space, and
assume that P(Ai) > 0, for all i. Then, for any event B such that P(B) > 0, we have
 Applying total probability theorem
53
Bayes’ Rule
 An example of the inference context that is implicit in Bayes’ rule. We observe a shade in a person’s X-ray (this
is event B, the “effect”) and we want to estimate the likelihood of three mutually exclusive and collectively
exhaustive potential causes: cause 1 (event A1) is that there is a malignant tumor, cause 2 (event A2) is that
there is a non-malignant tumor, and cause 3 (event A3) corresponds to reasons other than a tumor. We assume
that we know the probabilities P(Ai) and P(B | Ai), i = 1, 2, 3. Given that we see a shade (event B occurs),
Bayes’ rule gives the conditional probabilities of the various causes as
54
Exercise
 Let us return to the chess problem of exercise on Slide - 51
 Ai is the event of getting an opponent of type i
 B is the event of winning
 Suppose that you win. What is the probability P(A1 |B) that you had an opponent
of type 1?
 Using Bayes’ rule, we have
55
Independent Events Vs Mutually Exclusive Events
56
Independence
 What are Independent Events?
 Independent events are those events whose occurrence is not dependent on

any other event.
 For example, if we flip a coin in the air and get the outcome as Head, then again
if we flip the coin but this time we get the outcome as Tail.
 In both cases, the occurrence of both events is independent of each other.
 If the probability of occurrence of an event A is not affected by the occurrence of

another event B, then A and B are said to be independent events.
57
Exercise
 Consider an example of rolling a die. If A is the event ‘the number appearing is
odd’ and B be the event ‘the number appearing is a multiple of 3’, then prove that
A and B are the independent event events and compute P(A ∩ B) and P(A│B).
 A is the event ‘the number appearing is odd’ A = {1, 3, 5}

 P(A)= 3/6 = ½
 B be the event ‘the number appearing is a multiple of 3’ B = {3, 6}

 P(B) = 2/6
 Also A ∩ B is the event ‘the number appearing is odd and a multiple of 3’ so that
A∩B={3} P(A ∩ B) = 1/6
P(A ∩ B) = P(A)P(B) = (3/6)(2/6)=1/6
Therefore A and B are independent
 P(A│B) = P(A ∩ B)/ P(B) = (1/6)/(2/6) = 0.5
58
Independence and Multiplication rule of probability
 A is independent of B
59
Independence
 A is independent of B
60
Exercise
 Consider an experiment involving two successive rolls of a 4-sided die in which all
16 possible outcomes are equally likely and have probability 1/16.
61
Properties
62
Exercise
Consider the experiment of rolling a pair of 4-sided dice. We assume the dice are fair, and we
interpret this assumption to mean
that each of the sixteen possible outcomes [ordered
pairs (i, j), with i, j = [1, 2, 3, 4], has the same
probability of 1/16. To calculate the probability of
an event, we must count the number of elements of
event and divide by 16 (the total number of possible
outcomes). Here are some event probabilities
calculated in this way:
Sample space:
1, 1; 2, 1; 3, 1; 4, 1;
1, 2; 2, 2; 3, 2; 4, 2;
1, 3; 2, 3; 3, 3; 4, 3;
1, 4; 2, 4; 3, 4; 4, 4;
63
Properties
64
Conditional Independence
 Given an event C, the events A and B are called conditionally
independent
65
Multiplication Rule
 Assuming that all of the conditioning events have positive probability,
we have
 The multiplication rule can be verified by writing
 By using the definition of conditional probability to rewrite the right-hand side above
as
66
Visualization of the total probability theorem
67
Summary - Independence
 Two events A and B are said to independent if
P(A ∩ B) = P(A) P(B)
 If in addition, P(B) > 0, independence is equivalent to the condition

P(A|B) = P(A)
 If A and B are independent, so are A and Bc.
 Two events A and B are said to be conditionally independent, given another event
C with P(C) > 0,
If P(A ∩ B |C) = P(A|C)P(B |C)
 If in addition, P(B ∩ C) > 0, conditional independence is equivalent to the condition

 P(A|B ∩ C) = P(A|C)
 Remote
RJEs: Independence
job entry points does not imply conditional independence, and vice versa 68
Independence of a Collection of Events
 The events A1,A2 , . . . , An are independent if
 The first three conditions simply assert that any two events are independent, a
property known as pairwise independence. But the fourth condition is also
important and does not follow from the first three. Conversely, the fourth condition
does not imply the first three. 69
Exercise
 Prove that Pairwise independence does not imply independence
 Consider two independent fair coin tosses {H,T}, and the following events:
 H1 = {1st toss is a head} = {H}
 H2 = {2nd toss is a head} = {H}
 D = {the two tosses have different results} = {HH, {HT, TH}, TT}
 The events H1 and H2 are independent
 To see that H1 and D are independent,
 Similarly, H2 and D are independent.
 However, H1, H2, and D are not independent
70
Series Connection
 Let a subsystem consist of components 1, 2, . . . , m, and let (pi ) be the probability
that component i is up (“succeeds”).
 A series subsystem succeeds if all of its components are up
 Its probability of success is the product of the probabilities of success of the

corresponding components
71
Parallel Connection
 A parallel subsystem succeeds if any one of its components succeeds, so its
probability of failure is the product of the probabilities of failure of the
corresponding components
72
Exercise
 For a given network, calculate the probability of success for a path from A to B
73
Exercise
 For a given network, calculate the probability of success for a path from A to B
74
Independent Trials and the Binomial Probabilities
 If an experiment involves a sequence of independent but identical
stages, we say that we have a sequence of independent trials.
 In the special case where there are only two possible results at each
stage, we say that we have a sequence of independent Bernoulli
trials.
 E.g., “it rains” or “it doesn’t rain,” or coin tosses results as “heads” (H)
and “tails” (T).
 Compute the probability of k heads come up in an n-toss sequence
75
Independent Bernoulli trials – Sequential description
 Sequential description of the sample space of an experiment involving three
independent tosses of a biased coin
 n=3 long sequence of heads and tails that

involves k heads and (3 − k) tails
 n-long sequence that contains k heads

and n − k tails
76
Exercise
 Compute the probability of k heads come up in an n-toss sequence
 The probability of any given sequence

that contains k heads is
77
Properties
 (called “n choose k”) are known as the binomial coefficients
 The probabilities p(k) are known as the binomial probabilities
 Where for any positive integer i we have
 by convention, 0! = 1
78
The Counting Principle
 Consider a process that consists of r

stages.
 (a) There are n1 possible results for the
first stage.
 (b) For every possible result of the first
stage, there are n2 possible results at the
second stage.
 (c) More generally, for all possible results
of the first i − 1 stages, there are ni
possible results at the ith stage.
 Then, the total number of possible
results of the r-stage process is
 n1 n2 ・・ nr.
79
Permutation and Combination
 If the order of selection does not matters,
the selection it is called a combination
 AB, AC, AD, BA, BC, BD, CA, CB, CD,

DA, DB, DC
 If the order of selection matters, the

selection is called a permutation
 AB, AC, AD, BC, BD, CD
80
k-permutations
 Start with n distinct objects, and let k be some positive integer, with k ≤ n. Count the number of different
ways that we can pick k out of these n objects and arrange them in a sequence,
 We can choose any of the n objects to be the first one. Having chosen the first, there are only n−1
possible choices for the second; given the choice of the first two, there only remain n − 2 available objects
for the third stage, etc.
 When we are ready to select the last (the kth) object, we have already chosen k − 1 objects, which leaves us
with n − (k − 1) choices for the last one. By the Counting Principle, the number of possible sequences, called
k-permutations, is
 In the special case where k = n, the number of possible sequences, simply called permutations, is
81
Exercise - k-permutations
 Count the number of words that consist of four distinct letters.
 This is the problem of counting the number of 4-permutations of the 26

letters in the alphabet.
 The desired number is
82
Combinations
 In a combination there is no ordering of the selected elements
 2-permutations of the letters A, B, C, and D are
 AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC,
 the combinations of two out four of these letters are
 AB, AC, AD, BC, BD, CD
 Note that specifying an n-toss sequence with k heads is the same as picking k elements (those
that correspond to heads) out of the n-element set of tosses. Thus, the number of combinations is
the same as the binomial coefficient
83
Combinations - Exercise
 Count the number of combinations of two out of the four letters A, B, C, and D
 Let n = 4 and k = 2.
84
Random variable
 A random variable is a variable which represents the outcome of a trial, an
experiment or event.
 It is a specific number which is different each time the trial, experiment or event is
repeated.
 E.g. Throwing 2 dice
 Let X be the random variable equal to the sum of both the outcomes of the rolls
 Sample Space = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
 P(X=12) = 1/36
85
Concepts Related to Random Variables
 A random variable is a real-valued function of the outcome of the
experiment.
 A function of a random variable defines another random variable.
 We can associate with each random variable certain “averages” of

interest, such the mean and the variance.
 A random variable can be conditioned on an event or on another

random variable.
 There is a notion of independence of a random variable from an

event or from another random variable. 86
Discrete Random Variable
 A random variable is called discrete if its range (the set of values
that it can take) is finite or at most countably infinite
 E.g.
 The sum of the two rolls.
 Sample Space = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
87
Continuous Random Variable
 Continuous random variables describe outcomes in probabilistic situations
where the possible values some quantity can take form a continuum, which is
often (but not always) the entire set of real numbers R
 Choosing a point a from the interval [−4, 4]
 The possible values of the temperature outside on any given day
88
RJEs: Remote job -entry
Ref. Book points to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis , https://brilliant.org/wiki/continuous-random-variables-definition/
Introduction
Concepts Related to Discrete Random Variables
• A discrete random variable is a real-valued function of the outcome
of the experiment that can take a finite or countably infinite number of
values.
• A (discrete) random variable has an associated probability mass

function (PMF), which gives the probability of each numerical value
that the random variable can take.
• A function of a random variable defines another random variable,

whose PMF can be obtained from the PMF of the original random
variable.
89
PROBABILITY MASS FUNCTION (PMF)
 For a discrete random variable X, these are captured by the probability mass
function (PMF) of X, denoted pX. In particular, if x is any possible value of X,
the probability mass of x, denoted pX(x), is the probability of the event
{X = x} consisting of all outcomes that give rise to a value of X equal to x:
 For example, let the experiment consist of two independent tosses of a fair
coin, and let X be the number of heads obtained. X = { HH, HT, TH, TT }
 Then the PMF of X is
90
PROBABILITY MASS FUNCTION (PMF)
 For a discrete random variable X, these are captured by the probability mass
function (PMF) of X, denoted pX. In particular, if x is any possible value of X,
the probability mass of x, denoted pX(x), is the probability of the event
{X = x} consisting of all outcomes that give rise to a value of X equal to x:
 For example, let the experiment consist of two independent tosses of a fair
coin, and let X be the number of heads obtained. X = { HH, HT, TH, TT }
 Then the PMF of X is
91
Calculation of the PMF of a Random Variable X
 Random variable X = maximum roll in two independent rolls of a fair 4-sided die.
 x = 1,  (1, 1)  pX(1) = 1/16

 x = 2,  (1, 2), (2, 2), (2, 1)  pX(2) = 3/16
 x = 3,  (1, 3), (2, 3), (3, 3), (3, 1), (3, 2)  pX(2) = 5/16
 x = 4,  (1, 4), (2, 4), (3, 4), (4, 4), (4, 1), (4, 2), (4, 3)  pX(2) = 7/16
92
Calculation of the PMF of a Random Variable X
 For each possible value x of X:
 Collect all the possible outcomes that give rise to the event {X = x}.
 Add their probabilities to obtain pX(x).
93
The Bernoulli Random Variable
 Consider the toss of a biased coin, which comes up a head with probability p, and a tail with
probability 1−p.
 The Bernoulli random variable takes the two values 1 and 0
 The Bernoulli random variable is used to model probabilistic situations with just two
outcomes,
 (a) The state of a telephone at a given time that can be either free or busy.
 (b) A person who can be either healthy or sick with a certain disease.
 Furthermore, by combining multiple Bernoulli random variables, one can construct

more complicated random variables.
94
The Binomial Random Variable
 A biased coin is tossed n times. At each toss, the coin comes up a head with
probability p, and a tail with probability 1−p, independently of prior tosses. Let X
be the number of heads in the n-toss sequence. We refer to X as a binomial
random variable with parameters n and p. The PMF of X consists of the binomial
probabilities that were calculated
 Applying Additive and Normalization Property
95
Exercise
 Plot the Binomial PMF for the given cases:
 Case-1: If n=9 and p = 1/2, the PMF is symmetric around n/2
96
Exercise
Case – 2: the PMF is skewed Case – 3: the PMF is

towards 0 if p < 1/2 skewed towards if p > 1/2
97
Geometric Random Variable
 The geometric random variable is used when repeated independent trials are
performed until the first “success.”
 Each trial has probability of success p and the number of trials until the first
success is modelled by the geometric random variable.
 The PMF of a geometric random variable
 It decreases as a geometric progression

with parameter 1 − p.
98
The Poisson Random Variable
 A Poisson random variable takes nonnegative integer values. Its PMF is given by
 where λ is a positive parameter characterizing the PMF
 Poisson random variable, think of a binomial random variable with very small p
and very large n.
 E.g. The number of cars involved in accidents in a city on a given day

99
The Poisson Random Variable
 Probability Mass Function (PMF)
 If λ < 1, then the PMF is monotonically decreasing
 While if λ > 1, the PMF first increases and then

decreases as the value of k increases
100
RJEs: Ref.
Remote job-entry
Book points to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis
Introduction
Exercise
 Poisson PMF with parameter λ is a good approximation for a binomial PMF

with parameters n and p, provided λ = np, n is very large, and p is very
small, i.e.,
 Using the Poisson PMF may result in simpler models and calculations. For
example, let n = 100 and p = 0.01. Then the probability of k = 5 successes in
n = 100 trials is calculated using the binomial PMF as
 Using the Poisson PMF with λ = np = 100 ・ 0.01 = 1, this probability is approximated by
101
FUNCTIONS OF RANDOM VARIABLES
 Consider a probability model of today’s weather, let the random variable X be the
temperature in degrees Celsius, and consider the transformation Y = 1.8X + 32,
which gives the temperature in degrees Fahrenheit. In this example, Y is a linear
function of X, of the form
 Nonlinear functions of the general form
102
Properties
 If X is discrete with PMF pX, then Y is also discrete, and its PMF pY can be
calculated using the PMF of X. In particular, to obtain pY (y) for any y, we add the
probabilities of all values of x such that g(x) = y:
103
Exercise
 Let Y = |X| and let us apply the preceding formula for the PMF and compute
compute pY(y) to the case where
Y = |X|
104
Exercise
 Let Y = |X| and let us apply the preceding formula for the PMF and compute
compute pY(y) to the case where
 The possible values of Y are y = 0, 1, 2, 3, 4. To compute pY (y) for some given value y from this
range, we must add pX(x) over all values x such that |x| = y. In particular, there is only one value
of X that corresponds to y = 0, namely x = 0.
105
The PMFs of X and Y = |X|
106
Homework exercise
 Try the previous exercise for Z = X2
107
Expectation or Mean of a Random Variable
 Suppose you spin a wheel of fortune many times. At each spin, one of the numbers
m1, m2, . . . , mn comes up with corresponding probability p1, p2, . . . , pn, and this is
your monetary reward from that spin. What is the amount of money that you
“expect” to get “per spin”?
 Suppose that you spin the wheel k times, and that ki is the number of times that the
outcome is mi. Then, the total amount received is m1k1 +m2k2 +・・・+ mnkn.
 The amount received per spin is

2 5 + 3 10 + (1)(20)
𝑀= = 12
5
 If the number of spins k is very large, and if we are willing to interpret

probabilities as relative frequencies, it is reasonable to anticipate that mi comes
up a fraction of times that is roughly equal to pi:
108
Expectation or Mean of a Random Variable
 Thus, the amount of money per spin that you “expect” to receive is
109
Exercise - Compute mean
 Consider two independent coin tosses, each with a 3/4 probability of a head, and
let X be the number of heads obtained. This is a binomial random variable with
parameters n = 2 and p = 3/4. Its PMF is given below. Compute mean.
3/4 probability of a head, 1/4 prob. of tail
110
Exercise - Compute mean
111
nth moment of X
 1st moment of X is just the mean
 2nd moment of the random variable X as the expected value of the random
variable X2
 nth moment as E[Xn], the expected value of the random variable Xn
112
Variance and Standard Deviation
 The variance is always nonnegative
 The variance provides a measure of dispersion of X around its mean.
 Another measure of dispersion is the standard deviation of X, which is defined

as the square root of the variance and is denoted by σX:
 The standard deviation is often easier to interpret, because it has the same
units as X. For example, if X measures length in meters, the units of variance
are square meters, while the units of the standard deviation are meters.
113
Exercise – Compute Mean
 Consider the random variable X, which has the PMF
 The mean E[X] is equal to 0. This can be seen from the symmetry of the PMF of X
around 0, and can also be verified from the definition:
114
Exercise – Compute Variance
 Consider the random variable X, which has the PMF
115
Expected Value Rule for Functions of Random Variables
116
117
 Using the expected value rule, we can write the variance of X as
 Similarly, the nth moment is given by
 There is no need to calculate the PMF of Xn
118
Exercise
 For the random variable X with PMF
119
Summary - Variance
120
Expected value rule for functions
 Y is a random variable is a function of another random variable X
 Where a and b are given scalars. Let us derive the mean and the variance of the
linear function Y.
121
Property of Mean and Variance
122
Variance in Terms of Moments Expression
123
Exercise - Mean and Variance of the Bernoulli
 Consider the experiment of tossing a biased coin, which comes up a head with
probability p and a tail with probability 1 − p, and the Bernoulli random variable X
with PMF
 Compute mean, second moment, and variance.
Second moment
124
Exercise
 What is the mean and variance of the roll of a fair six-sided die? If we view the
result of the roll as a random variable X, its PMF is
E[X] = 3.5
125
The Mean of the Poisson
 The mean of the Poisson PMF can be calculated is follows:
126
Joint PMFs of Multiple Random Variables
 Consider two discrete random variables X and Y associated with the same
experiment.
 The joint PMF of X and Y is defined by
127
Joint PMFs of Multiple Random Variables
 The joint PMF determines the probability of any event that can be specified in terms of the
random variables X and Y. For example if A is the set of all pairs (x, y) that have a certain
property, then
 We can calculate the PMFs of X and Y by using the formulas
 Where the second equality follows by noting that the

event {X = x} is the union of the disjoint events {X = x, Y
= y} as y ranges over all the different values of Y. The
formula for pY (y) is verified similarly. We sometimes refer
to pX and pY as the marginal PMFs, to distinguish them
from the joint PMF.
128
Functions of Multiple Random Variables
 A function Z = g (X, Y) of the random variables X and Y defines another random
variable Z. Its PMF can be calculated from the joint PMF pX,Y according to
 In the special case where g is linear and of the form aX+bY +c, where a, b, and c
are given scalars, we have
129
Illustration of the tabular method for calculating marginal PMFs
from joint PMFs
130
More than two Random Variables
 The joint PMF of three random variables X, Y , and Z is defined
 For all possible triplets of numerical values (x, y, z). Corresponding marginal PMFs
 The expected value rule for functions takes the form
 if g is linear and of the form aX + bY + cZ + d, then
131
Exercise - Mean of the Binomial
 A class has 300 students and each student has probability 1/3 of getting an A, independently of
any other student. What is the mean of X, the number of students that get an A?
 Thus X1,X2, . . . , Xn are Bernoulli random variables with common mean p = 1/3 and
variance p(1 − p) = (1/3)(2/3) = 2/9. Their sum is the number of students that get an A.
 If we repeat this calculation for a general number of students n and probability of A equal to p, we
obtain
132
Summary of Facts About Joint PMFs
 Let X and Y be random variables associated with the same experiment.
 The joint PMF of X and Y is defined by
 The marginal PMFs of X and Y can be obtained from the joint PMF, using the formulas
 A function g(X, Y ) of X and Y defines another random variable, and
 If g is linear, of the form aX + bY + c, we have
133
CONDITIONING
 PMF pX|A(x): For each x, we add the probabilities of the outcomes in the intersection {X = x} ∩ A
and normalize by diving with P(A).
 The conditional PMF of a random variable X, conditioned on a particular event A with P(A) > 0, is
defined by
 Note that the events {X = x} ∩ A are disjoint for

different values of x, their union is A, and,
therefore,
 Combining the above two formulas, we see that
134
CONDITIONING - Exercise
 Let X be the roll of a die and let A be the event that the roll is an even number.
Compute the PX/A(x).
135
Conditioning one Random Variable on Another
 Let X and Y be two random variables conditional PMF pX|Y of X given Y
 Using the definition of conditional probabilities
 Normalization property
 Joint PMF, using a sequential approach
136
Exercise
 Consider a transmitter that is sending messages over a computer network. Let us define the
following two random variables:
 X : the travel time of a given message, Y : the length of the given message.
 We know the PMF of the travel time of a message that has a given length, and we know the PMF of
the message length. We want to find the (unconditional) PMF of the travel time of a message.
 Assume that the travel time X of the message depends on its length Y and the congestion level of
the network at the time of transmission. In particular, Length of a message can take two possible
values:
 y = 102 bytes with probability (p) = 5/6
 y = 104 bytes with probability (p) = 1/6
 The travel time is (10-4 Y) secs with probability 1/2

 (10-3 Y) secs with probability 1/3
 (10-2 Y) secs with probability 1/6
137
Exercise - Compute the PMF of X
 To compute the PMF of X, we use the total probability formula
138
Conditional Expectation
 Let X and Y be random variables associated with the same experiment.
 The conditional expectation of X given an event A with P(A) > 0, is defined by
 For a function g(X), it is given by
 The conditional expectation of X given a value y of Y is defined by
139
Total expectation theorem
 Let A1, . . . , An be disjoint events that form a partition of the sample space, and
assume that P(Ai) > 0 for all i. Then,
 The total expectation theorem basically says that “the unconditional average
can be obtained by averaging the conditional averages.”
140
Homework – Exercise – 2.12 (Page-29)
 Consider four independent rolls of a 6-sided die. Let X be the number of 1’s and let
Y be the number of 2’s obtained. What is the joint PMF of X and Y ? The marginal
PMF pY is given by the binomial formula
141
Independence of a Random Variable from an Event
 The independence of a random variable from an event is similar to the
independence of two events. The idea is that knowing the occurrence of the
conditioning event tells us nothing about the value of the random variable. We can
say that the random variable X is independent of the event A if
 Which is the same as requiring that the two events {X = x} and A be

independent, for any choice x. As long as P(A) > 0, and using the definition
pX|A(x) = P(X = x and A)/P(A) of the conditional PMF, we see that independence is
the same as the condition
142
Independence of Random Variables
 The notion of independence of two random variables is similar. We say that
two random variables X and Y are independent if
 X and Y are said to be conditionally independent, given a positive probability

event A,
144
Independence of Random Variables
 If X and Y are independent random variables, then
 Proof:
145
Variance
 Consider now the sum Z = X + Y of two independent random variables X and Y,
and let us calculate the variance of Z. We have, using the relation
 E[X + Y ] = E[X] + E[Y ],
147
Independence of Several Random Variables
 For example, three random variables X, Y , and Z are said to be independent if
 If X, Y, and Z are independent random variables, then any three random variables
of the form f(X), g(Y), and h(Z), are also independent
 Similarly, any two random variables of the form g(X, Y) and h(Z) are independent.
 On the other hand, two random variables of the form g(X, Y) and h(Y, Z) are
usually not independent, because they are both affected by Y .
 If X1,X2, . . . , Xn are independent random variables, then
148
Continuous Random Variables and PDFs
 A random variable X is called continuous if its probability law can be described in
terms of a nonnegative function fX, called the probability density function of X,
or Probability Density Function (PDF) for short, which satisfies
 The probability that the value of X falls within an interval is given below and and
can be interpreted as the area under the graph of the PDF
149
Continuous Random Variables and PDFs
 The probability that the value of X falls within an interval is given below and can
be interpreted as the area under the graph of the PDF
entire area under the graph of the PDF must

be equal to 1.
150
Probability mass per unit length
 To interpret the PDF, note that for an interval [x, x + δ] with very small length δ,
we have
 If δ is very small, the probability that X takes value in the interval [x, x + δ] is
the shaded area in the figure, which is approximately equal to fX(x)・ δ.
151
Continuous Uniform Random Variable
 A gambler spins a wheel of fortune, continuously calibrated between 0 and 1,
and observes the resulting number. Assuming that all subintervals of [0,1] of the
same length are equally likely, this experiment can be modelled in terms a
random variable X with PDF
 For some constant c. This constant can be determined by using the normalization
property
152
The PDF of a uniform random variable
153
Piecewise Constant PDF
 Alvin’s driving time to work is between 15 and 20 minutes if the day is sunny,
and between 20 and 25 minutes if the day is rainy, with all times being equally
likely in each case. Assume that a day is sunny with probability 2/3 and rainy
with probability 1/3. What is the PDF of the driving time, viewed as a random
variable X?
 “All times are equally likely” in the sunny and the rainy cases, to mean that the
PDF of X is constant in each of the intervals [15, 20] and [20, 25].
154
155
 Where a1, a2, . . . , an are some scalars with ai < ai+1 for all i, and c1, c2, . . . , cn are
some nonnegative constants
156
PDF Properties
157
Expectation
 The expected value or mean of a continuous random variable X is
defined by
 If Y = g(X) is a discrete random variable. In either case, the mean of g(X)

satisfies the expected value rule
158
Expectation of a Continuous Random Variable and its Properties
159
Expectation of a Continuous Random Variable and its Properties
160
Exercise
 Consider the case of a uniform PDF over an interval [a, b], as shown below
and compute Mean and Variance of the Uniform Random Variable
161
Exercise
 Consider the case of a uniform PDF over an interval [a, b], as shown below and
compute Mean and Variance of the Uniform Random Variable
162
Exercise
 Consider the case of a uniform PDF over an interval [a, b], as shown below and
compute Mean and Variance of the Uniform Random Variable
163
Annexure-I
164
RJEs: Remote job entry points https://softmathblog.weebly.com/blog/learn-algebra-equation-and-its-formulas-and-properties
Exponential Random Variable
 An exponential random variable has a PDF of the form
 Where λ is a positive parameter characterizing the PDF
 Note that the probability that X exceeds a certain value falls exponentially.
Indeed, for any a ≥ 0, we have
165
Exponential Random Variable
 The mean and the variance can be calculated to be
 An exponential random variable can be a very good model for the amount of
time until a piece of equipment breaks down, until a light bulb burns out, or until
an accident occurs.
166
Mean of an Exponential Random Variable
167
Second moment of an Exponential Random Variable
168
Variance of an Exponential Random Variable
169
Exercise
 The time until a small meteorite first lands anywhere on the earth is modelled as
an exponential random variable with a mean of 10 days. The time is currently
midnight. What is the probability that a meteorite first lands some time between
6am and 6pm of the first day?
 Let X be the time elapsed until the event of interest, measured in days. Then, X is
exponential, with mean 1/λ = 10, which yields λ = 1/10.
 The desired probability is
170
Cumulative Distribution Functions - CDF
 The CDF of a random variable X is denoted by FX and provides the probability
P(X ≤ x).
 The CDF FX(x) “accumulates” probability “up to” the value x.
172
Properties of a CDF
173
Properties of a CDF
174
The Geometric and Exponential CDFs
 Let X be a geometric random variable with parameter p; that is, X is the number of
trials to obtain the first success in a sequence of independent Bernoulli trials, where
the probability of success is p.
175
CDFs of some continuous random variables.
176
Relation of the geometric and the exponential CDFs
177
Normal Random Variables
 A continuous random variable X is said to be normal or Gaussian if it has a PDF of
the form
 Where μ and σ are two scalar parameters characterizing the PDF, with σ
assumed nonnegative. It can be verified that the normalization property
178
 A normal PDF and CDF, with μ = 1 and σ2 = 1. We observe that the PDF is
symmetric around its mean μ, and has a characteristic bell-shape. As x gets
further from μ, the term e−(x−μ)2/2σ2 decreases very rapidly. In this figure, the PDF is
very close to zero outside the interval [−1, 3].
179
 The mean and the variance can be calculated to be
 To see this, note that the PDF is symmetric around μ, so its mean must be μ.
Furthermore, the variance is given by
 Using the change of variables y = (x − μ)/σ and integration by parts, we have
180
 Using the change of variables y = (x − μ)/σ and integration by parts, we have
181
Standard Normal Random Variables
 The last equality above is obtained by using the fact
182
Normality is Preserved by Linear Transformations
183
The Standard Normal Random Variable
 A normal random variable Y with zero mean and unit variance is said to be a
standard normal. Its CDF is denoted by Φ,
 Let X be a normal random variable with mean μ and variance σ2. We “standardize”
X by defining a new random variable Y given by
 Since Y is a linear transformation of X, it is normal. Furthermore,
184
The Standard Normal Random Variable
185
CDF Calculation of the Normal Random Variable
186
Exercise - Signal Detection
 A binary message is transmitted as a signal that is either −1 or +1. The communication channel corrupts the
transmission with additive normal noise with mean μ = 0 and variance σ2. The receiver concludes that the
signal −1 (or +1) was transmitted if the value received is < 0 (or ≥ 0, respectively); What is the probability of
error?
 The area of the shaded region gives the probability of error in the two cases where −1 and +1 is transmitted.
187
Exercise - Signal Detection
 An error occurs whenever −1 is transmitted and the noise N is at least 1
 So that N +S = N −1 ≥ 0,
 Whenever +1 is transmitted and the noise N is smaller than −1
 So that N + S = N +1 < 0.
 In the former case, the probability of error is
 For σ = 1, we have Φ(1/σ) = Φ(1) = 0.8413, and

the probability of the error is 0.1587.
188
CONDITIONING ON AN EVENT
 The conditional PDF of a continuous random variable X, conditioned on a
particular event A with P(A) > 0, is a function fX|A that satisfies
189
CONDITIONING ON AN EVENT
 The conditional PDF of a continuous random variable X, conditioned on a
particular event A with P(A) > 0, is a function fX|A that satisfies
 The unconditional PDF fX and the conditional PDF fX|A, where A is the
interval [a, b]. Note that within the conditioning event A, fX|A retains the
same shape as fX, except that it is scaled along the vertical axis.
190
Conditional PDF and Expectation Given an Event
191
Conditional PDF and Expectation Given an Event
192
Multiple Continuous Random Variables
 We say that two continuous random variables associated with a common experiment are jointly
continuous and can be described in terms of a joint PDF fX,Y , if fX,Y is a nonnegative function that
satisfies
 For every subset B of the two-dimensional plane. The notation above means that the integration is
carried over the set B. In the particular case where B is a rectangle of the form B = [a, b] × [c, d], we
have
 Normalization property
193
Multiple Continuous Random Variables
 To interpret the PDF, we let δ be very small and consider the probability of a small
rectangle. We have
194
Marginal PDF
 The joint PDF contains all conceivable probabilistic information on the random
variables X and Y , as well as their dependencies. It allows us to calculate the
probability of any event that can be defined in terms of these two random variables.
As a special case, it can be used to calculate the probability of an event involving
only one of them. For example, let A be a subset of the real line and consider the
event {X ∈ A}. We have
195
Expectation
 If X and Y are jointly continuous random variables, and g is some function, then
Z = g(X, Y ) is also a random variable. We will see in Section 3.6 methods for
computing the PDF of Z, if it has one. For now, let us note that the expected
value rule is still applicable and
 As an important special case, for any scalars a, b, we have
196
Conditioning one Random Variable on Another
197
Circular Uniform PDF
 We throw a dart at a circular target of radius r. We assume that we always hit the
target, and that all points of impact (x, y) are equally likely, so that the joint PDF
of the random variables X and Y is uniform. Since the area of the circle is πr2,
we have
198
 To calculate the conditional

PDF fX|Y(x|y), let us first
calculate the marginal PDF
fY (y). For |y| > r, it is zero. For Note that the marginal fY (y) is
|y| ≤ r, it can be calculated as not a uniform PDF.
follows:
199
200
Independence of Continuous Random Variables
201
Joint CDFs
202
Expectation or mean
 We have seen that the mean of a function Y = g(X) of a continuous random
variable X, can be calculated using the expected value rule
203
DERIVED DISTRIBUTIONS
204
Home-work - Exercise
 John Slow is driving from Boston to the New York area, a distance of 180 miles. His average speed
is uniformly distributed between 30 and 60 miles per hour. What is the PDF of the duration of the
trip?
205
Exercise
206
Exercise
207
Stochastic process
 A stochastic process is a mathematical model of a probabilistic experiment that
evolves in time and generates a sequence of numerical values. For example,
a stochastic process can be used to model:
 (a) the sequence of daily prices of a stock;
 (b) the sequence of scores in a football game;
 (c) the sequence of failure times of a machine;
 (d) the sequence of hourly traffic loads at a node of a communication network;
 (e) the sequence of radar measurements of the position of an airplane.
208
Two major categories of stochastic processes
 Arrival-Type Processes:
 Here, we are interested in occurrences that have the character of an “arrival,”

such as message receptions at a receiver, job completions in a manufacturing
cell, customer purchases at a store, etc.
 Inter-arrival times (the times between successive arrivals) are independent

random variables.
 The inter-arrival times are Geometrically distributed – Bernoulli process
 The inter-arrival times are Exponentially distributed – Poisson process.
209
Two major categories of stochastic processes
 Markov Processes
 Experiments that evolve in time and in which the future evolution exhibits a
probabilistic dependence on the past.
 As an example, the future daily prices of a stock are typically dependent on past
prices.
210

Introduction To Probability Theory

Uploaded by

Copyright:

Available Formats

You might also like

Introduction To Probability Theory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Probability Theory

Uploaded by

Copyright:

Available Formats

Probability Models

Dr. Rahul J. Pandya,

 Introduction to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis

 Introduction to probability models by Sheldon Ross

 Stochastic process by Sheldon Ross

 Stochastic process and models by David Stirzaker

RJEs: Remote job entry points

 Introduction to Probability theory:

 Review of sample space, events, axioms of probability

 Ref. Book - Introduction to Probability by Dimitri P. Bertsekas and

 If S is a set and x is an element of S, we write

 If x is not an element of S, we write

 we write it as a list of the elements, in braces

 For example, the set of possible outcomes of a die roll is

 Set of all x that have a certain property P, and denote it by

 The symbol “|” is to be read as “such that.”

 Introduce a universal set, denoted by 

 The intersection of two sets S and T is the set of all

 If for every positive integer n, we are given a set Sn, then

 A collection of sets is said to be a partition of a set S if the sets in the collection

A probabilistic model is a mathematical description of an uncertain situation

Elements of a Probabilistic Model

 Every probabilistic model involves an underlying process, called the experiment,

 The sample space of an experiment may consist of a finite or an infinite number of

2. If the experiment consists of rolling a die, then the sample space is

Game 1: We receive $1 each time a head comes up.

 Many experiments have an

 Tossing a coin three times

 Observing the a stock price

 Describe the experiment and

P(A1 ∪ A2 ∪ ・・ ・) = P(A1) + P(A2) + ・ ・ ・

3. (Normalization) The probability of the entire sample space Ω is equal to 1,

 The additivity axiom implies that

 Using additivity, the probability of A is the sum of the probabilities of its

Discrete Uniform Probability Law

 Conditional probability of A given B, denoted by P(A|B) assuming

 Conditional probability of A given B is P(A|B)

 To verify the additivity axiom

 Two disjoint events A1 and A2

A = {more heads than tails come up} B= {1st toss is a head}

 The sample space consists of eight sequences

 The event A ∩ B consists of the three

and m takes each of the values 1, 2, 3, 4

 B = { (2,2), (2,3), (2,4), (3,2), (4,2) }

 The set A = {max(X, Y) = m}

 A = { (1,1), for m=1

 The set A = {max(X, Y ) = m} shares with B two elements if m = 3 or m = 4, one

 B = { (2,2), (2,3), (2,4), (3,2), (4,2) }

 The set A = {max(X, Y) = m}

 A = { (1,1), for m=1

 A conservative design team, call it C, and an innovative design team,

 (a) The probability that team C is successful is 2/3

 If both teams are successful, the design of team N is adopted.

 Assuming that exactly one successful design is produced, what is the

 SS: both succeed

 FF: both fail

 SF: C succeeds, N fails

P(A1 ∪ A2 ∪ ・・・) = P(A1) + P(A2) + ・・・