Introduction To Probability Theory

Probability Models

Dr. Rahul J. Pandya,

Department of Electrical Engineering,
Indian Institute of Technology (IIT), Dharwad
Reference books

 Introduction to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis


 Introduction to probability models by Sheldon Ross


 Stochastic process by Sheldon Ross

 Stochastic process and models by David Stirzaker

Introduction to Probability theory

 Introduction to Probability theory:

 Review of sample space, events, axioms of probability

 Ref. Book - Introduction to Probability by Dimitri P. Bertsekas and

John N. Tsitsiklis


 A set is a collection of objects, which are the elements of the set.

 If S is a set and x is an element of S, we write

 If x is not an element of S, we write

 A set can have no elements, in which case it is called the empty set,
denoted by
 If a set (S) contains a finite number of elements, say

 we write it as a list of the elements, in braces

 For example, the set of possible outcomes of a die roll is

S = {1, 2, 3, 4, 5, 6}

 Possible outcomes of a coin toss is {H, T}, where H stands for “heads”
and T stands for “tails.”
S = {H, T}
 If S contains infinite elements x1, x2, . . .

 Set of all x that have a certain property P, and denote it by

 The symbol “|” is to be read as “such that.”

 E.g. set of all scalars x in the interval [0, 1] can be written as,

 If every element of a set S is also an element of a set T, we say that S
is a subset of T
 

 Introduce a universal set, denoted by 

 The complement of a set S, with respect to the universe 
𝑆 𝑐 = {𝑥 ∈  | 𝑥 ∉ S}
 𝑆 𝑐 indicates all elements of Ω that do not belong to S,
Set operations
 Complement of a universal set is an empty set

Venn diagrams
 The union of two sets S and T is the set of all elements
that belong to S or T (or both), and is denoted by

 The intersection of two sets S and T is the set of all

elements that belong to both S and T, and is denoted by

Venn diagrams

Set operations
 Consider the union or the intersection of several sets

 If for every positive integer n, we are given a set Sn, then

 Two sets are said to be disjoint if their intersection is empty. More generally,
several sets are said to be disjoint if two of them have a no common element.

 A collection of sets is said to be a partition of a set S if the sets in the collection

are disjoint and their union is S.
Venn diagrams

The Algebra of Sets

de Morgan’s laws

Probabilistic Models

A probabilistic model is a mathematical description of an uncertain situation

Elements of a Probabilistic Model

• The sample space Ω, which is the set of all possible outcomes of an experiment.
• The probability law, which assigns to a set A of possible outcomes (also called
an event) a nonnegative number P(A) (called the probability of A)
Sample Spaces and Events
Sample Space:

 Every probabilistic model involves an underlying process, called the experiment,

that will produce exactly one out of several possible outcomes.

 The set of all possible outcomes is called the sample space of the experiment,
and is denoted by Ω.

 The sample space of an experiment may consist of a finite or an infinite number of

possible outcomes.


 A subset of the sample space, that is, a collection of possible outcomes, is called
an event.
Sample Spaces and Events
1. If the experiment consists of the flipping of a coin, then

where H means that the outcome of the toss is a head and T that it is a tail.

2. If the experiment consists of rolling a die, then the sample space is

3. If the experiments consists of flipping two coins, then the sample space consists
of the following four points:

The outcome will be (H, H) if both coins come up heads; it will be (H, T ) if the first coin comes up heads
and the second comes up tails; it will be (T, H) if the first comes up tails and the second heads; and it
will be (T, T ) if both coins come up tails. 16
Home work
Example 1.1. Consider two alternative games, both involving ten successive coin

Game 1: We receive $1 each time a head comes up.

Game 2: We receive $1 for every coin toss, up to and including the first time a head
comes up. Then, we receive $2 for every coin toss, up to the second time a head
comes up. More generally, the dollar amount per toss is doubled each time a head
comes up.

Sequential Models

 Many experiments have an

inherently sequential character

 Tossing a coin three times

 Observing the a stock price

on five successive days

 Describe the experiment and

the associated sample space
by means of a tree-based
sequential description Sample space of an experiment involving
two rolls of a 4-sided die
If the experiment consists of rolling two dice, then the sample space consists of the
following 36 points:

where the outcome (i, j) is said to occur if i appears on the first die and j on the
second die.
Probability Axioms
1. (Non-negativity) P(A) ≥ 0, for every event A.

2. (Additivity) If A and B are two disjoint events, then the probability of their union
P(A ∪ B) = P(A) + P(B)

Furthermore, if the sample space has an infinite number of elements and A1,A2, .
is a sequence of disjoint events, then the probability of their union satisfies

P(A1 ∪ A2 ∪ ・・ ・) = P(A1) + P(A2) + ・ ・ ・

3. (Normalization) The probability of the entire sample space Ω is equal to 1,

P(Ω) = 1
Probability Axioms
Example Coin tosses. Consider an experiment involving a single coin toss. There
are two possible outcomes, heads (H) and tails (T). The sample space is Ω = {H, T},
and the events are

 If the coin is fair, i.e., if we believe that heads and tails are “equally likely,” we
should assign equal probabilities to the two possible outcomes and specify that

 The additivity axiom implies that

 Which is consistent with the normalization axiom. Thus, the probability law is given

 Consider another experiment involving three coin tosses. The outcome will now
be a 3-long string of heads or tails. The sample space is

 Assume that each possible outcome has the same probability of 1/8.

 Using additivity, the probability of A is the sum of the probabilities of its


Discrete Probability Law
 If the sample space consists of a finite number of possible outcomes, then the
probability law is specified by the probabilities of the events that consist of a single
element. In particular, the probability of any event {s1, s2, . . . , sn} is the sum of
the probabilities of its elements:

Discrete Uniform Probability Law

 If the sample space consists of n possible outcomes which are equally likely (i.e.,
all single-element events have the same probability), then the probability of any
event A is given by

Consider the experiment of rolling a pair of 4-sided dice. We assume the dice are fair, and we
interpret this assumption to mean
that each of the sixteen possible outcomes [ordered
pairs (i, j), with i, j = [1, 2, 3, 4], has the same
probability of 1/16. To calculate the probability of
an event, we must count the number of elements of
event and divide by 16 (the total number of possible
outcomes). Here are some event probabilities
calculated in this way:

Sample space:

1, 1; 2, 1; 3, 1; 4, 1;
1, 2; 2, 2; 3, 2; 4, 2;
1, 3; 2, 3; 3, 3; 4, 3;
1, 4; 2, 4; 3, 4; 4, 4;

Sample space:

1, 1; 2, 1; 3, 1; 4, 1;
1, 2; 2, 2; 3, 2; 4, 2;
1, 3; 2, 3; 3, 3; 4, 3;
1, 4; 2, 4; 3, 4; 4, 4;

Continuous Models
 Probabilistic models with continuous sample spaces differ from their discrete counterparts in that the
probabilities of the single-element events may not be sufficient to characterize the probability law.

 This is illustrated in the following examples, which also illustrate how to generalize the uniform
probability law to the case of a continuous sample space.

Properties of Probability Laws

Properties of Probability Laws

Properties of Probability Laws

Properties of Probability Laws

Properties of Probability Laws

Conditional Probability
 Conditional probability provides us with a way to reason about the
outcome of an experiment, based on partial information

 Conditional probability of A given B, denoted by P(A|B) assuming

P(B) > 0;

Conditional Probability
 Conditional probability provides us with a way to reason about the outcome of an
experiment, based on partial information

 Conditional probability of A given B is P(A|B)

 Example: All six possible outcomes of a fair die roll are equally likely. If we are
told that the outcome is even, we are left with only three possible outcomes,
namely, 2, 4, and 6. These three outcomes were equally likely to start with, and so
they should remain equally likely given the additional knowledge that the outcome
was even.

Probability Law

Additive properties

 To verify the additivity axiom

 Two disjoint events A1 and A2

 We toss a fair coin three successive times. We wish to find the conditional probability P(A|B) when
A and B are the events defined as follow

A = {more heads than tails come up} B= {1st toss is a head}

 The sample space consists of eight sequences

 The event B consists of the four elements HHH, HHT, HTH, HTT, so its probability is

 The event A ∩ B consists of the three

elements outcomes HHH, HHT, HTH

 A fair 4-sided die is rolled twice and we assume that all sixteen
possible outcomes are equally likely. Let X and Y be the result of the
1st and the 2nd roll, respectively. We wish to determine the conditional
probability P(A|B) where

and m takes each of the values 1, 2, 3, 4

 The conditioning event B = {min(X,Y) = 2} consists of
the 5-element shaded set.

 B = { (2,2), (2,3), (2,4), (3,2), (4,2) }

 The set A = {max(X, Y) = m}

 where m takes each of the values 1, 2, 3, 4

 A = { (1,1), for m=1

(2,1), (1,2), (2,2) for m=2
(3,1), (3,2), (3,3), (1,3), (2,3), for m=3
(4,1), (4,2), (4,3), (4,4), (1,4), (2,4), (3,4)} for m=4

 The set A = {max(X, Y ) = m} shares with B two elements if m = 3 or m = 4, one

element if m = 2, and no element if m = 1.
 The conditioning event B = {min(X,Y) = 2} consists of
the 5-element shaded set.

 B = { (2,2), (2,3), (2,4), (3,2), (4,2) }

 The set A = {max(X, Y) = m}

 where m takes each of the values 1, 2, 3, 4

 A = { (1,1), for m=1

(2,1), (1,2), (2,2) for m=2
(3,1), (3,2), (3,3), (1,3), (2,3), (3,3) for m=3
(4,1), (4,2), (4,3), (4,4), (1,4), (2,4), (3,4), (4,4)} for m=4

 A conservative design team, call it C, and an innovative design team,

call it N, are asked to separately design a new product within a month.
From past experience we know that:

 (a) The probability that team C is successful is 2/3

 (b) The probability that team N is successful is 1/2.
 (c) The probability that at least one team is successful is 3/4.

 If both teams are successful, the design of team N is adopted.

 Assuming that exactly one successful design is produced, what is the

probability that it was designed by team N?
 There are four possible outcomes here, corresponding to the four combinations of
success and failure of the two teams:

 SS: both succeed

 FF: both fail

 SF: C succeeds, N fails

 FS: C fails, N succeeds

(a) The probability that team C is successful is 2/3  SS: both succeed
 FF: both fail
 SF: C succeeds, N fails
 FS: C fails, N succeeds
(b) The probability that team N is successful is 1/2

(c) The probability that at least one team is successful is 3/4

P(SS) + P(SF) + P(FS) + P(FF) = 1

 Assuming that exactly one successful design is produced, what is the probability
that it was designed by team N?

 SS: both succeed

 FF: both fail
 SF: C succeeds, N fails
 FS: C fails, N succeeds

Multiplication Rule
 Assuming that all of the conditioning events have positive probability,
we have

 The multiplication rule can be verified by writing

 By using the definition of conditional probability to rewrite the right-hand side above

Visualization of the total probability theorem

Total Probability Theorem
 Let A1, . . . , An be disjoint events that form a partition of the sample space (each
possible outcome is included in one and only one of the events A1, . . . , An) and
assume that P(Ai) > 0, for all i = 1, . . . , n.

 Then, for any event B, we have

 Three cards are drawn from an ordinary 52-card deck without replacement (drawn
cards are not placed back in the deck). We wish to find the probability that none of
the three cards is a heart. We assume that at each step, each one of the
remaining cards is equally likely to be picked. By symmetry, this implies that every
triplet of cards is equally likely to be drawn. A cumbersome approach, that we will
not use, is to count the number of all card triplets that do not include a heart, and
divide it with the number of all possible card triplets. Instead, we use a sequential
description of the sample space in conjunction with the multiplication rule.

 You enter a chess tournament where your probability of winning a game is 0.3 against half the players
(call them type 1), 0.4 against a quarter of the players (call them type 2), and 0.5 against the
remaining quarter of the players (call them type 3). You play a game against a randomly chosen
 What is the probability of winning? Let Ai be the event of playing with an opponent of type i. We

B be the event of winning

Ai be the event of
playing with an
opponent of type i.

 Using the additivity axiom, it follows that

 Since, by the definition of conditional

probability, we have
 the preceding equality yields

 Let B be the event of winning. We have

 Thus, by the total probability theorem, the probability of winning is

Homework exercise
We roll a fair four-sided die. If the result is 1 or 2, we roll once more but otherwise,
we stop. What is the probability that the sum total of our rolls is at least 4?

Homework exercise
Alice is taking a probability class and at the end of each week she can be either up-
to-date or she may have fallen behind. If she is up-to-date in a given week, the
probability that she will be up-to-date (or behind) in the next week is 0.8 (or 0.2,
respectively). If she is behind in a given week, the probability that she will be up-to-
date (or behind) in the next week is 0.6 (or 0.4, respectively). Alice is (by default) up-
to-date when she starts the class. What is the probability that she is up-to-date after
three weeks?

Bayes’ Rule
 Let A1,A2, . . . , An be disjoint events that form a partition of the sample space, and
assume that P(Ai) > 0, for all i. Then, for any event B such that P(B) > 0, we have

 Applying total probability theorem

Bayes’ Rule
 An example of the inference context that is implicit in Bayes’ rule. We observe a shade in a person’s X-ray (this
is event B, the “effect”) and we want to estimate the likelihood of three mutually exclusive and collectively
exhaustive potential causes: cause 1 (event A1) is that there is a malignant tumor, cause 2 (event A2) is that
there is a non-malignant tumor, and cause 3 (event A3) corresponds to reasons other than a tumor. We assume
that we know the probabilities P(Ai) and P(B | Ai), i = 1, 2, 3. Given that we see a shade (event B occurs),
Bayes’ rule gives the conditional probabilities of the various causes as

 Let us return to the chess problem of exercise on Slide - 51

 Ai is the event of getting an opponent of type i

 B is the event of winning

 Suppose that you win. What is the probability P(A1 |B) that you had an opponent
of type 1?

 Using Bayes’ rule, we have

Independent Events Vs Mutually Exclusive Events

 What are Independent Events?

 Independent events are those events whose occurrence is not dependent on

any other event.

 For example, if we flip a coin in the air and get the outcome as Head, then again
if we flip the coin but this time we get the outcome as Tail.

 In both cases, the occurrence of both events is independent of each other.

 If the probability of occurrence of an event A is not affected by the occurrence of

another event B, then A and B are said to be independent events.

 Consider an example of rolling a die. If A is the event ‘the number appearing is
odd’ and B be the event ‘the number appearing is a multiple of 3’, then prove that
A and B are the independent event events and compute P(A ∩ B) and P(A│B).

 A is the event ‘the number appearing is odd’ A = {1, 3, 5}

 P(A)= 3/6 = ½

 B be the event ‘the number appearing is a multiple of 3’ B = {3, 6}

 P(B) = 2/6

 Also A ∩ B is the event ‘the number appearing is odd and a multiple of 3’ so that
A∩B={3} P(A ∩ B) = 1/6
P(A ∩ B) = P(A)P(B) = (3/6)(2/6)=1/6
Therefore A and B are independent
 P(A│B) = P(A ∩ B)/ P(B) = (1/6)/(2/6) = 0.5
Independence and Multiplication rule of probability
 A is independent of B

 A is independent of B

 Consider an experiment involving two successive rolls of a 4-sided die in which all
16 possible outcomes are equally likely and have probability 1/16.

Consider the experiment of rolling a pair of 4-sided dice. We assume the dice are fair, and we
interpret this assumption to mean
that each of the sixteen possible outcomes [ordered
pairs (i, j), with i, j = [1, 2, 3, 4], has the same
probability of 1/16. To calculate the probability of
an event, we must count the number of elements of
event and divide by 16 (the total number of possible
outcomes). Here are some event probabilities
calculated in this way:

Sample space:

1, 1; 2, 1; 3, 1; 4, 1;
1, 2; 2, 2; 3, 2; 4, 2;
1, 3; 2, 3; 3, 3; 4, 3;
1, 4; 2, 4; 3, 4; 4, 4;

Conditional Independence
 Given an event C, the events A and B are called conditionally

Multiplication Rule
 Assuming that all of the conditioning events have positive probability,
we have

 The multiplication rule can be verified by writing

 By using the definition of conditional probability to rewrite the right-hand side above

Visualization of the total probability theorem

Summary - Independence
 Two events A and B are said to independent if
P(A ∩ B) = P(A) P(B)

 If in addition, P(B) > 0, independence is equivalent to the condition

P(A|B) = P(A)

 If A and B are independent, so are A and Bc.

 Two events A and B are said to be conditionally independent, given another event
C with P(C) > 0,
If P(A ∩ B |C) = P(A|C)P(B |C)

 If in addition, P(B ∩ C) > 0, conditional independence is equivalent to the condition

 P(A|B ∩ C) = P(A|C)

Independence of a Collection of Events
 The events A1,A2 , . . . , An are independent if

 The first three conditions simply assert that any two events are independent, a
property known as pairwise independence. But the fourth condition is also
important and does not follow from the first three. Conversely, the fourth condition
does not imply the first three. 69
 Prove that Pairwise independence does not imply independence

 Consider two independent fair coin tosses {H,T}, and the following events:
 H1 = {1st toss is a head} = {H}
 H2 = {2nd toss is a head} = {H}
 D = {the two tosses have different results} = {HH, {HT, TH}, TT}

 The events H1 and H2 are independent

 To see that H1 and D are independent,

 Similarly, H2 and D are independent.

 However, H1, H2, and D are not independent

Series Connection
 Let a subsystem consist of components 1, 2, . . . , m, and let (pi ) be the probability
that component i is up (“succeeds”).

 A series subsystem succeeds if all of its components are up

 Its probability of success is the product of the probabilities of success of the

corresponding components

Parallel Connection
 A parallel subsystem succeeds if any one of its components succeeds, so its
probability of failure is the product of the probabilities of failure of the
corresponding components

 For a given network, calculate the probability of success for a path from A to B

 For a given network, calculate the probability of success for a path from A to B

Independent Trials and the Binomial Probabilities
 If an experiment involves a sequence of independent but identical
stages, we say that we have a sequence of independent trials.

 In the special case where there are only two possible results at each
stage, we say that we have a sequence of independent Bernoulli

 E.g., “it rains” or “it doesn’t rain,” or coin tosses results as “heads” (H)
and “tails” (T).

 Compute the probability of k heads come up in an n-toss sequence

Independent Bernoulli trials – Sequential description
 Sequential description of the sample space of an experiment involving three
independent tosses of a biased coin

 n=3 long sequence of heads and tails that

involves k heads and (3 − k) tails

 n-long sequence that contains k heads

and n − k tails

 Compute the probability of k heads come up in an n-toss sequence

 The probability of any given sequence

that contains k heads is

 (called “n choose k”) are known as the binomial coefficients

 The probabilities p(k) are known as the binomial probabilities

 Where for any positive integer i we have

 by convention, 0! = 1
The Counting Principle

 Consider a process that consists of r

 (a) There are n1 possible results for the
first stage.
 (b) For every possible result of the first
stage, there are n2 possible results at the
second stage.
 (c) More generally, for all possible results
of the first i − 1 stages, there are ni
possible results at the ith stage.
 Then, the total number of possible
results of the r-stage process is
 n1 n2 ・ ・ nr.
Permutation and Combination
 If the order of selection does not matters,
the selection it is called a combination

 AB, AC, AD, BA, BC, BD, CA, CB, CD,


 If the order of selection matters, the

selection is called a permutation

 AB, AC, AD, BC, BD, CD

 Start with n distinct objects, and let k be some positive integer, with k ≤ n. Count the number of different
ways that we can pick k out of these n objects and arrange them in a sequence,

 We can choose any of the n objects to be the first one. Having chosen the first, there are only n−1
possible choices for the second; given the choice of the first two, there only remain n − 2 available objects
for the third stage, etc.

 When we are ready to select the last (the kth) object, we have already chosen k − 1 objects, which leaves us
with n − (k − 1) choices for the last one. By the Counting Principle, the number of possible sequences, called
k-permutations, is

 In the special case where k = n, the number of possible sequences, simply called permutations, is

Exercise - k-permutations
 Count the number of words that consist of four distinct letters.

 This is the problem of counting the number of 4-permutations of the 26

letters in the alphabet.

 The desired number is

 In a combination there is no ordering of the selected elements

 2-permutations of the letters A, B, C, and D are

 AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC,

 the combinations of two out four of these letters are

 AB, AC, AD, BC, BD, CD

 Note that specifying an n-toss sequence with k heads is the same as picking k elements (those
that correspond to heads) out of the n-element set of tosses. Thus, the number of combinations is
the same as the binomial coefficient

Combinations - Exercise
 Count the number of combinations of two out of the four letters A, B, C, and D

 Let n = 4 and k = 2.

Random variable
 A random variable is a variable which represents the outcome of a trial, an
experiment or event.

 It is a specific number which is different each time the trial, experiment or event is

 E.g. Throwing 2 dice

 Let X be the random variable equal to the sum of both the outcomes of the rolls

 Sample Space = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

 P(X=12) = 1/36

Concepts Related to Random Variables
 A random variable is a real-valued function of the outcome of the

 A function of a random variable defines another random variable.

 We can associate with each random variable certain “averages” of

interest, such the mean and the variance.

 A random variable can be conditioned on an event or on another

random variable.

 There is a notion of independence of a random variable from an

event or from another random variable. 86
Discrete Random Variable
 A random variable is called discrete if its range (the set of values
that it can take) is finite or at most countably infinite

 E.g.

 The sum of the two rolls.

 Sample Space = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

Continuous Random Variable
 Continuous random variables describe outcomes in probabilistic situations
where the possible values some quantity can take form a continuum, which is
often (but not always) the entire set of real numbers R

 Choosing a point a from the interval [−4, 4]

 The possible values of the temperature outside on any given day

Concepts Related to Discrete Random Variables
• A discrete random variable is a real-valued function of the outcome
of the experiment that can take a finite or countably infinite number of

• A (discrete) random variable has an associated probability mass

function (PMF), which gives the probability of each numerical value
that the random variable can take.

• A function of a random variable defines another random variable,

whose PMF can be obtained from the PMF of the original random

 For a discrete random variable X, these are captured by the probability mass
function (PMF) of X, denoted pX. In particular, if x is any possible value of X,
the probability mass of x, denoted pX(x), is the probability of the event
{X = x} consisting of all outcomes that give rise to a value of X equal to x:

 For example, let the experiment consist of two independent tosses of a fair
coin, and let X be the number of heads obtained. X = { HH, HT, TH, TT }

 Then the PMF of X is

 For a discrete random variable X, these are captured by the probability mass
function (PMF) of X, denoted pX. In particular, if x is any possible value of X,
the probability mass of x, denoted pX(x), is the probability of the event
{X = x} consisting of all outcomes that give rise to a value of X equal to x:

 For example, let the experiment consist of two independent tosses of a fair
coin, and let X be the number of heads obtained. X = { HH, HT, TH, TT }

 Then the PMF of X is

Calculation of the PMF of a Random Variable X
 Random variable X = maximum roll in two independent rolls of a fair 4-sided die.

 x = 1,  (1, 1)  pX(1) = 1/16

 x = 2,  (1, 2), (2, 2), (2, 1)  pX(2) = 3/16
 x = 3,  (1, 3), (2, 3), (3, 3), (3, 1), (3, 2)  pX(2) = 5/16
 x = 4,  (1, 4), (2, 4), (3, 4), (4, 4), (4, 1), (4, 2), (4, 3)  pX(2) = 7/16

Calculation of the PMF of a Random Variable X

 For each possible value x of X:

 Collect all the possible outcomes that give rise to the event {X = x}.

 Add their probabilities to obtain pX(x).

The Bernoulli Random Variable
 Consider the toss of a biased coin, which comes up a head with probability p, and a tail with
probability 1−p.
 The Bernoulli random variable takes the two values 1 and 0

 The Bernoulli random variable is used to model probabilistic situations with just two
 (a) The state of a telephone at a given time that can be either free or busy.
 (b) A person who can be either healthy or sick with a certain disease.

 Furthermore, by combining multiple Bernoulli random variables, one can construct

more complicated random variables.
The Binomial Random Variable
 A biased coin is tossed n times. At each toss, the coin comes up a head with
probability p, and a tail with probability 1−p, independently of prior tosses. Let X
be the number of heads in the n-toss sequence. We refer to X as a binomial
random variable with parameters n and p. The PMF of X consists of the binomial
probabilities that were calculated

 Applying Additive and Normalization Property

 Plot the Binomial PMF for the given cases:

 Case-1: If n=9 and p = 1/2, the PMF is symmetric around n/2

Case – 2: the PMF is skewed Case – 3: the PMF is

towards 0 if p < 1/2 skewed towards if p > 1/2

Geometric Random Variable
 The geometric random variable is used when repeated independent trials are
performed until the first “success.”

 Each trial has probability of success p and the number of trials until the first
success is modelled by the geometric random variable.

 The PMF of a geometric random variable

 It decreases as a geometric progression

with parameter 1 − p.

The Poisson Random Variable
 A Poisson random variable takes nonnegative integer values. Its PMF is given by

 where λ is a positive parameter characterizing the PMF

 Poisson random variable, think of a binomial random variable with very small p
and very large n.

 E.g. The number of cars involved in accidents in a city on a given day

The Poisson Random Variable

 Probability Mass Function (PMF)

 If λ < 1, then the PMF is monotonically decreasing

 While if λ > 1, the PMF first increases and then

decreases as the value of k increases
 Poisson PMF with parameter λ is a good approximation for a binomial PMF

with parameters n and p, provided λ = np, n is very large, and p is very
small, i.e.,

 Using the Poisson PMF may result in simpler models and calculations. For
example, let n = 100 and p = 0.01. Then the probability of k = 5 successes in
n = 100 trials is calculated using the binomial PMF as

 Using the Poisson PMF with λ = np = 100 ・ 0.01 = 1, this probability is approximated by
 Consider a probability model of today’s weather, let the random variable X be the
temperature in degrees Celsius, and consider the transformation Y = 1.8X + 32,
which gives the temperature in degrees Fahrenheit. In this example, Y is a linear
function of X, of the form

 Nonlinear functions of the general form

 If X is discrete with PMF pX, then Y is also discrete, and its PMF pY can be
calculated using the PMF of X. In particular, to obtain pY (y) for any y, we add the
probabilities of all values of x such that g(x) = y:

 Let Y = |X| and let us apply the preceding formula for the PMF and compute
compute pY(y) to the case where

Y = |X|

 Let Y = |X| and let us apply the preceding formula for the PMF and compute
compute pY(y) to the case where

 The possible values of Y are y = 0, 1, 2, 3, 4. To compute pY (y) for some given value y from this
range, we must add pX(x) over all values x such that |x| = y. In particular, there is only one value
of X that corresponds to y = 0, namely x = 0.

The PMFs of X and Y = |X|

Homework exercise
 Try the previous exercise for Z = X2

Expectation or Mean of a Random Variable
 Suppose you spin a wheel of fortune many times. At each spin, one of the numbers
m1, m2, . . . , mn comes up with corresponding probability p1, p2, . . . , pn, and this is
your monetary reward from that spin. What is the amount of money that you
“expect” to get “per spin”?
 Suppose that you spin the wheel k times, and that ki is the number of times that the
outcome is mi. Then, the total amount received is m1k1 +m2k2 +・ ・ ・+ mnkn.

 The amount received per spin is

2 5 + 3 10 + (1)(20)
𝑀= = 12

 If the number of spins k is very large, and if we are willing to interpret

probabilities as relative frequencies, it is reasonable to anticipate that mi comes
up a fraction of times that is roughly equal to pi:

Expectation or Mean of a Random Variable
 Thus, the amount of money per spin that you “expect” to receive is

Exercise - Compute mean
 Consider two independent coin tosses, each with a 3/4 probability of a head, and
let X be the number of heads obtained. This is a binomial random variable with
parameters n = 2 and p = 3/4. Its PMF is given below. Compute mean.

3/4 probability of a head, 1/4 prob. of tail

Exercise - Compute mean

nth moment of X
 1st moment of X is just the mean

 2nd moment of the random variable X as the expected value of the random
variable X2

 nth moment as E[Xn], the expected value of the random variable Xn

Variance and Standard Deviation

 The variance is always nonnegative

 The variance provides a measure of dispersion of X around its mean.

 Another measure of dispersion is the standard deviation of X, which is defined

as the square root of the variance and is denoted by σX:

 The standard deviation is often easier to interpret, because it has the same
units as X. For example, if X measures length in meters, the units of variance
are square meters, while the units of the standard deviation are meters.
Exercise – Compute Mean
 Consider the random variable X, which has the PMF

 The mean E[X] is equal to 0. This can be seen from the symmetry of the PMF of X
around 0, and can also be verified from the definition:

Exercise – Compute Variance
 Consider the random variable X, which has the PMF

Expected Value Rule for Functions of Random Variables

Expected Value Rule for Functions of Random Variables

Expected Value Rule for Functions of Random Variables
 Using the expected value rule, we can write the variance of X as

 Similarly, the nth moment is given by

 There is no need to calculate the PMF of Xn

 For the random variable X with PMF

Summary - Variance

Expected value rule for functions
 Y is a random variable is a function of another random variable X

 Where a and b are given scalars. Let us derive the mean and the variance of the
linear function Y.

Property of Mean and Variance

Variance in Terms of Moments Expression

Exercise - Mean and Variance of the Bernoulli
 Consider the experiment of tossing a biased coin, which comes up a head with
probability p and a tail with probability 1 − p, and the Bernoulli random variable X
with PMF

 Compute mean, second moment, and variance.

Second moment

 What is the mean and variance of the roll of a fair six-sided die? If we view the
result of the roll as a random variable X, its PMF is

E[X] = 3.5

The Mean of the Poisson

 The mean of the Poisson PMF can be calculated is follows:

Joint PMFs of Multiple Random Variables
 Consider two discrete random variables X and Y associated with the same

 The joint PMF of X and Y is defined by

Joint PMFs of Multiple Random Variables
 The joint PMF determines the probability of any event that can be specified in terms of the
random variables X and Y. For example if A is the set of all pairs (x, y) that have a certain
property, then

 We can calculate the PMFs of X and Y by using the formulas

 Where the second equality follows by noting that the

event {X = x} is the union of the disjoint events {X = x, Y
= y} as y ranges over all the different values of Y. The
formula for pY (y) is verified similarly. We sometimes refer
to pX and pY as the marginal PMFs, to distinguish them
from the joint PMF.
Functions of Multiple Random Variables
 A function Z = g (X, Y) of the random variables X and Y defines another random
variable Z. Its PMF can be calculated from the joint PMF pX,Y according to

 In the special case where g is linear and of the form aX+bY +c, where a, b, and c
are given scalars, we have

Illustration of the tabular method for calculating marginal PMFs
from joint PMFs

More than two Random Variables
 The joint PMF of three random variables X, Y , and Z is defined

 For all possible triplets of numerical values (x, y, z). Corresponding marginal PMFs

 The expected value rule for functions takes the form

 if g is linear and of the form aX + bY + cZ + d, then

Exercise - Mean of the Binomial
 A class has 300 students and each student has probability 1/3 of getting an A, independently of
any other student. What is the mean of X, the number of students that get an A?

 Thus X1,X2, . . . , Xn are Bernoulli random variables with common mean p = 1/3 and
variance p(1 − p) = (1/3)(2/3) = 2/9. Their sum is the number of students that get an A.

 If we repeat this calculation for a general number of students n and probability of A equal to p, we

Summary of Facts About Joint PMFs
 Let X and Y be random variables associated with the same experiment.
 The joint PMF of X and Y is defined by

 The marginal PMFs of X and Y can be obtained from the joint PMF, using the formulas

 A function g(X, Y ) of X and Y defines another random variable, and

 If g is linear, of the form aX + bY + c, we have

 PMF pX|A(x): For each x, we add the probabilities of the outcomes in the intersection {X = x} ∩ A
and normalize by diving with P(A).

 The conditional PMF of a random variable X, conditioned on a particular event A with P(A) > 0, is
defined by

 Note that the events {X = x} ∩ A are disjoint for

different values of x, their union is A, and,

 Combining the above two formulas, we see that

 Let X be the roll of a die and let A be the event that the roll is an even number.
Compute the PX/A(x).

Conditioning one Random Variable on Another
 Let X and Y be two random variables conditional PMF pX|Y of X given Y

 Using the definition of conditional probabilities

 Normalization property

 Joint PMF, using a sequential approach

 Consider a transmitter that is sending messages over a computer network. Let us define the
following two random variables:
 X : the travel time of a given message, Y : the length of the given message.
 We know the PMF of the travel time of a message that has a given length, and we know the PMF of
the message length. We want to find the (unconditional) PMF of the travel time of a message.

 Assume that the travel time X of the message depends on its length Y and the congestion level of
the network at the time of transmission. In particular, Length of a message can take two possible
 y = 102 bytes with probability (p) = 5/6
 y = 104 bytes with probability (p) = 1/6

 The travel time is (10-4 Y) secs with probability 1/2

 (10-3 Y) secs with probability 1/3
 (10-2 Y) secs with probability 1/6

Exercise - Compute the PMF of X
 To compute the PMF of X, we use the total probability formula

Conditional Expectation
 Let X and Y be random variables associated with the same experiment.

 The conditional expectation of X given an event A with P(A) > 0, is defined by

 For a function g(X), it is given by

 The conditional expectation of X given a value y of Y is defined by

Total expectation theorem
 Let A1, . . . , An be disjoint events that form a partition of the sample space, and
assume that P(Ai) > 0 for all i. Then,

 The total expectation theorem basically says that “the unconditional average
can be obtained by averaging the conditional averages.”

Homework – Exercise – 2.12 (Page-29)
 Consider four independent rolls of a 6-sided die. Let X be the number of 1’s and let
Y be the number of 2’s obtained. What is the joint PMF of X and Y ? The marginal
PMF pY is given by the binomial formula

Independence of a Random Variable from an Event
 The independence of a random variable from an event is similar to the
independence of two events. The idea is that knowing the occurrence of the
conditioning event tells us nothing about the value of the random variable. We can
say that the random variable X is independent of the event A if

 Which is the same as requiring that the two events {X = x} and A be

independent, for any choice x. As long as P(A) > 0, and using the definition
pX|A(x) = P(X = x and A)/P(A) of the conditional PMF, we see that independence is
the same as the condition

Independence of Random Variables
 The notion of independence of two random variables is similar. We say that
two random variables X and Y are independent if

 X and Y are said to be conditionally independent, given a positive probability

event A,

Independence of Random Variables
 If X and Y are independent random variables, then

 Proof:

 Consider now the sum Z = X + Y of two independent random variables X and Y,
and let us calculate the variance of Z. We have, using the relation
 E[X + Y ] = E[X] + E[Y ],

Independence of Several Random Variables
 For example, three random variables X, Y , and Z are said to be independent if

 If X, Y, and Z are independent random variables, then any three random variables
of the form f(X), g(Y), and h(Z), are also independent

 Similarly, any two random variables of the form g(X, Y) and h(Z) are independent.

 On the other hand, two random variables of the form g(X, Y) and h(Y, Z) are
usually not independent, because they are both affected by Y .

 If X1,X2, . . . , Xn are independent random variables, then

Continuous Random Variables and PDFs
 A random variable X is called continuous if its probability law can be described in
terms of a nonnegative function fX, called the probability density function of X,
or Probability Density Function (PDF) for short, which satisfies

 The probability that the value of X falls within an interval is given below and and
can be interpreted as the area under the graph of the PDF

Continuous Random Variables and PDFs
 The probability that the value of X falls within an interval is given below and can
be interpreted as the area under the graph of the PDF

entire area under the graph of the PDF must

be equal to 1.

Probability mass per unit length
 To interpret the PDF, note that for an interval [x, x + δ] with very small length δ,
we have

 If δ is very small, the probability that X takes value in the interval [x, x + δ] is
the shaded area in the figure, which is approximately equal to fX(x)・ δ.

Continuous Uniform Random Variable
 A gambler spins a wheel of fortune, continuously calibrated between 0 and 1,
and observes the resulting number. Assuming that all subintervals of [0,1] of the
same length are equally likely, this experiment can be modelled in terms a
random variable X with PDF

 For some constant c. This constant can be determined by using the normalization

The PDF of a uniform random variable

Piecewise Constant PDF
 Alvin’s driving time to work is between 15 and 20 minutes if the day is sunny,
and between 20 and 25 minutes if the day is rainy, with all times being equally
likely in each case. Assume that a day is sunny with probability 2/3 and rainy
with probability 1/3. What is the PDF of the driving time, viewed as a random
variable X?

 “All times are equally likely” in the sunny and the rainy cases, to mean that the
PDF of X is constant in each of the intervals [15, 20] and [20, 25].

RJEs: Remote job entry points Ref. Book - Introduction to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis
Piecewise Constant PDF

Piecewise Constant PDF
 Where a1, a2, . . . , an are some scalars with ai < ai+1 for all i, and c1, c2, . . . , cn are
some nonnegative constants

PDF Properties

 The expected value or mean of a continuous random variable X is
defined by

 If Y = g(X) is a discrete random variable. In either case, the mean of g(X)

satisfies the expected value rule

Expectation of a Continuous Random Variable and its Properties

Expectation of a Continuous Random Variable and its Properties

 Consider the case of a uniform PDF over an interval [a, b], as shown below
and compute Mean and Variance of the Uniform Random Variable

 Consider the case of a uniform PDF over an interval [a, b], as shown below and
compute Mean and Variance of the Uniform Random Variable

 Consider the case of a uniform PDF over an interval [a, b], as shown below and
compute Mean and Variance of the Uniform Random Variable

Exponential Random Variable
 An exponential random variable has a PDF of the form

 Where λ is a positive parameter characterizing the PDF

 Note that the probability that X exceeds a certain value falls exponentially.
Indeed, for any a ≥ 0, we have

Exponential Random Variable
 An exponential random variable has a PDF of the form

 The mean and the variance can be calculated to be

 An exponential random variable can be a very good model for the amount of
time until a piece of equipment breaks down, until a light bulb burns out, or until
an accident occurs.
Mean of an Exponential Random Variable
 An exponential random variable has a PDF of the form

Second moment of an Exponential Random Variable
 An exponential random variable has a PDF of the form

Variance of an Exponential Random Variable
 An exponential random variable has a PDF of the form

 The time until a small meteorite first lands anywhere on the earth is modelled as
an exponential random variable with a mean of 10 days. The time is currently
midnight. What is the probability that a meteorite first lands some time between
6am and 6pm of the first day?

 Let X be the time elapsed until the event of interest, measured in days. Then, X is
exponential, with mean 1/λ = 10, which yields λ = 1/10.

 The desired probability is

Cumulative Distribution Functions - CDF
 The CDF of a random variable X is denoted by FX and provides the probability
P(X ≤ x).

 The CDF FX(x) “accumulates” probability “up to” the value x.

Properties of a CDF

Properties of a CDF

The Geometric and Exponential CDFs
 Let X be a geometric random variable with parameter p; that is, X is the number of
trials to obtain the first success in a sequence of independent Bernoulli trials, where
the probability of success is p.

CDFs of some continuous random variables.

Relation of the geometric and the exponential CDFs

Normal Random Variables
 A continuous random variable X is said to be normal or Gaussian if it has a PDF of
the form

 Where μ and σ are two scalar parameters characterizing the PDF, with σ
assumed nonnegative. It can be verified that the normalization property

Normal Random Variables

 A normal PDF and CDF, with μ = 1 and σ2 = 1. We observe that the PDF is
symmetric around its mean μ, and has a characteristic bell-shape. As x gets
further from μ, the term e−(x−μ)2/2σ2 decreases very rapidly. In this figure, the PDF is
very close to zero outside the interval [−1, 3].
Normal Random Variables
 The mean and the variance can be calculated to be

 To see this, note that the PDF is symmetric around μ, so its mean must be μ.
Furthermore, the variance is given by

 Using the change of variables y = (x − μ)/σ and integration by parts, we have

Normal Random Variables

 Using the change of variables y = (x − μ)/σ and integration by parts, we have

Standard Normal Random Variables
 The last equality above is obtained by using the fact

Normality is Preserved by Linear Transformations

The Standard Normal Random Variable
 A normal random variable Y with zero mean and unit variance is said to be a
standard normal. Its CDF is denoted by Φ,

 Let X be a normal random variable with mean μ and variance σ2. We “standardize”
X by defining a new random variable Y given by

 Since Y is a linear transformation of X, it is normal. Furthermore,

The Standard Normal Random Variable

CDF Calculation of the Normal Random Variable

Exercise - Signal Detection
 A binary message is transmitted as a signal that is either −1 or +1. The communication channel corrupts the
transmission with additive normal noise with mean μ = 0 and variance σ2. The receiver concludes that the
signal −1 (or +1) was transmitted if the value received is < 0 (or ≥ 0, respectively); What is the probability of

 The area of the shaded region gives the probability of error in the two cases where −1 and +1 is transmitted.
Exercise - Signal Detection
 An error occurs whenever −1 is transmitted and the noise N is at least 1

 So that N +S = N −1 ≥ 0,

 Whenever +1 is transmitted and the noise N is smaller than −1

 So that N + S = N +1 < 0.

 In the former case, the probability of error is

 For σ = 1, we have Φ(1/σ) = Φ(1) = 0.8413, and

the probability of the error is 0.1587.

 The conditional PDF of a continuous random variable X, conditioned on a
particular event A with P(A) > 0, is a function fX|A that satisfies

 The conditional PDF of a continuous random variable X, conditioned on a
particular event A with P(A) > 0, is a function fX|A that satisfies

 The unconditional PDF fX and the conditional PDF fX|A, where A is the
interval [a, b]. Note that within the conditioning event A, fX|A retains the
same shape as fX, except that it is scaled along the vertical axis.
Conditional PDF and Expectation Given an Event

Conditional PDF and Expectation Given an Event

Multiple Continuous Random Variables
 We say that two continuous random variables associated with a common experiment are jointly
continuous and can be described in terms of a joint PDF fX,Y , if fX,Y is a nonnegative function that

 For every subset B of the two-dimensional plane. The notation above means that the integration is
carried over the set B. In the particular case where B is a rectangle of the form B = [a, b] × [c, d], we

 Normalization property

Multiple Continuous Random Variables
 To interpret the PDF, we let δ be very small and consider the probability of a small
rectangle. We have

Marginal PDF
 The joint PDF contains all conceivable probabilistic information on the random
variables X and Y , as well as their dependencies. It allows us to calculate the
probability of any event that can be defined in terms of these two random variables.
As a special case, it can be used to calculate the probability of an event involving
only one of them. For example, let A be a subset of the real line and consider the
event {X ∈ A}. We have

 If X and Y are jointly continuous random variables, and g is some function, then
Z = g(X, Y ) is also a random variable. We will see in Section 3.6 methods for
computing the PDF of Z, if it has one. For now, let us note that the expected
value rule is still applicable and

 As an important special case, for any scalars a, b, we have

Conditioning one Random Variable on Another

Circular Uniform PDF
 We throw a dart at a circular target of radius r. We assume that we always hit the
target, and that all points of impact (x, y) are equally likely, so that the joint PDF
of the random variables X and Y is uniform. Since the area of the circle is πr2,
we have

Circular Uniform PDF

 To calculate the conditional

PDF fX|Y(x|y), let us first
calculate the marginal PDF
fY (y). For |y| > r, it is zero. For Note that the marginal fY (y) is
|y| ≤ r, it can be calculated as not a uniform PDF.

Circular Uniform PDF

Independence of Continuous Random Variables

Joint CDFs

Expectation or mean
 We have seen that the mean of a function Y = g(X) of a continuous random
variable X, can be calculated using the expected value rule

Home-work - Exercise
 John Slow is driving from Boston to the New York area, a distance of 180 miles. His average speed
is uniformly distributed between 30 and 60 miles per hour. What is the PDF of the duration of the

Stochastic process
 A stochastic process is a mathematical model of a probabilistic experiment that
evolves in time and generates a sequence of numerical values. For example,
a stochastic process can be used to model:

 (a) the sequence of daily prices of a stock;

 (b) the sequence of scores in a football game;

 (c) the sequence of failure times of a machine;

 (d) the sequence of hourly traffic loads at a node of a communication network;

 (e) the sequence of radar measurements of the position of an airplane.

Two major categories of stochastic processes
 Arrival-Type Processes:

 Here, we are interested in occurrences that have the character of an “arrival,”

such as message receptions at a receiver, job completions in a manufacturing
cell, customer purchases at a store, etc.

 Inter-arrival times (the times between successive arrivals) are independent

random variables.

 The inter-arrival times are Geometrically distributed – Bernoulli process

 The inter-arrival times are Exponentially distributed – Poisson process.

Two major categories of stochastic processes

 Markov Processes

 Experiments that evolve in time and in which the future evolution exhibits a
probabilistic dependence on the past.

 As an example, the future daily prices of a stock are typically dependent on past

