ECE226 ProbabilityClassNotes

PROBABILITY & RANDOM PROCESSES (ECE 226) January 18, 2018
Rutgers, Spring 2018 Syllabus
PROBABILITY AND RANDOM PROCESSES
Probability theory studies random phenomena in a formal mathematical way. It is essential for all
engineering and scientific disciplines dealing with models that depend on chance. Probability plays
a central role in e.g., telecommunications and finance systems. Telecommunications systems strive to
provide reliable and secure transmission and storage of information under the uncertainties coming
from various types of random noise and adversarial behavior. Finance systems strive to maximize
profits in spite of the uncertainties coming from natural and man-made events. The students will
learn the fundamentals of probability that are necessary for several ECE courses and related fields.
Class time and place: Mon & Thr, 10:20 PM - 11:40 AM, HLL-114.
Instructor: Emina Soljanin emina.soljanin@rutgers.edu, CoRE 511, 848-445-5256
Instructor’s office hours: by appointment only

Office hours will be primarily to address personal issues for which discussion in a public forum
would be inappropriate. For technical questions about the course material, we will use TAs’
office hours and Piazza, which will be monitored by the instructor and the TAs.
TAs’ office hours:

The TAs will be in EE 224 at the hours listed below, and otherwise by appointment.
Thrusday 4:00 – 5:00 pm: Amir Behrouzi Far, amir.behrouzifar@rutgers.edu, Sections 2 & 4
Monday 9:00 – 10:00am: Fatemeh Koochaki, fatmakoochaki@gmail.com, Sections 1 & 5
Monday 3:30 – 4:30pm: Chrysanthi Koumpouzi, chrys.koumpouzi@rutgers.edu, Class TA
Friday 4:00 – 5:00pm Poojankuma Oza, poojan.oza@rutgers.edu, Sections 3 & 6
Please direct your questions about the quizzes and homework to the class TA Chrys Koumpouzi,
and feel free to contact any TA for technical questions about the course material.
Prerequisites: calculus
Grading: quizzes & homework 20%, 2 midterm exams 20% each, final exam 40%.
The midterm exams will be in class on February 15 and March 29, closed books and notes.
Text: Two textbooks available online (click on the book title below):
1. Introduction to Probability by Grinstead and Snell
2. Introduction to Probability, Statistics, and Random Processes by Pishro-Nik
Course notes: given per week in separate documents on the class Sakai page.
BASIC SET THEORY
PROBABILITY AND RANDOM PROCESSES (ECE 226)

Rutgers, Spring 2018
Lecture Notes, January 18, 2018
1/9
(Random) Experiments
2/9
An Experiment & its Set of Outcomes
The set of outcomes for the experiment of tossing this 20-faced coin
is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20}.
I A set is a collection of some items (elements).
I Sample space is the set of all possible outcomes of an experiment.
I An empty set is the set with no elements.
3/9
An Experiment & its Set of Outcomes
is “in the eyes of the beholder” (or a measuring apparatus).
What if I cannot identify the digits, but can tell if there are two or one?
4/9
Membership and Inclusion
Elements belong to sets (or not) and sets contain elements (or not).
Sets are subsets of other sets (or not).
The set A is a subset of B if every element of A is also an element of B.
What is B to A?
Notation:
I We use upper case letters for sets (and lower case for set elements).
I We use ⌦ for the sample space, and ; for the empty set.
I 2 means “belongs to”. How about 3 and 2 /?
Example: If ⌦ = {H, T }, then H 2 ⌦ and ⌦ 3 T and a 2
/ ⌦.
I ⇢ means “is subset of” as in A ⇢ B. Is B A?
What can we say if ⌦ = {1, 2, 3, 4, 5, 6}, A = {1, 2, 3}, and B = {2, 4, 6}?
5/9
Set Operations
Let A and B be two sets.
I The union of A and B is the set
A [ B = {x | x 2 A or x 2 B} .
I The intersection of A and B is the set
A \ B = {x | x 2 A and x 2 B} .
I The difference of A and B is the set
A - B = A \ B = {x | x 2 A and x 62 B} .
I The complement of A is the set
Ā = Ã = Ac = {x | x 2 ⌦ and x 62 A} .
6/9
Set Operations – Ven Diagrams
A\B A[B
A B A B
A\B Ac
A B A
7/9
How do Sets Relate to Each Other?
Example: If A = {1, 3, 5}, and B = {2, 4, 6}, what is A \ B?

I If A \ B = ;, we say that A and B are disjoint or mutually exclusive.
Example: If ⌦ = {1, 2, 3, 4, 5, 6},

what do A1 = {1, 2}, A2 = {3, 4}, A3 = {5, 6} together define?
I Sets Ai , i = 1, . . . , n define a partition of ⌦ if
1. they are disjoint, that is, Ai \ Aj = ;, and
2. their union is ⌦, that is A1 [ A2 [ · · · [ An = ⌦.
8/9
Reading Material
For this week:

1.0, 1.1, 1.2 in Introduction to Probability, Statistics, ...
For near future (only if you wish, proceed with caution):

1.3 in Introduction to Probability, Statistics, ...
1.2 in Introduction to Probability
9/9
BASIC PROBABILITY

1 / 13
An Experiment, Its Outcomes & Their Probabilities
The set of outcomes for the die-rolling experiment is {1, 2, 3, 4, 5, 6}.
Sample space ⌦ is the set of all possible outcomes of an experiment.

Elements of ⌦ are called outcomes, and its subsets are called events.
Example:
“An even number of dots turned up” describes an outcome or an event?
We shall study experiments from a probabilistic point of view.

We start with experiments whose sample spaces are finite sets.
2 / 13
Exercise
I A, B, C are three events in a sample space ⌦.

I An experiment is made and the following is learned:
The outcome is in event A, but neither in B nor in C.
Express the resulting event in terms of the events A, B, and C
using only the complement, union, and intersection operations.
A \ (B [ C)c
(There may be several ways to write the same statement.)
3 / 13
Exercise continued
The outcome is
1. in at most one of the events A or B.
(A \ B)c
2. not in any of the events A, B or C.
(A [ B [ C)c
3. both in event A and in event B, but not in event C.

A \ B \ Cc
4. either in event A or, if it is not, then it is not in event B either.

A [ (Ac \ Bc )
4 / 13
An Experiment, Its Outcomes & Their Probabilities
The size of the sample space (any set) ⌦ is denoted by |⌦| .

We assign probabilities to the possible outcomes of an experiment by
assigning to outcome j a nonnegative number µj in such a way that
|⌦|
X
µj = 1 .
j=1
Example: A die is rolled once:

I the sample space for this experiment is the 6-element set
⌦ = {1, 2, 3, 4, 5, 6} ,
I outcomes correspond to the 6 faces of the die,

I probability µj = 1/6 for each outcome j 2 ⌦ if the die is unbiased.
5 / 13
An Experiment & Its Associated Random Variable(s)
The outcome of a random experiment is called a random variable, RV .

Random variables are denoted by capital letters.
Example: A die is rolled once.
I ⌦ = {1, 2, 3, 4, 5, 6} is the sample space of this experiment.
I We let random variable X denote the outcome of this experiment.
I “Outcome j happens with probability µj .”
,
”X takes value j with probability µj .”
I The event E = {2, 4, 6} can be described by saying
“The result of the roll is an even number.”
,
“X is even.”
6 / 13
Can we associate more than one RV to an experiment?

We can have RVs X and Y with the respective sample spaces

⌦X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20}
⌦Y = {“single-digit number”, “double-digit number”}
7 / 13
Random Variable & its Probability Distribution
For an experiment, we have
X: RV denoting the value of its outcomes
⌦: the sample space (i.e., the set of all possible values of X)
A distribution function for X is defined as a real-valued function µ

whose domain is ⌦ and which satisfies:
1. µ(!) > 0 for all outcomes ! 2 ⌦
&
P
2. µ(!) = 1 .
!2⌦
We define the probability of an event E (E ⇢ ⌦) to be the number P(E)

X
P(E) = µ(!) .
!2E
Note that the probability P(E) is determined from the distribution µ.

8 / 13
Properties of Probability
We claim the following:
1. P(E) > 0 for every E ⇢ ⌦

2. P(⌦) = 1
3. If E ⇢ F ⇢ ⌦ then P(E) 6 P(F)
4. If A and B are disjoint subsets of ⌦
then P(A [ B) = P(A) + P(B)
5. P(Ā) = 1 - P(A) for every A ⇢ ⌦
9 / 13
A Claim and a Proof
Claim: If E ⇢ F ⇢ ⌦, then P(E) 6 P(F).
Proof:
I By the definition of the probability of an event, we have
X X
P(E) = µ(!) and P(F) = µ(!)
!2E !2F
I We note that
X X X
P(F) = µ(!) = µ(!) + µ(!) > P(E)
!2F !2E !2F\E
| {z } | {z }
P(E) >0
10 / 13
A Claim and a Proof
Claim:
Let A1 , . . . , An be a partition of ⌦, and let E be any event. Then
n
X
P(E) = P(E \ Ai )
i=1
? Proof reasoning:
I What is a partition ? Ai are pairwise disjoint and
A1 [ · · · [ An = ⌦.
P
I What about ? Have we seen a sum of event probabilities earlier?
If A and B are disjoint then P(A [ B) = P(A) + P(B)
I The claim should hold for any n; how about n = 2 ?
I What is the union of sets E \ Ai ? What is their intersection ?
11 / 13
What can we say about sets E \ Ai ?
When A1 , . . . , An form a partition of ⌦, than for any E ⇢ ⌦,

n
[
E= (E \ Ai )
i=1
⌦
A1
E A3
A2
12 / 13
Reading Material
For this class:

For next class (only if you wish, proceed with caution):

13 / 13
CONDITIONAL PROBABILITY

1 / 18
An Experiment, Its Outcomes & Events
Sample space ⌦ is the set of all possible outcomes of an experiment.

Example:
The sample space for the die-rolling experiment is ⌦ = {1, 2, 3, 4, 5, 6}.
“The number of dots that turned up is divisible by 3” is an event.
How can we use the knowledge that an event occurred?
2 / 18
Re-Assessing Beliefs
I If an unbiased die is rolled, we believe each number is equally likely.

I Suppose we learn that "the number that turned up is divisible by 3".
Q: How should we change our beliefs about which face turned up?
If µ is the a priori distribution function over ⌦ = {1, 2, 3, 4, 5, 6},

what is the distribution after the "divisible by 3" event µ0 ?
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 0 0 1/2 0 0 1/2
3 / 18
Conditional Probability
Let µ be a distribution function assigned to ⌦.

Recall that
X
I µ(!) > 0 for all ! 2 ⌦ and µ(!) = 1
!2⌦
X
I the event probability is determined by µ as P(F) = µ(!).
!2F
Suppose we learn that an event E has occurred.

How should we change (update) µ? How about the event probability?
We shall call the new probability for an event F

the conditional probability of F given E and denote it by P(F|E).
4 / 18
Conditional Probability Examples
For the "rolling an unbiased die" experiment,

I µ is the a priori distribution function
I µ0 is the distribution function after the "divisible by 3" event E .
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 0 0 1/2 0 0 1/2
If we know that E has taken place, what can we say about other events?
What is P(F) and P(F|E) for the following events F ?

"number strictly smaller than 3" and "even number"
5 / 18
Conditional Probability Computing
Before the experiment:
E E\F F
6 / 18
Conditional Probability Computing
After the experiment which resulted in event E:
E is what remains of ⌦
E E \ F is what remains of F
What is then P(F|E)?

P(E \ F)
P(F|E) =
P(E)
7 / 18
An Example – Revising Belief

I Suppose we learn that "the number that turned up is divisible by 3".
Q: How should we change our belief that 6 turned up? What is P(F|E)?
A: We need to answer the following questions:

I What is event E and P(E)?
I What is event F and P(F)?
I What is event E \ F and P(E \ F)?
I What is P(F|E)?
8 / 18
An Example – Is New Knowledge Always Helpful?

I Suppose we learn that "the number that turned up is even".
Q: What is then the probability that it is divisible by 3? What is P(F|E)?
A: We need to answer the following questions:

I What is event E and P(E)?
I What is event F and P(F)?
I What is event E \ F and P(E \ F)?
I What is P(F|E)?
9 / 18
The Bayes Rule
P(E \ F)
We have seen that P(F|E) = . What is P(E|F)?
P(E)
P(F \ E)
P(E|F) =
P(F)
Note that P(E \ F) = P(E)P(F|E) = P(E|F)P(F)
We get the Bayes Rule:
P(E \ F) P(E|F)P(F)
P(F|E) = =
P(E) P(E)
10 / 18
A Reminder
When A1 , . . . , An form a partition of ⌦, than for any E ⇢ ⌦,

n
[
E= (E \ Ai )
i=1
⌦
A1
E A3
A2
11 / 18
Total Probability
Two facts we know:

1. When A1 , . . . , An form a partition of ⌦, than for any E ⇢ ⌦,
n
X
P(E) = P(E \ Ai )
i=1
2. For any E, F ⇢ ⌦,
P(E \ F)
P(F|E) =
P(E)
From 1 and 2, we get the total probability formula:
n
X
P(E) = P(E | Ai )P(Ai )
i=1
12 / 18
An Example
One of the 3 biased dice is picked uniformly at random and rolled.
Each die is equally likely to be picked as any other.
⌦
µW 1/6 1/6 1/6 1/6 1/6 1/6
µB 0 0 1/2 0 0 1/2
µR 1/3 0 1/3 0 1/3 0
What is the probability of getting ? How about ?

If comes up, what is the probability the white die was picked?
13 / 18
An Example(continued)
I Note that the sample space of this experiment is

{W1, W2, W3, W4, W5, W6, B1, B2, B3, B4, B5, B6, R1, R2, R3, R4, R5, R6}
I How do we partition this space?
I How do we use the formula of total probability?
14 / 18
Independent Events – An Example
Experiment: One of the 3 dice is picked at random and rolled.

Each die is unbiased and equally likely to be picked as any other.
The sample space of this experiment is

If an even number turned up, what is the probability that the die is red?
If the red die is picked, what is the probability that the number is even?
15 / 18
Independent Events
If two events E and F have positive probabilities and
P(E|F) = P(E) and P(F|E) = P(F)
we say that E and F are independent.
Claim:
Two events E and F are independent if and only if
P(E \ F) = P(E)P(F) .
Proof :
P(E \ F) = P(E|F)P(F) = P(E)P(F)
16 / 18
Independence is Tricky
In the "unbiased die rolling experiment",

are "number divisible by 3" and "even number" independent events?
In the "three dice rolling" experiment, if each die is differently biased,

are "red color" and "even number" independent events?
17 / 18
Reading Material
For this class:


2 in Introduction to Probability, Statistics, ...
3 in Introduction to Probability
18 / 18
COUNTING METHODS

1 / 14
A Two-Stage Experiment
Experiment: One of the 3 dice is 1) picked at random and 2) rolled.
The sample space of this experiment is:

I The sub-experiment of picking a die has 3 outcomes.
I The sub-experiment of rolling the piked die has 6 outcomes.
Claim: An experiment that consists of two sub-experiments,

one with n and the other k outcomes, has n · k outcomes.
2 / 14
Sampling With and Without Replacement
An urn with n balls numbered from 1 to n.
I Sampling with replacement :

A ball is drawn, its number is recorded, then the ball is returned.
I Sampling without replacement :
A ball is drawn, its number is recorded, then the ball is removed.
Besides, we may (or not) care about the order of the recorded numbers.
3 / 14
Sampling With and Without Replacement
Example:
An urn with 52 balls numbered from 1 to 52.
Pick and record 7 cards from a regular deck of 52 cards w/o replacement.
Q: How many different ordered sequences of cards are possible?
A: 52 · 51 · 50 · 49 · 48 · 47 · 46
If we drew cards w. replacement, there would be 527 different sequences.
4 / 14
k-permutations of n
Experiment: k balls are drawn w/o replacement and their numbers

recorded. An outcome is the ordered sequence of k numbers. )
The size of the sample set is
n!
n · (n - 1) · (n - 2) · · · (n - k + 1) =
| {z } (n - k)!
k factors
Ỳ
where the factorial function `! of ` is defined as the product `! = j
j=1
and 0! = 1. Note that `! = ` · (` - 1)!.
5 / 14
n-permutations of n
Experiment: k = n balls are drawn w/o replacement and their numbers

recorded. An outcome is the ordered sequence of n numbers. )
n · (n - 1) · (n - 2) · · · (n - k + 1) = n!
| {z }
k=n factors
) n different objects can be ordered in n! ways.
6 / 14
k-combinations of n

recorded. An outcome is the un-ordered sequence of k numbers. )
1 n!
n · (n - 1) · (n - 2) · · · (n - k + 1) · =
| {z } k! (n - k)! k!
k factors
✓ ◆
n! n
is known as binomial coefficient and denoted .
(n - k)! k! k
7 / 14
Drawing with Replacement
Experiment: k balls are drawn with replacement and their numbers

recorded. An outcome is the ordered sequence of k numbers. )
k
| · n ·{zn · · · n} = n
n
k factors
8 / 14
The Birthday Problem
Suppose you have 23 friends.
1. Would you bet that two (any two) have the same birthday?
Yes, if the probability that this happens is higher than it does not.
Probability that no two people have the same birthday is
365 · (365 - 1) · (365 - 2) · · · (365 - 23 + 1)

365 23
7
1
2
?
2. Would you bet that at least one has the same birthday as yours?
Probability that no friend has the same birthday is
⇣ 364 ⌘23
365
7
1
2
?
9 / 14
Tree Diagrams
Tree diagrams are used to study experiments that take place in stages,
e.g, ordering food in restaurants (appetizer, main dish, desert):
ice cream
meat
cake
ice cream
soup fish cake
ice cream
vegetable
cake
(start)
ice cream
meat
cake
ice cream
juice fish
cake
ice cream
vegetable
cake
How many possible choices are there for the complete meal?
10 / 14
Tree Diagrams – Total Probability and Bayes Rule
ω m (ω)
meat ω1 .4
.5
.3 fish ω .24
soup 2
.2
.8
vegetable ω 3 .16
(start)
meat ω 4 .06
.2 .3
juice .4 fish ω .08

5
.3
vegetable ω 6 .06
P(meat) = P(meat|soup)P(soup) + P(meat|juce)P(juce) two paths

P(soup|meat) = P(meat|soup)P(soup)/P(meat)
11 / 14
Repeated Coin Tosses
Find the probability to get exactly 2 heads in 4 tosses of a biased coin.
Observations:
(We denote a head by 0, a tail by 1, and P(0) = p, P(1) = 1 - p.)
1. Sample space consists of all 4-bit binary strings.
(Each bit corresponds to a coin toss sub-experiment.)
2. Event of interest is E✓=◆{1100, 1010, 1001, 0101, 0110, 0011}.
4
Note that there are outcomes in E.
2
3. The probability of each outcome in E is p2 (1 - p)2 .
) ✓ ◆
4 2
P(E) = p (1 - p)2
2
12 / 14
Tree Diagrams for Coin Tosses
p 1-p
H T
p 1-p p 1-p
H T H T
p 1-p p 1-p p 1-p p 1-p
H T H T H T H T
p 1-p p 1-p p 1-p p 1-p p 1-p p 1-p p 1-p p 1-p
H T H T H T H T H T H T H T H T
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
13 / 14
Reading Material
For this class:

3 in Introduction to Probability

14 / 14
DISCRETE RANDOM VARIABLES

Lecture Notes, February 1, 2018
1 / 16
A random variable assigns a number to each outcome of an experiment.

Random variables are denoted by capital letters.
Example: A die is rolled once.
I ⌦ = {1, 2, 3, 4, 5, 6} is the sample space of this experiment.
I We can associate RV X with the outcome of this experiment.
“Outcome j happens with probability µj .”
, ” RV X takes value j with probability µj .”
I The event E = {2, 4, 6} can be described by saying
“The result of the roll is an even number.”
, “X is even.”
The set of values that an RV can take is its range (sample space).
2 / 16
Can we associate more than one RV to an experiment?

We can have RVs X and Y with their respective ranges

RX = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20}
RY = {“single-digit number”, “double-digit number”}
3 / 16
Discrete Random Variables
An RV is discrete if its range is either finite or countably infinite.
A set is countably infinite if its elements can be counted, i.e.,

can be put in one-to-one correspondence with the positive integers.
Many finite sets properties can be generalized to countably infinite, e.g.,

For any countable collection E1 , E2 , . . . of mutually exclusive events,
⇥ ⇤
P E1 [ E2 [ . . . = P(E1 ) + P(E2 ) + . . .
4 / 16
Random Variable & its Probability Distribution
For an experiment, we have

X: RV denoting the value of its outcomes
⌦X : the sample space (i.e., the set of all possible values of X)
A distribution function for X is defined as a real-valued function µ

whose domain is ⌦X and which satisfies:
1. µ(!) > 0 for all outcomes ! 2 ⌦X
&
X
2. µ(!) = 1 .
!2⌦
For discrete RVs, we also say the probability mass function (PMF).
5 / 16
Bernoulli Random Variable
Bernoulli RV is associated to any experiment with two outcomes,

such as a coin toss with outcomes H and T , which we call 1 and 0.
X ⇠ Bernoulli(p) means P(X = 1) = p and P(X = 0) = 1 - p
The indicator RV for an event E defined as

8
<1 if E occures,
1E :=
:0 otherwise.
is Bernoulli(P(E)).
6 / 16
k-combinations of n

recorded. An outcome is the un-ordered sequence of k numbers. )
1 n!
n · (n - 1) · (n - 2) · · · (n - k + 1) · =
| {z } k! (n - k)! k!
k factors
✓ ◆
n! n
is known as binomial coefficient and denoted .
(n - k)! k! k
7 / 16
Binomial Coefficients
Consider all strings of n binary digits 0 and 1.

I How many such sequences do we have? 2n
✓ ◆
n
I How many of them have exactly k zeros?
k
n
X n✓ ◆
I = 2n
k
k=0
8 / 16
Repeated Coin Tosses
Find the probability to get exactly 2 heads in 4 tosses of a biased coin.
Observations:
(We denote a head by 1, a tail by 0, and P(1) = p, P(0) = 1 - p.)
1. Sample space consists of all 4-bit binary strings.
(Each bit corresponds to a coin toss sub-experiment.)
2. Event of interest is E✓=◆{1100, 1010, 1001, 0101, 0110, 0011}.
4
Note that there are outcomes in E.
2
3. The probability of each outcome in E is p2 (1 - p)2 .
) ✓ ◆
4 2
P(E) = p (1 - p)2
2
9 / 16
Tree Diagrams for Coin Tosses
p 1-p
H T
p 1-p p 1-p
H T H T
p 1-p p 1-p p 1-p p 1-p
H T H T H T H T
p 1-p p 1-p p 1-p p 1-p p 1-p p 1-p p 1-p p 1-p
H T H T H T H T H T H T H T H T
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10 / 16
Bernoulli Trials
A Bernoulli trials process is a sequence of chance experiments such that
1. Each experiment has two possible outcomes, which we may call

H & T , 1 & 0 , or in general, success & failure .
2. The outcome of each experiment is independent of other outcomes.
)
The probability of success p is
I the same for each experiment
I not affected by any knowledge of previous outcomes
We often use q to denote the probability of failure q = 1 - p.
11 / 16
RVs Associated with Bernoulli Trials
X - the number of heads observed in n coin tosses (P(H) = p)
Q: Why/When do we care about X?

A: e.g. When we store data over failing disks.
Q: Which values can X take? A: X 2 {0, 1, 2, . . . , n}
Q: What is the probability of X taking a certain value?

A: The probability of getting exactly k heads in n tosses is
✓ ◆
n k
P(X = k) = p (1 - p)n-k
k
X ⇠ B(n, p) means X is a Binomial RV with parameters n and p.
12 / 16
Y - total number of coin tosses until the first head appears
Q: Why/When do we care about Y?

A: e.g. When we send data packets over lossy networks.
Q: Which values can Y take? A: Y = 1, 2, . . .
Q: What is the probability of Y taking a certain value?

A: The probability that the first head is seen at the k-th toss is
P(Y = k) = (1 - p)k-1 · p
X ⇠ Geometric(p) means X is a Geometric RV with parameter p.
13 / 16
Z - total number of coin tosses until the the `-th head appears
Q: Why/When do we care about Z?

A: e.g. When we stream coded data packets over lossy networks.
Q: Which values can Z take? A: Z = `, ` + 1, . . .
Q: What is the probability of Z taking a certain value?

A: The probability that the `-th head is seen at the k-th toss is
✓ ◆
k - 1 `-1
P(Z = k) = p (1 - p)k-` · p
`-1
Z ⇠ NB(`, p) means Z is a Negative Binomial RV with parameter p.
14 / 16
Two More Discrete RVs
X ⇠ Poisson( ) means X is a Poisson RV with parameter

k -
e
P(X = k) = for k = 0, 1, 2, . . .
k!
RV is U is a uniform on a sample space ⌦U of size n if
1
P(U = !) = for each ! 2 ⌦
n
15 / 16
Reading Material
For this class:


6.1, 6.2 in Introduction to Probability
16 / 16
DISCRETE RANDOM VARIABLES

1 / 18
A Gambling Game
An unbiased die is rolled.

I If an odd number turns up, you win that number of dollars.
I If an even number turns up, you loose that number of dollars.
Would you play this game? Do you expect to win or to loose money?
The RV associated with the financial transaction takes the values in

{+1. - 2, +3, -4, +5, -6}, each with probability 1/6. =)
The expected win/loss is
1 1 1 1 1 1
1· - 2 · + 3 · - 4 · + 5 · - 6 · = -0.5 .
6 6 6 6 6 6
2 / 18
The Expected Value
X is a numerically-valued discrete RV with the range RX = {x1 , x2 , . . . }

The expected value (aka the mean) E(X) of X is defined as
X
E(X) = xj P(X = xj )
j
provided this sum converges (otherwise, X does not have a mean).
Example: X ⇠ Bernoulli(p) )
E(X) = 1 · P(X = 1) + 0 · P(X = 0) = p
3 / 18
Some Properties of the Binomial Coefficients
✓ ◆
n n!
= for 0 6 k 6 n
k k! (n - k)!
The following holds:

✓ ◆ ✓ ◆
n n
1. = for 0 6 k 6 n,
k n-k
✓ ◆ ✓ ◆
n n n-1
2. =
k k k-1
n ✓ ◆
X n
3. (x + y)n = xn-k yk the binomial formula
k
k=0
4 / 18
The Expectation of the Binomial RV
✓ ◆
n k
X ⇠ B(n, p) ) P(X = k) = p (1 - p)n-k for k = 0, 1, . . . , n )
k
n
X ✓ ◆ n
X ✓ ◆
n k n n-1 k
E[X] = k· p (1 - p)n-k = k p (1 - p)n-k
k k k-1
k=0 | {z } k=1
P(X=k)
n ✓
X ◆
n - 1 k-1
= np p (1 - p)(n-1)-(k-1)
k-1
k=1
Xm ✓ ◆
m `
= np p (1 - p)m-` (with m = n - 1, ` = k - 1)
`
`=0
= np
5 / 18
The Sum of Two Random Variables
Claim: Let X and Y be RVs with finite expected values. Then
E(X + Y) = E(X) + E(Y)
Proof: Suppose that RX = {x1 , x2 , . . .} and RY = {y1 , y2 , . . .}. Then

XX
E(X + Y) = (xj + yk )P(X = xj , Y = yk )
j k
XX XX
= xj P(X = xj , Y = yk ) + yk P(X = xj , Y = yk )
j k j k
X X X X
j k k j
| {z } | {z }
P(X=xj ) P(Y=yk )
6 / 18
The Expectation of the Binomial RV
X ⇠ B(n, p) ) X = Y1 + Y2 + · · · + Yn
where Yi ⇠ Bernoulli(p) for i 2 {1.2, . . . , n} ) E[Yi ] = p
)
E[X] = E[Y1 + Y2 + · · · + Yn ]
= E[Y1 ] + E[Y2 ] + · · · + E[Yn ]
= np
7 / 18
The Expectation of the Poisson RV
k -
e
X ⇠ Poisson( ) ) P(X = k) = for k = 0, 1, 2, . . . )
k!
1
X k - X1 k
e
E[X] = k· = e-
k! }
| {z (k - 1)!
k=0 k=1
P(X=k)
1
X k-1
= e-
(k - 1)!
k=1
| {z }
e
8 / 18
The Expectation of the Geometric RV
X ⇠ Geometric(p) ) P(X = k) = (1 - p)k-1 p for k = 1, 2, . . . , n )

| {z }
q
1
X 1
X
E[X] = k · qk-1 p = p kqk-1
| {z }
k=0 k=1
P(X=k)
1
X 1
d k d X k
=p q =p q
dq dq
k=1 k=1
d h X ki
1
1
=p q q =p·
dq (1 - q)2
k=0
| {z }
1/(1-q)
1
=
p
9 / 18
The Expectation of the Negative Binomial (Pascal) RV
X ⇠ NB(`, p) ) X = Y1 + Y2 + · · · + Y`
1
where Yi ⇠ Geometric(p) for i 2 {1.2, . . . , `} ) E[Yi ] =
p
)
E[X] = E[Y1 + Y2 + · · · + Y` ]
= E[Y1 ] + E[Y2 ] + · · · + E[Y` ]
n
=
p
10 / 18
The Variance
How useful is the mean in predicting the outcome of an experiment?

i.e., How useful is E(X) in predicting the value that X will take?
That depends on how likely is X to deviate from E(X).
We measure the deviation by variance of X, denoted by V(X):
⇥ ⇤
V(X) = E (X - E(X))2
for a numerically valued random variable X.

p
The standard deviation of X is defined as D(X) = V(X).
2
We often use µX and X to denote the mean and variance of RV X.
11 / 18
Example # 1
A fair die is rolled once; let X be the number that turns up. Find V(X).
I To find V(X), we must first find the expected value of X:
1 1 1 1 1 1 7
E(X) = 1 · +2· +3· +4· +5· +6· =
6 6 6 6 6 6 2
I To find the variance of V(X),
we form the RV (X - E(X))2 and find its expectation:
X
PMF 1/6 1/6 1/6 1/6 1/6 1/6
(X - E(X))2 25/4 9/4 1/4 1/4 9/4 25/4
From this table, we find E((X - E(X))2 ) is

✓ ◆
1 25 9 1 1 9 25 35
V(X) = + + + + + = ,
6 4 4 4 4 4 4 12
p
and the standard deviation D(X) = 35/12 ⇡ 1.707 .
12 / 18
Example # 2
A biased die is rolled once; let X be the number that turns up.
Find V(X) if P(3) = P(4) = 1/2.
I E(X) = 7/2
X
PMF 0 0 1/2 1/2 0 0
(X - E(X))2 25/4 9/4 1/4 1/4 9/4 25/4
From this table, we find E((X - E(X))2 ) = 1/4 and D(X) = 1/2 .
13 / 18
Example # 3
A biased die is rolled once; let X be the number that turns up.
Find V(X) if P(1) = P(6) = 1/2.
I E(X) = 7/2
X
PMF 1/2 0 0 0 0 1/2
(X - E(X))2 25/4 9/4 1/4 1/4 9/4 25/4
From this table, we find E((X - E(X))2 ) = 25/4 and D(X) = 5/2 .
14 / 18
Point Processes
Consider RU students arriving to the Busch Center over a 2 hour period.

We know (assume):
I the average number of arrivals over a period of time, e.g.,
I 600 between 11:30am and 1:30pm, i.e., 5 arrivals per minute
I 120 between 2:30pm and 4:30pm, i.e., 1 arrival per minute
I an arrival can happen at any time
I the number of arrivals in non-overlapping intervals are independent
We can use the Binomial distribution to model such arrival scenarios.
15 / 18
Point Processes
We can use the Binomial distribution to model such arrival scenarios:

I is the arrival rate (the number of arrivals in a unit time interval)
I The unit time interval is broken up into n subintervals of equal
length s.t. either 0 or 1 occurrences can happen in the subinterval.
Arrivals over n = 60 seconds in 1 minute
0 10 20 30 40 50 60
) the number of occurrences in the 1 minute interval is a B(n, p).

What is p?
I We know that
I the expected number of arrivals is ·1
I the mean of X ⇠B(n, p) is n · p
) p = /n. (Note that p ! 0 as n ! 1)
16 / 18
Poisson Arrivals
Let X ⇠ B(n, p) be a Binomial RV with parameters n and p. s.t. p = .

n
Then, for large n, we have

✓ ◆
n k
P(X = k) = p (1 - p)n-k
k
(n - 1)(n - 2) . . . (n - k + 1) k ⇣ ⌘(n-k)
= · k · 1-
| k!
{z } n | n{z }
⇠ nk /k! ⇠ e- (n-k)/n
k -
e
! as n ! 1
k!
17 / 18
Reading Material
For this class:

For Midterm 1:
1, 2, 3 in Introduction to Probability, Statistics, ...
1.2, 3.1, 3.2, 4.1, 5, 6.1, 6.2 in Introduction to Probability
18 / 18
MULTIPLE DISCRETE RANDOM VARIABLES

1 / 18
The Coupon Collector’s Problem
Sampling with replacement :

A ball is drawn, its number is recorded, then the ball is returned.
How many draws does it take on average to record all numbers?
2 / 18
Time to Collect All Numbers (Coupons)
I The first draw brings a new coupon for sure.

I Suppose we have collected r different coupons. Then
I a draw brings a new coupon with probability pr = (n - r)/n,
I the number of draws to get a new coupon is Nr ⇠ Geometric(pr ),
I the average number of draws to get a new coupon is
1
X
⇥ ⇤ 1 n
E Nr = ` · pr (1 - pr )`-1 = = .
`=1
pr n-r
I The average number of draws to get all coupons is

n-1
X ⇥ ⇤ n-1
X 1 ⇣1 1 ⌘
E Nr = =n + + · · · + 1 = nHn
r=0
p
r=0 r
n n-1
Hn = log n + + O(n-1 ) is the harmonic number.

= 0.5772156649 is the Euler’s constant.
3 / 18
Average Time to Collect Some Coupons
The next homework assignment will be about that.
4 / 18
A Gambling Game
An unbiased die is rolled.

I If an odd number turns up, you win that number of dollars.
I If an even number turns up, you loose that number of dollars.
Let X be the RV associated with the number that turns up.
=) X takes values in ⌦X = {1, 2, 3, 4, 5, 6}, each with probability 1/6.
The RV Y associated with the financial transaction is a function of X:

8
<X if X is odd;
Y = g(X) =
:-X if X is even.
=) Y takes values in ⌦Y = {+1, -2, +3, -4, +5, -6},

each with probability 1/6.
5 / 18
Functions of Random Variables
If X is a random variable and Y = g(X), then Y is a random variable.

We have the following:
I The range of Y is ⌦Y = g(x) | x 2 ⌦X .
I The PMF of Y is given by
X
P(Y = y) = P(g(X) = y) = P(X = x)
x:g(x)=y
I The mean of Y is given by

X X X
E[Y] = yP(Y = y) = y· g(x)P(X = x)
y2⌦Y y2⌦Y x:g(x)=y
X
= g(x)P(X = x)
x2⌦X
6 / 18
Functions of Random Variables
Example:
In a die rolling experiment, RV X corresponds to the number
on the face that turns up, and RV Y = X2 is a function of X.
⌦X = {+1, -1, -2, +2, -4, +4}
⌦X2 = {1, 4, 16}
P(X2 = 1) = P(X = 1) + P(X = -1)
X X
Compute and compare 1) x2 P(X = x) and 2) yP(Y = y).
x2⌦X y2⌦Y
7 / 18
Two Discrete Random Variables - Example
One of the two dice is picked and rolled.

Black die is picked with probability 2/3 and red with probability 1/3.
⌦
µB 0 0 1/2 0 0 1/2
µR 1/3 0 1/3 0 1/3 0
Let RV X be the color of the picked die and Y the number that turns up.
Find the joint probability of X and Y.
8 / 18
Joint and Marginal Probabilities of Two RVs
I Consider two RVs X and Y associated with the same experiment.

I The random pair (X, Y) has the joint PMF pX,Y (·, ·) of X and Y.
If x is a possible value of X and y is a possible value of Y, then
pX,Y (x, y) = Pr({X = x} \ {Y = y}) = Pr(X = x, Y = y),
that is, Pr{X = x} AND Pr{Y = y}.
We can compute the marginal PMFs of X and Y from the joint PMF
X X
pX (x) = pX,Y (x, y) pY (y) = pX,Y (x, y)
y2⌦Y x2⌦X
9 / 18
Conditioning on Random Variables
I Consider two RVs X and Y associated with the same experiment.

I The random pair (X, Y) has the joint PMF pX,Y (·, ·) of X and Y.
The conditional PMF pY|X (·|·) of Y given X = x is defined as
Pr({Y = y} \ {X = x}) pX,Y (x, y)

pY|X (y|x) = Pr(Y = y|X = x) = =
Pr(X = x) pX (x)
provided that pX (x) 6= 0.
10 / 18
Independent Random Variables
RVs X with ⌦X = {x1 , x2 , . . .} and Y with ⌦Y = {y1 , y2 , . . .}

are independent iff
P(X = xj , Y = yk ) = P(X = xj )P(Y = yk ) for all xj 2 ⌦X , yk 2 ⌦Y
Example: Die X is rolled on Busch and die Y on Livingston campus.
X Y
⌦X = {1, 2, . . . , 20} ⌦Y = {1, 2, . . . , 6}
e.g. P(X = 12, Y = 6) = P(X = 12) · P(Y = 6).
11 / 18
E(X + Y) = E(X) + E(Y)
Proof: Suppose that ⌦X = {x1 , x2 , . . .} and ⌦Y = {y1 , y2 , . . .}. Then

XX
E(X + Y) = (xj + yk )P(X = xj , Y = yk )
j k
XX XX
j k j k
X X X X
j k k j
| {z } | {z }
P(X=xj ) P(Y=yk )
12 / 18
The Product of Two Independent Random Variables
Claim: Let X and Y be two independent RVs with finite expected

values. Then
E(X · Y) = E(X) · E(Y)
Proof: Suppose that ⌦X = {x1 , x2 , . . .} and ⌦Y = {y1 , y2 , . . .}. Then

XX
E(XY) = xj yk P(X = xj , Y = yk )
| {z }
j k
P(X=xj )P(Y=yk )
XX
= xj P(X = xj ) · yk P(Y = yk )
j k
⇣X ⌘ ⇣X ⌘
= xj P(X = xj ) · yk P(Y = yk )
j k
13 / 18
Conditional Expectation
Definition: Let F be an event and X is an RV with ⌦X = {x1 , x2 , . . .}.
The conditional expectation of X given F is
X
E(X|F) = xj P(X = xj |F) .
j
Claim: If events F1 , F2 , . . . , Fr form a partition of ⌦X , then

X
E(X) = E(X|Fk )P(Fk ) .
k
X X⇣ X ⌘
Proof: E(X|Fk )P(Fk ) = xj P(X = xj |Fk ) P(Fk )
k k j
XX
= xj P(X = xj and Fk occurs)
k j
X X
= xj P(X = xj and Fk occurs)
j k
X
= xj P(X = xj ) = E(X)
j 14 / 18
Joint (Vector) Random Variables
I When several RVs X1 , X2 , . . . , Xn correspond to an experiment

we often consider them jointly as a vector RV X̄ = (X1 , X2 , . . . , Xn )
I If Xi has range ⌦i , then the range R of X̄ is the Cartesian product
⌦ = ⌦1 ⇥ ⌦2 ⇥ · · · ⇥ ⌦n
I The joint probability P(X1 = x1 , X2 = x2 , . . . , Xn = xn ) equals to
P(X1 = x1 )·P(X2 = x2 |X1 = x1 )·. . .·P(Xn = xn |X1 = x1 , . . . , Xn-1 = xn-1 )
Example – independent RVs:

Numbers on dice rolled on Busch, College Av., Cook, and Livingston.
Example – dependent RVs:
Temperatures measured on Busch College Av., Cook, and Livingston.
15 / 18
(In)dependence of Random Variables X1 , X2 , . . . Xn , . . .
Independent RVs:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(Xi = xi )
i=1
Markov Chain:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn | Xn-1 = xn-1 )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) P(Xi = xi | Xi-1 = xi-1 )
i=2
⇥ ⇤ ⇥ ⇤
Martingale: E |Xn | < 1 and E Xn | X1 , . . . , Xn-1 = Xn-1
16 / 18
Bernoulli Trials and Gambler’s Fortune
A gambler is betting on outcomes of a sequence of coin tosses:

She wins $1 if the coin comes up heads and loses $1 otherwise.
Let
I p be the probability that head turns up.
I Xn be the RV associated with the gain/loss of the n-th toss.
I Sn be the RV associated with the gambler’s fortune after n tosses
What can we say about the sequences X1 , X2 , . . . ?

Do we have independent RVs, a Markov chain, a martingale?
How about S1 , S2 , . . . ?
Note that, when p = 1/2, the gambler’s expected fortune after the next trial,
given the history, is equal to his present fortune. Therefore, Sn is a martingale.
17 / 18
Reading Material
For this class:

3.2, 5.1, 11.2.1 in Introduction to Probability, Statistics, ...

3.2.4, 6.1.1 in Introduction to Probability, Statistics, ...
18 / 18
(IN)DEPENDENCE OF RVs

1 / 18
Joint (Vector) Random Variables
I When several RVs X1 , X2 , . . . , Xn correspond to an experiment
we often consider them jointly as a vector RV X̄ = (X1 , X2 , . . . , Xn )
I If Xi has range ⌦i , then the range ⌦ of X̄ is the Cartesian product
⌦ = ⌦1 ⇥ ⌦2 ⇥ · · · ⇥ ⌦n
I The joint probability is given by

P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) · xi 2 ⌦i
P(X2 = x2 | X1 = x1 )·
P(X3 = x3 | X2 = x2 , X1 = x1 )·
...
P(Xn = xn | Xn-1 = xn-1 , . . . , X2 = x2 , X1 = x1 )
Example – independent RVs:

Numbers on dice rolled on Busch, College Av., Cook, and Livingston.
Example – dependent RVs:
Temperatures measured on Busch College Av., Cook, and Livingston.
2 / 18
(In)dependence of Random Variables X1 , X2 , . . . Xn , . . .
Independent RVs:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(Xi = xi )
i=1
Markov Chain:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn | Xn-1 = xn-1 )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) P(Xi = xi | Xi-1 = xi-1 )
i=2
⇥ ⇤ ⇥ ⇤
Martingale: E |Xn | < 1 and E Xn | X1 , . . . , Xn-1 = Xn-1
3 / 18
Joint and Marginal PMFs Two RVs
Joint PMF of two RVs is often given by the following table:

(Table entries can be found empirically e.g., smoking vs. cancer.)
⌦Y
y1 y2 ... ym
x1 P(X = x1 , Y = y1 ) P(X = x1 , Y = y2 ) ... P(X = x1 , Y = ym ) P(x1 )
x2 P(X = x2 , Y = y1 ) P(X = x2 , Y = y2 ) ... P(X = x2 , Y = ym ) P(x2 )
⌦X
.. .. .. .. .. ..
. . . . . .
x` P(X = x` , Y = y1 ) P(X = x1 , Y = y2 ) ... P(X = x` , Y = ym ) P(x` )
P(Y = y1 ) P(Y = y2 ) ... P(Y = ym )
The PMFs of the individual RVs are referred to as the marginal PMFs.
4 / 18
Example #1: A red die and a blue die are rolled, and the joint
probability for each pair of faces is given in the following table:
⌦B
1/18 0 1/18 0 1/18 0

1/18 0 1/18 0 1/18 0
1/18 0 1/18 0 1/18 0
⌦R
1/18 0 1/18 0 1/18 0

1/18 0 1/18 0 1/18 0
1/18 0 1/18 0 1/18 0
1. Are the dice unbiased?

2. Are the rolls independent?
5 / 18
Example #2: A red die and a blue die are rolled, and the joint
probability for each pair of faces is given in the following table:
⌦B
1/36 1/36 1/36 1/36 1/36 1/36

1/36 3/72 1/36 1/36 1/36 1/72
1/36 1/36 1/36 1/36 1/36 1/36
⌦R
1/36 1/36 1/36 1/36 1/36 1/36

1/36 1/72 1/36 1/36 1/36 3/72
1/36 1/36 1/36 1/36 1/36 1/36
1. Are the dice unbiased?

2. Are the rolls independent?
6 / 18
E(X + Y) = E(X) + E(Y)

XX
E(X + Y) = (xj + yk )P(X = xj , Y = yk )
j k
XX XX
j k j k
X X X X
j k k j
| {z } | {z }
P(X=xj ) P(Y=yk )
7 / 18
The Product of Two Independent Random Variables
Claim: Let X and Y be two independent RVs with finite expected

values. Then
E(X · Y) = E(X) · E(Y)

XX
E(XY) = xj yk P(X = xj , Y = yk )
| {z }
j k
P(X=xj )P(Y=yk )
XX
= xj P(X = xj ) · yk P(Y = yk )
j k
⇣X ⌘ ⇣X ⌘
= xj P(X = xj ) · yk P(Y = yk )
j k
8 / 18
Calculation of the Variance
Claim: If X is any random variable with E(X) = µ, then
V(X) = E(X2 ) - µ2 .
Proof: We have
⇥ ⇤
V(X) = E (X - µ)2
= E(X2 - 2µX + µ2 )
= E(X2 ) - 2µE(X) + µ2
= E(X2 ) - µ2
9 / 18
Calculation of the Expectation of a Liner Function of RV
Claim: If X is an RV with finite mean and variance, and

Y is a function of X given by Y = aX + b (a & b are constants)
then the expectation of Y is equal to E[Y] = aE[X] + b.
Proof: We have
X
E[aX + b] = (ax + b) · pX (x)
x2⌦X
X X
=a x · pX (x) + b · pX (x)
x2⌦X x2⌦X
= aE[X] + b.
10 / 18
Calculation of the Variance of a Liner Function of RV

then the variance of Y is equal to V[Y] = a2 V[X].
Proof: We have
X
V[Y] = (ax + b - E[aX + b])2 · pX (x)
x2⌦X
X 2
= ax + b - aE[X] - b · pX (x)
x2⌦X
X
= a2 (x - E[X])2 · pX (x) = a2 V[X]
x2⌦X
11 / 18
The Variance of the Sum Two Independent RVs
Claim: Let X and Y be two independent RVs with finite expectations.

Then
V(X + Y) = V(X) + V(Y)
Proof: Let E(X) = µX and E(Y) = µY . Then

⇥ ⇤ ⇥ ⇤2
V(X + Y) = E (X + Y)2 - E(X + Y)
= E(X2 + 2XY + Y 2 ) - (µX + µY )2
= E(X2 ) + 2E(XY) + E(Y 2 ) - µ2X - 2µX µY - µ2Y .
Since X and Y are independent, E(XY) = E(X)E(Y) = µX µY . )
V(X + Y) = E(X2 ) - µ2X + E(Y 2 ) - µ2Y = V(X) + V(Y)
12 / 18
Covariance and Correlation
We consider two RVs X and Y with finite expectations and variances.
Definition:
1. The covariance of RVs X and Y is

⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y]) ,
cov(X, Y)
and the correlation coefficient is ⇢X,Y = p
V(X)V(Y)
2. We say that X and Y are uncorrelated RVs when cov(X, Y) = 0.
13 / 18
Some Properties of Covariance
How much can one RV tell about the other? Covariance is an indicator.
⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y])
1. cov(X, X) = V(X)
2. cov(X, Y) = cov(Y, X)
3. -1 6 ⇢(X, Y) 6 1
4. V(X + Y) = V(X) + V(Y) + 2 cov(X, Y)
5. cov(X, Y) = E [X · Y] - E [X] · E [Y]
) If two RVs are independent, they are also uncorrelated.
TRUE OR FALSE?
1. If X and Y are uncorrelated, then V(X + Y) = V(X) + V(Y) T
2. If X and Y are uncorrelated, then they are independent. F
14 / 18
Correlation vs. (In)dependence
Example:
The joint probability of RVs X and Y is given in the following table:
⌦X
-1 0 1
⌦Y -1 0 1/4 0
0 1/4 0 1/4
1 0 1/4 0
1. Are X and Y independent?

2. Are X and Y uncorrelated?
15 / 18
Mutual Information
Another useful measure of (in)dependence of two RVs:

X X P(X = x, Y = y)
I(X; Y) = P(X = x, Y = y) log
P(X = x)P(Y = y)
y2⌦Y x2⌦X
It tells how much uncertainty about X remains once Y is known.
E.g., it is used to determine the capacity of a communications channel.
16 / 18
Solve a problem before you go ...
An urn with n balls. X are white and the rest blue.
X is an RV with a probability distribution on the integers 0, 1, 2, . . . , n.
Is knowing just E(X) enough to calculate the probability that a ball

drawn at random from the urn will be white? What is this probability?
17 / 18
Reading Material
For this class:

For next class:

3.2.1, 7.1, 8.1 in Introduction to Probability
18 / 18
SUMS OF RVs & LAWS OF LARGE NUMBERS

Lecture Notes, Feb 26, 2018
1 / 17
The Expectation of the Sum of Two Random Variables
E(X + Y) = E(X) + E(Y)

XX
E(X + Y) = (xj + yk )P(X = xj , Y = yk )
j k
XX XX
j k j k
X X X X
j k k j
| {z } | {z }
P(X=xj ) P(Y=yk )
2 / 17
The Variance of the Sum of Two Random Variables
Claim: Let X and Y be two RVs with finite expectations. Then
V(X + Y) = V(X) + V(Y) + 2 cov(X, Y)
Proof: Let E(X) = µX and E(Y) = µY . Then

⇥ ⇤ ⇥ ⇤2
V(X + Y) = E (X + Y)2 - E(X + Y)
= E(X2 + 2XY + Y 2 ) - (µX + µY )2
= E(X2 ) + 2E(XY) + E(Y 2 ) - µ2X - 2µX µY - µ2Y
= E(X2 ) - µ2X + E(Y 2 ) - µ2Y +2 E(XY) - µX µY .
| {z } | {z } | {z }
V(X) V(Y) cov(X,Y)
If X and Y are uncorrelated, then E(XY) = E(X)E(Y) = µX µY . )
V(X + Y) = V(X) + V(Y)
3 / 17
Distribution of of the Sum of Two Independent RVs
A die is rolled twice. Let X1 and X2 be the outcomes and S2 = X1 + X2
If X1 & X2 are iid (independent identically distributed) with PMF m
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
what can we say about S2 ?

1 1 1
P(S2 = 2) = m(1)m(1) = · =
6 6 36
1 1 1 1 2
P(S2 = 3) = m(1)m(2) + m(2)m(1) = · + · =
6 6 6 6 36
1 1 1 1 1 1 3
P(S2 = 4) = m(1)m(3) + m(2)m(2) + m(3)m(1) = · + · + · =
6 6 6 6 6 6 36
P(S2 = 5) = 4/36 P(S2 = 6) = 5/36 P(S2 = 7) = 6/36 P(S2 = 8) = 5/36

P(S2 = 9) = 4/36 P(S2 = 10) = 3/36 P(S2 = 11) = 2/36 P(S2 = 12) = 1/36
4 / 17
The Distribution of the Sum of Two Independent RVs
I Let X and Y be two independent integer-valued RVs and Z = X + Y

If the PMFs of X & Y are pX & pY , we can find pZ , the PMF of Z
I If X = k, then Z = z iff Y = z - k. So the event Z = z is the union
[
(X = k) and (Y = z - k)
k
I Since these events are pairwise disjoint, we have

X
P(Z = z) = P (X = k) and (Y = z - k)
k
I Since X and Y are independent, we have

X
P(Z = z) = P(X = k) · P(Y = z - k) convolution
k
5 / 17
Distribution of of the Sum of Two Independent RVs
The price of a stock on a trading day changes for some random amount
X with PMF PX
X : -1 0 1 2
pX : 1/4 1/2 1/8 1/8
Find the distribution for the change in stock price after two consecutive
and independent trading days.
6 / 17
The Weak Low of Large Numbers
Example:
I Consider n rolls of a die, and let Xj be the outcome of the jth roll.
This is an independent trials process with E(Xj ) = 7/2.
I What can we say about Sn = X1 + X2 + · · · + Xn ?
The Weak Law of Large Numbers says that, for any ✏ > 0
✓ ◆
Sn 7
P - >✏ ! 0 as n ! 1.
n 2
An equivalent way to state this is that, for any ✏ > 0,

✓ ◆
Sn 7
P - < ✏ ! 1 as n ! 1.
n 2
7 / 17
Distribution of Sn /n for Die Rolling Trials, n=1, n=2, n=3
0.20
0.15
probability
0.10
0.05
0.00
1 2 3 4 5 6
sample space 8 / 17
Law of Large Numbers
Let X1 , X2 , . . . , Xn be an independent trials process

with E(Xj ) = µ < 1 and V(Xj ) < 1 (finite mean and variance).
Let Sn = X1 + X2 + · · · + Xn .
The Weak Law of Large Numbers says that, for any ✏ > 0,
✓ ◆
Sn
P -µ >✏ ! 0 as n ! 1
n
An equivalent way to state this is that, for any ✏ > 0,

✓ ◆
Sn
P - µ < ✏ ! 1 as n ! 1.
n
9 / 17
Sample Mean and Variance
I Let X be an RV with finite E(X) = µ and V(X) = 2 .

(X corresponds to outcomes of a random experiment.)
I How can we estimate µ?
We could repeat the experiment n times and use the outcomes

x1 , x2 , . . . , xn to estimate µ by the sample mean
n
1X
x̄ = xi
n
i=1
Is this a good estimator?
10 / 17
Sample Mean
I Are the mean and the variance of an RV constant or RVs?

I Are the sample mean and variance constant or RVs?
The sample mean and variance are RVs with their own PMFs.
What can we say about the mean of the sample mean E(X̄n )?
n
1X
X̄n = Xi
n
i=1
where X1 , X2 , . . . , Xn be an independent trials process

with E(Xj ) = µ < 1 and V(Xj ) < 1
=) E(X̄n ) = µ unbiased estimator.
11 / 17
The Sample Mean Estimator
What else is important for an estimator (besides the bias)?
12 / 17
How about precision?

n
1X
X̄n = Xi
n
i=1

with E(Xj ) = µ < 1 and V(Xj ) = 2 < 1
How close is the RV X̄n to its mean?
13 / 17
Calculation of the Expectation of a Liner Function of RV

then the expectation of Y is equal to E[Y] = aE[X] + b.
Proof: We have
X
E[aX + b] = (ax + b) · pX (x)
x2⌦X
X X
=a x · pX (x) + b · pX (x)
x2⌦X x2⌦X
= aE[X] + b.
14 / 17
Calculation of the Variance of a Liner Function of RV

then the variance of Y is equal to V[Y] = a2 V[X].
Proof: We have
X
V[Y] = (ax + b - E[aX + b])2 · pX (x)
x2⌦X
X 2
= ax + b - aE[X] - b · pX (x)
x2⌦X
X
= a2 (x - E[X])2 · pX (x) = a2 V[X]
x2⌦X
15 / 17
How about precision?

n
1X
X̄n = Xi
n
i=1

with E(Xj ) = µ < 1 and V(Xj ) = 2 < 1
How close is the RV X̄n to its mean?
1. The Weak Law of Large Numbers tells that, for any ✏ > 0,
P X̄n - µ > ✏ ! 0 as n ! 1
2
2. The variance of the sample mean V(X̄n ) = /n.
It is very unlikely that the sample mean gets very far from its mean.
16 / 17
Reading Material
For this class:

For next class:

17 / 17
TAILS, LIMITS, AND CONTINUITY

Lecture Notes, March 1, 2018
1 / 16
Fraction of Heads in Coin Tossing
Problem: A fair coin is tossed n times (e.g. 50, 100, 200)
1. What is the expected fraction of heads?
2. How likely is it that the fraction of heads deviates
from the expected by more than 0.1?
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
fraction of heads fraction of heads fraction of heads
2 / 16
Number of Heads in n = 50 Fair-Coin Tosses
I The number of heads is Xn ⇠ B(n, 0.5) ) E(X50 ) = 25.

I How likely is X50 to deviate by more than 5 from 25?
0.12
Two tails of the PMF: 0.10
19 ✓ ◆⇣ ⌘ ⇣
X 50 1 k 1 ⌘n-k 0.08
probability
1-
k 2 2
k=0 0.06
+
0.04
✓ ◆
50 ⇣ 1 ⌘k ⇣ 1 ⌘n-k
50
X
1- 0.02
k 2 2
k=31
0.00
10 15 20 25 30 35 40
number of heads
3 / 16
Measuring Deviation from the Mean
Claim: For any RV X and any positive real number ✏ > 0, we have a
bound on the probability that X differs from E(X) by ✏ or more:
V(X)
P(|X - E(X)| > ✏) 6 Chebyshev Inequality
✏2
Proof: Let m(x) denote the PMF of X and µ = E(X).

X
V(X) = (x - µ)2 m(x) variance definition
x
X
> (x - µ)2 m(x) dropping positive terms
|x-µ|>✏
X
> ✏2 m(x) region of |x - µ| > ✏
|x-µ|>✏
X
= ✏2 m(x)
|x-µ|>✏
= ✏2 P(|X - E(X)| > ✏) event probability definition

4 / 16
Law of Large Numbers - Statement

with E(Xj ) = µ < 1 and V(Xj ) = 2 < 1 (finite mean and variance).
Let Sn = X1 + X2 + · · · + Xn .
✓ ◆
Sn
P -µ >✏ ! 0 as n ! 1
n
5 / 16
Law of Large Numbers - Proof

Let Sn = X1 + X2 + · · · + Xn . Then
1. E(Sn ) = nµ expectation of the sum of RVs
2
2. V(Sn ) = n variance of the sum of independent RVs
Let Yn = Sn /n. Then

1. E(Yn ) = E(Sn )/n = µ.
2. V(Yn ) = V(Sn )/n2 = 2
/n
V(Yn )
3. P(|Yn - E(Yn )| > ✏) 6 Chebyshev inequality
✏2
✓ ◆ 2
Sn
P -µ >✏ 6 ! 0 as n ! 1
n n✏2
6 / 16
Problem: A fair coin is tossed 200 times.

2. Find an upper bound on the probability that the fraction
of heads deviates from the expected by 0.1 or more.
Solution:
I Which RV X corresponds to a single coin toss outcome?
I Which RV Sn corresponds to the number of heads in n tosses?
I What is Yn = Sn /n?
I Can we apply the Chebyshev inequality to Yn ?
7 / 16
Bernoulli Trials
I Let Xj , j = 1, 2, . . . , n be a Bernoulli trials process.

Xj = 1 if the j-th outcome is a success and 0 if it is a failure.
I If p is the probability of a success, and q = 1 - p, then
E(Xj ) = 0 · q + 1 · p = p and E(X2j ) = 02 · q + 12 · p = p
=) V(Xj ) = E(X2j ) - (E(Xj ))2 = p - p2 = pq

Note that
I Sn = X1 + X2 + · · · + Xn is the number of successes in n trials.
E(Sn ) = np and V(Sn ) = npq
I Yn = Sn /n is the fraction of successes in n trials.
E(Yn ) = p and V(Yn ) = pq/n
Chebyshev inequality P(|Yn - E(Yn )| > ✏) 6 V(Yn )/✏2
is an upper bound on the probability that an RV deviates from its mean.
8 / 16
Problem: A fair coin is tossed 200 times.

2. Find an upper bound on the probability that the fraction
of heads deviates from the expected by 0.1 or more.
Solution:
I Which RV X corresponds to a single coin toss outcome?
I Which RV Sn corresponds to the number of heads in n tosses?
I What is Yn = Sn /n?
I Can we apply the Chebyshev inequality to Yn ?
✓ ◆
Sn p(1 - p)
We have shown that P -p >✏ 6
n n✏2
In our problem, p = 0.5, n = 200, and ✏ = 0.1.
9 / 16
Fraction of Heads Yn in n Coin Tosses
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
As n increases, the concentration of Yn around its mean increases.
1 2
The possible values 0, , , . . . , 1 of Yn become closer to each other.
n n
10 / 16
Fraction of Heads Yn in n Coin Tosses
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
As n increases, the concentration of Yn around its mean increases.
1 2
The possible values 0, , , . . . , 1 of Yn become closer to each other.
n n
11 / 16
Continuous Random Variables
I For a discrete RV X, we talk about P(X = x) (X taking values).

I For a continuous RV X, we talk about P(a < X 6 b)
(X being in an interval)
-1 t x x+ 1
For example: P(-1 < X 6 t) = PX ((-1, t]) or

P(x < X 6 x + ) = PX ((x, x + ]).
12 / 16
Cumulative Distribution Function (CDF)
The cumulative distribution function of a continuous real-valued RV X is
FX (x) = P(X 6 x)
-1 t x x+ 1
Note that
PX ((x, x + ]) = F(x + ) - F(x)
13 / 16
Probability Density Function (PDF)
The probability density function of a continuous real-valued RV X is
P(x < X < x + )

f(x) = lim
!0
-1 t x x+ 1
Therefore,
PX ((x, x + ]) F(x + ) - F(x) d
f(x) = lim = lim = F(x) = F0 (x)
!0 !0 dx
14 / 16
Probability Density Function (PDF) - Properties
Zx
I F(x) = f(t)dt
-1
I fX (x) is not the probability that X takes the value x
Zb
I P((a, b]) = f(x)dx
a
Z1
I f(x)dx = 1
1
15 / 16
Reading Material
For this class:

For next class:

4.1, 4.2 in Introduction to Probability, Statistics, ...
16 / 16
CONTINUOUS RANDOM VARIABLES

1 / 18

-1 t x x+ 1

P(x < X 6 x + ) = PX ((x, x + ]).
2 / 18
FX (x) = P(X 6 x)
-1 t x x+ 1
Note that
PX ((x, x + ]) = F(x + ) - F(x)
3 / 18
P(x < X < x + )

f(x) = lim
!0
-1 t x x+ 1
Therefore,
PX ((x, x + ]) F(x + ) - F(x) d
f(x) = lim = = F(x) = F0 (x)
!0 dx
4 / 18
Zx
I F(x) = f(t)dt
-1
I fX (x) is not the probability that X takes the value x
Zb
a
Z1
I f(x)dx = 1
-1
5 / 18
Expected Value and Variance
The expected value µ = E(X) of a real-valued RV X is defined by

Z +1
µ = E(X) = xf(x)dx
-1
2
The variance = V(X) of a real-valued RV X is defined by
Z +1
2
= E (X - µ)2 = (x - µ)2 f(x)dx
-1
Z +1
= (x2 - 2µx + µ2 )f(x)dx
-1
Z +1 Z +1 Z +1
= x2 f(x)dx - 2µ xf(x)dx + µ2 f(x)dx
-1 -1 -1
2 2
= E(X ) - 2µ · E(X) + µ · 1
= E(X2 ) - µ2
6 / 18
Uniform Random Variable X ⇠ Uniform(a, b)
The PDF:
8 fX (x)
> 1
< for a 6 x 6 b,
f(x) = b - a
>
:0 for x < a or x > b 1
b-a
The mean and the variance:

a+b (b - a)2
E(X) = V(X) =
2 12
CDF: 8
>
> 0 for x < a
>
<x - a
F(x) = for a 6 x 6 b
>
> b-a
>
:1 for x > b a b x
Models e.g., lack of knowledge or indifference within a range of values.
7 / 18
A Spinner - An Example of a Uniform RV
The experiment consists of A unit-circumference circle:

1. spinning the pointer x
&
2. recording the label x
The PDF of X: 0
8
>
<1 for 0 6 x 6 1
f(x) =
>
:0 otherwise
Point x is distance x from 0.
A spinner is a continuous counterpart to a die.
8 / 18
An Example of a Uniform RV
I Michael comes to campus riding the A bus, which passes every

30min by Michael’s bus stop: 8:00, 8:30, 9:00, 9:30, . . .
I Michael arrives at any time equally likely between 9:00 and 9:30.
How long will Michael have to wait for the bus on average?
What is the probability that he waits less than five minutes?
9 / 18
2
Gaussian (Normal) Random Variable X ⇠ N(µ, )
2
f(x; µ, )
1
PDF:
µ = 0, = 0.4
1 (x-µ)2
f(x; µ, 2
)= p e- 2 2
2 2⇡
0.5
E(X) = µ µ = 1, = 0.8
2
V(X) =
0 x
-2 0 2 4
Models e.g., additive noise in communications, exam grade distribution.
10 / 18
Exponential Random Variable X ⇠ Expo( )
f(x; )
PDF:
8 1.5
< e- x
x > 0,
f(x) =
:0 x < 0.
1
= 1.5
CDF:
8
<1 - e- x
x > 0, 0.5
F(x) =
:0 x < 0.
= 0.75
x
0 1 2 3 4 5
Models e.g., device lifetime, service request inter-arrival and service time.
“How long do we have wait until something happens?”
11 / 18
The Mean and the Variance of X ⇠ Expo( )
Z1 Z1
E(X) = xfX (x) dx = xe- x
dx
0 0
1 Z1
- x
= -xe + e- x dx
0 0
x 1
e- 1
=0+ =
- 0
Z1
1
V(X) = E(X2 ) - E2 (X) = x2 fX (x) dx - 2
0
Z1 1 Z1
1 1
= x2 e- x dx - 2 = -x2 e- x + 2 xe- x
dx - 2
0 0 0
1 - x 1 1
2xe 2 1 2 1 1
= -x2 e- x
- - 2
e- x
- 2
= 2
- 2
= 2
.
0 0 0
12 / 18
The Memoryless Property of Exponential RVs
Let T ⇠ Expo( ). Then, for r, s > 0,
P(T > r + s | T > r) = P(T > s) .
Proof:
P(T > r + s \ T > r)
P(T > r + s | T > r) =
P(T > r)
P(T > r + s) 1 - F(r + s)
= =
P(T > r) 1 - F(r)
e- (r+s)
= = e- s
e- r
= 1 - F(s) = P(T > s)
13 / 18
What is Half-Life?
Which questions can we answer if we know the half-life?
14 / 18
Half Life Example
I Half-life is often used to specify exponential decay.
I E.g., “hard drives have a half-life of two years” means
1
Pr(T > 2) = .
2
where T is the time it takes for a new disk to fail (T is an RV).
What is the probability that a disk needs repair within its first year?
Soulution:
I We know that T is an exponential RV but are not given .
I We can find from the half-time, or
I We can use the memoryless property to solve this problem,
Pr(T > 2) = Pr(T > 1) Pr(T > 1 + 1|T > 1)

= Pr(T > 1) Pr(T > 1) = (Pr(T > 1))2 .
p p
=) Pr(T > 1) = Pr(T > 2) = 1/ 2.
=) Pr(T < 1) = 1 - Pr(T > 1) ⇡ 0.293.
15 / 18
Failure Rates
For an RV T modeling, e.g., a lifetime of a device, depending on
P(T > r + s | T > r) vs. P(T > s)
we can have
I increasing failure rate if P(T > r + s | T > r) > P(T > s)
I decreasing failure rate if P(T > r + s | T > r) < P(T > s)
I constant failure rate if P(T > r + s | T > r) = P(T > s)
16 / 18
Solve a problem before you go ...
A game is played as follows:

I A random number X is chosen uniformly from [0, 1].
I Then random numbers Y1 , Y2 , . . . are chosen independently and
uniformly from [0, 1], until Yi > X.
I You are then paid (i - 1) dollars.
What is a fair entrance fee for this game?
17 / 18
Reading Material
For this class:

4.1, 4.2 in Introduction to Probability, Statistics, ...
For next class:

Enjoy your Spring Break!
18 / 18
MULTIPLE CONTINUOUS RVs

Lecture Notes, April 2, 2018
1 / 27
The PDF:
8 fX (x)
> 1
< for a 6 x 6 b,
f(x) = b - a
>
:0 for x < a or x > b 1
b-a

a+b (b - a)2
E(X) = V(X) =
2 12
CDF: 8
>
> 0 for x < a
>
<x - a
>
> b-a
>
:1 for x > b a b x
2 / 27

&
The PDF of X: 0
8
>
<1 for 0 6 x 6 1
f(x) =
>
:0 otherwise
3 / 27
Recall Rolling a Die
Recall:
The range ⌦ is the set of all possible outcomes of an experiment.
Example:
The range for the die-rolling experiment is ⌦ = {1, 2, 3, 4, 5, 6}. “The
number of dots that turned up is smaller than 4” is an event.
4 / 27

I Suppose we learn that "the number that turned up smaller than 4".
How should we change our beliefs about which face turned up?

what is the distribution µ0 after the "smaller than 4" event?
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 1/3 1/3 1/3 0 0 0
5 / 27
Continuous Conditional Probability
If X is a continuous RV with PDF f(x) and E is an event with P(E) > 0,

we define a conditional density function by the formula
8
<f(x)/P(E), if x 2 E,
f(x|E) =
:0, if x 62 E.
Is the conditional density a PDF?
For any event F, we have P(F|E), the conditional probability of F given E,

Z Z
f(x) P(E \ F)
P(F|E) = f(x|E) dx = dx = .
F E\F P(E) P(E)
6 / 27
Continuous Conditional Probability – Example
Suppose we know the pointer is in the upper half of the circle – event E
What is then the probability of event F that 1/6 6 x 6 1/3 ?
x
E = [0, 1/2], F = [1/6, 1/3], and F \ E = F.
Therefore,
0
P(F \ E) 1/6 1
P(F|E) = = =
P(E) 1/2 3
The conditional density is Uniform(0, 1/2):

8 8
<f(x)/P(E), if x 2 E, <2, if 0 6 x < 1/2,
f(x|E) = =
:0, if x 62 E. :0, if 1/2 6 x < 1.
7 / 27
Joint PDFs and CDFs of Multiple Continuous RVs
I Let X1 , X2 , . . . , Xn be continuous RVs and X̄ = (X1 , X2 , . . . , Xn ).

Then the joint cumulative distribution function of X̄ is defined by
F(x1 , x2 , . . . , xn ) = P(X1 6 x1 , X2 6 x2 , . . . , Xn 6 xn )
I The joint density function of X̄ satisfies the following:

Z x1 Z x2 Z xn
I F(x1 , x2 , . . . , xn ) = ··· f(t1 , t2 , . . . tn ) dtn dtn-1 . . . dt1
-1 -1 -1
@n F(x1 , x2 , . . . , xn )
I f(x1 , x2 , . . . , xn ) =
@x1 @x2 · · · @xn
8 / 27
Joint and Marginal PDFs of Two Continuous RVs
If continuous RVs X and Y have joint PDF f(x, y),

then the marginal PDFs fX (x) and fY (y) are given by
Z1 Z1
fX (x) = f(x, y)dy and fY (y) = f(x, y)dx .
-1 -1
9 / 27
Independent Continuous RVs
I Continuous RVs X1 , X2 , . . . , Xn with CDFs F1 (x), F2 (x), . . . , Fn (x)

are mutually independent iff
F(x1 , x2 , . . . , xn ) = F1 (x1 )F2 (x2 ) · · · Fn (xn )
for any choice of x1 , x2 , . . . , xn .
(For two RVs, we say that they are independent.)
or equivalently
I Continuous RVs X1 , X2 , . . . , Xn with PDFs f1 (x), f2 (x), . . . , fn (x)
f(x1 , x2 , . . . , xn ) = f1 (x1 )f2 (x2 ) · · · fn (xn )
10 / 27
Expectation of Sums and Products
I If X and Y are real-valued RVs and c is any constant, then
E(X + Y) = E(X) + E(Y) ,

E(cX) = cE(X) .
I If X and Y are independent real-valued RVs, then
E(XY) = E(X)E(Y)
Proof: Let ⌦X be the range of X and ⌦Y be the range of Y. Then,

Z Z
E(XY) = xy f(x, y) dy dx
⌦X ⌦Y
Z Z
= xy fX (x)fY (y) dy dx
⌦X ⌦Y
Z Z
= xfX (x)dx · yfY (y)dy
⌦X ⌦Y
11 / 27
Variance of Sums
I If X is a real-valued RV and c is any constant, then
V(cX) = c2 V(X) ,
V(X + c) = V(X) .
V(X + Y) = V(X) + V(Y) .
12 / 27
Definition:

⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y]) ,
cov(X, Y)
and the correlation coefficient is ⇢X,Y = p
V(X)V(Y)
13 / 27
⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y])
1. cov(X, X) = V(X)
3. -1 6 ⇢(X, Y) 6 1
4. V(X + Y) = V(X) + V(Y) + 2 cov(X, Y)
5. cov(X, Y) = E [X · Y] - E [X] · E [Y]
TRUE OR FALSE?
14 / 27
PMF of of the Sum of Two Independent RVs
A die is rolled twice. Let X1 and X2 be the outcomes, and S2 = X1 + X2 .
Then X1 and X2 iid (independent identically distributed) with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
The PMF of S2 is then the convolution of this PMF with itself:

1 1 1
P(S2 = 2) = m(1)m(1) = · =
6 6 36
1 1 1 1 2
P(S2 = 3) = m(1)m(2) + m(2)m(1) = · + · =
6 6 6 6 36
1 1 1 1 1 1 3
P(S2 = 4) = m(1)m(3) + m(2)m(2) + m(3)m(1) = · + · + · =
6 6 6 6 6 6 36
P(S2 = 5) = 4/36 P(S2 = 6) = 5/36 P(S2 = 7) = 6/36 P(S2 = 8) = 5/36

P(S2 = 9) = 4/36 P(S2 = 10) = 3/36 P(S2 = 11) = 2/36 P(S2 = 12) = 1/36
15 / 27
PMF of the Sum of Two Independent RVs

[
(X = k) and (Y = z - k)
k

X
P(Z = z) = P (X = k) and (Y = z - k)
k

X
k
16 / 27
PDF of the Sum of Two Independent RVs
Let X and Y be two independent continuous RVs and Z = X + Y

If the PDFs of X & Y are fX & fY , we can find fZ , the PDF of Z :
Z +1
fZ = (fX ⇤ fY )(z) = fX (z - y)fY (y)dy
-1
Z +1
= fY (z - x)fX (x)dx
-1
17 / 27
Dirac’s Delta Function
I not a function, something like

8
<+1, x = 0
(x) =
:0, x 6= 0
I samples function f at point x:

Z1
f(t) (x - t) dt = f(x)
-1
18 / 27
Mathematical Convenience and/or Physical Reality?
How precisely can we locate a particle? Nice video if you like physics.
Is there a limit to the divisibility of space? ⇠ 10-35 m Planck length.
19 / 27
The Step and the Impulse Function
Dirac’s delta function is in ECE known as the impulse.

It comes from the derivative of the step function.
unit step: unit impulse:

H(x) d
(x) = dx H(x)
1 1
0.5 0.5
-4 -2 2 4 x x
20 / 27
Continuous Representation of Discrete RVs
Rolling a die with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
PDF with impulses and the corresponding CDF:
6
X 6
X
1 1
f(x) = · (x - i) F(x) = · H(x - i)
6 6
i=1 i=1
1 1
0.5 0.5
1 2 3 4 5 6 x 2 4 6 x
21 / 27
Mixed Random Variables
X is a mixed RV iff its PDF has both impulses and nonzero, finite values.
Example:
Observe someone dialing a phone and record the duration of the call.
Your observation tells you the following:
I 1/3 of the calls are not answered (and thus last 0 minutes),
I the duration of answered calls is U(0, 3) in minutes.
Let X denote the call duration. Find the PDF, CDF, and the mean of Y.
22 / 27
Random Processes
A random process is a collection of RVs Xt , t 2 T that
I have a common sample space ⌦
I are usually indexed by time t, t 2 T
Examples:
I stock price over some period of time:
I Bernoulli trials (e.g., coin tossing) is a discrete-time random process.

23 / 27
Joint PDFs and CDFs for Random Processes
I Let Xt , t 2 T be a random process and X̄ = (Xt1 , . . . , Xtn ), ti 2 T .

F(xt1 , . . . , xtn ) = P(Xt1 6 xt1 , . . . , Xtn 6 xtn )

Z xt Z xtn
1
I F(xt1 , . . . , xtn ) = ··· f(⌧t1 , . . . ⌧tn ) d⌧tn . . . d⌧t1
-1 -1
@n F(xt1 , . . . , xtn )
I f(xt1 , . . . , xtn ) =
@xt1 · · · @xtn
I The mean µt and the variance t are also functions of time.
24 / 27
Stationarity
I Random process Xt , t 2 R is stationary if
f(xt1 , . . . , xtn ) = f(xt1 +⌧ , . . . , xtn +⌧ ) for all ⌧ 2 R
i.e., joint PDF (and thus CDF) does not change by shifts ⌧ in time.
I Random process Xt , t 2 R is wide-sense stationarity (WSS) if

1. its mean is constant in time:
µ = E[Xt ] = E[Xt+⌧ ] for all ⌧ 2 R
2. its covariance does not change by shifts in time:
cov(Xt , Xt+⌧ ) = cov(Xt+s , Xt+s+⌧ ) = C(⌧)
25 / 27
Autocorrelation and Cross-correlation
I Let Xt , t 2 T be a random process.

The autocorrelation is the correlation between Xt and Xs , t, s 2 T :
E[(Xt - µt )(Xs - µs )]
R(s, t) =
t s
I Let Xt and Yt , t 2 T be two random process.

The cross-correlation is the correlation between Xt and Ys , t, s 2 T :
E[(Xt - µt )(Ys - µs )]
R(s, t) =
t s
26 / 27
Reading Material
For this class:

4, 5.2, 6.1.2, 10.1 in Introduction to Probability, Statistics, ...
4.2, 6.3, 7.2 in Introduction to Probability
For next class:

4.2.3 in Introduction to Probability, Statistics, ...
27 / 27
(IN)DEPENDENCE OF CONTINUOUS RVs

1 / 18

-1 t x x+ 1

P(x < X 6 x + ) = PX ((x, x + ]).
2 / 18
FX (t) = P(X 6 t)
-1 t x x+ 1
Note that
PX ((x, x + ]) = F(x + ) - F(x)
3 / 18
P(x < X < x + )

f(x) = lim
!0
-1 t x x+ 1
Therefore,
PX ((x, x + ]) F(x + ) - F(x) d
f(x) = lim = lim = F(x) = F0 (x)
!0 !0 dx
4 / 18
d
I f(x) = F(x), and is not the probability that X takes the value x
dx
Zx
I F(x) = f(t)dt
-1
Z1
I f(t)dt = 1
-1
Zb
a
Z
I f(x)dx probability of event E
E
5 / 18
Expected Value and Variance
The expected value µ = E(X) of a real-valued RV X is defined by

Z +1
µ = E(X) = xf(x)dx
-1
2
The variance = V(X) of a real-valued RV X is defined by
Z +1
2
= E (X - µ)2 = (x - µ)2 f(x)dx
-1
Z +1
= (x2 - 2µx + µ2 )f(x)dx
-1
Z +1 Z +1 Z +1
= x2 f(x)dx - 2µ xf(x)dx + µ2 f(x)dx
-1 -1 -1
2 2
= E(X ) - 2µ · E(X) + µ · 1
= E(X2 ) - µ2
6 / 18
The PDF:
8 fX (x)
> 1
< for a 6 x 6 b,
f(x) = b - a
>
:0 for x < a or x > b 1
b-a

a+b (b - a)2
E(X) = V(X) =
2 12
CDF: 8
>
> 0 for x < a
>
<x - a
>
> b-a
>
:1 for x > b a b x
7 / 18

&
The PDF of X: 0
8
>
<1 for 0 6 x 6 1
f(x) =
>
:0 otherwise
8 / 18
Recall Rolling a Die
Recall:
The range ⌦ is the set of all possible outcomes of an experiment.
Example:
The range for the die-rolling experiment is ⌦ = {1, 2, 3, 4, 5, 6}. “The
number of dots that turned up is smaller than 4” is an event.
9 / 18

I Suppose we learn that "the number that turned up smaller than 4".
How should we change our beliefs about which face turned up?

what is the distribution µ0 after the "smaller than 4" event?
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 1/3 1/3 1/3 0 0 0
10 / 18
Continuous Conditional Probability
If X is a continuous RV with PDF f(x) and E is an event with P(E) > 0,

we define a conditional density function by the formula
8
<f(x)/P(E), if x 2 E,
f(x|E) =
:0, if x 62 E.
Is the conditional density a PDF?
For any event F, we have P(F|E), the conditional probability of F given E,

Z Z
f(x) P(E \ F)
P(F|E) = f(x|E) dx = dx = .
F E\F P(E) P(E)
11 / 18
Continuous Conditional Probability – Example
Suppose we know the pointer is in the upper half of the circle – event E
What is then the probability of event F that 1/6 6 x 6 1/3?
x
E = [0, 1/2], F = [1/6, 1/3], and F \ E = F.
Therefore,
0
P(F \ E) 1/6 1
P(F|E) = = =
P(E) 1/2 3
The conditional density is Uniform(0, 1/2):

8 8
<f(x)/P(E), if x 2 E, <2, if 0 6 x < 1/2,
f(x|E) = =
:0, if x 62 E. :0, if 1/2 6 x < 1.
12 / 18
Joint PDFs and CDFs of Multiple Continuous RVs
I Let X1 , X2 , . . . , Xn be continuous RVs and X̄ = (X1 , X2 , . . . , Xn ).

F(x1 , x2 , . . . , xn ) = P(X1 6 x1 , X2 6 x2 , . . . , Xn 6 xn )

Z x1 Z x2 Z xn
I F(x1 , x2 , . . . , xn ) = ··· f(t1 , t2 , . . . tn ) dtn dtn-1 . . . dt1
-1 -1 -1
@n F(x1 , x2 , . . . , xn )
I f(x1 , x2 , . . . , xn ) =
@x1 @x2 · · · @xn
13 / 18
Joint and Marginal PDFs of Two Continuous RVs
If continuous RVs X and Y have joint PDF f(x, y),

then the marginal PDFs fX (x) and fY (y) are given by
Z1 Z1
fX (x) = f(x, y)dy and fY (y) = f(x, y)dx .
-1 -1
14 / 18
Independent Continuous RVs
I Continuous RVs X1 , X2 , . . . , Xn with CDFs F1 (x), F2 (x), . . . , Fn (x)

F(x1 , x2 , . . . , xn ) = F1 (x1 )F2 (x2 ) · · · Fn (xn )
(For two RVs, we say that they are independent.)
or equivalently
I Continuous RVs X1 , X2 , . . . , Xn with PDFs f1 (x), f2 (x), . . . , fn (x)
f(x1 , x2 , . . . , xn ) = f1 (x1 )f2 (x2 ) · · · fn (xn )
15 / 18
Definition:

⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y]) ,
cov(X, Y)
and the correlation coefficient is ⇢X,Y =
V(X)V(Y)
16 / 18
⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y])
1. cov(X, X) = V(X)
3. -1 6 ⇢(X, Y) 6 1
4. V(X + Y) = V(X) + V(Y) + 2 cov(X, Y)
5. cov(X, Y) = E [X · Y] - E [X] · E [Y]
TRUE OR FALSE?
17 / 18
Reading Material
For this class:

For next class:

4.3.1, 4.3.2, 6.1.2 in Introduction to Probability, Statistics, ...
18 / 18
SUMS & MIXES OF RVs, PROCESSES

1 / 18
Expectation of Sums and Products
I If X and Y are real-valued RVs and c is any constant, then
E(X + Y) = E(X) + E(Y) ,

E(cX) = cE(X) .
E(XY) = E(X)E(Y)
Proof: Let ⌦X be the range of X and ⌦Y be the range of Y. Then,

Z Z
E(XY) = xy f(x, y) dy dx
⌦X ⌦Y
Z Z
= xy fX (x)fY (y) dy dx
⌦X ⌦Y
Z Z
= xfX (x)dx · yfY (y)dy
⌦X ⌦Y
2 / 18
Variance of Sums
I If X is a real-valued RV and c is any constant, then
V(cX) = c2 V(X) ,
V(X + c) = V(X) .
V(X + Y) = V(X) + V(Y) .
3 / 18
PMF of of the Sum of Two Independent RVs
A die is rolled twice. Let X1 and X2 be the outcomes, and S2 = X1 + X2 .
Then X1 and X2 iid (independent identically distributed) with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
The PMF of S2 is then the convolution of this PMF with itself:

1 1 1
P(S2 = 2) = m(1)m(1) = · =
6 6 36
1 1 1 1 2
P(S2 = 3) = m(1)m(2) + m(2)m(1) = · + · =
6 6 6 6 36
1 1 1 1 1 1 3
P(S2 = 4) = m(1)m(3) + m(2)m(2) + m(3)m(1) = · + · + · =
6 6 6 6 6 6 36
P(S2 = 5) = 4/36 P(S2 = 6) = 5/36 P(S2 = 7) = 6/36 P(S2 = 8) = 5/36

P(S2 = 9) = 4/36 P(S2 = 10) = 3/36 P(S2 = 11) = 2/36 P(S2 = 12) = 1/36
4 / 18
PMF of the Sum of Two Independent RVs

[
(X = k) and (Y = z - k)
k

X
P(Z = z) = P (X = k) and (Y = z - k)
k

X
k
5 / 18
PDF of the Sum of Two Independent RVs
Let X and Y be two independent continuous RVs and Z = X + Y

If the PDFs of X & Y are fX & fY , we can find fZ , the PDF of Z :
Z +1
fZ = (fX ⇤ fY )(z) = fX (z - y)fY (y)dy
-1
Z +1
= fY (z - x)fX (x)dx
-1
6 / 18
Sum of Two Independent Normal Random Variables
1 2 1 2
Let X ⇠ p e-x /2 and Y ⇠ p e-y /2 be independent RVs.
2⇡ 2⇡
What can we say about Z = X + Y?
1 (x-µ)2
I Recall that p e- is the PDF
2 2
2 2⇡
2
of a Gaussian RV with the meant µ and variance .
I Note that X and Y are zero-mean, unit-variance Gaussian.

Therefore, E(Z) = E(X) + E(Y) = 0 and V(Z) = V(X) + V(Y) = 2.
What is the PDF of Z?
7 / 18
Sum of Two Independent Normal Random Variables
1 2 1 2
Let X ⇠ p e-x /2 and Y ⇠ p e-y /2 be independent RVs.
2⇡ 2⇡
Find the PDF of Z = X + Y.
We have
fZ (z) = fX ⇤ fY (z)
Z
1 +1 -(z-y)2 /2 -y2 /2
= e e dy
2⇡ -1
Z
1 -z2 /4 +1 -(y-z/2)2
= e e dy
2⇡ -1
 Z
1 -z2 /4 p 1 1 -(y-z/2)2
= e ⇡ p e dy
2⇡ ⇡ -1
| {z }
=1
1 2 1 2
= p e-z /4 = p e-z /(2·2)
4⇡ 2·2·⇡
8 / 18
Dirac’s Delta Function
I not a function, something like

8
<+1, x = 0
(x) =
:0, x 6= 0
I samples function f at point x:

Z1
f(t) (x - t) dt = f(x)
-1
9 / 18
Mathematical Convenience and/or Physical Reality?
How precisely can we locate a particle? Nice video if you like physics.
Is there a limit to the divisibility of space? ⇠ 10-35 m Planck length.
10 / 18
The Step and the Impulse Function
Dirac’s delta function is in ECE known as the impulse.

It comes from the derivative of the step function.
unit step: unit impulse:

H(x) d
(x) = dx H(x)
1 1
0.5 0.5
-4 -2 2 4 x x
11 / 18
Continuous Representation of Discrete RVs
Rolling a die with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
PDF with impulses and the corresponding CDF:
6
X 6
X
1 1
f(x) = · (x - i) F(x) = · H(x - i)
6 6
i=1 i=1
1 1
0.5 0.5
1 2 3 4 5 6 x 2 4 6 x
12 / 18
Mixed Random Variables
X is a mixed RV iff its PDF has both impulses and nonzero, finite values.
Example:
Observe someone dialing a phone and record the duration of the call.
Your observation tells you the following:
I 1/3 of the calls are not answered (and thus last 0 minutes),
I the duration of answered calls is U(0, 3) in minutes.
Let X denote the call duration. Find the PDF, CDF, and the mean of Y.
13 / 18
Random Processes
A random process is a collection of RVs Xt , t 2 T that
I have a common sample space ⌦
I are usually indexed by time t, t 2 T
Examples:
I stock price over some period of time:
I Bernoulli trials (e.g., coin tossing) is a discrete-time random process.

14 / 18
Joint PDFs and CDFs for Random Processes
I Let Xt , t 2 T be a random process and X̄ = (Xt1 , . . . , Xtn ), ti 2 T .

F(xt1 , . . . , xtn ) = P(Xt1 6 xt1 , . . . , Xtn 6 xtn )

Z xt Z xtn
1
I F(xt1 , . . . , xtn ) = ··· f(⌧t1 , . . . ⌧tn ) d⌧tn . . . d⌧t1
-1 -1
@n F(xt1 , . . . , xtn )
I f(xt1 , . . . , xtn ) =
@xt1 · · · @xtn
I The mean µt and the variance t are also functions of time.
15 / 18
Stationarity
I Random process Xt , t 2 R is stationary if
f(xt1 , . . . , xtn ) = f(xt1 +s , . . . , xtn +s ) for all ⌧ 2 R
i.e., joint PDF (and thus CDF) does not change by shifts s in time.
I Random process Xt , t 2 R is wide-sense stationarity (WSS) if

1. its mean is constant in time:
µ = E[Xt ] = E[Xt+s ] for all s 2 R
2. its covariance does not change by shifts in time.
cov(Xt , Xt+⌧ ) = E[(Xt - µ)(Xt+⌧ - µ)] = cov(Xt+s , Xt+s+⌧ ) = C(⌧)
16 / 18
Autocorrelation and Cross-correlation
I Let Xt , t 2 T be a random process.

The autocorrelation is the correlation between Xt and Xs , t, s 2 T :
E[(Xt - µt )(Xs - µs )]
R(s, t) =
t s
I Let Xt and Yt , t 2 T be two random process.

The cross-correlation is the correlation between Xt and Ys , t, s 2 T :
E[(Xt - µt )(Ys - µs )]
R(s, t) =
t s
17 / 18
Reading Material
For this class:

4.3.1, 4.3.2, 6.1.2, 10 in Introduction to Probability, Statistics, ...
For next class:

4.2.3 in Introduction to Probability, Statistics, ...
18 / 18
LIMIT DISTRIBUTIONS & TAIL INEQUALITIES

1 / 17
Problem: A fair coin is tossed n times (e.g. 50, 100, 200)
2. How likely is it that the fraction of heads deviates
from the expected by 0.1 or more?
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
2 / 17
Number of Heads in n = 50 Fair-Coin Tosses
I The number of heads is Xn ⇠ B(n, 0.5) ) E(X50 ) = 25.

I How likely is X50 to deviate by more than 5 from 25?
0.12
Two tails of the PMF: 0.10
20 ✓ ◆⇣ ⌘ ⇣
X 50 1 k 1 ⌘n-k 0.08
probability
1-
k 2 2
k=0 0.06
+
0.04
✓ ◆
50 ⇣ 1 ⌘k ⇣ 1 ⌘n-k
50
X
1- 0.02
k 2 2
k=30
0.00
10 15 20 25 30 35 40
number of heads
3 / 17
A Tail Inequality
Markov’s inequality:
If X is a nonnegative random variable and a > 0, we have
E(X)
P(X > a) 6
a
Example:
I The number of heads in n tosses is Xn ⇠ B(n, 0.5) ) E(X50 ) = 25.
I By Markov’s inequality, we have
5
P(X > 30) 6
6
Not a very impressive bound!
Markov’s inequality is used to derive tighter bounds on tail probabilities.
4 / 17
Measuring Deviation from the Mean
Claim: For any RV X and any positive real number ✏ > 0, we have a
bound on the probability that X differs from E(X) by ✏ or more:
V(X)
P(|X - E(X)| > ✏) 6 Chebyshev Inequality
✏2
Proof: Let m(x) denote the PMF of X and µ = E(X).

X
V(X) = (x - µ)2 m(x) variance definition
x
X
> (x - µ)2 m(x) dropping positive terms
|x-µ|>✏
X
> ✏2 m(x) region of |x - µ| > ✏
|x-µ|>✏
X
= ✏2 m(x)
|x-µ|>✏
= ✏2 P(|X - E(X)| > ✏) event probability definition

5 / 17
Another Tail(s) Inequality
Chebychev inequality:
For any RV X and any positive real number ✏ > 0, we have
V(X)
P(|X - E(X)| > ✏) 6
✏2
Example:
I The number of heads in n tosses is Xn ⇠ B(n, 0.5) )
E(X50 ) = 25 and V(X50 ) = 12.25
I By Chebychev inequality, we have
P(|X50 - 25| > 5) 6 0.5
6 / 17
Law of Large Numbers - Statement

Let Sn = X1 + X2 + · · · + Xn .
✓ ◆
Sn
P -µ >✏ ! 0 as n ! 1
n
7 / 17
Law of Large Numbers - Proof

with E(Xj ) = µ < 1 and V(Xj ) < 1 (finite mean and variance).
Let Sn = X1 + X2 + · · · + Xn . Then
1. E(Sn ) = nµ expectation of the sum of RVs
2
2. V(Sn ) = n variance of the sum of independent RVs
Let Yn = Sn /n. Then

1. E(Yn ) = E(Sn )/n = µ.
2. V(Yn ) = V(Sn )/n2 = 2
/n
V(Yn )
3. P(|Yn - E(Yn )| > ✏) 6 Chebyshev inequality
✏2
✓ ◆ 2
Sn
P -µ >✏ 6 ! 0 as n ! 1
n n✏2
8 / 17
The Central Limit Theorem (CLT) – Statement

Let Sn = X1 + X2 + · · · + Xn .
1
Consider the RVs p (Sn - nµ) for n = 1, 2, . . . ,
n
2
and note that their mean is 0 and their variance is .
1
CLT: RVs p (Sn - nµ) "converge in distribution" N(0, 2 ),
n
,
1
RVs p (Sn - nµ) "converge in distribution" to the standard normal:
n 2
Sn - nµ d
p ! N (0, 1) .
n 2
9 / 17
The Standardized Sum for Bernoulli Trials
I Let X1 , X2 , . . . , Xn be an independent Bernoulli(p) with q = 1 - p.

I Let Sn = X1 + X2 + · · · + Xn . Recall that Sn ⇠ B(n, p).
Then the standardized sum of Xj is given by
Sn - np
S⇤n = p .
npq
Note that S⇤n has expected value 0 and variance 1.
By the CLT, S⇤n converges to the standard normal RV in distribution, i.e.,
P(S⇤n 6 x) ! (x) as n ! 1
10 / 17
Binomial Distribution Approximations
Let Sn ⇠ B(n, p) be a Binomial RV with parameters n and p.

✓ ◆
n k
P(Sn = k) = p (1 - p)n-k
k
We have just see that, when p is fixed and n is large, then

p
(Sn - np)/ npq resembles the standard normal RV.
How does Sn look like when n ! 1 and p ! 0 s.t. pn ! ?

Why do we care?
11 / 17
Point Processes
Certain kind of occurrence can happen at random over a period of time:

I phone calls to a police precinct from 6pm until midnight
I NYC commuters arriving to the NB train station between 6 & 10am
I RU students arriving to the Busch Center
I between 11:30am and 2:30pm
I between 2:30pm and 4:30pm
We want to model such scenarios to consider events such as, e.g.,

"more than 10 phone calls occurring in a 5-minute time interval"
12 / 17
Point Processes
Consider RU students arriving to the Busch Center over a 2 hour period.

We know (assume):
I the average number of arrivals over a period of time, e.g.,
I 600 between 11:30am and 2:30pm, i.e., 5 arrivals per minute
I 120 between 2:30pm and 4:30pm, i.e., 1 arrival per minute
I an arrival can happen at any time
I the number of arrivals in non-overlapping intervals are independent
We can use the Binomial distribution to model such arrival scenarios.
13 / 17
Point Processes
We can use the Binomial distribution to model such arrival scenarios:

I is the arrival rate (the number of arrivals in a unit time interval)
I The unit time interval is broken up into n subintervals of equal
length s.t. either 0 or 1 occurrences can happen in the subinterval.
Arrivals over n = 60 seconds in 1 minute
0 10 20 30 40 50 60
) the number of occurrences in the 1 minute interval is a B(n, p).

What is p?
I We know that
I the expected number of arrivals is ·1
I the mean of X ⇠B(n, p) is n · p
) p = /n. (Note that p ! 0 as n ! 1)
14 / 17
Poisson Arrivals
Let X ⇠ B(n, p) be a Binomial RV with parameters n and p. s.t. p = .

n
Then, for large n, we have

✓ ◆
n k
P(X = k) = p (1 - p)n-k
k
(n - 1)(n - 2) . . . (n - k + 1) k ⇣ ⌘(n-k)
= · k · 1-
| k!
{z } n | n{z }
⇠ nk /k! ⇠ e- (n-k)/n
k -
e
! as n ! 1
k!
15 / 17
Inter-Arrivals Times
1
The probability that the time between two arrivals is k · is
n
⇣ ⌘k
1- · ⇠ · e- k· t
· t
n n
where t = 1/n.
Note that the inter-arrival times are geometric ⇠ exponential.
16 / 17
Reading Material
For this class:

6.2.2, 7 in Introduction to Probability, Statistics, ...
9, 5.1 in Introduction to Probability
17 / 17
PROBABILITY IN ECE: TELECOMMUNICATIONS

1 / 13
Telecommunications
What did Alice say?
X Y b
X
transmitter channel receiver
I Bob’s lack of knowledge makes Alice’s message a Bernoulli RV.

I Channel output RV Y is a noisy version of its input RV X.
The conditional distribution W(Y|X) characterizes the channel.
I Based on the received y, an estimate b
x of the true x is made.
I The error rate Pr(x 6= b
x) is minimized if b
x is chosen
so that it maximizes Pr(x | y) among all x 2 ⌦x .
2 / 13
Communications Channel
X Y b
X
transmitter channel receiver
A communications channel involves at least 2 RVs:

I X – the input; its range ⌦X is called the input alphabet
I Y – the output; its range ⌦Y is called the output alphabet
The relation between the input and the output is described by, e.g.,
I the conditional PDF of the output given the input W(Y | X)
(we call W the transition probability)
I a noise RV Z added to the input s.t. Y = X + Z.
We have some freedom in choosing the input (alphabet, PMF, strings).
3 / 13
The Binary Symmetric Channel BSC(p)
1-p
Binary input and output: 0 0
⌦X = ⌦Y = {0, 1}
p
X p Y
W(0 | 0) = W(1 | 1) = 1 - p
W(1 | 0) = W(0 | 1) = p 1 1
1-p
We can instead say
Y = X + Z mod 2 binary addition, XOR
where Z ⇠ Bernoulli(p)
4 / 13
Binary Erasure Channel BEC(✏)
1-✏
Binary input and ternary output: 0 0
⌦X = {0, 1}, ⌦Y = {0, 1, -} ✏
W(1 | 0) = W(0 | 1) = 0 X - Y
W(0 | 0) = W(1 | 1) = 1 - ✏
✏
W(- | 0) = W(- | 1) = ✏ 1 1
1-✏
I Is this channel additive?

I Is there a Bernoulli process here?
5 / 13
Binary Input Additive Gaussian Noise Channel
W(y | x)
-1 or 1 input and real-valued
output: ⌦X = {-1, 1}, ⌦Y = R
x=1
x = -1
1 - (x+1)
2
W(y | -1 ) = p e 2 2
2 2⇡
1 (x-1)2
W(y | 1 ) = p e- 2 2
2 2⇡
y
-1 0 1
2
We can instead say Y = X + Z where Z ⇠ N(0, ).
6 / 13
The Optimal Detector and its Error Rate
W(y | x)
I We assume equal priors:
P(-1) = P(-1) = 1/2.

x=1
x = -1
I The optimal detector:
-1
y 7 0
1
y
-1 0 1
1 1
Probability of error Pe is given by Pe =P(e|1) + P(e| - 1).
2 2
Because of symmetry, we have P(e|1) = P(e| - 1).
7 / 13
The Optimal Detector and its Error Rate
Suppose Alice sends -1. Then

2
I Bob receives Y ⇠ N(-1, )
I Probability of error is P(e| - 1) = P(Y > 0| - 1).
We have
Z1
1 (x+1)2
P(Y > 0| - 1) = p e- 2 2 dx set y = (x + 1)/
2⇡ 2
Z01
1 y2
= p e- 2 dy
1/ 2⇡
⇣1 ⌘
= Q
where ⇣ x ⌘ Z1
1 1 y2
Q(x) = erfc p = p e- 2 dy
2 2 2⇡ x
8 / 13
Introducing Redundancy (Dependence)
We protect messages from errors/erasures by transmitting redundant

information, namely, error/erasure correcting coding, e.g.,
I repetition coding
message transmit
0 000
1 111
I parity-check coding
message transmit
00 000
01 011
10 101
11 110
9 / 13
Random Variables in a Repetition Code on the BSC
I M - RV associated with the message, ⌦M = {0, 1}.

We first assume that P(0) = P(1) = 1/2, i.e., equal priors.
I X̄ - vector RV associated with the channel input, ⌦X̄ = {000, 111}.
I Ȳ - associated with the channel output,
⌦Ȳ = {000, 001, 010, 100, 111, 110, 101, 011}.
Given the channel output ȳ, we look for the most probable input,
namely, the one that maximizes P(X̄ = x̄ | Ȳ = ȳ):
P(Ȳ = ȳ | X̄ = x̄) · P(X̄ = x̄)

P(X̄ = x̄ | Ȳ = ȳ) =
P(Ȳ = ȳ)
⇠ P(Ȳ = ȳ | X̄ = x̄) · P(X̄ = x̄)
10 / 13
BSC(p) with a Repetition Code
Transitions from 000 input: Transitions from 111 input:
X̄ Ȳ P(Ȳ | X̄) X̄ Ȳ P(Ȳ | X̄)

3
000 (1 - p) 000 p3
100 (1 - p)2 p 100 p2 (1 - p)
010 (1 - p)2 p 010 p2 (1 - p)
000 001 (1 - p)2 p 111 001 p2 (1 - p)
110 (1 - p)p2 110 (1 - p)2 p
011 (1 - p)p2 011 (1 - p)2 p
101 (1 - p)p2 101 (1 - p)2 p
111 p3 111 (1 - p)3
Q: What should the optimal receiver do when we have equal priors?

A: On a memoryless BSC, the receiver should implement majority logic.
What if we did not have a memoryless channel and equal priors.
11 / 13
BSC(p) with a Majority Vote Repetition Code
Transitions from 000 input:
Pe
X̄ Ȳ P(Ȳ | X̄)
000 (1 - p)3
100 (1 - p)2 p
010 (1 - p)2 p
000 001 (1 - p)2 p
110 (1 - p)p2
011 (1 - p)p2
101 (1 - p)p2 p
111 p3
I Without coding, the probability of error is Pe = p.

I With coding, the probability of error is Pe = p3 + 3p2 (1 - p)
12 / 13
Parity-Check Coding on BEC(✏)
Px
Parity-check coding:
message transmit
00 000
01 011
10 101
11 110
I Without coding, the probability of erasure is Px = ✏.

⇥ ⇤
I With coding, the probability of erasure is Px = ✏ ✏2 + 2✏(1 - ✏)
13 / 13
PROBABILITY IN ECE: DETECTION

1/9
Binary Hypothesis Testing – Example
Example 3.11 in Introduction to Probability

I Ordinary aspirin is effective against headaches 60% of the time.
I A drug company claims that its new aspirin is more effective.
How can we test this claim?
There are two possibilities (the claim is either true or not):

I H0 - the new aspirin IS NOT better than old. null hypothesis
I H1 - the new aspirin IS better than old. alternative hypothesis
A test:
I We give the aspirin to n people to take when they have a headache.
I We accept H1 if at least m people are cured.
How should we determine this critical value m? How does n matter?
2/9
Binary Hypothesis Testing – Example
Consider 50 trials with the rate of cure 60% under H0 and 80% under H1 :
0.14
What should m be?
0.12
H0 H1
0.10
0.08
probability
0.06
0.04
0.02
0.00
0 10 20 30 40 50
# of cured people
3/9
Binary Hypothesis Testing – Errors
Let p0 be the success rate under H0 and p1 under H1 .
After n trials, we can make an error in two ways:

1. the true hypothesis is H0 and we decide H1 , type 1
n ✓ ◆
X n k
w.p. p (1 - p0 )n-k
k 0
k=m
2. the true hypothesis is H1 and we decide H0 . type 2
X✓
m-1 ◆
n k
w.p. p (1 - p1 )n-k
k 1
k=0
We determine the critical value m based on which error is less desirable?
4/9
Binary Hypothesis Testing – Errors
An error occurs if
1. the true hypothesis is H0 and we decide H1 , false alarm
or
2. the true hypothesis is H1 and we decide H0 . missed detection
We define the following disjoint events:
U0 = {decide H0 } and U1 = {decide H1 }
What can we say about events {U0 \ H1 } and {U1 \ H0 } ?
Then, the probability of detection error is

⇥ ⇤
P[error] = P {U1 \ H0 } [ {U0 \ H1 }
= P[U1 \ H0 ] + P[U0 \ H1 ]
= P[U1 |H0 ]P[H0 ] + P[U0 |H1 ]P[H1 ] .
5/9
Denial-of-Service (DoS) Cyber Attack
I Goal: Make a network resource unavailable to its intended users.
I DoS is typically accomplished by overloading the targeted resource
(e.g., cloud computing server) with superfluous requests.
I Possible symptoms:
1) seeming unavailability of a web site
2) extremely slow file download
We have to decide if there is an attack or normal traffic fluctuations?

6/9
Binary Hypothesis Testing – Ingredients
1. Two hypotheses: H0 and H1 .

In our example:
I H0 – the traffic is normal
I H1 – a DoS attack is happening
2. A probabilistic model of the system under each hypothesis.
In our example,
RV X – number of incoming packets in a time window of length T .
I Under normal traffic, X is Poisson(↵).
I Under a DoS attack, X is Poisson( ).
3. We can observe an event.
In our example,
we can count packets arriving in a time window of length T .
7/9
Binary Hypothesis Testing – Detection
1. We observe that the number of incoming packets is x.

2. If PH1 |X (x) > PH0 |X (x), then H1 “better explains the data”.
3. We know the PMFs of X under each hypothesis:
8 x 8 x
< ↵ e-↵ x = 0, 1, . . . < e- x = 0, 1, . . .
PX|H0 (x) = x! PX|H1 (x) = x!
:0 otherwise. :0 otherwise.
Therefore, we can find PH1 |X (x) and PH0 |X (x) if we know the priors:
PH1 |X (x) PX|H1 (x) · P(H1 )

=
PH0 |X (x) PX|H0 (x) · P(H0 )
⇣ ⌘x P(H1 )
= e- +↵ ·
↵ P(H0 )
where P(H0 ) and P(H1 ) are the priors.

8/9
Binary Hypothesis Testing – Decision Rule
Maximize the a-posteriori probability of the hypothesis (MAP):
Declare H1 : PH1 |X (x) > PH0 |X (x)
Declare H0 : PH1 |X (x) 6 PH0 |X (x).
PH1 |X (x) ⇣ ⌘x - +↵ P(H1 )

Since = e · ,
PH0 |X (x) ↵ P(H0 )
we have the following threshold based decision rule:
⇣ P[H ] ⌘
0
- ↵ + ln
P[H1 ]
Declare H1 : x > ⇣ ⌘
ln
↵
⇣ P[H ] ⌘
0
- ↵ + ln
P[H1 ]
Declare H0 : x 6 ⇣ ⌘
ln
↵
9/9
PROBABILITY IN ECE: MACHINE LEARNING

1 / 13
Bernoulli and Binomial RVs
I Let X1 , X2 , . . . , Xn be independent Bernoulli(p) with q = 1 - p.

I Let Sn = X1 + X2 + · · · + Xn .
We know that
I E(Xi ) = p and V(Xi ) = pq.
I E(Sn ) = np and V(Sn ) = npq.
I Sn ⇠ B(n, p)
n ✓ ◆
X n
I P(Sn > a) = pk q(n-k)
k
k=a | {z }
P(Sn =k)
2 / 13
Markov’s Inequality
If X is a nonnegative random variable and a > 0, we have
E(X)
P(X > a) 6
a
Example: Tossing a coin with P(H) = 0.25 n times.
I The number of heads in n tosses is Sn ⇠ B(n, 0.25)
) E(S400 ) = 100.
I For a = 150, by Markov’s inequality, we have
2
P(S400 > 150) 6
3
Not an impressive bound! In fact, P(S400 > 150) = 2.18 · 10-8 .
Markov’s inequality is used to derive other bounds on tail probabilities.
3 / 13
Chebychev Inequality
For a RV X, we can apply Markov’s inequality to RV Y = (X - E(X))2 :
E(Y) V(X)
P(Y > a) 6 , P((X - E(X))2 > a) 6
a a
p V(X)
, P(|X - E(X)| > a) 6
a
p
By setting ✏ = a, we get the Chebychev Inequality:
For any RV X and any positive real number ✏ > 0, we have
V(X)
P(|X - E(X)| > ✏) 6
✏2
Example: Tossing a coin with P(H) = 0.25 n times.
I The number of heads in n tosses is Sn ⇠ B(n, 0.25) )
E(S400 ) = 100 and V(S400 ) = 75
I By Chebychev inequality, we have
75
P(|S400 - 100| > 50) 6 = 0.03
502
4 / 13
The Standardized Sum for Bernoulli Trials
I Let X1 , X2 , . . . , Xn be an independent Bernoulli(p) with q = 1 - p.

I Let Sn = X1 + X2 + · · · + Xn . Recall that Sn ⇠ B(n, p).
Then the standardized sum of Xj is given by
Sn - np
S⇤n = p .
npq
Note that S⇤n has expected value 0 and variance 1.
By the CLT, S⇤n converges to the standard normal RV in distribution, i.e.,
P(S⇤n 6 x) ! (x) as n ! 1
In our example, we have p = 0.25, and are interested in P(S400 > 150)
5 / 13
Estimating the Sn Tail Probability by CLT
Note that
Sn - np a - np
S⇤n = p =) Sn > a , S⇤n > p
npq npq
In our example, we have p = 0.25 and a = 150, and thus have

p p 1 p
P(Sn > 150) = P S⇤n > 50/ 75 ⇡1 - 50/ 75 = erfc 50/ 75
2
1 p
We have erfc 50/ 75 ⇡ 4 · 10-9
2
6 / 13
Review Tree Diagrams – Jan. 29 Lecture
ice cream
meat
cake
ice cream
soup fish cake
ice cream
vegetable
cake
(start)
ice cream
meat
cake
ice cream
juice fish
cake
ice cream
vegetable
cake
7 / 13
ω m (ω)
meat ω1 .4
.5
.3 fish ω .24
soup 2
.2
.8
vegetable ω 3 .16
(start)
meat ω 4 .06
.2 .3

5
.3
vegetable ω 6 .06

8 / 13
The Multi-Armed Bandit Problem
Suppose you can go to your currant favorite restaurant or try a new one.
What would you do each evening for dinner over a month?
There are multiple choices, each providing known rewards, e.g.,

But you do not know the probabilities of awards given the choice.
At each time point, you can

I exploit one choice
I explore other options
9 / 13
An m-Coin Bandit Problem
Actions, Rewards, and Uncertainty

I You can make T coin tosses, and each head earns you a dollar.
I For each toss s, s 2 {1, . . . , T }, you can pick any of the m coins.
I You don’t know pi , the probability of head for coin i, i 2 {1, . . . , m}.
What should you do to maximize your cumulative reward?
10 / 13
The m-Armed Bandit Problem
Ingredients:
I A – known set of m possible actions (e.g., select & toss a coin)
I R – known set of possible rewards (e.g., get a dollar or not)
I P[r|a] – unknown PDFs of rewards r 2 R given an action a 2 A.
Dynamics:
At each step s
I the agent selects an action as 2 A
I the environment generates a reward rs 2 R w.p. P[rs |as ]
The goal is to maximize cumulative reward.
11 / 13
A 2-Coin Bandit Problem
Actions, Rewards, and Uncertainty

I You can make T coin tosses, and each head earns you a dollar.
I For each toss s, s 2 {1, . . . , T }, you can pick one of the two coins.
I You don’t know p1 and p2 the head probabilities for the coins.
What should you do to maximize your cumulative reward?

Let p⇤ = max{p1 , p2 }. We assume p1 > p2 , wlog.
Then the highest expected reward is V ⇤ = T · p⇤ = T p1 .
For a sequence ↵ of actions with the expected reward V↵ ,

we say that V ⇤ - V↵ is the regret.
The goal is to find ↵ with the minimum regret.
12 / 13
A 2-Coin Bandit Problem
What could you do to maximize your cumulative reward.
No Exploration Algorithm:
Pick a coin at random and toss T times. The expected reward is
1 1
V1 = T p1 + T p2 .
2 2
1
The regret is T (p1 - p2 ), and is linear in T .
2
Explore-First Algorithm:
1. Exploration phase: toss each coin n < T times.
2. Select the coin with the highest average reward.
3. Exploitation phase: toss the selected coin in all remaining rounds.
What else?
13 / 13
BAYES RULE & TOTAL PROBABILITY

1/7
Tree Diagrams – Jan. 29 Lecture
ice cream
meat
cake
ice cream
soup fish cake
ice cream
vegetable
cake
(start)
ice cream
meat
cake
ice cream
juice fish
cake
ice cream
vegetable
cake
2/7
ω m (ω)
meat ω1 .4
.5
.3 fish ω .24
soup 2
.2
.8
vegetable ω 3 .16
(start)
meat ω 4 .06
.2 .3

5
.3
vegetable ω 6 .06

3/7
An Experiment, Its Outcomes & Events
I Three types of users (Smartphone, Tablet, or Computer)

can be seen accessing an online store.
1
I The probability of seeing a Smartphone (S) is ,
2
1 1
of a Tablet (T ) is , and of a Computer (C) is .
4 4
I Each user requests either an Audio (A) or a Video (V) file.

1
I Audio is requested by a smart phone with probability (w.p.) ,
2
2 3
by a Tablet w.p. , and by a Computer w.p. .
5 10
4/7
Some Example Questions
1. Find the probability of a user requesting an Audio file?

2. Find the probability of a user requesting a Video file?
3. Find the probability that a user requesting a Video is a Tablet?
4. Find the probability that a user requesting an Audio is a Smartphone.
5. Find the probability that a user is a Computer & requests Video?
5/7
Some Example Questions
1. Find the probability of a user requesting an Audio file?
We can use the Total Probability Theorem across user types:
P[A] = P[A|S]P[S] + P[A|T ]P[T ] + P[A|C]P[C]

1 1 2 1 3 1 17
= · + · + · = .
2 2 5 4 10 4 40
2. Find the probability of a user requesting a Video file?

Since users request either Audio or Video files, we have
23
P[V] = P[Ac ] = 1 - P[A] = .
40
3. Find the probability that a user requesting a Video is a Tablet?

3 1
P[V|T ]P[T ] 5 · 4 6
We use Bayes’ Theorem: P[T |V] = = 23 = .
P[V] 40
23
6/7
Some Example Questions (continued)
4. Find the probability that a user requesting an Audio is a Smartphone.

We again use Bayes’ Theorem:
1 1
P[A|S]P[S] 2 · 2 10
P[S|A] = = 17 = .
P[A] 40
17
5. Find the probability that a user is a Computer & requests Video?

Note that this is an intersection of events
7 1 7
P[C \ V] = P[V|C]P[C] = · =
10 4 40
7/7

ECE226 ProbabilityClassNotes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECE226 ProbabilityClassNotes

Uploaded by

Copyright:

Available Formats

PROBABILITY & RANDOM PROCESSES (ECE 226) January 18, 2018

Rutgers, Spring 2018 Syllabus

PROBABILITY AND RANDOM PROCESSES

Instructor: Emina Soljanin emina.soljanin@rutgers.edu, CoRE 511, 848-445-5256

Instructor’s office hours: by appointment only

TAs’ office hours:

PROBABILITY AND RANDOM PROCESSES (ECE 226)

Lecture Notes, January 18, 2018

I The union of A and B is the set

I The intersection of A and B is the set

I The difference of A and B is the set

I The complement of A is the set

Example: If A = {1, 3, 5}, and B = {2, 4, 6}, what is A \ B?

Example: If ⌦ = {1, 2, 3, 4, 5, 6},

For this week:

For near future (only if you wish, proceed with caution):

PROBABILITY AND RANDOM PROCESSES (ECE 226)

Lecture Notes, January 22, 2018

The set of outcomes for the die-rolling experiment is {1, 2, 3, 4, 5, 6}.

Sample space ⌦ is the set of all possible outcomes of an experiment.

We shall study experiments from a probabilistic point of view.

I A, B, C are three events in a sample space ⌦.

(There may be several ways to write the same statement.)

2. not in any of the events A, B or C.

3. both in event A and in event B, but not in event C.

4. either in event A or, if it is not, then it is not in event B either.

The size of the sample space (any set) ⌦ is denoted by |⌦| .

Example: A die is rolled once:

I outcomes correspond to the 6 faces of the die,

The outcome of a random experiment is called a random variable, RV .

Can we associate more than one RV to an experiment?

We can have RVs X and Y with the respective sample spaces

A distribution function for X is defined as a real-valued function µ

We define the probability of an event E (E ⇢ ⌦) to be the number P(E)

Note that the probability P(E) is determined from the distribution µ.

We claim the following:

1. P(E) > 0 for every E ⇢ ⌦

Claim: If E ⇢ F ⇢ ⌦, then P(E) 6 P(F).

When A1 , . . . , An form a partition of ⌦, than for any E ⇢ ⌦,

For this class:

For next class (only if you wish, proceed with caution):

PROBABILITY AND RANDOM PROCESSES (ECE 226)

Lecture Notes, January 25, 2018

Sample space ⌦ is the set of all possible outcomes of an experiment.

How can we use the knowledge that an event occurred?

I If an unbiased die is rolled, we believe each number is equally likely.

If µ is the a priori distribution function over ⌦ = {1, 2, 3, 4, 5, 6},

Let µ be a distribution function assigned to ⌦.

Suppose we learn that an event E has occurred.

We shall call the new probability for an event F

For the "rolling an unbiased die" experiment,

What is P(F) and P(F|E) for the following events F ?

Before the experiment:

After the experiment which resulted in event E:

What is then P(F|E)?

I If an unbiased die is rolled, we believe each number is equally likely.

A: We need to answer the following questions:

I If an unbiased die is rolled, we believe each number is equally likely.

A: We need to answer the following questions:

Note that P(E \ F) = P(E)P(F|E) = P(E|F)P(F)

We get the Bayes Rule:

When A1 , . . . , An form a partition of ⌦, than for any E ⇢ ⌦,

Two facts we know: