Download as pdf or txt
Download as pdf or txt
You are on page 1of 610

4.

Distributions

Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology

8/5/20

ISYE 6739 — Goldsman 8/5/20 1 / 108


Outline
1 Bernoulli and Binomial Distributions
2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff
ISYE 6739 — Goldsman 8/5/20 2 / 108
Bernoulli and Binomial Distributions

Lesson 4.1 — Bernoulli and Binomial Distributions

ISYE 6739 — Goldsman 8/5/20 3 / 108


Bernoulli and Binomial Distributions

Lesson 4.1 — Bernoulli and Binomial Distributions

Goal: We’ll discuss lots of interesting distributions in this module.

ISYE 6739 — Goldsman 8/5/20 3 / 108


Bernoulli and Binomial Distributions

Lesson 4.1 — Bernoulli and Binomial Distributions

Goal: We’ll discuss lots of interesting distributions in this module.

The module will be a compendium of results, some of which we’ll prove, and
some of which we’ve already seen previously.

ISYE 6739 — Goldsman 8/5/20 3 / 108


Bernoulli and Binomial Distributions

Lesson 4.1 — Bernoulli and Binomial Distributions

Goal: We’ll discuss lots of interesting distributions in this module.

The module will be a compendium of results, some of which we’ll prove, and
some of which we’ve already seen previously.

Special emphasis will be placed on the Normal distribution, because it’s so


important and has so many implications, including the Central Limit
Theorem.

ISYE 6739 — Goldsman 8/5/20 3 / 108


Bernoulli and Binomial Distributions

Lesson 4.1 — Bernoulli and Binomial Distributions

Goal: We’ll discuss lots of interesting distributions in this module.

The module will be a compendium of results, some of which we’ll prove, and
some of which we’ve already seen previously.

Special emphasis will be placed on the Normal distribution, because it’s so


important and has so many implications, including the Central Limit
Theorem.

In the next few lessons we’ll discuss some important discrete distributions:
Bernoulli and Binomial Distributions
Hypergeometric Distribution
Geometric and Negative Binomial Distributions
Poisson Distribution

ISYE 6739 — Goldsman 8/5/20 3 / 108


Bernoulli and Binomial Distributions

Definition: The Bernoulli distribution with parameter p is given by


(
1 w.p. p (“success”)
X =
0 w.p. q (“failure”)

ISYE 6739 — Goldsman 8/5/20 4 / 108


Bernoulli and Binomial Distributions

Definition: The Bernoulli distribution with parameter p is given by


(
1 w.p. p (“success”)
X =
0 w.p. q (“failure”)

Recall: E[X] = p, Var(X) = pq, and MX (t) = pet + q.

ISYE 6739 — Goldsman 8/5/20 4 / 108


Bernoulli and Binomial Distributions

Definition: The Bernoulli distribution with parameter p is given by


(
1 w.p. p (“success”)
X =
0 w.p. q (“failure”)

Recall: E[X] = p, Var(X) = pq, and MX (t) = pet + q.

Definition: The Binomial distribution with parameters n and p is given


by  
n k n−k
P (Y = k) = p q , k = 0, 1, . . . , n.
k

ISYE 6739 — Goldsman 8/5/20 4 / 108


Bernoulli and Binomial Distributions

Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see.

ISYE 6739 — Goldsman 8/5/20 5 / 108


Bernoulli and Binomial Distributions

Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,

ISYE 6739 — Goldsman 8/5/20 5 / 108


Bernoulli and Binomial Distributions

Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
    
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6

ISYE 6739 — Goldsman 8/5/20 5 / 108


Bernoulli and Binomial Distributions

Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
    
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6

iid Pn
Theorem: X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).

ISYE 6739 — Goldsman 8/5/20 5 / 108


Bernoulli and Binomial Distributions

Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
    
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6

iid Pn
Theorem: X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).

Proof: This kind of result can easily be proved by a moment generating


function uniqueness argument, as we mentioned in previous modules. 2

ISYE 6739 — Goldsman 8/5/20 5 / 108


Bernoulli and Binomial Distributions

Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
    
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6

iid Pn
Theorem: X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).

Proof: This kind of result can easily be proved by a moment generating


function uniqueness argument, as we mentioned in previous modules. 2

Think of the Binomial as the number of successes from n Bern(p) trials.

ISYE 6739 — Goldsman 8/5/20 5 / 108


Bernoulli and Binomial Distributions

Theorem: Y ∼ Bin(n, p) implies


n
hX i n
X
E[Y ] = E Xi = E[Xi ] = np.
i=1 i=1

ISYE 6739 — Goldsman 8/5/20 6 / 108


Bernoulli and Binomial Distributions

Theorem: Y ∼ Bin(n, p) implies


n
hX i n
X
E[Y ] = E Xi = E[Xi ] = np.
i=1 i=1

Similarly,
Var(Y ) = npq.

ISYE 6739 — Goldsman 8/5/20 6 / 108


Bernoulli and Binomial Distributions

Theorem: Y ∼ Bin(n, p) implies


n
hX i n
X
E[Y ] = E Xi = E[Xi ] = np.
i=1 i=1

Similarly,
Var(Y ) = npq.
We’ve already seen that

MY (t) = (pet + q)n .

ISYE 6739 — Goldsman 8/5/20 6 / 108


Bernoulli and Binomial Distributions

Theorem: Y ∼ Bin(n, p) implies


n
hX i n
X
E[Y ] = E Xi = E[Xi ] = np.
i=1 i=1

Similarly,
Var(Y ) = npq.
We’ve already seen that

MY (t) = (pet + q)n .

Theorem: Certain Binomials add up: If Y1 , . . . , Yk are independent and


Yi ∼ Bin(ni , p),

ISYE 6739 — Goldsman 8/5/20 6 / 108


Bernoulli and Binomial Distributions

Theorem: Y ∼ Bin(n, p) implies


n
hX i n
X
E[Y ] = E Xi = E[Xi ] = np.
i=1 i=1

Similarly,
Var(Y ) = npq.
We’ve already seen that

MY (t) = (pet + q)n .

Theorem: Certain Binomials add up: If Y1 , . . . , Yk are independent and


Yi ∼ Bin(ni , p), then
k
X k
X 
Yi ∼ Bin ni , p .
i=1 i=1

ISYE 6739 — Goldsman 8/5/20 6 / 108


Hypergeometric Distribution

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 7 / 108


Hypergeometric Distribution

Lesson 4.2 — Hypergeometric Distribution

ISYE 6739 — Goldsman 8/5/20 8 / 108


Hypergeometric Distribution

Lesson 4.2 — Hypergeometric Distribution

Definition: You have a objects of type 1 and b objects of type 2.

ISYE 6739 — Goldsman 8/5/20 8 / 108


Hypergeometric Distribution

Lesson 4.2 — Hypergeometric Distribution

Definition: You have a objects of type 1 and b objects of type 2.

Select n objects without replacement from the a + b.

ISYE 6739 — Goldsman 8/5/20 8 / 108


Hypergeometric Distribution

Lesson 4.2 — Hypergeometric Distribution

Definition: You have a objects of type 1 and b objects of type 2.

Select n objects without replacement from the a + b.

Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf

ISYE 6739 — Goldsman 8/5/20 8 / 108


Hypergeometric Distribution

Lesson 4.2 — Hypergeometric Distribution

Definition: You have a objects of type 1 and b objects of type 2.

Select n objects without replacement from the a + b.

Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf
a
 b 
k n−k
P (X = k) = a+b
 , k = 0, 1, . . . , min(a, n − b, n).
n

ISYE 6739 — Goldsman 8/5/20 8 / 108


Hypergeometric Distribution

Lesson 4.2 — Hypergeometric Distribution

Definition: You have a objects of type 1 and b objects of type 2.

Select n objects without replacement from the a + b.

Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf
a
 b 
k n−k
P (X = k) = a+b
 , k = 0, 1, . . . , min(a, n − b, n).
n

Example: 25 sox in a box. 15 red, 10 blue. Pick 7 w/o replacement.

ISYE 6739 — Goldsman 8/5/20 8 / 108


Hypergeometric Distribution

Lesson 4.2 — Hypergeometric Distribution

Definition: You have a objects of type 1 and b objects of type 2.

Select n objects without replacement from the a + b.

Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf
a
 b 
k n−k
P (X = k) = a+b
 , k = 0, 1, . . . , min(a, n − b, n).
n

Example: 25 sox in a box. 15 red, 10 blue. Pick 7 w/o replacement.


15 10
 
P (exactly 3 reds are picked) = 3
4 .
25 2
7

ISYE 6739 — Goldsman 8/5/20 8 / 108


Hypergeometric Distribution

Theorem: After some algebra, it turns out that


 
a
E[X] = n and
a+b

ISYE 6739 — Goldsman 8/5/20 9 / 108


Hypergeometric Distribution

Theorem: After some algebra, it turns out that


 
a
E[X] = n and
a+b
   
a a a+b−n
Var(X) = n 1− .
a+b a+b a+b−1

ISYE 6739 — Goldsman 8/5/20 9 / 108


Hypergeometric Distribution

Theorem: After some algebra, it turns out that


 
a
E[X] = n and
a+b
   
a a a+b−n
Var(X) = n 1− .
a+b a+b a+b−1
a
Remark: Here, a+b here plays the role of p in the Binomial distribution.

ISYE 6739 — Goldsman 8/5/20 9 / 108


Hypergeometric Distribution

Theorem: After some algebra, it turns out that


 
a
E[X] = n and
a+b
   
a a a+b−n
Var(X) = n 1− .
a+b a+b a+b−1
a
Remark: Here, a+b here plays the role of p in the Binomial distribution.

And then the corresponding Y ∼ Bin(n, p) results would be


 
a
E[Y ] = n and
a+b

ISYE 6739 — Goldsman 8/5/20 9 / 108


Hypergeometric Distribution

Theorem: After some algebra, it turns out that


 
a
E[X] = n and
a+b
   
a a a+b−n
Var(X) = n 1− .
a+b a+b a+b−1
a
Remark: Here, a+b here plays the role of p in the Binomial distribution.

And then the corresponding Y ∼ Bin(n, p) results would be


 
a
E[Y ] = n and
a+b
  
a a
Var(Y ) = n 1− .
a+b a+b

ISYE 6739 — Goldsman 8/5/20 9 / 108


Hypergeometric Distribution

Theorem: After some algebra, it turns out that


 
a
E[X] = n and
a+b
   
a a a+b−n
Var(X) = n 1− .
a+b a+b a+b−1
a
Remark: Here, a+b here plays the role of p in the Binomial distribution.

And then the corresponding Y ∼ Bin(n, p) results would be


 
a
E[Y ] = n and
a+b
  
a a
Var(Y ) = n 1− .
a+b a+b
So same mean as the Hypergeometric, but slightly larger variance.
ISYE 6739 — Goldsman 8/5/20 9 / 108
Geometric and Negative Binomial Distributions

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 10 / 108


Geometric and Negative Binomial Distributions

Lesson 4.3 — Geometric and Negative Binomial Distributions

ISYE 6739 — Goldsman 8/5/20 11 / 108


Geometric and Negative Binomial Distributions

Lesson 4.3 — Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

ISYE 6739 — Goldsman 8/5/20 11 / 108


Geometric and Negative Binomial Distributions

Lesson 4.3 — Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,

ISYE 6739 — Goldsman 8/5/20 11 / 108


Geometric and Negative Binomial Distributions

Lesson 4.3 — Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,

P (Z = k) = q k−1 p, k = 1, 2, . . . ,

ISYE 6739 — Goldsman 8/5/20 11 / 108


Geometric and Negative Binomial Distributions

Lesson 4.3 — Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,

P (Z = k) = q k−1 p, k = 1, 2, . . . ,

and we say that Z has the Geometric distribution with parameter p.

ISYE 6739 — Goldsman 8/5/20 11 / 108


Geometric and Negative Binomial Distributions

Lesson 4.3 — Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,

P (Z = k) = q k−1 p, k = 1, 2, . . . ,

and we say that Z has the Geometric distribution with parameter p.

Notation: X ∼ Geom(p).

ISYE 6739 — Goldsman 8/5/20 11 / 108


Geometric and Negative Binomial Distributions

Lesson 4.3 — Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,

P (Z = k) = q k−1 p, k = 1, 2, . . . ,

and we say that Z has the Geometric distribution with parameter p.

Notation: X ∼ Geom(p).

We’ll get the mean and variance of the Geometric via the mgf. . . .

ISYE 6739 — Goldsman 8/5/20 11 / 108


Geometric and Negative Binomial Distributions

Theorem: The mgf of the Geom(p) is

pet
MZ (t) = , for t < `n(1/q).
1 − qet

ISYE 6739 — Goldsman 8/5/20 12 / 108


Geometric and Negative Binomial Distributions

Theorem: The mgf of the Geom(p) is

pet
MZ (t) = , for t < `n(1/q).
1 − qet

Proof:

MZ (t) = E[etZ ]

ISYE 6739 — Goldsman 8/5/20 12 / 108


Geometric and Negative Binomial Distributions

Theorem: The mgf of the Geom(p) is

pet
MZ (t) = , for t < `n(1/q).
1 − qet

Proof:

MZ (t) = E[etZ ]


X
= etk q k−1 p
k=1

ISYE 6739 — Goldsman 8/5/20 12 / 108


Geometric and Negative Binomial Distributions

Theorem: The mgf of the Geom(p) is

pet
MZ (t) = , for t < `n(1/q).
1 − qet

Proof:

MZ (t) = E[etZ ]


X
= etk q k−1 p
k=1

X
t
= pe (qet )k
k=0

ISYE 6739 — Goldsman 8/5/20 12 / 108


Geometric and Negative Binomial Distributions

Theorem: The mgf of the Geom(p) is

pet
MZ (t) = , for t < `n(1/q).
1 − qet

Proof:

MZ (t) = E[etZ ]


X
= etk q k−1 p
k=1

X
t
= pe (qet )k
k=0

pet
= , for qet < 1. 2
1 − qet

ISYE 6739 — Goldsman 8/5/20 12 / 108


Geometric and Negative Binomial Distributions

Corollary: E[Z] = 1/p.

ISYE 6739 — Goldsman 8/5/20 13 / 108


Geometric and Negative Binomial Distributions

Corollary: E[Z] = 1/p.

Proof:
d
E[Z] = MZ (t)

dt t=0

ISYE 6739 — Goldsman 8/5/20 13 / 108


Geometric and Negative Binomial Distributions

Corollary: E[Z] = 1/p.

Proof:
d
E[Z] = MZ (t)

dt t=0

(1 − qet )(pet ) − (−qet )(pet )


=
(1 − qet )2

t=0

ISYE 6739 — Goldsman 8/5/20 13 / 108


Geometric and Negative Binomial Distributions

Corollary: E[Z] = 1/p.

Proof:
d
E[Z] = MZ (t)

dt t=0

(1 − qet )(pet ) − (−qet )(pet )


=
(1 − qet )2

t=0

pet
=

(1 − qet )2 t=0

ISYE 6739 — Goldsman 8/5/20 13 / 108


Geometric and Negative Binomial Distributions

Corollary: E[Z] = 1/p.

Proof:
d
E[Z] = MZ (t)

dt t=0

(1 − qet )(pet ) − (−qet )(pet )


=
(1 − qet )2

t=0

pet
=

(1 − qet )2 t=0

p 1
= = . 2
(1 − q)2 p

ISYE 6739 — Goldsman 8/5/20 13 / 108


Geometric and Negative Binomial Distributions

Corollary: E[Z] = 1/p.

Proof:
d
E[Z] = MZ (t)

dt t=0

(1 − qet )(pet ) − (−qet )(pet )


=
(1 − qet )2

t=0

pet
=

(1 − qet )2 t=0

p 1
= = . 2
(1 − q)2 p

Remark: We could also have proven this directly from the definition of
expected value.

ISYE 6739 — Goldsman 8/5/20 13 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

2d2 2−p
E[Z ] = MZ (t) = ,

dt2 t=0 p2

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

2d2 2−p
E[Z ] = MZ (t) = ,

dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

2d2 2−p
E[Z ] = MZ (t) = ,

dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2

Example: Toss a die repeatedly. What’s the probability that we observe a 3


for the first time on the 8th toss?

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

2d2 2−p
E[Z ] = MZ (t) = ,

dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2

Example: Toss a die repeatedly. What’s the probability that we observe a 3


for the first time on the 8th toss?

Answer: The number of tosses we need is Z ∼ Geom(1/6).


P (Z = 8) = q k−1 p

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

2d2 2−p
E[Z ] = MZ (t) = ,

dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2

Example: Toss a die repeatedly. What’s the probability that we observe a 3


for the first time on the 8th toss?

Answer: The number of tosses we need is Z ∼ Geom(1/6).


P (Z = 8) = q k−1 p = (5/6)7 (1/6). 2

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

2d2 2−p
E[Z ] = MZ (t) = ,

dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2

Example: Toss a die repeatedly. What’s the probability that we observe a 3


for the first time on the 8th toss?

Answer: The number of tosses we need is Z ∼ Geom(1/6).


P (Z = 8) = q k−1 p = (5/6)7 (1/6). 2

How many tosses would we expect to take?

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Similarly, after a lot of algebra,

2d2 2−p
E[Z ] = MZ (t) = ,

dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2

Example: Toss a die repeatedly. What’s the probability that we observe a 3


for the first time on the 8th toss?

Answer: The number of tosses we need is Z ∼ Geom(1/6).


P (Z = 8) = q k−1 p = (5/6)7 (1/6). 2

How many tosses would we expect to take?

Answer: E[Z] = 1/p = 6 tosses. 2

ISYE 6739 — Goldsman 8/5/20 14 / 108


Geometric and Negative Binomial Distributions

Memoryless Property of Geometric

ISYE 6739 — Goldsman 8/5/20 15 / 108


Geometric and Negative Binomial Distributions

Memoryless Property of Geometric

Theorem: Suppose Z ∼ Geom(p). Then for positive integers s, t, we have

P (Z > s + t|Z > s) = P (Z > t).

ISYE 6739 — Goldsman 8/5/20 15 / 108


Geometric and Negative Binomial Distributions

Memoryless Property of Geometric

Theorem: Suppose Z ∼ Geom(p). Then for positive integers s, t, we have

P (Z > s + t|Z > s) = P (Z > t).

Why is it called the Memoryless Property?

ISYE 6739 — Goldsman 8/5/20 15 / 108


Geometric and Negative Binomial Distributions

Memoryless Property of Geometric

Theorem: Suppose Z ∼ Geom(p). Then for positive integers s, t, we have

P (Z > s + t|Z > s) = P (Z > t).

Why is it called the Memoryless Property? If an event hasn’t occurred by time


s, the probability that it will occur after an additional t time units is the same
as the (unconditional) probability that it will occur after time t — it forgot that
it made it past time s!

ISYE 6739 — Goldsman 8/5/20 15 / 108


Geometric and Negative Binomial Distributions

Proof: First of all, for any t = 0, 1, 2, . . ., the tail probability is

P (Z > t) = P (t Bern(p) failures in a row) = q t . (∗)

ISYE 6739 — Goldsman 8/5/20 16 / 108


Geometric and Negative Binomial Distributions

Proof: First of all, for any t = 0, 1, 2, . . ., the tail probability is

P (Z > t) = P (t Bern(p) failures in a row) = q t . (∗)

Then

P (Z > s + t|Z > s)

ISYE 6739 — Goldsman 8/5/20 16 / 108


Geometric and Negative Binomial Distributions

Proof: First of all, for any t = 0, 1, 2, . . ., the tail probability is

P (Z > t) = P (t Bern(p) failures in a row) = q t . (∗)

Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)

ISYE 6739 — Goldsman 8/5/20 16 / 108


Geometric and Negative Binomial Distributions

Proof: First of all, for any t = 0, 1, 2, . . ., the tail probability is

P (Z > t) = P (t Bern(p) failures in a row) = q t . (∗)

Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)

ISYE 6739 — Goldsman 8/5/20 16 / 108


Geometric and Negative Binomial Distributions

Proof: First of all, for any t = 0, 1, 2, . . ., the tail probability is

P (Z > t) = P (t Bern(p) failures in a row) = q t . (∗)

Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)

q s+t
= (by (∗))
qs

ISYE 6739 — Goldsman 8/5/20 16 / 108


Geometric and Negative Binomial Distributions

Proof: First of all, for any t = 0, 1, 2, . . ., the tail probability is

P (Z > t) = P (t Bern(p) failures in a row) = q t . (∗)

Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)

q s+t
= (by (∗))
qs
= qt

ISYE 6739 — Goldsman 8/5/20 16 / 108


Geometric and Negative Binomial Distributions

Proof: First of all, for any t = 0, 1, 2, . . ., the tail probability is

P (Z > t) = P (t Bern(p) failures in a row) = q t . (∗)

Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)

q s+t
= (by (∗))
qs
= qt
= P (Z > t). 2

ISYE 6739 — Goldsman 8/5/20 16 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time.

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success.

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?

Let Z be the number of tosses required.

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?

Let Z be the number of tosses required. By the Memoryless Property (with


s = 4 and t = 2) and (∗), we want

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?

Let Z be the number of tosses required. By the Memoryless Property (with


s = 4 and t = 2) and (∗), we want

P (Z > 6|Z > 4) = P (Z > 2) = (5/6)2 . 2

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?

Let Z be the number of tosses required. By the Memoryless Property (with


s = 4 and t = 2) and (∗), we want

P (Z > 6|Z > 4) = P (Z > 2) = (5/6)2 . 2

Fun Fact: The Geom(p) is the only discrete distribution with the
memoryless property.

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?

Let Z be the number of tosses required. By the Memoryless Property (with


s = 4 and t = 2) and (∗), we want

P (Z > 6|Z > 4) = P (Z > 2) = (5/6)2 . 2

Fun Fact: The Geom(p) is the only discrete distribution with the
memoryless property.

Not-as-Fun Fact: Some books define the Geom(p) as the number of


Bern(p) failures until you observe a success. # failures = # trials − 1. You
should be aware of this inconsistency, but don’t worry about it now.

ISYE 6739 — Goldsman 8/5/20 17 / 108


Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

ISYE 6739 — Goldsman 8/5/20 18 / 108


Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . ..

ISYE 6739 — Goldsman 8/5/20 18 / 108


Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.

ISYE 6739 — Goldsman 8/5/20 18 / 108


Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.

We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.

ISYE 6739 — Goldsman 8/5/20 18 / 108


Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.

We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.

Example: ‘FFFFSFS’ corresponds to W = 7 trials until the r = 2nd success.

ISYE 6739 — Goldsman 8/5/20 18 / 108


Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.

We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.

Example: ‘FFFFSFS’ corresponds to W = 7 trials until the r = 2nd success.

Notation: W ∼ NegBin(r, p).

ISYE 6739 — Goldsman 8/5/20 18 / 108


Geometric and Negative Binomial Distributions

Definition: Suppose we consider an infinite sequence of independent


Bern(p) trials.

Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.

We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.

Example: ‘FFFFSFS’ corresponds to W = 7 trials until the r = 2nd success.

Notation: W ∼ NegBin(r, p).

Remark: As with the Geom(p), the exact definition of the NegBin depends
on what book you’re reading.

ISYE 6739 — Goldsman 8/5/20 18 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.

Proof: Won’t do it here, but you can use the mgf technique. 2

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.

Proof: Won’t do it here, but you can use the mgf technique. 2

Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.

Proof: Won’t do it here, but you can use the mgf technique. 2

Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.

Since the Zi ’s are i.i.d., the above theorem gives:

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.

Proof: Won’t do it here, but you can use the mgf technique. 2

Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.

Since the Zi ’s are i.i.d., the above theorem gives:

E[W ] = rE[Zi ] = r/p,

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.

Proof: Won’t do it here, but you can use the mgf technique. 2

Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.

Since the Zi ’s are i.i.d., the above theorem gives:

E[W ] = rE[Zi ] = r/p,

Var(W ) = rVar(Zi ) = rq/p2 ,

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.

Proof: Won’t do it here, but you can use the mgf technique. 2

Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.

Since the Zi ’s are i.i.d., the above theorem gives:

E[W ] = rE[Zi ] = r/p,

Var(W ) = rVar(Zi ) = rq/p2 ,


r
pet

MW (t) = [MZi (t)]r = .
1 − qet

ISYE 6739 — Goldsman 8/5/20 19 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W .

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
  
k − 1 r−1 k−r
P (W = k) = p q p
r−1

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
    
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
    
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1

Example: Toss a die until a 5 appears for the third time.

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
    
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1

Example: Toss a die until a 5 appears for the third time. What’s the
probability that we’ll need exactly 7 tosses?

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
    
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1

Example: Toss a die until a 5 appears for the third time. What’s the
probability that we’ll need exactly 7 tosses?

Let W be the number of tosses required. Clearly, W ∼ NegBin(3, 1/6).

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
    
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1

Example: Toss a die until a 5 appears for the third time. What’s the
probability that we’ll need exactly 7 tosses?

Let W be the number of tosses required. Clearly, W ∼ NegBin(3, 1/6).


 
7−1
P (W = 7) = (1/6)3 (5/6)7−3 .
3−1

ISYE 6739 — Goldsman 8/5/20 20 / 108


Geometric and Negative Binomial Distributions

How are the Binomial and NegBin Related?

ISYE 6739 — Goldsman 8/5/20 21 / 108


Geometric and Negative Binomial Distributions

How are the Binomial and NegBin Related?

iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).

ISYE 6739 — Goldsman 8/5/20 21 / 108


Geometric and Negative Binomial Distributions

How are the Binomial and NegBin Related?

iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).

iid Pr
Z1 , . . . , Zr ∼ Geom(p) ⇒ W ≡ i=1 Zi ∼ NegBin(r, p).

ISYE 6739 — Goldsman 8/5/20 21 / 108


Geometric and Negative Binomial Distributions

How are the Binomial and NegBin Related?

iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).

iid Pr
Z1 , . . . , Zr ∼ Geom(p) ⇒ W ≡ i=1 Zi ∼ NegBin(r, p).

E[Y ] = np, Var(Y ) = npq.

ISYE 6739 — Goldsman 8/5/20 21 / 108


Geometric and Negative Binomial Distributions

How are the Binomial and NegBin Related?

iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).

iid Pr
Z1 , . . . , Zr ∼ Geom(p) ⇒ W ≡ i=1 Zi ∼ NegBin(r, p).

E[Y ] = np, Var(Y ) = npq.

E[W ] = r/p, Var(W ) = rq/p2 .

ISYE 6739 — Goldsman 8/5/20 21 / 108


Poisson Distribution

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 22 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).
(b) Defects on a wire (of length t).

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).
(b) Defects on a wire (of length t).
(c) Raisins in cookie dough (of volume t).

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).
(b) Defects on a wire (of length t).
(c) Raisins in cookie dough (of volume t).

Let λ > 0 be the average number of occurrences per unit time (or length or
volume).

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).
(b) Defects on a wire (of length t).
(c) Raisins in cookie dough (of volume t).

Let λ > 0 be the average number of occurrences per unit time (or length or
volume).

In the above examples, we might have:

ISYE 6739 — Goldsman 8/5/20 23 / 108


Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).
(b) Defects on a wire (of length t).
(c) Raisins in cookie dough (of volume t).

Let λ > 0 be the average number of occurrences per unit time (or length or
volume).

In the above examples, we might have:


(a) λ = 10/min.
ISYE 6739 — Goldsman 8/5/20 23 / 108
Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).
(b) Defects on a wire (of length t).
(c) Raisins in cookie dough (of volume t).

Let λ > 0 be the average number of occurrences per unit time (or length or
volume).

In the above examples, we might have:


(a) λ = 10/min. (b) λ = 0.5/ft.
ISYE 6739 — Goldsman 8/5/20 23 / 108
Poisson Distribution

Lesson 4.4 — Poisson Distribution

We’ll first talk about Poisson processes.

Let N (t) be a counting process. That is, N (t) is the number of


occurrences (or arrivals, or events) of some process over the time interval
[0, t]. N (t) looks like a step function.

Examples: N (t) could be any of the following.


(a) Cars entering a shopping center (by time t).
(b) Defects on a wire (of length t).
(c) Raisins in cookie dough (of volume t).

Let λ > 0 be the average number of occurrences per unit time (or length or
volume).

In the above examples, we might have:


(a) λ = 10/min. (b) λ = 0.5/ft. (c) λ = 4/in3 .
ISYE 6739 — Goldsman 8/5/20 23 / 108
Poisson Distribution

A Poisson process is a specific counting process. . . .

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

Definition: A Poisson process (PP) is one that satisfies the following:

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

Definition: A Poisson process (PP) is one that satisfies the following:

(i) There is a short enough interval of time h, such that, for all t,

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

Definition: A Poisson process (PP) is one that satisfies the following:

(i) There is a short enough interval of time h, such that, for all t,

P (N (t + h) − N (t) = 0) = 1 − λh + o(h)

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

Definition: A Poisson process (PP) is one that satisfies the following:

(i) There is a short enough interval of time h, such that, for all t,

P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

Definition: A Poisson process (PP) is one that satisfies the following:

(i) There is a short enough interval of time h, such that, for all t,

P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)
P (N (t + h) − N (t) ≥ 2) = o(h)

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

Definition: A Poisson process (PP) is one that satisfies the following:

(i) There is a short enough interval of time h, such that, for all t,

P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)
P (N (t + h) − N (t) ≥ 2) = o(h)

(ii) The distribution of the “increment” N (t + h) − N (t) only depends on


the length h.

ISYE 6739 — Goldsman 8/5/20 24 / 108


Poisson Distribution

A Poisson process is a specific counting process. . . .

First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.

Definition: A Poisson process (PP) is one that satisfies the following:

(i) There is a short enough interval of time h, such that, for all t,

P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)
P (N (t + h) − N (t) ≥ 2) = o(h)

(ii) The distribution of the “increment” N (t + h) − N (t) only depends on


the length h.

(iii) If a < b < c < d, then the two “increments” N (d) − N (c) and
N (b) − N (a) are independent RVs.
ISYE 6739 — Goldsman 8/5/20 24 / 108
Poisson Distribution

English translation of Poisson process assumptions.

ISYE 6739 — Goldsman 8/5/20 25 / 108


Poisson Distribution

English translation of Poisson process assumptions.

(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)

ISYE 6739 — Goldsman 8/5/20 25 / 108


Poisson Distribution

English translation of Poisson process assumptions.

(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)

(ii) The arrival pattern is stationary — it doesn’t change over time.

ISYE 6739 — Goldsman 8/5/20 25 / 108


Poisson Distribution

English translation of Poisson process assumptions.

(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)

(ii) The arrival pattern is stationary — it doesn’t change over time.

(iii) The numbers of arrivals in two disjoint time intervals are independent

ISYE 6739 — Goldsman 8/5/20 25 / 108


Poisson Distribution

English translation of Poisson process assumptions.

(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)

(ii) The arrival pattern is stationary — it doesn’t change over time.

(iii) The numbers of arrivals in two disjoint time intervals are independent

Poisson Process Example: Neutrinos hit a detector. Occurrences are rare


enough so that they really do happen one-at-a-time. You never get arrivals of
groups of neutrinos. Further, the rate doesn’t vary over time, and all arrivals
are independent of each other. 2

ISYE 6739 — Goldsman 8/5/20 25 / 108


Poisson Distribution

English translation of Poisson process assumptions.

(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)

(ii) The arrival pattern is stationary — it doesn’t change over time.

(iii) The numbers of arrivals in two disjoint time intervals are independent

Poisson Process Example: Neutrinos hit a detector. Occurrences are rare


enough so that they really do happen one-at-a-time. You never get arrivals of
groups of neutrinos. Further, the rate doesn’t vary over time, and all arrivals
are independent of each other. 2

Anti-Example: Customers arrive at a restaurant. They show up in groups,


not one-at-a-time. The rate varies over the day (more at dinnertime). Arrivals
may not be independent. This ain’t a Poisson process. 2

ISYE 6739 — Goldsman 8/5/20 25 / 108


Poisson Distribution

Definition: Let X be the number of occurrences in a Poisson(λ) process in a


unit interval of time.

ISYE 6739 — Goldsman 8/5/20 26 / 108


Poisson Distribution

Definition: Let X be the number of occurrences in a Poisson(λ) process in a


unit interval of time. Then X has the Poisson distribution with parameter
λ.

ISYE 6739 — Goldsman 8/5/20 26 / 108


Poisson Distribution

Definition: Let X be the number of occurrences in a Poisson(λ) process in a


unit interval of time. Then X has the Poisson distribution with parameter
λ.

Notation: X ∼ Pois(λ).

ISYE 6739 — Goldsman 8/5/20 26 / 108


Poisson Distribution

Definition: Let X be the number of occurrences in a Poisson(λ) process in a


unit interval of time. Then X has the Poisson distribution with parameter
λ.

Notation: X ∼ Pois(λ).

Theorem/Definition: X ∼ Pois(λ) ⇒ P (X = k) = e−λ λk /k!,


k = 0, 1, 2, . . ..

ISYE 6739 — Goldsman 8/5/20 26 / 108


Poisson Distribution

Definition: Let X be the number of occurrences in a Poisson(λ) process in a


unit interval of time. Then X has the Poisson distribution with parameter
λ.

Notation: X ∼ Pois(λ).

Theorem/Definition: X ∼ Pois(λ) ⇒ P (X = k) = e−λ λk /k!,


k = 0, 1, 2, . . ..

Proof: The proof follows from the PP assumptions and involves some simple
differential equations.

ISYE 6739 — Goldsman 8/5/20 26 / 108


Poisson Distribution

Definition: Let X be the number of occurrences in a Poisson(λ) process in a


unit interval of time. Then X has the Poisson distribution with parameter
λ.

Notation: X ∼ Pois(λ).

Theorem/Definition: X ∼ Pois(λ) ⇒ P (X = k) = e−λ λk /k!,


k = 0, 1, 2, . . ..

Proof: The proof follows from the PP assumptions and involves some simple
differential equations.
To begin with, let’s define Px (t) ≡ P (N (t) = x), i.e., the probability of
exactly x arrivals by time t.

ISYE 6739 — Goldsman 8/5/20 26 / 108


Poisson Distribution

Definition: Let X be the number of occurrences in a Poisson(λ) process in a


unit interval of time. Then X has the Poisson distribution with parameter
λ.

Notation: X ∼ Pois(λ).

Theorem/Definition: X ∼ Pois(λ) ⇒ P (X = k) = e−λ λk /k!,


k = 0, 1, 2, . . ..

Proof: The proof follows from the PP assumptions and involves some simple
differential equations.
To begin with, let’s define Px (t) ≡ P (N (t) = x), i.e., the probability of
exactly x arrivals by time t.
Note that the probability that there haven’t been any arrivals by time t + h can
be written in terms of the probability that there haven’t been any arrivals by
time t. . . .

ISYE 6739 — Goldsman 8/5/20 26 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).

Thus,
P0 (t + h) − P0 (t) .
= −λP0 (t).
h

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).

Thus,
P0 (t + h) − P0 (t) .
= −λP0 (t).
h
Taking the limit as h → 0, we have

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)

= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).

Thus,
P0 (t + h) − P0 (t) .
= −λP0 (t).
h
Taking the limit as h → 0, we have

P00 (t) = −λP0 (t). (1)

ISYE 6739 — Goldsman 8/5/20 27 / 108


Poisson Distribution

Similarly, for x > 0, we have

ISYE 6739 — Goldsman 8/5/20 28 / 108


Poisson Distribution

Similarly, for x > 0, we have

Px (t + h) = P (N (t + h) = x)

ISYE 6739 — Goldsman 8/5/20 28 / 108


Poisson Distribution

Similarly, for x > 0, we have

Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)

ISYE 6739 — Goldsman 8/5/20 28 / 108


Poisson Distribution

Similarly, for x > 0, we have

Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)
. 
= P {N (t) = x} ∩ {N (t + h) − N (t) = 0}

+P {N (t) = x − 1} ∩ {N (t + h) − N (t) = 1}
(by (i), only consider case of one arrival in [t, t + h])

ISYE 6739 — Goldsman 8/5/20 28 / 108


Poisson Distribution

Similarly, for x > 0, we have

Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)
. 
= P {N (t) = x} ∩ {N (t + h) − N (t) = 0}

+P {N (t) = x − 1} ∩ {N (t + h) − N (t) = 1}
(by (i), only consider case of one arrival in [t, t + h])
= P (N (t) = x)P (N (t + h) − N (t) = 0)
+P (N (t) = x − 1)P (N (t + h) − N (t) = 1)
(by independent increments (iii))

ISYE 6739 — Goldsman 8/5/20 28 / 108


Poisson Distribution

Similarly, for x > 0, we have

Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)
. 
= P {N (t) = x} ∩ {N (t + h) − N (t) = 0}

+P {N (t) = x − 1} ∩ {N (t + h) − N (t) = 1}
(by (i), only consider case of one arrival in [t, t + h])
= P (N (t) = x)P (N (t + h) − N (t) = 0)
+P (N (t) = x − 1)P (N (t + h) − N (t) = 1)
(by independent increments (iii))
.
= Px (t)(1 − λh) + Px−1 (t)λh.

ISYE 6739 — Goldsman 8/5/20 28 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)

Noting that t = 1 for the Pois(λ) finally completes the proof. 2

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)

Noting that t = 1 for the Pois(λ) finally completes the proof. 2

Remark: λ can be changed simply by changing the units of time.

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)

Noting that t = 1 for the Pois(λ) finally completes the proof. 2

Remark: λ can be changed simply by changing the units of time.

Examples:
X = # calls to a switchboard in 1 minute ∼ Pois(3 / min)

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)

Noting that t = 1 for the Pois(λ) finally completes the proof. 2

Remark: λ can be changed simply by changing the units of time.

Examples:
X = # calls to a switchboard in 1 minute ∼ Pois(3 / min)
Y = # calls to a switchboard in 5 minutes ∼ Pois(15 / 5 min)

ISYE 6739 — Goldsman 8/5/20 29 / 108


Poisson Distribution

Taking the limits as h → 0, we obtain

Px0 (t) = λ Px−1 (t) − Px (t) ,


 
x = 1, 2, . . . . (2)

The solution of differential equations (1) and (2) is easily shown to be

(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)

Noting that t = 1 for the Pois(λ) finally completes the proof. 2

Remark: λ can be changed simply by changing the units of time.

Examples:
X = # calls to a switchboard in 1 minute ∼ Pois(3 / min)
Y = # calls to a switchboard in 5 minutes ∼ Pois(15 / 5 min)
Z = # calls to a switchboard in 10 sec ∼ Pois(0.5 / 10 sec)
ISYE 6739 — Goldsman 8/5/20 29 / 108
Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:

MX (t) = E[etX ]

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:

e−λ λk
X  
tX tk
MX (t) = E[e ] = e
k!
k=0

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:
∞ ∞
e−λ λk (λet )k
X   X
tX tk −λ
MX (t) = E[e ] = e = e
k! k!
k=0 k=0

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:
∞ ∞
e−λ λk (λet )k
 
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:
∞ ∞
e−λ λk (λet )k
 
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0

Theorem: X ∼ Pois(λ) ⇒ E[X] = Var(X) = λ.

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:
∞ ∞
e−λ λk (λet )k
 
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0

Theorem: X ∼ Pois(λ) ⇒ E[X] = Var(X) = λ.

Proof (using mgf):

d
E[X] = MX (t)

dt t=0

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:
∞ ∞
e−λ λk (λet )k
 
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0

Theorem: X ∼ Pois(λ) ⇒ E[X] = Var(X) = λ.

Proof (using mgf):

d d λ(et −1)
E[X] = MX (t) = e

dt dt

t=0 t=0

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:
∞ ∞
e−λ λk (λet )k
 
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0

Theorem: X ∼ Pois(λ) ⇒ E[X] = Var(X) = λ.

Proof (using mgf):

d d λ(et −1)
E[X] = MX (t) = e = λet MX (t)

dt dt

t=0 t=0 t=0

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .

Proof:
∞ ∞
e−λ λk (λet )k
 
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0

Theorem: X ∼ Pois(λ) ⇒ E[X] = Var(X) = λ.

Proof (using mgf):

d d λ(et −1)
E[X] = MX (t) = e = λet MX (t) = λ.

dt dt

t=0 t=0 t=0

ISYE 6739 — Goldsman 8/5/20 30 / 108


Poisson Distribution

Similarly,

d2
E[X 2 ] = M (t)

2 X
dt

t=0

ISYE 6739 — Goldsman 8/5/20 31 / 108


Poisson Distribution

Similarly,

d2
E[X 2 ] = M (t)

2 X
dt

t=0

dd 
= MX (t)

dt dt t=0

ISYE 6739 — Goldsman 8/5/20 31 / 108


Poisson Distribution

Similarly,

d2
E[X 2 ] = M (t)

2 X
dt

t=0

dd 
= MX (t)

dt dt t=0

d t 
= λ e MX (t)
dt t=0

ISYE 6739 — Goldsman 8/5/20 31 / 108


Poisson Distribution

Similarly,

d2
E[X 2 ] = M (t)

2 X
dt

t=0

dd 
= MX (t)

dt dt t=0

d t 
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)

dt t=0

ISYE 6739 — Goldsman 8/5/20 31 / 108


Poisson Distribution

Similarly,

d2
E[X 2 ] = M (t)

2 X
dt

t=0

dd 
= MX (t)

dt dt t=0

d t 
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)

dt t=0
h i
= λet MX (t) + λet MX (t)

t=0

ISYE 6739 — Goldsman 8/5/20 31 / 108


Poisson Distribution

Similarly,

d2
E[X 2 ] = M (t)

2 X
dt

t=0

dd 
= MX (t)

dt dt t=0

d t 
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)

dt t=0
h i
= λet MX (t) + λet MX (t)

t=0
= λ(1 + λ).

ISYE 6739 — Goldsman 8/5/20 31 / 108


Poisson Distribution

Similarly,

d2
E[X 2 ] = M (t)

2 X
dt

t=0

dd 
= MX (t)

dt dt t=0

d t 
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)

dt t=0
h i
= λet MX (t) + λet MX (t)

t=0
= λ(1 + λ).

Thus, Var(X) = E[X 2 ] − (E[X])2 = λ(1 + λ) − λ2 = λ. 2

ISYE 6739 — Goldsman 8/5/20 31 / 108


Poisson Distribution

Example: Calls to a switchboard arrive as a Poisson process with rate 3


calls/min.

ISYE 6739 — Goldsman 8/5/20 32 / 108


Poisson Distribution

Example: Calls to a switchboard arrive as a Poisson process with rate 3


calls/min.

Let X = number of calls in 1 minute.

ISYE 6739 — Goldsman 8/5/20 32 / 108


Poisson Distribution

Example: Calls to a switchboard arrive as a Poisson process with rate 3


calls/min.

Let X = number of calls in 1 minute.


P4 −3 k
So X ∼ Pois(3), E[X] = Var(X) = 3, P (X ≤ 4) = k=0 e 3 /k!.

ISYE 6739 — Goldsman 8/5/20 32 / 108


Poisson Distribution

Example: Calls to a switchboard arrive as a Poisson process with rate 3


calls/min.

Let X = number of calls in 1 minute.


P4 −3 k
So X ∼ Pois(3), E[X] = Var(X) = 3, P (X ≤ 4) = k=0 e 3 /k!.

Let Y = number of calls in 40 sec.

ISYE 6739 — Goldsman 8/5/20 32 / 108


Poisson Distribution

Example: Calls to a switchboard arrive as a Poisson process with rate 3


calls/min.

Let X = number of calls in 1 minute.


P4 −3 k
So X ∼ Pois(3), E[X] = Var(X) = 3, P (X ≤ 4) = k=0 e 3 /k!.

Let Y = number of calls in 40 sec.


P4
So Y ∼ Pois(2), E[Y ] = Var(Y ) = 2, P (Y ≤ 4) = k=0 e
−2 2k /k!. 2

ISYE 6739 — Goldsman 8/5/20 32 / 108


Poisson Distribution

Theorem (Additive Property of Poissons): Suppose X1 , . . . , Xn are


independent with Xi ∼ Pois(λi ), i = 1, . . . , n. Then

ISYE 6739 — Goldsman 8/5/20 33 / 108


Poisson Distribution

Theorem (Additive Property of Poissons): Suppose X1 , . . . , Xn are


independent with Xi ∼ Pois(λi ), i = 1, . . . , n. Then
n
X n
X 
Y ≡ Xi ∼ Pois λi .
i=1 i=1

ISYE 6739 — Goldsman 8/5/20 33 / 108


Poisson Distribution

Theorem (Additive Property of Poissons): Suppose X1 , . . . , Xn are


independent with Xi ∼ Pois(λi ), i = 1, . . . , n. Then
n
X n
X 
Y ≡ Xi ∼ Pois λi .
i=1 i=1

Proof: Since the Xi ’s are independent, we have

ISYE 6739 — Goldsman 8/5/20 33 / 108


Poisson Distribution

Theorem (Additive Property of Poissons): Suppose X1 , . . . , Xn are


independent with Xi ∼ Pois(λi ), i = 1, . . . , n. Then
n
X n
X 
Y ≡ Xi ∼ Pois λi .
i=1 i=1

Proof: Since the Xi ’s are independent, we have


n
Y
MY (t) = MXi (t)
i=1

ISYE 6739 — Goldsman 8/5/20 33 / 108


Poisson Distribution

Theorem (Additive Property of Poissons): Suppose X1 , . . . , Xn are


independent with Xi ∼ Pois(λi ), i = 1, . . . , n. Then
n
X n
X 
Y ≡ Xi ∼ Pois λi .
i=1 i=1

Proof: Since the Xi ’s are independent, we have


n n
t −1)
Y Y
MY (t) = MXi (t) = eλi (e
i=1 i=1

ISYE 6739 — Goldsman 8/5/20 33 / 108


Poisson Distribution

Theorem (Additive Property of Poissons): Suppose X1 , . . . , Xn are


independent with Xi ∼ Pois(λi ), i = 1, . . . , n. Then
n
X n
X 
Y ≡ Xi ∼ Pois λi .
i=1 i=1

Proof: Since the Xi ’s are independent, we have


n n Pn
t −1) λi )(et −1)
Y Y
MY (t) = MXi (t) = eλi (e = e( i=1 ,
i=1 i=1

ISYE 6739 — Goldsman 8/5/20 33 / 108


Poisson Distribution

Theorem (Additive Property of Poissons): Suppose X1 , . . . , Xn are


independent with Xi ∼ Pois(λi ), i = 1, . . . , n. Then
n
X n
X 
Y ≡ Xi ∼ Pois λi .
i=1 i=1

Proof: Since the Xi ’s are independent, we have


n n Pn
t −1) λi )(et −1)
Y Y
MY (t) = MXi (t) = eλi (e = e( i=1 ,
i=1 i=1
Pn
2

which is the mgf of the Pois i=1 λi distribution.

ISYE 6739 — Goldsman 8/5/20 33 / 108


Poisson Distribution

Example: Cars driven by males [females] arrive at a parking lot according to


a Poisson process with a rate of λ1 = 3 / hr [λ2 = 5 / hr]. All arrivals are
independent.

ISYE 6739 — Goldsman 8/5/20 34 / 108


Poisson Distribution

Example: Cars driven by males [females] arrive at a parking lot according to


a Poisson process with a rate of λ1 = 3 / hr [λ2 = 5 / hr]. All arrivals are
independent.

What’s the probability of exactly 2 arrivals in the next 30 minutes?

ISYE 6739 — Goldsman 8/5/20 34 / 108


Poisson Distribution

Example: Cars driven by males [females] arrive at a parking lot according to


a Poisson process with a rate of λ1 = 3 / hr [λ2 = 5 / hr]. All arrivals are
independent.

What’s the probability of exactly 2 arrivals in the next 30 minutes?

The total number of arrivals is Pois(λ1 + λ2 = 8/hr), and so the total in the
next 30 minutes is X ∼ Pois(4).

ISYE 6739 — Goldsman 8/5/20 34 / 108


Poisson Distribution

Example: Cars driven by males [females] arrive at a parking lot according to


a Poisson process with a rate of λ1 = 3 / hr [λ2 = 5 / hr]. All arrivals are
independent.

What’s the probability of exactly 2 arrivals in the next 30 minutes?

The total number of arrivals is Pois(λ1 + λ2 = 8/hr), and so the total in the
next 30 minutes is X ∼ Pois(4). So P (X = 2) = e−4 42 /2!. 2

ISYE 6739 — Goldsman 8/5/20 34 / 108


Uniform, Exponential, and Friends

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 35 / 108


Uniform, Exponential, and Friends

Lesson 4.5 — Uniform, Exponential, and Friends

ISYE 6739 — Goldsman 8/5/20 36 / 108


Uniform, Exponential, and Friends

Lesson 4.5 — Uniform, Exponential, and Friends

Definition: The RV X has the Uniform distribution if it has pdf and cdf
(
1
b−a , a≤x≤b
f (x) =
0, otherwise

ISYE 6739 — Goldsman 8/5/20 36 / 108


Uniform, Exponential, and Friends

Lesson 4.5 — Uniform, Exponential, and Friends

Definition: The RV X has the Uniform distribution if it has pdf and cdf

(
1  0,
 x<a
b−a , a ≤ x ≤ b

f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise 
x ≥ b.

 1,

ISYE 6739 — Goldsman 8/5/20 36 / 108


Uniform, Exponential, and Friends

Lesson 4.5 — Uniform, Exponential, and Friends

Definition: The RV X has the Uniform distribution if it has pdf and cdf

(
1  0,
 x<a
b−a , a ≤ x ≤ b

f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise 
x ≥ b.

 1,

Notation: X ∼ Unif(a, b).

ISYE 6739 — Goldsman 8/5/20 36 / 108


Uniform, Exponential, and Friends

Lesson 4.5 — Uniform, Exponential, and Friends

Definition: The RV X has the Uniform distribution if it has pdf and cdf

(
1  0,
 x<a
b−a , a ≤ x ≤ b

f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise 
x ≥ b.

 1,

Notation: X ∼ Unif(a, b).

Previous work showed that


a+b (a − b)2
E[X] = and Var(X) = .
2 12

ISYE 6739 — Goldsman 8/5/20 36 / 108


Uniform, Exponential, and Friends

Lesson 4.5 — Uniform, Exponential, and Friends

Definition: The RV X has the Uniform distribution if it has pdf and cdf

(
1  0,
 x<a
b−a , a ≤ x ≤ b

f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise 
x ≥ b.

 1,

Notation: X ∼ Unif(a, b).

Previous work showed that


a+b (a − b)2
E[X] = and Var(X) = .
2 12

We can also derive the mgf,


b
etb − eta
Z
tX 1
MX (t) = E[e ] = etx dx = .
a b−a t(b − a)
ISYE 6739 — Goldsman 8/5/20 36 / 108
Uniform, Exponential, and Friends

Definition: The Exponential(λ) distribution has pdf

ISYE 6739 — Goldsman 8/5/20 37 / 108


Uniform, Exponential, and Friends

Definition: The Exponential(λ) distribution has pdf


(
λe−λx , x > 0
f (x) =
0, otherwise.

ISYE 6739 — Goldsman 8/5/20 37 / 108


Uniform, Exponential, and Friends

Definition: The Exponential(λ) distribution has pdf


(
λe−λx , x > 0
f (x) =
0, otherwise.

Previous work showed that the cdf F (x) = 1 − e−λx ,

ISYE 6739 — Goldsman 8/5/20 37 / 108


Uniform, Exponential, and Friends

Definition: The Exponential(λ) distribution has pdf


(
λe−λx , x > 0
f (x) =
0, otherwise.

Previous work showed that the cdf F (x) = 1 − e−λx ,

E[X] = 1/λ, and Var(X) = 1/λ2 .

ISYE 6739 — Goldsman 8/5/20 37 / 108


Uniform, Exponential, and Friends

Definition: The Exponential(λ) distribution has pdf


(
λe−λx , x > 0
f (x) =
0, otherwise.

Previous work showed that the cdf F (x) = 1 − e−λx ,

E[X] = 1/λ, and Var(X) = 1/λ2 .

We also derived the mgf,

ISYE 6739 — Goldsman 8/5/20 37 / 108


Uniform, Exponential, and Friends

Definition: The Exponential(λ) distribution has pdf


(
λe−λx , x > 0
f (x) =
0, otherwise.

Previous work showed that the cdf F (x) = 1 − e−λx ,

E[X] = 1/λ, and Var(X) = 1/λ2 .

We also derived the mgf,


Z ∞
tX λ
MX (t) = E[e ] = etx f (x) dx = , t < λ.
0 λ−t

ISYE 6739 — Goldsman 8/5/20 37 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

P (X > s + t|X > s) = P (X > t).

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

P (X > s + t|X > s) = P (X > t).

Similar to the discrete Geometric distribution, the probability that X will


survive an additional t time units is the (unconditional) probability that it will
survive at least t

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

P (X > s + t|X > s) = P (X > t).

Similar to the discrete Geometric distribution, the probability that X will


survive an additional t time units is the (unconditional) probability that it will
survive at least t — it forgot that it made it past time s! It’s always “like new”!

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

P (X > s + t|X > s) = P (X > t).

Similar to the discrete Geometric distribution, the probability that X will


survive an additional t time units is the (unconditional) probability that it will
survive at least t — it forgot that it made it past time s! It’s always “like new”!

Proof:
P (X > s + t ∩ X > s)
P (X > s + t|X > s) =
P (X > s)

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

P (X > s + t|X > s) = P (X > t).

Similar to the discrete Geometric distribution, the probability that X will


survive an additional t time units is the (unconditional) probability that it will
survive at least t — it forgot that it made it past time s! It’s always “like new”!

Proof:
P (X > s + t ∩ X > s) P (X > s + t)
P (X > s + t|X > s) = =
P (X > s) P (X > s)

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

P (X > s + t|X > s) = P (X > t).

Similar to the discrete Geometric distribution, the probability that X will


survive an additional t time units is the (unconditional) probability that it will
survive at least t — it forgot that it made it past time s! It’s always “like new”!

Proof:
P (X > s + t ∩ X > s) P (X > s + t)
P (X > s + t|X > s) = =
P (X > s) P (X > s)

e−λ(s+t)
=
e−λs

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Memoryless Property of Exponential

Theorem: Suppose that X ∼ Exp(λ). Then for positive s, t, we have

P (X > s + t|X > s) = P (X > t).

Similar to the discrete Geometric distribution, the probability that X will


survive an additional t time units is the (unconditional) probability that it will
survive at least t — it forgot that it made it past time s! It’s always “like new”!

Proof:
P (X > s + t ∩ X > s) P (X > s + t)
P (X > s + t|X > s) = =
P (X > s) P (X > s)

e−λ(s+t)
= = e−λt = P (X > t). 2
e−λs

ISYE 6739 — Goldsman 8/5/20 38 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months.

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

P (X > 30|X > 20) = P (X > 10)

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

P (X > 30|X > 20) = P (X > 10) = e−λx

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

P (X > 30|X > 20) = P (X > 10) = e−λx = e−(1/10)(10)

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

P (X > 30|X > 20) = P (X > 10) = e−λx = e−(1/10)(10) = e−1 . 2

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

P (X > 30|X > 20) = P (X > 10) = e−λx = e−(1/10)(10) = e−1 . 2

Example: If the time to the next bus is exponentially distributed with a mean
of 10 minutes, and you’ve already been waiting 20 minutes, you can expect to
wait 10 more. 2

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

P (X > 30|X > 20) = P (X > 10) = e−λx = e−(1/10)(10) = e−1 . 2

Example: If the time to the next bus is exponentially distributed with a mean
of 10 minutes, and you’ve already been waiting 20 minutes, you can expect to
wait 10 more. 2

Remark: The exponential is the only continuous distribution with the


Memoryless Property.

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Example: Suppose that the life of a lightbulb is exponential with a mean of


10 months. If the light survives 20 months, what’s the probability that it’ll
survive another 10?

P (X > 30|X > 20) = P (X > 10) = e−λx = e−(1/10)(10) = e−1 . 2

Example: If the time to the next bus is exponentially distributed with a mean
of 10 minutes, and you’ve already been waiting 20 minutes, you can expect to
wait 10 more. 2

Remark: The exponential is the only continuous distribution with the


Memoryless Property.

Remark: Look at E[X] and Var(X) for the Geometric distribution and see
how they’re similar to those for the exponential. (Not a coincidence.)

ISYE 6739 — Goldsman 8/5/20 39 / 108


Uniform, Exponential, and Friends

Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is

ISYE 6739 — Goldsman 8/5/20 40 / 108


Uniform, Exponential, and Friends

Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is
f (t) f (t)
S(t) ≡ = ,
P (X > t) 1 − F (t)

ISYE 6739 — Goldsman 8/5/20 40 / 108


Uniform, Exponential, and Friends

Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is
f (t) f (t)
S(t) ≡ = ,
P (X > t) 1 − F (t)

which can loosely be regarded as X’s instantaneous rate of death, given that it
has so far survived to time t.

ISYE 6739 — Goldsman 8/5/20 40 / 108


Uniform, Exponential, and Friends

Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is
f (t) f (t)
S(t) ≡ = ,
P (X > t) 1 − F (t)

which can loosely be regarded as X’s instantaneous rate of death, given that it
has so far survived to time t.

Example: If X ∼ Exp(λ), then S(t) = λe−λt /e−λt = λ. So if X is the


exponential lifetime of a lightbulb, then its instantaneous burn-out rate is
always λ — always good as new! This is clearly a result of the Memoryless
Property. 2

ISYE 6739 — Goldsman 8/5/20 40 / 108


Uniform, Exponential, and Friends

The Exponential is also related to the Poisson!

ISYE 6739 — Goldsman 8/5/20 41 / 108


Uniform, Exponential, and Friends

The Exponential is also related to the Poisson!

Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).

ISYE 6739 — Goldsman 8/5/20 41 / 108


Uniform, Exponential, and Friends

The Exponential is also related to the Poisson!

Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).

Proof:

F (x) = P (X ≤ x) = 1 − P (no arrivals in [0, x])

ISYE 6739 — Goldsman 8/5/20 41 / 108


Uniform, Exponential, and Friends

The Exponential is also related to the Poisson!

Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).

Proof:

F (x) = P (X ≤ x) = 1 − P (no arrivals in [0, x])


e−λx (λx)0
= 1− (since # arrivals in [0, x] is Pois(λx))
0!

ISYE 6739 — Goldsman 8/5/20 41 / 108


Uniform, Exponential, and Friends

The Exponential is also related to the Poisson!

Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).

Proof:

F (x) = P (X ≤ x) = 1 − P (no arrivals in [0, x])


e−λx (λx)0
= 1− (since # arrivals in [0, x] is Pois(λx))
0!
= 1 − e−λx . 2

ISYE 6739 — Goldsman 8/5/20 41 / 108


Uniform, Exponential, and Friends

The Exponential is also related to the Poisson!

Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).

Proof:

F (x) = P (X ≤ x) = 1 − P (no arrivals in [0, x])


e−λx (λx)0
= 1− (since # arrivals in [0, x] is Pois(λx))
0!
= 1 − e−λx . 2

Theorem: Amazingly, it can be shown (after a lot of work) that the


interarrival times of a PP are all iid Exp(λ)! See for yourself when you take a
stochastic processes course.

ISYE 6739 — Goldsman 8/5/20 41 / 108


Uniform, Exponential, and Friends

Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr.

ISYE 6739 — Goldsman 8/5/20 42 / 108


Uniform, Exponential, and Friends

Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr. What’s the probability that the time between the 13th and 14th
customers will be at least 4 minutes?

ISYE 6739 — Goldsman 8/5/20 42 / 108


Uniform, Exponential, and Friends

Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr. What’s the probability that the time between the 13th and 14th
customers will be at least 4 minutes?

Let the time between customers 13 and 14 be X. Since we have a PP, the
interarrivals are iid Exp(λ = 20/hr), so

ISYE 6739 — Goldsman 8/5/20 42 / 108


Uniform, Exponential, and Friends

Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr. What’s the probability that the time between the 13th and 14th
customers will be at least 4 minutes?

Let the time between customers 13 and 14 be X. Since we have a PP, the
interarrivals are iid Exp(λ = 20/hr), so

P (X > 4 min) = P (X > 1/15 hr) = e−λt = e−20/15 . 2

ISYE 6739 — Goldsman 8/5/20 42 / 108


Uniform, Exponential, and Friends

iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).

ISYE 6739 — Goldsman 8/5/20 43 / 108


Uniform, Exponential, and Friends

iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).

The Erlang is simply the sum of iid exponentials.

ISYE 6739 — Goldsman 8/5/20 43 / 108


Uniform, Exponential, and Friends

iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).

The Erlang is simply the sum of iid exponentials.

Special Case: Erlang1 (λ) ∼ Exp(λ).

ISYE 6739 — Goldsman 8/5/20 43 / 108


Uniform, Exponential, and Friends

iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).

The Erlang is simply the sum of iid exponentials.

Special Case: Erlang1 (λ) ∼ Exp(λ).

The pdf and cdf of the Erlang are

ISYE 6739 — Goldsman 8/5/20 43 / 108


Uniform, Exponential, and Friends

iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).

The Erlang is simply the sum of iid exponentials.

Special Case: Erlang1 (λ) ∼ Exp(λ).

The pdf and cdf of the Erlang are

λk e−λs sk−1
f (s) = , s ≥ 0, and
(k − 1)!

ISYE 6739 — Goldsman 8/5/20 43 / 108


Uniform, Exponential, and Friends

iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).

The Erlang is simply the sum of iid exponentials.

Special Case: Erlang1 (λ) ∼ Exp(λ).

The pdf and cdf of the Erlang are


k−1 −λs
λk e−λs sk−1 X e (λs)i
f (s) = , s ≥ 0, and F (s) = 1 − .
(k − 1)! i!
i=0

ISYE 6739 — Goldsman 8/5/20 43 / 108


Uniform, Exponential, and Friends

iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).

The Erlang is simply the sum of iid exponentials.

Special Case: Erlang1 (λ) ∼ Exp(λ).

The pdf and cdf of the Erlang are


k−1 −λs
λk e−λs sk−1 X e (λs)i
f (s) = , s ≥ 0, and F (s) = 1 − .
(k − 1)! i!
i=0

Notice that the cdf is the sum of a bunch of Poisson probabilities. (Won’t do it
here, but this observation helps in the derivation of the cdf.)

ISYE 6739 — Goldsman 8/5/20 43 / 108


Uniform, Exponential, and Friends

Expected value, variance, and mgf:


k
hX i k
X
E[S] = E Xi = E[Xi ] = k/λ
i=1 i=1

ISYE 6739 — Goldsman 8/5/20 44 / 108


Uniform, Exponential, and Friends

Expected value, variance, and mgf:


k
hX i k
X
E[S] = E Xi = E[Xi ] = k/λ
i=1 i=1

Var(S) = k/λ2

ISYE 6739 — Goldsman 8/5/20 44 / 108


Uniform, Exponential, and Friends

Expected value, variance, and mgf:


k
hX i k
X
E[S] = E Xi = E[Xi ] = k/λ
i=1 i=1

Var(S) = k/λ2

 k
λ
MS (t) = .
λ−t

ISYE 6739 — Goldsman 8/5/20 44 / 108


Uniform, Exponential, and Friends

Expected value, variance, and mgf:


k
hX i k
X
E[S] = E Xi = E[Xi ] = k/λ
i=1 i=1

Var(S) = k/λ2

 k
λ
MS (t) = .
λ−t

Example: Suppose X and Y are iid Exp(2). Find P (X + Y < 1).

ISYE 6739 — Goldsman 8/5/20 44 / 108


Uniform, Exponential, and Friends

Expected value, variance, and mgf:


k
hX i k
X
E[S] = E Xi = E[Xi ] = k/λ
i=1 i=1

Var(S) = k/λ2

 k
λ
MS (t) = .
λ−t

Example: Suppose X and Y are iid Exp(2). Find P (X + Y < 1).


k−1 −λs
X e (λs)i
P (X + Y < 1) = 1 −
i!
i=0

ISYE 6739 — Goldsman 8/5/20 44 / 108


Uniform, Exponential, and Friends

Expected value, variance, and mgf:


k
hX i k
X
E[S] = E Xi = E[Xi ] = k/λ
i=1 i=1

Var(S) = k/λ2

 k
λ
MS (t) = .
λ−t

Example: Suppose X and Y are iid Exp(2). Find P (X + Y < 1).


k−1 −λs 2−1 −(2·1)
X e (λs)i X e (2 · 1)i
P (X + Y < 1) = 1 − = 1− = 0.594. 2
i! i!
i=0 i=0

ISYE 6739 — Goldsman 8/5/20 44 / 108


Uniform, Exponential, and Friends

Definition: X has the Gamma distribution with parameters α > 0 and


λ > 0 if it has pdf

ISYE 6739 — Goldsman 8/5/20 45 / 108


Uniform, Exponential, and Friends

Definition: X has the Gamma distribution with parameters α > 0 and


λ > 0 if it has pdf

λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)

ISYE 6739 — Goldsman 8/5/20 45 / 108


Uniform, Exponential, and Friends

Definition: X has the Gamma distribution with parameters α > 0 and


λ > 0 if it has pdf

λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)

where Z ∞
Γ(α) ≡ tα−1 e−t dt
0
is the gamma function.

ISYE 6739 — Goldsman 8/5/20 45 / 108


Uniform, Exponential, and Friends

Definition: X has the Gamma distribution with parameters α > 0 and


λ > 0 if it has pdf

λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)

where Z ∞
Γ(α) ≡ tα−1 e−t dt
0
is the gamma function.

Remark: The Gamma distribution generalizes the Erlang distribution (where


α has to be a positive integer). It has the same expected value and variance as
the Erlang, with α in place of k.

ISYE 6739 — Goldsman 8/5/20 45 / 108


Uniform, Exponential, and Friends

Definition: X has the Gamma distribution with parameters α > 0 and


λ > 0 if it has pdf

λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)

where Z ∞
Γ(α) ≡ tα−1 e−t dt
0
is the gamma function.

Remark: The Gamma distribution generalizes the Erlang distribution (where


α has to be a positive integer). It has the same expected value and variance as
the Erlang, with α in place of k.

Remark: If α is a positive integer, then Γ(α) = (α − 1)!. Party trick:



Γ(1/2) = π.

ISYE 6739 — Goldsman 8/5/20 45 / 108


Other Continuous Distributions

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 46 / 108


Other Continuous Distributions

Lesson 4.6 — Other Continuous Distributions

ISYE 6739 — Goldsman 8/5/20 47 / 108


Other Continuous Distributions

Lesson 4.6 — Other Continuous Distributions

Triangular(a, b, c) Distribution — good for modeling RVs on the basis of


limited data (minimum, mode, maximum).

ISYE 6739 — Goldsman 8/5/20 47 / 108


Other Continuous Distributions

Lesson 4.6 — Other Continuous Distributions

Triangular(a, b, c) Distribution — good for modeling RVs on the basis of


limited data (minimum, mode, maximum).

2(x−a)
 (b−a)(c−a) , a < x ≤ b


2(c−x)
f (x) = (c−b)(c−a) , b<x<c


 0, otherwise.

ISYE 6739 — Goldsman 8/5/20 47 / 108


Other Continuous Distributions

Lesson 4.6 — Other Continuous Distributions

Triangular(a, b, c) Distribution — good for modeling RVs on the basis of


limited data (minimum, mode, maximum).

2(x−a)
 (b−a)(c−a) , a < x ≤ b


2(c−x)
f (x) = (c−b)(c−a) , b<x<c


 0, otherwise.

a+b+c
E[X] = and
3

ISYE 6739 — Goldsman 8/5/20 47 / 108


Other Continuous Distributions

Lesson 4.6 — Other Continuous Distributions

Triangular(a, b, c) Distribution — good for modeling RVs on the basis of


limited data (minimum, mode, maximum).

2(x−a)
 (b−a)(c−a) , a < x ≤ b


2(c−x)
f (x) = (c−b)(c−a) , b<x<c


 0, otherwise.

a+b+c
E[X] = and Var(X) = mess
3

ISYE 6739 — Goldsman 8/5/20 47 / 108


Other Continuous Distributions

Beta(a, b) Distribution — good for modeling RVs that are restricted to an


interval.

ISYE 6739 — Goldsman 8/5/20 48 / 108


Other Continuous Distributions

Beta(a, b) Distribution — good for modeling RVs that are restricted to an


interval.

Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)

ISYE 6739 — Goldsman 8/5/20 48 / 108


Other Continuous Distributions

Beta(a, b) Distribution — good for modeling RVs that are restricted to an


interval.

Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)

a
E[X] = and
a+b

ISYE 6739 — Goldsman 8/5/20 48 / 108


Other Continuous Distributions

Beta(a, b) Distribution — good for modeling RVs that are restricted to an


interval.

Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)

a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)

ISYE 6739 — Goldsman 8/5/20 48 / 108


Other Continuous Distributions

Beta(a, b) Distribution — good for modeling RVs that are restricted to an


interval.

Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)

a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)

This distribution gets its name from the beta function, which is defined as

ISYE 6739 — Goldsman 8/5/20 48 / 108


Other Continuous Distributions

Beta(a, b) Distribution — good for modeling RVs that are restricted to an


interval.

Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)

a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)

This distribution gets its name from the beta function, which is defined as
Z 1
Γ(a)Γ(b)
β(a, b) ≡ = xa−1 (1 − x)b−1 dx.
Γ(a + b) 0

ISYE 6739 — Goldsman 8/5/20 48 / 108


Other Continuous Distributions

The Beta distribution is very flexible. Here are a few family portraits.

ISYE 6739 — Goldsman 8/5/20 49 / 108


Other Continuous Distributions

The Beta distribution is very flexible. Here are a few family portraits.

ISYE 6739 — Goldsman 8/5/20 49 / 108


Other Continuous Distributions

The Beta distribution is very flexible. Here are a few family portraits.

Remark: Certain versions of the Uniform (a = b = 1) and Triangular


distributions are special cases of the Beta.

ISYE 6739 — Goldsman 8/5/20 49 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.
b
f (x) = ab(ax)b−1 e−(ax) , x > 0.

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.
b
f (x) = ab(ax)b−1 e−(ax) , x > 0.

F (x) = 1 − exp[−(ax)b ], x > 0.

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.
b
f (x) = ab(ax)b−1 e−(ax) , x > 0.

F (x) = 1 − exp[−(ax)b ], x > 0.

E[X] = (1/a)Γ(1 + (1/b)) and Var(X) = slight mess

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.
b
f (x) = ab(ax)b−1 e−(ax) , x > 0.

F (x) = 1 − exp[−(ax)b ], x > 0.

E[X] = (1/a)Γ(1 + (1/b)) and Var(X) = slight mess

Remark: The Exponential is a special case of the Weibull.

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.
b
f (x) = ab(ax)b−1 e−(ax) , x > 0.

F (x) = 1 − exp[−(ax)b ], x > 0.

E[X] = (1/a)Γ(1 + (1/b)) and Var(X) = slight mess

Remark: The Exponential is a special case of the Weibull.

Example: Time-to-failure T for a transmitter has a Weibull distribution with


parameters a = 1/(200 hrs) and b = 1/3. Then

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.
b
f (x) = ab(ax)b−1 e−(ax) , x > 0.

F (x) = 1 − exp[−(ax)b ], x > 0.

E[X] = (1/a)Γ(1 + (1/b)) and Var(X) = slight mess

Remark: The Exponential is a special case of the Weibull.

Example: Time-to-failure T for a transmitter has a Weibull distribution with


parameters a = 1/(200 hrs) and b = 1/3. Then

E[T ] = 200Γ(1 + 3) = 1200 hrs.

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Weibull(a, b) Distribution — good for modeling reliability models. a is


the “scale” parameter, and b is the “shape” parameter.
b
f (x) = ab(ax)b−1 e−(ax) , x > 0.

F (x) = 1 − exp[−(ax)b ], x > 0.

E[X] = (1/a)Γ(1 + (1/b)) and Var(X) = slight mess

Remark: The Exponential is a special case of the Weibull.

Example: Time-to-failure T for a transmitter has a Weibull distribution with


parameters a = 1/(200 hrs) and b = 1/3. Then

E[T ] = 200Γ(1 + 3) = 1200 hrs.

The probability that it fails before 2000 hrs is

F (2000) = 1 − exp[−(2000/200)1/3 ] = 0.884. 2

ISYE 6739 — Goldsman 8/5/20 50 / 108


Other Continuous Distributions

Cauchy Distribution — A “fat-tailed” distribution good for disproving


things!

ISYE 6739 — Goldsman 8/5/20 51 / 108


Other Continuous Distributions

Cauchy Distribution — A “fat-tailed” distribution good for disproving


things!

1
f (x) = and
π(1 + x2 )

ISYE 6739 — Goldsman 8/5/20 51 / 108


Other Continuous Distributions

Cauchy Distribution — A “fat-tailed” distribution good for disproving


things!

1 1 arctan(x)
f (x) = and F (x) = + , x ∈ R.
π(1 + x2 ) 2 π

ISYE 6739 — Goldsman 8/5/20 51 / 108


Other Continuous Distributions

Cauchy Distribution — A “fat-tailed” distribution good for disproving


things!

1 1 arctan(x)
f (x) = and F (x) = + , x ∈ R.
π(1 + x2 ) 2 π

Theorem: The Cauchy distribution has an undefined mean and infinite


variance!

ISYE 6739 — Goldsman 8/5/20 51 / 108


Other Continuous Distributions

Cauchy Distribution — A “fat-tailed” distribution good for disproving


things!

1 1 arctan(x)
f (x) = and F (x) = + , x ∈ R.
π(1 + x2 ) 2 π

Theorem: The Cauchy distribution has an undefined mean and infinite


variance!
iid
Weird Fact: X1 , . . . , Xn ∼ Cauchy ⇒ ni=1 Xi /n ∼ Cauchy. Even if you
P
take the average of a bunch of Cauchys, you’re right back where you started!

ISYE 6739 — Goldsman 8/5/20 51 / 108


Other Continuous Distributions

Alphabet Soup of Other Distributions

ISYE 6739 — Goldsman 8/5/20 52 / 108


Other Continuous Distributions

Alphabet Soup of Other Distributions

χ2 distribution — coming up when we talk Statistics

ISYE 6739 — Goldsman 8/5/20 52 / 108


Other Continuous Distributions

Alphabet Soup of Other Distributions

χ2 distribution — coming up when we talk Statistics

t distribution — coming up

ISYE 6739 — Goldsman 8/5/20 52 / 108


Other Continuous Distributions

Alphabet Soup of Other Distributions

χ2 distribution — coming up when we talk Statistics

t distribution — coming up

F distribution — coming up

ISYE 6739 — Goldsman 8/5/20 52 / 108


Other Continuous Distributions

Alphabet Soup of Other Distributions

χ2 distribution — coming up when we talk Statistics

t distribution — coming up

F distribution — coming up

Pareto, LaPlace, Rayleigh, Gumbel, Johnson distributions

ISYE 6739 — Goldsman 8/5/20 52 / 108


Other Continuous Distributions

Alphabet Soup of Other Distributions

χ2 distribution — coming up when we talk Statistics

t distribution — coming up

F distribution — coming up

Pareto, LaPlace, Rayleigh, Gumbel, Johnson distributions

Etc. . . .

ISYE 6739 — Goldsman 8/5/20 52 / 108


Normal Distribution: Basics

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 53 / 108


Normal Distribution: Basics

Lesson 4.7 — Normal Distribution: Basics

ISYE 6739 — Goldsman 8/5/20 54 / 108


Normal Distribution: Basics

Lesson 4.7 — Normal Distribution: Basics

The Normal Distribution is so important that we’re giving it an entire section.

ISYE 6739 — Goldsman 8/5/20 54 / 108


Normal Distribution: Basics

Lesson 4.7 — Normal Distribution: Basics

The Normal Distribution is so important that we’re giving it an entire section.

Definition: X ∼ Nor(µ, σ 2 ) if it has pdf

−(x − µ)2
 
1
f (x) = √ exp , ∀x ∈ R.
2πσ 2 2σ 2

ISYE 6739 — Goldsman 8/5/20 54 / 108


Normal Distribution: Basics

Lesson 4.7 — Normal Distribution: Basics

The Normal Distribution is so important that we’re giving it an entire section.

Definition: X ∼ Nor(µ, σ 2 ) if it has pdf

−(x − µ)2
 
1
f (x) = √ exp , ∀x ∈ R.
2πσ 2 2σ 2

Remark: The Normal distribution is also called the Gaussian distribution.

ISYE 6739 — Goldsman 8/5/20 54 / 108


Normal Distribution: Basics

Lesson 4.7 — Normal Distribution: Basics

The Normal Distribution is so important that we’re giving it an entire section.

Definition: X ∼ Nor(µ, σ 2 ) if it has pdf

−(x − µ)2
 
1
f (x) = √ exp , ∀x ∈ R.
2πσ 2 2σ 2

Remark: The Normal distribution is also called the Gaussian distribution.

Examples: Heights, weights, SAT scores, crop yields, and averages of


things tend to be normal.

ISYE 6739 — Goldsman 8/5/20 54 / 108


Normal Distribution: Basics

The pdf f (x) is “bell-shaped” and symmetric around x = µ, with tails falling
off quickly as you move away from µ.

ISYE 6739 — Goldsman 8/5/20 55 / 108


Normal Distribution: Basics

The pdf f (x) is “bell-shaped” and symmetric around x = µ, with tails falling
off quickly as you move away from µ.

ISYE 6739 — Goldsman 8/5/20 55 / 108


Normal Distribution: Basics

The pdf f (x) is “bell-shaped” and symmetric around x = µ, with tails falling
off quickly as you move away from µ.

Small σ 2 corresponds to a “tall, skinny” bell curve; large σ 2 gives a “short,


fat” bell curve.

ISYE 6739 — Goldsman 8/5/20 55 / 108


Normal Distribution: Basics

R
Fun Fact (1): R f (x) dx = 1.

ISYE 6739 — Goldsman 8/5/20 56 / 108


Normal Distribution: Basics

R
Fun Fact (1): R f (x) dx = 1.

Proof: Transform to polar coordinates. Good luck. 2

ISYE 6739 — Goldsman 8/5/20 56 / 108


Normal Distribution: Basics

R
Fun Fact (1): R f (x) dx = 1.

Proof: Transform to polar coordinates. Good luck. 2

Fun Fact (2): The cdf is


x
−(t − µ)2
Z  
1
F (x) = √ exp dt = ??
−∞ 2πσ 2 2σ 2

ISYE 6739 — Goldsman 8/5/20 56 / 108


Normal Distribution: Basics

R
Fun Fact (1): R f (x) dx = 1.

Proof: Transform to polar coordinates. Good luck. 2

Fun Fact (2): The cdf is


x
−(t − µ)2
Z  
1
F (x) = √ exp dt = ??
−∞ 2πσ 2 2σ 2

Remark: No closed-form solution for this. Stay tuned.

ISYE 6739 — Goldsman 8/5/20 56 / 108


Normal Distribution: Basics

R
Fun Fact (1): R f (x) dx = 1.

Proof: Transform to polar coordinates. Good luck. 2

Fun Fact (2): The cdf is


x
−(t − µ)2
Z  
1
F (x) = √ exp dt = ??
−∞ 2πσ 2 2σ 2

Remark: No closed-form solution for this. Stay tuned.

Fun Facts (3) and (4): E[X] = µ and Var(X) = σ 2 .

ISYE 6739 — Goldsman 8/5/20 56 / 108


Normal Distribution: Basics

R
Fun Fact (1): R f (x) dx = 1.

Proof: Transform to polar coordinates. Good luck. 2

Fun Fact (2): The cdf is


x
−(t − µ)2
Z  
1
F (x) = √ exp dt = ??
−∞ 2πσ 2 2σ 2

Remark: No closed-form solution for this. Stay tuned.

Fun Facts (3) and (4): E[X] = µ and Var(X) = σ 2 .

Proof: Integration by parts or mgf (below).

ISYE 6739 — Goldsman 8/5/20 56 / 108


Normal Distribution: Basics

Fun Fact (5): If X is any Normal RV, then

P (µ − σ < X < µ + σ) = 0.6827

ISYE 6739 — Goldsman 8/5/20 57 / 108


Normal Distribution: Basics

Fun Fact (5): If X is any Normal RV, then

P (µ − σ < X < µ + σ) = 0.6827


P (µ − 2σ < X < µ + 2σ) = 0.9545

ISYE 6739 — Goldsman 8/5/20 57 / 108


Normal Distribution: Basics

Fun Fact (5): If X is any Normal RV, then

P (µ − σ < X < µ + σ) = 0.6827


P (µ − 2σ < X < µ + 2σ) = 0.9545
P (µ − 3σ < X < µ + 3σ) = 0.9973.

ISYE 6739 — Goldsman 8/5/20 57 / 108


Normal Distribution: Basics

Fun Fact (5): If X is any Normal RV, then

P (µ − σ < X < µ + σ) = 0.6827


P (µ − 2σ < X < µ + 2σ) = 0.9545
P (µ − 3σ < X < µ + 3σ) = 0.9973.

So almost all of the probability is contained within 3 standard deviations of


the mean. (This is sort of what Toyota is referring to when it brags about
“six-sigma” quality.)

ISYE 6739 — Goldsman 8/5/20 57 / 108


Normal Distribution: Basics

Fun Fact (5): If X is any Normal RV, then

P (µ − σ < X < µ + σ) = 0.6827


P (µ − 2σ < X < µ + 2σ) = 0.9545
P (µ − 3σ < X < µ + 3σ) = 0.9973.

So almost all of the probability is contained within 3 standard deviations of


the mean. (This is sort of what Toyota is referring to when it brags about
“six-sigma” quality.)

Fun Fact (6): The mgf is MX (t) = exp(µt + 12 σ 2 t2 ).

ISYE 6739 — Goldsman 8/5/20 57 / 108


Normal Distribution: Basics

Fun Fact (5): If X is any Normal RV, then

P (µ − σ < X < µ + σ) = 0.6827


P (µ − 2σ < X < µ + 2σ) = 0.9545
P (µ − 3σ < X < µ + 3σ) = 0.9973.

So almost all of the probability is contained within 3 standard deviations of


the mean. (This is sort of what Toyota is referring to when it brags about
“six-sigma” quality.)

Fun Fact (6): The mgf is MX (t) = exp(µt + 12 σ 2 t2 ).

Proof: Calculus (or look it up in a table of integrals).

ISYE 6739 — Goldsman 8/5/20 57 / 108


Normal Distribution: Basics

Theorem (Additive Property of Normals): If X1 , . . . , Xn are independent


with Xi ∼ Nor(µi , σi2 ), i = 1, . . . , n, then

ISYE 6739 — Goldsman 8/5/20 58 / 108


Normal Distribution: Basics

Theorem (Additive Property of Normals): If X1 , . . . , Xn are independent


with Xi ∼ Nor(µi , σi2 ), i = 1, . . . , n, then
n
X n
X n
X 
Y ≡ ai Xi + b ∼ Nor ai µi + b, a2i σi2 .
i=1 i=1 i=1

ISYE 6739 — Goldsman 8/5/20 58 / 108


Normal Distribution: Basics

Theorem (Additive Property of Normals): If X1 , . . . , Xn are independent


with Xi ∼ Nor(µi , σi2 ), i = 1, . . . , n, then
n
X n
X n
X 
Y ≡ ai Xi + b ∼ Nor ai µi + b, a2i σi2 .
i=1 i=1 i=1

So a linear combination of independent normals is itself normal.

ISYE 6739 — Goldsman 8/5/20 58 / 108


Normal Distribution: Basics

Proof: Since Y is a linear function,

MY (t) = MPi ai Xi +b (t)

ISYE 6739 — Goldsman 8/5/20 59 / 108


Normal Distribution: Basics

Proof: Since Y is a linear function,

MY (t) = MPi ai Xi +b (t) = etb MPi ai Xi (t)

ISYE 6739 — Goldsman 8/5/20 59 / 108


Normal Distribution: Basics

Proof: Since Y is a linear function,

MY (t) = MPi ai Xi +b (t) = etb MPi ai Xi (t)

n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1

ISYE 6739 — Goldsman 8/5/20 59 / 108


Normal Distribution: Basics

Proof: Since Y is a linear function,

MY (t) = MPi ai Xi +b (t) = etb MPi ai Xi (t)

n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1

ISYE 6739 — Goldsman 8/5/20 59 / 108


Normal Distribution: Basics

Proof: Since Y is a linear function,

MY (t) = MPi ai Xi +b (t) = etb MPi ai Xi (t)

n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1
n
tb
Y h 1 i
= e exp µi (ai t) + σi2 (ai t)2 (normal mgf)
2
i=1

ISYE 6739 — Goldsman 8/5/20 59 / 108


Normal Distribution: Basics

Proof: Since Y is a linear function,

MY (t) = MPi ai Xi +b (t) = etb MPi ai Xi (t)

n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1
n
tb
Y h 1 i
= e exp µi (ai t) + σi2 (ai t)2 (normal mgf)
2
i=1
n
 X n 
 1  X 2 2 2
= exp µi a i + b t + ai σi t ,
2
i=1 i=1

ISYE 6739 — Goldsman 8/5/20 59 / 108


Normal Distribution: Basics

Proof: Since Y is a linear function,

MY (t) = MPi ai Xi +b (t) = etb MPi ai Xi (t)

n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1
n
tb
Y h 1 i
= e exp µi (ai t) + σi2 (ai t)2 (normal mgf)
2
i=1
n
 X n 
 1  X 2 2 2
= exp µi a i + b t + ai σi t ,
2
i=1 i=1

and we are done by mgf uniqueness. 2

ISYE 6739 — Goldsman 8/5/20 59 / 108


Normal Distribution: Basics

Remark: A normal distribution is completely characterized by its mean and


variance.

ISYE 6739 — Goldsman 8/5/20 60 / 108


Normal Distribution: Basics

Remark: A normal distribution is completely characterized by its mean and


variance.

By the above, we know that a linear combination of independent normals is


still normal.

ISYE 6739 — Goldsman 8/5/20 60 / 108


Normal Distribution: Basics

Remark: A normal distribution is completely characterized by its mean and


variance.

By the above, we know that a linear combination of independent normals is


still normal. Therefore, when we add up independent normals, all we have to
do is figure out the mean and variance — the normality of the sum comes for
free.

ISYE 6739 — Goldsman 8/5/20 60 / 108


Normal Distribution: Basics

Remark: A normal distribution is completely characterized by its mean and


variance.

By the above, we know that a linear combination of independent normals is


still normal. Therefore, when we add up independent normals, all we have to
do is figure out the mean and variance — the normality of the sum comes for
free.

Example: X ∼ Nor(3, 4), Y ∼ Nor(4, 6), and X, Y are independent. Find


the distribution of 2X − 3Y .

ISYE 6739 — Goldsman 8/5/20 60 / 108


Normal Distribution: Basics

Remark: A normal distribution is completely characterized by its mean and


variance.

By the above, we know that a linear combination of independent normals is


still normal. Therefore, when we add up independent normals, all we have to
do is figure out the mean and variance — the normality of the sum comes for
free.

Example: X ∼ Nor(3, 4), Y ∼ Nor(4, 6), and X, Y are independent. Find


the distribution of 2X − 3Y .

Solution: This is normal with

E[2X − 3Y ] = 2E[X] − 3E[Y ] = 2(3) − 3(4) = −6

ISYE 6739 — Goldsman 8/5/20 60 / 108


Normal Distribution: Basics

Remark: A normal distribution is completely characterized by its mean and


variance.

By the above, we know that a linear combination of independent normals is


still normal. Therefore, when we add up independent normals, all we have to
do is figure out the mean and variance — the normality of the sum comes for
free.

Example: X ∼ Nor(3, 4), Y ∼ Nor(4, 6), and X, Y are independent. Find


the distribution of 2X − 3Y .

Solution: This is normal with

E[2X − 3Y ] = 2E[X] − 3E[Y ] = 2(3) − 3(4) = −6

and
Var(2X − 3Y ) = 4Var(X) + 9Var(Y ) = 70.

ISYE 6739 — Goldsman 8/5/20 60 / 108


Normal Distribution: Basics

Remark: A normal distribution is completely characterized by its mean and


variance.

By the above, we know that a linear combination of independent normals is


still normal. Therefore, when we add up independent normals, all we have to
do is figure out the mean and variance — the normality of the sum comes for
free.

Example: X ∼ Nor(3, 4), Y ∼ Nor(4, 6), and X, Y are independent. Find


the distribution of 2X − 3Y .

Solution: This is normal with

E[2X − 3Y ] = 2E[X] − 3E[Y ] = 2(3) − 3(4) = −6

and
Var(2X − 3Y ) = 4Var(X) + 9Var(Y ) = 70.
Thus, 2X − 3Y ∼ Nor(−6, 70). 2
ISYE 6739 — Goldsman 8/5/20 60 / 108
Normal Distribution: Basics

Corollary (of Additive Property Theorem):

ISYE 6739 — Goldsman 8/5/20 61 / 108


Normal Distribution: Basics

Corollary (of Additive Property Theorem):

X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).

ISYE 6739 — Goldsman 8/5/20 61 / 108


Normal Distribution: Basics

Corollary (of Additive Property Theorem):

X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).

Proof: Immediate from the Additive Property after noting that


E[aX + b] = aµ + b and Var(aX + b) = a2 σ 2 . 2

ISYE 6739 — Goldsman 8/5/20 61 / 108


Normal Distribution: Basics

Corollary (of Additive Property Theorem):

X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).

Proof: Immediate from the Additive Property after noting that


E[aX + b] = aµ + b and Var(aX + b) = a2 σ 2 . 2

Corollary (of Corollary):

ISYE 6739 — Goldsman 8/5/20 61 / 108


Normal Distribution: Basics

Corollary (of Additive Property Theorem):

X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).

Proof: Immediate from the Additive Property after noting that


E[aX + b] = aµ + b and Var(aX + b) = a2 σ 2 . 2

Corollary (of Corollary):


X −µ
X ∼ Nor(µ, σ 2 ) ⇒ Z ≡ ∼ Nor(0, 1).
σ

ISYE 6739 — Goldsman 8/5/20 61 / 108


Normal Distribution: Basics

Corollary (of Additive Property Theorem):

X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).

Proof: Immediate from the Additive Property after noting that


E[aX + b] = aµ + b and Var(aX + b) = a2 σ 2 . 2

Corollary (of Corollary):


X −µ
X ∼ Nor(µ, σ 2 ) ⇒ Z ≡ ∼ Nor(0, 1).
σ
Proof: Use above Corollary with a = 1/σ and b = −µ/σ. 2

ISYE 6739 — Goldsman 8/5/20 61 / 108


Normal Distribution: Basics

Corollary (of Additive Property Theorem):

X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).

Proof: Immediate from the Additive Property after noting that


E[aX + b] = aµ + b and Var(aX + b) = a2 σ 2 . 2

Corollary (of Corollary):


X −µ
X ∼ Nor(µ, σ 2 ) ⇒ Z ≡ ∼ Nor(0, 1).
σ
Proof: Use above Corollary with a = 1/σ and b = −µ/σ. 2

The manipulation described in this corollary is referred to as


standardization.

ISYE 6739 — Goldsman 8/5/20 61 / 108


Standard Normal Distribution

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 62 / 108


Standard Normal Distribution

Lesson 4.8 — Standard Normal Distribution

ISYE 6739 — Goldsman 8/5/20 63 / 108


Standard Normal Distribution

Lesson 4.8 — Standard Normal Distribution

Definition: The Nor(0, 1) is called the standard normal distribution,


and is often denoted by Z.

ISYE 6739 — Goldsman 8/5/20 63 / 108


Standard Normal Distribution

Lesson 4.8 — Standard Normal Distribution

Definition: The Nor(0, 1) is called the standard normal distribution,


and is often denoted by Z.

The Nor(0, 1) is nice because there are tables available for its cdf.

ISYE 6739 — Goldsman 8/5/20 63 / 108


Standard Normal Distribution

Lesson 4.8 — Standard Normal Distribution

Definition: The Nor(0, 1) is called the standard normal distribution,


and is often denoted by Z.

The Nor(0, 1) is nice because there are tables available for its cdf.

You can standardize any normal RV X into a standard normal by applying the
transformation Z = (X − µ)/σ. Then you can use the cdf tables.

ISYE 6739 — Goldsman 8/5/20 63 / 108


Standard Normal Distribution

Lesson 4.8 — Standard Normal Distribution

Definition: The Nor(0, 1) is called the standard normal distribution,


and is often denoted by Z.

The Nor(0, 1) is nice because there are tables available for its cdf.

You can standardize any normal RV X into a standard normal by applying the
transformation Z = (X − µ)/σ. Then you can use the cdf tables.

The pdf of the Nor(0, 1) is


1 2
φ(z) ≡ √ e−z /2 , z ∈ R.

ISYE 6739 — Goldsman 8/5/20 63 / 108


Standard Normal Distribution

Lesson 4.8 — Standard Normal Distribution

Definition: The Nor(0, 1) is called the standard normal distribution,


and is often denoted by Z.

The Nor(0, 1) is nice because there are tables available for its cdf.

You can standardize any normal RV X into a standard normal by applying the
transformation Z = (X − µ)/σ. Then you can use the cdf tables.

The pdf of the Nor(0, 1) is


1 2
φ(z) ≡ √ e−z /2 , z ∈ R.

The cdf is Z z
Φ(z) ≡ φ(t) dt, z ∈ R.
−∞

ISYE 6739 — Goldsman 8/5/20 63 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

Φ(0) = 1/2

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1

Then

P (µ − kσ ≤ X ≤ µ + kσ)

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1

Then

P (µ − kσ ≤ X ≤ µ + kσ) = P (−k ≤ Z ≤ k)

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1

Then

P (µ − kσ ≤ X ≤ µ + kσ) = P (−k ≤ Z ≤ k) = 2Φ(k) − 1.

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Remarks: The following results are easy to derive, usually via symmetry
arguments.

P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)

Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1

Then

P (µ − kσ ≤ X ≤ µ + kσ) = P (−k ≤ Z ≤ k) = 2Φ(k) − 1.

So the probability that any normal RV is within k standard deviations of its


mean doesn’t depend on the mean or variance.

ISYE 6739 — Goldsman 8/5/20 64 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these.

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750
2.33 0.9901 ≈ 0.99

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750
2.33 0.9901 ≈ 0.99
3.00 0.9987

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)

z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750
2.33 0.9901 ≈ 0.99
3.00 0.9987
4.00 ≈ 1.0000

ISYE 6739 — Goldsman 8/5/20 65 / 108


Standard Normal Distribution

By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is

ISYE 6739 — Goldsman 8/5/20 66 / 108


Standard Normal Distribution

By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is

P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.

ISYE 6739 — Goldsman 8/5/20 66 / 108


Standard Normal Distribution

By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is

P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.

For k = 1, this probability is 2(0.8413) − 1 = 0.6827.

ISYE 6739 — Goldsman 8/5/20 66 / 108


Standard Normal Distribution

By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is

P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.

For k = 1, this probability is 2(0.8413) − 1 = 0.6827.

There is a 95% chance that a normal observation will be within 2 s.d.’s of its
mean.

ISYE 6739 — Goldsman 8/5/20 66 / 108


Standard Normal Distribution

By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is

P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.

For k = 1, this probability is 2(0.8413) − 1 = 0.6827.

There is a 95% chance that a normal observation will be within 2 s.d.’s of its
mean.

99.7% of all observations are within 3 standard deviations of the mean!

ISYE 6739 — Goldsman 8/5/20 66 / 108


Standard Normal Distribution

By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is

P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.

For k = 1, this probability is 2(0.8413) − 1 = 0.6827.

There is a 95% chance that a normal observation will be within 2 s.d.’s of its
mean.

99.7% of all observations are within 3 standard deviations of the mean!


.
Finally, note that Toyota’s six-sigma corresponds to 2Φ(6) − 1 = 1.0000.

ISYE 6739 — Goldsman 8/5/20 66 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.

p Φ−1 (p)

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.

p Φ−1 (p)
0.90 1.28

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.

p Φ−1 (p)
0.90 1.28
0.95 1.645

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.

p Φ−1 (p)
0.90 1.28
0.95 1.645
0.975 1.96

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.

p Φ−1 (p)
0.90 1.28
0.95 1.645
0.975 1.96
0.99 2.33

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.

Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.

p Φ−1 (p)
0.90 1.28
0.95 1.645
0.975 1.96
0.99 2.33
0.995 2.58

ISYE 6739 — Goldsman 8/5/20 67 / 108


Standard Normal Distribution

Example: X ∼ Nor(21, 4). Find P (19 < X < 22.5). Standardizing, we


get

ISYE 6739 — Goldsman 8/5/20 68 / 108


Standard Normal Distribution

Example: X ∼ Nor(21, 4). Find P (19 < X < 22.5). Standardizing, we


get

P (19 < X < 22.5)


 
19 − µ X −µ 22.5 − µ
= P < <
σ σ σ

ISYE 6739 — Goldsman 8/5/20 68 / 108


Standard Normal Distribution

Example: X ∼ Nor(21, 4). Find P (19 < X < 22.5). Standardizing, we


get

P (19 < X < 22.5)


 
19 − µ X −µ 22.5 − µ
= P < <
σ σ σ
 
19 − 21 22.5 − 21
= P <Z<
2 2

ISYE 6739 — Goldsman 8/5/20 68 / 108


Standard Normal Distribution

Example: X ∼ Nor(21, 4). Find P (19 < X < 22.5). Standardizing, we


get

P (19 < X < 22.5)


 
19 − µ X −µ 22.5 − µ
= P < <
σ σ σ
 
19 − 21 22.5 − 21
= P <Z<
2 2
= P (−1 < Z < 0.75)

ISYE 6739 — Goldsman 8/5/20 68 / 108


Standard Normal Distribution

Example: X ∼ Nor(21, 4). Find P (19 < X < 22.5). Standardizing, we


get

P (19 < X < 22.5)


 
19 − µ X −µ 22.5 − µ
= P < <
σ σ σ
 
19 − 21 22.5 − 21
= P <Z<
2 2
= P (−1 < Z < 0.75)
= Φ(0.75) − Φ(−1)

ISYE 6739 — Goldsman 8/5/20 68 / 108


Standard Normal Distribution

Example: X ∼ Nor(21, 4). Find P (19 < X < 22.5). Standardizing, we


get

P (19 < X < 22.5)


 
19 − µ X −µ 22.5 − µ
= P < <
σ σ σ
 
19 − 21 22.5 − 21
= P <Z<
2 2
= P (−1 < Z < 0.75)
= Φ(0.75) − Φ(−1)
= Φ(0.75) − [1 − Φ(1)]

ISYE 6739 — Goldsman 8/5/20 68 / 108


Standard Normal Distribution

Example: X ∼ Nor(21, 4). Find P (19 < X < 22.5). Standardizing, we


get

P (19 < X < 22.5)


 
19 − µ X −µ 22.5 − µ
= P < <
σ σ σ
 
19 − 21 22.5 − 21
= P <Z<
2 2
= P (−1 < Z < 0.75)
= Φ(0.75) − Φ(−1)
= Φ(0.75) − [1 − Φ(1)]
= 0.7734 − [1 − 0.8413] = 0.6147. 2

ISYE 6739 — Goldsman 8/5/20 68 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

Find the probability that the woman is taller than the man.

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

Find the probability that the woman is taller than the man.

Answer: Note that


W −M ∼ Nor(E[W − M ], Var(W − M ))

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

Find the probability that the woman is taller than the man.

Answer: Note that


W −M ∼ Nor(E[W − M ], Var(W − M ))
∼ Nor(65 − 68, 1 + 4) ∼ Nor(−3, 5).

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

Find the probability that the woman is taller than the man.

Answer: Note that


W −M ∼ Nor(E[W − M ], Var(W − M ))
∼ Nor(65 − 68, 1 + 4) ∼ Nor(−3, 5).
Then
P (W > M ) = P (W − M > 0)

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

Find the probability that the woman is taller than the man.

Answer: Note that


W −M ∼ Nor(E[W − M ], Var(W − M ))
∼ Nor(65 − 68, 1 + 4) ∼ Nor(−3, 5).
Then
P (W > M ) = P (W − M > 0)
 
0+3
= P Z> √
5

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

Find the probability that the woman is taller than the man.

Answer: Note that


W −M ∼ Nor(E[W − M ], Var(W − M ))
∼ Nor(65 − 68, 1 + 4) ∼ Nor(−3, 5).
Then
P (W > M ) = P (W − M > 0)
 
0+3
= P Z> √
5

= 1 − Φ(3/ 5)

ISYE 6739 — Goldsman 8/5/20 69 / 108


Standard Normal Distribution

Example: Suppose that heights of men are M ∼ Nor(68, 4) and heights of


women are W ∼ Nor(65, 1).

Select a man and woman independently at random.

Find the probability that the woman is taller than the man.

Answer: Note that


W −M ∼ Nor(E[W − M ], Var(W − M ))
∼ Nor(65 − 68, 1 + 4) ∼ Nor(−3, 5).
Then
P (W > M ) = P (W − M > 0)
 
0+3
= P Z> √
5

= 1 − Φ(3/ 5)
.
= 1 − 0.910 = 0.090. 2
ISYE 6739 — Goldsman 8/5/20 69 / 108
Sample Mean of Normals

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 70 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals

ISYE 6739 — Goldsman 8/5/20 71 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals


Pn
The sample mean of X1 , . . . , Xn is X̄ ≡ i=1 Xi /n.

ISYE 6739 — Goldsman 8/5/20 71 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals


Pn
The sample mean of X1 , . . . , Xn is X̄ ≡ i=1 Xi /n.

Corollary (of old theorem):


iid
X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).

ISYE 6739 — Goldsman 8/5/20 71 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals


Pn
The sample mean of X1 , . . . , Xn is X̄ ≡ i=1 Xi /n.

Corollary (of old theorem):


iid
X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).

Proof: By previous work, as long as X1 , . . . , Xn are iid something, we have

ISYE 6739 — Goldsman 8/5/20 71 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals


Pn
The sample mean of X1 , . . . , Xn is X̄ ≡ i=1 Xi /n.

Corollary (of old theorem):


iid
X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).

Proof: By previous work, as long as X1 , . . . , Xn are iid something, we have


E[X̄] = µ and Var(X̄) = σ 2 /n.

ISYE 6739 — Goldsman 8/5/20 71 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals


Pn
The sample mean of X1 , . . . , Xn is X̄ ≡ i=1 Xi /n.

Corollary (of old theorem):


iid
X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).

Proof: By previous work, as long as X1 , . . . , Xn are iid something, we have


E[X̄] = µ and Var(X̄) = σ 2 /n. Since X̄ is a linear combination of
independent normals, it’s also normal. Done. 2

ISYE 6739 — Goldsman 8/5/20 71 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals


Pn
The sample mean of X1 , . . . , Xn is X̄ ≡ i=1 Xi /n.

Corollary (of old theorem):


iid
X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).

Proof: By previous work, as long as X1 , . . . , Xn are iid something, we have


E[X̄] = µ and Var(X̄) = σ 2 /n. Since X̄ is a linear combination of
independent normals, it’s also normal. Done. 2

Remark: This result is very significant! As the number of observations


increases, Var(X̄) gets smaller (while E[X̄] remains constant). In fact, it’s
called the Law of Large Numbers.

ISYE 6739 — Goldsman 8/5/20 71 / 108


Sample Mean of Normals

Lesson 4.9 — Sample Mean of Normals


Pn
The sample mean of X1 , . . . , Xn is X̄ ≡ i=1 Xi /n.

Corollary (of old theorem):


iid
X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).

Proof: By previous work, as long as X1 , . . . , Xn are iid something, we have


E[X̄] = µ and Var(X̄) = σ 2 /n. Since X̄ is a linear combination of
independent normals, it’s also normal. Done. 2

Remark: This result is very significant! As the number of observations


increases, Var(X̄) gets smaller (while E[X̄] remains constant). In fact, it’s
called the Law of Large Numbers.

In the upcoming statistics portion of the course, we’ll learn that this makes X̄
an excellent estimator for the mean µ, which is typically unknown in
practice.
ISYE 6739 — Goldsman 8/5/20 71 / 108
Sample Mean of Normals

iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.

ISYE 6739 — Goldsman 8/5/20 72 / 108


Sample Mean of Normals

iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?

ISYE 6739 — Goldsman 8/5/20 72 / 108


Sample Mean of Normals

iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?

Solution: Note that X̄ ∼ Nor(µ, 16/n). Then

ISYE 6739 — Goldsman 8/5/20 72 / 108


Sample Mean of Normals

iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?

Solution: Note that X̄ ∼ Nor(µ, 16/n). Then

P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)

ISYE 6739 — Goldsman 8/5/20 72 / 108


Sample Mean of Normals

iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?

Solution: Note that X̄ ∼ Nor(µ, 16/n). Then

P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)
 
−1 X̄ − µ 1
= P √ ≤ √ ≤ √
4/ n 4/ n 4/ n

ISYE 6739 — Goldsman 8/5/20 72 / 108


Sample Mean of Normals

iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?

Solution: Note that X̄ ∼ Nor(µ, 16/n). Then

P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)
 
−1 X̄ − µ 1
= P √ ≤ √ ≤ √
4/ n 4/ n 4/ n
 √ √ 
− n n
= P ≤Z≤
4 4

ISYE 6739 — Goldsman 8/5/20 72 / 108


Sample Mean of Normals

iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?

Solution: Note that X̄ ∼ Nor(µ, 16/n). Then

P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)
 
−1 X̄ − µ 1
= P √ ≤ √ ≤ √
4/ n 4/ n 4/ n
 √ √ 
− n n
= P ≤Z≤
4 4

= 2Φ( n/4) − 1.

ISYE 6739 — Goldsman 8/5/20 72 / 108


Sample Mean of Normals

Now we have to find n such that this probability is at least 0.95. . . .

ISYE 6739 — Goldsman 8/5/20 73 / 108


Sample Mean of Normals

Now we have to find n such that this probability is at least 0.95. . . .



2Φ( n/4) − 1 ≥ 0.95 iff

ISYE 6739 — Goldsman 8/5/20 73 / 108


Sample Mean of Normals

Now we have to find n such that this probability is at least 0.95. . . .



2Φ( n/4) − 1 ≥ 0.95 iff

Φ( n/4) ≥ 0.975 iff

ISYE 6739 — Goldsman 8/5/20 73 / 108


Sample Mean of Normals

Now we have to find n such that this probability is at least 0.95. . . .



2Φ( n/4) − 1 ≥ 0.95 iff

Φ( n/4) ≥ 0.975 iff

n
≥ Φ−1 (0.975) = 1.96
4

ISYE 6739 — Goldsman 8/5/20 73 / 108


Sample Mean of Normals

Now we have to find n such that this probability is at least 0.95. . . .



2Φ( n/4) − 1 ≥ 0.95 iff

Φ( n/4) ≥ 0.975 iff

n
≥ Φ−1 (0.975) = 1.96
4
iff n ≥ 61.47 or 62.

ISYE 6739 — Goldsman 8/5/20 73 / 108


Sample Mean of Normals

Now we have to find n such that this probability is at least 0.95. . . .



2Φ( n/4) − 1 ≥ 0.95 iff

Φ( n/4) ≥ 0.975 iff

n
≥ Φ−1 (0.975) = 1.96
4
iff n ≥ 61.47 or 62.

So if you take the average of 62 observations, then X̄ has a 95% chance of


being within 1 of µ. 2

ISYE 6739 — Goldsman 8/5/20 73 / 108


The Central Limit Theorem + Proof

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 74 / 108


The Central Limit Theorem + Proof

Lesson 4.10 — The Central Limit Theorem + Proof

ISYE 6739 — Goldsman 8/5/20 75 / 108


The Central Limit Theorem + Proof

Lesson 4.10 — The Central Limit Theorem + Proof

The Central Limit Theorem is the most-important theorem in probability and


statistics!

ISYE 6739 — Goldsman 8/5/20 75 / 108


The Central Limit Theorem + Proof

Lesson 4.10 — The Central Limit Theorem + Proof

The Central Limit Theorem is the most-important theorem in probability and


statistics!

CLT: Suppose X1 , . . . , Xn are iid with E[Xi ] = µ and Var(Xi ) = σ 2 . Then


as n → ∞,

ISYE 6739 — Goldsman 8/5/20 75 / 108


The Central Limit Theorem + Proof

Lesson 4.10 — The Central Limit Theorem + Proof

The Central Limit Theorem is the most-important theorem in probability and


statistics!

CLT: Suppose X1 , . . . , Xn are iid with E[Xi ] = µ and Var(Xi ) = σ 2 . Then


as n → ∞,
Pn
i=1 X i − nµ X̄ − µ
Zn = √ = √
σ n σ/ n

ISYE 6739 — Goldsman 8/5/20 75 / 108


The Central Limit Theorem + Proof

Lesson 4.10 — The Central Limit Theorem + Proof

The Central Limit Theorem is the most-important theorem in probability and


statistics!

CLT: Suppose X1 , . . . , Xn are iid with E[Xi ] = µ and Var(Xi ) = σ 2 . Then


as n → ∞,
Pn
i=1 X i − nµ X̄ − µ X̄ − E[X̄]
Zn = √ = √ = p
σ n σ/ n Var(X̄)

ISYE 6739 — Goldsman 8/5/20 75 / 108


The Central Limit Theorem + Proof

Lesson 4.10 — The Central Limit Theorem + Proof

The Central Limit Theorem is the most-important theorem in probability and


statistics!

CLT: Suppose X1 , . . . , Xn are iid with E[Xi ] = µ and Var(Xi ) = σ 2 . Then


as n → ∞,
Pn
i=1 X i − nµ X̄ − µ X̄ − E[X̄] d
Zn = √ = √ = p → Nor(0, 1),
σ n σ/ n Var(X̄)

ISYE 6739 — Goldsman 8/5/20 75 / 108


The Central Limit Theorem + Proof

Lesson 4.10 — The Central Limit Theorem + Proof

The Central Limit Theorem is the most-important theorem in probability and


statistics!

CLT: Suppose X1 , . . . , Xn are iid with E[Xi ] = µ and Var(Xi ) = σ 2 . Then


as n → ∞,
Pn
i=1 X i − nµ X̄ − µ X̄ − E[X̄] d
Zn = √ = √ = p → Nor(0, 1),
σ n σ/ n Var(X̄)
d
where “ → ” means that the cdf of Zn → the Nor(0, 1) cdf.

ISYE 6739 — Goldsman 8/5/20 75 / 108


The Central Limit Theorem + Proof

Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)

ISYE 6739 — Goldsman 8/5/20 76 / 108


The Central Limit Theorem + Proof

Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)

Moreover, without loss of generality (since we’re standardizing anyway) and


for notational convenience, we’ll assume that µ = 0 and σ 2 = 1.

ISYE 6739 — Goldsman 8/5/20 76 / 108


The Central Limit Theorem + Proof

Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)

Moreover, without loss of generality (since we’re standardizing anyway) and


for notational convenience, we’ll assume that µ = 0 and σ 2 = 1.

We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.

ISYE 6739 — Goldsman 8/5/20 76 / 108


The Central Limit Theorem + Proof

Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)

Moreover, without loss of generality (since we’re standardizing anyway) and


for notational convenience, we’ll assume that µ = 0 and σ 2 = 1.

We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.

To get things going, the mgf of Zn is

MZn (t) = MPni=1 Xi /√n (t)

ISYE 6739 — Goldsman 8/5/20 76 / 108


The Central Limit Theorem + Proof

Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)

Moreover, without loss of generality (since we’re standardizing anyway) and


for notational convenience, we’ll assume that µ = 0 and σ 2 = 1.

We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.

To get things going, the mgf of Zn is

MZn (t) = MPni=1 Xi /√n (t)



= MPni=1 Xi (t/ n) (mgf of a linear function of a RV)

ISYE 6739 — Goldsman 8/5/20 76 / 108


The Central Limit Theorem + Proof

Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)

Moreover, without loss of generality (since we’re standardizing anyway) and


for notational convenience, we’ll assume that µ = 0 and σ 2 = 1.

We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.

To get things going, the mgf of Zn is

MZn (t) = MPni=1 Xi /√n (t)



= MPni=1 Xi (t/ n) (mgf of a linear function of a RV)
 √ n
= MX (t/ n) (Xi ’s are iid).

ISYE 6739 — Goldsman 8/5/20 76 / 108


The Central Limit Theorem + Proof

Thus, taking logs, our goal is to show that

ISYE 6739 — Goldsman 8/5/20 77 / 108


The Central Limit Theorem + Proof

Thus, taking logs, our goal is to show that


√ 
lim n `n MX (t/ n) = t2 /2.
n→∞

ISYE 6739 — Goldsman 8/5/20 77 / 108


The Central Limit Theorem + Proof

Thus, taking logs, our goal is to show that


√ 
lim n `n MX (t/ n) = t2 /2.
n→∞

If we let y = 1/ n, our revised goal is to show that

`n MX (ty)
lim = t2 /2.
y→0 y2

ISYE 6739 — Goldsman 8/5/20 77 / 108


The Central Limit Theorem + Proof

Thus, taking logs, our goal is to show that


√ 
lim n `n MX (t/ n) = t2 /2.
n→∞

If we let y = 1/ n, our revised goal is to show that

`n MX (ty)
lim = t2 /2.
y→0 y2

Before proceeding further, note that


 
lim `n MX (ty) = `n MX (0) = `n(1) = 0 (3)
y→0

ISYE 6739 — Goldsman 8/5/20 77 / 108


The Central Limit Theorem + Proof

Thus, taking logs, our goal is to show that


√ 
lim n `n MX (t/ n) = t2 /2.
n→∞

If we let y = 1/ n, our revised goal is to show that

`n MX (ty)
lim = t2 /2.
y→0 y2

Before proceeding further, note that


 
lim `n MX (ty) = `n MX (0) = `n(1) = 0 (3)
y→0

and
0 0
lim MX (ty) = MX (0) = E[X] = µ = 0, (4)
y→0

ISYE 6739 — Goldsman 8/5/20 77 / 108


The Central Limit Theorem + Proof

Thus, taking logs, our goal is to show that


√ 
lim n `n MX (t/ n) = t2 /2.
n→∞

If we let y = 1/ n, our revised goal is to show that

`n MX (ty)
lim = t2 /2.
y→0 y2

Before proceeding further, note that


 
lim `n MX (ty) = `n MX (0) = `n(1) = 0 (3)
y→0

and
0 0
lim MX (ty) = MX (0) = E[X] = µ = 0, (4)
y→0

where the last equality is from our standardization assumption.


ISYE 6739 — Goldsman 8/5/20 77 / 108
The Central Limit Theorem + Proof

So after all of this build-up, we have



`n MX (ty)
lim
y→0 y2
t MX0 (ty)
= lim (by (3) et L’Hôspital to deal with 0 / 0)
y→0 2y MX (ty)

ISYE 6739 — Goldsman 8/5/20 78 / 108


The Central Limit Theorem + Proof

So after all of this build-up, we have



`n MX (ty)
lim
y→0 y2
t MX0 (ty)
= lim (by (3) et L’Hôspital to deal with 0 / 0)
y→0 2y MX (ty)
00 (ty)
t 2 MX
= lim 0 (ty) (by (4) et L’Hôspital encore)
y→0 2 MX (ty) + 2yt MX

ISYE 6739 — Goldsman 8/5/20 78 / 108


The Central Limit Theorem + Proof

So after all of this build-up, we have



`n MX (ty)
lim
y→0 y2
t MX0 (ty)
= lim (by (3) et L’Hôspital to deal with 0 / 0)
y→0 2y MX (ty)
00 (ty)
t 2 MX
= lim 0 (ty) (by (4) et L’Hôspital encore)
y→0 2 MX (ty) + 2yt MX
00 (0)
t2 MX
=
2 MX (0) + 0

ISYE 6739 — Goldsman 8/5/20 78 / 108


The Central Limit Theorem + Proof

So after all of this build-up, we have



`n MX (ty)
lim
y→0 y2
t MX0 (ty)
= lim (by (3) et L’Hôspital to deal with 0 / 0)
y→0 2y MX (ty)
00 (ty)
t 2 MX
= lim 0 (ty) (by (4) et L’Hôspital encore)
y→0 2 MX (ty) + 2yt MX
00 (0)
t2 MX t2 E[X 2 ]
= =
2 MX (0) + 0 2

ISYE 6739 — Goldsman 8/5/20 78 / 108


The Central Limit Theorem + Proof

So after all of this build-up, we have



`n MX (ty)
lim
y→0 y2
t MX0 (ty)
= lim (by (3) et L’Hôspital to deal with 0 / 0)
y→0 2y MX (ty)
00 (ty)
t 2 MX
= lim 0 (ty) (by (4) et L’Hôspital encore)
y→0 2 MX (ty) + 2yt MX
00 (0)
t2 MX t2 E[X 2 ] t2
= = = (E[X 2 ] = σ 2 − µ2 = 1).
2 MX (0) + 0 2 2

ISYE 6739 — Goldsman 8/5/20 78 / 108


The Central Limit Theorem + Proof

Remarks: We have a lot to say about such an important theorem.

ISYE 6739 — Goldsman 8/5/20 79 / 108


The Central Limit Theorem + Proof

Remarks: We have a lot to say about such an important theorem.

(1) If n is large, then X̄ ≈ Nor(µ, σ 2 /n).

ISYE 6739 — Goldsman 8/5/20 79 / 108


The Central Limit Theorem + Proof

Remarks: We have a lot to say about such an important theorem.

(1) If n is large, then X̄ ≈ Nor(µ, σ 2 /n).

(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!

ISYE 6739 — Goldsman 8/5/20 79 / 108


The Central Limit Theorem + Proof

Remarks: We have a lot to say about such an important theorem.

(1) If n is large, then X̄ ≈ Nor(µ, σ 2 /n).

(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!

(3) You usually need n ≥ 30 observations for the approximation to work well.
(Need fewer observations if the Xi ’s come from a symmetric distribution.)

ISYE 6739 — Goldsman 8/5/20 79 / 108


The Central Limit Theorem + Proof

Remarks: We have a lot to say about such an important theorem.

(1) If n is large, then X̄ ≈ Nor(µ, σ 2 /n).

(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!

(3) You usually need n ≥ 30 observations for the approximation to work well.
(Need fewer observations if the Xi ’s come from a symmetric distribution.)

(4) You can almost always use the CLT if the observations are iid.

ISYE 6739 — Goldsman 8/5/20 79 / 108


The Central Limit Theorem + Proof

Remarks: We have a lot to say about such an important theorem.

(1) If n is large, then X̄ ≈ Nor(µ, σ 2 /n).

(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!

(3) You usually need n ≥ 30 observations for the approximation to work well.
(Need fewer observations if the Xi ’s come from a symmetric distribution.)

(4) You can almost always use the CLT if the observations are iid.

(5) In fact, there are versions of the CLT that are actually a lot more general
than the theorem presented here!

ISYE 6739 — Goldsman 8/5/20 79 / 108


Central Limit Theorem Examples

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 80 / 108


Central Limit Theorem Examples

Lesson 4.11 — Central Limit Theorem Examples

ISYE 6739 — Goldsman 8/5/20 81 / 108


Central Limit Theorem Examples

Lesson 4.11 — Central Limit Theorem Examples

Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui .

ISYE 6739 — Goldsman 8/5/20 81 / 108


Central Limit Theorem Examples

Lesson 4.11 — Central Limit Theorem Examples

Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui . Note that
E[Sn ] = nE[Ui ] = n/2, and Var(Sn ) = nVar(Ui ) = n/12. Therefore,

ISYE 6739 — Goldsman 8/5/20 81 / 108


Central Limit Theorem Examples

Lesson 4.11 — Central Limit Theorem Examples

Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui . Note that
E[Sn ] = nE[Ui ] = n/2, and Var(Sn ) = nVar(Ui ) = n/12. Therefore,
Sn − n/2
Zn ≡ p = Sn − 6 ≈ Nor(0, 1).
n/12

ISYE 6739 — Goldsman 8/5/20 81 / 108


Central Limit Theorem Examples

Lesson 4.11 — Central Limit Theorem Examples

Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui . Note that
E[Sn ] = nE[Ui ] = n/2, and Var(Sn ) = nVar(Ui ) = n/12. Therefore,
Sn − n/2
Zn ≡ p = Sn − 6 ≈ Nor(0, 1).
n/12
The histogram was compiled using 100,000 simulations of Z12 . It works! 2

ISYE 6739 — Goldsman 8/5/20 81 / 108


Central Limit Theorem Examples

iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).

ISYE 6739 — Goldsman 8/5/20 82 / 108


Central Limit Theorem Examples

iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).

ISYE 6739 — Goldsman 8/5/20 82 / 108


Central Limit Theorem Examples

iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).

Solution: Recall that if Xi ∼ Exp(λ), then E[Xi ] = 1/λ and


Var(Xi ) = 1/λ2 .

ISYE 6739 — Goldsman 8/5/20 82 / 108


Central Limit Theorem Examples

iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).

Solution: Recall that if Xi ∼ Exp(λ), then E[Xi ] = 1/λ and


Var(Xi ) = 1/λ2 .

Further, if X̄ is the sample mean based on n observations, then

ISYE 6739 — Goldsman 8/5/20 82 / 108


Central Limit Theorem Examples

iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).

Solution: Recall that if Xi ∼ Exp(λ), then E[Xi ] = 1/λ and


Var(Xi ) = 1/λ2 .

Further, if X̄ is the sample mean based on n observations, then

E[X̄] = E[Xi ] = 1/λ and

ISYE 6739 — Goldsman 8/5/20 82 / 108


Central Limit Theorem Examples

iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).

Solution: Recall that if Xi ∼ Exp(λ), then E[Xi ] = 1/λ and


Var(Xi ) = 1/λ2 .

Further, if X̄ is the sample mean based on n observations, then

E[X̄] = E[Xi ] = 1/λ and

Var(X̄) = Var(Xi )/n = 1/(nλ2 ).

ISYE 6739 — Goldsman 8/5/20 82 / 108


Central Limit Theorem Examples

iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).

Solution: Recall that if Xi ∼ Exp(λ), then E[Xi ] = 1/λ and


Var(Xi ) = 1/λ2 .

Further, if X̄ is the sample mean based on n observations, then

E[X̄] = E[Xi ] = 1/λ and

Var(X̄) = Var(Xi )/n = 1/(nλ2 ).

For our problem, λ = 1/1000 and n = 100, so that E[X̄] = 1000 and
Var(X̄) = 10000.

ISYE 6739 — Goldsman 8/5/20 82 / 108


Central Limit Theorem Examples

So by the CLT,

P (950 ≤ X̄ ≤ 1050)

ISYE 6739 — Goldsman 8/5/20 83 / 108


Central Limit Theorem Examples

So by the CLT,

P (950 ≤ X̄ ≤ 1050)
 
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)

ISYE 6739 — Goldsman 8/5/20 83 / 108


Central Limit Theorem Examples

So by the CLT,

P (950 ≤ X̄ ≤ 1050)
 
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)
 
. 950 − 1000 1050 − 1000
= P ≤Z≤
100 100

ISYE 6739 — Goldsman 8/5/20 83 / 108


Central Limit Theorem Examples

So by the CLT,

P (950 ≤ X̄ ≤ 1050)
 
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)
 
. 950 − 1000 1050 − 1000
= P ≤Z≤
100 100
 
1 1
= P − ≤Z≤ = 2Φ(1/2) − 1 = 0.383. 2
2 2

ISYE 6739 — Goldsman 8/5/20 83 / 108


Central Limit Theorem Examples

So by the CLT,

P (950 ≤ X̄ ≤ 1050)
 
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)
 
. 950 − 1000 1050 − 1000
= P ≤Z≤
100 100
 
1 1
= P − ≤Z≤ = 2Φ(1/2) − 1 = 0.383. 2
2 2

Remark: This problem can be solved exactly if we have access to the Excel
Erlang cdf function GAMMADIST. And what do you know, you end up with
exactly the same answer of 0.383!

ISYE 6739 — Goldsman 8/5/20 83 / 108


Central Limit Theorem Examples

Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000.

ISYE 6739 — Goldsman 8/5/20 84 / 108


Central Limit Theorem Examples

Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000. Find P (950 ≤ X̄ ≤ 1050).

ISYE 6739 — Goldsman 8/5/20 84 / 108


Central Limit Theorem Examples

Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000. Find P (950 ≤ X̄ ≤ 1050).

Solution: By exactly the same manipulations as in the previous example, the


.
answer = 0.383.

ISYE 6739 — Goldsman 8/5/20 84 / 108


Central Limit Theorem Examples

Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000. Find P (950 ≤ X̄ ≤ 1050).

Solution: By exactly the same manipulations as in the previous example, the


.
answer = 0.383.

Notice that we didn’t care whether or not the data came from an exponential
distribution. We just needed the mean and variance. 2

ISYE 6739 — Goldsman 8/5/20 84 / 108


Central Limit Theorem Examples

Normal Approximation to the Binomial

ISYE 6739 — Goldsman 8/5/20 85 / 108


Central Limit Theorem Examples

Normal Approximation to the Binomial

Suppose Y ∼ Bin(n, p), where n is very large. In such cases, we usually


approximate the Binomial via an appropriate Normal distribution.

ISYE 6739 — Goldsman 8/5/20 85 / 108


Central Limit Theorem Examples

Normal Approximation to the Binomial

Suppose Y ∼ Bin(n, p), where n is very large. In such cases, we usually


approximate the Binomial via an appropriate Normal distribution.
Pn
The CLT applies since Y = i=1 Xi , where the Xi ’s are iid Bern(p).

ISYE 6739 — Goldsman 8/5/20 85 / 108


Central Limit Theorem Examples

Normal Approximation to the Binomial

Suppose Y ∼ Bin(n, p), where n is very large. In such cases, we usually


approximate the Binomial via an appropriate Normal distribution.
Pn
The CLT applies since Y = i=1 Xi , where the Xi ’s are iid Bern(p).

Then
Y − E[Y ] Y − np
p = √ ≈ Nor(0, 1).
Var(Y ) npq

ISYE 6739 — Goldsman 8/5/20 85 / 108


Central Limit Theorem Examples

Normal Approximation to the Binomial

Suppose Y ∼ Bin(n, p), where n is very large. In such cases, we usually


approximate the Binomial via an appropriate Normal distribution.
Pn
The CLT applies since Y = i=1 Xi , where the Xi ’s are iid Bern(p).

Then
Y − E[Y ] Y − np
p = √ ≈ Nor(0, 1).
Var(Y ) npq

The usual rule of thumb for the Normal approximation to the Binomial is that
it works pretty well as long as np ≥ 5 and nq ≥ 5.

ISYE 6739 — Goldsman 8/5/20 85 / 108


Central Limit Theorem Examples

Why do we need such an approximation?

ISYE 6739 — Goldsman 8/5/20 86 / 108


Central Limit Theorem Examples

Why do we need such an approximation?

Example: Suppose Y ∼ Bin(100, 0.8) and we want


100  
X 100
P (Y ≥ 84) = (0.8)i (0.2)100−i .
i
i=84

ISYE 6739 — Goldsman 8/5/20 86 / 108


Central Limit Theorem Examples

Why do we need such an approximation?

Example: Suppose Y ∼ Bin(100, 0.8) and we want


100  
X 100
P (Y ≥ 84) = (0.8)i (0.2)100−i .
i
i=84

Good luck with the binomial coefficients (they’re too big) and the number of
terms to sum up (it’s going to get tedious). I’ll come back to visit you in an
hour.

ISYE 6739 — Goldsman 8/5/20 86 / 108


Central Limit Theorem Examples

Why do we need such an approximation?

Example: Suppose Y ∼ Bin(100, 0.8) and we want


100  
X 100
P (Y ≥ 84) = (0.8)i (0.2)100−i .
i
i=84

Good luck with the binomial coefficients (they’re too big) and the number of
terms to sum up (it’s going to get tedious). I’ll come back to visit you in an
hour.

The next example shows how to use the approximation.

ISYE 6739 — Goldsman 8/5/20 86 / 108


Central Limit Theorem Examples

Why do we need such an approximation?

Example: Suppose Y ∼ Bin(100, 0.8) and we want


100  
X 100
P (Y ≥ 84) = (0.8)i (0.2)100−i .
i
i=84

Good luck with the binomial coefficients (they’re too big) and the number of
terms to sum up (it’s going to get tedious). I’ll come back to visit you in an
hour.

The next example shows how to use the approximation.

Note that it incorporates a “continuity correction” to account for the fact


that the Binomial is discrete while the Normal is continuous. If you don’t
want to use it, don’t worry too much.

ISYE 6739 — Goldsman 8/5/20 86 / 108


Central Limit Theorem Examples

Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?

ISYE 6739 — Goldsman 8/5/20 87 / 108


Central Limit Theorem Examples

Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?

Y ∼ Bin(100, 0.8) and we want P (Y ≥ 84) (as in the last example). . .

ISYE 6739 — Goldsman 8/5/20 87 / 108


Central Limit Theorem Examples

Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?

Y ∼ Bin(100, 0.8) and we want P (Y ≥ 84) (as in the last example). . .

P (Y ≥ 84) = P (Y ≥ 83.5) (“continuity correction”)

ISYE 6739 — Goldsman 8/5/20 87 / 108


Central Limit Theorem Examples

Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?

Y ∼ Bin(100, 0.8) and we want P (Y ≥ 84) (as in the last example). . .

P (Y ≥ 84) = P (Y ≥ 83.5) (“continuity correction”)


 
. 83.5 − np
= P Z≥ √ (CLT)
npq

ISYE 6739 — Goldsman 8/5/20 87 / 108


Central Limit Theorem Examples

Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?

Y ∼ Bin(100, 0.8) and we want P (Y ≥ 84) (as in the last example). . .

P (Y ≥ 84) = P (Y ≥ 83.5) (“continuity correction”)


 
. 83.5 − np
= P Z≥ √ (CLT)
npq
 
83.5 − 80
= P Z≥ √ = P (Z ≥ 0.875) = 0.1908.
16

ISYE 6739 — Goldsman 8/5/20 87 / 108


Central Limit Theorem Examples

Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?

Y ∼ Bin(100, 0.8) and we want P (Y ≥ 84) (as in the last example). . .

P (Y ≥ 84) = P (Y ≥ 83.5) (“continuity correction”)


 
. 83.5 − np
= P Z≥ √ (CLT)
npq
 
83.5 − 80
= P Z≥ √ = P (Z ≥ 0.875) = 0.1908.
16

The actual answer (using the true Bin(100,0.8) distribution) turns out to be
0.1923 — pretty close! 2

ISYE 6739 — Goldsman 8/5/20 87 / 108


Extensions — Multivariate Normal Distribution

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 88 / 108


Extensions — Multivariate Normal Distribution

Lesson 4.12 — Extensions — Multivariate Normal Distribution

ISYE 6739 — Goldsman 8/5/20 89 / 108


Extensions — Multivariate Normal Distribution

Lesson 4.12 — Extensions — Multivariate Normal Distribution

Definition: (X, Y ) has the Bivariate Normal distribution if it has pdf


 h i
 − zX2 (x) − 2ρz (x)z (y) + z 2 (y) 
X Y Y
f (x, y) = C exp 2
,
 2(1 − ρ ) 

ISYE 6739 — Goldsman 8/5/20 89 / 108


Extensions — Multivariate Normal Distribution

Lesson 4.12 — Extensions — Multivariate Normal Distribution

Definition: (X, Y ) has the Bivariate Normal distribution if it has pdf


 h i
 − zX2 (x) − 2ρz (x)z (y) + z 2 (y) 
X Y Y
f (x, y) = C exp 2
,
 2(1 − ρ ) 

where
1
ρ ≡ Corr(X, Y ), C ≡ p ,
2πσX σY 1 − ρ2

ISYE 6739 — Goldsman 8/5/20 89 / 108


Extensions — Multivariate Normal Distribution

Lesson 4.12 — Extensions — Multivariate Normal Distribution

Definition: (X, Y ) has the Bivariate Normal distribution if it has pdf


 h i
 − zX2 (x) − 2ρz (x)z (y) + z 2 (y) 
X Y Y
f (x, y) = C exp 2
,
 2(1 − ρ ) 

where
1
ρ ≡ Corr(X, Y ), C ≡ p ,
2πσX σY 1 − ρ2
x − µX y − µY
zX (x) ≡ , and zY (y) ≡ .
σX σY

ISYE 6739 — Goldsman 8/5/20 89 / 108


Extensions — Multivariate Normal Distribution

Pretty nasty joint pdf, eh?


2 ) and Y ∼ Nor(µ , σ 2 ).
In fact, X ∼ Nor(µX , σX Y Y

ISYE 6739 — Goldsman 8/5/20 90 / 108


Extensions — Multivariate Normal Distribution

Pretty nasty joint pdf, eh?


2 ) and Y ∼ Nor(µ , σ 2 ).
In fact, X ∼ Nor(µX , σX Y Y

Example: (X, Y ) could be a person’s (height,weight). The two quantities


are marginally normal, but positively correlated.

ISYE 6739 — Goldsman 8/5/20 90 / 108


Extensions — Multivariate Normal Distribution

Pretty nasty joint pdf, eh?


2 ) and Y ∼ Nor(µ , σ 2 ).
In fact, X ∼ Nor(µX , σX Y Y

Example: (X, Y ) could be a person’s (height,weight). The two quantities


are marginally normal, but positively correlated.

If you want to calculate bivariate normal probabilities, you’ll need to evaluate


quantities like

ISYE 6739 — Goldsman 8/5/20 90 / 108


Extensions — Multivariate Normal Distribution

Pretty nasty joint pdf, eh?


2 ) and Y ∼ Nor(µ , σ 2 ).
In fact, X ∼ Nor(µX , σX Y Y

Example: (X, Y ) could be a person’s (height,weight). The two quantities


are marginally normal, but positively correlated.

If you want to calculate bivariate normal probabilities, you’ll need to evaluate


quantities like
Z dZ b
P (a < X < b, c < Y < d) = f (x, y) dx dy,
c a

ISYE 6739 — Goldsman 8/5/20 90 / 108


Extensions — Multivariate Normal Distribution

Pretty nasty joint pdf, eh?


2 ) and Y ∼ Nor(µ , σ 2 ).
In fact, X ∼ Nor(µX , σX Y Y

Example: (X, Y ) could be a person’s (height,weight). The two quantities


are marginally normal, but positively correlated.

If you want to calculate bivariate normal probabilities, you’ll need to evaluate


quantities like
Z dZ b
P (a < X < b, c < Y < d) = f (x, y) dx dy,
c a

which will probably require numerical integration techniques.

ISYE 6739 — Goldsman 8/5/20 90 / 108


Extensions — Multivariate Normal Distribution

Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,

ISYE 6739 — Goldsman 8/5/20 91 / 108


Extensions — Multivariate Normal Distribution

Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
 
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .

ISYE 6739 — Goldsman 8/5/20 91 / 108


Extensions — Multivariate Normal Distribution

Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
 
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .

Information about X helps to update the distribution of Y .

ISYE 6739 — Goldsman 8/5/20 91 / 108


Extensions — Multivariate Normal Distribution

Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
 
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .

Information about X helps to update the distribution of Y .

Example: Consider students at a university. Let X be their combined SAT


scores (Math and Verbal), and Y their freshman GPA (out of 4).

ISYE 6739 — Goldsman 8/5/20 91 / 108


Extensions — Multivariate Normal Distribution

Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
 
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .

Information about X helps to update the distribution of Y .

Example: Consider students at a university. Let X be their combined SAT


scores (Math and Verbal), and Y their freshman GPA (out of 4).
Suppose a study reveals that

µX = 1300, µY = 2.3,

ISYE 6739 — Goldsman 8/5/20 91 / 108


Extensions — Multivariate Normal Distribution

Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
 
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .

Information about X helps to update the distribution of Y .

Example: Consider students at a university. Let X be their combined SAT


scores (Math and Verbal), and Y their freshman GPA (out of 4).
Suppose a study reveals that

µX = 1300, µY = 2.3,
2
σX = 6400, σY2 = 0.25, ρ = 0.6.

ISYE 6739 — Goldsman 8/5/20 91 / 108


Extensions — Multivariate Normal Distribution

Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
 
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .

Information about X helps to update the distribution of Y .

Example: Consider students at a university. Let X be their combined SAT


scores (Math and Verbal), and Y their freshman GPA (out of 4).
Suppose a study reveals that

µX = 1300, µY = 2.3,
2
σX = 6400, σY2 = 0.25, ρ = 0.6.

Find P (Y ≥ 2|X = 900).

ISYE 6739 — Goldsman 8/5/20 91 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )


p
= 2.3 + (0.6)( 0.25/6400 )(900 − 1300) = 0.8,

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )


p
= 2.3 + (0.6)( 0.25/6400 )(900 − 1300) = 0.8,

indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )


p
= 2.3 + (0.6)( 0.25/6400 )(900 − 1300) = 0.8,

indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.

Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )


p
= 2.3 + (0.6)( 0.25/6400 )(900 − 1300) = 0.8,

indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.

Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )


p
= 2.3 + (0.6)( 0.25/6400 )(900 − 1300) = 0.8,

indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.

Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).
Now we can calculate
 2 − 0.8 
P (Y ≥ 2|X = 900) = P Z ≥ √
0.16

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )


p
= 2.3 + (0.6)( 0.25/6400 )(900 − 1300) = 0.8,

indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.

Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).
Now we can calculate
 2 − 0.8 
P (Y ≥ 2|X = 900) = P Z ≥ √ = 1 − Φ(3) = 0.0013.
0.16

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

First,

E[Y |X = 900] = µY + ρ(σY /σX )(x − µX )


p
= 2.3 + (0.6)( 0.25/6400 )(900 − 1300) = 0.8,

indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.

Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).
Now we can calculate
 2 − 0.8 
P (Y ≥ 2|X = 900) = P Z ≥ √ = 1 − Φ(3) = 0.0013.
0.16
This guy doesn’t have much chance of having a good GPA. 2

ISYE 6739 — Goldsman 8/5/20 92 / 108


Extensions — Multivariate Normal Distribution

The bivariate normal distribution is easily generalized to the multivariate case.

ISYE 6739 — Goldsman 8/5/20 93 / 108


Extensions — Multivariate Normal Distribution

The bivariate normal distribution is easily generalized to the multivariate case.

Honors Definition: The random vector X = (X1 , . . . , Xk )T has the


multivariate normal distribution with mean vector µ = (µ1 , . . . , µk )T
and k × k covariance matrix Σ = (σij ) if it has multivariate pdf

ISYE 6739 — Goldsman 8/5/20 93 / 108


Extensions — Multivariate Normal Distribution

The bivariate normal distribution is easily generalized to the multivariate case.

Honors Definition: The random vector X = (X1 , . . . , Xk )T has the


multivariate normal distribution with mean vector µ = (µ1 , . . . , µk )T
and k × k covariance matrix Σ = (σij ) if it has multivariate pdf

(x − µ)T Σ−1 (x − µ)
 
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2

ISYE 6739 — Goldsman 8/5/20 93 / 108


Extensions — Multivariate Normal Distribution

The bivariate normal distribution is easily generalized to the multivariate case.

Honors Definition: The random vector X = (X1 , . . . , Xk )T has the


multivariate normal distribution with mean vector µ = (µ1 , . . . , µk )T
and k × k covariance matrix Σ = (σij ) if it has multivariate pdf

(x − µ)T Σ−1 (x − µ)
 
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2

where |Σ| and Σ−1 are the determinant and inverse of Σ, respectively.

ISYE 6739 — Goldsman 8/5/20 93 / 108


Extensions — Multivariate Normal Distribution

The bivariate normal distribution is easily generalized to the multivariate case.

Honors Definition: The random vector X = (X1 , . . . , Xk )T has the


multivariate normal distribution with mean vector µ = (µ1 , . . . , µk )T
and k × k covariance matrix Σ = (σij ) if it has multivariate pdf

(x − µ)T Σ−1 (x − µ)
 
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2

where |Σ| and Σ−1 are the determinant and inverse of Σ, respectively.

It turns out that

E[Xi ] = µi , Var(Xi ) = σii , Cov(Xi , Xj ) = σij .

ISYE 6739 — Goldsman 8/5/20 93 / 108


Extensions — Multivariate Normal Distribution

The bivariate normal distribution is easily generalized to the multivariate case.

Honors Definition: The random vector X = (X1 , . . . , Xk )T has the


multivariate normal distribution with mean vector µ = (µ1 , . . . , µk )T
and k × k covariance matrix Σ = (σij ) if it has multivariate pdf

(x − µ)T Σ−1 (x − µ)
 
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2

where |Σ| and Σ−1 are the determinant and inverse of Σ, respectively.

It turns out that

E[Xi ] = µi , Var(Xi ) = σii , Cov(Xi , Xj ) = σij .

Notation: X ∼ Nork (µ, Σ).

ISYE 6739 — Goldsman 8/5/20 93 / 108


Extensions — Lognormal Distribution

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 94 / 108


Extensions — Lognormal Distribution

Lesson 4.13 — Extensions — Lognormal Distribution

ISYE 6739 — Goldsman 8/5/20 95 / 108


Extensions — Lognormal Distribution

Lesson 4.13 — Extensions — Lognormal Distribution

Definition: If Y ∼ Nor(µY , σY2 ), then X ≡ eY has the Lognormal


distribution with parameters (µY , σY2 ). This distribution has tremendous
uses, e.g., in the pricing of certain stock options.

ISYE 6739 — Goldsman 8/5/20 95 / 108


Extensions — Lognormal Distribution

Lesson 4.13 — Extensions — Lognormal Distribution

Definition: If Y ∼ Nor(µY , σY2 ), then X ≡ eY has the Lognormal


distribution with parameters (µY , σY2 ). This distribution has tremendous
uses, e.g., in the pricing of certain stock options.

Turns Out: The pdf and moments of the lognormal are


 
1 1 2
f (x) = √ exp − 2 [`n(x) − µY ] , x > 0,
xσY 2π 2σY

ISYE 6739 — Goldsman 8/5/20 95 / 108


Extensions — Lognormal Distribution

Lesson 4.13 — Extensions — Lognormal Distribution

Definition: If Y ∼ Nor(µY , σY2 ), then X ≡ eY has the Lognormal


distribution with parameters (µY , σY2 ). This distribution has tremendous
uses, e.g., in the pricing of certain stock options.

Turns Out: The pdf and moments of the lognormal are


 
1 1 2
f (x) = √ exp − 2 [`n(x) − µY ] , x > 0,
xσY 2π 2σY

σ2
 
E[X] = exp µY + Y ,
2

ISYE 6739 — Goldsman 8/5/20 95 / 108


Extensions — Lognormal Distribution

Lesson 4.13 — Extensions — Lognormal Distribution

Definition: If Y ∼ Nor(µY , σY2 ), then X ≡ eY has the Lognormal


distribution with parameters (µY , σY2 ). This distribution has tremendous
uses, e.g., in the pricing of certain stock options.

Turns Out: The pdf and moments of the lognormal are


 
1 1 2
f (x) = √ exp − 2 [`n(x) − µY ] , x > 0,
xσY 2π 2σY

σ2
 
E[X] = exp µY + Y ,
2

Var(X) = exp 2µY + σY2 exp σY2 − 1 .


  

ISYE 6739 — Goldsman 8/5/20 95 / 108


Extensions — Lognormal Distribution

Example: Suppose Y ∼ Nor(10, 4) and let X = eY . Then

ISYE 6739 — Goldsman 8/5/20 96 / 108


Extensions — Lognormal Distribution

Example: Suppose Y ∼ Nor(10, 4) and let X = eY . Then



P (X ≤ 1000) = P Y ≤ `n(1000)

ISYE 6739 — Goldsman 8/5/20 96 / 108


Extensions — Lognormal Distribution

Example: Suppose Y ∼ Nor(10, 4) and let X = eY . Then



P (X ≤ 1000) = P Y ≤ `n(1000)
 `n(1000) − 10 
= P Z≤
2

ISYE 6739 — Goldsman 8/5/20 96 / 108


Extensions — Lognormal Distribution

Example: Suppose Y ∼ Nor(10, 4) and let X = eY . Then



P (X ≤ 1000) = P Y ≤ `n(1000)
 `n(1000) − 10 
= P Z≤
2
= Φ(−1.55) = 0.061. 2

ISYE 6739 — Goldsman 8/5/20 96 / 108


Extensions — Lognormal Distribution

Example: Suppose Y ∼ Nor(10, 4) and let X = eY . Then



P (X ≤ 1000) = P Y ≤ `n(1000)
 `n(1000) − 10 
= P Z≤
2
= Φ(−1.55) = 0.061. 2

Honors Example (How to Win a Nobel Prize:) It is well-known that


stock prices are closely related to the lognormal distribution. In fact, it’s
common to use the following model for a stock price at a fixed time t,

ISYE 6739 — Goldsman 8/5/20 96 / 108


Extensions — Lognormal Distribution

Example: Suppose Y ∼ Nor(10, 4) and let X = eY . Then



P (X ≤ 1000) = P Y ≤ `n(1000)
 `n(1000) − 10 
= P Z≤
2
= Φ(−1.55) = 0.061. 2

Honors Example (How to Win a Nobel Prize:) It is well-known that


stock prices are closely related to the lognormal distribution. In fact, it’s
common to use the following model for a stock price at a fixed time t,

σ2 √
  
S(t) = S(0) exp µ− t + σ tZ , t ≥ 0,
2

ISYE 6739 — Goldsman 8/5/20 96 / 108


Extensions — Lognormal Distribution

Example: Suppose Y ∼ Nor(10, 4) and let X = eY . Then



P (X ≤ 1000) = P Y ≤ `n(1000)
 `n(1000) − 10 
= P Z≤
2
= Φ(−1.55) = 0.061. 2

Honors Example (How to Win a Nobel Prize:) It is well-known that


stock prices are closely related to the lognormal distribution. In fact, it’s
common to use the following model for a stock price at a fixed time t,

σ2 √
  
S(t) = S(0) exp µ− t + σ tZ , t ≥ 0,
2

where µ is related to the “drift” of the stock price (i.e., the natural rate of
increase), σ is its “volatility” (how much the stock bounces around), S(0) is
the initial price, and Z is a standard normal RV.

ISYE 6739 — Goldsman 8/5/20 96 / 108


Extensions — Lognormal Distribution

An active area of finance is to estimate option prices.

ISYE 6739 — Goldsman 8/5/20 97 / 108


Extensions — Lognormal Distribution

An active area of finance is to estimate option prices. For example, a


so-called European call option C permits its owner, who pays an up-front
fee for the privilege, to purchase the stock at a pre-agreed strike price k, at a
pre-determined expiry date T .

ISYE 6739 — Goldsman 8/5/20 97 / 108


Extensions — Lognormal Distribution

An active area of finance is to estimate option prices. For example, a


so-called European call option C permits its owner, who pays an up-front
fee for the privilege, to purchase the stock at a pre-agreed strike price k, at a
pre-determined expiry date T .

For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.

ISYE 6739 — Goldsman 8/5/20 97 / 108


Extensions — Lognormal Distribution

An active area of finance is to estimate option prices. For example, a


so-called European call option C permits its owner, who pays an up-front
fee for the privilege, to purchase the stock at a pre-agreed strike price k, at a
pre-determined expiry date T .

For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.

If IBM is worth $120 three months from now, I’ll be able to buy it for
only $105, and will have made a profit of $120 − $105 − $3 = $12.

ISYE 6739 — Goldsman 8/5/20 97 / 108


Extensions — Lognormal Distribution

An active area of finance is to estimate option prices. For example, a


so-called European call option C permits its owner, who pays an up-front
fee for the privilege, to purchase the stock at a pre-agreed strike price k, at a
pre-determined expiry date T .

For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.

If IBM is worth $120 three months from now, I’ll be able to buy it for
only $105, and will have made a profit of $120 − $105 − $3 = $12.
If IBM is selling for $107 three months hence, I can still buy it for $105,
and will lose $1 (recouping $2 from my original option purchase).

ISYE 6739 — Goldsman 8/5/20 97 / 108


Extensions — Lognormal Distribution

An active area of finance is to estimate option prices. For example, a


so-called European call option C permits its owner, who pays an up-front
fee for the privilege, to purchase the stock at a pre-agreed strike price k, at a
pre-determined expiry date T .

For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.

If IBM is worth $120 three months from now, I’ll be able to buy it for
only $105, and will have made a profit of $120 − $105 − $3 = $12.
If IBM is selling for $107 three months hence, I can still buy it for $105,
and will lose $1 (recouping $2 from my original option purchase).
If IBM is selling for $95, then I won’t exercise my option, and will walk
away with my tail between my legs having lost my original $3.

ISYE 6739 — Goldsman 8/5/20 97 / 108


Extensions — Lognormal Distribution

So what is the option worth (and what should I pay for it)? Its expected value
is given by

ISYE 6739 — Goldsman 8/5/20 98 / 108


Extensions — Lognormal Distribution

So what is the option worth (and what should I pay for it)? Its expected value
is given by + 
E[C] = e−rT E S(T ) − k

,
where

ISYE 6739 — Goldsman 8/5/20 98 / 108


Extensions — Lognormal Distribution

So what is the option worth (and what should I pay for it)? Its expected value
is given by + 
E[C] = e−rT E S(T ) − k

,
where
x+ ≡ max{0, x}.

ISYE 6739 — Goldsman 8/5/20 98 / 108


Extensions — Lognormal Distribution

So what is the option worth (and what should I pay for it)? Its expected value
is given by + 
E[C] = e−rT E S(T ) − k

,
where
x+ ≡ max{0, x}.
r, the “risk-free” interest rate (e.g., what you can get from a U.S.
Treasury bond). This is used instead of the drift µ.

ISYE 6739 — Goldsman 8/5/20 98 / 108


Extensions — Lognormal Distribution

So what is the option worth (and what should I pay for it)? Its expected value
is given by + 
E[C] = e−rT E S(T ) − k

,
where
x+ ≡ max{0, x}.
r, the “risk-free” interest rate (e.g., what you can get from a U.S.
Treasury bond). This is used instead of the drift µ.
The term e−rT denotes the time-value of money, i.e., a depreciation term
corresponding to the interest I could’ve made had I used my money to
buy a Treasury note.

ISYE 6739 — Goldsman 8/5/20 98 / 108


Extensions — Lognormal Distribution

So what is the option worth (and what should I pay for it)? Its expected value
is given by + 
E[C] = e−rT E S(T ) − k

,
where
x+ ≡ max{0, x}.
r, the “risk-free” interest rate (e.g., what you can get from a U.S.
Treasury bond). This is used instead of the drift µ.
The term e−rT denotes the time-value of money, i.e., a depreciation term
corresponding to the interest I could’ve made had I used my money to
buy a Treasury note.

Black and Scholes won a Nobel Prize for calculating E[C]. We’ll get the
same answer via a different method — but, alas, no Nobel Prize.

ISYE 6739 — Goldsman 8/5/20 98 / 108


Extensions — Lognormal Distribution

+
σ2 √
   
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2

ISYE 6739 — Goldsman 8/5/20 99 / 108


Extensions — Lognormal Distribution

+
σ2 √
   
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
  
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)

ISYE 6739 — Goldsman 8/5/20 99 / 108


Extensions — Lognormal Distribution

+
σ2 √
   
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
  
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)

= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),

ISYE 6739 — Goldsman 8/5/20 99 / 108


Extensions — Lognormal Distribution

+
σ2 √
   
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
  
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)

= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),

where φ(·) and Φ(·) are the Nor(0,1) pdf and cdf, respectively, and

ISYE 6739 — Goldsman 8/5/20 99 / 108


Extensions — Lognormal Distribution

+
σ2 √
   
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
  
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)

= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),

where φ(·) and Φ(·) are the Nor(0,1) pdf and cdf, respectively, and
σ2 T
rT − 2 − `n(k/S(0))
b ≡ √ .
σ T

ISYE 6739 — Goldsman 8/5/20 99 / 108


Extensions — Lognormal Distribution

+
σ2 √
   
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
  
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)

= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),

where φ(·) and Φ(·) are the Nor(0,1) pdf and cdf, respectively, and
σ2 T
rT − 2 − `n(k/S(0))
b ≡ √ .
σ T
There are many generalizations of this problem that are used in practical
finance problems, but this is the starting point. Meanwhile, get your tickets to
Norway or Sweden or wherever they give out the Nobel!

ISYE 6739 — Goldsman 8/5/20 99 / 108


Computer Stuff

1 Bernoulli and Binomial Distributions


2 Hypergeometric Distribution
3 Geometric and Negative Binomial Distributions
4 Poisson Distribution
5 Uniform, Exponential, and Friends
6 Other Continuous Distributions
7 Normal Distribution: Basics
8 Standard Normal Distribution
9 Sample Mean of Normals
10 The Central Limit Theorem + Proof
11 Central Limit Theorem Examples
12 Extensions — Multivariate Normal Distribution
13 Extensions — Lognormal Distribution
14 Computer Stuff

ISYE 6739 — Goldsman 8/5/20 100 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:

BINOMDIST = Binomial distribution

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:

BINOMDIST = Binomial distribution


EXPONDIST = Exponential

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:

BINOMDIST = Binomial distribution


EXPONDIST = Exponential
NEGBINOMDIST = Negative Binomial

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:

BINOMDIST = Binomial distribution


EXPONDIST = Exponential
NEGBINOMDIST = Negative Binomial
NORMDIST and NORMSDIST = Normal and Standard Normal

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:

BINOMDIST = Binomial distribution


EXPONDIST = Exponential
NEGBINOMDIST = Negative Binomial
NORMDIST and NORMSDIST = Normal and Standard Normal
POISSON = Poisson

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Lesson 4.14 — Computer Stuff

Evaluating pmf’s / pdf’s and cdf’s

We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:

BINOMDIST = Binomial distribution


EXPONDIST = Exponential
NEGBINOMDIST = Negative Binomial
NORMDIST and NORMSDIST = Normal and Standard Normal
POISSON = Poisson

Functions such as NORMSINV and TINV can calculate the inverses of the
standard normal and t distributions, respectively.

ISYE 6739 — Goldsman 8/5/20 101 / 108


Computer Stuff

Simulating Random Variables

ISYE 6739 — Goldsman 8/5/20 102 / 108


Computer Stuff

Simulating Random Variables

Motivation: Simulations are used to evaluate a variety of real-world


processes that contain inherent randomness, e.g., queueing systems, inventory
systems, manufacturing systems, etc. In order to run simulations, you need to
generate various RVs, e.g., arrival times, service times, failure times, etc.

ISYE 6739 — Goldsman 8/5/20 102 / 108


Computer Stuff

Simulating Random Variables

Motivation: Simulations are used to evaluate a variety of real-world


processes that contain inherent randomness, e.g., queueing systems, inventory
systems, manufacturing systems, etc. In order to run simulations, you need to
generate various RVs, e.g., arrival times, service times, failure times, etc.

Examples: There are numerous ways to simulate RVs.

ISYE 6739 — Goldsman 8/5/20 102 / 108


Computer Stuff

Simulating Random Variables

Motivation: Simulations are used to evaluate a variety of real-world


processes that contain inherent randomness, e.g., queueing systems, inventory
systems, manufacturing systems, etc. In order to run simulations, you need to
generate various RVs, e.g., arrival times, service times, failure times, etc.

Examples: There are numerous ways to simulate RVs.


The Excel function RAND simulates a Unif(0,1) RV.

ISYE 6739 — Goldsman 8/5/20 102 / 108


Computer Stuff

Simulating Random Variables

Motivation: Simulations are used to evaluate a variety of real-world


processes that contain inherent randomness, e.g., queueing systems, inventory
systems, manufacturing systems, etc. In order to run simulations, you need to
generate various RVs, e.g., arrival times, service times, failure times, etc.

Examples: There are numerous ways to simulate RVs.


The Excel function RAND simulates a Unif(0,1) RV.
iid
If U1 , U2 ∼ Unif(0,1), then U1 + U2 is Triangular(0,1,2). This is simply
RAND()+RAND() in Excel.

ISYE 6739 — Goldsman 8/5/20 102 / 108


Computer Stuff

Simulating Random Variables

Motivation: Simulations are used to evaluate a variety of real-world


processes that contain inherent randomness, e.g., queueing systems, inventory
systems, manufacturing systems, etc. In order to run simulations, you need to
generate various RVs, e.g., arrival times, service times, failure times, etc.

Examples: There are numerous ways to simulate RVs.


The Excel function RAND simulates a Unif(0,1) RV.
iid
If U1 , U2 ∼ Unif(0,1), then U1 + U2 is Triangular(0,1,2). This is simply
RAND()+RAND() in Excel.
The Inverse Transform Theorem gives (−1/λ)`n(U ) ∼ Exp(λ).

ISYE 6739 — Goldsman 8/5/20 102 / 108


Computer Stuff

Simulating Random Variables

Motivation: Simulations are used to evaluate a variety of real-world


processes that contain inherent randomness, e.g., queueing systems, inventory
systems, manufacturing systems, etc. In order to run simulations, you need to
generate various RVs, e.g., arrival times, service times, failure times, etc.

Examples: There are numerous ways to simulate RVs.


The Excel function RAND simulates a Unif(0,1) RV.
iid
If U1 , U2 ∼ Unif(0,1), then U1 + U2 is Triangular(0,1,2). This is simply
RAND()+RAND() in Excel.
The Inverse Transform Theorem gives (−1/λ)`n(U ) ∼ Exp(λ).
iid
If U1 , U2 , . . . , Uk ∼ Unif(0,1), then ki=1 (−1/λ)`n(Ui ) is Erlangk (λ)
P
because it is the sum of iid Exp(λ)’s.

ISYE 6739 — Goldsman 8/5/20 102 / 108


Computer Stuff

Simulating Random Variables

Motivation: Simulations are used to evaluate a variety of real-world


processes that contain inherent randomness, e.g., queueing systems, inventory
systems, manufacturing systems, etc. In order to run simulations, you need to
generate various RVs, e.g., arrival times, service times, failure times, etc.

Examples: There are numerous ways to simulate RVs.


The Excel function RAND simulates a Unif(0,1) RV.
iid
If U1 , U2 ∼ Unif(0,1), then U1 + U2 is Triangular(0,1,2). This is simply
RAND()+RAND() in Excel.
The Inverse Transform Theorem gives (−1/λ)`n(U ) ∼ Exp(λ).
iid
If U1 , U2 , . . . , Uk ∼ Unif(0,1), then ki=1 (−1/λ)`n(Ui ) is Erlangk (λ)
P
because it is the sum of iid Exp(λ)’s.
It can be shown that X = d`n(U )/`n(1 − p)e ∼ Geom(p), where d·e is
the “ceiling” (integer round-up) function.
ISYE 6739 — Goldsman 8/5/20 102 / 108
Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p
Z1 = −2`n(U1 ) cos(2πU2 ) and

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )

are iid Nor(0,1).

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )

are iid Nor(0,1).

Example: Suppose that U1 = 0.3 and U2 = 0.8 are realizations of two iid
Unif(0,1)’s. Box–Muller gives the following two iid standard normals.

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )

are iid Nor(0,1).

Example: Suppose that U1 = 0.3 and U2 = 0.8 are realizations of two iid
Unif(0,1)’s. Box–Muller gives the following two iid standard normals.
p
Z1 = −2`n(U1 ) cos(2πU2 ) = 0.480

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.

iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )

are iid Nor(0,1).

Example: Suppose that U1 = 0.3 and U2 = 0.8 are realizations of two iid
Unif(0,1)’s. Box–Muller gives the following two iid standard normals.
p
Z1 = −2`n(U1 ) cos(2πU2 ) = 0.480
−2`n(U1 ) sin(2πU2 ) = − 1.476. 2
p
Z2 =

ISYE 6739 — Goldsman 8/5/20 103 / 108


Computer Stuff

Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.

ISYE 6739 — Goldsman 8/5/20 104 / 108


Computer Stuff

Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.
It is essential that the cosine and sine must be calculated in radians, not
degrees.

ISYE 6739 — Goldsman 8/5/20 104 / 108


Computer Stuff

Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.
It is essential that the cosine and sine must be calculated in radians, not
degrees.
To get X ∼ Nor(µ, σ 2 ) from Z ∼ Nor(0, 1), just take X = µ + σZ.

ISYE 6739 — Goldsman 8/5/20 104 / 108


Computer Stuff

Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.
It is essential that the cosine and sine must be calculated in radians, not
degrees.
To get X ∼ Nor(µ, σ 2 ) from Z ∼ Nor(0, 1), just take X = µ + σZ.
Amazingly, it’s “Muller”, not “Müller”.
See https://www.youtube.com/watch?v=nntGTK2Fhb0

ISYE 6739 — Goldsman 8/5/20 104 / 108


Computer Stuff

Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by

ISYE 6739 — Goldsman 8/5/20 105 / 108


Computer Stuff

Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by

  ∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2

ISYE 6739 — Goldsman 8/5/20 105 / 108


Computer Stuff

Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by

  ∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2

∂u1 ∂u2 ∂u2 ∂u1 
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2

ISYE 6739 — Goldsman 8/5/20 105 / 108


Computer Stuff

Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by

  ∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2

∂u1 ∂u2 ∂u2 ∂u1 
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2

In order to obtain the functions k1 (Z1 , Z2 ) and k2 (Z1 , Z2 ), note that

ISYE 6739 — Goldsman 8/5/20 105 / 108


Computer Stuff

Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by

  ∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2

∂u1 ∂u2 ∂u2 ∂u1 
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2

In order to obtain the functions k1 (Z1 , Z2 ) and k2 (Z1 , Z2 ), note that

Z12 + Z22 = − 2 `n(U1 ) cos2 (2πU2 ) + sin2 (2πU2 ) = − 2 `n(U1 ),


 

ISYE 6739 — Goldsman 8/5/20 105 / 108


Computer Stuff

Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by

  ∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2

∂u1 ∂u2 ∂u2 ∂u1 
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2

In order to obtain the functions k1 (Z1 , Z2 ) and k2 (Z1 , Z2 ), note that

Z12 + Z22 = − 2 `n(U1 ) cos2 (2πU2 ) + sin2 (2πU2 ) = − 2 `n(U1 ),


 

so that
2 2
U1 = e−(Z1 +Z2 )/2 .

ISYE 6739 — Goldsman 8/5/20 105 / 108


Computer Stuff

This immediately implies that

Z12 = −2 `n(U1 ) cos2 (2πU2 )

ISYE 6739 — Goldsman 8/5/20 106 / 108


Computer Stuff

This immediately implies that

Z12 = −2 `n(U1 ) cos2 (2πU2 )


2 2
= −2 `n e−(Z1 +Z2 )/2 cos2 (2πU2 )


ISYE 6739 — Goldsman 8/5/20 106 / 108


Computer Stuff

This immediately implies that

Z12 = −2 `n(U1 ) cos2 (2πU2 )


2 2
= −2 `n e−(Z1 +Z2 )/2 cos2 (2πU2 )


= (Z12 + Z22 ) cos2 (2πU2 ),

ISYE 6739 — Goldsman 8/5/20 106 / 108


Computer Stuff

This immediately implies that

Z12 = −2 `n(U1 ) cos2 (2πU2 )


2 2
= −2 `n e−(Z1 +Z2 )/2 cos2 (2πU2 )


= (Z12 + Z22 ) cos2 (2πU2 ),

so that
 s s
Z12 Z12
 
1 1
U2 = arccos ± = arccos ,
2π Z12 + Z22 2π Z12 + Z22

ISYE 6739 — Goldsman 8/5/20 106 / 108


Computer Stuff

This immediately implies that

Z12 = −2 `n(U1 ) cos2 (2πU2 )


2 2
= −2 `n e−(Z1 +Z2 )/2 cos2 (2πU2 )


= (Z12 + Z22 ) cos2 (2πU2 ),

so that
 s s
Z12 Z12
 
1 1
U2 = arccos ± = arccos ,
2π Z12 + Z22 2π Z12 + Z22

where we (non-rigorously) get rid of the “±” to balance off the fact that the
range of y = arccos(x) is only regarded to be 0 ≤ y ≤ π (not 0 ≤ y ≤ 2π).

ISYE 6739 — Goldsman 8/5/20 106 / 108


Computer Stuff

Now some derivative fun:

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2
= e
∂zi ∂zi

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2 2 2
= e = − zi e−(z1 +z2 )/2 , i = 1, 2,
∂zi ∂zi

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2 2 2
= e = − zi e−(z1 +z2 )/2 , i = 1, 2,
∂zi ∂zi
s
z12

∂u2 ∂ 1
= arccos
∂z1 ∂z1 2π z12 + z22

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2 2 2
= e = − zi e−(z1 +z2 )/2 , i = 1, 2,
∂zi ∂zi
s
z12

∂u2 ∂ 1
= arccos
∂z1 ∂z1 2π z12 + z22
s
1 −1 ∂ z12
= (chain rule)
z12 + z22
r
2π z12 ∂z1
1 − z 2 +z 2
1 2

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2 2 2
= e = − zi e−(z1 +z2 )/2 , i = 1, 2,
∂zi ∂zi
s
z12

∂u2 ∂ 1
= arccos
∂z1 ∂z1 2π z12 + z22
s
1 −1 ∂ z12
= (chain rule)
z12 + z22
r
2π z12 ∂z1
1 − z 2 +z 2
1 2
−1/2
z12 z12

1 −1 1 ∂
= (chain rule again)
2 z12 + z22 ∂z1 z12 + z22
r
2π z22
z12 +z22

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2 2 2
= e = − zi e−(z1 +z2 )/2 , i = 1, 2,
∂zi ∂zi
s
z12

∂u2 ∂ 1
= arccos
∂z1 ∂z1 2π z12 + z22
s
1 −1 ∂ z12
= (chain rule)
z12 + z22
r
2π z12 ∂z1
1 − z 2 +z 2
1 2
−1/2
z12 z12

1 −1 1 ∂
= (chain rule again)
2 z12 + z22 ∂z1 z12 + z22
r
2π z22
z12 +z22

−(z12 + z22 ) 2z1 z22


=
4πz1 z2 (z12 + z22 )2

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2 2 2
= e = − zi e−(z1 +z2 )/2 , i = 1, 2,
∂zi ∂zi
s
z12

∂u2 ∂ 1
= arccos
∂z1 ∂z1 2π z12 + z22
s
1 −1 ∂ z12
= (chain rule)
z12 + z22
r
2π z12 ∂z1
1 − z 2 +z 2
1 2
−1/2
z12 z12

1 −1 1 ∂
= (chain rule again)
2 z12 + z22 ∂z1 z12 + z22
r
2π z22
z12 +z22

−(z12 + z22 ) 2z1 z22 −z2


= 2 2 = , and
4πz1 z2 (z1 + z2 ) 2 2π(z12 + z22 )

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Now some derivative fun:


∂u1 ∂ −(z12 +z22 )/2 2 2
= e = − zi e−(z1 +z2 )/2 , i = 1, 2,
∂zi ∂zi
s
z12

∂u2 ∂ 1
= arccos
∂z1 ∂z1 2π z12 + z22
s
1 −1 ∂ z12
= (chain rule)
z12 + z22
r
2π z12 ∂z1
1 − z 2 +z 2
1 2
−1/2
z12 z12

1 −1 1 ∂
= (chain rule again)
2 z12 + z22 ∂z1 z12 + z22
r
2π z22
z12 +z22

−(z12 + z22 ) 2z1 z22 −z2


= 2 2 = , and
4πz1 z2 (z1 + z2 ) 2 2π(z12 + z22 )
∂u2 z1
= (after similar algebra).
∂z2 2π(z1 + z22 )
2

ISYE 6739 — Goldsman 8/5/20 107 / 108


Computer Stuff

Then we finally have

ISYE 6739 — Goldsman 8/5/20 108 / 108


Computer Stuff

Then we finally have



∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = −
∂z1 ∂z2 ∂z1 ∂z2

ISYE 6739 — Goldsman 8/5/20 108 / 108


Computer Stuff

Then we finally have



∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = −
∂z1 ∂z2 ∂z1 ∂z2

−(z 2 +z 2 )/2 z 1 z 2 −(z 2 +z 2 )/2
= −z1 e 1 2 − z2 e 1 2
2π(z12 + z22 ) 2π(z12 + z22 )

ISYE 6739 — Goldsman 8/5/20 108 / 108


Computer Stuff

Then we finally have



∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = −
∂z1 ∂z2 ∂z1 ∂z2

−(z 2 +z 2 )/2 z 1 z 2 −(z 2 +z 2 )/2
= −z1 e 1 2 − z2 e 1 2
2π(z12 + z22 ) 2π(z12 + z22 )
1 −(z12 +z22 )/2
= e

ISYE 6739 — Goldsman 8/5/20 108 / 108


Computer Stuff

Then we finally have



∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = −
∂z1 ∂z2 ∂z1 ∂z2

−(z 2 +z 2 )/2 z 1 z 2 −(z 2 +z 2 )/2
= −z1 e 1 2 − z2 e 1 2
2π(z12 + z22 ) 2π(z12 + z22 )
  
1 −(z12 +z22 )/2 1 −z12 /2 1 −z22 /2
= e = √ e √ e ,
2π 2π 2π

ISYE 6739 — Goldsman 8/5/20 108 / 108


Computer Stuff

Then we finally have



∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = −
∂z1 ∂z2 ∂z1 ∂z2

−(z 2 +z 2 )/2 z 1 z 2 −(z 2 +z 2 )/2
= −z1 e 1 2 − z2 e 1 2
2π(z12 + z22 ) 2π(z12 + z22 )
  
1 −(z12 +z22 )/2 1 −z12 /2 1 −z22 /2
= e = √ e √ e ,
2π 2π 2π

which is the product of two iid Nor(0,1) pdf’s, so we are done!

ISYE 6739 — Goldsman 8/5/20 108 / 108

You might also like