Professional Documents
Culture Documents
6739 04 Distributions - 200805
6739 04 Distributions - 200805
Distributions
Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
8/5/20
The module will be a compendium of results, some of which we’ll prove, and
some of which we’ve already seen previously.
The module will be a compendium of results, some of which we’ll prove, and
some of which we’ve already seen previously.
The module will be a compendium of results, some of which we’ll prove, and
some of which we’ve already seen previously.
In the next few lessons we’ll discuss some important discrete distributions:
Bernoulli and Binomial Distributions
Hypergeometric Distribution
Geometric and Negative Binomial Distributions
Poisson Distribution
Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see.
Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6
Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6
iid Pn
Theorem: X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).
Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6
iid Pn
Theorem: X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).
Example: Toss 2 dice and take the sum; repeat 5 times. Let Y be the number
of 7’s you see. Y ∼ Bin(5, 1/6). Then, e.g.,
5 1 4 5 5−4
P (Y = 4) = . 2
4 6 6
iid Pn
Theorem: X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).
Similarly,
Var(Y ) = npq.
Similarly,
Var(Y ) = npq.
We’ve already seen that
Similarly,
Var(Y ) = npq.
We’ve already seen that
Similarly,
Var(Y ) = npq.
We’ve already seen that
Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf
Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf
a
b
k n−k
P (X = k) = a+b
, k = 0, 1, . . . , min(a, n − b, n).
n
Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf
a
b
k n−k
P (X = k) = a+b
, k = 0, 1, . . . , min(a, n − b, n).
n
Let X be the number of type 1’s selected. Then X has the Hypergeometric
distribution with pmf
a
b
k n−k
P (X = k) = a+b
, k = 0, 1, . . . , min(a, n − b, n).
n
Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,
Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,
P (Z = k) = q k−1 p, k = 1, 2, . . . ,
Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,
P (Z = k) = q k−1 p, k = 1, 2, . . . ,
Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,
P (Z = k) = q k−1 p, k = 1, 2, . . . ,
Notation: X ∼ Geom(p).
Let Z equal the number of trials until the first success is obtained. The event
Z = k corresponds to k − 1 failures, and then a success. Thus,
P (Z = k) = q k−1 p, k = 1, 2, . . . ,
Notation: X ∼ Geom(p).
We’ll get the mean and variance of the Geometric via the mgf. . . .
pet
MZ (t) = , for t < `n(1/q).
1 − qet
pet
MZ (t) = , for t < `n(1/q).
1 − qet
Proof:
MZ (t) = E[etZ ]
pet
MZ (t) = , for t < `n(1/q).
1 − qet
Proof:
MZ (t) = E[etZ ]
∞
X
= etk q k−1 p
k=1
pet
MZ (t) = , for t < `n(1/q).
1 − qet
Proof:
MZ (t) = E[etZ ]
∞
X
= etk q k−1 p
k=1
∞
X
t
= pe (qet )k
k=0
pet
MZ (t) = , for t < `n(1/q).
1 − qet
Proof:
MZ (t) = E[etZ ]
∞
X
= etk q k−1 p
k=1
∞
X
t
= pe (qet )k
k=0
pet
= , for qet < 1. 2
1 − qet
Proof:
d
E[Z] = MZ (t)
dt t=0
Proof:
d
E[Z] = MZ (t)
dt t=0
Proof:
d
E[Z] = MZ (t)
dt t=0
pet
=
(1 − qet )2 t=0
Proof:
d
E[Z] = MZ (t)
dt t=0
pet
=
(1 − qet )2 t=0
p 1
= = . 2
(1 − q)2 p
Proof:
d
E[Z] = MZ (t)
dt t=0
pet
=
(1 − qet )2 t=0
p 1
= = . 2
(1 − q)2 p
Remark: We could also have proven this directly from the definition of
expected value.
2d2 2−p
E[Z ] = MZ (t) = ,
dt2 t=0 p2
2d2 2−p
E[Z ] = MZ (t) = ,
dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2
2d2 2−p
E[Z ] = MZ (t) = ,
dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2
2d2 2−p
E[Z ] = MZ (t) = ,
dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2
2d2 2−p
E[Z ] = MZ (t) = ,
dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2
2d2 2−p
E[Z ] = MZ (t) = ,
dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2
2d2 2−p
E[Z ] = MZ (t) = ,
dt2 t=0 p2
and then
Var(Z) = E[Z 2 ] − (E[Z])2 = q/p2 . 2
Then
Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)
Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)
q s+t
= (by (∗))
qs
Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)
q s+t
= (by (∗))
qs
= qt
Then
P (Z > s + t ∩ Z > s)
P (Z > s + t|Z > s) =
P (Z > s)
P (Z > s + t)
=
P (Z > s)
q s+t
= (by (∗))
qs
= qt
= P (Z > t). 2
Example: Let’s toss a die until a 5 appears for the first time.
Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success.
Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?
Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?
Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?
Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?
Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?
Fun Fact: The Geom(p) is the only discrete distribution with the
memoryless property.
Example: Let’s toss a die until a 5 appears for the first time. Suppose that
we’ve already made 4 tosses without success. What’s the probability that
we’ll need more than 2 additional tosses before we observe a 5?
Fun Fact: The Geom(p) is the only discrete distribution with the
memoryless property.
Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . ..
Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.
Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.
We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.
Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.
We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.
Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.
We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.
Now let W equal the number of trials until the rth success is obtained.
W = r, r + 1, . . .. The event W = k corresponds to exactly r − 1 successes
by time k − 1, and then the rth success at time k.
We say that W has the Negative Binomial distribution (aka the Pascal
distribution) with parameters r and p.
Remark: As with the Geom(p), the exact definition of the NegBin depends
on what book you’re reading.
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.
Proof: Won’t do it here, but you can use the mgf technique. 2
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.
Proof: Won’t do it here, but you can use the mgf technique. 2
Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.
Proof: Won’t do it here, but you can use the mgf technique. 2
Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.
Proof: Won’t do it here, but you can use the mgf technique. 2
Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.
Proof: Won’t do it here, but you can use the mgf technique. 2
Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.
iid Pr
Theorem: If Z1 , . . . , Zr ∼ Geom(p), then W = i=1 Zi ∼ NegBin(r, p).
In other words, Geometrics add up to a NegBin.
Proof: Won’t do it here, but you can use the mgf technique. 2
Anyhow, it makes sense if you think of Zi as the number of trials after the
(i − 1)st success up to and including the ith success.
Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
k − 1 r−1 k−r
P (W = k) = p q p
r−1
Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1
Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1
Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1
Example: Toss a die until a 5 appears for the third time. What’s the
probability that we’ll need exactly 7 tosses?
Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1
Example: Toss a die until a 5 appears for the third time. What’s the
probability that we’ll need exactly 7 tosses?
Just to be complete, let’s get the pmf of W . Note that W = k iff you get
exactly r − 1 successes by time k − 1, and then the rth success at time k.
So. . .
k − 1 r−1 k−r k − 1 r k−r
P (W = k) = p q p= p q , k = r, r + 1, . . .
r−1 r−1
Example: Toss a die until a 5 appears for the third time. What’s the
probability that we’ll need exactly 7 tosses?
iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).
iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).
iid Pr
Z1 , . . . , Zr ∼ Geom(p) ⇒ W ≡ i=1 Zi ∼ NegBin(r, p).
iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).
iid Pr
Z1 , . . . , Zr ∼ Geom(p) ⇒ W ≡ i=1 Zi ∼ NegBin(r, p).
iid Pn
X1 , . . . , Xn ∼ Bern(p) ⇒ Y ≡ i=1 Xi ∼ Bin(n, p).
iid Pr
Z1 , . . . , Zr ∼ Geom(p) ⇒ W ≡ i=1 Zi ∼ NegBin(r, p).
Let λ > 0 be the average number of occurrences per unit time (or length or
volume).
Let λ > 0 be the average number of occurrences per unit time (or length or
volume).
Let λ > 0 be the average number of occurrences per unit time (or length or
volume).
Let λ > 0 be the average number of occurrences per unit time (or length or
volume).
Let λ > 0 be the average number of occurrences per unit time (or length or
volume).
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
(i) There is a short enough interval of time h, such that, for all t,
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
(i) There is a short enough interval of time h, such that, for all t,
P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
(i) There is a short enough interval of time h, such that, for all t,
P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
(i) There is a short enough interval of time h, such that, for all t,
P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)
P (N (t + h) − N (t) ≥ 2) = o(h)
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
(i) There is a short enough interval of time h, such that, for all t,
P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)
P (N (t + h) − N (t) ≥ 2) = o(h)
First, some notation: o(h) is a generic function that goes to zero faster than h
goes to zero.
(i) There is a short enough interval of time h, such that, for all t,
P (N (t + h) − N (t) = 0) = 1 − λh + o(h)
P (N (t + h) − N (t) = 1) = λh + o(h)
P (N (t + h) − N (t) ≥ 2) = o(h)
(iii) If a < b < c < d, then the two “increments” N (d) − N (c) and
N (b) − N (a) are independent RVs.
ISYE 6739 — Goldsman 8/5/20 24 / 108
Poisson Distribution
(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)
(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)
(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)
(iii) The numbers of arrivals in two disjoint time intervals are independent
(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)
(iii) The numbers of arrivals in two disjoint time intervals are independent
(i) Arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We
must make sure that λ doesn’t change over time.)
(iii) The numbers of arrivals in two disjoint time intervals are independent
Notation: X ∼ Pois(λ).
Notation: X ∼ Pois(λ).
Notation: X ∼ Pois(λ).
Proof: The proof follows from the PP assumptions and involves some simple
differential equations.
Notation: X ∼ Pois(λ).
Proof: The proof follows from the PP assumptions and involves some simple
differential equations.
To begin with, let’s define Px (t) ≡ P (N (t) = x), i.e., the probability of
exactly x arrivals by time t.
Notation: X ∼ Pois(λ).
Proof: The proof follows from the PP assumptions and involves some simple
differential equations.
To begin with, let’s define Px (t) ≡ P (N (t) = x), i.e., the probability of
exactly x arrivals by time t.
Note that the probability that there haven’t been any arrivals by time t + h can
be written in terms of the probability that there haven’t been any arrivals by
time t. . . .
P0 (t + h)
= P (N (t + h) = 0)
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).
Thus,
P0 (t + h) − P0 (t) .
= −λP0 (t).
h
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).
Thus,
P0 (t + h) − P0 (t) .
= −λP0 (t).
h
Taking the limit as h → 0, we have
P0 (t + h)
= P (N (t + h) = 0)
= P (no arrivals by time t and then no arrivals by time t + h)
= P {N (t) = 0} ∩ {N (t + h) − N (t) = 0}
= P (N (t) = 0)P (N (t + h) − N (t) = 0) (by indep increments (iii))
.
= P (N (t) = 0)(1 − λh) (by (i) and a little bit of (ii))
= P0 (t)(1 − λh).
Thus,
P0 (t + h) − P0 (t) .
= −λP0 (t).
h
Taking the limit as h → 0, we have
Px (t + h) = P (N (t + h) = x)
Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)
Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)
.
= P {N (t) = x} ∩ {N (t + h) − N (t) = 0}
+P {N (t) = x − 1} ∩ {N (t + h) − N (t) = 1}
(by (i), only consider case of one arrival in [t, t + h])
Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)
.
= P {N (t) = x} ∩ {N (t + h) − N (t) = 0}
+P {N (t) = x − 1} ∩ {N (t + h) − N (t) = 1}
(by (i), only consider case of one arrival in [t, t + h])
= P (N (t) = x)P (N (t + h) − N (t) = 0)
+P (N (t) = x − 1)P (N (t + h) − N (t) = 1)
(by independent increments (iii))
Px (t + h) = P (N (t + h) = x)
= P (N (t + h) = x and no arrivals during [t, t + h])
+P (N (t + h) = x and ≥ 1 arrival during [t, t + h])
(Law of Total Probability)
.
= P {N (t) = x} ∩ {N (t + h) − N (t) = 0}
+P {N (t) = x − 1} ∩ {N (t + h) − N (t) = 1}
(by (i), only consider case of one arrival in [t, t + h])
= P (N (t) = x)P (N (t + h) − N (t) = 0)
+P (N (t) = x − 1)P (N (t + h) − N (t) = 1)
(by independent increments (iii))
.
= Px (t)(1 − λh) + Px−1 (t)λh.
(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)
(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)
(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)
(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)
Examples:
X = # calls to a switchboard in 1 minute ∼ Pois(3 / min)
(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)
Examples:
X = # calls to a switchboard in 1 minute ∼ Pois(3 / min)
Y = # calls to a switchboard in 5 minutes ∼ Pois(15 / 5 min)
(λt)x e−λt
Px (t) = , x = 0, 1, 2, . . . .
x!
(If you don’t believe me, just plug in and see for yourself!)
Examples:
X = # calls to a switchboard in 1 minute ∼ Pois(3 / min)
Y = # calls to a switchboard in 5 minutes ∼ Pois(15 / 5 min)
Z = # calls to a switchboard in 10 sec ∼ Pois(0.5 / 10 sec)
ISYE 6739 — Goldsman 8/5/20 29 / 108
Poisson Distribution
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
MX (t) = E[etX ]
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞
e−λ λk
X
tX tk
MX (t) = E[e ] = e
k!
k=0
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞ ∞
e−λ λk (λet )k
X X
tX tk −λ
MX (t) = E[e ] = e = e
k! k!
k=0 k=0
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞ ∞
e−λ λk (λet )k
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞ ∞
e−λ λk (λet )k
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞ ∞
e−λ λk (λet )k
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0
d
E[X] = MX (t)
dt t=0
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞ ∞
e−λ λk (λet )k
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0
d d λ(et −1)
E[X] = MX (t) = e
dt dt
t=0 t=0
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞ ∞
e−λ λk (λet )k
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0
d d λ(et −1)
E[X] = MX (t) = e = λet MX (t)
dt dt
t=0 t=0 t=0
t −1)
Theorem: X ∼ Pois(λ) ⇒ mgf is MX (t) = eλ(e .
Proof:
∞ ∞
e−λ λk (λet )k
t
X X
−λ
MX (t) = E[e tX
] = tk
e = e = e−λ eλe . 2
k! k!
k=0 k=0
d d λ(et −1)
E[X] = MX (t) = e = λet MX (t) = λ.
dt dt
t=0 t=0 t=0
Similarly,
d2
E[X 2 ] = M (t)
2 X
dt
t=0
Similarly,
d2
E[X 2 ] = M (t)
2 X
dt
t=0
dd
= MX (t)
dt dt t=0
Similarly,
d2
E[X 2 ] = M (t)
2 X
dt
t=0
dd
= MX (t)
dt dt t=0
d t
= λ e MX (t)
dt t=0
Similarly,
d2
E[X 2 ] = M (t)
2 X
dt
t=0
dd
= MX (t)
dt dt t=0
d t
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)
dt t=0
Similarly,
d2
E[X 2 ] = M (t)
2 X
dt
t=0
dd
= MX (t)
dt dt t=0
d t
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)
dt t=0
h i
= λet MX (t) + λet MX (t)
t=0
Similarly,
d2
E[X 2 ] = M (t)
2 X
dt
t=0
dd
= MX (t)
dt dt t=0
d t
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)
dt t=0
h i
= λet MX (t) + λet MX (t)
t=0
= λ(1 + λ).
Similarly,
d2
E[X 2 ] = M (t)
2 X
dt
t=0
dd
= MX (t)
dt dt t=0
d t
= λ e MX (t)
dt t=0
h d i
= λ et MX (t) + et MX (t)
dt t=0
h i
= λet MX (t) + λet MX (t)
t=0
= λ(1 + λ).
The total number of arrivals is Pois(λ1 + λ2 = 8/hr), and so the total in the
next 30 minutes is X ∼ Pois(4).
The total number of arrivals is Pois(λ1 + λ2 = 8/hr), and so the total in the
next 30 minutes is X ∼ Pois(4). So P (X = 2) = e−4 42 /2!. 2
Definition: The RV X has the Uniform distribution if it has pdf and cdf
(
1
b−a , a≤x≤b
f (x) =
0, otherwise
Definition: The RV X has the Uniform distribution if it has pdf and cdf
(
1 0,
x<a
b−a , a ≤ x ≤ b
f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise
x ≥ b.
1,
Definition: The RV X has the Uniform distribution if it has pdf and cdf
(
1 0,
x<a
b−a , a ≤ x ≤ b
f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise
x ≥ b.
1,
Definition: The RV X has the Uniform distribution if it has pdf and cdf
(
1 0,
x<a
b−a , a ≤ x ≤ b
f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise
x ≥ b.
1,
Definition: The RV X has the Uniform distribution if it has pdf and cdf
(
1 0,
x<a
b−a , a ≤ x ≤ b
f (x) = and F (x) = x
b−a , a ≤ x ≤ b
0, otherwise
x ≥ b.
1,
Proof:
P (X > s + t ∩ X > s)
P (X > s + t|X > s) =
P (X > s)
Proof:
P (X > s + t ∩ X > s) P (X > s + t)
P (X > s + t|X > s) = =
P (X > s) P (X > s)
Proof:
P (X > s + t ∩ X > s) P (X > s + t)
P (X > s + t|X > s) = =
P (X > s) P (X > s)
e−λ(s+t)
=
e−λs
Proof:
P (X > s + t ∩ X > s) P (X > s + t)
P (X > s + t|X > s) = =
P (X > s) P (X > s)
e−λ(s+t)
= = e−λt = P (X > t). 2
e−λs
Example: If the time to the next bus is exponentially distributed with a mean
of 10 minutes, and you’ve already been waiting 20 minutes, you can expect to
wait 10 more. 2
Example: If the time to the next bus is exponentially distributed with a mean
of 10 minutes, and you’ve already been waiting 20 minutes, you can expect to
wait 10 more. 2
Example: If the time to the next bus is exponentially distributed with a mean
of 10 minutes, and you’ve already been waiting 20 minutes, you can expect to
wait 10 more. 2
Remark: Look at E[X] and Var(X) for the Geometric distribution and see
how they’re similar to those for the exponential. (Not a coincidence.)
Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is
Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is
f (t) f (t)
S(t) ≡ = ,
P (X > t) 1 − F (t)
Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is
f (t) f (t)
S(t) ≡ = ,
P (X > t) 1 − F (t)
which can loosely be regarded as X’s instantaneous rate of death, given that it
has so far survived to time t.
Definition: If X is a cts RV with pdf f (x) and cdf F (x), then its failure
rate function is
f (t) f (t)
S(t) ≡ = ,
P (X > t) 1 − F (t)
which can loosely be regarded as X’s instantaneous rate of death, given that it
has so far survived to time t.
Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).
Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).
Proof:
Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).
Proof:
Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).
Proof:
Theorem: Let X be the amount of time until the first arrival in a Poisson
process with rate λ. Then X ∼ Exp(λ).
Proof:
Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr.
Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr. What’s the probability that the time between the 13th and 14th
customers will be at least 4 minutes?
Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr. What’s the probability that the time between the 13th and 14th
customers will be at least 4 minutes?
Let the time between customers 13 and 14 be X. Since we have a PP, the
interarrivals are iid Exp(λ = 20/hr), so
Example: Suppose that arrivals to a shopping center are from a PP with rate
λ = 20/hr. What’s the probability that the time between the 13th and 14th
customers will be at least 4 minutes?
Let the time between customers 13 and 14 be X. Since we have a PP, the
interarrivals are iid Exp(λ = 20/hr), so
iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).
iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).
iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).
iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).
iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).
λk e−λs sk−1
f (s) = , s ≥ 0, and
(k − 1)!
iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).
iid
Definition/Theorem: Suppose X1 , . . . , Xk ∼ Exp(λ), and let
S = ki=1 Xi . Then S has the Erlang distribution with parameters k and
P
λ, denoted S ∼ Erlangk (λ).
Notice that the cdf is the sum of a bunch of Poisson probabilities. (Won’t do it
here, but this observation helps in the derivation of the cdf.)
Var(S) = k/λ2
Var(S) = k/λ2
k
λ
MS (t) = .
λ−t
Var(S) = k/λ2
k
λ
MS (t) = .
λ−t
Var(S) = k/λ2
k
λ
MS (t) = .
λ−t
Var(S) = k/λ2
k
λ
MS (t) = .
λ−t
λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)
λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)
where Z ∞
Γ(α) ≡ tα−1 e−t dt
0
is the gamma function.
λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)
where Z ∞
Γ(α) ≡ tα−1 e−t dt
0
is the gamma function.
λα e−λx xα−1
f (x) = , x ≥ 0,
Γ(α)
where Z ∞
Γ(α) ≡ tα−1 e−t dt
0
is the gamma function.
a+b+c
E[X] = and
3
a+b+c
E[X] = and Var(X) = mess
3
Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)
Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)
a
E[X] = and
a+b
Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)
a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)
Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)
a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)
This distribution gets its name from the beta function, which is defined as
Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)
a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)
This distribution gets its name from the beta function, which is defined as
Z 1
Γ(a)Γ(b)
β(a, b) ≡ = xa−1 (1 − x)b−1 dx.
Γ(a + b) 0
The Beta distribution is very flexible. Here are a few family portraits.
The Beta distribution is very flexible. Here are a few family portraits.
The Beta distribution is very flexible. Here are a few family portraits.
1
f (x) = and
π(1 + x2 )
1 1 arctan(x)
f (x) = and F (x) = + , x ∈ R.
π(1 + x2 ) 2 π
1 1 arctan(x)
f (x) = and F (x) = + , x ∈ R.
π(1 + x2 ) 2 π
1 1 arctan(x)
f (x) = and F (x) = + , x ∈ R.
π(1 + x2 ) 2 π
t distribution — coming up
t distribution — coming up
F distribution — coming up
t distribution — coming up
F distribution — coming up
t distribution — coming up
F distribution — coming up
Etc. . . .
−(x − µ)2
1
f (x) = √ exp , ∀x ∈ R.
2πσ 2 2σ 2
−(x − µ)2
1
f (x) = √ exp , ∀x ∈ R.
2πσ 2 2σ 2
−(x − µ)2
1
f (x) = √ exp , ∀x ∈ R.
2πσ 2 2σ 2
The pdf f (x) is “bell-shaped” and symmetric around x = µ, with tails falling
off quickly as you move away from µ.
The pdf f (x) is “bell-shaped” and symmetric around x = µ, with tails falling
off quickly as you move away from µ.
The pdf f (x) is “bell-shaped” and symmetric around x = µ, with tails falling
off quickly as you move away from µ.
R
Fun Fact (1): R f (x) dx = 1.
R
Fun Fact (1): R f (x) dx = 1.
R
Fun Fact (1): R f (x) dx = 1.
R
Fun Fact (1): R f (x) dx = 1.
R
Fun Fact (1): R f (x) dx = 1.
R
Fun Fact (1): R f (x) dx = 1.
n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1
n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1
n
tb
Y h 1 i
= e exp µi (ai t) + σi2 (ai t)2 (normal mgf)
2
i=1
n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1
n
tb
Y h 1 i
= e exp µi (ai t) + σi2 (ai t)2 (normal mgf)
2
i=1
n
X n
1 X 2 2 2
= exp µi a i + b t + ai σi t ,
2
i=1 i=1
n
Y
tb
= e Mai Xi (t) (Xi ’s independent)
i=1
n
Y
tb
= e MXi (ai t) (mgf of linear function)
i=1
n
tb
Y h 1 i
= e exp µi (ai t) + σi2 (ai t)2 (normal mgf)
2
i=1
n
X n
1 X 2 2 2
= exp µi a i + b t + ai σi t ,
2
i=1 i=1
and
Var(2X − 3Y ) = 4Var(X) + 9Var(Y ) = 70.
and
Var(2X − 3Y ) = 4Var(X) + 9Var(Y ) = 70.
Thus, 2X − 3Y ∼ Nor(−6, 70). 2
ISYE 6739 — Goldsman 8/5/20 60 / 108
Normal Distribution: Basics
X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).
X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).
X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).
X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).
X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).
X ∼ Nor(µ, σ 2 ) ⇒ aX + b ∼ Nor(aµ + b, a2 σ 2 ).
The Nor(0, 1) is nice because there are tables available for its cdf.
The Nor(0, 1) is nice because there are tables available for its cdf.
You can standardize any normal RV X into a standard normal by applying the
transformation Z = (X − µ)/σ. Then you can use the cdf tables.
The Nor(0, 1) is nice because there are tables available for its cdf.
You can standardize any normal RV X into a standard normal by applying the
transformation Z = (X − µ)/σ. Then you can use the cdf tables.
The Nor(0, 1) is nice because there are tables available for its cdf.
You can standardize any normal RV X into a standard normal by applying the
transformation Z = (X − µ)/σ. Then you can use the cdf tables.
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Φ(0) = 1/2
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1
Then
P (µ − kσ ≤ X ≤ µ + kσ)
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1
Then
P (µ − kσ ≤ X ≤ µ + kσ) = P (−k ≤ Z ≤ k)
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1
Then
Remarks: The following results are easy to derive, usually via symmetry
arguments.
P (Z ≤ a) = Φ(a)
P (Z ≥ b) = 1 − Φ(b)
P (a ≤ Z ≤ b) = Φ(b) − Φ(a)
Φ(0) = 1/2
Φ(−b) = P (Z ≤ −b) = P (Z ≥ b) = 1 − Φ(b)
P (−b ≤ Z ≤ b) = Φ(b) − Φ(−b) = 2Φ(b) − 1
Then
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750
2.33 0.9901 ≈ 0.99
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750
2.33 0.9901 ≈ 0.99
3.00 0.9987
Famous Nor(0, 1) table values. You can memorize these. Or you can use
software calls, like NORMDIST in Excel (which calculates the cdf for any
normal distribution.)
z Φ(z) = P (Z ≤ z)
0.00 0.5000
1.00 0.8413
1.28 0.8997 ≈ 0.90
1.645 0.9500
1.96 0.9750
2.33 0.9901 ≈ 0.99
3.00 0.9987
4.00 ≈ 1.0000
By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is
By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is
P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.
By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is
P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.
By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is
P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.
There is a 95% chance that a normal observation will be within 2 s.d.’s of its
mean.
By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is
P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.
There is a 95% chance that a normal observation will be within 2 s.d.’s of its
mean.
By the earlier “Fun Facts” and then the discussion on the last two pages, the
probability that any normal RV is within k standard deviations of its mean is
P (µ − kσ ≤ X ≤ µ + kσ) = 2Φ(k) − 1.
There is a 95% chance that a normal observation will be within 2 s.d.’s of its
mean.
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.
p Φ−1 (p)
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.
p Φ−1 (p)
0.90 1.28
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.
p Φ−1 (p)
0.90 1.28
0.95 1.645
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.
p Φ−1 (p)
0.90 1.28
0.95 1.645
0.975 1.96
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.
p Φ−1 (p)
0.90 1.28
0.95 1.645
0.975 1.96
0.99 2.33
Famous Inverse Nor(0, 1) table values. Can also use software, such as
Excel’s NORMINV function, which actually calculates inverses for any normal
distribution, not just standard normal.
Φ−1 (p) is the value of z such that Φ(z) = p. Φ−1 (p) is called the pth
quantile of Z.
p Φ−1 (p)
0.90 1.28
0.95 1.645
0.975 1.96
0.99 2.33
0.995 2.58
Find the probability that the woman is taller than the man.
Find the probability that the woman is taller than the man.
Find the probability that the woman is taller than the man.
Find the probability that the woman is taller than the man.
Find the probability that the woman is taller than the man.
Find the probability that the woman is taller than the man.
Find the probability that the woman is taller than the man.
In the upcoming statistics portion of the course, we’ll learn that this makes X̄
an excellent estimator for the mean µ, which is typically unknown in
practice.
ISYE 6739 — Goldsman 8/5/20 71 / 108
Sample Mean of Normals
iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?
iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?
iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?
P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)
iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?
P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)
−1 X̄ − µ 1
= P √ ≤ √ ≤ √
4/ n 4/ n 4/ n
iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?
P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)
−1 X̄ − µ 1
= P √ ≤ √ ≤ √
4/ n 4/ n 4/ n
√ √
− n n
= P ≤Z≤
4 4
iid
Example: Suppose that X1 , . . . , Xn ∼ Nor(µ, 16). Find the sample size n
such that
P (|X̄ − µ| ≤ 1) ≥ 0.95.
How many observations should you take so that X̄ will have a good chance of
being close to µ?
P (|X̄ − µ| ≤ 1) = P (−1 ≤ X̄ − µ ≤ 1)
−1 X̄ − µ 1
= P √ ≤ √ ≤ √
4/ n 4/ n 4/ n
√ √
− n n
= P ≤Z≤
4 4
√
= 2Φ( n/4) − 1.
Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)
Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)
Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)
We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.
Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)
We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.
Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)
We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.
Slightly Honors Informal Proof: Suppose that the mgf MX (t) of the Xi ’s
exists and satisfies certain technical conditions that you don’t need to know
about. (OK, MX (t) has to exist around t = 0, among other things.)
We will be done if we can show that the mgf of Zn converges to the mgf of
2
Z ∼ Nor(0, 1), i.e., we need to show that MZn (t) → et /2 as n → ∞.
and
0 0
lim MX (ty) = MX (0) = E[X] = µ = 0, (4)
y→0
and
0 0
lim MX (ty) = MX (0) = E[X] = µ = 0, (4)
y→0
(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!
(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!
(3) You usually need n ≥ 30 observations for the approximation to work well.
(Need fewer observations if the Xi ’s come from a symmetric distribution.)
(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!
(3) You usually need n ≥ 30 observations for the approximation to work well.
(Need fewer observations if the Xi ’s come from a symmetric distribution.)
(4) You can almost always use the CLT if the observations are iid.
(2) The Xi ’s don’t have to be normal for the CLT to work! It even works
on discrete distributions!
(3) You usually need n ≥ 30 observations for the approximation to work well.
(Need fewer observations if the Xi ’s come from a symmetric distribution.)
(4) You can almost always use the CLT if the observations are iid.
(5) In fact, there are versions of the CLT that are actually a lot more general
than the theorem presented here!
Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui .
Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui . Note that
E[Sn ] = nE[Ui ] = n/2, and Var(Sn ) = nVar(Ui ) = n/12. Therefore,
Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui . Note that
E[Sn ] = nE[Ui ] = n/2, and Var(Sn ) = nVar(Ui ) = n/12. Therefore,
Sn − n/2
Zn ≡ p = Sn − 6 ≈ Nor(0, 1).
n/12
Example: To show that the Central Limit Theory P really works, let’s add up
just n = 12 iid Unif(0,1)’s, U1 , . . . , Un . Let Sn = ni=1 Ui . Note that
E[Sn ] = nE[Ui ] = n/2, and Var(Sn ) = nVar(Ui ) = n/12. Therefore,
Sn − n/2
Zn ≡ p = Sn − 6 ≈ Nor(0, 1).
n/12
The histogram was compiled using 100,000 simulations of Z12 . It works! 2
iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).
iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).
iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).
iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).
iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).
iid
Example: Suppose X1 , . . . , X100 ∼ Exp(1/1000).
Find P (950 ≤ X̄ ≤ 1050).
For our problem, λ = 1/1000 and n = 100, so that E[X̄] = 1000 and
Var(X̄) = 10000.
So by the CLT,
P (950 ≤ X̄ ≤ 1050)
So by the CLT,
P (950 ≤ X̄ ≤ 1050)
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)
So by the CLT,
P (950 ≤ X̄ ≤ 1050)
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)
. 950 − 1000 1050 − 1000
= P ≤Z≤
100 100
So by the CLT,
P (950 ≤ X̄ ≤ 1050)
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)
. 950 − 1000 1050 − 1000
= P ≤Z≤
100 100
1 1
= P − ≤Z≤ = 2Φ(1/2) − 1 = 0.383. 2
2 2
So by the CLT,
P (950 ≤ X̄ ≤ 1050)
950 − E[X̄] X̄ − E[X̄] 1050 − E[X̄]
= P p ≤ p ≤ p
Var(X̄) Var(X̄) Var(X̄)
. 950 − 1000 1050 − 1000
= P ≤Z≤
100 100
1 1
= P − ≤Z≤ = 2Φ(1/2) − 1 = 0.383. 2
2 2
Remark: This problem can be solved exactly if we have access to the Excel
Erlang cdf function GAMMADIST. And what do you know, you end up with
exactly the same answer of 0.383!
Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000.
Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000. Find P (950 ≤ X̄ ≤ 1050).
Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000. Find P (950 ≤ X̄ ≤ 1050).
Example: Suppose X1 , . . . , X100 are iid from some distribution with mean
1000 and standard deviation 1000. Find P (950 ≤ X̄ ≤ 1050).
Notice that we didn’t care whether or not the data came from an exponential
distribution. We just needed the mean and variance. 2
Then
Y − E[Y ] Y − np
p = √ ≈ Nor(0, 1).
Var(Y ) npq
Then
Y − E[Y ] Y − np
p = √ ≈ Nor(0, 1).
Var(Y ) npq
The usual rule of thumb for the Normal approximation to the Binomial is that
it works pretty well as long as np ≥ 5 and nq ≥ 5.
Good luck with the binomial coefficients (they’re too big) and the number of
terms to sum up (it’s going to get tedious). I’ll come back to visit you in an
hour.
Good luck with the binomial coefficients (they’re too big) and the number of
terms to sum up (it’s going to get tedious). I’ll come back to visit you in an
hour.
Good luck with the binomial coefficients (they’re too big) and the number of
terms to sum up (it’s going to get tedious). I’ll come back to visit you in an
hour.
Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?
Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?
Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?
Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?
Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?
Example: The Braves play 100 independent baseball games, each of which
they have probability 0.8 of winning. What’s the probability that they win ≥
84?
The actual answer (using the true Bin(100,0.8) distribution) turns out to be
0.1923 — pretty close! 2
where
1
ρ ≡ Corr(X, Y ), C ≡ p ,
2πσX σY 1 − ρ2
where
1
ρ ≡ Corr(X, Y ), C ≡ p ,
2πσX σY 1 − ρ2
x − µX y − µY
zX (x) ≡ , and zY (y) ≡ .
σX σY
Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .
Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .
Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .
Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .
µX = 1300, µY = 2.3,
Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .
µX = 1300, µY = 2.3,
2
σX = 6400, σY2 = 0.25, ρ = 0.6.
Fun Fact (which will come up later when we discuss regression): The
conditional distribution of Y given that X = x is also normal. In particular,
Y |X = x ∼ Nor µY + ρ(σY /σX )(x − µX ), σY2 (1 − ρ2 ) .
µX = 1300, µY = 2.3,
2
σX = 6400, σY2 = 0.25, ρ = 0.6.
First,
First,
First,
indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.
First,
indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.
Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
First,
indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.
Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).
First,
indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.
Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).
Now we can calculate
2 − 0.8
P (Y ≥ 2|X = 900) = P Z ≥ √
0.16
First,
indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.
Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).
Now we can calculate
2 − 0.8
P (Y ≥ 2|X = 900) = P Z ≥ √ = 1 − Φ(3) = 0.0013.
0.16
First,
indicating that the expected GPA of a kid with 900 SAT’s will be 0.8.
Second,
Var(Y |X = 900) = σY2 (1 − ρ2 ) = 0.16.
Thus,
Y |X = 900 ∼ Nor(0.8, 0.16).
Now we can calculate
2 − 0.8
P (Y ≥ 2|X = 900) = P Z ≥ √ = 1 − Φ(3) = 0.0013.
0.16
This guy doesn’t have much chance of having a good GPA. 2
(x − µ)T Σ−1 (x − µ)
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2
(x − µ)T Σ−1 (x − µ)
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2
where |Σ| and Σ−1 are the determinant and inverse of Σ, respectively.
(x − µ)T Σ−1 (x − µ)
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2
where |Σ| and Σ−1 are the determinant and inverse of Σ, respectively.
(x − µ)T Σ−1 (x − µ)
1
f (x) = exp − , x ∈ Rk ,
(2π)k/2 |Σ|1/2 2
where |Σ| and Σ−1 are the determinant and inverse of Σ, respectively.
σ2
E[X] = exp µY + Y ,
2
σ2
E[X] = exp µY + Y ,
2
σ2 √
S(t) = S(0) exp µ− t + σ tZ , t ≥ 0,
2
σ2 √
S(t) = S(0) exp µ− t + σ tZ , t ≥ 0,
2
where µ is related to the “drift” of the stock price (i.e., the natural rate of
increase), σ is its “volatility” (how much the stock bounces around), S(0) is
the initial price, and Z is a standard normal RV.
For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.
For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.
If IBM is worth $120 three months from now, I’ll be able to buy it for
only $105, and will have made a profit of $120 − $105 − $3 = $12.
For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.
If IBM is worth $120 three months from now, I’ll be able to buy it for
only $105, and will have made a profit of $120 − $105 − $3 = $12.
If IBM is selling for $107 three months hence, I can still buy it for $105,
and will lose $1 (recouping $2 from my original option purchase).
For instance, suppose IBM is currently selling for $100 a share. If I think that
the stock will go up in value, I may want to pay $3/share now for the right to
buy IBM at $105 three months from now.
If IBM is worth $120 three months from now, I’ll be able to buy it for
only $105, and will have made a profit of $120 − $105 − $3 = $12.
If IBM is selling for $107 three months hence, I can still buy it for $105,
and will lose $1 (recouping $2 from my original option purchase).
If IBM is selling for $95, then I won’t exercise my option, and will walk
away with my tail between my legs having lost my original $3.
So what is the option worth (and what should I pay for it)? Its expected value
is given by
So what is the option worth (and what should I pay for it)? Its expected value
is given by +
E[C] = e−rT E S(T ) − k
,
where
So what is the option worth (and what should I pay for it)? Its expected value
is given by +
E[C] = e−rT E S(T ) − k
,
where
x+ ≡ max{0, x}.
So what is the option worth (and what should I pay for it)? Its expected value
is given by +
E[C] = e−rT E S(T ) − k
,
where
x+ ≡ max{0, x}.
r, the “risk-free” interest rate (e.g., what you can get from a U.S.
Treasury bond). This is used instead of the drift µ.
So what is the option worth (and what should I pay for it)? Its expected value
is given by +
E[C] = e−rT E S(T ) − k
,
where
x+ ≡ max{0, x}.
r, the “risk-free” interest rate (e.g., what you can get from a U.S.
Treasury bond). This is used instead of the drift µ.
The term e−rT denotes the time-value of money, i.e., a depreciation term
corresponding to the interest I could’ve made had I used my money to
buy a Treasury note.
So what is the option worth (and what should I pay for it)? Its expected value
is given by +
E[C] = e−rT E S(T ) − k
,
where
x+ ≡ max{0, x}.
r, the “risk-free” interest rate (e.g., what you can get from a U.S.
Treasury bond). This is used instead of the drift µ.
The term e−rT denotes the time-value of money, i.e., a depreciation term
corresponding to the interest I could’ve made had I used my money to
buy a Treasury note.
Black and Scholes won a Nobel Prize for calculating E[C]. We’ll get the
same answer via a different method — but, alas, no Nobel Prize.
+
σ2 √
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
σ2 √
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)
+
σ2 √
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)
√
= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),
+
σ2 √
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)
√
= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),
where φ(·) and Φ(·) are the Nor(0,1) pdf and cdf, respectively, and
+
σ2 √
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)
√
= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),
where φ(·) and Φ(·) are the Nor(0,1) pdf and cdf, respectively, and
σ2 T
rT − 2 − `n(k/S(0))
b ≡ √ .
σ T
+
σ2 √
−rT
E[C] = e E S(0) exp r − T + σ TZ − k
2
+
Z ∞
σ2 √
−rT
= e S(0) exp r − T + σ T z − k φ(z) dz
−∞ 2
(via the standard conditioning argument)
√
= S(0)Φ(b + σ T ) − ke−rT Φ(b) (after lots of algebra),
where φ(·) and Φ(·) are the Nor(0,1) pdf and cdf, respectively, and
σ2 T
rT − 2 − `n(k/S(0))
b ≡ √ .
σ T
There are many generalizations of this problem that are used in practical
finance problems, but this is the starting point. Meanwhile, get your tickets to
Norway or Sweden or wherever they give out the Nobel!
We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:
We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:
We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:
We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:
We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:
We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:
We can use various computer packages such as Excel, Minitab, R, SAS, etc.,
to calculate pmf’s / pdf’s and cdf’s for a variety of common distributions. For
instance, in Excel, we find the functions:
Functions such as NORMSINV and TINV can calculate the inverses of the
standard normal and t distributions, respectively.
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p
Z1 = −2`n(U1 ) cos(2πU2 ) and
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )
Example: Suppose that U1 = 0.3 and U2 = 0.8 are realizations of two iid
Unif(0,1)’s. Box–Muller gives the following two iid standard normals.
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )
Example: Suppose that U1 = 0.3 and U2 = 0.8 are realizations of two iid
Unif(0,1)’s. Box–Muller gives the following two iid standard normals.
p
Z1 = −2`n(U1 ) cos(2πU2 ) = 0.480
The simulation of RVs is actually the topic of another course, but we will end
this module with a remarkable method for generating normal RVs.
iid
Theorem (Box and Muller): If U1 , U2 ∼ Unif(0, 1), then
p p
Z1 = −2`n(U1 ) cos(2πU2 ) and Z2 = −2`n(U1 ) sin(2πU2 )
Example: Suppose that U1 = 0.3 and U2 = 0.8 are realizations of two iid
Unif(0,1)’s. Box–Muller gives the following two iid standard normals.
p
Z1 = −2`n(U1 ) cos(2πU2 ) = 0.480
−2`n(U1 ) sin(2πU2 ) = − 1.476. 2
p
Z2 =
Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.
Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.
It is essential that the cosine and sine must be calculated in radians, not
degrees.
Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.
It is essential that the cosine and sine must be calculated in radians, not
degrees.
To get X ∼ Nor(µ, σ 2 ) from Z ∼ Nor(0, 1), just take X = µ + σZ.
Remarks:
There are many other ways to generate Nor(0,1)’s, but this is the perhaps
the easiest.
It is essential that the cosine and sine must be calculated in radians, not
degrees.
To get X ∼ Nor(µ, σ 2 ) from Z ∼ Nor(0, 1), just take X = µ + σZ.
Amazingly, it’s “Muller”, not “Müller”.
See https://www.youtube.com/watch?v=nntGTK2Fhb0
Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by
Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by
∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2
Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by
∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2
∂u1 ∂u2 ∂u2 ∂u1
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2
Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by
∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2
∂u1 ∂u2 ∂u2 ∂u1
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2
Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by
∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2
∂u1 ∂u2 ∂u2 ∂u1
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2
Honors Proof: We follow the method from Module 3 to calculate the joint
pdf of a function of two RVs. Namely, if we can express U1 = k1 (Z1 , Z2 ) and
U2 = k2 (Z1 , Z2 ) for some functions k1 (·, ·) and k2 (·, ·), then the joint pdf of
(Z1 , Z2 ) is given by
∂u1 ∂u2 ∂u2 ∂u1
g(z1 , z2 ) = fU1 k1 (z1 , z2 ) fU2 k2 (z1 , z2 ) −
∂z1 ∂z2 ∂z1 ∂z2
∂u1 ∂u2 ∂u2 ∂u1
= − U1 and U2 are iid Unif(0,1) .
∂z1 ∂z2 ∂z1 ∂z2
so that
2 2
U1 = e−(Z1 +Z2 )/2 .
so that
s s
Z12 Z12
1 1
U2 = arccos ± = arccos ,
2π Z12 + Z22 2π Z12 + Z22
so that
s s
Z12 Z12
1 1
U2 = arccos ± = arccos ,
2π Z12 + Z22 2π Z12 + Z22
where we (non-rigorously) get rid of the “±” to balance off the fact that the
range of y = arccos(x) is only regarded to be 0 ≤ y ≤ π (not 0 ≤ y ≤ 2π).