STAT272 Topic2 2015

STAT272 Probability
Topic 2
Discrete Distributions and the Poisson Process
STAT272 2015 Topic 2 1

Random Variables
• Technical definition: a random variable is a function whose
domain is the sample space and whose range is the real line. i.e.
X is a random variable if
X:Ω→R
• The function must satisfy other conditions as well.

• e.g. Let X be the number of heads in four tosses of a coin. The
sample space Ω is the collection of all outcomes {O1 , O2 , O3 , O4 } ,
where Oi is one of H or T. X is then 0, 1, 2, 3 or 4, depending on
how many of the Oi ’s equal H.
• Mostly the sample space is suppressed and we just think of X as
taking on certain values with certain probabilities.
• e.g. We say that X is a Poisson random variable if for some
λ > 0,
e−λ λx
P (X = x) = x! ; x = 0, 1, 2, . . .
• Note that in the above we do not need to specify Ω.

Probability Distributions
• There are two main types of random variables: discrete and
continuous.
• A random variable X is said to be discrete if it can assume only
a finite (or countably infinite) number of distinct values. A set is
countably infinite if its elements can be put into 1:1
correspondence with the positive integers (which are, of course,
infinite). The rational numbers are countably infinite, while the
reals are not.
• Notation: P (X = x) = fX (x) . We call fX the probability
function (of X). It must satisfy the following conditions, in order
that the axioms of probability are satisfied
1. fX (x) ≥ 0, ∀x;
P
2. x fX (x) = 1.

• Note that we use X for a random variable and x for a value of X.
You will be penalised in assignments and the exam if you do not
distinguish between upper and lower case. Thus Y will be a
random variable and y a value of Y.
• Examples:
1. The function g for which



 0.3 ; x=1


 0.6 ; x=2
g (x) =


 0.1 ; x=3


 0 ; otherwise
can be the probability function of some rv (random variable);

2. The function g (x) for which

 k ; x = 1, 2, 3, . . .
x
g (x) =
 0 ; otherwise
P∞ 1
does not represent a probability function as x=1 x = ∞.
3. Probabilities based on the zeta function, where
∞
X 1
ζ (α) = α
x=1
x
converges if α > 1, and diverges if α ≤ 1: A zeta probability

function is given by

 k ; x = 1, 2, 3, . . .
fX (x) = xα
 0 ; otherwise

where α > 1 and k = 1/ζ (α).
Discrete Distributions
The Bernoulli distribution
• Consider an experiment in which there are only two possible

outcomes: ‘success’ and ‘failure’ (e.g. tossing a coin and
‘success’≡‘head’).
• We may construct a rv X by assigning the value 0 to ‘failure’
and 1 to ‘success’. The reason for doing this will become
apparent later on.
• We call such a rv X a Bernoulli rv and such an experiment a
Bernoulli trial.
• X has the probability function (pf), letting p be the probability

of ‘success’,

 1−p ; x=0


fX (x) = p ; x=1



0 ; otherwise.
p is called the “parameter” of the distribution. Some texts use π

instead of p.
• We can also write the pf as

 (1 − p)1−x px ; x = 0, 1
fX (x) =
 0 ; otherwise
• fX (x) pretty obviously satisfies the conditions for a pf if

0 ≤ p ≤ 1.

The Binomial Distribution
• A binomial random variable is the sum of n independent

Bernoulli random variables (i.e. a binomial experiment consists
of n independent Bernoulli trials).
• Thus a binomial random variable is the number of 1’s i.e. the
number of successes in n Bernoulli trials.
• Let X be distributed binomially, with parameters n and p (n is
the number of trials and p is the probability of ‘success’). We
write X ∼ Bin(n, p) . What is the probability function of X?
• The event X = x is the event that there are x successes and
(n − x) failures in the n trials.
• The successes can occur in any subset of length x in the n trials.
n

There are x such different subsets (combinations), as there are
x places out of n in which we can slot the successes – the

remaining (n − x) places will contain the failures.
• Each such combination has the same probability (where there are
x p’s and (n − x) (1 − p)’s below)
n−x
p × p × . . . × p × (1 − p) × (1 − p) × . . . × (1 − p) = px (1 − p) .
• Thus

 n
x (1 − p)n−x px ; x = 0, 1, . . . , n
fX (x) =
 0 ; otherwise.
• We should probably check that this is a pf:

1. fX (x) ≥ 0, ∀x;
STAT272 2015 Topic 2 10

2. By the binomial theorem
n
X n x n−x
p (1 − p)
x=0
x
= {p + (1 − p)}n
= 1.
The Geometric Distribution
• Let X be the number of trials up to and including the first

success in a sequence of independent Bernoulli trials with
constant success probability of p.
• We say that X is geometrically distributed with parameter p.
• What is the pf of X? The event X = x (x = 1, 2, . . .) occurs if
the first (x − 1) trials result in failure and the xth is a success.
STAT272 2015 Topic 2 11

The probability that this occurs is, where there are (x − 1)
(1 − p) terms below,
x−1
(1 − p) × · · · × (1 − p) × p = (1 − p) p
• Thus 
 (1 − p)x−1 p ; x = 1, 2, . . .
fX (x) =
 0 ; otherwise.
• Checks:
1. fX (x) ≥ 0, ∀x;
2. As long as 0 < p < 1,
∞
X x−1 1
p (1 − p) =p
x=1
1 − (1 − p)
p
= = 1.
p
STAT272 2015 Topic 2 12

The formula also holds if p = 1, but not if p = 0, obviously.
• Notation: Some books use q = 1 − p.
Negative Binomial Distribution
• Let X be the number of trials up to and including the kth

success in a sequence of independent Bernoulli trials with
constant success probability p.
• We say that X has the negative binomial distribution with
parameters k and p. (The reason for ‘negative binomial’ will
become apparent later.)
• What is the pf of X? The event X = x occurs when there are
exactly (k − 1) successes in the first (x − 1) trials, followed by a
success in the xth trial. Each combination has the same
STAT272 2015 Topic 2 13

probability. Hence, for x = k, k + 1, . . .

x − 1 k−1
fX (x) = p (1 − p)x−1−(k−1) p
k−1

x−1 k x−k
= p (1 − p)
k−1
and fX (x) = 0 otherwise.
• Checks:
1. fX (x) ≥ 0, ∀x;
P∞
2. Before evaluating x=k fX (x), we need
Newton’s generalized binomial theorem:

∞
X r r−i i
(a + b)r = a b (1)
i=0
i
STAT272 2015 Topic 2 14

where

r r(r − 1) · · · (r − i + 1)
= for r any real number
i i!
So we can write
p−k = (1 − q)−k where q = 1 − p

∞
X −k −k−y
= 1 (−q)y [(1) with r = −k, a = 1, b = −q]
y=0
y
∞
X −k
= (−1)y q y
y=0
y
∞
X −k
= (−1)y (1 − p)y . (2)
y=0
y
STAT272 2015 Topic 2 15

As long as 0 < p ≤ 1,
∞ ∞
X X x−1 k
fX (x) = p (1 − p)x−k
k−1
x=k x=k
∞
X y+k−1
=p k
(1 − p)y
y=0
k−1
∞
X (y + k − 1)! y
= pk (1 − p)
y=0
(k − 1)!y!
∞
X (y + k − 1) · · · k y
= pk (1 − p)
y=0
y!
Consider the numerator (y + k − 1) · · · k. There are

y + k − 1 − (k − 1) = y terms in the product. If we multiply
each term by −1, multiplying by (−1)y preserves the identity.
STAT272 2015 Topic 2 16

So
(−1)y (−k) (−k − 1) · · · (−k − y + 1)
∞
X ∞
X y
fX (x) = pk (1 − p)
y=0
y!
x=k
∞
X −k
= pk (−1)y (1 − p)y
y=0
y
= pk · p−k from (2)

=1
We have thus seen the reason for calling the distribution the
negative binomial distribution: the probabilities involve
coefficients in binomial expansions with negative index.
• Note: The negative binomial distribution is the generalisation of
the geometric distribution i.e. the geometric is the same as the
negative binomial with k = 1.
STAT272 2015 Topic 2 17

The Hypergeometric Distribution
• It is better to consider an example first: a box containing N

balls, has k white ones and (N − k) black. Draw n balls at
random without replacement. Let X be the number of white
balls drawn.
N

• What is the pf of X? Firstly, there are n different
combinations of n balls from N . The event X = x occurs when x
of the n balls are white and therefore n − x are black. The

k N −k

number of combinations with x white balls is therefore x n−x ,
since we must choose x white balls from K and (n − x) black
balls from (N − k). Hence,

k N −k

x n−x
fX (x) = N
.
n
STAT272 2015 Topic 2 18

• This is not totally correct – we must specify the vales of x for
which this is correct. We must have 0 ≤ x ≤ k. We must also
have 0 ≤ n − x ≤ N − k, or
n − N + k ≤ x ≤ n.
Thus


 k N − k


 x n−x


 ; max (0, n − N + k) ≤ x ≤ min (k, n)
 N
fX (x) = n








 0 ; otherwise
STAT272 2015 Topic 2 19

• Checks:
1. fX (x) ≥ 0, ∀x;
P
2. x fX (x) = 1
(not obvious arithmetically).
• Example: From a group of 20 graduate Actuaries, 9 are selected
randomly for employment. Assuming random selection, what is
the probability that the 9 selected include all the 5 best
Actuaries in the group of 20? Here
N = 20, k = 5, N − k = 15, n = 9.
STAT272 2015 Topic 2 20

• Let X be the number of best Actuaries chosen. Then
5
15
5 9−5
P (X = 5) = 20

9
15! 9!11!
=1× ×
4!11! 20!
21
=
2584
≃ 0.008127 .
STAT272 2015 Topic 2 21

The Poisson Distribution
• “Poisson” is French for fish, but the distribution is named after

the famous 19th century French mathematician Siméon-Denis
Poisson.
• The Poisson is a discrete distribution defined on the non-negative
integers 0, 1, 2, . . ., and is used as a model for counts occurring in
a fixed time period or space, e.g.
– number of accidents at the Herring Road-Waterloo Road
intersection in a week
– number of sms messages arriving on your mobile phone in a
day
– number of claims on an insurance policy in a year
– number of sharks near Bondi Beach at a particular time
STAT272 2015 Topic 2 22

• The Poisson distribution can be derived in many ways. We shall
look at two.
1. Binomial limit
Suppose X ∼ Bin(n, p) . Thus, for x = 0, 1, . . . , n

n x n−x
fX (x) = p (1 − p) .
x
Suppose that n is large, and p is small, but that np is also moderate
e.g. ≤ 10. (We shall look at the case where np is large later.) Let
θ = np and consider
g (x) = log fX (x) .
Then

θ θ
g (x) = log (n!)−log (x!)−log {(n − x)!}+x log +(n − x) log 1 − .
n n
STAT272 2015 Topic 2 23

Recall: Stirling’s formula gives an approximation for log(n!)

1 1 a (n)
log (n!) = n + log n − n + log (2π) + 0 < a (n) < 1 .
2 2 12n
(3)
Thus, we also have, for x fixed as n → ∞,

1 1 a (n − x)
log {(n − x)!} = n − x + log (n − x)−(n − x)+ log (2π)+ .
2 2 12 (n − x)
(4)
Recall: Taylor series expansion of log(1 + x):

x2 x3 x4
log(1 + x) = x − + − + ··· −1<x≤1
2 3 4
STAT272 2015 Topic 2 24

which gives
x2 x3 x4
log(1 − x) = −x − − − − ··· −1≤x<1 .
2 3 4
Now, for fixed θ, as n → ∞

θ θ θ2
log 1 − = − − 2 − ··· .
n n 2n
and hence

θ
(n − x) log 1 − = −(n − x)θ/n − (n − x)θ 2 /2n2 − . . . (5)
n
STAT272 2015 Topic 2 25

and for fixed x as n → ∞
x
log (n − x) = log n 1 −
n
x
= log n + log 1 −
n
x x2
= log n − − 2 − · · · .
n 2n
So we can write the first term of (4) as
2

1 1 x x
n−x+ log (n − x) = n − x + log n − − 2 − · · ·
2 2 n 2n
2

1 1 x x
= n−x+ log n − n − x + + 2 + ···
2 2 n 2n
(6)
STAT272 2015 Topic 2 26

Thus
g (x)

1 1 a (n)
= n+ log n − n + log(2π) + − log (x!)
2 2 12n
| {z }
(3)
 
 2

 1 1 x x 1 a(n − x) 
− n−x+ log n − n − x + + 2 + · · · − (n − x) + log (2π) + 
 2 2 n 2n 2 12(n − x) 
| {z }
(6)
+ x log θ − x log n − (n − x) θ/n − (n − x) θ2 /2n2 − · · ·

| {z }
(5)

1 1
= n + − n + x − − x log n − n + x + n − x − log (x!) − θ + x log θ,
2 2
STAT272 2015 Topic 2 27

plus other terms which converge to 0 as n → ∞. Hence, as n → ∞
g (x) → − log (x!) − θ + x log θ
and so
e−θ θ x
fX (x) → .
x!
This function is a pf since
∞ ∞
X e−θ θ x X θx
= e−θ
x=0
x! x=0
x!
= e−θ eθ
= e−θ+θ
= e0
=1.
STAT272 2015 Topic 2 28

Hence we have proved that the limiting form of the Binomial
distribution as n → ∞ and np = θ stays constant, is
e−θ θ x
fX (x) = x = 0, 1, 2, . . . .
x!
This is the Poisson distribution.
STAT272 2015 Topic 2 29

2. Poisson Process
(do not get mixed up here – it is very easy to do so). A Poisson

process is not a rv and is not a distribution. Firstly, a process, or
more correctly a stochastic process, {X(t)}, is a sequence of random
variables for which each element X(t) is a rv. Note that we use {} to
denote the process, or collection of random variables making up the
process. Changes in the index t are often thought of as changes in
time. The Poisson process is a model for the number of ‘events’
which occur. If {X(t)} is a Poisson process, then the rv X(t) is the
number of events which have occurred in the time interval (0, t).
STAT272 2015 Topic 2 30

As t increases, therefore, so will X(t). Now, each X(t) is a rv and has
a probability function. An example is where X(t) is the number of
α-particles emitted in the interval (0, t) . Together, the sequence
{X(t)} is a (stochastic) process, but at some time t, X(t) represents
the number of α-particles which have been emitted in (0, t).
“Little-oh” notation:
• We say that
g (δt) = o (δt)
g(δt)
if δt goes to 0 as δt goes to 0.
• It helps to think of o (δt) as something which is much smaller in
magnitude than δt (e.g. (δt)2 ) when |δt| is small.
STAT272 2015 Topic 2 31

Consider the following. Suppose events occur randomly in time,
subject to the conditions that
• times between events are independent of each other,
• the probability of one event occurring in (t, t + δt) is
λδt + o (δt)
in the limit as δt → 0, and

• the probability of more than one event occurring in (t, t + δt) is
o (δt).
Let X (t) be the number of events which have occurred in the interval
(0, t) . Then we say that {X(t)} is a Poisson process. We wish to
calculate the pf of the rv X(t).
STAT272 2015 Topic 2 32

For x = 0, 1, . . ., put
fX(t) (x) = P (X(t) = x) = px (t) .
Now, px (t + δt) is the probability that x events have occurred in the

interval (0, t + δt). It may be that
• x events occurred in (0, t) and none in [t, t + δt),
• or that x − 1 events occurred in (0, t) and 1 in [t, t + δt)
• or that x − 2 events occurred in (0, t) and 2 in [t, t + δt) , etc.
STAT272 2015 Topic 2 33

However, the probability that 2 events occur in [t, t + δt) is o (δt).
We thus have for x ≥ 1:
px (t + δt) = P (x events in (0, t) and none in [t, t + δt))

+ P (x − 1 events in (0, t) and 1 in [t, t + δt))
+ o (δt)
= px (t) P (no events in [t, t + δt))
+ px−1 (t) P (1 event in [t, t + δt)) + o (δt)
= px (t) {1 − λδt + o (δt)}
+ px−1 (t) {λδt + o (δt)} + o (δt)
= px (t) + λ {px−1 (t) − px (t)} δt + o (δt) .
STAT272 2015 Topic 2 34

Also,
p0 (t + δt) = P {0 events in (0, t) and none in [t, t + δt)}

= p0 (t) {1 − λδt + o (δt)}
= p0 (t) − λp0 (t) δt + o (δt) .
Thus, for x = 0, 1, . . .

o(δt)
px (t + δt) − px (t)  λ {px−1 (t) − px (t)} + δt ; x = 1, 2, . . .
=
δt  −λp0 (t) + o(δt) ; x=0
δt
and so

 λ {p
d x−1 (t) − px (t)} ; x = 1, 2, . . .
px (t) =
dt  −λp0 (t) ; x=0
STAT272 2015 Topic 2 35

The unique solution to these differential equations for which
∞
X
px (t) = 1, ∀t
x=0
can be shown to be
x
(λt) e−λt
px (t) = .
x!
You can derive this sequentially, or just verify the solution by
induction on x.
STAT272 2015 Topic 2 36

A random variable Y with probability function

 e−θ θy ; y = 0, 1, . . .
y!
fY (y) =
 0 ; otherwise
is said to have the Poisson distribution with parameter θ. A Poisson

process {X (t)} with parameter λ therefore has the property that for
each t the rv X (t) has the Poisson distribution with parameter λt.
The following conditions must be met for a process to be a Poisson
process:
1. the arrival rate must be constant;
2. the times between events must be independent;
3. it must not be possible for two or more events to occur at the
same time.
STAT272 2015 Topic 2 37

Consider some practical situations. Could they be modelled using
Poisson processes?
1. The number of accidents in the Macquarie University car parks;
2. The number of people arriving at a bank for service.
Comments on these situations:

• Non-constant rate depending on time of day/day of week/week of
year/weather conditions. Non-independence – a driver could
smash into an accident which has already happened. Two cars
can collide simultaneously.
• Non-independence of events – people in pairs or arriving on the
same bus. Non-constant arrival rate λ due to lunch-hour etc.
STAT272 2015 Topic 2 38

The Multinomial Distribution
• This is the natural extension of the binomial distribution –
instead of there being two mutually exclusive and exhaustive
outcomes at each trial, there are k.
• The Multinomial distribution has k parameters: n, the number
of trials, and p1 , . . . , pk−1 , the probabilities of each of the first
(k − 1) outcomes.
• Note that given the (k − 1) probabilities p1 , . . . , pk−1 , the kth
probability pk is obtained from
k−1
X
pk = 1 − pi .
i=1
Pk
• Definition: If pi > 0 for i = 1, . . . , k and i=1 pi = 1, the random
variables Y1 , . . . , Yk are distributed multinomially with
STAT272 2015 Topic 2 39

parameters (n, p1 , . . . , pk−1 ) if
n!
P (Y1 = y1 , . . . , Yk = yk ) = py11 · · · pykk
y1 ! · · · yk !
k
Y
n
= pyi i
y1 y2 ··· yk i=1
| {z }
multinomial coefficient
k
P
where yi = n and 0 ≤ yi ≤ n, ∀i.
i=1
STAT272 2015 Topic 2 40

STAT272 Topic2 2015

Uploaded by

Copyright:

Available Formats

You might also like

STAT272 Topic2 2015

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT272 Topic2 2015

Uploaded by

Copyright:

Available Formats

STAT272 Probability

STAT272 2015 Topic 2 1

• The function must satisfy other conditions as well.

STAT272 2015 Topic 2 2

• Note that in the above we do not need to specify Ω.

STAT272 2015 Topic 2 3

STAT272 2015 Topic 2 4

can be the probability function of some rv (random variable);

STAT272 2015 Topic 2 5

converges if α > 1, and diverges if α ≤ 1: A zeta probability

STAT272 2015 Topic 2 6

• Consider an experiment in which there are only two possible

STAT272 2015 Topic 2 7

p is called the “parameter” of the distribution. Some texts use π

• fX (x) pretty obviously satisfies the conditions for a pf if

STAT272 2015 Topic 2 8

• A binomial random variable is the sum of n independent

STAT272 2015 Topic 2 9

• We should probably check that this is a pf:

STAT272 2015 Topic 2 10

The Geometric Distribution

• Let X be the number of trials up to and including the first

STAT272 2015 Topic 2 11

STAT272 2015 Topic 2 12

Negative Binomial Distribution

• Let X be the number of trials up to and including the kth

STAT272 2015 Topic 2 13

Newton’s generalized binomial theorem:

STAT272 2015 Topic 2 14

p−k = (1 − q)−k where q = 1 − p

STAT272 2015 Topic 2 15

Consider the numerator (y + k − 1) · · · k. There are

STAT272 2015 Topic 2 16

= pk · p−k from (2)

STAT272 2015 Topic 2 17

• It is better to consider an example first: a box containing N

STAT272 2015 Topic 2 18

STAT272 2015 Topic 2 19

STAT272 2015 Topic 2 20

STAT272 2015 Topic 2 21

• “Poisson” is French for fish, but the distribution is named after

STAT272 2015 Topic 2 22

Suppose X ∼ Bin(n, p) . Thus, for x = 0, 1, . . . , n

STAT272 2015 Topic 2 23

Recall: Taylor series expansion of log(1 + x):

STAT272 2015 Topic 2 24

STAT272 2015 Topic 2 25

STAT272 2015 Topic 2 26

+ x log θ − x log n − (n − x) θ/n − (n − x) θ2 /2n2 − · · ·

STAT272 2015 Topic 2 27

g (x) → − log (x!) − θ + x log θ

STAT272 2015 Topic 2 28

STAT272 2015 Topic 2 29

(do not get mixed up here – it is very easy to do so). A Poisson

STAT272 2015 Topic 2 30

STAT272 2015 Topic 2 31

in the limit as δt → 0, and