STAT272 Topic2 2015

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

STAT272 Probability

Topic 2
Discrete Distributions and the Poisson Process

STAT272 2015 Topic 2 1


Random Variables
• Technical definition: a random variable is a function whose
domain is the sample space and whose range is the real line. i.e.
X is a random variable if

X:Ω→R

• The function must satisfy other conditions as well.

STAT272 2015 Topic 2 2


• e.g. Let X be the number of heads in four tosses of a coin. The
sample space Ω is the collection of all outcomes {O1 , O2 , O3 , O4 } ,
where Oi is one of H or T. X is then 0, 1, 2, 3 or 4, depending on
how many of the Oi ’s equal H.
• Mostly the sample space is suppressed and we just think of X as
taking on certain values with certain probabilities.
• e.g. We say that X is a Poisson random variable if for some
λ > 0,
e−λ λx
P (X = x) = x! ; x = 0, 1, 2, . . .

• Note that in the above we do not need to specify Ω.

STAT272 2015 Topic 2 3


Probability Distributions
• There are two main types of random variables: discrete and
continuous.
• A random variable X is said to be discrete if it can assume only
a finite (or countably infinite) number of distinct values. A set is
countably infinite if its elements can be put into 1:1
correspondence with the positive integers (which are, of course,
infinite). The rational numbers are countably infinite, while the
reals are not.
• Notation: P (X = x) = fX (x) . We call fX the probability
function (of X). It must satisfy the following conditions, in order
that the axioms of probability are satisfied
1. fX (x) ≥ 0, ∀x;
P
2. x fX (x) = 1.

STAT272 2015 Topic 2 4


• Note that we use X for a random variable and x for a value of X.
You will be penalised in assignments and the exam if you do not
distinguish between upper and lower case. Thus Y will be a
random variable and y a value of Y.
• Examples:
1. The function g for which



 0.3 ; x=1


 0.6 ; x=2
g (x) =


 0.1 ; x=3


 0 ; otherwise

can be the probability function of some rv (random variable);

STAT272 2015 Topic 2 5


2. The function g (x) for which

 k ; x = 1, 2, 3, . . .
x
g (x) =
 0 ; otherwise
P∞ 1
does not represent a probability function as x=1 x = ∞.
3. Probabilities based on the zeta function, where

X 1
ζ (α) = α
x=1
x

converges if α > 1, and diverges if α ≤ 1: A zeta probability


function is given by

 k ; x = 1, 2, 3, . . .
fX (x) = xα
 0 ; otherwise

STAT272 2015 Topic 2 6


where α > 1 and k = 1/ζ (α).

Discrete Distributions
The Bernoulli distribution

• Consider an experiment in which there are only two possible


outcomes: ‘success’ and ‘failure’ (e.g. tossing a coin and
‘success’≡‘head’).
• We may construct a rv X by assigning the value 0 to ‘failure’
and 1 to ‘success’. The reason for doing this will become
apparent later on.
• We call such a rv X a Bernoulli rv and such an experiment a
Bernoulli trial.
• X has the probability function (pf), letting p be the probability

STAT272 2015 Topic 2 7


of ‘success’,

 1−p ; x=0


fX (x) = p ; x=1



0 ; otherwise.

p is called the “parameter” of the distribution. Some texts use π


instead of p.
• We can also write the pf as

 (1 − p)1−x px ; x = 0, 1
fX (x) =
 0 ; otherwise

• fX (x) pretty obviously satisfies the conditions for a pf if


0 ≤ p ≤ 1.

STAT272 2015 Topic 2 8


The Binomial Distribution

• A binomial random variable is the sum of n independent


Bernoulli random variables (i.e. a binomial experiment consists
of n independent Bernoulli trials).
• Thus a binomial random variable is the number of 1’s i.e. the
number of successes in n Bernoulli trials.
• Let X be distributed binomially, with parameters n and p (n is
the number of trials and p is the probability of ‘success’). We
write X ∼ Bin(n, p) . What is the probability function of X?
• The event X = x is the event that there are x successes and
(n − x) failures in the n trials.
• The successes can occur in any subset of length x in the n trials.
n

There are x such different subsets (combinations), as there are
x places out of n in which we can slot the successes – the

STAT272 2015 Topic 2 9


remaining (n − x) places will contain the failures.
• Each such combination has the same probability (where there are
x p’s and (n − x) (1 − p)’s below)
n−x
p × p × . . . × p × (1 − p) × (1 − p) × . . . × (1 − p) = px (1 − p) .

• Thus
 
 n
x (1 − p)n−x px ; x = 0, 1, . . . , n
fX (x) =
 0 ; otherwise.

• We should probably check that this is a pf:


1. fX (x) ≥ 0, ∀x;

STAT272 2015 Topic 2 10


2. By the binomial theorem
n  
X n x n−x
p (1 − p)
x=0
x
= {p + (1 − p)}n
= 1.

The Geometric Distribution

• Let X be the number of trials up to and including the first


success in a sequence of independent Bernoulli trials with
constant success probability of p.
• We say that X is geometrically distributed with parameter p.
• What is the pf of X? The event X = x (x = 1, 2, . . .) occurs if
the first (x − 1) trials result in failure and the xth is a success.

STAT272 2015 Topic 2 11


The probability that this occurs is, where there are (x − 1)
(1 − p) terms below,
x−1
(1 − p) × · · · × (1 − p) × p = (1 − p) p

• Thus 
 (1 − p)x−1 p ; x = 1, 2, . . .
fX (x) =
 0 ; otherwise.
• Checks:
1. fX (x) ≥ 0, ∀x;
2. As long as 0 < p < 1,

X x−1 1
p (1 − p) =p
x=1
1 − (1 − p)
p
= = 1.
p

STAT272 2015 Topic 2 12


The formula also holds if p = 1, but not if p = 0, obviously.
• Notation: Some books use q = 1 − p.

Negative Binomial Distribution

• Let X be the number of trials up to and including the kth


success in a sequence of independent Bernoulli trials with
constant success probability p.
• We say that X has the negative binomial distribution with
parameters k and p. (The reason for ‘negative binomial’ will
become apparent later.)
• What is the pf of X? The event X = x occurs when there are
exactly (k − 1) successes in the first (x − 1) trials, followed by a
success in the xth trial. Each combination has the same

STAT272 2015 Topic 2 13


probability. Hence, for x = k, k + 1, . . .
 
x − 1 k−1
fX (x) = p (1 − p)x−1−(k−1) p
k−1
 
x−1 k x−k
= p (1 − p)
k−1
and fX (x) = 0 otherwise.
• Checks:
1. fX (x) ≥ 0, ∀x;
P∞
2. Before evaluating x=k fX (x), we need

Newton’s generalized binomial theorem:


∞  
X r r−i i
(a + b)r = a b (1)
i=0
i

STAT272 2015 Topic 2 14


where
 
r r(r − 1) · · · (r − i + 1)
= for r any real number
i i!

So we can write

p−k = (1 − q)−k where q = 1 − p


∞  
X −k −k−y
= 1 (−q)y [(1) with r = −k, a = 1, b = −q]
y=0
y
∞  
X −k
= (−1)y q y
y=0
y
∞  
X −k
= (−1)y (1 − p)y . (2)
y=0
y

STAT272 2015 Topic 2 15


As long as 0 < p ≤ 1,
∞ ∞  
X X x−1 k
fX (x) = p (1 − p)x−k
k−1
x=k x=k
∞  
X y+k−1
=p k
(1 − p)y
y=0
k−1

X (y + k − 1)! y
= pk (1 − p)
y=0
(k − 1)!y!

X (y + k − 1) · · · k y
= pk (1 − p)
y=0
y!

Consider the numerator (y + k − 1) · · · k. There are


y + k − 1 − (k − 1) = y terms in the product. If we multiply
each term by −1, multiplying by (−1)y preserves the identity.

STAT272 2015 Topic 2 16


So
(−1)y (−k) (−k − 1) · · · (−k − y + 1)

X ∞
X y
fX (x) = pk (1 − p)
y=0
y!
x=k
∞  
X −k
= pk (−1)y (1 − p)y
y=0
y

= pk · p−k from (2)


=1

We have thus seen the reason for calling the distribution the
negative binomial distribution: the probabilities involve
coefficients in binomial expansions with negative index.
• Note: The negative binomial distribution is the generalisation of
the geometric distribution i.e. the geometric is the same as the
negative binomial with k = 1.

STAT272 2015 Topic 2 17


The Hypergeometric Distribution

• It is better to consider an example first: a box containing N


balls, has k white ones and (N − k) black. Draw n balls at
random without replacement. Let X be the number of white
balls drawn.
N

• What is the pf of X? Firstly, there are n different
combinations of n balls from N . The event X = x occurs when x
of the n balls are white and therefore n − x are black. The

k N −k

number of combinations with x white balls is therefore x n−x ,
since we must choose x white balls from K and (n − x) black
balls from (N − k). Hence,

k N −k

x n−x
fX (x) = N
 .
n

STAT272 2015 Topic 2 18


• This is not totally correct – we must specify the vales of x for
which this is correct. We must have 0 ≤ x ≤ k. We must also
have 0 ≤ n − x ≤ N − k, or

n − N + k ≤ x ≤ n.

Thus
   

 k N − k


 x n−x


   ; max (0, n − N + k) ≤ x ≤ min (k, n)
 N
fX (x) = n








 0 ; otherwise

STAT272 2015 Topic 2 19


• Checks:
1. fX (x) ≥ 0, ∀x;
P
2. x fX (x) = 1
(not obvious arithmetically).
• Example: From a group of 20 graduate Actuaries, 9 are selected
randomly for employment. Assuming random selection, what is
the probability that the 9 selected include all the 5 best
Actuaries in the group of 20? Here

N = 20, k = 5, N − k = 15, n = 9.

STAT272 2015 Topic 2 20


• Let X be the number of best Actuaries chosen. Then
5
 15 
5 9−5
P (X = 5) = 20

9
15! 9!11!
=1× ×
4!11! 20!
21
=
2584
≃ 0.008127 .

STAT272 2015 Topic 2 21


The Poisson Distribution

• “Poisson” is French for fish, but the distribution is named after


the famous 19th century French mathematician Siméon-Denis
Poisson.
• The Poisson is a discrete distribution defined on the non-negative
integers 0, 1, 2, . . ., and is used as a model for counts occurring in
a fixed time period or space, e.g.
– number of accidents at the Herring Road-Waterloo Road
intersection in a week
– number of sms messages arriving on your mobile phone in a
day
– number of claims on an insurance policy in a year
– number of sharks near Bondi Beach at a particular time

STAT272 2015 Topic 2 22


• The Poisson distribution can be derived in many ways. We shall
look at two.

1. Binomial limit

Suppose X ∼ Bin(n, p) . Thus, for x = 0, 1, . . . , n


 
n x n−x
fX (x) = p (1 − p) .
x
Suppose that n is large, and p is small, but that np is also moderate
e.g. ≤ 10. (We shall look at the case where np is large later.) Let
θ = np and consider
g (x) = log fX (x) .
Then
   
θ θ
g (x) = log (n!)−log (x!)−log {(n − x)!}+x log +(n − x) log 1 − .
n n

STAT272 2015 Topic 2 23


Recall: Stirling’s formula gives an approximation for log(n!)
 
1 1 a (n)
log (n!) = n + log n − n + log (2π) + 0 < a (n) < 1 .
2 2 12n
(3)
Thus, we also have, for x fixed as n → ∞,
 
1 1 a (n − x)
log {(n − x)!} = n − x + log (n − x)−(n − x)+ log (2π)+ .
2 2 12 (n − x)
(4)

Recall: Taylor series expansion of log(1 + x):


x2 x3 x4
log(1 + x) = x − + − + ··· −1<x≤1
2 3 4

STAT272 2015 Topic 2 24


which gives
x2 x3 x4
log(1 − x) = −x − − − − ··· −1≤x<1 .
2 3 4
Now, for fixed θ, as n → ∞
 
θ θ θ2
log 1 − = − − 2 − ··· .
n n 2n
and hence
 
θ
(n − x) log 1 − = −(n − x)θ/n − (n − x)θ 2 /2n2 − . . . (5)
n

STAT272 2015 Topic 2 25


and for fixed x as n → ∞
  x 
log (n − x) = log n 1 −
n
 x
= log n + log 1 −
n
x x2
= log n − − 2 − · · · .
n 2n
So we can write the first term of (4) as
    2

1 1 x x
n−x+ log (n − x) = n − x + log n − − 2 − · · ·
2 2 n 2n
    2

1 1 x x
= n−x+ log n − n − x + + 2 + ···
2 2 n 2n
(6)

STAT272 2015 Topic 2 26


Thus
g (x)
 
1 1 a (n)
= n+ log n − n + log(2π) + − log (x!)
2 2 12n
| {z }
(3)
 
    2
 
 1 1 x x 1 a(n − x) 
− n−x+ log n − n − x + + 2 + · · · − (n − x) + log (2π) + 
 2 2 n 2n 2 12(n − x) 
| {z }
(6)

+ x log θ − x log n − (n − x) θ/n − (n − x) θ2 /2n2 − · · ·


| {z }
(5)
 
1 1
= n + − n + x − − x log n − n + x + n − x − log (x!) − θ + x log θ,
2 2

STAT272 2015 Topic 2 27


plus other terms which converge to 0 as n → ∞. Hence, as n → ∞

g (x) → − log (x!) − θ + x log θ

and so
e−θ θ x
fX (x) → .
x!
This function is a pf since
∞ ∞
X e−θ θ x X θx
= e−θ
x=0
x! x=0
x!
= e−θ eθ
= e−θ+θ
= e0
=1.

STAT272 2015 Topic 2 28


Hence we have proved that the limiting form of the Binomial
distribution as n → ∞ and np = θ stays constant, is
e−θ θ x
fX (x) = x = 0, 1, 2, . . . .
x!
This is the Poisson distribution.

STAT272 2015 Topic 2 29


2. Poisson Process

(do not get mixed up here – it is very easy to do so). A Poisson


process is not a rv and is not a distribution. Firstly, a process, or
more correctly a stochastic process, {X(t)}, is a sequence of random
variables for which each element X(t) is a rv. Note that we use {} to
denote the process, or collection of random variables making up the
process. Changes in the index t are often thought of as changes in
time. The Poisson process is a model for the number of ‘events’
which occur. If {X(t)} is a Poisson process, then the rv X(t) is the
number of events which have occurred in the time interval (0, t).

STAT272 2015 Topic 2 30


As t increases, therefore, so will X(t). Now, each X(t) is a rv and has
a probability function. An example is where X(t) is the number of
α-particles emitted in the interval (0, t) . Together, the sequence
{X(t)} is a (stochastic) process, but at some time t, X(t) represents
the number of α-particles which have been emitted in (0, t).

“Little-oh” notation:

• We say that
g (δt) = o (δt)
g(δt)
if δt goes to 0 as δt goes to 0.
• It helps to think of o (δt) as something which is much smaller in
magnitude than δt (e.g. (δt)2 ) when |δt| is small.

STAT272 2015 Topic 2 31


Consider the following. Suppose events occur randomly in time,
subject to the conditions that
• times between events are independent of each other,
• the probability of one event occurring in (t, t + δt) is

λδt + o (δt)

in the limit as δt → 0, and


• the probability of more than one event occurring in (t, t + δt) is
o (δt).
Let X (t) be the number of events which have occurred in the interval
(0, t) . Then we say that {X(t)} is a Poisson process. We wish to
calculate the pf of the rv X(t).

STAT272 2015 Topic 2 32


For x = 0, 1, . . ., put

fX(t) (x) = P (X(t) = x) = px (t) .

Now, px (t + δt) is the probability that x events have occurred in the


interval (0, t + δt). It may be that
• x events occurred in (0, t) and none in [t, t + δt),
• or that x − 1 events occurred in (0, t) and 1 in [t, t + δt)
• or that x − 2 events occurred in (0, t) and 2 in [t, t + δt) , etc.

STAT272 2015 Topic 2 33


However, the probability that 2 events occur in [t, t + δt) is o (δt).
We thus have for x ≥ 1:

px (t + δt) = P (x events in (0, t) and none in [t, t + δt))


+ P (x − 1 events in (0, t) and 1 in [t, t + δt))
+ o (δt)
= px (t) P (no events in [t, t + δt))
+ px−1 (t) P (1 event in [t, t + δt)) + o (δt)
= px (t) {1 − λδt + o (δt)}
+ px−1 (t) {λδt + o (δt)} + o (δt)
= px (t) + λ {px−1 (t) − px (t)} δt + o (δt) .

STAT272 2015 Topic 2 34


Also,

p0 (t + δt) = P {0 events in (0, t) and none in [t, t + δt)}


= p0 (t) {1 − λδt + o (δt)}
= p0 (t) − λp0 (t) δt + o (δt) .

Thus, for x = 0, 1, . . .

o(δt)
px (t + δt) − px (t)  λ {px−1 (t) − px (t)} + δt ; x = 1, 2, . . .
=
δt  −λp0 (t) + o(δt) ; x=0
δt

and so

 λ {p
d x−1 (t) − px (t)} ; x = 1, 2, . . .
px (t) =
dt  −λp0 (t) ; x=0

STAT272 2015 Topic 2 35


The unique solution to these differential equations for which

X
px (t) = 1, ∀t
x=0

can be shown to be
x
(λt) e−λt
px (t) = .
x!
You can derive this sequentially, or just verify the solution by
induction on x.

STAT272 2015 Topic 2 36


A random variable Y with probability function

 e−θ θy ; y = 0, 1, . . .
y!
fY (y) =
 0 ; otherwise

is said to have the Poisson distribution with parameter θ. A Poisson


process {X (t)} with parameter λ therefore has the property that for
each t the rv X (t) has the Poisson distribution with parameter λt.
The following conditions must be met for a process to be a Poisson
process:
1. the arrival rate must be constant;
2. the times between events must be independent;
3. it must not be possible for two or more events to occur at the
same time.

STAT272 2015 Topic 2 37


Consider some practical situations. Could they be modelled using
Poisson processes?
1. The number of accidents in the Macquarie University car parks;
2. The number of people arriving at a bank for service.

Comments on these situations:


• Non-constant rate depending on time of day/day of week/week of
year/weather conditions. Non-independence – a driver could
smash into an accident which has already happened. Two cars
can collide simultaneously.
• Non-independence of events – people in pairs or arriving on the
same bus. Non-constant arrival rate λ due to lunch-hour etc.

STAT272 2015 Topic 2 38


The Multinomial Distribution
• This is the natural extension of the binomial distribution –
instead of there being two mutually exclusive and exhaustive
outcomes at each trial, there are k.
• The Multinomial distribution has k parameters: n, the number
of trials, and p1 , . . . , pk−1 , the probabilities of each of the first
(k − 1) outcomes.
• Note that given the (k − 1) probabilities p1 , . . . , pk−1 , the kth
probability pk is obtained from
k−1
X
pk = 1 − pi .
i=1
Pk
• Definition: If pi > 0 for i = 1, . . . , k and i=1 pi = 1, the random
variables Y1 , . . . , Yk are distributed multinomially with

STAT272 2015 Topic 2 39


parameters (n, p1 , . . . , pk−1 ) if
n!
P (Y1 = y1 , . . . , Yk = yk ) = py11 · · · pykk
y1 ! · · · yk !
 k
Y
n
= pyi i
y1 y2 ··· yk i=1
| {z }
multinomial coefficient

k
P
where yi = n and 0 ≤ yi ≤ n, ∀i.
i=1

STAT272 2015 Topic 2 40

You might also like