Probability & Statistics 2: AS2110 / MA3666

Probability & Statistics 2
AS2110 / MA3666
Dr. Russell Gerrard
The Business School

City, University of London
Autumn Term 2021/2022
1 / 253
Outline
1 DISCRETE DISTRIBUTIONS
2 CONTINUOUS DISTRIBUTIONS
3 MOMENTS
4 GENERATING FUNCTIONS
5 INDEPENDENT RANDOM VARIABLES
6 BIVARIATE DISTRIBUTIONS
7 CONDITIONAL DISTRIBUTIONS
8 COVARIANCE AND CORRELATION
9 COMPOUND RANDOM VARIABLES
2 / 253
Section 1
DISCRETE DISTRIBUTIONS
3 / 253
Accidents happen
4 / 253
Prussian horsekick death data
• Famous data set published in 1898 by Ladislaus Josephovich

Bortkiewicz
• Deaths from horse kicks, by year and by cavalry corps.
• n = 280 observations
5 / 253
Prussian horsekick death data
0.5
0.4
Relative frequency
0.3
0.2
0.1
0.0
0 1 2 3 4
nr of deaths
Bortkiewicz’s task was to come up with a distribution which could

provide a good model for the horsekick observations. The one he
came up with is the one we now call the Poisson distribution.
6 / 253
1.1 Examples of discrete quantities
• Number of horsekick deaths

• number of claims on an insurance policy,
• number of ships arriving in a port
• number of students attending a class face-to-face,
• number of lottery tickets you buy before winning something
7 / 253
1.2 Things to remember
The likely behaviour of a discrete random variable X is determined by

its probability function pX and (cumulative) distribution
Xfunction FX :
pX (x) = P(X = x), FX (x) = P(X ≤ x) = p(y).
all y≤x
The expectation ofX

X and of a function of X is X
E[X] = xpX (x), E[h(X)] = h(x)pX (x)
all x all x
The variance of X is h i
Var[X] = E[X 2 ] − (E[X])2 = E (X − E[X])2 ≥ 0.
8 / 253
1.3 Discrete Modelling Distributions
Although there are infinitely many probability functions, the ones

most often used for practical purposes fall into a number of families.
We will be looking at:
• the Bernoulli distribution
• the binomial distribution
• the discrete uniform distribution
• the geometric distribution
• the negative binomial distribution
• the hypergeometric distribution
• the Poisson distribution
9 / 253
1.3.1 Bernoulli Distribution
A r.v. X has the Bernoulli distribution with parameter θ if X can only

take the values 0 or 1, with
p(1) = θ, p(0) = 1 − θ.
Used for modelling yes/no situations:

• has the policyholder made a claim?
• has the coin come up Heads?
• is it sunny today?
E(X) = θ, Var(X) = θ(1 − θ)
10 / 253
1.3.2 Binomial Distribution
We carry out
• a fixed number, k,
• of independent and identical Bernoulli trials
• each with ‘success’ probability θ
and count the total number of successes, X.
Then X has a binomial distribution with parameters k and θ.
The probability function
is
k x
p(x) = θ (1 − θ)k−x , (x = 0, 1, . . . , k).
x
For this distribution,

E(X) = kθ, Var(X) = kθ(1 − θ)
11 / 253
Examples of binomial
Examples of use:
• Each applicant for a job has a 20% chance of being invited for
interview. If there are 10 applicants, how many are invited for
interview?
• Each passenger booked on a 400-seat plane has 5% chance of not
turning up. What is the probability of at least 10 empty seats?
• An exam has 20 multiple-choice questions, each with 5 answers.
If I just guess each answer at random, how likely am I to pass the
exam?
12 / 253
1.3.3 Discrete Uniform Distribution
A r.v. X has the discrete uniform distribution over the range

{a, . . . , b} if every value from a to b is equally likely, i.e. if
1
p(x) = for a ≤ x ≤ b.
b−a+1
Examples of use:
• The roll of a die
• the spin of a roulette wheel
• the number of the winning lottery ticket
E(X) = 21 (b + a), Var(X) = 1

12 (b − a)(b − a + 2).
13 / 253
1.3.4 Geometric Distribution
We carry out
• a variable number, X,
• of independent and identical Bernoulli trials,
• each with ‘success’ probability θ,
• stopping as soon as a success is observed.
Then X has a geometric distribution with parameter θ.
p(x) = θ(1 − θ)x−1 , (x = 1, 2, . . .).
Examples of use:
• I am determined to buy a £2 lottery ticket every week until I win
a prize. How much do I end up spending?
• I mark a pile of exam scripts. How many do I have to get through
before I find one which has scored full marks on Question 1?
1 1−θ
E(X) = , Var(X) = .
θ θ2
14 / 253
Type II Geometric Distribution
There is also a type II geometric distribution: instead of counting the

total number of trials until the first success, this counts the total
number of failures before the first success. For the type II geometric
we have
p(x) = θ(1 − θ)x , (x = 0, 1, . . .),
1 1−θ
E(X) = − 1, Var(X) =
θ θ2
If X is a standard geometric r.v. then X − 1 is a type II geometric r.v.
15 / 253
1.3.5 Negative Binomial Distribution
Again we carry out

• a variable number, X,
• of independent and identical Bernoulli trials,
• each with ‘success’ probability θ,
but this time
• we stop as soon as k successes have been observed.
Then X, the number of trials required until we stop, has a negative
binomial distribution with parameters k and θ.
We are in effect adding together k independent geometric random
variables in order to obtain X.

x−1 k
p(x) = θ (1 − θ)x−k , (x = k, k + 1, . . .).
k−1
k k(1 − θ)
E(X) = , Var(X) =
θ θ2
16 / 253
Negative binomial examples
Examples of use:
• An insurance company wants to analyse the contracts of 20
policyholders who have not made a claim for 2 years. How many
policies do they have to sort through until they get their sample?
17 / 253
Type II Negative binomial
As with the geometric, there is also a type II negative binomial

distribution, which counts the total number of failures before the kth
success.
For the type II negative
binomial
we have
x+k−1 k
p(x) = θ (1 − θ)x , (x = 0, 1, . . .),
k− 1
1 1−θ
E(X) = k − 1 , Var(X) = k .
θ θ2
18 / 253
1.3.6 Hypergeometric Distribution
A bag contains N objects, of which K are considered "successes".

Suppose that n objects are taken out of the bag and not replaced; X is
the number of "successes" in this sample.
If the objects were replaced each time, X would just be a binomial
random variable with parameters n and K/N.
The probability
function is
K N−K
x n−x
p(x) = , (max{0, n − N + K} ≤ x ≤ min{K, n})
N
n
Examples of use:
• An owner of a fishing lake adds 100 carp to the 600 fish already
in the lake. On the next day 25 fish are caught. How many of
them are the new carp?
19 / 253
Hypergeometric mean and variance
The expectation and variance are given by

N−n
E[X] = nθ, Var[X] = nθ(1 − θ) × .
N−1
where we have written θ = K/N to assist comparison with the
binomial.
Because items are not replaced, it is less likely that X will be observed
to take very large or very small values, and this results in a reduced
variance compared with the binomial.
20 / 253
1.3.7 Poisson Processes
Suppose that
• events can occur at any time (time being a continuous quantity),
• the probability of an event in time interval (t, t + dt) is λ dt,
• independently of anything that has happened before time t.
• Here λ is assumed to be a constant, the mean number of events
per unit time.
Examples might include buses arriving at a bus stop or SMS messages
being received on a mobile phone or goals being scored in a football
match.
If we define N(t) to be the number of events which have occurred by
time t, then N(t) is called a Poisson process.
21 / 253
Trajectory of a Poisson Process
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
22 / 253
1.3.8 The Poisson distribution
If we observe a Poisson process for time T, the total number of

occurrences during this time has a Poisson distribution with parameter
µ = λT.
µx
p(x) = e−µ , (x = 0, 1, 2, . . .).
x!
Examples of use:
• If Harlequins score an average of 3 tries per game, what is the
probability that they score 2 tries in the first half of their next
game?
• An Accident and Emergency department in a hospital expects an
average of one patient every 5 minutes. What is the probability
that 10 minutes pass without the arrival of a new patient?
• An insurance company normally gets 100 flood damage claims a
month, but suddenly receives 10 claims in one day. How likely is
this?
The expectation and variance are given by
E(X) = µ, Var(X) = µ
23 / 253
A Poisson model for horsekick death data
0.5
Poisson(0.7)
0.4
Relative frequency
0.3
0.2
0.1
0.0
0 1 2 3 4
nr of deaths
24 / 253
1.3.9 Poisson approximation to binomial
Both Poisson and binomial distributions arise from counting the

number of events which have taken place in a fixed interval of time.
The difference is that
• time is discrete for the binomial,
• time is continuous for the Poisson distribution.
If a large number, k, of Bernoulli trials is carried out in the Binomial
model and each has a small success probability θ, the resulting
probability function is similar to that of a Poisson distribution with
mean µ = kθ.
The approximation is acceptable if k ≥ 20 and θ ≤ 0.05.
25 / 253
1.4 Simulation of discrete random variables
All simulation activities start from the assumption that you have a
supply of independent pseudo-random variables uniformly distributed
on (0, 1).
The most efficient way to generate a set of observations from a
distribution with known probability function p(x) is
P
• to calculate the distribution function F(x) = y≤x p(y)
• to generate a sequence U1 , U2 , . . . , Un of U(0, 1) random
variables
• and to use the formula
Xi = smallest value of x such that F(x) ≥ Ui .
When you need to simulate random variables as part of your

coursework, you will find that R has appropriate functions built in.
26 / 253
1.5 Sums of independent random variables
We shall see later in this module that a primary focus of mathematical

statistics is on the distribution of sample statistics such as the sample
mean, sample proportions and so on.
In chapter 5 we will encounter the idea of a generating function,
which will enable us to work out the distribution of a sample mean
given the distribution of the original observations – but only in certain
cases. In the present section we will consider the distribution of a sum
of random variables from the binomial, Poisson or geometric
distributions.
27 / 253
1.5.1 The concept of i.i.d. random variables
An important statistical idea is that a data set consists of a collection

of observed values, all taken from the same distribution and all
independent of each other. In that case we call the observations
independent and identically distributed, or i.i.d.
28 / 253
1.5.2 Sum of i.i.d. Binomial variables
Suppose X1 , . . . , Xn are i.i.d. Binomial(k, θ) random variables.

Then
• X1 is the number of successes when k independent trials are
carried out, each with success probability θ.
• The same applies to X2 and to X3 , and so on.
Then ni=1 Xi is the total number of successes when nk independent
P
trials are carried out.
We deduce that
Xn
Xi ∼ Bin(nk, θ).
i=1
29 / 253
1.5.3 Sum of i.i.d. Geometric variables
Suppose X1 , . . . , Xn are i.i.d. Geometric(θ) random variables. We

interpret this in the context of Bernoulli trials.
• X1 is the number of trials required to achieve the first success;
• if we restart the sequence of trials immediately, X2 is the number
of trials required to achieve the next success;
• adding them together, X1 + X2 is the number of trials required to
achieve 2 successes.
This can be continued indefinitely: ni=1 Xi is the number of trials
P
required to achieve n successes. Therefore
Xn
Xi ∼ NegBin(n, θ).
i=1
30 / 253
1.5.4 Sum of i.i.d. Poisson variables
Suppose X1 , . . . , Xn are i.i.d. Poisson(λ) random variables.

We can suppose that a rate λ Poisson process is running and that
• X1 is the number of events which take place in the time interval
(0, 1],
• X2 the number in the interval (1, 2], . . .,
• Xn the number of events in (n − 1, n].
If we add all the Xi together we are just counting the total number of
events which took place between time 0 and time n. We therefore have
Xn
Xi ∼ Poisson(nλ).
i=1
31 / 253
1.6 Summary
In this chapter we have defined the concept of a discrete random

variable and have looked at several discrete distributions: the discrete
uniform, the Bernoulli, the Binomial, the Poisson, the Geometric, the
negative binomial and the hypergeometric.
We have also considered the distribution of a sum of independent,
identically distributed random variables taken from standard discrete
distributions.
32 / 253

Probability & Statistics 2: AS2110 / MA3666

Uploaded by

Copyright:

Available Formats

You might also like

Probability & Statistics 2: AS2110 / MA3666

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability & Statistics 2: AS2110 / MA3666

Uploaded by

Copyright:

Available Formats

Probability & Statistics 2

Dr. Russell Gerrard

The Business School

Autumn Term 2021/2022

• Famous data set published in 1898 by Ladislaus Josephovich

• Deaths from horse kicks, by year and by cavalry corps.

Bortkiewicz’s task was to come up with a distribution which could

• Number of horsekick deaths

The likely behaviour of a discrete random variable X is determined by

The expectation ofX

Although there are infinitely many probability functions, the ones

A r.v. X has the Bernoulli distribution with parameter θ if X can only

Used for modelling yes/no situations:

E(X) = θ, Var(X) = θ(1 − θ)

For this distribution,

A r.v. X has the discrete uniform distribution over the range

E(X) = 21 (b + a), Var(X) = 1

There is also a type II geometric distribution: instead of counting the

Again we carry out

As with the geometric, there is also a type II negative binomial

A bag contains N objects, of which K are considered "successes".

The expectation and variance are given by

If we observe a Poisson process for time T, the total number of

Both Poisson and binomial distributions arise from counting the

When you need to simulate random variables as part of your

We shall see later in this module that a primary focus of mathematical

An important statistical idea is that a data set consists of a collection

Suppose X1 , . . . , Xn are i.i.d. Binomial(k, θ) random variables.

Suppose X1 , . . . , Xn are i.i.d. Geometric(θ) random variables. We

Suppose X1 , . . . , Xn are i.i.d. Poisson(λ) random variables.

In this chapter we have defined the concept of a discrete random

You might also like