Professional Documents
Culture Documents
L5 Notes
L5 Notes
Jonathan Marchini
November 10, 2008
Introduction
Figure 2: Histogram of birth times per hour of the babies in the Babyboom dataset
P(X = x) = e
x
x!
x = 0, 1, 2, 3, 4, . . .
2.1
Examples
Births in a hospital occur randomly at an average rate of 1.8 births per hour.
What is the probability of observing 4 births in a given hour at the hospital?
Let X = No. of births in a given hour
(i) Events occur randomly
X Po(1.8)
(ii) Mean rate = 1.8
We can now use the formula to calculate the probability of observing exactly 4
births in a given hour
4
= 0.0723
P (X = 4) = e1.8 1.8
4!
What about the probability of observing more than or equal to 2 births in a given
hour at the hospital?
We want P (X 2) = P (X = 2) + P (X = 3) + . . .
par(mfrow = c(1, 3)) plot(0:20, dpois(0:20, 3), type = h, ylim = c(0, 0.25),
xlab = X, main = Po(3), ylab = P(X), lwd = 3, cex.lab = 1.5, cex.axis = 2,
cex.main = 2) plot(0:20, dpois(0:20, 5), type = h, ylim = c(0, 0.25), xlab = X,
main = Po(5), ylab = P(X), lwd = 3, cex.lab = 1.5, cex.axis = 2, cex.main =
2) plot(0:20, dpois(0:20, 10),type = h, ylim = c(0, 0.25), xlab = X, main =
Po(10), ylab = P(X), lwd = 3, cex.lab = 1.5, cex.axis = 2, cex.main = 2) Using
the formula we can calculate the probabilities for a specific Poisson distribution
and plot the probabilities to observe the shape of the distribution. For example,
Figure 5 shows 3 different Poisson distributions. We observe that the distributions
are
(i) unimodal
(ii) exhibit positive skew (that decreases a increases)
(iii) centered roughly on
(iii) the variance (spread) increases as increases
In general, there is a formula for the mean of a Poisson distribution. There is also
a formula for the standard deviation, , and variance, 2 .
If X Po() then
=
2
=
= 0.13768
P (Y = 5) = e3.6 3.6
5!
This example illustrates the following rule
If X Po() on 1 unit interval,
then Y Po(k) on k unit intervals.
Now suppose we know that in hospital A births occur randomly at an average rate
of 2.3 births per hour and in hospital B births occur randomly at an average rate
of 3.1 births per hour.
What is the probability that we observe 7 births in total from the two hospitals
= 0.11999
P (X + Y = 7) = e5.4 5.4
7!
The Binomial and Poisson distributions are both discrete probability distributions.
In some circumstances the distributions are very similar. For example, consider
the Bin(100, 0.02) and Po(2) distributions shown in Figure 6. Visually these distributions are identical.
In general,
If n is large (say > 50) and p is small (say < 0.1) then a Bin(n, p)
can be approximated with a Po() where = np
The idea of using one distribution to approximate another is widespread throughout statistics and one we will meet again. In many situations it is extremely difficult to use the exact distribution and so approximations are very useful. Example
Given that 5% of a population are left-handed, use the Poisson distribution to estimate the probability that a random sample of 100 people contains 2 or more
left-handed people.
X = No. of left handed people in a sample of 100
X Bin(100, 0.05)
Poisson approximation X Po() with = 100 0.05 = 5
We want P (X 2)?
P (X 2) = 1 P (X < 2)
= 1 P (X = 0) + P (X = 1)
1
0
5 5
5 5
+e
1 e
0!
1!
1 0.040428
0.959572
7
Consider the two sequences of birth times we saw in Section 1. Both of these
examples consisted of a total of 44 births in 24 hour intervals.
Therefore the mean birth rate for both sequences is
44
24
= 1.8333
What would be the expected counts if birth times were really random i.e. what is
the expected histogram for a Poisson random variable with mean rate = 1.8333.
Using the Poisson formula we can calculate the probabilities of obtaining each
possible value1
x
0
P (X = x) 0.15989
1
2
0.29312 0.26869
3
4
5
0.16419 0.07525 0.02759
6
0.01127
3
4
5
3.941 1.806 0.662
6
0.271
4
5
6
1.806 0.662 0.271
3
0
0
When we compare the expected frequencies to those observed from the nonrandom clustered sequence in Section 1 we see that there is much less agreement.
x
0
1
2
3
Expected 3.837 7.035 6.448 3.941
Observed
12
3
0
2
4
5
6
1.806 0.662 0.271
2
4
1
In Lecture 7 we will see how we can formally test for a difference between the
expected and observed counts. For now it is enough just to know how to fit a
distribution.