geometric distribution cheat sheet

A quick guide to the geotttetrfc dlstrlbutlo"

Here's a quick summary of everything you could possibly need to know about the Geometric distribution

Whe" do I use It?

Use the Geometric disu·ibution if you're running independent trials, each one can have a success or failure, and
you're interested in how many trials are needed to get the first succes.~ful outcome

How do I calculate probabllttles?

Use the following handy formulae. p is the probability of success in a trial, q = I - p, and X is the number of
trials needed in order to get the first successful outcome. \ Ve say X ~ Geo(p).

7'P(X = r) = p q'- 1 P(X > r) = qr P(X ~ r) = 1 - qr

The f'"obability o+ the +irst The rrohability 'f<>"'II 11eed "'""e tha11 bei,., i11 t he r'th trial Th~ o'oa'o·,lit.'1 '1°"'11 11eed r trials
r trials to ,tt Yo""" t int s1.1t t.css
"" to ~et '104,"" f irrt. s"'-l.CSS
What about the expectatfo" a"d varta"ce?
Just use the following
E(X) = 1/p Var(X) = q/p2
- - - - - - - -- - - - - - - - - - - - -
Your quick guide to the bl"otttlal dlstrfbutlo"
Here's a quic k summar y ol' everything you could possibly need to know about the binomial distribution

Whe" do Iuse It?

Use the binomial d istri but.ion ii' yo u're running a fixed 11uml,er of i11depc11dc11t uials, each one can ha\'c a success
or failure, and you're interested in the number of successes or failures

How do I calculate probablltttu?

·cr = n!
P(X = r) = "C, p' q"·'
rt (n - r)I

where p is the probability of success in a trial, q = 1 · p, n is the number of trials. and X is the number of
surcesses in the 11 trials.

What about the expectatlo" a"d varta"ce?

E(X) = np Var(X) = npq
Your quick guide to the Poisso" dlstributlo"
H erc's a quick summary of everything you could possibly aced 10 know about the Poisson distribution

Whett do I use It?

Use the Po isson distribution if you have independent events such as malfunctions occurring in a gh·en interv al,
and you know A, the mean number of occurrences in a given imerval. You 're interested in the number of
occurrences in one particular interval.

How do I calculate probabilities, attd the expectatiott attd varlattce?

P(X = r) = •·• Ar E(X) = A Var(X) =A

How do I co111blt1e lttdepasdast rat1do111 variables?

Lf X ~ Po(/1. ) and Y ~ Po(/1. ), then

What COt1t1ectiot1 does It have to the blt101Mlal dlstrlbutfott?

If X ~ B(n, p), where n is large and p is small, then X can be approximated using

X- Po(np)

■ The geometric distribution ■ The binomial distribution appfies ■ The Poisson distribution applies
applies when you run a series of when you run a series of finite when individual events occur at
independent trials, there can be independent trials, there can be random and independentiy in a
either a success or failure for each either a success or faUure for each given interval, you know the mean
trial, the probability of success is trial, the probability of success is number of occurrences in the
the same for each trial, and the the same for each trial, and the interval or the rate of occurrences
main thing you're interested in main thing you're interested in is and this is finite, and you want to
is how many trials are needed in the number of successes in the n know the number of occurrences in
order to get your first success. independent trials. a given interval.
■ If the conditions are met for the ■ If the conditions are met for the ■ If the conditions are met for
geometric distribuUon, X is the binomial distribution, X is the the Poisson distribution, X is
number of trials needed to get the number of successful outcomes the number of occurrences In a
first successful outcome, and p is out of n trials, and p is the particular interval, and Ais the rate
the probability of success in a trial, probability of success in a trial, of occurrences, then
then then
X- Po(A)
x- Geo(p) X - B(n, p)
■ If X - Po(A) then
■ The following probabilities apply if ■ If X - B(n, p), you can calculate
X- Geo(p): probabilities using P(X =r) =e,1, A'
P(X = r) = pq · 1 P(X = r) = •c, p' q•·• E(X) = A
P(X > r) = q' Var(X) = A
P(X s r) = 1 - q' where
■ If X - Po(A,), Y - Po(J..,) and X and
•Cr = n!
■ If X - Geo(p) then Y are independent,
r! (n - r)!
E(X) = 1/p
Var(X) = q/p2 ■ If X - B(n, p), then
■ If X - B(n, p) where n is large and
E(X) = np
p is small, you can approximate it
Var(X) = npq
with X - Po(np).
■ The normal distribution forms the shape of a ■ You find normal probabilities by looking up your standard
symmetrical bell curve. It's defined using N(µ, o-2). score in probability tables. Probability tables give you
the probability of getting this value or lower.
■ To find normal probabilities, start by identifying the
probability range you need. Then find the standard score
for the limit of this range using

Z=X -µ where z- N(O, 1).


■ If X - N(µx, cr2x) and

Y - N(µY, cr2Y), and X and Y
are independent, then

X - Y - N(µ X - µy' 0 2X + cr2y)

■ If X - N(µ, cr2) and a and b are

numbers, then

ax+ b - N(aµ + b, a2cr2)

■ If X1, X2, ... , Xn are

independent observations of
X where X - N(µ, cr2), then
It's time to test your statistical knowledge. Complete the table below, saying what normal
distribution suits each situation, and what conditions there are.

Situation Dis tribution Condition

X + Y - N(µ• + µy' o2• + 0 2'f) X, Y are independent
X - N(11. , a2. ), Y - (11,- a2,)

)( - '( .. NC1. - 1-,, 0 1 • + 0 1,) )(, y ~ i,,d~ "t.
X - N(11. , o-2. ), Y - (II,- a2,)

,1)( + b .. NC.11' + b, .11 0 1 ) ,1, b~ t~t. v.llioes
X - N(11, o-2)

~•X2 + ... +X11

+ )(1 +... +)(• .. NC,,.,., 9\01) )(, )( , ..., )( .ll"e itlClqielldo.t
X - N(11, a2) ~~-of)(

Normal approximation of X
)( .. NC"f, ""' "' > '7, "'' ► '7
X - ■(n, p) Co.-t.-rly CGn-ettiotl l"e~d

Normal approximation of X A> 1'7

)( .. NC,._, )J
X - Po(A) Cot-ii~ CGn-ttii°" l"e~d

bullet points and no dumb questions


■ In particular circumstances you ■ If you're approximating the Poisson
can use the normal distribution to distribution with the normal
approximate the Poisson. distribution, then you need to apply
a continuity correction t.o make sure
■ If X - Po(A) and A> 15 then you can
your results are accurate.
approximate X using X - N(A, A)
■ A population is the entire collection with or without replacement. You can
of things you are studying. perform simple random sampling by
drawing lots or using random number
■ A sample is a relatively small generators.
selection taken from the population
that you can use to draw condusions ■ Stratified sampling is where you
about the population itself. divide the population into groups of
similar units or strata. Each stratum
■ To take a sample, start off by defining is as different from the others as
your target population, the population possible. Once you've done this, you
you want to study. Then decide on perform simple random sampling
your sampling units, the sorts of within each stratum.
things you need to sample. Once
you've done that, draw up a sampling ■ Cluster sampling is where you
frame, a list of all the sampling units divide the population into clusters
in your target population. where each cluster is as similar
to the others as possible. You use
■ A sample is biased if it isn't simple random sampling to choose
representative of your target a selection of clusters. You then
population. sample every unit in these dusters.
■ Simple random sampling is where ■ Systematic sampling is where you
you choose sampling units at random choose a number, k, and sample
to form your sample. This can be every kth unit.

■ A point estimator is an estimate for the value of a ■ The point estimator for the population mean is found by
population parameter, derived from sample data. calculating i. In other words,

■ The " symbol is added to the population parameter

when you're talking about its point estimator. As an
example, the point estimator for µ isµ. This means that if you want a good estimate for the true
■ The mean of a sample is represented as i. To find the value of the population mean, you can use the mean of
mean of the sample, use the formula the sample.


where x represents the values in the sample, and n is

the sample size.

■ The point estimator for the population ■ The point estimator for p is given by Ps•
variance is given by where Psis the proportion of successes in the

where s2 is given by
■ You calculate Ps by dividing the number of
l(x - x} 2
successes in the sample by the size of the
n-1 sample.

■ The population proportion is represented Ps= number of successes

using p. It's the proportion of successes within number in sample
the population.

■ The sampling distribution of proportions is ■ The standard error of proportion is the standard
what you get if you consider all possible samples deviation of this distribution. It's given by
of size n taken from the same population and form
a distrbution out of their proportions. We use P. to ...JVar(PJ
represent the sample proportion random variable.
■ The expectation and variance of P1 are defined as ■ If n > 30, then P, follows a normal distribution, so

E(Ps) = p P, - N(p, pq/n)

Var(PJ = pq/n for large n. When working with this, you need to
apply a continuity correction of
where p is the population proportion.
± 1
■ The sampling distribution of means is what ■ The standard error of the mean is the standard
you get If you consider all possible samples of deviation of this distribution. It's given by
size n taken from the same population and form
a distribution out of their means. We use Xto
represent the sample mean random variable.
■ The expectation and variance of Xare defined as ■ If X - N(µ, a2), then X- N(µ, a2/n).
■ The central limit theorem says that if n is large
E(X} = µ
and X doesn't follow a normal distribution, then
Var(X) = a2/n
X- N(µ, a2/n)
where µ and a2 are the mean and variance of the
Population Population Conditions Confidence interval
statistic distribution
µ Normal You know what 0 2 is
n is large or small (- a_ a)
X- C - , X + C -
X is the sample mean rn vn
You know what o 2 is
µ Non-normal
n is large (at least 30)
~ is the sample mean
( -X - C -,
a_ a)
X+ C -
µ Normal or non-normal You don't know what o2 is
n is large (at least 30)
i is the sample mean
(- s_
- s)
s2 is the sample variance
p n is large
p5 is the sample proportion
qs is 1 - Ps
(P,-~·P,•~ )
Whafs the ftltuvaJ '" gmual?
In gerwra l. the cnn lid<-nc c intcr ,al is gi, -cn by a~ the \ia~i\
Leve lofc onfl denc e Valu e of c
stati stic t (mar gin of erro r) 90% 1.64
95% 1.96
The m,u~i n uf cno, ~ ~,-en by the ,-aluc uf l multip lied 99% 2.58
b) die s1and ard dc·,iat ion of Lhe test s1a1is1ic.

ntarg in of erro r• c x (stan dard devi ation of stati
~ BULLET POINTS - - - - - - - - - - - - - - -

• In a hypothesis test, you take a claim and test it ■ The critical region is the set of values that presents
against statistical evidence. the most extreme evidence against the null
hypothesis test. You choose your critical region by
■ The claim that you're testing is called the null
considering the significance level and how many tails
hypothesis test. It's represented as H • and it's the
you need to use.
claim that's accepted unless there's strong statistical
evidence against it. ■ A one-tailed test is when your critical region lies
in either the upper or the lower tail of the data. A
■ The alternate hypothesis is the claim we'll accept
two-tailed test is when it's split over both ends.
if there's strong enough evidence against H0• It's
You choose your tail by looking at your alternate
represented by H1•
■ The test statistic is the statistic you use to test your
■ A p-value is the probability of getting the result
hypothesis. It's the statistic that's most relevant to
of your sample, or a result more extreme in the
the test. You choose the test statistic by assuming
direction of your critical region.
that H0 is true.
■ If the p-value lies in the critical region, you have
■ The significance level is represented by a. It's a way
sufficient reason to reject your null hypothesis. If
of saying how unlikely you want your results to be
your p-value lies outside your critical region, you
before you'll reject H0•
have insufficient evidence.
■ A Type I error is when you reject the null hypothesis when tt's actually
correct. The probability of getting a Type I error is a, the significance level of
the test.
■ A Type II error is when you accept the null hypothesis when it's wrong. The
probability of getting a Type II error is represented by p.
■ To find p, your alternate hypothesis must have a specific value. You then
find the range of values outside the critical region of your test, and then find
the probability of getting this range of values under H,.

Chapter 13

