Professional Documents
Culture Documents
Head First Stat
Head First Stat
where p is the probability of success in a trial, q = 1 · p, n is the number of trials. and X is the number of
surcesses in the 11 trials.
X- Po(np)
'BULLET POINTS
■ The geometric distribution ■ The binomial distribution appfies ■ The Poisson distribution applies
applies when you run a series of when you run a series of finite when individual events occur at
independent trials, there can be independent trials, there can be random and independentiy in a
either a success or failure for each either a success or faUure for each given interval, you know the mean
trial, the probability of success is trial, the probability of success is number of occurrences in the
the same for each trial, and the the same for each trial, and the interval or the rate of occurrences
main thing you're interested in main thing you're interested in is and this is finite, and you want to
is how many trials are needed in the number of successes in the n know the number of occurrences in
order to get your first success. independent trials. a given interval.
■ If the conditions are met for the ■ If the conditions are met for the ■ If the conditions are met for
geometric distribuUon, X is the binomial distribution, X is the the Poisson distribution, X is
number of trials needed to get the number of successful outcomes the number of occurrences In a
first successful outcome, and p is out of n trials, and p is the particular interval, and Ais the rate
the probability of success in a trial, probability of success in a trial, of occurrences, then
then then
X- Po(A)
x- Geo(p) X - B(n, p)
■ If X - Po(A) then
■ The following probabilities apply if ■ If X - B(n, p), you can calculate
X- Geo(p): probabilities using P(X =r) =e,1, A'
rl
P(X = r) = pq · 1 P(X = r) = •c, p' q•·• E(X) = A
P(X > r) = q' Var(X) = A
P(X s r) = 1 - q' where
■ If X - Po(A,), Y - Po(J..,) and X and
•Cr = n!
■ If X - Geo(p) then Y are independent,
r! (n - r)!
E(X) = 1/p
Var(X) = q/p2 ■ If X - B(n, p), then
■ If X - B(n, p) where n is large and
E(X) = np
p is small, you can approximate it
Var(X) = npq
with X - Po(np).
~ BULLET POINTS
■ The normal distribution forms the shape of a ■ You find normal probabilities by looking up your standard
symmetrical bell curve. It's defined using N(µ, o-2). score in probability tables. Probability tables give you
the probability of getting this value or lower.
■ To find normal probabilities, start by identifying the
probability range you need. Then find the standard score
for the limit of this range using
X+Y
X + Y - N(µ• + µy' o2• + 0 2'f) X, Y are independent
X - N(11. , a2. ), Y - (11,- a2,)
X-Y
)( - '( .. NC1. - 1-,, 0 1 • + 0 1,) )(, y ~ i,,d~ "t.
X - N(11. , o-2. ), Y - (II,- a2,)
aX+b
,1)( + b .. NC.11' + b, .11 0 1 ) ,1, b~ t~t. v.llioes
X - N(11, o-2)
Normal approximation of X
)( .. NC"f, ""' "' > '7, "'' ► '7
X - ■(n, p) Co.-t.-rly CGn-ettiotl l"e~d
■ A point estimator is an estimate for the value of a ■ The point estimator for the population mean is found by
population parameter, derived from sample data. calculating i. In other words,
x=rx
n
■ The point estimator for the population ■ The point estimator for p is given by Ps•
variance is given by where Psis the proportion of successes in the
sample.
where s2 is given by
■ You calculate Ps by dividing the number of
l(x - x} 2
successes in the sample by the size of the
n-1 sample.
■ The sampling distribution of proportions is ■ The standard error of proportion is the standard
what you get if you consider all possible samples deviation of this distribution. It's given by
of size n taken from the same population and form
a distrbution out of their proportions. We use P. to ...JVar(PJ
represent the sample proportion random variable.
■ The expectation and variance of P1 are defined as ■ If n > 30, then P, follows a normal distribution, so
Var(PJ = pq/n for large n. When working with this, you need to
apply a continuity correction of
where p is the population proportion.
± 1
2n
~ BULLET POINTS
■ The sampling distribution of means is what ■ The standard error of the mean is the standard
you get If you consider all possible samples of deviation of this distribution. It's given by
size n taken from the same population and form
a distribution out of their means. We use Xto
represent the sample mean random variable.
■ The expectation and variance of Xare defined as ■ If X - N(µ, a2), then X- N(µ, a2/n).
■ The central limit theorem says that if n is large
E(X} = µ
and X doesn't follow a normal distribution, then
Var(X) = a2/n
X- N(µ, a2/n)
where µ and a2 are the mean and variance of the
population.
Population Population Conditions Confidence interval
statistic distribution
µ Normal You know what 0 2 is
n is large or small (- a_ a)
X- C - , X + C -
X is the sample mean rn vn
You know what o 2 is
µ Non-normal
n is large (at least 30)
~ is the sample mean
( -X - C -,
rn
a_ a)
X+ C -
vn
µ Normal or non-normal You don't know what o2 is
n is large (at least 30)
i is the sample mean
(- s_
X-C-,X+C
fn
- s)
rn-
s2 is the sample variance
p n is large
Binomial
p5 is the sample proportion
qs is 1 - Ps
(P,-~·P,•~ )
Whafs the ftltuvaJ '" gmual?
In gerwra l. the cnn lid<-nc c intcr ,al is gi, -cn by a~ the \ia~i\
~~
Leve lofc onfl denc e Valu e of c
stati stic t (mar gin of erro r) 90% 1.64
95% 1.96
The m,u~i n uf cno, ~ ~,-en by the ,-aluc uf l multip lied 99% 2.58
b) die s1and ard dc·,iat ion of Lhe test s1a1is1ic.
stic)
ntarg in of erro r• c x (stan dard devi ation of stati
~ BULLET POINTS - - - - - - - - - - - - - - -
• In a hypothesis test, you take a claim and test it ■ The critical region is the set of values that presents
against statistical evidence. the most extreme evidence against the null
hypothesis test. You choose your critical region by
■ The claim that you're testing is called the null
considering the significance level and how many tails
hypothesis test. It's represented as H • and it's the
you need to use.
claim that's accepted unless there's strong statistical
evidence against it. ■ A one-tailed test is when your critical region lies
in either the upper or the lower tail of the data. A
■ The alternate hypothesis is the claim we'll accept
two-tailed test is when it's split over both ends.
if there's strong enough evidence against H0• It's
You choose your tail by looking at your alternate
represented by H1•
hypothesis.
■ The test statistic is the statistic you use to test your
■ A p-value is the probability of getting the result
hypothesis. It's the statistic that's most relevant to
of your sample, or a result more extreme in the
the test. You choose the test statistic by assuming
direction of your critical region.
that H0 is true.
■ If the p-value lies in the critical region, you have
■ The significance level is represented by a. It's a way
sufficient reason to reject your null hypothesis. If
of saying how unlikely you want your results to be
your p-value lies outside your critical region, you
before you'll reject H0•
have insufficient evidence.
~ BULLET POINTS
■ A Type I error is when you reject the null hypothesis when tt's actually
correct. The probability of getting a Type I error is a, the significance level of
the test.
■ A Type II error is when you accept the null hypothesis when it's wrong. The
probability of getting a Type II error is represented by p.
■ To find p, your alternate hypothesis must have a specific value. You then
find the range of values outside the critical region of your test, and then find
the probability of getting this range of values under H,.
Chapter 13