Professional Documents
Culture Documents
Theoretical Distributions & Hypothesis Testing
Theoretical Distributions & Hypothesis Testing
Theoretical Distributions & Hypothesis Testing
&
hypothesis testing
what is a distribution??
describes the shape of a batch of numbers
the characteristics of a distribution can
sometimes be defined using a small number
of numeric descriptors called parameters
why??
can serve as a basis for standardized
comparison of empirical distributions
can help us estimate confidence intervals
for inferential statistics
form a basis for more advanced statistical
methods
fit between observed distributions and certain
theoretical distributions is an assumption of
many statistical procedures
Normal (Gaussian) distribution
continuous distribution
tails stretch infinitely in both directions
symmetric around the mean (u)
maximum height at u
standard deviation (o) is at the point of
inflection
0
12
24
36
48
60
72
84
96
108
120
132
144
156
168
180
1 2 3 4 5 6 7 8 9 10 11 12 13
u
o o
a single normal curve exists for any
combination of u, o
these are the parameters of the distribution and
define it completely
a family of bell-shaped curves can be
defined for the same combination of u, o,
but only one is the normal curve
binomial distribution with p=q
approximates a normal distribution of
probabilities
p+q=1 p=q=.5
u=np=.5n
recall that the binomial theorem specifies
that the mean number of successes is np;
substitute p by .5
o=\(np
2
)=.5\n
simplified from \(n*0.25)
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0 2 4 6 8 10
k
P
(
1
0
,
k
,
.
5
)
lots of natural phenomena in the real world
approximate normal distributionsnear
enough that we can make use of it as a
model
e.g. height
phenomena that emerge from a large
number of uncorrelated, random events will
usually approximate a normal distribution
standard probability intervals (proportions
under the curve) are defined by multiples of
the standard deviation around the mean
true of all normal curves, no matter what u
or o happens to be
P(u-o <= u <= u+o) = .683
u+/-1o = .683
u+/-2o = .955
u+/-3o = .997
50% = u+/-0.67o
95% = u+/-1.96o
99% = u+/-2.58o
0
12
24
36
48
60
72
84
96
108
120
132
144
156
168
180
1 2 3 4 5 6 7 8 9 10 11 12 13
u
o
the logic works backwards
if u+/-o < > .68, the distribution is not
normal
z-scores
standardizing values by re-expressing them
in units of the standard deviation
measured away from the mean (where the
mean is adjusted to equal 0)
s
x x
Z
i
i
=
z-scores = standard normal deviates
converting number sets from a normal
distribution to z-scores:
presents data in a standard form that can be
easily compared to other distributions
mean = 0
standard deviation = 1
z-scores often summarized in table form as
a CDF (cumulative density function)
Shennan, Table C (note errors!)
can use in various ways, including
determining how different proportions of a
batch are distributed under the curve
Neanderthal stature
population of Neanderthal skeletons
stature estimates appear to follow an
approximately normal distribution
mean = 163.7 cm
sd = 5.79 cm
Quest. 1: what proportion of the
population is >165 cm?
z-score = ?
z-score = (165-163.7)/5.79 = .23 (+)
mean = 163.7 cm
sd = 5.79 cm
.48803 .48405 .48006 .47608
Quest. 1: what proportion of the
population is >165 cm?
z-score = .23 (+)
using Table C-2
cdf(.23) = .40905
40.9%
Quest. 2: 98% of the population
fall below what height?
Cdf(x)=.98
can use either table
Table C-1; look for .98
Table C-2; look for .02
.48803 .48405 .48006 .47608
Quest. 2: 98% of the population
fall below what height?
Cdf(x)=.98
can use either table
Table C-1; look for .98
Table C-2; look for .02
both give you a value of 2.05 for z
solve z-score formula for x:
x = 2.05*5.79+163.7 = 175.6cm
x Z x
i i
+ = o
sample distribution of the mean
we dont know the shape of the distribution
an underlying population
it may not be normal
we can still make use of some properties of
the normal distribution
envision the distribution of means associated
with a large number of samples
distribution of means derived from sets of
random samples taken from any population
will tend toward normality
conformity to a normal distribution
increases with the size of samples
these means will be distributed around the
mean of the population
u =
x
X
central limits theorem
we usually have one of these samples
we cant know where it falls relative to the
population mean, but we can estimate odds
about how far it is likely to be
this depends on
sample size
an estimate of the population variance
the smaller the sample and the more
dispersed the population, the more likely
that our sample is far from the population
mean
this is reflected in the equation used to
calculate the variance of sample means:
n
s
x
2
2
o
=
the standard deviation of sample means is the
standard error of the estimate of the mean:
n
n n
s
e
o
o
o
= = =
1
2
you can use the standard error to calculate
a range that contains the population mean,
at a particular probability, and based on a
specific sample:
n
s
Z x
o