Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Week

13

This week:
• Tuesday: Bayesian methods
• Thursday: No class (Thanksgiving)

Next week:
• Tuesday: No class (work on assignment
• Thursday: Review session, help with assignment
• Assignment 3 due Thursday November 30
Schools of statistical thought

Frequentist
- P(D|H), the probability of the data given the hypothesis
- Data are an estimate of some “truth” about the world

Bayesian
- P(H|D), the probability of the hypothesis given the data
- Data are “true”, what is the probability of the model?
Data analysis in a Bayesian framework is related to likelihood
methods

With likelihood, the data are considered an estimate of some


truth, and we vary the parameter to find the value for which
the probability of the data is highest.

Bayesian methods, treat the parameter (or hypothesis) as


some estimate of the truth, by considering it a random
variable, and seeking the parameter value having the highest
probability, given the data.

Calculating a posterior probability requires that we have a


prior probability for the parameter value, or hypothesis.
To talk about Bayesian methods, we need to remember what
likelihood is, and therefore what probability is…

So, what is probability?

A frequentist would say:


The probability of an event is the proportion of time that the event
would occur if we repeated a random trial over and over again
under the same conditions

A probability distribution is a list of all mutually exclusive outcomes


of a random trial and their probabilities of occurrence.
Probability statements that makes sense under a frequentist
definition…

• If we toss a fair coin, what is the probability of 10 heads in a row?

• If we assign treatments randomly to subjects, what is the


probability that the observed difference between treatment
means is what we’d get if the null hypothesis were true

• What is the probability of a result at least as extreme as that


observed if the null hypothesis is true
• Under a process of genetic drift in a finite population, what is the
probability of fixation of a rare allele?
Here, sampling error is the source of uncertainty.
Probability statements that do not makes sense under a
frequentist definition…
Why?
• What is the probability that Iran is building nuclear weapons?
Either Iran is or isn’t – there is no random trial
• What is the probability that hippos are the sister group to the
whales
Either they are or they’re not – there is no random trial
• What is the probability that polar bears will be extinct in the wild
in 40 years?
Difficult to cast as a frequency of occurrence

Here, there is no random trial, so no sampling error

The information is the source of uncertainty, not sampling error!


An alternative (Bayesian) definition of probability

Probability is the measure of a degree of belief associated with the


occurrence of an event

A probability distribution is a list of all mutually exclusive events


and the degree of belief associated with their occurrence

Bayesian statistics applies the mathematics of probability to model


uncertainty as a subjective degree of belief
Bayesian methods are increasingly used in ecology and evolution

“Ecologists should be aware that Bayesian methods constitute a radically different way of
doing science. Bayesian statistics is not just another tool to be added into the ecologists’
repertoire of statistical methods. Instead, Bayesians categorically reject various tenets of
statistics and the scientific method that are currently widely accepted in ecology and other
sciences.” B. Dennis, 1996, Ecology

“Bayesian statistics is like handing guns to children” J. Clark


Bayes Theorem H = Hypothesis
D = Data

Likelihood ‘Prior’ probability

‘Posterior’ P (D H ) P (H )
probability P ( H D) =
P ( D)

The data
Bayes Theorem: example

Down Syndrome (DS) occurs in about 1 in 1000 pregnancies. We use


a test to know the probability that a particular baby has it. The test
is low risk, but also low accuracy.
Fetus has DS? Test result +ve Probability
Yes
Yes
No

Yes
No
No
Bayes Theorem: example
• Conditional probability is the probability of an event occurring given that a
condition is met
• The probability of a positive test result from the test is 0.6, given that a fetus
has DS. The probability of a positive result is 0.05, given that a fetus is not DS.

Fetus has DS? Test result +ve Probability


0.60 Yes 0.0006
Yes
0.001 0.0004
0.40 No

0.05 Yes 0.04995


0.999
No
No 0.94905
0.95
1.00000
Bayes Theorem: example
• What is the probability that a fetus has DS if the test is positive?

P ( positive DS ) P ( DS )
P ( H D ) = P ( DS positive) =
P ( positive)

Fetus has DS? Test result +ve Probability


0.60 Yes 0.0006
Yes
0.001 0.0004
0.40 No

0.05 Yes 0.04995


0.999
No
No 0.94905
0.95
1.00000

0.60 × 0.001 • Just 1.2%


P ( H D ) = P ( DS positive) = = 0.012
0.0006 + 0.04995
Bayes Theorem H = Hypothesis
D = Data

‘Prior’ probability Fetus has DS? Test result +ve Probability


of having DS 0.60 Yes 0.0006
Yes
0.001 0.0004
0.40 No

0.05 Yes 0.04995


0.999
No
No 0.94905
‘Posterior’ 0.95
1.00000
probability of
having DS
0.60 × 0.001
P ( H D ) = P ( DS positive) = = 0.012
0.0006 + 0.04995
Bayesian inference with data

‘Prior’ probability Hypothesis Data


Data collected
H1
p
All other data

Data collected
1 – p
‘Posterior’ H2
All other data
probability

P ( D H 1 ) P ( H1 )
P ( H1 D ) =
P ( D H1 ) P ( H1 ) + P ( D H 2 ) P ( H 2 )
The dangers of Bayes Theorem
• The prior probability is subjective and can have a large influence
Example: Forensic evidence.
What is the probability of guilt given a positive DNA match?

‘Prior’ probability Defendant DNA match? Probability


of guilt guilty? 1 Yes p
Yes
p 0
0 No

10-6 Yes (1 – p)10-6


1 – p
‘Posterior’ No
No (1 – p)(1 – 10-6)
probability of guilt 1 – 10-6

1( p)
P ( H D ) = P ( guilt match ) =
1( p) +10 −6 (1− p)
The dangers of Bayes Theorem
‘Prior’ probability Defendant DNA match? Probability
of guilt guilty? 1 Yes p
Yes
p 0
0 No

10-6 Yes (1 – p)10-6


1 – p
‘Posterior’ No
No (1 – p)(1 – 10-6)
probability of guilt 1 – 10-6

1( p)
P ( H D ) = P ( guilt match ) =
1( p) +10 −6 (1− p)

If p = 10-6, then P(guilt|match) = 0.5


If p = 0.5, then P(guilt|match) = 0.999

So…is the defendant guilty or not guilty??


The dangers of Bayes Theorem: example
Study of the sex ratio of the communal-living bee, (Paxton and
Tengo, 1996, J. Insect. Behav.)

What is the proportion of males in the reproductive adults


emerging from colonies?

To begin, we need to come up with a prior probability distribution


for the proportion

A “non-informative” prior (sometimes called a flat prior) is like an


expression of ignorance (which can be a good thing!)

An “informative” prior captures previous information based on


theory of previous data
The dangers of Bayes Theorem: example

Case 1: A “non-informative” prior (sometimes called a flat prior)


The dangers of Bayes Theorem: example
Case 2: An “informative” prior based on sex-ratio theory (that
predicts a 50:50 ratio)
The dangers of Bayes Theorem: example

Case 3: An “informative” prior based on different sex-ratio theory


(that predicts female biased sex ratios)
The dangers of Bayes Theorem: example
proportion p from MLE = 0.39

p = 0.39

p = 0.40

p = 0.36
The dangers of Bayes Theorem: example
With lots of data, the choice of prior has little effect
The dangers of Bayes Theorem: example

• The estimated proportion based on the Bayesian posterior


probability distribution depends on the prior probability
distribution

• A source of controversy is that the prior is partly subjective.


Different researchers may use different priors and get different
results
• Non-informative priors may get around the subjectivity, but
they could also be considered a form a subjectivity (the same
way we assume a normality in a frequentist framework)

• Non-informative priors prevent us from incorporating prior


information, which is regarded as one of the strengths of the
Bayesian approach
Bayesian estimates of uncertainty
95% credible interval
Frequentist:

95% Confidence Interval (which is based on the likelihood) means:

In a many random samples taken from the same population, the


true population mean will fall somewhere within 95% of the
confidence intervals
Frequentist:

95% Confidence Interval (which is based on the likelihood) means:

In a many random samples taken from the same population, the


true population mean will fall somewhere within 95% of the
confidence intervals

Bayesian:

95% Credible Interval (which is based on Bayes Theorem) means:

There is a 95% chance that the population mean lies within that
interval
Bayesian model selection

BIC = Bayesian Information Criterion


DIC = Deviance Information Criterion

Derived from a very different theory, but yields a formula similar to


that for AIC.

AIC = −2 ln L(model|data) + 2k

BIC = −2 ln L(model|data) + k log(n)

DIC = −2 ln L(model|data) + k

k = number of parameters estimated in the model


n = sample size
Bayesian inference is different to what you usually do
The approach requires a prior probability
The prior probability represents the investigator’s strength of belief
about the hypothesis, or of the parameters value

The influence of the prior declines with more data

The posterior probability expresses how the investigator’s beliefs


have been ‘updated’ by the data

Bayes theorem is used to estimate parameters and test hypotheses


using the posterior distribution

The interpretation of interval estimates differ from the frequentist


definition

You might also like