Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

A Marginalisation Paradox Example

Dennis Prangle

28th October 2009


Overview

Bayesian inference recap


Example of error due to a marginalisation paradox
(Very) rough overview of general issues
Part I

Bayesian Inference
Bayesian Inference

Prior distribution on parameters θ: p(θ)


Model for the data X : f (X |θ)
Posterior distribution is (using Bayes’ theorem):

p(θ)f (X |θ)
f (θ|X ) = R
p(θ)f (X |θ)dθ

n.b. p(θ) only needed up to proportionality


Bayesian inference performed using computational Monte
Carlo methods (e.g. MCMC)
Typically also don’t need normalisation constant for p(θ) as
ratios used
Improper Prior

A probability density p(θ) (roughly speaking!) satisfies:


1 p(θ)
R ≥0
2 p(θ)dθ = 1
An improper prior doesn’t require condition 2
R
Instead can have p(θ)dθ = ∞
Example: p(θ) = 1 “improper uniform”
Sometimes used to represent prior ignorance
Resulting posterior often a proper distribution
⇒ meaningful conclusions (. . . or are they?!)
Part II

Example: Tuberculosis in San Francisco


Background: Tuberculosis

Tuberculosis is an infectious disease spread by bacteria


Epidemiological interest lies in estimating rates of
transmission and recovery
Conjectured that data on bacteria mutation provides
information → more accurate inference
Background: Paper

Tanaka et al (2006) investigated a Tuberculosis outbreak in


San Francisco in 1991/2
473 samples of Tuberculosis bacteria taken at a particular date
Genotyped according to a particular genetic marker
Samples split into clusters which share the same genotype

Cluster size 1 2 3 4 5 8 10 15 23 30
Number of clusters 282 20 13 4 2 1 1 1 1 1
Model: Underlying disease process

Assume initially there is one case


3 event types: birth, death, mutation (→ new genotype)
Suppose there are N cases at some time
Rate of births: αN
Rate of deaths: δN
Rate of mutations θN
Defines a continuous time Markov process model
We don’t care about times (no data) so can reduce to discrete
time Markov process
Model: Producing data

Run the disease process until there are 10,000 cases


(If the disease dies out, rerun)
Take a simple random sample of 473 cases
Convert to data on genotype frequencies
Prior

Some information on θ from previous studies


Prior distribution N(0.198, 0.067352 ) chosen
Corresponding density denoted p(θ)
Ignorance for other parameters
Proposed (improper) overall prior:

p(θ) if 0 < δ < α
p(α, δ, θ) =
0 otherwise
Motivation:
Marginal for θ is p(θ)
Marginal for (α, δ) is improper uniform:

1 if 0 < δ < α
0 otherwise
Restriction α > δ ⇒ zero prior probability on parameters
where epidemic usually dies out
Results

See Tanaka et al paper


Note change from prior
Parameter Redundancy

All parameters are proportional to rates


Multiplying all by a constant affects only rate of events
But this is irrelevant to our model
Model is over-parameterised:
(α, δ, θ) and (kα, kδ, kθ) give same likelihood
Reparameterisation

Reparameterise to:

a = α/(α + δ + θ)
d = δ/(α + δ + θ)
θ=θ

Motivation: keep θ as have prior info for it


a and d tell us everything about relative rates
Only θ has info on absolute rates. . .
. . . and θ has info on absolute rates only
Parameter constraints:
α, δ, θ > 0 ⇒ a, d, θ ≥ 0
and also a + d ≤ 1
Requirement α > δ in prior ⇒ a > d
Paradox (intuitive)

In new parameterisation, θ equiv to absolute rate info


But data has no information on absolute rates
So (marginal) θ posterior should equal prior?????
Analytic Results 1: Jacobian

Recall:
a = α/(α + δ + θ)
d = δ/(α + δ + θ)
θnew = θ
Solve to give:
α = aθnew /(1 − a − d)
δ = dθnew /(1 − a − d)
θ = θnew
Differentiate for Jacobian:
 
θnew (1 − d) aθ a(1 − a − d)
J = (1−a−d)−2  dθ θnew (1 − a) d(1 − a − d)
0 0 1
2 (1 − a − d)−3
|J| = θnew
Analytic Results 2: Reparameterised prior

Recall p(α, δ, θ) = p(θ)I [0 < δ < α]


(where p(θ) is a normal pdf)
Then:

p(a, d, θnew ) = p(θ)I [0 < δ < α]|J|


2
= θnew p(θnew )I [0 < d < a](1 − a − d)−3
Analytic Results 3: Posterior

Recall likelihood depends on a, d only


i.e. f (X |λ) = f (a, d)
So posterior is:
2
π(a, d, θnew ) ∝ θnew p(θnew )I [0 < d < a](1 − a − d)−3 f (a, d)

If this is proper, then posterior marginal for θ is:


2
π(θnew ) ∝ θnew p(θnew )

Matches results graph


Paradox and explanation

The prior was constructed to have marginal p(θ)


The model contains no data on θ
But we have shown that the posterior acts like ∝ θ2 p(θ)
(easy to falsely conclude that change is due to data)
PARADOX
The problem is that marginal distributions are not well defined
for improper priors
R
i.e. p(α, δ, θ)dαdδ is not a pdf (integral not 1)
Attempting to normalise gives /∞ problems
Prior didn’t really have claimed marginal
Practical resolution

Prior aimed to combine ignorance on α, δ with prior


knowledge on θ
In (a, d, θ) reparameterisation, range of (a, d) is finite
Combine p(θ) with a uniform marginal on (a, d) using
independence
For this parameterisation does give proper prior
So priors are well defined
(side issue: is uniform best representation of ignorance?)
Part III

Marginalisation Paradoxes: theory


Subjective Bayes viewpoint

Priors should represent prior beliefs


Only a probability distribution represent beliefs coherently
Therefore don’t use improper priors
(this is the resolution used earlier)
Objective Bayes viewpoint

Conclusions shouldn’t depend on subjective beliefs


(c.f. frequentist analysis)
Instead use objective reference priors
Lots of theory for choosing these
Will often be improper (e.g. Jeffrey’s prior)
So marginalisation paradoxes a real issue
The marginalisation paradox

Well-known Bayesian inference paradox


From Dawid, Stone, Zidek (RSS B 1973; read paper)
For models with a particular structure. . .
. . . there are two marginalisation approaches to Bayesian
inference
For improper priors, these typically do not agree
Large literature; claims of resolution but not fully
acknowledged
Is my example a special case of this?
Part IV

Conclusion
Conclusion

Be wary of marginalisation issues for improper priors!


Bibliography

A. P. Dawid, M. Stone, and J. V. Zidek Marginalization


paradoxes in Bayesian and structural inference JRSS(B),
35:189-233, 1973.
Mark M. Tanaka, Andrew R. Francis, Fabio Luciani, and S. A.
Sisson. Using Approximate Bayesian Computation to Estimate
Tuberculosis Transmission Parameters from Genotype Data.
Genetics, 173:1511–1520, 2006.

You might also like