Professional Documents
Culture Documents
Bayesian Data Analysis
Bayesian Data Analysis
Bayesian Data Analysis
An Introduction
By Martin Roa Villescas
What is Bayesian inference?
●
Bayesian inference is reallocation of credibility across possibilities.
Prior Prior Prior
0.0 0.2 0.4 0.6 0.8 1.0
Credibility
Credibility
A B C D A B C D A B C D
Possibilities Possibilities Possibilities
Credibility
Credibility
A B C D A B C D A B C D
Possibilities Possibilities Possibilities
2
What is Bayesian inference?
●
Bayesian inference is reallocation of credibility across possibilities.
Prior Prior Prior
●
This reallocation of credibility is not only intuitive, it is also what the
0.0 0.2 0.4 0.6 0.8 1.0
Credibility
Credibility
A B C D A B C D A B C D
Possibilities Possibilities Possibilities
Credibility
Credibility
A B C D A B C D A B C D
Possibilities Possibilities Possibilities
3
Foundational ideas
●
Bayesian data analysis has two foundational ideas:
4
Bayesian probability
●
The mathematics that govern reallocation of
credibility.
●
Boils down to one simple equation: Bayes’ rule!
●
Bayes’ rule is derived from:
– The conditional probability law
– The total probability theorem
6
Thomas Bayes
●
Thomas Bayes (1702-1761)
●
English mathematician and Presbyterian minister.
●
His theorem was published posthumously in 1763 by Richard Price.
●
An alternative approach of statistical inference emerged later in the
20th century known as Frequentist inference with Ronald Fisher
(1890-1962) as its main figure.
●
Although the Fisherian approach was dominant in the 20th century,
it is curious and re-assuring that the older Bayesian approach of
the 18th century is taking over in the 21st century.
7
Probabilistic models
●
A probabilistic model is a mathematical description of an uncertain
situation.
●
It has a sample space Ω, which is the set of all possible outcomes of an
experiment.
●
And a probability law, which assigns to a set A of possible outcomes a
nonnegative number P(A) that encodes our belief about the collective
“likelihood”of the elements of A.
8
Probabilistic models
●
Elements of the sample space must be distinct and
mutually exclusive.
●
The sample space must be collectively exhaustive.
9
Conditional probability law
10
Total probability theorem
11
Bayes’ rule
●
Conditional probability law (Product rule)
●
Total probability theorem
●
Bayes’ rule
12
Bayes’ rule
13
Bayes’ rule
●
Bayes’ rule is merely the mathematical relation between the prior
allocation of credibility and the posterior reallocation of credibility
conditional on data.
●
What is inference?
– There are a number of “causes” θi that may result in an “effect” D.
– We observe the effect D, and we wish to infer the cause θi.
14
Bayes’ rule
●
Bayes’ rule is merely the mathematical relation between the prior
allocation of credibility and the posterior reallocation of credibility
conditional on data.
●
What is inference?
– There are a number of “causes” θi that may result in an “effect” D.
– We observe the effect D, and we wish to infer the cause θi.
15
Bayes’ rule
●
Likelihood function
●
Prior distribution
●
Evidence
●
Posterior distribution
16
Bayes’ rule
17
Bayes’ rule
18
Bayes’ rule
●
Posterior distribution
19
Bayes’ rule
20
Two-way discrete table
●
Example of Bayes’ rule in action:
Hair color
Eye color Black Brunette Red Blond Marginal (Eye color)
Brown 0.11 0.20 0.04 0.01 0.37
Blue 0.03 0.14 0.03 0.16 0.36
Hazel 0.03 0.09 0.02 0.02 0.16
Green 0.01 0.05 0.02 0.03 0.11
Marginal (hair color) 0.18 0.48 0.12 0.21 1.0
21
Two-way discrete table
●
Example of Bayes’ rule in action:
Hair color
Eye color Black Brunette Red Blond Marginal (Eye color)
Brown 0.11 0.20 0.04 0.01 0.37
Blue 0.03 0.14 0.03 0.16 0.36
Hazel 0.03 0.09 0.02 0.02 0.16
Green 0.01 0.05 0.02 0.03 0.11
Marginal (hair color) 0.18 0.48 0.12 0.21 1.0
Hair color
Eye color Black Brunette Red Blond Marginal (Eye color)
Blue 0.03/ 0.36 0.14/ 0.36 0.03/ 0.36 0.16/ 0.36 0.36/ 0.36 = 1.0
= 0.08 = 0.39 = 0.08 = 0.45
22
Bayes’ rule
●
Bayes’ rule in the context of continuous
variables:
25
Bayes’ rule difficulty
26
Bayes’ rule difficulty
27
Bayes’ rule difficulty
●
For complex models this integral is impossible to solve analytically!
●
How has this difficulty been addressed?
– Analytically
●
Restricting to relatively simple likelihood functions with conjugate priors.
●
Variational approximation: Approximating functions with others that are
easier to work with.
– Numerically
●
Exhaustive summation over a grid of points covering the space.
●
Markov chain Monte Carlo (MCMC) methods: Randomly sampling a large
number of representative combinations of parameter values from the
posterior distribution.
28
Example
Inferring a binomial probability using pure
analytical mathematics without any
approximations.
29
Step 1: Data type
●
The first step is to identify the type of data
being described.
●
In this example we will try to estimate the
bias of a coin, i.e. the data can take one of
two values: heads (1) or tails (0).
30
Step 2: Descriptive model
●
The next step is to create a descriptive model with meaningful
paramaters.
●
This means coming up with a likelihood function.
●
In this example we will use the Bernoulli likelihood function:
31
Step 3: Prior distribution
●
The next step is to establish a prior
distribution over the parameter values.
●
Two desiderata for mathematical tractability:
– The product of results in a
function of the same form as .
– The denominator of Bayes’ rule can be solved
analytically, namely .
32
Step 3: Prior distribution
●
Notice that if the prior is of the form
●
A probability density of that form is called a beta distribution and it is
defined as
33
Step 3: Prior distribution
3.0
3.0
3.0
3.0
a = 0.1, b = 0.1 a = 1, b = 0.1 a = 2, b = 0.1 a = 3, b = 0.1
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
3.0
3.0
3.0
3.0
a = 0.1, b = 1 a = 1, b = 1 a = 2, b = 1 a = 3, b = 1
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
3.0
3.0
3.0
3.0
a = 0.1, b = 2 a = 1, b = 2 a = 2, b = 2 a = 3, b = 2
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
3.0
3.0
3.0
3.0
a = 0.1, b = 3 a = 1, b = 3 a = 2, b = 3 a = 3, b = 3
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
34
Step 4: Bayesian inference
●
The next steps are collecting the data and
applying Bayes’ rule to re-allocate credibility
across the possible parameter values.
p(θ|z , N ) = p(z , N |θ)p(θ)/ p(z , N ) Bayes’ rule
z (N − z ) θ (a− 1) (1 − θ) (b− 1)
= θ (1 − θ) / p(z , N )
B(a, b)
by definitions of Bernoulli and beta distributions
35
Step 4: Bayesian inferene
●
If the prior distribution is
36
Step 4: Bayesian inference
dbeta(θ|18.25, 6.75)
dbeta(θ|100, 100)
mode=0.5 mode=0.75
12
0 1 2 3 4 5
dbeta(θ|1, 1)
6
0 2 4 6 8
4
95% HDI 95% HDI
2
0.431 0.569 0.558 0.892
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
θ θ θ
Likelihood (Bernoulli) Likelihood (Bernoulli) Likelihood (Bernoulli)
0.00015
0.00015
p(D|θ)
p(D|θ)
p(D|θ)
max at 0.85 max at 0.85 max at 0.85
0.00000
0.00000
0.00000
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
θ θ θ
Posterior (beta) Posterior (beta) Posterior (beta)
dbeta(θ|35.25, 9.75)
dbeta(θ|117, 103)
0 1 2 3 4 5
dbeta(θ|18, 4)
6
0 2 4 6 8
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
θ θ θ
37
Step 5:
38
References
●
Doing Bayesian Data Analysis: A Tutorial with R and BUGS
by J.J. Kruschke
●
Introduction to Probability by D. P. Bersekas and J. N.
Tsitsiklis
39