Bayesian Data Analysis

Bayesian Data Analysis
An Introduction
By Martin Roa Villescas
What is Bayesian inference?
●
Bayesian inference is reallocation of credibility across possibilities.
Prior Prior Prior
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0

Credibility
Credibility
Credibility
A B C D A B C D A B C D
Possibilities Possibilities Possibilities
Posterior Posterior Posterior

0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0

A is B is C is
impossible impossible impossible
Credibility
Credibility
Credibility
2
What is Bayesian inference?
●
Bayesian inference is reallocation of credibility across possibilities.
Prior Prior Prior
●
This reallocation of credibility is not only intuitive, it is also what the
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0

exact mathematics of Bayesian inference prescribe!
Credibility
Credibility
Credibility
Posterior Posterior Posterior

0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0

A is B is C is
impossible impossible impossible
Credibility
Credibility
Credibility
3
Foundational ideas
●
Bayesian data analysis has two foundational ideas:
1) Bayesian inference is reallocation of credibility

across possibilities.
2) The possibilities, over which we allocate
credibility, are parameter values in meaningful
mathematical models.
●
You can think of parameters as control knobs on
mathematical devices that simulate data generation.
4
Bayesian probability
●
The mathematics that govern reallocation of
credibility.
●
Boils down to one simple equation: Bayes’ rule!
●
Bayes’ rule is derived from:
– The conditional probability law
– The total probability theorem
6
Thomas Bayes
●
Thomas Bayes (1702-1761)
●
English mathematician and Presbyterian minister.
●
His theorem was published posthumously in 1763 by Richard Price.
●
An alternative approach of statistical inference emerged later in the
20th century known as Frequentist inference with Ronald Fisher
(1890-1962) as its main figure.
●
Although the Fisherian approach was dominant in the 20th century,
it is curious and re-assuring that the older Bayesian approach of
the 18th century is taking over in the 21st century.
7
Probabilistic models
●
A probabilistic model is a mathematical description of an uncertain
situation.
●
It has a sample space Ω, which is the set of all possible outcomes of an
experiment.
●
And a probability law, which assigns to a set A of possible outcomes a
nonnegative number P(A) that encodes our belief about the collective
“likelihood”of the elements of A.
8
Probabilistic models
●
Elements of the sample space must be distinct and
mutually exclusive.
●
The sample space must be collectively exhaustive.
9
Conditional probability law
10
Total probability theorem
11
Bayes’ rule
●
Conditional probability law (Product rule)
●
Total probability theorem
●
Bayes’ rule
12
Bayes’ rule
13
Bayes’ rule
●
Bayes’ rule is merely the mathematical relation between the prior
allocation of credibility and the posterior reallocation of credibility
conditional on data.
●
What is inference?
– There are a number of “causes” θi that may result in an “effect” D.
– We observe the effect D, and we wish to infer the cause θi.
14
Bayes’ rule
●
Bayes’ rule is merely the mathematical relation between the prior
allocation of credibility and the posterior reallocation of credibility
conditional on data.
●
What is inference?
– There are a number of “causes” θi that may result in an “effect” D.
– We observe the effect D, and we wish to infer the cause θi.
15
Bayes’ rule
●
Likelihood function
●
Prior distribution
●
Evidence
●
Posterior distribution
16
Bayes’ rule
Likelihood function The probability that the data D could

●
●
be generated by the model with
parameter value θi.
●
Prior distribution
●
Although it specifies a probability at
●
Evidence each value of θi, the likelihood
function is not a probability
●
Posterior distribution distribution.
17
Bayes’ rule
Likelihood function The probability distrubution that

●
●
describes the credibility of the
parameter values θi before the data D
●
Prior distribution is taken into account.
●
Evidence
●
18
Bayes’ rule
Likelihood function The overall probability of the data

●
●
according to the model, determined
by averaging across all possible
●
Prior distribution parameter values weighted by the
strength of belief in those parameter
●
Evidence values.
●
19
Bayes’ rule
Likelihood function The probability distribution that

●
●
discribes the credibility of the
parameter values θi with the data D
●
Prior distribution taken into account.
●
Evidence
●
20
Two-way discrete table
●
Example of Bayes’ rule in action:
Hair color
Eye color Black Brunette Red Blond Marginal (Eye color)
Brown 0.11 0.20 0.04 0.01 0.37
Blue 0.03 0.14 0.03 0.16 0.36
Hazel 0.03 0.09 0.02 0.02 0.16
Green 0.01 0.05 0.02 0.03 0.11
Marginal (hair color) 0.18 0.48 0.12 0.21 1.0
21
Two-way discrete table
●
Example of Bayes’ rule in action:
Hair color
Brown 0.11 0.20 0.04 0.01 0.37
Blue 0.03 0.14 0.03 0.16 0.36
Hazel 0.03 0.09 0.02 0.02 0.16
Green 0.01 0.05 0.02 0.03 0.11
Marginal (hair color) 0.18 0.48 0.12 0.21 1.0
Hair color
Blue 0.03/ 0.36 0.14/ 0.36 0.03/ 0.36 0.16/ 0.36 0.36/ 0.36 = 1.0
= 0.08 = 0.39 = 0.08 = 0.45
22
Bayes’ rule
●
Bayes’ rule in the context of continuous
variables:
25
Bayes’ rule difficulty
26
27
●
For complex models this integral is impossible to solve analytically!
●
How has this difficulty been addressed?
– Analytically
●
Restricting to relatively simple likelihood functions with conjugate priors.
●
Variational approximation: Approximating functions with others that are
easier to work with.
– Numerically
●
Exhaustive summation over a grid of points covering the space.
●
Markov chain Monte Carlo (MCMC) methods: Randomly sampling a large
number of representative combinations of parameter values from the
posterior distribution.
28
Example
Inferring a binomial probability using pure
analytical mathematics without any
approximations.
29
Step 1: Data type
●
The first step is to identify the type of data
being described.
●
In this example we will try to estimate the
bias of a coin, i.e. the data can take one of
two values: heads (1) or tails (0).
30
Step 2: Descriptive model
●
The next step is to create a descriptive model with meaningful
paramaters.
●
This means coming up with a likelihood function.
●
In this example we will use the Bernoulli likelihood function:
where z is the number of heads and N – z is the number of tails.

●
In this function, θ represents the underlying probability of
heads, and therefore it can only take values from 0 to 1.
31
Step 3: Prior distribution
●
The next step is to establish a prior
distribution over the parameter values.
●
Two desiderata for mathematical tractability:
– The product of results in a
function of the same form as .
– The denominator of Bayes’ rule can be solved
analytically, namely .
32
●
Notice that if the prior is of the form
then when multiplied with the likelihood function, the resulting

function is of the same form, namely
●
A probability density of that form is called a beta distribution and it is
defined as
where B(a,b) is a normalizing constant that ensures that the area

under the beta density integrates to 1, namely
33
3.0
3.0
3.0
3.0
a = 0.1, b = 0.1 a = 1, b = 0.1 a = 2, b = 0.1 a = 3, b = 0.1
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
3.0
3.0
3.0
3.0
a = 0.1, b = 1 a = 1, b = 1 a = 2, b = 1 a = 3, b = 1
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
3.0
3.0
3.0
3.0
a = 0.1, b = 2 a = 1, b = 2 a = 2, b = 2 a = 3, b = 2
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
3.0
3.0
3.0
3.0
a = 0.1, b = 3 a = 1, b = 3 a = 2, b = 3 a = 3, b = 3
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
p(θ|a, b)
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
θ θ θ θ
34
Step 4: Bayesian inference
●
The next steps are collecting the data and
applying Bayes’ rule to re-allocate credibility
across the possible parameter values.
p(θ|z , N ) = p(z , N |θ)p(θ)/ p(z , N ) Bayes’ rule
z (N − z ) θ (a− 1) (1 − θ) (b− 1)
= θ (1 − θ) / p(z , N )
B(a, b)
by definitions of Bernoulli and beta distributions
= θ z (1 − θ) (N − z ) θ (a− 1) (1 − θ) (b− 1) / [ B(a, b)p(z , N ) ] by rearranging factors
= θ ((z + a)− 1) (1 − θ) ((N − z + b)− 1) / [ B(a, b) p(z , N ) ] by collecting powers
= θ ((z + a)− 1) (1 − θ) ((N − z + b)− 1) / [ B(z + a, N − z + b) ]
35
Step 4: Bayesian inferene
●
If the prior distribution is
and the data have z heads in N flips, then the

posterior distribution is
36
Step 4: Bayesian inference
Prior (beta) Prior (beta) Prior (beta)
dbeta(θ|18.25, 6.75)
dbeta(θ|100, 100)
mode=0.5 mode=0.75
12
0 1 2 3 4 5
dbeta(θ|1, 1)
6
0 2 4 6 8
4
95% HDI 95% HDI
2
0.431 0.569 0.558 0.892
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
θ θ θ
Likelihood (Bernoulli) Likelihood (Bernoulli) Likelihood (Bernoulli)
Data: z=17,N=20 Data: z=17,N=20 Data: z=17,N=20

0.00015
0.00015
0.00015
p(D|θ)
p(D|θ)
p(D|θ)
max at 0.85 max at 0.85 max at 0.85
0.00000
0.00000
0.00000
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
θ θ θ
Posterior (beta) Posterior (beta) Posterior (beta)
dbeta(θ|35.25, 9.75)
dbeta(θ|117, 103)
mode=0.532 mode=0.797 mode=0.85

12
0 1 2 3 4 5
dbeta(θ|18, 4)
6
0 2 4 6 8
95% HDI 95% HDI 95% HDI

2
0.466 0.597 0.663 0.897 0.66 0.959

0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
θ θ θ
37
Step 5:
38
References
●
Doing Bayesian Data Analysis: A Tutorial with R and BUGS
by J.J. Kruschke
●
Introduction to Probability by D. P. Bersekas and J. N.
Tsitsiklis
39

Bayesian Data Analysis

Uploaded by

Copyright:

Available Formats

You might also like

Bayesian Data Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Data Analysis

Uploaded by

Copyright:

Available Formats

Bayesian Data Analysis

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Posterior Posterior Posterior

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Posterior Posterior Posterior

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

1) Bayesian inference is reallocation of credibility

Likelihood function The probability that the data D could

Likelihood function The probability distrubution that

Likelihood function The overall probability of the data

Likelihood function The probability distribution that

where z is the number of heads and N – z is the number of tails.

then when multiplied with the likelihood function, the resulting

where B(a,b) is a normalizing constant that ensures that the area

= θ z (1 − θ) (N − z ) θ (a− 1) (1 − θ) (b− 1) / [ B(a, b)p(z , N ) ] by rearranging factors

= θ ((z + a)− 1) (1 − θ) ((N − z + b)− 1) / [ B(a, b) p(z , N ) ] by collecting powers

= θ ((z + a)− 1) (1 − θ) ((N − z + b)− 1) / [ B(z + a, N − z + b) ]

and the data have z heads in N flips, then the

Prior (beta) Prior (beta) Prior (beta)

Data: z=17,N=20 Data: z=17,N=20 Data: z=17,N=20

mode=0.532 mode=0.797 mode=0.85

95% HDI 95% HDI 95% HDI

0.466 0.597 0.663 0.897 0.66 0.959

You might also like