Chapter 4 Bayesian Machinery - Bayesian Hierarchical Models in Ecology

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

Chapter 4 Bayesian Machinery

4.1 Bayes’ Rule


P (y|θ) × P (θ)
P (θ|y) =
P (y)

where

1. P (θ|y) = posterior distribution


2. P (y|θ) = Likelihood function
3. P (θ) = Prior distribution
4. P (y) = Normalizing constant

4.1.1 Posterior Distribution: p(θ|y)

The posterior distribution (often abbreviated as the posterior) is simply the way of
saying the result of computing Bayes’ Rule for a set of data and parameters.
Because we don’t get point estimates for answers, we correctly call it a distribution,
and we add the term posterior because this is the distrution produced at the end.
You can think of the posterior as a astatement about the probability of the
parameter value given the data you observed.

“Reallocation of credibilities across possibilities.” - John Kruschke

https://bookdown.org/steve_midway/BHME/Ch3.html 1/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

https://bookdown.org/steve_midway/BHME/Ch3.html 2/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

4.1.2 Likelihood Function: p(y|θ)

Skip the math


Consider it similar to other likelihood functions
In fact, it will give you the same answer as ML estimator (interpretation differs)

4.2 Priors: p(θ)

Figure 4.1: Prior information can be useful.

Distribution we give to a parameter before computation


WARNING: This is historically a big deal among statisticians, and subjectivity
is a main concern cited by Frequentists
Priors can have very little, if any, influence (e.g., diffuse, vague, non-
informative, unspecified, etc), yet all priors are technically informative.
Much of ecology uses diffuse priors, so little concern
But priors can be practical if you really do know information (e.g., even basic
information, like populations can’t be negative values)

https://bookdown.org/steve_midway/BHME/Ch3.html 3/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

Simple models may not need informative priors; complex models may need
priors

Figure 4.2: Example of prior, likelihood, and poserior distributions.

Figure 4.3: Example of prior influence based on prior parameters.

You may not use informative priors when starting to model. Regardless, always
think about your priors, explore how they work, and be prepared to defend them to
reviewers and other peers.

“So far there are only few articles in ecological journals that have actually used this
asset of Bayesian statistics.” - Marc Kery (2010)

https://bookdown.org/steve_midway/BHME/Ch3.html 4/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

4.3 Normalizing Constant: P (y)

The normalizing constant is a function that converts the area under the curve to 1.
While this may seem technical—and it is—this is what allows us to interpret
Bayesian output probabilistically. The normalizing constant is a high dimension
integral that in most cases cannot be analytically solved. But we need it, so we
have to simulate it. To do this, we use Markov Chain Monte Carlo, MCMC.

4.3.1 MCMC Background

Stan Ulam: Manhattan project scientist


The solitaire problem: How do you know the chance of winning?
Can’t really solve… too much complexity
But we can automate a bunch of games and monitor the results—basically we
can do something so much that we assume the simulations are approximating
the real thing.

Fun Fact: There are


80,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000 solitaire combinations!

Markov Chain: transitions from one state to another (dependency) Monte Carlo:
chance applied to the transition (randomness)

MCMC is a group of functions, governed by specific algorithms


Metropolis-Hastings algorithm: one of the first algorithms
Gibbs algorithm: splits multidimensional θ into separate blocks, reducing
dimensionality
Consider MCMC a black box, if that’s easier

https://bookdown.org/steve_midway/BHME/Ch3.html 5/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

Figure 4.4: MCMC samplers are designed to sample parameter space with a
combination of dependency and randomness.

4.3.2 MCMC Example

A politician on a chain of islands wants to spend time on each island proportional to


each island’s population.

1. After visiting one island, she needs to decide…

stay on current island


move to island to the west
move to island to the east

2. But she doesn’t know overall population—can ask current islanders their
population and population of adjacent islands

3. Flips a coin to decide east or west island

if selected island has larger population, she goes


if selected island has smaller population, she goes probabilistically

MCMC is a set of techniques to simulate draws from the posterior distribution


p(θ|x) given a model, a likelihood p(x|θ) , and data x, using dependent sequences
of random variables. That is, MCMC yields a sample from the posterior distribution
of a parameter.

4.3.3 Gibbs Sampling

https://bookdown.org/steve_midway/BHME/Ch3.html 6/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

One iteration includes as many random draws as there are parameters in the
model; in other words, the chain for each parameter is updated by using the last
value sampled for each of the other parameters, which is referred to as full
conditional sampling.

Figure 4.5: Visualizing parameter sampling

Although the nuts and bolts of MCMC can get very detailed and may go beyond the
operational knowledge you need to run models, there are some practical issues
that you will need to be comfortable handling, including initial values, burn-in,
convergence, thinning,

4.3.4 Burn-in

Chains start with an initial value that you specify or randomize


Initial value may not be close to true value
This is OK, but need time for chain to find correct parameter space
If you know your high probability region, then you may have burned in already
Visual Assessment can confirm burn-in

https://bookdown.org/steve_midway/BHME/Ch3.html 7/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

Figure 4.6: Burn-in is the initial MCMC sampling that may take place outside of the
highest probability region for a parameter.

4.3.5 Convergence

We run multiple independent chains for stronger evidence of correct parameter


space
I When chains converge on the same space, that is strong evidence for
convergence
But how do we know or measure convergence?
Averages of the functions may converge (chains don’t technically converge)

Figure 4.7: Clean non-convergence for 3 chains.

https://bookdown.org/steve_midway/BHME/Ch3.html 8/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

Figure 4.8: Non-convergence is not always obvious. These chains are not
converging despite overlapping.
Convergence Diagnostics

1. Visual convergence of iterations (“hairy caterpillar” or the “grass”)


2. Visual convergence of histograms
3. Brooks-Gelman-Rubin Statistic, ^
R

4. Others

Figure 4.9: Histograms and density plots are a good way to visualize convergence.

4.3.6 Thinning

MCMC chains are autocorrelated, so ^ ∼ f (θ


θ t
^
t−1 ) . It s common practice to thin by
2, 3 or 4 to reduce autocorrelation. However, there are also arguements against
thinning.

4.3.7 MCMC Summary

There is some artistry in MCMC, or at least some decision that need to be made by
the modeler. Your final number of samples in your posterior is often much less than
your total iterations, because in handling the MCMC iterations you will need to
eliminate some samples (e.g., burn-in and thinning). Many MCMC adjustments you
make will not result in major changes, and this is typically a good thing because it
means you are in the parameter space you need to be in. Other times, you will

https://bookdown.org/steve_midway/BHME/Ch3.html 9/10
4/5/24, 6:40 PM Chapter 4 Bayesian Machinery | Bayesian Hierarchical Models in Ecology

have a model issue and some MCMC adjustment will make a (good) difference.
Because computation is cheap—especially for simple models—it is common to
over-do the iterations a little. This is OK.

Here is a nice overview video about MCMC. https://www.youtube.com/watch?


v=OTO1DygELpY

And here is a great simulator to play with to evaluate how changes in MCMC
settings play out visually. https://chi-feng.github.io/mcmc-demo/app.html

https://bookdown.org/steve_midway/BHME/Ch3.html 10/10

You might also like