Bayesian Credible Sets: C (x) θ∈C (x)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

STATS 200 (Stanford University, Summer 2015)

Lecture 12:

Bayesian Credible Sets

The last lecture introduced the idea of using data X to estimate by reporting a random
set C(X) that we think is likely to contain the true value of . We now consider a Bayesian
approach for constructing such a set, which is called a credible set.
Posterior Probabilities
Bayesian inference is based on the posterior distribution of the unknown parameter given
the observed data X = x, which we typically write as ( x). A credible set for with
posterior probability is a set C(x) such that P [ C(x) X = x] = . The posterior
probability is typically reported as a percentage and the credible set is identified by this
percentage (e.g., a 95% credible set for when = 0.95). As with confidence intervals, it is
common to write this posterior probability as 1 instead of .
Computing Credible Intervals
The posterior probability in the definition above is simply is simply
P [ C(x) X = x] =

C(x)

( x) d

or

P [ C(x) X = x] = ( x)
C(x)

according to whether is continuous or discrete. Note that for any particular value of 1
and any particular observed data X = x, there will typically be many sets C(x) such that
P [ C(x) X = x] = 1 . There are two commonly used approaches for choosing a
particular set C(x).
Equal-Tailed Credible Intervals
One approach is to to choose the set C(x) to be an interval {0 C1 (x) < 0 < C2 (x)}
such that P [ C1 (x) X = x] = P [ C2 (x) X = x] = /2, i.e.,
C1 (x)

( x) d =

C2 (x)

( x) d =

or

( x) =

C1 (x)

( x) =

C2 (x)

according to whether is continuous or discrete. Such an interval is called an equal-tailed


credible interval for with posterior probability 1 .
Example 12.0.1: Let X1 , . . . , Xn iid Bin(1, ), where 0 < < 1, and suppose we observe
X = x. Now take our prior distribution for to be Beta(a, b) for some a > 0 and b > 0. In
Example 5.2.1 of Lecture 5, we found that the posterior distribution was
n

i=1

i=1

x Beta(a + xi , b + n xi ).
Since we cannot integrate the pdf of the beta distribution in closed form, there is no direct
formula for an equal-tailed credible interval with probability 1. However, for any particular
values of a, b, n, and x, it is trivial to compute such an interval using statistical software
or tables of the beta distribution. For example, if a = b = 1, n = 25, and ni=1 xi = 24, then a
95% equal-tailed credible interval for is approximately (0.803, 0.991).

Lecture 12: Bayesian Credible Sets

Highest Posterior Density Credible Sets


A different approach is to choose C(x) to be the set where the posterior pdf or pmf ( x)
exceeds some value, i.e.,
C(x) = {0 (0 x) > k},
where k > 0 is chosen so that P [ C(x) X = x] = 1 . Such a set (which may or may not
be an interval) is called a highest posterior density (or HPD) credible set for with posterior
probability 1 .
Note: This terminology may seem as though it should only apply when is continuous,
since otherwise it does not have a pd f (probability density function). However, in more
sophisticated treatments of probability, both pdfs and pmfs can be defined as special
cases of a more general notion of density. (Specifically, a pdf is a density with respect to
Lebesgue measure, while a pmf is a density with respect to counting measure.) Thus,
there is no problem with referring to these sets as highest posterior density credible
sets even when is discrete.

Example 12.0.2: For the posterior in Example 12.0.1, a highest posterior density credible
interval is not as easy to obtain numerically as an equal-tailed credible interval. Since the
beta pdf is unimodal, the problem reduces to finding two endpoints for which (i) the beta pdf
takes the same value and (ii) the integral of the beta pdf between the two endpoints is 1 .
(If the beta pdf were not unimodal, the task would be more difficult.) This problem can
still be solved numerically, and we find that a 95% HPD credible interval is approximately
(0.829, 0.998).

The following theorem provides an intuitively sensible property of highest posterior density
credible sets. To state the theorem concisely, we first define some notation. For any subset H
of the parameter space , let A(H) denote the total length of the set (if is continuous)
or the total number of points in the set (if is discrete), i.e., A(H) = H d or A(H) = H.
Note: If you are familiar with the notions of Lebesgue measure and counting measure,
then you recognize that we are simply taking A(H) to be the measure of the set H
under whichever of these measures is appropriate.

Theorem 12.0.3. Let C(x) be a highest posterior density credible set for with posterior

probability 1, and let C(x)


be some other credible set for with posterior probability 1.

Then A[C(x)] A[C(x)].


Proof. Since C(x) is a highest posterior density credible set, there exists k > 0 such that
( x) > k for all C(x) and ( x) k for all C(x). Now observe that

P [ C(x) C(x)
X = x] = P [ C(x) X = x] P [ C(x) C(x)
X = x]

= P [ C(x)
X = x] P [ C(x) C(x)
X = x]

= P [ C(x)
C(x) X = x],

Lecture 12: Bayesian Credible Sets

where the second equality holds because C(x) and C(x)


have posterior probability 1 .
Next, note that

P [ C(x) C(x)
X = x] k A[C(x) C(x)]
since ( x) > k for all C(x), and

P [ C(x)
C(x) X = x] k A[C(x)
C(x)]

since ( x) k for all C(x).


Then

A[C(x) C(x)]
A[C(x)
C(x)].

Then the result follows from the fact that A[C(x)] = A[C(x) C(x)]
+ A[C(x) C(x)]

and A[C(x)]
= A[C(x) C(x)]
+ A[C(x)
C(x)].
The basic idea of Theorem 12.0.3 is simpler than it may appear from the proof. Essentially,
to capture a specified posterior probability in the smallest possible set, we should choose
a set on which the posterior pdf or pmf is as large as possible. Such a set is precisely the
definition of a highest posterior density credible set.
Numerical or Approximate Calculations
As we saw in Example 12.0.1 and Example 12.0.2, simple formulas for credible intervals often
cannot be found since the posterior distribution is often a pdf that cannot be integrated in
closed form. Thus, we typically must rely on either numerical solutions (as in Example 12.0.1
and Example 12.0.2) or approximations.

You might also like