Professional Documents
Culture Documents
The Annals of Applied Statistics 10.1214/08-AOAS205 Institute of Mathematical Statistics
The Annals of Applied Statistics 10.1214/08-AOAS205 Institute of Mathematical Statistics
The Annals of Applied Statistics 10.1214/08-AOAS205 Institute of Mathematical Statistics
1. The lung cancer data. The data are male lung cancer mortality fre-
quencies and population sizes for the period 1972–1981 in N = 84 Missouri
cities (see Table 1). The variables, given in Tsutakawa and reproduced be-
low, are the number r of men aged 45–54 dying from lung cancer in each
city over the period 1972–1981 and the city size n.
Most of the “cities” are small, though three are large. The mortality rates
are poorly defined in small cities; four cities with populations less than 200
have no deaths at all, so the observed rate is zero. Our aim is to estimate
the mortality rates in each city, using the information from other cities in
the most appropriate way.
Table 1
Male lung cancer mortality frequency and city size 1972–1981 in 84 Missouri cities
# n r # n r # n r # n r
we use for the second term the deviance at the MLEs rather than at the
posterior mean, since the latter is not a grid point in general and would
require additional computation. Here the second term is 362.51, giving pD =
1.81, and DIC 1 = 366.13.
However, Figure 3 provides much more information, since it gives the exact
(up to the grid resolution) posterior distribution of the deviance, without
any Monte Carlo simulation. Sorting the deviances
D2 (µ[g] , σ[g] ) = −2ℓ2 (µ[g] , σ[g] )
into increasing order with their corresponding posterior probabilities and
cumulating the latter, we obtain the cdf of the deviance, as shown in Figure
4.
Conditional on the ith area data ri , ni and the parameters µ, σ, the pos-
terior density of pi has the form
1 (θi − µ)2
n i ri
π(θi |µ, σ, ri ) = c(µ, σ) pi (1 − pi )ni −ri exp −
ri 2 σ2
10 M. AITKIN, C. C. LIU AND T. CHADWICK
4.2. Beta-binomial analysis. At the upper level, the city proportions are
modeled by a conjugate beta distribution:
Pi | a, b ∼ Beta(a, b),
with density function
f (p) = pa−1 (1 − p)b−1 /B(a, b), a, b > 0,
where B(a, b) is the complete beta function
B(a, b) = Γ(a)Γ(b)/Γ(a + b).
The beta-binomial likelihood is denoted by
m
Y ni
L2 (a, b) = B(ri + a, ni − ri + b)/B(a, b) .
ri
i=1
BAYESIAN MODEL COMPARISON 11
The full posterior distribution of the beta deviance D2 (µβ , σβ ) = −2 log L2 (µβ ,
σβ ), computed as for the normal model, is shown (dotted curve) in Figure
7, together with that of the normal model 1 (solid curve). We discuss the
comparison of these models in the next section.
Conditional on the ith area data ri , ni and the parameters a, b, the pos-
terior distribution of pi is again beta:
π(pi | a, b, ri ) = piri +a−1 (1 − pi )ni −ri +b−1 /B(ri + a, ni − ri + b).
The posterior densities for the area random effects pi are again computed
[t]
by Gaussian kernel smoothing of T = 10,000 random values pi , generated
[t] [t]
from T random draws (µβ , σβ ) from their posterior distribution which are
converted to T random values (a[t] , b[t] ) of (a, b). We finally draw the T
[t]
random values pi of pi , one each from the T beta distributions with indices
ri + a[t] , ni − ri + b[t] for the given i.
The T values of pi are transformed to the logit scale for ease of inspection
and consistency with Tsutakawa’s analysis; posterior densities for individual
BAYESIAN MODEL COMPARISON 13
cities are then computed using Gaussian kernel densities with bandwidths
chosen to give smooth densities. Figure 8 shows the five beta posteriors
(dotted curves) together with the normal posteriors from Figure 6 (solid
curves). The city numbers are placed at the intersection of the two densities.
The beta posteriors are slightly less concentrated than the normal poste-
riors except for city 84, and show slightly more shrinkage toward the mean.
Since the posterior conclusions from the beta distribution differ somewhat
from those from the normal, we need to decide whether the data support
one model over the other.
Fig. 8. Beta (dotted) and normal (solid) posterior densities for five cities.
10,000 (unordered) values for each model. The cdf of the deviance difference
distribution is shown in Figure 9.
The distribution has its median at 0.505, and the 95% credible interval
for the true difference, computed from the 250th and 9750th ordered differ-
ences [Congdon (2005)] is (−5.125, 6.378). The estimated probability that
the normal deviance is smaller than the beta is 0.6332: we cannot confidently
choose between these models.
The deviance constructions for Models 3 and 4 are different from those
for Models 1 and 2. The null Model 3 likelihood given p is
m
ni
pri (1 − p)ni −ri
Y
L3 (p) =
ri
i=1
m
" #
ni
pR (1 − p)N −R ,
Y
=
ri
i=1
where R = i ri , N = i ni .
P P
16 M. AITKIN, C. C. LIU AND T. CHADWICK
Fig. 10. Deviances for null (dashed), normal (solid), beta (dotted) and saturated (dot–
dashed) models.
The cdfs of all four deviance distributions are shown in Figure 10.
Model 3 is nearly 40 deviance units to the right of the normal and beta—
it is immediately clear that the null model is untenable. We exclude it from
BAYESIAN MODEL COMPARISON 17
The averaged density pave,i for pi is then defined through the T simulated
[t]
values pave,i produced by the algorithm: for each i and each m:
that neither the normal nor the beta provides a convincing representation of
the random effect distribution, and so conclusions about individual random
effects need to be based substantially on the local area rate.
Fig. 13. Averaged (solid) and local (dot-dashed) posteriors, five cities.
Fig. 14. Averaged (solid) and normal (dot-dashed) posteriors, five cities.
(0.00742 and 0.00867) which are very similar and near the median observed
rate (though nearly the smallest of the normal posterior mean rates), while
city 84 has almost the highest observed rate (0.01484), and has the highest
normal posterior mean rate.
The small cities with limited data cannot shrink effectively toward either
support point, since their rates are not well identified with these points,
and the generally higher likelihood for the saturated model accentuates this
effect. The model averaged rate distributions remain diffuse, though less so
than the single city rates.
What has been missing in the use of these models is a direct and straight-
forward assessment of the appropriateness, or goodness of fit, of the upper-
level model. This has been done in Bayesian analysis mostly through Bayes
factors, with their attendant prior sensitivity difficulties, or through the DIC,
with its definitional “focus” difficulties and ambiguous penalty.
The comparison of models through their posterior likelihood or deviance
distributions provides a straightforward way, not only to compare different
22 M. AITKIN, C. C. LIU AND T. CHADWICK
REFERENCES
Aitkin, M. (1997). The calibration of P -values, posterior Bayes factors and the AIC
from the posterior distribution of the likelihood (with discussion). Statist. Comput. 7
253–272.
Aitkin, M. (1999). A general maximum likelihood analysis of variance components in
generalized linear models. Biometrics 55 117–128. MR1705676
BAYESIAN MODEL COMPARISON 23
Aitkin, M., Boys, R. J. and Chadwick, T. (2005). Bayesian point null hypothesis
testing via the posterior likelihood ratio. Statist. Comput. 15 217–230. MR2147554
Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data
Analysis. Chapman and Hall, London. MR1427749
Celeux, G., Forbes, F., Robert, C. P. and Titterington, D. M. (2006). Deviance
information criteria for missing data models (with discussion). Bayesian Anal. 1 651–
706. MR2282197
Congdon, P. (2005). Bayesian predictive model comparison via parallel sampling. Com-
put. Statist. Data Anal. 48 735–753. MR2133574
Congdon, P. (2006). Bayesian model comparison via parallel model output. J. Statist.
Comput. Simul. 76 149–165. MR2224357
Dempster, A. P. (1974). The direct use of likelihood in significance testing. In Proc. Conf.
Foundational Questions in Statistical Inference (O. Barndorff-Nielsen, P. Blaesild and
G. Sihon, eds.) 335–352. Kluwer, Hingham, MA. MR0408052
Dempster, A. P. (1997). The direct use of likelihood in significance testing. Statist.
Comput. 7 247–252.
Fox, J.-P. (2005). Multilevel IRT using dichotomous and polytomous response data. Brit.
J. Math. Statist. Psych. 58 145–172. MR2196136
Hoeting, J. A., Madigan, D., Raftery, A. and Volinsky, C. T. (1999). Bayesian
model averaging: A tutorial. Statist. Sci. 14 382–417. MR1765176
Ridall, P. G., Pettitt, A. N., Friel, N., Henderson, R. and McCombe, P. (2007).
Motor unit number estimation using reversible jump Markov chain Monte Carlo meth-
ods (with discussion). J. Roy. Statist. Soc. Ser. C 56 235–269. MR2370990
Roeder, K. (1990). Density estimation with confidence sets exemplified by superclusters
and voids in the galaxies. J. Amer. Statist. Assoc. 85 617–624.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002).
Bayesian measures of model complexity and fit (with discussion). J. Roy. Statist. Soc.
Ser. B 64 583–639. MR1979380
Trevisani, M. and Gelfand, A. E. (2003). Inequalities between expected marginal
log likelihoods with implications for likelihood-based model comparison. Canadian J.
Statist. 31 239–250. MR2030122
Tsutakawa, R. K. (1985). Estimation of cancer mortality rates: A Bayesian analysis of
small frequencies. Biometrics 41 69–79. MR0793434
M. Aitkin T. Chadwick
C. C. Liu Institute of Health and Society
School of Behavioural Science Newcastle University
University of Melbourne United Kingdom
Australia E-mail: t.j.chadwick@newcastle.ac.uk
E-mail: murray.aitkin@ms.unimelb.edu.au
charles.liu@muarc.monash.edu.au