Practical Guidance For Bayesian Inference in Astronomy

RASTI 000, 1–10 (0000) Preprint 10 February 2023 Compiled using rasti LATEX style file v3.
Practical Guidance for Bayesian Inference in Astronomy

Gwendolyn M. Eadie1,2★ , Joshua S. Speagle,1,2,3 Jessi Cisewski-Kehe,4 Daniel Foreman-Mackey,5
Daniela Huppenkothen,6 David E. Jones,7 Aaron Springford8 and Hyungsuk Tak9,10,11
1 University of Toronto, David A. Dunlap Department of Astronomy & Astrophysics, Toronto, M5S 3H4, Canada
2 University of Toronto, Department of Statistical Sciences, Toronto, M5S 3G3, Canada
3 University of Toronto, Dunlap Institute for Astronomy & Astrophysics, Toronto, M5S 3H4, Canada
4 University of Wisconsin-Madison, Department of Statistics, Madison, WI, 53706, USA
5 Center for Computational Astrophysics, Flatiron Institute, 160 5th Ave, New York, NY 10010, USA
6 SRON Netherlands Institute for Space Research, Niels Bohrlaan 4, 2333 CA Leiden, Netherlands
arXiv:2302.04703v1 [astro-ph.IM] 9 Feb 2023
7 Texas A&M University, Department of Statistics, College Station, TX 77843, USA

8 Cytel, Toronto, Ontario, Canada
9 Pennsylvania State University, Department of Statistics, University Park, PA 16802, USA
10 Pennsylvania State University, Department of Astronomy & Astrophysics, University Park, PA 16802, USA
11 Pennsylvania State University, Institute for Computational and Data Sciences, University Park, PA 16802, USA
10 February 2023
ABSTRACT
In the last two decades, Bayesian inference has become commonplace in astronomy. At the same time, the choice of algorithms,
terminology, notation, and interpretation of Bayesian inference varies from one sub-field of astronomy to the next, which can
lead to confusion to both those learning and those familiar with Bayesian statistics. Moreover, the choice varies between the
astronomy and statistics literature, too. In this paper, our goal is two-fold: (1) provide a reference that consolidates and clarifies
terminology and notation across disciplines, and (2) outline practical guidance for Bayesian inference in astronomy. Highlighting
both the astronomy and statistics literature, we cover topics such as notation, specification of the likelihood and prior distributions,
inference using the posterior distribution, and posterior predictive checking. It is not our intention to introduce the entire field of
Bayesian data analysis – rather, we present a series of useful practices for astronomers who already have an understanding of the
Bayesian "nuts and bolts" and wish to increase their expertise and extend their knowledge. Moreover, as the field of astrostatistics
and astroinformatics continues to grow, we hope this paper will serve as both a helpful reference and as a jumping off point for
deeper dives into the statistics and astrostatistics literature.
Key words: astrostatistics – computational methods – parallax
1 INTRODUCTION data such as missing and censored data. Fourth, astronomers of-
ten have prior knowledge about allowable and realistic ranges of
Over the past two decades, Bayesian inference has become increas-
parameter values (e.g., through physical theories and previous ob-
ingly popular in astronomy. On NASA’s Astrophysics Data Sys-
servations/experiments) which can naturally be included in prior
tem (ADS), a search using “keyword:statistical” and “abs:bayesian”
distributions and thereby improve the final inference.
yields 2377 refereed papers, and shows exponential growth since the
Importantly, in addition to the aforementioned advantages of the
year 2000, with over 237 papers in 2021.
Bayesian approach, efficient and increased computing power, along
Bayesian analyses have become popular in astronomy due to sev-
with easy-to-use or out-of-the-box algorithms, have brought Bayesian
eral key advantages over traditional methods. First, an estimate of
methodology to astronomers in convenient practical forms (e.g., em-
the posterior distribution of model parameters provides a more com-
cee (Foreman-Mackey et al. 2013), Rstan (Stan Development Team
plete picture of parameter uncertainty, joint parameter uncertainty,
2020), PyStan (Riddell et al. 2017), PyMC3 (Salvatier et al. 2016),
and parameter relationships given the model, data, and prior as-
BUGS (Lunn et al. 2000), NIMBLE (de Valpine et al. 2017), and
sumptions than traditional methods. Second, the interpretation of
JAGS (Plummer et al. 2003)).
Bayesian probability intervals is often closer to what scientists de-
Interestingly, the surge in popularity of Bayesian statistics comes
sire, and is an appealing alternative to point estimates with confi-
in spite of the fact that Bayesian methods are rarely taught in un-
dence intervals which often rely on the sampling distribution of the
dergraduate astronomy and physics programs, and has only recently
estimator. Third, Bayesian analysis easily allows for marginalization
been introduced at a basic level in astronomy graduate courses (Eadie
over nuisance parameters, incorporation of measurement uncertain-
et al. 2019b,a). Some challenges faced by both new and seasoned
ties through measurement error models, and inclusion of incomplete
users of Bayesian inference are the varied notation, terminology, in-
terpretation, and choice of algorithms available in the astronomy and
★ E-mail: gwen.eadie@utoronto.ca statistics literature.
© 0000 The Authors

2 G.M. Eadie et al.
Being well-versed in best practices and common pitfalls associated 2.2), 𝑝(𝜽) is the prior density (Section 2.3), and 𝑝(𝒚) is the prior
with the Bayesian framework is important if these methods are to be predictive density (Schervish 1995). With the data 𝒚 in hand, 𝑝(𝒚|𝜽)
used to advance the field of astronomy. Here, users of Bayesian is often viewed as a function of 𝜽 called the likelihood function (which
inference in astronomy face challenges too, since undergraduate and is not a probability density), and 𝑝(𝒚) is a normalizing constant that
graduate program training is still catching up to the state-of-the-art does not depend on 𝜽 (which is often referred to in astronomy as
Bayesian inference methods. the model evidence). In the probability and Bayesian computation
The goal of this paper is two-fold. Our first goal is to provide a literature, the posterior probability distribution of interest is usually
“translation” between terminology and notation used for Bayesian denoted and referred to as the target distribution with the notation
inference across the fields of astronomy and statistics. Our second 𝜋. There are also differences in notation for the likelihood across
goal is to illustrate useful practices for the Bayesian inference process disciplines, which we discuss in Section 2.2.
which we hope will be a valuable contribution to astronomers who For smooth translation between sub-disciplines of astronomy and
are familiar with and/or use Bayesian statistics in research. To achieve statistics, it is beneficial to use explicit statements about model
these goals, we deal with the following topics in the main body of choices and parameter definitions. For example, a list of all model
the paper: notation (Section 2.1), interpreting and determining the parameters, notation, and their associated prior probability distribu-
likelihood (Section 2.2), choosing and assessing prior distributions tions in the form of a table is very useful to the reader. Moreover,
(Section 2.3), evaluating and making inference from the posterior we stress the importance of fully specifying any Bayesian model in
distribution (Section 2.4), and performing posterior predictive checks papers to increase reproducibility (e.g., via a detailed appendix, open
(Section 2.5). code). In this spirit, we provide the full Bayesian model for our run-
This work is not meant to be a comprehensive introduction to ning example in Table 1, explicitly define notation next, and provide
Bayesian inference, but rather an unveiling of Bayesian statistics as our open source code1 .
both an extensive topic and an active research area. We focus our
efforts on identifying common mistakes and misunderstandings re-
lated to Bayesian inference, and use these as jumping off points for 2.1.1 Parallax Example: Defining Notation
highlighting important topics for further study. Indeed, many valu- For the running example in this paper, we infer the distance to a star
able topics and subtopics arise which we do not cover, but we make from a parallax measurement. The true but unknown distance 𝑑 in
a point of providing key references. For example, throughout the kiloparsecs (kpc) is related to the true but unknown parallax 𝜛 in
paper we touch on areas such as Bayesian design, posterior pre- milliarcseconds (mas) through
dictive checking, hierarchical modeling, and Bayesian computations
(Craiu & Rosenthal 2014; Robert 2014), and also recommend books 1
𝑑 [kpc] = . (2)
on Bayesian data analysis from statistics and astrostatistics (Gelman 𝜛 [mas]
et al. 2013; Carlin & Louis 2008; Hilbe et al. 2017). Our data 𝑦 is a measurement of the parallax, and has some fixed un-
To help the narrative, we use a running example at the end of each certainty 𝜎 that we treat as known. Thus, in the Bayesian framework,
section — inferring the distance to a star through its parallax mea- we wish to infer the parameter 𝑑 given the data 𝑦, and we seek to find
surement. The specific problem of distance estimation from parallax the posterior distribution,
in the Bayesian context is explored closely in other studies (Bailer-
Jones 2015; Astraatmadja & Bailer-Jones 2016a,b) and applied to 𝑝(𝑑|𝑦) ∝ 𝑝(𝑦|𝑑) 𝑝(𝑑). (3)
the Gaia second data release (Gaia DR2) (Lindegren, L. et al. 2018; In Section 2.4.2, we extend this example to infer the distance to a
Bailer-Jones et al. 2018; Schönrich et al. 2019), and we refer the star cluster from the parallax measurements of many stars within
reader to these papers for a deep exploration on this topic. Here, the cluster. The true but unknown distance to the cluster, 𝑑cluster ,
we employ this example because of its generality, because it pro- is related to the true but unknown parallax of the cluster through
vides some interesting challenges and potential pitfalls, and because Equation 2, too. In this case, we express the corresponding posterior
it provides a nice framework to illustrate sound practices in Bayesian density as follows:
analysis. Along the way, we also identify how our advice applies to
other areas in astronomy. 𝑝(𝑑cluster | 𝒚) ∝ 𝑝( 𝒚|𝑑cluster ) 𝑝(𝑑cluster ), (4)
where 𝒚 represents a vector of parallax measurements of its stars.
Table 1 summarizes this model specification.
2 SPECIFYING A BAYESIAN MODEL
2.1 Notation & Bayes Theorem 2.2 Likelihood function
A number of different notation practices for Bayesian inference are Differences in notation and language between statistics and astron-
used in both the astronomy and statistics literature. This section is omy can lead to confusions regarding the likelihood. In Bayesian
meant to clarify some of these differences, while also providing a statistics, the capital letters 𝑌 and Θ often denote random variables.
“translation” so that astronomers can more easily follow statistics For example, both Gelman et al. (2013) in their applied statistics text
papers (e.g., recognize notation for random variables, probability and Schervish (1995) in his statistics theory text first write down
distribution functions, etc.). a joint probability density 𝑝(Θ, 𝑌 ), and then specify the likelihood
We use 𝒚 (vectorized form) to represent the observed data of a function as 𝑝(𝑌 = 𝒚 | 𝜽) (sometimes written L (𝜽) elsewhere), where
random variable 𝑌 , and 𝜽 to represent the parameter(s) of interest. 𝒚 is the fixed, observed value of 𝑌 (i.e., the data), and 𝜽 is the argument
The posterior distribution is defined by Bayes’ theorem as of the likelihood function. That is, the likelihood is a function of the
𝑝(𝒚|𝜽) 𝑝(𝜽) parameters 𝜽, given the data 𝒚 (Gelman et al. 2013; Schervish 1995;
𝑝(𝜽 | 𝒚) = (1)
𝑝(𝒚)
where 𝑝(𝒚|𝜽) is the sampling distribution for 𝒚 given 𝜽 (Section 1 https://github.com/joshspeagle/nrp_astrobayes
RASTI 000, 1–10 (0000)

Bayesian inference in Astronomy 3
Table 1. Bayesian models for inferring (1) a star’s distance parameter 𝑑 from its parallax measurement 𝑦, assuming a Gaussian distribution for its true parallax
𝜛 (top panel), and (2) a star cluster’s distance parameter 𝑑cluster from the parallax measurement of 𝑛 stars 𝒚 (bottom panel).
Inferring a single star’s distance parameter 𝑑
Sampling density / likelihood (parallax) 𝑝 ( 𝑦 | 𝑑) = 𝑁 ( 𝑦 | 1/𝑑, 𝜎 2 ), where 𝜛 = 1/𝑑

(
𝑑 2 𝑒−𝑑/𝐿 if 𝑑min < 𝑑 < 𝑑max with a constant 𝐿
Prior (distance) 𝑝 (𝑑) ∝
0 otherwise
( 𝑦−1/𝑑) 2
h i
Posterior on distance 𝑝 (𝑑 |𝑦, 𝜎 𝜛 ) ∝ 𝑑 2 exp − 𝐿
𝑑
−
2𝜎 2
Inferring a star cluster’s distance parameter 𝑑cluster
𝑁 (𝒚 | 1/𝑑cluster , 𝜎𝑖2 )
Î𝑛
Sampling density / likelihood (𝑛 parallax measurements) 𝑝 (𝒚 | 𝑑cluster ) = 𝑖=1
Prior (on parallax of cluster) 𝑝 ( 𝜛cluster ) = 𝑁 ( 𝜛cluster | 𝜇 𝜛 , 𝜎 𝜛 )

(𝑦 −1/𝑑 )2
Posterior on distance to cluster 𝑝 (𝑑cluster |𝒚) ∝ 𝑝 (𝑑cluster ) exp − 𝑒 𝑓 𝑓 2 cluster
2𝜏
2 In both the upper and lower box, 𝜎 and 𝜎𝑖 : 𝑖 = 1, 2, . . . , 𝑛 are assumed to be known. Throughout 𝑁 ( 𝑦 | 𝑎, 𝑏) denotes the Gaussian density function of 𝑦
with mean 𝑎 and variance 𝑏.
Carlin & Louis 2008; Berger & Wolpert 1988; Casella & Berger a number of valuable references in the statistics literature on these
2002). However, in astronomy, it is not unusual to see phrases such topics (Rubin 1976; Little & Rubin 2019). We recommend writing
as “the likelihood function of the data given the model parameters”, down enough mathematical details to uniquely determine the like-
which might be misconstrued as treating the likelihood function as a lihood by defining (algebraically) not only the physical process of
function of data. Finally, we note that the likelihood function of 𝜽 is interest but also the sampling/measurement process that generated
not a probability density function of 𝜽. the data.
In Table 2, we summarize common notation for the likelihood
found in both the statistics and astronomy literature, which ranges
from being very explicit (e.g., 𝑓𝑌 (𝒚|𝜽)) to quite simplified (e.g., 2.2.1 Parallax Example: the likelihood function
𝑃(𝒚|𝑀)). We note that a subtlety sometimes missed in astronomy
The Gaia spacecraft has measured parallaxes for over a billion stars
is the difference between 𝑝 and 𝑃. In statistics, 𝑓 and 𝑝 are often
(Gaia et al. 2018; Collaboration et al. 2018). These parallaxes have
used for probability density functions (pdf) of continuous random
been shown empirically, through simulations (Holl et al. 2012; Lin-
variables, and 𝑃 or 𝑃𝑟 are used to denote probabilities of discrete
degren et al. 2012), to follow a normal distribution with mean equal
events (probability mass functions (pmf) are an exception, and are
to the true underlying parallax 𝜛 so that
often denoted using 𝑓 , 𝑝, or 𝑃). A capital 𝐹 is usually reserved for
the cumulative distribution function (cdf). 1

(𝑦 − 𝜛) 2

Determining what the likelihood function should be in a given 𝑝(𝑦 | 𝜛) = √ exp − or equivalently, (5)
2𝜋𝜎 2 2𝜎 2
astronomy problem can be challenging, and care must be taken to
choose an appropriate sampling distribution. The likelihood is of- 𝑦 | 𝜛, 𝜎 ∼ 𝑁 (𝜛, 𝜎 2 ) (6)
ten taken to be a product of independent and identically distributed where 𝑦 is the measured parallax and 𝜎 is the associated (assumed
(i.i.d.) Gaussian random variables with known variance. While this known) measurement uncertainty (Hogg 2018). The parameter of
choice is sometimes plausible, there are also many cases in which it interest is the distance 𝑑, so we rewrite Equation 6 as
is inappropriate, and has a material effect on inference. For instance,
(𝑦 − 1/𝑑) 2

when describing the brightness of a high energy source, a discrete 1
𝑝(𝑦 | 𝑑) = √ exp − or equivalently, (7)
distribution such as the Poisson distribution is usually more appro- 2𝜋𝜎 2 2𝜎 2
priate than a Gaussian. In other cases, uncertainty in the variance
𝑦 | 𝑑, 𝜎 ∼ 𝑁 (1/𝑑, 𝜎 2 ). (8)
of the data might lead us to use a likelihood function based on the
𝑡-distribution or another non-Gaussian parametric family. A further We note that a similar Gaussian model assumption is widely ap-
consideration is whether the data being modeled are collected as a plicable to various sub-fields in observational astronomy such as
function of space or time, in which case the assumption of exchange- detecting exoplanets by RV (Danby 1988; Mayor et al. 2011; Pepe
ability – that data can be reordered without affecting the likelihood et al. 2011; Fischer et al. 2013; Butler et al. 2017) or by transit
– is generally unwarranted. In these cases, an expanded model that (Konacki et al. 2003; Alonso et al. 2004; Dragomir et al. 2019), in-
includes correlation among observations should be considered. ferring the true brightness of a source (Tak et al. 2017), or estimating
Best practice includes all non-negligible contributors to the mea- the Hubble constant (Hubble 1929). This is because the statistical
surement process in the likelihood function. For example, it is impor- details are analogous; the observation is measured with Gaussian
tant to account for substantial truncation and censoring issues when measurement error, estimated measurement error uncertainty 𝜎 is
present, because these can strongly influence parameter inference in treated as a known constant, and the mean model can be written as
some cases (Rubin 1976; Eadie et al. 2021). Other common issues a deterministic function of other parameters, e.g., 𝜛 = 1/𝑑 in Equa-
to check for and address are measurement uncertainty, correlated er- tion 6. On the other hand, as mentioned already, in each new setting
rors, measurement bias, sampling bias, and missing data. There are it is important to carefully consider which model is most appropriate;
RASTI 000, 1–10 (0000)

4 G.M. Eadie et al.
[t!]
Table 2. Likelihood notation found in different contexts.
Notation Description Context
𝑓𝑌 (𝒚 |𝜽) distribution function of the random variable 𝑌 , statistics

but viewed as a function of 𝜽, with 𝒚 fixed
𝑝 (𝒚 |𝜽) format used in this paper (common) statistics and astronomy
L (𝜽; 𝒚) or L (𝜽) explicit notation for the likelihood with specific statistics topics, e.g.,
argument 𝜽 maximum likelihood estimation
𝑃 (𝑫 |𝜽) 𝐷 represents the data astronomy
𝑃 (𝒚 |𝑀 ) 𝑀 represents the model assumption, astronomy, model selection

implicitly suggests parameters
𝑃𝑟 (𝒚 |𝜽, 𝐻 ) 𝐻 represents a particular proposed model astronomy, model selection
Gaussian-based models are (i) sometimes misused, (ii) overused, or after a transformation. For instance, in our example (Section 2.3.1)
(iii) sometimes inappropriate. a “non-informative” uniform prior distribution on the parallax of a
star is actually quite informative in terms of distance (third panel,
Figure 1). Thus, we recommend carefully considering what direct
2.3 Prior Distributions or indirect information is available about the value of a parameter
The prior probability distribution, or the prior, captures our initial before resorting to default or non-informative priors; in astronomy,
knowledge about the model parameters before we have seen the data. we usually have at least a little information about the range of al-
Priors may assign higher probability (or density) to some values of lowed or physically reasonable values. Even when a non-informative
the model parameters over others. Priors are often categorized into prior does seem appropriate, checking that the chosen distribution is
two classes: informative and non-informative. The former type sum- consistent with known physical constraints is essential.
marizes knowledge gained from previous studies, theoretical predic- Complete descriptions and mathematical forms of prior distribu-
tions, and/or scientific intuition. The latter type attempts to include as tions, including the values of hyperparameters defining these dis-
little information as possible about the model parameters. Informa- tributions, help promote reproducibility and open science. Unfortu-
tive priors can be conjugate (Diaconis & Ylvisaker 1979), mixtures nately, a recent meta-analysis of the astronomical literature showed
of conjugate priors (Dalal & Hall 1983), scientifically motivated (Tak that prior definitions are often incomplete or unstated (Tak et al.
et al. 2018; Lemoine 2019), based on previous data, or in the case of 2018), making it difficult for others to interpret results.
empirical Bayes, based on the data at hand (often called data-driven To summarize, good practices in the context of priors are: (1)
priors) (Carlin & Louis 2000; Maritz 2018). Non-informative priors choosing informative priors when existing knowledge is available,
can be improper or “flat”, weakly-informative, Jeffrey’s priors (Tuyl (2) choosing priors with caution if there is no prior knowledge,
et al. 2008), or other reference distributions. Conjugate priors are (3) testing the influence of alternative priors (see the discussion of
sometimes defined to be non-informative. sensitivity analyses in Section 2.4), and (4) explicitly specifying the
One popular choice for a non-informative prior is an improper chosen prior distributions for clarity and reproducibility.
prior — a prior that is not a probability distribution and in particular
does not integrate to one. Good introductions to improper priors are
available in the statistics literature (Gelman et al. 2013, 2017). An
example of an improper prior is a flat prior on an unbounded range, 2.3.1 Parallax Example: choosing a prior
e.g., Unif(0, ∞) or Unif(−∞, ∞). When an improper prior has been A naive choice of prior on the true parallax 𝜛 is 𝑝(𝜛) ∝ constant,
adopted, it is imperative to check whether the resulting posterior is a an improper prior that assigns equal density to all values of 𝜛 from
proper probability distribution before making any inference. Without (0, +∞). A straightforward way to be more informative and proper
posterior propriety the analysis has no probability interpretation. is to instead define a truncated uniform prior, where 𝜛 is uniformly
Empirical checks may not be sufficient; posterior samples may not distributed between 𝜛 = (𝜛min , 𝜛max ) so that
reveal any evidence of posterior impropriety, forming a seemingly
(
reasonable distribution even when the posterior is actually improper constant 𝜛min < 𝜛 < 𝜛max
(Hobert & Casella 1996; Tak et al. 2018). 𝑝(𝜛) ∝ , (9)
0 otherwise
Research on quantifying prior impact is active (e.g., effective prior
sample size Clarke 1996; Reimherr et al. 2014; Jones et al. 2020) as or equivalently,
is the discussion on choosing a prior in the context of the likelihood
(Reimherr et al. 2014; Gelman et al. 2017; Jones et al. 2020). 𝜛 ∼ Unif(𝜛min , 𝜛max ) (10)
In astronomy, there is a tendency for scientists to adopt non-
informative prior distributions, perhaps because informative priors Here, 𝜛min and 𝜛max are hyperparameters set by the scientist (e.g.,
are perceived as too subjective or because there is a lack of easily using some physically-motivated cutoff for 𝜛min and the minimum
quantifiable information about the parameters in question. However, realistic distance to the star for 𝜛max ). Thus the prior in Equation 9
all priors provide some information about the likely values of the can be regarded as weakly-informative because some physical knowl-
model parameter(s), even a “flat” prior. Notably, a flat prior is non-flat edge is reflected in the bounds. Similarly, we could instead define a
RASTI 000, 1–10 (0000)

uniform prior on distance: high-dimensional or multi-modal target distributions; see Brooks
( et al. (2011) for details.
constant 𝑑min < 𝑑 < 𝑑max The posterior distribution enables inference of model parameters
𝑝(𝑑) ∝ , (11)
0 otherwise or of quantities that can be derived from model parameters. For
example, the posterior mean 𝐸 (𝜃 | 𝑦) is a point summary for 𝜃. The
or equivalently, posterior distribution can also be used to define credible intervals
𝑑 ∼ Unif(𝑑min , 𝑑max ) (12) for parameters that provide a range of probable values. We stress
that credible intervals are not confidence intervals; a 95% credible
where 𝑑min = 1/𝜛max and 𝑑max = 1/𝜛min . Like Equation 9, this interval suggests that there is a 95% probability that the parameter
prior can also be regarded as weakly-informative. However, both lies within the specified range given our prior beliefs, the model, and
display drastically different behavior as a function of 𝑑 (see Fig- the data, whereas a 95% confidence interval suggests that if similar
ure 1), which highlights how the interpretation of non-informative intervals are properly constructed for multiple datasets then 95% of
(or weakly-informative) priors may change depending on the choice them are expected to contain the true (fixed) parameter value.
of parameterization. Depending on the characteristics of the posterior distribution, we
While the prior in Equation 11 may appear non-informative (in emphasize that point summaries and intervals may not provide a
a sense that it is uniform), it actually encodes a strong assumption complete description of uncertainty (e.g., for multi-modal posteri-
about the number density of stars 𝜌 as a function of distance. The prior ors). Here, visualizations of the posterior can provide a more com-
implies that we are just as likely to observe stars at large distances prehensive picture (see Figure 2). Recommendations and open source
as we are at smaller distances. However, as we look out into space, software packages containing visualization tools for Bayesian analy-
the area of the solid angle defined by the distance 𝑑 increases, and sis can be found in the statistics literature (Gabry et al. 2019; Gabry
this in turn implies that the stellar number density is decreasing with & Mahr 2019; Kumar et al. 2019; Vehtari et al. 2020). Projections
distance. Thus, Equation 11, which says that all distances are equally of the joint posterior distribution into two parameter dimensions —
likely, implies that there are fewer stars per volume at large distances also colloquially referred to as a corner plot in astronomy literature
than stars per volume at small distances. — is the most common visualization tool. Drawing credible regions
Bailer-Jones et al. (2018) introduced a better prior for the parallax or contours on these types of visualizations are also helpful, although
inference problem, which we outline briefly and reproduce here. defaulting to a “1-sigma” credible region is not always appropriate
The physical volume d𝑉 probed by an infinitesimal solid angle on (i.e., when the distributions are non-Gaussian).
the sky dΩ at a given distance 𝑑 scales as the size of a shell so that The posterior distribution also provides a useful way to obtain es-
dΩ ∝ 𝑑 2 . This means that, assuming a constant stellar number density timates and credible intervals for other quantities of physical interest.
𝜌 everywhere, a prior behaving as 𝑝(𝑑) ∝ 𝑑 2 is more appropriate. For example, if a model has parameters 𝜽 = (𝛼, 𝛽, 𝜅) and there is
However, we can go one step further — we know that our Sun sits some physical quantity described by e.g., 𝛾 = 𝛽2 /𝛼𝑒 𝜅 , then for every
in the disk of the Galaxy, and that the actual stellar density 𝜌 as sample of 𝜽, a sample of 𝛾 can be calculated. Thus, a distribution of
we go radially outward in the disk should decrease as a function the physically interesting quantity 𝛾 is obtained, which can also be
of distance. Assuming we are looking outward, and that the stellar used to obtain point estimates and credible intervals. In other words,
density decreases exponentially with a length scale 𝐿 (so that for a in the Bayesian paradigm, uncertainties in each model parameter are
given distance we have 𝑝(𝜌|𝑑) ∝ 𝑒 −𝑑/𝐿 ) the prior on distance is naturally propagated to uncertainties in derived physical quantities
( in a coherent way.
𝑑 2 𝑒 −𝑑/𝐿 𝑑min < 𝑑 < 𝑑max Posterior distributions can be complicated in shape (e.g., asym-
𝑝(𝑑) ∝ , (13)
0 otherwise metric, with multiple modes). This can create computational chal-
lenges in cases where the posterior cannot be derived in closed form.
which is the density function of a truncated Gamma(3, 𝐿) distri-
Fortunately, many algorithms have been developed for approximat-
bution. The scientist using Equation 13 would need to choose and
ing the posterior distribution. Different algorithms perform well for
define the three hyperparameters 𝑑min , 𝑑max , and 𝐿. Equation 13 is
different characteristics of the posterior, and therefore prior knowl-
the exponentially decreasing space density prior of previously pre-
edge of what we might expect the posterior to look like, as well
sented in Bailer-Jones et al. (2018). Figure 1 illustrates all three priors
preliminary explorations, are often valuable in practice. In addition
discussed here.
to the MCMC sampling algorithms already mentioned, a number
of other techniques have been developed. For example, integrated
nested Laplace approximations (Rue et al. 2009) approximates pos-
2.4 Posterior distributions
terior distributions, variational Bayes methods (Jordan et al. 1999;
The posterior distribution of Equation 1 is the focus of Bayesian Blei et al. 2003; Hoffman et al. 2013), and approximate Bayesian
inference. Once the prior distribution(s) and the likelihood function computation (Beaumont et al. 2009; Marin et al. 2012; Weyant et al.
are specified, the posterior distribution is uniquely determined. Of- 2013; Akeret et al. 2015; Ishida et al. 2015; Beaumont 2019) are pos-
ten, the denominator quantity 𝑝(𝒚) is not available analytically. In sible alternatives when the likelihood functions are too complicated
this case, 𝑝(𝒚) can be estimated by numerical integration. Samples or expensive to be evaluated.
drawn from 𝑝(𝜽 | 𝒚) can be used to estimate properties of the pos-
terior. A popular approach for obtaining samples is to construct a
Markov Chain whose stationary distribution is designed to match the
target distribution 𝑝(𝜽 | 𝒚), which is known as Markov chain Monte
2.4.1 Parallax Example: inferring the distance to a star
Carlo (MCMC). The canonical example is the Metropolis-Hasting al-
gorithm (Metropolis & Ulam 1949; Metropolis et al. 1953; Hastings In our running example, we are interested in inferring the parameter
1970; Gelman et al. 2013), but there are many variations, some of for the distance 𝑑 = 1/𝜛 given the measured parallax 𝑦 and its
which are designed to address specific challenges such as sampling associated measurement uncertainty 𝜎 (which we treat as known).
RASTI 000, 1–10 (0000)

6 G.M. Eadie et al.
Likelihood Prior Posterior

Observed Physical
units units Uniform (parallax)
(𝜛) (d = 1/𝜛)
× Physically-motivated
∝
Non- Uniform (distance)
Gaussian Gaussian
Parallax [mas] Distance [kpc] Distance [kpc] Distance [kpc]
Figure 1. Bayesian inference of the distance 𝑑 = 1/𝜛 to a star based on the measured parallax 𝑦. Far left: The likelihood for parallax 𝜛 is normal with variance
assumed known. Center left: A transformation of parameters from 𝜛 to 𝑑 gives a non-normal PDF. Note that a non-negativity constraint was applied to the
distribution of 𝑑. Center right: We highlight three possible priors 𝑝 (𝑑) over the distance: uniform in parallax 𝜛 = 1/𝑑 (blue), uniform in distance 𝑑 (red), and
a physically-motivated prior (Bailer-Jones et al. 2018) (orange). Far right: The posteriors that correspond to each of the three priors.
Mode
Probability [normalized]
Median
90% Credible Interval
Distance [kpc]
Figure 2. The three posterior distributions corresponding to each prior distribution shown in Figure 1: uniform in parallax prior (blue curve), uniform in distance
prior (red curve), and physically-motivated prior (orange curve). Also shown are one summary statistic for each posterior: the mode (blue dotted line), the median
(red dotted-dashed line), and the 90% credible interval (orange dashed lines and shaded region).
From Bayes’ theorem, the posterior is for 𝑑min < 𝑑 < 𝑑max (and 0 otherwise). While none of these have an-
alytic solutions for point estimates or credible intervals, they can be
𝑝(𝑑|𝑦) ∝ 𝑝(𝑦|𝑑) 𝑝(𝑑). (14) computed using computational techniques. Approximations to these
three posterior distributions are show in Figures 1 and 2. In this illus-
For the three priors discussed previously, this corresponds to the tration, though each resulting posterior distribution is right-skewed,
following posteriors: the shape is notably different for each considered prior distribution.
(𝑦 − 1/𝑑) 2

1
Equation 9 ⇒ 𝑝(𝑑|𝑦) ∝ 2 exp − (15)
𝑑 2𝜎 2
2.4.2 Extended Example: inferring the distance to a cluster of stars
(𝑦 − 1/𝑑) 2

Equation 11 ⇒ 𝑝(𝑑|𝑦) ∝ exp − (16) We now extend our example to infer the distance to a cluster of
2𝜎 2
stars, based on the collection of parallax measurements of each in-
𝑑 (𝑦 − 1/𝑑) 2

Equation 13 ⇒ 𝑝(𝑑|𝑦) ∝ 𝑑 2 exp − − , (17) dividual star. Assuming that there are 𝑛 stars located at approxi-
𝐿 2𝜎 2 mately the same distance 𝑑cluster and that the measured parallaxes
RASTI 000, 1–10 (0000)

𝒚 = {𝑦 1 , 𝑦 2 , . . . , 𝑦 𝑛 } to each star are independent given 𝑑cluster , our comparison, an overlaid density plot could be used (top panel of
combined likelihood is the product of the individual likelihoods Figure 4), but in general this is a poor choice because it is difficult to
𝑛
Ö judge differences between the overlaid densities visually. For graph-
𝑝(𝑦 1 , 𝑦 2 , . . . , 𝑦 𝑛 |1/𝑑cluster ) = 𝑝(𝑦 𝑖 |1/𝑑cluster ), (18) ical comparison, we recommend instead using a quantile-quantile
𝑖=1 (Q-Q) plot to characterize any differences (bottom panel of Figure 4).
where the individual likelihoods are defined following Equation 6. To construct a Q-Q plot, it suffices to compute the quantiles from
We assume that the measurement uncertainties 𝜎𝑖 are known con- the original data and from the posterior predictive distribution (or
stants. Our posterior is from data simulated from the posterior predictive distribution), and
to plot the pairs one against the other (bottom panel, Figure 4). If the
𝑛
Ö empirical distribution and the posterior predictive distribution match,
𝑝(𝑑cluster | 𝒚) ∝ 𝑝(𝑑cluster ) 𝑝(𝑦 𝑖 |1/𝑑cluster ). (19)
then their quantiles should lie along a 1:1 line. Functions to display
𝑖=1
Q-Q plots are common in statistical computing software languages.
The product of 𝑛 independent Gaussian densities with known vari-
In our parallax example, parallax values in the tails of the simulated
ances is a Gaussian density with precision parameter 𝜏 −2 = 𝑖=1 𝜎𝑖−2
Í𝑛
and real data distributions show some disagreement (Figure 4). The
and mean parameter 1/𝑑cluster . The observed parallaxes can be com-
Q-Q plot shows this more explicitly that the density plot, as the
bined to obtain an effective parallax 𝑦 eff = 𝜏 2 𝑖=1 𝑦 𝑖 /𝜎𝑖2 , and thus,
Í𝑛
quantiles do not follow the 1:1 line in the tails of the distribution
(below ∼ 10th percentile and above ∼ 70th percentile). Differences
(𝑦 − 1/𝑑cluster ) 2

in either end of a Q-Q plot can be due to chance, but strong deviations
𝑝(𝑑cluster | 𝒚) ∝ 𝑝(𝑑cluster ) exp − eff . (20)
2𝜏 2 from the 1:1 line are usually worth investigating.
The estimated posterior distribution over 𝜛cluster = 1/𝑑cluster for There are also other valuable approaches for checking a Bayesian
a nearby cluster of stars (M67) using data from Gaia DR2, and using model and the quality of approximations to the posterior distribution.
a conjugate Gaussian prior for 𝜛cluster , is shown in Figure 3. The For example, one may set a portion of the data aside, or obtain ad-
top panel of Figure 3 shows the individual parallax measurements ditional data, and then compare the resulting inference to that from
of stars in M67, sorted by their signal-to-noise values. The bottom the original data. One may also use multiple methods for approxi-
panel shows the assumed prior distribution (narrow left panel) and mating the posterior distribution, and compare results. This can help
the (estimated) posterior distribution for the cluster’s parallax, as diagnose situations where one or more sampling algorithms did not
more and better data are added to the analysis. explore the full parameter space, and consequently fail to include
Note: A prior over 𝑝(𝑑cluster ), which governs the distribution of high-probability regions in the posterior samples.
clusters of stars, is not the same as a prior over 𝑝(𝑑), which governs In addition to model and posterior checking, it is important to con-
the distribution of individual stars. While it might be reasonable to sider the influence of the prior distribution(s). The rightmost panel
assume these are similar, they are not interchangeable quantities and of Figure 1 shows three posteriors: each one used one of the three
may indeed follow different distributions. Realizing the differences priors discussed in Section 2.3.1. While the posteriors are vaguely
in priors between various scenarios such as these is key to building similar in shape (e.g., right-skewed), the inferred summary statistics
good models and subsequently making good inferences. can be quite different (Figure 2).
More generally, investigating how the analysis compares for several
2.5 Posterior Predictive Checking different prior distributions is an important technique, often referred
to as a sensitivity analysis. A sensitivity analysis directly assesses
After obtaining the posterior distribution, it is recommended to assess the impact of the prior distribution on the posterior, and for this
the adequacy of the model using posterior predictive checks (Gelman reason we recommend them — particularly when the information
et al. 1996), which compare the empirical distribution of the data available to construct a prior is limited. On the other hand, sensitivity
to the distribution described by the Bayesian model. The posterior analyses can be somewhat ad hoc (e.g., which priors are tried, how
predictive distribution is the posterior distribution of hypothetical they are compared) making it difficult to summarize and compare the
future data ( 𝒚˜ ) under the chosen model and given the previously prior impact across multiple analyses, instruments, and models. More
collected data: principled approaches may therefore be preferred or complementary
in some scenarios. One such method is to quantify the effective
∫
𝑝( 𝒚˜ | 𝒚) = 𝑝( 𝒚˜ , 𝜽 | 𝒚)𝑑𝜽. (21) prior sample size (EPSS), i.e., the number of data points that the
information provided by the prior distribution corresponds to.
We find that posterior predictive checking is underused in astronomy
but can be very useful. Posterior predictive checks not only assess The EPSS is simple to compute for conjugate models. For exam-
ple, if we have the data 𝑦 𝑖 ∼ 𝑁 (𝜇, 𝜎 2 ), for 𝑖 = 1, . . . , 𝑛, and the
𝑖𝑖𝑑
the the adequacy of the model but also simultaneously check any ap-
proximations to the posterior. They are a valuable tool for diagnosing conjugate prior distribution 𝜇 ∼ 𝑁 (𝜇0 , 𝜎 2 /𝑚), with known 𝜎 2 , then
issues with computational sampling methods. the posterior distribution of 𝜇 has variance 𝜎 2 /(𝑛 + 𝑚). Thus, the
In most cases, the posterior predictive distribution is not available effect of the prior is equivalent to that of 𝑚 samples, and we say that
in closed form. However, it is possible to generate simulated obser- the EPSS is 𝑚. The statistics literature includes proposals of several
vations from the posterior predictive distribution and compare these methods for extending this idea beyond conjugate models (Clarke
to the original data. For example, for each of the posterior samples of 1996; Morita et al. 2008; Reimherr et al. 2014; Jones et al. 2020)
𝜽, draw a random sample of 𝒚˜ and compare these samples to the real and how to additionally account for location discrepancies, e.g., the
data. Significant or systematic differences between the distributions value of | 𝑦¯ − 𝜇0 | in the preceding example. Clarke (1996) and Morita
of the real and simulated data may suggest a problem with the model. et al. (2008) use EPSS to quantify the information in the prior in
It is good practice to perform quantitative and/or graphical com- isolation from the data, while Reimherr et al. (2014) and Jones et al.
parison between the simulated data and the real data. For graphical (2020) concentrate on the impact of the prior on the specific analysis
RASTI 000, 1–10 (0000)

8 G.M. Eadie et al.
Measured
Measurements
Parallax [mas]
Individual
Prior Combine Inferred

Parallax [mas]
Cluster
Posterior
Likelihood More information
Number of Objects
Figure 3. Extended Example for Open Cluster M67. An extension of the example shown in Figure 1 illustrating how to infer the distance to an open cluster
(M67) based on parallax measurements of many stars. Top: Parallax measurements (gray) for likely cluster members (based on proper motions), sorted by their
observed signal-to-nose ratio 𝜛obs /𝜎 𝜛 . Bottom: The joint likelihood (gray) and posterior (blue) for the cluster parallax 𝜛cluster = 1/𝑑cluster as more and more
stars are added to our analysis. The (Gaussian) prior distribution on the cluster’s parallax is illustrated in the narrow left panel. When there is only a small number
of stars, the location of the prior has a substantial impact on the posterior. However, as more stars are added, the information from the data dominates.
performed. The latter is typically more relevant in science and more 2.6 Conclusion
closely coincides with sensitivity analyses.
We hope that this article has identified, clarified, and illuminated fun-
Good practices outlined in this section can be summarized as (1)
damental Bayesian inference notation and techniques from the statis-
using multiple ways to summarize the posterior inference, (2) quan-
tics literature, and in particular, has made a case for fully specifying
titatively and graphically checking the posterior distribution (e.g.,
the model, posterior predictive checking, and the use of underused
using posterior predictive checks, Q-Q plots), and (3) providing evi-
aids such as the Q-Q plot. In summary, we highlight sound practices
dence that diagnostic checks were completed.
for conducting Bayesian inference in astronomy as follows:
• Be explicit about notation, and use appropriate terminology for
the interpretation of concepts such as the likelihood and credible
intervals, which will help interdisciplinary collaboration and repro-
ducibility.
• Describe the likelihood as a function of the parameters, given
2.5.1 Extended Example: posterior predictive checks the data.
• Use informative priors whenever possible and justified. Care-
We investigate the validity of our model for the distance to a cluster fully consider what direct or indirect information is available about
of stars by computing the posterior predictive distribution for the the parameters.
observed stellar parallaxes. While in this case the posterior predictive • Use non-informative priors carefully, and assess their properties
can be written in closed form (since it is a Gaussian distribution), we under parameter transformations.
also approximate it by simulating values of 𝑑cluster from the posterior • Test the sensitivity of the posterior distribution to different prior
and then subsequently simulating values for the predicted parallax distributions.
measurements 𝜛pred,𝑖 given 𝑑cluster . • Fully specify the Bayesian model in terms of the likelihood,
In Figure 4, we compare both the distribution and quantiles esti- prior, and posterior, and provide open-source code whenever possi-
mated for the simulated dataset and the observed dataset via a density ble.
and Q-Q plot respectively. While there are differences, especially in • Perform posterior predictive checks of the model, using visual-
the tails of the distribution, overall the cluster model reproduces most izations such as Q-Q plots where appropriate.
of the observed properties of the data. It would be worth investigat- • Strive to include all non-negligible contributors to the measure-
ing whether these differences persist under different models – for ment process.
example, a model in which the distance to each star is not assumed
to be identical, or a model in which measurement uncertainty is not We hope that there is a continued growth of interdisciplinary col-
assumed to be known exactly. laborations between astronomers and statisticians in the future. Data
RASTI 000, 1–10 (0000)

ACKNOWLEDGEMENTS
Posterior
Probability [normalized]
GME acknowledges the support of a Discovery Grant from the Natu-
ral Sciences and Engineering Research Council of Canada (NSERC,
RGPIN-2020-04554). JCK gratefully acknowledges support from
NSF under Grant Numbers AST 2009528 and DMS 2038556. DH
is supported by the Women In Science Excel (WISE) programme of
Posterior the Netherlands Organisation for Scientific Research (NWO).
Predictive
Data DATA AVAILABILITY

Data used in the running example is provided with permission and
courtesy of Phill Cargile (Center for Astrophysics | Harvard & Smith-
sonian).
Measured Parallax [mas]
99% REFERENCES
95%
Akeret J., Refregier A., Amara A., Seehars S., Hasner C., 2015, Journal of
80% Cosmology and Astroparticle Physics, 2015, 043
Alonso R., et al., 2004, The Astrophysical Journal Letters, 613, L153
50% Astraatmadja T. L., Bailer-Jones C. A. L., 2016a, ApJ, 832, 137
Astraatmadja T. L., Bailer-Jones C. A. L., 2016b, ApJ, 833, 119
20% Bailer-Jones C. A. L., 2015, PASP, 127, 994
5%
Bailer-Jones C. A. L., Rybizki J., Fouesneau M., Mantelet G., Andrae R.,
2018, AJ, 156, 58
1% Beaumont M. A., 2019, Annual review of statistics and its application, 6, 379
Beaumont M. A., Cornuet J.-M., Marin J.-M., Robert C. P., 2009, Biometrika,
96, 983
Berger J. O., Wolpert R. L., 1988, The likelihood principle. IMS Lecture
Notes-Monograph Series Vol. 6, Institute of Mathematical Statistics
Predicted Parallax [mas] Blei D. M., Ng A. Y., Jordan M. I., 2003, Journal of machine Learning
research, 3, 993
Figure 4. Quantile-Quantile (Q-Q) Plot. This figure demonstrates a way Brooks S., Gelman A., Jones G., Meng X.-L., 2011, Handbook of Markov
to perform posterior predictive checking for the model shown in Figure 3. chain Monte carlo. CRC press
Top: The distribution of parallax measurements from the data (gray) and Butler R. P., et al., 2017, The Astronomical Journal, 153, 208
simulated values from the posterior predictive (light blue). The posterior Carlin B. P., Louis T. A., 2000, Bayes and empirical Bayes methods for data
mean is indicated using the dashed dark blue line. The distributions appear analysis. Texts in Statistical Science Vol. 88, Chapman & Hall/CRC Boca
relatively consistent with each other by eye, but a quantile-quantile (Q-Q) Raton
plot is more informative and suggests otherwise. Bottom: The Q-Q plot of the Carlin B. P., Louis T. A., 2008, Bayesian methods for data analysis. CRC
quantiles from the posterior predictive simulated parallax data (𝑥-axis) and Press
of the observed parallaxes (𝑦-axis). If the real and simulated data followed Casella G., Berger R. L., 2002, Statistical inference, second edn. Duxbury
the same distribution, then the quantiles would lie on the one-to-one line. Pacific Grove, CA
However, strong discrepancies are apparent below ∼ 10th percentile and Clarke B., 1996, Journal of the American Statistical Association, 91, 173
above ∼ 70th percentile. Collaboration G., et al., 2018, yCat, pp I–345
Craiu R. V., Rosenthal J. S., 2014, Annual Review of Statistics and Its Appli-
cation, 1, 179
Dalal S., Hall W., 1983, Journal of the Royal Statistical Society: Series B
(Methodological), 45, 278
Danby J., 1988, Willmann-Bell, 1988. 2nd ed., rev. & enl.
from cutting-edge telescopes such as the Vera Rubin Observatory, Diaconis P., Ylvisaker D., 1979, The Annals of statistics, pp 269–281
the James Webb Space Telescope, and many others, have the poten- Dragomir D., et al., 2019, The Astrophysical Journal Letters, 875, L7
tial to drive the field of astronomy, but this new information is best Eadie G., et al., 2019a, in Bulletin of the American Astronomical Society.
understood in the context of existing knowledge and careful statisti- p. 233 (arXiv:1909.11714)
cal inference. Bayesian inference provides a framework in which this Eadie G., et al., 2019b, in Canadian Long Range Plan for As-
type of analysis and discovery can occur. Areas of astronomy where tronony and Astrophysics White Papers. p. 10 (arXiv:1910.08857),
prior information and non-Gaussian based likelihoods are common doi:10.5281/zenodo.3756019
can especially benefit from Bayesian methods, for example X-ray and Eadie G. M., Webb J. J., Rosenthal J. S., 2021, arXiv e-prints, p.
arXiv:2108.13491
gamma-ray astronomy.
Fischer D. A., Marcy G. W., Spronck J. F., 2013, The Astrophysical Journal
Bayesian inference is a broad topic, and many subtopics were Supplement Series, 210, 5
not covered in this article. Ultimately, we hope that this article not Foreman-Mackey D., Hogg D. W., Lang D., Goodman J., 2013, Publications
only serves as a useful resource, but will also be the inception for a of the Astronomical Society of the Pacific, 125, 306
series of more specific papers on Bayesian methods and techniques Gabry J., Mahr T., 2019, bayesplot: Plotting for Bayesian Models, https:
in astronomy and physics. //mc-stan.org/bayesplot
RASTI 000, 1–10 (0000)

10 G.M. Eadie et al.
Gabry J., Simpson D., Vehtari A., Betancourt M., Gelman A., 2019, J. R. Vehtari A., Gelman A., Simpson D., Carpenter B., Bürkner P.-C., 2020,
Stat. Soc. A, 182, 389 Bayesian Analysis, 16, 667
Gaia C., et al., 2018, Astronomy & Astrophysics, 616 Weyant A., Schafer C., Wood-Vasey W. M., 2013, The Astrophysical Journal,
Gelman A., Meng X.-L., Stern H., 1996, Statistica Sinica, 6, 733 764, 116
Gelman A., Carlin J. B., Stern H. S., Dunson D. B., Vehtari A., Rubin D. B., de Valpine P., Turek D., Paciorek C. J., Anderson-Bergman C., Lang D. T.,
2013, Bayesian data analysis. CRC press Bodik R., 2017, Journal of Computational and Graphical Statistics, 26,
Gelman A., Simpson D., Betancourt M., 2017, Entropy, 19, 555 403
Hastings W. K., 1970, Biometrika, 57, 97
Hilbe J. M., De Souza R. S., Ishida E. E., 2017, Bayesian models for astro- This paper has been typeset from a TEX/LATEX file prepared by the author.
physical data: using R, JAGS, Python, and Stan. Cambridge University
Press
Hobert J. P., Casella G., 1996, Journal of the American Statistical Association,
91, 1461
Hoffman M. D., Blei D. M., Wang C., Paisley J., 2013, The Journal of Machine
Learning Research, 14, 1303
Hogg D. W., 2018, A likelihood function for the Gaia Data
(arXiv:1804.07766)
Holl B., Lindegren L., Hobbs D., 2012, A&A, 543, A15
Hubble E., 1929, Proceedings of the National Academy of Science, 15, 168
Ishida E. E., et al., 2015, Astronomy and Computing, 13, 1
Jones D. E., Trangucci R. N., Chen Y., 2020, arXiv preprint arXiv:2001.10664
Jordan M. I., Ghahramani Z., Jaakkola T. S., Saul L. K., 1999, Machine
learning, 37, 183
Konacki M., Torres G., Jha S., Sasselov D. D., 2003, Nature, 421, 507
Kumar R., Carroll C., Hartikainen A., Martin O. A., 2019, The Journal of
Open Source Software
Lemoine N. P., 2019, Oikos, 128, 912
Lindegren, L. et al., 2018, A&A, 616, A2
Lindegren L., Lammers U., Hobbs D., O’Mullane W., Bastian U., Hernández
J., 2012, A&A, 538, A78
Little R. J., Rubin D. B., 2019, Statistical analysis with missing data, third
edn. John Wiley & Sons
Lunn D. J., Thomas A., Best N., Spiegelhalter D., 2000, Statistics and com-
puting, 10, 325
Marin J.-M., Pudlo P., Robert C. P., Ryder R. J., 2012, Statistics and Com-
puting, 22, 1167
Maritz J. S., 2018, Empirical Bayes methods with applications. CRC Press
Mayor M., et al., 2011, arXiv preprint arXiv:1109.2497
Metropolis N., Ulam S., 1949, Journal of the American statistical association,
44, 335
Metropolis N., Rosenbluth A. W., Rosenbluth M. N., Teller A. H., Teller E.,
1953, The journal of chemical physics, 21, 1087
Morita S., Thall P. F., Müller P., 2008, Biometrics, 64, 595
Pepe F., et al., 2011, Astronomy & Astrophysics, 534, A58
Plummer M., et al., 2003, in Proceedings of the 3rd international workshop
on distributed statistical computing. pp 1–10
Reimherr M., Meng X.-L., Nicolae D. L., 2014, arXiv preprint
arXiv:1406.5958
Riddell A., et al., 2017, stan-dev/pystan: v2.17.0.0,
doi:10.5281/zenodo.1003176, https://doi.org/10.5281/zenodo.
1003176
Robert C. P., 2014, Annual Review of Statistics and Its Application, 1, 153
Rubin D. B., 1976, Biometrika, 63, 581
Rue H., Martino S., Chopin N., 2009, Journal of the royal statistical society:
Series b (statistical methodology), 71, 319
Salvatier J., Wiecki T. V., Fonnesbeck C., 2016, PeerJ Computer Science, 2,
e55
Schervish M. J., 1995, Theory of statistics. Springer Series in Statistics,
Springer
Schönrich R., McMillan P., Eyer L., 2019, Monthly Notices of the Royal
Astronomical Society, 487, 3568
Stan Development Team 2020, RStan: the R interface to Stan, http://
mc-stan.org/
Tak H., Mandel K., van Dyk D. A., Kashyap V. L., Meng X.-L., Siemigi-
nowska A., 2017, The Annals of Applied Statistics, 11, 1309
Tak H., Ghosh S. K., Ellis J. A., 2018, Monthly Notices of the Royal Astro-
nomical Society, 481, 277
Tuyl F., Gerlach R., Mengersen K., 2008, The American Statistician, 62, 40
RASTI 000, 1–10 (0000)

Practical Guidance For Bayesian Inference in Astronomy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical Guidance For Bayesian Inference in Astronomy

Uploaded by

Copyright:

Available Formats

RASTI 000, 1–10 (0000) Preprint 10 February 2023 Compiled using rasti LATEX style file v3.

Practical Guidance for Bayesian Inference in Astronomy

7 Texas A&M University, Department of Statistics, College Station, TX 77843, USA

© 0000 The Authors

RASTI 000, 1–10 (0000)

Inferring a single star’s distance parameter 𝑑

Sampling density / likelihood (parallax) 𝑝 ( 𝑦 | 𝑑) = 𝑁 ( 𝑦 | 1/𝑑, 𝜎 2 ), where 𝜛 = 1/𝑑

Inferring a star cluster’s distance parameter 𝑑cluster

Prior (on parallax of cluster) 𝑝 ( 𝜛cluster ) = 𝑁 ( 𝜛cluster | 𝜇 𝜛 , 𝜎 𝜛 )

RASTI 000, 1–10 (0000)

Table 2. Likelihood notation found in different contexts.

Notation Description Context

𝑓𝑌 (𝒚 |𝜽) distribution function of the random variable 𝑌 , statistics

𝑝 (𝒚 |𝜽) format used in this paper (common) statistics and astronomy

𝑃 (𝑫 |𝜽) 𝐷 represents the data astronomy

𝑃 (𝒚 |𝑀 ) 𝑀 represents the model assumption, astronomy, model selection

𝑃𝑟 (𝒚 |𝜽, 𝐻 ) 𝐻 represents a particular proposed model astronomy, model selection

RASTI 000, 1–10 (0000)

RASTI 000, 1–10 (0000)

Likelihood Prior Posterior

Parallax [mas] Distance [kpc] Distance [kpc] Distance [kpc]

90% Credible Interval

RASTI 000, 1–10 (0000)

RASTI 000, 1–10 (0000)

Prior Combine Inferred

Likelihood More information

RASTI 000, 1–10 (0000)

Data DATA AVAILABILITY

RASTI 000, 1–10 (0000)

RASTI 000, 1–10 (0000)

You might also like