Professional Documents
Culture Documents
Toubiana 2401.06845
Toubiana 2401.06845
Toubiana 2401.06845
ABSTRACT
In these notes, we comment on the standard indistinguishability criterion often
used in the gravitational wave (GW) community to set accuracy requirements on
waveforms (Flanagan & Hughes 1998; Lindblom et al. 2008; McWilliams et al. 2010;
Chatziioannou et al. 2017; Pürrer & Haster 2020). Revisiting the hypotheses under
which it is derived, we propose a correction to it. Moreover, we outline how the ap-
proach described in Toubiana et al. (2023) in the context of tests of general relativity
can be used for this same purpose.
1. DEFINITIONS
In the following, we use θ to denote the parameters of a GW signal model, np the
total number of parameters and h(θ) the family of templates we use to analyse the
GW data. We assume that we have a set of nd independent observations, which could
be the strain in different detectors or the time-delay-interferometry (TDI) variables
A, E and T (Tinto & Dhurandhar 2005) in the case of the Laser Interferometer Space
Antenna (LISA) (Amaro-Seoane et al. 2017). In general, the strain d is the sum of
a GW signal s0 and noise. Here, we work in the zero-noise approximation, taking
d = s0 . The “true” GW signal s0 need not belong to the family of model templates,
h(θ), but we assume it depends on the same set of parameters, θ. Denoting θ0 the
parameters such that s0 = s(θ0 ), we wish to derive a simple criterion to determine if
when analysing the observed data with our family of templates h, we would obtain a
biased estimate of θ0 due to systematic effects.
We define the noise-weighted inner product between two data streams, d1 and d2 ,
as:
Z +∞
d1 (f )d∗2 (f )
(d1 |d2 ) = 4Re df , (1)
0 Sn (f )
1
We assume this to hold in the region of the parameter space we would identify when performing
parameter estimation.
3
Next, we use the linear signal approximation (Finn 1992) to write the likelihood in a
Gaussian form. We denote by θ̂ the maximum likelihood point, and decompose the
GW signal into a component within the template manifold and a component orthog-
onal to it (in the sense of the inner product defined in Eq. 1) as: s0,i = hi (θ̂) + δhi ,
such that ni=1 (δhi |∂j hi (θ̂)) = 0 ∀ j, where ∂j h = ∂h/∂θj as usual. This yields:
P d
" nd
! #
1 X
L = L̂ exp − (θ − θ̂)t F i (θ − θ̂) , (11)
2 i
where the maximum likelihood, L̂, and the Fisher matrices, F i , are given by:
" nd
#
1X
L̂ = exp − (δhi |δhi ) , (12)
2 i
Fjik = (∂j hi |∂k hi )|θ̂ . (13)
The Gaussian approximation is expected to hold for high enough SNRs. Note that
its validity is independent of the overall quality of the fit, which is measured by L̂.
2. INDISTINGUISHABILITY CRITERION
2.1. Standard criterion
The standard indistinguishability criterion is obtained by:
1. assuming L̂ = 1,
2. requiring that the log-likelihood at the true point θ0 is larger than exp[−np /2],
the log-likelihood 1 − σ away in all directions from θ̂.
Although very handy, this criterion tends to be too conservative and biases are not
necessarily found when it is violated (Pürrer & Haster 2020). It can be readily
improved upon by modifying the two hypotheses under which it was derived.
2
This can readily be seen by transforming Eq. 11 into the basis of eigenvectors of the total covariance
matrix, which then clearly shows that 2(ln L̂ − ln L) is the sum of np normalised Gaussian random
variables, and so follows a χ2 (np ) distribution.
3
The 0.5 limit comes from the fact that as np → ∞, the χ2 distribution approaches a Gaussian
distribution with mean np and variance 2np . However, this convergence is very slow.
5
q
X
4
If X follows a χ2 (np ) distribution, then 3
np converges to a normal distribution with mean 1 − 9n2 p
and variance 9n2 p . Remarkably, this convergence is much faster than the one of χ2 to a normal
distribution.
5
In that work the authors let the factor in the numerator of the right-hand side of Eq. 14 be free
instead of setting it to np , and estimate it by setting an equality between the left and right-hand
sides of Eq. 14 at the SNR where they find, from full from parameter estimation, systematic biases
to equal statistical errors. In one case, they find this factor to be as large as 104 for an SNR of 250,
which could be obtained for a fitting factor of 0.92. This is plausible considering that they analyse
numerical relativity waveforms with full harmonic content using templates with the (2,2) harmonic
only.
6
informs us if the true point lies inside the np − dimensional 90% confidence region.
Often, we are interested in estimating if biases are expected in a specific subset of
parameters, e.g., the intrinsic parameters. We can denote θ = (θ1 , θ2 ), where θ1 are
the parameters we are interested in and θ2 those in which we are not.
3. MODEL SELECTION
In Toubiana et al. (2023), we introduced the Akaike information criterion (AIC)
(Akaike 1974) of a model, defined as:
where k is a constant, that was originally taken to be k = 2, but other choices are
occasionally made in the statistics literature. The Bayes’ factor between two models
can be approximated by:
1
ln B = − (AIC1 − AIC2 ). (28)
2
This can be justified by considering the case of an np −dimensional Gaussian likelihood
with covariance eigenvalues σi , and taking a flat prior that extendsa width√
2∆i in each
n
eigenvector direction, θ⃗i . The log-evidence for this model is: ln 2πσ
Q p
i 2∆i
i
+ ln L̂.
√
Eq. 28 is recovered by choosing ∆i = e 22π σi ≃ 3.4σi . In this way, the chosen prior
domain contains 0.9993np of the likelihood probability weight, which is usually enough
for the values of np we are typically interested in. For spinning binary GW sources,
np can be as large as 15, in which case this probability weight is ∼ 99%. Increasing
∆i to 4.07σi (which would mean that the prior now contains 99.93% of the likelihood
probability weight when np = 15) would correspond to setting k = 2.36 in the AIC.
These choices of ∆i are somewhat ad-hoc, but the procedure can be interpreted as
trying to maximize the evidence for a model with flat priors, without excluding any
region of the parameter space consistent with the observed data. It is interesting
that the AIC can be related to the evidence in this way, since the AIC is derived
using information theory, by minimizing the information loss relative to the true data
generating process by considering the Kullback-Liebler divergence between different
models. Moreover, note that the AIC is a measure of how well a model fits the data:
the smaller it is, the better the fit. In this sense, interpreting the difference in AICs
as a log-Bayes’ factor through Eq. 28 is a way of a introducing a scale to quantify
how much a model is preferred to another.
Now, we discuss how this approach can be applied to assess if biases are expected.
From the Bayesian perspective, we expect not to have biases in a subset of param-
eters θ1 , if the model where those parameters are fixed to their true value θ01 is not
disfavoured over the model where we vary them. This is the same reasoning we used
in Toubiana et al. (2023) to estimate the SNR at which we would expect apparent
deviations from GR to arise due to systematic effects. The log-Bayes’ factor between
the two models reads:
In that equation, ln L̂(θ1 = θ01 ) is the maximum log-likelihood when fixing θ1 to θ01
and letting only θ2 vary, and ln L̂ is the maximum log-likelihood when varying all the
9
where, as above, θ̌2 is such that the total overlap at θ01 is maximised.
The value of ln B from which we expect to observe biases is not unequivocally
defined. Adopting the Kass-Raftery scale Kass & Raftery (1995), we expect to have
strong evidence for a bias if ln B is in the range 3-5, and very strong evidence for
ln B > 5. In Toubiana et al. (2023), we found these scales to be in good agreement
with our parameter estimation runs.
4. PRACTICAL CONSIDERATIONS
We are often interested in determining the SNR threshold at which we expect bi-
ases to begin to appear. From Eq. 5, when varying the SNR of the source just
by varying the distance of the signal s0 , the log-likelihood can be readily rescaled:
since only the (s0,i |s0,i ) term depends on the distance, each term of the sum scales
as SNR2i , and each of these scales in the same way with distance. Moreover, the
parameters that maximise the log-likelihood do not change, apart from the distance,
which scales in the same way as the distance of s0 . Thanks to this observation, the
Bayes’ factor-based criterion, as well as the full and maximised versions of the re-
visited indistinguishability criterion (given by Eqs. 15, 18 and 21, not the one where
we perform marginalisation over a subset of parameters presented in Sec. 2.3.2) can
be easily evaluated at different SNRs, using minimisation routines or Markov-Chain-
Monte-Carlo (MCMC) (as in Toubiana et al. (2023)) to estimate θ̂2 and ln L̂ for a
given SNR and applying the proper rescaling. Alternatively, using the linear signal
approximation, we can compute the Fisher matrix (see Eq. 13), assuming that θ̂ ∼ θ0
(the difference in the Fisher matrix at those points should yield higher order correc-
tions to the linear signal approximation), use it to draw samples of θ and compute the
log-likelihood at those points. A drawback of this method and of the MCMC-based
one, is that the maximum log-likelihood lies in the higher-end tail of the log-likelihood
distribution, which might not be well sampled (recall that errors in the estimation of
the log-likelihood will scale with SNR2 in the criteria). We can once again exploit the
fact that the log-likelihood follows a χ2 (np ) distribution and estimate its maximum
value from the mean, < ln L >, whose estimate through sampling is more stable,
using the relation:
np
ln L̂ =< ln L > + . (31)
2
The fact that the marginalised version of the revisited indistinguishability criterion
(Eq. 24) does not follow the SNR scaling makes it even more inconvenient to work
with. Thus, we believe the Bayes’ factor-based criterion we propose offers a powerful
alternative for the study of systematic effects.
10
5. ACKNOWLEDGEMENTS
We are thankful to Lorenzo Pompili and Alessandra Buonanno for fruitful discus-
sions that led to the elaboration of these notes.
REFERENCES
Akaike, H. 1974, IEEE Transactions on Lindblom, L., Owen, B. J., & Brown,
Automatic Control, 19, 716, D. A. 2008, Phys. Rev. D, 78, 124020,
doi: 10.1109/TAC.1974.1100705 doi: 10.1103/PhysRevD.78.124020
Amaro-Seoane, P., et al. 2017. McWilliams, S. T., Kelly, B. J., & Baker,
https://arxiv.org/abs/1702.00786 J. G. 2010, Phys. Rev. D, 82, 024014,
Baird, E., Fairhurst, S., Hannam, M., & doi: 10.1103/PhysRevD.82.024014
Murphy, P. 2013, Phys. Rev. D, 87, Pürrer, M., & Haster, C.-J. 2020, Phys.
024035, Rev. Res., 2, 023151,
doi: 10.1103/PhysRevD.87.024035 doi: 10.1103/PhysRevResearch.2.023151
Chatziioannou, K., Klein, A., Yunes, N., Tinto, M., & Dhurandhar, S. V. 2005,
& Cornish, N. 2017, Phys. Rev. D, 95, Living Rev. Rel., 8, 4,
104004, doi: 10.12942/lrr-2005-4
doi: 10.1103/PhysRevD.95.104004
Toubiana, A., Pompili, L., Buonanno, A.,
Finn, L. S. 1992, Phys. Rev. D, 46, 5236,
Gair, J. R., & Katz, M. L. 2023.
doi: 10.1103/PhysRevD.46.5236
https://arxiv.org/abs/2307.15086
Flanagan, E. E., & Hughes, S. A. 1998,
Phys. Rev. D, 57, 4566, Wilson, E. B., & Hilferty, M. M. 1931,
doi: 10.1103/PhysRevD.57.4566 Proceedings of the National Academy
Kass, R. E., & Raftery, A. E. 1995, of Sciences of the United States of
Journal of the American Statistical America, 17, 684.
Association, 90, 773, http://www.jstor.org/stable/86022
doi: 10.1080/01621459.1995.10476572