Toubiana 2401.06845

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Draft version January 19, 2024

Typeset using LATEX modern style in AASTeX631

Indistinguishability criterion and estimating the presence of biases


Alexandre Toubiana1 and Jonathan R. Gair1
1
Max Planck Institute for Gravitationsphysik (Albert Einstein Institute)
Am Mühlenberg 1
14476 Potsdam, Germany
arXiv:2401.06845v2 [gr-qc] 18 Jan 2024

ABSTRACT
In these notes, we comment on the standard indistinguishability criterion often
used in the gravitational wave (GW) community to set accuracy requirements on
waveforms (Flanagan & Hughes 1998; Lindblom et al. 2008; McWilliams et al. 2010;
Chatziioannou et al. 2017; Pürrer & Haster 2020). Revisiting the hypotheses under
which it is derived, we propose a correction to it. Moreover, we outline how the ap-
proach described in Toubiana et al. (2023) in the context of tests of general relativity
can be used for this same purpose.

1. DEFINITIONS
In the following, we use θ to denote the parameters of a GW signal model, np the
total number of parameters and h(θ) the family of templates we use to analyse the
GW data. We assume that we have a set of nd independent observations, which could
be the strain in different detectors or the time-delay-interferometry (TDI) variables
A, E and T (Tinto & Dhurandhar 2005) in the case of the Laser Interferometer Space
Antenna (LISA) (Amaro-Seoane et al. 2017). In general, the strain d is the sum of
a GW signal s0 and noise. Here, we work in the zero-noise approximation, taking
d = s0 . The “true” GW signal s0 need not belong to the family of model templates,
h(θ), but we assume it depends on the same set of parameters, θ. Denoting θ0 the
parameters such that s0 = s(θ0 ), we wish to derive a simple criterion to determine if
when analysing the observed data with our family of templates h, we would obtain a
biased estimate of θ0 due to systematic effects.
We define the noise-weighted inner product between two data streams, d1 and d2 ,
as:
Z +∞
d1 (f )d∗2 (f )

(d1 |d2 ) = 4Re df , (1)
0 Sn (f )

where Sn (f ) is the power spectral density


p (PSD). For a given choice of PSD, the SNR
of a signal s is defined as SNR = (s|s). The posterior distribution of the source
2

parameters, θ, given datasets d1 , ..., dnd is given by Bayes’ theorem:


p(d1 , ..., dnd |θ)p(θ)
p(θ|d1 , ..., dnd ) = , (2)
p(d1 , ..., dnd )
where p(d1 , ..., dnd |θ) is the likelihood, which we denote by L(θ) in the following, p(θ)
is the prior and p(d1 , ..., dnd ) is the evidence. Throughout these notes, we assume
the prior on θ to be flat and that its support contains the support of the likelihood.
Thus, up to normalisation constants, which do not affect its shape, the posterior
on θ is equal to the likelihood. Assuming the noise to be stationary, Gaussian and
independent for each observed dataset, the likelihood reads:
nd  
Y 1
L = p(d1 , ..., dnd |θ|) = exp − (s0,i − hi (θ)|s0,i − hi (θ)) . (3)
i
2
The log-likelihood can then be written:
nd
X 1 1
ln L = (s0,i |hi (θ)) − (s0,i |s0,i ) − (hi (θ)|hi (θ)), (4)
i
2 2
s
n d
X (hi (θ)|hi (θ)) (s0,i |hi (θ))
= (s0,i |s0,i ) p
i
(s0,i |s0,i ) (s0,i |s0,i )(hi (θ)|hi (θ))
s s !
1 (s0,i |s0,i ) 1 (hi (θ)|hi (θ))
− − . (5)
2 (hi (θ)|hi (θ)) 2 (s0,i |s0,i )
We define the overlap between two data streams as:
(d1 |d2 )
O(d1 , d2 ) = p . (6)
(d1 |d1 ) (d2 |d2 )
Then, assuming that the true signal and our recovered templates have similar loud-
ness, i.e., (s0,i |s0,i ) ≃ (hi (θ)|hi (θ))1 , from Eq. 5 we can write the log-likelihood as:
nd
X
ln L(θ) = − SNRi 2 (1 − Oi (s0 , h(θ))), (7)
i

= −SNR2T (1 − OT (s0 , h(θ))). (8)


In the above equation, we have introduced the total SNR, SNRT , and the total overlap,
OT , defined as:
nd
X
SNR2T = SNRi 2 , (9)
i
n
X d
SNRi 2
OT (s0 , h(θ)) = Oi (s0 , h(θ)). (10)
i
SNR2T

1
We assume this to hold in the region of the parameter space we would identify when performing
parameter estimation.
3

Next, we use the linear signal approximation (Finn 1992) to write the likelihood in a
Gaussian form. We denote by θ̂ the maximum likelihood point, and decompose the
GW signal into a component within the template manifold and a component orthog-
onal to it (in the sense of the inner product defined in Eq. 1) as: s0,i = hi (θ̂) + δhi ,
such that ni=1 (δhi |∂j hi (θ̂)) = 0 ∀ j, where ∂j h = ∂h/∂θj as usual. This yields:
P d

" nd
! #
1 X
L = L̂ exp − (θ − θ̂)t F i (θ − θ̂) , (11)
2 i

where the maximum likelihood, L̂, and the Fisher matrices, F i , are given by:
" nd
#
1X
L̂ = exp − (δhi |δhi ) , (12)
2 i
Fjik = (∂j hi |∂k hi )|θ̂ . (13)

The Gaussian approximation is expected to hold for high enough SNRs. Note that
its validity is independent of the overall quality of the fit, which is measured by L̂.

2. INDISTINGUISHABILITY CRITERION
2.1. Standard criterion
The standard indistinguishability criterion is obtained by:

1. assuming L̂ = 1,

2. requiring that the log-likelihood at the true point θ0 is larger than exp[−np /2],
the log-likelihood 1 − σ away in all directions from θ̂.

From Eqs. 8 and 11 these conditions translate into:


np
1 − OT (s0 , h(θ0 )) ≤ . (14)
2SNR2T

Although very handy, this criterion tends to be too conservative and biases are not
necessarily found when it is violated (Pürrer & Haster 2020). It can be readily
improved upon by modifying the two hypotheses under which it was derived.

2.2. Revisiting the criterion


The first one requires that the component of the GW signal orthogonal to the
template family h is null, see Eq. 12, i.e., there exists some set of parameters for
which our template can perfectly reproduce the signal. However, this is usually not
the case. For instance, when recovering NR injections with SEOBNRv5HM templates
including all harmonics in Toubiana et al. (2023), our maximum log-likelihoods were
always significantly below 0 (see Fig. 19 of that paper), and since they scale with
the SNR2 , this becomes even more relevant as the SNR increases. This problem can
4

Figure 1. Fraction of the probability weight of an np -dimensional Gaussian distribution


for the part of the parameter space above a given iso-likelihood slice as a function of np . On
a one hand, taking the mean value of the log-likelihood, this fraction varies substantially
with np . On the other hand, the expression defined in Eq. 16 is very stable and gives a
good approximation to the 90% quantile.

be corrected by adding the actual maximum log-likelihood. We will discuss ways to


estimate this in section 4.
The second hypothesis is motivated by the 1 − σ rule for Gaussian distributions.
Using the fact that 2(ln L̂ − ln L) is distributed as a χ2 (np ) distribution2 , we can
reinterpret the second hypothesis as requiring that the log-likelihood at the true
point is larger than the mean log-likelihood. However, as shown in Fig. 1, for an
np − dimensional Gaussian distribution, the fraction of the probability weight con-
tained in the region with log-probability larger than the mean log-probability depends
significantly on np . It starts from 0.68 for np = 1, as expected for the 1 − σ contour
of a one-dimensional Gaussian, and tends slowly to 0.53 . Better motivated lower
limits for the log-likelihood at the true point, are the quantiles of the log-likelihood
distribution, as also proposed in Baird et al. (2013). Schematically, we want to take
an iso-likelihood slice of the np − dimensional parameter space such that a specified
fraction, f , of the probability weight is contained above that slice. If the true point
lies above that slice, it is contained in the np − dimensional 100f % confidence re-
gion. Denoting by Qf (np ) the 100f % quantile of the χ2 (np ) distribution, the revisited

2
This can readily be seen by transforming Eq. 11 into the basis of eigenvectors of the total covariance
matrix, which then clearly shows that 2(ln L̂ − ln L) is the sum of np normalised Gaussian random
variables, and so follows a χ2 (np ) distribution.
3
The 0.5 limit comes from the fact that as np → ∞, the χ2 distribution approaches a Gaussian
distribution with mean np and variance 2np . However, this convergence is very slow.
5

criterion then reads:


Qf (np )
ln L(θ0 ) ≥ ln L̂ − , (15)
2
To obtain a handy expression, we fix f = 0.9 and use the Wilson–Hilferty transfor-
mation (Wilson & Hilferty 1931)4 , and approximate ln L0.9 as:
 q 3
np 1 − 9n2 p + 1.3 9n2 p
ln L0.9 = ln L̂ − . (16)
2
The 1.3 factor is ad-hoc such that this approximates the 90% quantile of the log-
likelihood distribution, as can be seen in Fig. 1. Finally, defining the total fitting
factor FFT as the maximised overlap over θ, so that:
ln L̂ = −SNR2T (1 − FFT (s0 , h)), (17)
we can write the revised indistinguishability criterion as:
 q 3
np 1 − 9n2 p + 1.3 9n2 p
1 − OT (s0 , h(θ0 )) ≤ + 1 − FFT (s0 , h). (18)
2SNR2T
In Fig. 2, we illustrate why the standard indistinguishability criterion could erro-
neously indicate that we expect a bias, when our revisited criterion would not. The
standard criterion makes reference to the idealised likelihood distribution of the true
model (where ln L̂ = 0), while the new criterion makes reference to the measured
likelihood distribution. Moreover, the former requires that the log-likelihood at the
true point is larger than the mean log-likelihood, which, in addition to not being
statistically sound, tends to be too stringent. This illustration shows a hypothetical
scenario in which the value of the log-likelihood at the true point lies well within
the bulk of the distribution, above the 90% quantile, indicating that the true point
should lie in the 90% confidence region, while the whole support of the log-likelihood
distribution lies well below the idealised likelihood distribution reference by the stan-
dard criterion. Accounting for the corrections we propose is expected to decrease
the discrepancies between the prediction of the indistinguishability criterion and full
parameter estimation found in other studies, including Pürrer & Haster (2020).5
2.3. Looking at a subset of parameters
In the expression we have derived, np is the number of free parameters, i.e., the
number of parameters that are varied when performing MCMC, and our criterion

q
X
4
If X follows a χ2 (np ) distribution, then 3
np converges to a normal distribution with mean 1 − 9n2 p
and variance 9n2 p . Remarkably, this convergence is much faster than the one of χ2 to a normal
distribution.
5
In that work the authors let the factor in the numerator of the right-hand side of Eq. 14 be free
instead of setting it to np , and estimate it by setting an equality between the left and right-hand
sides of Eq. 14 at the SNR where they find, from full from parameter estimation, systematic biases
to equal statistical errors. In one case, they find this factor to be as large as 104 for an SNR of 250,
which could be obtained for a fitting factor of 0.92. This is plausible considering that they analyse
numerical relativity waveforms with full harmonic content using templates with the (2,2) harmonic
only.
6

Figure 2. Probability density function of log-likelihood values in a mock case with np =


8 parameters, ln L(θ0 ) = −26 and ln L̂ = −20. The dotted curve assumes ln L̂ = 0,
as in the standard version of the indistinguishability criterion. The vertical dotted lines
show the mean log-likelihood value, for both the true and the wrong value of L̂. In the
standard formulation of the indistinguishability criterion, it is the lower limit for the log-
likelihood at the true point to decide on the presence of biases. The dashed line shows
the 90% quantile of the true log-likelihood distribution. This plot illustrates why using
the standard indistinguishability criterion, one would erroneously conclude that biases are
expected, whereas our revisited criterion (Eq. 18) corrects for this.

informs us if the true point lies inside the np − dimensional 90% confidence region.
Often, we are interested in estimating if biases are expected in a specific subset of
parameters, e.g., the intrinsic parameters. We can denote θ = (θ1 , θ2 ), where θ1 are
the parameters we are interested in and θ2 those in which we are not.

2.3.1. Maximising approach


A common approach is to maximise the overlap at the subset θ01 of the true point
over the parameters θ2 . We denote by θ̌2 the point that maximises OT (s0 , h(θ01 , θ2 )).
Through Eq. 8, this can be interpreted as maximising the likelihood over θ2 while
fixing θ1 to θ01 . With this procedure, we define a posterior distribution on θ1 that
corresponds to a slice of the full posterior onto the θ2 = θ̌2 hypersurface:

p(θ1 , θ̌2 |d1 , ..., dnd )


p(θ1 , θ2 = θ̌2 |d1 , ..., dnd ) = R . (19)
p(θ1 , θ̌2 |d1 , ..., dnd )dθ1

It can be rewritten as:

p(θ1 , θ2 = θ̌2 |d1 , ..., dnd ) = N L(θ1 , θ̌2 ), (20)


7

where N is a normalisation constant. If L(θ1 , θ2 ) follows the Gaussian approximation,


L(θ1 , θ̌2 ) also does, and we can follow the same steps as before. We obtain the
criterion:
 q 3
n1p 1 − 9n2 1 + 1.3 9n2 1
p p
1 − OT (s0 , h(θ01 , θ̌2 )) ≤ 2 + 1 − FFT (s0 , h(θ̌2 )), (21)
2SNRT
where n1p are the number of parameters in the subset θ1 , and FFT (s0 , h(θ̌2 )) is the
total fitting factor under the constraint θ2 = θ̌2 . The quantity OT (s0 , h(θ01 , θ̌2 )) is
often called the faithfulness of the template. However, requiring θ01 to be in the 90%
confidence region of the distribution defined by fixing the parameters θ2 to the value
θ̌2 in the posterior is not a well-motivated requirement. In particular, it is not the
same as performing parameter estimation for all parameters θ and looking at the
marginal posterior on θ1 . Within the Gaussian approximation, this is true only if the
sets of parameters θ1 and θ2 are uncorrelated, but in this case any value of θ2 could
be used to evaluate the likelihood.
2.3.2. Marginalising approach
From the Bayesian perspective, it is better defined to look at the marginalised
posterior on θ1 : R
1 L(θ1 , θ2 )p(θ1 , θ2 )dθ2
p(θ |d1 , ..., dnd ) = R . (22)
L(θ1 , θ2 )p(θ1 , θ2 )dθ2 dθ1
In the case of flat priors, the prior term is constant, and the posterior on θ1 can be
written as:
p(θ1 |d1 , ..., dnd ) = N Lθ2 (θ1 ), (23)
where we have introduced Lθ2 as the average likelihood over the prior range on θ2 .
If the Gaussian approximation applies to the full likelihood, it still applies to the
marginalised one. Thus, following the same reasoning as we did before, we obtain the
indistinguishability criterion for θ1 :
Qf (n1p )
ln Lθ2 (θ01 ) ≥ ln Lˆθ2 − . (24)
2
For the sake of comparison, we introduce the ”averaged” overlap through:
ln Lθ2 (θ1 ) = −SNRT 2 (1 − OT,θ2 (θ1 )), (25)
and define the “averaged” fitting factor FFT,θ2 as the maximum of the “averaged”
overlap over θ1 . The criterion then reads:
 q 3
n1p 1 − 9n2 1 + 1.3 9n2 1
p p
1 − OT,θ2 (θ01 ) ≤ 2 + 1 − FFT,θ2 . (26)
2SNRT
Unfortunately, because of the averaging over θ2 , this version of the indistinguishability
criterion is not very convenient to work with. We now propose a different criterion
to address the presence of biases within a subset of parameters.
8

3. MODEL SELECTION
In Toubiana et al. (2023), we introduced the Akaike information criterion (AIC)
(Akaike 1974) of a model, defined as:

AIC = k np − 2 ln L̂, (27)

where k is a constant, that was originally taken to be k = 2, but other choices are
occasionally made in the statistics literature. The Bayes’ factor between two models
can be approximated by:
1
ln B = − (AIC1 − AIC2 ). (28)
2
This can be justified by considering the case of an np −dimensional Gaussian likelihood
with covariance eigenvalues σi , and taking a flat prior that extendsa width√
2∆i in each
n
eigenvector direction, θ⃗i . The log-evidence for this model is: ln 2πσ
Q p
i 2∆i
i
+ ln L̂.

Eq. 28 is recovered by choosing ∆i = e 22π σi ≃ 3.4σi . In this way, the chosen prior
domain contains 0.9993np of the likelihood probability weight, which is usually enough
for the values of np we are typically interested in. For spinning binary GW sources,
np can be as large as 15, in which case this probability weight is ∼ 99%. Increasing
∆i to 4.07σi (which would mean that the prior now contains 99.93% of the likelihood
probability weight when np = 15) would correspond to setting k = 2.36 in the AIC.
These choices of ∆i are somewhat ad-hoc, but the procedure can be interpreted as
trying to maximize the evidence for a model with flat priors, without excluding any
region of the parameter space consistent with the observed data. It is interesting
that the AIC can be related to the evidence in this way, since the AIC is derived
using information theory, by minimizing the information loss relative to the true data
generating process by considering the Kullback-Liebler divergence between different
models. Moreover, note that the AIC is a measure of how well a model fits the data:
the smaller it is, the better the fit. In this sense, interpreting the difference in AICs
as a log-Bayes’ factor through Eq. 28 is a way of a introducing a scale to quantify
how much a model is preferred to another.
Now, we discuss how this approach can be applied to assess if biases are expected.
From the Bayesian perspective, we expect not to have biases in a subset of param-
eters θ1 , if the model where those parameters are fixed to their true value θ01 is not
disfavoured over the model where we vary them. This is the same reasoning we used
in Toubiana et al. (2023) to estimate the SNR at which we would expect apparent
deviations from GR to arise due to systematic effects. The log-Bayes’ factor between
the two models reads:

ln B = ln L̂(θ1 = θ01 ) − ln L̂ + n1p . (29)

In that equation, ln L̂(θ1 = θ01 ) is the maximum log-likelihood when fixing θ1 to θ01
and letting only θ2 vary, and ln L̂ is the maximum log-likelihood when varying all the
9

parameters. Using Eq. 8, the log-Bayes’ factor can be written:

ln B = OT (s0 , h(θ01 , θ̌2 )) − FFT (s0 , h) SNR2T + n1p ,


 
(30)

where, as above, θ̌2 is such that the total overlap at θ01 is maximised.
The value of ln B from which we expect to observe biases is not unequivocally
defined. Adopting the Kass-Raftery scale Kass & Raftery (1995), we expect to have
strong evidence for a bias if ln B is in the range 3-5, and very strong evidence for
ln B > 5. In Toubiana et al. (2023), we found these scales to be in good agreement
with our parameter estimation runs.

4. PRACTICAL CONSIDERATIONS
We are often interested in determining the SNR threshold at which we expect bi-
ases to begin to appear. From Eq. 5, when varying the SNR of the source just
by varying the distance of the signal s0 , the log-likelihood can be readily rescaled:
since only the (s0,i |s0,i ) term depends on the distance, each term of the sum scales
as SNR2i , and each of these scales in the same way with distance. Moreover, the
parameters that maximise the log-likelihood do not change, apart from the distance,
which scales in the same way as the distance of s0 . Thanks to this observation, the
Bayes’ factor-based criterion, as well as the full and maximised versions of the re-
visited indistinguishability criterion (given by Eqs. 15, 18 and 21, not the one where
we perform marginalisation over a subset of parameters presented in Sec. 2.3.2) can
be easily evaluated at different SNRs, using minimisation routines or Markov-Chain-
Monte-Carlo (MCMC) (as in Toubiana et al. (2023)) to estimate θ̂2 and ln L̂ for a
given SNR and applying the proper rescaling. Alternatively, using the linear signal
approximation, we can compute the Fisher matrix (see Eq. 13), assuming that θ̂ ∼ θ0
(the difference in the Fisher matrix at those points should yield higher order correc-
tions to the linear signal approximation), use it to draw samples of θ and compute the
log-likelihood at those points. A drawback of this method and of the MCMC-based
one, is that the maximum log-likelihood lies in the higher-end tail of the log-likelihood
distribution, which might not be well sampled (recall that errors in the estimation of
the log-likelihood will scale with SNR2 in the criteria). We can once again exploit the
fact that the log-likelihood follows a χ2 (np ) distribution and estimate its maximum
value from the mean, < ln L >, whose estimate through sampling is more stable,
using the relation:
np
ln L̂ =< ln L > + . (31)
2
The fact that the marginalised version of the revisited indistinguishability criterion
(Eq. 24) does not follow the SNR scaling makes it even more inconvenient to work
with. Thus, we believe the Bayes’ factor-based criterion we propose offers a powerful
alternative for the study of systematic effects.
10

5. ACKNOWLEDGEMENTS
We are thankful to Lorenzo Pompili and Alessandra Buonanno for fruitful discus-
sions that led to the elaboration of these notes.

REFERENCES
Akaike, H. 1974, IEEE Transactions on Lindblom, L., Owen, B. J., & Brown,
Automatic Control, 19, 716, D. A. 2008, Phys. Rev. D, 78, 124020,
doi: 10.1109/TAC.1974.1100705 doi: 10.1103/PhysRevD.78.124020
Amaro-Seoane, P., et al. 2017. McWilliams, S. T., Kelly, B. J., & Baker,
https://arxiv.org/abs/1702.00786 J. G. 2010, Phys. Rev. D, 82, 024014,
Baird, E., Fairhurst, S., Hannam, M., & doi: 10.1103/PhysRevD.82.024014
Murphy, P. 2013, Phys. Rev. D, 87, Pürrer, M., & Haster, C.-J. 2020, Phys.
024035, Rev. Res., 2, 023151,
doi: 10.1103/PhysRevD.87.024035 doi: 10.1103/PhysRevResearch.2.023151
Chatziioannou, K., Klein, A., Yunes, N., Tinto, M., & Dhurandhar, S. V. 2005,
& Cornish, N. 2017, Phys. Rev. D, 95, Living Rev. Rel., 8, 4,
104004, doi: 10.12942/lrr-2005-4
doi: 10.1103/PhysRevD.95.104004
Toubiana, A., Pompili, L., Buonanno, A.,
Finn, L. S. 1992, Phys. Rev. D, 46, 5236,
Gair, J. R., & Katz, M. L. 2023.
doi: 10.1103/PhysRevD.46.5236
https://arxiv.org/abs/2307.15086
Flanagan, E. E., & Hughes, S. A. 1998,
Phys. Rev. D, 57, 4566, Wilson, E. B., & Hilferty, M. M. 1931,
doi: 10.1103/PhysRevD.57.4566 Proceedings of the National Academy
Kass, R. E., & Raftery, A. E. 1995, of Sciences of the United States of
Journal of the American Statistical America, 17, 684.
Association, 90, 773, http://www.jstor.org/stable/86022
doi: 10.1080/01621459.1995.10476572

You might also like