Teaching To Extract Spectral Densities From Lattice Correlators To A Broad Audience of Learning-Machines

Teaching to extract spectral densities from lattice correlators
to a broad audience of learning-machines

Michele Buzzicotti,1, ∗ Alessandro De Santis,1, † and Nazario Tantalo1, ‡
1
University and INFN of Roma Tor Vergata, Via della Ricerca Scientifica 1, I-00133, Rome, Italy
(Dated: July 4, 2023)
We present a new supervised deep-learning approach to the problem of the extraction of smeared
spectral densities from Euclidean lattice correlators. A distinctive feature of our method is a model-
independent training strategy that we implement by parametrizing the training sets over a functional
space spanned by Chebyshev polynomials. The other distinctive feature is a reliable estimate of the
systematic uncertainties that we achieve by introducing several ensembles of machines, the broad
audience of the title. By training an ensemble of machines with the same number of neurons over
training sets of fixed dimensions and complexity, we manage to provide a reliable estimate of the
systematic errors by studying numerically the asymptotic limits of infinitely large networks and
training sets. The method has been validated on a very large set of random mock data and also
arXiv:2307.00808v1 [hep-lat] 3 Jul 2023
in the case of lattice QCD data. We extracted the strange-strange connected contribution to the
smeared R-ratio from a lattice QCD correlator produced by the ETM Collaboration and compared
the results of the new method with the ones previously obtained with the HLT method by finding
a remarkably good agreement between the two totally unrelated approaches.
I. INTRODUCTION Nb = 4 Nb = 6 Nb = ∞ Nb = ∞
Nρ = 12 Nρ = ∞ Nρ = 30 Nρ = ∞
The problem of the extraction of hadronic spectral den-
sities from Euclidean correlators, computed from numer-
ical lattice QCD simulations, has attracted a lot of at-
tention since many years (see Refs. [1–30], the works on
the subject of which we are aware of, and Refs. [31, 32]
for recent reviews). At zero temperature, the theoretical
and phenomenological importance of hadronic spectral
FIG. 1. By introducing a discrete functional-basis, with
densities, strongly emphasized in the context of lattice elements Bn (E), that is dense in the space of square-
field theory in Refs. [1, 11, 13, 14, 17–19], is associated integrable functions f (E) in the interval [E0 , ∞) with E0 > 0,
with the fact that from their knowledge it is possible to any
P∞ such function can exactly be represented as f (E) =
extract all the information needed to study the scattering n=0 cn Bn (E). With an infinite number of basis functions
of hadrons and, more generally, their interactions. (Nb = ∞) and by randomly selecting an infinite number
From the mathematical perspective, the problem of the (Nρ = ∞) of coefficient vectors c = (c0 , · · · , cNb ), one can
extraction of spectral densities from lattice correlators is get any possible spectral density. This is the situation repre-
equivalent to that of an inverse Laplace-transform oper- sented by the filled blue disk. If the number of basis functions
ation, to be performed numerically by starting from a Nb and the number of randomly extracted spectral densities
Nρ are both finite one has a training set that is finite and
discrete and finite set of noisy input data. This is a no-
that also depends on Nb . This is the situation represented in
toriously ill-posed numerical problem that, in the case the first disk on the left. The other two disks schematically
of lattice field theory correlators, gets even more compli- represent the situations in which either Nb or Nρ is infinite.
cated because lattice simulations have necessarily to be
performed on finite volumes where the spectral densities
are badly-behaving distributions.
In Ref. [14], together with M. Hansen and A. Lupo, one
limit is mathematically well defined. The method of
of the authors of the present paper proposed a method to
Ref. [14] (HLT method in short) has been further re-
cope with the problem of the extraction of spectral den-
fined in Ref. [21] where it has been validated by perform-
sities from lattice correlators that allows to take into ac-
ing very stringent tests within the two-dimensional O(3)
count the fact that distributions have to be smeared with
non-linear σ-model.
sufficiently well-behaved test functions. Once smeared,
finite volume spectral densities become numerically man- In this paper we present a new method for the extrac-
ageable and the problem of taking their infinite volume tion of smeared spectral densities from lattice correlators
that is based on a supervised deep-learning approach.
The idea of using machine-learning techniques to ad-
dress the problem of the extraction of spectral densities
∗ michele.buzzicotti@roma2.infn.it from lattice correlators is certainly not original (see e.g.
† alessandro.desantis@roma2.infn.it Ref. [15, 16, 22–29]). The great potential of properly-
‡ nazario.tantalo@roma2.infn.it trained deep neural networks in addressing this problem
2
Tσ (Nb , Nρ )
1
Cc (t) N ρ̂pred (E, N , c, r)
SUPERVISED TRAINING
σ
TRAINED
Neural Networks 3
2
r=1
N

pred 2
C(t) ρ̂σ (E, N , r) − ρ̂true
σ (E)

3 ∆latt
σ (E, N ) ∆net
σ (E, N )
2
r=1
ρ̂pred
σ (E, N ) ± ∆stat
σ (E, N ) +
FIG. 2. We generate several training sets Tσ (Nb , Nρ ) built

c r
by considering randomly chosen spectral densities. These are 3
obtained by choosing Nρ random coefficients vectors with Nb
entries, see FIG. 1. For each spectral density ρ(E) we build
the associated correlator C(t) and smeared spectral density + +
ρ̂σ (E), where σ is the smearing parameter. We then dis-
tort the correlator C(t), by using the information provided 0
Nρ
∞ 0
Nn
∞ 0
Nb
∞
by the statistical variance of the lattice correlator (the one
we are going to analyse at the end of the trainings), and ob-
tain the input-output pair (Cnoisy (t), ρ̂σ (E)) that we add to
1 then implement different pred ρ̂pred (E) ± ∆tot
σ (E)
Tσ (NCb ,(t)
Nρ ). We N neural networks
ρ̂σ (E, N , c, r)
σ
c
with Nn neurons and Neural atTRAINED
fixed
Networks
N = (N n , Nb , Nρ ) we intro-
3
duce an ensemble of machines r=1
with
2 Nr replicas. Each ma- FIG. 3. Flowchart illustrating the three-step procedure that,
chine r = 1, · · · , Nr belonging to the ensemble has the same after the training, we use to extract the final result. Here
architecture and, before the training, differs from the other C(t) represents the input lattice correlator
1 that, coming from
replicas for the initialization parameters. All the replicas are2 a Monte Carlo simulation, is affected by statistical noise.
net (E, N )
then trained over Tσ (Nb , Nρ ) and,∆latt at (E,
σ theNend
) ∆
ofσ the training We call Cc (t) the c-th bootstrap sample (or jackknife bin)
process, each replica will give a different answer depending of the lattice correlator with c = 1, · · · , Nc . In the first
upon Nρ̂.pred
σ (E, N ) ± ∆stat (E, N ) + step, Cc (t) is fed to all the trained neural networks belonging
σ
to the ensemble at fixed N and the corresponding answers
c r ρ̂pred
σ (E, N , c, r) are collected. In the second step, by com-
3 bining in quadrature the widths of the distributions of the
is pretty evident from the previous works on the subject. answers as a function of the index c (∆lattσ (E, N )) and of the
These findings strongly motivated us to develop an ap- index r (∆net stat

σ (E, N )) we estimate the error ∆σ (E, N )) at
+ fixed N . At the end of this step we are left with a collection
proach that can be+used to obtain trustworthy theoretical
of results ρ̂pred
σ (E, N ) ± ∆stat
σ (E, N ). In the third and last
predictions.
0 To this
∞ end
0 we had
∞ to address
0 the
∞ following step, the limits N 7→ ∞ are studied numerically and an un-
two pivotalNquestions
ρ Nn Nb
biased estimate of ρ̂pred
σ (E) and of its error ∆tot
σ (E), with the
latter also taking into account the unavoidable systematics
1. is it possible to devise a model independent training associated with these limits, is finally obtained.
strategy? ρ̂pred
σ (E) ± ∆tot
σ (E)
2. if such a strategy is found, is it then possible to

quantify reliably, together with the statistical er- 1. the introduction of a functional-basis to
rors, also the unavoidable systematic
1 uncertainties? parametrize the correlators and the smeared
spectral densities of the training sets in a model
The importance of the first question can hardly be un- independent way;
derestimated. Under the working assumption, supported
by the so-called universal reconstruction theorems (see 2. the introduction of the ensemble of machines, the
Refs. [33–35]), that a sufficiently large neural network broad audience mentioned in the title, to estimate
can perform any task, limiting either the size of the net- the systematic errors1 .
work or the information to which it is exposed during A bird-eye view of the proposed strategy is given in
the training process means, in fact, limiting its ability FIG. 1, FIG. 2 and in FIG. 3. The method is validated
to solve the problem in full generality. Addressing the by using both mock and true lattice QCD data. In the
second question makes the difference between providing
a possibly efficient but qualitative solution to the prob-
lem and providing a scientific numerical tool to be used
in order to derive theoretical predictions for phenomeno- 1 The idea of using ensembles of machines has already been ex-
logical analyses. plored within the machine-learning literature in other contexts,
In order to address these two questions, the method see e.g. Ref. [36] where the so-called “ensembling” technique to
that we propose in this paper has been built on two pillars estimate the errors is discussed from a Bayesian perspective.
3
case of mock data the exact result is known and the val- (see e.g. Refs. [37, 38] for textbooks discussing the sub-
idation test is quite stringent. In the case of true lattice ject from a machine-learning perspective). The problem,
QCD data the results obtained with the new method are as we are now going to discuss, is particularly challenging
validated by comparison with the results obtained with from the numerical point of view and becomes even more
the HLT method. challenging and delicate in our Quantum Field Theory
The plan of the paper is as follows. In Section II we (QFT) context because, according to Wightman’s ax-
introduce and discuss the main aspects of the problem. ioms, QFT spectral densities live in the space of tem-
In Section III we illustrate the numerical setup used to pered distributions and, therefore, cannot in general be
obtain the results presented in the following sections. In considered smooth and well behaved functions.
Section IV we describe the construction of the training Before discussing though the aspects of the problem
sets and the proposed model-independent training strat- that are peculiar to our QFT context, it is instructive to
egy. In Section V we illustrate the procedure that we first review the general aspects of the numerical inverse
use to extract predictions from our ensembles of trained Laplace-transform problem that, in fact, is ill-posed in
machines and present the results of a large number of the sense of Hadamard. To this end, we start by consid-
validation tests performed with random mock data. Fur- ering the correlator in the infinite L and T limits,
ther validation tests, performed by using mock data gen- Z ∞
erated by starting from physically inspired models, are C(aτ ) = lim CT L (aτ ) = dE e−τ aE ρ(E) , (4)
L,T 7→∞ E0
presented in Section VI. In Section VII we present our
results in the case of lattice QCD data and the compar- and we assume that our knowledge of C(t) is limited to
ison with the HLT method. We draw our conclusions in the discrete and finite set of times t = aτ . Moreover
Section VIII and discuss some technical details in the two we assume that the input data are affected by numerical
appendixes. and/or statistical errors that we call ∆(aτ ).
The main point to be noticed is that, in general, the
spectral density ρ(E) contains an infinite amount of in-
II. THEORETICAL FRAMEWORK formation that cannot be extracted from the limited and
noisy information contained into the input data C(aτ ).
The problem that we want to solve is the extraction As a consequence, in any numerical approach to the ex-
of the infinite-volume spectral density ρ(E) from the Eu- traction of ρ(E) a discretization of Eq. (4) has to be
clidean correlators introduced. Once a strategy to discretize the problem
Z ∞ has been implemented, the resulting spectral density has
then to be interpreted as a “filtered” or (as is more natu-
CLT (t) = dE bT (t, E) ρLT (E) , (1)
E0 ral to call it in our context) smeared version of the exact
spectral density,
computed numerically in lattice simulations. These are Z ∞
performed by introducing a finite spatial volume L3 , a ρ̂(E) = dω K(E, ω) ρ(ω) , (5)
finite Euclidean temporal direction T and by discretizing E0
space-time, where K(E, ω) is the so-called smearing kernel, explic-
itly or implicitly introduced within the chosen numerical
T L approach.
x = a(τ, n) , 0 ≤ τ < NT = , 0 ≤ ni < NL = ,
a a There are two main strategies (and many variants of
(2) them) to discretize Eq. (4). The one that we will adopt in
this paper has been introduced and pioneered by Backus
where a is the so-called lattice spacing and (τ, n) are and Gilbert [39] and is based on the introduction of a
the integer space-time coordinates in units of a. In or- smearing kernel in the first place. We will call this the
der to obtain the infinite-volume spectral density one has “smearing” discretization approach. The other approach,
to perform different lattice simulations, by progressively that is more frequently used in the literature and that
increasing L and/or T , and then study numerically the we will therefore call the “standard” one, is built on
L 7→ ∞ and T 7→ ∞ limits. In the T 7→ ∞ limit the basis the assumption that spectral densities are smooth and
functions bT (t, E) become decaying exponentials and the well-behaved functions. Before discussing the smearing
correlator is given by approach we briefly review the standard one, by putting
Z ∞ the emphasis on the fact that also in this case a smearing
CL (t) = lim CT L (t) = dE e−tE ρL (E) , (3) kernel is implicitly introduced.
T 7→∞ E0 In the standard discretization approach the infinite-
volume correlator is approximated as a Riemann sum,
where ρL (E) = 0 for E < E0 and the problem is that of
performing a numerical inverse Laplace-transform oper- E −1
NX
ation, by starting from a discrete and finite set of noisy Ĉ(aτ ) = σ e−τ aEm ρ̂(Em ) , Em = E0 + mσ ,
input data. This is a classical numerical problem, arising m=0
in many research fields, and has been thoroughly studied (6)
4
under the assumption that the infinite-volume spectral 2.0
density is sufficiently regular to have En = 5.0

1.5
σ = 0.1
C(aτ ) − Ĉ(aτ ) ≪ ∆(aτ ) (7)

σ = 0.2
σK(En, ω)
1.0
σ = 0.4
for NE sufficiently large and σ sufficiently small. By 0.5
introducing the “veilbein matrix”
0.0
Êτ m ≡ e−τ aEm , (8)
−0.5
and the associated “metric matrix” in energy space,
h i −1.0
4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75 6.00
Ĝnm ≡ Ê T Ê ω
nm
1090 En = 5.0
T −1
NX −a(En +Em ) −T (En +Em )
−τ a(En +Em ) e −e 1079 σ = 0.1
= e = , σ = 0.2
τ =1
1− e−a(En +Em ) 10 68
|gτ (En)|
σ = 0.4
1057
(9)
1046
Eq. (6) is then solved, 1035
T −1
NX 1024
ρ̂(En ) = gτ (En ) Ĉ(aτ ) , 1013
τ =1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
τ
E −1
NX
1
gτ (En ) = Ĝ−1
nm Êτ m . (10)
σ m=0
FIG. 4. Top panel : Smearing kernel of Eq. 12 at En = 5
for three values of σ. The reconstruction is performed by
By using the previous expressions we can now explain setting E0 = 1 and Emax = 10 as UV cutoff in Eq. 5 and by
why the problem is particularly challenging and in which working in lattice units with a = 1. The kernel is smooth for
sense it is numerically ill-posed. ω ≥ En while it presents huge oscillations for ω < En . In the
On the numerical side, the metric matrix Ĝ is very case σ = 0.1 these oscillations range from −10126 to +10121 .
badly conditioned in the limit of large NE and small Bottom panel : Corresponding coefficients gτ (En ) (see Eq. 10)
σ. Consequently, the coefficients gτ (En ) become huge in absolute value for the first 20 discrete times. Extended-
precision arithmetic is mandatory to invert the matrix Ĝnm .
in magnitude and oscillating in sign in this limit and
This test, in which det(Ĝ) = O 10−15700 for σ = 0.1, has
even a tiny distortion of the input data gets enormously been performed by using 600-digits arithmetic
amplified,
T −1
NX
σ7→0
gτ (En ) ∆(aτ ) 7−→ ∞. (11) Consistency would require that K(En , ω) = δ(En − ω)
τ =1 but this cannot happen at finite T and/or NE . In fact
K(En , ω) can be considered a numerical approximation
In this sense the numerical solution becomes unstable
of δ(En −ω) that, as can easily be understood by noticing
and the problem ill-posed.
that
Another important observation concerning Eqs. (10),
usually left implicit, concerns the interpretation of ρ̂(En ) δnm
as a smeared spectral density. By introducing the smear- K(En , Em ) = , (14)
σ
ing kernel
has an intrinsic energy-resolution proportional to the dis-
T −1
NX
−aτ ω cretization interval σ of the Riemann’s sum. A numerical
K(En , ω) = gτ (En ) e , (12)
study of K(En , ω) at fixed En reveals that for ω ≥ En
τ =1
the kernel behaves smoothly while it oscillates wildly for
and by noticing that, as a matter of fact, the spectral ω < En and small values of σ. A numerical example is
density has to be obtained by using the correlator C(aτ ) provided in FIG. 4.
(and not its approximation Ĉ(aτ ), to be considered just Once the fact that a smearing operation is unavoid-
as a theoretical device introduced in order to formalize able in any numerical approach to the inverse Laplace-
the problem), we have transform problem has been recognized, the smearing
Z ∞ discretization approach that we are now going to dis-
ρ̂(En ) = dω K(En , ω) ρ(ω) . (13) cuss appears more natural. Indeed, the starting point of
E0 the smearing approach is precisely the introduction of a
5
smearing kernel and this allows to cope with the prob- and by considering the limits
lem also in the case, relevant for our QFT applications,
in which spectral densities are distributions. lim lim ρ̂LT (E) = ρ(E) , (22)
σ7→0 L,T 7→∞
By reversing the logic of the standard approach, that
led us to Eq. (12), in the smearing approach the prob- in the specified order. When σ is very small, the co-
lem is discretized by representing the possible smearing efficients gτ (E) determined by using Eq. (17) become
kernels as functionals of the coefficients gτ according to gigantic in magnitude and oscillating in sign, as in the
T −1
NX case of the coefficients obtained by using Eq (10). In fact
K(g, ω) = gτ bT (aτ, ω) . (15) the numerical problem is ill-posed independently of the
τ =1 approach used to discretize it.
We now come to the aspects of the problem that are
Notice that we are now considering the correlator peculiar to our QFT context. Hadronic spectral densities
CLT (aτ ) at finite T and L and this allows to analyse
the results of a single lattice simulation. The main ob- ρ(p1 , · · · pn−1 )
servation is that in the T 7→ ∞ limit any target kernel
Ktarget (E, ω) such that
Z ∞ = ⟨0|Ô1 (2π)3 δ 4 (P̂ − p1 )Ô2 · · · (2π)3 δ 4 (P̂ − pn−1 )Ôn |0⟩ ,
2
∥Ktarget (E)∥ =
2
dω |Ktarget (E, ω)| < ∞, (16) (23)
E0
are the Fourier transforms of Wightman’s functions in
can exactly be represented as a linear combination of Minkowski space,
the b∞ (aτ, ω) = exp(−aωτ ) basis functions (that are in-
deed dense in the functional-space L2 [E0 , ∞] of square- W (x1 , · · · xn−1 ) = ⟨0|Ô1 e−iP̂ ·x1 Ô2 · · · e−iP̂ ·xn−1 Ôn |0⟩ ,
integrable functions). With a finite number of lattice (24)
times, the smearing kernel is defined as the best possible
approximation of Ktarget (E, ω), i.e. the coefficients gτ (E)
where P̂ = (Ĥ, P̂ ) is the QCD four-momentum operator
are determined by the condition
and the Ôi ’s are generic hadronic operators. According to
2 Wightman’s axioms, vacuum expectation values of field
∂ ∥K(g) − Ktarget (E)∥
=0. (17) operators are tempered distributions. This implies that
∂gτ

gτ =gτ (E) also their Fourier transforms, the spectral densities, are
Once the coefficients are given, the smeared spectral den- tempered distributions in energy-momentum space. It
sity is readily computed by relying on the linearity of the is therefore impossible, in general, to compute numeri-
problem, cally a spectral density. On the other hand, it is always
possible to compute smeared spectral densities,
T −1
NX
ρ̂LT (E) = gτ (E) CLT (aτ ) Z (Y )
4
τ =1 ρ[K1 , · · · Kn−1 ] = d pi Ki (pi ) ρ(p1 , · · · pn−1 ) ,
i
Z ∞
(25)
= dω K(E, ω) ρLT (ω) , (18)
E0 where the kernels Ki (p) are Schwartz functions.
In this paper, to simplify the discussion and the no-
where now tation, we focus on the dependence upon a single space-
K(E, ω) ≡ K(g(E), ω) . (19) time variable,
In the smearing approach one has W (x) = ⟨0|ÔF e−iP̂ ·x ÔI |0⟩ , (26)
Z ∞
lim ρ̂LT (E) = dω Ktarget (E, ω) ρL (E) . (20) where the operators ÔI and ÔF might also depend upon
T 7→∞ E0
other coordinates. The spectral density associated with
Moreover, provided that ρ(E) exists, it can be obtained W (x) is given by
by choosing as the target kernel a smooth approximation
1
Z
to δ(E − ω) depending upon a resolution parameter σ, ρ(E) = d4 x eip·x W (x)
e.g. 2π
1 (E−ω)2
Ktarget (E, ω) = √ e− 2σ2 ,
2πσ = ⟨0|ÔF δ Ĥ − E (2π)3 δ 3 P̂ − p ÔI |0⟩ , (27)
lim Ktarget (E, ω) = δ(E − ω) , (21) and, to further simplify the notation, we don’t show ex-
σ7→0 plicitly the dependence of ρ(E) w.r.t. the fixed spatial
6
momentum p. The very same spectral density appears III. NUMERICAL SETUP
in the two-point Euclidean correlator
Z In order to implement the strategy sketched in FIG. 1,
C(t) = 3
d xe −ip·x
⟨0|ÔF e −tĤ+iP̂ ·x
ÔI |0⟩ FIG. 2, FIG. 3 and discussed in details in the following
sections, the numerical setup needs to be specified. In
this section we describe the data layout that we used to
Z ∞ represent the input correlators and the output smeared
= dE e−tE ρ(E) (28) spectral densities and then provide the details of the ar-
E0 chitectures of the neural networks that we used in our
study.
for positive Euclidean time t. In the last line of the pre-
vious equation we have used the fact that, because of
the presence of the energy Dirac-delta function appear- A. Data layout
ing in Eq. (27), the spectral density vanishes for E < E0
where E0 is the smallest energy that a state propagating In this work we have considered both mock and real
between the two hadronic operators ÔI and ÔF can have. lattice QCD data and have chosen the algorithmic pa-
On a finite volume the spectrum of the Hamiltonian rameters in order to be able to extract the hadronic spec-
is discrete and, consequently, the finite-volume spectral tral density from the lattice QCD correlator described in
density ρL (E) becomes a particularly wild distribution, Section VII. Since in the case of the lattice QCD cor-
relator the basis function bT (t, E) (see Eq. (1)) is given
by
X
ρL (E) = wn (L) δ(En (L) − E) , (29)
n
ω 2 h −tω −(T −t)ω
i
bT (t, ω) = e + e , (31)
the sum of isolated Dirac-delta peaks in correspondence 12π 2
of the eigenvalues En (L) of the finite volume Hamilto- with T = 64a, we considered the same setup also in the
nian. By using the previous expression one has that case of mock data. These are built by starting from a
model unsmeared spectral density ρ(E) and by comput-
ing the associated correlator according to
X
CL (t) = wn (L) e−En (L)t , (30)
n Z ∞
ω 2 h −tω i
C(t) = dω 2
e + e−(T −t)ω ρ(ω) , (32)
and this explains why we have discussed the standard E0 12π
approach to the discretization of the inverse Laplace-
and the associated smeared spectral density according to
transform problem by first taking the infinite-volume
Z ∞
limit. Indeed if the Riemann’s discretization of Eq. (6)
is applied to CL (aτ ) and all the Em ’s are different from ρ̂σ (E) = dω Kσ (E, ω)ρ(ω) . (33)
E0
all the En (L)’s one gets ĈL (aτ ) = 0!
On the one hand, also in infinite volume, spectral den- Since in the case of the lattice QCD spectral density,
sities have to be handled with the care required to cope in order to compare the results obtained with the pro-
with tempered distributions and this excludes the option posed new method with the ones previously obtained in
of using the standard approach to discretize the problem Ref. [40], we considered a Gaussian smearing kernel, also
in the first place. On the other hand, in some specific in the case of mock data we made the choice
cases it might be conceivable to assume that the infinite 1 (E−ω)2
volume spectral density is sufficiently smooth to attempt Kσ (E, ω) = √ e− 2σ2 . (34)
2πσ
using the standard discretization approach. This requires
that the infinite-volume limit of the correlator has been We stress that there is no algorithmic reason behind the
numerically taken and that the associated systematic un- choice of the Gaussian as the smearing kernel and that
certainty has been properly quantified (a non-trivial nu- any other kernel can easily be implemented within the
merical task at small Euclidean times where the errors proposed strategy.
on the lattice correlators are tiny). Among the many computational paradigms available
The smearing discretization approach can be used ei- within the machine-learning framework, we opted for the
ther in finite- and infinite-volume and the associated sys- most direct one and represented both the correlator and
tematic errors can reliably be quantified by studying nu- the smeared spectral density as finite dimensional vec-
merically the limits of Eq. (22). For this reason, as in tors, that we used respectively as input and output of our
the case of the HLT method of Ref. [14], the target of neural networks. More precisely, the dimension of the in-
the machine-learning method that we are now going to put correlator vector has been fixed to NT = 64, coincid-
discuss in details is the extraction of smeared spectral ing with the available number of Euclidean lattice times
densities. in the case of the lattice QCD correlator. The inputs of
7
the neural networks are thus the 64-components vectors Type Maps Size Kernel size Stride Activation
C = {C(a), C(2a), · · · , C(64a)}. The output vectors are Input 64
instead given by ρ̂σ = {ρ̂σ (Emin ), · · · , ρ̂σ (Emax )}. As in Conv1D 2 32x2 3 2 LeakyReLu
Ref. [40], we have chosen to measure energies in units of Conv1D 4 16x4 3 2 LeakyReLu
the muon mass, mµ = 0.10566 GeV, and set Emin = mµ Conv1D 8 8x8 3 2 LeakyReLu
and Emax = 24mµ . The interval [Emin , Emax ] has been Flatten 384
discretized in steps of size mµ /2. With these choices Fully conn. 256 LeakyReLu
our output smeared spectral densities are the vectors ρ̂σ Fully conn. 256 LeakyReLu
Output 47
with NE = 47 elements corresponding to energies rang-
Parameters 94651
ing from about 100 MeV to 2.5 GeV.
The noise-to-signal ratio in (generic) lattice QCD cor-
relators increases exponentially at large Euclidean times. TABLE I. arcS : the smallest neural network architecture used
For this reason it might be numerically convenient to in this work. The architecture is of the type feed-forward and
choose NT < T /a and discard the correlator at large the structure can be read from top to bottom of the table.
times where the noise-to-signal ratio is bigger than one. It consist of three 1D convolutional layers with an increasing
According to our experience with the HLT method, that number of maps followed by two fully connected layers. The
two blocks are intermediated by one flatten layer. The column
inherits the original Backus-Gilbert regularization mech-
denoted by “Size” reports the shape of the signal produced by
anism, it is numerically inconvenient to discard part the corresponding layer. The stride of the filters is set to 2 in
of the information available on the correlator provided such a way that the dimension of the signal is halved at each
that the information contained in the noise is used to 1D convolutional layer thus favouring the neural network to
conveniently regularize the numerical problem. Also in learn a more abstract, and possibly more effective, represen-
this new method, as we are going to explain in subsec- tation of the input data. As activation functions we use the
tion IV B, we use the information contained in the noise LeakyReLu with negative slope coefficient set to −0.2. The
of the lattice correlator during the training process and neurons with activation functions are also provided with bi-
this, as shown in Appendix B, allows us to conveniently ases. The output is devoid of activation function in order not
use all the available information on the lattice correlator, to limit the output range. The bottom line reports the total
number of trainable parameters.
i.e. to set NT = T /a, in order to extract the smeared
spectral density.
We treat σ, the width of the smearing Gaussian, as a
fixed parameter by including in the corresponding train- and it is common practice to make the choice by taking
ing sets only spectral functions that are smeared with the into account the details of the problem at hand. For our
chosen value of σ. This is a rather demanding numerical analysis, after having performed a comparative study at
strategy because in order to change σ the neural net- (almost) fixed number of parameters of the so-called Mul-
works have to be trained anew, by replacing the smeared tilayer perceptron and convolutional neural networks, we
spectral densities in the training sets with those corre- used feed-forward convolutional neural networks based
sponding to the new value of σ. Architectures that give on the LeNet architecture introduced in Ref. [41].
the possibility to take into account a variable input pa- We do instead study in details the dependence of the
rameter, and a corresponding variable output at fixed in- output of the neural networks upon their size Nn and,
put vector, have been extensively studied in the machine to this end, we implemented three architectures that
learning literature and we leave a numerical investiga- we called arcS, arcM and arcL. These architectures, de-
tion of this option to future work on the subject. In this scribed in full details in TABLEs I, II and III, differ only
work we considered two different values, σ = 0.44 GeV for the number of maps in the convolutional layers. The
and σ = 0.63 GeV, that correspond respectively to the number of maps are chosen so that the number of param-
smallest and largest values used in Ref. [40]. eters of arcS:arcM:arcL are approximately in the propor-
tion 1 : 2 : 3. For the implementation and the training
we employed both Keras [42] and TensorFlow [43].
B. Architectures
By reading the discussion on the data layout presented IV. MODEL INDEPENDENT TRAINING
in the previous subsection from the machine-learning
point of view, we are in fact implementing neural net- In the supervised deep-learning framework a neural
works to solve a RNT 7→ RNE regression problem with network is trained over a training set which is representa-
NT = 64 and NE = 47 which, from now on, fix the tive of the problem that has to be solved. In our case the
dimension of the input and output layers of the neural inputs to each neural network are the correlators C and
networks. the target outputs are the associated smeared spectral
There are no general rules to prefer a given network densities ρ̂σ . As discussed in the Introduction, our main
architecture among the different possibilities that have goal is to devise a model-independent training strategy.
been considered within the machine-learning literature To this end, the challenge is that of building a training set
8
Type Maps Size Kernel size Stride Activation the answer of a network cannot be exact, and there-
Input 64 fore has to be associated with an error, is taken into
Conv1D 12 32x12 3 2 LeakyReLu account by introducing an ensemble of machines,
Conv1D 24 16x24 3 2 LeakyReLu with Nr replicas, and by estimating this error by
Conv1D 48 8x48 3 2 LeakyReLu studying the distribution of the different Nr an-
Flatten 384 swers in the Nr 7→ ∞ limit (see FIG. 2);
Fully conn. 256 LeakyReLu
Fully conn. 256 LeakyReLu • once the network (and statistical) errors at fixed
Output 47 N are given, we can study numerically the N 7→
Parameters 180871 ∞ limits and also quantify, reliably, the additional
systematic errors associated with these unavoidable
TABLE II. arcM : the medium-size architecture used in this extrapolations (see FIG. 3).
work. See TABLE I for the description.
We are now going to provide the details concerning the
choice of the functional basis that we used to parametrize
Type Maps Size Kernel size Stride Activation the spectral densities and to build our training sets.
Input 64
Conv1D 32 32x32 3 2 LeakyReLu
Conv1D 64 16x64 3 2 LeakyReLu A. The functional-basis
Conv1D 128 8x128 3 2 LeakyReLu
Flatten 1024
Fully conn. 256 LeakyReLu In our strategy we envisage studying numerically the
Fully conn. 256 LeakyReLu limit Nb 7→ ∞ and, therefore, provided that the system-
Output 47 atic errors associated with this extrapolation are properly
Parameters 371311 taken into account, there is no reason to prefer a partic-
ular basis w.r.t. any other. For our numerical study we
used the Chebyshev polynomials of the first kind as basis
TABLE III. arcL: the largest architecture used in this work. functions (see for example Ref. [44]).
See TABLE I for the description.
The Chebyshev polynomials Tn (x) are defined for x ∈
[−1, 1] and satisfy the orthogonality relations
which contains enough variability so that, once trained, 
the network is able to provide the correct answer, within Z +1 0 n ̸= m,
Tn (x)Tm (x)  π
the quoted errors, for any possible input correlator. dx √ = 2 n = m ̸= 0 . (36)
As a matter of fact, the situation in which the neural −1 1 − x2 
π n = m = 0
network can exactly reconstruct any possible function is
merely ideal. That would be possible only in absence of In order to use them as a basis for the spectral densities
errors on the input data and with a neural network with that live in the energy domain E ∈ [E0 , ∞), we intro-
an infinite number of neurons, trained on an infinitely duced the exponential map
large and complex training set. This is obviously impos-
sible and in fact our goal is the realistic task of getting an x(E) = 1 − 2e−E (37)
output for the smeared spectral density as close as pos-
sible to the exact one by trading the unavoidable limited and set
abilities of the neural network with a reliable estimate of
the systematic error. In order to face this challenge we Bn (E) = Tn (x(E)) − Tn (x(E0 )) . (38)
used the algorithmic strategy described in FIG. 1, FIG. 2
Notice that the subtraction of the constant term
and in FIG. 3. In our strategy,
Tn (x(E0 )) has been introduced in order to be able to
• the fact that the network cannot be infinitely large cope with the fact that hadronic spectral densities vanish
is parametrized by the fact that Nn (the number of below a threshold energy E0 ≥ 0 that we consider an un-
neurons) is finite; known of the problem. With this choice, the unsmeared
spectral densities that we use to build our training sets
• the fact that during the training a network can- are written as
not be exposed to any possible spectral density is
parametrized by the fact that Nb (the number of Nb
X
basis functions) and Nρ (the number of spectral ρ(E; Nb ) = θ(E − E0 ) cn [Tn (x(E)) − Tn (x(E0 ))] ,
densities to which a network is exposed during the n=0
training) are finite (see FIG. 1); (39)
• the fact that at fixed and vanish identically for E ≤ E0 . Once E0 and
the coefficients cn that define ρ(E; Nb ) are given, the
N = (Nn , Nb , Nρ ) (35) correlator and the smeared spectral density associated
9
1.00
Nb = 16
√
0.75
0.4 2 2 ln 2 σ = 0.10 GeV
Nb = 32
Nb = 128 Σσ,∆E (E)
0.50 0.2
Nb = 512
ρ(E)
0.25
0.0
ρ(E)
0.00
−0.2
−0.25
−0.50 −0.4
−0.75
0.0 0.5 1.0 1.5 2.0 2.5 √

E [GeV] 0.4 2 2 ln 2 σ = 0.50 GeV
Σσ,∆E (E)
0.2
FIG. 5. Examples of unsmeared spectral densities generated
ρ(E)
according to Eq. 39 using the Chebyshev polynomial as basis 0.0
functions for different Nb . For all the examples we set E0 =
0.8 GeV. As it can be seen, by generating the cn coefficients −0.2
according to Eq. (40), the larger the number of terms of the
Chebyshev series the richer is the behaviour of the spectral
−0.4
density in terms of local structures.
0.0 0.5 1.0 1.5 2.0
with ρ(E; Nb ) can be calculated by using Eq. (32) and E [GeV]

Eq. (33)2 .
Each ρ(E; Nb ) that we used in order to populate our
FIG. 6. The black dashed curve is a generic continuous un-
training sets has been obtained by choosing E0 randomly smeared spectral density ρ(E) while the vertical dashed lines
in the interval [0.2, 1.3] GeV with a uniform distribution indicate the energy sampling defining the support of ρδ (ω) ac-
and by inserting in Eq. (39) the coefficients cording to Eq. (42). The blue and red curves are the smeared
rn version of respectively ρ(ω) and ρδ (ω). The green curve rep-
c0 = r0 ; cn = , n>0, (40) resents Σσ,∆E (E) = ρ̂σ (E) − ρ̂δ,σ (E). The spacing is set to
n1+ε ∆E = 0.1 GeV. The smearing kernel is a normalized Gaussian
where the rn ’s are Nb uniformly distributed random function of width σ and as√resolution width we refer to the full
numbers in the interval [−1, 1] and ε is a non-negative pa- width at half maximum 2 2 ln 2 σ . Top-panel : the resolution
rameter that we set to 10−7 . Notice that with this choice width is equal to ∆E and it is such that√the single peaks are
of the coefficients cn the Chebyshev series of Eq. (39) is resolved. Bottom-panel : in this case 2 2 ln 2 σ ≫ ∆E and
the two smeared functions are almost undistinguishable.
convergent in the Nb 7→ ∞ limit and that the resulting
spectral densities can be negative and in general change
sign several times in the interval E ∈ [0, ∞). Notice also
that up to a normalization constant, that it will turn to
be irrelevant in our training strategy in which the input
data are standardized as explained in subsection IV C,
the choice of the interval [−1, 1] for the rn random num- polynomials as basis functions is the fact that we are
bers is general. thus sampling the space of regular functions while, as we
A few examples of unsmeared spectral densities gener- pointed out in section II, on a finite volume the lattice
ated according to Eq. 39 are shown in FIG. 5. As it can QCD spectral density is expected to be a discrete sum
be seen, with the choice of the coefficients cn of Eq. (40), of Dirac-delta peaks (see Eq. (29)). Here we are rely-
the larger is the number of terms in the Chebyshev series ing on the fact that the inputs of our networks will be
of Eq. (39) the richer is the behaviour of the resulting Euclidean correlators, that in fact are smeared spectral
unsmeared spectral densities in terms of local structures. densities, and will also be asked to provide as output
This is an important observation and a desired feature. the smeared spectral densities ρ̂σ (E). This allows us to
Indeed, a natural concern about the choice of Chebyshev assume that, provided that the energy smearing param-
eter σ and Nb are sufficiently large, the networks will
be exposed to sufficiently general training sets to be able
2 By representing also the smearing kernels and the basis functions to extract the correct smeared spectral densities within
on a Chebyshev basis, the orthogonality relations of Eq. (36) can the quoted errors. To illustrate this point we consider
conveniently be exploited to speedup this step of the numerical in FIG. 6 a continuous spectral density ρ(E) and ap-
calculations. proximate its smeared version through a Riemann’s sum
10
according to
Z ∞
40
ρ̂σ (E) = dω Kσ (E, ω)ρ(ω)
E0
(%)
∞
X 30
= ∆E Kσ (E, ωn )ρ(ωn ) + Σσ,∆E (E)
∆Clatt (t)
n=0
Clatt (t)
20
Z ∞
= dω Kσ (E, ω)ρδ (ω) + Σσ,∆E (E), (41)
E0 10
where
0
∞
X 0 10 20 30 40 50 60
ρδ (ω) ≡ ∆E ρ(ω)δ(ω − ωn ) , ωn = E0 + n∆E . t
a
n=0
(42)
FIG. 7. Noise-to-signal ratio of the lattice correlator Clatt (t),
The previous few lines of algebra show that the smear- discussed in Section VII, whose covariance matrix is used to
ing of a continuous function ρ(ω) can be written as the inject the statistical noise in the training sets.
smearing of the distribution defined in Eq. (42), a proto-
type of the finite-volume distributions of Eq. (29), plus
the approximation error Σσ,∆E (E) = ρ̂σ (E) − ρ̂δ,σ (E). that we obtain, as we are now going to explain, by distort-
The quantity Σσ,∆E (E), depending upon the spacing ∆E ing the correlator C using the information provided by
of the Dirac-delta peaks and the smearing parameter σ, the noise of the lattice correlator Clatt (see Section VII).
is expected to be sizeable when σ ≤ ∆E and to become In order to cope with the presence of noise in the in-
irrelevant in the limit σ ≫ ∆E. A quantitative exam- put data that have to be processed at the end of the
ple is provided in FIG. 6 where Σσ,∆E (E) is shown at training, it is extremely useful (if not necessary) to teach
fixed ∆E for two values of σ. In light of this observation, to the networks during the training to distinguish the
corroborated by the extensive numerical analysis that we physical content of the input data from noisy fluctua-
have performed at the end of the training sessions to val- tions. This is particularly important when dealing with
idate our method (see subsection V B and, in particular, lattice QCD correlators for which, as discussed in subsec-
FIG. 16), we consider justified the choice of Chebyshev tion III A, the noise-to-signal ratio grows exponentially
polynomials as basis functions provided that, as is the for increasing Euclidean times (see FIG. 7). A strategy
case in this work, the energy smearing parameter σ is to cope with this problem, commonly employed in the
chosen sufficiently large to not be able to resolve the dis- neural network literature, is to add Gaussian noise to the
crete structure of the finite-volume spectrum. input data used in the training. There are several exam-
ples in the literature where neural networks are shown
to be able to learn rather efficiently by employing this
B. Building the training sets strategy of data corruption (see the already cited text-
books Refs. [37, 38]). According to our experience, it is
Having provided all the details concerning the func- crucially important that the structure of the noise used
tional basis that we use to parametrize the unsmeared to distort the training input data resembles as much as
spectral densities, we can now explain in details the pro- possible that of the true input data. In fact, it is rather
cedure that we used to build our training sets. difficult to model the noise structure generated in Monte
A training set Tσ (Nb , Nρ ) contains Nρ input-output Carlo simulations and, in particular, the off-diagonal el-
pairs. Each pair is obtained by starting from a random ements of the covariance matrices of lattice correlators.
unsmeared spectral density, parametrized at fixed Nb ac- In the light of these observations we decided to use the
cording to Eq. (39) and generated as explained in the pre- covariance matrix Σ̂latt of the lattice correlator Clatt that
vious subsection. Given the unsmeared spectral density we are going to analyse in Section VII to obtain Cnoisy
ρ(E; Nb ), we then compute the corresponding correlator from C. More precisely, given a correlator C, we gener-
vector C (by using Eq. (32)) and, for the two values of σ ate Cnoisy by extracting a random correlator vector from
that we used in this study, the smeared spectral densities the multivariate Gaussian distribution
vectors ρ̂σ (by using Eq. (33)). From each pair (C, ρ̂σ ) " 2 #
we then generate an element (Cnoisy , ρ̂σ ) of the training C(a)
G C, Σ̂latt (44)
set at fixed Nb and σ, Clatt (a)
ρi (E; Nb ) 7→ (C, ρ̂σ )i 7→ (Cnoisy , ρ̂σ )i ∈ Tσ (Nb , Nρ ) , having C as mean vector and as covariance the matrix
2
Σ̂latt normalized by the factor CC(a)
latt (a)
.
i = 1, · · · , Nρ , (43) In order to be able to perform a numerical study
11
of the limits N 7→ ∞, we have generated with the D. Training an ensemble of machines

procedure just described several training sets, cor-
responding to Nb = {16, 32, 128, 512} and Nρ = Given a machine with Nn neurons we train it over
{50, 100, 200, 400, 800} × 103 . At fixed Nb , each train- the training set Tσ (Nb , Nρ ) by using as loss function the
ing set includes the smaller ones, i.e. the training set Mean Absolute Error (MAE)
Tσ (Nb , 100 × 103 ) includes the training set Tσ (Nb , 50 ×
103 ) enlarged with 50 × 103 new samples. Nρ
1 X pred,i
(w) − ρ̂iσ ,

ℓ(w) = ρ̂σ (48)
Nρ i=1
C. Data pre-processing
where |ρσ | is the Cartesian norm of the NE = 47 com-
ponents vector ρσ , while ρ̂pred,i
σ (w) is the output of the
A major impact in the machine-learning performance neural network in correspondence of the input correlator
is played by the way the data are presented to the neural i
Cnoisy . In Eq. (48) we have explicitly shown the depen-
network. The learning turns to be more difficult when
dence of the predicted spectral density ρ̂σpred,i (w) upon
the input variables have different scales and this is ex-
the weights w of the network.
actly the case of Euclidean lattice correlators since the
At the beginning of each training session each weight
components of the input vectors decrease exponentially
wn (with n = 1, · · · , Nn ) is extracted from a Gaussian
fast. The risk in this case is that the components small in
distribution with zero mean value and variance 0.05. To
magnitude are given less importance during the training
end the training procedure we rely on the early stopping
or that some of the weights of the network become very
criterion: the training set is split into two subsets con-
large in magnitude, thus generating instabilities in the
taining respectively the 80% and 20% of the entries of
training process. We found that a considerable improve-
the training set Tσ (Nb , Nρ ). The larger subset is used
ment in the validation loss minimization is obtained by
to update the weights of the neural network with the
implementing the standardization of the input correlators
gradient descent algorithm. The smaller subset is the so-
(see next subsection and the central panel of FIG. 8) and
called validation set and we use it to monitor the training
therefore used this procedure.
process. At the end of each epoch we calculate the loss
The standardization procedure of the input consists function of Eq. (48) for the validation set, the so-called
in rescaling the data in such a way that all the compo- validation loss, and stop the training when the drop in
nents of the input correlator vectors in the training set the validation loss is less than 10−5 for 15 consecutive
are distributed with average 0 and variance 1. For a given epochs. In the trainings performed in our analysis this
training set Tσ (Nb , Nρ ) we calculate the NT -component occurs typically between epoch 150 and 200. The early
vectors µ and γ whose components are stopping criterion provides an automatic way to prevent
Nρ
the neural network from overfitting the input data.
1 X We implement a Mini-Batch Gradient Descent algo-
µ(τ ) = Ci (aτ ) (45)
Nρ i=1 rithm, with Batch Size (BS) set to 32, by using the Adam
optimizer [45] combined with a learning rate decaying ac-
and cording to
sP
Nρ 2 θ(e − 25)η(e − 1)
(Ci (aτ ) − µ(τ )) η(e) = , (49)
γ(τ ) = i=1
. (46) 1 + e · 4 × 10−4
Nρ
where e ∈ N is the epoch and η(0) = 2 × 10−4 is the
Each correlator in the training is then replaced by starting value. The step-function is included so that the
learning rate is unchanged during the first 25 epochs.
′ Cnoisy (aτ ) − µ(τ ) Although a learning rate scheduler is not strictly manda-
Cnoisy (aτ ) 7→ Cnoisy (aτ ) = . (47)
γ(τ ) tory, since the Adam optimizer already includes adaptive
learning rates, we found that it provides an improvement
where Cnoisy is the distorted version of C discussed in in the convergence with less noisy fluctuations in the val-
the previous subsection. Notice that the vectors µ and γ idation loss (see the top panel of FIG. 8). Concerning the
are determined from the training set before including the BS, we tested the neural network performance by start-
noise. Although the pre-processed correlators look quite ing from BS=512 and by halving it up to BS=16 (see
different from the original ones, since the components are bottom panel of FIG. 9). Although the performance im-
no longer exponential decaying, the statistical correlation proves as BS decreases we set BS= 32 in order to cope
in time is preserved by the standardization procedure. with the unavoidable slowing down of the training for
At the end of the training, the correlators fed into the smaller values of BS.
neural network for validation or prediction have also to As we already stressed several times, at fixed N the
be pre-processed by using the same vectors µ and γ used answer of a neural network cannot be exact. In order
in the training. to be able to study numerically the N 7→ ∞ limits, the
12
2 × 10−2
with l.r. scheduler σ = 0.44 GeV
without l.r. scheduler Nb = 512 arcS
Nb = 128 arcM
Nb = 32 arcL
Validation loss
Validation loss
Nb = 16
6 × 10−3
10−2
4 × 10−3
−3
6 × 10
3 × 10−3
data pre-processed σ = 0.63 GeV

data not pre-processed Nb = 512 arcS
10−1
4 × 10−3 Nb = 128 arcM
Validation loss
Nb = 32 arcL
Validation loss
Nb = 16
3 × 10−3
2 × 10−3
10−2
100000 200000 300000 400000 500000 600000 700000 800000

BS = 16
Nρ
BS = 32
BS = 64
Validation loss
BS = 128
FIG. 9. Validation losses at the end of the trainings as func-
BS = 512
tions of the training set dimension Nρ . The top-plot cor-
responds to σ = 0.44 GeV and the bottom-plot to σ =
0.63 GeV. The different colors correspond to the different
10−2
architectures and different markers to the different number
of basis functions Nb considered in this study. The error on
each point correspond to the standard deviation of the dis-
tribution of the validation loss in an ensemble of Nr = 20
0 25 50 75 100 125 150 175 machines. As expected, the validation loss decreases by in-
epochs creasing Nρ (more general training) and/or Nn (larger and
therefore smarter networks). Moreover, at fixed Nn and Nρ
the neural network performs better at smaller values of Nb
FIG. 8. Numerical tests to optimize the training perfor- as consequence of the fact that the training set exhibits less
mances. The validation loss functions of the trainings of the complex features and learning them is easier.
Nr = 20 replica machines belonging to the same ensemble are
plotted with the same color. All the trainings refer to arcL,
Nρ = 50 × 103 , Nb = 512 and σ = 0.44 GeV. If not otherwise
specified the data are pre-processed, BS=32 and the learn-
obtained by starting from the same unsmeared spectral
ing rate scheduler is implemented. Top panel : Comparison densities, and therefore from the same pairs (C, ρ̂σ ) (see
of the validation losses with and without the learning sched- Eq. (43)), but with different noisy input correlator vec-
uler rate defined in Eq. (49). Central panel : Comparison of tors Cnoisy .
the validation losses by implementing or not the input data In FIG. 9 we show the validation loss as a function of
pre-processing procedure described in subsection IV C. Bot- Nρ for the different values of Nb , the three different archi-
tom panel : Comparison of the validations losses at different tectures and the two values of σ considered in this study.
values of the BS parameter. Each point in the figure has been obtained by studying
the distribution of the validation loss at the end of the
training within the corresponding ensemble of machines
error associated with the limited abilities of the networks
and by using respectively the mean and the standard de-
at finite N has to be quantified. To do this we introduce
viation of each distribution as central value and error.
the ensemble of machines by considering at fixed N
The figure provides numerical evidences concerning the
Nr = 20 (50) facts that
machines with the same architecture and trained by using • networks with a finite number Nn of neurons, ini-
the same strategy. More precisely, each machine of the tialized with different weights and exposed to train-
ensemble is trained by using a training set Tσ (Nb , Nρ ) ing sets containing a finite amount of information,
13
provide different answers and this is a source of er- • by computing the standard deviation over the en-
rors that have to be quantified; semble of machines of ρ̂pred
σ (E, N , r), we obtain an
estimate of the error, that we call ∆net
σ (E, N ), as-
• the validation loss decreases for larger Nn because sociated with the fact that at fixed N the answer
larger networks are able to assimilate a larger of a network cannot be exact;
amount of information;
• both ∆latt net
σ (E, N ) and ∆σ (E, N ) have a statistical
• the validation loss decreases for larger Nρ since, if latt
origin. The former, ∆σ (E, N ), comes from the
Nn is sufficiently large, the networks learn more limited statistics of the lattice Monte Carlo simula-
from larger and more general training sets; tion while the latter, ∆netσ (E, N ), comes from the
• at fixed Nn and Nρ the networks perform better statistical procedure that we used to populate our
at smaller values of Nb because the training sets training sets Tσ (Nb , Nρ ) and to train our ensembles
exhibit less complex features and learning them is of machines at fixed Nb . We sum them in quadra-
easier. ture and obtain an estimate of the statistical error
at fixed N ,
In order to populate the plots in FIG. 9 we considered q
several values of Nρ and Nb and this is rather demanding ∆stat [∆latt
2 net 2
σ (E, N ) = σ (E, N )] + [∆σ (E, N )] ; (52)
from the computational point of view. The training sets
that we found to be strictly necessary to quote our final
• we then study numerically the N 7→ ∞ limits by
results are listed in TABLE IV.
using a data-driven procedure. We quote as central
value and statistical error of our final result
V. ESTIMATING THE TOTAL ERROR AND
VALIDATION TESTS
ρ̂pred
σ (E) ≡ ρpred
σ (E, N max ) ,
Having explained in the previous section the procedure ∆stat

σ (E) ≡ ∆σ (E, N
max
), (53)
that we used to train our ensembles of machines, we can
now explain in details the procedure that we use to obtain where N max , given in Table IV, is the vector with
our results. We start by discussing the procedure that the largest components among the vectors N con-
we use to estimate the total error, taking into account sidered in this study;
both statistical and systematics uncertainties, and then
• in order to check the numerical convergence of the
illustrate the validation tests that we have performed in
N 7→ ∞ limits and to estimate the associated sys-
order to assess the reliability of this procedure.
tematic uncertainties ∆Xσ (E), where
X = {ρ, n, b} , (54)
A. Procedure to estimate the total error
ref
we define the reference vectors NX listed in Ta-
The procedure that we use to quote our results for ble IV. We then define the pull variables
smeared spectral densities by using the trained ensembles pred
of machines, illustrated in FIG. 3, is the following ρ̂σ (E) − ρ̂pred
σ (E, NX ref
)·
X
Pσ (E) = q 2 , (55)
• given an ensemble of Nr machines trained at ∆stat (E) + ∆ stat (E, N ref ) 2
σ σ X
fixed N , we feed in each machine r belonging to
the ensemble all the different bootstrap samples and then
we weight the absolute value of the differ-
(or jackknife bins) Cc (aτ ) of the input correlator ence ρ̂pred
σ (E) − ρ̂pred
σ (E, NX ref
) with the Gaussian
(c = 1, · · · , Nc ) and obtain a collection of results probability that this is not due to a statistical fluc-
ρ̂pred
σ (E, N , c, r) for the smeared spectral density tuation,
that depend upon N , c and r; X
|Pσ (E)|

∆X
pred pred ref

• at fixed N and r we compute the bootstrap (or σ (E) = ρ̂ σ (E) − ρ̂ σ (E, N X ) erf √ ;
2
jackknife) central values ρ̂pred
σ (E, N , r) and statis-
(56)
tical errors ∆latt
σ (E, N , r) and average them over
the ensemble of machines,
• the total error that we associate to our final result
Nr ρ̂pred
σ (E) is finally obtained according to
1 X
ρ̂pred
σ (E, N ) = ρ̂pred
σ (E, N , r) ,
Nr ∆tot
r=1 σ (E)
Nr
q
1 X 2 ρ 2 2 2
∆latt ∆latt (E, N , r) ; = [∆stat b n
σ (E)] + [∆σ (E)] + [∆σ (E)] + [∆σ (E)] .
σ (E, N ) = (51)
Nr r=1 σ (57)
14
Label Nn Nb Nρ E = 0.16 GeV

N nax arcL 512 800 × 103
Nbref arcL 16 800 × 103 −0.1678
(E, N, r)
Nnref arcS 512 800 × 103
Nρref arcL 512 50 × 103 −0.1679
ρ̂pred
σ
−0.1680
TABLE IV. List of the vectors N used to estimate the sys-
tematic errors associated with the N 7→ ∞ limits. 1 10 20 2 5 10
E = 0.79 GeV
−0.5655
Some remarks are in order here. At first sight this
procedure might appear too complicated and/or too con-
(E, N, r)
−0.5660
servative. In fact, as we are now going to illustrate by
ρ̂pred
considering an explicit example, this has to be viewed
σ
−0.5665
as just one of the possible ways to perform quantitative
plateaux-analyses of the N 7→ ∞ limits. Moreover, the
stringent validation tests that we have performed with 1 10 20 2 5 10
mock data, and that we are going to discuss below, pro- E = 1.69 GeV
vide a quantitative evidence that the results obtained by
implementing this procedure can trustworthy be used in −0.738
(E, N, r)
phenomenological applications.
−0.740
ρ̂pred
σ
−0.742
B. Validation
1 10 20 2 5 10
In order to illustrate how the method described in the E = 2.54 GeV
previous subsection can be applied in practice, we now −0.620
consider an unsmeared spectral density ρ(E) that has
−0.625
not been used in any of the trainings and that has been
(E, N, r)
obtained as described in subsection IV A, i.e. by start- −0.630

ing from Eq. (39), but this time with Nb = 1024, i.e. a
ρ̂pred
σ
number of basis functions which is twice as large as the −0.635
largest Nb employed in the training sessions. −0.640

From this ρ(E) we then calculate the associated corre- 1 10 20 2 5 10
r (training replicas)
lator C and the smeared spectral density corresponding
to σ = 0.44 GeV. We refer to the true smeared spectral
density as ρ̂true pred
σ (E), while we call ρ̂σ (E) the final pre- FIG. 10. Each row corresponds to a different energy, as
dicted result. Both the unsmeared spectral density ρ(E) written on the top of each plot, and in all cases we set
(grey dotted curve, partially visible) and the expected σ = 0.44 GeV and N = N max = (arcL, 512, 800 × 103 ).
exact result ρ̂true Left panels: the x-axes corresponds to the replica index
σ (E) (solid black curve) are shown in the
top panel of FIG. 12. r = 1, · · · , Nr = 20 running over the entries of the ensem-
ble of machines. The plotted points correspond to the re-
Starting from the exact correlator C corresponding to
sults ρ̂pred
σ (E, N , r) ± ∆latt
σ (E, N , r) obtained by computing
ρ(E) we have then generated Nc = 800 bootstrap samples the bootstrap averages and errors of the c = 1, · · · , Nc = 800
from the distribution of Eq. (44), thus simulating the results ρ̂pred (E, N , c, r). Right panels: distributions of the
σ
outcome of a lattice Monte Carlo simulation. The Nc central values ρ̂pred
σ (E, N , r). The means of these distribu-
samples have been fed into each trained neural network tions correspond to the results ρ̂pred
σ (E, N ) that are shown in
and we collected the answers ρ̂pred
σ (E, N , c, r) that, as the left panels as solid blue lines. The widths correspond to
shown in FIG. 10 for a selection of the considered values the network errors ∆net σ (E, N ) that are represented in the left
of E, we have then analysed to obtain ∆latt σ (E, N ) and
panels with two dashed blue lines. The blue bands in the left
∆net (E, N ). panels correspond to ∆stat σ (E, N ), defined in Eq. (52).
σ
The next step is now the numerical study of the limits
N 7→ ∞ that we illustrate in FIG. 11 for the same values
of E considered in FIG. 10. The limit Nρ 7→ ∞ (left results ρ̂pred
σ (E) ± ∆stat
σ (E), are always statistically com-
panels) is done at fixed Nb = 512 and Nn =arcL. The patible with the first grey point on the left of the blue
limit Nn 7→ ∞ (central panels) is done at fixed Nb = 512 one in each plot. The red points correspond to the results
and Nρ = 800 × 103 . The limit Nb 7→ ∞ (right pan- ρ̂pred
σ (E, NXref
) ± ∆stat ref
σ (E, NX ) that we use to estimate
els) is done at fixed Nn =arcL and Nρ = 800 × 103 . the systematic uncertainties ∆X σ (E). The blue band cor-
As it can be seen, the blue points, corresponding to our respond to our estimate of the total error ∆totσ (E).
15
E = 0.16 GeV
σ = 0.44 GeV
−0.1674 −0.2
ρ̂true
σ (E)
−0.3 ρ(E)
σ (E, N)
−0.1676
ρ̂pred
σ (E)
−0.4
ρ̂σ (E)
−0.1678
ρ̂pred
−0.5
−0.1680
−0.6
−0.7
E = 0.79 GeV
−0.5655
6
−0.5660 ∆tot pred
σ (E)/|ρ̂σ (E)|
σ (E, N)
Relative error budget (%)

5 ∆stat pred
σ (E)/|ρ̂σ (E)|
−0.5665
∆ρσ (E)/|ρ̂pred
σ (E)|
ρ̂pred
−0.5670 4
∆nσ (E)/|ρ̂pred
σ (E)|
−0.5675 3 ∆bσ (E)/|ρ̂pred
σ (E)|
−0.5680
2
E = 1.69 GeV
−0.725 1
−0.730 0
σ (E, N)
−0.735 1.00
−0.740 0.75
ρ̂pred
−0.745 0.50
−0.750
0.25
pσ (E)
0.00
E = 2.54 GeV
−0.25
−0.60
−0.50
σ (E, N)
−0.62 −0.75
−1.00
ρ̂pred
−0.64 0.0 0.5 1.0 1.5 2.0 2.5

E [GeV]
−0.66
50 400 800 150 300 16 128 512 FIG. 12. Final results for σ = 0.44 GeV. Top panel : The
Nρ ×103
Nn ×103
Nb final results predicted by the neural networks (blue points) is
compared with the true result (solid black). The dotted grey
curve in the background represents the unsmeared spectral
FIG. 11. Numerical study of the limits N 7→ ∞. Each row density. Central panel : Relative error budget as a function
corresponds to a different energy, as written on the top of each of the energy. Bottom panel : Deviation of the predicted re-
plot, and all cases we set σ = 0.44 GeV. Left panels: Study of sult from the true one in units of the standard deviation, see
the limit Nρ 7→ ∞ at fixed Nb = 512 and Nn =arcL. Central Eq. (58).
panels: Study of the limit Nn 7→ ∞ at fixed Nb = 512 and
Nρ = 800 × 103 . Right panels: Study of the limit Nb 7→ ∞
at fixed Nn =arcL and Nρ = 800 × 103 . All panels: The error, particularly at large energies. The bottom panel
horizontal blue line is the final central value, corresponding of FIG. 12 shows the pull variable
to N = N max = (arcL, 512, 800 × 103 ) (blue points) and it is
common to all the panels within the same row. The blue band ρ̂pred
σ (E) − ρ̂true
σ (E)
represents instead ∆totσ (E) calculated by the combination in
pσ (E) = tot
(58)
∆σ (E)
quadrature of all the errors (see Eq. 57). The red points
ref
correspond to NX and the grey ones to the other trainings, and, as it can be seen, by using the proposed procedure to
see TABLE IV. estimate the final results and their errors, no significant
deviations from the true result have been observed in this
particular case.
In the top panel of FIG. 12 we show the comparison In order to validate our method we repeated the test
of our final results ρ̂pred
σ (E) ± ∆tot
σ (E) (blue points) with just described 2000 times. We have generated random
the true smeared spectral density ρ̂true
σ (E) (black curve) spectral densities, not used in the trainings, by starting
that in this case is exactly known. The central panel again from Eq. (39) and by selecting random values for
in FIG. 12 shows the relative error budget as a function E0 and Nb respectively in the intervals [0.3, 1.3] GeV and
of the energy. As it can be seen, the systematics errors [8, 1024]. In FIG. 13 we show the distributions of the re-
represent a sizeable and important fraction of the total sulting pull variables pσ (E) (see Eq. 58) for σ = 0.44 GeV
16
4 4
σ = 0.44 GeV σ = 0.44 GeV
3 0.5 3 0.4
80.1 % ≤ 1 s.d. 79.1 % ≤ 1 s.d.
96.5 % ≤ 2 s.d. 95.6 % ≤ 2 s.d.
2 2
0.4 99.4 % ≤ 3 s.d. 99.1 % ≤ 3 s.d.
0.3
1 0.6 % > 3 s.d. 1 0.9 % > 3 s.d.
pσ (E)
pσ (E)
0 0.3 0
0.2
−1 0.2 −1
−2 −2
0.1
0.1
−3 −3
−4 0.0 −4 0.0
0.5 1.0 1.5 2.0 2.5 0% 20% 40% 60% 0.5 1.0 1.5 2.0 2.5 0% 20% 40% 60%
4 4 0.5
0.6
σ = 0.63 GeV σ = 0.63 GeV
3 85.8 % ≤ 1 s.d.
3 82.0 % ≤ 1 s.d.
0.5 98.4 % ≤ 2 s.d. 0.4 96.4 % ≤ 2 s.d.
2 2
99.8 % ≤ 3 s.d. 99.3 % ≤ 3 s.d.
1 0.4 0.2 % > 3 s.d. 1 0.7 % > 3 s.d.
pσ (E)
pσ (E)
0.3
0 0.3 0
−1 −1 0.2
0.2
−2 −2
0.1
0.1
−3 −3
−4 0.0 −4 0.0
0.5 1.0 1.5 2.0 2.5 0% 25% 50% 75% 0.5 1.0 1.5 2.0 2.5 0% 20% 40% 60%
E [GeV] E [GeV]
FIG. 13. Left panel : Normalized density distribution of the FIG. 14. The same as FIG. 13 but for 2000 samples of un-
significance defined in Eq. (58) as a function of the energy. smeared spectral densities generated according to Eq. (59)
The distribution is calculated over 2000 validation samples and thus totally unrelated to the data used during the train-
generated on the Chebyshev functional space. Right panel : ing.
Cumulative histogram of pσ (E) and related fractions of values
that fall within 1, 2 or 3 standard deviations.
unsmeared spectral densities with Npeaks = 5000. The
position En of each peak has been set by drawing in-
(top panel) and σ = 0.63 GeV (bottom panel). The
dependent random numbers uniformly distributed in the
plots show that, given the procedure that we use to quote
interval [E0 , 15 GeV] while E0 has been randomly chosen
the total error, we observe deviations ρ̂pred
σ (E) − ρ̂true
σ (E) in the interval [0.3, 1.3] GeV. The coefficients cn have also
smaller than 2 standard deviations in 95% of the cases
been generated randomly in the interval [−0.01, 0.01].
and smaller than 3 standard deviations in 99% of the
The 2000 trains of isolated Dirac-delta peaks are repre-
cases. The fact that deviations smaller than 1 standard
sentative of unsmeared spectral densities that might arise
deviation occur in ∼80% of the cases for σ = 0.44 GeV
in the study of finite volume lattice correlators. These
and ∼85% of the cases for σ = 0.63 GeV (to be com-
are rather wild and irregular objects that our neural net-
pared with the expected 68% in the case of a standard
works have not seen during the trainings. FIG 14 shows
normal distribution) is an indication of the fact that our
the plots equivalent to those of FIG 13 for this new valida-
estimate of the total error is slightly conservative. More-
tion set. As it can be seen, the distributions of pσ (E) are
over, from FIG. 13 it is also evident that our ensembles
basically unchanged, an additional reassuring evidence of
of neural networks are able to generalize very efficiently
the ability of our ensembles of neural networks to gen-
outside the training set.
eralize very efficiently outside the training set and, more
In order to perform another stringent validation test
importantly, on the robustness of the procedure that we
of the proposed method, we also considered unsmeared
use to estimate the errors.
spectral densities that cannot be written on the Cheby-
shev basis. We repeated the analysis described in the
previous paragraph for a set of spectral densities gener-
ated according to VI. RESULTS FOR MOCK DATA INSPIRED BY
PHYSICAL MODELS
Npeaks
X
ρ(E) = cn δ(E − En ), (59)
In the light of the results of the previous section, pro-
n=1
viding a solid quantitative evidence of the robustness and
where 0 < E0 ≤ E1 ≤ E2 ≤ · · · ≤ Epeaks . By plug- reliability of the proposed method, we investigate in this
ging Eq. (59) into Eq. (32) and Eq. (33) we have calcu- section the performances of our trained ensembles of ma-
lated the correlator and smeared spectral density associ- chines in the case of mock spectral densities coming from
ated to each unsmeared ρ(E). We have generated 2000 physical models. A few more validation tests, unrelated
17
Model C M [GeV] Γ [GeV] δ2 [GeV] 2.00
ρGM (E) 10 1 0.1 1.5 1.75

ρGM
1 (E)
1
σ = 0.63 GeV
ρGM
2 (E) 5 1 0.1 1.5
1.50 σ = 0.44 GeV
ρGM
3 (E) 5 0.8 0.05 1.2
1.25
ρ̂σ (E)
1.00
TABLE V. Additional parameters used to generate the mock
unsmeared spectral densities based on a Gaussian mixture 0.75
model, Eq. 60. The corresponding functions are plotted in 0.50

FIG. 15 (dashed grey).
0.25
0.00
to physical models, are discussed in Appendix A. 2.00
For each test discussed in the following subsections 1.75

ρGM
2 (E)
σ = 0.63 GeV
we define a model spectral density ρ(E) and then use 1.50 σ = 0.44 GeV
Eq. (32) and Eq. (33) to calculate the associated exact
1.25
correlator and true smeared spectral density. We then
ρ̂σ (E)
generate Nc = 800 bootstrap samples Cc (t), by sampling 1.00
the distribution of Eq. (44), and quote our final results by 0.75
using the procedure described in details in the previous
0.50
section.
0.25
0.00
A. A resonance and a multi-particle plateaux 2.00
1.75
ρGM
3 (E)
σ = 0.63 GeV
The first class of physically motivated models that we 1.50 σ = 0.44 GeV
investigate is that of unsmeared spectral densities ex-
1.25
hibiting the structures corresponding to an isolated reso-
ρ̂σ (E)
nance and a multi-particle plateau. To build mock spec- 1.00
tral densities belonging to this class we used the same 0.75

Gaussian mixture model used in Ref. [24] and given by
0.50
E−M 2
h i
ρGM (E) = θ̂(E, δ1 , ζ1 ) Ce−( Γ ) 1 − θ̂(E, δ2 , ζ2 ) 0.25
0.00
0.0 0.5 1.0 1.5 2.0 2.5
E [GeV]
+ C0 θ̂(E, δ1 , ζ1 )θ̂(E, δ2 , ζ2 ), (60)
where FIG. 15. Results for three different unsmeared spectral den-
sities ρ(E) (dashed grey lines) generated from the Gaussian
1 mixture model of Eq. (60). The solid lines correspond to
θ̂(E, δi , ζi ) = . (61)
1 + exp − E−δ i ρ̂true
σ (E). The points, with error bars, refer to the predicted
ζi
spectral densities ρ̂pred
σ (E).
The sigmoid function θ̂(E, δ1 , ζ1 ) with δ1 = 0.1 GeV and

ζ1 = 0.01 GeV dumps ρGM (E) at low energies while the
E < 1.5 GeV while, at higher energies, the quoted error is
other sigmoid θ̂(E, δ2 , ζ2 ), with ζ2 = 0.1 GeV, connects sufficiently large to account for the deviation of ρ̂pred (E)
σ
the resonance to the continuum part whose threshold is from ρ̂true
σ (E). This is particularly evident in the case of
δ2 . The parameter C0 = 1 regulates the height of the ρGM
1 (E) shown in the top panel.
continuum plateaux which also coincides with the asymp- The same class of physical models can also be investi-
totic behaviour of the spectral density, i.e. ρGM (E) 7→ C0 gated by starting from unsmeared spectral densities that
for E 7→ ∞. We have generated three different spectral might arise in finite volume calculations by considering
densities with this model that, in the following, we call
ρGM GM GM
1 (E), ρ2 (E) and ρ3 (E) and whose parameters are Npeaks
X
given in TABLE V. ρpeak (E) = Cpeak δ(E − Epeak ) + cn δ(E − En ).
The predicted smeared spectral densities for smearing n=1
widths σ = 0.44 GeV and σ = 0.63 GeV are compared (62)
to the true ones in FIG. 15. In all cases the predicted
result agrees with the true one, within the quoted errors, The parameter Epeak parametrizes the position of an iso-
for all the explored values of E. The quality of the recon- lated peak while the multi-particle part is introduced
struction of the smeared spectral densities is excellent for with the other peaks located at Epeak < E1 ≤ · · · ≤
18
1.4 0.8
ρpeak
1 (E)
1.2 σ = 0.63 GeV
σ = 0.44 GeV 0.6
1.0
ρ̂σ (E)
ρ̂σ (E)
0.8
0.4
0.6
0.4 O(3)
0.2 ρ1 (E)
0.2 σ = 0.63 GeV
σ = 0.44 GeV
0.0 0.0
1.4
ρpeak
2 (E) 0.8
1.2 σ = 0.63 GeV

σ = 0.44 GeV
1.0 0.6
ρ̂σ (E)
0.8
ρ̂σ (E)
0.4
0.6
0.4
0.2 O(3)
ρ2 (E)
0.2
σ = 0.63 GeV
0.0 σ = 0.44 GeV
0.0
0.0 0.5 1.0 1.5 2.0 2.5
E [GeV]
0.8
FIG. 16. Reconstructed smeared spectral densities (points

0.6
with errors) compared to the true ones (solid lines) for two
unsmeared spectral densities simulating a finite volume dis-
ρ̂σ (E)
tribution, see Eq. (62). The unsmeared ρ(E) is represented 0.4

by vertical lines located in correspondence of E = En with
heights proportional to cn .
0.2 O(3)
ρ3 (E)
σ = 0.63 GeV
σ = 0.44 GeV
ENpeaks . We have generated two unsmeared spectral den- 0.0
sities, ρpeak
1 (E) and ρpeak
2 (E) by using this model. In 0.0 0.5 1.0 1.5 2.0 2.5
E [GeV]
both cases we set Npeaks = 10000, selected random val-
ues for the En ’s up to 50 GeV and random values for
the cn ’s in the range [0, 0.01]. In the case of ρpeak
1 (E) FIG. 17. Predicted smeared spectral densities for the two-
we set Epeak = 0.8 GeV, Cpeak = 1 and E1 = 1.2 GeV. particles contribution to the vector-vector correlator in the
1 + 1 dimensional O(3) non-linear σ model for different values
In the case of ρpeak
2 (E) we set instead Epeak = 0.7 GeV,
of the two-particle threshold: 0.5 GeV (top panel), 1 GeV
Cpeak = 1 and E1 = 1.5 GeV. (central panel) and 1.5 GeV (third panel). The solid lines
The predicted smeared spectral densities are compared and the dashed curve correspond respectively to ρ̂pred (E) and
σ
with the true ones in FIG. 16. In both cases ρ̂pred
σ (E) is ρO(3) (E).
in excellent agreement with ρ̂true
σ (E).
It is worth emphasizing once again that the model in
Eq. (62) cannot be represented by using the Chebyshev given by
basis of Eq. (39) and, therefore, it is totally unrelated to
the smooth unsmeared spectral densities that we used to ρO(3) (E)
populate the training sets.
3π 3 θ2 + π 2

3 θ

= θ(E − Eth ) 2 2 2
tanh , (63)
4θ θ + 4π 2 θ=2 cosh−1 E
Eth
B. O(3) non-linear σ-model
where Eth is the two-particle threshold. We considered
In this subsection we consider a model for the un- three mock unsmeared spectral densities, that we call
O(3) O(3) O(3)
smeared spectral density that has already been investi- ρ1 (E), ρ2 (E) and ρ3 (E), differing for the posi-
gated by using the HLT method in Ref. [21]. More pre- tion of the multi-particle threshold, which has been set
cisely, we consider the two-particles contribution to the respectively to Eth = 0.5 GeV, 1 GeV and 1.5 GeV. The
vector-vector spectral density in the the 1+1 dimensional reconstructed smeared spectral densities for σ = 0.44
O(3) non-linear σ-model (see Ref. [21] for more details) GeV and σ = 0.63 GeV are compared with the exact
19
4.0 ID L3 × T a fm aL fm mπ GeV
σ = 0.63 GeV
3.5 B64 643 × 128 0.07957(13) 5.09 0.1352(2)
σ = 0.44 GeV
3.0
TABLE VI. ETMC gauge ensemble used in this work. See
2.5
Refs. [40, 48] for more details.
R̂σ (E)
2.0
1.5 0.5
σ = 0.44 GeV
1.0 R̂σHLT (E)
0.4 R̂σpred (E)
0.5
0.0
R̂σ (E)
0.3
0.0 0.5 1.0 1.5 2.0 2.5
E [GeV]
0.2
FIG. 18. Results for the R-ratio using the parametrization of

0.1
Ref. [46]. The gray dashed line is R(E) while the solid lines
correspond to R̂σtrue (E).
0.5
σ = 0.63 GeV
ones in FIG. 17. R̂σHLT (E)
The predicted smeared spectral densities are in re- 0.4 R̂σpred (E)
markably good agreement with the true ones in the full
energy range and in all cases. This result can be read as R̂σ (E)
0.3
an indication of the fact that the smoothness of the un-

derlying unsmeared spectral density plays a crucial rôle 0.2
in the precision that one can get on ρ̂pred

σ (E), also if the
problem is approached by using a neural network ap- 0.1
proach. Indeed, this fact had already been observed and
exploited in Ref. [21], where the authors managed to per- 0.0 0.5 1.0 1.5 2.0 2.5
O(3)
form the σ 7→ 0 limit of ρ̂σ (E) with controlled system- E [GeV]
atic errors by using the HLT method.
FIG. 19. The results obtained with our ensembles of neu-
ral networks in the case of a real lattice QCD correlator are
C. The R-ratio with mock data compared with those obtained with the HLT method (grey,
Ref. [14, 40]). The top-panel shows the two determinations
The last case that we consider is that of a model spec- of the strange-strange connected contribution to the smeared
tral density coming from a parametrization of the exper- R-ratio for σ = 0.44 GeV while the case σ = 0.63 GeV is
imental data of the R-ratio. The R-ratio, denoted by shown in the bottom-panel.
R(E), is defined as the ratio between the inclusive cross
section of e+ e− → hadrons and e+ e− → µ+ µ− and plays
a crucial rôle in particle physics phenomenology (see e.g. and the narrow resonance ϕ(1020). Given this rich struc-
Refs. [40, 47]). ture, shown as the dashed grey curve in FIG. 18, this is
In the next section we will present results for a contri- a much more challenging validation test w.r.t. the one of
bution to the smeared R-ratio Rσ (E) that we obtained the O(3) model discussed in the previous subsection.
by feeding into our ensembles of trained neural networks In FIG. 18, R̂σpred (E) is compared with R̂σtrue (E) for
a lattice QCD correlator that has been already used in both σ = 0.44 GeV (orange points and solid curve) and
Ref. [40] to calculate the same quantity with the HLT σ = 0.63 GeV (blue points and solid curve). In both
method. cases the difference R̂σpred (E) − R̂σtrue (E) doesn’t exceed
Before doing that, however, we wanted also to per- the quoted error for all the considered values of E.
form a test with mock data generated by starting from
the parametrization of R(E) given in Ref. [46]. This
parametrization, that we use as model unsmeared spec- VII. THE R-RATIO WITH LATTICE QCD DATA
tral density by setting ρ(E) ≡ R(E), is meant to repro-
duce the experimental measurements of R(E) for energies In this section we use our trained ensembles of neural
E < 1.1 GeV, i.e. in the low-energy region where there networks to extract the smeared spectral density from a
are two dominant structures associated with the mixed ρ real lattice QCD correlator.
and ω resonances, a rather broad peak at E ≃ 0.7 GeV, We have considered a lattice correlator, measured by
20
10 account. Here we consider only the connected strange-

σ = 0.44 GeV strange contribution, i.e. we set Jµ = − 13 s̄γµ s and ne-
8
R̂σHLT (E) glect the contribution associated with the fermionic-
R̂σpred (E) disconnected Wick contraction. From Clatt (t) we have
(%)
6 calculated the covariance matrix Σ̂latt that we used to

inject noise in all the training sets that we have built
R̂σpred (E)
σ (E)
and used in this work (see Section IV B).

∆tot
4
Although, as already stressed, in this case we don’t
2
know the exact smeared spectral density, we do expect
from phenomenology to see a sizeable contribution to
0
the unsmeared spectral density coming from the ϕ(1020)
10
resonance. Therefore, the shape of the smeared spec-
σ = 0.63 GeV tral density is expected to be similar to those shown in
R̂σHLT (E) FIG. 16 for which the results obtained from our ensem-
8
R̂σpred (E) bles of neural networks turned out to be reliable. The
comparison of R̂σpred (E) and R̂σHLT (E) is shown in FIG. 19
(%)
6
and, as it can be seen, the agreement between the two
R̂σpred (E)
σ (E)
determinations is remarkably good.

∆tot
4
Some remarks are in order here. The HLT method
and the new method that we propose here are radically
2 different and the fact that the results obtained with the
two procedures are in such a good agreement, especially
0
0.0 0.5 1.0 1.5 2.0 2.5
for E < 1.5 GeV where both methods provide very small
E [GeV] errors, is at the same time reassuring and encouraging
in view of future applications of spectral reconstruction
techniques.
FIG. 20. Relative error on the smeared spectral density ob- Concerning the errors, there is no significant evidence
tained from our ensembles of neural networks (colored points) that one method has better performances than the other.
and the HLT method (grey points) for σ = 0.44 GeV (top
This can be better appreciated in FIG. 20 where the rel-
panel) and σ = 0.63 GeV (bottom panel).
ative error is shown in both cases as a function of the
energy.
the ETMC on the ensemble described in TABLE VI, that We want to stress that in the new proposed method,
has already been used in Ref. [40] to extract the so-called as in the HLT case, no prior information on the expected
strange-strange connected contribution to the smeared spectral density has to be provided and, therefore, the
R-ratio by using the HLT method of Ref. [14]. The choice results for R̂σpred (E) shown in FIG. 19 have to be con-
is motivated by the fact that in this case the exact answer sidered as first-principles model-independent determina-
for the smeared spectral density is not known and we are tions of the smeared strange-strange contribution to the
going to compare our results R̂σpred (E) with the ones, that R-ratio that we managed to obtain by using supervised
deep-learning techniques.
we call R̂σHLT (E), obtained in Ref. [40].
In QCD the R-ratio can be extracted from the lattice We leave to future work on the subject the task of
correlator analysing all the contributions to R(E), as done in
Ref. [40], in order to obtain a machine-learning deter-
3 Z
1X mination of R̂σ (E) to be compared with experiments.
Clatt (t) = − d3 x T⟨0|Ji (x)Ji (0)|0⟩
3 i=1
VIII. CONCLUSIONS
∞
ω 2 −tω
Z
= dω e R(ω) . (64) In this work we have proposed a new method to extract
0 12π 2
smeared hadronic spectral densities from lattice correla-
where Jµ (x) is the hadronic electromagnetic current. The tors. The method has been built by using supervised
previous formula (in which we have neglected the term deep-learning techniques and is characterized by the
proportional to e−(T −t)ω that vanishes in the T 7→ ∞ distinctive features to implement a model-independent
limit) explains the choice that we made in Eq. (32) to training strategy and to provide a reliable estimate of
represent all the correlators that we used as inputs to both the statistical and systematic uncertainties.
our neural networks. In Ref. [40] all the connected and We managed to implement a model-independent train-
disconnected contributions
P to Clatt (t), coming from the ing strategy by introducing a basis in the functional space
fact that Jµ = f qf ψ̄f γµ ψf with f = {u, d, s, c, b, t}, from which we extract the spectral densities that we use
qu,c,t = 2/3 and qd,s,b = −1/3, have been taken into to populate our training sets. In order to obtain a reli-
21
able estimate of the systematic errors, we introduced the Appendix A: Results in exceptional cases
ensembles of machines.
We have shown that, by studying the distribution of In this appendix we illustrate some additional valida-
the answers of the different machines belonging to an en- tion tests that we performed by using mock data designed
semble, at fixed and finite number of neurons, dimension in order to challenge our ensembles of trained machines
and complexity of the training set, it is possible to quan- in extreme situations, i.e. in the case of unsmeared spec-
tify reliably the systematic errors associated with the pro- tral densities that are very different from the ones used
posed method. To do that, we presented a large number in the trainings. To this end, we considered the following
of stringent validation tests, performed with mock data, unsmeared spectral densities
providing quantitative evidence of the reliability of the
procedure that we use to estimate the total error on our ρ1 (E) = 0.03 · δ(E − 0.6) + 0.05 · δ(E − 1.2) ,
final results (see FIG. 13 and FIG. 14).
In addition to mock data, we have applied the new
proposed method also in the case of a lattice QCD corre- ρ2 (E) = 0.03 · δ(E − 0.6) + 0.06 · δ(E − 1.8) ,
lator obtained from a simulation performed by the ETM
Collaboration. We have extracted the strange-strange ρ3 (E) = 0.01 · δ(E − 0.6) − 0.01 · δ(E − 1.3)
connected contribution to the smeared R-ratio and com- + 0.02 · δ(E − 2.2) , (A1)
pared the predictions obtained by using our ensembles of
trained machines with the ones previously obtained by where the arguments of the Dirac-delta functions are in
using the HLT method [14, 40]. We found a remarkably GeV units. The above spectral densities correspond to
good agreement between the results obtained by using either two or three very well separated Dirac-delta peaks
the two totally unrelated methods that provide total er- and one of the peaks of ρ3 (E) has a negative coeffi-
rors of the same order of magnitude. cient. The final predicted results, compared with the
The proposed method requires the training of many exact ones, are shown in FIG. 21 for σ = 0.44 GeV and
machines with different architectures and dimensions of σ = 0.63 GeV. The agreement with ρ̂true σ (E) is excellent
the training sets. Admittedly, although this is a task that in all cases. As it can be seen, a difference w.r.t. the other
can be completed in a few days on a modern desktop com- models considered in Sections V and VI is the size of the
puter, the procedure might end up to be computationally errors that are much larger in these cases for σ = 0.44
demanding. On the one hand, we don’t exclude the pos- GeV. This had to be expected and is a desired feature.
sibility that the proposed method can be simplified in Indeed, the inverse problem becomes harder when one
order to speed up the required computations. At the tries to reconstruct sharp peaks with high resolution.
same time, on the basis of our experience, we are firmly The presence of statistical noise in the input data pre-
convinced that a careful study of the different sources vents any method to provide a very precise result, espe-
of systematic uncertainties is mandatory when dealing cially at high energies. Our neural network strategy, in
with machine-learning techniques and when the aim is these exceptional situations, provides a large error and
to compare theoretical predictions with experiments. In this is a further reassuring evidence of the robustness of
fact, the computational cost of the proposed method is the proposed procedure.
the price that we had to pay for the reliability of the
results.
As a final remark we want to stress that in this paper Appendix B: Dependence on the number of input
we have taught a lesson to a broad audience of learning- time slices
machines. The subject of the lesson, the extraction of
smeared spectral densities from lattice correlators, is just In Section III we emphasized that the noise-to-signal
a particular topic. The idea of teaching systematically to ratio of generic lattice correlators grows exponentially
a broad audience of machines is much more general and with the Euclidean time (see FIG. 7) and that for this
can be used to estimate reliably the systematic errors in reason it might be convenient to reduce the number NT
many other applications. of components of the input vector C.
The approaches based on the Backus-Gilbert regu-
larization method have a built–in mechanism to suppress
ACKNOWLEDGMENTS noisy input data and by using these methods one can
safely feed the whole information contained in the cor-
We warmly thank our colleagues of the ETMC for relator into the algorithm. In general, this cannot be
very useful discussions and for providing us the data of expected to be true when supervised deep-learning ap-
the lattice QCD correlator used in this study. We also proaches are employed. Even though neural networks
thank L. Del Debbio, A. Lupo, M.Panero and M. Sbra- have been proven to be robust in handling noisy data,
gaglia for enlightening discussions and B. Lucini for his too much noise can have a bad impact on the perfor-
extremely valuable comments on a preliminary version of mances since a big fraction of the effort in the minimiza-
the manuscript. tion algorithm is put in the suppression of the noise and
22
0.06
in learning the distinction between outliers and effective
ρ1 (E) information rather than in learning general features of
0.05 σ = 0.63 GeV
σ = 0.44 GeV
the problem.
0.04
In the following we investigate the dependence of the
0.03
training performances, obtained by injecting the noise
ρ̂σ (E)
0.02 of the lattice correlator in the training sets as explained

0.01 in subsection IV B, upon NT . In FIG. 22 we show the
0.00
validation loss as a function of NT . As it can be seen, the
performances of the networks improve as NT increases
−0.01
but the validation loss reaches a saturation point around
NTsat = 40. On the one hand, NTsat can be considered as
0.06 ρ2 (E)
the maximum number of time slices of the correlator from
σ = 0.63 GeV which meaningful physical information can be extracted.
0.05
σ = 0.44 GeV On the other hand, despite the corruption of the input
0.04 data for NT > NTsat , including more time slices does not
compromise the quality of the trainings and we could
ρ̂σ (E)
0.03
safely set NT = T /a = 64.
0.02
0.01
0.00 0.0075 σ = 0.44 GeV

Nρ = 50000
0.0070
Nρ = 100000
0.0065 Nρ = 200000
Validation loss
0.025 ρ3 (E)
σ = 0.63 GeV 0.0060
0.020
σ = 0.44 GeV
0.015 0.0055
0.0050
ρ̂σ (E)
0.010
0.005 0.0045
0.000 0.0040
−0.005 0.0035
10 20 30 40 50 60
−0.010
Input time slices NT
0.0 0.5 1.0 1.5 2.0 2.5
E [GeV]
FIG. 22. Validation losses at the end of the training for differ-
ent number of time slices in the input data and for different
FIG. 21. Predicted smeared spectral densities for the three training set size Nρ . This test is performed with Nb = 512
models of unsmeared spectral densities given in Eq. (A1). The and σ = 0.44 GeV. Each point with the associated error bar
solid lines correspond to ρ̂true
σ (E). The vertical grey lines are comes from the average of Nr = 20 trainings (see Section IV).
in correspondence of the isolated Dirac-delta peaks and the
height is proportional to the associated weights.
[1] J. C. A. Barata and K. Fredenhagen, Particle scattering cients, spectral functions and the lattice, JHEP 04, 053,
in Euclidean lattice field theories, Commun. Math. Phys. arXiv:hep-ph/0203177.
138, 507 (1991). [6] G. Aarts and J. M. Martinez Resco, Continuum and
[2] M. Jarrell and J. E. Gubernatis, Bayesian inference and lattice meson spectral functions at nonzero momentum
the analytic continuation of imaginary-time quantum and high temperature, Nucl. Phys. B 726, 93 (2005),
Monte Carlo data, Phys. Rept. 269, 133 (1996). arXiv:hep-lat/0507004.
[3] Y. Nakahara, M. Asakawa, and T. Hatsuda, Hadronic [7] M. Laine, How to compute the thermal quarkonium spec-
spectral functions in lattice QCD, Phys. Rev. D 60, tral function from first principles?, Nucl. Phys. A 820,
091503 (1999), arXiv:hep-lat/9905034. 25C (2009), arXiv:0810.1112 [hep-ph].
[4] M. Asakawa, T. Hatsuda, and Y. Nakahara, Maxi- [8] H. B. Meyer, Energy-momentum tensor correlators and
mum entropy analysis of the spectral functions in lattice spectral functions, JHEP 08, 031, arXiv:0806.3914 [hep-
QCD, Prog. Part. Nucl. Phys. 46, 459 (2001), arXiv:hep- lat].
lat/0011040. [9] Y. Burnier and A. Rothkopf, Bayesian Approach to Spec-
[5] G. Aarts and J. M. Martinez Resco, Transport coeffi- tral Function Reconstruction for Euclidean Quantum
23
Field Theories, Phys. Rev. Lett. 111, 182003 (2013), plication of radial basis functions neutral networks in
arXiv:1307.6106 [hep-lat]. spectral functions, Phys. Rev. D 104, 076011 (2021),
[10] H. B. Meyer, Transport Properties of the Quark-Gluon arXiv:2106.08168 [hep-ph].
Plasma: A Lattice QCD Perspective, Eur. Phys. J. A 47, [27] T. Lechien and D. Dudal, Neural network approach
86 (2011), arXiv:1104.3708 [hep-lat]. to reconstructing spectral functions and complex poles
[11] M. T. Hansen, H. B. Meyer, and D. Robaina, From deep of confined particles, SciPost Phys. 13, 097 (2022),
inelastic scattering to heavy-flavor semileptonic decays: arXiv:2203.03293 [hep-lat].
Total rates into multihadron final states from lattice [28] D. Boyda et al., Applications of Machine Learning to Lat-
QCD, Phys. Rev. D 96, 094513 (2017), arXiv:1704.08993 tice Quantum Field Theory, in 2022 Snowmass Summer
[hep-lat]. Study (2022) arXiv:2202.05838 [hep-lat].
[12] R.-A. Tripolt, P. Gubler, M. Ulybyshev, and [29] S. Shi, L. Wang, and K. Zhou, Rethinking the ill-
L. Von Smekal, Numerical analytic continuation of posedness of the spectral function reconstruction — Why
Euclidean data, Comput. Phys. Commun. 237, 129 is it fundamentally hard and how Artificial Neural Net-
(2019), arXiv:1801.10348 [hep-ph]. works can help, Comput. Phys. Commun. 282, 108547
[13] J. Bulava and M. T. Hansen, Scattering amplitudes (2023), arXiv:2201.02564 [hep-ph].
from finite-volume spectral functions, Phys. Rev. D 100, [30] T. Bergamaschi, W. I. Jay, and P. R. Oare, Hadronic
034521 (2019), arXiv:1903.11735 [hep-lat]. Structure, Conformal Maps, and Analytic Continuation,
[14] M. Hansen, A. Lupo, and N. Tantalo, Extraction of spec- (2023), arXiv:2305.16190 [hep-lat].
tral densities from lattice correlators, Phys. Rev. D 99, [31] A. Rothkopf, Inverse problems, real-time dynamics and
094508 (2019), arXiv:1903.06476 [hep-lat]. lattice simulations, EPJ Web Conf. 274, 01004 (2022),
[15] L. Kades, J. M. Pawlowski, A. Rothkopf, M. Scherzer, arXiv:2211.10680 [hep-lat].
J. M. Urban, S. J. Wetzel, N. Wink, and F. P. G. Ziegler, [32] J. Bulava, The spectral reconstruction of inclusive rates,
Spectral Reconstruction with Deep Neural Networks, PoS LATTICE2022, 231 (2023), arXiv:2301.04072
Phys. Rev. D 102, 096001 (2020), arXiv:1905.04305 [hep-lat].
[physics.comp-ph]. [33] K. Hornik, M. Stinchcombe, and H. White, Multilayer
[16] J. Karpie, K. Orginos, A. Rothkopf, and S. Zafeiropoulos, feedforward networks are universal approximators, Neu-
Reconstructing parton distribution functions from Ioffe ral Networks 2, 359 (1989).
time data: from Bayesian methods to Neural Networks, [34] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learn-
JHEP 04, 057, arXiv:1901.05408 [hep-lat]. ing (MIT Press, 2016) http://www.deeplearningbook.
[17] G. Bailas, S. Hashimoto, and T. Ishikawa, Reconstruction org.
of smeared spectral function from Euclidean correlation [35] G. Cybenko, Approximation by superpositions of a sig-
functions, PTEP 2020, 043B07 (2020), arXiv:2001.11779 moidal function, Mathematics of control, signals and sys-
[hep-lat]. tems 2, 303 (1989).
[18] P. Gambino and S. Hashimoto, Inclusive Semileptonic [36] T. Pearce, F. Leibfried, and A. Brintrup, Uncertainty
Decays from Lattice QCD, Phys. Rev. Lett. 125, 032001 in neural networks: Approximately bayesian ensembling,
(2020), arXiv:2005.13730 [hep-lat]. in Proceedings of the Twenty Third International Con-
[19] M. Bruno and M. T. Hansen, Variations on the Maiani- ference on Artificial Intelligence and Statistics, Proceed-
Testa approach and the inverse problem, JHEP 06, 043, ings of Machine Learning Research, Vol. 108, edited by
arXiv:2012.11488 [hep-lat]. S. Chiappa and R. Calandra (PMLR, 2020) pp. 234–244.
[20] J. Horak, J. M. Pawlowski, J. Rodrı́guez-Quintero, [37] C. E. Rasmussen and C. K. I. Williams, Gaussian pro-
J. Turnwald, J. M. Urban, N. Wink, and S. Zafeiropou- cesses for machine learning., Adaptive computation and
los, Reconstructing QCD spectral functions with Gaus- machine learning (MIT Press, 2006) pp. I–XVIII, 1–248.
sian processes, Phys. Rev. D 105, 036014 (2022), [38] K. P. Murphy, Machine learning : a probabilistic perspec-
arXiv:2107.13464 [hep-ph]. tive (MIT Press, Cambridge, Mass. [u.a.], 2013).
[21] J. Bulava, M. T. Hansen, M. W. Hansen, A. Patella, and [39] G. Backus and F. Gilbert, The resolving power of gross
N. Tantalo, Inclusive rates from smeared spectral den- earth data, Geophys. J. Int. 16, 169 (1968).
sities in the two-dimensional O(3) non-linear σ-model, [40] C. Alexandrou et al. (Extended Twisted Mass Collabo-
JHEP 07, 034, arXiv:2111.12774 [hep-lat]. ration (ETMC)), Probing the Energy-Smeared R Ratio
[22] L. Wang, S. Shi, and K. Zhou, Automatic differen- Using Lattice QCD, Phys. Rev. Lett. 130, 241901 (2023),
tiation approach for reconstructing spectral functions arXiv:2212.08467 [hep-lat].
with neural networks, in 35th Conference on Neural In- [41] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
formation Processing Systems (2021) arXiv:2112.06206 Gradient-based learning applied to document recogni-
[physics.comp-ph]. tion, Proceedings of the IEEE 86, 2278 (1998).
[23] S.-Y. Chen, H.-T. Ding, F.-Y. Liu, G. Papp, and [42] F. Chollet et al., Keras, https://keras.io (2015).
C.-B. Yang, Machine learning Hadron Spectral Func- [43] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,
tions in Lattice QCD, PoS LATTICE2021, 148 (2022), C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin,
arXiv:2112.00460 [hep-lat]. S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is-
[24] S. Y. Chen, H. T. Ding, F. Y. Liu, G. Papp, and ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-
C. B. Yang, Machine learning spectral functions in lattice enberg, D. Mané, R. Monga, S. Moore, D. Murray,
QCD, (2021), arXiv:2110.13521 [hep-lat]. C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,
[25] L. Wang, S. Shi, and K. Zhou, Reconstructing spec- K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan,
tral functions via automatic differentiation, Phys. Rev. F. Viégas, O. Vinyals, P. Warden, M. Wattenberg,
D 106, L051502 (2022), arXiv:2111.14760 [hep-ph]. M. Wicke, Y. Yu, and X. Zheng, TensorFlow: Large-
[26] M. Zhou, F. Gao, J. Chao, Y.-X. Liu, and H. Song, Ap- scale machine learning on heterogeneous systems (2015),
24
software available from tensorflow.org. [47] T. Aoyama et al., The anomalous magnetic moment of
[44] J. P. Boyd, Chebyshev and Fourier spectral methods the muon in the Standard Model, Phys. Rept. 887, 1
(Courier Corporation, 2001). (2020), arXiv:2006.04822 [hep-ph].
[45] D. P. Kingma and J. Ba, Adam: A method for stochastic [48] C. Alexandrou et al., Lattice calculation of the short and
optimization (2017), arXiv:1412.6980 [cs.LG]. intermediate time-distance hadronic vacuum polariza-
[46] D. Bernecker and H. B. Meyer, Vector Correlators in Lat- tion contributions to the muon magnetic moment using
tice QCD: Methods and applications, Eur. Phys. J. A 47, twisted-mass fermions, (2022), arXiv:2206.15084 [hep-
148 (2011), arXiv:1107.4388 [hep-lat]. lat].

Teaching To Extract Spectral Densities From Lattice Correlators To A Broad Audience of Learning-Machines

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teaching To Extract Spectral Densities From Lattice Correlators To A Broad Audience of Learning-Machines

Uploaded by

Copyright:

Available Formats

Teaching to extract spectral densities from lattice correlators

to a broad audience of learning-machines

FIG. 2. We generate several training sets Tσ (Nb , Nρ ) built

These findings strongly motivated us to develop an ap- index r (∆net stat

2. if such a strategy is found, is it then possible to

under the assumption that the infinite-volume spectral 2.0

density is sufficiently regular to have En = 5.0

ρ̂(En ) = gτ (En ) Ĉ(aτ ) , 1013

0.0 0.5 1.0 1.5 2.0 2.5 √

0.0 0.5 1.0 1.5 2.0

with ρ(E; Nb ) can be calculated by using Eq. (32) and E [GeV]

of the limits N 7→ ∞, we have generated with the D. Training an ensemble of machines

data pre-processed σ = 0.63 GeV

100000 200000 300000 400000 500000 600000 700000 800000

Having explained in the previous section the procedure ∆stat

Label Nn Nb Nρ E = 0.16 GeV

obtained as described in subsection IV A, i.e. by start- −0.630

number of basis functions which is twice as large as the −0.635

largest Nb employed in the training sessions. −0.640

Relative error budget (%)

−0.64 0.0 0.5 1.0 1.5 2.0 2.5

Model C M [GeV] Γ [GeV] δ2 [GeV] 2.00

ρGM (E) 10 1 0.1 1.5 1.75

model, Eq. 60. The corresponding functions are plotted in 0.50

to physical models, are discussed in Appendix A. 2.00

For each test discussed in the following subsections 1.75

nance and a multi-particle plateau. To build mock spec- 1.00

tral densities belonging to this class we used the same 0.75

The sigmoid function θ̂(E, δ1 , ζ1 ) with δ1 = 0.1 GeV and

1.2 σ = 0.63 GeV

FIG. 16. Reconstructed smeared spectral densities (points

tribution, see Eq. (62). The unsmeared ρ(E) is represented 0.4

FIG. 18. Results for the R-ratio using the parametrization of

an indication of the fact that the smoothness of the un-

in the precision that one can get on ρ̂pred

10 account. Here we consider only the connected strange-

6 calculated the covariance matrix Σ̂latt that we used to

and used in this work (see Section IV B).

determinations is remarkably good.

0.02 of the lattice correlator in the training sets as explained

0.00 0.0075 σ = 0.44 GeV

You might also like