Professional Documents
Culture Documents
Hayou - 2023 - On The Connection Between Riemann Hypothesis and Neural Networks
Hayou - 2023 - On The Connection Between Riemann Hypothesis and Neural Networks
Hayou - 2023 - On The Connection Between Riemann Hypothesis and Neural Networks
Abstract
The Riemann hypothesis (RH) is a long-standing open problem in mathematics. It conjectures that
non-trivial zeros of the zeta function all lie on the line Re(z) = 1/2. The extent of the consequences
of RH is far-reaching and touches a wide spectrum of topics including the distribution of prime
numbers, the growth of arithmetic functions, the growth of Euler’s totient, etc. In this note, we
revisit and extend an old analytic criterion of the RH known as the Nyman-Beurling criterion
which connects the RH to a minimization problem that involves a special class of neural networks.
This note is intended for an audience unfamiliar with RH. A gentle introduction to RH is provided.
1. Introduction
The Riemann hypothesis conjectures that the non-trivial zeros of the Riemann zeta function are
located on the line Re(z) = 21 in the complex plane C. This is a long-standing open problem in
number theory first formulated by (Riemann, 1859). The Riemann P zeta function was first defined
for complex numbers z with a real part greater than 1 by ζ(z) = ∞ 1
n=1 nz , z ∈ C, Re(z) > 1.
However, it is the extension of the zeta function ζ to the whole complex plane C that is considered
in the statement of RH. This extension is called the analytic continuation of the zeta function (de-
tails are provided in Appendix A).
There is strong empirical evidence that RH holds. Recent numerical verification by Platt and Trud-
gian (2021) showed that RH is at least true in the region {z = a + ib ∈ C : a ∈ (0, 1), b ∈ (0, γ]}
where γ = 3 · 1012 , meaning that all zeros of the zeta function with imaginary parts in (0, γ] have a
real part equal to 12 . Several other theoretical insights seem to support RH ;we invite the reader to
check Appendix A for a short summary of relevant results and insights. In this note, we are inter-
ested in an specific criterion of the RH, i.e. an equivalent statement of RH. This criterion is known
as the Nyman-Beurling criterion (Nyman, 1950; Beurling, 1955) which states that RH holds if and
only if a special class of functions is dense in L2 (0, 1). This class of functions can be seen as a
special kind of neural networks with one dimensional input. In this note, we show that the sufficient
condition can be easily extended to L2 ((0, 1)d ). Specifically, we introduce a new class of neural
networks and show that RH implies the density of this class in L2 ((0, 1)d ) for any d ≥ 2. The
necessary condition in general dimension d ≥ 2 remains an open question.
2. Riemann Hypothesis
The Riemann zeta function was originally defined for complex numbers z with a real part greater
than 1 by
∞
X 1
ζ(z) = , z ∈ C, Re(z) > 1. (1)
nz
n=1
.
H AYOU
The above definition of Riemann zeta function excludes the region of interest {z ∈ C : Re(z) = 12 }
since the series in Eq. (3) diverge when |z| < 1. Indeed, RH is stated for the an extension of the
zeta function on the whole complex plane C. This extension is called the analytic continuation, and
it is unique by the Identity theorem (Walz, 2017). To give the reader some intuition of how such
extension is defined, let us show how we can extend ζ to the region {z ∈ C : Re(z) > 0}. Observe
that the function ζ satisfies the following identity
∞ ∞ ∞
1−z
X 1 X 1 X (−1)n+1
(1 − 2 )ζ(z) = − 2 = ,
nz (2n)z nz
n=1 n=1 n=1
where the right hand side is defined for any complex number z such that Re(z) > 0. Using similar
techniques, we can show that for any z ∈ C such that Re(z) ∈ (0, 1),
πz
ζ(z) = 2z π z−1 sin Γ(1 − z)ζ(1 − z), (2)
2
which helps extend ζ to complex numbers with negative real part. A step by step explanation of the
analytic continuation of the ζ function is provided in Appendix A.
Zeros of the ζ function. From Eq. (2), we have ζ(−2k) = 0 for any integer k ≥ 1. The negative
even integers {−2k}k≥1 are thus called trivial zeros of the Riemann zeta function since the result
follows from the simple fact that sin (−πk) = 0 for all integers k ≥ 1. The other zeros of ζ are
called non-trivial zeros, and their properties remain poorly understood. The RH conjectures that
they all lie on a the line Re(z) = 21 .
Riemann Hypothesis (RH). All non-trivial zeros of ζ have a real part equal to 21 .
Whether RH holds is still an open question. The consequences of the Riemann hypothesis are
various (see Appendix A) and numerous equivalent results exist in the literature. In the next section,
we re-visit an old analytic criterion of RHthat involves a special type of functions that can be seen
as single layer neural networks.
where µ is the Lebesgue measure on Rd . We denote by ∥.∥p the standard Lebesgue norm defined
1/p
by ∥f ∥p = S |f |p dµ
R
for f ∈ Lp (S).
def
For some k ≥ 1, let Ik = (0, 1)k = (0, 1)×· · ·×(0, 1) where the product contains k terms. Let
ρ denote the fractional part function given by ρ(x) = x − ⌊x⌋, for x ∈ R. Consider the following
class of functions defined on the interval I1
m
X βi
N = {f (x) = ci ρ , x ∈ I1 : m ≥ 1, c ∈ Rm , β ∈ Im , cT β = 0}.
x
i=1
2
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS
1.0 1.0 0
0.8 0.8 1
0.6 0.6
(0.1/x)
(0.5/x)
2
f(x)
0.4 0.4
0.2 0.2 3
0.0 0.0 4
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x x
Figure 1: (Left) The curve of the function x → ρ(α/x) for α ∈ {0, 1; 0.5}. (Right) The graph of the
function f (x) = ρ(0.7/x) − ρ(0.3/x) − 4ρ(0.1/x), which belongs to the class N .
neuron values for different choices of βi . The graphs show fluctuations when x is close to 0 which
should be expected since the function x → ρ(βi /x) fluctuates indefinitely between 0 and 1 as x
goes to zero, whenever βi ̸= 0. In figure Fig. 1 (right), we show an example of a function from the
class N given by f (x) = ρ(0.7/x) − ρ(0.3/x) − 4ρ(0.1/x). We observe that f is a step function
which might be surprising at first glance. However, it is easy to see that N consists only of step
functions. This is due to the constraint on the parameters c, β, and the fact that ρ(x) = x − ⌊x⌋.
Now, we are ready to state the main results that draw an interesting connection between RH and
the class N .
Beurling (1955) later extended this result by showing that for any p > 1, the ζ function has no
zeroes in the set {z ∈ C : Re(z) > 1/p} if and only if the set N is dense in Lp (I1 ).
Theorem 2 (Beurling (1955)) The Riemann zeta function is free from zeros in the half plane
Re(z) > p1 , 1 < p < ∞, if and only if N is dense in Lp (I1 ).
The intuition behind this connection is rather simple. The number of fluctuations of the function
x → ρ(β/x) near 0 is closely related to the ζ function. To understand the machinery of the proofs
of Theorem 1 and Theorem 2, we provide a sketch of the proof by Beurling (1955) for the sufficient
condition in Appendix B. Using the same techniques, we derive the following result on zero-free
regions of the zeta function.
The condition that N should be dense in L2 (I1 ) can be replaced by the following weaker condition:
the constant function 1 on I1 can be approximated up to an arbitrary accuracy with functions from
N . This is because from the constant function 1, one can construct an approximation of any step-
wise function, which in turn can approximate any function in L2 (I1 ).
A discussion on the empirical implications of Theorem 1 is provided in Appendix B. In the next
section, we show that the sufficient condition of Theorem 2 can be easily generalized to networks
with multi-dimensional inputs, i.e. the case d ≥ 1.
3
H AYOU
d X
m
X βi,j
Nd = {f (x) = ci,j ρ , x ∈ Id : m ≥ 1, c ∈ Rd×m , β ∈ Id×m , cT β = 0},
xj
j=1 i=1
where c = (c1,1 , c2,1 , . . . , cm,1 , . . . , cm,d )⊤ ∈ Rmd is the flattened vector of (c.,j )1≤j≤d . Notice
that we recover the Nyman-Beurling class N when d = 1. Using this class, we can generalize the
zero-free region result given by Theorem 1 to a multi-dimensional setting in the case p = 2.1
4
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS
The result of Theorem 4 has an important implication on the choice of the sample size. Indeed, to
1/d
have the coefficient δN of order ϵ with high probability, a necessary condition is that N = O(ϵ2d ).
When is the multi-dimensional variant better than the one-dimensional criterion? For some
d ≥ 2, it is straightforward that the multi-dimensional criterion given in Lemma 2 is better than the
1/d
one given in Lemma 1 only if inf f ∈Nd ∥1 − f ∥2 < inf f ∈N ∥1 − f ∥2 . Under this condition, the
zero-free region is larger with d ≥ 2. For empirical verification of the RH and for same probability
threshold α, Lemma 4 implies that the multi-dimensional setting is better than the one-dimensional
counterpart whenever inf f ∈Nd ∆N (f )1/d < inf f ∈N ∆N (f ). We discuss the feasibility of such
conditions in the next paragraph.
What does it take to improve upon existing numerical verifications of RH? The high proba-
bility zero-free regions from Lemma 4 are of the form {Re(z) > 21 (1 + ∆|z|2 )} for some constant
∆ > 0.
Using a different analytical criterion of the RH, Platt and Trudgian (2021) showed that the region
{a + ib : a ∈ (0, 1), b ∈ (0, γ], γ ≈ 3 · 1012 } is free of the zeroes of ζ. Hence, using Lemma 4 to
improve this result requires that the region RN ∩ {a + ib, a ∈ (0, 1), b ∈ (0, γ]} contains complex
numbers z with imaginary part larger than order 1012 . Let z = a + ib ∈ C. Having z ∈ RN implies
that b2 < −a2 + ∆N (f )−1/d (2a − 1). For the region of interest where a ∈ (0, 1), and assuming that
∆N (f ) is small enough, the right-hand side is of order ∆N (f )−1/d (2a − 1) which is maximized for
a = 1 and equal to ∆N (f )−1/d . Thus, to improve upon existing work (Platt and Trudgian, 2021)
(at least with some high probability certificate), we need to have ∆N (f )−1/d of order 1012 , which
means that ∆N (fP )1/d should be at least of order 10−12 . This requires a the minimize of a the em-
−1 N 2 24
pirical risk N i=1 (1 − f (xi )) with a minimum sample size of order 10 which is unfeasible
with the current compute resources.
5
H AYOU
References
A. Beurling. A closure problem related to the Riemann Zeta-function. Proceedings of the National
Academy of Sciences of the United States of America, 41 (5):312–314, 1955.
P. Borwein. An efficient algorithm for the Riemann Zeta function. CECM-95-043, 1995.
A.W. Dudek. On the Riemann hypothesis and the difference between primes. International Journal
of Number Theory, 11(03):771–778, 2015.
H.M. Edwards. Riemann’s Zeta Function. Pure and Applied Mathematics, A Series of Monographs
and Textbooks, 1974.
G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford University Press,
1938.
B. Nyman. On the one-Dimensional translation group and semi-group in certain function spaces.
1950.
D. Platt and T. Trudgian. The Riemann hypothesis is true up to 3·1012. Bulletin of the London
Mathematical Society, 53(3):792–797, 2021.
B. Riemann. Ueber die Anzahl der Primzahlen unter einer gegebenen Grosse. Gesammelte math.
Werke und wissensch, 2:145–155, 1859.
J. Sondow. Zeros of the Alternating Zeta Function on the Line R(s) = 1. The American Mathematical
Monthly, 110(5):435–437, 2003.
E.C. Titchmarsh. The theory of the Riemann Zeta-function (2nd ed.). The Clarendon Press Oxford
University, 1986.
6
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS
There is a strong empirical evidence that RH holds. Recent numerical verification by Platt and
Trudgian (2021) showed that RH is at least true in the region {z = a + ib ∈ C : a ∈ (0, 1), b ∈
(0, γ]} where γ = 3 · 1012 , meaning that all zeros of the zeta function with imaginary parts in
(0, γ] have a real part equal to 12 . Other theoretical insights seem to support RH. For instance,
french mathematician A. Denjoy gave the following probabilistic argument for RH (mentioned in
Edwards (1974)): if (µ(k))k≥0 is a sequence of independent Bernoulli Prandom variables with values
+1 or −1 (each with probability 1/2), then for any ε > 0, we have k≤x µ(k) = Ox→∞ (x1/2+ε )
with probability 1. This statement is closely related to RH as shown by British mathematician J.E.
LittlewoodPin 1912 (mentioned in Titchmarsh (1986)). Indeed, Littlewood showed that RH is akin
to having k≤x µ(k) = Ox→∞ (x1/2+ε ) for all ε > 0, where µ is the Möbius function, a function
with values in {−1, 0, 1} that gives the parity of the number of prime factors in the prime decompo-
sition of integers (see e.g. Hardy and Wright (1938) for details about the Möbius function). Hence,
with respect to Denjoy’s argument, RH (informally) suggests that the sequence of Möbius function
µ(x) behaves like a random walk. This provides a probabilistic interpretation of RH but it does not
constitute a proof. There are many consequences to RH, and probably the most important of these
is the distribution of prime numbers. Recent work by Dudek (2015) showed that RH implies that
√
for any real number x ≥ 2, there exists a prime number px such that x − π4 x log(x) ≤ p ≤ x.
Another consequence
P of RH is with regards to the growth rate of the Mertens function defined
by M (x) = k≤x µ(k) where µ is the Möbius function; RH implies that the Mertens function
satisfies M (x) = Ox→∞ (x1+ε ) for all ε > 0. Such functions are ubiquitous in number theory and
quantifying their growth rate has several applications. Another intriguing consequence of RH is
given by the Nyman-Beurling criterion (Nyman, 1950; Beurling, 1955) which states that RH holds
if and only if a special class of functions is dense in L2 (0, 1). This class consists of neural networks
with one dimensional input, and have a special parameterization.
The Riemann hypothesis conjectures that the non-trivial zeros of the Riemann zeta function are
complex numbers with a real part 12 2 . It is a long-standing open problem in number theory first
formulated by Riemann (1859). The Riemann zeta function is defined for complex numbers z with
a real part greater than 1 by
∞
X 1
ζ(z) = , z ∈ C, Re(z) > 1. (3)
nz
n=1
The above definition of Riemann zeta function excludes the region of interest {z ∈ C : Re(z) = 21 }
since the series in Eq. (3) diverge when |z| < 1. Indeed, RH is stated for the analytic continuation
of the zeta function which is an extension of the zeta function on a larger set. As shown by Riemann,
the function ζ extends to the whole complex plane C while preserving some desirable properties
such as analyticity. This extension is called the analytic continuation, and it is unique by the Identity
theorem (Walz, 2017). Let us construct this analytic continuation of ζ step by step.
2. Trivial zeros are negative even numbers, see below for more details.
7
H AYOU
1. Extension to {z ∈ C : Re(z) > 0}. Observe that the function ζ satisfies the following
identity
∞ ∞ ∞
1−z
X 1 X 1 X (−1)n+1
(1 − 2 )ζ(z) = − 2 = ,
nz (2n)z nz
n=1 n=1 n=1
where the right hand side is defined for any complex number z such that Re(z) > 0. The
P∞ (−1)n+1
series η(z) = n=1 nz , known as the Dirichlet eta function, is defined for complex
numbers z satisfying Re(z) > 0. Using this expression, we can extend the definition of ζ to
the set {z ∈ C : Re(z) > 0, 21−z ̸= 1} by
∞
1 X (−1)n+1
ζ(z) = . (4)
1 − 21−z nz
n=1
This definition extends the original domain of definition of ζ to all complex numbers such that
2πn
Re(z) > 0 except for those satisfying 21−z = 1, which are all of the form zn = 1 + i log(2) ,
where n is an non-zero integer. Using classical properties of the Dirichlet eta function η(z) =
P∞ (−1)n+1
n=1 nz (Borwein, 1995), also known as the alternating zeta function, the zeta function
ζ can be analytically continued to include the set {zn , n ≥ 1} (Widder, 1941; Sondow, 2003).
2. Extension to {z ∈ C : Re(z) ≤ 0}\{0}. The Dirichlet function η satisfies the following
functional equation (Borwein, 1995)
2z−1 − 1 z−1 πz
η(z) = 2 z
π z sin Γ(−z)η(1 − z), (5)
2 −1 2
where Γ is the Gamma function. Using Eq. (5), Eq. (4), and the property of the Gamma
function Γ(z + 1) = zΓ(z) for all complex numbers z ∈ C (Γ is the analytic
R ∞ continuation of
the original Gamma function defined on {z ∈ C, Re(z) > 0} by Γ(z) = 0 tz−1 e−t dt), we
obtain that for any z ∈ C such that Re(z) ∈ (0, 1),
πz
ζ(z) = 2z π z−1 sin Γ(1 − z)ζ(1 − z). (6)
2
Using the functional equation Eq. (2), we can extend the definition of ζ to the all the remaining
complex numbers z such that Re(z) ≤ 0 and z ̸= 0. It can further be shown that ζ can be
continued to 0 with ζ(0) = −1/2 (Borwein, 1995).
Zeros of the ζ function. From Eq. (2), the zeta function satisfies ζ(−2k) = 0 for any integer
k ≥ 1. The negative even integers {−2k}k≥1 are thus called trivial zeros of the Riemann zeta
function since the result follows from the simple fact that sin (−πk) = 0 for all integers k ≥ 1. The
other zeros of ζ are called non-trivial zeros, and their properties remain largely misunderstood. The
RH conjectures that they all lie on a the line Re(z) = 12 .
Riemann Hypothesis (RH). All non-trivial zeros of the Riemann zeta function ζ have a real part
equal to 21 .
Whether RH holds is still an open question. However, there have been a number of attempts to
prove or disprove RH in the literature. In the next section, we re-visit an old result that provides
an analytic point of view of RH.
8
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS
Beurling (1955) later extended this result by showing that for any p > 1, the ζ function has no
zeroes in the set {z ∈ C : Re(z) > 1/p} if and only if the set N is dense in Lp (I1 ).
Theorem 2 (Beurling (1955)) The Riemann zeta function is free from zeros in the half plane
Re(z) > p1 , 1 < p < ∞, if and only if N is dense in Lp (I1 ).
The following is a sketch of the proof by Beurling (1955). It helps understand the machinery of
the proofs of Theorem 1 and Theorem 2 for the sufficient condition.
Sketch of the proof. The connection between RH and the class N is due to the following
identity that relates the zeta function ζ to the fractional part function ρ3
Z 1
θ θ θz ζ(z)
ρ xz−1 dx = − , ∀z ∈ C, Re(z) > 0, θ ∈ I. (7)
0 x z−1 z
ζ(z) m z−1
Z 1 P
z−1 i=1 ci βi
f (x)x dx = − , ∀z ∈ C, Re(z) > 0. (8)
0 z
Now fix p > 1 and assume that the class N is dense in Lp (I). Therefore, given ε > 0, there exists a
function f ∈ N such that ∥1 − f ∥ < ε, where 1 is the constant function on I equal to 1 everywhere.
Using Eq. (8), we obtain
m
Z 1 !
z−1 1 X
z−1
(1 − f (x))x dx = 1 + ζ(z) ci βi , ∀z ∈ C, Re(z) > 0.
0 z
i=1
Using Hölder’s inequality, we have for any z such that Re(z) > 1/p
1
∥1 − f ∥p ∥xz−1 ∥q < ε ,
(q(Re(z) − 1/p))1/q
Hence, in the region {z ∈ C, Re(z) > 1/p + εq |z|q /q}, ζ(z) cannot be equal to zero. Since ε is
arbitrarily chosen, we conclude that ζ(z) ̸= 0 if Re(z) > 1/p.
For the necessary condition, We invite the reader to check Beurling (1955).
The identity (7) is the nub of the proof above. It provides an integral representation of the zeta
function ζ in terms of the fractional part function ρ. Hence, one would expect that some properties
3. After multiple attempts, we could not find the original paper where this identity has first appeared. However, it has
been mentioned in different works, e.g. (Nyman, 1950; Beurling, 1955).
9
H AYOU
of zeta function should in-principle be reflected on some function classes involving the function ρ.
This is precisely the idea behind Theorems 1 and 2. More importantly, from the analysis above, we
have the following analytic criterion for zero-free regions of the zeta function.
Lemma 1 (Nyman-Beurling zero-free regions) Let f ∈ N and δ = ∥1 − f ∥2 be the distance
between the constant function 1 on I and f . Then, the region {z ∈ C, Re(z) > 12 1 + δ 2 |z|2 } is
d X
m
X βi,j
Nd = {f (x) = ci,j ρ , x ∈ I d : m ≥ 1, c ∈ Rd×m , β ∈ Id×m , cT β = 0},
xj
j=1 i=1
where c = (c1,1 , c2,1 , . . . , cm,1 , . . . , cm,d ) ∈ Rm×d is the flattened vector of (c.,j )1≤j≤d . Notice that
we recover the Nyman-Beurling class when d = 1. Using this class, we can generalize the zero-free
region result given by Theorem 1 to a multi-dimensional setting.
function ζ.
R1 z
Proof Let d ≥ 1 and f ∈ Nd . Using Eq. (7), namely 0 ρ αx xz−1 dx = z−1 α
− α ζ(z)
z for
α ∈ (0, 1), Re(z) > 1/2, we have that
d d X
m z ζ(z)
βi,j
Z
Y
−d+1
X βi,j
f (x) xz−1
j dx =z ci,j −
Id z−1 z
j=1 j=1 i=1
d X
X m
−d z
= −z ci,j βi,j ζ(z),
j=1 i=1
10
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS
Therefore, we have
Z d
Y d X
X m
(1 − f (x)) xz−1
j dx = z −d (1 + ζ(z) z
ci,j βi,j ). (9)
Id j=1 j=1 i=1
Qd 2
z−1
where we have used the fact that j=1 xj = (2Re(z) − 1)−d . Therefore, for all complex
2
2/d 2
numbers z satisfying Re(z) > 12 + δ 2|z| , we have ζ(z) ̸= 0. This is true since we can choose f
such that ∥1 − f ∥ is arbitrarily close to δ.
Notice that if δ can be chosen arbitrarily small, then the zero-free region in Theorem 2 can be
extended to the whole half-plane {Re(z) > 1/2}. This is a generalization of the sufficient condition
of Theorem 2 in the multi-dimensional case.
11