Hayou - 2023 - On The Connection Between Riemann Hypothesis and Neural Networks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

On the Connection Between Riemann Hypothesis

and a Special Class of Neural Networks


Soufiane Hayou HAYOU @ BERKELEY. EDU
Simons Institute
UC Berkeley
arXiv:2309.09171v1 [stat.ML] 17 Sep 2023

Abstract
The Riemann hypothesis (RH) is a long-standing open problem in mathematics. It conjectures that
non-trivial zeros of the zeta function all lie on the line Re(z) = 1/2. The extent of the consequences
of RH is far-reaching and touches a wide spectrum of topics including the distribution of prime
numbers, the growth of arithmetic functions, the growth of Euler’s totient, etc. In this note, we
revisit and extend an old analytic criterion of the RH known as the Nyman-Beurling criterion
which connects the RH to a minimization problem that involves a special class of neural networks.
This note is intended for an audience unfamiliar with RH. A gentle introduction to RH is provided.

1. Introduction
The Riemann hypothesis conjectures that the non-trivial zeros of the Riemann zeta function are
located on the line Re(z) = 21 in the complex plane C. This is a long-standing open problem in
number theory first formulated by (Riemann, 1859). The Riemann P zeta function was first defined
for complex numbers z with a real part greater than 1 by ζ(z) = ∞ 1
n=1 nz , z ∈ C, Re(z) > 1.
However, it is the extension of the zeta function ζ to the whole complex plane C that is considered
in the statement of RH. This extension is called the analytic continuation of the zeta function (de-
tails are provided in Appendix A).
There is strong empirical evidence that RH holds. Recent numerical verification by Platt and Trud-
gian (2021) showed that RH is at least true in the region {z = a + ib ∈ C : a ∈ (0, 1), b ∈ (0, γ]}
where γ = 3 · 1012 , meaning that all zeros of the zeta function with imaginary parts in (0, γ] have a
real part equal to 12 . Several other theoretical insights seem to support RH ;we invite the reader to
check Appendix A for a short summary of relevant results and insights. In this note, we are inter-
ested in an specific criterion of the RH, i.e. an equivalent statement of RH. This criterion is known
as the Nyman-Beurling criterion (Nyman, 1950; Beurling, 1955) which states that RH holds if and
only if a special class of functions is dense in L2 (0, 1). This class of functions can be seen as a
special kind of neural networks with one dimensional input. In this note, we show that the sufficient
condition can be easily extended to L2 ((0, 1)d ). Specifically, we introduce a new class of neural
networks and show that RH implies the density of this class in L2 ((0, 1)d ) for any d ≥ 2. The
necessary condition in general dimension d ≥ 2 remains an open question.

2. Riemann Hypothesis
The Riemann zeta function was originally defined for complex numbers z with a real part greater
than 1 by

X 1
ζ(z) = , z ∈ C, Re(z) > 1. (1)
nz
n=1

.
H AYOU

The above definition of Riemann zeta function excludes the region of interest {z ∈ C : Re(z) = 12 }
since the series in Eq. (3) diverge when |z| < 1. Indeed, RH is stated for the an extension of the
zeta function on the whole complex plane C. This extension is called the analytic continuation, and
it is unique by the Identity theorem (Walz, 2017). To give the reader some intuition of how such
extension is defined, let us show how we can extend ζ to the region {z ∈ C : Re(z) > 0}. Observe
that the function ζ satisfies the following identity
∞ ∞ ∞
1−z
X 1 X 1 X (−1)n+1
(1 − 2 )ζ(z) = − 2 = ,
nz (2n)z nz
n=1 n=1 n=1

where the right hand side is defined for any complex number z such that Re(z) > 0. Using similar
techniques, we can show that for any z ∈ C such that Re(z) ∈ (0, 1),
 πz 
ζ(z) = 2z π z−1 sin Γ(1 − z)ζ(1 − z), (2)
2
which helps extend ζ to complex numbers with negative real part. A step by step explanation of the
analytic continuation of the ζ function is provided in Appendix A.
Zeros of the ζ function. From Eq. (2), we have ζ(−2k) = 0 for any integer k ≥ 1. The negative
even integers {−2k}k≥1 are thus called trivial zeros of the Riemann zeta function since the result
follows from the simple fact that sin (−πk) = 0 for all integers k ≥ 1. The other zeros of ζ are
called non-trivial zeros, and their properties remain poorly understood. The RH conjectures that
they all lie on a the line Re(z) = 21 .
Riemann Hypothesis (RH). All non-trivial zeros of ζ have a real part equal to 21 .
Whether RH holds is still an open question. The consequences of the Riemann hypothesis are
various (see Appendix A) and numerous equivalent results exist in the literature. In the next section,
we re-visit an old analytic criterion of RHthat involves a special type of functions that can be seen
as single layer neural networks.

2.1. A Neural Network Criterion for RH


For p > 1, d ∈ N\{0}, and some set S ⊂ Rd , let Lp (S) denote the set of real-valued functions f
defined on S such as |f |p is Lebesgue integrable, i.e. Lp (S) = {f : S → R : S |f |p dµ < ∞},
R

where µ is the Lebesgue measure on Rd . We denote by ∥.∥p the standard Lebesgue norm defined
1/p
by ∥f ∥p = S |f |p dµ
R
for f ∈ Lp (S).
def
For some k ≥ 1, let Ik = (0, 1)k = (0, 1)×· · ·×(0, 1) where the product contains k terms. Let
ρ denote the fractional part function given by ρ(x) = x − ⌊x⌋, for x ∈ R. Consider the following
class of functions defined on the interval I1
m  
X βi
N = {f (x) = ci ρ , x ∈ I1 : m ≥ 1, c ∈ Rm , β ∈ Im , cT β = 0}.
x
i=1

In machine learning nomenclature, N consists of single-layer neural networks with a constrained


parameter space and a specific non-linearity (or activation function) that depends on the fractional
part ρ. The parameters (c, β) belong to the set {c ∈ Rm , β ∈ (0, 1)m , cT β = 0}. The values
(ρ(βi /x))1≤i≤m act as the neurons (post-activations) in the neural network. In Fig. 1, we depict

2
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS

1.0 1.0 0
0.8 0.8 1
0.6 0.6
(0.1/x)

(0.5/x)
2

f(x)
0.4 0.4
0.2 0.2 3
0.0 0.0 4
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x x

Figure 1: (Left) The curve of the function x → ρ(α/x) for α ∈ {0, 1; 0.5}. (Right) The graph of the
function f (x) = ρ(0.7/x) − ρ(0.3/x) − 4ρ(0.1/x), which belongs to the class N .

neuron values for different choices of βi . The graphs show fluctuations when x is close to 0 which
should be expected since the function x → ρ(βi /x) fluctuates indefinitely between 0 and 1 as x
goes to zero, whenever βi ̸= 0. In figure Fig. 1 (right), we show an example of a function from the
class N given by f (x) = ρ(0.7/x) − ρ(0.3/x) − 4ρ(0.1/x). We observe that f is a step function
which might be surprising at first glance. However, it is easy to see that N consists only of step
functions. This is due to the constraint on the parameters c, β, and the fact that ρ(x) = x − ⌊x⌋.
Now, we are ready to state the main results that draw an interesting connection between RH and
the class N .

Theorem 1 (Nyman (1950)) The RH is true if and only if N is dense in L2 (I1 ).

Beurling (1955) later extended this result by showing that for any p > 1, the ζ function has no
zeroes in the set {z ∈ C : Re(z) > 1/p} if and only if the set N is dense in Lp (I1 ).

Theorem 2 (Beurling (1955)) The Riemann zeta function is free from zeros in the half plane
Re(z) > p1 , 1 < p < ∞, if and only if N is dense in Lp (I1 ).

The intuition behind this connection is rather simple. The number of fluctuations of the function
x → ρ(β/x) near 0 is closely related to the ζ function. To understand the machinery of the proofs
of Theorem 1 and Theorem 2, we provide a sketch of the proof by Beurling (1955) for the sufficient
condition in Appendix B. Using the same techniques, we derive the following result on zero-free
regions of the zeta function.

Lemma 1 (Nyman-Beurling zero-free regions) Let f ∈ N and δ = ∥1 − f ∥2 be the distance


between the constant function 1 on I1 and f . Then, the region {z ∈ C, Re(z) > 12 1 + δ 2 |z|2 } is


free of zeroes of the Riemann zeta function ζ.

The condition that N should be dense in L2 (I1 ) can be replaced by the following weaker condition:
the constant function 1 on I1 can be approximated up to an arbitrary accuracy with functions from
N . This is because from the constant function 1, one can construct an approximation of any step-
wise function, which in turn can approximate any function in L2 (I1 ).
A discussion on the empirical implications of Theorem 1 is provided in Appendix B. In the next
section, we show that the sufficient condition of Theorem 2 can be easily generalized to networks
with multi-dimensional inputs, i.e. the case d ≥ 1.

3
H AYOU

3. A sufficient condition in the multi-dimensional case


Let d ≥ 1 and consider the following class of neural networks with inputs in Id ,

d X
m  
X βi,j
Nd = {f (x) = ci,j ρ , x ∈ Id : m ≥ 1, c ∈ Rd×m , β ∈ Id×m , cT β = 0},
xj
j=1 i=1

where c = (c1,1 , c2,1 , . . . , cm,1 , . . . , cm,d )⊤ ∈ Rmd is the flattened vector of (c.,j )1≤j≤d . Notice
that we recover the Nyman-Beurling class N when d = 1. Using this class, we can generalize the
zero-free region result given by Theorem 1 to a multi-dimensional setting in the case p = 2.1

Lemma 2 (zero-free regions for general d ≥ 1) Let d ≥ 1 and f ∈ Nd . Let δ = ∥1 − f ∥2 be


the L2 distance
2
 between the constant function 1 on Id and f . Then, the region {z ∈ C, Re(z) >
1 2
2 1 + δ |z| } is free of zeroes of the Riemann zeta function ζ.
d

In Fig. 2, we depict the zero-free regions from Lemma


2. The smaller the constant δ, the larger the region. The 100 = 1e-4
= 1e-3
multi-dimensional input case (d ≥ 2) can therefore be in- 80
= 1e-2
= 1e-1
teresting if we can better approximate the constant func- Critical line
tion 1 with functions from Nd . More precisely, the result 60
Im(z)

of Lemma 2 is relevant if for some d ≥ 2, we could find


40
δ such that δ 2/d < δ1 , where δ1 is the approximation er-
ror in the one-dimensional case d = 1. In this case, the 20
zero-free region obtained with d ≥ 2 will be larger than
0
the one obtained with d = 1. We refer the reader to Sec-
0.4 0.5 0.6 0.7 0.8 0.9 1.0
tion 4 for a more-in depth discussion about the empirical Re(z)
implications of the multi-dimensional case. Notice that if
δ can be chosen arbitrarily small, then the zero-free re- Figure 2: Zero-free regions of the form
gion in Lemma 2 can be extended to the whole half-plane {Re(z) > 12 (1 + ∆|z|2 )} as
{Re(z) > 1/2}. This is a generalization of the sufficient stated in Theorem 1, Theo-
condition of Theorem 2 in the multi-dimensional case. rem 2, and Theorem 4.

Corollary 3 (Sufficient condition for d ≥ 1) Let d ≥ 1.


Assume that the class Nd is dense in L2 (Id ). Then, the
region {Re(z) > 1/2} is free of the zeroes of the Riemann zeta function ζ.

3.1. Open problem: The necessary condition for d ≥ 2


By considering the class Nd , we generalized the sufficient condition of Beurling’s criterion in the
multi-dimensional input case d ≥ 2. However, it is unclear whether a similar necessary condition
holds. Proving that RH implies
P the density of Nd in L2 (Id ) is challenging. A function f ∈ Nd
can be expressed as f (x) = di=1 fi (xi ) for x = (x1 , . . . , xd )⊤ ∈ Id , and fi are functions with
one-dimensional inputs. This special additive form of functions from Nd makes it harder to use
arguments similar to the one-dimensional case (Theorem 2) to prove density results.
1. The choice of p = 2 is arbitray, and similar result to that of Theorem 2 can be obtained for any p > 1.

4
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS

4. Discussion on the Implications and Limitations


In this section, we discuss some empirical implications of Lemma 1 and Lemma 2.
Probabilistic zero-free regions. Notice that Lemmas 1 and 2 require access to the distance ∥1 −
f ∥2 which is generally intractable. However, we can approximate this quantity using Monte Carlo
samples and obtain high probability bounds for this norm. Hence, the best we can do with such
criterion is to verify the non-existence of zeroes of ζ in some region with high probability. Indeed,
using Hoefdding’s inequality, we have the following result.

Lemma 4 Let d ≥ 1, N ≥ 1 and X1 , X2 , . . . , XN be iid uniform random variables


 onId . Let
Pd Pm β
f ∈ Nd (where for d = 1, we denote Nd = N ) such that f (x) = j=1 i=1 ci,j ρ xi,j j
for all
x ∈ Id , for some m ≥ 1, β ∈ Im×d , c ∈ Rm×d . Then, for any α ∈ (0, 1), we have with probability
def
at least 1 − α, the region RN = {Re(z) > 21 1 + ∆N (f )1/d |z|2 } is free of the zeroes of ζ, where

q
2 log(2/α)
∆N (f ) = N1 N 2 2 , with ∥c∥1 = m
P P
i=1 (1 − f (xi )) + (1 + ∥c∥1 ) N i=1 |ci |.

Proof The proof follows from a simple application


PN of Hoeffding’s concentration inequality to con-
−1 2
trol the deviations of the empirical risk N i=1 (1 − f (xi )) . Hoeffding’s lemma requires that
the random variables (1 − f (xi ))2 are bounded, which is straightforward since (1 − f (xi ))2 ≤
2(1 + f (xi )2 ) ≤ 2(1 + ∥c∥21 ) almost surely.

The result of Theorem 4 has an important implication on the choice of the sample size. Indeed, to
1/d
have the coefficient δN of order ϵ with high probability, a necessary condition is that N = O(ϵ2d ).
When is the multi-dimensional variant better than the one-dimensional criterion? For some
d ≥ 2, it is straightforward that the multi-dimensional criterion given in Lemma 2 is better than the
1/d
one given in Lemma 1 only if inf f ∈Nd ∥1 − f ∥2 < inf f ∈N ∥1 − f ∥2 . Under this condition, the
zero-free region is larger with d ≥ 2. For empirical verification of the RH and for same probability
threshold α, Lemma 4 implies that the multi-dimensional setting is better than the one-dimensional
counterpart whenever inf f ∈Nd ∆N (f )1/d < inf f ∈N ∆N (f ). We discuss the feasibility of such
conditions in the next paragraph.
What does it take to improve upon existing numerical verifications of RH? The high proba-
bility zero-free regions from Lemma 4 are of the form {Re(z) > 21 (1 + ∆|z|2 )} for some constant
∆ > 0.
Using a different analytical criterion of the RH, Platt and Trudgian (2021) showed that the region
{a + ib : a ∈ (0, 1), b ∈ (0, γ], γ ≈ 3 · 1012 } is free of the zeroes of ζ. Hence, using Lemma 4 to
improve this result requires that the region RN ∩ {a + ib, a ∈ (0, 1), b ∈ (0, γ]} contains complex
numbers z with imaginary part larger than order 1012 . Let z = a + ib ∈ C. Having z ∈ RN implies
that b2 < −a2 + ∆N (f )−1/d (2a − 1). For the region of interest where a ∈ (0, 1), and assuming that
∆N (f ) is small enough, the right-hand side is of order ∆N (f )−1/d (2a − 1) which is maximized for
a = 1 and equal to ∆N (f )−1/d . Thus, to improve upon existing work (Platt and Trudgian, 2021)
(at least with some high probability certificate), we need to have ∆N (f )−1/d of order 1012 , which
means that ∆N (fP )1/d should be at least of order 10−12 . This requires a the minimize of a the em-
−1 N 2 24
pirical risk N i=1 (1 − f (xi )) with a minimum sample size of order 10 which is unfeasible
with the current compute resources.

5
H AYOU

References
A. Beurling. A closure problem related to the Riemann Zeta-function. Proceedings of the National
Academy of Sciences of the United States of America, 41 (5):312–314, 1955.

P. Borwein. An efficient algorithm for the Riemann Zeta function. CECM-95-043, 1995.

A.W. Dudek. On the Riemann hypothesis and the difference between primes. International Journal
of Number Theory, 11(03):771–778, 2015.

H.M. Edwards. Riemann’s Zeta Function. Pure and Applied Mathematics, A Series of Monographs
and Textbooks, 1974.

G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford University Press,
1938.

B. Nyman. On the one-Dimensional translation group and semi-group in certain function spaces.
1950.

D. Platt and T. Trudgian. The Riemann hypothesis is true up to 3·1012. Bulletin of the London
Mathematical Society, 53(3):792–797, 2021.

B. Riemann. Ueber die Anzahl der Primzahlen unter einer gegebenen Grosse. Gesammelte math.
Werke und wissensch, 2:145–155, 1859.

J. Sondow. Zeros of the Alternating Zeta Function on the Line R(s) = 1. The American Mathematical
Monthly, 110(5):435–437, 2003.

E.C. Titchmarsh. The theory of the Riemann Zeta-function (2nd ed.). The Clarendon Press Oxford
University, 1986.

G. Walz. Lexikon der Mathematik. Springer Spektrum Verlag, 2017.

D.V. Widder. Laplace Transform. Princeton Mathematical Series, 1941.

6
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS

Appendix A. RH step by step

There is a strong empirical evidence that RH holds. Recent numerical verification by Platt and
Trudgian (2021) showed that RH is at least true in the region {z = a + ib ∈ C : a ∈ (0, 1), b ∈
(0, γ]} where γ = 3 · 1012 , meaning that all zeros of the zeta function with imaginary parts in
(0, γ] have a real part equal to 12 . Other theoretical insights seem to support RH. For instance,
french mathematician A. Denjoy gave the following probabilistic argument for RH (mentioned in
Edwards (1974)): if (µ(k))k≥0 is a sequence of independent Bernoulli Prandom variables with values
+1 or −1 (each with probability 1/2), then for any ε > 0, we have k≤x µ(k) = Ox→∞ (x1/2+ε )
with probability 1. This statement is closely related to RH as shown by British mathematician J.E.
LittlewoodPin 1912 (mentioned in Titchmarsh (1986)). Indeed, Littlewood showed that RH is akin
to having k≤x µ(k) = Ox→∞ (x1/2+ε ) for all ε > 0, where µ is the Möbius function, a function
with values in {−1, 0, 1} that gives the parity of the number of prime factors in the prime decompo-
sition of integers (see e.g. Hardy and Wright (1938) for details about the Möbius function). Hence,
with respect to Denjoy’s argument, RH (informally) suggests that the sequence of Möbius function
µ(x) behaves like a random walk. This provides a probabilistic interpretation of RH but it does not
constitute a proof. There are many consequences to RH, and probably the most important of these
is the distribution of prime numbers. Recent work by Dudek (2015) showed that RH implies that

for any real number x ≥ 2, there exists a prime number px such that x − π4 x log(x) ≤ p ≤ x.
Another consequence
P of RH is with regards to the growth rate of the Mertens function defined
by M (x) = k≤x µ(k) where µ is the Möbius function; RH implies that the Mertens function
satisfies M (x) = Ox→∞ (x1+ε ) for all ε > 0. Such functions are ubiquitous in number theory and
quantifying their growth rate has several applications. Another intriguing consequence of RH is
given by the Nyman-Beurling criterion (Nyman, 1950; Beurling, 1955) which states that RH holds
if and only if a special class of functions is dense in L2 (0, 1). This class consists of neural networks
with one dimensional input, and have a special parameterization.

The Riemann hypothesis conjectures that the non-trivial zeros of the Riemann zeta function are
complex numbers with a real part 12 2 . It is a long-standing open problem in number theory first
formulated by Riemann (1859). The Riemann zeta function is defined for complex numbers z with
a real part greater than 1 by


X 1
ζ(z) = , z ∈ C, Re(z) > 1. (3)
nz
n=1

The above definition of Riemann zeta function excludes the region of interest {z ∈ C : Re(z) = 21 }
since the series in Eq. (3) diverge when |z| < 1. Indeed, RH is stated for the analytic continuation
of the zeta function which is an extension of the zeta function on a larger set. As shown by Riemann,
the function ζ extends to the whole complex plane C while preserving some desirable properties
such as analyticity. This extension is called the analytic continuation, and it is unique by the Identity
theorem (Walz, 2017). Let us construct this analytic continuation of ζ step by step.

2. Trivial zeros are negative even numbers, see below for more details.

7
H AYOU

1. Extension to {z ∈ C : Re(z) > 0}. Observe that the function ζ satisfies the following
identity
∞ ∞ ∞
1−z
X 1 X 1 X (−1)n+1
(1 − 2 )ζ(z) = − 2 = ,
nz (2n)z nz
n=1 n=1 n=1
where the right hand side is defined for any complex number z such that Re(z) > 0. The
P∞ (−1)n+1
series η(z) = n=1 nz , known as the Dirichlet eta function, is defined for complex
numbers z satisfying Re(z) > 0. Using this expression, we can extend the definition of ζ to
the set {z ∈ C : Re(z) > 0, 21−z ̸= 1} by

1 X (−1)n+1
ζ(z) = . (4)
1 − 21−z nz
n=1

This definition extends the original domain of definition of ζ to all complex numbers such that
2πn
Re(z) > 0 except for those satisfying 21−z = 1, which are all of the form zn = 1 + i log(2) ,
where n is an non-zero integer. Using classical properties of the Dirichlet eta function η(z) =
P∞ (−1)n+1
n=1 nz (Borwein, 1995), also known as the alternating zeta function, the zeta function
ζ can be analytically continued to include the set {zn , n ≥ 1} (Widder, 1941; Sondow, 2003).
2. Extension to {z ∈ C : Re(z) ≤ 0}\{0}. The Dirichlet function η satisfies the following
functional equation (Borwein, 1995)
2z−1 − 1 z−1 πz
η(z) = 2 z
π z sin Γ(−z)η(1 − z), (5)
2 −1 2
where Γ is the Gamma function. Using Eq. (5), Eq. (4), and the property of the Gamma
function Γ(z + 1) = zΓ(z) for all complex numbers z ∈ C (Γ is the analytic
R ∞ continuation of
the original Gamma function defined on {z ∈ C, Re(z) > 0} by Γ(z) = 0 tz−1 e−t dt), we
obtain that for any z ∈ C such that Re(z) ∈ (0, 1),
 πz 
ζ(z) = 2z π z−1 sin Γ(1 − z)ζ(1 − z). (6)
2
Using the functional equation Eq. (2), we can extend the definition of ζ to the all the remaining
complex numbers z such that Re(z) ≤ 0 and z ̸= 0. It can further be shown that ζ can be
continued to 0 with ζ(0) = −1/2 (Borwein, 1995).

Zeros of the ζ function. From Eq. (2), the zeta function satisfies ζ(−2k) = 0 for any integer
k ≥ 1. The negative even integers {−2k}k≥1 are thus called trivial zeros of the Riemann zeta
function since the result follows from the simple fact that sin (−πk) = 0 for all integers k ≥ 1. The
other zeros of ζ are called non-trivial zeros, and their properties remain largely misunderstood. The
RH conjectures that they all lie on a the line Re(z) = 12 .
Riemann Hypothesis (RH). All non-trivial zeros of the Riemann zeta function ζ have a real part
equal to 21 .

Whether RH holds is still an open question. However, there have been a number of attempts to
prove or disprove RH in the literature. In the next section, we re-visit an old result that provides
an analytic point of view of RH.

8
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS

Appendix B. The Nyman-Beurling Criterion


Theorem 1 (Nyman (1950)). The RH is true if and only if N is dense in L2 (I1 ).

Beurling (1955) later extended this result by showing that for any p > 1, the ζ function has no
zeroes in the set {z ∈ C : Re(z) > 1/p} if and only if the set N is dense in Lp (I1 ).

Theorem 2 (Beurling (1955)) The Riemann zeta function is free from zeros in the half plane
Re(z) > p1 , 1 < p < ∞, if and only if N is dense in Lp (I1 ).

The following is a sketch of the proof by Beurling (1955). It helps understand the machinery of
the proofs of Theorem 1 and Theorem 2 for the sufficient condition.
Sketch of the proof. The connection between RH and the class N is due to the following
identity that relates the zeta function ζ to the fractional part function ρ3
Z 1  
θ θ θz ζ(z)
ρ xz−1 dx = − , ∀z ∈ C, Re(z) > 0, θ ∈ I. (7)
0 x z−1 z

Therefore, given a function f ∈ N , we obtain

ζ(z) m z−1
Z 1 P
z−1 i=1 ci βi
f (x)x dx = − , ∀z ∈ C, Re(z) > 0. (8)
0 z

Now fix p > 1 and assume that the class N is dense in Lp (I). Therefore, given ε > 0, there exists a
function f ∈ N such that ∥1 − f ∥ < ε, where 1 is the constant function on I equal to 1 everywhere.
Using Eq. (8), we obtain
m
Z 1 !
z−1 1 X
z−1
(1 − f (x))x dx = 1 + ζ(z) ci βi , ∀z ∈ C, Re(z) > 0.
0 z
i=1

Using Hölder’s inequality, we have for any z such that Re(z) > 1/p
1
∥1 − f ∥p ∥xz−1 ∥q < ε ,
(q(Re(z) − 1/p))1/q

where q > 1 satisfies 1/p + 1/q = 1. This yields,


m
X εq |z|q
|1 − ζ(z) ci βiz |q < .
q(Re(z) − 1/p)
i=1

Hence, in the region {z ∈ C, Re(z) > 1/p + εq |z|q /q}, ζ(z) cannot be equal to zero. Since ε is
arbitrarily chosen, we conclude that ζ(z) ̸= 0 if Re(z) > 1/p.
For the necessary condition, We invite the reader to check Beurling (1955).
The identity (7) is the nub of the proof above. It provides an integral representation of the zeta
function ζ in terms of the fractional part function ρ. Hence, one would expect that some properties
3. After multiple attempts, we could not find the original paper where this identity has first appeared. However, it has
been mentioned in different works, e.g. (Nyman, 1950; Beurling, 1955).

9
H AYOU

of zeta function should in-principle be reflected on some function classes involving the function ρ.
This is precisely the idea behind Theorems 1 and 2. More importantly, from the analysis above, we
have the following analytic criterion for zero-free regions of the zeta function.
Lemma 1 (Nyman-Beurling zero-free regions) Let f ∈ N and δ = ∥1 − f ∥2 be the distance
between the constant function 1 on I and f . Then, the region {z ∈ C, Re(z) > 12 1 + δ 2 |z|2 } is


free of zeroes of the Riemann zeta function ζ.


Existing empirical verification studies of the RH use different analytic criteria to locate the
zeroes. To the best of our knowledge, the most recent verification study was conducted by Platt and
Trudgian (2021) where the authors have found that in the region {a + ib : a ∈ (0, 1), b ∈ (0, γ], γ ≈
3 · 1012 }, the RH is satisfied, meaning that all the zeroes of ζ are on the line Re(z) = 1/2. Is it
possible to use Lemma 1 to beat this record? first, notice that unless the function f is simple, the
norm ∥1 − f ∥2 is generally intractable, an can only be approximated with Monte-Carlo sampling.
See Section 4 for more details.
The result of Lemma 1 (and that of Theorem 2) is stated for the function class N which con-
sists of a special neural architecture with one-dimensional inputs. Can we generalize this to multi-
dimensional inputs? In the next section, we answer this question positively by introducing a gener-
alized class of neural networks with multi-dimensional inputs.

Appendix C. A sufficient condition in the multi-dimensional case


Let d ≥ 1 and consider the following class of neural networks with inputs in Id ,

d X
m  
X βi,j
Nd = {f (x) = ci,j ρ , x ∈ I d : m ≥ 1, c ∈ Rd×m , β ∈ Id×m , cT β = 0},
xj
j=1 i=1

where c = (c1,1 , c2,1 , . . . , cm,1 , . . . , cm,d ) ∈ Rm×d is the flattened vector of (c.,j )1≤j≤d . Notice that
we recover the Nyman-Beurling class when d = 1. Using this class, we can generalize the zero-free
region result given by Theorem 1 to a multi-dimensional setting.

Lemma 2 [zero-free regions for general d ≥ 1]


Let d ≥ 1 and f ∈ Nd . Let δ = ∥1 − f ∥2 be the distance between the constant function 1 on Id
2
1 2
and f . Then, the region {z ∈ C, Re(z) > 2 1 + δ |z| } is free of zeroes of the Riemann zeta
d

function ζ.
R1 z
Proof Let d ≥ 1 and f ∈ Nd . Using Eq. (7), namely 0 ρ αx xz−1 dx = z−1 α
− α ζ(z)

z for
α ∈ (0, 1), Re(z) > 1/2, we have that

d d X
m z ζ(z)
βi,j
Z  
Y
−d+1
X βi,j
f (x) xz−1
j dx =z ci,j −
Id z−1 z
j=1 j=1 i=1
d X
X m
−d z
= −z ci,j βi,j ζ(z),
j=1 i=1

where we have used the fact that cT β = 0.

10
R IEMANN H YPOTHESIS AND N EURAL N ETWORKS

Therefore, we have
Z d
Y d X
X m
(1 − f (x)) xz−1
j dx = z −d (1 + ζ(z) z
ci,j βi,j ). (9)
Id j=1 j=1 i=1

Using Cauchy-Schwartz inequality, we obtain


2
d X
m
X
z |z|2d
1 + ζ(z) ci,j βi,j ≤ ∥1 − f ∥2 ,
(2Re(z) − 1)d
j=1 i=1

Qd 2
z−1
where we have used the fact that j=1 xj = (2Re(z) − 1)−d . Therefore, for all complex
2
2/d 2
numbers z satisfying Re(z) > 12 + δ 2|z| , we have ζ(z) ̸= 0. This is true since we can choose f
such that ∥1 − f ∥ is arbitrarily close to δ.

Notice that if δ can be chosen arbitrarily small, then the zero-free region in Theorem 2 can be
extended to the whole half-plane {Re(z) > 1/2}. This is a generalization of the sufficient condition
of Theorem 2 in the multi-dimensional case.

11

You might also like