Professional Documents
Culture Documents
(Discrete Mathematics and Its Applications) Moreno, Carlos J. - Wagstaff, Jr. Samuel S - Sums of Squares of Integers-CRC Press (2005)
(Discrete Mathematics and Its Applications) Moreno, Carlos J. - Wagstaff, Jr. Samuel S - Sums of Squares of Integers-CRC Press (2005)
MATHEMATICS
and
ITS APPLICATIONS Series Editor
Kenneth H. Rosen, Ph.D.
Sums of Squares
of Integers
Carlos J. Moreno
Samuel S. Wagstaff, Jr.
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
This work has its origins in courses taught by the authors many years ago,
when we were both in the same math department with Joe Doob, hiked with
him every Saturday afternoon and learned his system of writing mathematics.
This book is one we wished we could have read when we were graduate
students. It is an introduction to a vast and exciting area of number theory
where research continues today.
The first chapter is a general introduction, which gives the contents of the
other chapters in greater detail. It also lists the prerequisites assumed for the
reader.
The second chapter treats elementary methods having their origins in a
series of eighteen papers by Liouville. Formulas are proved for the number of
representations of an integer as a sum of two, three or four squares.
Chapter 3 develops useful properties of the Bernoulli numbers. Many are
interesting in their own right to number theorists. Some properties and values
are needed for Chapter 6.
Chapter 4 discusses the central themes of the modern theory of modular
forms. It does this through several large examples from the literature.
v
vi Sums of Squares of Integers
2 Elementary Methods 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Some Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Two Fundamental Identities . . . . . . . . . . . . . . . . . . . 14
2.4 Euler’s Recurrence for σ(n) . . . . . . . . . . . . . . . . . . . 18
2.5 More Identities . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Sums of Two Squares . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Sums of Four Squares . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Still More Identities . . . . . . . . . . . . . . . . . . . . . . . 34
2.9 Sums of Three Squares . . . . . . . . . . . . . . . . . . . . . . 38
2.10 An Alternate Method . . . . . . . . . . . . . . . . . . . . . . 43
2.11 Sums of Polygonal Numbers . . . . . . . . . . . . . . . . . . . 54
2.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Bernoulli Numbers 61
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Definition of the Bernoulli Numbers . . . . . . . . . . . . . . 62
3.3 The Euler-MacLaurin Sum Formula . . . . . . . . . . . . . . 65
3.4 The Riemann Zeta Function . . . . . . . . . . . . . . . . . . . 73
3.4.1 Functional Equation for the Zeta Function . . . . . . . 77
3.4.2 Functional Equation for the Dirichlet L-functions . . . 83
3.4.3 Generalized Bernoulli Numbers . . . . . . . . . . . . . 89
3.5 Signs of Bernoulli Numbers Alternate . . . . . . . . . . . . . 90
3.6 The von Staudt-Clausen Theorem . . . . . . . . . . . . . . . 92
ix
x Contents
8 Applications 317
8.1 Factoring Integers . . . . . . . . . . . . . . . . . . . . . . . . 317
8.2 Computing Sums of Two Squares . . . . . . . . . . . . . . . . 320
8.3 Computing Sums of Three Squares . . . . . . . . . . . . . . . 325
8.4 Computing Sums of Four Squares . . . . . . . . . . . . . . . . 327
8.5 Computing rs (n) . . . . . . . . . . . . . . . . . . . . . . . . . 329
8.6 Resonant Cavities . . . . . . . . . . . . . . . . . . . . . . . . 330
8.7 Diamond Cutting . . . . . . . . . . . . . . . . . . . . . . . . . 334
8.8 Cryptanalysis of a Signature Scheme . . . . . . . . . . . . . . 337
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
References 343
Index 350
Chapter 1
Introduction
The purpose of this introductory chapter is twofold: to list the prerequisites
the reader needs to understand the book and to tell what material is in the
rest of the book.
1.1 Prerequisites
It is assumed that the reader is familiar with the basic properties of divisibil-
ity, congruences, prime numbers and multiplicative functions. After Chapter
2, we assume the reader is familiar with the Legendre-Jacobi symbol (a/p).
Occasionally we will mention related results and give alternate proofs that re-
quire a little algebra or analysis, but these may be omitted without loss by the
reader having only minimal prerequisites. The books by Hardy and Wright
[52], Niven, Zuckerman and Montgomery [79] and Uspensky and Heaslet [113]
are good general references for the entire book.
We use these conventions throughout the book. A “mod” in parentheses
denotes the congruence relation: a ≡ b (mod m) means that the integer m
exactly divides b − a. A “mod” not in parentheses is the binary operation
that gives the least nonnegative remainder: a mod m = a − a/mm is the
integer b in the interval 0 ≤ b < m for which a ≡ b (mod m). And x, which
we just used, is the greatest integer ≤ x. We write gcd(m, n) for the greatest
def
common divisor of m and n. When we write “A = B,” we are defining A
to be B. This implies that the reader should already know the meaning of
B. We use (z) and (z) for the real and imaginary parts, respectively, of
the complex number z. We use N, Z, Q, R and C to denote the natural
numbers, the rational integers, the rational numbers, the real numbers and
the complex numbers, respectively. We write Fq for the Galois (finite) field
with q elements. The notation |A| means the cardinality of A if A is a set,
and the absolute value of A if A is a number. The meaning will always be
clear from the context. In Chapter 6 we use the notation ordp (n), where p is
a prime and n is a nonzero integer, to mean the largest integer e for which pe
1
2 Sums of Squares of Integers
divides n.
Chapters 3-6 require complex analysis at the level of Ahlfors [2]. In some
parts of the book, the reader is assumed to know a bit of group theory or
linear algebra, mostly about the ring SL(2, R) of 2 × 2 matrices.
Here and there throughout the book we mention advanced results, like the
Riemann-Roch theorem and the Riemann hypothesis for function fields, which
are not proved, and which are not used in our proofs. Such references indicate
possible directions for further study. We give many references to the literature
as suggestions to the reader about what to read after this book. But the reader
can understand virtually all of this book with only the limited background
just described.
For the algorithms in Chapter 8, the reader is also assumed to know about
the Euclidean algorithm and simple continued fractions. For complete under-
standing of the applications at the end of Chapter 8, the reader needs a first
course in physics and a first course in cryptography.
Each chapter concludes with a collection of exercises to test the reader’s un-
derstanding of the material. A few exercises, especially in Chapter 8, require
a computer for their solution.
1.2.1 Chapter 2
Chapter 2 is the simplest chapter of the book. In a sense, it assumes nothing
more than high school mathematics. If you understand even and odd numbers
and proof by mathematical induction, then you can understand this chapter.
However, parts of it are a very sophisticated application of even and odd
numbers, and it really is too hard for high school students.
In Chapter 2 we prove formulas for the number of ways of writing a positive
integer as a sum of two, of three, and of four squares of integers in Theorems
2.5, 2.7 and 2.6, respectively. The main result of the chapter is Gauss’ formula
for the number r3 (n) of representations of an integer n as the sum of three
squares of integers. From this formula it is clear that this number is positive
precisely when n ≥ 0 and n is not of the form 4a (8b + 7).
The triangular numbers are the numbers
1, 1 + 2, 1 + 2 + 3, 1 + 2 + 3 + 4, 1 + 2 + 3 + 4 + 5, . . . ,
Introduction 3
that is, numbers of the form k(k + 1)/2. A corollary of Gauss’ theorem, which
we prove, is that every positive integer is the sum of at most three triangular
numbers. More generally, we will define polygonal numbers and show that
every positive integer is the sum of r r-gonal numbers, for each r ≥ 3. Our
proof of the polygonal number theorem is based on Bachmann [5].
Much of this chapter is based on Chapter XIII of Uspensky and Heaslet
[113], which in turn goes back to Liouville’s series of eighteen papers, “Sur
quelques formules générales qui peuvent être utiles dans la théorie des nom-
bres,” Journal de Mathematiques, series (2), 1858–1865. See Dickson [24],
Volume 2, Chapter XI for a summary of these papers. Bachmann [6], Part
II, Chapter 8, deals with the same material. Part of the connection between
this work and elliptic functions is given in Smith [104]. See Hardy [50] and
Rademacher [87] for the standard applications of elliptic functions to sums of
squares. Grosswald [42] is another excellent reference for the theory of sums
of squares.
Joseph Liouville derived his results on the number of representations of an
integer as the sum of a certain number of squares, as well as recurrence formu-
las for the sum of divisors function σ(n), from two “Fundamental Identities,”
which we prove as Theorems 2.1 and 2.2. He was led to these completely el-
ementary formulas from deep investigations in the theory of elliptic modular
forms. We explain their origin in Jacobi’s work in Section 6.7. The exer-
cises at the end of Chapter 2 give many more examples of consequences of
Liouville’s fundamental identities.
1.2.2 Chapter 3
Bernoulli numbers are certain rational numbers that arise in many places in
number theory, analysis and combinatorics. In Chapter 3 we prove a selec-
tion of important theorems about them, with emphasis on number theory.
For example, we prove the von Staudt-Clausen theorem that determines the
denominators of the Bernoulli numbers. We use this theorem also to investi-
gate the distribution of the fractional parts of the Bernoulli numbers, which
is quite unusual. The graph of the distribution function appears on the cover
of this book. Other consequences of the von Staudt-Clausen theorem include
J. C. Adams’ theorem, the congruences of Voronoi and Kummer, and other
congruences which may be used to compute the Bernoulli numbers modulo a
prime.
Fermat’s Last Theorem is the statement that the equation
xn + y n = z n
1.2.3 Chapter 4
Chapter 4 is an introduction to three of the central themes of the modern
theory of modular forms:
(i) A reciprocity law is an equality between two objects: on one side there
is the solution to a diophantine problem, e.g., counting the number of
solutions of polynomial equations modulo primes, while on the other
side there appears a formula containing spectral information about cer-
tain types of arithmetic operators related to the operation of raising a
quantity to a p-th power, e.g., the so-called Hecke operators.
(iii) The functional equations satisfied by certain Dirichlet series, when viewed
from the point of view of their Mellin transforms, serve to characterize
the modular behavior of the latter.
A = {x ∈ Fp : x4 − 2x2 + 2 = 0}
The second theme, that of the connection between the Fourier coefficients of
modular forms and the eigenvalues of certain Hecke operators is developed
along the lines pioneered by Mordell in his seminal paper on the proof of the
multiplicative property of the Ramanujan tau function τ (n). The main result
is Theorem 4.5, the derivation of which will take us on a tour of the basic
theory of modular forms, and will touch on the elementary theory of Hecke
operators, the modular group, and fundamental domains.
Our discussion of the last theme in Chapter 4 is carried out by the develop-
ment of the important theorem of Hamburger, in which the uniqueness of the
Riemann zeta function ζ(s) is determined, under some natural conditions, by
the type of functional equation it satisfies as a function of the complex vari-
able s. The principal result is Theorem 4.9. The method used in the proof of
this theorem can be thought of as the starting point of the theory of modular
forms as developed by Hecke.
1.2.4 Chapter 5
Chapter 5 is an introduction to the basic theory of modular forms on the mod-
ular group Γ and its associated subgroup Γ0 (N ). We build on the elementary
methods already discussed in Chapter 4. The main themes are:
(i) The finite dimensionality of the space of holomorphic modular forms of
weight k on the modular group Γ and on Γ0 (N ).
(ii) The normality of the family of Hecke operators, a property which is es-
sential in the proof of the existence of an orthonormal basis with respect
to the Petersson inner product.
6 Sums of Squares of Integers
(iii) The relation between the eigenfunction property of a modular form and
the existence of an Euler product representation for the Dirichlet series
constructed with the Fourier coefficients of a modular form.
(iv) The elementary theory of Poincaré series, a theory which provides a
constructive way of building classical cuspidal modular forms.
itself a collection of orbits of points in the upper half plane under the action
of the group of linear fractional transformations Γ0 (N ), as a finite topological
covering of the Riemann sphere P1 (C).
In developing the topic (ii), the essential property of the Hecke operators
that comes into play is their commutativity, a property that results read-
ily from the definition of these operators as modular correspondences whose
structure becomes quite explicit when expressed in terms of the Fourier ex-
pansions of modular forms. Of course, the central idea here is the existence
of the Petersson inner product, which among other things establishes that
the finite dimensional vector space of cusp forms is a complete Hilbert space
endowed with a commuting family of arithmetically defined operators.
As already indicated in Chapter 4, using the Dirichlet series associated
to the Ramanujan modular form, the Fourier coefficients of a simultaneous
cuspidal eigenform of the Hecke operators lead to Dirichlet series possessing an
Euler product whose local factors are quadratic polynomials in the expression
T = p−s . When these Euler products are completed with suitable gamma
factors, the local factor corresponding to the ordinary absolute value on Q,
the resulting function of the complex variable s is holomorphic and satisfies a
functional equation that relates its value at s with that at k − s.
The treatment we give of the topic in (iv) takes place in the context of the
elementary theory of Poincaré series with respect to the full modular group
Γ. The main and most significant result established provides the Fourier
Introduction 7
an object with deep roots in the applications of harmonic analysis to the study
of the additive as well as the multiplicative properties of the integers. Some
of the important properties of these sums are developed in the exercises.
1.2.5 Chapter 6
The crowning achievement of the modern theory of modular forms as it re-
gards the problem of representing integers as sums of squares is the clear
understanding of the role played by the theta series
2
ϑ(z) = q n , where q = e2πiz ,
n∈Z
where ρs (n) is the so-called singular series, an Euler product constructed with
local data, and whose value can be made explicit as a quotient of two classical
Dirichlet L-functions. The latter can in turn be evaluated using generalized
Bernoulli numbers.
For s > 8, ϑ(z)s is no longer an Eisenstein series. In fact, the difference
ϑ(z)s − Es/2 (z) is a cusp form of weight s/2. When s is even, the classical
theory provides a representation for the difference ϑ(z)s − Es/2 (z) as a sum
of a finite number of cusp forms, all eigenfunctions of the Hecke operators.
The comparison of the Fourier coefficients leads to an explicit formula for
rs (n) − ρs (n) as a sum of multiplicative arithmetic functions closely related
to the eigenvalues of the Hecke operators. We make this connection explicit
in this chapter and provide many examples.
The case when s is odd is remarkably more difficult and at the same time
more interesting. In this case ϑ(z)s is a modular form of half-integral weight,
and the modular transformation law
s
az + b ab
ϑ = Js (σ, z)ϑ(z)s , where σ= ,
cz + d cd
rs (tn2 ) − ρs (tn2 )
1.2.6 Chapter 7
Chapter 7 contains a selection of results from additive number theory, some
of which are related to sums of squares, the main subject of this book. In
1973, E. Szemerédi [109] proved a long-standing conjecture which says that if
the length of the arithmetic progressions in a sequence of positive integers is
bounded above, then the sequence must be very thin (have zero density). The
original motivation for this conjecture was to understand better the theorem
of van der Waerden, which says that if the set of positive integers is partitioned
Introduction 9
into a finite number of subsets, then one subset contains arbitrarily long arith-
metic progressions. The thinking was that clearly one of the subsets in the
partition must be not too thin (i.e., must have positive density), and there-
fore that one probably would have arbitrarily long arithmetic progressions.
Roth [92] made the first dent in the conjecture when he showed that every
sequence with positive density must contain infinitely many arithmetic pro-
gressions of three terms. We prove Roth’s theorem, Szemerédi’s theorem and
some other results about arithmetic progressions such as van der Waerden’s
theorem. The latter is Khinchin’s [62] first pearl. The other theorems about
arithmetic progressions have not appeared in books, but some of them are
quoted in Ostmann [81]. Halberstam and Roth [45] is another good general
reference for additive number theory.
In the final section of Chapter 7, we determine all arithmetic progressions
of three squares. This is equivalent to finding all ways of expressing twice a
square as the sum of two squares. An exercise treats the case of four squares
in arithmetic progression.
1.2.7 Chapter 8
In Chapter 8 we present a more practical side to complement the theory of
the earlier chapters. We tell how to find a representation of n as a sum of
s squares when such a representation exists. Finding the representations of
n as the sum of two squares (when there are such representations) turns out
to be equivalent to factoring n, and we show that equivalence in the first
two sections of this chapter. Perhaps surprisingly, it is easy to compute the
representations of a large positive integer as the sum of three or more squares
(when such representations exist) without knowing its prime factorization,
and we tell how this may be done in the next two sections. We consider when
one may represent a positive integer as the sum of a certain number of positive
squares. (In the rest of the book, we allow 0 as a square summand.) Then
we tell how to compute the number rs (n) of representations of n as a sum of
s squares for any s and n for which this is possible with reasonable effort.
We end this chapter and book with three little-known applications having
no apparent connection to sums of squares or even to number theory. We show
that the number of ways to write a positive integer as the sum of three positive
squares determines the number of eigenfrequencies for microwave radiation in
a cube-shaped resonant cavity. We use the structure of crystals to explain
why the number of facets on a round brilliant-cut diamond might be related
to the total number of ways of writing the integers 1, 2, 3, 4 and 5 as the sum
of three squares. Finally, we tell how to compromise one variation of the RSA
signature scheme, which is widely used in Internet commerce, by constructing
a bogus valid signature for a message not actually signed by the alleged signer.
The attacker does this by writing a certain number as the sum of two squares
in two different ways.
Chapter 2
Elementary Methods
2.1 Introduction
For positive integers k and n, let rk (n) denote the number of ways n can be
expressed as the sum of exactly k squares of integers, counting all permuta-
tions and changes of sign as different representations. Let rk (0) = 1. We have
r3 (6) = 24, for example, because
For k = 2, 3 and 4 we will find simple formulas for rk (n). For k = 4 we will
11
12 Sums of Squares of Integers
The σk are multiplicative functions, that is, σk (mn) = σk (m)σk (n) whenever
m and n are relatively prime. For k = 2 and 4 we will prove that the functions
fk (n) = rk (n)/rk (1) are multiplicative. This function is multiplicative also
for k = 1 (obvious) and k = 8 (not obvious) and for no other positive integer
k.
Finally, we shall prove a theorem about representing a positive integer as a
sum of polygonal numbers. The partial sums of the arithmetic progressions
1 2 3 4 ... n ...
1 3 5 7 ... 2n − 1 ...
1 4 7 10 ... 3n − 1 ...
1 5 9 13 ... 4n − 1 ...
...
are called the triangular, square, pentagonal, hexagonal, etc., numbers. In
(r)
general, the n-th r-gonal number pn is the sum of the first n terms of the
arithmetic progression with first term 1 and common difference r − 2. Thus,
n n(n − 1)
p(r)
n = (2 + (n − 1)(r − 2)) = n + (r − 2) .
2 2
Since we make the convention that the empty sum is zero, it is natural to
(r)
define p0 = 0 for each r > 2. Figure 2.1 illustrates the reason for this
geometric terminology.
We will prove that, for each r ≥ 3, every positive integer is the sum of r
r-gonal numbers. In case r ≥ 5, all but four of the r-gonal numbers can be 0
Elementary Methods 13
LEMMA 2.1
If n is an odd positive integer, then n is a square if and only if σ(n) is odd.
LEMMA 2.2
Every square is congruent to 0 or 1 modulo 4. Every sum of two squares
is congruent to 0, 1 or 2 modulo 4. Every square is congruent to 0, 1 or 4
modulo 8. Every sum of two squares is congruent to 0, 1, 2, 4 or 5 modulo 8.
Every sum of three squares is congruent to 0, 1, 2, 3, 4, 5 or 6 modulo 8.
LEMMA 2.3
For every nonnegative integer n, we have r3 (4n) = r3 (n) and r2 (2n) = r2 (n).
14 Sums of Squares of Integers
PROOF If 4n = x21 + x22 + x23 , then the integers xi must all be even
by the proof of Lemma 2.2. Thus, the equations xi = 2yi give a one-to-
one correspondence between solutions of 4n = x21 + x22 + x23 and solutions of
n = y12 + y22 + y32 .
If 2n = x21 + x22 , then the integers x1 and x2 must have the same parity
(both are even or both are odd). Hence, the equations
x1 + x2 x1 − x2
y1 = y2 =
2 2
x1 = y1 + y2 x2 = y1 − y2
LEMMA 2.4
If n = 4a (8b + 7) for some nonnegative integers a and b, then n is not the
sum of three squares of integers.
integers while the unsquared variables arepositive integers. When the condi-
tions are more complicated we will write (a) f (d) and list the conditions as
“(a) n = i2 + dδ, d and δ odd” on a separate line, for example.
The two fundamental identities are formulas relating sums of values of a
function F (x, y, z) over complicated sets of argument values. Let n be a
positive integer and λ and µ be two real numbers, such that λ > |µ|. Consider
partitions of the following types:
(a) n = λi2 + µi + (λδ + µ)d
(b) n = λi2 − µi + (λδ − µ)d
(c) n = λh2 + µh + λ∆∆
2
∆ + ∆ ∆ − ∆
(d) n=λ +µ
2 2
(e) n = λs2 + µs.
In all five partitions, n, λ and µ are fixed and the other letters are variables.
In (a) and (b), i is an integer and d and δ are positive integers. In (c), h is
an integer and ∆ and ∆ are positive integers. In (d), ∆ and ∆ are positive
integers of the same parity. In (e), s is an integer. The conditions on λ and
µ insure that the number of sets of parameter values is finite for each n. At
most two integers satisfy (e) because it is a quadratic equation in s. Also,
s = 0 because n is a positive integer.
For any function F of three variables and fixed n, λ and µ, let
S= F (δ − 2i, d + i, 2d + 2i − δ)
(a)
S = F (δ − 2i, d + i, 2d + 2i − δ)
(b)
W = F (∆ + ∆ , h, ∆ − ∆ )
(c)
∆ − ∆
U = F ∆ + ∆ , , ∆ − ∆
2
(d)
2|s|−1
T = F (2|s| − j, |s|, 2|s| − j).
(e) j=1
in terms of the primed variables: i = δ −i , d = 2i +d −δ and δ = δ . The six
equations relating i, d, δ and i , d , δ determine a one-to-one correspondence
between solutions (i, d, δ) and (i , d , δ ) of (a) with 2i + d − δ > 0. Hence S1
is unchanged when (i, d, δ) is replaced by (i , d , δ ). Thus,
S1 = F (δ − 2i, d + i, 2d + 2i − δ)
= F (δ − 2i , d + i , 2d + 2i − δ )
= F (δ − 2(δ − i), (2i + d − δ) + (δ − i), 2(2i + d − δ) + 2(δ − i) − δ)
= F (−(δ − 2i), d + i, 2d + 2i − δ)
= −S1
S2 = F (δ − 2i, d + i, 2d + 2i − δ)
(a)
2i+d−δ=0
2s−1
= F (2s − j, s, 2s − j).
(e) j=1
s>0
have
2s−1
S2 = F (2s − j, s, 2s − j)
n=λs2 −µs j=1
s>0
2|s|−1
= F (2|s| − j, |s|, 2|s| − j).
(e) j=1
s<0
S3 = F (δ − 2i, d + i, 2d + 2i − δ)
(a)
2i+d−δ<0
= F (∆ + ∆ , h, ∆ − ∆ ).
(c)
∆ −∆+2h>0
Similarly,
S3 = F (∆ + ∆ , h, ∆ − ∆ ).
n=λh2 −µh+λ∆∆
∆ −∆+2h>0
S3 = F (∆ + ∆ , −h, ∆ − ∆)
(c)
∆ −∆+2h<0
= F (∆ + ∆ , h, ∆ − ∆ )
(c)
∆ −∆+2h<0
and
2s−1
T2 = F (2s − j, s, 2s − j).
n=λs −µs
2 j=1
s is a positive integer
PROOF Split S and S into three parts as in the proof of Theorem 2.1.
Since F (x, y, z) is an odd function of x we have S1 = S1 = 0. Also, S2 = T1
and S2 = T2 . Since F (x, y, z) is an odd function of y, z, it follows that S3 − S3
differs from W only by terms in which ∆ − ∆ + 2h = 0, that is, by U .
s(s − 1) 3s2 − s
s + (5 − 2) =
2 2
for nonnegative integers s. In the present section only, it will be convenient
to allow s to be negative and say that the pentagonal numbers are all the
numbers (3s2 + s)/2 for any integer s. Note that distinct integers s produce
distinct pentagonal numbers.
Most of the identities in this chapter follow from the First Fundamental
Identity. We will make exactly one application of the Second Fundamental
Identity. The function
0 if x or z is even
F (x, y, z) =
(−1)(x+z)/2+y if x and z are both odd
Thus,
0 if n is not pentagonal
T1 − T2 =
(−1)s−1 s if n = (3s2 + s)/2 for some integer s.
3i2 + i 3δ + 1
(a) n= + d δ odd
2 2
3i2 − i 3δ − 1
(b) n= + d δ odd
2 2
2
3h + h 3
(c) n = + ∆∆ ∆ ≡ ∆ (mod 2).
2 2
The sum on (c) vanishes because the terms (−1)h+∆ and (−1)h+∆ cancel,
since ∆ and ∆ have opposite parity. They must have opposite parity since
F (x, y, z) = 0 when x or z is even, and here x = ∆ + ∆ and z = ∆ − ∆ . We
have proved
0 if n is not pentagonal
(−1)i − (−1)i = (2.1)
(−1)s−1 s if n = (3s2 + s)/2.
(a) (b)
20 Sums of Squares of Integers
ω(m) = 1− 1.
d|m d|m
d≡1 (mod 3) d≡2 (mod 3)
3i2 + i
(a1 ) n= + (3k − 1)d k > 0, d > 0
2
3i2 + i
(b1 ) n= + (3k + 1)d k ≥ 0, d > 0
2
Recall that n is a fixed positive integer. For each fixed i, the sum (a1 ) 1
counts
the number of divisors of n − (3i 2
+ i)/2 with the form 3k − 1 and the
sum (b ) 1 counts the number of divisors of n − (3i2 + i)/2 with the form
1
3k + 1. Hence, for fixed n and i,
We now head towards a recursion formula for σ(n). In the First Funda-
mental Identity, let n be a positive integer, λ = 3/2, µ = 1/2 and
0 if x or z is even
F (x, y, z) =
(−1)(x+z)/2+y (2y − z) if x and z are odd
Elementary Methods 21
Adding 3/2 times Equation (2.2) to 1/2 times Equation (2.1) gives
3δ + 1 3δ − 1 3
(−1)i + (−1)i = (−1)h+∆ (∆ − ∆) + V, (2.3)
2 2 2
(a) (b) (c)
where
0 if n is not pentagonal
V =
(−1)s−1 n if n = (3s2 + s)/2 for some integer s.
3i2 + i
(a) n= + (3k − 1)d k > 0, d > 0
2
2
3i + i
(b) n= + (3k + 1)d k ≥ 0, d > 0
2
For j = 0, 1, 2, let σ(j) (m) = d, where the sum is taken over all positive
integer divisors d of m with d ≡ j (mod 3).
Thus, σ(0) (m) + σ(1) (m) + σ(2) (m) = σ(m),
3δ + 1 3i2 + i
(−1)i = (−1)i σ(2) n −
2 i
2
(a)
and
3δ − 1 3i2 + i
(−1)i = (−1)i σ(1) n − ,
2 i
2
(a)
22 Sums of Squares of Integers
where the sums on the right sides run over all integers i for which (3i2 +i)/2 <
n. We will prove that
def 3 3h2 + h
W = (−1)h+∆
(∆ − ∆) = − (−1) σ(0) n −
h
,
2 2
(c) h
where the sum on the right side runs over all integers h for which (3h2 +h)/2 <
n. Substituting the last three equations into Equation (2.3) gives Euler’s
recurrence formula for σ(m):
PROOF We have
3
W = (−1)h (−1)∆ (∆ − ∆).
2 2 ∆,∆ >0,∆≡∆ (mod 2)
(3h +h)/2<n
n−(3h2 +h)/2=(3/2)∆∆
For fixed h, if n−(3h2 +h)/2 is not divisible by 3, then the inner sum is zero be-
cause there are no appropriate ∆, ∆ . Otherwise, write (2/3) n − (3h2 + h)/2 =
∆∆ = 2α+1 M , where M is odd and α ≥ 0. (At least one 2 must divide ∆∆
because ∆ and ∆ have opposite parity.) The inner sum becomes
(−1)∆ (∆ − ∆) = 2 (∆ − ∆)
∆∆ =2α+1 M, ∆,∆ >0 ∆∆ =2α+1 M, ∆,∆ >0
∆≡∆ (mod 2) ∆ even, ∆ odd
=2 ∆ − 2 2α+1 β (∆ = 2α+1 β)
∆ |M β|M
= −2 2α+1 − 1 σ(M )
= −2σ(2α M )
2
3h + h
= −2σ n− 3
2
2 3h2 + h
= − σ(0) n − .
3 2
Therefore,
3h2 + h
W =− (−1) σ(0) n −
h
.
2
(3h2 +h)/2<n
Elementary Methods 23
or
where the term σ(n − n), if it appears, has the value n. There are approxi-
mately 2 2n 3 terms in this formula, which makes it useful in preparing tables
of σ(n) for n up to a few thousand.
(a) n = i2 + δd
(b) n = i2 + δd
(c) n = h2 + ∆∆ ,
2 F (δ − 2i, d + i, 2d + 2i − δ) = (2.4)
(a)
F (∆ + ∆ , h, ∆ − ∆ ) + T − U,
(c)
and
2s−1
U= F (2s, j − s, 2j − 2s).
j=1
Since F is an odd function of y, the terms for (h, ∆, ∆ ) and (−h, ∆, ∆ ) cancel
in the sum on (c); hence that sum vanishes. Also, U = 0 because x = 2s is
even, and T = 0 unless n = s2 for some positive integer s, in which case
2s−1
T =2 (−1)j+s f (s) = 2(−1)s−1 sf (s).
j=1
j odd
we get
∆ + ∆
2 f (δ − i) = f + T − U.
2
(e) (f)
Elementary Methods 25
The five identities (2.4), (2.5), (2.6), (2.7) and (2.8) will be used to prove
formulas for the number of ways of representing n as a sum of two, three and
four squares.
26 Sums of Squares of Integers
The function χ(d) is totally multiplicative, that is, χ(de) = χ(d)χ(e) for all
integers d and e.
Define T (0) = 1/4 and T (n) = d|n χ(d), where the sum is over the positive
divisors d of n. Then T is multiplicative, that is, T (mn) = T (m)T (n) for all
relatively prime positive integers m and n. We have
T (n) = (−1)(d−1)/2 = 1− 1.
odd d|n d|n d|n
d≡1 (mod 4) d≡3 (mod 4)
where the pi ≡ 1 (mod 4), the qj ≡ 3 (mod 4) and the exponents are nonneg-
ative integers.
Elementary Methods 27
R= r2 (n − 4h2 ) and S = r2 (n − i2 ).
4h2 <n i2 ≤n, i odd
2 r2 (n − i2 ) = r2 (n − 4h2 ).
i2 ≤n, i odd 4h2 <n
A simple induction on n shows that r2 (n) = 4T (n) for all n ≥ 0, and we have
proved the following theorem, due to Gauss. Much earlier, Fermat had proved
Corollary 2.1 and the fact that n is not the sum of two squares if any prime
q ≡ 3 (mod 4) divides n to an odd power.
and ⎛ ⎞
⎜ ⎟
r2 (n) = 4 ⎜
⎝ 1 − 1⎟
⎠.
d|n,d>0 d|n,d>0
d≡1 (mod 4) d≡3 (mod 4)
28 Sums of Squares of Integers
COROLLARY 2.1
If p ≡ 1 (mod 4) is prime, then there are unique positive integers x and y,
such that x is odd, y is even and p = x2 + y 2 .
where the ± signs are independent, give all eight ways to write p as the sum
of two squares, so x and y are unique.
COROLLARY 2.2
The number of divisors d congruent to 1 modulo 4 of a positive integer n is
at least as great as the number of divisors d congruent to 3 modulo 4.
PROOF This is clear from the second displayed formula in the theorem
and the fact r2 (n) ≥ 0.
LEMMA 2.5
If n is a positive integer, then
3r4 (n) if n is odd
r4 (2n) =
r4 (n) if n is even.
2n = x2 + y 2 + z 2 + w2 (2.10)
the numbers x, y, z, w are either all even or all odd. Therefore, the integers
x+y x−y z+w z−w
a= , b= , c= , d=
2 2 2 2
satisfy
n = a2 + b2 + c2 + d2 . (2.11)
x = a + b, y = a − b, z = c + d, w = c − d, (2.12)
the two even numbers in (2.10), we have r4 (2n) = 6P . Let Q be the number
of solutions of (2.11), in which a and b have the same parity and c and d
have opposite parity. In every solution of Equation (2.11), either a and b have
the same parity or c and d have the same parity (and not both). Therefore,
r4 (n) = 2Q. But Equation (2.12) gives a one-to-one correspondence between
solutions of Equation (2.10) counted in P and solutions of Equation (2.11)
counted in Q. Hence, P = Q and r4 (2n) = 3r4 (n) when n is odd.
n = x2 + y 2 + z 2 + v 2 + w2 (2.13)
has solutions, then in any solution three of x, y, z, v, w are odd and two are
even. Let
R= r4 (n − 4h2 )
4h2 ≤n
S= r4 (n − i2 )
i2 <n, i odd
It follows from Lemma 2.5 that r4 (4m) = 3r4 (m) when m is odd and
r4 (4m) = r4 (m) when m is even. Hence,
3r4 (j/4) if j ≡ 4 (mod 8)
r4 (j) =
r4 (j/4) if j ≡ 0 (mod 8).
∆ + ∆
=2 (δ + i) + {ss},
2
(a) (b)
2
σ(n − 4h2 ) = σ(n − i2 ) (n ≡ 3 (mod 4).)
3
4h2 ≤n i2 <n, i odd
Therefore, we have
n − i2
σ(n − 4h ) = 4
2
σ (n ≡ 5 (mod 8).)
4
4h2 ≤n i2 <n
with partition (d) n = i2 + 4dδ. Since s must be odd, the expression in braces
becomes
(−1)δ+i i = − (−1)δ i = 0
(d) (d)
Also,
(−1)d+δ d = (−1)d+δ δ,
(d) (d)
so
2 (−1)δ δ (−1)d + 1 = {−n + 1}.
(d)
The terms in which d is odd vanish. Change d to 2d and we have
for positive integers n. Notice that χ(n) = σ(n) when n is odd and χ(n) =
3σ(n) when n is even. In case n ≡ 2 (mod 4), χ(n) = 3σ(n) = 3σ(n/2) = σ(n).
We have
n − i2
χ(n − 4h ) = 4
2
χ (n ≡ 1 (mod 4).)
4
4h ≤n
2 2 i ≤n, i odd
These are the analogues of Equations (2.15) and (2.14). The formula
3χ(n) if n is odd
χ(2n) =
χ(n) if n is even.
n = 4m = x2 + y 2 + z 2 + w2 ,
either all of x, y, z, w are even or else all are odd. The number of solutions
having all four of these numbers even is r4 (m) = 8σ(m). Hence, the number
of representations of 4m as the sum of four odd squares is 16σ(m), and the
number of ways of expressing 4m as the sum of the squares of four positive
odd numbers is σ(m).
34 Sums of Squares of Integers
(−1)i (f (d + i − d ) − f (d + i + d )) = {(−1)s−1 s(f (s − d ) − f (s + d ))},
(c1 )
where the partition (c1 ) is n − d δ = i2 + d δ , δ odd, and the right side
vanishes unless n − d δ = s2 for some positive integer s. Summing this
equation over all possible d , δ , we find (see below)
with partitions
while i = −s gives
and
(−1)(x−1)/2−(z+1)/2+y yf (y) if both x and z are odd
F1 (x, y, z) =
0 if either x or z is even.
Elementary Methods 35
The reader should check that each of F and F1 is an odd function of x and
an even function of y, z. Applying Equation (2.4) to F and then to F1 yields
the two identities
2 (−1)i δf (d + i) = (−1)∆ −1+h (2h + ∆ − ∆)f (h) + 2{(−1)s−1 s2 f (s)}
(e) (f)
and
2 (−1)i (d + i)f (d + i) = (−1)∆ −1+h hf (h) + 2{(−1)s−1 s2 f (s)},
(e) (f)
and
(−1)∆ −1+h (∆ − ∆)f (h) = −2 (−1)h (d − δ)f (i).
(f) (e)
If we subtract the two identities we got from Equation (2.4), we have
valid for even functions f (x). If f (x) is an even function and i is a fixed
integer, then
ψi (x) = f (x + i) + f (x − i)
is an even function of x. Let
ω0 (n) + 2 (−1)i ωi (n − i2 ) = 0.
i2 <n, i>0
One shows easily by induction on n that ω0 (n) = 0 for every n, that is,
When d and d are even in (g) replace them by 2d and 2d , respectively.
Equation (2.20) becomes
(Here, d |k, d |(2m − k) and d|m.) But, for k odd, σ(k) is the number of
representations of 4k as a sum of the squares of four positive odd integers.
The equation shows that σ3 (m) is the number of representations of 8m as a
sum of the squares of eight positive odd integers.
As another easy application of (2.21), one can show that every prime p ≡
1 (mod 4) can be expressed as the sum of two squares of integers by letting
f (x) = cos(πx/2) and m = p in (2.21). Note that
(−1)x/2 if x is even
f (x) = cos(πx/2) =
0 if x is odd.
Equation (2.21) becomes
(−1)(d −d )/2 − (−1)(d +d )/2 =
def
S = d 1 − (−1)d .
( ) (k)
As (k) in this case is, p = dδ with δ odd, the right side is just 1(1 − (−1)1 ) +
p(1 − (−1)p ) = 2(p + 1) whenever p is an odd prime. If also p ≡ 1 (mod 4),
then the right side must be ≡ 4 (mod 8).
On the left side ( ) is 2p = d δ + d δ , where d , δ , d and δ are odd
positive integers. It is easy to see (by trying ±1 (mod 4) for for d and d )
that (d − d )/2 and (d + d )/2 always have different parity so that
T = (−1)(d −d
def )/2
− (−1)(d +d )/2
= ±2.
If d = δ in ( ), then there are four solutions with the same numbers:
2p = d δ + d δ = δ d + d δ = d δ + d δ = d δ + δ d .
The first and third solutions give the same T = ±2 and so do the second and
fourth ones. Hence, the contribution to S from these four solutions is either
0 or 4(±2), a multiple of 8 in any case. The same result holds when d = δ .
The sum of multiples of 8 must be a multiple of 8. But we have seen that
when p is prime and p ≡ 1 (mod 4), then the right side must be ≡ 4 (mod 8),
not a multiple of 8. This can happen only if there is a solution to ( ) with
d = δ and d = δ , that is, 2p = (d )2 + (d )2 , and so p is the sum of two
squares by Lemma 2.3.
Now we repeat the steps in the above derivation of (2.21) from (2.5) with
different functions. Let F (x) be an odd function. Let n be a positive integer
and let d and δ be odd positive integers with d δ < n. In (2.5) replace n
by n − d δ and use the odd function ψ(x) = F (x − d ) + F (x + d ). Multiply
by (−1)(δ −1)/2 and sum over all possible d and δ . We get
−1)/2
(−1)i+(δ (F (d − d + i) − F (d + d + i)) = (2.22)
(d)
− (−1)i+(δ−1)/2 i F (i + d),
(e)
38 Sums of Squares of Integers
which is similar to (2.19). Repeat the same steps from (2.19) to (2.20) and
find
−1)/2
(−1)(δ (F (d − d ) + F (d + d )) =
(g)
1 + (−1)(δ−1)/2 δ
(−1)(δ−1)/2 dF (d) − F (d).
2
(h) (h)
Sum each of these two equations over all possible d and δ . We get the same
equations as above, but with (a) and (b) replaced by
respectively.
Since i and −i run through the same set of integers, we have
2 (f (d − d + i) − f (d + d + i)) = (ψi (d − d ) − ψi (d + d )),
(d) (d)
Now apply (2.23) in the same manner to the identity we obtain from (2.7).
This yields
d+δ d+δ
h
(−1) f −d −f +d = (2.25)
2 2
(c)
2 (−1)(i−1)/2+(δ−1)/2 d f (i − 2d).
(e)
We will use these two equations with f (x) = 1 if |x| = 1 and f (x) = 0 for
other x. On the left side of (2.24) and (2.25) the terms f d+δ
2 +d always
vanish because 2 + d ≥ 2, and f 2 − d = 0 unless d = 2 ± 1.
d+δ d+δ d+δ
Since dis odd this implies that d + δ ≡ 0 (mod 4). Hence nonzero terms
arise in (c) only for solutions of one of the equations
4n + 1 = 4h2 + dδ + (d + δ − 2)δ
4n + 1 = 4h2 + dδ + (d + δ + 2)δ ,
where d, δ and δ are positive odd integers, such that d + δ ≡ 0 (mod 4).
For positive integers n, let T (n) denote the total number of solutions to the
two equations
4n + 1 = dδ + (d + δ − 2)δ
4n + 1 = dδ + (d + δ + 2)δ ,
with the same restrictions on d, δ and δ . With this notation the left sides of
(2.24) and (2.25) are
T (n − h2 ) and (−1)h T (n − h2 ),
h2 <n h2 <n
respectively.
We will need to evaluate the right sides of (2.24) and (2.25) only when
n = 2m and n = 4m, where m is an odd positive integer. In (2.24), f (i) = 0
unless i2 = 1, and we have
8σ(m) if n = 2m
2 d f (i) = 4 d=
16σ(m) if n = 4m.
(e) n=dδ, δ odd
Elementary Methods 41
The only terms in −2 (e) d f (i − 2d) that do not vanish are those with
i = 2d ± 1. For this i, (e) is n = d(d + δ ± 1). But δ is odd, so the two factors
d, (d + δ ± 1) have the same parity. This is impossible when n = 2m with m
odd, so the sum vanishes in this case. Similarly,
2 (−1)(i−1)/2+(δ−1)/2 d f (i − 2d) = 0
(e)
T (2m − h2 ) = 8σ(m)
h2 <2m
and
(−1)h T (2m − h2 ) = 0.
h2 <2m
−2 d f (i − 2d) = −4 ∆.
(e) m=∆(∆+(δ±1)/2)
2 (−1)(i−1)/2+(δ−1)/2 d f (i − 2d)
(e)
has the same value. These sums cancel when we subtract (2.25) from (2.24).
The result is
Next we show that r3 (n) satisfies identities like those just proved for T (n),
and thereby establish relations between the two functions. Let m be an odd
positive integer. By Remark 2.1 at the end of Section 2.7, the number of
solutions of
4m = x2 + y 2 + z 2 + w2
in odd numbers is 16σ(m). Therefore,
Comparing this equation with (2.28) gives r3 (4m − 1) = 2T (4m − 1) for all
odd m, that is,
r3 (8n + 3) = 2T (8n + 3)
for every nonnegative integer n.
For odd m there are exactly 6σ(m) solutions to
2m = x2 + y 2 + z 2 + w2 (2.29)
with positive odd x, by Jacobi’s theorem (Theorem 2.6). (That theorem says
that r4 (2m) = 24σ(m) when m is odd. Since two of x, y, z, w are even and two
are odd, x is odd in half of the solutions, and x is positive in half of those.)
Hence,
6σ(m) = r3 (2m − 12 ) + r3 (2m − 32 ) + · · · .
Comparison with (2.27) yields r3 (2m − 1) = 3T (2m − 1) for odd m, or
r3 (4n + 1) = 3T (4n + 1)
for every nonnegative integer n.
By Jacobi’s theorem there are exactly 12σ(m) solutions to (2.29) in which
x is even. Thus,
12σ(m) = r3 (2m) + 2r3 (2m − 22 ) + 2r3 (2m − 42 ) + · · · .
From Equation (2.26) it follows that r3 (2m) = 3T (2m) for odd m, that is,
r3 (4n + 2) = 3T (4n + 2)
for every nonnegative integer n.
Combining these formulas and recalling Lemmas 2.3 and 2.4, we find
⎧
⎪
⎪ 3T (n) if n ≡ 1, 2, 5 or 6 (mod 8)
⎨
2T (n) if n ≡ 3 (mod 8)
r3 (n) =
⎪
⎪ 0 if n ≡ 7 (mod 8)
⎩
r3 (n/4) if n ≡ 0 or 4 (mod 8),
and the proof of the three squares theorem is nearly complete.
PROOF Following the above formula for r3 (n) in terms of T (n) it suffices
to show that T (n) is always positive. Now T (n) is the total number of solutions
of the two equations
4n + 1 = dδ + (d + δ − 2)δ
4n + 1 = dδ + (d + δ + 2)δ
Elementary Methods 43
COROLLARY 2.3
Every positive integer is the sum of three triangular numbers.
x2 + x y 2 + y z 2 + z
n= + +
2 2 2
REMARK 2.2 It may be shown (Smith [14], p. 323) that T (n) is four
times the number of odd classes of binary quadratic forms ax2 + 2bxy + cy 2
of discriminant b2 − ac = −n, where classes of ax2 + ay 2 are counted as 1/2.
(“Odd” means that at least one of a, c is odd; if one form in a class is odd,
then so is every form in that class.) A related result is shown in the next
section.
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
T (n) 2 4 4 4 8 8 4 8 10 8 12 8 8 16 8
r3 (n) 6 12 8 6 24 24 0 12 30 24 24 8 24 48 0
r2 (m) = 4 χ(d).
d|m, d>0
d odd
When m ≡ 2 (mod 4) and m > 0, the only way to express m as the sum of two
squares is with odd squares. If we require also that the squares be squares of
positive integers, the factor of 4 is not needed, and we can write this formula
in the form
= χ(a)χ(c)
m=2ab+2cd
a−c
= (−1) 2 ,
m=2ab+2cd
a = x + y, c = x − y, b = z − t, d = z + t,
Elementary Methods 45
we obtain
N0 = e. (2.33)
e|m, e>0
e odd
Here we have used the fact that d = t+z has solutions t = −d+2i, z = 2(d−i),
for i = 0, . . ., d − 1.
If x, y, z, t satisfy m = 4(xz − yt), and we replace y by −y and t by −t, but
leave x and z unchanged, then the new values still satisfy the equation. This
change of variables gives a one-to-one correspondence between terms (−1)y
summed in N+ and those summed in N− , and shows that N+ = N− .
Consider now a solution (x, y, z, t) of Equation (2.32) with y > 0. We then
have
x
>1 and x ≡ y (mod 2),
y
which implies that there is an integer u, such that
x
2u − 1 < < 2u + 1.
y
We put
x = 2uz − t
y = z
z = y
t = 2uy − x.
46 Sums of Squares of Integers
CLAIM 2.1
Let a, b and n be positive integers. Let f (a, b, n) denote the number of integer
solutions of
Then we have
f (a, b, n) = f (b, a, n).
PROOF We now prove the claim. Note that f (b, a, n) is the number of
integer solutions of the equation
n = ax + by ≥ a · 1 + b(a + 1) = a + b + ab,
and n is a multiple of gcd(a, b). If n were not a multiple of gcd(a, b), then
there would be no integer solutions of System (2.34) because one side would
be divisible by gcd(a, b) while the other side would not be. Also note that
these conditions are symmetric in a and b.
Now let (x, y) be an integer solution of System (2.34). Since y ≡ 0 (mod a),
there is a unique positive integer, namely, u = y/a, such that
y
u< < u + 1,
a
or equivalently,
au < y < au + a,
that is,
0 < y − au < a.
Put x = y − ua.
x
Since 0 < x < b, we have 0 < b < 1 and hence
x
u< + u < 1 + u,
b
which implies
x + ub
u< < u + 1.
b
48 Sums of Squares of Integers
y > b.
We are now ready to state the main lemma. Its statement and proof are
similar to those of Claim 2.1, but we restrict the parity of x and y.
LEMMA 2.6
Let a and b be positive integers. Let m be an integer. Let α, β ∈ {0, 1}. Let
φ(a, b, α, β, m) denote the number of integer solutions of
y ≡ a (mod 2a)
Denote x = y − 2au.
From the inequality |x| < b, that is, −b < x < b, with the same integer u
as above, we get
−b + 2bu < x + 2bu < b + 2bu.
This yields
−2b + 2bu < x + 2bu − b < 2bu,
which is the same as
x + 2bu − b
u−1< < u.
2b
Therefore x + 2bu ≡ b (mod 2b). Denote y = x + 2bu.
Observe that
Also note that since −b < x < b, we have −b + 2bu < x + 2bu, and this implies
b ≤ b(2u − 1) < y . Therefore y > b.
Similarly x ≡ α (mod 2) implies x + 2bu ≡ α (mod 2), that is, y ≡
α (mod 2). Using y ≡ β (mod 2), we get y − 2au ≡ β (mod 2) and hence
x ≡ β (mod 2).
We have thus shown that there is a bijection between the solutions of
Systems (2.36) and (2.37) given by (x, y) → (x , y ). We should note that
φ(a, b, α, β, m) = 0 except if m is a multiple of gcd(a, b), m ≥ a + b and
m ≡ aα + bβ (mod 2). These conditions for φ(a, b, α, β, m) = 0 are symmetric
for a, b and α, β. This completes the proof of the lemma.
We now come to the derivation of the formula for r3 (m). The elementary
method of Kronecker which we present requires that we know a priori the
shape of the formula to be proved, a fact that can be taken as a weak point
in the derivation, as compared to Gauss’ original proof.
in integers a, b, c.
Note that H(m) = 0 unless −m < 0 and −m ≡ 1 (mod 4). This follows
from the fact that b ≡ 1 (mod 2) implies b2 ≡ 1 (mod 4), and this in turn
yields m = 4ac − b2 ≡ −1 (mod 4).
If a ≤ c, then (2.38) gives 0 < b < 2a and 0 < b ≤ 2a − 1, which implies
b2 ≤ 4a2 − 4a + 1, or −b2 ≥ 4a − 1 − 4a2 . Therefore,
Example 2.1
Let m = 3. Then −m < 0 and −m = −3 ≡ 1 (mod 4). Let us compute
H(3). We will count the reduced binary quadratic forms ax2 + bxy + cy 2 with
discriminant b2 − 4ac = −3. The formula displayed just before this example
gives
4 = m + 1 = 4a(c − a + 1)
or a(c − a + 1) = 1. From (2.38), a and c must be positive integers. The only
possibility is a = c = 1. From −3 = b2 − 4ac = b2 − 4 we find that b = 1. Hence
x2 + xy + y 2 is the only reduced binary quadratic form with discriminant −3,
and so H(3) = 1.
Note that when k ≡ 3 (mod 8), the only way to write k as a sum of three
squares uses odd squares, and so ρ3 (k) = r3 (k)/8. We can now state the main
result.
ρ4 (m) = r3 (m − x2 ),
0<x odd
where the summation is now taken over all odd integers x, positive and neg-
ative.
Elementary Methods 51
m − x2 = 4n − x2 = 4ac − b2 ,
then
x2 − b2
n=a·c+ .
4
Therefore, we can express Xn in the equivalent form Xn =
1 x2 − b2
.
(a, b, c, x) : n = ac + , b > 0, b < 2a, b < 2c, b ≡ x ≡ 1 (mod 2)
2 4
|a − y| < c − z.
{(a, c, y, z) : n = ac−yz, 0 < y+z < 2a, y+z < 2c, |a−y| < c−z, y ≡ z (mod 2)}.
Then A ⊆ B, where B =
{(a, c, y, z) : n = ac−yz, 0 < y+z < 2a, y+z > 2c, |a−y| < c−z, y ≡ z (mod 2)}.
and these imply 2c < y + z < c + a, from which we obtain 2c < c + a and
0 < y + z < c + a. Therefore, −a < c < a, and hence
as claimed.
Note that the conditions that define B also imply that both a and c are
odd, and that y and z have opposite parity.
On the set B we perform the following change of variables:
y = a − u, z = u + w, c = u + v + w.
Thus we have
v = u + v + w − u − w = c − z > |a − y| = |a − a + u| = |u|.
Note that |B| = |D|. Since the conditions defining D are equivalent to those
defining B, we have that a is odd and therefore w is even.
Let us now consider the quadruples in D for which u = 0. In this case we
have
n = av, |w| < a, v > 0.
Thus a can be any positive divisor of n. For each such a, w can be any even
number with |w| < a, and there are a possible values for w. Therefore the
total number of quadruples with u = 0 is
d = ρ4 (4n).
d|n, d>0
d odd
The conditions defining E imply that v and u have opposite parity. To see
this, observe that the equality
n − u2 = av + uw,
where the sum runs over all pairs (u, a) for which u > 0, a > 0 and a is odd.
The same argument we used in the two paragraphs after (2.33) to show that
N+ = N− = 0 also shows that all the terms in the sum u,a vanish except
those for which
n − u2 ≥ u + a.
(There is no (−1)y here because we are just trying to show that there are no
solutions.) This implies in particular that the set B is finite.
As we have already shown that in the set C the inequality y + z < 2a is
implied by the other inequalities, we will omit it from here on.
We make a further change of variables in C,
a = u + v + w, y = u + w, z = c − u,
or equivalently,
n − u2 = vc + uw.
The inequality y + z > 2c implies u + w + c − u > 2c, and hence
w > |c|.
u > |v|.
Again with φ as in Lemma 2.6, for a given pair (c, u), the number of pairs
(v, w) for which (c, u, v, w) ∈ F is φ(c, u, u + 1, 0, n − u2 ) and hence we see
that
|F | = φ(c, u, u + 1, 0, n − u2 ),
c,u
where the sum runs over all pairs (c, u) for which c > 0, u > 0 and c is odd.
We now apply Lemma 2.6 to obtain
|E| = |F |.
k = t2 + u2 + v 2 + w2 and s = t + u + v + w. (2.41)
LEMMA 2.7
Let k and s be odd positive integers, such that s2 < 4k and 3k < s2 + 2s + 4.
Then there exist nonnegative integers t, u, v and w, such that (2.41) holds.
so that
4k − s2 = x2 + y 2 + z 2 . (2.42)
These three integers must be odd. By reordering them and changing their
signs, we may assume that x ≥ y ≥ z > 0. Choose the sign of ±z to
make s + x + y ± z ≡ 0 (mod 4). Define integers t = (s + x + y ± z)/4,
u = (s + x − y ∓ z)/4, v = (s − x + y ∓ z)/4, w = (s − x − y ± z)/4. They satisfy
(2.41) and t ≥ u ≥ v ≥ w. Using the Cauchy-Schwartz inequality, (2.42) and
3k < s2 + 2s + 4 we find
PROOF One verifies easily that for each odd positive integer k ≤ 9 there
is an odd positive integer s, such that s2 < 4k√ and 3k
2
√ < s + 2s + 4. The
same statement holds for k > 9 because√ then 4k − ( 3k − 2 − 1) > 2 (and
3k < s2 + 2s + 4 is equivalent to s ≥ 3k − 2 − 1).
Now let k be a fixed odd positive
√ integer. Let s1 be
√ the smallest and s2 be
the largest odd integers between 3k − 2 − 1 and 4k. By the lemma just
proved, the theorem holds for all numbers of the form
m
(k − s) + s + r, (2.43)
2
where s is odd, s1 ≤ s ≤ s2 and 0 ≤ r ≤ m − 2. Since m > 2, the smallest of
these numbers is
m
Ck = (k − s2 ) + s2
2
and the largest is
m
Dk = (k − s1 ) + s1 + m − 2.
2
Now
m m
(k − s) + s + m − 2 = (k − (s − 2)) + (s − 2),
2 2
56 Sums of Squares of Integers
and √
4(k + 2) − 4k < 2
for all positive integers k. Therefore, s1 = s1 or s1 + 2, and s2 = s2 or
s2 + 2. Hence Dk+2 ≥ Dk + 2 and limk→∞, k odd Dk = ∞. Note also C1 =
2 (1 − 1) + 1 = 1. We will show that the closed intervals [Ck , Dk ] overlap
m
This inequality shows that there is no integer between the interval [Ck , Dk ]
and [Ck+2 , Dk+2 ] and the theorem holds for all n in Ck ≤ n ≤ Dk+2 .
Otherwise we have s2 − s1 = 0, which implies Ck+2 = Dk +√2, and also that
there is at most one odd number (namely
√ s1 = s2 ) between 3k − 2 − 1 and
4(k + 2). Hence 4(k + 2) − 3k − 2 − 1 < 4, which holds only when
k ≤ 105. We have represented every positive integer in the form (2.43) except
those numbers
√ Dk + 1 with k = 1, 3, 5, . . . , 105, for which there
√ is only one odd
s between 3k − 2 − 1 and 4(k + 2). But 4(k + 2) − 3k − 2 − 1 > 2
for every positive integer k, so there is at least one odd s between these limits.
One verifies, perhaps by computer calculation, that for 1 ≤ k ≤ 105, k odd,
if there is only one such odd s, then there exist nonnegative integers t, u, v
and w, such that either
or
k + 1 = t2 + u2 + v 2 + w2 and s + 1 = t + u + v + w.
Elementary Methods 57
Now
(k + 1) − (s − 1)
Ck + 1 = m + (s − 1)
2
(k + 1) − (s + 1)
=m + (s + 1) + m − 2.
2
Hence these numbers are representable in the form (2.40), too, and we are
done.
Legendre [65] proved that four or five polygonal numbers are enough for all
sufficiently large n. See Nathanson [75] for a modern proof of the following
version of Legendre’s theorem:
2.12 Exercises
1. Show that distinct integers s give distinct pentagonal numbers (3s2 +
s)/2 in the sense of Section 2.4.
2. Make a table of the “pentagonal” numbers (3s2 +s)/2 < 100. Beginning
with ω(1) = σ(1) = 1, use the recursion formulas in Theorems 2.3 and
2.4 to make tables of ω(n) and σ(n) for all n ≤ 100.
Prove that χ is totally multiplicative, that is, χ(de) = χ(d)χ(e) for all
integers d, e. Prove that for odd positive integers n we have
5. Let σ(n) denote the sum of the odd positive divisors of n. Prove that
σ is multiplicative and that σ(n) = σ(m), where m is the greatest odd
58 Sums of Squares of Integers
9. Prove that for odd positive integers m, σ2 (m) = min(d , d ), where
the sum extends over all odd positive integers d , d , δ , δ , such that
2m = d δ + d δ .
13. Let r(n) denote the number of triples (x, y, z) of integers with n =
x2 + y 2 + 2z 2 . Prove or disprove the following statements:
15. Find all positive integers d, such that every positive integer is repre-
sentable in the form x2 + y 2 + z 2 + dw2 . Repeat for x2 + y 2 + 2z 2 + dw2
with d ≥ 2, for x2 +2y 2 +2z 2 +dw2 with d ≥ 2 and for x2 +y 2 +4z 2 +dw2
with d ≥ 4.
Elementary Methods 59
16. Find the smallest positive integer k, such that every positive integer is
the sum of k or fewer odd squares.
17. Find the smallest positive integer k1 , such that every sufficiently large
positive integer is the sum of k1 or fewer odd squares.
18. Prove that if a positive integer is the sum of three squares of rational
numbers, then it is the sum of three squares of integers. (The converse
is obvious.)
19. For positive real numbers x, let N (x) denote the number of positive
integers n ≤ x that are not the sum of three squares, that is, for which
r3 (n) = 0. Prove that for all x ≥ 1,
x 7 log x x log x
− − 1 < N (x) < + .
6 8 log 4 6 8 log 4
22. Prove that if p and q are prime numbers with p2 +1 = 2q, then p2 +1 = 2q
is not the sum of the squares of two primes. (Note that 1 is not a prime.)
25. Extend the table of T (n) and r3 (n) at the end of Section 2.9 to n = 50.
27. Show by direct construction that the class number H(k) in Theorem 2.8
is positive when k ≡ 3 (mod 8).
28. Verify the final step of the proof of Theorem 2.9. Perhaps by computer
calculation, show that for each√odd k between 1 and 105 for which
there is only one odd s between 3k − 2 − 1 and 4(k + 2), there exist
nonnegative integers t, u, v and w, such that either
k + 1 = t2 + u 2 + v 2 + w 2 and s−1=t+u+v+w
or
k + 1 = t2 + u2 + v 2 + w2 and s + 1 = t + u + v + w.
Chapter 3
Bernoulli Numbers
Some of the material in this chapter is used in later chapters. Specifically,
we will use the definitions of the Bernoulli and Euler numbers and polynomi-
als, the functional equations for the Riemann zeta function and the Dirichlet
L-functions, their values at negative integers, and the definition and basic
properties of the generalized Bernoulli numbers.
The rest of the material in this chapter is beautiful mathematics for you to
enjoy. We love this material, we hope you do, too, and we believe we have
presented it with a novel slant.
3.1 Overview
The Bernoulli numbers have important uses in many parts of number theory
and, indeed, throughout mathematics. We will need some of their properties
in later chapters. In this chapter, we define Bernoulli numbers and prove some
of their important and interesting properties.
We will define the Bernoulli numbers Bn as coefficients of a power series
∞
t tn
= Bn .
et − 1 n=0 n!
We will show that they are rational numbers and that Bn = 0 when n is > 1
and odd. We define the Bernoulli polynomials Bn (x) in a similar fashion and
show that
m
Bn+1 (m + 1) − Bn+1
in = .
i=1
n+1
For the case n = 2 of squares, the subject of this book, the formula says
m
m(m + 1)(2m + 1)
i2 = 1 + 4 + 9 + 16 + 25 + · · · + m2 = .
i=1
6
61
62 Sums of Squares of Integers
We prove that the derivative of Bn+1 (x) is (n+ 1)Bn (x) and use integration
by parts to obtain the Euler-MacLaurin sum formula (Theorem 3.5), which is
useful for approximating a sum by an integral. We give some applications of
this formula, including Stirling’s formula for n! and the functional equation
of the Riemann zeta function ζ(s). We prove formulas for ζ(2n) such as
∞
1 1 1 1 π2
n−2 = 1 + + + + + ··· = .
n=1
4 9 16 25 6
valid for |t| < 2π. The coefficients Bn are called the Bernoulli numbers. It is
easy to find a recursion formula
n for the Bn . In the theorem the symbol ni is
the binomial coefficient i = n!/(i!(n − i)!).
Bernoulli Numbers 63
n−1
n
1 if n = 1
Bi =
i 0 1.
if n =
i=0
This recursion formula shows that the Bernoulli numbers are rational num-
bers and enables us to compute them.
Example 3.1
1 · B0 = 1, so B0 = 1,
1
B0 + 2B1 = 0, so B1 = − ,
2
1
B0 + 3B1 + 3B2 = 0, so B2 = ,
6
B0 + 4B1 + 6B2 + 4B3 = 0, so B3 = 0,
1
B0 + 5B1 + 10B2 + 10B3 + 5B4 = 0, so B4 = − , etc.
30
n 0 1 2 4 6 8 10 12 14 16 18 20
Bn 1 − 21 1
6 − 30
1 1
42 − 30
1 5
66 − 2370
691 7
6 − 3617
510
43867
798 − 174611
330
yields
n
n
Bn (x) = Bi xn−i ,
i=0
i
which is a polynomial of degree n (since B0 = 1) with rational coefficients. It
is called the n-th Bernoulli polynomial. We find using Theorem 3.1,
n
n 1 + B1 = 12 if n = 1
Bn (1) = Bi = (3.1)
i Bn if n = 1.
i=0
Example 3.2
By the method of Example 3.1, the first few Bernoulli polynomials are
1 1
B0 (x) = 1, B1 (x) = x − B2 (x) = x2 − x + ,
,
2 „ « 6
3 2 1 1
B3 (x) = x − x + x = x(x − 1) x −
3
,
2 2 2
1
B4 (x) = x4 − 2x3 + x2 − ,
30
5 5 1
B5 (x) = x5 − x4 + x3 − x.
2 3 6
See Abramowitz and Stegun [1] for extensive tables of Bernoulli numbers
and the coefficients of Bernoulli polynomials.
Bernoulli polynomials were first introduced (by Bernoulli himself) to eval-
uate the sum
k−1
k−1
Sn (k) = in = in .
i=1 i=0
Add this equation for x = 1, 2, . . . , k. The sum on the left side telescopes and
we obtain the theorem.
Example 3.3
X
k−1
k2 k X
k−1
k3 k2 k
S1 (k) = i= − , S2 (k) = i2 = − + ,
i=1
2 2 i=1
3 2 6
X
k−1
k 4
k k 3 2 X
k−1
k5 k4 k3 k
S3 (k) = i3 = − + , S4 (k) = i4 = − + − .
i=1
4 2 4 i=1
5 2 3 30
COROLLARY 3.1
For all real a ≤ b and nonnegative integers n,
b
1
Bn (t) dt = (Bn+1 (b) − Bn+1 (a)) .
a n+1
COROLLARY
1 3.2 1
We have 0 B0 (t) dt = 1 and 0 Bn (t) dt = 0 for all integers n > 0.
where b
(−1)n−1
Rn = Bn (x − x)f (n) (x) dx.
n! a
1 1
PROOF Write 0 f (x) dx = 0 B0 (x)f (x) dx and integrate by parts n
times. Using Equation (3.1) we get
1 1
Bk (x) (k−1) 1
n
Bn (x) (n)
f (x) dx = (−1)k−1 f (x) + (−1)n f (x) dx
0 k! 0 0 n!
k=1
Bk (k−1)
n
= (−1)k−1 f (1) − f (k−1) (0) + f (1)
k!
k=1
1
Bn (x) (n)
+ (−1)n f (x) dx.
0 n!
Replace f (x) by f (j − 1 + x) and find
1 Bk (k−1)
n
f (j) = f (j − 1 + x) dx + (−1)k f (j) − f (k−1) (j − 1)
0 k!
k=1
1
Bn (x) (n)
+ (−1)n−1 f (j − 1 + x) dx.
0 n!
The theorem follows when we add this equation from j = a + 1 to j = b.
where
1
(n)
aj =2 Bn (x) cos 2πjx dx
0
1
(n)
bj =2 Bn (x) sin 2πjx dx.
0
Bernoulli Numbers 67
(1)
for integers n > 1 and aj = 0 for positive integers j. For integers n > 1 we
have
1 1
cos 2πjx 2
B (x) cos 2πjx dx
(n)
bj = −2Bn (x) +
2πj 2πj 0 n
0
1
2n n (n−1)
= Bn−1 (x) cos 2πjx dx = a ,
2πj 0 2πj j
while for n = 1,
1 1
(1) cos 2πjx 2 2
bj = −2B1 (x) + cos 2πjx dx = − .
2πj 2πj 0 2πj
0
for all positive integers m and j. Hence we have proved Theorem 3.6.
68 Sums of Squares of Integers
for positive integers m, which, since the sum clearly positive, proves Theorem
3.7.
In Section 3.5, we will give a proof of Theorem 3.7 that does not use Fourier
series.
For complex numbers s with real part > 1, the Riemann zeta function is
defined by
∞
ζ(s) = n−s = (1 − p−s )−1 .
n=1 p prime
Thus, the equation just before Theorem 3.7 may be rewritten as Theorem 3.8.
(2π)2m (2π)2m
ζ(2m) = (−1)m−1 B2m = |B2m |.
2(2m)! 2(2m)!
Example 3.4
With m = 1, 2, 3 and 4 we have
X∞
1 π2 X∞
1 π4 X∞
1 π6 X∞
1 π8
= , = , = , = .
n=1
n2 6 n=1
n4 90 n=1
n6 945 n=1
n8 9450
Bernoulli Numbers 69
It is easy to see that ζ(s) is monotonically decreasing for real s > 1 and
lims→∞ ζ(s) = 1. This gives the estimates
(2m)! (2m)! (2m)!
2 < |B2m | ≤ 2 ζ(2) = ,
(2π)2m (2π)2m 12(2π)2m−2
valid for positive integers m, which shows that |B2m | increases monotonically
after its minimum at B6 = 1/42, and
(2m)!
|B2m | ∼ 2 . (3.3)
(2π)2m
(The notation f (x) ∼ g(x) means limx→∞ f (x)/g(x) = 1.) We may obtain
estimates for Bn (t − t), too. From Theorem 3.6, we have for n ≥ 2,
∞
2 n! 1 2 n!
|Bn (t − t)| ≤ = ζ(n) (3.4)
(2π)n j=1 j n (2π)n
2 n! n!
≤ ζ(2) = ,
(2π)n 12(2π)n−2
and for even n we have by Theorem 3.8,
where ∞
1
Ω2m (N ) = B2m (t − t)t−2m dt
2m N
and
1 ∞
m
B2j
Km = B2m (t − t)t−2m dt − .
2m 1 j=1
(2j − 1)2j
70 Sums of Squares of Integers
N
√ √ N
N! ∼ 2πe−N N N +1/2 = 2πN . (3.6)
e
and
1 · 3 · 5 · · · (2n − 1) √ 1
lim n= √ . (3.9)
n→∞ 2 · 4 · 6 · · · (2n) π
PROOF In the proof we will use the notation m!! to denote the product
of every second integer from m down to 1 or 2. That is, (2n)!! = (2n)(2n −
2) · · · 4 · 2 and (2n − 1)!! = (2n − 1)(2n − 3) · · · 3 · 1. With this notation, (3.8)
and (3.9) become
2
(2n)!! 1 π (2n − 1)!! √ 1
lim = and lim n= √ .
n→∞ (2n − 1)!! 2n + 1 2 n→∞ (2n)!! π
π/2
For nonnegative integers m let Im = 0 sinm x dx. Integrating by parts
we find for m ≥ 2,
π/2 π/2
Im = −(sinm−1 x) cos x0 + (m − 1) (sinm−2 x) cos2 x dx
0
π/2
= (m − 1) sinm−2 x(1 − sin2 x) dx
0
= (m − 1)(Im−2 − Im ) = (m − 1)Im−2 − (m − 1)Im ,
π/2 π/2
or Im = m−1
m Im−2 . Now I0 = 0 1 dx = π/2 and I1 = 0 sin x dx = 1. A
simple induction argument yields
⎧
⎨ (m − 1)!! π if m is even
Im = m!! 2
⎩ (m − 1)!!
if m is odd.
m!!
For every real x in 0 ≤ x ≤ π/2 and for every positive integer n we have
or
2 2
1 (2n)!! π 1 (2n)!!
≤ ≤ .
2n + 1 (2n − 1)!! 2 2n (2n − 1)!!
72 Sums of Squares of Integers
Let A denote the left side and B denote the right side of this inequality, so
that it says A ≤ π/2 ≤ B. Now
2
1 1 (2n)!!
B−A = −
2n 2n + 1 (2n − 1)!!
2
1 (2n)!!
=
(2n)(2n + 1) (2n − 1)!!
1 π π
≤ = ,
2n 2 4n
since A ≤ π/2. Therefore, limn→∞ (B − A) = 0 and Formula (3.8) follows. If
we take the reciprocal of the square root of Formula (3.8), we find
1 · 3 · 5 · · · (2n − 1) 1 1
lim n+ = √ ,
n→∞ 2 · 4 · 6 · · · (2n) 2 π
N !eN
c = eK = lim .
N →∞ N (N +1/2)
or N ! ∼ ce−N N (N +1/2) .
If we multiply numerator and denominator in (3.9) by 2 · 4 · 6 · · · (2n), we
find
(2n)! √ 1
lim n= √ .
n→∞ (2n n!)2 π
Apply N ! ∼ ce−N N (N +1/2) with N = 2n and with N = n to obtain
√
ce−2n (2n)(2n+1/2) n 1
lim 2 = √ .
n→∞ n −n
2 ce n (n+1/2) π
√ √ √
This formula simplifies to limn→∞ 2/c = 1/ π, or c = 2π, that is, K =
√
log 2π.
Bernoulli Numbers 73
The limit
N
1
γ = lim − log N
N →∞ n
n=1
exists by the integral test and is called Euler’s constant. Its definition provides
a poor method for computing it because the expression in parentheses is only
within a constant times N −1 of γ. Thus, approximately 1,000,000,000 terms
in the sum would be required to compute γ to 9 decimal places. Furthermore,
if this calculation were performed by standard floating point arithmetic on
a computer, the round-off error would be enormous and in the last step two
nearly equal large numbers would be subtracted, resulting in the catastrophic
loss of significant figures. The Euler-MacLaurin sum formula provides a better
way to compute γ.
Apply Theorem 3.5 with f (x) = 1/x, n = m, a = 1 and b = N . The details
are like those of the derivation of Stirling’s formula. We find
N
1 1
m
B2j −2j
= log N + γ + − N + Ψ2m (N ),
n=1
n 2N j=1 2j
where
|B2m | −2j
|Ψ2m (N )| ≤ N .
2m
Example 3.5
With N = 6 and m = 3 we have
1 1 1 1 1 1 B2 −2 B4 −4 B6 −6
2.45 = 1 + + + + + = log 6 + γ + − 6 − 6 − 6 + Ψ6 (6).
2 3 4 5 6 12 2 4 6
We have B2 = 16 , B4 = − 30
1
and B6 = 1
42
, so with an error not greater than
˛ ˛
˛ B6 ˛ −6
˛
|Ψ6 (6)| ≤ ˛ ˛ 6 = 7−1 6−8 = 0.000000085
6 ˛
we find
1
γ = 2.45 − log 6 − + 2−1 6−3 − 20−1 6−5 + 7−1 6−8 ≈ 0.577215667.
12
With larger values of m and N one obtains the more precise estimate γ ≈
0.5772156649015328606. It is not known whether γ is rational or irrational.
which equals π 2m−1 times a rational number. (See also Exercise 12 at the end
of this chapter.)
Example 3.6
With B1 (t) = t − 1/2 and B3 (t) = t(t − 1)(t − 1/2) we find
X∞
(−1)n 1 1 π X∞
(−1)n 1 1 π3
= 1 − + − · · · = and 3
= 1 − 3 + 3 − ··· = .
n=0
2n + 1 3 5 4 n=0
(2n + 1) 3 5 32
We can evaluate a few more sums of this type. For s > 1 we have
∞
1 ∞ 1 ∞
1
= (1 − 2−s )ζ(s)
def
λ(s) = s
= s
− s
n=0
(2n + 1) k (2k)
k=1 k=1
and
∞ ∞
∞
def (−1)n−1 1 1
η(s) = s
= s
−2 s
= (1 − 21−s )ζ(s).
n=1
n k (2k)
k=1 k=1
Example 3.7
By L’Hôpital’s rule (and ζ (1) = 1), η(1) = log 2. Theorem 3.8 allows us to
evaluate λ(s) and η(s) for even positive integers s. Thus,
1 1 π2 1 1 π4
λ(2) = 1 + 2
+ 2 + ··· = , λ(4) = 1 + 4
+ 4 +··· = ,
3 5 8 3 5 96
1 1 π2 1 1 7π 4
η(2) = 1 − + − · · · = and η(4) = 1 − + − · · · = .
22 32 12 24 34 720
Let (s) denote the real part of the complex number s. The Riemann
ζ-function has been defined for (s) > 1. We will extend this domain by
analytic continuation and, at the same time, give a convenient formula for
computing ζ(s) for any complex number s = 1. Apply the Euler-MacLaurin
sum formula with f (x) = x−s , a = 1 and b = N . Let n be a positive
Bernoulli Numbers 75
N
n
Bj
ζ(s) − 1 = lim x−s dx − s(s + 1) · · · (s + j − 2)(N −s+1−j − 1)
N →∞ 1 j!
j=1
N
1
− s(s + 1) · · · (s + n − 1) Bn (t − t)t−s−n dt
n! 1
N 1−s − 1 1 −s
= lim + (N − 1)
N →∞ 1−s 2
n
Bj
− s(s + 1) · · · (s + j − 2)(N −s−j+1 − 1)
j=2
j!
N
1 −s−n
− s(s + 1) · · · (s + n − 1) Bn (t − t)t dt .
n! 1
1 Bj
n
1
ζ(s) = + + s(s + 1) · · · (s + j − 2) (3.11)
s − 1 2 j=2 j!
∞
1
− s(s + 1) · · · (s + n − 1) Bn (t − t)t−s−n dt.
n! 1
But the right side converges in (s) > 1 − n, except at s = 1. Hence ζ(s) is
continued analytically to a meromorphic function in the whole complex plane
with its only singularity a simple pole with residue 1 at s = 1.
Formula (3.11) provides an excellent way to compute ζ(s) for any complex
number s = 1. Take n to be a positive integer with n > 1 − (s). For any
such n, Bn (t − t) is a polynomial in t for t between consecutive integers.
Thus we can compute
N
Bn (t − t)t−s−n dt
1
explicitly for any reasonable positive integers N and n. The remainder may
be estimated by (3.4).
With s = 0 in (3.11) we have ζ(0) = −1/2. We can also use (3.11) to find
ζ(−k) for positive integers k. When n = k + 1 and s = −k, Equation (3.11)
76 Sums of Squares of Integers
becomes
1 Bj
k+1
1
ζ(−k) = + + (−k)(−k + 1) · · · (−k + j − 2)
−k − 1 2 j=2 j!
⎛ ⎞
k + 1
k+1
−1 ⎝ 1
= 1+ − (k + 1) + Bj ⎠
k+1 2 j=2
j
k+1
−1 k + 1 Bk+1
= Bj = − ,
k + 1 j=0 j k+1
by Theorem 3.1. Thus ζ(−2m) = 0 and ζ(1 − 2m) = −B2m /2m for positive
integers m. The zeros at the negative even integers are called the “trivial
zeros” of the Riemann ζ-function.
See H. M. Edwards, [26], pp. 114–118, for more examples of the use of
Equation (3.11).
We will now sketch a proof of the functional equation for the ζ-function.
This equation connects ζ(s) and ζ(1 − s). For more details of the proof, see
Rademacher [87], p. 83. Let n = 3 in Equation (3.11). Then
1 1 B2 s(s + 1)(s + 2) ∞
ζ(s) = + + s− B3 (t − t)t−s−3 dt
s−1 2 2 6 1
converges for (s) < −1. Integrating by parts three times and using Theorem
3.4, we find that the value of this integral is
B2 1 1
− s− − .
2 2 s−1
Hence, ∞
s(s + 1)(s + 2)
ζ(s) = − B3 (t − t)t−s−3 dt
6 0
for −2 < (s) < −1. Replace B3 (t − t) by its Fourier series (Theorem 3.6).
One can justify interchanging the integral and sum and we have
∞ ∞
s(s + 1)(s + 2) sin 2πnt −s−3
ζ(s) = − 12 t dt
6 n=1 0 (2πn)3
∞ ∞
1
= −2s(s + 1)(s + 2) y −s−3 sin y dy,
n=1
(2πn)1−s 0
Bernoulli Numbers 77
This form of the functional equation shows that the function Λ(s) on the left
side is symmetric about the critical line (s) = 1/2. It is known that ζ(s)
does not vanish for (s) ≥ 1, that it has only the trivial zeros for (s) ≤ 0
(these correspond to poles of Γ(s/2)), and that it has infinitely many zeros
in the critical strip 0 < (s) < 1. The famous Riemann hypothesis is the
conjecture that every zero of ζ(s) in the critical strip lies on the critical line.
The functional equation (3.12) shows that if s is a zero in the critical strip,
then so is 1 − s. Since by (3.11) ζ(s) is real for real s, if s is a zero, then so
is its complex conjugate. Thus the zeros on the critical line occur in complex
conjugate pairs, and (if the Riemann hypothesis is false) those in the critical
strip off the critical line occur in quadruples as the vertices of rectangles in
the complex plane.
REMARK 3.1 The zeros of the ζ-function are connected with the dis-
tribution of prime numbers.
x Let π(x) denote the number of primes ≤ x
and define li(x) = 0 (1/ log t)dt. The prime number theorem states that
π(x) ∼ li(x). If Θ is the least upper bound of the real parts of the zeros of
ζ(s), so that 1/2 ≤ Θ ≤ 1, then |π(x) − li(x)| < cxΘ log x, where c is a certain
positive constant. The Riemann hypothesis is the statement Θ = 1/2.
as a function of the complex variable s, defined originally for (s) > 1, has a
meromorphic continuation to the whole s plane which is regular everywhere
except at s = 1 and 0, where it has simple poles with residue 1, and it satisfies
the relation
Λ(s) = Λ(1 − s). (3.14)
namely,
1 z
ϑ − = ϑ(z);
z i
the latter being a consequence of the Poisson summation formula.
Riemann used the third proof to obtain the approximate location of a few
zeros of ζ(s) on the line (s) = 12 . The same line of ideas was subsequently
developed by Siegel into a powerful technique (the Riemann-Siegel formula)
capable of yielding the true location of the complex zeros of ζ(s) high up on
the critical line (s) = 12 .
The argument we present here is that of the second proof. Its main merit is
that it quickly provides the relation between the values ζ(1 − 2k), for positive
integers k, and the Bernoulli numbers B2k . It has the added feature that
it generalizes to give the functional equation for the Dirichlet L-functions
L(s, χ), as well as to the relation between the values of L(s, χ) at the negative
integers and the generalized Bernoulli numbers, which we will treat in the
next section.
PROOF We begin with the well-known integral formula for the Gamma
function,
∞ ∞
dx dy
e−x xs e−ny (ny)s
def
Γ(z) = = , (x → ny),
0 x 0 y
Bernoulli Numbers 79
valid for (s) > 0, and which we can write in the form
∞
1 dy
Γ(s) · s = e−ny y s . (3.15)
n 0 y
Using the absolute convergence of ζ(s) for (s) > 1 and the formula for the
sum of a geometric series, we obtain
∞
∞ ∞ ∞ s−1
1 −ny s−1 y
Γ(s) · = e y dy = y −1
dy .
n=1
ns n=1 0 0 e
Note that the inversion of the order of summation and integration is valid
because for s = σ + it we have
∞ ∞
−ny s−1
e y dy = Γ(σ)ζ(σ) ,
n=1 0
where c is a real number satisfying 0 < c < 2π. The path Cc is oriented so
that the circular arc Cc0 about the origin is traversed in the counterclockwise
direction. The profile of the path has the shape of a lollipop with an infinitely
long stick.
Imaginary axis
'$
Cc−
Real axis
Cc+
&%
Cc0
removed and with log z real-valued on the positive real axis. With this con-
vention we have for s = σ + it and z = reiϑ ,
and hence
|z s | = rσ · e−tϑ .
We note that if the path integral F (s) is convergent, which we show below,
then Cauchy’s residue theorem implies that F (s) is independent of the path
Cc chosen and is in fact a function of s alone.
By decreasing the distance between Cc+ and Cc− down to zero while keeping
them connected to Cc0 , we arrive at a new path C = C − ∪Cc0 ∪C + , in which both
C − and C + can be identified with the negative real axis, but with opposite
orientation. Using the new path C and employing the inequality
s−1
z xσ−1 e±πt x∆−1 e∆π
−x/2
e−z − 1 = ex − 1 = ex − 1 < e ,
valid for s = σ + it inside the disk |s| < ∆ and for x > x0 = x0 (∆), we
conclude that the convergence is uniform inside |s| < ∆.
Using the parametrizations
C + : z = xe−iπ , for x = −∞ to x = c
Cc0 : z = ceiϑ , for ϑ = − 2π 2π
3 to 3 ,
−
C : z = xe , for x = c to x = −∞,
iπ
and setting
1
g(z) = ,
e−z − 1
we get
∞
2πiF (s) = − xs−1 e−sπi g(−x) dx
c
π
+ cs esϑi g(ceiϑ )i dϑ
−π
∞
+ xs−1 esπi g(−x) dx,
c
or equivalently,
∞ π
s−1 cs
πF (s) = sin sπ x g(−x) dx + esϑi g(ceiϑ ) dϑ.
c 2 −π
Let F1 (s, c) and F2 (s, c) denote the first and second terms in this sum, so that
πF (s) = F1 (s, c) + F2 (s, c).
Bernoulli Numbers 81
Since zg(z) is regular in |z| < 2π, we have |zg(z)| < A1 for |z| < π and a
constant A1 > 0. Therefore, if c < π, then
cσ π −tϑ A1
|F2 (s, c)| ≤ e dϑ ≤ πA1 e|t|π rσ−1 . (3.16)
2 −π c
is bounded. Thus we can bound the total contribution to FN (s) along the
boundary of C(N ) by an expression which is
Rσ−1 e|t|π
≤ (2πR) · A2 ≤ Rσ e|t|π A2 ,
2π
N
FN (s) = (2πin)s−1 + (−2πin)s−1
n=1
πs s−1
N
= 2(2π)s−1 sin n .
2 n=1
We thus obtain
sπ
F (s) = lim FN (s) = 2(2π)s−1 sin ζ(1 − s) ,
N →∞ 2
(1 − s) = 1 − σ > 1.
Hence,
sin 2πs ζ(1 − s)
ζ(s) = (2π)s 2
· .
sin πs Γ(s)
This then yields the functional equation for ζ(s) in its standard form
s
−s/2 −(1−s)/2 1−s
π Γ ζ(s) = π Γ ζ(1 − s),
2 2
χ : Z −→ C
L(s, χ) = 0
Example 3.8
The principal character χ0 is of course associated to the Riemann zeta function,
X∞
1
L(s, χ0 ) = s
= ζ(s).
n=1
n
Example 3.9
The Gauss character χg is defined by
j
(−1)(a−1)/2 if a is odd,
χg (a) =
0 if a is even.
def √
Z[i] = {a + bi : a, b ∈ Z} where i = −1.
X∞
χg (n) X∞
(−1)m
L(s, χg ) = = .
n=1
ns m=0
(2m + 1)s
Bernoulli Numbers 85
We shall see in the next section that there is a vast generalization of the Leibnitz
evaluation which applies to all Dirichlet L-functions. This development will lead
to the generalized Bernoulli numbers.
Example 3.10
The Eisenstein character χe is defined by
8
<1 if a ≡ 1 (mod 3),
χe (a) = −1 if a ≡ 2 (mod 3),
:
0 if a ≡ 0 (mod 3).
This character has conductor fχe = 3 and satisfies χe (−a) = −χe (a). The
associated Dirichlet L-function is given by
X∞
χe (n)
L(s, χe ) = , (s) > 1.
n=1
ns
Example 3.11
def
The product character χge = χg ·χe is an even function of conductor fχge = 12.
The associated Dirichlet L-function,
X∞
χge (n)
L(s, χge ) = , (s) > 1,
n=1
ns
completes the set of four L-functions related to the characters of the group
(Z/12Z)× .
1
f
W (χ) = i−δ χ(a)e2πia/f ,
fχ a=1
and has absolute value 1.
PROOF Begin with the integral formula (3.15). Multiply both sides by
χ(n) and sum over n = 1, 2, 3, . . .. From the periodicity of χ, we obtain a
new formula ∞
Γ(s)L(s, χ) = gχ (y)y s−1 dy ,
0
with
f
e−ay
gχ (y) = χ(a) · , where y > 0.
a=1
1 − e−f y
Letting
f
χ(a)zeaz
fχ (z) = ,
a=1
ef z − 1
we obtain fχ (−y) = ygχ (y). Both fχ and gχ are meromorphic functions of z
with poles at the points 2πim/f for m ∈ Z.
Using the same conventions as in the proof of Theorem 3.12 concerning the
branch of log z under consideration and the corresponding value for z s , we are
lead to consider the path integral
dz
Fχ (s) = fχ (z)z s−1 ,
Cc z
Bernoulli Numbers 87
where the path Cc is the same one as in the proof of Theorem 3.12, but with the
proviso that 0 < c < 2π/f . Exactly the same deformation and convergence
arguments as before lead to the integral representations
∞
2πiFχ (s) = − xs−1 e−sπi gχ (−x) dx
c
π
+ cs esϑi gχ (ceiϑ )i dϑ
−π
∞
+ xs−1 esπi gχ (−x) dx,
c
or equivalently,
∞ π
cs
πFχ (s) = sin sπ xs−1 gχ (−x) dx + esϑi gχ (ceiϑ ) dϑ. (3.17)
c 2 −π
Since zgχ (z) is regular in |z| < 2π/f and since |zgχ (z)| < A1 (χ), a constant
depending only on χ, for |z| < 2π/f , the same limiting procedure as was used
in the case of ζ(s) applied to the second term on the right side of (3.17) yields
the integral representation
∞
dy
πFχ (s) = − sin sπ fχ (y)y s−1 = − sin sπΓ(s)L(s, χ).
0 y
If
sf is a positive integer, s > 1 and χ is not the principal character χ0 , so that
a=1 χ(a) = 0, we see that the expression
χ(a)e−az z s−1
gχ (z) =
1 − e−f z
is single-valued and regular at z = 0 and therefore Fχ (1) = Fχ (2) = Fχ (3) =
· · · = 0. This proves that when χ is not the principal character, L(s, χ) has a
holomorphic extension to the whole s plane.
To obtain the functional equation, we suppose that (s) = σ < 0 and
consider the integral
1
Fχ,N = gχ (z)z s−1 dz ,
2πi Cχ (N )
i−δ · Gχ
W (χ) = √
f
and use the Gamma identity
(2π)s π −(1−s−δ)/2 Γ 1−s−δ
= 2
,
2Γ(s) cos π(s − δ)/2 π −(s+δ)/2 Γ s+δ
2
a result that can be derived from π/sin sπ = Γ(s)Γ(1 − s) and the duplication
formula
2s−1 1
2 Γ(s)Γ s + = π 1/2 Γ(2s),
2
we arrive at the standard form of the functional equation: If
s+δ
Λ(s, χ) = π −(s+δ)/2 Γ L(s, χ),
2
then
Λ(s, χ) = W (χ) · fχ(1/2)−s · Λ(1 − s, χ−1 ).
This completes the proof of the functional equation for L(s, χ).
Bn,χ
L(1 − n, χ) = − .
n
∞
Bn,χ 1 zm
fχ (z)z −n−1 = · + Bn,χ · .
n! z m=0
m
m=n
Thus we have that the residue of fχ (z)z −n−1 at z = 0 is Bn,χ /n!. Since
Γ(n) = (n − 1)!, the formula at the end of the previous section gives the result
claimed.
It is a pleasant exercise to prove that with χ(−1) = (−1)δ , where δ ∈ {0, 1},
we have
k−1
Sn (k) = in
i=1
and use it to derive a recursion formula for B2m . This formula leads to a
simple proof of Theorem 3.7 and the fact that |B2n | increases for n ≥ 4. This
section is based on Uspensky and Heaslet [113], pp. 254 ff.
We will add the numbers in the following k −1 by k −1 table in two different
ways.
Bernoulli Numbers 91
1n 2n 3n ··· 1n (k − 1)n
2n 4n 6n ··· 2n (k − 1)n
3n 6n 9n ··· 3n (k − 1)n
.. .. .. .. ..
. . . . .
The sum of the numbers in row i is in Sn (k). Adding all the rows gives the
grand total (Sn (k))2 . The sum of the numbers in the i-th L-shaped region is
Therefore,
k−1
(Sn (k))2 = 2in Sn (i) + i2n
i=1
⎛ ⎛ ⎞ ⎞
k−1 n n
⎝ 2i ⎝ n + 1
= Bj in+1−j ⎠ + i2n ⎠
i=1
n + 1 j=0
j
⎛ ⎞
2i2n+1
k−1
1 n + 1
n−1
= ⎝ + 2B1 i2n + i2n + Bj+1 i2n−j ⎠
i=1
n + 1 n + 1 j=1
j + 1
2 n Bj+1
n−1
= S2n+1 (k) + 2 S2n−j (k),
n+1 j=1
j j+1
where we have used Theorem 3.3 and B1 = −1/2. Both sides are polynomials
of degree 2n + 2 in k. Since they agree at every positive integer k, they must
be the same polynomial. Equate the coefficients of k 2 in each polynomial.
The coefficient of k 2 in (Sn (k))2 is
2
1 n+1
Bn = Bn2 .
n+1 n
2
The coefficient of k 2 in n+1 S2n+1 (k) is
2 1 2n + 2 2n + 1
B2n = B2n .
n + 1 2n + 2 2n n+1
92 Sums of Squares of Integers
2n + 1 n 2n − j
n−2
B2n + Bj+1 B2n−j−1 = −nBn2 .
n+1 j=1
j j + 1
for odd n ≥ 3.
We know that B2 = 16 > 0 and B4 = − 30 1
< 0. Use complete induction
on n. Suppose that (−1) B2s > 0 for 1 ≤ s < n. Then the two equations
s−1
displayed above show that B2n > 0 for odd n and B2n < 0 for even n. Hence
(−1)n−1 B2n > 0 for every positive integer n. The equations also show that
2n − 1 n(n + 1)
|B2n | ≥ |B2n−2 |
2n + 1 12
for n ≥ 2, so that |B2n | > |B2n−2 | for n ≥ 4, as was proved in Section 3.3.
which we write as
m−1
1 m + 1 m−j
pBm = Sm (p) − p pBj . (3.18)
j=0
m+1 j
Therefore, pm−j /(m−j+1)! and also (3.19) are p-integers and are ≡ 0 (mod p).
Thus, (3.18) becomes
p−1
p−1
Sm (p) = nm ≡ 1 = p − 1 ≡ −1 (mod p).
n=1 n=1
p−2
g (p−1)m − 1
Sm (p) ≡ g rm = ≡ 0 (mod p)
r=0
gm − 1
because g m ≡ 1 (mod p), while g p−1 ≡ 1 (mod p). This proves (3.21).
The congruences (3.20) and (3.21) show that if (p − 1) m, then pBm ≡
0 (mod p), so that Bm is a p-integer, and if (p − 1)|m, then pBm is a p-integer
satisfying pBm ≡ −1 (mod p).
COROLLARY 3.3
If m is an even
" positive integer, the denominator Qm of Bm is square-free.
In fact, Qm = (p−1)|m, p prime p.
COROLLARY 3.4
If m is an even positive integer, then there is an integer Im , such that
1
Bm = Im − .
p
(p−1)|m, p prime
If (p − 1)|m, then Bm + 1
p = (pBm + 1)/p is a p-integer for that prime p. It
follows that Im is a p-integer for every prime p. Hence, Im is an integer.
REMARK 3.2 The sum in Corollary 3.4 is less than a constant times
log log m, so for large even m, the integer Im is about as large as Bm . Corollary
3.4 determines the fractional part Bm − Bm of Bm for even m. It is easy
for large m to compute ζ(m) to many decimal places. If we use this value and
Theorem 3.8 to compute Bm with an error less than 12 , then we can find Bm
exactly because we know the fractional part from Corollary 3.4. This method
has been used to produce some tables of Bernoulli numbers.
COROLLARY 3.5
If p is an odd prime and m is an even integer satisfying 1 < m ≤ p − 1, then
or
n−1 # sa $
(am − 1)Sm (n) ≡ mnam−1 sm−1 (mod n2 ). (3.25)
s=1
n
n−1 # sa $
nPm (am − 1) ≡ Qm (am − 1)Sm (n) ≡ mnam−1 Qm sm−1 (mod n2 ).
s=1
n
REMARK 3.3 Closer examination of the proofs of (3.23) and (3.25) when
n = p is prime, (p − 1) m, and pe is the highest power of p that divides m,
shows that in this situation the congruences hold modulo pe+2 , so that we
may divide by m in Voronoi’s theorem to obtain
p−1 !
Bm m−1 sa
(a − 1)
m
≡a m−1
s (mod p). (3.26)
m s=1
p
It is this version of Voronoi’s theorem that is used in Corollaries 3.8 and 3.9.
COROLLARY 3.6
If p is prime, m is even, pe divides m and p does not divide Qm , that is, p − 1
does not divide m, then pe divides Pm .
COROLLARY 3.7
If a is an integer and k is a positive integer, then
ak+1 a2k − 1 B2k
2k
is an integer.
2k
ak+1 a − 1 P2k ak+1 a2k − 1 B2k
· =
m 2k
is an integer.
We return briefly to analysis for an application of Corollary 3.7. For |z| < 2π
we have
∞ ∞
z ez/2 + e−z/2 z z z zn z 2n
· z/2 = + = + Bn = B2n .
2 e − e−z/2 2 ez − 1 2 n=0 n! n=0
(2n)!
where Tn = (−1)n−1 22n (22n − 1)B2n /2n. The “tangent coefficients” Tn are
positive by Theorem 3.7 and integers by Corollary 3.7 with a = 2.
The next corollary is due to Vandiver [115].
COROLLARY 3.8
If p is an odd prime, k and a are positive integers, a ≥ 2, (p − 1) 2k and p a,
then
B2k jp/a
a−1
1 − a2k ≡ a2k−1 s2k−1 (mod p).
2k j=1 s=1
jp/a
a−1
≡ a2k−1 s2k−1 (mod p).
j=1 s=1
100 Sums of Squares of Integers
COROLLARY 3.9
If (p − 1) 2k, then
B2k
3p−2k + 4p−2k − 6p−2k − 1 ≡ s2k−1 (mod p).
4k
p/6<s<p/4
B2k jp/a
a−1
ap−2k 1 − a2k ≡ s2k−1 (mod p).
2k j=1 s=1
Write
B2k B2k
f (a) = ap−2k 1 − a2k ≡ ap−2k − a (mod p)
2k 2k
2k−1
and Sn = s , where the sum is over all integers s between np/12 and
(n + 1)p/12. With a = 3, 4 and 6, we find these congruence modulo p:
Adding the first two congruences and subtracting the third one yields
B2k
3p−2k + 4p−2k − 6p−2k − 1 ≡ S2 − S9 (mod p).
2k
But
S 2 − S9 ≡ s2k−1 − s2k−1 (mod p)
p/6<s<p/4 3p/4<s<5p/6
≡ s2k−1 − (p − s)2k−1 (mod p)
p/6<s<p/4
≡2 s2k−1 (mod p),
p/6<s<p/4
interval the sum on the right side is evaluated modulo p. If it does not vanish
(as usually happens), then p does not divide P2k . Otherwise, one computes
the coefficient of B2k /4k on the left side. If this factor is not a multiple of p,
then p is irregular and p|P2k . Another test (a slower one) must be used if p
divides the coefficient. Some such alternative tests are provided by variations
on Corollary 3.9 derived from Corollary 3.8 by choosing different values of
a. See Exercise 20 for some examples. See also page 268 of [113]. Also see
Buhler et al. [13] for even faster ways to find irregular primes.
COROLLARY 3.10
If p ≡ 3 (mod 4) is prime, then
(p−1)/2
2 B(p+1)/2 s
2− ≡− (mod p), (3.27)
p (p + 1)/2 s=1
p
REMARK 3.6 The sum on the right side of (3.27) is not divisible by
p since it is the sum of an odd number of terms ±1 and has absolute value
less than p. Hence, p does not divide P(p+1)/2 when p ≡ 3 (mod 4). This is
the only known case of p P2k with 2 ≤ 2k ≤ p − 3, k = k(p), that holds for
infinitely many primes p.
pn+e
−1 !
at
(a − 1) Pm ≡ mQm a
m m−1
t m−1
(mod pn+e ).
t=1
pn+e
B2k+s(p−1) −1 ta !
pn+e
a 2k
−1 ≡a2k−1
t2k+s(p−1)−1 (mod pn ).
2k + s(p − 1) t=1
p n+e
n
Multiply by (−1)s s and add from s = 0 to s = n. We get
n
n B2k+s(p−1)
a 2k
−1 (−1) s
s=0
s 2k + s(p − 1)
pn+e
−1 !
n
ta n (p−1)s
≡ a 2k−1
t 2k−1
(−1) s
t
t=1
pn+e s=0
s
pn+e
−1 ! n
ta
≡ a 2k−1
t2k−1 1 − t(p−1) (mod pn ).
t=1
pn+e
B2k B2k+p−1
≡ (mod p), (3.28)
2k 2k + p − 1
REMARK 3.7 The close examination of the terms (3.19) and (3.24) in
the proofs of the von Staudt-Clausen theorem and Voronoi’s congruence may
be avoided by the use of p-adic numbers. Wells Johnson [61] gave p-adic
proofs of Theorems 3.2, 3.15, 3.16 and 3.17 as well as most of the corollaries
of Theorems 3.15 and 3.16.
LEMMA 3.1
An odd prime is irregular if and only if it is a proper divisor of some Bernoulli
number.
B2k B2k
≡ ≡ 0 (mod ).
2k 2k
|Btm | |Ntm |
= > 1.
tm Dtm
Now we prove Jensen’s theorem, which he proved before Carlitz gave the
short proof of Theorem 3.18.
B2q B2 1
≡ = (mod pi ).
2q 2 12
Hence P2q ≡ 3 (mod 4) and there must be a prime ≡ 3 (mod 4), such that
|P2q . Clearly, q, and = pi because pi |Q2q , so |N2q and is irregular.
LEMMA 3.2
If p is an odd prime and k ≡ 1 (mod p(p − 1)), then
p
S2k (p) ≡ (mod p2 ).
6
Bernoulli Numbers 105
d1 , d2 , ..., dr ,
2d1 , 2d2 , ..., 2dr ,
(3.31)
d1 η, d2 η, ..., dr η,
2d1 η, 2d2 η, ..., 2dr η.
When 1 is added to each member of (3.31), the primes in the first two rows are
precisely those dividing Q2µ . The di η + 1 are even and 2di η + 1 ≡ 0 (mod 2i )
by (3.30). Therefore Q2q = Q2µ , and hence Q2q /6 ≡ (mod p).
106 Sums of Squares of Integers
COROLLARY 3.11
If t > 2 is an integer, then there are infinitely many irregular primes not of
the form mt + 1.
PROOF The result follows from Theorem 3.20 if t has an odd prime factor
and from Theorem 3.19 if t is a power of 2.
Example 3.12
For t = 3 and t = 6 we have φ(t) = 2 and Corollary 3.11 tells us that there are
infinitely many irregular primes of each of the forms 3m + 2 and 6m + 5. Of
course, there are also infinitely many irregular primes of the form 4m + 3 by
Theorem 3.19.
Example 3.13
The even integers have asymptotic density 1/2, as do the odd integers. The
prime numbers have asymptotic density 0. Exercise 19 of Chapter 2 shows that
the asymptotic density of the set of integers representable as a sum of three
squares is 5/6.
For positive integers j let d2j be the asymptotic density (if the limit exists)
of the set of positive integers k for which B2k has the same fractional part as
B2j . It turns out that the limits do exist and that d2j is approximately the
height of the vertical jump of the graph at z = the fractional part of B2j . For
example, the graph shows that about one-sixth of the Bernoulli numbers B2j ,
j ≥ 1, have fractional part 1/6.
108 Sums of Squares of Integers
Fx (z)
0
0 1 z 1
6
The graph appears to be horizontal in places, for example from 1/6 to about
0.3. This observation suggests that some intervals contain the fractional part
of no Bernoulli number. We will prove now that this is not so. We will show
that the fractional parts of the Bernoulli numbers are dense in the unit interval
(0, 1).
For positive integers j, let S2j = p−1|2j 1/p. By Corollary 3.3, the de-
nominator of B2j is greater than 1 when j is a positive integer. Is follows from
this fact and the remarks above that the fractional part of B2j is never 0, and
it equals 1− the fractional part of S2j when j is a positive integer. Hence it
suffices to show that the fractional parts of the sums S2j , j ≥ 1, are dense in
(0, 1). This we will show in Theorem 3.21 below. The proof of this theorem
requires three lemmas, which are interesting in their own right.
The first lemma counts the number of primes p, for which p − 1 is square-
free, lying in an arithmetic progression. In it, µ(n) is the Möbius function.
It is multiplicative, µ(1) = 1, µ(p) = −1 if p is prime, and µ(k) = 0 if k is
divisible by a square > 1.
Bernoulli Numbers 109
LEMMA 3.3
For all A > 1, if k and b are positive integers with gcd(b, k) = gcd(b−1, k) = 1
and k ≤ (log x)A , then
x
µ (p − 1) = c(k) li(x) + O
2
,
(log x)A
p≤x
p≡b (mod k)
where
1 1
c(k) = 1− . (3.33)
φ(k) p(p − 1)
pk
PROOF Note that µ2 (a) = d2 |a µ(a). To see this, observe that we may
always write a = st2 with s square-free. Thus a is square-free if and only if
t = 1, so we have µ2 (a) = d|t µ(d), which is 1 if t = 1 and 0 if t > 1. Since
d|t if and only if d2 |a, the identity follows.
Using this identity with a = p − 1, we have
µ2 (p − 1) = µ(d) = µ(d) 1.
√
p≤x p≤x d2 |(p−1) d≤ x p≤x, p≡b (mod k)
p≡b (mod k) p≡b (mod k) p≡1 (mod d2 )
The next lemma gives the principle of “partial summation,” also called
Abel’s summation formula. The lemma and its proof are from Hardy and
Wright [52] (Theorem 421 in Section 22.5).
LEMMA 3.4
Suppose c1 , c2 , . . . is a sequence of real numbers. For real x ≥ 1, let C(x) =
1≤i≤x ci . Suppose the real-valued function f (t) has a continuous derivative
for t ≥ 1. Then for x ≥ 1,
x
cn f (n) = C(x)f (x) − C(t)f (t) dt.
1≤n≤x 1
N
cn f (n) = cn f (n)
1≤n≤x n=1
N
= C(1)f (1) + (C(n) − C(n − 1))f (n)
n=2
N −1
= C(n)(f (n) − f (n + 1)) + C(N )f (N ).
n=1
Finally, x
C(N )f (N ) = C(x)f (x) − C(t)f (t) dt,
N
Bernoulli Numbers 111
LEMMA 3.5
For all integers k > 2 and b satisfying 1 ≤ b < k, gcd(k, b) = 1 and gcd(k, b −
1) = 1, we have
1 1
µ2 (p − 1) = c(k) log log x + K + O , (3.34)
p log x
p≤x
p≡b (mod k)
PROOF Lemma 3.5 follows from Lemma 3.3 by partial summation. Let k
and b be fixed. We will apply Lemma 3.4 with f (x) = 1/x and ci the sequence
defined by ci = 1 if i is prime, i ≡ b (mod k) and i − 1 is square-free,
and
ci = 0 if i does not satisfy thesethree conditions. Write C(x) = 1≤i≤x ci .
Since c1 = 0, we have C(x) = 2≤i≤x ci . Note that Lemma 3.3 estimates
C(x). Lemma 3.4 gives
x
1 C(x) C(t)
µ (p − 1) =
2
cn f (n) = + dt.
p x 2 t2
p≤x 1≤n≤x
p≡b (mod k)
The O term here is O((log x)−A+1 ), which is O((log x)−1 ). If we write li(t) =
t/ log t + O(t/(log t)2 ), we find that the first integral is
x x x
c(k) li(t) dt dt
dt = c(k) + c(k) O .
2 t2 2 t log t 2 t(log t)
2
The first integral on the right side gives c(k) log log x + K1 . The integral in
the O term may be evaluated as
∞ ∞
dt 1 1
− = +O .
2 x t(log t)2 log 2 log x
1 1
k
1
S(As ) − S(As−1 ) = = =
p p i=1
1 + qdi
p−1|As , but p−1=qdi
p−1As−1 for some i 1+qdi is prime
1 1 1 di
k k
σ(As−1 )
≤ = = .
q i=1 di q i=1 As−1 qAs−1
By Theorem 323 of [52], σ(As−1 )/As−1 < c1 log log As−1 for some positive
constant c1 . By Theorem 414 of [52],
COROLLARY 3.12
The fractional parts of the Bernoulli numbers (with even subscripts) are dense
in the unit interval (0, 1).
Erdős and Wagstaff [29] proved the following theorem, which explains many
properties of the graph above.
Bernoulli Numbers 113
3.10 Exercises
The first thirteen exercises deal with the Euler numbers En and Euler poly-
nomials En (x), which are defined by
∞
2ext tn n 1
= En (x) and En = 2 En .
et + 1 n=0 n! 2
5. For all real numbers x and positive integers n, prove En (x) = nEn−1 (x).
m
An (m) = (−1)m−k k n = mn − (m − 1)n + · · · + (−1)m−1 1n .
k=1
7. Prove these Fourier series expansions for the Euler polynomials. Show
that for every positive integer m and for all real noninteger t we have
∞
cos(2j + 1)πt
E2m−1 (t − t) = (−1)m 4(2m − 1)! ,
j=0
((2j + 1)π)2m
∞
sin(2j + 1)πt
E2m (t − t) = (−1)m 4(2m)! .
j=0
((2j + 1)π)2m+1
18. If m > 1, show that 2B2m ≡ 1 (mod 4), and if m > 2, then 2B2m ≡
3 − (−1)m 2 (mod 8).
19. Prove that if m, a and b are positive integers and gcd(a, b) = 1, then
(a2m − 1)(b2m − 1)B2m /2m is an integer.
20. Let p > 3 be a prime and k be a positive integer, such that (p − 1) 2k.
Prove that
p−2k B2k
2 + 3p−2k − 4p−2k − 1 ≡ s2k−1 (mod p),
4k
p/4<s<p/3
21. When p and k are fixed, x and y are real numbers, and a, b, c are integers,
let S(x, y) = xp<s<yp s2k−1 and C(a, b, c) = ap−2k + bp−2k − cp−2k − 1.
Then the congruences of Exercise 20 become
and
These congruences have about p/12 and p/10 total terms in their sums,
respectively. Prove that
Exploit the fact that the endpoints of the second summation interval
are exactly twice those of the first summation interval and show that
C(2, 5, 6)B2k /4k ≡ (1 + 22k−1 )S(1/6, 1/5) − 22k−1 S(3/10, 1/3) (mod p),
with only about p/15 total terms in the sums. Can you find similar
congruences for B2k with fewer terms in the sums?
22. In case p is prime, 12 ≤ 2k < p − 2 and 2p−2k ≡ 3p−2k ≡ 5p−2k ≡
1 (mod p), the coefficients C(a, b, c) of all four congruences mentioned
in the previous exercises vanish, and they cannot be used to compute
B2k mod p. In this situation, show that the coefficient of B2k in the
congruence
(22k−1 + 32k−1 + 62k−1 − 1)B2k /4k ≡ (p − 6s)2k−1 (mod p2 )
0<s<p/6
24. Prove this result of Carlitz: If pe (p − 1)|2k for some e > 1, then pe
divides the numerator of B2k + 1p − 1.
25. Without using Theorem 3.22, prove that infinitely many Bernoulli num-
bers have denominator Qn = 30.
26. Show that if p and q are distinct odd primes, then Qp−1 = Qq−1 .
27. Let p be an irregular prime and p|Pm , where m is even and 2 ≤ m ≤
p − 3. Then p|Nm+(p−1)s for every nonnegative integer s. Define at
by Nm+(p−1)t ≡ pat (mod p2 ) and 0 ≤ at < p. Let n = m + (p − 1)t
for some nonnegative integer t. Prove that p2 |Nn if and only if −a0 ≡
(a1 − a0 )t (mod p). Prove that when a0 = 0 and a1 = a0 , there are
infinitely many even positive integers n, such that p n and p2 |Pn .
28. Let Q(x) denote the number of square-free positive integers ≤ x. It is
well known (see Theorem 333 of [52]) that
6x √
Q(x) = 2
+ O( x).
π
Prove that 1 6
= 2 log x + O(1).
n π
1≤n≤x
n square-free
Chapter 4
Examples of Modular
Forms
4.1 Introduction
The main theme of this chapter is to exhibit the close relation between the
higher reciprocity laws of number theory and the classical theory of modular
forms. The simplest such relation is perhaps noticed in connection with the
quadratic reciprocity law: let f (x) = ax2 + bx + c be a quadratic polynomial
with integer coefficients and negative discriminant d = b2 − 4ac, let χ(n) =
(d/n) be the Jacobi symbol, and let Fp be the p-element field. Then for a
prime p, which is relatively prime to d, we have the equality
It should be observed that on the left hand side we have a number that
arises from purely diophantine data; as opposed to this, we have on the right
hand side a number that can be thought of as arising from purely analytical
considerations. In fact, the number 1 + χ(p) is an eigenvalue of the operator
Tp acting on the function
⎛ ⎞
∞
1 ⎝
E(q) = L(0, χ) + χ(d)⎠ q n .
2 n=1 d|n
The Hecke operator Tp is the linear map defined on formal power series, such
as
∞
E(q) = a(n)q n ,
n=0
by
∞
∞
def n
(E|Tp )(q) = a(np)q + χ(p) a(n)q pn .
n=0 n=0
117
118 Sums of Squares of Integers
In this chapter we will discuss four classical examples that are intended to
serve as an indication of the basic questions treated in the theory of modular
forms and their connection with number theory.
General references for this chapter include Hardy [49], Samuel [95] and Serre
[97].
THEOREM 4.1
(a) We have the following identities:
∞
def
φ(q) = q 1 − q 8n 1 − q 16n
n=1
2
+8n2
= (−1)n q (4m+1)
m,n∈Z
2
+16n2
= (−1)m+n q (4m+1) .
m,n∈Z
PROOF We will prove Part (a) here, and postpone the proof of Part (b)
until later.
The well-known Jacobi triple product identity
∞
1 + q 2m−1 ν −1
2
ν m qm = 1 − q 2m 1 + q 2m−1 ν
m∈Z m=1
and
∞
m 2m2 +m n 2n2
(−1) q (−1) q = (1 − q n ) 1 − q 2n .
m∈Z n∈Z n=1
Now replace q by q 8 and then multiply both sides of the last two identities by
q to obtain
∞
2 2
q 1 − q 8n 1 − q 16n = (−1)n q (4m+1) +8n
n=1 m,n∈Z
2
+16n2
= (−1)m+n q (4m+1) .
m,n∈Z
the identities in the statement of Theorem 4.1(a) and Gauss’ two criteria for
the biquadratic character of 2.
X 4 − 2 ≡ 0 (mod p)
has a solution and hence four solutions if and only if a ≡ 0 (mod 2).
X 4 − 2 ≡ 0 (mod p)
has a solution and hence four solutions if and only if β ≡ 0 (mod 2).
We now raise both sides to the power (p − 1)/4 and observe that, modulo p,
2
(−1)(p−1)/4 ≡ 1, 4(p−1)/4 = 2(p−1)/2 ≡ =1
p
b
k
b 2 p
(b2 )(p−1)/4 = b(p−1)/2 ≡ ≡ = = 1,
p p p b
4a + 1 p 8b2 2
(4a+1)2·(p−1)/4 ≡ = = = = (−1)a .
p 4a + 1 4a + 1 4a + 1
Collecting these congruences we have
Gauss’ First Criterion now readily follows from the simple arithmetical fact
that the congruence
X n ≡ a (mod p)
has s = gcd(n, p − 1) solutions if a(p−1)/s ≡ 1 (mod p) and has no solutions
otherwise.
Examples of Modular Forms 121
with δ 2 |4(a + β) + 1 and δ02 |4(a − β) + 1. Now from Equation (4.6) we obtain
4(a + β) + 1 4(a − β) + 1
= x2 − 8y 2
δ2 δ02
with x odd and gcd(x, y) = 1. From the congruence
4(a + β) + 1
x2 − 2(2y)2 ≡ 0 (mod D), D= ,
δ2
we get that
2 2
−1)/8
1= = (−1)(D ,
D
or equivalently, D2 ≡ 1 (mod 16). But since δ 4 ≡ 1 (mod 16), this reduces to
(4(a + β) + 1)2 ≡ 1 (mod 16) or 8(a + β) ≡ 0 (mod 16). Therefore,
Thus we get that β ≡ 0 (mod 2) if and only if a ≡ 0 (mod 2). This proves
Gauss’ Second Criterion.
here we have used the simple fact that b2 ≡ b (mod 2). This last congruence
together with Congruence (4.7) give for a prime p ≡ 1 (mod 8),
where b, α and β are the numbers appearing in Gauss’ two criteria. It then
appears that the identities in Theorem 4.1(a) are a generalization of the equiv-
alence of Gauss’ two criteria for the biquadratic character of 2.
122 Sums of Squares of Integers
2. (Euler).
(ax2 + cy 2 )(z 2 + acw2 ) = aX 2 + cY 2 , (4.11)
where X = xz − cyw, Y = axw + zy.
Since the counting function for the number of representations of n by the form
x2 + 8y 2 with x ≡ 1 (mod 4) is multiplicative, we see that all representations
of n are obtained by composition from all the representations of n and n .
Now, since x ≡ x ≡ 1 (mod 4), we get from (4.12) that b(n) = x b + x b ≡
b + b (mod 2), that is
where a , b (respectively a , b ) run over all solutions of n = (4a + 1)2 + 8b
2
(respectively n = (4a + 1)2 + 8b ), and use (4.12) and (4.13), we get
2
(−1)b · (−1)b = (−1)b ,
a , b a , b a, b
where the last sum runs over all representations of n = (4a + 1)2 + 8b2 . But
this last sum is precisely a(n). This proves that a(n) is multiplicative.
p2v = ((−1)v pv )2 + 8 · 02 .
p2v = ((−1)v pv )2 + 16 · 02 .
p2v = ((−1)v pv )2 + 8 · 02 ,
and hence the sum a,b (−1)b , p2v = (4a + 1)2 + 8b2 , contains only one term
which is 1.
There remains the first case. Here pv is representable by each form
−1 2
= =v+1
d d
d|pv d|pv
4(a + β) + 1
x2 − 2(2y)2 ≡ 0 (mod D), D= .
δ2
p p
ε ≡ (mod p),
b 4a + 1
where b = 2k b , b odd. Now raising both sides to the v-th power we get
pv pv
εv ≡ (mod p)
b 4a + 1
126 Sums of Squares of Integers
and
2
εv ≡ ≡ (−1)a (mod p). (4.16)
4a + 1
Therefore we obtain
v
−1)/8
(−1)b = εv (−1)(p
and
v
−1)/8
a(pv ) = εv (−1)(p (v + 1), p ≡ 1 (mod 8).
This completes the evaluation of a(n).
In summary, we have proved this corollary.
COROLLARY 4.1
(i) a(n) = 0 if n ≡ 1 (mod 8) or if n is not of the form
n= pα(p) q 2α(q) r2α(r) s2α(s) ,
(1) (3) (5) (7)
where the product (a) runs over the prime divisors of n that are ≡ a (mod 8).
v
(ii) a(pv ) = εv (−1)(p −1)/8 (v + 1) if p ≡ 1 (mod 8) and ε ≡ 2(p−1)/4 (mod p).
(iii) a(p2v ) = (−1)v if p ≡ 3 (mod 8).
(iv) a(p2v ) = 1 if p ≡ 5 or 7 (mod 8).
Our next goal is to exhibit, within the framework of this example, the true
connection between the coefficients in the power series expansion
∞
∞
q 1 − q 8n 1 − q 16n = a(n)q n
n=1 n=1
and the higher reciprocity laws. This is accomplished in the next theorem,
which is stated without using the language of class field theory.
Then
−1
|{x ∈ Fp : x4 − 2x2 + 2 = 0}| = 1 + + a(p). (4.17)
p
Examples of Modular Forms 127
Φp : a −→ ap ,
and on the other hand we have an expression involving the eigenvalues of the
Hecke operators.
Now, since (1 + i)(1 − i) = 2 is a square in Fp , either both f1 (x) and f−1 (x)
split into two linear factors or both remain irreducible. But the polynomial
x2 − (1 + i) splits into two linear factors over Fp if and only if (1 + i) is a
square in Fp , that is, if and only if
But recall from Corollary 4.1(ii) that, for a prime p ≡ 1 (mod 8),
a4 − 4a2 − 4 = 0.
COROLLARY 4.2
Over the finite field Fp the polynomial f (x) = x4 − 2x2 + 2 is
(i) the product of four linear factors if p ≡ 1 (mod 8) and a(p) = 2,
(ii) the product of two irreducible quadratic factors if p ≡ 1 (mod 8) and
a(p) = −2,
(iii) an irreducible quartic if p ≡ 3 (mod 8),
(iv) the product of two linear factors and an irreducible quadratic factor if
p ≡ 5 (mod 8).
(v) the product of two irreducible quadratic factors if p ≡ 7 (mod 8).
Example 4.1
If A is the ring of integers of a number field, then we consider a point of Spec (A)
of degree e as lying on the intersection of e “sheets.” If we think of Spec Z[α],
where α4 − 2α2 + 2 = 0, as a “covering” of Spec Z, then the following picture
shows that part of the “covering” that lies above the primes ≤ 47.
r
r r r r r r r r r r
Spec Z[α] r r r r r r r r r r
r r r r r r r r r r
Spec Z r r r r r r r r r r r r r r r
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47
For example, above the point 11 there is a point in Spec Z[α] of degree 4; above
the point 41 there are four points in Spec Z[α] each of degree 1. From Theorem
4.4 and Corollary 4.2 we can read easily the values of a(p), in particular, a(11) =
0 and a(41) = 2.
lead him to conjecture, among other things, that the arithmetical function
τ (n) is multiplicative. This fact was subsequently established by Mordell in
1917 by a method based on the analytic and automorphic properties of ∆(q).
Mordell’s basic idea consisted in using certain linear operators, now called
Hecke operators, that had been studied extensively by Hurwitz in his theory
of modular correspondences. In retrospect, this is not very surprising and in
fact in his work on modular correspondences on the Riemann surface, which
admits the simple group of order 168 as a group of automorphisms, Hurwitz
130 Sums of Squares of Integers
PROPOSITION 4.1
The discriminant function
∞
∆(z) = q (1 − q n )24 q = exp(2πiz),
n=1
az + b
∆ = (cz + d)12 ∆(z),
cz + d
with a, b, c and d ∈ Z and ad − bc = 1.
−1
∆(z + 1) = ∆(z) and ∆ = z 12 ∆(z).
z
This is because of the following slightly more general statement.
LEMMA 4.1
The unimodular group
ab
Γ= : a, b, c, d ∈ Z, ad − bc = 1
cd
Examples of Modular Forms 131
11 0 −1
T = and S = .
01 1 0
ab
PROOF We need to show that any element σ = in Γ can be
cd
written as a word in T and S. We use induction on m = |a|. For m = 0 we
see that the condition ad − bc = 1 implies −bc = 1 and hence in this case we
may suppose without loss of generality that b = 1 and c = −1. The claim in
this case follows trivially from the matrix identity
0 1 0 −1 1 −d −1 0
= = ST −dS 2 .
−1 d 1 0 0 1 0 −1
Let us suppose then that the claim holds true for all matrices with 0 ≤ |a| ≤
m − 1 and prove it for a matrix with |a| = m > 0. We may assume also that
c > 0. For if c = 0, then a = d = ±1 and clearly either
ab ab 1b
= = = Tb
cd 0a 01
or
ab −1 b 1 −b
= = S 2 = T −b S 2 .
cd 0 −1 0 1
If c < 0, then
ab −a −b
σ= = S2
cd −c −d
and clearly −c > 0. We may suppose further that |a| ≥ c. For if |a| < c,
ab 0 −1 c d c d
σ= = =S
cd 1 0 −a −b −a −b
and then σ can be replaced by a matrix with |a| ≥ c. Now let h be such that
a = ch + r with |r| < c ≤ |a|. Then
ab a − ch b − dh r b − dh
T −h σ = T −h = = ,
cd c d − ch c d − ch
LEMMA 4.2
The Dedekind function
∞
η(z) = q 1/24 (1 − q n ) , q = exp(2πiz),
n=1
−1 z 1
2
η = η(z).
z i
which satisfies the functional equation Λ(s) = Λ(1 − s). These properties in
turn imply the following functional equation for
Φ(s) = Λ(s)Λ(s + 1)
1
= GR (s)GR (s + 1)
p prime
1 − (1 + p−1 )p−s + p−1−2s
∞
∞ −1
m
= (2π)−s Γ(s) (4.19)
m=1 n=1
(mn)s
= Φ(−s),
∞
1
at s = 0 : Φ(s) = 2
+ A0 (n)sn
2s n=0
∞
π 1
at s = 1 : Φ(s) = · + A1 (n)(s − 1)n
12 s − 1 n=0
∞
π 1
at s = −1 : Φ(s) = − · + A−1 (n)(s + 1)n .
12 s + 1 n=0
The Dirichlet series in Equation (4.19) has the same coefficients as the q-
expansion
∞
∞
F (z) = m−1 q mn
m=1 n=1
∞
∞
−1 n m
= m (q )
n=1 m=1
∞
=− log(1 − q n ).
n=1
valid for σ > 1 and (z) > 0. In the latter formula the integral is taken
along the line (s) = σ. Hecke’s method consists in moving the contour of
integration to the line (s) = −σ and making use of the bounded behavior
of Φ(s) on vertical strips away from the real axis. In doing this the poles of
the integrand as s = 0, ±1 give rise to residue contributions that are easily
calculated. Thus we obtain
−σ+i∞
2πi 1 1 z 1 z −s
F (z) = z+ − log + Φ(s) ds.
24 z 2 i 2πi −σ−i∞ i
134 Sums of Squares of Integers
Now, using the functional equation Φ(s) = Φ(−s), the last integral is seen to
be equal to F (−1/z). Therefore we obtain
1 1 z
log η − = log η(z) + log .
z 2 i
The exponentiation of this formula leads to the result stated in Lemma 4.2.
D
i
q
ρ q q
− 21 1
2
Real axis
The two main properties that a fundamental domain must satisfy are:
1. For every z ∈ H, there is a σ ∈ Γ such that σ(z) ∈ D, and
2. If z ∈ D and σ ∈ Γ are such that σ(z) ∈ D and σ is not the identity,
then either (z) = ± 12 and σ(z) = z ± 1 or |z| = 1 and σ(z) = −1/z.
A more precise definition of fundamental domain for arbitrary discrete sub-
groups of Γ will be given in the next chapter. Here we simply establish that
these two conditions are indeed satisfied by D. To verify Property 1 we ob-
ab
serve that for σ = ∈ Γ and z = x + iy ∈ H, we have
cd
y y
(σ(z)) = = . (4.20)
|cz + d|2 (cx + d)2 + (cy)2
Since c and d are integers, there are at most a finite number of matrices
∗∗
∈ Γ for which (cx + d)2 + (cy)2 is bounded from above by a given
cd
positive number. Hence for each z ∈ H, a σ ∈ Γ exists, which makes σ(z)
a maximum. Now choose an integer, such that T n σ(n) has real part between
−1/2 and 1/2. We claim that the element z = T n σ(z) belongs to D, that
Examples of Modular Forms 135
LEMMA 4.3
Suppose that f (z) is a bounded holomorphic function defined on the upper
half plane H and suppose f (z) is Γ-invariant, that is,
1
f (z + 1) = f (z) and f − = f (z), for all z ∈ H.
z
Observe that if det σ = 1 and f (σ(z)) = (cz + d)k f (z), then f ◦ [σ]k (z) = f (z).
In particular we have
ab
∆ ◦ [σ]12 (z) = ∆(z), σ= ∈ Γ.
cd
1
p−1
z+v
Tp f (z) = pk−1 f (pz) + f .
p v=0 p
then
∞
1
p−1
Tp f (z) = pk−1 f (pz) + a(n) exp(2πinz/p) exp(2πinv/p) .
n=1
p v=0
But
1
p−1
1 if p|n
exp(2πinv/p) =
p v=0 0 otherwise,
Examples of Modular Forms 137
and hence
∞
∞
Tp f (z) = a(np)q n + pk−1 a(n)q np . (4.23)
n=1 n=1
From this it follows easily that Tp f (z) is periodic of period 1, that is,
11
Tp f ◦ (z) = Tp f (z + 1)
01 k
= Tp f (z).
Suppose f ◦ [σ]k (z) = f (z) for all σ ∈ Γ, write Equation (4.22) in the form
p−1
1 z 1v
Tp f (z) = p k−1
f (pz) + f +p k/2−1
f◦ (z),
p p 0p k
v=1
and consider the effect of the transformation S : z −→ −1/z on the function
Tp f (z)
1 z
Tp ◦ [S]k f (z) = pk−1 f (pz) ◦ [S]k + f ◦ [S]k + (4.24)
p p
p−1
1v
p k/2−1
f◦ ◦ [S](z).
0p k
v=1
k
Since f (−1/z) = z f (z), we see that the first two terms in the right-hand side
of Equation (4.24) simply get permuted, that is,
1 z 1 z
pk−1 f (pz) ◦ [S]k + f ◦ [S]k = f + pk−1 f (pz).
p p p p
The rest of the sum in Equation (4.24) also remains invariant. To see this we
observe that
1v 1v
f◦ [S](z) = f ◦ · S (z).
0p k 0p k
But if v = 0, we can find a w = 0 with up − vw = 1 and
1v 1w
· S = SA−1
0p 0 p
The observations applied to the function f (z) = ∆(z) with k = 12 give that
the function g(z) = Tp ∆(z)/∆(z) satisfies
g(σ(z)) = g(z), σ ∈ Γ,
is nonzero and bounded in the finite part of the plane, we can apply Lemma
4.3 to the function
τ (p)q + τ (2p)q 2 + · · ·
g(z) =
q + τ (2)q 2 + · · ·
= τ (p) + (∗)q + · · · .
Tp ∆(z) = τ (p)∆(z).
and
Let
Uλ = τ (pλ n) − τ (pλ )τ (n)
and observe that Equation (4.25) gives the difference equation
(1 − τ (p)T + p11 T 2 )f (T ) = 1
with ∞
f (T ) = τ (pv )T v .
v=0
Then
(a) τ (n) ≡ b(n) (mod 23), and
(b) |{x ∈ Fp : x3 − x − 1 = 0}| = 1 + b(p) for prime p = 23.
REMARK 4.6 It will be seen below that if the Legendre symbol (−23/p) =
−1, then x3 − x − 1 = 0 has one and only one solution in the finite field Fp .
From Part (b) of the theorem we then recover the classical statement that
τ (p) ≡ 0 (mod 23) for
p ≡ 5, 7, 10, 11, 14, 15, 17, 19, 20, 21, 22 (mod 23).
The proof of Theorem 4.6 will proceed in several steps. First we will eval-
uate explicitly the coefficients b(p) in terms of the behavior of the prime p in
140 Sums of Squares of Integers
√
the quadratic extension Q( −23). We shall then examine the factorization of
the polynomial f (x) = x3 − x − 1 = 0. Finally we relate these factorizations
to the computation of the Frobenius conjugacy classes in the splitting exten-
sion of f (x). The end result will be the identification of the Dirichlet series
∞ −s
n=1 b(n)n with the Artin L-function L(s, ρ) associated with the unique
two-dimensional representation of the Galois group of the splitting extension
of f (x).
Let us first observe that Theorem 4.6(a), which is equivalent to the formal
congruence
∞
∞
q (1 − q n )24 ≡ q (1 − q n ) 1 − q 23n (mod 23),
n=1 n=1
LEMMA 4.4
We have
∞
∞
(1 − q n ) = a(n)q n , (4.26)
n=1 n=0
where ⎧
⎨1 if n = 0
a(n) = (−1)m if n > 0 and n = m(3m ± 1)/2
⎩
0 otherwise.
When the values a = 3/2 and b = 1/2 are substituted into this equality there
results Euler’s well-known identity
∞
(1 − q n ) = (−1)n q n(3n+1)/2
n=1 n∈Z
∞
= a(n)q n .
n=0
Examples of Modular Forms 141
where
b(n) = a(k)a(m),
and the sum runs over all solutions of n = 1 + k + 23m. Hence we have
b(n) = a(m)a(n − 1 − 23m) (4.27)
0≤m≤h
= a(n − 1) + a(1)a(n − 24) + · · · + a(h)a(n − 1 − 23h),
where h = (n − 1)/23. Now if the sum (4.27) is nonempty, then from the
definition of the a(n), there must exist integers v and w, such that
v w
n = 1 + (3v ± 1) + (3w ± 1),
2 2
or equivalently,
24n = (6v)2 ± 2(6v) + 1 + 23 (6w)2 ± 2(6w) + 1
= (6v ± 1)2 + 23(6w + 1)2 ,
which implies that the Legendre symbol 24n
23 = 1, that is,
where the sum runs over those values of m and n for which
m n
p − 1 − 23 (3m ± 1) = (3n ± 1),
2 2
that is,
24p = (6n ± 1)2 + 23(6m ± 1)2 .
To evaluate b(p) we must make a study of the character of the integers u
and v that satisfy the equation
and obtain that if p is a quadratic residue of 23, then it splits into the product
of two prime ideals
1 1
(p) = π · π , π, π = p, s − · 23 ∓ · α ,
2 2
where s is an integer that satisfies the congruence x2 + 23x + 138 ≡ 0 (mod p).
The class group CL(k) of k = Q(α) is of order 3: the principal class which
contains the principal ideals and the two classes represented by the ideal
factors of 2,
1 1 1 1
π2 = 2, + α , π2 = 2, − α .
2 2 2 2
Examples of Modular Forms 143
In fact the class group is isomorphic to the group of ideal classes of binary
quadratic forms of discriminant −23, of which a representative system is given
by the three binary quadratic forms,
x2 + xy + 6y 2 , 2x2 + xy + 3y 2 , 2x2 − xy + 3y 2 .
In the following argument we shall also need the prime ideal factors of 3,
1 1 1 1
π3 = 3, − α , π3 = 3, + α ,
2 2 2 2
where the notation has been chosen so that π2 and π3 belong to the same
ideal class.
The value of b(p) for a prime p with (−23/p) = 1 depends on whether the
prime ideal factors of p in k = Q(α) are in the principal class or not.
Case 1. (p) = π · π and π is a principal ideal, that is,
1 1
(π) = a + bα , a − b even.
2 2
In this case we have
4p = 4π · π = a2 + 23b2 ,
so that
a2 − b2 = 4(p − 6b2 ),
from which it follows that a and b must both be even, for otherwise a2 − b2 ≡
0 (mod 8). We therefore have
p = h2 + 23k 2 . (4.31)
Also, since the prime ideal decomposition (p) = π · π is unique there is only
one solution in positive integers of Equation (4.31) and
must hold. Hence in the equation 24p = (6n ± 1)2 + 23(6m ± 1)2 , m and n
are either both even or both odd. For otherwise, 6(m ± n) ≡ 0 (mod 24).
Therefore the sum in Equation (4.29) contains only two terms and for both
144 Sums of Squares of Integers
of these m + n is even. Hence b(p) = (−1)m+n = 2. We have thus shown
that if p splits in Q(α) into two principal prime ideals, then b(p) = 2.
Case 2. (p) = π · π and π is a nonprincipal ideal. In this case we choose a
root r of x2 + 23x + 138 ≡ 0 (mod p) so that the prime ideal factor of p,
1 1
π= p, 2 + · 23 + α ,
2 2
is in the same class with π2 and π3 , and also π is in the same class with π2
and π3 . Hence π2 π3 π is a principal ideal and 24p is expressible in the form
u2 + 23v 2 in only one way:
1 1
π2 π3 π = (3 + 3α), 1 − α p, r + (23 + α) (4.32)
2 2
1 1
= p · (3 + 3α), p(1 − α)(1 − α) r + (23 + α) ,
2 2
1 1
(3 + 3α) r + (23 + α)
2 2
1 1
= u + vα ,
2 2
where u and v are solutions of 24p = u2 + 23v 2 . It follows that there must
exist integers h, k, h and k , such that
1 1 1 1 1
p · (3 + 3α) = u+ vα · h + kα
2 2 2 2 2
1 1 1 1
p · (1 − α) = u+ vα · h + kα .
2 2 2 2
The second of these equalities gives
Similarly,
p(1 + α)(u − vα) = 4p(h + kα),
from which we deduce that
u = 6n + 1, v = 6m − 1 or u = 6n − 1, v = 6m + 1
Examples of Modular Forms 145
and
3n − 3m ± 1 = 2k.
But this implies that m + n is odd; now since the prime ideal factorization
(4.32) is unique, the sum in Equation (4.29) contains only one term and hence
b(p) = −1. We collect these results in the following statement.
PROPOSITION 4.2
The coefficients b(n) in the expansion
∞
∞
q (1 − q n ) 1 − q 23n = b(n)q n
n=1 n=1
In order to complete the proof of Theorem 4.6 we need to recall two basic
propositions that describe the decomposition of rational primes in algebraic
extensions. Here we simply recall their statement and basic definitions.
The group G acts as a permutation group on the set S = {Hσ1 , . . . , Hσr } via
right multiplication: for g ∈ G and Hσ ∈ S we have g : Hσ −→ Hσg. By a
cycle of length t for g ∈ G we mean a sequence
S = C1 ∪ C2 ∪ · · · ∪ Cs , length Ci = ti , 1 ≤ i ≤ s.
146 Sums of Squares of Integers
Fix a prime p which does not divide the discriminant and let χ0 be the triv-
ial character, sgn the signature character and χρ the character of the two-
dimensional representation. The first three rows in the table below give the
character table of S3 .
χ0 1 1 1
sgn 1 1 −1
χρ 2 −1 0
σp C1 C2 C3
(p) Q1 · Q2 · Q3 Q Q1 · Q2
deg Qi 1, 1, 1 3 1, 2
f (x) mod p (x − α)(x − β)(x − γ) f (x) irreducible (x − α)(x2 + ax + b)
Going up from the last row to the row of conjugacy classes is accomplished
by first applying Theorem 4.8 and then Theorem 4.7. From this table we can
read easily the following result.
COROLLARY 4.3
Let the notation be as above. For a prime p that does not divide D(f ) we
have
|{x ∈ Fp : f (x) = 0}| = 1 + χρ (σp ).
We are now ready to complete the proof of Theorem 4.6. We apply the
above table to the example f (x) = x3 − x − 1 with roots α, β, γ and
discriminant D(f ) = −23, which is also the
√ discriminant of the splitting
field Kf = Q(α, β, γ). The field k = Q( −23) is contained in Kf and
148 Sums of Squares of Integers
that satisfy similar functional equations and have Euler product expansions.
Because of the many examples that were available and perhaps because of the
then recent publication of Hamburger’s argument characterizing the Riemann
zeta function ζ(s) as the unique function that satisfies the functional equation
1
Λ(s) = π −s/2 Γ(s/2)
def
= Λ(1 − s)
p
1 − p−s
and also because of Mordell’s argument (see Theorem 4.5, which establishes
the Euler product expansion
∞
1
τ (n)n−s = ),
n=1 p
1− τ (p)p−s + p11−2s
Hecke was lead in the late 1920s to consider the problem of characterizing
all functions φ(s) of the complex variable s, which are of signature {N, γ, k}.
Here N > 0 and k > 0 are real numbers and γ is a complex number. Recall
that a function φ(s) has signature {N, γ, k} if it has these three properties:
1. We have
Hecke was very successful in solving the problem for signature {1, γ, k}, k
integral, and showing, as we will do later, that the vector space of functions
{1, γ, k} is finite dimensional and has a basis consisting of Dirichlet series with
Euler product expansions. In fact the dimension of {1, γ, k} is given by
∞
1
= dimC {1, γ, k}T k .
(1 − T )(1 − T )
4 6
k=0
and
∞
1
φE (s) = σ11 (n)n−s = ,
n=1 p
1 − σ11 (p)p−s + p11−2s
150 Sums of Squares of Integers
where σ11 (n) = d|n d11 .
The last example we want to consider is Hamburger’s theorem, which gives a
characterization of Riemann’s zeta function ζ(s). We can think of this theorem
as describing completely the modified space of functions {1, γ, 1}0, where the
subscript 0 indicates that in Property 1 above we define Λ(s) = GR (s)φ(s).
where ∞
g(1 − s) = b(n)n−(1−s) ,
n=1
the series being absolutely convergent for σ < −α, for some constant α < 0.
Then f (s) = cζ(s), where c is a constant.
LEMMA 4.5
For any real a > 0 we have
a+i∞
1
e−x = Γ(s)x−s ds.
2πi a−i∞
This last expression exhibits clearly the location of the poles and the cor-
responding residues of the function Γ(s). To complete the proof of Mellin’s
formula the line of integration is moved to the left so as to lie half way between
two poles of Γ(s). One then uses Cauchy’s formula to obtain
a+i∞
k−1 2 −k+i∞
1
1 −s (−1)n n 1
Γ(s)x ds = x + Γ(s)x−s ds.
2πi a−i∞ n=0
n! 2πi 2 −k−i∞
1
e−π|t|/2
|Γ(s)x−s | .
xk−1/2 (1 + |t|)k
The integral that defines φ(x) is now replaced by its equivalent form (4.33)
to get
2+i∞
1
φ(x) = π −(1−s)/2 Γ((1 − s)/2)g(1 − s)x−s/2 ds.
2πi 2−i∞
We want to move the line of integration from σ = 2 to σ = −1 − α. We
observe first that f (s) is bounded on the line σ = 2 and g(1 − s) is bounded
on σ = −1 − α. Now, by (4.33) and (4.34) we have
Γ(s/2)
|g(1 − s)| = π −s+1/2 f (s) |t|3/2 , for (s) = 2.
Γ((1 − s)/2)
152 Sums of Squares of Integers
The Phragmén-Lindelöf principle and Cauchy’s residue theorem give the new
formula
−α−1+i∞ m
1
φ(x) = π −(1−s)/2 Γ((1 − s)/2)g(1 − s)x−s/2 ds + Rv ,
2πi −α−1−i∞ v=1
we see that
m
m
Rv = x−sv /2 Qv (log x) = Q(x),
v=1 v=1
where the Qv (log x) are polynomials in log x. Again by using Mellin’s formula
we have
∞ −α−1+i∞
1
φ(x) = √ b(n) Γ((1 − s)/2)(πn2 x−1 )−(1−s)/2 ds + Q(x)
x n=1 −α−1−i∞
∞
2
b(n)e−πn /x + Q(x).
2
= √
x n=1
We multiply both sides by e−πt x , t > 0, and integrate over (0, ∞). We apply
2
∞ ∞
a(n) 1 e−2πnt 1 ∞
Q(x)e−πt x dx,
2
· 2 2
= b(n) · +
n=1
π t + n n=1
t 2 0
0
Examples of Modular Forms 153
Since the b’s are integers and (a) > −1, integration by parts shows that this
integral is a sum of terms of the form tα (log x)β . Hence
∞
∞
1 1
a(n) + − πtH(t) = 2π b(n)e−πnt ,
n=1
t + in t − in n=1
where H(t) is a sum of terms of the form tα (log x)β . Now, the series on
the left is a meremorphic function with poles at ±in. But the function on
the right is periodic with period i and, by analytic continuation, so is the
function on the left. Therefore the residues at ki and (k + 1)i are equal, that
is, a(k + 1) = a(k) = · · · = a(1) and
∞
f (s) = a1 n−s = a1 ζ(s).
n=1
From the functional equation (4.33) we get g(1 − s) = a1 ζ(1 − s). This
completes the proof of Hamburger’s theorem.
4.6 Exercises
1. Prove that (1 − x)24 ≡ (1 − x8 )(1 − x16 ) (mod 4) and hence obtain that
τ (n) ≡ a(n) (mod 4), where τ is the Ramanujan arithmetical function
defined by
∞
∞
q (1 − q n )24 = τ (n)q n .
n=1 n=1
1+i 1+d
= ,
p p
then
∞
1
a(pv )T v = .
v=0 1 − a(p)T + −8
p T
2
6. Show that φ(z) = η(8z)η(16z) and use the functional equation for η(z)
given in Lemma 4.2 to show that
√
−1 128
φ = zφ(z).
128z i
9. Show
⎧ ⎫
1⎨ ⎬
∞
2
+mn+6n2 2 2
q (1 − q n ) 1 − q 23n = qm − q 2m +mn+3n
.
n=1
2⎩ ⎭
m,n∈Z m,n∈Z
10. Prove that if f (x) is a monic polynomial of degree 3 with integer coeffi-
cients, and if α, β, and γ are its roots over the complex numbers, then
its discriminant
Df = (α − β)2 (β − γ)2 (γ − α)2
is a rational number, which is an integer only when the three roots α,
β, and γ are rational numbers.
Examples of Modular Forms 155
11. Prove that if the polynomial f (x) of degree 3 with integer coefficients
splits into 3 linear factors in the finite field Fp , then the discriminant
Df is a quadratic residue modulo p.
K = Fp [x]/(f (x)),
where (f (x)) is the ideal generated in Fp [x] by all the multiples of f (x),
is a cyclic extension of degree 3 of the finite field Fp . Deduce from
this that the polynomial x2 − Df in Fp [x] is reducible, that is, Df is a
quadratic residue modulo p.
Hecke’s Theory of
Modular Forms
5.1 Introduction
In this chapter we present an outline of the elementary theory of modular
forms. From the beginning we emphasize those aspects of the theory which
are closely connected with combinatorial number theory: additive vs. multi-
plicative problems. This outline is intended to serve the student as a guide to
the more advanced books on the subject, such as Shimura’s text [101]. Most
of the material presented is quite standard and can be derived with only a
good knowledge of elementary number theory and the rudiments of the theory
of functions of one complex variable. The exceptions are Sections 5.10–5.13,
where we have included material that is not readily available and can only be
found scattered in the literature. In Section 5.10 we compute the Fourier coef-
ficients of Poincaré series in terms of Bessel functions and Kloosterman sums.
In Section 5.11 we use the results of Section 5.10 and the well-known Weil
estimate for Kloosterman sums to obtain an upper bound for the Ramanujan
τ -function by a method that is applicable to the Fourier coefficients of mod-
ular forms of nonintegral weight, a case not covered by Deligne’s bound. The
Eichler-Selberg trace formula, an exact formula for the trace of Hecke opera-
tors, is described in Section 5.12. Finally, Section 5.13 is a brief description of
the connection between -adic Galois representations attached to eigenforms
of Hecke operators and the Ramanujan conjecture.
In the following, C, R, Q and Z denote respectively the field of complex
numbers, the field of real numbers, the field of rational numbers and the ring
of rational integers. Recall that (z) and (z) denote the real and imaginary
parts of the complex number z, respectively. When R is a commutative ring
with unity, we let SL(2, R) denote the set of 2 × 2 matrices with coefficients
in R and determinant 1. Two of these matrices are the identity matrix 12
(or simply 1) and its negative −12 (or simply −1). Thus SL(2, R)/{1, −1}
or SL(2, R)/{±1} denotes the set of 2 × 2 matrices with real coefficients and
157
158 Sums of Squares of Integers
is the quotient of (E4 (z))3 , the third power of the Eisenstein series E4 (z) of
weight 4 (see Section 5.4), by the Ramanujan modular form ∆(z), normalized
so that the residue at ∞ is 1. From the point of view of algebraic geometry,
more interesting objects arise when we consider arithmetically defined sub-
groups of Γ that are also of finite index in Γ. One such class of groups is the
Hecke congruence subgroups, denoted by Γ0 (N ), for each positive integer N .
The elements of Γ0 (N ) are defined by
az + b ab
z −→ σ(z) = , σ= ∈ Γ, c ≡ 0 (mod N ).
cz + d cd
The index of Γ0 (N ) in Γ is
1
µ=N 1+ .
p
p|N
1
D = {z ∈ H ; |(z)| ≤ , |z| ≥ 1}.
2
D
i
q
ρ q q
− 21 1
2
Real axis
The elliptic points of Γ are i = (−1)1/2 and ρ = (−1 + (−3)1/2 )/2. Strictly
speaking, we should only take half of the boundary of the fundamental domain,
but we avoid this here so as not to complicate the notation.
A fundamental domain D(N ) for Γ0 (N ) is easily obtained from that of Γ
by taking a left coset decomposition of Γ modulo Γ0 (N ), say
Γ = σ1 Γ0 (N ) ∪ · · · ∪ σµ Γ0 (N ),
and putting
D(N ) = σ1 D ∪ · · · ∪ σµ D,
Example 5.1
The Riemann surface X0 (11) has genus 1 and is the first example of an X0 (N )
with positive genus. The unique differential Ω of the first kind on X0 (11) is
Y
∞
` ´2 X
∞
Ω= (1 − q n )2 1 − q 11n dq = an q n−1 dq
n=1 n=1
(see Shimura [101]). Affine models for X0 (11) had been known for some time;
the following simple one was found by Tate:
y 2 − y = x3 − x2 .
where the ∗ indicates that the summation is over all integers m and n not
both zero. The Eisenstein series of even weight are the first nontrivial ex-
amples of integral modular forms for Γ. The functions of main interest from
an arithmetical point of view are the seminormalized Eisenstein series with
Fourier expansions
∞
1
Ek (z) = ζ(1 − k) + σk−1 (n)q n , q = exp(2πiz),
2 n=1
and ζ(1 − k) is the value of the Riemann zeta function at the point 1 − k.
Recall that ζ(1 − k) = 0 for k even and hence Ek (z) is not a cusp form. For
odd k ≥ 3, Gk (z) is trivially equal to zero. At present it is of considerable
interest to understand the nature of the series
∞
1 qn
Ek† (z) = − ζ (1 − k) + nk−1 log n
2 n=1
1 − qn
for k odd. The general problem of understanding the arithmetic nature of the
coefficients in the Taylor expansion of the Riemann zeta function ζ(s) about
the negative odd integers remains one of the most interesting and difficult
questions concerning special values of zeta and L-functions, with important
implications for the orders of the higher K-groups of the rings of integers of
number fields.
If we consider the evaluation map δ∞ whose value at a function f (z) ∈
Mk (Γ) is δ∞ (f ) = f (∞), we get the exact sequence
∞ δ
0 −→ Mk0 (Γ) −→ Mk (Γ) −→ C −→ 0,
This diophantine equation has no solutions for k < 0 or k odd. It has a trivial
solution for k = 0 and no solution for k = 2. For k = 4, 6, 8 and 10 it has
only one solution, as shown in this table:
k n n∞ ni nρ
4 0 0 0 1
6 0 0 1 0
8 0 0 0 2
10 0 0 1 1
From the existence of Eisenstein series for even k ≥ 4 and the vanishing of
E4 (z) at z = ρ and E6 (z) at z = i we get
For k = 12 the diophantine equation has four solutions corresponding to: the
Eisenstein series E12 (z), which is nonzero at i∞, i and ρ;
∞
∞
24
∆(z) = q (1 − q n ) = τ (n)q n , where q = exp(2πiz),
n=1 n=1
which vanishes at the cusp i∞ and which is nonzero otherwise in the upper
half plane; (E6 (z))2 , with a double zero at z = i; and (E4 (z))3 , with a triple
zero at z = ρ. These four functions span a two-dimensional vector space, a fact
which gives rise to several identities among divisor functions and coefficients
of cusp forms. Examples are given below.
The fact that ∆(z) vanishes only at infinity implies that for k > 12, multi-
plication by ∆(z) gives an isomorphism
∼
Mk−12 (Γ) −→ Mk0 (Γ)
between the space of integral modular forms of weight k − 12 and the space
of cusp forms of weight k. A simple induction argument using the above
isomorphism and the dimension of Mk (Γ) for k = 2, 4, 6, 8, 10 and 12 gives
k
+1 if k ≡ 2 (mod 12)
dimC Mk (Γ) = 12
k
12 if k ≡ 2 (mod 12).
Example 5.2
The graded ring of integral modular forms for Γ is generated by the forms
E4 (z) and E6 (z):
∞
MΓ = Q [E4 (z), E6 (z)] = Mn (Γ),
n=0
691
n
65 691
τ (n) = σ11 (n) + σ5 (n) − σ5 (m)σ5 (n − m).
756 756 3 m=1
its domain to the ring of integers in the usual way and then to Γ0 (N ) by
putting
ab
χ(γ) = χ(d) if γ = .
cd
1. f (z) is holomorphic on H,
az + b k ab
2. f cz + d = χ(d)(cz + d) f (z), for γ = in Γ0 (N ), and
cd
The space of modular forms of nebentype (k, χ) will be denoted by Mk (Γ0 (N ), χ).
When χ is trivial we get a the space of functions called by Hecke of “principal
type.” The cusp forms in Mk (Γ0 (N ), χ) are defined again as the kernel of the
evaluation map
weight 1 and nebentype χ which are the Mellin transforms of Artin L-functions
of two-dimensional representations of Galois groups. The latter representa-
tions are classified into two categories, the first corresponding to induced
representations and the second to noninduced ones. It is modular forms of
the second category, the “noninduced” ones, that are difficult to count.
r ei
2ei −1 2 −1 k(ei − 1) (k − 1)(ei − 1)
+ 1−t (1 − t ) − t2k .
i=1
e i e i
k=1
The proof of this theorem is based on Shimura’s formula for the dimension
of Mk0 (Γ), as in [101], p. 46, and Exercise 2 at the end of this chapter.
We proved these results in Section 4.3. The basic result in the classical theory
of modular forms on Γ = SL(2, Z)/{±12 } is the following theorem:
Mk = CEk ⊕ Mk0 .
The action of Tk (n) on a modular form f whose expansion about the cusp at
infinity is
∞
f (z) = a(m)q m , q = exp(2πiz),
m=0
is given by
∞
mn
Tk (n)f (z) = qn dk−1 a .
m=0
d2
d| gcd(m,n)
Hamburger [46] proved in 1921 that the Riemann zeta function is characterized
by its functional equation. One of Hecke’s main contributions to the theory
of modular forms was his generalization of Hamburger’s theorem.
Λf (s) = ik Λf (k − s).
where D is a fundamental domain for the quotient space H/Γ. The theorem
of Hecke and Petersson, Theorem 5.2, is an immediate consequence of the
identity
< Tk (n)f, g > = < f, Tk (n)g >,
which implies that the Hecke operators Tk (n) are Hermitian. The princi-
pal axis theorem applied to the Hilbert space consisting of cusp forms Mk0
metrized by the Petersson inner product < , > implies that we can select
a basis for Mk0 consisting of simultaneous eigenfunctions for the Hecke op-
erators. Henceforth we assume Mk0 is equipped with a basis made up of all
normalized eigenfunctions {fj (z) : j = 1, 2, . . . , µ}, where µ = dimC Mk0 ,
whose Fourier expansions have the form
∞
fj (z) = an (fj )q n , a1 (fj ) = 1.
n=1
The details of the proof of this important result can be found in any of the
standard expositions of Hecke’s work (see Shimura [101]). Special cases of the
above theorem had been given by Mordell, who proved that
∞
−1
φ∆ (s) = τ (n)n−s = 1 − τ (p)p−s + p11−2s ,
n=1 p
We first prove the existence of such series. For this we need the following
result.
LEMMA 5.1
The series
|mz + n|− ,
m,n∈Z
where the prime means we have omitted the term with m = n = 0, converges
uniformly on compact subsets of H whenever > 2.
PROOF For z in a compact region R of the upper half plane we define the
lattice Λ = {mz + n : m, n ∈ Z}. For each r = 0, 1, 2, . . . define a partition
of the complex plane by annular regions
Dr = {z ∈ C : r ≤ |z| < r + 1}.
We clearly have
Card (Λ ∩ Dr ) ≤ Ar,
where the constant A depends only on the compact set R. Therefore we have
∞
|Ω|− = |Ω|−
Ω∈Λ, Ω=0 r=0 Ω∈Dr ∩Λ, Ω=0
∞
≤ |Ω|− + r− Card(Dr ∩ Λ)
Ω∈D0 ∩Λ, Ω=0 r=1
∞
≤ A0 (R) + A r1− < ∞
r=1
172 Sums of Squares of Integers
if > 2. Here A0 (R) is a constant that depends only on the compact set R.
This proves the lemma.
The starred entries are occupied by just one pair of integers (a, b) correspond-
ing to a solution of the diophantine equation ad − bc = 1.
A typical term in the Poincaré series corresponding to the transformation
L(z) = az+b
cz+d can be bounded by
exp(2πimL(z))(cz + d)−k = exp − 2πm(z) |cz + d|−k
|cz + d|2
≤ |cz + d|−k
for any m ≥ 0. Using the above coset representatives for Γ∞ \Γ we see that
the Poincaré series φm (z; k) is bounded by
|φm (z; k)| ≤ |mz + n|−k .
m,n∈Z
Hence Lemma 5.1 shows that φm (z; k) exists and defines a holomorphic func-
tion in the upper half plane. The delicate case of k = 2 can be handled by
introducing a convergence factor |JL (z)|S and showing that the series
exp(2πimL(z))JL (z)k/2 |JL (z)|s
L∈Γ∞ \Γ
converges for (s) > 0 and has a limit as s → 0. These considerations are not
relevant here since M20 (Γ) = 0.
Using the uniform convergence of the Poincaré series on compact subsets
of H we can check immediately the formal identity
az + b
φm ; k = (cz + d)k φm (z; k).
cz + d
To establish that the Poincaré series φm (z; k) are integral modular forms
of weight k > 2 for the modular group Γ, it remains to show that they are
holomorphic at the cusp at i∞. When m = 0 we get that φ0 (z; k) is essentially
a multiple of the Eisenstein series. Thus we consider the case m ≥ 1. Since
Γ∞ \Γ contains only one translation we can write
φm (z; k) = e2πimz + exp(2πimL(z))(cz + d)−k ,
L∈Γ∞ \Γ, c>0
Hecke’s Theory of Modular Forms 173
where the sum runs over representatives of Γ∞ \Γ with c > 0. Using the
annular regions defined above, we have
∞
e 2πimL(z) −k
(cz + d) ≤ |Ω|−k + |Ω|−k .
L∈Γ∞ \Γ, c>0 Ω∈D0 ∩Λ, Ω=0 r=1 Ω∈Dr ∩Λ
9r
Card(Dr ∩ Λ) ≤ .
(z)
lim φm (z; k) = 0.
(z)→∞
and thus lim (z)→∞ φ0 (z; k) = 1. Hence we have proved the following result.
be its Fourier expansion about the cusp at infinity. In the next chapter an
important role will be played by the Petersson inner product formula which
we state formally as
174 Sums of Squares of Integers
= Γ(k − 1)(4πm)1−k am ,
where am is the m-th coefficient in the Fourier expansion of f (z) and φm (z; k)
is the corresponding Poincaré series of index m. (Here, Γ(k − 1) = (k − 2)!.)
Remark. The formula given in Theorem 5.6 for the Petersson inner prod-
uct < f, φm > is known in the literature as “the fundamental formula.”
From Theorem 5.6 the following important result, also due to Petersson, is
obtained.
PROOF Since dimC Mk0 (Γ) < ∞, the Poincaré series φm (z; k), m ≥ 1,
span a finite dimensional closed subspace M of Mk0 (Γ). Using the Petersson
inner product we get the following orthogonal decomposition
Mk0 (Γ) = M ⊕ M ⊥ .
Now, the fundamental formula shows that a cusp form f is in the orthogonal
complement M ⊥ of M if and only if all its Fourier coefficients vanish, that is
to say, if f is identically zero. Hence the theorem follows.
∞
Γ∞ \Γ = {12 } + LU q .
1∈R, c>0 q=−∞
ab
where L = . We now use the identity
cd
az + aq + b a 1
= − 2
cz + cq + d c c (z + q + d/c)
to obtain
φm (z; k) = e2πimz +
∞
2πim
e2πima/c (cz + cq + d)−k exp − 2
.
q=−∞
c (z + q + d/c)
L∈R, c>0
Hecke’s Theory of Modular Forms 177
= e2πimz +
∞ w ∞ −w−k
−2πim 1 d
e2πima/c c−k z + q + .
w=0
c2 w! q=−∞ c
L∈R, c>0
Thus we have
∞ w
2πima/c −k −2πim 1
φm (z; k) = e 2πimz
+ e c 2
×
w=0
c w!
L∈R, c>0
∞
(−2πi)w+k−1 w+k−1 2πin(z+d/c)
n e ,
(w + k − 1)! n=1
which we can also write in the simpler form
e2πimz + 2πik m(1−k)/2 e2πima/c c−1 ×
L∈R, c>0
∞ ∞ √
(−1)w 4π nm
2w+k−1
(1−k)/2 2πin(d/c)+2πinz e
n e
n=1 w=0
w!(w + k − 1)!
∞
= e2πimz + 2πik m(1−k)/2 e2πi(ma+nd)/c c−1 n(1−k)/2 ×
n=1 L∈R, c>0
√
4π nm 2πinz
Jk−1 e ,
c
where Jk−1 (z) is the Bessel function. Using the representation of R given
above we can now write
∞ ∞
Sc (m, n)
φm (z; k) = e 2πimz
+ k (1−k)/2 (k−1)/2
2πi m n ×
n=1 c=1
c
√
4π nm
Jk−1 e2πinz ,
c
178 Sums of Squares of Integers
where
c
Sc (m, n) = e2πi(ma+nd)/c
d=1
gcd(d,c)=1, ad≡1 (mod c)
is the Kloosterman sum. We can also write the last expression for φm (z; k) in
the closed form
√
4π nm n (k−1)/2 2πinz
∞ ∞
k Sc (m, n)
φm (z; k) = δm,n + 2πi Jk−1 e ,
n=1 c=1
c c m
where <∆, ∆> is the Petersson inner product of ∆(z) with itself.
PROOF 0
Since dimC M12 (Γ) = 1, we must have for any m ≥ 1
φm (z; k) = c(m)∆(z).
Hence,
11! τ (m)
c(m) = ,
(4πm)11 <∆, ∆>
or equivalently,
11! τ (m)∆(z)
= φm (z; k).
(4πm)11 <∆, ∆>
By comparing the Fourier expansions of both sides we then obtain the equality
claimed in the theorem.
Hecke’s Theory of Modular Forms 179
Sq (1, m) q 1/2+ε .
If we now partition the infinite sum in the Petersson explicit formula in two
parts, one containing the terms with q ≤ p1/2 and the other sum containing
terms with q > p1/2 and using the estimates in 2 and 3 above, we get
∞
√
Sq (1, p)J11 (4π p/q)/q p1/4+ε .
q=1
We thus obtain
Clearly, the same sort of estimation can be carried out for the coefficients
of other cusp forms.
We shall give the formula only for the case Γ = SL(2, Z)/{±12 }. It reads as
follows
hA
tr Tk (n) = − F (k−2) (ρ, ρ) − σk−1 (n)
wA
(ρ,ρ) A
√ k − 1 k/2−1
+ δ( n) n − n(k−1)/2 ,
12
where F (k−2) (ρ, ρ) = ρk−1 − ρk−1 /(ρ−ρ), (ρ, ρ) runs over all pairs of mutu-
ally conjugate irrational quadratic integers with norm n, A runs over all orders
of imaginary quadratic fields, such that ρ ∈ A, hA is the class
number of A, wA
is the number of roots of unity contained in A, σk−1 (n) = d|n, d<√n dk−1 ,
√ √
and δ( n) = 1 or 0 according as n is an integer or not. When k = 2 we
must add σ(d) = d|n d to the right hand side of the above formula. Since
dimC Mk0 (Γ) = 0 for k = 2, 4, 6, 8, 10, we have tr Tk (n) = 0 in these cases.
The first nontrivial value of the trace of the Hecke operator occurs for k = 12,
where we obtain this simple expression for the value of τ (p) when p is prime,
h(4p − m2 )
τ (p) = − Mp,m − 1 + p,
√ w∆
|m|< 4p
where the class number h(4p−m2 ) of the imaginary quadratic field Q m2 − 4p
is to be weighted by the appropriate number of roots of one, that is,
⎧
⎨ 6 if ∆ = −3,
w∆ = 4 if ∆ = −4,
⎩
2 if ∆ < −4, and
to the Weil conjectures concerning the zeros of the zeta functions of varieties.
In Ihara’s approach there remained three main problems: the first had to do
with the desingularization of the fiber variety obtained by Ihara, the second
had to do with the proper contribution to the Hecke polynomial coming from
the “middle” cohomology groups and the third, and perhaps most crucial,
was the actual proof of the Weil conjectures. All these three problems were
successfully dealt with by Deligne. We now state these results for the case of
the Ramanujan arithmetical function.
For each prime , let K denote the maximal extension of the rationals
unramified outside and for a prime p = , let Fp be the conjugacy class in
Gal(K /Q) which is the inverse of the class containing the Frobenius element
φp corresponding to the prime p. The results of Deligne are:
5.14 Exercises
1. Derive the following expression for the Euler-Poincaré polynomial for Γ:
∞
1
PΓ (t) = (dimC Mk (Γ)) tk = .
(1 − t4 )(1 − t6 )
k=0
182 Sums of Squares of Integers
Let w(−3) = 3, w(−4) = 2 and w(D) = 1 for D < −4. Set h (D) =
h(D)/w(D), where h(D) is the number of classes of binary quadratic
forms Q = ax2 + bxy + cy 2 with b2 − 4ac = D, if D ≡ 0, 1 (mod 4).
Otherwise, put h(D) = 0. Define the Hurwitz class number of a positive
integer N by
N
H(N ) = h − 2 .
d
2 d |N
where
Representation of
Numbers as Sums of
Squares
6.1 Introduction
The problem of representing a positive integer as a sum of s squares has a
long and interesting history, partly recounted in Dickson’s monumental his-
tory of number theory [24], volume II, pages 225–339. The two best-known
results of elementary number theory which have played a significant role in
the development of various approaches to the problem are Fermat’s theorem
that any prime of the form p = 4m + 1 is representable in a unique way as a
sum of of two squares and Lagrange’s theorem that every positive integer is
the sum of at most four squares of integers. As we have seen in Chapter 2,
each of these results comes equipped with an explicit formula for the number
of possible representations. It has been the goal of many number theorists to
obtain similar exact formulas for the number of representations of an integer
as the sum of s squares for s > 4. We shall see in this chapter, using the
Hecke theory of modular forms, to what extent such a goal is possible.
In this chapter we present an outline of those aspects of the problem con-
nected with the theory of modular forms. The central result is the derivation
of the so-called singular series as an Euler product consisting of the quotient
of two classical Dirichlet L-functions, whose values are expressible in terms of
generalized Bernoulli numbers (see Theorems 6.3 and 6.4). We precede our
discussion by recalling some of the well-known formulas related to the prob-
lem for s = 2, 3, 4, 5, 6, 7 and 8. The chapter ends with a brief description
(in Section 6.7) of the relation between Liouville’s methods and the theory of
elliptic modular forms.
In this development, the theory of modular forms of half integral weight,
including the representation theoretic interpretation of the Jacobi theta form
185
186 Sums of Squares of Integers
ϑ(z)s , plays a major role, particularly in the case of odd s. This part of the
theory is lucidly presented in Gelbart’s monograph [36].
For more about modular forms and the arithmetical properties of the co-
efficients appearing in their q-series expansions, the reader can consult the
monograph of Ono [80].
where the second sum in the last formula runs over all integral quaternions
a + ib + jc + kd whose norm is n.
16 128 n
r24 (n) = (−1)n+d d11 + (−1)n−1 259τ (n) − 512τ , (6.7)
691 691 2
d|n
Representation of Numbers as Sums of Squares 187
where ∞ ∞
24
∆(z) = q (1 − q n ) = τ (n)q n
n=1 n=1
Much later Minkowski and Smith gave arithmetical proofs of the cases s = 5
and s = 7:
√ ∞
Bn n n 1
r5 (n) = 2
, (6.12)
π m=1
m m2
√ ∞
Cn2 n −n 1
r7 (n) = 3
. (6.13)
π m=1
m m3
There are corresponding formulas for even n which depend on the highest
power of 2 dividing n. Examples of these formulas will be made explicit when
we develop the theory of the singular series. The impatient reader can skip
ahead to Theorem 6.3.
In his brief note ([47], page 192) Hardy gave the following three power series
identities from which the formulas (6.11), (6.12) and (6.13) could be derived.
⎧
⎪
⎨∞ ∞
(−1)(n−1)/2
ϑ(z)3 = 1 + 8 (mn + j)1/2 q mn+j (6.14)
⎪
⎩ n
n=1 j m=0
n odd
⎫
∞ ∞ ⎪
⎬
1
+ (−1)m+v (mn + j)1/2 q mn+j ,
n j m=0 ⎪
⎭
n=2
n even
⎧
⎪ ∞
32 ⎨ 1
∞
ϑ(z)5 = 1 + (mn + j)3/2 q mn+j (6.15)
5 ⎪⎩ n=1 n 2
j m=0
n odd
⎫
∞
∞ ⎪
⎬
1
− (−1)m+v (mn + j)3/2 q mn+j ,
n2 j m=0 ⎪
⎭
n=2
n even
⎧
⎪ ∞
256 ⎨ (−1)(n−1)/2
∞
ϑ(z)7 = 1 + (mn + j)5/2 q mn+j (6.16)
15 ⎪⎩ n=1 n 3
j m=0
n odd
⎫
∞
∞ ⎪
⎬
1
− (−1)m+v
(mn + j)5/2 mn+j
q .
n3 j m=0 ⎪
⎭
n=2
n even
∞ s
ns/2−1 π s/2
m
Sh,m
ρs (n) = e−2πinh/m , (6.17)
Γ(s/2) m=1 m
h=1
gcd(h,m)=1
Representation of Numbers as Sums of Squares 189
where
m
2
Sh,m = e2πihj /m
.
j=1
We record here for future reference some for the basic properties of the Gaus-
sian sum Sh,m :
m = 2, Sh,m = 0,
m = 22ν+1 , ν>0 Sh,m = 2ν+1 eπih/4 , √
m = 22ν , ν >0 2ν (1 + ih ),
Sh,m = i= −1,
2 √
m = p, Sh,m = hp i(p−1) /4 p,
m = p2ν+1 , Sh,m = pν Sh,p ,
m = p2ν , Sh,m = pν ,
The basic idea of the circle method as applied by Hardy in the proof of
this theorem consists in observing that the essential part of the generating
function ϑ(z)s is to some degree determined by its behavior at all the rational
points on the real axis. This latter behavior is in turn derived from the
theta transformation formulas. In one respect the solution given by Hardy is
somewhat unsatisfactory in that it leaves open the all too important question
of the arithmetical nature of rs (n). For example, it does not explain why
rs (n) fails to be multiplicative in general. As we shall see later, and this is the
main point we want to make in this chapter, this difficulty arises because the
generating function ϑ(z)s belongs to three quite different spaces depending
on the residue class to which s belongs modulo 4. We also note that from
190 Sums of Squares of Integers
and that
ϑ(zf ) = ϑ(w(zf )) = χϑ (w) czf + d ϑ(zf ), (6.19)
we get, on canceling ϑ(zf ) from both sides,
a + d 2
a+d
1 = χϑ (w) + − 1,
2 2
an expression that gives the value of χϑ (w) readily. Unfortunately, the fixed
point zf often lies on the real axis and in the last step we have to cancel the
indefinite quantity ϑ(zf ) from both sides. To remedy this situation we inves-
tigate instead the asymptotic behavior of the theta function as we approach
a rational number on the real axis from above. Knowledge of this asymptotic
behavior, together with relation (6.19), will prove the theorem.
More precisely, given w(z) as in (6.18), with c > 0, we put z = − dc + iy, so
that cz + d = icy and w(z) = ac + c2iy . From (6.19), with z instead of zf , we
get, as y → 0+,
∞
πim2 z
ϑ(w(z)) = 1 + o(1) = χϑ (w) icy · 1 + 2 e (6.20)
m=1
∞
√
πim2 ρ−πm2 y
= ζ4 · χϑ (w) cy 1+2 e ,
m=1
m = j + cν, j = 1, 2, . . . , c, ν = 0, 1, 2, . . . ,
192 Sums of Squares of Integers
1
= √ + O(1).
2c y
We note that the O-term on the right side comes from the fact that the
higher-order derivatives of the (monotonic) function
are
∞ all−t
polynomials
√ in y. The value of the integral uses the well-known formula
2
−∞
e dt = π. (A direct argument using Riemann sums would also suffice
to prove (6.21).)
Substituting the expression (6.21) in Equation (6.20) yields the relation
1
1 = ζ4 · χϑ (w) · √ · G(ρ),
c
where
n
m n
def
G = exp πi j 2
m j=1
m
This fact is established using arguments like those we will give in Section
6.4.1. It is an interesting exercise to verify that the resulting expression for
χϑ (w) in terms of the extended Jacobi-Legendre symbol agrees with the value
found above.
It contains two inequivalent cusps: one at infinity and the other at z = −1.
More generally, for s a positive integer we can consider ϑ(z)s as a modular
form of type {Γϑ , s/2, χsϑ }, that is, f (z) = ϑ(z)s is regular in D(Γϑ ) and
satisfies the functional equation
az + b
f (w(z)) = χϑ (w)s (cz + d)s/2 f (z), with w(z) = ∈ Γϑ .
cz + d
The Petersson-Riemann-Roch formula can be used to show that this latter
space of functions, which is finite dimensional, contains no functions that
vanish at both cusps z = −1 and z = ∞ when 1 ≤ s ≤ 8. Of the two possible
Eisenstein series of type {Γϑ , s/2, χsϑ}, for s ≥ 5, one vanishes at z = i∞ and
the other is given by
1
Es (z) = χϑ (T )−s (cz + d)−s/2 ,
2 0 1
∗ ∗A
T =@
cd
is the n-th coefficient of the q-expansion of the cusp form f (z). One then uses
the fact that f (z) has weight s/2 to deduce that
cs (n) = O(ns/4 ).
type {Γϑ , s/2, χsϑ }, with s even. For each basis element Petersson could then
calculate explicitly the q-expansion. This leads then in principle to exact
formulas for rs (n). Rather than stating the most general result we give an
example of the type of formula that results from Petersson’s analysis.
Example 6.1
For odd n we have
„ √ «
222 · 37 · π 12 <∆, ∆>n11/2 X Sc (n)
∞
4π n
r24 (n) = ρ24 (n) + J11 ,
34 · 52 · 691 c=1
c c
where <∆, ∆> is the Petersson inner product of ∆(z) with itself, J11 (z) is the
Bessel function and
Xc „ «
2πi
Sc (n) = exp (h + h n) , where hh ≡ 1 (mod c),
h=1
c
gcd(h,c)=1
is a Kloosterman sum.
Expansions similar to that given above are now possible for any real s (see
[77], page 135). The advantage in using Petersson’s method of Poincaré series
over that of Hardy consists in applying estimates for Sc (n) to get correspond-
ing estimates for rs (n) − ρs (n). For example, one can deduce from Weil’s
estimate of Kloosterman sums the result
for even s and ε > 0. A slightly weaker result also follows for odd s.
The rest of the chapter is divided into five sections. In Sections 6.3 and 6.4
we consider the problem of computing explicitly in finite terms the singular
series. In Section 6.5 we develop, using the modern theory of modular forms,
explicit formulas for rs (n) when s is even and n is a power of a prime, and for s
odd when n is a fixed multiple of the square of a prime power (see Theorems
6.8 and 6.9, respectively). In Section 6.6 we discuss some open problems
and give several examples that seem to suggest where to search for a more
satisfactory answer to the problem than that presented here. In Section 6.7 we
outline the connection between Liouville’s elementary method and the theory
of elliptic modular forms.
and
0 if s − 23−i n ≡ 2 or 6 (mod 8),
Θi = 3−i
(−1)(s−2 n )/4 if s − 23−i n ≡ 0 or 4 (mod 8),
If e = 1, we put L2 (0, ρs ) = 1.
∞
zez zk
= Bk · .
e −1
z k!
k=0
The character χ in Theorem 6.4 depends on the parity of (s − 1)/2 and will
be defined later (as χn∗ in Equation (6.42)).
REMARK 6.1 These two theorems will be proved in Section 6.4 where
we will make a complete evaluation of the singular series for arbitrary n.
REMARK 6.2 The Bernoulli numbers used in Theorem 6.4 are the same
as those studied in Chapter 3, except that B1 = + 21 here.
REMARK 6.4 If (−1)(s−1)/2 n ≡ 1 (mod 4), then the formula for ρs (n)
has to be modified by multiplying it by the factor (1/4)(s/2)−1 .
REMARK 6.5 The explicit formulas of Theorem 6.3 can be written more
simply as follows. For s = 2k, with odd k, we have
4
k−1
2n
ρs (n) = ρ2k (n) = χ(d) + χ(k)dk−1 , (6.22)
|Ek−1 | d
d|n
where χ(a) = (−1/a) (the Jacobi symbol) and Ek−1 is the Euler number
defined at the end of Chapter 3. For s = 2k with even k we have
2k
ρs (n) = ρ2k (n) = dk−1 (−1)(d+k/2)(n/d−1) , (6.23)
(2k − 1)|Bk |
d|n
Similarly we have
⎛ ⎞
∞ k−1 n ∞
χ(n)n q ⎝
= χ(d)dk−1 ⎠ q n .
n=1
1 − q n
n=1 d|n
When these two formulas are substituted into the Mordell expansion we get
⎛ ⎞
∞ k−1
1 1 ⎝ 2n ⎠ qn
|Ek−1 |E2k (z) = |Ek−1 | + 2 χ(d)
2 2 n=1
d
d|n
⎛ ⎞
∞
+2χ(k) ⎝ χ(d)dk−1 ⎠ q n .
n=1 d|n
We now divide through by 12 |Ek−1 | and compare the n-th coefficients in the
q-expansion above with those in E2k (z) to obtain
! "
4
k−1
2n
ρ2k (n) = χ(d) + χ(k)dk−1 .
|Ek−1 | d
d|n
To deal with the case of even k we use Mordell’s q-Lambert expansion ([74],
page 96):
∞
nk−1 q n
(2k − 1)|Bk |E2k (z) = (2k − 1)|Bk | + 2k .
n=1
1 − (−1)k/2+n q n
We then divide through by (2k − 1)|Bk | and equate the n-th coefficients of
the corresponding q-expansion to obtain
2k
ρ2k (n) = dk−1 (−1)(d+k/2)(n/d−1) .
(2k − 1)|Bk |
d|n
Theorem 6.3 clearly accounts for the simple form taken by rs (n) in the
examples given in the previous section. They are obtained by using the well-
known values for the Euler and Bernoulli numbers. (Note that E2 = −1 and
E4 = 5. The Bernoulli numbers may be found in a table following Theorem
3.2.)
REMARK 6.6 Theorems 6.3 and 6.4 open the way for a closer arithmetic
study of the singular series ρs (n). In particular we can now study the p-adic
analytic continuation of ρs (n) in the variable s in the sense of Iwasawa and
Serre.
REMARK 6.7 We are hopeful that the methods of this section apply
to the more general singular series studied by Hardy and Littlewood. In
particular one may try to evaluate the singular series that arises in the cubic
Waring problem (see Hardy and Littlewood [51])
n = x31 + x32 + · · · + x3s
REMARK 6.8 The results of Theorems 6.3 and 6.4 clearly show to what
extent the singular series is made up of multiplicative functions.
In particular we have
N 1
f (0) + f (1)
= lim f (η)e2πinη dη.
2 N →∞ 0 n=−N
def
n
2
S = e2πij /n
,
j=1
Note first that applying the Fourier series theorem to fs (ξ) gives
n−1
fs (0) + fs (1)
S=
s=0
2
n−1 N 1
2
= lim e2πi{(s+η) /n+hη} dη
N →∞ 0
s=0 h=−N
N s+1
n−1
2
= lim e2πi{η /n+hη}
dη
N →∞ s
h=−N s=0
N n
2
= lim e2πi{η /n+hη}
dη, η = nx
N →∞ 0
h=−N
N 1
2πin{x2 +hx}
= n lim e dx .
N →∞ 0
h=−N
200 Sums of Squares of Integers
we obtain
∞ h+1
γ 2
(i) √ = e2πiny dy
n
h=−∞ h
∞ 1
2
= e2πin(y+h) dy
h=−∞ 0
∞ 1
2
= e2πin(y +2hy)
dy.
h=−∞ 0
Similarly,
∞ h+1/2
γ 2
(ii) √ = e2πiny dy
n
h=−∞ h−1/2
∞ 1
2
= e2πin(y+h−1/2) dy
h=−∞ 0
∞ 1
2
= e2πin(y +(2h−1)y n
i dy.
h=−∞ 0
Adding (i) to i−n times (ii) and comparing with the previously obtained
formula for S we get
γ γi−n √
S=n √ + √ = n(1 + i−n )γ.
n n
1 + i−n √
S= n,
1 + i−1
k
2
S(h, k) = e2πihj /k
j=1
for integers k > 0 and h with gcd(h, k) = 1. We will prove that S(h, k) enjoys
a useful multiplicativity property.
THEOREM 6.6
If k1 and k2 are relatively prime positive integers and h is an integer relatively
prime to both k1 and k2 , then
LEMMA 6.1
Let k and k be relatively prime positive integers. Let j run through a
complete set of residues modulo k and j run through a complete set of residues
modulo k . Then kj + k j runs through a complete set of residues modulo
kk .
have
kj + k j ≡ ki + k i (mod kk ).
This implies k j ≡ k i (mod k), which gives j ≡ i (mod k) because gcd(k , k) =
1. Similarly, j ≡ i (mod k ). Hence the kk numbers are all incongruent and
must form a complete set of residues modulo kk .
where j and j run over complete sets of residues modulo k1 and k2 , respec-
tively. The expression in braces in the exponent satisfies
hk1 j h(k2 j + k1 j )2
2
hk2 j 2
+ ≡ (mod 1).
k1 k2 k1 k2
202 Sums of Squares of Integers
Letting J = k2 j + k1 j and using Lemma 6.1, we see that the above sum is
2
e2πihJ /k1 k2 ,
J mod k1 k2
Clearly, if k = 1, $
we have S(h, 1) = 1.
If k > 1 and k = p|k pp is the factorization of k into a product of powers
of distinct primes, then repeated application of (6.24) yields
k p
S(1, k) = S , p .
pp
p|k
We will now show that if ≥ 2 and p does not divide 2h, then
S(h, p ) = pS(h, p−2 ). (6.25)
The integers J in the set {0, 1, . . . , p −1} can be parametrized by J = p−1 j +
j , with j in the set {0, 1, . . . , p− 1} and j in the set {0, 1, . . . , p−1 − 1}. Since
h(p−1 j + j )2 2hjj hj
2
hJ 2
=
≡ + (mod 1),
p p p p
the sum S(h, p ) can be represented as a double sum,
p−1
−1
p−1
2
/p
e2πihj e2πij(2j h/p)
.
j =0 j=0
Here the inner sum is p if 2j h ≡ 0 (mod p), that is, if p|j ; otherwise it is 0.
Hence the double sum is
p−2
−1 2
/p−2
e2πihν · p = pS(h, p−2 ).
ν=0
If we let ζ = e2πi/p , and let j and j run through the quadratic residues
(QR) and quadratic nonresidues (QNR) modulo p, respectively, in the set
{1, 2, . . . , p − 1}, then with gcd(h, p) = 1, we have
def
p−1
J
σ = e2πihJ/p = ζ jh − ζ j h.
p
J=1 j QR j QNR
Representation of Numbers as Sums of Squares 203
Thus
p−1
ζ −hj = 2 ζ −hj = ζ −hj − ζ −hj − 1,
2
j=1 j QR j QR j QNR
or
p−1
p−1
j
ζ −hj = ζ −hj
2
S(h, p) = 1 +
j=1 j=1
p
p−1
h j h 1 + ip √
= ζ −j = · p
p j=1 p p 1+i
p−1
−hj 2 h 1 + ip √
ζ = p
j=1
p 1+i
and
S(h, p) h 1 + ip
√ = .
p p 1+i
Note that for odd p,
1 + ip
= i−(p−1) /4 .
2
1+i
In the case p = 2 we clearly have S(h, 2) = 0. The values of S(h, k) for
k = 2u are easy to calculate. For example, we can verify directly that
⎧
⎨0 if k = 2,
S(h, k) = 2(1 + ih ) if k = 22 ,
⎩ πih/4
4e if k = 23 .
Since
a
u 2 2u √ (1 + i)(1 + i−a )
e2πi2 j /a
= a
j=1
a 2
and u
2 a
√ u
(1 + i)(1 + i−a2 )
2
/2u a
e2πij = 2u a ,
j=1
2
we have u
S(h, 2u ) 2
√ = · 1 + ih . (6.29)
2u h
A direct calculation for u = 2 shows that (6.29) also holds in that case.
then we have
∞
σ1−s (n) cq (n)
= .
ζ(s) q=1
qs
It can be proved directly from the definition of cq (n) (see Section 16.6 of [52])
that cq (n) is multiplicative in q and
φ(q) q
cq (n) = µ(q ), q = .
φ(q ) gcd(n, q)
If we put T = 1/ps , then the equation above can be written more simply as
∞
1+ cpu (n)T u = (1 − T ) pu T u .
u=1 0≤u≤ordp (n)
206 Sums of Squares of Integers
This is equivalent to
∞
1 − (apT )ordp (n)+1
1+ c (n)a T = (1 − aT )
pu
u u
.
u=1
1 − (apT )
These identities will play an important role in the evaluation of the Fourier
transform of the quadratic Gaussian sum in the next section.
k s
S(h, k)
Ak = · e−2πinh/k .
k
h=1
gcd(h,k)=1
In this section we will evaluate Apu when p is prime and u is a positive integer.
There are two cases, according as p is odd or p = 2.
6.4.3.1 The Case p odd
LEMMA 6.2
Let p be an odd prime and u be a positive integer. Then
⎧
⎪ (−1)s/2
⎪
⎨ cpu (n) if pu is not square and s is even
p
(pu )s/2 Apu = −cpu (n)√p (−1)(s−1)/2 ν if pu is not square and s is odd
⎪
⎪ p
⎩
cpu (n) if pu is square,
where n = pu−1 · ν.
p−1
h n √ 1 + ip
e−2πinh/p = p· , (6.30)
p p 1+i
h=1
Representation of Numbers as Sums of Squares 207
we have
p−1
−s
Ap = p (S(h, p))s e−2πinh/p
h=1
& h s
s p−1
= p−s (−1)(p−1)/2 p e−2πinh/p .
p
h=1
The inner sum vanishes unless pu−1 |n. But if n = pu−1 · ν = p2v · ν, then
Apu = p−(v+1)s pu−1 (S(m, p)) e−2πiνm/p
s
m
p−1
= p−vs+u−1 p−s
s −2πiνm/p
(S(m, p)) e .
m=1
p &
−2πimj 2 /p m
S(m, p) = e = (−1)(p−1)/2 p.
j=1
p
Hence,
p−1 &
s
−vs+u−1 −s m
Apu =p p (−1)(p−1)/2 p e−2πiνm/p
m=1
p
& s
p−1 s
−vs+u−1−(s/2) m
=p (−1)(p−1)/2 e−2πiνm/p .
m=1
p
208 Sums of Squares of Integers
In fact we have
2v
p
s
2v −s
S(h, p2v ) · e−2πinh/p .
2v
Ap2v = p
h=1
gcd(h,p2v )=1
2v
p
2v −s/2
e−2πinh/p = p−vs cp2v (n).
2v
Ap2v = p
h=1
gcd(h,p2v )=1
LEMMA 6.3
We have A2 = 0. Let u > 1 be an integer. Let ζ8 = e2πi/8 . Then
⎧
⎪
⎪ 0 if u is even and 2u+1 8n
⎪
⎪ (s/2)+u−2 s−ν
⎪
⎪ 2 ζ8 + ζ8
−(s−ν)
if u is even and 2u+1 |8n
√ s ⎨
2u An = 0 if u is odd and 2u 8n
⎪
⎪
⎪
⎪0 if u is odd and s ≡ ν (mod 4)
⎪
⎪
⎩ 2(s/2)+u−2 ζ s−ν + ζ +(s−ν) if u is odd and s ≡ ν (mod 4),
8 8
where 8n = 2u · ν.
1
2
1
A2 = (S(h, 2))s e−2πinh/2 = (S(1, 2))s e−πin
2s 2s
h=1
gcd(h,2)=1
⎛ ⎞s
1 ⎝ 2πij 2 /2 ⎠ −πin 1
2 s
= s e e = s eπi/1 + 1 e−πin = 0.
2 j=1
2
S(h, 2u ) S(h, 22 )
√ = √ .
2u 22
S(m, 22 ) s 2u−2
−1
−2πinm/2u u−2
= e e−2πinz/2 .
m
2 z=0
The inner sum vanishes if 2u−2 n, and it equals 2u−2 if 2u−2 |n. Recall that
8n = 2u · ν. Then
√ s s
s S(1, 4) S(3, 4)
u u−2 −2πin/2u −2πin·3/2u
2 A2u = 2 e + e
2 2
u u
= 2u−2 2−s (S(1, 4))s e−2πin/2 + (S(3, 4))s e−2πin·3/2 .
210 Sums of Squares of Integers
Since
S(1, 4) S(3, 4)
√ = 21/2 ζ8 and √ = 21/2 ζ8−1 ,
22 22
we have
√ s
−(ν−s)
A2u = 2−s+(3s/2)+u−2 ζ8
(ν−s)
2u + ζ8
(ν−s) −(ν−s)
= 2u−2+(s/2) ζ8 + ζ8 .
The inner sum vanishes unless 2u−3 |n, and in that case it equals 2u−3 and we
have
√ s √ s u
2u A2u = 2u−3 2 e(πia/4)−(2πina/2 )
a
u−3+(s/2)
=2 e(2πia/8){s−ν}
a
{s−ν}a
u−3+(s/2)
=2 ζ8 .
a
{s−ν}
3
{s−ν}2b
= 2u−3+(s/2) ζ8 ζ8 .
b=0
The inner sum vanishes unless s − ν ≡ 0 (mod 4), in which case it equals 4.
Hence, when s ≡ ν (mod 4) we get
√ s
{s−ν}
2u A2u = 2u−1+(s/2) ζ8 .
The goal of this section is to derive an explicit expression for this local factor.
Suppose p is an odd prime and s is a positive odd integer. From the defining
relation for the Ramanujan function,
∞
ordp (n)
1+ cpu (n)T = (1 − T )
u
pu · T u ,
u=1 u=0
we get that, if p does not divide n, that is, ordp (n) = 0, then cpu = 0 for
u ≥ 2 and hence
Apu = 0 for u ≥ 2.
Therefore,
(−1)(s−1)/2 · n
Lp (w, ρs ) = 1 + p−(s−1)/2−w . (6.31)
p
and hence cpu = 0 for u ≥ 3, and cp2 = −p. Since Ap = 0 in this case, we
have
Hence we have
1 p(1−(s/2)−w)(e−1) − 1
Lp (w, ρs ) = 1 + 1 − p2(1−(s/2)−w) 2(1−(s/2)−w) − pe−(e+1)((s/2)+w)
p p −1
1 p(2−s−2w)(e−1)/2 − 1
= 1+ 1− p2−s−2w − p−1−((e+1)/2)(s+w−2) .
p p(2−s−2w) − 1
Letting V = p2−s−2w , we get
(e−1)/2
1 V −1
Lp (w, ρs ) = 1 + 1 − ·V · − p−1 V (e+1)/2
p V −1
1− 1
p V − V (e+1)/2 + 1
p V 1+(e+1)/2
=
1−V
' (
1− 1
p ·V 1 − V (e+1)/2
= .
1−V
Substituting back the value V = p2−s−2w into the last expression, we get
1 − p1−s−2w 1 − p(2−s−2w)(e+1)/2
Lp (w, ρs ) = .
1 − p2−s−2w
This finishes the case of e odd.
Representation of Numbers as Sums of Squares 213
we get
e−2
−(s/2)−uw
Lp (w, ρs ) = 1 + cpu (n) (pu ) − pe−1−es/2−ew
u=2
u even
ν
+ pe−es/2−ew 1 + p−(s−1)/2−w ,
p
214 Sums of Squares of Integers
we can write
(2−s−2w)(e−2)/2
1 p −1
Lp (w, ρs ) = 1 + 1 − p 2−s−2w
− pe−1−es/2−ew
p p2−s−2w − 1
e−es/2−ew ν −(s−1)/2−w
+p 1+ p .
p
= .
1−T
Thus we get
1 − p1 p2−s−2w 1 − p(2−s−2w)(e/2)
Lp (w, ρs ) =
1 − p2−s−2w
e−es/2−ew ν −(s−1)/2−w
+p 1+ p ,
p
Let us combine the cases of even and odd e into a single formula. If we put
e = 2e + δ with δ ∈ {0, 1}, we can finally write
1 − p1 p2−s−2w 1 − p(2−s−2w)(e+δ)/2
Lp (w, ρs ) =
1 − p2−s−2w
ν
+ (1 − δ)p(2−s−2w)e/2 1 + p−(s−1)/2−w .
p
This completes the evaluation of the local factor Lp (w, ρs ) when p is odd and
s is odd.
Representation of Numbers as Sums of Squares 215
For later use we note that the above resultcan also be written in the
following form. Let χn∗ (p) = (−1)(s−1)/2 · n/p and ordp (n) = e = 2e + δ
with δ ∈ {0, 1}, we then have
Lp (0, ρs ) = 1 − p1−s ×
2−s (e+δ)/2
1 (e+δ)/2−1 p
1 + p2−s + · · · + p2−s + (1 − δ) .
1 − χn∗ (p)p−(s−1)/2
This formula will facilitate the calculation of the singular series in the general
case when n is not necessarily square-free.
∞
A2k
L2 (w, ρs ) = 1 + .
(2k )w
k=1
∞
Ak
L(w, ρs ) = Lp (w, ρs ) =
p
kw
k=1
u = 0 : A1 = 1,
u = 1 : A2 = 0,
u = 2 : A22 = 2−(s/2)+1 Θ− ,
0 if n ≡ s (mod 4),
u = 3 : A23 =
2(−(s/2+1))2 Θ+ if n ≡ s (mod 4),
u ≥ 4 : A2u = 0.
T − T2
L2 (w, ρs ) = 1 + 0 + 2w
Θ + 3w Θ+ .
2 2
Here and in the following we put
T = 2−(s/2)+1 .
(B) We suppose n = 2n with n odd. Then, if 2u+1 8n, we have A2u = 0
for even u ≥ 4 and, if 2u 8n, we have A2u = 0 for odd u ≥ 5 If 8n = 23 · ν,
then ν is even and s ≡ ν (mod 4), and hence A23 = 0. We get in this case
T −
Ls (w, ρs ) = 1 + Θ ,
22w
unless n ≡ s (mod 4), in which case
T − T2
Ls (w, ρs ) = 1 + 2w
Θ + 3w Θ+ .
2 2
We consider now the general case
n = 2 a · n , n odd, a ≥ 2.
A2u = T u−1 Θ− .
Representation of Numbers as Sums of Squares 217
a+3
A2u A2u
L2 (w, ρs ) = = 1 + + S,
u=0
(2u )w (2u )w
2≤u≤a
u even
where
A2u+1 A2u+2 A2u+3
S= + u+2 + u+3 .
(2u+1 )w (2 )w (2 )w
For the middle sum we have
A2u
= Θ− T 2j−1 · 2−2jw
(2u )w j=1
2≤u≤a
u even
1 2 −2w j
= Θ− T ·2
T j=1
−1
2 −2w 1 − T 2 2−2w
=Θ T 2 .
T 1 − (T 2 2−2w )
As for the sum S, we note that by (A), (B), (C), (D) and (E) we have
S = Θ− T a 2−(a+1)w
Otherwise we have
LEMMA 6.4
Let s be an odd positive integer and set T = 2/2s/2 . Let n be a positive
integer and set n = 2a · n with n odd. Put a = 2 + δ with δ ∈ {0, 1}. Then
L2 (0, ρs ) is given by this polynomial in T :
1 s 1 − T a−δ a+(s−n )(1−δ)/2 a+1−δ
1+0 + √ T· + (−1) T
2 2 1 − T2
+ (1 − δ)(−1)(s−n )/4 T a+2−δ ,
and hence
1 s 1
Θ− = − √ = − √ (−1)(s −1)/8 .
2
2 2 2
Similarly for Θ+ we have, if s ≡ ν (mod 4), with ν = 8n/2a+3 = n odd, a + 3
odd,
(−1)(s−n )/4 if s − n ≡ 0 (mod 4),
Θ+ = ζ8s−n =
0 otherwise.
The last two terms in the statement of the lemma come from the two possible
terms that can appear in the sum S evaluated earlier, with the understanding
that when s ≡ n (mod 4), then (−1)(s−n )/2 = −1. We note also that
s
s − 2n
= (−1)(n −1)/2 .
2 2
If we put
(−1)s/2
χs (p) = and χs (pu ) = (χs (p))u ,
p
then we have
(pu )s/2 Apu = cpu (n)χs (pu ).
If we write n = pe · n , where p n , and T = χs (p)/ps/2 , then we have
u
χs (p)
1+ Apu (n) = cpu (n)
ps/2
k
χs (p) χs (p)
= 1− p k
ps/2 ps/2
0≤k≤ordp (n)
1 − (pT )ordp (n)+1
= (1 − T ) .
1 − pT
LEMMA 6.5
Suppose s is an even integer, n = 2e · n , with n odd, and T = 2/2s/2 . We
then have
L2 (0, ρs ) = 1 + 0 + Θ0 T k−1 + Θ1 T e + Θ2 T e+1 ,
2≤k≤e
PROOF Recall that Lemma 6.3 gives the following values for A2u : A1 = 1
and A2 = 0. Let n = 2e · n with n odd. Let ν = 8n/2u . Let
1 (s−ν) ±(s−ν)
Θ± = ζ + ζ8 .
2 8
For u even we have
A2u 0 if 2u+1 8n,
2 u−1 = Θ− if 2u+1 |8n, (6.36)
2s/2
while for u odd we have
A2u 0 if 2u 8n or if s ≡ ν (mod 4),
2 u−1 =
Θ+ if 2u |8n and s ≡ ν (mod 4).
(6.37)
2s/2
Throughout the discussion we assume s is even. We note first that A2u = 0
(a) if u is even and u > 2 + e, and (b) if u is odd and u > 3 + e. Hence,
A2u = 0 for all u > 3 + e. Now note that if s is even and u = 3 + e is odd,
then
s ≡ ν ≡ n (mod 4)
cannot hold because n is odd. Hence, A2u = 0 if u ≥ 3 + e. Thus we have
L2 (0, ρs ) = 1 + 0 + A2k + A2e+1 + A2e+2 .
2≤k≤e
This is equivalent to
A2k (−1)(s−ν)/4 if s − ν ≡ 0 or 4 (mod 8),
2 k−1 = 0 if s − ν ≡ 2 or 6 (mod 8).
2s/2
Since
1 2πi(s−ν)/8
Θ+ = e + e+2πi(s−ν)/8 = (−1)(s−ν)/4 ,
2
A2u 0 if s − ν ≡ 2 or 6 (mod 8),
2 u−1 = (−1)(s−ν)/4 if s − ν ≡ 0 or 4 (mod 8).
2s/2
Example 6.2
The following table gives the values of the Thetas for s ∈ {2, 4, 6, 8}. Recall
that n = 2e · n with n odd. (For s = 6, we have Θ2 = (−1)(3−n )/2 =
−(−1)(1−n )/2 .)
s Θ0 Θ1 Θ2
2 0 0 +(−1)(1−n )/2
4 −1 1 0
6 0 0 −(−1)(1−n )/2
8 1 −1 0
6.4.7 Examples
This section contains four extended examples to illustrate the use of the theory
we have just developed.
6.4.7.1 Sum of Two Squares
As we have seen in earlier sections, the formula for the number r2 (n) of rep-
resentations of an integer n as a sum of two squares has a relatively simple
structure and involves the divisors of n. Jacobi’s combinatorial formula (6.8),
Representation of Numbers as Sums of Squares 223
which we write as
! ∞
"2 ∞
2 q 2m−1
n
q =1+4 (−1)m−1 ,
n=−∞ m=1
1 − q 2m−1
has on the left side the generating function for r2 (n) and on the right side,
after expanding the Lambert series as a power series in q, the arithmetic
function expressing r2 (n) in terms of the divisors of n. More precisely, we
have ⎧ ⎫
⎪
⎪ ⎪
⎪
∞ ∞
⎨ ⎬
n (d−1)/2
1+ r2 (n)q = 1 + 4 (−1) qm .
⎪ ⎪
n=1 m=1 ⎪
⎩ d|m ⎪
⎭
d odd
The discussion in this section is aimed at a proof of the identity
2r2 (n) = ρ2 (n),
first noted by Ramanujan.
We recall that the singular series is defined by
n(s/2)−1
ρs (n) = · S,
π −s/2 Γ(s/2)
%∞
where S = k=1 Ak , and
k s
S(h, k)
A1 = 1, Ak = e−2πinh/k ,
k
h=1
gcd(h,k)=1
k
2
S(h, k) = e2πij h/k
.
j=1
Since the Ak have been shown to be multiplicative in k, we have
S= Lp (0, ρs ),
p
where ∞
Lp (0, ρs ) = 1 + Apk .
k=1
Using the results obtained earlier for Lp (0, ρs ), when s is even, we get
n(s/2)−1
ρs (n) = · L2 (0, ρs ) Lp (0, ρs )
π −s/2 Γ(s/2) p
p odd
⎧ e+1
⎫
⎪
⎨ 1 − p(s/2)+1
ψs (p) ⎪
⎬
n(s/2)−1 ψs (p)
= L2 (0, ρs ) 1 − s/2 ,
π −s/2 Γ(s/2) p ⎪
⎩ 1− ψs (p) ⎪
⎭
p pe n p(s/2)+1
p odd p odd
224 Sums of Squares of Integers
Here the symbol ψs (p) = ((−1)s/2 /p) is the Kronecker character arising from
the first supplement to the law of quadratic reciprocity. Its value is given by
p−1 s
ψs (p) = (−1) 2 ·2 .
Its conductor is f = fψs = 4. We thus have
⎧ ⎫
⎪
⎪ ⎪
⎪
1 ⎨ 1 ⎬
ρs (n) = · L2 (0, ρs ) · n(s/2)−1 ψs (d) ,
π −s/2 Γ(s/2)L(s/2, ψs) ⎪
⎪ d(s/2)−1 ⎪
⎪
⎩ d|n ⎭
d odd
(6.38)
where L(s/2, ψs ) is the Dirichlet L-function associated to ψs .
We recall the well-known functional equation for Dirichlet L-functions in
a format suitable for calculation. If ψ is a Dirichlet character of conductor
f = fψ and if we define δ ∈ {0, 1} so that ψ(−1) = (−1)δ , then
2s−1 L(1 − s, χ)
π −s Γ(s)L(s, χ) = Wχ · f(1/2)−s · .
cos(π(s − δ)/2)
The root number
f
τ (χ)
Wχ = √ δ , where τ (χ) = χ(a)e2πia/f ,
fi a=1
has absolute value 1. If χ is real, its value is actually equal to 1, a result which
is equivalent to the main statement of the law of quadratic reciprocity. (See
Theorem 3.13 or page 104 of the Appendix of Iwasawa [56].)
In our case, since ψs is real-valued and of conductor f = 4, we obtain
L(1 − s/2, ψs )
π −s/2 Γ(s/2)L(s/2, ψs) = 2−s/2 .
cos(π(s/2 − δ)/2)
Combining this formula with (6.38) gives
⎧ ⎫
⎪
⎪ ⎪
⎪
2−s/2 L2 (0, ρs ) cos(π(s/2 − δ)/2)n(s/2)−1 ⎨ ⎬
ρs (n) = ψs (d)d−(s/2)+1 .
L(1 − s/2, ψs ) ⎪
⎪ ⎪
⎪
⎩ d|n ⎭
d odd
(6.39)
From the theory of generalized Bernoulli numbers we also need the following
formula (see Theorem 3.14 or Iwasawa [56], Theorem 1, page 11):
Bs/2,ψs
L(1 − (s/2), ψs ) = − ,
s/2
where
f
∞
def teat tn
Fψs (t) = ψs (a) · ft = Bn,ψs · .
a=1
e − 1 n=0 n!
Representation of Numbers as Sums of Squares 225
s/2
where T = 2/2 . For the case s = 2 we have from Example 6.2,
Θ0 = 0, Θ1 = 0 and Θ2 = (−1)(n −1)/2 ,
where n = 2ord2 (n)·n , with n odd. Hence
L2 (0, ρs ) = 1 + (−1)(n −1)/2 .
Since ψs (d) = (−1)(d−1)/2 for d odd we get
⎧ ⎫
⎪
⎪ ⎪
⎪
1 + (−1)(n −1)/2 ⎨ ⎬
ρ2 (n) = −2 (−1)(d−1)/2
. (6.41)
B1,ψ2 ⎪
⎪ ⎪
⎪
⎩ d|n ⎭
d odd
It only remains to find the value of B1,ψ2 . To do this we note that since the
conductor of ψ2 is 4 and ψ2 (3) = −1, we have
tet ψ2 (3)te3t
Fψ2 (t) = + 4t
−1
e4t e −1
tet (e2t + 1)
= 2t
(e + 1)(e2t − 1)
∞
tet tn
= 2t = Bn,ψs · .
e − 1 n=0 n!
where γ(a/c) is an eighth root of unity which describes the behavior of the
%
theta function ϑ(τ ) = ∞n=−∞ e
πiτ n2
as τ approaches the rational cusp a/c,
that is to say,
(cτ − a)1/2 ϑ(τ ) −→ γ(a/c), c > 0, gcd(a, c) = 1.
To extend the validity of the identity above to values of s < 4, one normally
introduces a convergence factor, an idea originally due to Hecke, and defines
for z a complex variable
Φs (τ, z) = 1 + γ(a/c)s (cτ − a)−s/2 |cτ − a|−z ,
a/c
where a/c runs over all rational numbers and (z) > 2 − (s/2). Determin-
ing ρs (n) is equivalent to calculating the Fourier expansion of the function
Φs (τ, z). The analytic theory of automorphic forms can be used to prove that
so that
π4 L2 (0, ρs ) · 16 90 1
ρ8 (n) = · n3 · · 4 1+ 3 .
Γ(4) 15 π p
p|n
or
ord
p (n) k
ε ε
Lp (0, ρs ) = 1 − s/2 .
p k=0
p(s/2)−1
In particular, when s = 8, we have ε = 1 and
ord
p (n) k
ε 1
Lp (0, ρs ) = 1 − 4 .
p p3
k=0
where T = 1/8, with the understanding that L2 (0, ρs ) = 1 when ord2 (n) = 0.
With the values for Lp (0, ρs ) obtained earlier, we have
2 ord2 (n)−1 ord2 (n)
1 1 1 1
ρ8 (n) = 16n 3
1+0+ + + ···+ − ×
8 8 8 8
! ordp (n) "
1 1
1 + 3 + ···+ 3
,
p p p
p odd
with the understanding that the expression in curly braces is 1 when ord2 (n) =
0. This expression agrees with the earlier expression obtained for n square-
free. Thus in all cases
⎛ ⎞
⎜ 3 3⎟
ρ8 (n) = 16 ⎜
⎝ d − d ⎟
⎠.
d|n d |n
d even d odd
where
s−1 1
L , χn∗ =
2 1 − χn∗ (p)p−(s−1)/2
pn
p odd
∞
(−1)(s−1)/2 · n 1
= .
m=1
m m(s−1)/2
gcd(m,2n)=1
is required in order to obtain explicit formulas for the singular series ρs (n).
We now review the basic facts about the Kronecker symbol needed to solve
this problem.
Let d ≡
0 or 1 (mod 4) and let d not be a perfect square. We denote by
χd (n) = nd , the Kronecker symbol, defined as follows:
d
= 1
1
d
= 0 if p is prime and p|d,
p
d 1 if d ≡ 1 (mod 8),
=
2 −1 if d ≡ 5 (mod 8),
d d
= Legendre’s symbol , if p > 2, p prime, p d,
p p
ν ν
d d
= for m = pr with pr prime.
m r=1
pr r=1
Note that d2 in the definition is Jacobi’s symbol |d| 2
for odd d. See Landau
[64] Definition 20, Chapter VI, page 70, for this definition.
The basic property of the Kronecker symbol is that it is a character modulo
d, that is, χd (a) = 0 if gcd(a, d) = 1, χd (1) = 1, χd (a)χd (b) = χd (ab) for all a
and b, and χd (a) = χd (b) whenever a ≡ b (mod d).
We recall that a number d ≡ 0 or 1 (mod 4), and not a perfect square, is
called a fundamental discriminant if it is not divisible by the square of any
odd prime and is either odd or ≡ 8 or 12 (mod 16).
The following basic result appears on page 219 of Landau [64].
230 Sums of Squares of Integers
LEMMA 6.6
Let s be an odd integer greater than 1. Let χ be a real primitive character
of conductor f . We then have
L s−1
2 ,χ π −s/2 Γ 2s B(s−1)/2,χ
· 2(s−1)/2 · (−1)(s −1)/8 ·
2
= ,
ζ(s − 1) f (s/2)−1 Bs−1
where
f ∞
χ(a)teat tn
= Bn,χ ·
a=1
ef t − 1 n=0
n!
and
∞
tet tn
= Bn · .
et − 1 n=0 n!
PROOF We shall use the functional equations for the Riemann zeta func-
tion and for Dirichlet L-functions in the following well-known form (see The-
Representation of Numbers as Sums of Squares 231
ζ(1 − s)
ζ(s) = ,
2(2π)−s Γ(s) cos(πs/2)
Using the expression obtained earlier for ρs (n), together with Lemma 6.6, we
find that, when s is odd, we have ρs (n) =
L2 (0, ρs ) (−1)(s−1)/2 n 1 B(s−1)/2,χ
·2(s−1)/2 ·(−1)(s −1)/8 ·
2
1 · 1− .
1 − 2s−1 2 2 (s−1)/2 Bs−1
Thus, ⎧7
⎪
⎨8 if n ≡ 5 (mod 8),
L2 (0, ρ5 ) = 5
if n ≡ 1 (mod 8),
⎪
⎩
8
5
4 if n ≡ 5 (mod 4).
If n∗ = (−1)(s−1)/2 n > 0 is a square-free fundamental discriminant, then
ρs (n) =
∗
L2 (0, ρs ) n 1 B(s−1)/2,χ
· 2(s−1)/2 · (−1)(s −1)/8 ·
2
· 1− .
1 − 2s−1
1 2 2 (s−1)/2 Bs−1
Representation of Numbers as Sums of Squares 233
When n ≡ 5 (mod 4), the above expression has to be modified by the factor
n (s/2)−1 1
=
4n 2s−2
n∗
to account for the fact that the conductor of χ(a) = a is 4n.
For m ≡ 1 (mod 4) and square-free and s = 5, we have, using B4 = −1/30,
that ∗
L2 (0, ρ5 ) n 1
ρ5 (n) = 15 · 1 − · 4 · 30 · B2,χ .
16
2 4
If n ≡ 1 (mod 8), then
5
1
ρ5 (n) = 8
15· 1− · 4 · 30 · B2,χ
16
4
1
n n 2
= 60 · χ(a) a − .
n a=1 2
5 3/2
1
ρ5 (n) = 4
15 · 22 · · 30 · B2,χ
16
4
1
4n
2
= 20 · χ(a) (a − 2n) .
4n a=1
with quadratic forms but rather because the level of the associated diagonal
form is a multiple of 4. In the following the symbol p will always denote an
odd prime. Now, due to the almost multiplicative nature of rs (n), it is suf-
ficient, and this will become clear later on, to deal with the values of rs (p ),
and these values can be expressed as linear combinations of eigenvalues of the
Hecke operators whose structure is well known. More precisely we prove the
following result.
µ
πv − σ πv
rs p −1
= ρs p −1
+ Av · , (6.43)
v=1
πv − σ πv
πv = p(k−1)/2+iφp , σ
πv = p(k−1)/2+iψp , (6.44)
π
and φp and ψp are real and φp + ψp ∈ Z . (6.45)
log p
REMARK 6.11 In the above statement, and in the rest of this chapter,
the notation σ πv is taken to suggest that σ πv is a Galois conjugate of the
algebraic integer πv .
In the case when s is odd we no longer can make the assertion that, “rs (n)
is essentially a linear combination of multiplicative functions.” Nevertheless
we can appeal to the theory of Shimura [102] concerning quadratic forms in
an odd number of variables to get an approximation to Theorem 6.8 above.
More precisely we prove this result.
πv = ps/2−1+iφp , σ
πv = ps/2−1−iφp , (6.47)
As an easy consequence of Theorems 6.8 and 6.9 we get the following im-
provement to Hardy’s result, Theorem 6.1.
COROLLARY 6.1
With the same notation as Theorems 6.8 and 6.9 we have for s = 2k,
rs (p ) = ρs (p ) + O p(s/4−1/2) , (6.49)
REMARK 6.12 Parts (6.43) and (6.44) of Theorem 6.8 were known
before for a few values of s. In fact Hardy and Ramanujan had obtained (in
a different notation)
where πp + σ πp = τ (p). (See Hardy [50], pages 155 and 165.) By a more
delicate analysis than that of Hardy and Ramanujan, Rankin proved ([90],
page 405) that
8 p9 − 1 16 π1 − σ π1 π − σ π
r20 (p−1 ) = · + + 76 · 2 σ 2 ,
31 p9 − 1 93 π1 − π1
σ π2 − π2
where π1 and π2 and their σ-conjugates are complex numbers. Other examples
will be given in Section 6.6.
REMARK 6.14 The basic idea in the proof of Theorem 6.8 consists in
showing, as we have already observed, that the cusp form ϑ(z)s − Es (z) is
a linear combination of eigenfunctions of the Hecke operators associated to
odd prime powers. A comparison of the coefficients in the q-expansions shows
that rs (n) is a linear combination of functions that are multiplicative on the
set of odd integers. The difficulty with the prime 2 arises because we are
essentially dealing with the Hecke group Γ0 (4). Finally, to get the formula of
the theorem we identify the Fourier coefficients of eigenfunctions with p-adic
spherical functions whose structure is well known and in this way we get parts
(6.43) and (6.44). To get part (6.45) we must apply Deligne’s proof of the
Petersson-Ramanujan conjecture ([23], page 302).
REMARK 6.15 In [50], page 158, Section 9.3, Hardy remarked, “In order
that c2s (n) = 0 for all n, it is necessary and sufficient that 2s ≤ 8. In these
cases only r2s (n) is a ‘divisor function,’ represented by the ‘singular series’ of
Section 9.8. The ‘reasons’ for this have been shown in a new light recently
by Siegel, Annals of Math. (2), 36 (1935), 527–606.” Although not entirely
incorrect, his remark is somewhat misleading, as Theorems 6.8 and 6.9 clearly
show.
REMARK 6.16 Theorems 6.8 and 6.9 and a simple application of Kro-
necker’s theorem on diophantine approximation make precise Ramanujan’s
empirical observation ([88], page 159, # 142),
rs (n) − ρs (n) = o(n(s/4−1/2) ).
This shows in fact that Corollary 6.1 is best possible.
REMARK 6.17 The formulas of Theorems 6.8 and 6.9 contain crude
versions of the interesting recursion relations for rs (n) obtained by Newman
([76], page 778).
PROOF No new idea is needed in the proof beyond basic results that can
be found in the literature, though not all in one place. For this reason and for
the sake of the reader, we assemble here the necessary results. When s = 2k,
the generating function is a modular function of type M (Γ0 (4), k, χ), where
ab
Γ0 (4) = ∈ SL(2, Z) : c ≡ 0 (mod 4)
cd
and the multiplier system χ is defined by one of the two quadratic characters
χ : Γ0 (4) −→ T = {z ∈ C : |z| = 1},
Representation of Numbers as Sums of Squares 237
ab
−→ χ(d),
cd
associated with the Gaussian field Q((−1)1/2 ). More precisely,
−1
d if k is odd,
χ(d) = −1 2
d if k is even.
For a more precise definition of the quadratic residue symbol (c/d) see Shimura
[102], page 442. The dependence of χ on k should be kept in mind in the
following since it indicates that ϑ(z)2k has two different multiplier systems
depending on the parity of k. Recall that a modular form f (z) is of type
M (Γ0 (4), k, χ) if f (z) is regular in the upper half plane, has a suitable q-
expansion about each cusp of Γ0 (4) and satisfies
az + b ab
f = χ(d)(cz + d)k f (z), ∈ Γ0 (4).
cz + d cd
The dimension of the space of such functions is finite and can be computed
by means of the Petersson-Riemann-Roch theorem. For example, when k is
even we have
k
dim M (Γ0 (4), k, χ) = + 1,
2
while when k is odd, the dimension of the space of cusp forms is
1
µ= (2k − 5 − χ(P1 ) − χ(P2 ) − χ(P3 )),
4
where P1 , P2 and P3 are three nonequivalent parabolic elements of Γ0 (4).
Hardy’s estimate rs (n)−ρs (n) = O(ns/2 ) indicates that the function f (z) =
ϑ(z)s − Es (z) is a cusp form on Γ0 (4). The theory of Hecke operators acting
on the space of cusp forms of type M 0 (Γ0 (N ), k, χ) is fairly well understood,
having been worked out in great detail by Atkin-Lehner in [4] for the case of
χ trivial and then by Winnie Li [67] for general χ. In the following we shall
make essential use of several results from the paper of Winnie Li. Let us first
recall some basic definitions. Let M − (Γ0 (4), k, χ) denote the space generated
by
1. cusp forms f (z) of type (Γ0 (1), k.χ),
is defined by
∞
(f |Tp )(z) = χ(p)pk−1 a(n/p) + a(np) q n ,
n=1
where a(x) = 0 if x is not an integer. The basic fact that we shall use from
Winnie Li ([67], page 295) is that a new form is uniquely determined by its
eigenvalues. In fact if f (z) is normalized and f |Tp = a(p)f , f |U2 = a(2)f ,
where U2 is T2 truncated so that only the terms with a(2n)q n in the series
appear, then
∞
f (z) = a(n)q n .
n=1
Now the space of old forms is left invariant under the action of the Hecke
operators and so a basis can be selected consisting of eigenfunctions. Nev-
ertheless, these are no longer uniquely determined by their eigenvalues. Let
f1 , . . . , fα be a basis of M + (Γ0 (1), k, χ) consisting of normalized new forms,
g1 , . . . , gβ be a basis of M + (Γ0 (2), k, χ) consisting of normalized new forms
and h1 , . . . , hγ be a basis of M + (Γ0 (4), k, χ) consisting of normalized new
forms. Now any cusp form in M 0 (Γ0 (4), k, χ) is a linear combination of the
above basis elements and the translates g1 |B2 , . . . , gβ |B2 ([67], page 295). In
particular we have the linear combination
α
β
γ
ϑ(z) 2k
− E2k (z) = av fz (z) + (bv gv (z) + bv gv |B2 (z)) + cv hv (z).
v=1 v=1 v=1
Representation of Numbers as Sums of Squares 239
where each av (n) is a function which is multiplicative on the set of odd in-
tegers, that is, av (mn) = av (m)av (n) when m and n are odd and relatively
prime. In particular we have
µ
r2k (p ) = ρ2k (p ) + Av av (p ).
v=1
At this point we could invoke the theory of p-adic spherical functions which
gives the structure of the functions av (p ). We prefer to derive our results by
elementary considerations. By Theorem 3 of Winnie Li ([67], page 295) we
have for each av (p ) the “local factor” identity
∞ ∞
T w w+1 πv − σ πv
= av (p )T = T ,
1 − av (p)T + χ(p)pk−1 T 2 w=0 πv − σ πv
=1
PROOF The situation here is more complicated than what we have con-
sidered above, and the corresponding results are less complete. The proof of
240 Sums of Squares of Integers
the theorem relies heavily on the theory of Shimura concerning modular forms
of half integral weight [102]. The definitions and results we now describe are
taken almost entirely from Shimura’s paper.
Let GL+ 2 (R) be the group of all real 2×2 matrices with positive determinant,
let T = {z ∈ C : |z| = 1} denote the circle group, andlet H be the upper half
ab
plane. As usual the action of an element α = on an element z ∈ H
cd
is denoted by α(z) = (az + b)/(cz + d). We let G̃ 0 be
the group consisting
ab
of all pairs (α, φ(z)) formed by an element α = of GL+ 2 (R) and a
cd
holomorphic function φ(z) on H, such that
We let
P : G̃0 −→ GL+
2 (R)
We recall that if ϑ(z) is the theta function and γ ∈ Γ0 (4), then the Mordell
automorphy factor ([73], page 363),
ϑ(γ(z))
j(γ, z) =
ϑ(z)
satisfies the relation
c
ab
j ,z = ε−1
d (cz + d)1/2 ,
cd d
then
∞
f |Tκ,χ
N
(p2 ) (z) = b(n)q n ,
n=0
where
2 n λ−1 2 κ−2 n
b(n) = a(p n) + χ1 (p) p a(n) + χ(p )p a ,
p p2
with λ = (κ − 1)/2, and χ1 is a character modulo N defined by χ1 (n) =
χ(n)(−1/n)λ . Here we understand that a(x) = 0 if x is not an integer and
χ1 (p)(n/p) = 0 if p = 2.
N
We recall that the algebra generated by the operators Tκ,χ (p2 ) for all primes
p is commutative and therefore we can find µ (= dim Gκ (N, χ)) basis elements
f1 , . . . , fµ of Gκ (N, χ), such that
fv |Tκ,χ
N
(p2 ) = ωv (p)fv , v = 1, . . . , µ
for all rational primes p, that is, a basis consisting of common eigenfunctions
N
of the Tκ,χ (p2 ) for all primes p. It is shown in ([102], Remark on page 457)
that ϑ(z) is an element of G1 (4, χ0 ), with χ0 trivial. One can then easily see
that if s is odd, ϑ(z)s is an element of Gs (4, χ0 ), and furthermore Hardy’s
estimate rs (n) − ρs (n) = O(ns/4 ) implies that ϑ(z)s − Es (z) is an element in
Ss (4, χ0 ). We can therefore write it as a linear combination of eigenfunctions
4
for the Hecke operators Ts,χ 0
(p2 ), that is,
µ
ϑ(z)s − Es (z) = Cv fv (z), µ = dim Ss (4, χ0 ). (6.51)
v=1
where the product is taken over all primes p, and λ and χ1 are defined as in
the definition of the Hecke operators given above.
∞
! ∞
"! ∞
"
−s λ−1−s 2 −s
At (n)n = χt (m)m a(tm )m .
n=1 m=1 m=1
N
Suppose that f is a common eigenfunction of Tκ,χ (p2 ) for all prime factors p of
N not dividing the conductor of χt . Then Ft belongs to M (Γ0 (Nt ), κ − 1, χ2 )
for a certain positive integer Nt . Moreover, if κ ≥ 5, then Ft is a cusp form.
Furthermore, suppose that f |Tκ,χ
N
(p2 ) = ω(p)f and define a function F on H
by
∞
F (z) = An q n ,
n=1
∞
−1
An n−s = 1 − ω(p)p−s + χ(p)2 pκ−2−2s ,
n=1 p
where the product is taken over all primes p. Let N0 be the greatest common
divisor of the integers Nt for all square-free t, such that a(t) = 0. Then F
belongs to M (Γ0 (N0 ), κ−1, χ2) if κ ≥ 3 and to M 0 (Γ0 (N0 ), κ−1, χ2 ) if κ ≥ 5.
244 Sums of Squares of Integers
We now observe that Theorem 6.9 is true for s ≤ 7 by the results of Hardy
in the following we assume s ≥ 9. Let us then consider a cusp form
[48]; so %
∞
f (z) = n=1 a(n)q n in Gs (Γ0 (4), χ0 ) which is an eigenfunction of the Hecke
4
operators Ts,χ 0
(p2 ) and suppose f |Ts,χ
4
0
(p2 ) = ω(p)f . Then from the identity
∞
t
a(tn2 )n−s = a(t) 1 − χ1 (p) pλ−1−s ×
n=1 p
p
−1
1 − ω(p)p−s + χ0 (p)pλ−2−2s
∞
a(t)
= An n−s ,
L(s + 1 − λ, χt ) n=1
then
π +1 − σ π +1 π − σ π
2
a(tp ) = a(t) − p λ−1
χ t (p) · .
π − σπ π − σπ
Again by Deligne’s proof of the Ramanujan-Petersson conjecture ([23], page
302) we have that
π = p(λ−2)/2+iφp with φp real.
If t is a positive square-free odd integer, then we can apply the above consid-
erations to obtain that
µ
πv+1 − σ πv+1 πv − σ πv (s−3)/2
rs (tp2 ) = ρs (tp2 )+ Cv av (t) − · p χ t (p) ,
v=1
πv − σ πv πv − σ πv
where
πv = ps/2−1+iφp with φp real.
Even though the main subject of this chapter is the study of the coefficients
of modular forms on Γ such as an (Q) above, we give below an example of a
modular form on Γ0 (11) arising from a quadratic form in four variables and
of discriminant different from 1.
Let Q be defined by
n = Q(x1 , x2 , x3 , x4 ).
246 Sums of Squares of Integers
The functions Eχ0 (z) and f (z) are respectively an Eisenstein series and a cusp
form of weight 2 associated with the subgroup Γ0 (11). The Dirichlet series
associated with Eχ0 (z) has the relatively simple form
The Dirichlet series associated with the cusp form f (z) has the Euler product
∞
φf (s) = an n−s = (1 − 11−s ) (1 − ap p−s + p1−2s )−1 .
n=1 p 11
y 2 − y = x3 − x2 .
12 p − 1 18 π − σ π
ap−1 (Q) = · + · ,
5 p−1 5 π − σπ
y 2 − y = x3 − x2
over the finite field Fp of p elements. This shows how a reciprocity law can
be used to completely determine the arithmetical function ap (Q).
A similar decomposition for the space M2 (Γ0 (N )) can also be given in the
cases
N = 14, 15, 17, 19, 20, 21, 24, 27, 32, 36 and 49.
(See Shimura [100], page 182.) Andrianov [3] has given a detailed exposition
of the cases N = 27, 32, 36, and M4 (Γ0 (9)), where the relevant elliptic curves
have complex multiplication, thus making it possible to express the traces of
Frobenius in terms of Hecke grössen characters.
π − σ π
r12 (p−1 ) = ρ12 (p−1 ) + 16 · ,
π − σπ
where
1 − a(p)T + p5 T 2 = (1 − πT )(1 − σ πT )
and
∞
4
a(n)q n = q 1/4 − 3q 9/4 + 5q 25/4 − 7q 49/4 + . . .
n=1
is an eigenform of weight 6.
These examples were suggested by computations in Mordell ([74], pages 100
and 103). Other examples can be derived from the tables of Glaisher ([38],
page 480):
1456 π − σ π
r14 (p−1 ) = ρ14 (p−1 ) + · ,
61 π − σπ
512 π − σ π
r16 (p−1 ) = ρ16 (p−1 ) + · .
17 π − σπ
248 Sums of Squares of Integers
In the last two examples the π can also be given as in the early cases, but
the generating functions do not throw any light on their meaning.
In the odd cases we have the example of s = 9:
+1 σ +1
64 π − π 3 π − π
σ
2 2
r9 (p ) = ρ9 (p ) + · −p · ,
17 π − σπ π − σπ
where
1 − (a(p2 ) + p3 )T + p7 T 2 = (1 − πT )(1 − σ πT )
and ∞
η(2z)12
f (z) = a(n)q n =
n=1
ϑ(z)3
is the unique cusp form of weight 9/2 with respect to ∆0 (4) ([102], page 477).
In this example η(z) is the distinguished 24-th root of the Ramanujan modular
form ∆(z). Examples similar to this one can be given for s = 11, 13, and 15.
In all of the above examples we have assumed that p is an odd prime. We
leave open the problem of how to modify Theorems 6.8 and 6.9 so as to include
the case p = 2.
In the main theorems and in the examples we have written
π = p(s−2)/4+iφp , π = p(s−2)/2+iφp ,
in the even and odd cases, respectively, but we have said nothing
% about the
nature of the angles φp . If we decompose ϑ(z)s − Es (z) = Av fv (z) as
a linear combinations of eigenfunctions fv (z) of the Hecke operators, then
it may happen that all the fv ’s have “complex multiplication” and in this
case the angles φp have a regular distribution with respect to the standard
Lebesgue measure dϑ on the unit circle. In fact the numbers πv can be given an
arithmetical interpretation with the framework of abelian class field theory.
This seems to happen in the case of 10, 12 and 16 squares, and it would
appear that the cases s = 9, 11 and 13 also fall under this category. The
Ramanujan example for s = 24 is clearly one in which the associated modular
form ϑ(z)24 −E24 (z) does not have “complex multiplication.” Actually, in this
case one expects that the angles φp are distributed according to the Sato-Tate
measure (2/π) sin2 φ dφ. In still other cases it may turn out that both types
of modular forms are present. A possible example is ϑ(z)18 − E18 (z).
relation, even if the discussion were confined only to historical aspects, would
take us too far afield and would probably require a separate monograph in
its own right. It is our intention here to discuss a few important examples to
help the reader amplify his awareness of the deep relation between Liouville’s
elementary methods and the original ideas of Jacobi, as first exposed in his
Fundamenta Nova. We encourage the reader to consult the classical sources
including some of the articles by Jacobi. Uspensky [112] refers to earlier
papers in which he explains the connection between Liouville’s elementary
methods and elliptic functions. An excellent modern source is Chapter 4 of
Fine’s monograph [31]. This book contains some of the most beautiful results
of the theory.
Example 6.3
The principal character χ0 . The uniqueness of the theta function
X
∞
2
ϑ(z) = eπin z
n=−∞
Example 6.4
The Gauss character χg . The uniqueness of Jacobi’s identity
Y
∞ X
∞
(1 − q n )3 = (−1)m (2m + 1)q m(m+1)/2 ,
n=1 m=0
X
∞
2
η(z)3 = χg (n)n · q n /8
, q = e2πiz ,
n=1
Example 6.5
The product character χge . The uniqueness of Euler’s identity
Y
∞ X
∞
(1 − q n ) = (−1)n q n(3n+1)/2 , q = e2πiz ,
n=1 n=−∞
X
∞
2
η(z) = χge (n)q n /24
, (6.52)
n=1
Representation of Numbers as Sums of Squares 251
also follows, using the same line of argument as in Hecke [53], from the integral
representation for the Euler product
Y 1
Λ(s, χge ) = π −s Γ(s)(12)s
p
1 − χ ge (p)p
−2s
Z ∞“ ” dy
= y s + y (1/2)−s η(iy) .
1 y
Example 6.6
The Eisenstein character χe . Finally, we note that this character is associated
with the modular form
η(z)4 X∞
2
= χe (n) · n · q n /6 .
ϑ(z) n=1
Z ∞“ ” η(iy)4 dy
= y s + y (3/2)−s · · .
1 ϑ(iy) y
REMARK 6.20 For a deeper study of the modular forms η(z), η(z)3 ,
4
ϑ(z), η(z) /ϑ(z), and other related theta functions, the reader can consult
the encyclopedic monograph of Petersson [85].
with k ≥ 0 and h ≥ 1.
∞
η(z) = q 1/24 (1 − q n ) , q = e2πiz ,
n=1
and denoting by η (z) the derivative of η(z) with respect to z, we obtain this
formula for the logarithmic derivative
∞
η (z) 1 qn
= 2πi − n·
η(z) 24 n=1 1 − qn
∞
1
= 2πi − σ(n)q n
.
24 n=1
Multiplying both sides by η(z), dividing by 2πi and using the formula (6.52)
for η(z), we find
∞
∞
∞
n2 n2 /24 1 2
χge (n) · ·q = − σ(m)q m χge (h)q h /24
.
n=1
24 24 m=1
h=1
Multiply the two q-series on the right side formally and equate the coefficients
of equal powers of q on both sides to obtain the equation claimed in the
theorem statement.
Another example that can be proved along the same lines is the following
identity due to Gauss:
∞
η(2z)2 2
= q (n+(1/2)) /2 .
η(z) n=0
In its most elementary form this identity reflects the fact that the following
two partition problems have the same answer.
Problem 1. In how many ways may a given positive integer N be expressed
as a monotonically decreasing series of odd positive integers?
Representation of Numbers as Sums of Squares 253
Example 6.7
7 = 1+1+1+1+1+1+1
= 3+1+1+1+1
= 3+3+1
= 5+1+1
=7
Example 6.8
7 = 1+2+2+2
= 1+2+4
= 1+6
= 3+2+2
= 3+4
If M denotes the answer to both problems, then its values are given in this
table:
N 1 2 3 4 5 6 7 8 9 10 11 12 13 ...
M 1 1 2 2 3 4 5 6 8 10 12 14 17 ...
This example is taken from the lucid article [110] by Jacques Tits on the
nature of symmetry. There it is revealed that neither an understanding of the
significance of the Gauss identity nor knowledge of the logic of an elementary
proof of the dual nature of Problem 1 and Problem 2 are sufficient to penetrate
the deep symmetries embodied in this identity, a fact that requires that it be
viewed as an analytic expression for the “trace” of an infinite dimensional
representation of a Kac-Moody Lie algebra of type A1 , a subject with many
connections in common with the theory of modular forms presented in this
book.
ϑ1 = πϑ2 · ϑ3 · ϑ4 ,
where
∞
2
ϑ1 (v, z) = −i (−1)n q (n+(1/2)) e(2n+1)πiv ,
n=−∞
∞
(n+(1/2))2 (2n+1)πiv
ϑ2 (v, z) = q e ,
n=−∞
∞
2
ϑ3 (v, z) = q n e2πinv ,
n=−∞
∞
2
ϑ4 (v, z) = (−1)n q n e2πinv ,
n=−∞
and
d +
ϑ1 = ϑ1 (v, z)+v=0 , ϑλ = ϑλ (0, z) for λ = 2, 3, 4.
dv
Jacobi’s alternative proof [57] of his identity was closer to the elementary
ideas already used by Euler and Gauss in proofs of similar results.
The starting point is, mirabile dictu, the identity we wish to prove:
∞
3 ∞
n2 /24 2
χge (n)q = χg (m)m · q m /8
.
n=1 m=1
2 2
By equating each coefficient of q 3m +n to zero, we obtain that for all integers
N ≥ 1, we have
' (
χge (n)χg (m)m m2 − n2 = 0, (6.54)
m≥1 n≥1
Representation of Numbers as Sums of Squares 255
N = 3m2 + n2
Now when n ≡ 1 (mod 6) we have χge (n) = (−1)(n−1)/2 . Also, for odd m we
have χg (m) = (−1)(m−1)/2 . Therefore, Equation (6.54) is equivalent to
' (
(−1)(m−n)/2 m m2 − n2 = 0, (6.55)
m n
with m and n running through all integers satisfying N = 3m2 +n2 , m odd, m
positive and n ≡ 1 (mod 6). Note that n is allowed to be positive or negative.
Example 6.9
If N = 52, the solutions to N = 3m2 + n2 satisfying these conditions are
(m, n) = (1, 7) and (3, −5), so Equation (6.55) becomes
Jacobi then notes that since all the steps are reversible, a proof of (6.55) for
all N will result in the proof of the original q-series identity (6.53). Jacobi’s
goal is then to prove (6.55) by elementary means. From the modern point of
view his key idea is to use the fact that the integers of the form N = 3m2 + n2
are precisely the norms of the numbers in the ring
√ def √
Z[ −3] = {A + B −3 : A, B ∈ Z}.
√
Since Z[ −3] = Z + 2 · Z[ρ], where ρ is a cube root of unity, Jacobi’s idea is
to use the hexagonal symmetry of the ring Z[ρ] of Eisenstein integers in order
to obtain a partition of the solutions of the equation N = 3m2 + n2 into two
equinumerous families {(m, n)} and {(m , n )}, such that
m(−1)(m−n)/2 m2 − n2 = −m (−1)(m −n )/2 m − n .
2 2
It is not unlikely that these ideas of Jacobi, together with the elementary
combinatorial arguments of Euler and Gauss concerning the partition func-
tion, served as catalytic elements in Liouville’s development of his methods.
256 Sums of Squares of Integers
It is nevertheless interesting to note that Liouville does not once mention the
name of Jacobi in any of his first twelve essays regarding the proof of the
fundamental identities.
In this section we show how to derive a Liouville-type identity from an
elliptic modular form identity.
Formal manipulation of the Jacobi triple product identity,
∞ ∞
' −1 2n−1
( 2
(1 − q )(1 − zq
2n 2n−1
)(1 − z q ) = z n · qn ,
n=1 n=−∞
with q = e2πiz , provides the Fourier expansion of the function ϑ1 (0, z)/ϑ1 (u, z):
(1 − q n )2 2N
= 1 − sin u q N
sin − d u,
1 − 2q n cos 2u + q 2n d
n≥1 N ≥1 d|N
where we have put z = eiu and u is a real variable. This can also be written
⎧ ⎫−1
∞ ⎨ sin(2n + 1)u 2
⎬
n 3
(1 − q ) · (−1)n · q (n +n)/2 =
⎩ sin u ⎭
n=1 n≥0
2N
1 − 4 sin u qN sin − d u.
d
N ≥1 d|N
n=1
sin u
(For a detailed derivation, starting from Jacobi’s triple product formula, the
reader may consult the book by Fine [31].)
By using cross-multiplication we can rewrite the last identity in the form
∞
2
χg (n) {sin nu − n sin u} q n
n=1
∞
∞
m2 2N
= 4 sin u χg (m) sin mu · q · q 8N
sin −d u
m=1
d
N =1 d|N
∞
= q M · CM (u),
M=1
Representation of Numbers as Sums of Squares 257
where
2N
CM (u) = 4 χg (m) sin u · sin mu · sin − d u,
d
m, d
We thus have
∞
2
χg (n)q n {sin nu − n sin u} = q M · CM (u).
n=1 M≥1
M≡1 (mod 8)
(M) (M)
Since A−j = Aj for all M and j, we may in fact replace sin ju by an
arbitrary odd function of the integer variable j. We thus have proved the
following Liouville-type result.
m2 + 8dδ = M,
258 Sums of Squares of Integers
where m and d are odd positive integers and δ is a positive integer. Then
0 if M is not the square of an integer
SM =
χg (r){f (r) − rf (1)} if M = r2 .
6.8 Exercises
1. By direct calculation of the first 100 terms in the expression ϑ(q)5 =
{1+2q +2q 4 +2q 9 +2q 16 +2q 25 +2q 36 +2q 49 +2q 64 +2q 81 +2q 100 +· · · }5 ,
verify the following table of values of r5 (10a + b).
b
a 0 1 2 3 4 5 6 7 8 9
0 1 10 40 80 90 112 240 320 200 250
1 560 560 400 560 800 960 730 480 1240 1520
2 752 1120 1840 1600 1200 1210 2000 2240 1600 1680
3 2720 3200 1480 1440 3680 3040 2250 2800 3280 4160
4 2800 1920 4320 5040 2800 3472 5920 4480 2960 3370
5 5240 6240 3760 3920 6720 7360 4000 3360 7920 6800
6 4800 6160 6720 8000 5850 3840 8960 9840 4320 6720
7 10720 9280 6200 5280 9840 10480 7600 6720 11040 13440
8 5872 6730 12960 10320 7520 10080 12400 12480 9200 6240
9 14000 16480 8000 10080 16960 13760 8880 8160 13480 17360
3. Let q be a prime ≡ 3 (mod 4). Use Theorem 6.4 and the class number
formula
q−1
√ a a
h(−q) = class number of Q( −q) = −
a=1
q q
Representation of Numbers as Sums of Squares 259
(This exercise and the next one use the convention that σi (x) = 0 if x
is not a positive integer.)
8. Show that if odd(m) means the largest odd divisor of the positive integer
m, then for all integers n ≥ 1 we have
32
r16 (n) = σ7 (n) − 2σ7 (n/2) + 256 σ7 (n/4)
17
+ (−1)n−1 16 23 ord2 (n) σ3 (odd(n))
n−1
+ 16 (−1)j j 3 23 ord2 (n−jk) σ3 (odd(n − jk)) .
j=1 k≥1
n−jk>0
def
9. Use Equation (6.4) to prove that f8 (n) = r8 (n)/r8 (1) is a multiplicative
function.
Chapter 7
Arithmetic Progressions
Except for the last section, this chapter has little to do with sums of squares
of integers. It proves a famous theorem of Szemerédi that says that every
subset of the positive integers with positive (upper) asymptotic density con-
tains arbitrarily long arithmetic progressions. Three squares x2 , y 2 , z 2 form
an arithmetic progression if and only if y 2 − x2 = z 2 − y 2 , that is, if and
only if x2 + z 2 = 2y 2 . Therefore, an arithmetic progression of three squares is
equivalent to the representation of twice a square as the sum of two squares.
Exercises at the end of this chapter make additional connections between
arithmetic progressions and sums of squares. A further connection between
this chapter and the rest of the book is a proof technique (in Section 7.3)
which was also used in Section 6.2.
7.1 Introduction
In this chapter the notation |A| means the cardinality of A if A is a set and
the absolute value of A if A is a real number. The meaning will always be
clear from the context.
We repeat Definition 3.5 from Chapter 3. For integers a and b, let [a, b] =
{n ∈ Z : a ≤ n ≤ b}. Likewise, let [a, b) = {n ∈ Z : a ≤ n < b}.
Define (a, b] and (a, b) similarly. For a set A of integers and a positive integer
n, let A(n) = |A ∩ [1, n]|, the number of positive integers ≤ n in A. The
upper and lower asymptotic densities of A are d(A) = lim supn→∞ A(n)/n
and d(A) = lim inf n→∞ A(n)/n. Thus, 0 ≤ d(A) ≤ d(A) ≤ 1. If d(A) = d(A),
we say that A has (asymptotic) density d(A) = d(A).
Example 7.1
The odd integers have asymptotic density 1/2, as do the even integers. Both
the set of primes and the set of squares have asymptotic density 0. Exercise
19 of Chapter 2 shows that the asymptotic density of the set A of integers
representable as a sum of three squares is 5/6, since A(n) = n − N (n).
261
262 Sums of Squares of Integers
Example 7.2
If = 5, then (3, 5, 5, 0) and (3, 5, 5, 3) are 5-equivalent, while (4, 5, 5, 3) and
(0, 5, 5, 3) are not 5-equivalent,
For any positive integer r there exists a positive integer N (, m, r),
such that for any function f : [1, N (, m, r)] −→ [1, r] there ex-
ist positive integers a, d1 , . . . , dm , such that f (a + mi=1 xi di ) is
constant on each -equivalence class of [0, ]m .
We now give the proof of Graham and Rothschild [40] for Theorem 7.1.
Note that Statements (b), (c) and (d) give an inductive proof that S(, m)
is true for all positive integers and m.
We now prove the four statements.
PROOF The 1-equivalence classes of [0, 1] are {(0)} and {(1)}. Any
function is constant on a singleton set.
(c) For all positive integers and m, S(, m) implies S(, m + 1).
PROOF Assume S(, m). For fixed r, let M = N (, m, r), M = N (, 1, rM )
and N (, m + 1, r) = M M . Suppose f : [1, M M ] −→ [1, r] is given. Define
g : [1, M ] −→ [1, rM ] by
M−1
g(k) = 1 + (f (kM − j) − 1)rj ,
j=0
(d) For each positive integer , if S(, m) holds for every positive integer m,
then S( + 1, 1) holds.
PROOF Let N ( + 1, 1, r) = 2N (, r, r). For fixed r, let f : [1, 2N (, r, r)]
−→ [1, r] be given. By definition of N (, r, r) there exist positive integers
a, d1 , . . . , dr , such that
r
a+ xi di ∈ [1, N (, r, r)]
i=1
Arithmetic Progressions 265
r
for xi ∈ [0, ] and f a + i=1 xi di is constant on -equivalence classes of
[0, ]r . By the box principle there exist integers 0 ≤ u < v ≤ r, such that
u v
f a+ di = f a + di . (7.1)
i=1 i=1
Therefore
u
v
h(x) = f a+ di +x di
i=1 i=u+1
is constant for x ∈ [0, ]: We have h(x) = h(0) for x ∈ [0, − 1] because
is one equivalence class, and h() = h(0) by Equation (7.1). This proves
S( + 1, 1).
COROLLARY 7.1
If the set of all positive integers is partitioned into a finite number of sets,
then (at least) one set contains arbitrarily long arithmetic progressions.
PROOF Suppose the corollary were false. Then there would exist a
positive integer r and a partition of the positive integers into disjoint sets
sets A1 , . . . , Ar , such that there is a uniform bound, say − 1, on the length
of the arithmetic progressions contained in any Ai . Let n = n0 (, r) and
Bi = Ai ∩ [1, n]. By Theorem 7.1, some Bi contains an -progression. But
this progression is too long to be contained in Ai and this contradiction proves
the corollary.
Clearly C1 and C2 are true, and Ck+1 implies Ck for any positive integer k.
266 Sums of Squares of Integers
The standard technique for proving Statement Ck for k ≥ 3 uses the func-
tion rk (n).
DEFINITION 7.2 In this chapter only, rk (n) denotes the size of the
largest subset of [1, n] that contains no k-progression.
PROPOSITION 7.1
If f (n) is a nonnegative real-valued function defined on positive integers n
and satisfying f (m+n) ≤ f (m)+f (n), then limn→∞ f (n)/n exists and equals
inf n≥1 f (n)/n.
PROOF Since f (n) ≤ nf (1), which is immediate from the triangle in-
equality, we see that f (n)/n is bounded. As f (0) does not appear in the
statement of the proposition, we may define f (0) = 0. Let m and n be pos-
itive integers. Then 0 ≤ m − nm/n < n. Using f (m + n) ≤ f (m) + f (n)
and induction on m we find
m m
f (m) ≤ f (n) + f m − n
n n
m
≤ f (n) + nf (1).
n
Arithmetic Progressions 267
Therefore,
f (m) f (n) n
≤ + f (1).
m n m
Letting m go to infinity we find
f (m) f (n)
lim sup ≤ (7.2)
m→∞ m n
f (m) f (n)
lim sup ≤ lim inf .
m→∞ m n→∞ n
Hence the limit exists, and by Inequality (7.2), it equals inf n≥1 f (n)/n.
r
= the number of solutions of j=1 aj = 0 with aj ∈ Aj .
and
r3 (m)
N
S ∗ (α) = e(nα).
m n=1
Let E(α) = S ∗ (α) − S(α). We will show that E(α) is small compared to N .
Let dn = r3 (m)/m − k(n) so that |dn | ≤ 1 and E(α) = N n=1 dn e(nα). Let
m−1
F (α) = ν=0 e(−να), so that F (0) = m. Let q be a positive integer and for
n = 1, 2, . . ., N − mq define
m−1
σ(n) = σ(m, q, n) = dn+νq .
ν=0
LEMMA 7.1
N −mq
F (qα)E(α) = n=1 σ(n)e(nα) + R(α), where |R(α)| ≤ 2m2 q. In particu-
N −mq
lar, with α = 0, we have mE(0) = n=1 σ(n) + R, where |R| ≤ 2m2 q.
PROOF We have
m−1
N
F (qα)E(α) = e(−νqα) dn e(nα)
ν=0 n=1
m−1 N
= dn e((−νq + n)α)
ν=0 n=1
N
m−1 −νq
= dn+νq e(nα)
ν=0 n=1−νq
−mq
N
m−1
= dn+νq e(nα) + R(α)
n=1 ν=0
−mq
N
= σ(n)e(nα) + R(α),
n=1
LEMMA 7.2
For every positive integer n, σ(n) ≥ 0.
PROOF By definition,
r3 (m)
m−1
σ(n) = − k(n + νq)
ν=0
m
m−1
= r3 (m) − k(n + νq)
ν=0
= r3 (m) − |{n + νq : ν ∈ [0, m − 1], n + νq ∈ A}| ≥ 0
because the set is a subset of an m-progression and contains no 3-progression,
and can have no more than r3 (m) elements.
LEMMA 7.3
Let α be a real number and M be a positive integer. Then there exist integers
p and q, such that |p − qα| < 1/M and 1 ≤ q ≤ M .
LEMMA 7.4
For every real number α there is a positive integer q, such that 1 ≤ q ≤ 2m
and |F (qα)| ≥ 2m/π.
PROOF Apply Lemma 7.3 with α = α and M = 2m. There exist integers
p and q, such that 1 ≤ q ≤ 2m and |p − qα| < 1/(2m). Let β = p − αq. Then
|mβ| < 1/2 and
m−1
1 − e(mβ)
|F (qα)| = |F (−β)| = e(νβ) =
ν=0
1 − e(β)
sin πmβ
= = m · sin πmβ · πβ ≥ m · 2 · 1.
sin πβ πmβ sin πβ π
270 Sums of Squares of Integers
LEMMA 7.5
For all real α we have
π r3 (m) r
|E(α)| < N − + 4πm2 .
2 m N
r3 (m) r2 τ τ 3N 2
I∗ ≥ · ≥ (τ N )2 = .
m 2 2 2
We may assume that N is so large that r/N < 2τ . Then
τ 3N 2 τ 3N 2
|I ∗ − I| ≥ −r > − 2τ N.
2 2
Arithmetic Progressions 271
REMARK 7.2 A more careful analysis of the proof shows that r3 (n) <
cn/ log log n for some constant c > 0.
LEMMA 7.6
If is a positive integer and 0 < δ < 1, then there is a number p(δ, ), such
that if n ≥ p(δ, ), C ⊆ [0, n) and |C| ≥ δn, then there is a set S ⊆ C of the
form
S = {y} + {0, x1 } + · · · + {0, x },
By the box principle there exists a positive integer x1 , such that D(x1 ) ≥
δ 2 n/4. Let C1 be the set of all a ∈ C for which a + x1 ∈ C. It follows that
C1 + {0, x1 } ⊆ C and |C1 | = D(x1 ) ≥ δ 2 n/4 ≥ 2 by Inequality (7.3) with
k = 1.
Repeat this process with C replaced by C1 . We find C2 and x2 , such that C2
is the set of all a ∈ C1 for which a + x2 ∈ C1 . We have C2 + {0, x2} ⊆ C1 and
2
|C2 | ≥ 4(δ 2 /4)2 n ≥ 2 by Inequality (7.3) with k = 2. Continue this process.
Finally we reach C and x , such that C + {0, x } ⊆ C−1 and |C | ≥ 2. Let
y ∈ C . Then {y} + {0, x } ⊆ C−1 and S ⊆ C.
DEFINITION 7.3 For ε > 0 let n3 (ε) be the least positive integer,
such that n ≥ n3 (ε) implies r3 (n)/n < τ3 + ε. This number exists because
limn→∞ r3 (n)/n = τ3 .
We have
S0 ⊆ S 1 ⊆ · · · ⊆ S ⊆ C
D0 ⊆ D1 ⊆ · · · ⊆ D ⊆ [0, n).
Now |S0 | = 1, so |D0 | = |B| ≥ τ n/8. Thus, for i ∈ [0, ] we have τ n/8 ≤
|Di | ≤ n. By the box
principle there is a j ∈ [1, ], such that |Dj | − |Dj−1 | ≤
n/. Write Dj−1 = rσ=1 Pσ , where the Pσ are disjoint arithmetic progressions
with common difference 2xj and with maximum length in Dj−1 . Since Pσ is
maximal we have Pσ + {2xj } ⊆ Dj−1 . But of course Pσ + {2xj } ⊆ Dj
by Equation (7.4). Hence to each Pσ corresponds a unique element of Dj
not in Dj−1 . Therefore, r = |Dj | − |Dj−1 | ≤ n/. Let Pσ be Pσ with m
elements
r dropped from each end. (If |Pσ | ≤ 2m, then Pσ is the null set.) Let
E = σ=1 Pσ ⊆ Dj−1 . Then
|E| ≥ |Dj−1 | − 2mr ≥ τ n/8 − 2mn/ = (τ /8 − 2m/)n ≥ τ n/16
t
by our choice of . By the construction of E we can write [0, n)−E = σ=1 Qσ ,
where the Qσ are disjoint arithmetic progressions with difference 2xj , such
that |Qσ | ≥ m. Therefore,
t
|E ∩ A| = |A| − |A ∩ Qσ |
σ=1
t
≥ τn − (τ + ε)|Qσ | (since m = n3 (ε))
σ=1
= τ n − (τ + ε)(n − |E|)
2
τ τ 2n
≥ τ |E| − εn ≥ −ε n≥
16 32
since ε < τ 2 /32. Hence |E ∩ A| > 0 and there exists an integer d ∈ E ∩ A.
Therefore d ∈ Dj−1 , so d = 2c−b for some b ∈ B and c ∈ Sj−1 ⊆ C. Thus, the
integers b, c, d ∈ A form an arithmetic progression, and this result contradicts
the choice of A. Therefore, τ = τ3 = 0.
REMARK 7.3 Some of the ideas of this proof will appear in the proof (in
the following sections) that all τk = 0. In particular, the set S is an example
of a configuration of order , which we will define in Section 7.6.
Note that 0 ≤ β(X, Y ) ≤ 1. Note also that kI (u) is the set of all vertices
directly connected to u by an edge in I.
The main result of this section is Theorem 7.4, which says roughly that any
large bipartite graph can be decomposed into nearly regular subgraphs. We
also prove several lemmas in this section.
LEMMA 7.7
For all real numbers ε1 , ε2 and δ strictly between 0 and 1, and for all nonempty
bipartite graphs I, there exist subsets X ⊆ A, Y ⊆ B and a positive integer
r, such that:
(a1 ) r ≤ 1/δ,
(b1 ) |X| > εr1 |A| and |Y | > εr2 |B|, and
(c1 ) for all S ⊆ X and T ⊆ Y with |S| > ε1 |X| and |T | > ε2 |Y |, we have
β(S, T ) > β(X, Y ) − δ.
Also,
Thus t ≤ 1/δ. If for all s ≤ 1/δ, Xs and Ys are defined and do not satisfy (c1 ),
then Xs+1 and Ys+1 are also defined, and this contradicts t ≤ 1/δ. Hence, for
some r ≤ 1/δ, X = Xr and Y = Yr are defined and satisfy (c1 ).
LEMMA 7.8
For all real numbers ε1 , ε2 and δ strictly between 0 and 1, and with ε2 < 1/2,
there exist positive integers M and N , such that for all bipartite graphs I
with |A| = m > M and |B| = n > M , there exist subsets X ⊆ A, Y ⊆ B and
a positive integer r, such that
(a2 ) r ≤ 1/δ,
(b2 ) |X| > (1/2)εr1 |A| and |Y | > εr2 |B|,
(c2 ) for all S ⊆ X and T ⊆ Y with |S| > 2ε1 |X| and |T | > ε2 |Y |, we have
β(S, T ) > β(X, Y ) − δ, and
(d2 ) for all x ∈ X, |k(x) ∩ Y | ≤ (β(X, Y ) + 2δ)|Y |.
We claim |Z| ≤ |X|/2. If not, then there exists Z ⊆ Z, such that |X|/2 <
|Z | ≤ |X|/2 + 1. Now
k(X, Y ) = k(Z , Y ) + k(X − Z , Y ),
so that β = αβ(Z , Y ) + (1 − α)β(X − Z , Y ), where α = |Z |/|X|. By
definition of Z, k(Z , Y ) > (β + δ)|Z | |Y |, and so β(Z , Y ) > β + δ. Also,
since |Z | ≤ (1/2)|X| + 1, we have |X − Z | ≥ (1/2)|X| − 1 > ε1 |X| provided
((1/2) − ε1 )|X| > 1, that is, provided |X| > ((1/2) − ε1 )−1 . This condition
−1
. Hence |X − Z | > ε1 |X| and, by (c1 ),
1/δ
holds when M > ε1 ((1/2) − ε1 )
we have β(X − Z , Y ) > β − δ. Hence β ≥ α(β + δ) + (1 − α)(β − δ) =
β + (2α − 1)δ. But by hypothesis, α = |Z |/|X| > 1/2, so β > β, which is
impossible. Hence |Z| ≤ |X|/2, as claimed.
Let X = X − Z and Y = Y . Thus |X| > (1/2)εr1 |A| and (a2 ) and (b2 )
are satisfied. Finally, (d2 ) follows from the definition of Z, while (c1 ) implies
(c2 ).
(b) for all i < m0 , j < n0 , S ⊆ Ci and T ⊆ Ci,j with |S| > ε1 |Ci | and
|T | > ε2 |Ci,j |, we have β(S, T ) ≥ β(Ci , Ci,j ) − δ; and
which is the first part of (a). Lemma 7.9 gives the other conclusions of The-
orem 7.4.
LEMMA 7.9
In the situation of the proof of Theorem 7.4 where this lemma is needed,
there exist integers M and N , such that if m > M , n > N and A ⊆ A
with |A | > ρm, then there exist C ⊆ A and C j ⊆ B for j < n0 , such that
|C| > ε(n0 + 1)|A | and C and the C j satisfy the conclusions of Theorem 7.4,
except the first part of (a), if we choose Ci = C and Ci,j = C j .
Also,
(b3 ) we have
r r
1 1 ε(n0 − j)
|Zj | > ε(n0 − j) |Zj−1 | > |Zj−1 |,
2 2 4
r r
|C j | > ε2 B − C ν > ε2 B − C ν ,
ν<j ν<j
(c3 ) for all S ⊆ Zj and T ⊆ C j with |S| > ε(n0 −j)|Zj | and |T | > ε2 |C j |,
we have
δ
β(S, T ) > β(Zj , C j ) − ,
2
(d3 ) for all x ∈ Zj , we have
δ
|k(x) ∩ C j | ≤ β(Zj , C j ) + |C j |.
2
Once we use (i) to define Zj and C j , we always use it until Zn0 and C n0
are defined. Let j0 be the largest index j < n0 for which (ii) is used. By (b3 )
we have
If j0 = n0 − 1, then
B − C ν < (1 − εr2 )n0 |B| < σn.
ν<n
0
COROLLARY 7.2
For all real numbers ε1 , ε2 , δ, ρ and σ strictly between 0 and 1, there exist
integers m0 , n0 , M , N , such that for all bipartite graphs I with |A| = m > M
and |B| = n > N , there exist disjoint C i ⊆ A, i < m0 , and for each i < m0 ,
disjoint C i,j ⊆ B, j < n0 , such that:
Arithmetic Progressions 279
(a) |A − i<m0 C i | < ρm and |B − j<n0 C i,j | < σn for each i < m0 ;
(b) for all i < m0 , j < n0 , S ⊆ C i and T ⊆ C i,j with |S| > ε1 |C i | and
|T | > ε2 |C i,j |, we have β(S, T ) < β(C i , C i,j ) + δ; and
7.6 Configurations
In this section we define configurations, which generalize arithmetic progres-
sions, and prove some facts about them. For positive integers 1 , . . ., m
(where possibly m = 0) we define a set B(1 , . . . , m ) of sets of positive inte-
gers called configurations as follows.
Example 7.3
The elements of B(1 ) are just the 1 -progressions of positive integers. The ele-
ments of B(1 , 2 ) are the sets of 2 equally spaced nonoverlapping 1 -progressions
with the same common difference.
Example 7.4
For X ∈ B(1 , . . . , m ), the sets Xi , for 0 ≤ i < m , in the definition of configu-
ration are congruent configurations in B(1 , . . . , m−1 ).
Now assume that we are given a set R of positive integers with positive
upper asymptotic density: d(R) > 0. Eventually we will show that R contains
arbitrarily long arithmetic progressions. At present, however, we will define
a sequence t1 , t2 , . . . of positive integers, certain sets
m, respectively. Let
S(∅) = B(∅) = {{n} : n is a positive integer},
P (∅) = {{n} : n ∈ R}.
If X ∈ B(t1 , . . . , tm−1 , ) and S(t1 , . . . , tm−1 ) and P (t1 , . . . , tm−1 ) have been
defined, we let
sm (X) = |{i < : Xi ∈ S(t1 , . . . , tm−1 )}|,
pm (X) = |{i < : Xi ∈ P (t1 , . . . , tm−1 )}|.
Let f1 () = max{|X ∩ R| : X ∈ B()} = max{p1 (X) : X ∈ B()}. Clearly,
f1 ( + ) ≤ f1 () + f1 ( ), so by Proposition 7.1,
f1 ()
α1 = lim
→∞
exists. Also, α1 > 0 because d(R) > 0. Let ε1 () = |α1 − f1 ()/|.
Let K be a positive integer. We will prove that R contains an arithmetic
progression of K − 1 terms, and this will prove Szemerédi’s theorem because
K is arbitrary. Define a positive integer t1 to be sufficiently large depending
on K so that ε1 (t1 ) is small. We shall explain later precisely what this means.
We have now defined several sets and functions for the case m = 1. Next
we define these objects and several others for m = 2. Later we will define
them recursively for general m. Let
S(t1 ) = {X ∈ B(t1 ) : p1 (X) > (α1 − ε1 (t1 )1/4 )t1 },
g2 () = max{s2 (X) : X ∈ B(t1 , )},
g2 ()
β2 = lim sup , and
→∞
g2 ()
µ2 () = β2 − .
The corresponding quantities for m = 1 are degenerate: g1 () = , β1 = 1,
µ1 () = 0. Lemma 7.11 will show that µ2 () → 0 and Proposition 7.2 will
show that β2 is close to 1.
and α2 = lim sup→∞ f 2 ()/. Hence there is a sequence {n }, such that
limn→∞ f 2 (n )/n = α2 . Therefore there exist X (n) ∈ B(t1 , n ) for which
!
s2 X (n) ≥ β2 − µ2 (n ) n
and #
lim p2 X (n) n = α2 .
n→∞
Now for infinitely
many n, the same R-equivalence class occurs in the defi-
nition of p2 X (n) . Choose one such class and denote it by P (t1 ). Clearly,
P (t1 ) ⊆ S(t1 ). Let
! "
f2 () = max p2 (X) : X ∈ B(t1 , ) and s2 (X) ≥ β2 − µ2 ()
and ε2 () = |α2 − f2 ()/|. Lemmas 7.11 and 7.13 will show
lim g2 ()/ = β2 and lim f2 ()/ = α2 .
→∞ →∞
LEMMA 7.10
With the notation above we have
LEMMA 7.11
For every positive integer m, βm+1 = lim→∞ gm+1 ()/ exists and βm+1 ≤
gm+1 ()/ for every positive integer .
Let µm+1 () = gm+1 ()/ − βm+1 , so that µm+1 () ≥ 0 for every .
LEMMA 7.12
We may assume that µm () > 0 for m ≥ 2 and all positive integers . In
other words, if m ≥ 2, is a positive integer and µm () = 0, then R contains
arbitrarily long arithmetic progressions.
REMARK 7.5 Note that Lemma 7.12 implies that we may assume βm < 1
for m ≥ 2.
Note that pm+1 (X) ≥ sm+1 (X)2−t1 ···tm for every X ∈ B(t1 , . . . , tm , ). Take
Y ∈ B(t1 , . . . , tm , ) with sm+1 (Y ) = gm+1 (). Then sm+1 (Y ) ≥ βm+1 by
Lemma 7.11, so that
LEMMA 7.13
For every positive integer m we have αm+1 = lim→∞ fm+1 ()/.
and let |T | = αν (so 0 ≤ α ≤ 1). We have sm+1 (Yj ) ≤ (βm+1 + µm+1 ()) by
definition of µm+1 . Hence
sm+1 (X ) = sm+1 (Yj ) + sm+1 (Yj )
j∈T j∈T
!
≤ βm+1 − µm+1 () αν + (βm+1 + µm+1 ()) (1 − α)ν.
!
But sm+1 (X ) ≥ sm+1 (X) − ≥ βm+1 − µm+1 (L) L − . Hence
! 1 !
βm+1 − µm+1 (L) − ≤ α βm+1 − µm+1 () + (1 − α) (βm+1 + µm+1 ())
ν !
= βm+1 + µm+1 () − α µm+1 () + µm+1 () ,
so that
! 1 !
α µm+1 () + µm+1 () ≤ µm+1 () + + µm+1 (). (7.5)
ν
By Lemma 7.12, we may assume that µm+1 () > 0, so for sufficiently large L,
Inequality (7.5) becomes
!
α µm+1 () + µm+1 () < 2µm+1 (),
so that
!
α ≤ 2 µm+1 (). (7.6)
or
pm+1 (Yj ) ≤ (αm+1 − 2ε) .
The first inequality holds for only αν indices j (those in T ). Hence at least
(1 − α)ν indices j satisfy the second inequality. But j ∈ T implies
!
pm+1 (Yj ) < βm+1 − µm+1 ()
But
pm+1 (Yj ) = pm+1 (X ) ≥ pm+1 (X) − > (αm+1 − ε)L − .
j<ν
Arithmetic Progressions 285
Therefore
!
(αm+1 − ε)L − < αν βm+1 − µm+1 () + (1 − α)ν (αm+1 − 2ε) .
and
1 !
αm+1 − ε − < α βm+1 − µm+1 () + (1 − α) (αm+1 − 2ε) . (7.7)
ν
!
The right side of Inequality (7.7) is a convex combination of βm+1 − µm+1 ()
and αm+1 − 2ε. By Inequality ! (7.6), 1 − α is arbitrarily close to 1 for suffi-
ciently large. Since βm+1 − µm+1 () is bounded, we obtain a contradiction
from Inequality (7.7) by choosing and L sufficiently large.
Hence our assumption was false and there does exist an index j < ν, such
that both !
sm+1 (Yj ) ≥ βm+1 − µm+1 ()
and
pm+1 (Yj ) > (αm+1 − 2ε) .
Therefore, fm+1 ()/ > αm+1 − 2ε for all sufficiently large . Since ε > 0 was
arbitrary, we have proved Lemma 7.13.
PROPOSITION 7.2
For all m ≥ 1, we have
$!
! !
βm+1 ≥ 1 − 2 µm (tm ) + εm (tm ) + µm (tm ) . (7.8)
PROOF Let φ(tm ) denote the right side of Inequality (7.8). By the
definitions of εm and αm , for all positive integers
! exists an X ∈
there
B(tt , . . . , tm−1 , tm ), such that s (X) ≥ βm − µm (tm ) tm and pm (X) ≥
m
Therefore,
!
sm (Yj ) ≤ βm − µm (tm ) tm a + (βm + µm (tm )) tm (1 − a).
j<
!
But j< sm (Yj ) = sm (X) ≥ βm − µm (tm ) tm , so
! !
βm − µm (tm ) ≤ βm − a µm (tm ) + (1 − a)µm (tm )
and
! !
a µm (tm ) + µm (tm ) ≤ µm (tm ) + µm (tm ). (7.9)
If m = 1, then µ1 ( ) = 0 for all , s1 (Yj ) = t1 for all j < , and hence
T1 = ∅. Therefore a = 0 and
!
a ≤ 2 µm (tm ). (7.10)
Now suppose m > 1. By Lemma 7.12 we may assume µm (tm ) > 0. Then
for sufficiently large, we have from Inequality (7.9)
2µm (tm ) !
a≤ ! < 2 µm (tm ).
µm (tm ) + µm (tm )
and |T2 | = b (so 0 ≤ b ≤ 1). For j ∈ T1 we have pm (Yj ) ≤ (αm + εm (tm ))tm .
Therefore,
pm (Yj ) = pm (Yj ) + pm (Yj ) + pm (Yj )
j< j∈T1 j∈T2 j∈T1 ∪T2
$!
!
≤ atm + αm − εm (tm ) + µm (tm ) tm b
Thus,
!
αm − εm (tm ) ≤ 2 µm (tm ) + αm + εm (tm )
$
! !
−b εm (tm ) + µm (tm ) + εm (tm ) ,
and so
$
! !
b εm (tm ) + µm (tm ) + εm (tm ) ≤ (7.11)
!
2 µm (tm ) + εm (tm ) + εm (tm ).
We may assume that ε1 (t1 ) > 0. (The proof is like that! of Lemma ! 7.12.) For
m > 1 we have µm (tm ) > 0 by Lemma 7.12. Hence, εm (tm )+ µm (tm ) > 0
for every positive integer m. If εm ( ) = 0 for all sufficiently large , then we
may choose so large that
!
εm (tm ) ≤ εm (tm ) ≤ εm (tm ).
Otherwise, we may choose a large tm so that εm (tm ) > 0, and then the
preceding inequalities also $!hold for sufficiently large . Hence by Inequal-
!
ity (7.11) we have b ≤ 2 εm (tm ) + µm (tm ). But for j ∈ T1 ∪ T2 , we
have Yj ∈ S(t1 , . . . , tm ). Therefore sm+1 (Y ) ≥ (1 − a − b) ≥ φ(tm ) and
Proposition 7.2 is proved.
Define
LEMMA 7.14
If βm+1 > 1 − 1/, then C(t1 , . . . , tm , ) = ∅.
so that gm+1 () = . Thus there exists an X ∈ B(t1 , . . . , tm ) with sm+1 (X) =
so that X ∈ C(t1 , . . . , tm , ) = ∅.
By Proposition 7.2 and the fact that αm+1 ≥ βm+1 2−t1 ···tm , we have that
S(t1 , . . . , tm ) and P (t1 , . . . , tm ) are both nonempty for all m (provided tm is
suitably chosen).
For X = i< Xi ∈ B(t1 , . . . , tm , ) and C ⊆ [0, ), define
PROPOSITION 7.3
For all δ and τ there exists a positive integer , such that if X = i< Xi ∈
C(t1 , . . . , tm , ), C ⊆ [0, tm ), |C| ≥ τ tm , and tm is sufficiently large depending
of , then there exists an integer i < , such that pm (Xi , C) > (αm − δ)|C|
and an integer i < , such that pm (Xi , C) < (αm + δ)|C|.
Hence
! !
βm − µm (tm ) tm ≤ βm − µm (tm ) tm a + (βm + µm ())(1 − a)tm
and ! !
a µm () + µm () ≤ µm () + µm (tm ).
!
Since µm () > 0 by Lemma 7.12, we have a ≤ 2 µm () provided tm is
sufficiently large depending on .
!
Let T2 = j < tm : sm (Yj ) ≥ βm − µm () and pm (Yj ) < αm −
$! ! "
εm () + µm () , and |T2 | = btm . For j ∈ T1 we have by definition of
εm ,
pm (Yj ) ≤ fm () ≤ (αm + εm ()).
Therefore, since T1 ∩ T2 = ∅,
pm (Yj ) = pm (Yj ) + pm (Yj ) + pm (Yj )
j<tm j∈T1 j∈T2 j∈T1 ∪T2
$!
!
≤ atm + αm − εm () + µm () btm
It follows that
$
! !
b εm () + µm () + εm () ≤
! $! !
2 µm () + εm () + εm (tm ) + µm (tm ).
290 Sums of Squares of Integers
≥ (1 − a − b)tm
$!
! !
≥ 1−2 µm () + 2 εm () + µm () tm .
Thus
δ τδ
αm − |C| − tm ≤ (αm − δ)|C|,
2 4
Arithmetic Progressions 291
and so
δ δ τδ τδ
|C| ≤ αm − tm < tm ,
2 2 4 4
which contradicts the assumption that |C| ≥ tm . This proves the first asser-
tion of the proposition.
Suppose pm (Xi , C) ≥ (αm + δ)|C| for all i < . Then
pm (Xi , C) = pm (Yj ) ≥ (αm + δ)|C|.
i< j∈C
Thus
pm (Yj ) = pm (Yj ) + pm (Yj )
j∈C j∈C∩T1 j∈C−T1
!
≤ 2 µm ()tm + (αm + εm ())|C|.
Comparing the two inequalities for j∈C pm (Yj ), we have
!
(αm + δ)|C| ≤ 2 µm ()tm + (αm + εm ())|C|,
and !
2 µm ()
|C| ≤ tm ,
δ − εm ()
which is impossible for sufficiently large . This proves the second assertion
and the proof of the proposition is complete.
LEMMA 7.15
Let X = i<K Xi ∈ B(t1 , . . . , tm , K) and write Xi = j<tm Xi,j as usual.
Then (j0 , . . . , jK−1 ) ∈ E(tm , K) if and only if i<K Xi,ji ∈ B(t1 , . . . , tm−1 , K).
LEMMA 7.16
We have
e(t, K, j, i) ≤ t, (7.14)
and for t/4 < j < 3t/4, K ≥ 2, and t ≥ 4,
t
e(t, K, j, i) ≥ . (7.15)
K2
PROOF If (j0 , . . . , jK−1 ) ∈ E(t, K, j, i), then all js < t and ji = j. Thus,
if K > 1, then the choice of ji+1 (or ji−1 if i = K − 1) determines the rest
of the progression. But there are at most t choices for ji+1 (or ji−1 ), so that
e(t, K, j, i) ≤ t. If K = 1, then e(t, K, j, i) = 1 ≤ t. This proves Inequality
(7.14).
To prove Inequality (7.15), we note that since we allow both increasing and
decreasing arithmetic progressions, the worst case occurs when j = t/2 and
i = K − 1. Clearly
(a, a + d, . . . , a + (K − 1)d) ∈ E(t, K, t/2, K − 1)
Arithmetic Progressions 293
LEMMA 7.17
If > 1, t ≥ 2 − and L ⊆ [0, t) with |L| > (1 − 1/)t, then L contains an
-progression.
for prime . When > 1 is composite, Lemma 7.17 holds with 2 − replaced
by a smaller number (but not much smaller).
We will use induction to prove the existence of K-tuples of type (m, i).
Note that by Lemma 7.16,
G0 (t1 , . . . , tm , K) = C(t1 , . . . , tm , K).
Now let X ∈ B(t1 , . . . , tm , K) and let 0 ≤ i ≤ s < K. We shall define
a bipartite graph I(X, i, s) which will eventually connect Theorem 7.4 with
the rest of the proof. Although the vertex sets A and B are supposed to be
disjoint, we shall use the set [0, tm ) for both A and B. This will cause no
confusion. The pair {j, j } ∈ [A, B] is an element of the graph I(X, i, s) if
(j0 , . . . , jK−1 ) ∈ F (X, j, i, s) and ji = j .
The graph I(X, i, s) is crucial to the remainder of the proof. We may
picture I(X, i, s) as follows. Write X ∈ B(t1 , . . . , tm , K) as X = i<K Xi ,
with Xi ∈ B(t1 , . . . , tm ) and Xi = j<tm Xi,j for i < K as usual. Display
the Xi,j in the following way:
B A
X0,0 ··· Xi,0 ··· Xs,0 ··· XK−1,0
X0,1 ··· Xi,1 ··· Xs,1 ··· XK−1,1
.. .. .. ..
. . . .
.. .. ..
. . Xs,j .
.. .. .. ..
. . . .
.. .. ..
. Xi,j . .
.. .. .. ..
. . . .
X0,tm −1 · · · Xi,tm −1 · · · Xs,tm −1 · · · XK−1,tm −1
Arithmetic Progressions 295
The set A corresponds to the set of indices j of the Xs,j with j < tm . The
set B corresponds to the set of indices j of the Xi,j with j < tm . The
pair {j, j } is in the bipartite graph I(X, i, s) if and only if the arithmetic
progression j0 , . . . , jK−1 defined by letting ji = j and js = j has all its terms
in [0, tm ) and Xi ,ji ∈ P (t1 , . . . , tm−1 ) for all i < i.
3. Let τ (m) = σ (m) /n0 and let m be a large number for which the
(m)
1
2K 3 22K m < !
1 − βm+1
296 Sums of Squares of Integers
(E) Let tm > 4(2m−1 − m−1 ) for m ≥ 2. (This insures that Lemma
7.17 holds with = m−1 and t ≥ tm /4.)
!
(F) Let tm be so large that µm (tm ) < min(βm , 1 − βm ), which is
possible since we may assume that 0 < βm < 1.
DEFINITION 7.8 We shall say that X is well saturated if for all s with
i + 1 ≤ s < K we have
m
p (Xi , Cµ,ν (X, s)) − αm |Cµ,ν (X, s)| ≤ δ (m) |Cµ,ν (X, s)|
and m
p (Xi , C µ,ν (X, s)) − αm |C µ,ν (X, s)| ≤ δ (m) |C µ,ν (X, s)|
whenever
|Cµ,ν (X, s)| ≥ τ (m) tm and C µ,ν (X, s) ≥ τ (m) tm ,
(m) (m)
where µ < m0 and ν < n0 .
PROPOSITION 7.4
If i + 1 < K and X ∈ Gi (t1 , . . . , tm , K) is well saturated, then X ∈
Gi+1 (t1 , . . . , tm , K).
Define Z ⊆ A(m) by
tm 3tm
Z= j : <j< and j ∈ Cµ and
4 4 (m)
µ<m0
α i+1 t α i t "
m m m m
f (X, j, i + 1, s) ≤ and f (X, j, i, s) ≥ .
2 K2 2 K2
Case (i). Assume |Z| ≤ αK m (1 − βm )tm . By Theorem 7.4, which applies
by (A) in Section 7.8, we have
(m)
A − Cµ ≤ ρ(m) |A(m) | ≤ ρ(m) tm = αK
m (1 − βm )tm .
(m)
µ<m0
298 Sums of Squares of Integers
m (1 − βm )tm
Also, by definition of Gi (t1 , . . . , tm , K), there are at most 2iαK
indices j with f (X, j, i, s) < (αm /2)i tm /K 2 . Thus
i+1 t %
j : tm < j < 3tm and f (X, j, i + 1, s) ≤ αm m
4 4 2 K2
≤ αK
m (1 − βm )tm + αm (1 − βm )tm + 2iαm (1 − βm )tm
K K
m (1 − βm )tm ,
= 2(i + 1)αK
(m)
which is a contradiction. Thus, for some µ < m0 ,
(m)
|Z ∩ Cµ | > αK
m (1 − βm )|Cµ | = ε1 |Cµ |.
Let Z = Z ∩ Cµ . We (m)
first show that k(Z , B ) is not very large.
Let X = B (m)
− ν<n(m) Cµ,ν = [0, tm ) − ν<n(m) Cµ,ν . By Theorem 7.4,
0 0
|X| < σ (m) tm . Also by Theorem 7.4, for x ∈ Cµ we have
|k(x) ∩ Cµ,ν | ≤ β(Cµ , Cµ,ν ) + δ (m) |Cµ,ν |,
and so k(Z , Cµ,ν ) ≤ β(Cµ , Cµ,ν ) + δ (m) |Cµ,ν | |Z |. Therefore,
k(Z , B (m) ) = k(Z , Cµ,ν ) + k(Z , X)
(m)
ν<n0
≤ β(Cµ , Cµ,ν ) + δ (m) |Cµ,ν | |Z | + |Z | |X| (7.17)
(m)
ν<n0
≤ β(Cµ , Cµ,ν )|Cµ,ν | |Z | + δ (m) + σ (m) |Z |tm .
(m)
ν<n0
Next let L = {j < tm : Xi,j ∈ P (t1 , . . . , tm−1 )}. Thus for all j ∈ A(m) ,
But we have already seen that |Z | > ε1 |Cµ |. Thus we may apply Theorem
(m)
Now break the sum into two parts according as |Cµ,ν | ≥ τ (m) tm or |Cµ,ν | <
τ (m) tm . By Inequality (7.19) we find
k(Z , L) ≥ β(Cµ , Cµ,ν ) − δ (m) |L ∩ Cµ,ν | |Z |
(m)
ν<n0
− β Z , L ∩ Cµ,ν |L ∩ Cµ,ν | |Z |
(m)
ν<n0
|Cµ,ν |<τ (m) tm
≥ β(Cµ , Cµ,ν )|L ∩ Cµ,ν | |Z |
(m)
ν<n0
−δ (m) |Z | |L ∩ Cµ,ν |
(m)
ν<n0
− 1 · |L ∩ Cµ,ν | |Z |
(m)
ν<n0
|Cµ,ν |<τ (m) tm
≥ β(Cµ , Cµ,ν )|L ∩ Cµ,ν | |Z |
(m)
ν<n0
1 αm i+1
|Z |t m ≥ f (X, j, i + 1, s) = k(Z , L)
K2 2 j∈Z
Therefore,
1 αm i+1
≤ 6δ (m) ,
K2 2
1 1 αm K 1 1 αm i+1
δ (m) = · 2 < · 2
10 K 2 6 K 2
(m)
|Z ∩ C µ | > αK
m (1 − βm )|C µ | = ε1 |C µ |.
(m)
We let Z = Z ∩ C µ , so that |Z | > ε1 |C µ |. By Corollary 7.2, for each
x ∈ Z we have
k(x) ∩ C µ,ν ≥ β(C µ , C µ,ν ) − δ (m) |C µ,ν |.
Therefore,
k(Z , C µ,ν ) = |k(x) ∩ C µ,ν | ≥ β(C µ , C µ,ν ) − δ (m) |C µ,ν | |Z |,
x∈Z
302 Sums of Squares of Integers
and so
(m)
k(Z , B )≥ k(Z , C µ,ν )
(m)
ν<n0
≥ β(C µ , C µ,ν ) − δ (m) |C µ,ν | |Z | (7.21)
(m)
ν<n0
≥ β(C µ , C µ,ν )|C µ,ν | |Z | − δ (m) |Z |tm .
(m)
ν<n0
Now break the sum into two parts according as |C µ,ν | ≥ τ (m) tm or |C µ,ν | <
τ (m) tm . By hypothesis, X is well saturated, so for |C µ,ν | ≥ τ (m) tm we have
PROPOSITION 7.5
Suppose, for some i in 0 ≤ i < K − 1 that X (ξ) ∈ Gi (t1 , . . . , tm , K) for ξ < ,
−K
where ≥ 2m . Assume that
(ξ)
X= Xi ∈ C(t1 , . . . , tm , )
ξ<
(ξ) (η)
and that for each j < i, Xj and Xj are R-equivalent for all ξ, η < . Then
there exists a π < , such that
X (π) ∈ Gi+1 (t1 , . . . , tm , K).
(ξ) (η)
PROOF The R-equivalence of Xj and Xj implies
or
(ii) |C µ,ν (s)| ≥ τ (m) tm and
m (ξ)
p (Xi , C µ,ν (s)) − αm |C µ,ν (s)| > δ (m) |C µ,ν (s)|.
There are actually four possibilities here, depending on which way the in-
equalities go when the absolute value signs are removed. But by the choice of
and m (see 7.8.4), we have
−K
> w(m , 4Km0 n0 ).
(m) (m)
≥ 2m
(m) (m)
Since there are at most m0 choices for µ, at most n0 choices for ν, and at
most K choices for s, then by definition of the van der Waerden function w
(see 7.8.4), there exists an arithmetic progression of length m in [0, ), such
that (i) and (ii) involve the same µ, ν and s and so that the inequalities go the
same way. Let ξ0 , . . ., ξm −1 denote this arithmetic progression and suppose
to be definite that
(ξj )
pm (Xi , Cµ,ν (s)) − αm |Cµ,ν (s)| > δ (m) |Cµ,ν (s)| (7.23)
which contradicts Inequality (7.23). The other three cases are treated in
exactly the same way and Proposition 7.5 is proved.
PROPOSITION 7.6
Let > 0, m ≥ 0 and i with 0 ≤ i < K be fixed integers and assume that
(ξ)
X (ξ) = Xi ∈ Gi (t1 , . . . , tm+1 , K)
i <K
Arithmetic Progressions 305
(ξ) (η)
for ξ < ≤ m . Suppose that for all j < i and ξ, η < , Xj and Xj are
R-equivalent. Write, as usual,
(ξ)
(ξ)
Xi = Xi ,j
j <tm+1
F (X (ξ) , j, i, s) = F (X (η) , j, i, s)
for all ξ, η < , i ≤ s < k and j < tm+1 . To be concise we write F (X (ξ) , j, i, s) =
F (j, i, s) and f (X (ξ) , j, i, s) = f (j, i, s). By definition of Gi (t1 , . . . , tm+1 , K),
there exist for each s in i ≤ s < K sets Zs , Z s ⊆ [0, tm+1 ), such that
|Zs | ≤ 2KαK m+1 (1 − βm+1 )tm+1 , |Z s | ≤ 2Kαm+1 (1 − βm+1 )tm+1 , and
K
(b) f (j, i, s) ≥ (αm+1 /2)i K −2 tm+1 if j ∈ Z s , tm+1 /4 < j < 3tm+1 /4.
Define Z by Z = {j : tm+1 < j < 3tm+1 /4 and there is no (j0 , . . . , jK−1 ) ∈
F (j, i, i), such that
(ξ)
Xi ,ji ∈ Di (t1 , . . . , tm , K) (7.24)
i <K
holds for all ξ < }. Set P = j∈Z F (j, i, i) and p = |P |. In other words,
Z is the set of all indices j with tm+1 /4 < j < 3tm+1 /4, such that for each
(j0 , . . . , jK−1 ) ∈ F (j, i, i), there exists at least one ξ < so that (7.24) fails.
Note that by the definition of F (j, i, i) we have
(ξ)
Xi ,ji ∈ P (t1 , . . . , tm ) ⊆ S(t1 , . . . , tm )
for some i with i ≤ i < K. But by (b) with s = i we see that for each
j ∈ Z − Z i,
f (j, i, i) = |F (j, i, i)| ≥ (αm+1 /2)i K −2 tm+1 .
306 Sums of Squares of Integers
|Z i | ≤ 2KαK
m+1 (1 − βm+1 )tm+1 ,
we have
−2 αm+1 i
p = |P | ≥ |Z| − 2K(1 − βm+1 )αK
m+1 tm+1 K tm+1 . (7.25)
2
On the other hand, it is clear from the definition of P that
P ⊆ F (j, i, s) ∪ F (j, i, s) ,
i≤s<K ξ< j<tm+1 ,j∈Zs j<tm+1 ,j∈Zs
(ξ) (ξ)
Xs,j ∈S(t1 ,...,tm ) Xs,j ∈S(t1 ,...,tm )
so that
p = |P | ≤ K · · 2KαK
m+1 (1 − βm+1 )tm+1 · tm+1
(ξ)
+ K · · 2i αim+1 tm+1 max |{j : Xs,j ∈ S(t1 , . . . , tm ) and j ∈ Zs }|
s,ξ
≤ K 2KαK
m+1 (1 − βm+1 )tm+1
2
(7.26)
(ξ)
m+1 tm+1 max |{j :
+ K 2K αK Xs,j ∈ S(t1 , . . . , tm )}|.
s,ξ
so that
!
(ξ)
max |{j : Xs,j ∈ S(t1 , . . . , tm )}| ≤ 1 − βm+1 + µm+1 (tm+1 ) tm+1
s,ξ
≤ 2(1 − βm+1 )tm+1
by the choice of tm+1 (see (F) in Section 7.8). Hence by Inequality (7.26),
provided
1
K 3 22K m ≤ (1 − βm+1 )−1/2 .
2
But this inequality holds by the choice of tm+1 (see 7.8.4).
(0)
Now Xi ∈ S(t1 , . . . , tm+1 ) because
X
(0)
∈ C(t1 , . . . , tm+1 , K). Thus
(0) !
s m+1
(Xi ) ≥ βm+1 − µm+1 (tm+1 ) tm+1 and so
(0) tm+1 3tm+1
m+1
s Xi , , −Z
4 4
! tm+1 !
≥ βm+1 − µm+1 (tm+1 ) tm+1 − − 1 − βm+1 tm+1
2
1 !
≥ βm+1 − − 2 1 − βm+1 tm+1 by choice of tm+1 (see (F))
2
1 1
> 1− by choice of tm+1 (see (C)).
2 m
By (E) in the choice of tm+1 and by Lemma 7.17, 14 tm+1 , 34 tm+1 −Z contains
an arithmetic progression j (0) , . . ., j (m −1) of length m . Therefore, by the
(ζ)
definition of Z, for each ζ < m , there exists some arithmetic progression j0 ,
(ζ) (ξ)
. . ., jK−1 ∈ F (j , i, i), such that i <K X (ζ) ∈ D (t1 , . . . , tm , K) for all
(ζ) i
i ,ji
ξ < . This completes the proof of Proposition 7.6.
LEMMA 7.18
For every positive integer m and every integer i with 0 ≤ i < K, there exists
a number h(m, i), such that if m ≥ h(m, i) and
−i
X (ξ) ∈ Di (t1 , . . . , tm , K) for ξ < ≤ 2m ,
then there exist
Y (ξ) ∈ Gi (t1 , . . . , tm , K) for ξ < ,
308 Sums of Squares of Integers
such that Y (ξ) X (ξ) and for all ξ, η < , the position of Y (ξ) in X (ξ) is the
same as the position of Y (η) in X (η) .
D0 (t1 , . . . , tm , K) = C(t1 , . . . , tm , K)
and G0 (t1 , . . . , tm , K) = C(t1 , . . . , tm , K).
(ξ) (ξ)
Since Zi = X (0) , the position of each Z (ξ) in X (ξ) is the same, and Z (ξ)
i ,ji
X (ξ) . Now apply the induction hypothesis to Z (ξ) ∈ C(t1 , . . . , tm −1 , K) for
ξ < . By induction, there exist Y (ξ) ∈ C(t1 , . . . , tm , K), for ξ < , such that
Y (ξ) Z (ξ) and for each ξ < , the position of Y (ξ) in Z (ξ) is the same. Thus
Y (ξ) X (ξ) and for each ξ < , the position of Y (ξ) in X (ξ) is the same. This
completes the case i = 0.
Case i > 0. Assume for some i ≥ 0 that Lemma 7.18 holds. We now
prove it for i + 1. Let m∗ = h(m, i) + 1 and h(m, i + 1) = h(m∗ , i). Suppose
m ≥ h(m, i + 1) = h(m∗ , i) and
−(i+1)
X (ξ) ∈ Di+1 (t1 , . . . , tm , K) for ξ < ≤ 2m .
−(i+1) −i
Thus X (ξ) ∈ Di (t1 , . . . , tm , K). Since 2m ≤ 2m , we can apply the
induction hypothesis to obtain configurations Y (ξ)
∈ Gi (t1 , . . . , tm∗ , K), for
ξ < , such that Y (ξ)
X (ξ)
and the position of each Y (ξ) in X (ξ) is the
(ξ)
same. Since X (ξ)
∈ D (t1 , . . . , tm , K), we see that Xj ∈ P (t1 , . . . , tm )
i+1
for all j ≤ i. By the definition of P (t1 , . . . , tm ), all the X (ξ) , for j ≤ i, are
(ξ)
R-equivalent. Since the position of each Y (ξ) in X (ξ) is the same, all the Yj ,
for j ≤ i, are R-equivalent. Since h(m, i) ≥ m, we have m = h(m.i) and we
can apply Proposition 7.6 to the Y (ξ) for ξ ≤ . The conclusion of Proposition
7.6 asserts that there exists a sequence of arithmetic progressions
(ζ) (ζ)
(j0 , . . . , jK−1 ) ∈ E(th(m,i)+1 , K), ζ < h(m,i) ,
Arithmetic Progressions 309
such that for every ξ < and every ζ < h(m,i) we have
(ξ)
Y (ξ, ζ) = Y (ζ) ∈ Di (t1 , . . . , th(m,i) , K)
i ,ji
i <K
(0) ( −1)
and such that ji , . . ., ji h(m,i) forms an arithmetic progression of length
−(i+1)
h(m,i) . Since h(m, i) ≥ m, Y (ξ, ζ) is defined for all ξ < and ζ < 2m .
Thus we have
−(i+1)
Y (ξ, ζ) ∈ Di (t1 , . . . , th(m,i) , K) for ξ < , ζ < 2m .
such Y (ξ, ζ). Thus we may apply the induction hypothesis to these Y (ξ, ζ)
and obtain a sequence
Z(ξ, ζ) ∈ Gi (t1 , . . . , tm , K)
−(i+1)
with Z(ξ, ζ) Y (ξ, ζ) for ξ < , ζ < 2m . By the way in which the
Y (ξ, ζ) are defined, for fixed ζ, the position of Y (ξ, ζ) in X (ξ) is the same
as the position of Y (η, ξ) in X (η) for ξ, η < . Since the position of each
Z(ξ, ζ) in Y (ξ, ζ) is the same (by the induction hypothesis), then for fixed
ζ, the position of Z(ξ, ζ) in Y (ξ, ζ) and X (ξ) is the same as the position of
−(i+1)
Z(η, ξ) in Y (η, ξ) and X (η) for ξ, η < and ζ < 2m . Keep in mind that
(ξ)
for j ≤ i, all the Xj are R-equivalent. Thus for j ≤ i, all the Z(ξ, ζ)j with
−(i+1) −(i+1)
ξ < , ζ < 2m , are R-equivalent. Since ≤ 2m , we see that Z(ξ, ζ) is
defined for ξ, ζ < . Let
Z= Z(ξ, ζ)i , ξ < .
ζ<
(0) ( −1)
As part of the conclusion of Proposition 7.6, we know that ji , . . ., ji h(m,i)
(0) (−1)
forms an arithmetic progression of length h(m,i) . Hence ji , . . ., ji forms
an arithmetic progression of length . Also by the induction hypothesis, the
position of each Z(ξ, ζ) in Y (ξ, ζ) is the same. Therefore Z ∈ B(t1 , . . . , tm , ).
But each Z(ξ, ζ) ∈ Gi (t1 , . . . , tm , K), so each Z(ξ, ζ)i ∈ S(t1 , . . . , tm ). Hence
Z ∈ C(t1 , . . . , tm , K).
Finally, since each Y (ξ, ζ) ∈ Di (t1 , . . . , th(m,i) , K), we have that for j < i,
−(i+1)
Y (ξ, ζ)j ∈ P (t1 , . . . , th(m,i) ). Thus for all j < i and all ζ < 2m , the
−(i+1)
Y (0, ζ)j are R-equivalent. But for ζ < 2m , the position of each Z(0, ζ) in
−(i+1)
Y (0, ζ) is the same. Therefore, for all j < and all ζ < 2m , the Z(0, ζ)j
−K −(i+1)
are R-equivalent. Hence, since 2m ≤ 2m , we may apply Proposition 7.5
310 Sums of Squares of Integers
−K
to the sequence Z(0, ζ), ζ < 2m . The conclusion of Proposition 7.5 asserts
−K
that for some ζ ∗ < 2m ,
But we have already seen that for all ξ < , the position of Z(ξ, ζ ∗ ) in X (ξ) is
the same as the position of Z(0, ζ ∗ ) in X (0) . Since all the Xj with j ≤ i and
(ξ)
ξ < are R-equivalent (they are all in P (t1 , . . . , tm )) we see that, for each
j ≤ i, all the Z(ξ, ζ ∗ )j for ξ < are R-equivalent to Z(0, ζ ∗ )j . Therefore, for
ξ < ,
Z(ξ, ζ ∗ ) ∈ Gi+1 (t1 , . . . , tm , K).
Lastly, since
Z(ξ, ζ ∗ ) Y (ξ, ζ ∗ ) X (ξ) , ξ < ,
the inductive step is complete. This proves Lemma 7.18.
Di (t1 , . . . , tm , K) = ∅.
The conclusion of Proposition 7.6 asserts that there exists a sequence of arith-
metic progressions
(ζ) (ζ)
(j0 , . . . , jK−1 ) ∈ E(tm +1 , K) for ζ < m ,
Arithmetic Progressions 311
such that
Y (ζ) = Yi ,j (ζ) ∈ Di (t1 , . . . , tm , K) for ζ < m ,
i
i <K
(0) ( −1)
and such that ji , . . ., ji m forms an arithmetic progression of length m .
−i
Since m + 1 ≤ m , we have 2m+1 ≤ m , and so we may apply the definition of
−i
h(m+1, i) = m to the sequence Y (ζ) , ζ < 2m+1 . Hence there exists a sequence
−i
Z (ζ) ∈ Gi (t1 , . . . , tm+1 , K), ζ < 2m+1 , such that Z (ζ) Y (ζ) and the position
of each Z (ζ) in Y (ζ) is the same. Since each Z (ζ) ∈ Gi (t1 , . . . , tm+1 , K), we
(ζ) −(i+1)
have Zi ∈ S(t1 , . . . , tm+1 ), for i < K. Thus, letting = 2m+1 , we have
(ζ)
Zi ∈ C(t1 , . . . , tm+1 , )
ζ<
(0) ()
(because ji , . . ., ji forms an arithmetic progression). Since we have Y (ζ) ∈
(ζ)
Di (t1 , . . . , tm , K), for ξ < m , we see that each Yj , for j < i, belongs
(ζ)
to P (t1 , . . . , tm ) and consequently all the Yj , for j < i and ζ < m , are
R-equivalent. Since the position of each Z (ζ) in Y (ζ) is the same, for each
(ζ )
for ζ, ζ < m . But i + 1 < K, so
(ζ)
j < i, Zj is R-equivalent to Zj
−K
that ≥ 2m+1 . Thus the hypotheses of Proposition 7.5 are satisfied for the
sequence Z (ζ) , ζ < . Proposition 7.5 asserts that there exists a ξ ∗ < , such
∗
that Z (ξ ) ∈ Gi+1 (t1 , . . . , tm+1 , K). Finally, we apply Proposition 7.6 to the
∗
single configuration Z (ξ ) (in this case the R-equivalence condition is trivial).
The conclusion of Proposition 7.6 implies that
Di+1 (t1 , . . . , tm , K) = ∅.
It remains to verify that Theorem 7.5 does in fact show that R contains
arbitrarily long arithmetic progressions. According to the theorem there exists
an X ∈ DK−1 (K). By definition of DK−1 (K), this just means that X is
an arithmetic progression in which the first K − 1 terms belong to P (∅) =
R. Since K is arbitrary, R does indeed contain arbitrarily long arithmetic
progressions.
The following corollary shows that all τk = 0.
COROLLARY 7.3
For all ε > 0 and positive integers k, there exists a number nk (ε), such that
if n > nk (ε) and R is a subset of [0, n) with |R| ≥ εn, then R contains a
k-progression.
312 Sums of Squares of Integers
PROOF Suppose the corollary is false. Then there exist integers n1 <
n2 < · · · and Ri ⊆ [0, ni ) with |Ri | > εni , for i ≥ 1, such that Ri contains no
k-progression. Letn1 < n2 < · · · be asubsequence
of ni satisfying ni+1 ≥ 3ni
for all i. Let di = j<i nj and R = i>0 Rni + {di } . Thus
By Theorem 7.5, R must contain a 3k-progression, say A = {a + di : i ∈
[0, 3k)}. Let satisfy d ≤ a + (3k − 1)d < d+1 . Since Rn + {d } can contain
at most k − 1 terms of A, we have a + 2kd < d . But if follows from the
definition of ni and di that d ≥ 3d−1 . Thus
1 1
a + kd ≥ (a + (3k − 1)d) > d ≥ d−1 .
3 3
However, this implies
A ∩ Rn−1 + {d−1 } > k,
x = r 2 − s2 ,
y = 2rs,
z = r 2 + s2 ,
where r > s are positive integers. It is surprising that all Pythagorean triples
arise this way.
Arithmetic Progressions 313
LEMMA 7.19
If m and n are relatively prime positive integers whose product is an exact
square, then each of m, n is an exact square.
It is clear that if d is a positive integer with d|x and d|y, then d2 |x2 +y 2 = z 2 ,
and so d|z. Similarly, if d divides any two of x, y, z, then d divides the third
variable. Thus, if x, y, z is a primitive solution to x2 + y 2 = z 2 , then
Note that each fraction in parentheses is an integer because x and z are odd,
while y is even. If a positive integer d divides both factors on the left side,
then it divides their sum, z, and their difference, x. But gcd(z, x) = 1, so d
must be 1. Therefore, the two factors on the left side are relatively prime.
Since their product is a square, Lemma 7.19 tells us that each is a square. It
follows that there are positive integers r and s so that
z+x z−x y
= r2 , = s2 , = rs.
2 2 2
We have gcd(r, s) = 1 because (z + x)/2 and (z − x)/2 are relatively prime.
Clearly r > s. Also, r and s have opposite parity because r + s = z, which is
odd. When we solve for x, y and z in terms of r and s we get the equations
in the statement of the theorem.
Furthermore, if either of x, z were even, then the other would have to be even
also because of the “2” in x2 + z 2 = 2y 2 . Therefore, both x and z are odd. It
follows from considerations modulo 4 that y must be odd, too.
x = |r2 − s2 − 2rs|,
y = r 2 + s2 ,
z = r2 − s2 + 2rs.
COROLLARY 7.4
All arithmetic progressions x2 < y 2 < z 2 of three squares of integers are
given by
x = k|r2 − s2 − 2rs|,
y = k(r2 + s2 ),
z = k(r2 − s2 + 2rs),
where r > s > 0 are any relatively prime integers of opposite parity, and k is
any positive integer.
7.12 Exercises
1. Show that for any integer r > 1, one can partition the set of positive
integers into r sets, none of which contains an infinite arithmetic pro-
gression.
316 Sums of Squares of Integers
4. Prove that r3 (n) < cn/ log log n for some constant c > 0.
5. Let α and β be real numbers, such that 0 ≤ α ≤ β ≤ 1. Exhibit a set A
of positive integers with lower asymptotic density α and upper density
β that contains no infinite arithmetic progression.
Explain why this statement does not follow immediately from the fact
that the set of all squares has density 0.
11. Let S be the set of all positive integers that are the sum of two squares.
Prove that d(S) = 0. Does S contain arbitrarily long arithmetic pro-
gressions?
Chapter 8
Applications
The notation rs (n) now resumes its meaning as the number of ways of writing
n as the sum of s squares of integers. Likewise, σ(n) again denotes the sum
of the positive divisors of n.
This chapter gives several applications of the theory developed in this book.
We explore the connection between factoring an integer and finding all its
representations as the sum of two squares. We show that it is not necessary
to know the factorization of an integer to express it as the sum of s squares
when s ≥ 3. We discover for what values of s and n one can determine rs (n)
easily. Finally, we give three applications of sums of squares of integers to
problems far removed from number theory, including microwave radiation,
diamond cutting and cryptanalysis.
317
318 Sums of Squares of Integers
n = a2 + b 2 , n = c2 + d2 ,
PROOF If n were prime, then, by Corollary 2.1, n would have only one
representation as the sum of two squares. Therefore, n is composite and has
a proper factor.
With no loss of generality, we may assume that a, b, c and d are all nonneg-
√
ative. If any of a, b, c and d were 0, then n would be a square and so n is a
proper factor. Thus, we may assume all of a, b, c and d are positive integers.
We may also assume that gcd(a, b) = gcd(c, d) = 1 because if either gcd
exceeded 1, then it (and also its square) would be a proper factor of n.
After these preliminaries, we claim that both of the numbers gcd(ad+bc, n),
gcd(ad − bc, n) are proper factors of n. It is well known that we can compute
these gcd’s in polynomial time in the size of n.
Note that
which shows that n divides (ad + bc)(ad − bc). Either n divides one of these
two factors or else both gcd(ad + bc, n) and gcd(ad − bc, n) are proper factors
of n. Clearly, ad + bc > ad − bc because b and c are positive integers. The
equation
n2 = (a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (ad + bc)2
shows that ad+bc ≤ n, and that ad+bc < n unless ac−bd = 0. If ad+bc = n,
then ac = bd. In that case, we have a|bd. But gcd(a, b) = 1, so a|d. Similarly,
d|a and so a = d. Then b = c, and the representation n = c2 + d2 is the same
as n = a2 + b2 , but with a and b swapped. In other words, n = c2 + d2 is
not distinct from n = a2 + b2 . By assumption, this cannot be so. Therefore,
ad + bc < n and so 1 < gcd(ad + bc, n) < n.
Now ad−bc < ad+bc < n. Likewise, bc−ad < ad+bc < n, so ad−bc > −n.
If ad − bc = 0, then a|bc, so a|c; similarly, c|a and c = a, so the representations
are not distinct, again contrary to hypothesis.
Therefore, both gcd(ad + bc) < n and gcd(ad − bc) < n, so both are proper
factors of n.
REMARK 8.1 The same trick works for quadratic forms other than
a2 + b2 . If we know two distinct representations of n by the same quadratic
form a2 + Db2 , then we can factor n easily. See Mathews [69].
Applications 319
Example 8.1
If we are given 50 = 52 + 52 and 50 = 72 + 12 , then gcd(5, 5) = 5 > 1 is a proper
factor of 50. If we ignored this, and found ad + bc = 5 · 7 + 5 · 1 = 40, then we
would have gcd(40, 50) = 10 as a proper factor of 50.
Example 8.2
Suppose we are given 377 = 112 +162 = 192 +42 . Then ad+bc = 11·4+16·19 =
348 and gcd(348, 377) = 29. Also, ad − bc = 11 · 4 − 16 · 19 = −260 and
gcd(−260, 377) = 13.
Example 8.3
Suppose we wish to factor
n = 2128 + 1 = 340282366920938463463374607431768211457,
This gives only one representation of n as the sum of two squares. However, we
get another such representation for free because of the special form of n:
` ´2
n = 2128 + 1 = 264 + 1 = 184467440737095516162 + 1.
ad + bc = 302201021862543689419343173647321450568
and
ad − bc = 302201021862543689402384285931448645560.
One checks easily that these two gcd’s are the prime factors of n.
The first question is easy to answer using Theorem 2.5. If any prime q ≡
3 (mod 4) divides n to an odd power, then r2 (n) = 0; otherwise, r2 (n) > 0.
Now assume that no prime q ≡ 3 (mod 4) divides n to an odd power, so
that r2 (n) > 0. Let us find a representation of n as the sum of two squares.
If n is the sum of two squares, then the prime factors of n are of three types:
1. 2 = 12 + 12 ,
In the first two cases, p = 2 or p ≡ 1 (mod 4), we can use the identity,
Example 8.4
Let us find some representations of powers of 5 as the sum of two squares,
beginning from 5 = 22 + 12 . We have
5 = 22 + 12 = 12 + 22 ,
so
52 = (2 · 1 + 1 · 2)2 + (2 · 2 − 1 · 1)2 = 42 + 32
52 = (2 · 2 + 1 · 1)2 + (2 · 1 − 1 · 2)2 = 52 + 02 .
We are left with the problem of representing a prime p ≡ 1 (mod 4), to the
first power, as the sum of two squares. There are several ways to do this.
√
If p is small enough, one can just try all 1 ≤ a < p to see whether p − a2
is a square. If it equals b2 , then we have p = a2 + b2 . This algorithm has time
√
complexity O( p).
Even slower algorithms use the formulas in these two beautiful theorems.
In each theorem, p = 4k + 1 is prime, and p = a2 + b2 , where a ≡ 1 (mod 4).
Both of these algorithms have time complexity O(p).
Example 8.5
Let p = 13 = 4 · 3 + 1. Then
! !
1 2·3 1 6
a≡ = = 10 ≡ −3 (mod 13).
2 3 2 3
Example 8.6
Let p = 13. Then (p − 1)/2 = 6 and
X6 „ «„ 2 «
j j +1
a=−
j=1
13 13
= −(1 · (−1) + (−1) · (−1) + 1 · 1 + 1 · 1 + (−1) · 0 + (−1) · (−1))
= −3.
The Euclidean algorithm used in the second step is the same one used to
compute gcd(x, p) and is well known to run in polynomial time. We will say
more later about the complexity of finding x in the first step.
Here is why the algorithm works. See pages 32–34 of Perron [83] for the
needed facts about continued fractions. The theory of continued fractions tells
us that
x Ak+1 ϑ
− = , with |ϑ| < 1.
p Bk+1 Bk+1 Bk+2
Multiply by pBk+1 and square to get
2 ϑ2 p2
xBk+1 − pAk+1 = 2 < p
Bk+2
√
2
because p < Bk+2 . Add Bk+1 to get
2 2 2
0 < xBk+1 − pAk+1 + Bk+1 < p + Bk+1 < 2p,
√
since Bk+1 < p. The left side of this inequality is a multiple of p because
x ≡ −1 (mod p). Therefore it must equal p, and we have expressed p as the
2
The last equations are the same as those for the remainders in the Euclidean
algorithm for p/x. Hence, A2k−1 = r1 , A2k−2 = r2 , . . ., Ak+1 = rk−1 ,
Ak = rk , Ak−1 = rk+1 , . . .. From the third property, we have p = rk2 + rk+1 2
.
√
Since k > 1, the first property gives rk−1 = Ak+1 = Bk+2 , which is > p by
√
Hermite’s version of the algorithm. Therefore, rk is the first remainder < p.
If r1 = 1, then p = q0 x + 1 and the continued fraction is [q0 , q0 ]. Therefore,
q0 = x and p = x2 + 1.
Since both versions of the algorithm consist of just the Euclidean algorithm,
both have polynomial time complexity after we find x.
We now turn to the question of finding a square root x of −1 modulo p. It
is here that the algorithm may become probabilistic. First, there is an x with
x2 ≡ −1 (mod p) because p ≡ 1 (mod 4). If we knew a quadratic nonresidue
q modulo p, then we could let x ≡ ±q (p−1)/4 (mod p) with 0 < x < p/2. In
some cases we can exhibit a small quadratic nonresidue modulo p directly. For
example, if p ≡ 5 (mod 8), then q = 2 is a quadratic nonresidue. Likewise, if
p ≡ 17 (mod 24), then q = 3 will work. In the remaining case, p ≡ 1 (mod 24),
we can use this algorithm:
Try successive primes q = 5, 7, 11, . . ., until one is found with q (p−1)/4 ≡ 1
or −1 (mod p). Then x ≡ ±q (p−1)/4 (mod p) with 0 < x < p/2.
This algorithm works because of Euler’s criterion. If q were a quadratic
residue modulo p, then q (p−1)/2 ≡ +1 (mod p) and so q (p−1)/4 ≡ +1 or
−1 (mod p). Hence, if q (p−1)/4 ≡ 1 or −1 (mod p), then q (p−1)/2 ≡ −1 (mod p),
q is a quadratic nonresidue modulo p, and x ≡ ±q (p−1)/4 (mod p) with
0 < x < p/2 is the square root of 1 that we need.
How many q’s might we have to try to get a quadratic nonresidue? Burgess
[14] has shown that for every ε > 0 there is a p0 (ε) so that the least posi-
tive quadratic
√ nonresidue modulo p is < pc+ε for all primes p > p0 (ε), where
c = 1/(4 e) ≈ 0.1516. A similar bound holds for the number of consecutive
integers we must try, beginning at any number, until we find the first quadratic
nonresidue. Even these results seem far from the truth. Vinogradov conjec-
tured that for every ε > 0 there is a p0 (ε) so that the least positive quadratic
nonresidue modulo p is < pε for all primes p > p0 (ε). Although one cannot
prove a good upper bound for the least positive quadratic nonresidue, the
average number of positive integers which must be tried to find a quadratic
nonresidue modulo p is 2.
Applications 325
Example 8.7
If p = 17, then x = 4. The Euclidean algorithm has just one step,
17 = 4 · 4 + 1,
so q0 = x and 17 = 42 + 1.
Example 8.8
Let p = 1217. Since p ≡ 17 (mod 24), we can use q = 3 as a quadratic
nonresidue. This gives x = q (p−1)/4 = 3304 ≡ 78 (mod p). Since 0 < x = 78 <
1217/2, we do not have to replace x by p−x. Note that x2 = 782 ≡ −1 (mod p).
The first steps of the Euclidean algorithm for 1217/78 are
1217 = 78 · 15 + 47
78 = 47 · 1 + 31
47 = 31 · 1 + 16.
√ √
We have 31 < p < 47, so 31 is the first remainder < p. Hence, p = 1217 =
312 + 162 .
We remark that 50 is the smallest integer which can be written as the sum
of two positive squares in two different ways, 50 = 12 + 72 = 52 + 52 , and 65 is
the smallest integer which can be written as the sum of two different positive
squares in two different ways, 65 = 12 + 82 = 42 + 72 .
Finally, it follows easily from Theorem 2.5 that a positive integer n is the
sum of two positive squares if and only if n = 2e n1 n22 , where e is a nonnegative
integer, n1 is an integer > 1 all of whose prime factors are ≡ 1 (mod 4) and
n2 is the product of zero or more primes ≡ 3 (mod 4).
Example 8.9
Express n = 139335 as the sum of three triangular numbers.
Our first task is to write 8n + 3 = 1114683 as the sum of three squares. Since
8n + 3 ≡ 3 (mod 8), we subtract odd squares from it until we reach twice a
(probable) prime. We soon find 8n+3−112 = 2p, where p = 557281 is probably
prime.
Now we have to write p = 557281 as the sum of two squares. Since p ≡
1 (mod 24) there is no obvious quadratic nonresidue. The first two primes, 2,
3, don’t work, and the next two primes, 5, 7, are quadratic residues, too. But
the Legendre symbol (11/p) = −1 so q = 11 is a quadratic nonresidue modulo
p.
We compute x ≡ 11(p−1)/4 = 11139320 ≡ 252875 (mod p), with 0 < 252875 <
p/2, so x = 252875 is the solution to x2 ≡ −1 (mod p) that we can use.
Applications 327
The Euclidean algorithm for p/x (or for gcd(p, x)) begins this way.
say. Finally, we use the formula from the proof of Corollary 2.3,
that every prime is the sum of four squares, and then appeals to Euler’s
identity,
(a2 + b2 + c2 + d2 )(e2 + f 2 + g 2 + h2 ) = w2 + x2 + y 2 + z 2 ,
where
w = ae + bf + cg + dh, x = af − be ± ch ∓ dg,
y = ag ∓ bh − ce ± df, z = ah ± bg ∓ cf − de,
which shows that the product of two numbers, each of which is the sum of
four squares, is also the sum of four squares.
One might surmise from this proof that it is necessary to know the prime
factorization of n in order to find a representation of n as the sum of four
squares. However, we will show that there is a probabilistic polynomial time
algorithm to find such a representation, without knowing the factorization of
n.
First remove any factors of 4 from n. We assume this has been done and
so n ≡ 1, 2 or 3 (mod 4).
If n ≡ 1 or 2 (mod 4), then one of the four squares can be taken to be 0.
Use results of the previous section to write n as the sum of three squares.
In case n ≡ 2 (mod 4), an alternative approach is to write n = p + q, where
p and q are primes with p ≡ q ≡ 1 (mod 4). Write p and q each as the sum of
two squares by Brillhart’s variation of Hermite’s algorithm.
If n ≡ 3 (mod 4), then n − 12 ≡ 2 (mod 4) is the sum of three squares, so
use results of the previous section to get n − 1 = x2 + y 2 + z 2 .
The situation for sums of four or more positive squares was settled by Pall.
Example 8.10
Find r2 (n) for n = 23 p21 p32 p43 q 6 , where q ≡ 3 (mod 4) and the pi ≡ 1 (mod 4)
are distinct primes.
The power of 2 is irrelevant. Since q 6 exactly divides n, it contributes 1,
rather than 0, as a factor of r2 (n). The pi ’s contribute factors of 2 + 1, 3 + 1
and 4 + 1 to r2 (n). Therefore, r2 (n) = 4T (n) = 4 · 3 · 4 · 5 = 240.
def
pk(α+1) − 1
if n = pα , then σk (n) = dk = .
p prime p prime
pk − 1
d|n
330 Sums of Squares of Integers
Example 8.11
Find r4 (n) when
r4 (n) = 8σ(n) = 8σ1 (n) = 8(1 + 2381 + 23812 + 23813 )(1 + 87881)(1 + 354047)
= 3361341898424587272192.
Example 8.12
Find r8 (n) when n = 2381 · 87881 = 209244661.
Since n and all of its divisors d are odd, Formula (6.4) tells us that
without matter), has perfectly reflecting walls and F is smooth, then the free
electromagnetic oscillations E, H in G obey the Helmholtz equations
2
∇ + K 2 E = 0 and ∇2 + K 2 H = 0,
∇ · E = 0, ∇·H= 0
n × E = 0, n·H=0
where
where
Q−1
1
Fs (Q) = rs (n) + rs (Q).
n=0
2
F2 (Q) = πQ + O(Qϑ+ε )
8.3. Exercise 11 of this chapter shows that when these numbers are discarded,
the asymptotic density is unchanged at 5/6, so that (8.3) remains correct.
Essentially the same formulas arise in a problem in nonrelativistic quan-
tum statistical mechanics of an ideal gas at low temperature. The quan-
tum statistician is interested in the asymptotic form of N (3) (Q) with more
precision than the first term πQ3/2 /6, and in particular with the difference
∆W = (6/5)/2W = 3/5W . Thus, (8.3) is important in statistical mechanics
as well as in electromagnetism. See [7] for more about how the theorem of
Grosswald, Calloway and Calloway [43] is used in statistical mechanics.
Example 8.13
The Miller indices (1, 0, 0), (0, −1, 0), (0, 0, 2) and (1, −2, 1) represent the planes
x = 1, y = −1, 2z = 1 and x − 2y + z = 1, respectively.
±h x ± k y ± z = 1
Example 8.14
The notation {1; 0; 0} represents the six planes x = ±1, y = ±1, z = ±1, with
independent ± signs.
The notation {1; 1; 0} represents the twelve planes ±x ± y = 1, ±x ± z = 1,
±y ± z = 1, with independent ± signs.
because relatively few chemical bonds span this boundary and so the cleaving
requires the least energy. For a diamond, with its face-centered cube crystal
structure, the eight {1; 1; 1} planes have the densest packing of carbon atoms
of any lattice plane. When naturally occurring (uncut) diamonds are exam-
ined under a microscope, the {1; 1; 1} planes are the most common surfaces
observed. Usually, only one cleavage plane is prominent. It is easily identified
by its transparency and smoothness.
The next reason the guess is wrong comes from the way a diamond reflects
and refracts light. A typical light beam enters the diamond through the table
or an upper facet, where it is bent and its colors are separated. Then it is
reflected totally inside the diamond by one or two lower facets. Finally, it
exits the diamond through the table or an upper facet, where it is refracted
once more, and we see the diamond “sparkle.” The angles between the facets
and the plane through the table must lie in certain limited ranges in order for
this refraction and total reflection to work properly. As shown in Figure 8.3,
if a diamond is too tall (2.) or too short (3.), light will exit through a lower
facet and the gem will not sparkle. Part 1. of Figure 8.3 shows the correct
light path. It turns out that a diamond cut on lattice planes with small Miller
indices would not have facets with the correct angles to make it sparkle. In
1919, Marcel Tolkowsky published a book [111] in which he analyzed which
angles between facets would make a round brilliant-cut diamond sparkle the
most. The angles he recommended were based on the reflection and refraction
properties of diamond and were chosen to maximize the brilliancy, fire and
color of the crystal. These angles simply are not the angles between lattice
planes with small Miller indices.
Finally, when crystals, including diamonds, are cleaved on lattice planes
with small Miller indices, these surfaces are fragile and material tends to flake
off from them. When a diamond is cut into a gem stone, its facets are polished
(with diamond dust) rather than being cleaved, and the facets are chosen to
be not lattice planes. Even though diamond is the hardest mineral, its lattice
planes would still flake when it is handled, giving it a poor appearance.
ed ≡ 1 (mod φ(n)). She keeps p, q and d secret, but makes n and e public,
for example, on her web page or in a directory of RSA users.
When Bob wants to send a secret message to Alice he breaks it into pieces
0 ≤ m < n and enciphers each piece by raising it to the power e modulo n.
He knows e and n because they are public.
When Alice receives the enciphered message c = me mod n, she deciphers it
by raising it to the d power modulo n. This works because of Euler’s theorem,
mφ(n) ≡ 1 (mod n), so that
d
cd ≡ (me ) ≡ mde ≡ m1 = m (mod n),
Given signatures for messages m1 and m2 , Eve could still construct a signature
for some number x = µ(m√ 1 )µ(m2 ) mod n in 0 < x < n, but it is unlikely√(the
probability is about 1/ n) that x = µ(m3 ) for any message 0 < m3 < n.
The ISO Standard 9796-1 used Hamming codes in the definition of µ. The
Standard worked well for eight years, but in 1999 it was partially broken
and, shortly thereafter, completely broken. Coppersmith, Halevi and Jutla √
[21] found a way to construct triples of messages 0 < m1 , m2 , m3 < n
with µ(m1 )µ(m2 ) ≡ µ(m3 ) (mod n). In the same paper they suggested five
alternate redundancy functions to replace the µ they had broken.
The next year Girault and Misarsky [37] found attacks for all five of these
countermeasures. The attack on µ5 from [21] interests us because it uses the
representation of integers as the sum of two squares. √
The redundancy function µ5 was defined [21] for 0 ≤ m < n by µ5 (m) =
(m2 + δ) mod n, where δ is a (public) constant modulo n. Assume that δ is
about as big as n/2.
Girault and Misarsky [37] attacked this countermeasure by letting A =
2(n − δ), so that A is about as large as n, and assuming that A has at least
two distinct representations as the sum of two squares, say,
√
A = w2 + x2 = y 2 + z 2 with 0 < w, x, y, z < n.
√
PROOF We have µ5 (m) = (m2 + δ) mod n when 0 ≤ m < n. Thus
COROLLARY 8.1
If signatures using µ5 are known for any three of the four messages√w, x, y,
z, where A = 2(n − δ) = w2 + x2 = y 2 + z 2 with 0 < w, x, y, z < n, then
one can compute the signature for the fourth one.
PROOF If three of the four numbers µ5 (w), µ5 (x), µ5 (y), µ5 (z) are known,
then the fourth one can be computed from the congruence in Theorem 8.5.
Example 8.15
Suppose Alice uses n = 733549 = 827 · 887 as her RSA modulus. Suppose
δ = 325256. Then
A = 2 · (n − δ) = 2 · (733549 − 325256) = 816586 = 2 · 281 · 1453.
Since the odd prime factors of A are ≡ 1 (mod 4), we can apply Hermite’s
algorithm of Section 8.2 and find
281 = 52 + 162 , 1453 = 32 + 382 .
The identity (8.1) gives us
A
= (16 · 3 + 5 · 38)2 + (16 · 38 − 3 · 5)2 = 2382 + 5932 , and
2
A
= (16 · 38 + 5 · 3)2 + (16 · 3 − 5 · 38)2 = 6232 + 1422 .
2
We use 2 = 12 + 12 and another application of (8.1) to obtain
A = (593 + 238)2 + (593 − 238)2 = 8312 + 3552 and
A = (623 + 142)2 + (623 − 142)2 = 7652 + 4812 .
Then Theorem 8.5 gives
µ5 (831)µ5 (765) ≡ µ5 (355)µ5 (481) (mod n).
Girault and Misarsky [37] gave a similar example with a 512-bit modulus
n. Hermite’s algorithm works fine even for numbers of that size.
8.9 Exercises
1. The Fibonacci number u101 = 573147844013817084101 has exactly two
prime factors. Use the information that u101 = a2 + b2 , where a =
9844590449 and b = 21822737750, to find them.
Applications 341
343
344 References
1944.
[54] C. Hermite. Note au sujet de l’article précédent. J. Math. Pures Appl.,
page 15, 1848. The title refers to Serret [99].
[55] Y. Ihara. Hecke polynomials as congruence zeta functions in the elliptic
modular case. Ann. Math., Ser. 2, 85:267–295, 1967.
[56] K. Iwasawa. Lectures on p-adic L-functions. Annals of Math. Studies
No. 74. Princeton Univ. Press, Princeton, New Jersey, 1971.
[57] C. G. J. Jacobi. Démonstration élémentaire d’une formule analytique
remarquable. J. de Mathématiques, 7:85–109, 1842. Originally published
in German in J. Math., 21:13–32, 1840.
[58] E. Jacobsthal. Anwendungen einer Formel aus der Theorie der
quadratischen Reste. J. Math., 132:238–245, 1907.
[59] G. J. Janusz. Algebraic Number Fields. Academic Press, New York,
1973.
[60] K. L. Jensen. Om talteoretiske Egenskaber ved de Bernoulliske Tal.
Nyt Tidsskrift Für Mathematik, Afdeling B, 26:73–83, 1915.
[61] W. Johnson. p-adic proof of congruences for the Bernoulli numbers. J.
Number Theory, 7:251–265, 1975.
[62] A. Khinchin. Three Pearls of Number Theory. Graylock, 1952.
[63] M. Knopp. Modular Functions in Analytic Number Theory. Markham,
Chicago, 1970.
[64] E. Landau. Elementary Number Theory. Reprinted by Chelsea, New
York, 1955.
[65] A. M. Legendre. Théorie des Nombres, Tome I. Gauthiers-Villars,
Paris, France, 1830.
[66] D. H. Lehmer. Computer technology applied to the theory of numbers,
volume 6 of MAA Studies in Mathematics, pages 117–151. Math. Assn.
Am., 1969.
[67] W. Li. New forms and functional equations. Math. Ann., 212:285–315,
1975.
[68] J. Liu and T. Zhan. Distribution of integers that are sums of three
squares of primes. Acta Arith., 98:207–228, 2001.
[69] G. B. Mathews. Theory of Numbers, Part I. Deighton, Bell & Co.,
Cambridge, 1892.
[70] F. I. Mautner. Spherical functions over p-adic fields, I. Am. J. Math.,
80:441–457, 1958.
[71] T. Metsänkylä. Distribution of irregular primes. J. Reine Angew. Math.,
282:126–130, 1976.
[72] H. L. Montgomery. Distribution of irregular primes. Ill. J. Math.,
9:553–558, 1965.
[73] L. J. Mordell. On the representation of a number as a sum of an odd
number of squares. Trans. Cambridge Phil. Soc., 23:361–372, 1919.
[74] L. J. Mordell. On the representation of numbers as a sum of 2r squares.
Quart. J. of Math., 48:93–104, 1920.
[75] M. B. Nathanson. Additive Number Theory: The Classical Bases.
References 347