Professional Documents
Culture Documents
Hatch
Hatch
APPLICATION
NATHAN HATCH
Abstract. The ultimate goal of this paper is to prove that a prime p can
be expressed as a sum of two squares if and only if p = 2 or p ≡ 1 (mod 4).
More time is spent on developing the tools to answer this question than on
the question itself. To this end, the first section is entirely an introduction to
basic group theory, and the second section extends this theory to prove that the
group Z×p is cyclic. The next section presents the Legendre symbol and proves
one of the supplementary theorems of quadratic reciprocity. Finally, this last
theorem provides the crucial first step in establishing the desired congruence
conditions for primes of the form x2 + y 2 .
Contents
1. Introduction 1
2. Basic Group Theory 2
3. Cyclicity and Z×
p 6
4. The Legendre Symbol 10
5. A Congruence Condition for Primes of the Form x2 + y 2 12
Acknowledgments 13
References 13
1. Introduction
A prime number p can be expressed in the form x2 + y 2 for integral x and y if
and only if p = 2 or p ≡ 1 (mod 4). This observation was first made by Albert
Girard in 1632, and the first person to claim to have a proof was Pierre de Fermat
in 1640. For this reason, the statement is often called “Fermat’s theorem on sums
of two squares,” or “Fermat’s two-squares theorem” for short. The earliest written
proof is (unsurprisingly) due to Leonhard Euler, who used the method of infinite
descent to prove it in 1749, after a great deal of hard work [6].
Fermat’s two-squares theorem provided the impetus for study of the general
question of when a prime has the form x2 + ny 2 , where n is also an integer. This
problem is by now a solved one; however, it took a lot of time—and many develop-
ments in number theory and algebra—to get there. The solution, which is discussed
in David A. Cox’s 1997 book Primes of the Form x2 + ny 2 , involves genus theory,
reciprocity laws, and class field theory [5].
That said, this paper concerns itself merely with proving the original theorem
about primes with the form x2 +y 2 . Our proof will not follow that of Euler; instead,
our aim is to use more recent developments in number theory (which occurred
partially because of studies of the aforementioned general problem) to offer a simpler
explanation than the one Euler gave. The main topic is group theory, which is the
subject of the first two sections. No background knowledge of groups is required; the
first section begins with the basics and ends by proving Lagrange’s theorem. In the
second section we prove that Z× p is cylic. (Note that some familiarity with fields and
polynomials will make several of the lemmas in this section more understandable.)
In section three, we introduce and discuss the Legendre symbol, using the cyclicity
of Z× p to prove the first supplementary law of quadratic reciprocity. With this
knowledge, we finally have the necessary background for our proof of Fermat’s
two-squares theorem in section four.
(i) T ⊆ K, and
(ii) for all subgroups H ≤ G, T ⊆ H implies K ⊆ H.
If T consists of a single element g, the notation h{g}i is simplified to hgi.
Definition 2.4 requires some justification. We need to show that such a subgroup
always exists, and that it is unique. To prove this, we can use the fact that an
arbitrary intersection of subgroups is itself a subgroup.
T 2.5. Let G be a group and {Hi }i∈I be an arbitrary set of subgroups. Let
Lemma
J = i∈I H. Then J is also a subgroup.
Proof. First note that J is nonempty, because all subgroups contain the identity e.
Now, pick any g, h ∈ J (not necessarily distinct); we wish to show that gh ∈ J and
g −1 ∈ J. For any i ∈ I, we have g ∈ Hi and h ∈ Hi . Then since Hi is closed under
multiplication and inverses, both gh and g −1 must be elements of Hi as well. But
since this is true for all i ∈ I, they must also be elements of the intersection J.
With this fact in mind, we can now show that Definition 2.4 is valid.
Proposition 2.6. If G is a group and T ⊆ G, then hT i exists and is unique.
Proof. Define the set Γ = {H ≤ G : T ⊆ H}. Note that Γ is not empty, because G
itself is necessarily a subgroup containing T . Let
\
K= H.
H∈Γ
Proof. First of all, let S = {g k : k ∈ Z}. We claim S = hgi. It is easy to see that S
is closed under multiplication and inverses; therefore, S is a subgroup of G which
contains g. By definition, this means hgi ⊆ S. To see that S ⊆ hgi, consider any
g k ∈ S. If k = 0, it is clear that g k ∈ hgi, since g 0 = e and all subgroups contain
the identity. If k > 0, again we must have g k ∈ hgi, since hgi is closed under
multiplication and g k = g ∗ g ∗ . . . ∗ g . Finally, k < 0 means g k ∈ hgi, since by the
| {z }
k times
previous case g −k ∈ hgi and hgi is closed under inverses. This covers all cases, so
the desired equality has been established.
It remains to show that |S| = ord(g). We first show this for the case when g has
infinite order. Assume that for all n ∈ N, g n 6= e. If this is true, then we claim that
for all i, j ∈ Z, i 6= j implies g i 6= g j . Say, without loss of generality, that i ≥ j.
If g i = g j , then g i−j = g i g −j = g j (g j )−1 = e, which is impossible if i − j ∈ N.
Therefore i − j = 0, and i = j. This is enough to show that the set S is countably
infinite.
Now for the finite case. Suppose there is an n ∈ N such that ord(g) = n. Our
proof will be complete if we can show that the set S is the same as another set
S 0 = {e, g, g 2 , . . . , g n−1 }, where all elements of S 0 are distinct. Suppose there exist
k, l ∈ {0, . . . , n − 1} such that g k = g l . Then, as above, let k ≥ l, and note that
0 ≤ k − l < n. As before, we see that g k−l = e. But n is the least positive integer
satisfying g n = e, and therefore k − l = 0. This proves the elements of S 0 are
distinct. Now, pick any g a ∈ S, a ∈ Z. By the Division Theorem, a = qn + r for
some q ∈ Z, r ∈ {0, . . . , n − 1}. Then g a = g qn+r = (g n )q g r = g r , which is in S 0 , as
desired. Since clearly also S 0 ⊆ S, we conclude S 0 = S.
We will return to the concept of subgroup generation in the discussion of cyclic
groups in the next section. For now, we are interested in figuring out if there are
any other special properties that a subgroup must satisfy. In particular, if a group
G is finite, is the order of a subgroup related in some way to the order of the whole
group? In fact, it is: the order of any subgroup must divide the order of the whole
group. This important relationship is called Lagrange’s Theorem. In order to prove
it, we introduce a special type of subset, called a coset, that can be created from a
subgroup.
Definition 2.10. Let H ≤ G. For any element g ∈ G, the left and right cosets
of H in G are the sets gH := {gh : h ∈ H} and Hg := {hg : h ∈ H}, respectively.
One very important property of the left cosets (and also of the right) is that they
create a partition of the group G. That is, any two cosets are either disjoint or
equal, and the union of all cosets is the entire group. The latter statement is very
easy to see, since for all g ∈ G, g ∈ gH. Now, to establish the former assertion, it
is helpful to prove first the following lemma.
Lemma 2.11. Let H ≤ G and g1 , g2 ∈ G. Then g1 H = g2 H if and only if there is
an h ∈ H such that g1 = g2 h.
Proof. We first prove the forward direction. Since e ∈ H, we know g1 ∈ g1 H, which
means g1 ∈ g2 H since by assumption the two cosets are equal. Thus, by definition,
there is an h ∈ H such that g2 h = g1 . To prove the other direction, we must show
g1 H ⊆ g2 H. An identical argument will show g2 H ⊆ g1 H, proving equality. So,
pick any g10 ∈ g1 H. Then there is an h0 ∈ H such that g10 = g1 h0 . We must show
GROUP THEORY: AN INTRODUCTION AND AN APPLICATION 5
We claim that each coset gH ∈ S has the same number of elements as H itself. If
true, this would imply X
|gH| = km,
gH∈S
and therefore km = n. Thus, once we prove this claim, we will have shown m | n,
and our proof will be complete.
Pick any g ∈ G, and define the function µg : H → gH where for all h ∈ H,
µg (h) = gh. Clearly, for any gh0 ∈ gH, µg (h0 ) = gh0 , so µg is surjective. Also,
µg (h1 ) = µg (h2 ) ⇒ gh1 = gh2
⇒ h1 = h2 ,
−1
since we can multiply by g on the left. This means µg is also injective. Then since
µg is bijective for any g, all cosets have the same cardinality as H, as desired.
We can use an earlier result (Proposition 2.9) to prove an interesting corollary
of Lagrange’s Theorem, which will be the last result proven in this section.
Corollary 2.14. If G is a finite group, then for all g ∈ G, g |G| = e.
Proof. From Proposition 2.9, ord(g) = |hgi| =: n, and from Lagrange’s Theorem,
there exists m ∈ N such that |G| = nm. Then
g |G| = (g n )m = em = e.
This introduction barely scratches the surface of group theory. However, the
rest of this paper hopes to show that even this relatively small set of results has at
least one interesting application already. We will need only one additional concept:
cyclic groups. They are explained in the next section.
6 NATHAN HATCH
3. Cyclicity and Z×
p
Henceforth, when discussing elements of Z×p , we will usually discard the brackets.
For example, unless otherwise noted, we take 1 to mean [1].
As mentioned above, there are only p − 1 elements in Z× p , since we took out the
zero element. Then, by Lagrange’s Theorem (more specifically, by Corollary 2.14),
for any a ∈ Z×p, a
p−1
= 1. This result is more commonly known as Fermat’s Little
Theorem.
Our ultimate goal in this section is to prove that Z× p is cyclic. Of course, we
must first explain what is meant by this.
Definition 3.3. An element g of a group G is called primitive if hgi = G. A group
G is called cyclic if it contains a primitive element (also called a generator).
Our proof of the cyclicity of Z×
p follows that of Keith Conrad, in his eponymous
article on the subject [4]. We will need five lemmas. To begin, let us introduce a
useful notation for the number of elements of Z× p with a given order.
GROUP THEORY: AN INTRODUCTION AND AN APPLICATION 7
In other words, Np (d) returns the number of elements of Z× p that have order d.
Recall from our discussion of Lagrange’s Theorem that the order of any element of
Z× ×
p is a divisor of p − 1, the order of Zp . This means that the sum of Np (d) over
all divisors of p − 1 gives precisely the number of elements in Z×
p:
X
Np (d) = p − 1. (3.5)
d|(p−1)
Lemma 3.6. Let F be a field and let f ∈ F [x] have degree d such that d ≥ 1. If
there is an α ∈ F such that f (α) = 0, then there is another function g ∈ F [x] of
degree d − 1 such that f (x) = (x − α)g(x).
Proof. Write f as
f (x) = ad xd + . . . + a1 x + a0 , (3.7)
where for all i, ai ∈ F , and ad 6= 0. Now we plug α into this expression to obtain
0 = ad αd + . . . + a1 α + a0 . (3.8)
Now we subtract Eq.(3.8) from Eq.(3.7). The constant term cancels, and we are
left with
f (x) = ad (xd − αd ) + . . . + a1 (x − α). (3.9)
i i
For each i ∈ {1, . . . , d}, we note that we can factor x − α as
xi − αi = (x − α)(xi−1 + xi−2 α + . . . + xαi−2 + αi−1 ).
Hence, we can factor out (x − α) from every term in Eq.(3.9), leaving us with an
expression of the desired form f (x) = (x − α)g(x). Note that the coefficients of g
are sums of coefficients of f , and the coefficient of xd−1 in g is ad , which is nonzero.
Thus, g is a polynomial of degree d − 1 with coefficients in F , as required.
With this knowledge, proving the inductive step of the following lemma becomes
much easier.
Lemma 3.10. Let F be a field and let f ∈ F [x] have degree d such that d ≥ 1.
Then f has at most d roots in F .
Proof. We induct on d.
If d = 1, then f has the form f (x) = ax + b, where a 6= 0. In this case, f (α) = 0
implies α = −a−1 b, so f has exactly one root. In particular, we can say it has at
most d roots, and this establishes the base case.
8 NATHAN HATCH
For the inductive step, fix d ∈ N and assume that all polynomials of degree d
over F have at most d roots. Now pick some f ∈ F [x] of degree d + 1. If there is no
β ∈ F satisfying f (β) = 0, then f has no roots in F , and we are done. Otherwise,
f has a root α ∈ F , so by Lemma 3.6, there is a g ∈ F [x] of degree d such that
f (x) = (x − α)g(x). Then, since the field F has no zero divisors, any root of f must
be a root of either (x − α) or g. By inductive hypothesis, g has at most d roots,
and the only root of (x − α) is α itself. Thus, f has at most d + 1 roots over F .
Induction is complete.
This lemma dealt with general polynomials over a general field. Now, by applying
it to a very specific polynomial over a very specific field, we will be able to learn
something interesting about our group, Z× p.
Corollary 3.11. For any d ∈ N, there are at most d distinct elements a in the
group Z× d
p satisfying a = 1.
Proof. Consider the polynomial f (a) = ad − 1 over the field Zp . From Lemma 3.10,
we know that there are at most d roots of f in Zp . In particular, then, there are at
most d roots in the set Zp \ {0}. Restated, this means there are at most d distinct
numbers in Z× d
p satisfying a = 1, which makes perfect sense even when we view Zp
×
It will turn out to be helpful to treat separately the case where a is a multiple of
p; that is why in the definition we specify that a and p are relatively prime. Also,
it is clear that if a, b ∈ N and a ≡ b (mod p), then a is a quadratic residue mod
p if and only if b is also a quadratic residue mod p. Therefore, we can completely
describe the quadratic residues of any prime p by listing the quadratic residues in
the group Z× p = {1, . . . , p−1}. Here is an example of how to identify these numbers.
Example 4.2. Let us consider the prime 5. To figure out which numbers are
quadratic residues mod 5, we simply square a representative of each of the five
equivalence classes mod 5 and see what we get.
x: 0 1 2 3 4
x2 : 0 1 4 4 1
Thus, the quadratic residues mod 5 are 1 and 4. (Remember 0 is not counted.)
Proof. By definition, the quadratic residues mod p are exactly the elements in
{k 2 : k ∈ Z× ×
p }. Now, pick any i, j ∈ Zp . Then
The next result is very helpful in proving properties of the Legendre symbol,
and it is important enough that it has its own name. Unfortunately, it is also a
bit tricky to prove. To crack it, we will need to use the main result of the previous
section—the cyclicity of Z×p.
Now, we reach the last proposition of this section. For reasons which we will not
delve into here, it is often called the first supplementary law to quadratic reciprocity.
This result is the key to the first step in our proof of Fermat’s two-squares theorem,
as we will see quite soon.
Proposition 4.7. Let p be an odd prime. Then
−1 1, p ≡ 1 (mod 4)
=
p −1, p ≡ 3 (mod 4).
Proof. We apply Euler’s lemma:
−1 p−1
≡ (−1) 2 (mod p).
p
But, again, both sides can only take on the values -1 or 1, and p ≥ 3, so they are
congruent mod p if and only if they are equal. To conclude, we simply note that the
exponent (p − 1)/2 is even when p ≡ 1 (mod 4), and odd when p ≡ 3 (mod 4).
Now we write
(lm)(mp) = (a2 + b2 )(u2 + v 2 ) = (au + bv)2 + (av − bu)2 , (5.2)
which we obtained by rearranging the terms in the product (a2 + b2 )(u2 + v 2 ). Now
we notice that au + bv ≡ a2 + b2 ≡ 0 (mod m) and av − bu ≡ ab − ba ≡ 0 (mod m).
This means au+bv
m ∈ Z and av−bu
m ∈ Z, so we can divide Eq.(5.2) by m2 to obtain
2 2
au + bv av − bu
lp = + ,
m m
which expresses lp as a sum of square numbers. But since 0 ≤ l < m and m is
the minimum positive multiple of p which can be expressed as a sum of square
numbers, it must be that l = 0. This in turn implies a = b = 0, which means m | u
u 2 v 2
and m | v. Then we can rewrite the equation mp = u2 + v 2 as p = m[( m ) +(m ) ],
which means m | p. This can only be true if m = 1 or m = p, but we already know
that m < p. Thus, m = 1, as desired.
It might seem like the amount of background preparation in this paper was
disproportionate to the goal, especially given that the only previous result that we
use in this proof is Proposition 4.7, and its role may seem rather minor. On the
contrary, though, it established a crucial first step and provided the bound that
allowed us to finish the proof. It is one thing to know that we can find a k such
that p | (k 2 + 1); it is another thing entirely to know why such a k can be found,
and this is what the background theory explains.
By finding an expression for ( −2 p ) in terms of congruence conditions for p, our
method could be generalized slightly to find which primes can be expressed in
the form x2 + 2y 2 . However, there are limitations to the strategy we used. As
mentioned in the introduction, the general case of primes of the form x2 + ny 2 has
been solved, and the solution lies in the realm of class field theory. As a matter
of fact, using the theory of Gaussian integers, one can devise a rather shorter and
cleaner proof of Theorem 5.1 than the one we used above. The article “Sums of
Two Squares” by Pete Clark provides one example [3]. For more information on the
generalizations of this problem, the interested reader is advised to obtain a copy of
the aforementioned book Primes of the Form x2 + ny 2 , by D. A. Cox [5].
References
[1] Babai, Laszlo. “Apprentice Program: Linear Algebra and Combinatorics.”
Lecture series, Univ. Chicago Math Dept. VIGRE REU, June–July 2011.
<http://people.cs.uchicago.edu/∼laci/REU11/appr.html>.
[2] Bennett, Michael. “Homework 2 Solutions.” Univ. British Columbia, Math 313, Section 201,
Spring 2009. <http://www.math.ubc.ca/∼bennett/Math313/HW2Sol.pdf>.
[3] Clark, Pete L. “Sums of Two Squares.” Course notes, Univ. Georgia, Math 4400/6400, 2009.
<http://www.math.uga.edu/∼pete/4400twosquares.pdf>.
14 NATHAN HATCH