Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

MATH 537 Class Notes

Ed Belk

Fall, 2014

1 Week One

1.1 Lecture One

Instructor: Greg Martin, Office Math 212


Text: Niven, Zuckerman & Montgomery
Conventions: N will denote the set of positive integers, and N0 the set of nonnegative integers. Unless
otherwise stated, all variables are assumed to be elements of N.
§1.2 – Divisibility
Definition: Let a, b ∈ Z with a 6= 0. Then a is said to divide b, denoted a|b, if there exists some c ∈ Z such
that ac = b. If in addition a ∈ N, then a is called a divisor of b.
Properties of Divisibility: For all a, b, c ∈ Z with a 6= 0, one has:
• If a|b then ±a| ± b
• 1|b, b|b, a|0
• If a|b and b|a then a = ±b
• If a|b and a|c, then a|(bx + cy) for any x, y ∈ Z
If we assume that a and b are positive, we also have
• If a|b then a ≤ b
The Division Algorithm: Let a, b ∈ N. Then there exist unique natural numbers q and r such that:
1. b = aq + r, and
2. 0 ≤ r < a
Proof : We prove existence first; consider the set

R = {b − an : n ∈ N0 } ∩ N0 .

By the well-ordering axiom, R has a least element r, and we define q to be the nonnegative integer q such that
b − aq = r. Then b = aq + r and r ≥ 0; moreover, if r ≥ a then one has

0 ≤ r − a = (b − aq) − a = b − a(q + 1) < b − aq + r,

contradicting the minimality of r ∈ R, and we are done.

1
Now, suppose q 0 and r0 are such that we have

b = aq + r = aq 0 + r0 .

Without loss of generality we may assume than r ≥ r0 . Then

r − r0 = (b − aq) − (b − aq 0 ) = a(q 0 − q) ⇒ a|(r − r0 );

but 0 ≤ r − r0 ≤ r < a, and so the above equation is a contradiction unless r − r0 = 0, and the result is
immediate.

Greatest Common Divisor: Given any two integers a and b not both equal to zero, we define their greatest
common divisor (commonly abbreviated gcd) to be the largest d ∈ N such that d|a and d|b; we write d = (a, b).
Note that because a and b each have only finitely many divisors, the gcd is always well-defined.
Theorem 1.1.1 Let a, b ∈ Z, not both equal to zero. Then:
1. (a, b) = min S, where S = ({ax + by : x, y ∈ Z} ∩ N), and
2. For any c ∈ Z such that c|a and c|b, we have c|(a, b).
The existence of integers x, y so that ax + by = (a, b) as in part (1) is known as Bézout’s identity.
Proof : 1. Let m = min S, with u and v such that m = au + bv, and let g = (a, b); note that m ≤ a. Since g|a
and g|b, we know from the properties of divisibility that g|m and so g ≤ m. Now, if m - a then by the division
algorithm we may write a = mq + r with 0 < r < m, and thus

r = a − mq = a − q(au + bv) = a(1 − qu) + b(−qv) ∈ S,

and we deduce that r ≥ m = min S, a contradiction; thus m|a. In the same fashion we show m|b, and so by
definition m ≤ (a, b) = g, and we are done.
2. If c|a and c|b, then we know c|(ax + by) for every x, y ∈ Z, and in particular for those u, v such that
(a, b) = au + bv, whose existence is guaranteed by part 1.


2
1.2 Lecture Two

Recall: Bézout’s identity states that (a, b) is the smallest positive integer that may be written ax + by, where
x, y ∈ Z.
Proposition 1.2.1 For a, b ∈ N, one has (ma, mb) = m(a, b).
 
a
Corollary 1: If d|a, d|b, then ad , db = d1 (a, b); in particular, (a,b) b

, (a,b) = 1.

Proof : Set g = (a, b), so that we may write


ax + by = g,
for some x, y ∈ Z. Then
mg = (ma)x + (mb)y, thus mg ≥ (ma, mb).
Furthermore, g|a and so mg|ma; similarly mg|mb, thus mg ≤ (ma, mb), and we are done.

Definition: Two integers a and b are called relatively prime (or coprime) if (a, b) = 1.
nb. We observe that (a, b) = 1 if and only if there exist x, y such that ax+by = 1. The corresponding statement
with (a, b) = k > 1 is not, in general, true, however it is the case that

ax + by = k ⇒ (a, b)|k.

Proposition 1.2.2 If (a, n) = (b, n) = 1, then (ab, n) = 1.


Proof : Suppose we have u, v, x, y so that au + nv = bx + ny = 1; then we have

1 = 1 · 1 = (au + nv)(bx + ny) = ab(ux) + n(auy + bvx + nvy),

and the result is immediate.



[Aside: Compare with the analagous result in commutative algebra. If R is a commutative, unital ring and
I, J, K ⊂ R are ideals such that I + K = J + K = R, then IJ + K = R.]
Proposition 1.2.3 If a|c, b|c, and (a, b) = 1, then ab|c. (Note that this is not, in general, true for (a, b) > 1,
e.g. a = b = c = 2.)
Proof : Choose m, n, x, y so that c = am = bn and ax + by = 1. Then

c = cax + cby = (bn)ax + (am)by = ab(nx + my),

and we deduce that ab|c.



Theorem 1.2.4 (Theorem 1.10, Niven) If d|ab and (b, d) = 1, then d|a.
Proof : Exercise.
nb. If d|a, d|b, then d|b + ax for any x ∈ Z. In fact, the condition is also necessary, as b = (b + ax) − x(a).
The Euclidean Algorithm: How can we find the gcd of two integers, for example 537 and 105?
By the division algorithm, we have 537 = 5 · 105 + 12, and so by the above note we know (537, 105) = (105, 12).
Repeating this process, we see
105 = 8 · 12 + 9 ⇒ (105, 12) = (12, 9);
12 = 1 · 9 + 3 ⇒ (12, 9) = (9, 3);

3
9 = 3 · 3 + 0 ⇒ (9, 3) = (3, 0) = 3.
Thus (537, 105) = 3.
Notation: The least common multiple of a and b is denoted lcm(a, b) or, more commonly, [a, b].
Exercise: Show that (a, b)[a, b] = ab.
§1.3 – Primes
Definition: A natural number n is called prime if it has exactly two divisors. n is called composite if there
exists some d with 1 < d < n such that d|n. The integer n = 1 is neither prime nor composite.
Notation: Unless otherwise stated, p will denote a prime number.
Lemma 1.2.5 (Euclid’s lemma) If p|ab, then p|a or p|b.
Proof : Suppose p - b. Then (p, b) = 1, and so by theorem 1.2.4 we know that p|a.

Theorem 1.2.6 (The Fundamental Theorem of Arithmetic) Every n ∈ N, n > 2 may be written as the product
of primes; moreover this expression is unique up to reordering of the factors.
Proof : (existence) We use strong induction. The case n = 2 is trivial from the definition of a prime, therefore
suppose n > 2. If n is prime we have the trivial factorization n = n, otherwise we may write n = ab, with
1 < a < n and 1 < b < n. By the inductive hypothesis we may write a = p1 p2 · · · pk , b = q1 q2 · · · ql , with each
pi , qj prime, and the result is immediate.
(uniqueness) Let n ∈ N and suppose we have

n = p1 p2 · · · pk = q1 q2 · · · ql , each pi , qj prime.

Since p1 |q1 q2 · · · ql we have by lemma 1.2.5 that p1 |q1 or p1 |q2 · · · ql . Repeating this process as many times as
necessary, we find qt such that p1 |qt , and by relabelling the qj if necessary we will assume t = 1. Since p1 6= 1
this implies that p1 = q1 , as q1 has no other factors. We then cancel p1 = q1 on both sides of the equation and
we have
p2 p3 · · · pk = q2 q3 · · · ql .
We apply the same argument to this expression to obtain p2 = q2 , p3 = q3 , and so on; it follows that k = l, and
we are done.


4
2 Week Two

2.1 Lecture Three

Doing a linear algebra problem backwards. Consider the augmented matrix


 
1 0 537
;
0 1 105
   
x 537
this system clearly has solution = . Moreover, from basic linear algebra we know that the application
y 105
of elementary row operations to this augmented system will not change the  solution;
  therefore,
 with R1 , R2
x 537
respectively denoting the first and second row of the matrix, we observe that = is also a solution
y 105
to the augmented matrices  
1 −5 12
(R1 → R1 − 5R2 ),
0 1 105
 
1 −5 12
(R2 → R2 − 8R1 ),
−8 41 9
 
9 −46 3
(R1 → R1 − R2 ),
−8 41 9
 
9 −46 3
(R2 → R2 − 3R1 ).
−35 179 0
Thus we have the matrix equation     
9 −46 537 3
= .
−35 179 105 0
The first entry of this equation indicates that 9(537) + (−46)(105) = 3 = (537, 105), while the entries in the
105 537
second row of the matrix are −35 = − (537,105) and 179 = (537,105) . This operation is known as the extended
Euclidean algorithm.
Lemma 2.1.1 Let a, b ∈ N and use the division algorithm to write b = aq + r with 0 ≤ r < a. Then a|b if and
only if r = 0.
Proof : If r = 0 then b = aq and we are done. Conversely, if a|b then a|b−ax for every x, and since r = a−bq < a,
we must have r = 0.

Theorem 2.1.2 (Euclid’s theorem) There are infinitely many prime numbers.
Proof : It suffices to show that every finite list of primes excludes at least one prime number. Let {p1 , p2 , . . . , pk }
be a set of finitely many primes and let
N = p1 p2 · · · pk + 1.
Then N ≥ 2 and so by the fundamental theorem of arithmetic N is the product of primes, so there exists some
prime p such that p|N . Applying the division algorithm with N and any pj yields

N = pj (p1 · · · pj−1 pj+1 · · · pk ) + 1,

which (since 1 < pj ) by lemma 2.1.1 implies that pj - N for any j. Thus we deduce that p 6= pj for any
j = 1, 2, . . . , k, and therefore that the set of primes {p1 , p2 , . . . , pk } is not exhaustive.


5
§2.1 – Congruences
Definition: Let m ∈ Z, m 6= 0. Given a, b ∈ Z, we say that a is congruent to b modulo m, written
a ≡ b mod m, if m|(b − a). For example, we have

53 ≡ 7 mod 23, but 5 6≡ 37 mod 23.

Lemma 2.1.3 For fixed m 6= 0, “congruence modulo m” is an equivalence relation.


Proof : Clearly a ≡ a mod m because m|0 = a − a, which proves reflexivity. Symmetry is an immediate
consequence of the fact that m|(b − a) ⇔ m|(a − b), and to prove transitivity we observe that

a ≡ b mod m, b ≡ c mod m ⇒ m|(b − a), m|(c − b) ⇒ m|(c − b) + (b − a) = (c − a),

and we are done.



Thus in particular, congruence modulo m (as every equivalence relation) partitions Z into equivalence classes,
called residue classes modulo m. For example, one residue class modulo 23 is the set

{. . . , −39, −16, 7, 30, 53, . . .}.

In general, a residue class modulo m is of the form {a + km : k ∈ Z}. Note in particular that a ≡ b mod m if
and only if a and b have the same remainder when dividing by m.
Lemma 2.1.4 Suppose a ≡ b mod m, c ≡ d mod m. Then:
1. If d|m then a ≡ b mod d,
2. a + c ≡ b + d mod m,
3. ac = bd mod m.
Proof : We prove only (3), as the others are clear from the definitions: since m|(b − a), m|(c − d), we must have
that m divides c(b − a) + b(d − c) = bd − ac, and the result follows.

The last two parts of lemma 2.1.4 imply further that a − c ≡ b − d mod m, and more generally, if f (X) ∈
Z[X], then f (a) ≡ f (b) mod m whenever a ≡ b mod m. In particular, we have that ak ≡ bk mod m for any
k ∈ N.
Question: If j ≡ k mod m, do we have aj ≡ ak mod m?
In general, no: some counterexamples include a = 2, m = 3 or a = 2, m = 4.
We have seen that the operations of addition, subtraction, and multiplication behave well with respect to
congruence modulo m; does division? Again, in general the answer is no:

18 ≡ 28 mod 10, but 9 6≡ 14 mod 10,

as we might expect if we were allowed to “divide by 2.”


m
Theorem 2.1.5 (Theorem 2.3, Niven) We have ax ≡ ay mod m if and only if x ≡ y mod (a,m) . In particular,
if (a, m) = 1 then
ax ≡ ay mod m ⇔ x ≡ y mod m.

6
 
m a m a
Proof : Suppose ax ≡ ay mod m so that m|a(y−x); then we have (a,m) | (a,m) (y−x), and since (a,m) , (a,m) =1
m m m m
we know that (a,m) |(y − x), hence x ≡ y mod (a,m) . Now, suppose x ≡ y mod (a,m) so that (a,m) |(y − x). Then
m a
we certainly have a (a,m) |a(y − x), hence (a,m) m|a(y − x) and so in particular m|a(y − x), and we are done.

Definition: Given m ∈ Z, m 6= 0, a complete residue system modulo m is a set containing exactly one
element from each residue class modulo m. For example, with m = 5 we may take any of the sets

{0, 1, 2, 3, 4}, {1, 2, 3, 4, 5}, {−2, −1, 0, 1, 2}, or {−17, 60, 101, 12, −111}.

A reduced residue system is a set of representatives from all residue classes relatively prime to m; continuing
in the same example, we may take

{1, 2, 3, 4} or {537, −7, 1, 99999929}.

7
2.2 Lecture Four

Recall: A reduced residue system modulo m is a set consisting of exactly one element form each
residue class modulo m whose elements are relatively prime to m; these are called reduced residue classes.
Equivalently, we may take any complete residue system modulo m, and discard all elements d such that
(d, m) > 1.
Example: If m = 10, a complete residue system is given by {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; by discarding all elements
not relatively prime to 10, we obtain the reduced residue system {1, 3, 7, 9}. If m is prime, a reduced residue
system is given by {1, 2, . . . , m − 1}.
Definition: The Euler φ-function (or Euler totient function) is the function which assigns to m ∈ N the
cardinality of a reduced residue system modulo m; that is,

φ(m) = #{1 ≤ i ≤ m : (i, m) = 1}.

For example, φ(10) = 4, and φ(p) = p − 1 for any prime p.


Lemma 2.2.1 Let {r1 , r2 , . . . , rφ(m) } be a reduced residue system modulo m and let a ∈ Z with (a, m) = 1.
Then
{ar1 , ar2 , . . . , arφ(m) }
is also a reduced residue system modulo m.
For example, with m = 10, a = 13, we see that {13, 39, 91, 117} = {13 · 1, 13 · 3, 13 · 7, 13 · 9} is a reduced residue
system modulo 10.
Proof : By assumption a and each rj are relatively prime to m, and so each arj is also relatively prime to m.
Moreover, if ari , arj lie in the same residue class, then one has

ari ≡ arj mod m.

By theorem 2.1.5, we may cancel a (which is relatively prime to the modulus) to yield the congruence

ri ≡ rj mod m,

and hence (since we began with a reduced residue system) we know that i = j, and the result is immediate.

Theorem 2.2.2 (Euler’s theorem) If (a, m) = 1, then aφ(m) ≡ 1 mod m.
Proof : Let {r1 , r2 , . . . , rφ(m) } be a reduced residue system modulo m. Then by lemma 2.2.1, the elements
ar1 , ar2 , . . . , arφ(m) are congruent (in some order) to the elements r1 , r2 , . . . , rφ(m) , and therefore

r1 r2 · · · rφ(m) ≡ (ar1 )(ar2 ) · · · (arφ(m) ) mod m


≡ aφ(m) r1 r2 · · · rφ(m) mod m.

Since (r1 r2 · · · rφ(m) , m) = 1, we may cancel it, and the result follows.

Corollary 1: (Fermat’s little theorem) If p is prime and p - a, then ap−1 ≡ 1 mod p, and for all a ∈ Z one has
ap ≡ a mod p.
Corollary 2: Let (a, m) = 1. If there exist e and f with e ≡ f mod φ(m), then ae ≡ af mod m.
For example, 537 ≡ 1 mod 4, and since 4 = φ(10) we have that 3537 ≡ 31 mod 10.

8
Proof : Suppose without loss of generality that f ≥ e and write f = e + kφ(m). We have

af = ae+kφ(m) = ae (aφ(m) )k ≡ ae (1)k mod m


≡ ae mod m,

as claimed.

Definition: Given a, m ∈ Z with m 6= 0, we call x ∈ Z a (multiplicative) inverse of a modulo m if
ax ≡ 1 mod m.
Theorem 2.2.3 (Theorem 2.9, Niven) If (a, m) > 1, then a has no inverse modulo m. If (a, m) = 1, then
there exists a unique reduced residue class modulo m which contains all inverses of a. We denote any such
inverse as ā or a−1 .
Note that the notation a−1 is justified, as for example if we define a−k to be (a−1 )k mod m, then we indeed
have (ak )−1 = (a−1 )k .
Proof : Let g = (a, m); note that if ax ≡ 1 mod m then ax ≡ 1 mod g, and since g|a this congruence becomes
0x ≡ 1 mod g, a contradiction unless g = 1. Thus with the assumption that g = 1, we first prove uniqueness:
if
ax ≡ 1 mod m and ay ≡ 1 mod m,
then ax ≡ ay mod m, hence (since (a, m) = 1) x ≡ y mod m, as claimed. To show existence, we give two short
proofs:
(1) By Euler’s theorem, we have 1 ≡ aφ(m) mod m ≡ a · aφ(m)−1 mod m, so we may take a−1 = aφ(m)−1 .
(2) Since (a, m) = 1, there exist integers u, v such that au + bv = 1. Taking this equation modulo m yields the
congruence au ≡ 1 mod m, and so we may take a−1 = u.


9
2.3 Lecture Five

Calculating inverses: Suppose we want to calculate the (multiplicative) inverse of 9 modulo 20; note that
this calculation is well-defined, as (9, 20) = 1. We perform the Euclidean algorithm:

20 = 9 · 2 + 2; 9 = 2 · 4 + 1

⇒ 1 = 9 − 2 · 4 = 9 − 2 · (20 − 2 · 9) = 9 · 9 − 4 · 20.
Taking this last equation modulo 20, we see that 92 ≡ 1 mod 20, so 9−1 ≡ 9 mod 20. The same equation also
tells us that 20−1 ≡ 4 mod 9. One clearly has

20−1 ≡ 1 mod 19, 19−1 ≡ −1 mod 20,

19−1 ≡ 1 mod 9, 9−1 ≡ −2 mod 19.


Definition: A collection of integers m1 , m2 , . . . , mr are called pairwise coprime (or pairwise relatively
prime) if (mi , mj ) = 1 for all i 6= j. Note that this is stronger than the statement that (m1 , m2 , . . . , mr ) = 1.
For example, (6, 10, 15) = 1, but (6, 10) = 2, (6, 15) = 3, (10, 15) = 5.
Theorem 2.3.1 (Theorem 2.18, Niven; the Chinese remainder theorem) Let m1 , m2 , . . . , mr be pairwise co-
prime, and let {a1 , a2 , . . . , mr } be any set of integers. Then there exists a solution x to the system of congruences

x ≡ a1 mod m1 ,

x ≡ a2 mod m2 ,
..
.
x ≡ ar mod mr ,
and moreover the set of all solutions is exactly the residue class of x modulo M = m1 m2 · · · mr .
Proof : For j = 1, 2, . . . , r, let Nj = m1 mm2j···mr , and note that (mj , Nj ) = 1. Therefore we may define bj to be
the inverse of Nj modulo mj , so Nj bj ≡ 1 mod mj . Set
r
X
x0 = Nj bj aj ;
j=1

we claim that x0 solves our system. Indeed, modulo mj , each Ni with i 6= j is congruent to 0 modulo mj , and
so x0 ≡ (Nj bj )aj mod mj ≡ aj mod mj , as claimed. Now, if x ≡ x0 mod M , then in particular for each j we
have
x ≡ x0 mod mj ≡ aj mod mj ,
so x is also a solution. Finally, if y is any solution to our system, then y ≡ aj mod mj ≡ x0 mod mj for every j,
so mj |(y − x0 ). Since the mi are pairwise coprime, we have m1 m2 |(y − x0 ), m1 m2 m3 |(y − x0 ), and so on, until
we obtain M |(y − x0 ), and we are done.

Remark: If m1 , m2 , . . . , mr are not pairwise coprime, then there may be no solution, or there may be one
residue class of solutions modulo [m1 , m2 , . . . , mr ]. For example, the system

x ≡ 0 mod 6,

x ≡ 1 mod 4,

10
has no solution, while
x ≡ 0 mod 6,
x ≡ 2 mod 4,
has as its solution the residue class of 6 modulo 12.
Example: Greg steals B boxes of 20 Timbits each. There are an equal number of each of the 9 flavours, and
one extra to fill the last box. In class, he divides the Timbits equally among the 19 students, with 4 leftover
for himself. What is the smallest possible value of B?
Solution: Let t be the total number of Timbits; we have

t ≡ 0 mod 20,
t ≡ 1 mod 9,
t ≡ 4 mod 19.

Set m1 = 20, m2 = 9, m3 = 19; then

N1 = 171, N2 = 380, N3 = 180.

We need b1 ≡ N1−1 mod m1 ≡ (9 · 19)−1 mod 20 ≡ (9)−1 (19)−1 mod 20 ≡ 11 mod 20, from our previous work.
Similarly, b2 ≡ 5 mod 9, b3 ≡ −2 mod 19. Hence

x0 = N1 b1 a1 + N2 b2 a2 + N3 b3 a3 = (171)(11)(0) + (380)(5)(1) + (180)(−2)(4) = 460.

Structural comments: Let Zm = Z/mZ be the set of residue classes modulo m. If d|m, then there is a
well-defined projection map πd : Zm → Zd given by

πd (a mod m) = a mod d.

Note that this map is not well-defined if d - m. Now, let m1 , m2 , . . . , mr be pairwise coprime. We have a
map
π : Zm1 m2 ···mr −→ Zm1 × Zm2 × · · · × Zmr ,
given in each component Zmi by πmi . The Chinese remainder theorem gives a map

ρ : Zm1 × Zm2 × · · · × Zmr −→ Zm1 m2 ···mr

such that π ◦ ρ = id. Since each set is finite, we know that π and ρ are bijections. One can check that:
1. π and ρ respect coprimality, and
2. π and ρ respect multiplication and addition.
Hence, π and ρ are ring isomorphisms. In particular, if Z×m is the set of reduced residue classes modulo m,
then
π : (Zm1 m2 ···mr )× −→ Z× × ×
m1 × Zm2 × · · · × Zmr

is an isomorphism of multiplicative groups. It follows from this, and the formula for the Euler φ-function,
that
φ(m1 m2 · · · mr ) = φ(m1 )φ(m2 ) · · · φ(mr ).

11
3 Week Three

3.1 Lecture Six

Suppose n ∈ N has prime factorization


n = pα1 1 pα2 2 · · · pαr r ,
with αi > 0 and pi 6= pj for all i 6= j. Then as discussed last time, we have maps

π : Zm1 m2 ···mr −→ Zm1 × Zm2 × · · · × Zmr ,

ρ : Zm1 × Zm2 × · · · × Zmr −→ Zm1 m2 ···mr ,


where π = πpα1 × πpα2 × · · · × πpαr r and ρ is the map given by the Chinese remainder theorem. These maps are
1 2
mutual inverses, and moreover are ring isomorphisms.
In particular, these maps respect coprimality, and so their restrictions to their respective multiplicative groups
of units yield mutually inverse group isomorphisms

π̃ : (Zm1 m2 ···mr )× −→ Z× × ×
m1 × Zm2 × · · · × Zmr ,

ρ̃ : Z× × × ×
m1 × Zm2 × · · · × Zmr −→ (Zm1 m2 ···mr ) .

By definition, (Zn )× has cardinality φ(n), and so it follows that

φ(m1 m2 · · · mr ) = φ(m1 )φ(m2 ) · · · φ(mr ).

Thus we are led to compute φ(pα ) for prime p; but since the only 1 ≤ k ≤ pα with (pα , k) > 1 must have
(pα, k) = 
p, we deduce that exactly the multiples of p are not relatively prime to pα , hence φ(pα ) = pα − pα−1 =
pα 1 − p1 . It follows that
Y 1

φ(n) = n 1− ,
p
p|n

with the product running over all prime divisors p of n.


Lemma 3.1.1 Fix m ∈ N, and consider the following statements:
1. x2 ≡ 1 mod m
2. x−1 ≡ x mod m
3. x ≡ ±1 mod m
For any m, one has (1) if and only if (2), and that (3) implies (1). If m is prime, then all three are equivalent.
Proof : The first statement is clear, as is the statement that (3) implies (1). Thus we will assume m is prime;
then one has (3) if and only if m|x2 − 1 = (x + 1)(x − 1). Thus by Euclid’s lemma we have m|x + 1 or m|x − 1,
and the result is immediate.

We saw in the last lecture that 9−1 ≡ 9 mod 20, but clearly 9 6≡ ±1 mod 20. The same is true for 11 ≡
−9 mod 20.
Theorem 3.1.2 (Wilson’s theorem) If p is prime, then (p − 1)! ≡ −1 mod p.

12
Proof : The cases p = 2, p = 3 are clear by computation. For p > 3, we pair off the numbers {2, 3, . . . , p − 2}
as {a1 , b1 , a2 , b2 , . . . , ak , bk }, where k = p−3
2 and ai bi ≡ 1 mod p. We know that this is well-defined by lemma
3.1.1, and the fact that inverses modulo p are unique. One then has
(p − 1)! = 1 · 2 · · · (p − 1) = 1 · (p − 1) · a1 b1 · · · ak bk
≡ 1 · (p − 1) · 1 · 1 · · · 1 mod p ≡ −1 mod p,
as claimed.

§2.2 – Solutions of congruences
How many solutions has
X 4 + 2X 3 + X + 1 ≡ 0 mod 5?
As integers, we have solutions
x ∈ {· · · , −14, −13, −9, −8, −4, −3, 1, 2, 6, 7, 11, 12, · · · }.
As residue classes modulo 5, we have only
x ≡ 1 mod 5 and x ≡ 2 mod 5;
we say that our congruence has only 2 solutions modulo 5.
Definition: Given a polynomial f (X) ∈ Z[X], the number of solutions of f (X) ≡ 0 mod m, denoted σf (m),
is the number of residue classes modulo m which satisfy the congruence; equivalently,
σf (m) = #{1 ≤ x ≤ m : f (x) ≡ 0 mod m}.

Example: Let f (X) = X 2 − 1. We saw that σf (20) ≥ 4, while by lemma 3.1.1 we know that if p is an odd
prime then σf (p) = 2, while σf (2) = 1.
We begin our investigation by studying linear congruences of the form ax ≡ b mod m.
Theorem 3.1.3 (Theorem 2.17, Niven) Let m ∈ N and set f (X) = aX − b, a, b ∈ Z. Set g = (a, m). Then
σf (m) = 0 unless g|b, in which case σf (m) = g.
Proof : If ax ≡ b mod m, then ax ≡ b mod g, i.e. 0x ≡ b mod g, since g|a, and hence we must have g|b. Now,
suppose g|b and write a = αg, b = βg, m = µg. Then
ax ≡ b mod m ⇔ αx ≡ β mod µ,
by theorem 2.1.5. But (α, µ) = 1 by construction, so α−1 modulo µ exists, and we have the unique solution
given by x ≡ α−1 β mod µ. This yields g = m
µ solutions modulo m, as claimed.

Example: Let m = 100 and g = 5, so that µ = 20. Then x ≡ 14 mod 20 if and only if x ≡ 14, 34, 54, 74, or 94
modulo 100.
Let m have prime factorization m = pe11 pe22 · · · perr . By the Chinese remainder theorem, the congruence f (x) ≡
0 mod m is equivalent to the system of congruences
f (x) ≡ 0 mod pe11 ,
f (x) ≡ 0 mod pe22 ,
..
.
f (x) ≡ 0 mod perr .

13
In particular, this implies that
r
Y
σf (m) = σf (pei i ),
i=1

and thus it suffices to study polynomial congruences modulo prime powers; this will be the focus of our next
lecture.

14
3.2 Lecture Seven

Exercise: Prove that the product of any k consecutive integers is a multiple of k!.
Solution: The pigeonhole principle implies that among any k consecutive integers must be a multiple of 1, of
2, and so on up to k, but this is not quite enough, since these numbers need not be pairwise coprime.
Instead, we may prove it one prime at a time, from which the general case follows. On the other hand, we may
simply use the identity  
j(j − 1) · · · (j − k + 1) j! j
= = ∈ Z,
k! k!(j − k)! k
from which the fact is apparent; granted, the last method is a Deus ex machina.
§2.6 – Prime power moduli
Lemma 3.2.1 Let f (X) ∈ C[X] have degree d. Then for any a ∈ C, we have

f 00 (a) f (d) (a)


f (a + h) = f (a) + hf 0 (a) + h2 + · · · + hd .
2! d!

Proof : Fix a; both expressions above are polynomials in h of degree d, and their zeroth derivatives agree at
h = 0, as do their first derivatives, second, and so on up to the dth derivatives. Thus their derivative, which is a
polynomial in h of degree at most d, is divisible by hd+1 , which implies that they must, in fact, be equal.

nb. With the notion of a derivative not defined here, we instead will use the formal derivative of a polynomial
or power series, i.e.
m
X m
X
n 0
if f (X) = an X , then f (X) = nan X n−1 , m ∈ N0 ∪ {∞}.
n=0 n=0

f (k) (a)
Lemma 3.2.2 If f (X) ∈ Z[X], then for any a ∈ Z, k ∈ N, we have that k! is an integer.
d
X
Proof : Write f (X) = an X n , an ∈ Z. Then
n=0

d
f (k) (a) X n(n − 1) · · · (n − k + 1) n−k
= a ,
k! k!
n=0

n(n−1)···(n−k+1)
and by the exercise we know that k! ∈ Z.

Theorem 3.2.3 (Hensel’s lemma) Let f (X) ∈ Z[X] and let pj be a prime power. Suppose there exists a ∈ Z
so that
f (a) ≡ 0 mod pj and f 0 (a) 6≡ 0 mod p.
Then there exists a unique integer t, 0 ≤ t < p such that f (a + tpj ) ≡ 0 mod pj+1 .
Example: Take f (X) = X 2 − 2, a = 4, pj = 71 . Then

f (4) = 16 − 2 ≡ 0 mod 7, f 0 (4) = 2(4) 6≡ 0 mod 7.

It follows that exactly one element of {4, 11, 18, 25, 32, 39, 46} is a root of f (X) modulo 72 ; it turns out to be
39.

15
Note that the residue class a modulo pj is the union of the p residue classes a + tpj , 0 ≤ t < p. The one which
is a root modulo pj+1 is called a lift of a.
Proof of Hensel’s lemma: By lemma 3.2.1, we may write

(tpj )2 f 00 (a) (tpj )d f (d) (a)


f (a + tpj ) = f (a) + tpj f 0 (a) + + ··· + .
2! d!
Taking this expression modulo pj+1 yields

f (a + tpj ) ≡ f (a) + tpj f 0 (a) mod pj+1 .

Since f (a) ≡ 0 mod pj , we have that this is the case if and only if

f (a)
≡ −tf 0 (a) mod p.
pj

Since f 0 (a) 6≡ 0 mod p, we have that f 0 (a) is a unit modulo pj+1 , and so we find the unique class t to be given
by
−(f 0 (a))−1 f (a)
t≡ mod p,
pj
as can be easily verified.

f (a)
Example: Using the same example from before, we calculate pj
= 14
7 = 2, f 0 (a) = 8 ≡ 1 mod 7, so we ought
to take t = −(1)−1 (2) ≡ 5 mod 7, and indeed

f (4 + 5 · 7) = f (39) = 1519 ≡ 0 mod 72 .

Corollary 1: Given f (X) ∈ Z[X], a prime p, and a ∈ Z with f (a) ≡ 0 mod p and f 0 (a) 6≡ 0 mod p, then for
every j ≥ 2 there exists a unique lift of a to a root of f modulo pj ; that is, a unique residue class aj mod pj
such that
f (aj ) ≡ 0 mod pj and aj ≡ a mod p.

Proof : Exercise. (hint: use induction and Hensel’s lemma)


Remark: The aj of the corollary are given recursively by a1 = a and, for j ≥ 1,

aj+1 = aj − f 0 (aj )−1 f (aj ).

nb. The condition f 0 (a) 6≡ 0 mod p is the condition that a is a nonsingular root of f (X) modulo p. As
written, this formula fails for singular roots: consider f (X) = X 2 . Then a = 0 is a root modulo p, and every lift
of a is a root of f modulo p2 . Similarly, for g(X) = X 2 − p, a = 0 is a root modulo p, but no lifts of a are roots
modulo p2 . There is a more general version of Hensel’s lemma (theorem 2.24 of Niven) which accommodates
such roots.
Fact: There exist polynomials, such as

(X 2 − 2)(X 2 − 17)(X 2 − 34), or 3X 3 + 4Y 3 + 5Z 3 ,

which have roots modulo m for every m ∈ N, but have no roots over the rationals.

16
3.3 Lecture Eight

§2.7 – Prime modulus


aj X j , g(X) = bj X j ∈ Z[X]. We will say that f (X) is congruent to g(X)
P P
Definition: Let f (X) =
modulo m, written f (X) ≡ g(X) mod m, if aj ≡ bj mod m for every j. In other words, f (X) ≡ g(X) mod m
if and only if f (X) and g(X) have the same image in (Z[X])/(m) ∼= (Z/mZ)[X].
Example: Suppose f (X) = 15X 2 + 3X + 8 ∈ Z[X]. We note that deg f = 2 over Z, but deg f = 1 over Z5 ,
and deg f = 0 over Z3 .
Lemma 3.3.1 Let p be prime, a an integer, and f (X) ∈ Z[X]. If f (a) ≡ 0 mod p, then there exists g(X) ∈
Z[X] with deg g = deg f − 1 such that

f (X) ≡ (X − a)g(X) mod p.

Proof : We saw in our last lecture that (with d = deg f )

f 00 (a) f (d) (a)


f (a + h) = f (a) + hf 0 (a) + h2 + · · · + hd .
2! d!
We set
d
X f (j)
g(X) = (X − a)j−1 ,
j!
j=1

and we have that


f (X) = f (a) + (X − a)g(X) ≡ (X − a)g(X) mod p.
f (d) (a)
Note that the leading coefficient of f (X) is d! and that deg g = d − 1.

Observe that the primality condition is necessary; indeed, if f (X) = X 2 − 1, then f has roots ±1, but we may
factor f (X) = (X − 5)(X + 5).
Theorem 3.3.2 (Theorem 2.26, Niven) Let f (X) ∈ Z[X], deg f = d modulo p, with p prime. Then f has at
most d roots modulo p.
Proof : We induct on deg f . For deg f = 0 the result is clear, so suppose deg f = d > 0. If f has no roots
modulo p we are done; otherwise, write

f (X) ≡ (X − a)g(X) mod p,

where f (a) = 0 and deg g = d − 1, as guaranteed by lemma 3.3.1. Since p is prime, any root of f (X) modulo p
is a root of X − a or g(X). By the inductive hypothesis, g has at most d − 1 roots modulo p, and X − a has a
single root modulo p, from which we deduce the result.

Example: Consider f (X) = X p − X with p prime. By Fermat’s little theorem, every residue class modulo p
is a root of f , and by lemma 3.3.1 it follows that

f (X) = X(X − 1)(X − 2) · · · (X − p + 1) mod p.

Comparing coefficients yields some interesting congruences, among which we have in the coefficient of X p−1

0 + 1 + 2 + · · · + (p − 1) ≡ 0 mod p, p > 2,

17
and in the coefficient of X p−2 X
jk ≡ 0 mod p, p > 3.
0≤j<k≤p−1

Finally, from the coefficient of X we may deduce Wilson’s theorem

(p − 1)! ≡ −1 mod p.

Remark: This example implies that if f (X), g(X) ∈ Z[X] are such that f (a) ≡ g(a) mod p for every a ∈ Z,
then
f (X) − g(X) ≡ h(X)(X p − X) mod p
for some h(X) ∈ Z[X]. In fact, this condition is also sufficient.
Proposition 3.3.3 Let F (X) be any function (i.e. set map) from Zp to Zp . Then there exists a unique
polynomial g(X) modulo p of degree at most p − 1 such that

F (a) ≡ g(a) mod p for every a ∈ Z.

Proof : We show uniqueness first. If g(X), h(X) both satisfy the condition, then from our remark above we
have that
g(X) − h(X) = q(X)(X p − X), some q(X) ∈ Z[X].
Comparing degrees, we see that we must have g = h. For existence, we give two proofs. First of all, if we
set
p−1
X
g(X) = (1 − (X − a)p−1 )F (a),
a=0

then by Fermat’s little theorem we see that g(a0 ) ≡ (1 − 0)F (a0 ) mod p ≡ F (a0 ) mod p.
Alternatively, we observe that there are exactly pp functions Zp → Zp , and there are exactly pp polynomials
over Zp of degree at most p − 1. No two of these polynomials give the same function, and it follows that the
two sets must coincide.

Corollary 1: (Corollary 2.30, Niven) Let p be prime and suppose that d|(p − 1). Then X d − 1 has exactly d
roots modulo p.
Proof : By theorem 3.3.2 there are most d roots, so we need only show there are at least d roots. Note
that
X p−1 − 1 ≡ (X − 1)(X − 2) · · · (X − p + 1) mod p
has exactly p − 1 roots modulo p. Since d|(p − 1), we have

X p−1 − 1 = (X d − 1)(X p−1−d + X p−1−2d + · · · + X 2d + X d + 1).

The second factor has at most p − 1 − d roots modulo p, and so by the pigeonhole principle X d − 1 must have
at least d roots modulo p, as claimed.

§2.8 – Primitive roots and power residues
Consider the congruence X n ≡ 1 mod m; note that any solution a must satisfy (a, n) = 1.
Definition: Given a with (a, m) = 1, the multiplicative order of a modulo m (often called simply the
order of a) is the least positive integer k such that ak ≡ 1 mod m. One sometimes says that a belongs to the
exponent k modulo m.

18
Example: Let m = 11, a = 3. We have

31 ≡ 3 mod 11, 32 ≡ 2 mod 11, 33 ≡ 5 mod 11, 34 ≡ 4 mod 11, 35 ≡ 1 mod 11,

and we see that the order of 3 modulo 11 is 5.


Fact: The order of a modulo m always divides φ(m).

19
4 Week Four

4.1 Lecture Nine

Lemma 4.1.1 (Lemma 2.31, Niven) ak ≡ 1 mod m if and only if the order of a modulo m divides k.
Proof : Let h be the order of a modulo m. If h|k, we have k = hq for some q, hence

ak = ahq = (ah )q ≡ 1q mod m ≡ 1 mod m.

Conversely, if ak ≡ 1 mod m, we may use the division algorithm to write k = hq + r, 0 ≤ r < h. One then
has
1 ≡ ak mod m ≡ (ah )q ar mod m ≡ ar mod m.
Since h is the minimal positive integer such that ah ≡ 1 mod m, it follows that r = 0, and we are done.

If (a, m) = 1, then the order of a modulo m divides φ(m).
h
Lemma 4.1.2 (Lemma 2.33, Niven) If a has order h modulo m, then ak has order (h,k) modulo m.
h
For example, the order of a2 modulo m is 2 if h is even, and h if h is odd.
Proof : The following statements about positive integers j are equivalent:
1. (ak )j ≡ 1 mod m
2. h|(kj)
h k
3. (h,k) | (h,k) j
h
4. (h,k) |j
h
It follows that the least positive j satisfying (4), and hence (1), is exactly j = (h,k) .

Remark: The subgroup of Z× m generated by a is a cyclic group of order h. The same proof shows that the
h
smallest positive integer y such that ky ≡ 0 mod h is y = (h,k) .
Lemma 4.1.3 Let a have order r modulo m, and let b have order s modulo m. Then the order of ab modulo
rs rs [r,s]
m divides (r,s) = [r, s], and moreover is a multiple of (r,s)2 = (r,s) .

In particular (Lemma 2.34, Niven), if (r, s) = 1, then the order of ab modulo m is exactly rs.
Proof : Let t be the order of ab modulo m. Then

(ab)rs/(r,s) = (ar )s/(r,s) (bs )r/(r,s) ≡ (1)(1) mod m ≡ 1 mod m,


rs
and it follows that t| (r,s) . We also have

ast ≡ ast (bs )t mod m ≡ ((ab)t )s mod m ≡ 1 mod m,


 
r s r s r s
hence r|st, so (r,s) | (r,s) t⇒ (r,s) |t. By a symmetric argument we may show that (r,s) |t, and since (r,s) , (r,s) =1
rs
it follows that (r,s)2 |t.

Definition: An integer a is called a primitive root modulo m if it has order φ(m) modulo m. In this case,

m is the cyclic group of order φ(m).

20
Proposition 4.1.4 If m has a primitive root, then it has exactly φ(φ(m)) primitive roots.
Proof : Let g be a primitive root modulo m. Then we have a reduced residue system modulo m given by
φ(m)
{g, g 2 , . . . , g φ(m) }. By lemma 4.1.2, the order of g j modulo m is exactly (j,φ(m)) , which equals φ(m) exactly
when (j, φ(m)) = 1. There are exactly φ(φ(m)) such residue classes, and we are done.

Lemma 4.1.5 (Lemma 2.35, Niven) Let p, q be primes and let r ∈ N be such that q r |(p − 1). Then there are
q r − q r−1 residue classes of order q r modulo p.
r
Proof : The order of a modulo p divides q r if and only if aq ≡ 1 mod p. This congruence has exactly q r solutions
r−1
by corollary 1 of proposition 3.3.3. The order of a modulo p divides q r−1 if and only if aq ≡ 1 mod p, which
has exactly q r−1 solutions. The result is now immediate.

Theorem 4.1.6 (Theorem 2.36, Niven) Every prime p has a primitive root.
Proof : If p = 2 the result is immediate, so assume p is odd and write p − 1 in its prime factorization

p − 1 = q1r1 q2r2 · · · qkrk .


r
For each 1 ≤ j ≤ k, let aj be some integer of order qj j modulo p, whose existence is guaranteed by lemma 4.1.5.
r
Since (qiri , qj j ) = 1 for all i 6= j, we have by lemma 2.34 of Niven that a1 a2 has order q1r1 q2r2 modulo p, that
a1 a2 a3 has order q1r1 q2r2 q3r3 modulo p, and continuing in this fashion, we eventually see that a1 a2 · · · ak has order
p − 1 modulo p, as claimed.


21
4.2 Lecture Ten

Example: Modulo 5, the reduced residue classes are 1, 2, 3, and 4, with respective orders 1, 4, 4, and 2; we see
that 2 and 3 are the φ(φ(5)) primitive roots modulo 5. What are the primitive roots modulo 25? Exactly

{2, 3, 8, 12, 13, 17, 22, 23}.

Note that there are 8 = φ(φ(25)) of them, and that all are also primitive roots modulo 5. In fact, we may lift
any primitive root modulo p to p − 1 primitive roots modulo p2 , and for j ≥ 2, any primitive root modulo pj
lifts to exactly p primitive roots modulo pj+1 .
Proposition 4.2.1 For n ≥ 1, we have X
φ(d) = n.
d|n

Proof : The fractions { n1 , n2 , . . . , nn } are not all in lowest terms; when we do so, we may consider their denomi-
nators. For every divisor d of n, exactly φ(d) of these fractions have denominator d; indeed, these fractions are
exactly  
k(n/d)
: 1 ≤ k ≤ d, (k, d) = 1 .
n
Since there are exactly n fractions in our original set, the result follows.

Alternative proof of the existence of primitive roots modulo p: We use strong induction to find the
number of elements of order k modulo p, namely φ(k) if k | (p − 1), and 0 if k - (p − 1). The case k = 1 is
trivial. For k > 1, k | (p − 1), we first note that
X X
φ(k) + φ(d) = φ(d) = k.
d|k, d|k
d<k

Since p is prime, there are exactly k solutions to the congruence xk ≡ 1 mod p, which are exactly those x
modulo p with order dividing k. This, again, is exactly the sum
X
#{x : ordp (x) = k} + #{x : ordp (x) = d},
d|k,
d<k

where ordp (x) denotes the order of x modulo p; the result is now immediate.

Lemma 4.2.2 If d|n, then for any a with (a, n) = 1, the order of a modulo d divides the order of a modulo n.
Proof : If ordn (a) = h, then ah ≡ 1 mod n, so ah ≡ 1 mod d.

Proposition 4.2.3 If g is a primitive root modulo pr with r ≥ 2, then
r−2 (p−1)
gp 6≡ 1 mod pr .

Moreover, the converse holds if g is a primitive root modulo pr−1 .


Proof : If g is a primitive root modulo pr , then

ordpr (g) = φ(pr ) = pr−1 (p − 1) > pr−2 (p − 1),

22
from which it follows that
r−2 (p−1)
gp 6≡ 1 mod pr .
Now, suppose that g is a primitive root modulo pr−1 and that
r−2 (p−1)
gp 6≡ 1 mod pr .

The order of g modulo pr divides φ(pr ) = pr−1 (p − 1), and by lemma 4.2.2 must be a multiple of pr−2 (p − 1).
Since ordpr (g) 6= pr−2 (p − 1) by assumption, we deduce the result.

Theorem 4.2.4 Primitive roots exist modulo p2 for any prime p.
Proof : Let g be a primitve root modulo p and consider the lifts g + tp modulo p2 , 0 ≤ t ≤ p − 1. We claim
that all but one of these lifts are primitive roots modulo p2 .
Indeed, by proposition 4.2.3 it suffices to show that exactly one lift satifsies

(g + tp)p−1 ≡ 1 mod p2 .

Let f (X) = X p−1 − 1. Then g is a root of f (X) modulo p, and

f 0 (g) = (p − 1)g p−2 6≡ 0 mod p.

Thus g is a nonsingular root of f modulo p, and so by Hensel’s lemma exactly one lift of g is a root of f modulo
p2 ; every other such lift must then yield a primitive root.

Lemma 4.2.5 If g is a primitive root modulo p2 , then it is also a primitive root modulo p.
Proof : If ak ≡ 1 mod p, then

apk − 1 = (ak − 1)((ak )p−1 + (ak )p−2 + · · · + ak + 1).

Both factors are multiples of p, so it follows that apk ≡ 1 mod p2 . In particular, if g is a primitive root modulo
p2 , then g pk 6≡ 1 mod p2 for k = 1, 2, . . . , p − 2. Hence g k 6≡ 1 mod p for 1 ≤ k ≤ p − 2, and it follows that the
order of g modulo p is p − 1.

Next, we will consider primitive roots modulo pr for r ≥ 3. No more degenerate cases arise here, except when
p = 2. In this case, there are no primitive roots modulo 2r for any r ≥ 3.

23
4.3 Lecture Eleven

Theorem 4.3.1 Let p be an odd prime and let r ≥ 2. Then any primitve root modulo p2 is a primitive root
modulo pr .
Proof : We induct on r. The case r = 2 is trivial, so for r > 2 assume g is a primitive root modulo pr ; we will
show that g is a primitive root modulo pr+1 .
Indeed, by proposition 4.2.3 we have that
r−2 (p−1)
gp 6≡ 1 mod pr ,
r−1
and so by the same proposition it suffices to show that g p (p−1) 6≡ 1 mod pr+1 . By Euler’s theorem we have
that
r−2
g p (p−1) ≡ 1 mod pr−1 ,
r−2 (p−1)
so we can write g p = 1 + npr−1 for some n 6≡ 0 mod p. By the binomial theorem we have that
p  
pr−1 (p−1) r−1 p
X p
g = (1 + np ) = (npr−1 )k ,
k
n=0

p p
for 2 ≤ k ≤ p − 1, we see that pr+1 | (npr−1 )k . In fact we also have this divisibilty when k = p,
 
and since p| k k
and so
r−1 (p−1)
gp ≡ 1 + npr mod pr+1 6≡ 1 mod pr+1 ,
and we are done.

p
n2 p2r−2 .

nb. We only use the fact that p is odd in the cancellation of 2

Lemma 4.3.2 If r ≥ 3, then the order of every odd integer modulo 2r divides 2r−2 = 21 φ(2r ). In particular,
there are no primitive roots modulo 2r .
Proof : Again we induct on r. We did the case r = 3 in the last lecture, and so assuming the claim is true for
some r with r ≥ 3, then
r−2
a2 ≡ 1 mod 2r
r−2 r−2
for every odd a. Then 2r |(a2 − 1) and 2|(a2 + 1) by parity, hence
r−2 r−2 r−1
2r+1 |(a2 − 1)(a2 + 1) = a2 − 1,
r−1
whence a2 ≡ 1 mod 2r+1 , as claimed.

α
nb. The same proof shows that if a ≡ 5 mod 8, then 2α+2 ||(a2 − 1), where pk ||n if and only if pk | n and
pk+1 - n.
r−2
Theorem 4.3.3 (Theorem 2.43, Niven) Let r ≥ 3; then the set {±5, ±52 , . . . , ±52 } is a reduced residue
system modulo 2r . In particular, 5 has order 2r−2 modulo 2r , and the abelian group homomorphism

f : Z2r−2 × Z2 −→ Z×
2r

given by f (x, y) = 5x (−1)y is an isomorphism.

24
By way of comparison, note that if p is odd, the map is an isomorphism

f : Zpr−1 (p−1) −→ Z×
pr

given by f (x) = g x for any primitive root g modulo pr−1 .


Proof : The order of 5 modulo 2r divides 2r−2 by lemma 4.3.2, and so if 2r−2 is not the order, then the order
divides 2r−3 , hence
r−3
52 ≡ 1 mod 2r .
r−3
But then 2r |52 − 1, contradicting our previous remark with α = r − 3. Thus 5 has order 2r−2 modulo 2r ,
and so the residue classes
r−2
{5, 52 , . . . , 52 }
are distinct modulo 2r , as are the residue classes
r−2
{−5, −52 , . . . , −52 }.

Finally, 5k ≡ 1 mod 4, while −5k ≡ 3 mod 4, so the two sets above are disjoint, and we are done.

e1 e2
We now know the group structure of Z× er
n for every n. If n has prime factorization n = p1 p2 · · · pr , then by the
Chinese remainder theorem
Z× ∼ × × ×
n = Zpe1 × Zpe2 × · · · × Zper . r
1 2

If p is odd, then
Z× ∼
e = Z ei −1 ,
p i
i
p i (p −1)
i

and for p = 2 we have 


Z1
 if r = 1,
Z× ∼
= Z2 if r = 2, and
2r 
Z2r−2 × Z2 if r ≥ 3.

Primitive roots modulo non-prime powers


Note that φ(n) is even for every n ≥ 3. If we can write n = cd with (c, d) = 1 and c, d ≥ 3, then the order of
any a modulo n must divide 21 φ(n) = 12 φ(c)φ(d), as we have

aφ(n)/2 = (aφ(c) )φ(d)/2 ≡ 1φ(d)/2 mod c ≡ 1 mod c,

and similarly
aφ(n)/2 = (aφ(d) )φ(c)/2 ≡ 1φ(c)/2 mod d ≡ 1 mod d,
since by our assumption 2|φ(c), 2|φ(d). Our claim then follows by the Chinese remainder theorem.
The only integers a which do not have such a factorization are powers of 2, or are of the form a = pr or a = 2pr ,
where p is an odd prime and r ≥ 1. Numbers of this form are the only ones which could possibly have primitive
roots.
Theorem 4.3.4 (Theorem 2.41, Niven) The moduli that have primitive roots are exactly 1, 2, 4, pr , and 2pr ,
where p is an odd prime and r ≥ 1.
Proof : Next lecture.

25
5 Week Five

5.1 Lecture Twelve

Fun fact! If S(x) denotes the set of squarefree numbers s with s ≤ x, then one has

#S(x) 6
lim = 2.
n→∞ x π

Recall theorem 4.3.4 from last lecture, and let P R denote the set of moduli which have primitive roots. For
example, modulo 18, we have φ(18) = 6, and indeed a reduced residue system is given by {1, 5, 7, 11, 13, 17},
which have respective order 1, 6, 3, 6, 3, and 2. Thus 5 and 11 are primitive roots modulo 18, and as expected
we find there are 2 = φ(φ(18)) of them.
Similarly, modulo 9 a reduced residue system is given by {1, 2, 4, 5, 7, 8} with respective orders 1, 6, 3, 6, 3, and
2 (note the similarity with Z×
18 ), and we have the same result with the primitive roots 2 and 5.

Proof : (of theorem 4.3.4) We need only check that m = 2pr has primitive roots, the other claims having
already been proven. If {a1 , a2 , . . . , aφ(pr ) } is a reduced residue system modulo pr , then we claim that

{aj : 2 - aj } ∪ {aj + pr : 2 | aj }

is a reduced residue system modulo 2pr . Indeed, we see that we have exactly φ(2pr ) = φ(2)φ(pr ) = φ(pr )
residue classes, that all are distinct, and since (aj , p) = 1 we have u, v so that aj u + pv = 1; thus writing x = u
and y = v − pr−1 u, we have

1 = aj x + p(y + pr−1 x) = (aj + pr )x + py ⇒ (aj + pr , p) = 1,

and hence (since p is assumed odd) aj + pr is indeed a unit modulo 2pr , by the Chinese remainder theorem.
Furthermore, the order of the elements of the latter set (the lifts of the even aj ) do not change, as for 0 < k <
ordpr (aj ) we have
k  
r k
X k n r(k−n)
(aj + p ) = a p ≡ akj mod pr ,
n j
n=0

which is nonzero by assumption, thus akj 6≡ 0 mod 2pr . The same argument holds for the odd aj , and we see
that one of the elements in our reduced residue system must have order φ(pr ) = φ(2pr ), which completes the
proof.

× ∼
Remark: When m is odd, we have an isomorphism of groups π : Z×
m −→ Z2m .

Corollary 1: (Corollary 2.42, Niven) Let m ∈ P R and let (a, m) = 1. The congruence xn ≡ a mod m has d
solutions if aφ(m)/d ≡ 1 mod m where d = (n, φ(m)), and zero solutions otherwise.
Remark: The analogue for m = 2r , r ≥ 3, is corollary 2.44 in Niven.
Proof : Let g be a primitive root modulo m. Choose j, 1 ≤ j ≤ φ(m) so that g j ≡ a mod m, and note that if
xn ≡ a mod m then one must have (x, n) = 1. For every such x, there exists k so that g k ≡ x mod m, and thus
it suffices to solve the congruence
(g k )n ≡ g j mod m
for k. Since the order of g is φ(m), this congruence has a solution if and only if kn ≡ j mod φ(m). For fixed j,
theorem 3.1.3 tells us that there are d = (n, φ(m)) solutions if d|j, and none otherwise. But d|j if and only if
j = dl for some 1 ≤ l ≤ m, if and only if a ≡ g dl mod m.

26
Finally, this is equivalent to the statement that aφ(m)/d ≡ g φ(m)l mod m (it is a sufficient condition because
g di 6≡ 1 mod m for 1 ≤ i ≤ l − 1); but g φ(m)l ≡ 1 mod m, and we are done.

Corollary 2: (Corollary 2.38, Niven; Euler’s criterion): Let p be an odd prime. The congruence X 2 ≡ a mod p
p−1
has two solutions if a 2 ≡ 1 mod p, and no solutions otherwise. There is one solution if p|a.
Definition: The Carmichael lambda function, denoted λ(m), is the smallest exponent e ∈ N such that
ae ≡ 1 mod m for every (a, m) = 1.
Remark: We know λ(m)|φ(m), and λ(m) = φ(m) if and only if m ∈ P R. Moreover, as seen last week, if
m ∈ P R then λ(m) ≤ φ(m)
2 . By the Chinese remainder theorem,

λ(pe11 pe22 · · · perr ) = [pe11 , pe22 , . . . , perr ].

For odd primes, we have λ(pr ) = pr−1 (p − 1), which also holds for p = 2 and r ≤ 2. For r ≥ 3, one has instead
λ(2r )/2r−2 . Group theoretically, λ(m) is the exponent of the group Z× m.

Definition: A base-b pseudoprime is a composite number m such that bm−1 ≡ 1 mod m.


For example, we may take b = 2, m = 341; then

210 = 1024 = 3 · 341 + 1,

and so 2341−1 = (210 )34 ≡ 134 mod 341 ≡ 1 mod 341. Thus 341 is a base-2 pseudoprime. This notion gives rise
to the Fermat test for primality: if bm−1 6≡ 1 mod m, then m is composite. For example, with m = 341, b = 3,
we have
3341−1 ≡ 56 mod 341 6≡ 1 mod 341,
and it follows that 341 is not prime.

27
5.2 Lecture Thirteen

Recall: Fermat’s test for primality.


Definition: Let m be composite. Then m is called a Carmichael number if bm−1 ≡ 1 mod m for all
(b, m) = 1.
For example, we might take m = 561 = 3 · 11 · 17. If (b, m) = 1, then we have by Euler’s theorem

2 280
(b ) mod 3 ≡ 1 mod 3,

b561−1 ≡ (b10 )56 mod 11 ≡ 1 mod 11,

 16 35
(b ) mod 17 ≡ 1 mod 17.

The Chinese remainder theorem then implies that b560 ≡ 1 mod m.


In 1994, Alford, Granville, and Pomerance showed that there are infinitely many Carmichael numbers, in the
paper of the same name.
In fact, if 6k + 1, 12k + 1, and 18k + 1 are all prime for some k ∈ N, then their product is a Carmichael number.
For example with k = 1 we get that 1729 is a Carmichael number.
§3.1 – Quadratic residues
Most generally, we will investigate congruences of the form aX 2 + bX + c ≡ 0 mod p, where p is an odd prime.
Completing the square gives
4a2 X 2 + 4abX + 4ac ≡ 0 mod p ⇒ (2aX + b)2 ≡ b2 − 4ac mod p.
Thus we are led to ask when y 2 ≡ ∆ mod p (where ∆ = b2 − 4ac is the discriminant of our polynomial) has a
solution. If so, then
2aX + b ≡ y mod p ⇔ x ≡ (y − b)(2a)−1 mod p.
We note the obvious analogue of the quadratic formula. Thus it suffices to investigate when X 2 ≡ a mod p can
be solved. By Euler’s criterion, this occurs exactly when
p−1
a 2 ≡ 1 mod p, if p - a.

p−1
Example: We investigate such congruences modulo 7, when 2 = 3.
a ord7 (a) a3 mod 7 Solutions of x2 ≡ a mod 7
0 – 0 x ≡ 0 mod 7
1 1 1 x ≡ 1, 6 mod 7
2 3 1 x ≡ 3, 4 mod 7
3 6 −1 none
4 3 1 x ≡ 2, 5 mod 7
5 6 −1 none
6 2 −1 none

Definition: If (a, m) = 1, then a is called a quadratic residue modulo m if X 2 ≡ a mod m has a solution,
and a quadratic nonresidue otherwise.
Definition: If p is an odd prime, define the Legendre symbol ap via



  
a 1 if a is a quadratic residue modulo p,
= −1 if a is a quadratic nonresidue modulo p,
p 
0 if p|a.

28
a b
. Moreover, the number of solutions of X 2 ≡ a mod p is exactly
 
Remark: If a ≡ b mod p, then p = p
a

p + 1.
p−1
a

Theorem 5.2.1 (Theorem 3.1, Niven) If p is an odd prime and (a, p) = 1, then p =a 2 .
Proof : We give two proofs. In the first, we simply use Euler’s criterion (this is left as an exercise).
For the second, we observe that if a is a quadratic residue modulo p, then we can choose some z such that
z 2 ≡ (−z)2 mod p ≡ a mod p. We then pair the reduced residue classes modulo p apart from ±z as (xi , yi ),
with xi yi ≡ a mod p. There are p−3
2 such pairs, and by Wilson’s theorem

p−3
2
Y
−1 ≡ (p − 1)! mod p ≡ z(−z) xi yi mod p
i=1

p−3 p−1
≡ −a · a 2 mod p ≡ −a 2 mod p,
and the result follows. If a is a nonresidue, we repeat the above construction, this time pairing all residue
classes xi y1 ≡ a mod p, i = 1, 2, . . . , p−1
2 , and we are done.

a2
Corollary 1: For any integers a, b, we have ab a b
   
p = p p ; in particular, if (a, p) = 1 we have p = 1.

In other words, the product of two quadratic residues is a quadratic residue, as is the product of two quadratic
nonresidues. The product of a residue and a nonresidue is a nonresidue – compare this behaviour with that of
the positive and negative integers.

29
5.3 Lecture Fourteen

Recall: The Legendre symbol for p - a is defined


  (
a 1 if x2 ≡ a mod p has a solution,
=
p −1 otherwise.
p−1
a

By Euler’s criterion, we showed that a 2 ≡ p mod p.
Example: When a = −1 and p is odd, we have that
  (
−1 p−1 1 if p ≡ 1 mod 4,
≡ (−1) 2 mod p ≡
p −1 if p ≡ 3 mod 4.

So X 2 ≡ −1 mod p has two solutions if p ≡ 1 mod 4, and no solutions if p ≡ 3 mod 4.


nb. For odd primes p, we have
p−1
p−1 2  
Y p−1 Y p−1 p−1
i ≡ (−1) 2 j mod p ≡ (−1) 2 ! mod p. (1)
2
i= p+1
2
j=1

In particular, if p ≡ 1 mod 4 we get


  2   p−1
p−1 p−1 p−1 Y
! ≡ (−1) 2 i mod p ≡ (p − 1)! mod p ≡ −1 mod p,
2 2 p+1
i= 2

 
p−1
and hence x = 2 ! solves x2 ≡ −1 mod p.

Theorem 5.3.1 (The Law of Quadratic Reciprocity) Let p 6= q be odd primes; then
  
p q p−1 q−1
= (−1) 2 · 2 .
q p

In other words, pq = pq if p or q ≡ 1 mod 4, and pq = − pq if p ≡ q ≡ 3 mod 4. Knowing whether or not


   

X 2 ≡ p mod q has solutions is the same as knowing whether or not X 2 ≡ q mod p has solutions.
Proof : (due to Rousseau, 1991) First, some background. Let α = p−12 ,β =
q−1
2 . Let
n pq o
F = 1≤k< : (k, pq) = 1
2
be the “first half” of Z×
pq and let
n qo
L = (i, j) ∈ Z×
p × Z×
q : 1 ≤ i ≤ p − 1, 1 ≤ j <
2
be the “left half” of Z× ×
p × Zq , and let π : Zpq → Zp × Zq be the map given by the Chinese remainder theorem.
One can see that for every k ∈ Z× pq , one has π(k) ∈ L or −π(k) ∈ L (we will write k ∈ −L). For each such k,
choose k ∈ {±1}, ik ∈ {1, 2, . . . , p − 1}, jk ∈ {1, 2, . . . , β} such that

π(k) = (ik , jk ).

30
In particular, if k 6= k 0 ∈ F , then π(k) 6= π(k 0 ) and π(k) 6= −π(k 0 ). Thus each ordered pair (ik , jk ) is distinct,
and we obtain ! 
Y Y Y Y Y
(k, k) ≡ π(k) ≡ k (ik , jk ) ≡ k  (i, j) , (2)
k∈F k∈F k∈F k∈F (i,j)∈L

the calculation taking place in Z×


p × Z×
q and the congruences taken (modp, modq).
Now, consider the right-hand side of (2): we have (with the same notation convention)

Y p−1
YY β
(i, j) ≡ (i, j) ≡ (((p − 1)!)β , (β!)p−1 ).
k∈F i=1 j=1

From (1), we have that


q−1
Y
i ≡ (−1)β β! mod q,
i=β+1

hence (modp, modq) we have


  α 
Y q−1
Y
(i, j) ≡ ((p − 1)!)β , β! · i(−1)β   ≡ (((p − 1)!)β , (−1)αβ ((q − 1)!)α ),
(i,j)∈L β+1

and finally by Wilson’s theorem we obtain


Y
(i, j) ≡ ((−1)β , (−1)αβ (−1)α ).
(i,j)∈L
Q
Thus with  = k∈F k , the right-hand side of (2) becomes

((−1)β , (−1)αβ (−1)α ).


Now, on the left-hand side, we look at the first co-ordinate modulo p:
  −1
Y Y  Y  Y 
k≡ k≡ k  k . (3)
  
1≤k< pq pq pq
   
k∈F 2
, 1≤k< 2
, 1≤k< 2
,
(pq,k)=1 p-k q|k
 pq 
The first factor in (3) splits into intervals of length p − 1, with one exception, namely the interval ending 2 .
Thus modulo p we see
     
Y Y Y Y Y
k= k  k · · ·  k  k ;
1≤k< pq
2
, 1≤k≤p−1 p+1≤k≤2p−1 (β−1)p≤k≤βp−1 βp+1≤k≤βp+α
p-k
 pq 
but βp + α = 2 , so we see that
Y
k ≡ ((p − 1)!)β α! mod p.
1≤k< pq
2
,
p-k

The second factor of (3) is the inverse of


 
Y
α q
k ≡ q · 2q · · · αq mod p ≡ q α! mod p ≡ α! mod p,
p
1≤k< pq
2
,
q|k

31
with the last congruence following by Euler’s criterion. Thus (3) becomes
  −1
Y
β q
k ≡ ((p − 1)!) α! α! mod p,
p
k∈F

q
which by Wilson’s theorem is congruent modulo p to (−1)β p . The same proof shows
 
Y p
k ≡ (−1)α mod q,
q
k∈F

and so (2) becomes


    
β q α p
(−1) , (−1) ≡ ((−1)β , (−1)αβ (−1)α ) (modp, modq).
p q

The first co-ordinate tells us that pq ≡  mod p, and the second that pq = (−1)αβ  = (−1)αβ pq (where we
  

have equality rather than congruence, as pq ∈ {±1} and p is odd), hence




  
p q
= (−1)αβ ,
q p

as claimed.


32
6 Week Six

6.1 Lecture Fifteen


p−1
Recall: Last week, we saw that Euler’s criterion implies that −1

p = (−1) 2 for any odd prime p. In other
words, x2 ≡ −1 mod p has 2 solutions if p ≡ 1 mod 4, and no solutions if p ≡ 3 mod 4. There is a single solution
if p = 2.
Consequently, we see that, for every integer x, all of the prime factors of x2 +1 (other than 2) must be congruent
to 1 modulo 4. Similarly, for any x, k ∈ Z we have that all prime factors p of x2 + k 2 satisfy
p | 2k or p ≡ 1 mod 4,
since if p - k then x2 + k 2 ≡ 0 mod p implies that x2 ≡ −k 2 mod p, hence (xk −1 )2 ≡ −1 mod p and so p = 2 or
p ≡ 1 mod 4. Note that in the first case, we must have (x, k) > 1.
Example: We use quadratic reciprocity to answer the question: Does x2 ≡ 55 mod 367 have a solution? Note
that 367 is a prime congruent to 3 modulo 4.
55

To answer this question we compute the Legendre symbol 367 : by multiplicativity we have
    
55 5 11
= .
367 367 367
The law of quadratic reciprocity then implies that
     
5 367 2
= = = −1,
367 5 5
since the quadratic residues modulo 5 are 1 and 4, and similarly
       2
11 367 4 2
=− =− =− = −1.
367 11 11 11
55

Thus 367 = (−1)(−1) = 1, and we see that 55 is a quadratic residue modulo 367. The theorem is non-
constructive, but one may check that (±34)2 ≡ 55 mod 367.
We see from this example that one algorithm for calculating (ap) is given by:
1. Factor a completely, a = pe11 pe22 · · · pekk .
2. Use multiplicativity and periodicity:
   e1  e2   ek 
a p1 p2 p
= ··· k .
p p p p

3. Use the law of quadratic reciprocity.


4. If not finished, return to 1.
Theorem 6.1.1 (Theorem 3.3, Niven) If p is an odd prime, then
 
2 p2 −1
= (−1) 8 ;
p
that is,
  (
2 1 if p ≡ ±1 mod 8,
=
p −1 if p ≡ ±3 mod 8.

33
The proof is not given here.
§3.3 – The Jacobi symbol
Let p1 , p2 , . . . , pk be odd primes (not necessarily distinct), and let Q be their product. The Jacobi symbol
a
Q is defined
  Y k  
a a
= ,
Q pj
j=1

where the symbols on the right are Legendre symbols.


8

Example: We compute the Jacobi symbol 15 . We have
       
8 8 8 2 2
= = = (−1)(−1) = 1.
15 3 5 5 5
8
is 1, the congruence x2 ≡ 8 mod 15 has no solution, as x2 ≡ 2 mod 3

Note that although the Jacobi symbol 15
a
hasn’t any. However, we can say that, if Q = −1, then x2 ≡ a mod Q has no solutions.
Our example shows that the converse is false; why, then, define the Jacobi symbol at all? There are several
reasons, chief among which are
1. It agrees with the Legendre symbol when Q is prime, and
2. It is easy to compute without factoring any integers.
The first of these assertions is clear, but the second is not yet.
Properties of the Jacobi symbol
• It is totally multiplicative in both arguments; that is, if Q and R are odd primes, then for any a, b we
have          
ab a b a a a
= , = .
Q Q Q QR Q R
a b
 
• It is periodic in the top argument with period Q, i.e. if a ≡ b mod Q then Q = Q .
The second property is immediate if Q is squarefree, and if not then we write Q = Q0 S with Q0 squarefree and
S a perfect square, and we have that

a 2
         
a a a a a
= 0
= 0
√ = .
Q Q S Q S Q

Before proceeding, we first record the following


Lemma 6.1.2 If b1 , b2 , . . . , bk are odd, then
k
X bj − 1 b1 b2 · · · bk − 1
≡ mod 2.
2 2
j=1

Proof : If k = 2, then
 
b1 b2 − 1 b1 − 1 b2 − 1 (b1 − 1)(b2 − 1)
− + = ≡ 0 mod 2,
2 2 2 2
and the general case follows by induction (exercise).


34
−1

Theorem 6.1.3 (Theorem 3.7, Niven) If Q > 0 is odd, then the Jacobi symbol Q equals
(
Q−1 1 if Q ≡ 1 mod 4,
(−1) 2 =
−1 if Q ≡ 3 mod 4.

Proof : Since square factors of Q do not affect the Jacobi symbol (as illustrated above), we may assume without
loss of generality that Q = p1 p2 · · · pk is squarefree. Then by lemma 6.1.2 we have that
Q−1 p1 − 1 p2 − 1 pk − 1
≡ · ··· mod 2,
2 2 2 2
hence       
−1 −1 −1 −1
     
p1 −1 p2 −1 pk −1 Q−1
= ··· = (−1) 2
(−1) 2
· · · (−1) 2
= (−1) 2 ,
Q p1 p2 pk
as claimed.


35
6.2 Lecture Sixteen

Theorem 6.2.1 (Theorem 3.8, Niven; the law of Quadratic reciprocity for Jacobi symbols) Let P, Q ∈ N be
odd with (P, Q) = 1. Then
   (
P Q P −1 Q−1
· −1 if P ≡ Q ≡ 3 mod 4,
= (−1) 2 2 =
Q P 1 otherwise.

P

Note that if (P, Q) > 1, we must have Q = 0.
Proof : Write P = p1 p2 · · · pk , Q = q1 q2 · · · ql , where the pi and qj are odd (not necessarily distinct) primes. By
multiplicativity, we have
  Y k   k Yl  
P pi Y pi
= = ,
Q Q qj
i=1 i=1 j=1

where the factors in the last product are Legendre symbols. The law of quadratic reciprocity (for Legendre
symbols) then implies that
  Yk Y
l    
P qj pi −1 qj −1 Q Pk Pl pi −1 qj −1
= (−1) 2 · 2 = (−1) i=1 j=1 2 · 2 .
Q pi P
i=1 j=1

By lemma 6.1.2 from our last lecture, the exponent of −1 is exactly


k X
l
X p i − 1 qj − 1 P −1 Q−1
· ≡ · ,
2 2 2 2
i=1 j=1

hence  
P P −1 Q−1
= (−1) 2 · 2 ,
Q
as claimed.

2

Application: We calculate the Legendre symbol p , where p is an odd prime; rather, we will show that the
Jacobi symbol Q2 obeys the formula from last lecture, namely


  (
2 Q2 −1 1 if Q ≡ ±1 mod 8,
= (−1) 8 =
Q −1 if Q ≡ ±3 mod 8,

from which the special case of the Legendre symbol follows. By periodicity in the top argument, we have
that         
2 2−Q −1 Q − 2 Q−1 Q−2
= = = (−1) 2 .
Q Q Q Q Q
Since Q is odd and positive, we must have that (Q, Q−2) = 1, and so by quadratic reciprocity we see that
   
2 Q−1 Q Q−1 Q−3
= (−1) 2 (−1) 2 · 2 ;
Q Q−2

again, since one of Q − 1 and Q − 3 must be divisible by 4, we cancel the last factor and obtain
     
2 Q−1 Q Q−1 2
= (−1) 2 = (−1) 2 .
Q Q−2 Q−2

36
By descent, we obtain    
2 Q−1 Q−3
3 2 2
= (−1) 2 (−1) 2 · · · (−1) (−1) ,
Q 3
and finally since 2 is a quadratic nonresidue modulo 3 we have
 
2 Q−1 1 Q−1 Q+1 Q2 −1
= (−1)1+2+···+ 2 = (−1) 2 · 2 · 2 = (−1) 8 ,
Q
and we are done.

a

We can turn this into a general algorithm for computing the Jacobi symbol. Indeed, to compute Q , we may
apply the following steps:
P

1. Factor −1 and any powers of 2 from a, leaving Q with P an odd positive number.
2. Use quadratic reciprocity and periodicity.
3. If not finished, return to 1.
Note, in particular, that this algorithm doesn’t require us to factor any integers.
Example: 53681 is prime and congruent to 1 modulo 4. Is 1311 a quadratic residue modulo 53681?
It suffices to compute the Jacobi symbol, which in the case that Q is an odd prime is exactly the Legendre
symbol. Using the algorithm outlined above, we find
         
1311 53681 −70 −1 2 35
= = =
53681 1311 1311 1311 1311 1311
       2
35 1311 16 4
= (−1)(1) =− (−1) = = = 1.
1311 35 35 35
So 1311 is indeed a square modulo 53681.
Here we will give an outline of a more “traditional” proof of the law of quadratic reciprocity, nearer to the proof
given in Niven. We start with a preliminary result.
Lemma 6.2.2 (Gauss’s lemma) Let p be an odd prime and let
   
p−1 p+1 p+3
F = 1, 2, . . . , , −F = , ,...,p − 1 .
2 2 2

Given a with (a, p) = 1, let n = #{k ∈ F : ak mod p ∈ −F }. Then ap = (−1)n .




 
Note that from this we can immediately compute p2 , since in this case n = #{ p4 < k < p2 }. Next, we show
that
p−1
2  
X aj
n≡ mod 2,
p
j=1

and we also use the fact that


p−1 q−1
2   2  
X aj X kp p−1 q−1
+ = · .
p q 2 2
j=1 k=1

One proof of this fact counts lattice points in the rectangle R in the first quadrant, whose vertices are at
(0, 0), (0, q), (p, 0) and (p, q); specifically, those lying above and below the line segment joining the origin to
(p, q) — but this is all the detail we give here.

37
With this machinery, we can show that there are infinitely many primes congruent to 1 modulo 4. Indeed, if
p1 , p2 , . . . , pk is any finite list of such primes, let

N = (2p1 p2 · · · pk )2 + 1.

Then pi - N for i = 1, 2, . . . , k. But since N is one more than a square and odd, we know that all of its
prime factors must be congruent to 1 modulo 4; in particular, there must be such a prime which is not on the
list.

38
6.3 Lecture Seventeen

Final exam date: Friday, December 8, at noon.


Definition: A degree-d form (or homogeneous polynomial) is a polynomial, each of whose monomials
has degree d. For example, X 3 + 2Y 3 + 3Y 2 Z − 4XY Z is a degree-3 form. A binary form is a form in two
variables, and a quadratic form is a degree-2 form. We will focus on binary quadratic forms.
Example: One binary quadratic form is f (X, Y ) = X 2 + Y 2 ; another is g(X, Y ) = 53X 2 + 152XY +
109Y 2 .
Among the questions we might ask about binary quadratic forms f (X, Y ), two important ones are:
1. Which m ∈ Z are represented by f ? That is, for which m ∈ Z do we have x, y ∈ Z with f (x, y) = m?
2. Which n ∈ Z can be properly represented by f? That is, when is m represented m = f (x, y) with
(x, y) = 1?
One motivation for the second question is the observation that for any binary quadratic form f , we have
f (dx, dy) = d2 f (x, y). We first investigate the form f (X, Y ) = X 2 + Y 2 , and investigate when f represents a
prime p. We observe that 2 = 12 + 12 , and from now on will restrict our attention to odd primes p.
Lemma 6.3.1 If p ≡ 3 mod 4 and p|(x2 + y 2 ), then p|x and p|y.
Proof : Since p|(x2 + y 2 ), we have that x2 ≡ −y 2 mod p. If p - y, then y is a unit modulo p and we have the
equivalent congruence (xy −1 )2 ≡ 1 mod p, or p | ((xy −1 )2 + 1), contradicting our result from the end of the last
lecture that p | ((2n)2 + 1) implies p ≡ 1 mod 4. Thus p | y, from which we immediately see p | x.

In particular, if p ≡ 3 mod 4, then there is no way to express p as the sum of two squares.
Proposition 6.3.2 If p ≡ 1 mod 4, then there exist x, y ∈ Z such that x2 + y 2 = p and (x, y) = 1.
Proof : Fix some z so that z 2 ≡ −1 mod p, and consider the set
√ √
S = {u + zv : 0 ≤ u < p, 0 ≤ v < p}.

It is not difficult to see that #S = (1 + b pc)2 , and that
√ √
(1 + b pc)2 > d pe2 > p,

where dxe denotes the ceiling function. Thus by the pigeonhole principle there must be two distinct elements
u + zv, u0 + zv 0 (i.e. with not both u = u0 and v = v 0 ) which are congruent modulo p. Define

x = u − u0 , y = v 0 − v.

Then since u − u0 ≡ z(v 0 − v) mod p, we see that x2 ≡ −y 2 mod p, and so p|(x2 + y 2 ). Moreover, we see
that
|x2 + y 2 | ≤ |x|2 + |y|2 < 2p,
and since we do not have x = y = 0 by our earlier remarks, it follows that x2 + y 2 = p. Furthermore, if
d = (x, y), then it follows that d2 |p and hence d = 1.

Theorem 6.3.3 (due to Fermat) An integer n is properly represented by X 2 + Y 2 if and only if 4 - n and no
prime p ≡ 3 mod 4 has p | n.

39
Proof : Suppose first that n = x2 + y 2 with (x, y) = 1, and let p ≡ 3 mod 4 be prime. If p|(x2 + y 2 ), then by
lemma 6.3.2 p|x and p|y, thus (x, y) > 1, a contradiction.
Conversely suppose that no prime factor p of n has p ≡ 3 mod 4. Since we know each prime factor is properly
represented, its suffices to prove that the product mn of any numbers m, n properly represented by X 2 + Y 2 ,
is itself properly represented.
Write m = w2 + z 2 and n = x2 + y 2 with (w, z) = (x, y) = 1. Then

mn = (wx)2 + (wy)2 + (xz)2 + (yz)2 = (wx − yz)2 + (wy − xz)2 ,

and it suffices to check coprimality.


[Here we encounter an error in the proof, the rest of which has been omitted.]
In the next lecture, we will prove the following, also due to Fermat.
Theorem 6.3.4 Given n ∈ N, write n in its prime factorization as
k l
γ
pβi i
Y Y
n = 2α qj j ,
i=1 j=1

where every pi has pi ≡ 1 mod 4 and every qj has qj ≡ 3 mod 4. Then n is represented by X 2 + Y 2 if and only
if every γj is even; in other words, if and only if we can write n = ab2 , where

p|a ⇒ p 6≡ 3 mod 4 and p|b ⇒ p ≡ 3 mod 4.

40
7 Week Seven

7.1 Lecture Eighteen

Recall: Theorem 6.3.4.


Proof : Lemma 6.3.1 showed that if q|(x2 + y 2 ) and q ≡ 3 mod 4 is prime, then q|x and q|y, thus q 2 |(x2 + y 2 ).
Conversely, proposition 6.3.2 showed the converse statement for p ≡ 1 mod 4, and theorem 6.3.3 for 2 and for
q 2 , q ≡ 3 mod 4, and since
(a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (ad + bc)2
we see that representability by X 2 + Y 2 is multiplicative, which completes the proof.

Fact: A positive integer n can be properly represented by X 2 + Y 2 if and only if each γj = 0; that is, if and
only if no prime congruent to 3 modulo 4 divides n. The proof of one implication was attempted at the end of
the last lecture; today, we develop machinery to prove more general statements.
[Aside: Lagrange’s Four-Square theorem asserts that any nonnegative integer can be written as the sum of at
most four squares. One proves this first for primes, then by showing multiplicative closure of representability
by W 2 + X 2 + Y 2 + Z 2 . We may draw an analogy between the corresponding observation in the proof of
theorem 6.3.4 and multiplicativity of the complex norm |a + ib|2 = a2 + b2 , and that of the norm in the ring of
quaternions,
|a + ib + jc + kd|2 = a2 + b2 + c2 + d2 .
Moreover let f (X1 , X2 , . . . , Xn ) be any quadratic form. If f represents every integer in the set {1, 2, . . . , 15},
then f represents every integer. This is known as the Fifteen Theorem.]
§3.4 – Binary quadratic forms
Notation: For the remainder of this lecture, f (X, Y ) = aX 2 + bXY + cY 2 will denote an arbitrary quadratic
form of discriminant d = b2 − 4ac.
When does f (x, y) = 0 for x, y not both 0? Suppose d is a perfect square. If a 6= 0 then we may factor f over
Q via √ ! √ !
b−2 d b− d
f (x, y) = a x + y x+ y ,
2a 2a
and so by proposition 6.2.2 we see that f also factors over Z. In this case, there are many ways to represent
0, as we need only make one of the factors equal zero. If a = 0 then f (X, Y ) = Y (bX + cY ) and we have the
same observation.
In the case d = 0, we can write f (X, Y ) = e(gX + hY )2 for some integers e, g, h. If e > 0 then f is positive
semidefinite; that is, f (x, y) ≥ 0 for any x, y ∈ Z. Similarly if e < 0 then f (x, y) ≤ 0 for all x, y ∈ Z, and f
is said to be negative semidefinite. If furthermore f (x, y) = 0 implies that x = y = 0, then f is said to be
positive definite (resp. negative definite).
Now, suppose d is not a perfect square; then f is irreducible over Q. In particular, ac 6= 0, else d = b2 which is
not the case.
Theorem 7.1.1 (Theorem 3.10, Niven) Suppose that a binary quadratic form f (X, Y ) has discriminant d < 0;
then f is definite (i.e. positive definite or negative definite).
Proof : Suppose f (m, n) = 0 and suppose n 6= 0. The identity

4af (x, y) = (2ax + by)2 − dy 2

41
implies that
m
(2am + bn)2 − dn2 = 0 ⇔ dn2 = (2am + bn)2 ⇔ d = (2a + b)2 ,
n
so d < 0 is the square of a rational number, which is a contradiction. A symmetric argument with the assumption
m 6= 0 completes the proof.

We might ask: when is f positive? negative?
Theorem 7.1.2 (Theorem 3.11, Niven) Let f be a binary quadratic form of discriminant d. If d > 0 then f
is indefinite, that is, f represents both positive and negative values. If d < 0 and a > 0, then f is positive
definite. If d < 0 and a < 0, then f is negative definite.
Proof : Suppose d > 0. Then if a 6= 0 we have that f (1, 0) = a and f (b, −2a) = −ad, and since d > 0 we
know that a and −ad have opposite signs, so f is indefinite. The same argument works if we assume c 6= 0,
using f (0, 1) = c, f (−2c, b) = −cd. Finally if a = c = 0 then f (1, 1) = b, f (−1, 1) = −b, and since f 6= 0 by
assumption this exhausts all cases.
Suppose now that d < 0 so that in particular d is not a perfect square. Then we know a 6= 0 and so by our
identity we have that
4af (x, y) = (2ax + by)2 + |d|y 2 ≥ 0,
from which it follows that a must have the same sign as f (x, y). The same equation shows that if f (x, y) = 0
then y = 0, thus x = 0, and we are done.


42
7.2 Lecture Nineteen

Theorem 7.2.1 (Theorem 3.12, Niven) Let d ∈ Z; then there exists a binary quadratic form of discriminant
d if and only if d ≡ 0 or 1 mod 4.
Proof : Suppose f (X, Y ) = aX 2 + bXY + cY 2 has discriminant d; then

d = b2 − 4ac ≡ b2 mod 4,

and since the squares modulo 4 are 0 and 1 the result is clear. Conversely, if d ≡ 0 mod 4 we may take
f (X, Y ) = X 2 − d4 Y 2 which has discriminant d, and if d ≡ 1 mod 4 we instead take f (X, Y ) = X 2 +XY − d−1
4 Y
2

with the same result.



Theorem 7.2.2 (Theorem 3.13, Niven) Let d, n ∈ Z with n 6= 0. There exists a binary quadratic form of
discriminant d that properly represents n if and only if the congruence x2 ≡ d mod 4n has a solution.
Remark: This theorem guarantees the existence of some binary quadratic form of discriminant d, but repre-
sentability by a specific form is a much harder question.
Example: Take n = −3. There is a binary quadratic form of discriminant d representing −3 if and only if
x2 ≡ d mod −12 has a solution. The squares modulo 12 are 0, 1, 4, and 9, and so we see that the only binary
quadratic forms representing −3 have discriminant d lying in one of these residue classes modulo 12.
Proof : Suppose u2 ≡ d mod 4n, and write u2 − d = 4nv for some integer v. Then with

f (X, Y ) = nX 2 + uXY + vY 2 ,

we see that the discriminant of f is u2 −4nv = d and that f (1, 0) = n. Conversely, suppose that as2 +bst+ct2 = n
with (s, t) = 1 and b2 − 4ac = d. Choose m1 , m2 ∈ Z such that (m1 , m2 ) = 1, m1 m2 = 4n, and also (m1 , t) =
(m2 , s) = 1. Note that we can always choose such m1 , m2 : for example,
Y 4n
m1 = pordp (4n) , m2 = .
m1
p|s

Recalling from last lecture the identity 4af (x, y) = (2ax + by)2 − dy 2 , hence

(2as + bt)2 − dt2 ≡ 0 mod m1 ⇔ d ≡ (2ast−1 + b)2 mod m1 ,

since (t, m1 ) = 1. A symmetric argument shows that d ≡ (2cts−1 + b)2 mod m2 , and since (m1 , m2 ) = 1 the
Chinese remainder theorem implies that we have a solution to the congruence x2 ≡ d mod m1 m2 ≡ d mod 4n,
and we are done.

Corollary 1: Let d ≡ 0 or 1 mod 4, and let p be an odd prime. There exists a binary quadratic form of
discriminant d representing p if and only if dp = 0 or 1.

Proof : By Theorem 7.2.2 it suffices to show that x2 ≡ d mod 4p has a solution if and only if dp = 0 or 1.


Suppose x2 ≡ d mod 4p so that x2 ≡ d mod p; it follows that dp = 0 or 1.




Conversely, if dp = 0 or 1, then we may write x2 ≡ d mod p, and since d is a square modulo 4 by assumption


we have y 2 ≡ d mod 4, and the Chinese remainder theorem completes the proof.

Thus we are led to investigate the set of all binary quadratic forms of a given discriminant.

43
Example: Determine all integers represented by f (X, Y ) = 53X 2 + 152XY + 109Y 2 .
If we set y = 2u − 7v, x = −3u + 10v, then a calculation shows that f (x, y) = u2 + v 2 , and thus if n is
represented by f , it is also represented by X 2 + Y 2 . Conversely if n is represented by this latter form, then
n = u2 +v 2 = f (−3u+10v, 2u−7v), and we see that both forms represent exactly the same set of integers.
We can
 associate to any binary quadratic form f (X, Y ) = aX 2 + bXY + cY 2 the 2 × 2 symmetric matrix
b

a
F = b 2 , which has the property that
2 c
 
T x
~x F ~x = f (x, y), ~x = ,
y
 
53 76
where AT
denotes the matrix transpose. In our above example, F = is associated to f (X, Y ) =
76 109
 
2 2 1 0
53X + 152XY + 109Y , and G = is associated to g(X, Y ) = X 2 + Y 2 .
0 1
With this in mind, we write our change of variables from our example above as
    
x −3 10 u
~x = = =: M~u,
y 2 −7 v

hence
f (x, y) = ~xT F ~x = (M~u)T F (M~u) = ~uT (M T F M )~u,
and indeed, M T F M = G.

44
8 Week Eight

8.1 Lecture Twenty

Recall from last lecture the binary quadratic forms

f (X, Y ) = 53X 2 + 152XY + 109Y 2 , g(X, Y ) = X 2 + Y 2 ,

with their associated matrices    


53 76 1 0
F = and G = ,
76 109 0 1
   
T −3 10 a b
respectively. We saw that M F M = G, where M = . Recall that if A = , then
2 −7 c d
   
−1 1 d −b 1 d −b
A = = .
det A −c a ad − bc −c a
     
−7 −10 u x
In our case, det M = 1 and so M −1 = ; however, we observe that if M = , then
−2 −3 v y
     
u −1 x −7x − 10y
=M = .
v y −2x − 3y

Since f (−u, −v) = f (u, v) for any binary quadratic form, the negative signs in this matrix are of no concern.
Thus we obtain F = (M −1 )T GM −1 , which combined with our previous relation G = M T F M implies that f
and g represent exactly the same integers.
Definition: The modular group Γ is the set of all 2 × 2 matrices over Z with determinant 1, with the group
operation being multiplication.
Also used to denote Γ are SL2 (Z) and SL(2, Z). Since Γ is a group we have that M ∈ Γ ⇔ M −1 ∈ Γ.
Definition: Two binary quadratic forms f and g are called equivalent, denoted f ∼ g, if there exists some
M ∈ Γ such that M T F M = G, where F and G are the associated matrices of f and g, respectively.
 
t a b
It is easy to see that if f ∼ g with M F M = G, M = , then f (ax + by, cx + dy) = g(x, y). In our
c d
previous example, we showed that 53X 2 + 152XY + 109Y 2 ∼ X 2 + Y 2 .
Remark: If M T F M = G, then (−M )T F (−M ) = G. Thus we may take M or −M as we see fit, or equivalently
choose a representative from P SL2 (Z) = Γ/{±I}.
Theorem 8.1.1 (Theorem 3.16, Niven) ∼ is an equivalence relation.
Proof : Reflexivity is clear, as F = I T F I, as is symmetry by our remarks above, so it suffices to prove
transitivity. Suppose f ∼ g, g ∼ h, and let M, N ∈ Γ be such that M T F M = G, N T GN = H. Then M N ∈ Γ
and (M N )T F (M N ) = H, so f ∼ h, and we are done.

2
Note that if f (X, Y ) = aX 2 + bXY + cY 2 has associated matrix F , then det F = ac − b4 = − d4 , where d is
the discriminant of f . In particular, this means that if f ∼ g then their discriminants are equal. Indeed, in
our perennial example f (X, Y ) = X 2 + Y 2 , it is not difficult to see that the discriminant of f is −4, as is the
discriminant of g.
Theorem 8.1.2 (Theorem 3.17, Niven) Let f ∼ g be binary quadratic forms, and let n ∈ Z. Then:

45
1. The representations of n by f are in one-to-one correspondence with the representations of n by g.
2. The proper representations of n by f are in one-to-one correspondence with the proper representations of
n by g.
Proof :
1. If f (x, y) = n, then ~xT F ~x = (n), and so with M T F M = G we have (M~x)T G(M~x) = (n). This process is
invertible, whence we deduce the result.
2. In the calculation in the proof of the first statement, if m|x and m|y then m divides both entries of M~x,
and conversely.

We seek to understand the structure of the equivalence classes of binary quadratic forms of discriminant d,
which our work above shows to be partitioned by ∼. We begin by showing that every equivalence class contains
a “nice” form; that is, roughly speaking, one in which b is the smallest coefficient in absolute value and c the
largest.
Definition: Let f (X, Y ) = aX 2 + bXY + cY 2 be a binary quadratic form. Then f is said to be reduced if
one of the following conditions hold:
1. −|a| < b ≤ |a| < |c|.
2. 0 ≤ b ≤ |a| = |c|.

46
8.2 Lecture Twenty-One

Recall from last time the notion of a reduced binary quadratic form; there is an algorithm for converting any
given binary quadratic form f into an equivalent, reduced binary quadratic form.
Example: We will reduce f = f0 (X, Y ) = 53X 2 + 152XY + 109Y 2 , which corresponds to the matrix F =

53 76
. For n ∈ Z, let
76 109
   
1 n 0 1
Tn = ,S = .
0 1 −1 0
We note that if F1 is defined via
 T     
T 1 −1 53 76 1 −1 53 23
F1 = T−1 F0 T−1 = = ,
0 1 76 109 0 1 23 10

which corresponds to the form f1 (X, Y ) = 53X 2 + 46XY + 10Y 2 . Next, we set
 T     
T 0 1 53 23 0 1 10 −23
F2 = S F1 S = = ,
−1 0 23 10 −1 0 −23 53

so that f2 (X, Y ) = 10X 2 − 46XY + 53Y 2 . Continuing in this way, we set


 T     
1 2 10 −23 1 2 10 −3
F3 = T2T F2 T2 = = ,
0 1 −23 53 0 1 −3 1
T 
    
T 0 1 10 −3 0 1 1 3
F4 = S F3 S = = ,
−1 0 −3 1 −1 0 3 10
 T     
T 1 −3 1 3 1 −3 1 0
F5 = T−3 F4 T−3 = = .
0 1 3 10 0 1 0 1
 
−3 10
We see that f0 ∼ f5 and that f5 (X, Y ) = X2 2
+ Y is reduced. Thus, if M = T−1 ST2 ST−3 = , then
2 −7
we have that M t F0 M = F5 .
Theorem 8.2.1 (Theorem 3.18, Niven) Let d ≡ 0 or 1 mod 4, with d not a perfect square. Then every
equivalence class of binary quadratic forms of discriminant d contains a reduced form.
as b2s
 
2 2
Proof : Let f0 (X, Y ) = a0 X + b0 XY + c0 Y have discriminant d, and for s ≥ 0 let Fs = bs , with Tn
2 cs
and S as above. Define an algorithm via:
(A) If |cs | < |as |, set Fs+1 = T T Fs T so that as+1 = cs , cs+1 = as , bs+1 = −bs .
(B) If |as | ≤ |cs | but |bs | ∈
/ (−|as |, |as |], then choose n ∈ Z so that 2as n + bs ∈ (−|as |, |as |]. Indeed, this choice
is unique by the division algorithm, writing

|as | − bs = (2as )q + r; set n = q.

Then set Fs+1 = TnT Fs Tn , so that

as+1 = as , bs+1 = 2as n + bs , cs+1 = as n2 + bs n + cs = fs (n, 1).

(C) If |as | = |cs | but bs < 0, then set Fs+1 = S T Fs S.

47
We observe that if a binary quadratic form does not satisfy the premises of (A), (B), or (C), then it is reduced;
thus it suffices to show that the algorithm terminates.
Since d is assumed not to be a perfect square we know that as 6= 0 for any s. We see that (A) is never followed
by (A), nor (B) by (B), nor (C) by (C), and moreover since the output of (C) is reduced by construction it
remains only to show that we cannot have an infinite loop (A) followed by (B) followed by (A), and so on. But
this is clear, since every time we apply step (A), |as | decreases, and so the well-ordering axiom implies that the
algorithm terminates.

Note that if d is a perfect square, then applying the above algorithm may obtain as = 0, meaning that none of
the steps (A), (B), or (C) is triggered unless as = bs = cs = 0.
Theorem 8.2.2 (Theorem 3.19, Niven) Let d ∈ Z with d not a perfect square, and let f (X, Y ) = aX 2 + bXY +
cY 2 be a reduced binary quadratic form of discriminant d. Then:
q
1. If d > 0 then ac < 0 and 0 < |a| < d2 .
q
2. If d < 0 then ac > 0 and 0 < |a| < |d|3 .

It is an immediate consequence of this theorem that there are only finitely many equivalence classes of bi-
nary quadratic forms of discriminant d, as there are only finitely many such reduced forms: indeed, we must
have
p b2 − d
0 ≤ |b| ≤ |a| ≤ |d|, c = .
4a

The proof will be given in the next lecture; today, we end with the following definition.
Definition: Let d ∈ Z with d not a perfect square. The number of equivalence classes of binary quadratic
forms of discriminant d is called the class number of d and is denoted H(d).

48
8.3 Lecture Twenty-Two

Recall theorem 8.2.2 from last time. Today, we prove the second assertion of the theorem.
Proof : (of Theorem 8.2.2, part (2)) Since d < 0 we know that ac > 0, as b2 − 4ac < 0, so in particular |a| > 0.
Then
|d| = −d = 4ac − b2 = 4|ac| − b2 .
Since f is reduced, we have that |b| ≤ |a| ≤ |c|, and so

4|ac| − b2 ≥ 4a2 − a2 = 3a2 ,


q
|d|
and we have that |a| ≤ 3 , as claimed.

Recall also the definition of the class number H(d) of d.
Example: We compute H(−7). We proceed by listing all reduced binary quadratic forms of discriminant −7
and then checking whether any are equivalent. 2 2
√ Theorem 8.2.2 shows that if f (X, Y ) = aX + bXY + cY is
reduced of discriminant −7, then 0 < |a| ≤ 73 < 2, hence a = ±1.
If |a| = |c| = 1 then we have −1 < b ≤ 1, and if |a| < |c| we have 0 ≤ b ≤ 1; that is, in both cases b ∈ {0, 1}.
2 −d
Calculating the possibilities for c = b 4a yields the following table:

a b c valid?
7
1 0 4 no
1 1 2 yes
−7
−1 0 4 no
−1 1 −2 yes

(where the last column indicates whether or not aX 2 + bXY + cY 2 is a valid binary quadratic form). It follows
from this that H(−7) ≤ 2. Since the discriminant is negative, it follows that both of the binary quadratic
forms
f (X, Y ) = X 2 + XY + 2Y 2 , g(X, Y ) = −X 2 + XY − 2Y 2
are (positive or negative) definite, and a calculation shows that f (1, 1) = 4 > 0, g(1, 1) = −2. Thus f is positive
definite, g is negative definite, and so in particular f 6∼ g and we have that H(−7) = 2.
Note that for any binary quadratic form of discriminant d, we have that d = b2 − 4ac ≡ b2 mod 2, so b must
have the same parity as d.
Example: Which primes are represented by the reduced form f found in our example above?
By theorem 7.2.2 we have that n is properly represented by some binary quadratic form of discriminant −7
if and only if there exists a solution to the congruence x2 ≡ −7 mod 4|n|. If n > 0, then x2 ≡ −7 mod 4n
implies that n is properly represented by f , since f is the only positive definite reduced binary quadratic form
of discriminant −7. Furthermore, if n = p is prime, then every representation of p is proper.
For p = 2, take (x, y) = (0, 1) so that f (x, y) = 2. For odd p, we see that f represents p if and only if
x2 ≡ −7 mod p has a solution, by the Chinese remainder theorem. If p = 7 this is clear; otherwise,
p
• If p ≡ 1 mod 4 then −7 −1 7
  
p = p p = 7 .
p
• If p ≡ 3 mod 4 then −7 −1 7
  
p = p p = 7 .

The quadratic residues modulo 7 are 1, 2, and 4; thus p is represented by f if and only if p ≡ 0, 1, 2 or
4 mod 7.

49
Theorem 8.3.1 (Theorem 3.25, Niven) Let f (X, Y ) = aX 2 + bXY + cY 2 , g(X, Y ) = a0 X 2 + b0 XY + c0 Y 2 be
reduced, positive definite binary quadratic forms. If f ∼ g, then f = g.
Proof : Exercise.
Consequently, if d < 0 then H(d) equals the number of reduced binary quadratic forms of discriminant d, which
is twice the number of such positive definite forms.
p
[Aside: there is also the notion of the class number of a number field; when d < 0, the class number of Q( −|d|)
equals 12 H(d).]

50
9 Week Nine

9.1 Lecture Twenty-Three

Recall: Theorem 8.3.1


Can we “compose” two binary quadratic forms? We can generalize the multiplication formula

(a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (ad + bc)2 .

Note that if z = a + ib, w = c + id are complex numbers, then the above formula states exactly that |z|2 |w|2 =
|zw|2 . Thus, the binary quadratic form f (X, Y ) = X 2 + Y 2 has a “composition law” given by

f (a, b)f (c, d) = f (ab − cd, ad + bc);

in particular, this implies that the set of numbers represented by f is multiplicatively closed. Can we generalize
this idea to arbitrary binary quadratic forms?
Example: Let d = −7. We saw last week that the single equivalence class of positive definite binary quadratic
forms of discriminant −7 is represented by the reduced form f (X, Y ) = X 2 + XY + 2Y 2 . We factor over the
complex numbers, using the quadratic formula:
√ ! √ !
1+i 7 1−i 7
f (a, b) = a + b a+ b .
2 2

Thus we are led to compute


√ ! √ ! √
1+i 7 1+i 7 1+i 7
a+ b c+ d = (ac − 2bd) + (ad + bc + bd),
2 2 2

which implies
f (a, b)f (c, d) = f (ac − 2bd, ad + bc + bd),
and again we see that the set of represented values is multiplicatively closed.
Example: Suppose d = −20. In assignment 4, we verify that there are exactly two positive definite reduced
binary quadratic forms of discriminant −20, namely

f+ (X, Y ) = X 2 + 5Y 2 , and f− (X, Y ) = 2X 2 + 2XY + 3Y 2 .

Observe that the set of values represented by f− is not multiplicatively closed, as indeed

f− (1, 0) = 2, f− (0, 1) = 3, but f− (x, y) 6= 6 for any x, y ∈ Z.

Indeed, we have the identity 4af− (x, y) = (2ax + by)2 − dy 2 , hence

8f− (x, y) = (4x + 2y)2 + 20y 2 ⇔ 2f− (x, y) = (2x + y)2 + 5y 2 ,

and thus f− (x, y) = 6 implies that (2x + y)2 + 5y 2 = 12, which is never satisfied, as can easily be verified
by checking possible values of x and y. In particular, this means that there is no multiplicative formula (or
“composition law”) for f− as there were for our previous examples.
Does such a formula exist for f+ ? The identity
√ √ √
(a + i 5b)(c + i 5d) = (ac − 5bd) + i 5(ad + bc)

51
implies
f+ (a, b)f− (c, d) = f+ (ac − 5bd, ad + bc).
We see that if we factor f− using the quadratic formula, we obtain
√ ! √ ! √ ! √ !
1+i 5 1−i 5 √ 1+i 5 √ 1+i 5
f− (a, b) = 2 a + b a+ b = 2a + b 2a + b .
2 2 2 2

Calculating as before, we obtain


√ ! √ !
√ 1+i 5 √ 1+i 5 √
2a + b 2c + d = (2ac + ad − 2bd) + i 5(ad + bc + bd),
2 2

which implies
f− (a, b)f− (c, d) = f+ (2ac + ad − 2bd, ad + bc + bd).
What happens if we consider the product f+ (a, b)f− (c, d)? The relevant calculation is
√ ! √
 √  √ 1+i 5 √ 1+i 5
a + i 5b 2c + d = 2(ac + 2bc − 3bd) + (ad + 2bc + bd),
2 2

hence
f+ (a, b)f− (c, d) = f− (ac + 2bc − 3bd, ad + 2bc + bd).
Thus we have obtained the following “multiplication table”:

f+ f−
f+ f+ f−
f− f− f+

The entries are understood to mean, for example, that the product of two numbers represented by f+ may also
be represented by f+ . In fact, this relation holds on the level of equivalence classes; that is, if f ∼ f+ , g ∼ f− ,
then f (a, b)g(c, d) = h(x, y) for some x, y linear combinations of a, b, c, d, and h ∼ f− .
In general, the set of equivalence classes of positive definite binary quadratic forms of negative discriminant is
a group under the operation of “multiplication” alluded to above. This is known as the class group.
This ends our discussion of binary quadratic forms; next, we will discuss arithmetic functions; that is,
complex-valued functions whose domain is N.

52
9.2 Lecture Twenty-Four

§4.2 – Arithmetic functions


Notation: Let τ (n) denote the number of positive divisors of n (also used is the notation d(n)).
Lemma 9.2.1 Let n have prime factorization n = pe11 pe22 · · · pekk . Any integer d divides n if and only if d =
ps11 ps22 · · · pskk , with 0 ≤ sj ≤ ej for every j.
Proof : Clearly, with n and d as above we see that n = d(pe11 −s1 pe22 −s2 · · · pkek −sk ). Conversely, if d|n and
s
p 6= pj is prime with p|d, then p - n, a contradiction. Finally if sj > ej and pj j |d, then pj | dej ; but pj - nej , a
pj pj
d n
contradiction, hence e - e , if and only if d - n, and we are done.
pj j pj j

One consequence of this lemma is that if n = pe11 pe22 · · · pekk , then

τ (n) = #{(s1 , s2 , . . . , sk ) : 0 ≤ sj ≤ ej } = (1 + e1 )(1 + e2 ) · · · (1 + ek ),

or more succinctly written, Y


τ (n) = (α + 1).
pα kn

Proposition 9.2.2 If (m, n) = 1, then τ (mn) = τ (m)τ (n).


This statement is false if (m, n) > 1; for example, τ (8) = 4 6= 6 = τ (2)τ (4).
Proof : We give two sketches, left as exercises.
1. The assertion follows from the multiplicative formula found above.
2. Divisors d of n are in one-to-one correspondence with pairs of integers (d, e) where de = n.

Definition: An arithmetic function f : N → C which is not identically zero is called multiplicative if,
whenever (m, n) = 1, we have f (mn) = f (m)f (n).
Proposition 9.2.2 shows that τ (n) is multiplicative, and from previous work we know that φ(n) is also multi-
plicative. Indeed, we used this property to prove the formula
Y 1

φ(n) = n 1− .
p
p|n

A similar example is given by the function

σf (n) = #{x mod n : f (x) ≡ 0 mod n},

where f (X) ∈ Z[X]. The Chinese remainder theorem tells us that σf (n) is multiplicative, and indeed we observe
that
φ(n) = σX φ(n) −1 (n).
Properties of multiplicative functions: Suppose f is a multiplicative function.
• For every n, we have the formula Y
f (n) = f (pα ).
pα kn

53
In particular, f is determined by its values on prime powers. Conversely, any set map

f : {pk : p prime, k ∈ N0 } → C

induces a multiplicative function.


• f (1) = 1. Indeed, since there must be some n with f (n) 6= 0, we have f (n) = f (1 · n) = f (1)f (n).
Definition: If an arithmetic function f , not identically zero, satisfies f (mn) = f (m)f (n) for every pair of
numbers m, n, then f is said to be totally multiplicative (or completely multiplicative).
Clearly, any totally multiplicative function is also multiplicative.
Example: For any λ ∈ R, the function fλ (n) = nλ is totally multiplicative. In particular, when λ = 0 we have
fλ = 1 for all n, and for λ = 1 we have fλ (n) = id(n) = n for every n.
Example: The iota function ι(n), defined
(
1 if n = 1,
ι(n) =
0 6 1,
if n =

is totally multiplicative.
Example: Let f (n) = (−1)n−1 , so that f (n) = 1 if n is odd and −1 if n is even. Then f is not totally
multiplicative, as for example
f (8) = −1 6= 1 = f (2)f (4);
(
1 if p is odd,
however, f (n) is multiplicative, and indeed f is induced by the map f (pα ) =
−1 if p = 2.
Example: The function f (n) = (−1)n is not multiplicative, and so in particular is not totally multiplica-
tive.
Theorem 9.2.3 (Theorem 4.4, Niven) Let f (n) be a multiplicative function and let
X
F (n) = f (d).
d|n

Then F (n) is also multiplicative.


Proof : As alluded to in the proof of proposition 9.2.2, divisors d of mn are in one-to-one correspondence with
ordered pairs (b, c), with bc = d, b|m, c|n. Thus, if (m, n) = 1, we have
X XX XX
F (mn) = f (d) = f (bc) = f (b)f (c)
d|mn b|m c|n b|m c|n
  
X X
= f (b)  f (c) = F (m)F (n),
b|m c|n

and we are done.



Example: Let f (n) = n0 = 1. Then X
F (n) = f (n) = τ (n),
d|n

giving another proof of the fact that τ is multiplicative. Note that f is totally multiplicative, while F (n) is
not.

54
9.3 Lecture Twenty-Five

Recall: Theorem 9.2.3.


Motivating questions:
P
• Is the converse of theorem 9.2.3 true? That is, if F (n) = d|n f (d) is multiplicative, must f (n) also be
multiplicative?
• Given F (n), how can we get information about f (n)?
P
Remark: Given any arithmetic function F , there is exactly one function f so that F (n) = d|n f (d). Indeed,
we set f (1) = 1 and recusively define the other values via
X
f (n) = F (n) − f (d).
d|n,
d<n

Example: We find the function f (n) satisfying


(
X 1 if n = 1,
f (d) = ι(n) =
d|n
0 if n > 1.

We calculate the first couple of values:

f (1) = 1, f (2) = F (2) − f (1) = 0 − 1 = −1.

Clearly, for any prime p we have

f (p) = F (p) − f (1) = −1, f (p2 ) = F (p2 ) − f (p) − f (1) = 0,

and indeed f (pk ) = 0 for k > 1. For composite numbers of the form pq where p, q are distinct primes, we
have
f (pq) = F (pq) − f (p) − f (q) − f (1) = 0 − (−1) − (−1) − 1 = 1 = f (p)f (q),
while for n = p2 q we have

f (p2 q) = F (p2 q) − f (p) − f (p2 ) − f (q) − f (pq) − f (1) = 0 = f (p2 )f (q).

The above calculations suggest that f is multiplicative, which motivates the following definition.
Definition: The Möbius function µ(n) is the multiplicative function satisfying, for every prime p,
(
−1 if α = 1,
µ(pα ) =
0 if α > 1.

Equivalently: if n is not squarefree, then µ(n) = 0. Otherwise, writing n = p1 p2 · · · pk with pj distinct primes,
one has µ(n) = (−1)k .
Notation: Denote by ω(n) the number of distinct prime divisors of n, and by Ω(n) the number of prime factors
of n counted with multiplicity.
For example, with n = 720 = 24 · 32 · 5, we have ω(n) = 3, Ω(n) = 4 + 2 + 1 = 5. With this notation, we may
define (
(−1)ω(n) if n is squarefree,
µ(n) =
0 otherwise.

55
Theorem 9.3.1 (Theorem 4.7, Niven) One has
X
µ(d) = ι(n).
d|n

This theorem is much more widely invoked than is the definition of µ(n).
Proof : We give two proofs.
1. Both sides of the equation are multiplicative by theorem 9.2.3, and we already know that both sides agree
when n is a prime power, from which we deduce the result.
2. By definition, X X
µ(d) = (−1)ω(d) ,
d|n d|n,
d squarefree
k

and so if ω(n) = k then there are exactly j squarefree divisors d of n with ω(d) = j. Thus

k  
(
X X k 1 if n = 1,
µ(d) = (−1)j = (1 − 1)k =
j 0 if n > 1,
d|n j=0

and we are done.



Theorem 9.3.2
P (Theorem 3.8, Niven; the Möbius inversion formula) Let f (n) be an arithmetic function and
let F (n) = d|n f (d). Then
X n
f (n) = µ(d)F .
d
d|n

For example, for any multiplicative function f (n), we have f (12) = F (12) − F (6) − F (4) + F (2).
Proof : The right-hand side of the equation is
X n X X X
µ(d)F = µ(d) f (δ) = µ(d)f (δ)
d n
d|n d|n δ| d dδ|n

X X X n
= f (δ) µ(d) = f (δ)ι = f (n),
δ
δ|n d| n
δ
δ|n

where we have used the result of theorem 9.3.1, and the result folllows.


56
10 Week Ten

10.1 Lecture Twenty-Six

Recall: The Möbius inversion formula.


Example: We have proven the identity X
n = id(n) = φ(d),
d|n

and so Möbius inversion implies that


X n X µ(d)n
φ(n) = µ(d)id = ;
d d
d|n d|n

that is,
φ(n) X µ(d)
= .
n d
d|n

Note that µ(d)


d is multiplicative, thus by theorem 9.2.3 we know that
φ(n)
n is multiplicative. Indeed, checking on
prime powers, we see for α ≥ 1 that

φ(pα ) pα−1 (p − 1) p−1 1


α
= α
= =1− ,
p p p p
and similarly
X µ(d) µ(1) µ(p) µ(p2 ) µ(pα ) (−1) 1
= + + 2
+ · · · + α
=1+ + 0 + ··· + 0 = 1 − .
α
d 1 p p p p p
d|p

Theorem 10.1.1 (Theorem 4.9, Niven) Let F (n) be an arithmetic function and define
X n
f (n) = µ(d)F .
d
d|n

Then X
F (n) = f (d).
d|n

Proof : We have  
 
X X X d 
f (d) =  µ(δ)F .
δ
d|n d|n δ|d

With d fixed, as δ ranges over the divisors of d, so does dδ . Thus


X X X d X X d
f (d) = µ F (δ) = µ F (δ).
δ δ
d|n d|n δ|d δ|n d|δ

d

Writing d = δ δ , we have
X X X d X n
f (d) = F (δ) µ = F (δ)ι = F (n),
d n
δ δ
d|n δ|n | δ|n
δ δ

57
and we are done.

Definition: Let f (n), g(n) b two arithmetic functions. Their Dirichlet convolution, denoted f ∗ g, is
defined X n
(f ∗ g)(n) = f (d)g .
d
d|n

Note that Dirichlet convolution is commutative, as


X n X n
(g ∗ f )(n) = g(d)f = g f (d) = (f ∗ g)(n).
d d
d|n d|n

Example: If g(n) = 1 for every n, then X


(f ∗ g)(n) = f (d).
d|n

(The function g is sometimes written 1.) In particular, this means that id = φ ∗ 1, ι = µ ∗ 1, and τ = 1 ∗ 1.
With this notation, we may restate the Möbius inversion formula as: F = f ∗ 1 if and only if f = F ∗ µ.
Theorem 10.1.2 If f and g are multiplicative functions, then f ∗ g is multiplicative.
Note that this theorem is a generalization of theorem 9.2.3.
Proof : If (m, n) = 1, then  mn 
X
(f ∗ g)(mn) = f (d)g .
d
d|mn

For each divisor d of mn, we may uniquely factor d = d1 d2 with d1 |m and d2 |n. Thus
  XX    
XX mn m n
(f ∗ g)(mn) = f (d1 d2 )g = f (d1 )g f (d2 )g
d1 d2 d1 d2
d1 |m d2 |n d1 |m d2 |n
  
   
X m  X n 
= f (d1 )g f (d2 )g = (f ∗ g)(m)(f ∗ g)(n),
d1 d2
d1 |m d2 |n

as claimed.

[Structural remarks: Let A = {f : N → C} be the set of arithmetic functions and let A× = {f ∈ A : f (1) 6= 0};
then (A× , ∗) forms an abelian group. In this group, ι is the identity and 1−1 = µ, which yields yet another
statement of the Möbius inversion formula:

F = f ∗ 1 ⇔ µ ∗ F = µ ∗ (f ∗ 1) = f ∗ (µ ∗ 1) = f ∗ ι = f.

Moreover, by theorem 10.1.2, the set of multiplicative functions forms a subgroup.]


Example: Let (
1 if n is a perfect square,
s(n) =
0 otherwise;
we will identify s ∗ (µ2 ).

58
Note that s is multiplicative, and is characterized by
(
1 if 2 | α,
s(pα ) =
0 if 2 - α.

Moreover, µ2 is multiplicative, as the product of two multiplicative functions; hence f = s ∗ (µ2 ) is also
multiplicative. We compute:
X  pα 
α
f (p ) = s µ2 (d) = s(pα )µ2 (1) + s(pα−1 )µ2 (p) + · · · + s(1)µ2 (pα ) = s(pα ) + s(pα−1 ) = 1.
α
d
d|p

So f (pα ) = 1 for every α ≥ 1, and it follows that s ∗ (µ2 ) = 1.


Note that µ2 is the characteristic function of squarefree numbers, and indeed we see
X
(s ∗ µ2 )(n) = s(a)µ2 (b) = #{a, b ∈ N : ab = n, a = s2 some s, b squarefree } = 1.
ab=n

Thus there is a unique way to factor any n ∈ N as n = n0 s2 where n0 is squarefree. For example, if n = 2·32 ·53 ·74 ,
we have n = (2 · 5)(3 · 5 · 72 )2 .

59
10.2 Lecture Twenty-Seven

Properties of Möbius inversion:


• We do not assume multiplicativity of the functions; that is, the inversion formula holds for any arithmetic
functions.
X
• If F (n) = f (d) and F (n) is multiplicative, then so is f (n), as f = F ∗ µ.
d|n

Recall: Dirichlet convolution.


When n = pα is a prime power, then
X n
(f ∗ g)(pα ) = f (d)g = f (1)g(pα ) + f (p)g(pα−1 ) + · · · + f (pα )g(1).
α
d
d|p

Let us assign names to these values, so that f (1) = a0 , f (p) = a1 , f (p2 ) = a2 , . . ., and similarly g(1) = b0 , g(p) =
b1 , g(p2 ) = b2 , . . . We obtain the following table:

α f (pα ) g(pα ) (f ∗ g)(pα )


0 a0 b0 a0 b0
1 a1 b1 a0 b1 + a1 b0
2 a2 b2 a0 b2 + a1 b1 + a2 b0
3 a3 b3 a0 b3 + a1 b2 + a2 b1 + a3 b0

We observe the similarity with the coefficients of the product of power series:

! ∞ ! ∞
X X X
α α α α
f (p )X g(p )X = (f ∗ g)(pα )X α .
α=0 α=0 α=0

Example: Find an arithmetic function f such that

φ(n) X
= f (d),
n
d|n

forgetting that we found it in the previous lecture.


Let F (n) = φ(n)
n , so that F = f ∗ 1. By Möbius inversion we know that f = F ∗ µ and that f is multiplicative,
since F is. Thus we have a table as before:
α F (pα ) µ(pα ) f (pα )
0 1 1 1
−1
1 1 − p1 −1 p
2 1 − p1 0 0
3 1 − p1 0 0

We see that f is the multiplicative function generated by


(
−1
if α = 1,
f (pα ) = p
0 if α > 1.

µ(n)
That is, f (n) = n , as before.

60
Example: Define a multiplicative function r via



 2 if p ≡ 1 mod 4,

0 if p ≡ 3 mod 4,
r(pα ) =


 1 if p = 2 and α = 1,

0 if p = 2 and α > 1.

Now, define R = r ∗ s, where s is the indicator function of the perfect squares from lecture twenty-six; note that
R is multiplicative. Determine the values of R(pα ).
[Aside: Theorem 3.2.2 of Niven tells us that the number of proper representations of n by the binary quadratic
form X 2 + Y 2 equals 4r(n). In the statement of theorem 6.3.3 originally given, there was an error, in that we
forgot the necessary condition that 4 - n.
2 2
Note also that any representation x2 + y 2 = n corresponds to a proper representation xd + yd = dn2 , where
d = (x, y). Thus if Sn denotes the set of representations of n by X 2 + Y 2 , and Snp ⊂ Sn denotes the subset of
proper representations, then
X p
X n X n
#Sn = #Sn/g2 = 4r = 4 r s(d) = 4(r ∗ s)(n) = 4R(n).
2 2
g2 d
g |n g |n d|n

Note in particular that Niven’s functions R and r correspond to our 4R and 4r, respectively.]
First, we assume that p ≡ 1 mod 4. We get the table

α r(pα ) s(pα ) R(pα )


0 1 1 1
1 2 0 2
2 2 1 3
3 2 0 4
4 2 1 5
5 2 0 6

In fact, we can prove that R(pα ) = α + 1 for any p ≡ 1 mod 4: if α is even then
α
X X X α
α j α−j α j
R(p ) = r(p )s(p ) = r(1)s(p ) + r(p ) = 1 + 2=1+2 = α + 1.
2
j=0 1≤j≤α, 1≤j≤α,
α even α even

A similar proof works for α odd, and is left as an exercise. Now, suppose p ≡ 3 mod 4; we obtain

α r(pα ) s(pα ) R(pα )


0 1 1 1
1 0 0 0
2 0 1 1
3 0 0 0
4 0 1 1
5 0 0 0

61
On these primes, r acts like s, so the restriction of r ∗ s to the primes congruent to 3 modulo 4 is simply s.
Finally, suppose p = 2; the table this time is

α r(pα ) s(pα ) R(pα )


0 1 1 1
1 1 0 1
2 0 1 1
3 0 0 1
4 0 1 1
5 0 0 1

On these prime powers, r acts like µ2 , so R acts like µ2 ∗ s = 1. Thus we conclude that R is the multiplicative
function generated by 


 α+1 if p ≡ 1 mod 4,

1 if p ≡ 3 mod 4 and α is even,
R(pα ) =
0

 if p ≡ 3 mod 4 and α is odd,

1 if p = 2.
One consequence of this fact is that R(n) = 0, or

R(n) = #{d : d|n and p|d ⇒ p ≡ 1 mod 4}.

62
10.3 Lecture Twenty-Eight

Example: Let R(n) be the multiplicative function from the last lecture, generated by

α + 1 if p ≡ 1 mod 4,



α
1 if p ≡ 3 mod 4 and α is even,
R(p ) =


 0 if p ≡ 3 mod 4 and α is odd,

1 if p = 2.
X
Find a function g such that R(n) = g(d).
d|n

nb. We defined
X  n  X n
R(n) = r = r s(d).
2
g2 d
g |n d|n

Note that, since R = g ∗ 1, the Möbius inversion formula implies that g = R ∗ µ, and since R and µ are both
multiplicative, we know that g is as well. We observe that
X  pα 
α
g(p ) = R µ(d) = R(pα )µ(1) + R(pα−1 )µ(p) + · · · + R(1)µ(pα ) = R(pα ) − R(pα−1 ).
α
d
d|p

Thus:
• If p ≡ 1 mod 4 then g(pα ) = (α + 1) − α = 1.
(
α 1−0=1 if α is even,
• If p ≡ 3 mod 4 then g(p ) =
0 − 1 = −1 if α is odd.
• If p = 2 then g(pα ) = 1 − 1 = 0.
Remarks:
• Since g(pα ) = g(p)α for every prime p and positive integer α, it follows that g is totally multiplicative.
• On odd primes, g(p) equals the Legendre symbol −1

p , and hence on odd n, g(n) equals the Jacobi symbol
n−1
−1

n . Thus, for odd n, g(n) = (−1)
2 .

Consequently, X
R(n) = g(d) = #{d|n : d ≡ 1 mod 4} − #{d|n : d ≡ 3 mod 4}.
d|n

P
Some miscellany: Recall that σ(n) = d|n d = 1∗ id. The Greeks defined a perfect number to be a number
n whose proper divisors sum to n itself; that is, a number satisfying

n = σ(n) − n ⇔ σ(n) = 2n.

For example, 6 is perfect, as 6 = 1 + 2 + 3, as is 28 = 1 + 2 + 4 + 7 + 14. The next perfect number is 496, then
8128. Note that σ(n) is multiplicative, and that

pα+1 − 1
σ(pα ) = 1 + p + p2 + · · · + pα = .
p−1

63
We see equivalently that n is a perfect number if and only if

σ(n) Y pα+1 − 1
2= = .
n α
pα (p − 1)
p kn

Let us factor the first three perfect numbers:

6 = 2 · 3 = 21 (22 − 1), 28 = 22 · 7 = 22 (23 − 1), 496 = 24 · 31 = 24 (25 − 1).

This motivates our next result.


Theorem 10.3.1 If q = 2p − 1 is prime, then n = 2p−1 q is a perfect number.
Recall from a homework problem that if 2k − 1 is prime, then k must be prime, although this is not a sufficient
condition as e.g. 211 − 1 = 2047 = 23 · 89.
Proof : We give two.
(1) By multiplicativity,

σ(2p−1 q) = σ(2p−1 )σ(q) = (2p − 1)(q + 1) = 2p (2p − 1) = 2(2p−1 )(2p − 1) = 2(2p−1 q),

and we are done.


(2) We simply verify that the divisors of 2p−1 q, namely

1, 2, 22 , . . . , 2p−1 , q, 2q, 22 q, . . . , 2p−1 q,

sum to 2(2p−1 q).



We know exactly 48 numbers of this form, and note that all such numbers by construction are even. The
following theorem gives the converse statement.
Theorem 10.3.2 If n is an even perfect number, then n = 2p−1 (2p − 1), where both p and 2p − 1 are prime.
Proof : Write n = 2k−1 m where k ≥ 2 and m odd. If n is perfect, then

2k m = 2n = σ(n) = σ(2k−1 )σ(m) = (2k − 1)σ(m).

Hence (2k − 1)|2k m, so by Euclid’s lemma we have that (2k − 1)|m. Writing m = (2k − 1)l, we have 2k l = σ(m);
but l and m are both divisors of m, so

σ(m) ≥ m + l = (2k − 1)l + l = 2k l.

Thus we have the equality


2k m
σ(m) = = 2k l = (2k − 1)l + l = m + l,
2k − 1
so m has exactly two divisors m and l, which are distinct because k ≥ 2, and we must have l = 1. It follows
that m = 2k − 1 is prime.

Some open conjectures:
1. There are infinitely many Mersenne primes (that is, primes of the form 2p − 1 with p prime), and hence
infinitely many even perfect numbers.
2. There are no odd perfect numbers.

64
11 Week Eleven

11.1 Lecture Twenty-Nine

Diophantine approximation is the technique of finding rational numbers near given real numbers. One
fundamental fact of Diophantine approximation that we will use frequently is that, if n ∈ Z and n 6= 0,
then |n| ≥ 1.
Example: Define

X 1
e= ;
n!
n=0

we will prove that e is irrational. Indeed, assume not, and choose a, b ∈ Z, b > 0 such that e = ab . Then be ∈ Z
and so in particular b!e ∈ Z. Thus we define
b ∞
X b! X 1
m = b!e − = b! ∈ Z.
n! n!
n=0 n=b+1

Clearly m > 0, and moreover in the last sum we see that every term is at most half the previous term, thus
∞ ∞
X 1 X 1 1 2b! 2
m = b! < b! · n−(b+1) = = ≤ 1.
n! (b + 1)! 2 (b + 1)! b+1
n=b+1 n=b+1

That is,
m ∈ Z and 0 < m < 1,
which is a contradiction. Thus e ∈
/ Q.

Lemma 11.1.1 If ab , dc are distinct rational numbers, then ab − dc ≥ 1

|bd| .

Proof : This follows from the basic rules of arithmetic:



a b ad − bc 1
− =
c d bd ≥ |bd| .


Theorem 11.1.2 (Theorem 6.8, Niven; Dirichlet’s theorem on Diophantine approximation) Let x ∈ R, n ∈ N.
Then there exists ab ∈ Q with 1 ≤ b ≤ n and |x − ab | ≤ b(n+1)
1
.

nb. It is slightly easier to prove the bound |x − ab | < bn


1 1
or b(n−1) , but the inequality in the theorem statement
c
is the best possible result; indeed, we attain equality with x = n+1 , (c, n + 1) = 1.
Proof : Define the fractional part of y to be {y} = y − byc ∈ [0, 1). Consider the n real numbers
{x}, {2x}, . . . , {nx} and the n + 1 subintervals
     
1 1 2 n
0, , , ,..., ,1 ,
n+1 n+1 n+1 n+1

1 a bjxc
whose disjoint union is [0, 1). If some {jx} ∈ [0, n+1 ), then let b = j ; we have

a jx bjxc {jx} 1 1
x − = − = < = .

b j j j j(n + 1) b(n + 1)

65
n a bjxc+1
Similarly, if some {jx} ∈ [ n+1 , 1) then we may take b = j , and we have
 
a bjxc + 1 jx 1 − {jx} 1
n+1 1
− x = − = < = .

b j j j j b(n + 1)

Finally, if neither of these cases occur, then by the pigeonhole principle there exists some subinterval containing
1
{jx} and {kx} with j < k (say), so that |{jx} − {kx}| < n+1 . Then, with a = bkxc − bjxc, b = k − j, we
have  
1
a (k − j)x bkxc − bjxc |{kx}{jx}|
n+1
x − = − = < ,

b b b b b
and we are done.

a
Corollary 1: If x ∈ R \ Q, then there exist infinitely many b ∈ Q such that |x − ab | < 1
b2
.
Proof : Theorem 11.1.2 gives, for every n ∈ N, a rational number abnn with 1 ≤ bn ≤ n and

an 1 1
0 < x − ≤ < 2.
bn bn (n + 1) bn

/ Q, we know that |x − abnn | =


Since x ∈ 6 0, so any given ab can equal only finitely many of the terms an
bn , since

an
lim x − = 0.

n→∞ bn

We may generalize lemma 11.1.1 as follows:
a a a 1
 
Lemma 11.1.3 Let p(X) ∈ Z[X] have degree d and let b ∈ Q. If p b 6= 0, then |p b |≥ bd
.
Proof : If p(X) = cd X d + cd−1 X d−1 + · · · + c1 X + c0 , where ci ∈ Z, cd 6= 0, then
a
bd p = cd ad + cd−1 ad−1 b + · · · + c1 abd−1 + c0 bd ∈ Z.
b
Hence if p ab 6= 0, then |bd p ab | ≥ 1, and the result is immediate.
 


Definition: Let α ∈ R. We say that α is algebraic of degree d if there exists an irreducible polynomial
p(X) ∈ Z[X] such that p(α) = 0. If α is not algebraic, then α is said to be transcendental.

For example, 2 is algebraic of degree 2, as it is a root of X 2 − 2. Furthermore, α is algebraic of degree 1 if
and only if α ∈ Q.
Theorem 11.1.4 (Liouville’s theorem on Diophantine approximation) Let α be algebraic of degree d. Then
there exists some constant C = C(α) > 0 such that, for any ab ∈ Q, ab 6= α, we have
a C(α)
α − ≥ d .

b b

Proof : By taking C(α) ≤ 1 we may assume that ab satisfies |α− ab | ≤ 1. Choose p(X) ∈ Z[X] to be irreducible
of degree d and such that p(α) = 0. Then we must have p ab 6= 0 and so by lemma 11.1.3 that |p ab | ≥ b1d .


But  a   a  a
p = p − p(α) = − α p0 (t),

b b b

66
for some t between α and ab , by the mean value theorem. Thus, taking

1
C(α) = ,
max{p0 (t) : t ∈ [α − 1, α + 1]}

we obtain
1  a  a
0
a 1
≤ p = − α p (t) ≤ − α · ,

bd
b

b

b C(α)
and we are done.

It was using this theorem that Liouville first demonstrated (1844) the existence of transcendental numbers.
This work preceded by several decades Cantor’s investigation of uncountable sets, which yields a simpler albeit
non-constructive proof of the existence of transcendental numbers.

67
11.2 Lecture Thirty

Recall: Theorem 11.1.2.


It is a trivial consequence of this theorem that the number

X
α= 10−n! = 0.11000100 . . .
n=1

is transcendental. Indeed, define


k
ak X
= 10−n! ,
bk
n=1

so that bk = 10k! and thus ∞


α − ak =
X
10−n! .

bk
n=k+1

We note that each summand is at most half the previous one, thus
∞ ∞
α − ak = 1 2
X X
−n!
10−(k+1)! n−(k+1) = (k+1)! .

10 ≤
bk 2 10
n=k+1 n=k+1

If α were algebraic of degree d, then for some constant C(α) > 0 we would have

C(α) ak 2
d
≤ α − ≤ k+1 ,
bk bk bk

and thus bk+1−d


k ≤ 2
C(α) . Taking k → ∞ yields a contradiction, and so we see that α cannot be algebraic.
a
Recall: Last lecture we showed that for all α ∈ R\Q there are infinitely many b ∈ Q such that |α− ab | < 1
b2
.
Theorem 11.2.1 (Roth’s theorem) If α is algebraic, then for any  > 0 there exists some constant C = C(α, )
such that a C(α, ) a
α − ≥ 2+ , for all ∈ Q.

b b b

§6.1 – Farey sequences


Given n ∈ N, the Farey fractions of order n are those ab ∈ Q such that 1 ≤ b ≤ n and 0 ≤ a ≤ b; that
is,
a
Fn = { : 1 ≤ b ≤ n, 0 ≤ a ≤ b} ⊂ Q ∩ [0, 1].
b
Usually the set is thought of as being totally-ordered. For example,
 
0 1 1 1 2 1 3 2 3 4
F5 = , , , , , , , , , ,1 .
1 5 4 3 5 2 5 3 4 5

If we know the first few elements of Fn , how can we compute the next?
Proposition 11.2.2 Let a
b ∈ Fn with a 6= b. The next element of Fn after a
b is x
y, where y ≡ −a−1 mod b, n −
ay+1
b < y ≤ n, and x = b .

68
Proof : Since ay + 1 ≡ a(−a−1 ) + 1 ≡ 0 mod b, we know that x ∈ Z. Moreover since y ≤ n and 1 ≤ y(b − a),
we know
x ay + 1 by
= ≤ = 1,
y by by
x c a c
and thus y ∈ Fn . Now, suppose d ∈ Fn with b < d < xy . Then
 
x c c a  bx − ay 1
− + − = = .
y d d b yb yb

But by lemma 11.1.1, we know that


  
x c c a 1 1 y+b n+1 1 n+1 1
− + − ≥ + = ≥ ≥ · > ,
y d d b yd db ybd ybd yb n yb

which is a contradiction, and we are done.



a x
Corollary 1: If b < y are consecutive Farey fractions (for any fixed n), then xb − ay = 1.
a c x c a+x
Corollary 2: If b < d < y are consecutive Farey fractions, then d = b+y .

For example,  
0 1 1 1 2 3
F4 = , , , , , ,1 .
1 4 3 2 3 4
The fractions of F5 \ F4 are exactly
1 0+1 2 1+1 3 1+2 4 3+1
= , = , = , = ,
5 1+4 5 3+2 5 2+3 5 4+1
which are seen to lie in the respective intervals
       
0 1 1 1 1 2 3 1
, , , , , , , .
1 4 3 2 2 3 4 1

Next lecture, we will use the Farey fractions to give an alternate proof of Dirichlet’s theorem.

69
11.3 Lecture Thirty-One
b c
In the Farey fractions Fn of order n, we have that if r < s are consecutive, then
b b+c c
rc − sb = 1 and < < with r + s ≥ n + 1.
r r+s s
Indeed, the condition r + s ≥ n + 1 is necessary for our second result, otherwise the middle fraction is itself a
Farey fraction, a contradiction.
Recall: Dirichlet’s theorem on Diophantine approximation (theorem 11.1.2), which states that if x ∈ R, n ∈ N,
then there exists aq ∈ Q with 1 ≤ q ≤ n and |x − aq | ≤ q(n+1)
1
.
a b c
Proof : If α ∈ Fn , then take q = α. Otherwise, choose r < s to be consecutive in Fn such that

b c
<α< ,
r s
by replacing α with {α} if necessary. We now have two cases.
1. Suppose
b b+c
<α≤ ,
r r+s
a
and take q = rb . We have


α − b b + c b cr − bs 1 1
≤ − = = ≤ ,
r r+s r r(r + s) r(r + s) r(n + 1)
and by assumption 1 ≤ r ≤ n.
2. If instead we have
b+c c
≤α< ,
r+s s
a
we instead take q = sc , and the proof unfolds in the same way.

§7.1 – The Euclidean algorithm
We can think of continued fractions as a consequence of the Euclidean algorithm.
Example: We find (76, 26). Simple calculation shows

73 = 2 · 26 + 21,
26 = 1 · 21 + 5,
21 = 4 · 5 + 1,
5 = 5 · 1 + 0.

Note also that


73 21 1 1
=2+ =2+ =2+ 5 .
26 26 (26/21) 1 + 21
Continuing in this fashion, we have

73 1 1
=2+ =2+ .
26 5 1
1+ 1+
21 1
4+
5

70
This is an example of the type of expression we will now study.
Definition: A continued fraction is an expression of the form

1
x0 + ,
1
x1 +
1
x2 +
.. 1
.+
xj

where xi ∈ R and x0 , x1 , . . . , xj > 0; we will mostly be interested in the situation when xi ∈ Z for every i. We
have the shorthand notation hx0 ; x1 , x2 . . . , xj i. For example,
   
76 26 21
= 2; = 2; 1, = h2; 1, 4, 5i .
23 21 5

Example: Find a simple expression for h1; 3, 1, 5, xi as a function of x > 0. We have

1 1 1 6x + 1 29x + 5
h1; 3, 1, 5, xi = 1 + =1+ =1+ =1+ = .
1 1 5x + 1 23x + 4 23x + 4
3+ 3+ 3+
1 x 6x + 1
1+ 1+
1 5x + 1
5+
x
We may write the above calculation more compactly as
       
5x + 1 6x + 1 23x + 4 29x + 5
h1; 3, 1, 5, xi = 1; 3, 1, = 1; 3, = 1; = .
x 5x + 1 6x + 1 23x + 4

Some useful identities:


• hx0 ; x1 , x2 , . . . , xj i = x0 + hx1 ;x2 ,x13 ,...,xj i .
D E
1
• hx0 ; x1 , x2 , . . . , xj i = x0 ; x1 , x2 , . . . , xj−2 , xj−1 + xj .

Example: We find a fraction between 14 73


5 = 2.8 and 26 = 2.8076923, with minimal denominator. Note that
14 76
5 = h2; 1, 4i and 23 = h2; 1, 4, 5i. The function x 7→ h2; 1, 4, xi for x > 0 is a decreasing function of x and
satisfies
73 14
f (5) = , lim f (x) = .
26 x→∞ 5
Thus taking x = 6 we have
87
f (6) = h2; 1, 4, 6i = = 2.8064 . . .
31
14+73 14 73
It is no coincidence that this is the Farey mediant 5+26 of 5 and 26 in F31 .
It is not difficult to see that
f (x0 , x1 , . . . , xk ) = hx0 ; x1 , x2 , . . . , xk i
is an increasing function of xj for every even j and a decreasing function of xj for every odd j. Thus if ai , bi ∈ Z,
we have that
ha0 ; a1 , a2 , . . . , ak i < hb0 ; b1 , b2 , . . . , bk i
if and only if

71
• a0 < b0 , or
• a0 = b0 and a1 > b1 , or
• a0 = b0 and a1 = b1 and a2 < b2 , or . . .
Thus we have an alternating lexicographic ordering on the integral continued fractions. To compare
ha0 ; a1 , a2 , . . . , ak i to ha0 ; a1 , a2 , . . . , al i with k < l, we write, formally,

ha0 ; a1 , a2 , . . . , ak i = ha0 ; a1 , a2 , . . . , ak , ∞i .

Finally since we may always write, for example,


1
4=3+ ⇒ h2; 1, 4i = h2; 1, 3, 1i ,
1
we remark on the special case

ha0 ; a1 , a2 , . . . , ak i = ha0 ; a1 , a2 , . . . , ak − 1, 1i .

Notation: For the Euclidean algorithm applied to the pair (u0 , u1 ), we write

u0 = u1 a0 + u2 , 0 < u2 < u1 ,

u1 = u2 a1 + u3 , 0 < u3 < u2 ,
..
.
uk−1 = uk ak−1 + uk+1 , 0 < uk+1 < uk ,
uk = uk+1 ak + uk+2 , 0 = uk+2 < uk+1 .
We call the ai coefficients partial quotients. We have equivalently
 
u0 1 u0
= a0 + , a0 = ,
u1 u1 /u2 u1
 
u1 1 u1
= a1 + , a1 = ,
u2 u2 /u3 u2
..
.
 
u0 uk uk
= ak , a k = = .
u1 uk+1 uk+1
Similarly, we have for example
u1 1 1
= n o = u0 − a0 .
u2 u0
u1
u1

72
12 Week Twelve

12.1 Lecture Thirty-Two

The Process: Given ξ ∈ R, define ξ0 = ξ and set


1 1
a0 = bξ0 c, ξ1 = = ,
ξ0 − a 0 {ξ0 }
1 1
a1 = bξ1 c, ξ2 = = ,
ξ1 − a 1 {ξ1 }
and so on. We saw in our last lecture that if ξ = m
n , then The Process is exactly the Euclidean algorithm
applied to find (m, n); in particular, The Process eventually terminates. Conversely, if ξ ∈ R \ Q, then The
Process never terminates. Furthermore, we see that

ξ = hξi = ha0 ; ξ1 i = ha0 ; a1 , ξ2 i = · · ·

The numbers aj are called the partial quotients of ξ.



Example: Let ξ = 3 2 = 1.25992 . . . We have ξ0 = ξ, and

3 1
a0 = b 2c = 1, ξ1 = = 3.84732 . . .
ξ0 − 1
1
a1 = bξ1 c = 3, ξ2 = = 1.18019 . . .
ξ1 − 3
1
a2 = bξ2 c = 1, ξ3 = = 5.54974 . . .
ξ2 − 1
1
a3 = bξ3 c = 5, ξ4 = = 1.81905 . . .
ξ3 − 5
We have that

3 29ξ4 + 5
2 = h1; 3, 1, 5, ξ4 i = ;
23ξ4 + 4
solving this expression for ξ4 , we obtain √
432−5
ξ4 = √ .
−23 3 2 + 29

Definition: Given a0 ∈ Z, a1 , a2 ∈ N, define recursively the sequences

h−2 = 0, h−1 = 1, hj = aj hj−1 + hj−2 for j ≥ 0,

k−2 = 1, k−1 = 0, kj = aj kj−1 + kj−2 for j ≥ 0.


hj
Furthermore for j ≥ 0 define rj = kj ; if the coefficients aj are those found in The Process applied to ξ ∈ R,

then rj is called the jth convergent to ξ. Continuing from our last example, the partial quotients of 3 2 are
1, 3, 1, 5, . . . We have the following table:
j aj hj kj rj
−2 0 1 0
−1 1 0 ∞
0 1 1 1 1
4
1 3 4 3 3
5
2 1 5 4 4
29
3 5 29 23 23

73
Note that r0 = 1, r1√= 1.3333 . . . , r2 = 1.25, r3 = 1.26087 . . ., so that the convergents are indeed good rational
approximations to 3 2 = 1.25992 . . ..
Theorem 12.1.1 (Theorem 7.3, Niven) For any x > 0, we have that

xhj−1 + hj−2
ha0 ; a1 , a2 , . . . , aj−1 , xi = .
xkj−1 + kj−2

In particular,
aj hj−1 + hj−2 hj
ha0 ; a1 , a2 , . . . , aj−1 , aj i = = .
aj kj−1 + kj−2 kj

x·1+0
Proof : We use induction. In the j = 0 case we have that hxi = 0·x+1 which is clearly so, and thus we may
assume the claim holds up to j. We have

1 (aj + x1 )hj−1 + hj−2


ha0 ; a1 , a2 , . . . , aj , xi = ha0 ; a1 , a2 , . . . , aj−1 , aj + i=
x (aj + x1 )kj−1 + kj−2

(aj hj−1 + hj−2 )x + hj−1 xhj + hj−1


= = .
(aj kj−1 + kj−2 )x + kj−1 xkj + kj−1

Example: Suppose aj = 1 for all j ≥ 0. Then hj = Fj+2 , kj = Fj+1 , where Fn are the Fibonacci numbers
Fn = Fn−1 + Fn−2 normalized so that F0 = 0, F1 = 1. In particular,

Fj+1 j→∞
h1; 1, 1, . . . , 1i = −→ ϕ,
| {z } Fj
j copies

1+ 5
where ϕ = 2 = 1.618033 . . . is the golden ratio.
Theorem 12.1.2 (Theorem 7.5, Niven) For j ≥ −1 one has hj kj−1 − kj hj−1 = (−1)j−1 . In particular, this
means that (hj , kj ) = 1 for every j and that

(−1)j−1
rj − rj−1 = .
kj kj−1

Proof : Exercise. (hint: use induction)


From the last equation, we know that rj > rj−1 if and only if j is odd.
Theorem 12.1.3 (Convergence of convergents) Let ξ ∈ R and let a0 , a1 , a2 , . . . be its partial quotients, with
ξj , hj , kj , rj defined as above. Then
(−1)j
ξ − rj = ,
kj (ξj+1 kj + kj−1 )
and in particular lim rj = ξ.
j→∞

Proof : We apply theorems 12.1.1 and 12.1.2 to obtain


ξj+1 hj + hj−1 hj
ξ − rj = ha0 ; a1 , a2 , . . . , aj , ξj+1 i − rj = −
ξj+1 kj + kj−1 kj

hj−1 kj − hj kj−1 (−1)j


= = ,
kj (ξj+1 kj + kj−1 kj (ξj+1 kj + kj−1 )

74
and we are done.

Note that aj+1 ≤ ξj+1 < aj+1 + 1. Given n ∈ N, then choosing j so that kj ≤ n < kj+1 , then we can show
that
ξ − hj ≤ 1

.
kj kj (n + 1)
Thus every convergent rj confirms Dirichlet’s theorem on Diophantine approximation. We may also restate the
theorem thus:
ξ − hj = 1 · 1 kj−1

2 , where aj+1 ≤ ξj+1 + ≤ aj+1 + 2.
kj kj ξj+1 + kj−1 /kj kj
Hence, the greater aj+1 , the better the approximation rj = ha0 ; a1 , a2 , . . . , aj i is to ξ.

75
12.2 Lecture Thirty-Three

Recall: Theorem 12.1.1 tells us that


ξj hj−1 + hj−2
ξ= ,
ξj kj−1 + kj−2
from which it follows that
ξkj−2 − hj−2
ξj = .
−ξkj−1 + hj−1

Example: Let ξ = ξ0 = 41 = 6.4312 . . . We see that
√ 1
a0 = b 41c = 6, ξ1 = = 2.48062 . . .
ξ0 − 6
1
a1 = bξ1 c = 2, ξ2 = = 2.08062 . . .
ξ1 − 2
1
a2 = bξ2 c = 2, ξ3 = = 12.40312 . . .
ξ2 − 2
We have the table:
j aj hj kj
−2 0 1
−1 1 0
0 6 6 1
1 2 13 2
2 2 32 5
Thus
ξk−1 − h−1 1
ξ1 = =√ ,
−ξk0 + h0 41 − 6

ξk0 − h0 41 − 6
ξ2 = = √ ,
−ξk1 + h1 −2 41 + 13

ξk1 − h1 2 41 − 13
ξ3 = = √ .
−ξk2 + h2 −5 41 + 32
Rationalizing denominators, we obtain
√ √
1 41 + 6 41 + 6
ξ1 = √ ·√ = ,
41 − 6 41 + 6 5
√ √ √
41 − 6 2 41 + 13 4 + 41
ξ2 = √ · √ = ,
−2 41 + 13 2 41 + 13 5
√ √
2 41 − 13 5 41 + 32 √
ξ3 = √ · √ = 6 + 41.
−5 41 + 32 5 41 + 32
√ √
We see that41 = h6; 2, 2, 6 + 41i, hence
√ √ √
6 + 41 = h12; 2, 2, 6, 6 + 41i = h12; 2, 2, 12, 2, 2, 6 + 41i = · · ·
√ √
Thus 41 = h6; 2, 2, 12i; that is, 41 has a periodic continued fraction.
Lemma 12.2.1 If the continued fraction of ξ ∈ R is eventually periodic, then ξ is a quadratic irrational,
i.e. it is the root of some quadratic polynomial with integer coefficients.

76
Proof : For simplicity we will assume that the continued fraction is purely periodic, although the stronger claim
is true; that is, assume
ξ = ha0 ; a1 , a2 , . . . , aj−1 i.
Then
ξhj−1 + hj−2
ξ = ha0 ; a1 , a2 , . . . , aj−1 , ξi = ,
ξkj−1 + kj−2
hence ξ(ξkj−1 + kj−2 ) = ξhj−1 hj−2 , and so

kj−1 ξ 2 + (kj−2 + hj−1 )ξ − hj−2 = 0.



Lemma 12.2.2 Every real quadratic irrational

r + s c, where r, s ∈ Q and c ∈ N is not a perfect square
m+ d
(written c ∈ N \ N ) can be written q , where m, q ∈ Z, d ∈ N \ N2 , and q|(d − m2 ).
2

Proof : Taking a common denominator for r and s, we may write


√ √ √
√ a+b c a + cb2 ae + cb2 e2
r+s c= = = ,
e e e2
and the claim is now immediate.


m0 + d
The Quadratic Irrational Process: Let ξ = ξ0 = q0 , where d, m0 , and q0 satisfy the conditions of
lemma 12.2.2. For j ≥ 0, define

d − m2j+1 mj+1 + d
aj = bξj c, mj+1 = aj qj − mj , qj+1 = , ξj+1 = .
qj qj+1
The aj and ξj so produced are the same as those produced in The Process.

Example: ξ = ξ0 = 41, so that m0 = 0, d = 41, q0 = 1.
j√ √
k 41 − 62 6 + 41
j = 0 : a0 = 41 = 6, m1 = 6 · 1 − 0 = 6, q1 = = 5, ξ1 = .
1 1
$ √ % √
6 + 41 41 − 42 4 + 41
j = 1 : a1 = = 2, m2 = 2 · 5 − 6 = 4, q2 = = 5, ξ2 = .
5 5 5
$ √ % √
4 + 41 41 − 62 6 + 41
j = 2 : a2 = = 2, m3 = 2 · 5 − 4 = 6, q2 = = 1, ξ2 = .
5 5 1

Theorem 12.2.3 (Theorem 7.19, Niven) Given a quadratic irrational ξ0 , we have:


1. The qj from The Quadratic Irrational Process are integers which are eventually positive.
2. The qj and the mj are bounded.
3. The continued fraction for ξ0 is eventually periodic.

Example: The quadratic irrational − 12 − 43 5 has continued fraction h−3; 1, 4, 4, 1, 1, 1, 5, 3, 5i.
Proof : (sketch) (1) ⇒ (2): Since qj > 0 for all j sufficiently large, and qj+1 + qj + m2j = d, we see that there
are only finitely many choices for the qj , mj .
(2) ⇒ (3) There are only finitely many pairs (mj , qj ), and so by the pigeonhole principle there must eventually
occur a duplicate. The pair (mj , qj ) determines the values for the next step of The Quadratic Irrational
Process.

77
(3) ⇒ (1) Highly nontrivial, and omitted.


Theorem 12.2.4 (Theorem 7.21, Niven) Let d ∈ N \ N2 and set c = d. Then bcc + c has a purely periodic
continued fraction ha0 ; a1 , a2 , . . . , ar−1 i with a0 = 2c. Hence c = hc; a1 , a2 , . . . , ar i where ar = 2c.

We refer to our earlier example, where we found that 6 + 41 has a purely periodic continued fraction.
Proof : (omitted)

Facts: If ξ = d and qj are defined as above, then:
• For every j we have qj 6= −1.
• If r is the period of the continued fraction of ξ, then qj = 1 if and only if r | j.

78
12.3 Lecture Thirty-Four

Notation: Throughout this lecture, d denotes a positive√integer that is not a perfect square. The symbols
aj , hj , kj denote the terms from The Process applied to d, and similarly for mj , qj .
Pell’s equation: We are interested in integer solutions to the equation x2 − dy 2 = N for some fixed N ∈ Z; in
particular, we seek solutions where both x and y are positive.

Theorem 12.3.1 (Theorem 7.24, Niven)√If |N | < d, then for any positive solution (x, y) to Pell’s equation
we must have that xy is a convergent to d. In particular, if (x, y) = 1 then we must have that x = hj and
y = kj for some j.
Proof : (omitted)

Example: Every solution of x2 − 41y 2 = −1 must come from a convergent of 41. We saw in our last lecture
that in this case h2 = 32, k2 = 5, and indeed

(32)2 − 41(5)2 = 1024 − 1025 = −1.

Theorem 7.22 of Niven gives us the following key identity: for j ≥ −1, one has h2j − dkj2 = (−1)j+1 qj+1 . At
the√end of our last lecture we saw that qj = 1 if and only if r|j, where r is the period of the continued fraction
of d. It is a corollary (Corollary 7.23) that, for every l ≥ 0, we have

h2lr−1 − dklr−1
2
= (−1)lr .

Example: We solve Pell’s equation for d = 45. We have



45 = h6; 1, 2, 2, 2, 1, 12i,

so r = 6. Then with l = 1, we have by corollary 7.23 that

h5 = 161, k5 = 24, hence 1612 − 45(24)2 = q6 = 1.

So a solution to x2 − 45y 2 = 1 is x = 161, y = 24. Note that


h5
= r5 = h6; 1, 2, 2, 2, 1i.
k5
Another solution is given by l = 2; we have
h11 51841
= r11 = h6; 1, 2, 2, 2, 1, 12, 1, 2, 2, 2, 1i = ,
k11 7728
and indeed we have that 518412 − 45(7728)2 = 1.
Theorem 12.3.2 (Theorem 7.25, Niven) All solutions to x2 − 2
√ dy = ±1 are of the form x = hlr−1 , y = klr−1 ,
where l ≥ 0 and r is the period of the continued fraction of d. Furthermore if r is even then there are no
positive solutions to x2 − dy 2 = −1, and the positive solutions to x2 − dy 2 = 1 are exactly x = hlr−1 , y = klr−1
with l ≥ 1; if r is odd, then the positive solutions to x2 − dy 2 = −1 are exactly x = hlr−1 , y = klr−1 where l is
odd and positive, and the positive solutions to x2 − dy 2 = 1 are exactly x = hlr−1 , y = klr−1 where l is even and
positive. In every case, y = y(l) is a strictly increasing function of l.
This is the main important result of our foregoing work.
Remark: Suppose s2 − dt2 = A, u2 − dv 2 = B. Factoring over the reals gives
√ √ √ √
A = (s − t d)(s + t d), B = (u − v d)(u + v d),

79
from which it follows that
√ √
AB = ((su + dtv) − d(sv + tu))((su + dtv) + d(sv + tu)) = (su + dtv)2 − d(sv + tu)2 .

In particular, if A = 1, then we get new solutions to the equation x2 − dy 2 = A by considering (s + t d)l with
l ≥ 2.
Example: Suppose d = 45. Set s = 161, t = 24 so that s2 − dt2 = 1. We have
√ √ √ √
(161 + 24 45)2 = 51841 + 7728 45, (161 + 24 45)3 = 16, 692, 641 + 2, 488, 392 45,

and indeed
16, 692, 6412 − 45 · 2, 488, 3922 = 1, h17 = 16, 692, 641, k17 = 2, 488, 392.

Proof : (omitted)
Theorem 12.3.3
√ (Theorem 7.26, Niven) Set x1 = hr−1 , y1 = kr−1 , where r is the period of the continued
fraction of d. Define xl , yl recursively via
√ √
xl + yl d = (x1 + y1 d)l .

Then xl = hlr−1 and yl = klr−1 .


Proof : (omitted)
Theorems 12.3.2 and 12.3.3 together tell us that the smallest (in terms of y) solution to x2 − dy 2 = ±1
√ is given
by x1 = hr−1 , y1 = kr−1 , and moreover that all solutions may be found by taking exponents of x1 + y1 d.
Example: Suppose d = 41; then the smallest positive solution to x2 − 41y 2 = −1 is x1 = h2 = 32, y1 = k2 = 5.
Thus √ √ √
x2 + y2 41 = (32 + 5 d)2 = 2049 + 320 41.
By theorem 12.3.3, (2049, 320) is the smallest positive solution to x2 − 41y 2 = 1.

80
13 Week Thirteen

13.1 Lecture Thirty-Five

Miscellany about continued fractions: Given an arbitrary continued fraction, must it correspond to a real
number? Let a0 ∈ Z, a1 , a2 , . . . ∈ N, and define

L = ha0 ; a1 , a2 , . . .i = lim ha0 ; a1 , a2 , . . . , an i.


n→∞

Theorem 13.1.1 The limit L always exists and is irrational. Moreover, the partial quotients of L are exactly
a0 , a1 , a2 , . . .
Recall: If rn denotes the nth convergent of L, we have rn = ha0 ; a1 , . . . , an i and moreover

(−1)n−1
rn − rn−1 = .
kn kn−1
1
This implies that the convergents oscillate around L. Indeed, define αn = kn kn−1 so that
n
X
rn = a0 + (−1)j−1 αj ;
j=1

as a decreasing, alternating series, we know that this series converges and thus that the convergents also
converge.
Example: Define x = h1; 1, 1, . . .i so that x = 1 + x1 . This yields the quadratic equation x2 − x − 1 = 0 and

1+ 5
since x > 0 we deduce that x = 2 = ϕ, as introduced in lecture thirty-two. With the Fibonacci numbers as
defined there, we have
1
Fn = √ (ϕn − (−ϕ)n ), and m|n ⇒ Fm |Fn .
5

Definition: A real number is called simply normal in base-10 if, for every i ∈ {0, 1, . . . , 9}, the probability
of randomly selecting an i in its decimal expansion is 0.1.
There is an analogous definition for simple normality in base-b. A real number is normal base-b if it is simply
normal base-b, base-b2 , base-b3 , and so on. For example, 0.0123456789 is simply normal base-10, but not
normal.
Theorem 13.1.2 Almost all real numbers are normal base-10.
Champernowne’s number: Let c = 0.12345678910111213 . . . D.G. Champernowne showed in 1933 that c is
normal base-10.
It is conjectured that the following numbers are normal: π, e, log 2, and any q ∈ Q of degree at least 3.
It is a trivial consequence of theorem 13.1.2 that almost all real numbers are normal in every base simultane-
ously.
Back to continued fractions: given ξ ∈ R, define
#{n ≤ x : an = k}
δk (ξ) = lim .
x→∞ x
1
Aleksandr Khinchin showed that, for almost all ξ ∈ R, δk (ξ) exists and equals log2 (1 + k(k+2) ), thus

δ1 ≈ 0.415, δ2 ≈ 0.170, δ3 ≈ 0.093, . . .

81
One number which fails this test is

e = h2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, . . .i

Furthermore, any number of the form me+n


re+s also fails Khinchin’s theorem. It is conjectured that the following
numbers satisfy Khinchin’s theorem: π, e, log 2, and any q ∈ Q of degree at least 3. Khinchin also proved (1934)
that, for almost all ξ ∈ R, one has
∞  log2 k
Y 1
lim (a1 a2 · · · an )1/n = 1+ = 2.6854520010 . . .
n→∞ k(k + 2)
k=1

Theorem 13.1.3 (Theorem 7.17, Niven) For all ξ ∈ R \ Q, there exist infinitely many ab ∈ Q such that

|ξ − ab | < √5b
1
2 , and moreover 5 is the best possible such bound.

By discarding the (countable) set of real numbers ξ for which the bound 5 is necessary, we may improve the
√ √
221

1517
bound to 8; repeating this process we obtain bounds of 5 , 13 , . . . These numbers arise naturally in the
study of the Markov spectrum.
Theorem 13.1.4 (Theorem 7.14, Niven) If |ξ − ab | < 1
2b2
, then a
b is a convergent to ξ.

82
13.2 Lecture Thirty-Six

Numerical examples of continued fractions


Let y = 365.242199 . . . be the number of solar days in a year; it has been a challenge for centuries to construct
a calendar which takes into account this lack of integrality. Numa Pompilius devised a calendar (ca. 713 BCE)
in which occasional and irregular leap months would be added into the middle of February. Julius Caesar (48
BCE) devised the Julian calendar, in which every year has 365 days, except for every fourth year which has
366.
While divergence from the true count is slow in the Julian calendar (amounting to about 11 days over 1800
years) it is noticeable; in 1582, Pope Gregory XIII introduced the Gregorian calendar as a replacement. In this
calendar, every year divisible by 4 is a leap year, except years divisible by 100 and not 400. This is the most
widely-used calendar in contemporary Western society; it averages 365.2425 days per year, and so diverges by
about 3 days every 10,000 years.
The continued fraction of y is h365; 4, 7, 1, 3, 5, 20, . . .i, and the convergents to y − 365 are
1 7 8 31 163
, , , , ,...
4 29 33 128 673
To get a good rational approximation, we need to truncate before a large partial quotient. Using the convergent
31
128 , we might say that we have a leap year every year which is divisible by 4, except years that are divisible by
128. In hexadecimal: a year is a leap year if it ends in 0, 4, 8, or C, unless it ends A00. This diverges by about
one day every 87,000 years, and we have
31
365 = 365.2421875.
128
Now, let m = 29.53059 . . . be the number of days in a lunar month (that is, from one new moon to the next),
y
so that we have m = 12.3683 . . . Taking the continued fraction,
y
= h12; 2, 1, 2, 1, 1, 17, . . .i,
x
y
and the convergents of x − 12 are
1 1 3 4 7
, , , , ,...
2 3 8 11 19
Modern lunisolar calendars have 7 leap months every 19 years, diverging by one month every 6800 years.
In modern western music, the A above middle C is assigned the frequency 440Hz. By doubling this frequency,
we obtain a note one octave higher; tripling it, we obtain a perfect fifth between 880Hz and 1320Hz. Un-
fortunately much like the alignment of months and years, the alignment of octaves and fifths is out of sync;
indeed,
(3/2)12
≈ 1.015.
27
However, an equally-tempered tuning divides each octave into 12 equal segments, so each semitone is an
increase by a factor of 21/12 ; in this case we see 27/12 ≈ 1.498. We take the continued fraction:

log(3/2)
log2 (3/2) = = 0.58496 . . . = h0; 1, 1, 2, 2, 3, 1, 5 . . .i,
log 2
with convergents
1 3 7 24
1, , , , , . . .
2 5 12 41

83
So if we wanted to divide the octaves into x notes so that an interval of y of them make a perfect fifth, we
would be better to take x = 41, y = 24.
Pythagorean triplets: What are all positive integer solutions to the equation x2 + y 2 = z 2 ? A primitive
triplet is a solution to this equation in which (x, y) = 1.
Theorem 13.2.1 (Theorem 5.5, Niven) The positive, primitive Pythagorean triplets (with y even) are param-
eterized by:
x = r2 − s2 , y = 2rs, z = r2 + s2 ,
where r > s > 0, (r, s) = 1, and r and s have opposite parity.
nb. For any primitive (x, y, z), exactly one of x and y is even.
Proof : We give two sketches.
1. We may factor y 2 = (z − x)(z + x), hence
 y 2 z+x z−x
= · , with ( x+z x−z
2 , 2 ) = 1.
2 2 2
By Euclid’s lemma, we must have that z+x 2 z−x
2 =r , 2 =s .
2

2 2
2. We have xz + yz = 1, and so we seek to find the rational points q of the unit circle. The line joining
any rational point q to (−1, 0) has rational slope; conversely, any line through (−1, 0) with rational slope
intersects the circle in a rational point:

y = mx + b, m ∈ Q ⇒ x2 + (m(x + 1))2 = 1 ⇔ (x + 1)((m2 + 1)x + (m2 − 1)) = 0.

So, all rational points on the circle have the form

1 − m2
 
2m
, , m ∈ Q.
1 + m2 1 + m2


The approach of proof (2) generalizes to arbitrary conic sections.

84
13.3 Lecture Thirty-Seven

Final exam review


At least half of the problems on the final will be taken from homework problems. No calculators are permitted.
Below is a brief overview of the important topics covered.
Chapter One – Divisibility
• The Euclidean algorithm: calculating the gcd, Bézout’s identity, calculating inverses modulo m.
• The Fundamental theorem of arithmetic.
• Euclid’s theorem
Chapter Two – Congruences
• The Chinese remainder theorem.
• Euler’s theorem; Fermat’s little theorem.
• The Euler φ-function.
• Primitive roots; the structure of Z×
n.

• Hensel’s lemma.
• Solving linear congruences ax ≡ b mod m.
• The number of solutions of xn ≡ a mod p.
Example problems: Find all n ∈ Z such that 3n ≡ n mod 7. Show that aφ(n) ≡ a2φ(n) mod n for all a ∈ Z, n ∈ N.
Prove that a squarefree integer n is a Carmichael number if and only if (p − 1)|(n − 1) for every p|n.
Chapter Three – Quadratic Reciprocity and Quadratic Forms
• Sums of two squares.
• The law of quadratic reciprocity.
• Jacobi symbols, Legendre symbols; special known values of the same.
• Quadratic residues and nonresidues.
• Euler’s criterion.
• Binary quadratic forms
Example problem: In Z× n , prove that at most half of the elements are quadratic residues, and that exactly half
of them are quadratic residues if and only if n has a primitive root.
Chapter Four – Some Functions of Number Theory
• Multiplicative functions, totally multiplicative functions.
• Dirichlet convolution.
• Möbius inversion.
Chapters Six and Seven – Farey Fractions and Irrational Numbers; Simple Continued Frac-
tions
• Dirichlet’s theorem on Diophantine approximation.

85
• Farey fractions.
• Diophantine approximations to rational and algebraic numbers.
• Continued fractions.
• Pell’s equation.

86

You might also like