Professional Documents
Culture Documents
Diophantine Geometry Course
Diophantine Geometry Course
1. Diophantine Equations
Objective. Solve polynomial equations using integers (or rationals).
Example. Linear equation: Solve ax + by = c where a, b, c ∈ Z. It is a classical fact
from elementary number theory that this equation has a solution (for x and y) if and only
if gcd(a, b) | c.
Example. Quadratic equation: x2 + y 2 = z 2 . We are interested in non-zero solutions
(x, y, z) ∈ Z, i.e. (x, y, z) 6= (0, 0, 0). Since the equation is homogeneous, it is enough
to understand the solutions of X 2 + Y 2 = 1 where X, Y ∈ Q (points on the unit circle).
Anyways, the complete solution is known for this problem. WLOG gcd(x, y, z) = 1, x is
odd and y is even. Then the solutions are given by x = s2 − t2 , y = 2st, and z = s2 + t2 .
s2 − t2
Analogously, the solutions (X, Y ) ∈ Q2 of X 2 + Y 2 = 1 are parametrized by: X = 2
s + t2
2st
and Y = 2 .
s + t2
Example. Consider the equation y 2 = x3 − 2. It turns out that (3, ±5) are the only
solutions in Z2 , while there are infinitely many solutions (x, y) ∈ Q.
Goal. Given f1 , f2 , ..., fk ∈ Z[X1 , ..., Xn ]. For R = Z, Q, or any other ring. Let
V (R) = {(x1 , x2 , ..., xn ) ∈ Rn : fi (x1 , ..., xn ) = 0 for all i}
Describe V (R). Two questions naturally arise. 1) Is V (R) = ∅? This is undecidable for
R = Z. 2) Is V (R) finite?
2 variables, 1 equation. Let C be a plane curve given by f (x, y) = 0 where f (x, y) ∈
Z[x, y]. The goal would be to describe the solutions (x, y) ∈ Z2 , Q2 , R2 or C2 . As the ring gets
bigger and bigger, the task progressively becomes easier. In other words, we are concerned
with the solution set C(R) = {(x, y) ∈ R2 : f (x, y) = 0}.
Example. C : x3 + y 2 = 1.
THM
C(Q) = C(Z) = {(1, 0), (0, 1)}
Example. C : xn + y n = 1.
FLT
C(Q) ⊆ {(±1, 0), (0, ±1)}
where FLT stands for Fermat’s Last Theorem (proved by Wiles).
Idea. If degf (x, y) is big, does that necessarily mean fewer solutions? Not necessarily,
e.g. y = xd still has plenty of solutions.
Guiding Principle. Geometry (solutions to polynomial equations over an algebraically
closed field) determines the arithmetic (number theory, i.e. solutions over integers or non-
closed fields).
1
Consider the plane curve C : f (x, y) = 0. There are some extra points “at infinity”. Let
C = C ∪ {points at infinity}. Sometimes, C is a nice curve (smooth). Not so nice curves are
the ones with singularities (think of a cuspidal or nodal cubic curve). We can blow-up these
curves at their singular points to make them smooth.
Assuming C is nice, the set C(C) is a nonsingular compact 1-dimensional complex mani-
fold, i.e. a Riemann surface of genus g. Intuitively, g counts the number of holes. So g = 0
corresponds to the 2-sphere, while g = 1 corresponds to the usual torus, etc. So here, the
genus g is the “geometry” side (see the Guiding Principle above).
Theorem. Consider the plane curve C : f (x, y) = 0 for f ∈ Q[x, y]. Suppose there are
no singularities, so C(C) = g-holed torus, where g is the genus of C. There are three cases
to consider.
Case 1. g = 0. Then C(Q) = ∅ or C(Q) = Q ∪ {∅}. There exists an algorithm to
determine the conclusion.
Case 2. g = 1. Then C(Q) = ∅ or C(Q) = finitely-generated abelian group. This is the
Mordell-Weil Theorem. In the latter case, we know that any finitely-generated abelian group
is of the form
finite abelian group × Zr
| {z }
=torsion part
. The non-negative integer r is called the rank. It is a theorem of Mazur that the torsion
part has order at most 16. Furthermore, there exists an algorithm to determine the torsion
part. It is not known if the rank r can be unbounded or not. Current record for an example
with high rank is r = 28 due to Elkies. There is no known algorithm to determine the rank
in general.
Case 3. g ≥ 2. Then C(Q) is finite. This is a theorem of Faltings (this result was
previously known as Mordell’s Conjecture). There is no algorithm in general to find the
solution set.
Goals for the class. We will prove Mordell-Weil Theorem and Faltings’ Theorem (but
not Faltings’ original proof).
Key Tools.
(1) Diophantine Approximation: how closely can a rational quantity approximate an
irrational quantity? We will learn about results of Roth, Baker and others.
(2) Height Functions: measuring complexity of objects.
2. Diophantine Approximation
Let us say few words about Diophantine Approximation. First, since Q is dense in R, it is
a √ √
true that inf | − 2| = 0. However, we are interested in approximating 2 with rationals
a/b∈Q b
whose denominators are not so large (relatively speaking). For example, here are two facts
that are easy to prove:
a a √ 1
(1) There are infinitely many ∈ Q with gcd(a, b) = 1 satisfying − 2 < 2.
b b b
a a √ 1
(2) There are only finitely many ∈ Q with gcd(a, b) = 1 satisfying − 2 < 3.
b b b
In fact, let’s prove a more general result that implies the second statement.
2
Theorem. (Liouville) Let > 0. If x ∈ R satisfies a degree n polynomial with coefficients
in Q, then
a 1
− x < n+
b b
a
has only finitely many solutions for with gcd(a, b) = 1.
b
Remark. The following proof was communicated to me by Ming Hao Quek.
Proof. Consider the set S defined by
a a 1
S := ∈Q: − x < n+
b b b
Assume, to the contrary, that
Qn S is infinite. Say x satisfies a monic polynomial P (X) ∈ Q[X]
of degree n. Let P (X) = i=1 (X − xi ) where xi ∈ C and x1 = x. Given b ∈ S, P ab is a
a
rational number with denominator at most Dbn for some fixed D > 0. Since P (X) has only
finitely many roots, and S is infinite, the subset
na a o
S 0 := ∈S:P 6= 0
b b
must be infinite as well. For all a
b
∈ S 0 , we have
a 1
P ≥
b Dbn
On the other hand, for all ab ∈ S 0 ,
a a a a
P = −x − x2 · · · − xn
b b b
b
n
a Y a a a ∆
≤ −x − x + |x − xi | ≤ − x (1 + δ)n−1 = ∆ − x ≤ n+
b i=2
b
| {z } | {z } b b b
<δ
≤1
where δ is any upper bound for the difference of the roots of P (X), and ∆ := (1 + δ)n−1 only
depends on x. Combining the upper and lower bounds, we get
1 ∆
≤ ⇒ bn+ ≤ D∆bn
Dbn bn+
for all ab ∈ S 0 . Since S 0 is infinite, we can choose ab ∈ S 0 where b is arbitrarily large, and this
leads to a contradiction.
a
Using the above theorem, we easily see that there are only finitely many ∈ Q with
b
a √ 1
gcd(a, b) = 1 satisfying − 2 < 3 . Similarly, for any > 0, there are only finitely many
b b
a a √ 3 1
∈ Q with gcd(a, b) = 1 satisfying − 2 < 3+ .
b b b
Fact. It is also true, but much harder to prove, that there are only finitely many ab ∈ Q
a √ 3 1
with gcd(a, b) = 1 satisfying − 2 < 2.5 . This would instantly follow from Roth’s
b b
celebrated theorem.
3
Theorem. (Roth) Let > 0. If x is an irrational number, then there are only finitely
a
many ∈ Q with gcd(a, b) = 1 satisfying
b
a 1
− x < 2+
b b
Let K be a number field, and fix an algebraic closure K. The Galois group GK = Gal(K/K)
acts on An (K) by σ(P ) = (σ(x1 ), ..., σ(xn )) for P = (x1 , ..., xn ).
Then An (K)GK = fixed points of An (K) = An (K). Likewise, GK acts on Pn (K) by
σ(P ) = [σ(x0 ) : · · · : σ(xn )] for P = [x0 : · · · : xn ].
Proposition. Pn (K)GK = Pn (K).
Proof. This is an application of Hilbert’s Theorem 90.
Definition. We say that f (x0 , x1 , ..., xn ) is a homogeneous of degree d if
X
f= aI xi00 · · · xinn
I=(i0 ,...,in )
i0 +···+in =d
or equivalently, f (λx0 , ..., λxn ) = λn f (x0 , ..., xn ) in the ring K[λ, x0 , ..., xn ].
For P ∈ Pn (K), f (P ) is not well-defined but {P : f (P ) = 0} is well-defined.
Definition. A rational map f : Pn → Pm is f = [f0 , ..., fm ] with f0 , ..., fm ∈ K[x0 , ..., xn ]
homogeneous of degree d. To be more pedantic, we could have written f : Pn (K) → Pm (K).
Then f (P ) is (almost) well-defined: If P = [a0 , ..., an ], then f (P ) = [f0 (a0 , ..., an ), ..., fm (a0 , ..., an )]
is well-defined when fi (a0 , ..., an ) 6= 0 for some i.
Since K[x0 , ..., xn ] is a UFD, we can assume that f0 , f1 , ..., fm have no common irreducible
factor, that is, gcd(f0 , ..., fm ) = 1, in which case we can define the degree of f as d.
For a rational map f : Pn → Pm , its indeterminacy locus is defined by:
If = {P ∈ Pn : f0 (P ) = · · · = fm (P ) = 0}
So f gives a function f : Pn (K) \ If → Pm (K).
Definition. A rational map f : Pn → Pm is called a morphism if If = ∅.
Example. Consider the rational map f : P2 → P2 given by [x, y, z] 7→ [x2 , xy, z 2 ]. Then
If = {[0, 1, 0]} and so f is not a morphism.
Nullstellensatz. Given F1 , F2 , ..., FN ∈ K[x0 , ..., xn ] homogenous polynomials (not nec-
essarily of the same degree), let
V (F ) := V (F1 , ..., FN ) := {P ∈ Pn (K) : F1 (P ) = · · · = FN (P ) = 0}
p
Suppose V (F ) = ∅. Then Nullstellensatz says that the radical hF1 , ..., FN i = (x0 , ..., xn ).
In other words, for each 1 ≤ i ≤ n, there exists a ki ∈ N such that xki i = N
P
i=1 Gi Fi .
4
4. Height Functions
Moral: “height(object) = complexity”.
Given P = [x0 , ..., xn ] ∈ Pn (Q), assume WLOG xi ∈ Z. Since Z is a UFD, we can
assume WLOG gcd(x0 , ..., xn ) = 1 in which case we say the coordinates are normalized
P up to ±1).
(normalization in this case is unique
To describe P takes roughly ni=1 (log2 |xi | + 1) bits.
Definition. Given P = [x0 , ..., xn ] ∈ Pn (Q),
| {z }
normalized
(1) The (logarithmic) height of P is h(P ) = log max |xi | .
0≤i≤n
5. Absolute Values
Recall that
MQ := {(normalized) absolute values on Q} = {| · |∞ } ∪ {| · |p where p ∈ Z is prime}
For a given a ∈ Q, we have |a|∞ = max(a, −a), and |a|p = p− ordp a where ordp (a) = r is
defined to be the unique integer such that a = pr cb with p - b and p - c.
Product Formula. For any a ∈ Q \ {0}, we have
Y
|a|v = 1
v∈MQ
Recall that RK denotes the ring of integers in a number field K. We can interpret the ring
of integers in terms of places:
RK = {a ∈ K : |a|v ≤ 1 for all v ∈ MK◦ } = {a ∈ K : |a|v ≤ 1 for all v ∈
/ MK∞ }
More generally, let S be a finite set such that MK∞ ⊆ S ⊂ MK .
Definition. The ring of S-integers is defined by
RS := {a ∈ K : |a|v ≤ 1 for all v ∈
/ S}
In other words, we allow the finite set S of primes to occur in the denominator.
Theorem. (Dirichlet’s Unit Theorem).
◦
RS∗ = µK × Z#S−1 = µK × Zr1 +r2 +#(S∩MK )−1
where µK denotes the roots of unity in K, and r1 , r2 denote the number of real and complex
(counted in pairs with its complex conjugate) embeddings of K, i.e. [K : Q] = r1 + 2r2 .
where α = α1 , α2 , ..., αd are the conjugates of α. In the analysis that follows below, we will
use the notation (
N if v is archimedean
Nv =
1 if v is non-archimedean
for any positive integer N . For v ∈ MK ,
X
|σj (α1 , ..., αd )|v = αi1 αi2 · · · αij
1≤i1 <...<ij ≤d
v
d
≤ · max |αi · · · αij |v
j v 1≤i1 <...<ij ≤d 1
≤ 2dv · max |αi |jv
1≤i≤d
d
Y
≤ 2dv · max(1, |αi |v )d
i=1
Applying the usual procedure of raising the expression to nv power, multiplying over all
v ∈ MK and taking [K : Q] roots, one obtains
!1/[K:Q] d
2
Y Y
dnv
H([σ0 (α), ..., σd (α)]) ≤ 2v H(αi )d = 2d H(α)d
v∈MK i=1
where we applied the product formula at the end to the term 2d . We have a map
take min poly 2
{α ∈ Q : H(α) ≤ B, D(α) = d} −→ {X d +a1 X d−1 +· · ·+ad : ai ∈ Q, H([1, a1 , ..., ad ]) ≤ |2d{z
B d}}
=C
By enlarging the field, if necessary, we can assume that there is a number field K such that
#»
xi ∈ K and ai, #»j ∈ K for all i and j = (j0 , ..., jn ).
Let v ∈ MK be a place. Then
X
|fi (P )|v = ai, #»j xj00 · · · xjnn
j0 +···+jn =d v
n+d
≤ · max
#»
|aij |v · max
#»
|xj00 · · · xjnn |v
d v j j
≤ 2n+d
v · |fi |v · max |xk |dv
0≤k≤n
In the last line, we simply defined |fi |v to be max #»j |aij |v , i.e. it measures the largest absolute
value among the coefficients of fi . Since f = [f0 , ..., fn ], we can define |f |v = max0≤i≤m |fi |v .
We get
|fi (P )|v ≤ 2vn+d · |f |v · max |xk |dv
0≤k≤n
Now we do the usual procedure: we raise the inequality to nv -th power (local powers),
multiply over all v ∈ MK , and then take the [K : Q]-th root to get the absolute heights.
Finally, we will take the log of both sides to obtain the absolute logarithmic
Q heights. It
nv nv
Q
is worth mentioning that for any integer N ∈ N, we have v∈MK Nv = v∈M ∞ Nv =
K
nv 1/[K:Q]
[K:Q]
Q
NormK/Q (N ) = N . After taking [K : Q]-th roots, v∈MK Nv = N . Combining
these observations, the previous displayed equation translates to
H(f (P )) ≤ 2n+d H(f )H(P )d
nv 1/[K:Q]
Q
where H(f ) stands for v∈MK |f |v . There is a way to interpret H(f ) as an actual
height. Indeed, arrange all the coefficients of all the component functions fi of f into some
big vector and view the large string as an element of some big projective space. Then H(f )
is precisely the absolute height of this vector. Similarly, we define h(f ) := log H(f ).
Taking logs, we arrive to
h(f (P )) ≤ d · h(P ) + h(f ) + (n + d) log(2)
We can take c1 (f ) := h(f ) + (n + d) log(2), and this completes the proof of part (a).
(b) The lower bound is more subtle, and requires Nullstellensatz. We are assuming that f
is a morphism, i.e. If = ∅. This means f0 , ..., fm have no common roots except for (0, ..., 0)
which is not in the projective space. The projective version of Nullstellensatz says that for
each 0 ≤ i ≤ n, there is an exponent ei such that xei i ∈ hf0 , ..., fm i. By taking the largest of
the ei ’s, there is a single exponent e (independent of i) such that xei ∈ hf0 , ..., fm i.
10
For each i, there are polynomials gij (x0 , ..., xn ) in K[x0 , ..., xn ] such that
m
X
xei = gij (x0 , ..., xn )fj (x0 , ..., xn )
j=0
By enlarging the field again, we may as well assume that gij [x0 , ..., xn ] ∈ K[x0 , ..., xn ]. With-
out loss of generality, we can assume that gij are homogeneous of degree e − d. Thus,
|x |e ≤ (m + 1) · max |g ( #»
i v v x )| · max |f ( #»
ij v x )| j v
0≤j≤m 0≤j≤m
max |fj ( #»
≤ (m + 1)v · max 2n+e−d · |gij |v · max |xk |e−d
v x )|v
0≤j≤m k 0≤j≤m
Here we applied the result of part (a) to the functions gij . Now, taking a maximum over all
0 ≤ i ≤ n, we get
max |xi |ev ≤ (m + 1)2n+e−d v · max |gij |v · max |xk |e−d · max |fj ( #»
v x )|v
0≤i≤n i,j 0≤k≤n 0≤j≤m
Now we do the usual procedure (see part (a) for details) to get
H(P )d ≤ (m + 1)2n+e−d H(g)H(f (P ))
where H(g) stands for the heigh of a point obtained by stringing together all the coefficients
of all the gij . After taking logs, and rearranging the equation, we have
h(f (P )) ≥ d · h(P ) − h(g) − log(m + 1) − (n + e − d) log(2)
We can take c2 (f ) := h(g) + log(m + 1) + (n + e − d) log(2), and this completes the proof of
part (b).
8. Application: Northcott’s Theorem for preperiodic points
Suppose f : Pn 99K Pn is a rational map (so here m = n). We can iterate f by composing
f with itself. So we consider f 2 := f ◦ f , f 3 := f ◦ f ◦ f , and in general f r := f ◦ f ◦ · · · ◦ f .
| {z }
=r times
Given a point P ∈ Pn , its f -orbit is defined to be the set Of (P ) := {f r (P ) : r ≥ 0}.
Definition. A point P is called preperiodic if Of (P ) is finite. A point P is called periodic
if f m (P ) = P for some m ≥ 1.
It is clear that any periodic point is preperiodic. Similarly, some iterate of a preperiodic
point must be periodic. In Pm (C), the set PrePer(P ) is big. Indeed, for any pair of integers
k > j, the polynomial equation f k (P ) = f j (P ) will always have solutions in C, leading to
abundance of preperiodic points. Of course, the same argument applies to any algebraically
closed field.
Theorem. (Northcott) Suppose f : Pn → Pn is a morphism defined over Q with degree
d ≥ 2. The set
{P ∈ Pn (Q) : P is preperiodic for f }
is a set of bounded height.
Since there are only finitely many points with bounded height and bounded degree, we
immediately deduce the following corollary.
11
Corollary. Suppose f : Pn → Pn is a morphism defined over Q with degree d ≥ 2. For
any number field K, the set
{P ∈ Pn (K) : P is preperiodic for f }
is finite.
Proof of Northcott’s Theorem. We are interested in the set of periodic and preperiodic
points for f .
Per(f, Pn (Q)) := {P ∈ Pn (Q) : f i (P ) = P for some i ∈ N}
PrePer(f, Pn (Q)) := {P ∈ Pn (Q) : f i (P ) = f j (P ) for some i, j ∈ N with i > j}
We first show that Per(f, Pn (Q)) is a set of bounded height. In a previous section, we have
proved that for a morphism f of degree d,
h(f (Q)) ≥ d · h(Q) − C
holds for all points Q. Here C = C(f ) is a constant that only depends on f . Suppose that
P ∈ Per(f, Pn (Q)), i.e. f k (P ) = P for some k ≥ 1. Then
h(P ) = h(f k (P )) = h(f (f k−1 (P )))
≥ d · h(f k−1 (P )) − C
≥ d(d · h(f k−2 (P )) − C) − C
= d2 · h(f k−2 (P )) − (d + 1)C
≥ ...
≥ dk h(P ) − (dk−1 + dk−2 + · · · + d + 1)C
k
k d −1
= d h(P ) − C
d−1
We have shown that
dk − 1
k
h(P ) ≥ d h(P ) − C
d−1
Rearranging this inequality, one obtains
k
d −1 C
C ≥ (dk − 1)h(P ) ⇒ h(P ) ≤
d−1 d−1
Therefore,
C
Per(f, P (Q)) ⊆ P ∈ Pn (Q) : h(P ) ≤
n
d−1
which proves Northcott’s theorem in the periodic case.
Now suppose that P ∈ PrePer(f, Pn (Q)), i.e. f k+i (P ) = f i (P ) for some k ≥ 1, i ≥ 0.
Note that i = 0 if and only if P is periodic for f .
Since f k (f i (P )) = f i (P ), it follows that f i (P ) ∈ Per(f, Pn (Q)). By the previous compu-
tation,
C
h(f i (P )) ≤
d−1
12
Using the same computation as in the periodic case, we have
i
i i d −1
h(f (P )) ≥ d · h(P ) − C
d−1
Combining the previous two equations, we get
i
i d −1 C
d · h(P ) − C≤
d−1 d−1
Rearranging this inequality, we have
di − 1
i
i 1 d
d · h(P ) ≤ + C= C
d−1 d−1 d−1
Cancelling out di from both sides, we arrive at
C
h(P ) ≤
d−1
which is amazingly the exact same bound as in the periodic case. We conclude that
n n C
PrePer(f, P (Q)) ⊆ P ∈ P (Q) : h(P ) ≤
d−1
This finishes the proof of Northcott’s theorem.
As a corollary of Northcott’s theorem, PrePer(f, Pn (K)) must be finite for any number
field K. So the natural question is, how big can the set PrePer(f, Pn (K)) be? It is clear
that the size of PrePer(f, Pn (K)) can tend to infinity as [K : Q] → ∞, or n → ∞ or
deg(f ) → ∞. The following conjecture predicts that these 3 factors together govern how big
can PrePer(f, Pn (K)) get.
Uniform Boundedness Conjecture.
# PrePer(f, Pn (K)) ≤ C ([K : Q], deg(f ), n)
where the constant C only depends on [K : Q], deg(f ) and n.
A simplest example is the case of a quadratic polynomial, i.e. fc (x) = x2 + c where c ∈ Q.
By Northcott’s theorem, the set of periodic points Per(fc , Q) is finite. For c = 0, the point
x = 0 has period 1. For c = −1, x = 0 is a point of period 2 – indeed, f−1 (x) = x2 − 1 and
f−1 (0) = −1 and f−1 (−1) = 0. There is a specific value c0 such that fc0 has a point of period
3. However, Morton showed that there is no value of c such that fc has a point of period 4.
The same conclusion holds for points of period 5. Conditional on the Birch–Swinnerton-Dyer
conjecture, Stoll showed that there is no value of c ∈ Q such that fc has a point of period 6.
Nothing is currently known for period 7 or higher.
9. Heights on Varieties
Recall that an algebraic set V ⊆ Pn is defined by a collection of polynomial equations
f1 = ... = fr = 0 where each fi is a homogenous polynomial. In other words, V = {P ∈
Pn : f1 (P ) = ... = fr (P ) = 0}. Given an algebraic set V , we can consider I(V ) which
is the ideal generated by the set homogeneous polynomials vanishing on V . In symbols,
I(V ) = {homogeneous f such that f (P ) = 0 for all P ∈ V }. We say that an algebraic set
V is a variety if I(V ) is prime ideal. Geometrically, a variety is an algebraic set which cannot
13
be written as a union of two other algebraic sets. What we call “variety” here is sometimes
called an “irreducible variety” in some textbooks.
Given a variety V , an irreducible divisor of V is a subvariety W ⊆ V of codimension 1.
Definition. The group of divisors of V is
( )
X
Div(V ) = nW W : nW ∈ Z all but finitely many of nw are zero
W irred. divisor
More formally, Div(V ) is the free abelian group generated by the symbols W where W ranges
over all irreducible divisors of V .
Example. Suppose that V = C is a curve. Then Pan irreducible divisor is just a point on
the curve. So, any divisor D on C is of the form P ∈C nP P . We can use the concept of a
divisor to keep track of zeros and poles of functions on curves.
Example. Let V = Pn . It turns out that if W ⊆ V is an irreducible divisor, then
W = {f = 0} for some homogeneous irreducible polynomial in k[x0 , ..., xn ].
Given a variety V over a field k, the field k(V ) consisting of all rational functions on V is
called the function field of V . Notice that k(V ) is the fraction field of the coordinate ring
k[V ] = k[x0 , ..., xn ]/I(V ) which is an integral domain precisely because I(V ) is a prime ideal
(by definition of a variety). Given f ∈ k(V ), we can view f : V 99K P1 . It is customary
to associate 0 with [0 : 1] and ∞ with [1 : 0] on P1 . This correspondence is explained via
[a, b] ←→ a/b. Let’s assume from now on that V is smooth. Given any irreducible divisor
W ⊂ V , it is possible to define an integer ordW (f ) which is the order of vanishing of f along
W . This quantity ordW (f ) will be negative if f has a pole across W . It turns out that
ordW : k(V )∗ → Z
is a valuation. For a given f ∈ K(V )∗ , we define
X
div(f ) := ordW (f )W
W irred. divisor
Note that ordW (f ) > 0 if and only if W ⊆ f −1 ([0 : 1]) and ordW (f ) < 0 if and only if
W ⊆ f −1 ([1 : 0]).
The reason that all of this works rests on the fact that the local ring OV,W is a DVR
(because it is a dimension 1 regular local ring).
Definition. We say that D1 ∼ D2 if D2 − D1 = div(f ) for some f ∈ k(V )∗ . This is
indeed an equivalence relation.
Definition. The Picard group of V is the quotient group
Div(V ) Div(V )
Pic(V ) = =
∼ div(k(V )∗ )
Example. Let’s prove that Pic(Pn ) = Z. Suppose D ∈ Div(Pn ). Then
X X
D= nw W = nW {fW = 0}
W W
This looks more promising. However, it is still not clear what happens when we let d → ∞.
Just because the error term is bounded, it does not mean that it converges as d → ∞.
Theorem. Let V be a non-singular variety, and f : V → V a morphism. Assume that
D ∈ Div(V ) such that f ∗ (D) ∼ λD for some λ > 1. Then
(a) The limit
1
ĥf,D (P ) := lim n hD (f n (P ))
n→∞ λ
hD (f (P )) = hf ∗ D (P ) + O(1)
Since f ∗ D ∼ λD,
hD (f (P )) = hλD (P ) + O(1)
= λ · hD (P ) + O(1)
by linearity in the last step. We will use the identity hD (f (P )) = λ · hD (P ) + O(1) to show
that the sequence λ1n hD (f n (P )) is Cauchy, hence converges. Replacing P by f i−1 (P ), we
have
hD (f i (P )) − λhD (f i−1 (P )) = O(1)
for each i. Thus, for n > m,
1 1
n
hD (f n (P )) − m hD (f m (P )) =
λ λ
1 1
n
hD (f n (P )) − n−1 hD (f n−1 (P ))+
λ λ
1 1
n−1
hD (f n−1 (P )) − n−2 hD (f n−2 (P ))+
λ λ
..
.
1 1
hD (f m−1 (P )) − hD (f m (P ))
λm+1 λm
n
X 1 1
= i
hD (f i (P )) − i−1 hD (f i−1 (P ))
i=m+1
λ λ
n
X 1
hD (f i (P )) − λ · hD (f i−1 (P ))
= i
i=m+1
λ
n ∞
X 1 X 1 1/λm+1
= i
O(1) ≤ i
O(1) = O(1)
i=m+1
λ i=m+1
λ 1 − 1/λ
1 1
= m
· O(1) −→ 0 as n ≥ m → ∞
λ λ−1
1 1
n
hD (f n (P )) − hD (P ) ≤ · O(1)
λ λ−1
1
Letting n → ∞, we get ĥf,D (P ) − hD (P ) ≤ λ−1
· O(1), so in particular, ĥf,D (P ) = hD (P ) +
O(1).
22
(c) By definition,
1
ĥf,D (f (P )) = lim n
hD (f n (f (P )))
n→∞ λ
λ
= lim n+1 hD (f n+1 (P ))
n→∞ λ
1
= λ lim n+1 hD (f n+1 (P )) = λ · ĥf,D (P )
n→∞ λ
(d) Let ĥ0f,D be a function satisfying (b) and (c). Define g(P ) := ĥf,D (P ) − ĥ0f,D (P ). Part
(b) implies that g = O(1). Part (c) says that
λn g(P ) ⇒ g(P ) = 0
g(f (P )) = λ · g(P ) ⇒ g(f n (P )) = |{z}
| {z }
bounded →∞
Let K(A[m]) be the field generated by all the points P ∈ A[m]. Consider the subgroup
H = Gal(K/K(A[m])) of GK = Gal(K/K). For each σ ∈ H, by definition σ(P ) = P for
every P ∈ A[m], so ρm (σ) is the identity map in Aut(A[m]), i.e. σ is in the kernel of ρm . Thus,
H ⊆ ker(ρm ), meaning that ρm factors through GK → GK /H. By the Fundamental Theorem
of Galois Theory, we have an isomorphism GK /H ∼ = Gal(K(A[m])/K). The situation can
be described by a commutative diagram
ρm
GK / GL2g (Z/mZ)
5
'
Gal(K(A[m])/K)
A(K)/mA(K) → A(L)/mA(L)
is finite.
Proof. Let Φ be the kernel of this map, so
A(K) ∩ mA(L)
Φ=
mA(K)
Let P = P (mod mA(K)) ∈ Φ with P ∈ A(K). Then P = mQP for some QP ∈ A(L). By
replacing L by its Galois closure, we can assume that L/K is Galois. Define a map
fP : GL/K → A(L)
σ 7→ σ(QP ) − QP
where we use GL/K as a shorthand for Gal(L/K). Note that σ(QP ) − QP has order m.
Indeed, [m](σ(QP ) − QP ) = [m]σ(QP ) − [m]QP = σ([m]QP ) − [m]QP = σ(P ) − P = 0 as P
is defined over K. Thus, the target of fP can be replaced with A[m], so we can view fP as
26
a set map GL/K → A[m]. This association leads to a map
A(K) ∩ mA(L) → HomSet (GL/K , A[m])
| {z }
finite, as GL/K and A[m] are finite
P 7→ fP
Claim. If fP = fP 0 , then P − P 0 ∈ mA(K).
Note that the claim implies the result, since it gives an injective map
A(K) ∩ mA(L)
,→ finite set
mA(K)
Proof of the Claim. If fP = fP 0 , then σ(QP ) − QP = σ(QP 0 ) − QP 0 for every σ ∈ GL/K .
So, σ(QP − QP 0 ) = QP − QP 0 for every σ ∈ GL/K , i.e. QP − QP 0 ∈ A(K). Then P − P 0 =
m(QP − QP 0 ) ∈ mA(K).
This completes the proof of the lemma.
Therefore, to prove weak Mordell-Weil, we may assume that A[m] ⊂ A(K) and l.µm ⊆ K. .
Let’s recall the Kummer sequence as an analogy:
∗ x7→xm ∗
1 −→ l.µm −→ K −→ K −→ 1
.
Taking group cohomology, we get
1 −→ l.µm ∩ K −→ K ∗ −→ K ∗ −→ H 1 (GK , l.µm ) −→ H 1 (GK , K ∗ ) = 0
. .
where the last terms is zero by Hilbert’s Theorem 90. Thus, we have an isomorphism
= H 1 (GK , l.µm ). This isomorphism is achieved by sending a → (σ 7→ α ) where
K ∗ /(K ∗ )m ∼ . σ(α)
√
α ∈ m a. The point is that often α ∈ / K, so the cocycle (σ 7→ σ(α)
α
) is not a coboundary in
general. If K contained all m-th roots of unity, then each cocycle would be a coboundary in
which case H 1 (GK , l.µm ) would be zero.
.
Similarly, we consider
m
0 → A[m] −→ A(K) −→ A(K) −→ 0
Taking group cohomology, we obtain
m
0 → A[m] ∩ A(K) −→ A(K) −→ A(K) −→ H 1 (GK , A[m]) −→ H 1 (GK , A(K)) −→ ...
Unfortunately, we don’t have a version of Hilbert Theorem 90 for abelian varieties, so
H 1 (GK , A(K)) is not necessarily zero. Since we are assuming A[m] ⊂ A(K), the connecting
homomorphism A(K) −→ H 1 (GK , A[m]) induces an injection
δ : A(K)/mA(K) ,→ H 1 (GK , A[m]) = Hom(GK , A[m])
The reason for the last equality is that A[m] ⊂ A(K), and so GK acts trivially on A[m]. It
is well-known that if G acts trivially on A, then the first non-trivial group cohomology is
just the set of homomorphisms, i.e. we get usual homomorphisms instead of twisted ones,
H 1 (G, A) := {f : G → A : f (gh) = gf (h) + f (g) ∀ g, h ∈ G}
= {f : G → A : f (gh) = f (h) + f (g) ∀ g, h ∈ G} = Hom(G, A)
27
whenever G acts trivially on A. Let’s write down the map δ explicitly. It is analogous to the
Kummer sequence.
δ : A(K)/mA(K) ,→ Hom(GK , A[m])
(P mod mA(K)) 7→ (σ 7→ σ(QP ) − QP where [m]QP = P )
Now, the issue is that Hom(GK , A[m]) is not finite in general. The trick is to pass to an
appropriate subextension K ⊆ L ⊆ K, namely
Y
L := K([m]−1 A(K)) = K(Q)
Q∈A(K)
mQ∈A(K)
where product symbol is used to indicate that we are taking the compositum over all such
fields K(Q). The point is that δ(P )(σ) = σ(QP ) − QP where QP ∈ [m]−1 P (”m”-th roots of
P ). If σ fixes all of L, then δ(P )(σ) = 0. In other words, Gal(K/L) is contained in the kernel
of δ(P ) : GK → A[m]. Since GK / Gal(K/L) = Gal(K/K)/ Gal(K/L) ∼ = Gal(L/K), we see
that δ(P ) factors through GL/K := Gal(L/K), i.e. δ(P ) : GL/K → A[m]. Consequently, we
obtain an injection
A(K)/mA(K) ,→ Hom(GL/K , A[m])
The advantage is that Hom(GL/K , A[m]) is a lot smaller than Hom(GK , A[m]). Our goal is to
show that Hom(GL/K , A[m]) is finite. It suffices to establish that L/K is a finite extension.
Goal. We want to prove that [L : K] < ∞, in which case #GL/K < ∞ so that
# Hom(GK , A[m]) < ∞, which would imply that A(K)/mA(K) is finite, thus proving the
Weak Mordell-Weil Theorem.
We will proceed by showing the following steps.
Step 1. L/K is abelian of exponent m.
Step 2. There are only finitely many primes p of K which ramifies in L.
Step 3. Step 1 + Step 2 ⇒ L/K is finite.
13.1. Step 1. We have a pairing:
A(K) × GK → A[m]
(P, σ) 7→ σ(QP ) − QP
where QP ∈ [m]−1 P . We first need to check the map is well-defined, i.e. does not depend
on the choice of QP . Indeed, suppose that QP and Q0P are both elements of [m]−1 P , i.e.
[m]QP = P and [m]Q0P = P . We need to show that σ(QP ) − QP = σ(Q0P ) − Q0P . Note that
[m](QP − Q0P ) = 0 so that QP − Q0P ∈ A[m] ⊂ A(K) by assumption. We have
σ(QP ) − QP − (σ(Q0P ) − Q0P ) = σ(QP − Q0P ) − (QP − Q0P ) = 0
as desired. Note that the assignment given here is “bilinear” (a group homomorphism once
you fix either of the entries). Given P, P 0 , we have [m](QP + QP 0 ) = P + P 0 , so that
QP +P 0 = QP + QP 0 . Thus,
(P + P 0 , σ) 7→ σ(QP +P 0 ) − QP +P 0 = σ(QP ) + σ(QP 0 ) − QP − QP 0 ←[ (P, σ) + (P 0 , σ)
On the other hand, given σ, τ ∈ GK , we have
(P, σ ◦ τ ) 7→ σ(τ (QP )) − QP = σ(τ (QP )) − τ (QP ) + τ (QP ) − QP
= σ(QP ) − QP + τ (QP ) − QP ←[ (P, σ) + (P, τ )
28
We should justify the last equality: σ(τ (QP )) − τ (QP ) = σ(QP ) − QP . This is equivalent to
showing that σ(τ (QP )) − σ(QP ) = τ (QP ) − QP , which is true because τ (QP ) − QP ∈ A[m] ⊂
A(K), so σ must fix this element, i.e. σ(τ (QP ) − QP ) = τ (QP ) − QP .
Let’s try to understand what we need to quotient out so that the map A(K) × G(K) →
A[m] becomes non-degenerate. First, let’s look for the points P ∈ A(K) such that σ 7→
σ(QP ) − QP is the zero map,
{P ∈ A(K) : σ(QP ) = QP ∀σ ∈ GK where mQP = P } = {P : QP ∈ A(K) where mQP = P }
= mA(K)
So, to get non-degeneracy in the first component, we need to replace A(K) with A(K)/mA(K).
Similarly, let’s for the elements σ ∈ GK such that P 7→ σ(QP ) − QP is the zero map,
{σ ∈ GK : σ(QP ) = QP ∀P ∈ A(K)} = {σ ∈ GK : σ fixes L} = Gal(K/L)
because L was defined precisely to be the field generated by elements QP from [m]−1 (P )
as P varies in A(K). To get non-degeneracy in the second component, we need to replace
GK = Gal(K/K) with the quotient Gal(K/K)/ Gal(K/L) ∼ = Gal(L/K). We conclude that
the pairing
A(K)/mA(K) × Gal(L/K) → A[m]
is non-degenerate. In particular,
Gal(L/K) ,→ Hom(A(K)/mA(K), A[m])
Since A[m] is an abelian group, and has exponent m, it follows that Gal(L/K) is also abelian
of exponent m, completing Step 1.
13.2. Step 2. Given a number field K, and a prime ideal ℘ inside the ring of integers OK ,
we can consider the reduction map
mod ℘
PN (K) −−−−−→ PN (F℘ )
[a0 : a1 : · · · : aN ] 7→ [e a1 : · · · : e
a0 : e aN ]
where e
am = ai mod ℘. For this map to be well-defined (i.e. to avoid all the entires to be 0
mod ℘), we can choose projective coordinates so that
(1) every ai is ℘-integral, i.e. ord℘ (ai ) ≥ 0.
(2) some ord℘ (aj ) = 0.
For an abelian variety A ⊆ PN , one can form A e℘ ⊂ PN . This is called the
e mod ℘ = A
F℘
reduction of A modulo ℘. We have a natural map A(K) → A℘ (F℘ ).
Example. Let E be an elliptic curve given by y 2 = x3 + Ax + B where A, B ∈ Z such
that 4A3 + 27B 2 6= 0. Then E ep is non-singular if and only if p - 2(4A3 + 27B 2 ). One needs
the extra factor of 2 in front of the discriminant because the equation is singular over F2 .
Theorem. A e℘ is non-singular for all but finitely many primes ℘, and in fact it is an
abelian variety (whenever it is non-singular).
The second part of the statement deserves a remark. Just because A e℘ is non-singular, it is
not a priori clear that it is an abelian variety. Indeed, it is not clear that when one reduces
the multiplication map µ : A × A → A mod ℘, the resulting map µ℘ : A℘ × A℘ → A℘ stays
a morphism (in general it could be a rational map).
29
When A ewp is non-singular, we say that ℘ is a prime of good reduction for A. Otherwise, ℘
is a prime of bad reduction.
Thus, we get a homomorphism of abelian varieties,
A(K) → A
e℘ (F℘ )
A(K)[m] ,→ A
e℘ (F℘ )
In general, the kernel of A(K) → A e℘ (F℘ ) can be quite big. For example, if A(K) is infinite,
then the kernel is necessarily infinite as the target A e℘ (F℘ ) is finite. Nevertheless, the key
fact says that the m-torsion points will not be in the kernel.
Analog for the multiplicative group. If p - m, then µm ,→ Fp . In this case, µm =
Gm [m]. Note that the multiplicative group Gm = {xy − 1 = 0} is always non-singular when
reduced mod p. The injection µm ,→ Fp says that different m-th roots do not coincide in Fp
provided that p - m.
Proposition. The Key Fact ⇒ L/K is unramified for ℘ ∈ / SA/K,m , where
Proof. Let ℘ ∈ / SA/K,m . We want to show the unramifiedness. Let I℘ ⊂ GL/K be the inertia
group at ℘. Since GL/K is abelian, the inertial group is “well-behaved”. Since e = #I℘ is
equal to the ramification index at ℘, it suffices to prove that I℘ is the trivial group (being
unramified means that e = 1). Let p be a prime in L lying above ℘, i.e. p | ℘. Recall the
definition of the inertia group:
I℘ = ker GL/K → Gp/℘
Let σ ∈ I℘ . So σ fixes “L-things” mod p. Let Q ∈ A(L) with [m]Q ∈ A(K). Then
(1) σ(Q) ≡ Q mod p.
(2) [m](σ(Q) − Q) = σ([m]Q) − [m]Q = 0 as [m]Q ∈ A(K).
So σ(Q) − Q ∈ A[m]. Thus, σ(Q) − Q ∈ ker(A(L)[m] → A ep (Fp )) = 0 by the key fact, so
σ(Q) = Q. Therefore, σ ∈ I℘ fixes every Q with [m]Q ∈ A(K), which are the elements used
to generate L as a field extension over K. This implies that σ fixes L pointwise, i.e. σ = id
in GL/K . Since σ ∈ I℘ was arbitrary, we conclude that the inertia group I℘ is trivial, and
that ℘ is unramified in L.
This completes Step 2, assuming the key fact above (which will be proved later).
30
13.3. Step 3. We want to prove the following general result about field extensions.
Proposition. Suppose that L/K is an abelian extension of exponent m and unramified
at all ℘ ∈
/ S for some finite set S of primes. Then L/K is a finite extension.
Proof. Without loss of generality, we can make K bigger and so we may assume that l.µm ⊂ .
K. Indeed, replacing K with a finite extension K 0 will replace Gal(L/K) with its subgroup
Gal(L/K 0 ) which is still abelian of exponent m, and L/K is finite if and only if L/K 0 is
finite. By Kummer theory, there is an isomorphism K ∗ /(K ∗ )m ∼ = Hom(Gal(K/K), l.µ) given .
by b 7→ (σ 7→ σ(β)
β
where β m = b). This isomorphism comes from applying group cohomology
to the Kummer exact sequence, and using Hilbert Theorem 90 that H 1 (Gal(K/K), K ∗ ) =
0, and that H 1 (Gal(K/K), l.µ) = Hom(Gal(K/K), l.µ) because Gal(K/K) acts trivially on
. .
.
.
lµ ⊂ A(K), so the twisted homomorphisms are just the usual homomorphisms.
Next, note that L is generated by finite extensions K 0 of K. Since Gal(L/K) Gal(K 0 /K),
it follows that Gal(K 0 /K) is a finite abelian group of exponent m, so we can further decom-
pose K 0 into smaller extension of K whose Galois group is a subgroup of Z/mZ. Thus, L is
generated by those finite subextensions K 0 of K such that Gal(K 0 /K) ⊆ Z/mZ, i.e.
Y
L= K0
K 0 /K,K 0 ⊂L
GK 0 /K ⊆Z/mZ
√
where each such K 0 is obtained as K 0 = K( m b) for some b ∈ K ∗ . Thus,
p
m
L=K bi : i ∈ I
where bi are chosen from equivalence classes in K ∗ /(K ∗ )m . By enlarging the finite set S,
if necessary, we may assume that S contains {℘ : ℘ |√m}. The discriminant of xm − b is
/ S, it follows that K( m b)/K is unramified at ℘ if and only
±mm bm−1 . So, for all primes ℘ ∈
if ord℘ (b) ≡ 0 (mod m). We have an exact sequence
Y
0 → B −→ hbi ∈ K ∗ /(K ∗ )m : i ∈ Ii −→ Z/mZ
℘∈S
There are three different approaches: 1) Using formal groups – see the book [HS00], 2)
Chevalley-Weil Theorem regarding unramified morphisms V → W – see exercise C.7 in
[HS00], and 3) Hensel’s Lemma, plus some facts from algebraic geometry.
We will follow approach 3) for our proof.
Proof. We will need the following theorem, which we will assume without a proof:
Theorem. Let k be an algebraically closed field, and A/k be an abelian variety, and
g := dim(A). Then
(
(Z/pt Z)2g if p 6= 0 in k
A(k)[pt ] =
(Z/pt Z)i if p = 0 in k, where 0 ≤ i ≤ g
for each t ≥ 1.
We also need Hensel’s Lemma, which we will state in two versions. The first version is
the classical version, while the second one is a geometric version for varieties. Note that the
uniqueness part in version 1 is missing in version 2.
Hensel’s Lemma. Let K℘ be the completion of K at a prime ℘, and let R℘ denote the
ring of integers, and denote by ℘ = ℘R℘ for the corresponding maximal ideal. Then the
following statements hold.
Version 1. Let f (x) ∈ R℘ [x] and α ∈ R℘ such that f (α) ≡ 0 (mod ℘) and f 0 (α) 6≡
0 (mod ℘). Then there exists a unique β ∈ R℘ such that β ≡ α (mod ℘) and f (β) = 0.
Version 2. Let V /K℘ ⊆ PnK℘ be a variety. Reduce mod ℘ to get Ve℘ /F℘ ⊂ PnF℘ . Let
Qe ∈ Ve℘ (F℘ ). Assume that Qe is nonsingular point of Ve℘ . Then there exists some Q ∈ V (K℘ )
such that Q ≡ Q e (mod ℘).
Recall the definition of a non-singular point. We can choose local coordinates x1 , ..., xr
such that X is locally given by {f1 = ... = fs = 0} where
we can choose fi ∈ R℘ [x1 , ..., xn ].
∂fi e
A point Q is a nonsingular point of V if rank (Q) = n − dim(V ).
∂xj
We can now prove the key fact above. Suppose that ℘ is a prime such that A has a good
reduction at ℘ and ℘ - m. By the Version 2 of Hensel’s Lemma, the natural map
A(K℘ ) A
e℘ (F℘ )
is surjective. By enlarging the field K if necessary, we can assume that A[m] ⊆ K. Let’s
view A[m] as a scheme (rather than a set of closed points). Since A ⊂ PN is defined by
some polynomial equations, A = {F1 = · · · = Fr = 0}, the same holds true for A[m], i.e.
A[m] = {F1 = · · · = Fr = G1 = · · · = Gs = 0} where the polynomials G1 , G2 , ...., Gs arise
from analyzing the equation [m]P = 0.
Claim. The variety A[m] is non-singular.
Proof. It suffices to show that the multiplication map [m] : A → A is unramified, in
fact étale. This will imply the desired result as A[m] is the fiber of this map above 0. This
can be checked by passing to C. Indeed, [m] : A(C) → A(C) gets identified with the map
Cg /L → Cg /L given by z 7→ mz, i.e. multiplying each coordinate with m. This latter map is
32
étale, so the former map must be étale as well. Thus, A[m] is non-singular in characteristic
0, and so it is non-singular over F℘ for all but finitely many primes ℘.
Hence, for all but finitely many primes ℘, we can apply Hensel’s lemma to A[m] to get a
surjective map
A[m](K℘ ) A e℘ [m](F℘ )
Since both sides are isomorphic to (Z/mZ)2g , they both have the same cardinality m2g , and
thus, the map A[m](K℘ ) A e℘ [m](F℘ ) is also injective.
as desired.
14.2. More height inequalities. Fix a point Q ∈ A(K). Define a translation map
TQ : A → A
P 7→ P + Q
Let’s use the theorem of the cube for a symmetric divisor D,
(f + g + h)∗ D − (f + g)∗ D − (f + h)∗ D − (g + h)∗ D + f ∗ D + g ∗ D + h∗ D ∼ 0
with f = TQ , g = T−Q and h = [−1]. We first compute all the relevant maps:
(f + g + h)(P ) = P + Q + P − Q − P = P
(f + g)(P ) = P + Q + P − Q = 2P
(f + h)(P ) = P + Q − P = Q
(g + h)(P ) = P − Q − P = −Q
Thus, f + g + h = [1], f + g = [2], and f + h = Q and g + h = −Q are the constant maps
with values Q and −Q, respectively. Going back to the theorem of the cube, and using the
fact that the pullback of D under a constant map is 0, we obtain
[1]∗ D − [2]∗ D − 0 − 0 + TQ∗ D + T−Q
∗
D + [−1]∗ D ∼ 0
34
Now [1]∗ D and [2]∗ D = 4D. Also [−1]∗ D = D as D is assumed to be symmetric. Rearranging
the terms,
TQ∗ D + TQ∗ D ∼ 2D
for every fixed Q ∈ A(K). We will convert this statement about linear equivalence of divisors
(geometry) to an assertion about the height machine (arithmetic):
hTQ∗ D (P ) + hT−Q
∗ D (P ) = h2D (P ) + O(1)
∗ D (P ) = hD (P − Q).
Using functoriality, hTQ∗ D (P ) = hD (TQ (P )) = hD (P + Q). Similarly, hT−Q
Using linearity, h2D (P ) = 2hD (P ) + O(1). Thus,
hD (P + Q) + hD (P − Q) = 2hD (P ) + OA,D,Q (1)
where the subscript on the O(1) is written to emphasize which variables it depends on. We
will shortly see a more precise version of the bound (using the theorem of the square) which
will explain how the bound depends on Q. In particular, if D is ample, then hD (P + Q) ≥
O(1). Consequently,
2hD (P ) + O(1) = hD (P + Q) + hD (P − Q) ≥ O(1) + hD (P − Q)
or equivalently,
hD (P − Q) ≤ 2hD (P ) + O(1)
This is the key height inequality that is used in the descent argument in the proof of Mordell-
Weil. We will refer to this inequality as the descent inequality.
14.3. Descent. We are finally ready to prove the Mordell-Weil theorem. We will assume
the Weak Mordell-Weil, which we have already proved.
Theorem. (Mordell-Weil) For an abelian variety A over a number field K, the abelian
group A(K) is finitely-generated.
Proof. Fix m ≥ 2. By the Weak Mordell-Weil Theorem, the quotient A(K)/mA(K) is
finite. Choose coset representatives
{Q1 , Q2 , ..., Qr } ←→ A(K)/mA(K)
Fix an ample symmetric divisor D. This can be obtained by taking an ample divisor H,
and considering D = H + [−1]∗ H which is both ample and symmetric. For each point P ,
we have
(3) hD (mP ) ≥ m2 hD (P ) − C1 (A, D, m)
(4) hD (P − Q) ≤ 2hD (P ) + C2 (A, D, Q)
for some constants C1 , C2 . The first inequality relies on ĥD (mP ) = m2 ĥD (P ), while the
second inequality is the descent inequality. Applying (4) for each coset representative Qi ,
we can get a single constant C2 (A, D, Q1 , ..., Qr ) satisfying
(5) hD (P − Q) ≤ 2hD (P ) + C2 (A, D, Q1 , ..., Qr )
Take any P0 ∈ A(K). There exists some i1 such that
P0 ≡ Qi1 (mod mA(K))
35
that is, P0 = mP1 + Qi1 for some P1 ∈ A(K). Similarly, there is some i2 such that P1 =
mP2 + Qi2 for some P2 ∈ A(K). Continuing in this way n times, we have constructed a
sequence of points P0 , P1 , ..., Pn ∈ A(K) such that
P0 = mP1 + Qi1
P1 = mP2 + Qi1
..
.
Pn−1 = mPn + Qin
The idea of the descent is to show that the heights of the points Pi must be getting smaller
as i increases. The system of equations imply that
P0 = mn Pn + Z-linear combination of Q1 , ..., Qr
For each 1 ≤ j ≤ n, we obtain Pj−1 − Qij = mPj , so hD (Pj−1 − Qij ) = hD (mPj ). Applying
(4) and (5),
2hD (Pj−1 ) + C2 ≥ hD (Pj−1 − Qij ) = hD (mPj ) ≥ m2 hD (Pj ) − C1
Thus,
m2 hD (Pj ) − C1 ≤ 2hD (Pj−1 ) + C2
or equivalently,
2 C1 + C2
hD (Pj ) ≤ hD (P j−1 ) +
m2 m2
Apply this repeatedly to get,
n 2 n−1 !
2 C1 + C2 2 2 2
hD (Pn ) ≤ h D (P 0 ) + 1+ 2 + + ··· +
m2 m2 m m2 m2
Note that
2 n−1 ∞ i
m2
2 2 2 X 2 1
1+ 2 + + ··· + ≤ = =
m m2 m2 i=0
m2 1 − m22 m2 − 2
Substituting this bound into the previous one, we obtain
n
2 C1 + C2
hD (Pn ) ≤ 2
hD (P0 ) + 2
m m −2
As m ≥ 2, we can get even more rough upper bound by replacing each m with 2,
n
1 C1 + C2
hD (Pn ) ≤ hD (P0 ) +
2 2
This explains why the height of Pn decreases as n increases. In particular, we can find n
1 n
(which depends only on the initial point P0 ) such that 2 hD (P0 ) ≤ 1, so that
C1 + C2
hD (Pn ) ≤ 1 +
2
Recall that
P0 = mn Pn + Z-linear combination of Q1 , ..., Qr
36
Since P0 was an arbitrary point in A(K), we conclude that
C1 + C2
A(K) ⊂ SpanZ {Q1 , ..., Qr } ∪ P ∈ A(K) : hD (P ) ≤ 1 +
2
The first set {Q1 , ..., Qr } is finite (because of Weak Mordell-Weil), and the second set is also
finite because D is ample. This finally completes the proof of the Mordell-Weil Theorem.
This implies that the set {P ∈ A(K) : ĥD (P ) ≤ 1} is infinite, which is a contradiction as D
is ample.
Remark. It is worth emphasizing that part (4) did not formally follow from part (3),
and we had to use special properties of the height function, namely boundedness. Here is an
example of a lattice L with a quadratic form q such that q is positive definite√on L, but q does
not induce positive definite quadratic form on L ⊗Z R. Consider L =√Z + Z 2 as a subgroup √
2 2 2
of R, and let q : L → R given by q(x) = |x|
√ . More explicitly, q(a + b 2) = a + 2b + 2ab 2.
Then q is positive definite on L because ∼
2 is irrational. On the other hand, L ⊗Z R = R ⊕ R
√ √
and q(a + b 2) = 0 for (a, b) = ( 2, −1) so q is not positive definite on L ⊗Z R. The problem
apparently arises from the fact that the image set q(L) has an accumulation point in R. By
contrast, the boundedness property for ample divisors is precisely the statement that the set
{hD (P )}P ∈A(K) is a discrete subset of R, so everything works out in this case.
18.4. More on Pic0 (C). Let C/K be a genus g ≥ 1 curve, and P0 ∈ C(K). We have defined
Pic0 (C) as
Div0 (C)
Pic0 (C) =
∼
42
where ∼ stands for linear equivalence. Note that Pic0 (C), as defined, is just an abelian group.
We will soon see that Pic0 (C) is also a variety. There is a natural map j : C → Pic0 (C) given
by P 7→ [(P ) − (P0 )]. As we briefly saw before, we can use j to map multiple copies of C into
Pic0 (C). Indeed, j : C n → Pic0 (C) can be defined by (P1 , ..., Pn ) 7→ [(P1 )+...+(Pn )−n(P0 )].
Clearly, the image does not depend on the order of the points, as the addition in Div(C) is
commutative. Thus, we get a well-defined map C (n) → Pic0 (C) where C (n) := C n /Sn .
It turns out that j(C (g) ) = Pic0 (C) which will be proved by Riemann-Roch. In other
words, every divisor of degree 0 on C can be expressed in the form (P1 ) + ... + (Pn ) − n(P0 )
for some points P1 , ..., Pn . Since C (g) is naturally a variety, the equality j(C (g) ) = Pic0 (C)
allows one to equip Pic0 (C) with the structure of a variety. For this last assertion to make
perfect sense, we also need some sort of injectivity of the map j. This is in fact true: there
exists an open set U ⊂ C (g) such that j|U : U ,→ Pic0 (C). This turns “most of” Pic0 (C) into
an an algebraic variety, and then there is a general theorem of Weil that allows one to extend
to get an abelian variety structure on all of Pic0 (C). Let’s summarize our observations in a
theorem.
Theorem. (Three key facts about Pic0 ).
(1) j(C (n) ) = Pic0 (C).
(2) There is a non-empty open set U ⊂ C (g) such that j|U : U ,→ Pic0 (C).
(3) There exists an abelian variety J = JC = Jac(C) such that
∼ ∼
C (g) −→ J −→ Pic0 (C).
birational
Proof. (1) Let [D] ∈ Pic0 (C). Consider the divisor g(P0 ) + D. By Riemann-Roch,
`(g(P0 ) + D) ≥ deg(g(P0 ) + D) − g + 1 = g − g + 1 = 1
Therefore, there is some f ∈ K(C) such that div(f ) + g(P0 ) + D ≥ 0. Let E = div(f ) +
g(P0 ) + D. Note that D ∼ E − g(P0 ), and deg(E) = g. See the book [HS00] for the proofs
of parts (2) and (3).
∼
Upshot. There exists an abelian variety J such that j : C g /Sg −→ J.
birational
Let Θ := j(C g−1 ) ∈ Div(J) be the theta divisor. This is an irreducible divisor, as C is
irreducible.
Theorem. (Properties of the Theta divisor).
(1) [−1]∗ Θ ∼ Θκ where Θκ = Θ − κ, or more formally, Θκ = Tκ∗ Θ. In fact, we will see in
the proof that κ = j(KC ). We will also use the notation Θ− := [−1]∗ Θ.
(2) j ∗ Θ = Θ|C ∼ g(P0 ) + κ, and j ∗ (Θ−
z ) ∼ g(P0 ) − z for z ∈ J. Moral: Θ|C has degree g.
(3) Recall that σ : J × J → J is the map (P, Q) 7→ P + Q, while π1 : J × J → J,
π2 : J × J → J are the usual projections. Then
(j × j)∗ [σ ∗ Θ − π1∗ Θ − π2∗ Θ] ∼ −∆ + {P0 } × C + C × {P0 }
| {z } | {z } | {z }
∈Div(J×J) p∗1 (P0 ) p∗2 (P0 )
| {z }
∈Div(C×C)
Next, Θz = Θ−
z−j(Kc ) by (1), and so
j ∗ (Θz ) = j ∗ (Θ−
z−j(KC ) ) ∼ g(P0 ) − z + j(KC ) = g(P0 ) − z + κ
Then
#{x ∈ S : ||x|| ≤ R} ≤ c log R
for some constant c > 0.
Compare this with:
#{x ∈ L : ||x|| ≤ R} ∼ cL · Rr
where r = dim(V ) and cL depends on the fundamental domain of the lattice L.
The idea behind the proof is to count the points in S carefully by considering points on
the annulus.
Proof. For u ≤ v, let
S(u, v) = {x ∈ S : u < ||x|| ≤ v} = S ∩ (B0 (v) \ B0 (u))
where 0 is the zero vector of V . Here we are employing the notation:
Bx0 (R) = {x ∈ V : ||x − x0 || ≤ R}
for a closed ball of radius R centered at x0 . If x, √y ∈ S(u, v) and x 6= y, then ||x − y||2 ≥
α(||x||2 + ||y||2 ) ≥ 2αu2 , which implies ||x − y|| ≥ 2αu. As a result,
Bx (βu) ∩ By (βu) = ∅
47
1
√ p
where β = 2
2α = α/2. Using x ∈ B0 (v), we get Bx (βu) ⊆ B0 (v + βu). Thus,
[
B0 (v + βu) ⊇ Bx (βu)
x∈S(u,v)
where the union on the right is in fact a disjoint union because Bx (βu) ∩ By (βu) = ∅ for any
x, y ∈ S with x 6= y. By taking the volume of both sides,
X
vol(B0 (v + βu)) ≥ vol(Bx (βu))
x∈S(u,v)
We deduce that r r
v + βu v
#S(u, v) ≤ = 1+
βu βu
This is a very strong condition. The bound depends only on the ratio of the radii in the
annulus. We now use a dyadic trick:
blog2 Rc
X
#{x ∈ S : ||x|| ≤ R} ≤ #S(2k , 2k+1 ) + #{x ∈ S : ||x|| ≤ 1}
k=0
Note that the second piece is at most #{x ∈ L : ||x|| ≤ 1}, which is finite as L is a lattice,
and so it is discrete. We will focus on the first piece:
blog2 Rc blog2 Rc r r
2k+1
X
k k+1
X 2
#S(2 , 2 ) ≤ 1+ k
≤ log2 R · 1 +
k=0 k=0
β2 β
p
As β = α/2 and r are both constants, this finishes the proof of the proposition.
Let us address a few steps where we were not accurate, and how to fix them.
Fudge. 1) Rather than ||x − y||2 ≥ α(||x||2 + ||y||2 ), one should use ||x − y||2 ≥ α(||x||2 +
||y||2 ) + O(1). This is not a huge problem, since we can just replace α with something else,
i.e. we can instead use ||x − y||2 ≥ α̃(||x||2 + ||y||2 ) for a suitably chosen α̃ < α.
2) In reality Θ 6= Θ− , so Θ is not a symmetric divisor. So we should instead apply the
argument with the divisor D = Θ + Θ− = Θ + Θk , so the correct inequality should be
2 1
||x||2 + ||y||2 − C1 (||x|| + ||y|| + 1)
||x − y|| ≥ 1 −
g
In our analysis we did not have the linear term ||x|| + ||y|| + 1. Using ||x + k||2 = ||x||2 +
2hx, ki + ||k||2 , and completing the square, we can convert this into a version ||x − y||2 ≥
α0 (||x||2 + ||y||2 ) for some α0 > 0.
We have finally proven the theorem due to Mumford:
Theorem (Mumford). If C is a smooth curve of genus g ≥ 2 over a number field K,
then there is a constant c > 0 such that:
#{P ∈ C(K) : h(P ) ≤ R} ≤ c log R
where D is any ample divisor on C.
48
How do different height functions on a curve relate to each other?
Proposition. Let D1 , D2 ∈ Div(C) where D2 is ample. Then
hD1 (P ) deg(D1 )
lim =
hD2 (P )→∞ hD2 (P ) deg(D2 )
for any P ∈ C(K).
Proof. First, we will prove the special case when E = D1 satisfies deg(E) = 0. By
Riemann-Roch, if D ∈ Div(D) satisfies deg(D) ≥ 2g + 1, then D is very ample. It follows
that D ∈ Div(C) is ample if and only if deg(D) ≥ 1. We also know that
D is ample ⇒ hD (P ) ≥ O(1)
Take any integer n ≥ 1. Then
deg(D + nE) = deg(D) ≥ 1 and deg(D − nE) = deg(D) ≥ 1
So D + nE and D − nE are ample, and therefore,
hD+nE (P ) ≥ O(1) and hD−nE (P ) ≥ O(1)
Using functoriality, we obtain
hD (P ) ≥ −nhE (P ) + O(1)
hD (P ) ≥ nhE (P ) + O(1)
Dividing both sides by −nhD (P ) in the first equation, and by nhD in the second equation,
we get
1 hE (P ) 1
− ≤ +O
n hD (P ) hD (P )
1 hE (P ) 1
≥ +O
n hD (P ) hD (P )
We have shown that:
1 hE (P ) 1 1
− ≤ +O ≤
n hD (P ) hD (P ) n
Letting hD (P ) → ∞,
1 hE (P ) hE (P ) 1
− ≤ lim inf ≤ lim sup ≤
n hD (P )→∞ hD (P ) hD (P )→∞ hD (P ) n
We deduce that
hE (P )
lim =0
hD (P )→∞ hD (P )
It follows that
√
3 A A
|x − 2y| = √ √ ≤
|x2 + 3 3 2
2xy + 4y | Cy 2
Consequently,
x √3 A
− 2 ≤
y C|y|3
√
This last inequality says that x/y ∈ Q is very close to 3 2. According to the theorem of
Axel Thue, there are only finitely many solutions x/y ∈ Q with gcd(x, y) = 1 satisfying
this inequality. From now on, when we write x/y ∈ Q we will implicitly assume that
gcd(x, y) = 1.
More generally, Thue was able to show that if α is an irrational algebraic number with
degree [Q(α) : Q] = d, then for any > 0, the inequality
x 1
− α ≤ d/2+1+ε
y y
50
is satisfied for only finitely many rational numbers x/y ∈ Q with y > 0. In our application
above, d = 3, and so d/2 + 1 + ε = 2.5 + ε.
Diophantine approximation is the theory of how close rational quantities approximate
irrational quantities.
Theorem. (Dirichlet) Let α ∈ R\Q. There are infinitely many rational numbers x/y ∈ Q
with
x 1
−α ≤ 2
y y
Proof. We will apply the pigeonhole principle. Let {t} := t − btc. Choose B ∈ N. Look at
{0}, {α}, {2α}, ..., {Bα}. These will be the pigeons. Since there are B + 1 pigeons, we need
B pigeonholes. Look at the intervals:
1 1 2 B−1
0, , , , ..., ,1
B B B B
These are the pigeonholes. By the pigeonhole principle, there exist 0 ≤ x1 < x2 ≤ B, and
0 ≤ i < B such that
i i+1
{x1 α}, {x2 α} ∈ ,
B B
Consequently,
1
|{x2 α} − {x1 α}| ≤
B
which translates into:
1
|x2 α − bx2 αc − x1 α + bx1 αc| ≤
B
1
|(x2 − x1 )α − (bx2 αc − bx1 αc)| ≤
B
Letting y = x2 − x1 and x = bx2 αc − bx1 αc, we get |yα − x| ≤ B1 . Dividing both sides by
|y|, we obtain
x 1 1
α− ≤ ≤ 2
y B|y| y
since |y| = |x2 − x1 | ≤ B.
Remark. One of the improvements of Dirichlet’s theorem is a result due to Hurwitz,
which states that for α ∈ R \ Q, there are infinitely many solutions to:
x 1
−α ≤ √
y 5y 2
√
The constant 5 cannot
√ be replaced by any bigger constant, so the result is optimal√
in a
1+ 5
certain sense. This 5 comes from attempting to approximate the golden ratio α = 2 . In
some sense, the golden ratio is the real number that is hardest to approximate by a rational
number.
Theorem. (Liouville) Let α ∈ Q \ Q and d = [Q(α) : Q]. There exists a constant C(α),
which depends on α, such that for all x/y ∈ Q,
x C(α)
−α ≥
y yd
51
An equivalent formulation is the statement that there are only finitely many rational numbers
x/y ∈ Q satisfying |x/y − α| < 1/y d+ε .
Proof. We may assume that α ∈ Q ∩ R. Let f (T ) = a0 T d + a1 T d−1 + · · · + ad ∈ Z[T ]
be the minimal polynomial of α. Note that f (α) = 0 but f 0 (α) 6= 0. We can write f (T ) =
(T − α)g(T ) where g(T ) ∈ R[x] and g(α) 6= 0. For any x/y ∈ Q,
x x x
f = −α · g
y y y
So we have
a0 xd + a1 xd−1 y + · · · + ad y d
x 1
0 6= f = d
≥ d
y y |y|
Consequently,
x x 1 x 1 1
−α · g ≥ d ⇒ −α ≥ d ·
y y |y| y |y| |g(x/y)|
x
So we just need an upper bound for |g(x/y)|. Either y
− α > 1, in which case we are done.
x
So, we may assume that y
− α ≤ 1, and
x
g ≤ sup |g(t)| = C(α)−1
y |t−α|≤1
for some C(α) > 0. Here we used the fact that g(α) 6= 0, and the fact that g(x) is a
continuous function, and so it achieves a minimum on a closed set {t ∈ R : |t − α| ≤ 1}.
Combining the inequalities, we obtain
x 1 1 C(α)
−α ≥ d · ≥
y |y| |g(x/y)| |y|d
as desired. P∞ 1
Corollary. Let β = n=0 10n! is in R \ Q, i.e. it is a transcendental number.
Proof sketch. Let xyNN = N 1
P
n=0 10n! be the partial sums. If β were algebraic with degree
d = [Q(β) : Q], it would satisfy Liouville’s theorem. So,
C(β) xN constant
d
≤ −β ≤ f (N )
yN yN yN
where f is some function that satisfies f (N ) → ∞ as N → ∞. This contradicts the inequality
above for values of N that are much bigger than d.
As we mentioned above, Thue improved Liouville’s theorem by showing that for every
α ∈ Q \ Q with degree d = [Q(α) : Q], and ε > 0, there exists a constant C(α, ε) > 0
depending only on α and ε such that
x C(α, ε)
− α ≥ d/2+1+ε
y |y|
for all rational numbers x/y ∈ Q. The bound was later improved by Siegel to
x C(α, ε)
−α ≥ √
y |y|2 d+1+ε
52
Dyson and Gelfond independently proved a slightly stronger statement:
x C(α, ε)
−α ≥ √
y |y| 2d+1+ε
√ √
k
In his paper, Dyson suggested that √ maybe 2d can be replaced with kd for any k ≥ 1.
By letting k → ∞, we would get k kd → 1. This goal was achieved by Roth. Thus, Roth’s
theorem is the inequality:
x C(α, ε)
−α ≥
y |y|2+ε
The exponent 2 + ε is the best possible in view of Dirichlet’s theorem.
All the results above are not effective, i.e. the proofs do not furnish any estimate for how
big C(α, ε) can get. An effective version for Liouville’s theorem was proved by Baker, who
showed that there are constants δ(α) > 0 and C(α) > 0 such that:
x C(α)
− α ≥ d−δ(α)
y |y|
where both δ(α) and C(α) are effective. We can apply Baker’s theorem to the Diophantine
equation x3 − 2y 3 = A. As we mentioned above, any solution (x, y) ∈ Z2 with y 6= 0 must
satisfy: √
C( 3 2) x √ 3 4A
√
3 ≤ − 2 ≤ √ 3
|y|3−δ( 2) y 3 4|y|3
√
3
The left-hand side √comes from Baker’s theorem. Rearranging the inequality, we get |y|δ( 2) ≤
constant. Since δ( 3 2) > 0 is effective, there is a finite√search space for the potential values of
y. In theory, depending on how large the constant δ( 3 2) > 0 is, this allows us to enumerate
all integral solutions to x3 − 2y 3 = A.
References
[Dob79] E. Dobrowolski, On a question of Lehmer and the number of irreducible factors of a polynomial,
Acta Arith. 34 (1979), no. 4, 391–401, DOI 10.4064/aa-34-4-391-401.
[HS00] Marc Hindry and Joseph H. Silverman, Diophantine geometry 201 (2000), xiv+558, DOI
10.1007/978-1-4612-1210-2. An introduction.
53