Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

DIOPHANTINE GEOMETRY – TAUGHT BY JOE SILVERMAN

NOTES BY SHAMIL ASGARLI

1. Diophantine Equations
Objective. Solve polynomial equations using integers (or rationals).
Example. Linear equation: Solve ax + by = c where a, b, c ∈ Z. It is a classical fact
from elementary number theory that this equation has a solution (for x and y) if and only
if gcd(a, b) | c.
Example. Quadratic equation: x2 + y 2 = z 2 . We are interested in non-zero solutions
(x, y, z) ∈ Z, i.e. (x, y, z) 6= (0, 0, 0). Since the equation is homogeneous, it is enough
to understand the solutions of X 2 + Y 2 = 1 where X, Y ∈ Q (points on the unit circle).
Anyways, the complete solution is known for this problem. WLOG gcd(x, y, z) = 1, x is
odd and y is even. Then the solutions are given by x = s2 − t2 , y = 2st, and z = s2 + t2 .
s2 − t2
Analogously, the solutions (X, Y ) ∈ Q2 of X 2 + Y 2 = 1 are parametrized by: X = 2
s + t2
2st
and Y = 2 .
s + t2
Example. Consider the equation y 2 = x3 − 2. It turns out that (3, ±5) are the only
solutions in Z2 , while there are infinitely many solutions (x, y) ∈ Q.
Goal. Given f1 , f2 , ..., fk ∈ Z[X1 , ..., Xn ]. For R = Z, Q, or any other ring. Let
V (R) = {(x1 , x2 , ..., xn ) ∈ Rn : fi (x1 , ..., xn ) = 0 for all i}
Describe V (R). Two questions naturally arise. 1) Is V (R) = ∅? This is undecidable for
R = Z. 2) Is V (R) finite?
2 variables, 1 equation. Let C be a plane curve given by f (x, y) = 0 where f (x, y) ∈
Z[x, y]. The goal would be to describe the solutions (x, y) ∈ Z2 , Q2 , R2 or C2 . As the ring gets
bigger and bigger, the task progressively becomes easier. In other words, we are concerned
with the solution set C(R) = {(x, y) ∈ R2 : f (x, y) = 0}.
Example. C : x3 + y 2 = 1.
THM
C(Q) = C(Z) = {(1, 0), (0, 1)}
Example. C : xn + y n = 1.
FLT
C(Q) ⊆ {(±1, 0), (0, ±1)}
where FLT stands for Fermat’s Last Theorem (proved by Wiles).
Idea. If degf (x, y) is big, does that necessarily mean fewer solutions? Not necessarily,
e.g. y = xd still has plenty of solutions.
Guiding Principle. Geometry (solutions to polynomial equations over an algebraically
closed field) determines the arithmetic (number theory, i.e. solutions over integers or non-
closed fields).
1
Consider the plane curve C : f (x, y) = 0. There are some extra points “at infinity”. Let
C = C ∪ {points at infinity}. Sometimes, C is a nice curve (smooth). Not so nice curves are
the ones with singularities (think of a cuspidal or nodal cubic curve). We can blow-up these
curves at their singular points to make them smooth.
Assuming C is nice, the set C(C) is a nonsingular compact 1-dimensional complex mani-
fold, i.e. a Riemann surface of genus g. Intuitively, g counts the number of holes. So g = 0
corresponds to the 2-sphere, while g = 1 corresponds to the usual torus, etc. So here, the
genus g is the “geometry” side (see the Guiding Principle above).
Theorem. Consider the plane curve C : f (x, y) = 0 for f ∈ Q[x, y]. Suppose there are
no singularities, so C(C) = g-holed torus, where g is the genus of C. There are three cases
to consider.
Case 1. g = 0. Then C(Q) = ∅ or C(Q) = Q ∪ {∅}. There exists an algorithm to
determine the conclusion.
Case 2. g = 1. Then C(Q) = ∅ or C(Q) = finitely-generated abelian group. This is the
Mordell-Weil Theorem. In the latter case, we know that any finitely-generated abelian group
is of the form
finite abelian group × Zr
| {z }
=torsion part

. The non-negative integer r is called the rank. It is a theorem of Mazur that the torsion
part has order at most 16. Furthermore, there exists an algorithm to determine the torsion
part. It is not known if the rank r can be unbounded or not. Current record for an example
with high rank is r = 28 due to Elkies. There is no known algorithm to determine the rank
in general.
Case 3. g ≥ 2. Then C(Q) is finite. This is a theorem of Faltings (this result was
previously known as Mordell’s Conjecture). There is no algorithm in general to find the
solution set.
Goals for the class. We will prove Mordell-Weil Theorem and Faltings’ Theorem (but
not Faltings’ original proof).
Key Tools.
(1) Diophantine Approximation: how closely can a rational quantity approximate an
irrational quantity? We will learn about results of Roth, Baker and others.
(2) Height Functions: measuring complexity of objects.

2. Diophantine Approximation
Let us say few words about Diophantine Approximation. First, since Q is dense in R, it is
a √ √
true that inf | − 2| = 0. However, we are interested in approximating 2 with rationals
a/b∈Q b
whose denominators are not so large (relatively speaking). For example, here are two facts
that are easy to prove:
a a √ 1
(1) There are infinitely many ∈ Q with gcd(a, b) = 1 satisfying − 2 < 2.
b b b
a a √ 1
(2) There are only finitely many ∈ Q with gcd(a, b) = 1 satisfying − 2 < 3.
b b b
In fact, let’s prove a more general result that implies the second statement.
2
Theorem. (Liouville) Let  > 0. If x ∈ R satisfies a degree n polynomial with coefficients
in Q, then
a 1
− x < n+
b b
a
has only finitely many solutions for with gcd(a, b) = 1.
b
Remark. The following proof was communicated to me by Ming Hao Quek.
Proof. Consider the set S defined by
 
a a 1
S := ∈Q: − x < n+
b b b
Assume, to the contrary, that
Qn S is infinite. Say x satisfies a monic polynomial P (X) ∈ Q[X]
of degree n. Let P (X) = i=1 (X − xi ) where xi ∈ C and x1 = x. Given b ∈ S, P ab is a
a

rational number with denominator at most Dbn for some fixed D > 0. Since P (X) has only
finitely many roots, and S is infinite, the subset
na a o
S 0 := ∈S:P 6= 0
b b
must be infinite as well. For all a
b
∈ S 0 , we have
a 1
P ≥
b Dbn
On the other hand, for all ab ∈ S 0 ,
a a  a  a 
P = −x − x2 · · · − xn
b b b
 b 
n
a Y  a  a a ∆
≤ −x  − x + |x − xi | ≤ − x (1 + δ)n−1 = ∆ − x ≤ n+
b i=2
 b
| {z } | {z } b b b

≤1

where δ is any upper bound for the difference of the roots of P (X), and ∆ := (1 + δ)n−1 only
depends on x. Combining the upper and lower bounds, we get
1 ∆
≤ ⇒ bn+ ≤ D∆bn
Dbn bn+
for all ab ∈ S 0 . Since S 0 is infinite, we can choose ab ∈ S 0 where b is arbitrarily large, and this
leads to a contradiction. 
a
Using the above theorem, we easily see that there are only finitely many ∈ Q with
b
a √ 1
gcd(a, b) = 1 satisfying − 2 < 3 . Similarly, for any  > 0, there are only finitely many
b b
a a √ 3 1
∈ Q with gcd(a, b) = 1 satisfying − 2 < 3+ .
b b b
Fact. It is also true, but much harder to prove, that there are only finitely many ab ∈ Q
a √ 3 1
with gcd(a, b) = 1 satisfying − 2 < 2.5 . This would instantly follow from Roth’s
b b
celebrated theorem.
3
Theorem. (Roth) Let  > 0. If x is an irrational number, then there are only finitely
a
many ∈ Q with gcd(a, b) = 1 satisfying
b
a 1
− x < 2+
b b

3. Algebraic Geometry Background


We have the standard definitions of affine and projective spaces.
An (K) = {(x1 , ..., xn ) : xi ∈ K}
Pn (K) = An+1 (K) \ {0}/K ∗ = {[x0 : · · · : xn ] = [λx0 : · · · : λxn ] for λ ∈ K ∗ }

Let K be a number field, and fix an algebraic closure K. The Galois group GK = Gal(K/K)
acts on An (K) by σ(P ) = (σ(x1 ), ..., σ(xn )) for P = (x1 , ..., xn ).
Then An (K)GK = fixed points of An (K) = An (K). Likewise, GK acts on Pn (K) by
σ(P ) = [σ(x0 ) : · · · : σ(xn )] for P = [x0 : · · · : xn ].
Proposition. Pn (K)GK = Pn (K).
Proof. This is an application of Hilbert’s Theorem 90.
Definition. We say that f (x0 , x1 , ..., xn ) is a homogeneous of degree d if
X
f= aI xi00 · · · xinn
I=(i0 ,...,in )
i0 +···+in =d

or equivalently, f (λx0 , ..., λxn ) = λn f (x0 , ..., xn ) in the ring K[λ, x0 , ..., xn ].
For P ∈ Pn (K), f (P ) is not well-defined but {P : f (P ) = 0} is well-defined.
Definition. A rational map f : Pn → Pm is f = [f0 , ..., fm ] with f0 , ..., fm ∈ K[x0 , ..., xn ]
homogeneous of degree d. To be more pedantic, we could have written f : Pn (K) → Pm (K).
Then f (P ) is (almost) well-defined: If P = [a0 , ..., an ], then f (P ) = [f0 (a0 , ..., an ), ..., fm (a0 , ..., an )]
is well-defined when fi (a0 , ..., an ) 6= 0 for some i.
Since K[x0 , ..., xn ] is a UFD, we can assume that f0 , f1 , ..., fm have no common irreducible
factor, that is, gcd(f0 , ..., fm ) = 1, in which case we can define the degree of f as d.
For a rational map f : Pn → Pm , its indeterminacy locus is defined by:
If = {P ∈ Pn : f0 (P ) = · · · = fm (P ) = 0}
So f gives a function f : Pn (K) \ If → Pm (K).
Definition. A rational map f : Pn → Pm is called a morphism if If = ∅.
Example. Consider the rational map f : P2 → P2 given by [x, y, z] 7→ [x2 , xy, z 2 ]. Then
If = {[0, 1, 0]} and so f is not a morphism.
Nullstellensatz. Given F1 , F2 , ..., FN ∈ K[x0 , ..., xn ] homogenous polynomials (not nec-
essarily of the same degree), let
V (F ) := V (F1 , ..., FN ) := {P ∈ Pn (K) : F1 (P ) = · · · = FN (P ) = 0}
p
Suppose V (F ) = ∅. Then Nullstellensatz says that the radical hF1 , ..., FN i = (x0 , ..., xn ).
In other words, for each 1 ≤ i ≤ n, there exists a ki ∈ N such that xki i = N
P
i=1 Gi Fi .
4
4. Height Functions
Moral: “height(object) = complexity”.
Given P = [x0 , ..., xn ] ∈ Pn (Q), assume WLOG xi ∈ Z. Since Z is a UFD, we can
assume WLOG gcd(x0 , ..., xn ) = 1 in which case we say the coordinates are normalized
P up to ±1).
(normalization in this case is unique
To describe P takes roughly ni=1 (log2 |xi | + 1) bits.
Definition. Given P = [x0 , ..., xn ] ∈ Pn (Q),
| {z }
normalized
 
(1) The (logarithmic) height of P is h(P ) = log max |xi | .
0≤i≤n

(2) The (multiplicative) height of P is H(P ) = max |xi |


0≤i≤n
Theorem. For any fixed B > 0 and n ∈ N,
#{P ∈ Pn (Q) : h(P ) ≤ B} < ∞
Proof. Write P = [x0 , ..., xn ] ∈ Pn (Q) in a normalized form. Note that H(P ) = max |xi | ≤
0≤i≤n
B 0 where B 0 = eB . The number of choices for xi is 2B 0 + 1, namely the number of integers
in the interval [−B, B]. Thus,
#{P ∈ Pn (Q) : h(P ) ≤ B} = {P ∈ Pn (Q) : H(P ) ≤ B 0 } ≤ (2B 0 + 1)n+1 < ∞
as desired. 
In fact, the size of {P ∈ Pn (Q) : H(P ) ≤ B 0 } is asymptotic to kn (B 0 )n+1 where kn = 2
ζ(n+1)
.
In symbols,
#{P ∈ Pn (Q) : H(P ) ≤ B}
lim =1
B→∞ kn B n+1
It would be an instructive exercise to check this for n = 1.

5. Absolute Values
Recall that
MQ := {(normalized) absolute values on Q} = {| · |∞ } ∪ {| · |p where p ∈ Z is prime}
For a given a ∈ Q, we have |a|∞ = max(a, −a), and |a|p = p− ordp a where ordp (a) = r is
defined to be the unique integer such that a = pr cb with p - b and p - c.
Product Formula. For any a ∈ Q \ {0}, we have
Y
|a|v = 1
v∈MQ

In general, for any number field K, we have


MK := {absolute values on K extending MQ }
We usually decompose MK = MK∞ ∪ MK◦ where MK∞ consists of archimedean places, and MK◦
consists of non-archimedean places. Given a tower of field extensoons L/K/Q, and places
w ∈ ML . v ∈ MK , we say that w | v if | · |w = | · |v on K.
Notation. We denote Kv to be the completion of K at v.
If v ∈ MK∞ , then Kv = R or C.
5
Let nv be the local degree of v, which is defined to be nv := [Kv : Qv ].
If v ∈ Mk∞ , then nv = 1 or 2.
If v ∈ Mk◦ , where v corresponds to a prime p lying over a prime p ∈ Z, then
nv = e(p/p)f (p/p)
where e and f denotes ramification index and inertia degree, respectively.
Definition. ||a||v = |a|nv v (a slightly different normalization).
Product Formula. For a ∈ K ∗ , we have
Y
||a||v = 1
v∈MK

The proof of this result uses


Y Y Y
||a||v = |NK/Q a| and ||a||v = N p− ordp (a) = |NK/Q a|−1

v∈MK ◦
v∈MK p|(a)

Recall that RK denotes the ring of integers in a number field K. We can interpret the ring
of integers in terms of places:
RK = {a ∈ K : |a|v ≤ 1 for all v ∈ MK◦ } = {a ∈ K : |a|v ≤ 1 for all v ∈
/ MK∞ }
More generally, let S be a finite set such that MK∞ ⊆ S ⊂ MK .
Definition. The ring of S-integers is defined by
RS := {a ∈ K : |a|v ≤ 1 for all v ∈
/ S}
In other words, we allow the finite set S of primes to occur in the denominator.
Theorem. (Dirichlet’s Unit Theorem).

RS∗ = µK × Z#S−1 = µK × Zr1 +r2 +#(S∩MK )−1
where µK denotes the roots of unity in K, and r1 , r2 denote the number of real and complex
(counted in pairs with its complex conjugate) embeddings of K, i.e. [K : Q] = r1 + 2r2 .

6. Height Functions on Number Fields


Let K/Q be a number field. Given a point P ∈ Pn (K), the (relative multiplicative) height
of P is Y
HK (P ) = max ||xi ||v
0≤i≤n
v∈MK
The (relative logarithmic) height of P is hK (P ) := log HK (P ). The word “relative” here
means relative to K.
Proposition.
(1) HK (P ) is well-defined.
(2) HK (P ) ≥ 1.
(3) For a field extension L/K, HL (P ) = HK (P )[L:K] .
Proof. See the book [HS00]. For 2), a different way to proceed is to note that xj 6= 0 for
some j so Y Y
HK (P ) = max ||xi ||v ≥ ||xj ||v = 1
0≤i≤n
v∈MK v∈MK
where the last equality follows from the product formula.
6
Definition. (Northcott) The (absolute multiplicative) height of P = [x0 , ..., xn ] ∈ Pn (Q)
is defined as follows:
• Let K/Q be such that P ∈ Pn (K).
• Define H(P ) := HK (P )1/[K:Q] ≥ 1.
Part (3) of the proposition implies that H(P ) is independent of K/Q such that P ∈ Pn (K).
Indeed, suppose L/K/Q such that P ∈ Pn (K), so that P ∈ Pn (L) as well. To show
independence of H(P ) on K or L, observe that
HL (P )1/[L:Q] = (HK (P )[L:K] )1/[L:Q] = HK (P )[L:K]/[L:Q] = HK (P )1/[K:Q]
The (absolute logarithmic) height of P is
1
h(P ) = log H(P ) = hK (P )
[K : Q]
where K/Q is any field extension such that P ∈ Pn (K).
Notation. For a ∈ Q, we can consider [a, 1] ∈ P1 (Q). We set
!1/[K:Q]
Y
H(a) := H( [a, 1] ) = max(||a||v , 1)
|{z}
v∈MK
∈P1 (Q)

and h(a) := h([a, 1])


Proposition. Given P ∈ Pn (Q) and σ ∈ GQ = Gal(Q/Q), we have H(σ(P )) = H(P ).
Proof. See the book [HS00].
Theorem. Let K be a number field. The set
{P ∈ Pn (K) : HK (P ) ≤ B}
is finite.
Definition. The field of definition of P = [x0 , ..., xn ] ∈ Pn (Q) is
 
x 0 x1 xn
Q(P ) = Q , , ...,
x j xj xj
for some j such that xj 6= 0. Alternatively, let GP = {σ ∈ GQ : σ(P ) = P }. Then
GP
Q(P ) = Q is called the field of moduli for P .
Theorem. For any fixed B, D > 0, the set
#{P ∈ Pn (Q) : H(P ) ≤ B and [Q(P ) : Q] ≤ D}
is finite.
Notation. The set above is denoted Pn (Q)B,D , and we put D(P ) := [Q(P ) : Q].
Proof. Let P = [x0 , ..., xn ] ∈ Pn (Q)B,D , and let K = Q(P ). WLOG assume that some
xj = 1. So, for any k,
max ||xi ||v ≥ max(||xk ||v , 1)
0≤i≤n
So
!1/DP !1/DP
Y Y
(1) max ||xi ||v ≥ max(||xk ||v , 1) = H(xk )
0≤i≤n
v∈MK v∈MK
| {z }
=H(P )≤B
7
for each k. Also, Q(xk ) ⊆ Q(x0 , ..., xn ) = K which gives
(2) D(xk ) ≤ D(P )
Combine inequalities (1) and (2) to get
#Pn (Q)B,D ≤ #(P1 (Q)B,D )n+1
In the view of the last inequality, it is enough to show that
#{α ∈ Q : H(α) ≤ B, D(α) = d} < ∞
for all 1 ≤ d ≤ D.
Let Fα (X) be the minimal polynomial of α over Q. We can write
d
Y d
X
Fα (X) = (X − αi ) = (−1)j σj (α1 , ..., αd )X d−j
i=1 j=0

where α = α1 , α2 , ..., αd are the conjugates of α. In the analysis that follows below, we will
use the notation (
N if v is archimedean
Nv =
1 if v is non-archimedean
for any positive integer N . For v ∈ MK ,
X
|σj (α1 , ..., αd )|v = αi1 αi2 · · · αij
1≤i1 <...<ij ≤d
v
 
d
≤ · max |αi · · · αij |v
j v 1≤i1 <...<ij ≤d 1
≤ 2dv · max |αi |jv
1≤i≤d
d
Y
≤ 2dv · max(1, |αi |v )d
i=1

Applying the usual procedure of raising the expression to nv power, multiplying over all
v ∈ MK and taking [K : Q] roots, one obtains
!1/[K:Q] d
2
Y Y
dnv
H([σ0 (α), ..., σd (α)]) ≤ 2v H(αi )d = 2d H(α)d
v∈MK i=1

where we applied the product formula at the end to the term 2d . We have a map
take min poly 2
{α ∈ Q : H(α) ≤ B, D(α) = d} −→ {X d +a1 X d−1 +· · ·+ad : ai ∈ Q, H([1, a1 , ..., ad ]) ≤ |2d{z
B d}}
=C

which is d-to-1. But we saw earlier that


#{X d + a1 X d−1 + · · · + ad : ai ∈ Q, H([1, a1 , ..., ad ]) ≤ C} ≤ #{P ∈ Pn (Q) : H(P ) ≤ C} < ∞
which completes the proof. 
Exercise. Show that #Pn (Q)B,D ≤ c1 (n, D) ≤ B c2 (n,D) .
8

Corollary. (Kronecker’s Theorem) Let α ∈ Q . Then h(α) = 0 if and only if α is a root
of unity.
Proof. First, we need a preliminary result, namely that h(αn ) = nh(α) or equivalently,
HK (αn ) = HK (α)n . Indeed,
!n
Y Y
HK (αn ) = max(||αn ||v , 1) = max(||α||v , 1) = HK (α)n
v∈MK v∈MK

We can proceed to the proof of the corollary.


(⇐) This direction is clear: if αn = 1, then nh(α) = h(αn ) = h(1) = 0 because H(1) = 1.
We get h(α) = 0.
(⇒) Suppose that h(α) = 0. Then h(αn ) = nh(α) = 0. Hence,
{1, α, α2 , α3 , ...} ⊆ {β ∈ Q : H(β) ≤ 1 and D(β) ⊆ D(α)}
| {z }
finite
m n m−n
Thus, α = α for some m > n, which gives α = 1, i.e. α is a root of unity. 

Lehmer’s Conjecture. There exists c > 0 such that for all α ∈ Q with α 6= a root of
unity,
c
h(α) ≥
D(α)
Theorem. (Dobrowolski) [Dob79]. Let  > 0. There exists a constant c > 0 such that

for all α ∈ Q with α 6= a root of unity,
 6+
c log log D(α)
h(α) ≥
D(α) log D(α)
7. Comparing heights of P and f (P )
Recall that we have defined the absolute height function h : Pn (Q) → [0, ∞). Let f :
Pn 99K Pm be a rational map of degree d ≥ 1. In other words, f is given by [f0 , ..., fm ]
where fi ∈ Q[x0 , ..., xn ]. Intuitively, we would expect that H(f (P )) ≈ H(P )d so that
h(f (P )) ≈ d · h(P ).
Example. Consider the map f : Pn 99K Pn given by f [x0 , ..., xn ] = [xd0 , xd1 , ..., xdn ]. In this
case, H(f (P )) = H(P )d holds exactly.
The next theorem explains how the heights of P and f (P ) compare.
Theorem. (a) There is a constant c1 (f ) so that for all P ∈ Pn (Q),
h(f (P )) ≤ d · h(P ) + c1 (f )
(b) Assume that f is a morphism (i.e. the indeterminacy locus If = ∅). There is a constant
c2 (f ) so that for all P ∈ Pn (Q),
h(f (P )) ≥ d · h(P ) − c2 (f )
Remark. The hypothesis that f is a morphism is necessary for the conclusion in part
(b) to hold. Indeed, consider the following rational map f : P2 99K P2 given by f ([x, y, z]) =
[x2 , xy, z 2 ]. Note that f ([a, b, a]) = [a2 , ab, a2 ] = [a, b, a] for every [a, b] ∈ P1 (Q). Thus,
h(f ([a, b, a]) = h([a, b, a]) which shows that the lower bound h(f (P )) ≥ 2 · h(P ) − c2 (f )
cannot possibly hold for any constant c2 (f ). The problem is that f is not a morphism. In
fact, If = {[0, 1, 0]}.
9
Proof. (a) Let P = [x0 , ..., xn ] ∈ Pn (Q) and f = [f0 , ..., fm ]. We can write each fi as
X
fi = ai,j0 ,...,jn xj00 xj11 · · · xjnn
j0 +···+jn =d

By enlarging the field, if necessary, we can assume that there is a number field K such that

xi ∈ K and ai, #»j ∈ K for all i and j = (j0 , ..., jn ).
Let v ∈ MK be a place. Then
X
|fi (P )|v = ai, #»j xj00 · · · xjnn
j0 +···+jn =d v
 
n+d
≤ · max

|aij |v · max

|xj00 · · · xjnn |v
d v j j

≤ 2n+d
v · |fi |v · max |xk |dv
0≤k≤n

In the last line, we simply defined |fi |v to be max #»j |aij |v , i.e. it measures the largest absolute
value among the coefficients of fi . Since f = [f0 , ..., fn ], we can define |f |v = max0≤i≤m |fi |v .
We get
|fi (P )|v ≤ 2vn+d · |f |v · max |xk |dv
0≤k≤n

for each 0 ≤ i ≤ m. The right hand side is independent of i, so


max |fi (P )|v ≤ 2vn+d · |f |v · max |xk |dv
0≤i≤m 0≤k≤n

Now we do the usual procedure: we raise the inequality to nv -th power (local powers),
multiply over all v ∈ MK , and then take the [K : Q]-th root to get the absolute heights.
Finally, we will take the log of both sides to obtain the absolute logarithmic
Q heights. It
nv nv
Q
is worth mentioning that for any integer N ∈ N, we have v∈MK Nv = v∈M ∞ Nv =
K
nv 1/[K:Q]
[K:Q]
Q 
NormK/Q (N ) = N . After taking [K : Q]-th roots, v∈MK Nv = N . Combining
these observations, the previous displayed equation translates to
H(f (P )) ≤ 2n+d H(f )H(P )d
nv 1/[K:Q]
Q 
where H(f ) stands for v∈MK |f |v . There is a way to interpret H(f ) as an actual
height. Indeed, arrange all the coefficients of all the component functions fi of f into some
big vector and view the large string as an element of some big projective space. Then H(f )
is precisely the absolute height of this vector. Similarly, we define h(f ) := log H(f ).
Taking logs, we arrive to
h(f (P )) ≤ d · h(P ) + h(f ) + (n + d) log(2)
We can take c1 (f ) := h(f ) + (n + d) log(2), and this completes the proof of part (a).
(b) The lower bound is more subtle, and requires Nullstellensatz. We are assuming that f
is a morphism, i.e. If = ∅. This means f0 , ..., fm have no common roots except for (0, ..., 0)
which is not in the projective space. The projective version of Nullstellensatz says that for
each 0 ≤ i ≤ n, there is an exponent ei such that xei i ∈ hf0 , ..., fm i. By taking the largest of
the ei ’s, there is a single exponent e (independent of i) such that xei ∈ hf0 , ..., fm i.
10
For each i, there are polynomials gij (x0 , ..., xn ) in K[x0 , ..., xn ] such that
m
X
xei = gij (x0 , ..., xn )fj (x0 , ..., xn )
j=0

By enlarging the field again, we may as well assume that gij [x0 , ..., xn ] ∈ K[x0 , ..., xn ]. With-
out loss of generality, we can assume that gij are homogeneous of degree e − d. Thus,
|x |e ≤ (m + 1) · max |g ( #»
i v v x )| · max |f ( #»
ij v x )| j v
0≤j≤m 0≤j≤m

max |fj ( #»
 
≤ (m + 1)v · max 2n+e−d · |gij |v · max |xk |e−d
v x )|v
0≤j≤m k 0≤j≤m

Here we applied the result of part (a) to the functions gij . Now, taking a maximum over all
0 ≤ i ≤ n, we get
max |xi |ev ≤ (m + 1)2n+e−d v · max |gij |v · max |xk |e−d · max |fj ( #»

v x )|v
0≤i≤n i,j 0≤k≤n 0≤j≤m

Moving the term max0≤k≤n |xk |e−d


v to the left hand side, we obtain
max |xi |dv ≤ (m + 1)2n+e−d v · max |gij |v · max |fj ( #»

x )|v
0≤i≤n i,j 0≤j≤m

Now we do the usual procedure (see part (a) for details) to get
H(P )d ≤ (m + 1)2n+e−d H(g)H(f (P ))
where H(g) stands for the heigh of a point obtained by stringing together all the coefficients
of all the gij . After taking logs, and rearranging the equation, we have
h(f (P )) ≥ d · h(P ) − h(g) − log(m + 1) − (n + e − d) log(2)
We can take c2 (f ) := h(g) + log(m + 1) + (n + e − d) log(2), and this completes the proof of
part (b). 
8. Application: Northcott’s Theorem for preperiodic points
Suppose f : Pn 99K Pn is a rational map (so here m = n). We can iterate f by composing
f with itself. So we consider f 2 := f ◦ f , f 3 := f ◦ f ◦ f , and in general f r := f ◦ f ◦ · · · ◦ f .
| {z }
=r times
Given a point P ∈ Pn , its f -orbit is defined to be the set Of (P ) := {f r (P ) : r ≥ 0}.
Definition. A point P is called preperiodic if Of (P ) is finite. A point P is called periodic
if f m (P ) = P for some m ≥ 1.
It is clear that any periodic point is preperiodic. Similarly, some iterate of a preperiodic
point must be periodic. In Pm (C), the set PrePer(P ) is big. Indeed, for any pair of integers
k > j, the polynomial equation f k (P ) = f j (P ) will always have solutions in C, leading to
abundance of preperiodic points. Of course, the same argument applies to any algebraically
closed field.
Theorem. (Northcott) Suppose f : Pn → Pn is a morphism defined over Q with degree
d ≥ 2. The set
{P ∈ Pn (Q) : P is preperiodic for f }
is a set of bounded height.
Since there are only finitely many points with bounded height and bounded degree, we
immediately deduce the following corollary.
11
Corollary. Suppose f : Pn → Pn is a morphism defined over Q with degree d ≥ 2. For
any number field K, the set
{P ∈ Pn (K) : P is preperiodic for f }
is finite.
Proof of Northcott’s Theorem. We are interested in the set of periodic and preperiodic
points for f .
Per(f, Pn (Q)) := {P ∈ Pn (Q) : f i (P ) = P for some i ∈ N}
PrePer(f, Pn (Q)) := {P ∈ Pn (Q) : f i (P ) = f j (P ) for some i, j ∈ N with i > j}
We first show that Per(f, Pn (Q)) is a set of bounded height. In a previous section, we have
proved that for a morphism f of degree d,
h(f (Q)) ≥ d · h(Q) − C
holds for all points Q. Here C = C(f ) is a constant that only depends on f . Suppose that
P ∈ Per(f, Pn (Q)), i.e. f k (P ) = P for some k ≥ 1. Then
h(P ) = h(f k (P )) = h(f (f k−1 (P )))
≥ d · h(f k−1 (P )) − C
≥ d(d · h(f k−2 (P )) − C) − C
= d2 · h(f k−2 (P )) − (d + 1)C
≥ ...
≥ dk h(P ) − (dk−1 + dk−2 + · · · + d + 1)C
 k 
k d −1
= d h(P ) − C
d−1
We have shown that
dk − 1
 
k
h(P ) ≥ d h(P ) − C
d−1
Rearranging this inequality, one obtains
 k 
d −1 C
C ≥ (dk − 1)h(P ) ⇒ h(P ) ≤
d−1 d−1
Therefore,
 
C
Per(f, P (Q)) ⊆ P ∈ Pn (Q) : h(P ) ≤
n
d−1
which proves Northcott’s theorem in the periodic case.
Now suppose that P ∈ PrePer(f, Pn (Q)), i.e. f k+i (P ) = f i (P ) for some k ≥ 1, i ≥ 0.
Note that i = 0 if and only if P is periodic for f .
Since f k (f i (P )) = f i (P ), it follows that f i (P ) ∈ Per(f, Pn (Q)). By the previous compu-
tation,
C
h(f i (P )) ≤
d−1
12
Using the same computation as in the periodic case, we have
 i 
i i d −1
h(f (P )) ≥ d · h(P ) − C
d−1
Combining the previous two equations, we get
 i 
i d −1 C
d · h(P ) − C≤
d−1 d−1
Rearranging this inequality, we have
di − 1
   i 
i 1 d
d · h(P ) ≤ + C= C
d−1 d−1 d−1
Cancelling out di from both sides, we arrive at
C
h(P ) ≤
d−1
which is amazingly the exact same bound as in the periodic case. We conclude that
 
n n C
PrePer(f, P (Q)) ⊆ P ∈ P (Q) : h(P ) ≤
d−1
This finishes the proof of Northcott’s theorem. 
As a corollary of Northcott’s theorem, PrePer(f, Pn (K)) must be finite for any number
field K. So the natural question is, how big can the set PrePer(f, Pn (K)) be? It is clear
that the size of PrePer(f, Pn (K)) can tend to infinity as [K : Q] → ∞, or n → ∞ or
deg(f ) → ∞. The following conjecture predicts that these 3 factors together govern how big
can PrePer(f, Pn (K)) get.
Uniform Boundedness Conjecture.
# PrePer(f, Pn (K)) ≤ C ([K : Q], deg(f ), n)
where the constant C only depends on [K : Q], deg(f ) and n.
A simplest example is the case of a quadratic polynomial, i.e. fc (x) = x2 + c where c ∈ Q.
By Northcott’s theorem, the set of periodic points Per(fc , Q) is finite. For c = 0, the point
x = 0 has period 1. For c = −1, x = 0 is a point of period 2 – indeed, f−1 (x) = x2 − 1 and
f−1 (0) = −1 and f−1 (−1) = 0. There is a specific value c0 such that fc0 has a point of period
3. However, Morton showed that there is no value of c such that fc has a point of period 4.
The same conclusion holds for points of period 5. Conditional on the Birch–Swinnerton-Dyer
conjecture, Stoll showed that there is no value of c ∈ Q such that fc has a point of period 6.
Nothing is currently known for period 7 or higher.

9. Heights on Varieties
Recall that an algebraic set V ⊆ Pn is defined by a collection of polynomial equations
f1 = ... = fr = 0 where each fi is a homogenous polynomial. In other words, V = {P ∈
Pn : f1 (P ) = ... = fr (P ) = 0}. Given an algebraic set V , we can consider I(V ) which
is the ideal generated by the set homogeneous polynomials vanishing on V . In symbols,
I(V ) = {homogeneous f such that f (P ) = 0 for all P ∈ V }. We say that an algebraic set
V is a variety if I(V ) is prime ideal. Geometrically, a variety is an algebraic set which cannot
13
be written as a union of two other algebraic sets. What we call “variety” here is sometimes
called an “irreducible variety” in some textbooks.
Given a variety V , an irreducible divisor of V is a subvariety W ⊆ V of codimension 1.
Definition. The group of divisors of V is
( )
X
Div(V ) = nW W : nW ∈ Z all but finitely many of nw are zero
W irred. divisor

More formally, Div(V ) is the free abelian group generated by the symbols W where W ranges
over all irreducible divisors of V .
Example. Suppose that V = C is a curve. Then Pan irreducible divisor is just a point on
the curve. So, any divisor D on C is of the form P ∈C nP P . We can use the concept of a
divisor to keep track of zeros and poles of functions on curves.
Example. Let V = Pn . It turns out that if W ⊆ V is an irreducible divisor, then
W = {f = 0} for some homogeneous irreducible polynomial in k[x0 , ..., xn ].
Given a variety V over a field k, the field k(V ) consisting of all rational functions on V is
called the function field of V . Notice that k(V ) is the fraction field of the coordinate ring
k[V ] = k[x0 , ..., xn ]/I(V ) which is an integral domain precisely because I(V ) is a prime ideal
(by definition of a variety). Given f ∈ k(V ), we can view f : V 99K P1 . It is customary
to associate 0 with [0 : 1] and ∞ with [1 : 0] on P1 . This correspondence is explained via
[a, b] ←→ a/b. Let’s assume from now on that V is smooth. Given any irreducible divisor
W ⊂ V , it is possible to define an integer ordW (f ) which is the order of vanishing of f along
W . This quantity ordW (f ) will be negative if f has a pole across W . It turns out that
ordW : k(V )∗ → Z
is a valuation. For a given f ∈ K(V )∗ , we define
X
div(f ) := ordW (f )W
W irred. divisor

Note that ordW (f ) > 0 if and only if W ⊆ f −1 ([0 : 1]) and ordW (f ) < 0 if and only if
W ⊆ f −1 ([1 : 0]).
The reason that all of this works rests on the fact that the local ring OV,W is a DVR
(because it is a dimension 1 regular local ring).
Definition. We say that D1 ∼ D2 if D2 − D1 = div(f ) for some f ∈ k(V )∗ . This is
indeed an equivalence relation.
Definition. The Picard group of V is the quotient group
Div(V ) Div(V )
Pic(V ) = =
∼ div(k(V )∗ )
Example. Let’s prove that Pic(Pn ) = Z. Suppose D ∈ Div(Pn ). Then
X X
D= nw W = nW {fW = 0}
W W

where fW is a homogeneous polynomial of degree dW . The polynomial


!
P
− nW dW
Y n
F := fWW · x0
W
14
has degree 0, so F is an element of k(Pn ). Note that
X X 
div(F ) = nW div(fW ) − nW dW {x0 = 0}
W | {z }
∈Z
P
Thus, D ∼ ( nW dW ) {x0 = 0}. So we have shown that any divisor D is linearly equivalent
to mH where m is an integer and H = {x0 = 0} is a fixed hyperplane.
In fact, we have constructed a degree map deg : Pic(Pn ) → Z defined by D 7→
P
nW .
The map is well-defined because deg(div(f )) = 0 for any f ∈ K(Pn )∗ . Since {x0 = 0} 7→ 1,
the degreePmap is surjective. The map deg : Pic(Pn ) → Z is also
Q injective: if deg(D) = 0,
nW
and D = W nW DW then P D is defined by the equation f := W fW which is a rational
function of degree 0 as W nW DW = deg(D) = 0. So f ∈ k(Pn )∗ and D = div(f ) which is
the zero element in Pic(Pn ).
Theorem. Let V be a smooth variety. There is an exact sequence of abelian groups:
0 −→ Pic0 (V ) −→ Pic(V ) −→ NS(V ) −→ 0
where NS(V ) is the Neron-Severi group. Moreover, NS(V ) is a finitely generated abelian
group.
It turns out that Pic0 (V ) is naturally a projective algebraic group variety, also known as
an abelian variety. An abelian variety is an abelian group object in the category of projective
varieties. One interesting feature is that abelian varieties are always smooth. This is because
it has at least one smooth point, and the group law allows one to translate a smooth point
everywhere else.
In general, Pic(V ) has an alternative description via 1) line bundles or 2) invertible sheaves.
In this course, we will mostly use the divisor language.
if f ∈ k(V )∗ , then div(f ) = zeros − poles. We can turn this around: Given poles & zeros,
we can try finding a function f that fits such portfolio.
Example. Find functions on P1 that vanish to order at least 2 at [1 : 3], and have a pole
of order at most 3 at [5 : 2], and no other poles.
For any [a : b] ∈ P1 , we can consider the following function:
(3x − y)2 (ax + by)
(2x − 5y)3
where x, y are the homogeneous coordinates on P1 . It is clear that this function does the
job, and conversely, any function on P1 that vanish to order at least 2 at [1 : 3], and have a
pole of order at most 3 at [5 : 2] must be of the form above for some choice of [aP: b] ∈ P1 .
There is a convenient way to compare divisors. We say that D ≥ 0 if D = W nW DW
where each nW ≥ 0. For any two divisors D1 and D2 , we say that D1 ≥ D2 if D1 − D2 ≥ 0.
Given D ∈ Div(V ), we define the space
L(D) = {f ∈ k(V )∗ : div(f ) ≥ −D} ∪ {0}
It turns out that L(D) is a finite-dimensional vector space. To see that L(D) is indeed
a vector space, one needs to use the fact that ordW : k(V )∗ → Z is a valuation. The
finite-dimensionality part of the assertion is much subtler.
Given V ⊂ Pn , how shall we define a height on V ? A naive approach is restrict the height
h : Pn (Q) → [0, ∞) to points of v to get hV : V (Q) → [0, ∞).
15
Problem / Issue. There are lots of ways to embed V inside a projective space, and the
above definition of a height would depend on such an embedding.
So we need to study how to map V into a projective space. Given a divisor D, we can
choose a basis f0 , ..., fr for L(D), and we can use this to define [f0 , ...., fr ] : V 99K Pr . We
will get different height functions, one for each divisor D. We will study the variation in
heights obtained this way.
We denote `(D) := dim L(D). Given a basis f0 , ..., fr for L(D) where r = `(D) − 1, we
get a map φD : [f0 , ..., fr ] : V 99K Pr . Assuming that φD is an embedding (in particular, a
morphism), we can define the height of a point P ∈ V by hD (P ) = h(φD (P )).
Definition. A divisor D is said to be very ample if φD is an embedding.
Definition. A divisor D is said to be ample if there exists m ≥ 1 such that mD is very
ample.
Example. On a smooth curve C, D is very ample if deg(D) ≥ 2g + 1 where g is the genus
of C. This is an application of Riemann-Roch Theorem.
Example. On a smooth curve C, D is ample if deg(D) ≥ 1.
Theorem. (Serre) Every divisor D ∈ Div(V ) can be written as D = D1 − D2 where D1
and D2 are very ample.
Idea behind the Proof. Let H be very ample (say a hyperplane section of V ). Then
we can set D1 = mH + D and D2 = mH. For large enough m ∈ N, both D1 and D2 are
very ample, and clearly D = D1 − D2 .
For every D ∈ Div(V ), choose very ample divisors D1 and D2 such that D = D1 − D2 .
Choose bases for L(D1 ) and L(D2 ) to get embeddings ϕD1 : V ,→ P`(D1 )−1 and ϕD2 : V ,→
P`(D2 )−1 .
We can “define” hD (P ) := hD1 (P ) − hD2 (P ) = h(φD1 (P )) − h(φD2 (P )). This depends on
a lot of choices: it depends on the way we decomposed D as D = D1 − D2 , and it depends
on the bases for D1 and D2 . We will soon see that these choices can only change the height
function by a bounded amount.
Proposition. a) Suppose that D is very ample. Pick a basis f0 , f1 , ..., fr for L(D),
where r = `(D) − 1. Now pick another basis f00 , f10 , ..., fr0 for L(D). Let ϕD and ϕ0D be the
corresponding embeddings V ,→ Pr . Then

h(ϕD (P )) = h(ϕ0D (P )) + O(1)

for all P ∈ V (Q).


b) Suppose that D and D0 are very ample. Then D + D0 is very ample, and hD+D0 =
hD + hD0 + O(1).
Proof. a) We have ϕD = [f0 , ..., fr ] : V ,→ Pr and ϕ0D = [f00 , ..., fr0P
] : V ,→ Pr . There
exists a change of basis matrix A = (aij ) ∈ GLr+1 (Q) such that fi = j aij fj0 . Note that
A : Pr → Pr defines a morphism and deg(A) = 1. Since ϕD = Aϕ0D , we obtain

h(ϕD (P )) = h(A(ϕD0 (P ))) = h(ϕD0 (P )) + O(1)

where O(1) is a constant that does not depend on the point P .


b) Fix bases f0 , ..., fr for L(D) and f00 , ..., fs0 for L(D0 ). We get ϕD = [f0 , ..., fr ] : V ,→ Pr
and ϕD0 : [f00 , ..., fs0 ] : V ,→ Ps . A basis for L(D + D0 ) is given by {fi fj0 : 0 ≤ i ≤ r, 0 ≤
j ≤ s}. Then ϕD+D0 : V → P(r+1)(s+1)−1 = Prs+r+s is given by [..., fi fj0 , ...]. The following
16
commutative diagram summarizes all the relevant maps:
ϕD+D0
V _ / Pr+s+rs
8
ϕD ×ϕD0
 + Segre Embedding
r s
P ×P
As a composition of embeddings, ϕD+D0 : V ,→ Prs+r+s is also an embedding, i.e. D + D0 is
very ample. Then
h(ϕD+D0 (P )) = h([..., fi (P )fj0 (P ), ...])
Once we fix a number field K, note that
Y
Hk ([..., fi (P )fj0 (P ), ...]) = max ||fi (P )fj0 (P )||v
i,j
v∈MK
Y  
0
= max ||fi (P )||v · max ||fj (P )||v
i j
v∈MK

= HK ([..., fi (P ), ...]) · HK ([..., fj0 (P ), ...])


Taking the logarithms, we get hD+D0 (P ) = hD (P ) + hD0 (P ) + O(1).
Corollary. hD is well-defined up to O(1).
Proof. Suppose D = D1 − D2 = D10 − D20 are two different decompositions of D where
D1 , D2 , D10 , D20 are all very ample. We have D1 + D20 = D10 + D2 . Thus,
hD1 +D20 = hD10 +D2 + O(1)
Since hD1 +D20 = hD1 + hD20 + O(1) and hD10 +D2 = hD10 + hD2 + O(1), we get
hD1 + hD20 = hD10 + hD2 + O(1)
or equivalently,
hD1 − hD2 = hD10 − hD20 + O(1)
as desired. 
Consequence. We have constructed a well-defined function
{functions V (Q) → R}
h : Div(V ) →
{bounded functions}
that sends D 7→ (hD : V (Q) → R). In fact, this is a group homomorphism, because
hD1 +D2 = hD1 + hD2 + O(1) as we proved above.

10. Weil Height Machine


Let V be any non-singular variety. We want to further investigate the map
{functions V (Q) → R}
h : Div(V ) →
{bounded functions}
D 7→ hV,D
where hV,D is a shorthand for hD : V (Q) → R) constructed in the previous section.
Theorem. (Weil Height Machine)
17
(a) Normalization. Let ϕ : V ,→ Pn be an embedding, and H ⊆ Pn be a hyperplane.
Then
hV,ϕ∗ H (P ) = h(ϕ(P )) + O(1)
Here ϕ∗ (H) is H ∩ V , and the height on the right hand side is the usual height on Pn .
(b) Functoriality. Suppose ϕ : V → W is a morphism, and let D ∈ Div(W ). Then
hV,ϕ∗ D (P ) = hW,D (ϕ(P )) + O(1)
(c) Additivity. If D, E ∈ Div(V ), then
hV,D+E = hV,D + hV,E + O(1)
(d) Linear Equivalence. Let D, E ∈ Div(V ) such that D ∼ E (linearly equivalent
divisors). Then
hV,D = hV,E + O(1)
P
(e) Positivity. Assume D ≥ 0, i.e. D = ni Wi where Wi is an irreducible subvariety of
V , and ni ≥ 0. Then
hV,D (P ) ≥ O(1)
for all P not in the base locus of D. In other words, there exists a constant C ≥ 0 such that
hV,D (P ) ≥ −C for all such points P .
(f) Boundedness for Ample Divisors. If D is ample, then for any fixed B1 , B2 > 0,
the set
{P ∈ V (Q) : hV,D (P ) ≤ B1 and [Q(P ) : Q] ≤ B2 }
is finite.
Base Point Freeness. Before we delve into the proof, we will say a few words about a
base locus of a linear system, as it will come up in the proof a few times (for example, already
in the proof of part (b) below). If D is very ample, then by definition, [f0 , ..., fn ] : V ,→ Pn is
an embedding, where f0 , ..., fn is a basis for L(D). For a general divisor D, if we pick a basis
f0 , ..., fr for L(D), the associated rational map ϕD : [f0 , ..., fr ] : V ,→ Pr might fail to be an
embedding for several reasons. First of all, there might not even be enough sections to begin
with, e.g. nothing prevents the case L(D) = {0}, and secondly, the associated rational map
is not necessarily a morphism, i.e. f0 , ..., fr may vanish at a common point. This second
reason for failure can be encapsulated by a notion of a base locus. Given a divisor D, the
base locus of D is the indeterminacy locus IϕD . Another definition is provided as follows.
We first define
|D| = {D0 ∈ Div(V ) : D0 ∼ D, and D0 ≥ 0}
Then the base locus of D is \
BD = Support(D0 )
D0 ∈|D|

Here, Support(D0 ) = ∪Wi where D0 =


P
ni Wi with ni 6= 0. We say that D is base point
free, if the base locus BD is empty, i.e. the associated map ϕD = [f0 , ..., fr ] : V → Pr is a
morphism. It turns out that one can define the associated height function for any base point
free divisor by taking a basis for its global sections (it doesn’t have to be very ample!). See
more details in page 186 of the the book [HS00].
18
Proof. (a) Write H = {a0 x0 + .... + an xn = 0}. Let’s denote A( #» x ) := a0 x0 + .... + an xn .
A basis for L(H) is given by
x0 x1 xn
#» , #» , ..., #»
A( x ) A( x ) A( x )

These elements restricted to V give a basis for L(φ H) = L(V ∩ H). Thus,
 
x0 x1 xn
hV,ϕ∗ H (P ) = h (P ), #» (P ), ..., #» (P )
A( #»
x) A( x ) A( x )
We cancel out A( #»
x ) in the projective coordinates, which gives
hV,ϕ∗ H (P ) = h([x0 (P ), ..., xn (P )]) = h(P ) + O(1)
(b) We write D = D1 − D2 where D1 , D2 are very ample. If we can show the same claim
for D1 , D2 , i.e. if we can show that
hV,ϕ∗ D1 (P ) = hW,D1 (ϕ(P )) + O(1)
hV,ϕ∗ D2 (P ) = hW,D2 (ϕ(P )) + O(1)
then we would be done. Indeed, after subtracting the second equation from the first, one
obtains
hV,ϕ∗ D1 (P ) − hV,ϕ∗ D2 (P ) = hW,D1 (ϕ(P )) − hW,D2 (ϕ(P )) + O(1)
or equivalently,
hV,ϕ∗ D1 −ϕ∗ D2 (P ) = hW,D1 −D2 (ϕ(P )) + O(1)
Since D = D1 − D2 , and ϕ∗ (D1 ) − ϕ∗ (D2 ) = ϕ∗ (D1 − D2 ), the last displayed equation
becomes
hV,ϕ∗ (P ) = hW,D (ϕ(P )) + O(1)
as desired. This reduction shows that the conclusion would follow if the claim for part (b) is
demonstrated for D1 and D2 . The only difference between D and Di (for i = 1, 2) is that Di
are very ample. Therefore, we can assume, without loss of generality, that D is very ample.
Assume D is very ample, i.e. it induces an embedding ϕD : W ,→ Pr . It is not necessarily
true that ϕ∗ (D) is very ample, but it is at least base point free. The divisor ϕ∗ (D) induces
the map ϕD ◦ ϕ : V → Pn . Using part (a) twice, we have
hV,ϕ∗ (D) (P ) = h ◦ ϕD ◦ ϕ(P ) + O(1)
= h ◦ ϕD (ϕ(P )) + O(1)
= hW,D (ϕ(P )) + O(1)
(c) We proved this property last time. The key was to use the Segre embedding.
(d) Suppose D ∼ E. Write D = D1 − D2 and E = E1 − E2 where D1 , D2 , E1 , E2
are very ample. The hypothesis D1 − D2 ∼ E1 − E2 implies that D1 + E2 ∼ E1 + D2 .
Notice that D1 + E2 and E1 + D2 are both very ample (as they are sum of two very ample
divisors). If we can show the same claim for D1 + E2 ∼ E1 + D2 , i.e. if we can show that
hD1 +E2 = hE1 +D2 + O(1), then we would be done, as the last equation can be rearranged to
hD1 −D2 = hE1 −E2 + O(1). So we may assume, without loss of generality, that D and E are
both very ample divisors.
Let f0 , ..., fn be a basis for L(D), so that
hD (P ) = h([f0 (P ), ..., fn (P )])
19
By definition of D ∼ E, we have D = E + div(g) for some g ∈ k(V )∗ . Note the following
chain of equivalences:
f ∈ L(D) ⇐⇒ div(f ) + D ≥ 0
⇐⇒ div(f ) + div(g) + E ≥ 0
⇐⇒ div(f g) + E ≥ 0
⇐⇒ f g ∈ L(E)
Since f0 , ..., fn is a basis for L(D), the elements f0 g, ..., fn g (multiplication here, not compo-
sition) form a basis for L(E). The last assertion can be justified by observing that one can
multiply a linear dependence relation formally by g −1 , since g ∈ k(V )∗ . Thus,
hE (P ) = h ([f0 g(P ), ..., fn g(P )]) + O(1)
= h ([f0 (P )g(P ), ..., fn (P )g(P )]) + O(1)
= h ([f0 (P ), ..., fn (P )]) + O(1)
= hD (P ) + O(1)
Note that a problematic part could be at the points P where g vanishes or g has poles. In this
case, we can use another g 0 to cover these problematic points. And to cover the problematic
points of g 0 , we can use another g 00 , and so on. The idea is somehow to use different charts
to cover all of V .
(e) Write D = D1 − D2 where D1 , D2 are very ample. Given D ≥ 0, let f0 , ..., fn be a
basis for L(D2 ). Then
hD2 (P ) = h([f0 (P ), ..., fn (P )])
We know D1 − D2 ≥ 0. Also, div(fi ) + D2 ≥ 0 for each i. Adding these two inequalities,
div(fi ) + D1 ≥ 0. So fi ∈ L(D1 ) for each i = 0, 1, ..., n. Note that {f0 , ..., fn } is still linearly
independent when viewed in L(D1 ) but not necessarily a basis for L(D1 ). In any case, we can
further extend it to a basis of L(D1 ) by adding extra elements fn+1 , ..., fm , i.e. {f0 , f1 , ..., fm }
is a basis of L(D1 ). By definition,
hD1 (P ) = h ([f0 (P ), ..., fm (P )])
So
hD (P ) = hD1 (P ) − hD2 (P )
= h([f0 (P ), ..., fm (P )]) − h([f0 (P ), ..., fn (P )]) + O(1)
Since the entries [f0 (P ), ..., fm (P )] include the entries of [f0 (P ), ..., fn (P )], it is clear that
h([f0 (P ), ..., fm (P )]) ≥ h([f0 (P ), ..., fn (P )]) and so the preceding inequality implies that
hD (P ) ≥ O(1)
To cover all points P on all of V − BD , we would repeat the argument for each D0 ∈ |D|.
Indeed, given any point P ∈ V − BD , there exists some D0 ∈ |D| such that P is not in the
support of D0 . When we write D0 = D1 − D2 , we can also assume that P is not in the
support of D1 and D2 . Consequently, the basis elements f0 , ..., fn ∈ L(D1 ) do not all vanish
at P , because if they all did, then all elements of L(D1 ) would vanish at P , which would
mean that P is in the support of D.
20
(f) Given that D is ample, mD is very amply for some m ∈ N. After choosing a basis
f0 , ..., fn for mD, we get an embedding
ϕmD = [f0 , ..., fm ] : V ,→ Pn
Since mD ∼ ϕ∗mD , applying part (a),
hV,mD (P ) = hV,ϕ∗mD H (P ) + O(1)
= h(ϕmD (P )) + O(1)
On the other hand hV,mD (P ) = mhV,D (P ) + O(1), so combining these two equations,
mhV,D (P ) + O(1) = h(ϕmD (P )) + O(1)
Thus, looking at the set of points P with bounded degree and a bounded value of hV,D (P )
is the same as looking at the set of points P with bounded degree and a bounded value of
h(ϕmD (P )). The latter set has already been shown to be finite before. In essence, we used
functoriality to reduce the problem from V to the case of the projective space which has
been dealt before.

11. Canonical Height


Suppose that f : PN → PN is a morphism of degree d. Then
h(f (P )) = d · h(P ) + Of (1)
The error term Of (1) can be annoying sometimes, so we would like to get rid of it somehow.
If we iterate f , and repeatedly use the above formula, we reach to
h(f n (P )) = dn · h(P ) + Of n (1)
At this point, one is tempted to divide both sides by dn , and let d → ∞. The problem is
that Of n (1) can be growing as a function of d and n as well. However, if one carefully carries
out the iteration process, one can get
h(f n (P )) = dn · h(P ) + Of (dn )
If we try the same trick of dividing both sides by dn , we would get
h(f n (P )) Of (dn )
= h(P ) +
dn n
| d{z }
=bounded

This looks more promising. However, it is still not clear what happens when we let d → ∞.
Just because the error term is bounded, it does not mean that it converges as d → ∞.
Theorem. Let V be a non-singular variety, and f : V → V a morphism. Assume that
D ∈ Div(V ) such that f ∗ (D) ∼ λD for some λ > 1. Then
(a) The limit
1
ĥf,D (P ) := lim n hD (f n (P ))
n→∞ λ

converges. We say that ĥf,D (P ) is the canonical height associated to f and D.


(b) ĥf,D = hD + O(1)
(c) ĥf,D (f (P )) = λ · ĥf,D (P )
Moreover, properties (b) and (c) determine ĥf,D uniquely.
21
Proof. Since f is a morphism, by functoriality part of the Weil Height Machine,

hD (f (P )) = hf ∗ D (P ) + O(1)

Since f ∗ D ∼ λD,

hD (f (P )) = hλD (P ) + O(1)
= λ · hD (P ) + O(1)

by linearity in the last step. We will use the identity hD (f (P )) = λ · hD (P ) + O(1) to show
that the sequence λ1n hD (f n (P )) is Cauchy, hence converges. Replacing P by f i−1 (P ), we
have
hD (f i (P )) − λhD (f i−1 (P )) = O(1)
for each i. Thus, for n > m,
1 1
n
hD (f n (P )) − m hD (f m (P )) =
λ λ
1 1
n
hD (f n (P )) − n−1 hD (f n−1 (P ))+
λ λ
1 1
n−1
hD (f n−1 (P )) − n−2 hD (f n−2 (P ))+
λ λ
..
.
1 1
hD (f m−1 (P )) − hD (f m (P ))
λm+1 λm

n
X 1 1
= i
hD (f i (P )) − i−1 hD (f i−1 (P ))
i=m+1
λ λ
n
X 1 
hD (f i (P )) − λ · hD (f i−1 (P ))

= i
i=m+1
λ
n ∞
X 1 X 1 1/λm+1
= i
O(1) ≤ i
O(1) = O(1)
i=m+1
λ i=m+1
λ 1 − 1/λ
1 1
= m
· O(1) −→ 0 as n ≥ m → ∞
λ λ−1

We conclude that ĥf,D (P ) is well-defined.


(b) Take m = 0 in the above computation. Then

1 1
n
hD (f n (P )) − hD (P ) ≤ · O(1)
λ λ−1

1
Letting n → ∞, we get ĥf,D (P ) − hD (P ) ≤ λ−1
· O(1), so in particular, ĥf,D (P ) = hD (P ) +
O(1).
22
(c) By definition,
1
ĥf,D (f (P )) = lim n
hD (f n (f (P )))
n→∞ λ
λ
= lim n+1 hD (f n+1 (P ))
n→∞ λ
1
= λ lim n+1 hD (f n+1 (P )) = λ · ĥf,D (P )
n→∞ λ

(d) Let ĥ0f,D be a function satisfying (b) and (c). Define g(P ) := ĥf,D (P ) − ĥ0f,D (P ). Part
(b) implies that g = O(1). Part (c) says that
λn g(P ) ⇒ g(P ) = 0
g(f (P )) = λ · g(P ) ⇒ g(f n (P )) = |{z}
| {z }
bounded →∞

So g = 0, and ĥ0f,D = ĥf,D which proves the desired uniqueness. 


Example. Consider a morphism f : Pn → Pn of degree d = deg(f ) ≥ 2. Then f ∗ H ∼ dH.
And we can associate the canonical height ĥf,D .
In particular, consider the case of d-th power map, that is, f : Pn → Pn is given by
f ([x0 , ..., xn ]) = [xd0 , ..., xdn ]. In this case, the canonical height ĥf,D ends up agreeing with the
usual Weil height:
1 X
ĥf,D (P ) = h(P ) = log max ||xi ||v
[K : Q] v∈M
K

Recall the following conjecture:



Lehmer’s Conjecture. There exists an absolute constant C > 0 such that if α ∈ Q is
not a root of unity, then
C
h(α) ≥
[Q(α) : Q]
As a context for the conjecture, recall that if α is a root of unity, then h(α) = 0. In
fact, the converse is also true, the proof of which has appeared earlier in the notes. We will
now prove a generalization of this result (letting f be the d-the power map x 7→ xd in the
statement will recover the results about roots of unity):
Proposition. Let f : V → V be a morphism and D ∈ Div(V ) such that f ∗ D ∼ λD
where λ > 1. Suppose that D is ample. Then ĥf,D (P ) = 0 ⇔ P is preperiodic for f .
Proof. (⇐) Suppose P is preperiodic for f . Then the set {f n (P )}n∈N is finite, and so
1
{h(f n (P ))}n∈N is finite, which clearly tells us that ĥf,D (P ) = lim n h(f n (P )) = 0.
n→∞ λ
(⇒) If ĥf,D (P ) = 0, then the property ĥf,D (f (P )) = λĥf,D (P ) implies that ĥf,D (f n (P )) =
0 for each n ∈ N. Since ĥf,D (P ) = h(P ) + O(1), the set {h(f n (P ))}n∈N is bounded. Now, D
is ample and f n (P ) all live in some fixed number field. By the boundedness property of the
Weil Height Machine, {f n (P )}n∈N is finite. Therefore, P is preperiodic for f . 
Note that Northcott’s theorem is a corollary of the proposition above in the special case
of a morphism f : Pn → Pn of degree d ≥ 2. Indeed, for a number field K, the preperiodic
points P of f correspond to the points where the height ĥf,D (P ) = 0, i.e. the values of
hD (P ) are bounded. But there are only finitely many points in K of bounded height, so
there are only finitely many preperiodic points of f .
23
12. Abelian Varieties
Definition. An abelian variety is a group in the category of projective varieties (where
the morphisms are regular morphisms, as opposed to just rational maps).
In other words, A ⊆ Pn is an abelian variety if there exist morphisms µ : A × A → A,
ι : A → A and an identity point e ∈ A satisfying the group axioms, i.e.
µ(e, x) = µ(x, e) = x
µ(x, ι(x)) = µ(ι(x), x) = e
µ(µ(x, y), z) = µ(x, µ(y, z))
for all x, y, z ∈ A. It turns out that the group structure is always abelian. We will see the
proof of this result soon.
Example. An elliptic curve (a smooth genus 1 curve) is an example of an abelian variety
of dimension 1. Assuming char(k) 6= 2, 3, one can realize the elliptic curve as a plane
curve A ⊆ P2 given by the equation y 2 z = x3 + Axz 2 + Bz 3 for some A, B ∈ k satisfying
4A3 + 27B 2 6= 0.
Remark. It turns out every abelian variety A of dimension 1 must be an elliptic curve.
Indeed, if dim(A) = 1 then A is a smooth curve of genus g ≥ 0. For g ≥ 2, it is a non-
trivial result that the curve A has only finitely many automorphisms, and hence cannot
be an abelian variety; indeed, in an abelian variety A, each point P ∈ A gives rise to a
translation-by-P map A → A given by Q 7→ P + Q which is clearly an automorphism. As
we are working with algebraically closed fields, there are infinitely many k-points P on A,
which implies that every abelian variety admits infinitely many automorphisms. To rule out
the case g = 0, observe that in this case A = P1 and suppose that µ : P1 × P1 → P1 is the
multiplication map of some group structure. One can show that µ must necessarily factor
through one of the projections, i.e. say µ = f ◦ π1 where π1 : P1 × P1 → P1 is projection
onto the first coordinate, and f : P1 → P1 is some morphism. But then for each point P ,
we have P = µ(e, P ) = f (e) which is a contradiction, as f (e) is a single point, while P can
be chosen to be any point. Thus, g = 1 is only permissible value of the genus for an abelian
variety of dimension 1.
The key result that is needed to prove that abelian varieties are abelian groups is the
so-called rigidity lemma.
Rigidity Lemma. Let X, Y, Z be varieties such that X is projective. Let f : X × Y → Z
be a morphism. Suppose that there exists some y0 ∈ Y such that f (X × {y0 }) is a single
point. Then f (X × {y}) is a single point for all y ∈ Y .
Proof. Since X is projective, X is proper so that the projection map p : X × Y → Y is
a closed map. By hypothesis, f (X × {y0 }) = {z0 } for some z0 ∈ Z. Let z0 ∈ U ⊆ Z where
U is an open affine neighborhood of z. Look at W = p(f −1 (Z \ U )) which is a closed subset
of Y . We claim that y0 ∈ / W (so that W 6= Y ). Indeed, if y0 ∈ W , then (x, y0 ) ∈ f −1 (Z \ U )
for some x ∈ X, so that f (x, y0 ) ∈ Z \ U which contradicts f (x, y0 ) = z0 ∈ U .
Thus, Y \ W is an open dense subset of Y . Note that for each y ∈ Y \ W , we must have
f (X × {y}) ⊆ U . Indeed, if f (x, y) ∈ Z \ U for some x ∈ X, then (x, y) ∈ f −1 (Z \ U ) so that
y ∈ p(f −1 (Z \ U )) = W , contradicting the choice of y. We conclude that f (X × {y}) ⊆ U
holds for each y ∈ Y \ W . But U is affine, while f (X × {y}) is projective (because X × {y}
is projective, and f is a morphism). This forces f (X × {y}) to be a single point for each
y ∈ Y \ W.
24
Now fix x0 ∈ X and define the map
g :X ×Y →Z
(x, y) 7→ (f (x0 ), y)
As f (X ×{y}) is a single point for each y ∈ Y \W , it follows that f (x, y) = f (x0 , y) = g(x, y)
for each x ∈ X and y ∈ Y \ W . So f and g agree on X × (Y \ W ). Since X × (Y \ W )
is an open dense subset of X × Y , and X × Y is separated, it follows that f and g agree
on X × Y . Thus, for each y ∈ Y , we have f (x, y) = f (x0 , y) for each x ∈ X, so that
f (X × {y}) = {f (x0 , y)} is a single point. 
Note: The rigidity lemma is true for all characteristics.
Corollary.
1) An abelian variety is an abelian group.
2) A morphism of abelian varieties is a composition of a homomorphism and a translation.
Proof. 1) Let ∗ denote the group operation. Look at f : A × A → A given by f (x, y) =
x ∗ y ∗ x−1 ∗ y −1 . For each x ∈ A, we have f (x, e) = x ∗ e ∗ x−1 ∗ e−1 = e. So, we have
f (A, e) = {e} which is a single point. By the rigidity lemma, f (A × {y}) is a single point
for each y ∈ A. Since f (e, y) = e, this single point must be e again, so f (A × {y}) = {e} for
each y ∈ A, i.e. for each x, y ∈ A, we have x ∗ y ∗ x−1 ∗ y −1 = e, or equivalently, x ∗ y = y ∗ x.
2) The proof is similar, and also uses the rigidity lemma. See the book for a proof. 
Theorem. Let A be an abelian variety over C, and denote g = dim(A). There is a
holomorphic surjective homomorphism Cg  A(C) whose kernel is a lattice L ⊂ Cg , i.e.
L∼ = Z2g as a group.
The space Cg is the universal covering space. The theorem above produces a complex

analytic uniformization ϕ : Cg /L → A(C).
Note. Not every Cg /L is an abelian variety (but yes for g = 1, i.e. elliptic curves).
Notation. Let A be an abelian variety defined over K. For each m ∈ N, we define
the map [m] : A → A given by P 7→ mP . This is a homomorphism. We also define
A[m] := ker[m] = {P ∈ A(K) : [m]P = 0}.
Proposition. GK = Gal(K/K) acts on A[m].
Proof. Since A is defined over K, we have σ(P1 + P2 ) = σ(P1 ) + σ(P2 ), and σ(−Q) =
−σ(Q) for every σ ∈ GK . By induction, σ([m]P ) = [m]σ(P ). So if σ ∈ GK and P ∈ A[m],
then [m]σ(P ) = σ([m]P ) = σ(0) = 0, so σ(P ) ∈ A[m]. 
Analogy. It is useful to consider various analogies between an abelian variety (a projective
algebraic group) and Gm (an affine algebraic group). The analog of A[m] is Gm [m] = {α ∈

K : αm = 1} = l.µm , i.e. the m-th roots of 1. This object is well-studied. For example, the
.
extension Q(l.µm )/Q is abelian, has degree ϕ(m), and ramified at p if and only if p | m.
.
Theorem. Assume char(K) = 0 and g = dim(A). Then A[m] ∼ = (Z/mZ)2g as abstract
groups.
Proof. (for K ⊆ C). We have A(K) ⊆ A(C). So
A[m] = {P ∈ A(C) : [m]P = 0} = points of order m in Cg /L
 2g
∼ 2g 2g ∼ 1 ∼
= (R /Z )[m] = Z/Z = (Z/mZ)2g
m
Remark. If char(k) = p > 0, then A[p] ∼
= (Z/pZ)k for some 0 ≤ k ≤ g, with k = g being
the most common case.
25
Representation of a Galois group. After choosing a basis, we get a non-canonical
isomorphism A[m] ∼
= (Z/mZ)2g . The action of GK on A[m] gives a representation
ρm : GK → Aut(A[m]) ∼
= GL2g (Z/mZ)

Let K(A[m]) be the field generated by all the points P ∈ A[m]. Consider the subgroup
H = Gal(K/K(A[m])) of GK = Gal(K/K). For each σ ∈ H, by definition σ(P ) = P for
every P ∈ A[m], so ρm (σ) is the identity map in Aut(A[m]), i.e. σ is in the kernel of ρm . Thus,
H ⊆ ker(ρm ), meaning that ρm factors through GK → GK /H. By the Fundamental Theorem
of Galois Theory, we have an isomorphism GK /H ∼ = Gal(K(A[m])/K). The situation can
be described by a commutative diagram
ρm
GK / GL2g (Z/mZ)
5

'
Gal(K(A[m])/K)

13. Mordell-Weil Theorem


Let K be a number field, and let A be an abelian variety. We can consider the set A(K)
which consists of all the K-points of A. This is naturally a group.
Mordell-Weil Theorem. A(K) is a finitely generated abelian group.
The theorem is proved by first proving a weaker version.
Weak Mordell-Weil Theorem. For each m ≥ 2, the quotient A(K)/mA(K) is finite.
Note that weak Mordell-Weil does not immediately imply the full Mordell-Weil. For
example, A = Q/Z is an example of an abelian group such that A/mA = 0 for each m ≥ 2,
but A is not finitely generated. The following lemma shows that it is enough to prove Weak
Mordell-Weil theorem after a finite extension of fields.
Lemma. Let L/K be a finite extension. Then the kernel of

A(K)/mA(K) → A(L)/mA(L)

is finite.
Proof. Let Φ be the kernel of this map, so
A(K) ∩ mA(L)
Φ=
mA(K)

Let P = P (mod mA(K)) ∈ Φ with P ∈ A(K). Then P = mQP for some QP ∈ A(L). By
replacing L by its Galois closure, we can assume that L/K is Galois. Define a map

fP : GL/K → A(L)
σ 7→ σ(QP ) − QP

where we use GL/K as a shorthand for Gal(L/K). Note that σ(QP ) − QP has order m.
Indeed, [m](σ(QP ) − QP ) = [m]σ(QP ) − [m]QP = σ([m]QP ) − [m]QP = σ(P ) − P = 0 as P
is defined over K. Thus, the target of fP can be replaced with A[m], so we can view fP as
26
a set map GL/K → A[m]. This association leads to a map
A(K) ∩ mA(L) → HomSet (GL/K , A[m])
| {z }
finite, as GL/K and A[m] are finite

P 7→ fP
Claim. If fP = fP 0 , then P − P 0 ∈ mA(K).
Note that the claim implies the result, since it gives an injective map
A(K) ∩ mA(L)
,→ finite set
mA(K)
Proof of the Claim. If fP = fP 0 , then σ(QP ) − QP = σ(QP 0 ) − QP 0 for every σ ∈ GL/K .
So, σ(QP − QP 0 ) = QP − QP 0 for every σ ∈ GL/K , i.e. QP − QP 0 ∈ A(K). Then P − P 0 =
m(QP − QP 0 ) ∈ mA(K).
This completes the proof of the lemma. 
Therefore, to prove weak Mordell-Weil, we may assume that A[m] ⊂ A(K) and l.µm ⊆ K. .
Let’s recall the Kummer sequence as an analogy:
∗ x7→xm ∗
1 −→ l.µm −→ K −→ K −→ 1
.
Taking group cohomology, we get
1 −→ l.µm ∩ K −→ K ∗ −→ K ∗ −→ H 1 (GK , l.µm ) −→ H 1 (GK , K ∗ ) = 0
. .
where the last terms is zero by Hilbert’s Theorem 90. Thus, we have an isomorphism
= H 1 (GK , l.µm ). This isomorphism is achieved by sending a → (σ 7→ α ) where
K ∗ /(K ∗ )m ∼ . σ(α)

α ∈ m a. The point is that often α ∈ / K, so the cocycle (σ 7→ σ(α)
α
) is not a coboundary in
general. If K contained all m-th roots of unity, then each cocycle would be a coboundary in
which case H 1 (GK , l.µm ) would be zero.
.
Similarly, we consider
m
0 → A[m] −→ A(K) −→ A(K) −→ 0
Taking group cohomology, we obtain
m
0 → A[m] ∩ A(K) −→ A(K) −→ A(K) −→ H 1 (GK , A[m]) −→ H 1 (GK , A(K)) −→ ...
Unfortunately, we don’t have a version of Hilbert Theorem 90 for abelian varieties, so
H 1 (GK , A(K)) is not necessarily zero. Since we are assuming A[m] ⊂ A(K), the connecting
homomorphism A(K) −→ H 1 (GK , A[m]) induces an injection
δ : A(K)/mA(K) ,→ H 1 (GK , A[m]) = Hom(GK , A[m])
The reason for the last equality is that A[m] ⊂ A(K), and so GK acts trivially on A[m]. It
is well-known that if G acts trivially on A, then the first non-trivial group cohomology is
just the set of homomorphisms, i.e. we get usual homomorphisms instead of twisted ones,
H 1 (G, A) := {f : G → A : f (gh) = gf (h) + f (g) ∀ g, h ∈ G}
= {f : G → A : f (gh) = f (h) + f (g) ∀ g, h ∈ G} = Hom(G, A)
27
whenever G acts trivially on A. Let’s write down the map δ explicitly. It is analogous to the
Kummer sequence.
δ : A(K)/mA(K) ,→ Hom(GK , A[m])
(P mod mA(K)) 7→ (σ 7→ σ(QP ) − QP where [m]QP = P )
Now, the issue is that Hom(GK , A[m]) is not finite in general. The trick is to pass to an
appropriate subextension K ⊆ L ⊆ K, namely
Y
L := K([m]−1 A(K)) = K(Q)
Q∈A(K)
mQ∈A(K)

where product symbol is used to indicate that we are taking the compositum over all such
fields K(Q). The point is that δ(P )(σ) = σ(QP ) − QP where QP ∈ [m]−1 P (”m”-th roots of
P ). If σ fixes all of L, then δ(P )(σ) = 0. In other words, Gal(K/L) is contained in the kernel
of δ(P ) : GK → A[m]. Since GK / Gal(K/L) = Gal(K/K)/ Gal(K/L) ∼ = Gal(L/K), we see
that δ(P ) factors through GL/K := Gal(L/K), i.e. δ(P ) : GL/K → A[m]. Consequently, we
obtain an injection
A(K)/mA(K) ,→ Hom(GL/K , A[m])
The advantage is that Hom(GL/K , A[m]) is a lot smaller than Hom(GK , A[m]). Our goal is to
show that Hom(GL/K , A[m]) is finite. It suffices to establish that L/K is a finite extension.
Goal. We want to prove that [L : K] < ∞, in which case #GL/K < ∞ so that
# Hom(GK , A[m]) < ∞, which would imply that A(K)/mA(K) is finite, thus proving the
Weak Mordell-Weil Theorem.
We will proceed by showing the following steps.
Step 1. L/K is abelian of exponent m.
Step 2. There are only finitely many primes p of K which ramifies in L.
Step 3. Step 1 + Step 2 ⇒ L/K is finite.
13.1. Step 1. We have a pairing:
A(K) × GK → A[m]
(P, σ) 7→ σ(QP ) − QP
where QP ∈ [m]−1 P . We first need to check the map is well-defined, i.e. does not depend
on the choice of QP . Indeed, suppose that QP and Q0P are both elements of [m]−1 P , i.e.
[m]QP = P and [m]Q0P = P . We need to show that σ(QP ) − QP = σ(Q0P ) − Q0P . Note that
[m](QP − Q0P ) = 0 so that QP − Q0P ∈ A[m] ⊂ A(K) by assumption. We have
σ(QP ) − QP − (σ(Q0P ) − Q0P ) = σ(QP − Q0P ) − (QP − Q0P ) = 0
as desired. Note that the assignment given here is “bilinear” (a group homomorphism once
you fix either of the entries). Given P, P 0 , we have [m](QP + QP 0 ) = P + P 0 , so that
QP +P 0 = QP + QP 0 . Thus,
(P + P 0 , σ) 7→ σ(QP +P 0 ) − QP +P 0 = σ(QP ) + σ(QP 0 ) − QP − QP 0 ←[ (P, σ) + (P 0 , σ)
On the other hand, given σ, τ ∈ GK , we have
(P, σ ◦ τ ) 7→ σ(τ (QP )) − QP = σ(τ (QP )) − τ (QP ) + τ (QP ) − QP
= σ(QP ) − QP + τ (QP ) − QP ←[ (P, σ) + (P, τ )
28
We should justify the last equality: σ(τ (QP )) − τ (QP ) = σ(QP ) − QP . This is equivalent to
showing that σ(τ (QP )) − σ(QP ) = τ (QP ) − QP , which is true because τ (QP ) − QP ∈ A[m] ⊂
A(K), so σ must fix this element, i.e. σ(τ (QP ) − QP ) = τ (QP ) − QP .
Let’s try to understand what we need to quotient out so that the map A(K) × G(K) →
A[m] becomes non-degenerate. First, let’s look for the points P ∈ A(K) such that σ 7→
σ(QP ) − QP is the zero map,
{P ∈ A(K) : σ(QP ) = QP ∀σ ∈ GK where mQP = P } = {P : QP ∈ A(K) where mQP = P }
= mA(K)
So, to get non-degeneracy in the first component, we need to replace A(K) with A(K)/mA(K).
Similarly, let’s for the elements σ ∈ GK such that P 7→ σ(QP ) − QP is the zero map,
{σ ∈ GK : σ(QP ) = QP ∀P ∈ A(K)} = {σ ∈ GK : σ fixes L} = Gal(K/L)
because L was defined precisely to be the field generated by elements QP from [m]−1 (P )
as P varies in A(K). To get non-degeneracy in the second component, we need to replace
GK = Gal(K/K) with the quotient Gal(K/K)/ Gal(K/L) ∼ = Gal(L/K). We conclude that
the pairing
A(K)/mA(K) × Gal(L/K) → A[m]
is non-degenerate. In particular,
Gal(L/K) ,→ Hom(A(K)/mA(K), A[m])
Since A[m] is an abelian group, and has exponent m, it follows that Gal(L/K) is also abelian
of exponent m, completing Step 1.

13.2. Step 2. Given a number field K, and a prime ideal ℘ inside the ring of integers OK ,
we can consider the reduction map
mod ℘
PN (K) −−−−−→ PN (F℘ )
[a0 : a1 : · · · : aN ] 7→ [e a1 : · · · : e
a0 : e aN ]
where e
am = ai mod ℘. For this map to be well-defined (i.e. to avoid all the entires to be 0
mod ℘), we can choose projective coordinates so that
(1) every ai is ℘-integral, i.e. ord℘ (ai ) ≥ 0.
(2) some ord℘ (aj ) = 0.
For an abelian variety A ⊆ PN , one can form A e℘ ⊂ PN . This is called the
e mod ℘ = A
F℘
reduction of A modulo ℘. We have a natural map A(K) → A℘ (F℘ ).
Example. Let E be an elliptic curve given by y 2 = x3 + Ax + B where A, B ∈ Z such
that 4A3 + 27B 2 6= 0. Then E ep is non-singular if and only if p - 2(4A3 + 27B 2 ). One needs
the extra factor of 2 in front of the discriminant because the equation is singular over F2 .
Theorem. A e℘ is non-singular for all but finitely many primes ℘, and in fact it is an
abelian variety (whenever it is non-singular).
The second part of the statement deserves a remark. Just because A e℘ is non-singular, it is
not a priori clear that it is an abelian variety. Indeed, it is not clear that when one reduces
the multiplication map µ : A × A → A mod ℘, the resulting map µ℘ : A℘ × A℘ → A℘ stays
a morphism (in general it could be a rational map).
29
When A ewp is non-singular, we say that ℘ is a prime of good reduction for A. Otherwise, ℘
is a prime of bad reduction.
Thus, we get a homomorphism of abelian varieties,

A(K) → A
e℘ (F℘ )

for all primes ℘ ∈


/ SA/K , where SA/K is the finite set of bad reduction primes.
The following is a key input which goes into the proof of the Weak Mordell-Weil theorem.
Key Fact. If A has a good reduction at ℘ and ℘ - m, then

A(K)[m] ,→ A
e℘ (F℘ )

In general, the kernel of A(K) → A e℘ (F℘ ) can be quite big. For example, if A(K) is infinite,
then the kernel is necessarily infinite as the target A e℘ (F℘ ) is finite. Nevertheless, the key
fact says that the m-torsion points will not be in the kernel.
Analog for the multiplicative group. If p - m, then µm ,→ Fp . In this case, µm =
Gm [m]. Note that the multiplicative group Gm = {xy − 1 = 0} is always non-singular when
reduced mod p. The injection µm ,→ Fp says that different m-th roots do not coincide in Fp
provided that p - m.
Proposition. The Key Fact ⇒ L/K is unramified for ℘ ∈ / SA/K,m , where

SA/K,m = {℘ : A has a bad reduction at ℘} ∪ {℘ : ℘ | m}

Recall that the field L is defined by,


Y
L := K([m]−1 A(K)) = K(Q)
Q∈A(K)
mQ∈A(K)

Proof. Let ℘ ∈ / SA/K,m . We want to show the unramifiedness. Let I℘ ⊂ GL/K be the inertia
group at ℘. Since GL/K is abelian, the inertial group is “well-behaved”. Since e = #I℘ is
equal to the ramification index at ℘, it suffices to prove that I℘ is the trivial group (being
unramified means that e = 1). Let p be a prime in L lying above ℘, i.e. p | ℘. Recall the
definition of the inertia group:

I℘ = ker GL/K → Gp/℘

Let σ ∈ I℘ . So σ fixes “L-things” mod p. Let Q ∈ A(L) with [m]Q ∈ A(K). Then
(1) σ(Q) ≡ Q mod p.
(2) [m](σ(Q) − Q) = σ([m]Q) − [m]Q = 0 as [m]Q ∈ A(K).
So σ(Q) − Q ∈ A[m]. Thus, σ(Q) − Q ∈ ker(A(L)[m] → A ep (Fp )) = 0 by the key fact, so
σ(Q) = Q. Therefore, σ ∈ I℘ fixes every Q with [m]Q ∈ A(K), which are the elements used
to generate L as a field extension over K. This implies that σ fixes L pointwise, i.e. σ = id
in GL/K . Since σ ∈ I℘ was arbitrary, we conclude that the inertia group I℘ is trivial, and
that ℘ is unramified in L. 
This completes Step 2, assuming the key fact above (which will be proved later).
30
13.3. Step 3. We want to prove the following general result about field extensions.
Proposition. Suppose that L/K is an abelian extension of exponent m and unramified
at all ℘ ∈
/ S for some finite set S of primes. Then L/K is a finite extension.
Proof. Without loss of generality, we can make K bigger and so we may assume that l.µm ⊂ .
K. Indeed, replacing K with a finite extension K 0 will replace Gal(L/K) with its subgroup
Gal(L/K 0 ) which is still abelian of exponent m, and L/K is finite if and only if L/K 0 is
finite. By Kummer theory, there is an isomorphism K ∗ /(K ∗ )m ∼ = Hom(Gal(K/K), l.µ) given .
by b 7→ (σ 7→ σ(β)
β
where β m = b). This isomorphism comes from applying group cohomology
to the Kummer exact sequence, and using Hilbert Theorem 90 that H 1 (Gal(K/K), K ∗ ) =
0, and that H 1 (Gal(K/K), l.µ) = Hom(Gal(K/K), l.µ) because Gal(K/K) acts trivially on
. .
.
.
lµ ⊂ A(K), so the twisted homomorphisms are just the usual homomorphisms.
Next, note that L is generated by finite extensions K 0 of K. Since Gal(L/K)  Gal(K 0 /K),
it follows that Gal(K 0 /K) is a finite abelian group of exponent m, so we can further decom-
pose K 0 into smaller extension of K whose Galois group is a subgroup of Z/mZ. Thus, L is
generated by those finite subextensions K 0 of K such that Gal(K 0 /K) ⊆ Z/mZ, i.e.
Y
L= K0
K 0 /K,K 0 ⊂L
GK 0 /K ⊆Z/mZ

where each such K 0 is obtained as K 0 = K( m b) for some b ∈ K ∗ . Thus,
p 
m
L=K bi : i ∈ I

where bi are chosen from equivalence classes in K ∗ /(K ∗ )m . By enlarging the finite set S,
if necessary, we may assume that S contains {℘ : ℘ |√m}. The discriminant of xm − b is
/ S, it follows that K( m b)/K is unramified at ℘ if and only
±mm bm−1 . So, for all primes ℘ ∈
if ord℘ (b) ≡ 0 (mod m). We have an exact sequence
Y
0 → B −→ hbi ∈ K ∗ /(K ∗ )m : i ∈ Ii −→ Z/mZ
℘∈S

b 7→ (ord℘ (b) (mod m))℘∈S


where B is defined to be the kernel of the given map. We claim that B is finite. Note that
B ⊂ hb ∈ K ∗ /(K ∗ )m : ord℘ (b) ≡ 0 (mod m) ∀℘}
For each b ∈ B, we can factorize the ideal (b) = (℘e11 ℘e22 · · · ℘err )m . By making S bigger, we
can assume that the ring of S-integers RS is a PID, where R = OK is the ring of integers
of the base field. Consider the class group HK = {a1 , ..., ah } of K. Add to the set S all
the primes ℘ with ord℘ (ai ) 6= 0. Then a1 , a2 , ..., ah are units in RS . Since RS is a PID, in
the decomposition (℘e11 ℘e22 · · · ℘err )m , we can write ℘i = (πi ) so (b) = (β m ) which means that
b = ub β m for some unit ub in RS∗ . By Dirichlet’s unit theorem, the set RS∗ /(RS∗ )m is finite, so
we just need to pick finitely many representatives
Q u1 , u2 , ..., ut , which shows that B is finite.
Combining the fact that B is finite and ℘∈S Z/mZ is finite (because the set S is finite, and
we see how this was a key hypothesis), we conclude from the above displayed exact sequence
that hbi ∈ K ∗ /(K ∗ )m : i ∈ Ii is finite. As this is a generating set for L over K, it follows
that L is a finite extension of K. 
31
13.4. The proof of the Key Fact. We have proved the Weak Mordell-Weil Theorem, that
is finiteness of A(K)/mA(K) for every m ≥ 2, modulo a key fact stated before,
Key Fact. If A has a good reduction at ℘ and ℘ - m, then
A(K)[m] ,→ A
e℘ (F℘ )

There are three different approaches: 1) Using formal groups – see the book [HS00], 2)
Chevalley-Weil Theorem regarding unramified morphisms V → W – see exercise C.7 in
[HS00], and 3) Hensel’s Lemma, plus some facts from algebraic geometry.
We will follow approach 3) for our proof.
Proof. We will need the following theorem, which we will assume without a proof:
Theorem. Let k be an algebraically closed field, and A/k be an abelian variety, and
g := dim(A). Then
(
(Z/pt Z)2g if p 6= 0 in k
A(k)[pt ] =
(Z/pt Z)i if p = 0 in k, where 0 ≤ i ≤ g
for each t ≥ 1.
We also need Hensel’s Lemma, which we will state in two versions. The first version is
the classical version, while the second one is a geometric version for varieties. Note that the
uniqueness part in version 1 is missing in version 2.
Hensel’s Lemma. Let K℘ be the completion of K at a prime ℘, and let R℘ denote the
ring of integers, and denote by ℘ = ℘R℘ for the corresponding maximal ideal. Then the
following statements hold.
Version 1. Let f (x) ∈ R℘ [x] and α ∈ R℘ such that f (α) ≡ 0 (mod ℘) and f 0 (α) 6≡
0 (mod ℘). Then there exists a unique β ∈ R℘ such that β ≡ α (mod ℘) and f (β) = 0.
Version 2. Let V /K℘ ⊆ PnK℘ be a variety. Reduce mod ℘ to get Ve℘ /F℘ ⊂ PnF℘ . Let
Qe ∈ Ve℘ (F℘ ). Assume that Qe is nonsingular point of Ve℘ . Then there exists some Q ∈ V (K℘ )
such that Q ≡ Q e (mod ℘).
Recall the definition of a non-singular point. We can choose local coordinates x1 , ..., xr
such that X is locally given by {f1 = ... = fs = 0} where
 we can choose fi ∈ R℘ [x1 , ..., xn ].
∂fi e
A point Q is a nonsingular point of V if rank (Q) = n − dim(V ).
∂xj
We can now prove the key fact above. Suppose that ℘ is a prime such that A has a good
reduction at ℘ and ℘ - m. By the Version 2 of Hensel’s Lemma, the natural map
A(K℘ )  A
e℘ (F℘ )

is surjective. By enlarging the field K if necessary, we can assume that A[m] ⊆ K. Let’s
view A[m] as a scheme (rather than a set of closed points). Since A ⊂ PN is defined by
some polynomial equations, A = {F1 = · · · = Fr = 0}, the same holds true for A[m], i.e.
A[m] = {F1 = · · · = Fr = G1 = · · · = Gs = 0} where the polynomials G1 , G2 , ...., Gs arise
from analyzing the equation [m]P = 0.
Claim. The variety A[m] is non-singular.
Proof. It suffices to show that the multiplication map [m] : A → A is unramified, in
fact étale. This will imply the desired result as A[m] is the fiber of this map above 0. This
can be checked by passing to C. Indeed, [m] : A(C) → A(C) gets identified with the map
Cg /L → Cg /L given by z 7→ mz, i.e. multiplying each coordinate with m. This latter map is
32
étale, so the former map must be étale as well. Thus, A[m] is non-singular in characteristic
0, and so it is non-singular over F℘ for all but finitely many primes ℘. 
Hence, for all but finitely many primes ℘, we can apply Hensel’s lemma to A[m] to get a
surjective map
A[m](K℘ )  A e℘ [m](F℘ )
Since both sides are isomorphic to (Z/mZ)2g , they both have the same cardinality m2g , and
thus, the map A[m](K℘ )  A e℘ [m](F℘ ) is also injective. 

14. Mordell-Weil Theorem


Our goal now is to prove the Mordell-Weil Theorem: A(K) is a finitely-generated abelian
group for an abelian variety A and a number field K. We have already proved the Weak
Mordell-Weil which states that A(K)/mA(K) is finite for each m ≥ 2. To obtain the full
Mordell-Weil from the weak Mordell-Weil, we need to study the height machine on abelian
varieties.
14.1. Divisors on Abelian Varieties. For an abelian variety A, consider the set
End(A) = {algebraic homomorphisms A → A}
This is called the endomorphism ring of A. The ring structure on End(A) is the following: for
f, g ∈ End(A), and a point P ∈ A, define (f +g)(P ) = f (P )+g(P ), and (f ·g)(P ) = f (g(P )).
There are some conditions to be checked here, such as distributivity.
Theorem. (Theorem of the Cube). Given f, g, h ∈ End(A), and D ∈ Div(A),
(f + g + h)∗ D − (f + g)∗ D − (f + h)∗ D − (g + h)∗ D + f ∗ D + g ∗ D + h∗ D ∼ 0
Remark. It is not true in general that (f + g)∗ D = f ∗ D + g ∗ D. If this were true, then the
theorem of the cube would be trivial.
Let’s apply the theorem of the cube in the particular case when f = [m], g = [1], and
h = [−1]. So f (P ) = m[P ], g(P ) = P and h(P ) = −P for each P ∈ A. We get
[m]∗ D − [m + 1]∗ D − [m − 1]∗ D − [0]∗ D + [m]∗ D + [1]∗ D + [−1]∗ D ∼ 0
Note that [0]∗ D = 0 and [1]∗ D = D. After rearranging, we get
[m + 1]∗ D + [m − 1]∗ D ∼ 2[m]∗ D + D + [−1]∗ D
This gives a 2-step recurrence relation which can be solved, the two base cases being [0]∗ D = 0
and [1]∗ D = D. We state the final result.
Corollary. For D ∈ Div(A) and m ≥ 1, we have
 2   2 
∗ m +m m −m
[m] D ∼ D+ [−1]∗ D
2 2
Proof. Now that we know the answer, the proof can be obtained by a simple induction. 
Definition. We say that a divisor D ∈ Div(A) is symmetric if [−1]∗ D ∼ D. Similarly, we
say that D is anti-symmetric if [−1]∗ D ∼ −D. The above corollary simplifies in the special
case of a symmetric or anti-symmetric divisor.
Corollary. Let D ∈ Div(A) and m ≥ 1. If D is symmetric, then [m]∗ D ∼ m2 D. If D is
anti-symmetric, then [m]∗ D ∼ mD.
Recall the context of canonical heights. If f is a morphism A → A such that f ∗ D = λD
for λ > 1, then we constructed a height function ĥf,D which satisfies some nice properties.
33
Theorem. Let K/Q be a number field, and A an abelian variety over K. Let D ∈ Div(A)
be a symmetric divisor, so that [−1]∗ D ∼ D. Then
1
(1) ĥD (P ) = lim n hD ([2n ]P ) exists and satisfies ĥD (P ) = hD (P ) + O(1).
n→∞ 4
(2) For each m ≥ 2, we have ĥD ([m]P ) = m2 hD (P ).
1
Proof. (1) By definition, ĥf,D (P ) = lim n hD (f n (P )). In this case, f = [2] and [2]∗ D =
n→∞ λ
1
4D so λ = 4. Thus, ĥD := ĥ[2],D = lim n hD ([2n ]P ). We have previously proved the
n→∞ 4
property ĥD (P ) = hD (P ) + O(1) in general.
(2) The key is that the multiplication maps [2] and [m] commute. For simplicity, we will
denote [m]P with mP in the computation below.
ĥD (mP ) = lim 4−n hD (2n (mP ))
n→∞
= lim 4−n hD (m(2n P ))
n→∞
= lim 4−n h[m]∗ D (2n P ) + O(1)

n→∞
= lim 4−n (hm2 D (2n P ) + O(1))
n→∞
= lim 4−n m2 hD (2n P ) + O(1)

n→∞
= lim 4−n m2 hD (2n P )
n→∞

= m2 lim 4−n hD (2n P ) = m2 ĥD (P )


n→∞

as desired. 

14.2. More height inequalities. Fix a point Q ∈ A(K). Define a translation map
TQ : A → A
P 7→ P + Q
Let’s use the theorem of the cube for a symmetric divisor D,
(f + g + h)∗ D − (f + g)∗ D − (f + h)∗ D − (g + h)∗ D + f ∗ D + g ∗ D + h∗ D ∼ 0
with f = TQ , g = T−Q and h = [−1]. We first compute all the relevant maps:
(f + g + h)(P ) = P + Q + P − Q − P = P
(f + g)(P ) = P + Q + P − Q = 2P
(f + h)(P ) = P + Q − P = Q
(g + h)(P ) = P − Q − P = −Q
Thus, f + g + h = [1], f + g = [2], and f + h = Q and g + h = −Q are the constant maps
with values Q and −Q, respectively. Going back to the theorem of the cube, and using the
fact that the pullback of D under a constant map is 0, we obtain
[1]∗ D − [2]∗ D − 0 − 0 + TQ∗ D + T−Q

D + [−1]∗ D ∼ 0
34
Now [1]∗ D and [2]∗ D = 4D. Also [−1]∗ D = D as D is assumed to be symmetric. Rearranging
the terms,
TQ∗ D + TQ∗ D ∼ 2D
for every fixed Q ∈ A(K). We will convert this statement about linear equivalence of divisors
(geometry) to an assertion about the height machine (arithmetic):
hTQ∗ D (P ) + hT−Q
∗ D (P ) = h2D (P ) + O(1)

∗ D (P ) = hD (P − Q).
Using functoriality, hTQ∗ D (P ) = hD (TQ (P )) = hD (P + Q). Similarly, hT−Q
Using linearity, h2D (P ) = 2hD (P ) + O(1). Thus,
hD (P + Q) + hD (P − Q) = 2hD (P ) + OA,D,Q (1)
where the subscript on the O(1) is written to emphasize which variables it depends on. We
will shortly see a more precise version of the bound (using the theorem of the square) which
will explain how the bound depends on Q. In particular, if D is ample, then hD (P + Q) ≥
O(1). Consequently,
2hD (P ) + O(1) = hD (P + Q) + hD (P − Q) ≥ O(1) + hD (P − Q)
or equivalently,
hD (P − Q) ≤ 2hD (P ) + O(1)
This is the key height inequality that is used in the descent argument in the proof of Mordell-
Weil. We will refer to this inequality as the descent inequality.

14.3. Descent. We are finally ready to prove the Mordell-Weil theorem. We will assume
the Weak Mordell-Weil, which we have already proved.
Theorem. (Mordell-Weil) For an abelian variety A over a number field K, the abelian
group A(K) is finitely-generated.
Proof. Fix m ≥ 2. By the Weak Mordell-Weil Theorem, the quotient A(K)/mA(K) is
finite. Choose coset representatives
{Q1 , Q2 , ..., Qr } ←→ A(K)/mA(K)
Fix an ample symmetric divisor D. This can be obtained by taking an ample divisor H,
and considering D = H + [−1]∗ H which is both ample and symmetric. For each point P ,
we have
(3) hD (mP ) ≥ m2 hD (P ) − C1 (A, D, m)
(4) hD (P − Q) ≤ 2hD (P ) + C2 (A, D, Q)

for some constants C1 , C2 . The first inequality relies on ĥD (mP ) = m2 ĥD (P ), while the
second inequality is the descent inequality. Applying (4) for each coset representative Qi ,
we can get a single constant C2 (A, D, Q1 , ..., Qr ) satisfying
(5) hD (P − Q) ≤ 2hD (P ) + C2 (A, D, Q1 , ..., Qr )
Take any P0 ∈ A(K). There exists some i1 such that
P0 ≡ Qi1 (mod mA(K))
35
that is, P0 = mP1 + Qi1 for some P1 ∈ A(K). Similarly, there is some i2 such that P1 =
mP2 + Qi2 for some P2 ∈ A(K). Continuing in this way n times, we have constructed a
sequence of points P0 , P1 , ..., Pn ∈ A(K) such that
P0 = mP1 + Qi1
P1 = mP2 + Qi1
..
.
Pn−1 = mPn + Qin
The idea of the descent is to show that the heights of the points Pi must be getting smaller
as i increases. The system of equations imply that
P0 = mn Pn + Z-linear combination of Q1 , ..., Qr
For each 1 ≤ j ≤ n, we obtain Pj−1 − Qij = mPj , so hD (Pj−1 − Qij ) = hD (mPj ). Applying
(4) and (5),
2hD (Pj−1 ) + C2 ≥ hD (Pj−1 − Qij ) = hD (mPj ) ≥ m2 hD (Pj ) − C1
Thus,
m2 hD (Pj ) − C1 ≤ 2hD (Pj−1 ) + C2
or equivalently,
2 C1 + C2
hD (Pj ) ≤ hD (P j−1 ) +
m2 m2
Apply this repeatedly to get,
 n  2  n−1 !
2 C1 + C2 2 2 2
hD (Pn ) ≤ h D (P 0 ) + 1+ 2 + + ··· +
m2 m2 m m2 m2
Note that
2 n−1 ∞  i
m2
 
2 2 2 X 2 1
1+ 2 + + ··· + ≤ = =
m m2 m2 i=0
m2 1 − m22 m2 − 2
Substituting this bound into the previous one, we obtain
 n
2 C1 + C2
hD (Pn ) ≤ 2
hD (P0 ) + 2
m m −2
As m ≥ 2, we can get even more rough upper bound by replacing each m with 2,
 n
1 C1 + C2
hD (Pn ) ≤ hD (P0 ) +
2 2
This explains why the height of Pn decreases as n increases. In particular, we can find n
1 n

(which depends only on the initial point P0 ) such that 2 hD (P0 ) ≤ 1, so that
C1 + C2
hD (Pn ) ≤ 1 +
2
Recall that
P0 = mn Pn + Z-linear combination of Q1 , ..., Qr
36
Since P0 was an arbitrary point in A(K), we conclude that
  
C1 + C2
A(K) ⊂ SpanZ {Q1 , ..., Qr } ∪ P ∈ A(K) : hD (P ) ≤ 1 +
2
The first set {Q1 , ..., Qr } is finite (because of Weak Mordell-Weil), and the second set is also
finite because D is ample. This finally completes the proof of the Mordell-Weil Theorem. 

15. Theorem of the Square


Recall the canonical heights for abelian varieties. Let D be a symmetric divisor on an
abelian variety A, i.e. D = [−1]∗ D. We get a canonical height ĥD : A(K) → R satisfying
(1) ĥD (mP ) = m2 ĥD (P ).
(2) ĥD (P ) = hD (P ) + O(1).
Theorem. (Theorem of the Square). Consider the following four maps σ, δ, π1 , π2 :
A × A → A defined by
σ(P, Q) = P + Q, δ(P, Q) = P − Q
π1 (P, Q) = P, π2 (P, Q) = Q
Then
σ ∗ D + δ ∗ D ∼ 2π1∗ D + 2π2∗ D
Remark. There is an equivalent version of the theorem of the square which states that
the map
φD : A → Pic(A)

P 7→ T−P D
is a homomorphism of abelian groups. Here T−P is the translation by −P . Another fact:
ker(φD ) is finite if and only if D is ample.
As usual, we can convert linear equivalence between divisors into a statement about the
height machine.
Theorem. For each P, Q ∈ A, we have
hD (P + Q) + hD (P − Q) = 2hD (P ) + 2hD (Q) + OA,D (1)
This is a more precise statement than the descent inequality used in the proof of the Mordell-
Weil theorem. By replacing P with 2n P , and Q with 2n Q in the theorem above, multiplying
both sides by 41n and letting n → ∞, we obtain
(6) ĥD (P + Q) + ĥD (P − Q) = 2ĥD (P ) + 2ĥD (Q)
The advantage of using canonical heights is that there is no more O(1) term. The equation
(6) is called the parallelogram formula.
Exercise. Use the parallelogram formula to deduce that
A(K) × A(K) → R
1 
(P, Q) 7→ hP, QiD := ĥD (P + Q) − ĥD (P ) − ĥD (Q)
2
1
is a bilinear form. Note that the factor of 2
is needed to ensure that hP, P iD = ĥD (P ).
37
16. Quadratic form on A(K)/A(K)tors
Using the theorem of the square, we have proved the parallelogram formula,
ĥD (P + Q) + ĥD (P − Q) = 2ĥD (P ) + 2ĥD (Q)
for any symmetric divisor D ∈ Div(A) on abelian variety. We claim that this implies ĥD is
a quadratic form. The proof is a formal consequence of the parallelogram law, so we record
the most general version. The proof given below appears in [HS00, Lemma B.5.2].
Lemma. Let A be an abelian group, and f : A → R be any function satisfying the
parallelogram law,
f (P + Q) + f (P − Q) = 2f (P ) + 2f (Q)
for every P, Q ∈ A. Then f is a quadratic form on A.
Proof. When P = Q = 0, the parallelogram law yields f (0) = 0. Letting P = 0, and
keeping Q as a variable, we get f (Q) + f (−Q) = 2f (Q), so that f (−Q) = f (Q). Thus, f is
an even function with no constant term. Next, we apply the parallelogram law four times,
namely for the pairs (P + R, Q), (P + Q, R), (P, R − Q), and (R, Q).
(7) f (P + R + Q) + f (P + R − Q) − 2f (P + R) − 2f (Q) = 0
(8) f (P + Q + R) + f (P + Q − R) − 2f (P + Q) − 2f (R) = 0
(9) f (P + R − Q) + f (P − R + Q) − 2f (P ) − 2f (R − Q) = 0
(10) 2f (R + Q) + 2f (R − Q) − 4f (R) − 4f (Q) = 0
Adding (7), (8), and subtracting (9), (10), and dividing both sides by 2, we obtain
(11) f (P + Q + R) + f (P ) + f (Q) + f (R) = f (P + Q) + f (Q + R) + f (R + P )
or equivalently,
f (P + Q + R) − f (P ) − f (Q + R) = [f (P + Q) − f (P ) − f (Q)] + [f (P + R) − f (P ) − f (R)]
Define hP, Qi = 21 (f (P + Q) − f (P ) − f (Q)). So (11) is equivalent to,
hP, Q + Ri = hP, Qi + hP, Ri
Thus, hP, Qi is a bilinear form, and f is a quadratic form on A. 
Theorem. Let D be an ample symmetric divisor on abelian variety A. Then:
(1) ĥD is a quadratic form.
(2) ĥD (P ) = 0 ⇔ P ∈ Ators .
(3) ĥD is positive definite on A(K)/A(K)tors ∼= Zrank A(K) .
(4) ĥD induces a positive definite quadratic form on A(K) ⊗Z R ∼ = Rrank A(K) .
Proof. We have already proved (1) above. We do not need ampleness here.
(2) (⇐) If P ∈ Ators , then mP = 0, so m2 ĥD (P ) = ĥD (mP ) = ĥD (0) = 0, which implies
that ĥD (P ) = 0.
(2) (⇒) If ĥD (P ) = 0, then ĥD (mP ) = m2 ĥD (P ) = 0 for each m ≥ 1. Since ĥD =
ĥD + O(1), we conclude that {hD (mP )}m≥1 is a bounded set. Since D is ample, the set
{mP }m≥1 must be finite, i.e. nP = kP for some n > k, so that (n − k)P = 0 and
P ∈ A(K)tors .
38
(3) First, we should explain why ĥD induces a well-defined function on A(K)/A(K)tors .
We have already seen that ĥD (T ) = 0 if and only if T ∈ A(K)tors . In particular, if P ∈ A(K)
and T ∈ A(K)tors with mT = 0, then
1 1 1
ĥD (P + T ) = 2 ĥD (m(P + T )) = 2 ĥD (mP + mT ) = 2 ĥD (mP ) = ĥD (P )
m m m
Since ĥD (P + T ) = ĥD (P ) for each P ∈ A(K) and T ∈ A(K)tors , the function ĥD is well-
defined on A(K)/A(K)tors . Since D is ample, ĥD : A(K)/A(K)tors → [0, ∞). Using (2)
again, we know that ĥD (P ) = 0 if and only if P = 0 for P ∈ A(K)/A(K)tors . Thus, ĥD is
positive definite on A(K)/A(K)tors .
(4) So far we have a lattice L ∼ = A(K)/A(K)tors ∼ = Zr where r = rank A(K), and a certain
quadratic form q : L → [0, ∞).PWe should first explain how to define q on L ⊗Z R. Given
v ∈ R ⊗Z L, we can write v = ni=1 αi ⊗ vi for The key is to use the bilinear property. We
have h·, ·i : L × L → R defined by ha, bi = 21 (q(a + b) − q(a) − q(b)) so that q(v) = hv, vi for
v ∈ L. We can define q(v) as follows:
n
! * n n
+
X X X
q(v) = q αi vi = αi vi , αj vj
i=1 i=1 j=1
X X X q(vi + vj ) − q(vi ) − q(vj )
= hαi vi , αj vj i = αi αj hvi , vj i = αi αj
i,j i,j i,j
2

For q = ĥD and L ∼ = A(K)/A(K)tors ∼ = Zr , first note that L = image(A(K) → A(K) ⊗ R)


where the map A(K) → A(K) ⊗ R is v 7→ v ⊗ 1. Let V = A(K) ⊗ R = Rr , and so q induces
a quadratic form on Rr . By a standard result in linear algebra, q can be diagonalized on V
with its associated matrix having a certain number 1s, −1s, and 0s on the diagonal:
 
1s
q ←→  −1t 
0r−s−t
Let {v1 , ..., vr } be a basis for Rr where the quadratic form has this diagonal representation:
r
! s s+t
X X X
2
q(v) = q ci vi = ci − c2i
i=1 i=1 i=s+1

With this notation, q is positive definite on V ⇔ s = r and t = 0. Assume, to the contrary,


that q = ĥD is not positive definite so that s < r. For all ε > 0 and B > 0, look at the set
( r s r
)
X X X
Cε,B = ci vi : c2i ≤ ε and c2i ≤ B
i=1 i=1 i=1

Observe that Cε,B is compact, convex, and symmetric, that is, if #»


x ∈ Cε,B then − #»
x ∈ Cε,B .
For a given ε > 0,
lim volume(Cε,B ) → ∞
B→∞
This is where the hypothesis s < r was crucially used. Indeed, if s = r, then the second
condition in the definition of Cε,B is not present. Recall that Minkowski’s Theorem guaran-
tees that a compact, convex and symmetric region E ⊂ Rr must contain a non-zero point
39
of a lattice L provided that Vol(E) > 2r Vol(Rr /L). In particular, for any given ε > 0, we
can choose the corresponding Bε > 0 to be large enough so that 0 6= P ∈ L ∩ Cε,Bε where
L = A(K)/A(K)tors . By definition,
r
! s s+t
X X X
ĥD (Pε ) = q(Pε ) = q ci (Pε )vi = ci (P )2 − ci (P )2 ≤ ε
i=1 i=1 i=s+1

This implies that the set {P ∈ A(K) : ĥD (P ) ≤ 1} is infinite, which is a contradiction as D
is ample. 
Remark. It is worth emphasizing that part (4) did not formally follow from part (3),
and we had to use special properties of the height function, namely boundedness. Here is an
example of a lattice L with a quadratic form q such that q is positive definite√on L, but q does
not induce positive definite quadratic form on L ⊗Z R. Consider L =√Z + Z 2 as a subgroup √
2 2 2
of R, and let q : L → R given by q(x) = |x|
√ . More explicitly, q(a + b 2) = a + 2b + 2ab 2.
Then q is positive definite on L because ∼
2 is irrational. On the other hand, L ⊗Z R = R ⊕ R
√ √
and q(a + b 2) = 0 for (a, b) = ( 2, −1) so q is not positive definite on L ⊗Z R. The problem
apparently arises from the fact that the image set q(L) has an accumulation point in R. By
contrast, the boundedness property for ample divisors is precisely the statement that the set
{hD (P )}P ∈A(K) is a discrete subset of R, so everything works out in this case.

17. Path towards Faltings: Mumford’s Theorem.


17.1. Abelian Regulator. Let A be an abelian variety over a number field K. Fix a
symmetric ample divisor D ∈ Div(A). Let P1 , P2 , ..., Pr ∈ A(K) generate A(K)/A(K)tors ∼
=
r
Z where r = rank A. We define the abelian regulator of A(K) with respect to D as follows:
RegD (A(K)) = covolume of A(K)/A(K)tors in A(K) ⊗ R relative to ĥD
= det (hPi , Pj i)1≤i≤j≤r 6= 0
Note that the determinant is not zero because ĥD is positive definite.
17.2. Counting Functions. Given a variety V over a number field K with a height function
hD , and B > 0, consider the following function
N (V (K), hD , B) := #{P ∈ V (K) : hD (P ) ≤ B}
It can be shown that
N (PN (K), h, B) ∼ CN,K
B

for some constant CN,K depending on N and K.


Theorem. (Neron) If A is an abelian variety, then
 
volume of r-dimensional ball
N (A(K), ĥD , B) ∼ #A(K)tors · · B rank A(K)
RegD (A(K))
Let C be a smooth curve of genus g ≥ 2.
Question. How big is N (C(K), hD , B)?
Falting’s Theorem asserts that N (C(K), hD , B) = O(1), so there are finitely many K-
rational points on C. Before Falting’s celebrated result, Mumford had already shown (in the
1960s) the following weaker result
N (C(K), hD , B) < log B
40
We will work towards understanding Mumford’s proof.

18. The Jacobian of a curve


Our eventual goal is to prove Falting’s theorem, which states that if C is a smooth curve
of genus g ≥ 2 defined over a number field K, then C(K) is finite. We will soon the define
the Jacobian variety J associated to C, which will come with an embedding C ,→ J. The
Jacobian variety is an abelian variety, and so C(K) ,→ J(K) ∼ = finite group × Zr by the
Mordell-Weil theorem. Thus, Falting’s theorem would follow if one can show that C ∩ J(K)
is a finite set. This latter statement is generalized by the following.
Lang-Mordell Conjecture (proven by Faltings). Let A/C be an abelian variety, and
Γ ⊂ A(C) be a finitely generated subgroup. Let X ⊂ A be a subvariety. Assume that X
does not contain a translate of an abelian subvariety. Then X ∩ Γ is finite.
Let C be a smooth projective curve of genus g, so C(C) is a Riemann surface. Topo-
logically, C(C) is a g-holed torus. We have H1 (C(C), Z) ∼ = Z2g . By the definition of the
geometric genus, H 0 (C(C), Ω1C ) ∼= Zg .
Example. Let Cf be a curve defined by y 2 = x2g+1 + a1 x2g + · · · +na2g = f (x) where o
xi dx
disc(f ) 6= 0. Then g(Cf ) = g. Indeed, one can show that H 0 (Ω1Cf ) = span y
:0≤i<g .
xi dx 2xi dy
Note that 2ydy = f 0 (x)dx ⇒ y
= f 0 (x)
, so these proposed spanning elements are holo-
morphic on (x, y)-frame.
18.1. Jacobian of curves over C. Let P0 ∈ C(C), and consider
H 0 (C, Ω1 ) = span{ω1 , ω2 , ..., ωg }
Given ω ∈ H 0 (C, Ω1 ), consider the map
C(C) → C
Z P
P 7→ ω
P0
As written, this is not a well-defined map because the output depends
R on the path between
P0 and P . However, the ambiguity is precisely captured by the span γ ω : γ a closed loop .
nR o
This last set can be identified with γ ω : ω ∈ H1 (C(C), Z) .
This motivates us to consider the set
Z Z Z  
LC = ω1 , ω2 , ..., ωg : γ ∈ H1 (C(C), Z)
γ γ γ

and the map


ϕ
C
C(C) → Cg /LC
Z P Z P Z P 
P 7→ ω1 , ω2 , ..., ωg
P0 P0 P0
Abel-Jacobi Theorem.
(1) LC is a lattice.
(2) ϕC is an embedding.
(3) Cg /LC is an abelian variety.
41
In particular, (3) tells us that there exists a holomorphic map Cg /LC ,→ PN
C for some N .
Definition. The abelian variety Cg /LC is called the Jacobian of C, and often denoted by
JC or JacC .

18.2. Algebraic construction of JC . We have an exact sequence


deg
0 −→ Div0 (C) −→ Div(C) −→ Z −→ 0
0
We can define Pic0 (C) = linearDiv (C)
equivalence
. Without loss of generality, assume that C(K) 6= ∅,
0
and pick P0 ∈ C(K). Then Pic (C) will be the Jacobian of the curve C. We have a map
j : C → Pic0 (C) given by j(P ) = [(P )−(P0 )], where the square brackets mean modulo linear
equivalence. We have that the map j is injective if and only if g ≥ 1 (or else there exists a
global holomorphic function with exactly one zero and one pole, which gives an isomorphism
between C and P1 ).
We can extend the map j for n-tuple of points on C, namely
j : C n 7→ Pic0 (C)
" n
#
X
j(P1 , ..., Pn ) 7→ j(P1 ) + · · · + j(Pn ) = (Pi ) − n(P0 )
i=1
n 0
Fact. n ≥ g ⇒ j(C ) = Pic (C). This fact uses Riemann-Roch.

18.3. Application of Riemann-Roch. Let C be a smooth projective curve of genus g,


and let D ∈ Div(C). We define the space
L(D) = {f ∈ k(C) : div(f ) + D ≥ 0}
and let `(D) = dim L(D). The Riemann-Roch Theorem tells us when we can find a mero-
morphic function with prescribed zeros and poles.
Weak version. If deg(D) ≥ 2g + 1, then `(D) = deg(D) − g + 1.
Strong Version. Let KC be a canonical divisor on C. Then
`(D) − `(KC − D) = deg(D) − g + 1
Let’s recall the definition of the canonical divisor. Take any non-zero differential 1-form ω on
C. At each point P ∈ C, let xP be a uniformizer i.e. ordP (xP ) = 1. Write ω = f dxP for some
f ∈ k(C). This is possible as {meromorphic 1-forms}
P is 1-dimensional over k(C). Now define
ordp (ω) = ordp (fp ), and set KC := div(ω) = P ∈C ordP (ω)P . Note that given another non-
zero differential ω 0 , we have ω 0 = gω for some g ∈ K(C). So div(ω 0 ) = div(g) + div(ω), i.e.
[KC ] = [div(ω)] is well-defined.
Corollary 1. When D = 0, Riemann-Roch gives `(0) − `(KC ) = 0 − g + 1. Since `(0) = 1
(constant functions), this gives `(KC ) = g. So, dim({holomorphic 1-forms}) = g.
Corollary 2. When D = KC , Riemann-Roch gives `(KC ) − `(0) = deg(KC ) − g + 1 ⇒
g − 1 = deg(KC ) − g + 1 ⇒ deg(KC ) = 2g − 2.

18.4. More on Pic0 (C). Let C/K be a genus g ≥ 1 curve, and P0 ∈ C(K). We have defined
Pic0 (C) as
Div0 (C)
Pic0 (C) =

42
where ∼ stands for linear equivalence. Note that Pic0 (C), as defined, is just an abelian group.
We will soon see that Pic0 (C) is also a variety. There is a natural map j : C → Pic0 (C) given
by P 7→ [(P ) − (P0 )]. As we briefly saw before, we can use j to map multiple copies of C into
Pic0 (C). Indeed, j : C n → Pic0 (C) can be defined by (P1 , ..., Pn ) 7→ [(P1 )+...+(Pn )−n(P0 )].
Clearly, the image does not depend on the order of the points, as the addition in Div(C) is
commutative. Thus, we get a well-defined map C (n) → Pic0 (C) where C (n) := C n /Sn .
It turns out that j(C (g) ) = Pic0 (C) which will be proved by Riemann-Roch. In other
words, every divisor of degree 0 on C can be expressed in the form (P1 ) + ... + (Pn ) − n(P0 )
for some points P1 , ..., Pn . Since C (g) is naturally a variety, the equality j(C (g) ) = Pic0 (C)
allows one to equip Pic0 (C) with the structure of a variety. For this last assertion to make
perfect sense, we also need some sort of injectivity of the map j. This is in fact true: there
exists an open set U ⊂ C (g) such that j|U : U ,→ Pic0 (C). This turns “most of” Pic0 (C) into
an an algebraic variety, and then there is a general theorem of Weil that allows one to extend
to get an abelian variety structure on all of Pic0 (C). Let’s summarize our observations in a
theorem.
Theorem. (Three key facts about Pic0 ).
(1) j(C (n) ) = Pic0 (C).
(2) There is a non-empty open set U ⊂ C (g) such that j|U : U ,→ Pic0 (C).
(3) There exists an abelian variety J = JC = Jac(C) such that
∼ ∼
C (g) −→ J −→ Pic0 (C).
birational

Proof. (1) Let [D] ∈ Pic0 (C). Consider the divisor g(P0 ) + D. By Riemann-Roch,
`(g(P0 ) + D) ≥ deg(g(P0 ) + D) − g + 1 = g − g + 1 = 1
Therefore, there is some f ∈ K(C) such that div(f ) + g(P0 ) + D ≥ 0. Let E = div(f ) +
g(P0 ) + D. Note that D ∼ E − g(P0 ), and deg(E) = g. See the book [HS00] for the proofs
of parts (2) and (3). 

Upshot. There exists an abelian variety J such that j : C g /Sg −→ J.
birational
Let Θ := j(C g−1 ) ∈ Div(J) be the theta divisor. This is an irreducible divisor, as C is
irreducible.
Theorem. (Properties of the Theta divisor).
(1) [−1]∗ Θ ∼ Θκ where Θκ = Θ − κ, or more formally, Θκ = Tκ∗ Θ. In fact, we will see in
the proof that κ = j(KC ). We will also use the notation Θ− := [−1]∗ Θ.
(2) j ∗ Θ = Θ|C ∼ g(P0 ) + κ, and j ∗ (Θ−
z ) ∼ g(P0 ) − z for z ∈ J. Moral: Θ|C has degree g.
(3) Recall that σ : J × J → J is the map (P, Q) 7→ P + Q, while π1 : J × J → J,
π2 : J × J → J are the usual projections. Then
(j × j)∗ [σ ∗ Θ − π1∗ Θ − π2∗ Θ] ∼ −∆ + {P0 } × C + C × {P0 }
| {z } | {z } | {z }
∈Div(J×J) p∗1 (P0 ) p∗2 (P0 )
| {z }
∈Div(C×C)

where p1 : C × C → C and p2 : C × C → C are the usual projections.


Remark. We should explain the notation Θ − κ. Note that κ = j(KC ) in our application.
Assuming g ≥ 1, we have that `(KC ) = g ≥ 1, soPthere exists an effective divisor which is
2g−2
linearly equivalent to KC . In other words, KC ∼ i=1 (Pi ) for 2g − 2 distinct points on C.
43
So we may define j(KC ) as an image under the map C 2g−2 → J. Now that we have viewed
κ = j(KC ) as a point in J, we can rigorously define Θ − κ as the translation of Θ pointwise
by κ. In other words, Θ − κ = {Q − κ : Q ∈ Θ}.
Proof. (a) By definition,
Θ = j(C g−1 ) = {[(P1 ) + ... + (Pg−1 ) − (g − 1)(P0 )]}
Take some D ≥ 0, where deg(D) = g − 1. By Riemann-Roch, `(D) − `(KC − D) =
deg(D) − g + 1, so
`(KC − D) = `(D) − deg(D) + g − 1
Note that `(D) ≥ 1 as D ≥ 0, and − deg(D) + g − 1 = 0 by assumption, so we obtain
`(KC − D) ≥ 1, i.e. there exists f ∈ K(C) such that div(f ) + KC − D ≥ 0. Let E =
div(f ) + KC − D. Then E ≥ 0, and deg(E) = (2g − 2) − (g − 1) = g − 1. By definition,
−D ∼ E − KC
As a result,
−(D − (g − 1)(P0 )) = (E − (g − 1)(P0 )) −(KC − (2g − 2)(P0 ))
| {z } | {z }
∈Θ− ∈Θ
| {z }
∈Θ translated by −j(KC )

Since E is effective, and deg(E) = g − 1, we know that E is linearly equivalent to g − 1


points. Thus, E − (g − 1)(P0 ) comes from j(C g−1 ).
(2) Sketch: We have a surjection C (g)  J = Pic0 (C). There exists an open set U ⊆ C (g) ,
U = {(P1 , ..., Pg ) : Pi are distinct and `((P1 ) + (P2 ) + ... + (Pg )) = 1}
It is an exercise to show that the condition `((P1 ) + (P2 ) + · · · + (Pg )) = 1 is equivalent to
showing that j is injective on U .
Consider the special case −z ∈ j(U ) where −z = [(Q1 ) + · · · + (Qg ) − g(P0 )] ∈ j(U ). Note
that P ∈ Support(j ∗ (Θ− −
z )) if and only if j(P ) ∈ Θz if and only if

(P ) − (P0 ) ∼ [(g − 1)(P0 ) − ((P1 ) + ... + (Pg−1 ))] − z


for some P1 , ...., Pg−1 , i.e.
(P ) + (P1 ) + ... + (Pg−1 ) ∼ (Q1 ) + ... + (Qg )
Since −z ∈ j(U ), the points Q1 , ..., Qg are all distinct and `(Q1 + ... + Qg ) = 1,
(P ) + (P1 ) + ... + (Pg−1 ) = (Q1 ) + ... + (Qg )
and so P ∈ {Q1 , ..., Qg }. So the Support(j ∗ (Θ−
z )) ⊆ {Q1 , ..., Qg }. Conversely, each (Qj )
appears on the left hand-hand side as some (Pi ). Interchanging the roles of P and Pi ,
and running the argument backwards, we get that Qj = Pi ∈ Support(j ∗ (Θ− z )). Thus,
Support(j ∗ (Θ−
z )) = {Q 1 , ..., Q g }. Then
j ∗ (Θ−
z ) = (Q1 ) + ... + (Qg ) ∼ g(P0 ) − z

Next, Θz = Θ−
z−j(Kc ) by (1), and so

j ∗ (Θz ) = j ∗ (Θ−
z−j(KC ) ) ∼ g(P0 ) − z + j(KC ) = g(P0 ) − z + κ

for all z ∈ j(U ). For general z ∈ J, see the book [HS00].


44
19. The Seesaw Principle
Let C be a smooth curve of genus g ≥ 1, and P ∈ C(k) be a point. We have talked about
the map j : C ,→ J ∼
= Pic0 (C) given by P 7→ [(P ) − (P0 )]. We have also defined the theta
divisor :
Θ := j(C) + j(C) + · · · + j(C) ∈ Div(J)
| {z }
g−1 times

The goal of this section is to prove the following linear equivalence:


(12) (j × j)∗ (σ ∗ Θ − π1∗ Θ − π2∗ Θ) ∼ −∆ + (C × {P0 }) + ({P0 } × C)
Here σ : J × J → J is the addition map, and j × j : C × C → J → J is the natural map
(P, P ) 7→ (j(P ), j(P )).
The proof needs the following result from the theory of abelian varieties.
Seesaw Principle. Let D ∈ Div(X × Y ) be a divisor. Suppose that:
• For every x1 ∈ X, D|{x1 }×Y ∼ 0 in Div(Y ).
• There exists y1 ∈ Y such that D|X×{y1 } ∼ 0 in Div(X).
Then D ∼ 0 in Div(X × Y ).
Proof idea. For every x1 ∈ X, there exists some function fx1 ∈ k(Y ) such that div(fx1 ) =
D|{x1 }×Y . Look at the map
X × Y → P1
(x, y) 7→ fx (y)
By letting f (x, y) := fx (y), we have f (x, y) ∈ k(X × Y ). Here is a fact:
divX×Y (f (x, y)) = D + E × Y
for some E ∈ Div(X). By restricting this identity on the slice X × {y1 },
divX×Y (f (x, y))|X×{y1 } = D|X×{y1 } + (E × Y )X×{y1 }
By hypothesis, D|X×{y1 } ∼ 0 and so we get
divX×Y (f (x, y))|X×{y1 } ∼ (E × Y )X×{y1 }
or equivalently,
divX (f (x, y1 )) ∼ E
This means that E ∼ 0, and so E = div(g(x)) for some g(x) ∈ k(X). Substituting this back,
we obtain
divX×Y (f (x, y)) ∼ D + divX×Y (g(x))
Thus, D ∼ divX×Y (f (x, y)g(x)−1 ) ∼ 0. 
We proceed to prove the key linear equivalence (12). We will show that the difference
of the two sides is ∼ 0 when restricted to C × {P1 } for every P1 ∈ C. By symmetry, the
difference will also be ∼ 0 when restricted to {P1 } × C. The Seesaw Principle will then
complete the proof.
Fix a point P1 on the curve C. Define ι : C → C × C by P 7→ (P, P1 ). Note that for any
divisor D ∈ Div(C × C), we have ι∗ D ∼ DC×{P1 } . So, we want to check
ι∗ (j × j)∗ σ ∗ Θ − ι∗ (j × j)∗ π1∗ Θ − ι∗ (j × j)∗ π2∗ Θ ∼ −ι∗ ∆ + ι∗ (C × {P0 }) + ι∗ ({P0 } × C)
45
Using functoriality of the pullback, this is equivalent to checking:
(σ ◦(j ×j)◦ι)∗ Θ−(π1 ◦(j ×j)◦ι)∗ Θ−(π2 ◦(j ×j)◦ι)∗ Θ ∼ −ι∗ ∆+ι∗ (C ×{P0 })+ι∗ ({P0 }×C)
We will first focus on the left hand side. Note that
σ ◦ (j × j) ◦ ι(P ) = σ ◦ (j × j)(P, P1 ) = j(P ) + j(P1 ) = Tj(P1 ) ◦ j(P )
π1 ◦ (j × j) ◦ ι(P ) = π1 ◦ (j × j)(P, P1 ) = j(P )
π2 ◦ (j × j) ◦ ι(P ) = π2 ◦ (j × j)(P, P1 ) = j(P1 )
So the left hand side becomes
j ∗ Tj(P

1)
(Θ) − j ∗ (Θ) − 0 ∼ j ∗ (Θ − j(P1 )) − j ∗ (Θ)
∼ g(P0 ) − j(P1 ) + κ − (g(P0 ) + κ)
∼ −j(P1 ) = −[(P1 ) − (P0 )] = [(P0 ) − (P1 )]
The right hand side is the same:
−ι∗ ∆ + ι∗ (C × {P0 }) + ι∗ ({P0 } × C) ∼ −(P1 ) + 0 + (P0 ) = [(P0 ) − (P1 )]
as desired.

20. Mumford’s Theorem


The goal is to prove the following result due to Mumford:
#{P ∈ C(K) : h(P ) ≤ B} ≤ c log(B)
for some constant c > 0. It is worthwhile to compare this with the corresponding result for
the Jacobian variety:
1
#{P ∈ J(K) : h(P ) ≤ B} ≈ B 2 rank(J(K))
We have the canonical height function associated to the divisor Θ, namely ĥJ,Θ : J(K) → R.
Theorem. The theta divisorq Θ is ample.
For z ∈ J(K), let ||z|| := ĥJ,Θ (z). We also have the inner product:
1
||z1 + z2 ||2 − ||z1 ||2 − ||z2 ||2

hz1 , z2 i :=
2
Here we really ought to be thinking of z ∈ J(K) ⊗Z R.
During the course of the proof, we will make a simplification, which is actually false (i.e.
almost never happens). The advantage is that the analysis becomes cleaner, and one can
then go back and fix the argument in a straightforward manner. Basically, we would rather
see the big picture (under the false, but morally right, assumption) rather than get lost in
the technical details from the very beginning.
Simplification. Θ− ∼ Θ (false)
The key linear equivalence (12) converts into a statement about height functions:
ĥJ,Θ (j(P1 ) + j(P2 )) − ĥJ,Θ (j(P1 )) − ĥJ,Θ (j(P2 )) = −h∆ (P1 , P2 ) + h(P0 ) (P1 ) + h(P0 ) (P2 ) + O(1)
46
If P1 6= P2 , then h∆ (P1 , P2 ) ≥ O(1) because ∆ > 0 by the base-locus property. Apparently
this can fail if P1 = P2 . Using our assumption, j ∗ (Θ) = j ∗ (Θ− ) ∼ g(P0 ), we know that
gh(P0 ) (P ) = ĥJ,Θ (j(P )). Thus,
1 1
||j(P1 ) + j(P2 )||2 − ||j(P1 )||2 − ||j(P2 )||2 ≤ ||j(P1 )||2 + ||j(P2 )||2 + O(1)
g g

For simplicity, we will use ||P || := ||j(P )||. Let V := J(K) ⊗ R = Rr where r = rank J(K).
Note that ker(J(K) → V ) = J(K)tors . Let S be the image of C(K) in V . We might as well
focus on bounding the set S.
We have shown that for P, Q ∈ S with P 6= Q, we have
1
||P + Q||2 − ||P ||2 − ||Q||2 ≤ ||P ||2 + ||Q||2

g
where we are ignoring the constant O(1) at the end for simplicity. Using the parallelogram
law, we get an equivalent formulation:
1
||P ||2 + ||Q||2 − ||P − Q||2 ≤ ||P ||2 + ||Q||2

g
Consequently,  
2 1
||P ||2 + ||g||2

||P − Q|| ≥ 1 −
g
Somehow this inequality is only useful if the points P and Q have similar heights. In this
case, it will force the points P and Q to be apart. We will prove a more general result that
holds for any lattice in a finite-dimensional vector space. Our application will be the special
case when L = J(K) and S = C(K).
Proposition. Let (V, || · ||) be a finite-dimensional real vector space equipped with the
Euclidean norm || · ||. Let S ⊂ L ⊂ V where L is a lattice in V . Suppose there exists α > 0
such that
∀x, y ∈ S with x 6= y, ||x − y||2 ≥ α ||x||2 + ||y||2


Then
#{x ∈ S : ||x|| ≤ R} ≤ c log R
for some constant c > 0.
Compare this with:
#{x ∈ L : ||x|| ≤ R} ∼ cL · Rr
where r = dim(V ) and cL depends on the fundamental domain of the lattice L.
The idea behind the proof is to count the points in S carefully by considering points on
the annulus.
Proof. For u ≤ v, let
S(u, v) = {x ∈ S : u < ||x|| ≤ v} = S ∩ (B0 (v) \ B0 (u))
where 0 is the zero vector of V . Here we are employing the notation:
Bx0 (R) = {x ∈ V : ||x − x0 || ≤ R}
for a closed ball of radius R centered at x0 . If x, √y ∈ S(u, v) and x 6= y, then ||x − y||2 ≥
α(||x||2 + ||y||2 ) ≥ 2αu2 , which implies ||x − y|| ≥ 2αu. As a result,
Bx (βu) ∩ By (βu) = ∅
47
1
√ p
where β = 2
2α = α/2. Using x ∈ B0 (v), we get Bx (βu) ⊆ B0 (v + βu). Thus,
[
B0 (v + βu) ⊇ Bx (βu)
x∈S(u,v)

where the union on the right is in fact a disjoint union because Bx (βu) ∩ By (βu) = ∅ for any
x, y ∈ S with x 6= y. By taking the volume of both sides,
X
vol(B0 (v + βu)) ≥ vol(Bx (βu))
x∈S(u,v)

⇒ vol(B0 (1))(v + βu) ≥ #S(u, v) · vol(B0 (1))(βu)r


r

⇒ (v + βu)r ≥ #S(u, v)(βu)r

We deduce that  r  r
v + βu v
#S(u, v) ≤ = 1+
βu βu
This is a very strong condition. The bound depends only on the ratio of the radii in the
annulus. We now use a dyadic trick:
 
blog2 Rc
X
#{x ∈ S : ||x|| ≤ R} ≤  #S(2k , 2k+1 ) + #{x ∈ S : ||x|| ≤ 1}
k=0

Note that the second piece is at most #{x ∈ L : ||x|| ≤ 1}, which is finite as L is a lattice,
and so it is discrete. We will focus on the first piece:
blog2 Rc blog2 Rc  r r
2k+1

X
k k+1
X 2
#S(2 , 2 ) ≤ 1+ k
≤ log2 R · 1 +
k=0 k=0
β2 β
p
As β = α/2 and r are both constants, this finishes the proof of the proposition. 
Let us address a few steps where we were not accurate, and how to fix them.
Fudge. 1) Rather than ||x − y||2 ≥ α(||x||2 + ||y||2 ), one should use ||x − y||2 ≥ α(||x||2 +
||y||2 ) + O(1). This is not a huge problem, since we can just replace α with something else,
i.e. we can instead use ||x − y||2 ≥ α̃(||x||2 + ||y||2 ) for a suitably chosen α̃ < α.
2) In reality Θ 6= Θ− , so Θ is not a symmetric divisor. So we should instead apply the
argument with the divisor D = Θ + Θ− = Θ + Θk , so the correct inequality should be
 
2 1
||x||2 + ||y||2 − C1 (||x|| + ||y|| + 1)

||x − y|| ≥ 1 −
g
In our analysis we did not have the linear term ||x|| + ||y|| + 1. Using ||x + k||2 = ||x||2 +
2hx, ki + ||k||2 , and completing the square, we can convert this into a version ||x − y||2 ≥
α0 (||x||2 + ||y||2 ) for some α0 > 0.
We have finally proven the theorem due to Mumford:
Theorem (Mumford). If C is a smooth curve of genus g ≥ 2 over a number field K,
then there is a constant c > 0 such that:
#{P ∈ C(K) : h(P ) ≤ R} ≤ c log R
where D is any ample divisor on C.
48
How do different height functions on a curve relate to each other?
Proposition. Let D1 , D2 ∈ Div(C) where D2 is ample. Then
hD1 (P ) deg(D1 )
lim =
hD2 (P )→∞ hD2 (P ) deg(D2 )
for any P ∈ C(K).
Proof. First, we will prove the special case when E = D1 satisfies deg(E) = 0. By
Riemann-Roch, if D ∈ Div(D) satisfies deg(D) ≥ 2g + 1, then D is very ample. It follows
that D ∈ Div(C) is ample if and only if deg(D) ≥ 1. We also know that
D is ample ⇒ hD (P ) ≥ O(1)
Take any integer n ≥ 1. Then
deg(D + nE) = deg(D) ≥ 1 and deg(D − nE) = deg(D) ≥ 1
So D + nE and D − nE are ample, and therefore,
hD+nE (P ) ≥ O(1) and hD−nE (P ) ≥ O(1)
Using functoriality, we obtain
hD (P ) ≥ −nhE (P ) + O(1)
hD (P ) ≥ nhE (P ) + O(1)
Dividing both sides by −nhD (P ) in the first equation, and by nhD in the second equation,
we get
 
1 hE (P ) 1
− ≤ +O
n hD (P ) hD (P )
 
1 hE (P ) 1
≥ +O
n hD (P ) hD (P )
We have shown that:  
1 hE (P ) 1 1
− ≤ +O ≤
n hD (P ) hD (P ) n
Letting hD (P ) → ∞,
1 hE (P ) hE (P ) 1
− ≤ lim inf ≤ lim sup ≤
n hD (P )→∞ hD (P ) hD (P )→∞ hD (P ) n
We deduce that
hE (P )
lim =0
hD (P )→∞ hD (P )

which proves the special case when deg(E) = 0.


General Case. Let d1 = deg(D1 ), and d2 = deg(D2 ). We will apply the special case
with E = d2 D1 − d1 D2 and D = D2 . Note that deg(E) = 0. By using the conclusion of the
special case,
d2 hD1 (P ) − d1 hD2 (P )
lim →0
hD2 (P )→∞ hD2 (P )
49
This translates into:
 
d2 hD1 (P ) hD1 (P ) d1
lim − d1 →0 ⇒ lim →
hD2 (P )→∞ hD2 (P ) hD2 (P )→∞ hD2 (P ) d2
and we are done. p 
In fact, a stronger fact is true. One can apparently show that hE (P ) ≤ c hD (P ) when
E is a degree 0 divisor, and D is an ample divisor.

21. Diophantine Approximation


Motivation. Let’s say that we want to solve xn − y n = A where n ≥ 2 and A ∈ Z with
A 6= 0. So, we want to find all pairs (x, y) ∈ Z2 which satisfies this equation. One idea is to
use factorization:
(x − y)(xn−1 + xn−2 y + · · · + y n−1 ) = A
Let B = x − y and C = xn−1 + · · · + y n−1 so that A = BC. The set {(B, C) : A = BC} is
finite. For each potential value of B, one can substitute x = y + B into xn−1 + · · · + y n−1 to
get (y + B)n−1 + · · · + y n−1 = 0 which has at most ≤ n − 1 solutions. So, in this example
the solution set can be checked in finite time (in theory).
In contrast, x2 − 2y 2 = A often has infinitely many solutions.
Perhaps more surprisingly, it turns out that x3 − 2y 3 = A has finitely many solutions.
This example explains some of the motivation for diophantine approximation. Here is the
idea. Let’s assume A > 0 for simplicity. We can still factorize:
√ √ √
(x − 2y)(x2 + 2xy + 4y 2 ) = A
3 3 3

The second factor can be completed to a square:


2
√ √ 1√ 3√

2 2
4y 2
3 3 3 3
x + 2xy + 4y = x + 2y +
2 4

Let C := 43 3 4y 2 . It follows that
√ √
x2 + 2xy + 4y 2 ≥ Cy 2
3 3

It follows that

3 A A
|x − 2y| = √ √ ≤
|x2 + 3 3 2
2xy + 4y | Cy 2
Consequently,
x √3 A
− 2 ≤
y C|y|3

This last inequality says that x/y ∈ Q is very close to 3 2. According to the theorem of
Axel Thue, there are only finitely many solutions x/y ∈ Q with gcd(x, y) = 1 satisfying
this inequality. From now on, when we write x/y ∈ Q we will implicitly assume that
gcd(x, y) = 1.
More generally, Thue was able to show that if α is an irrational algebraic number with
degree [Q(α) : Q] = d, then for any  > 0, the inequality
x 1
− α ≤ d/2+1+ε
y y
50
is satisfied for only finitely many rational numbers x/y ∈ Q with y > 0. In our application
above, d = 3, and so d/2 + 1 + ε = 2.5 + ε.
Diophantine approximation is the theory of how close rational quantities approximate
irrational quantities.
Theorem. (Dirichlet) Let α ∈ R\Q. There are infinitely many rational numbers x/y ∈ Q
with
x 1
−α ≤ 2
y y
Proof. We will apply the pigeonhole principle. Let {t} := t − btc. Choose B ∈ N. Look at
{0}, {α}, {2α}, ..., {Bα}. These will be the pigeons. Since there are B + 1 pigeons, we need
B pigeonholes. Look at the intervals:
     
1 1 2 B−1
0, , , , ..., ,1
B B B B
These are the pigeonholes. By the pigeonhole principle, there exist 0 ≤ x1 < x2 ≤ B, and
0 ≤ i < B such that  
i i+1
{x1 α}, {x2 α} ∈ ,
B B
Consequently,
1
|{x2 α} − {x1 α}| ≤
B
which translates into:
1
|x2 α − bx2 αc − x1 α + bx1 αc| ≤
B
1
|(x2 − x1 )α − (bx2 αc − bx1 αc)| ≤
B
Letting y = x2 − x1 and x = bx2 αc − bx1 αc, we get |yα − x| ≤ B1 . Dividing both sides by
|y|, we obtain
x 1 1
α− ≤ ≤ 2
y B|y| y
since |y| = |x2 − x1 | ≤ B. 
Remark. One of the improvements of Dirichlet’s theorem is a result due to Hurwitz,
which states that for α ∈ R \ Q, there are infinitely many solutions to:
x 1
−α ≤ √
y 5y 2

The constant 5 cannot
√ be replaced by any bigger constant, so the result is optimal√
in a
1+ 5
certain sense. This 5 comes from attempting to approximate the golden ratio α = 2 . In
some sense, the golden ratio is the real number that is hardest to approximate by a rational
number.
Theorem. (Liouville) Let α ∈ Q \ Q and d = [Q(α) : Q]. There exists a constant C(α),
which depends on α, such that for all x/y ∈ Q,
x C(α)
−α ≥
y yd
51
An equivalent formulation is the statement that there are only finitely many rational numbers
x/y ∈ Q satisfying |x/y − α| < 1/y d+ε .
Proof. We may assume that α ∈ Q ∩ R. Let f (T ) = a0 T d + a1 T d−1 + · · · + ad ∈ Z[T ]
be the minimal polynomial of α. Note that f (α) = 0 but f 0 (α) 6= 0. We can write f (T ) =
(T − α)g(T ) where g(T ) ∈ R[x] and g(α) 6= 0. For any x/y ∈ Q,
   
x x x
f = −α · g
y y y
So we have
a0 xd + a1 xd−1 y + · · · + ad y d
 
x 1
0 6= f = d
≥ d
y y |y|
Consequently,
 
x x 1 x 1 1
−α · g ≥ d ⇒ −α ≥ d ·
y y |y| y |y| |g(x/y)|
x
So we just need an upper bound for |g(x/y)|. Either y
− α > 1, in which case we are done.
x
So, we may assume that y
− α ≤ 1, and
 
x
g ≤ sup |g(t)| = C(α)−1
y |t−α|≤1

for some C(α) > 0. Here we used the fact that g(α) 6= 0, and the fact that g(x) is a
continuous function, and so it achieves a minimum on a closed set {t ∈ R : |t − α| ≤ 1}.
Combining the inequalities, we obtain
x 1 1 C(α)
−α ≥ d · ≥
y |y| |g(x/y)| |y|d
as desired. P∞ 1 
Corollary. Let β = n=0 10n! is in R \ Q, i.e. it is a transcendental number.
Proof sketch. Let xyNN = N 1
P
n=0 10n! be the partial sums. If β were algebraic with degree
d = [Q(β) : Q], it would satisfy Liouville’s theorem. So,
C(β) xN constant
d
≤ −β ≤ f (N )
yN yN yN
where f is some function that satisfies f (N ) → ∞ as N → ∞. This contradicts the inequality
above for values of N that are much bigger than d. 
As we mentioned above, Thue improved Liouville’s theorem by showing that for every
α ∈ Q \ Q with degree d = [Q(α) : Q], and ε > 0, there exists a constant C(α, ε) > 0
depending only on α and ε such that
x C(α, ε)
− α ≥ d/2+1+ε
y |y|
for all rational numbers x/y ∈ Q. The bound was later improved by Siegel to
x C(α, ε)
−α ≥ √
y |y|2 d+1+ε
52
Dyson and Gelfond independently proved a slightly stronger statement:
x C(α, ε)
−α ≥ √
y |y| 2d+1+ε
√ √
k
In his paper, Dyson suggested that √ maybe 2d can be replaced with kd for any k ≥ 1.
By letting k → ∞, we would get k kd → 1. This goal was achieved by Roth. Thus, Roth’s
theorem is the inequality:
x C(α, ε)
−α ≥
y |y|2+ε
The exponent 2 + ε is the best possible in view of Dirichlet’s theorem.
All the results above are not effective, i.e. the proofs do not furnish any estimate for how
big C(α, ε) can get. An effective version for Liouville’s theorem was proved by Baker, who
showed that there are constants δ(α) > 0 and C(α) > 0 such that:
x C(α)
− α ≥ d−δ(α)
y |y|
where both δ(α) and C(α) are effective. We can apply Baker’s theorem to the Diophantine
equation x3 − 2y 3 = A. As we mentioned above, any solution (x, y) ∈ Z2 with y 6= 0 must
satisfy: √
C( 3 2) x √ 3 4A

3 ≤ − 2 ≤ √ 3
|y|3−δ( 2) y 3 4|y|3

3
The left-hand side √comes from Baker’s theorem. Rearranging the inequality, we get |y|δ( 2) ≤
constant. Since δ( 3 2) > 0 is effective, there is a finite√search space for the potential values of
y. In theory, depending on how large the constant δ( 3 2) > 0 is, this allows us to enumerate
all integral solutions to x3 − 2y 3 = A.
References
[Dob79] E. Dobrowolski, On a question of Lehmer and the number of irreducible factors of a polynomial,
Acta Arith. 34 (1979), no. 4, 391–401, DOI 10.4064/aa-34-4-391-401.
[HS00] Marc Hindry and Joseph H. Silverman, Diophantine geometry 201 (2000), xiv+558, DOI
10.1007/978-1-4612-1210-2. An introduction.

53

You might also like