Professional Documents
Culture Documents
Finite Fields and Symmetric Cryptography Lecture Notes
Finite Fields and Symmetric Cryptography Lecture Notes
Symmetric Cryptography
Andrea Caranti
Dipartimento di Matematica, Università degli Studi di Trento,
via Sommarive 14, I-38050 Povo (Trento), Italy
E-mail address: caranti@science.unitn.it
URL: http://science.unitn.it/~caranti/
Contents
Introduction 5
Why these notes are not in Italian 5
Goal 5
Notes from the author to the author 5
Chapter 1. Finite Fields 7
1.1. Background 7
1.2. Simple extension 7
1.3. Galois theory 8
1.4. Irreducible polynomials 9
1.5. Linear Functions 9
Chapter 2. The Moebius function 13
2.1. Definition 13
2.2. Moebius inversion 13
2.3. Irreducible polynomials 14
Chapter 4. AES 21
4.1. Rijndael and AES 21
4.2. Generalities 21
4.3. S-boxes 22
Chapter 5. Nonlinearity 23
5.1. Correlation 23
5.2. Distance from affine functions 23
5.3. A scalar product 24
5.4. Parities 24
5.5. Parseval’s identity 26
5.6. Inversion in a finite field 26
5
CHAPTER 1
Finite Fields
1.1. Background
A finite field E has order pn , for a suitable prime p. Here p is the characteristic
of E (and thus E contains the field F = Fp with p elements), and |E : F | = n.
Write q = pn . The elements of E are roots of f = xq − x ∈ F [x].
Given p and n, a field of order q can be constructed as the splitting field of f
over F . This is unique up to (F -)isomorphisms. Write GF(q) for it.
nk = |L : F | = |L : E| · |E : F | = |E[α] : E| · n.
Proof. Suppose m
P
6 0 (so say a1 = 1), and
i=1 ai ϕi = 0. May assume all ai =
there is no nontrivial relation with some of the ai = 0. Since ϕ1 6= ϕ2 , let x0 be
10 1. FINITE FIELDS
where the last one is obtained multiplying by ϕ1 (x0 ) the first one. Subtracting
the last two, we get
m
X
ai (ϕ1 (x0 ) − ϕi (x0 ))ϕi (x) = 0.
i=2
The first coefficient is zero, but the second one is not, as ai (ϕ1 (x0 ) − ϕ2 (x0 )) 6= 0.
This contradicts our initial assumption.
Thus the ϕi are independent, and thus their linear combinations
n−1
n−1
X
ai ϕi : x 7→ a0 x + a1 xp + · · · + an−1 xp ,
i=0
with ai ∈ E are distinct linear maps. And there are |E|n = pn of them, so they
2
for all x ∈ E we have tr(ax) = tr(bx), that is, tr((a − b)x) =, and so a − b = 0.
But these are precisely |E| functions, and thus all of them.
CHAPTER 2
2.1. Definition
Let N? be the set of positive integers. Define a function µ : N? → Z via
(
X 1 if n = 1,
(2.1.1) µ(d) =
d|n
0 if n > 1.
• µ is uniquely determined by the previous formula, that is, there is a unique
function µ that satisfies the formula for all n.
Clearly µ(1) = 1. Proceed by induction on n. If the values µ(d) are
known, for d < n, then the value of µ(n) is uniquely determined by (2.1.1).
• µ is multiplicative, that is µ(nm) = µ(n) · µ(m) if (n, m) = 1.
Let (n, m) = 1. We may clearly assume n, m > 1. Assume by induc-
tion that µ(x) is multiplicative for x < nm. Now every divisor d of nm
can be written as d = ef , where e | n e f | m, thus (e, f ) = 1, so that we
have
X
−µ(nm) = µ(d)
d|nm,d6=nm
X
= µ(e)µ(f )
e,f,e|n,f |m,ef 6=nm
X X
= µ(e) µ(f ) − µ(n)µ(m)
e,e|n f,f |m
= −µ(n)µ(m)
as n, m > 1.
• Show that
1
if n = 1,
µ(n) = 0 if n is divisible by the square of a prime,
(−1)k
if n is the product of k distinct primes.
Proceeding by induction on k, one shows the formula to hold for n =
pk , where p is a prime. Then use the previous item
µ is called the Moebius function.
Then X
f (n) = µ(n/d)g(d).
d|n
We compute
X X X X
µ(n/d)g(d) = µ(e)g(d) = µ(e) f (k).
d|n d,e,de=n d,e,de=n k|d
What’s the coefficient on f (k) in the previous formula, for a given k? It’s
(
X X 1 if k = n,
µ(e) = µ(e) =
d,e,de=n,k|d e,e|n/k
0 if k 6= n.
For example,
6
(xp − x) · (xp − x)
P (6) = p2 .
(x − x) · (xp3 − x)
6
The meaning is that we first take out from xp − x the irreducible polynomials of
degree dividing 2 and 3, but then we have removed those of degree 1 twice, so we
need to compensate.
As for the number of polynomials, write Q(n) for the number of irreducible
polynomials in F [x] of degree d. Then (2.3.1) says
Y
pn = Q(d),
d|n
attempts we have obtained a basis with probability at least (1 − ε)n > 1 − nε.
Note that ! !
n
X 1
γ = lim − log(n)
n→∞
k=1
k
is the Euler-Mascheroni constant ≈ 0.5772.
3.5. Cryptanalysis of affine transformations
Here f = fa,b .
In a chosen plaintext situation, we first compute 0fa,b = b, and then we are in
the case of a linear function.
In the given plaintext situation, we start with a differential cryptanalysis ap-
proach. That is, given u, v ∈ V , we compute uf − vf = (u − v)fa,0 , where now
fa,0 is linear. So once we have enough vectors so that their pairwise differences
form a basis for V , we are have found fa,0 , and thus we find easily b too.
There is only one thing to be noticed. Even if we have n + 1 vectors vi , such
that every subset of n vectors is a basis, it might be that their differences is not a
basis. This happens when there is a relation of the form
n+1
X n+1
X
ai vi , with ai = 0.
i=1 i=1
To be added: to be polished
CHAPTER 4
AES
I have just started writing this chapter, so it is very tentative, and contains
notes to myself for further development. Blanket reference is [DR02].
4.2. Generalities
AES is a symmetric (or private key) cryptosystem. That is, the two parties
who use it to communicate have to share a secret (private) key beforehand. A
public key cryptosystem like RSA might be used to exchange the private key;
from then on, AES is used.
AES is a block cryptosystem. A message is split up in blocks of 128 bits, and
every block is processed separately.
AES must run also in situations in which memory and processing power are
limited, such as on a smart card. To minimize the length of the code, AES is
an iterated crytposystem. That is, a simple function (which would not offer any
security by itself, but which has a short code) is iterated several times, until
security is achieved. (Think of the way we shuffle a deck of cards: we repeat
several times a simple shuffling operation, until the cards are well mixed [Dia98].)
This simple function is called a round.
Thus, when we submit a plaintext block to AES, it undergoes several transfor-
mations, until we get the final ciphertext. Each intermediate state in the process
is called - ehm - the state. The plaintext, the ciphertext and all intermediate states
are all elements of a vector space V = V (2, 128) of dimension 128 over the field
F = F2 with two elements.
To be added: Simple examples to illustrate?
In turn, every round is composed of several simpler functions. One of them
involves the round key. Starting from the master key, one different key for each
round is calculated. In round i, the round key ki is simply used by addition, that
is, the state x is mapped to x + ki .
21
22 4. AES
4.3. S-boxes
All components of AES but one are affine functions. (In cryptography, one
often says linear, when a mathematician would say affine.) This one has to be a
permutation γ of V , a set with 2128 elements. How does one write down such a
beast? The idea is to split
16
M
V = Vi ,
i=1
where each Vi has dimension 8. The permutation γ is then defined as
16
X
vγ = vi γi ,
i=1
P16
where v = i=1 vi , with vi ∈ Vi , and γi is a permutation of Vi . Each Vi is taken
to be athe additive group of GF(28 ), and then one takes
(
0 if u = 0,
uγi = −1
u otherwise.
This function is chosen (see [Nyb94]) because it is very much non-linear, see
Chapter 5.
To be added: To be continued
CHAPTER 5
Nonlinearity
5.1. Correlation
A rielaboration of [DR02, Chap. 7] in terms of group theory.
Let V = Fn2 . Suppose you have two binary Boolean functions f, g : V → F2 .
Their Hamming distance d(f, g) is simply the number of points on which they
differ. On average, you will expect them to coincide on about half of the points.
That is, if you take two random Boolean functions, the bets are they will have the
same value on half of the points, that is, the probability
d(f, g)
1−
2n
that they coincide is likely to be 1/2.
So you measure their correlation as
d(f, g) d(f, g)
C(f, g) = 2 · (1 − ) − 1 = 1 − .
2n 2n−1
(We have used the linear function λ : R → R given by λ(x) = 2x − 1, which
maps the interval [0, 1] onto [−1, 1].) So correlation 0 means they behave as your
favourite pair of average random Boolean functions. Correlation 1 means f = g.
Correlation −1 means that g(x) = 1 + f (x) for all x, or g evaluates to 0 on the
points where f evaluates to 1 and viceversa. Any correlation different from 0
means that the knowledge of f tells you some information about g as well.
We can reformulate the above as
d(f, g) = 2n−1 (1 − C(f, g)).
23
24 5. NONLINEARITY
Since min C(f, g) = − max(−C(f, g)), we get that the distance of f from affine
functions is
(5.2.1) 2n−1 (1 − max { |C(f, g))| : g linear }).
In other words D E
C(f, g) = fˆ, ĝ .
D E
In particular the norm fˆ, fˆ = 1 for Boolean functions f .
5.4. Parities
The dual space V ? of V is the vector space of all linear maps V 7→ F2 . A
choice of a basis vi on V gives a dual basis vi? ∈ V ? (where vi? (vj ) = δ(i, j)). We
will write elements og V and V ? as vectors in Fn2 , with respect to such a pair of
bases.
If w ∈ V ? (regarded, as we said, as an element of Fn2 ), as a linear function
V → F2 acts as x 7→ x · w (where “·” represent the row-by-row product of x and
5.4. PARITIES 25
w, where x ∈ V is also regarded as an element of Fn2 Their hats are the functions
called parities
πw : V → { 1, −1 }
x 7→ (−1)x·w .
Note that πx (y) = πy (x).
We claim that every binary Boolean function (in the hat form), that is, every
function V → C, can be written as a linear combination of these. Note that the
space of functions V → C is a vector space over C of dimension 2n over C.
This is a special case of a rather more general fact from the theory of group
representation theory and discrete Fourier transforms. Let G be a finite abelian
(commutative) group. Then its (linear) characters are the group morphisms from
G into the multiplicative group C? of the nonzero complex numbers.
If G is our V above, such a morphism ϕ satisfies
ϕ(x + y) = ϕ(x)ϕ(y),
thus we have for all v ∈ V
1 = ϕ(0) = ϕ(2v) = ϕ(v + v) = ϕ(v)ϕ(v) = ϕ(v)2 ,
so that ϕ(v) ∈ { 1, −1 }. And in fact these characters are exactly the parities
above. This is because a parity is a character, and conversely, if ϕ is a character,
then we can write ϕ(x) = (−1)ψ(x) for a unique ψ : V → F2 , and ψ is linear, as
(−1)ψ(x+y) = ϕ(x + y) = ϕ(x)ϕ(y) = (−1)ψ(x) (−1)ψ(y) = (−1)ψ(x)+ψ(y) .
Note now that two distinct parities are orthogonal, as we have
1 X
hπv , πw i = n (−1)x·v (−1)x·w
2 x∈V
1 X
= n (−1)x·(v+w)
2 x∈V
= δ(v, w).
In fact, if v = w all the summands are 1. If u = v + w 6= 0, the sum is zero instead.
This is because x 7→ x · u is a linear map V → F2 . It has value 0 on its kernal,
which is a subspace of V of codimension one, and thus with 2n /2 elements, and
value 1 on the remaining 2n /2 elements.
If these parities are orthogonal, they must be linearly independent, and thus
they are a basis of the space of C-valued functions on V . Let us find how to write
any (hat) function in terms of the parities.
The coefficients will be of course just the scalar products of a fˆ with the
parities, that is, the coefficient of fˆ with respect to the parity πw will be
D E 1 X
F (w) = C(f, πw ) = fˆ, πw = n (−1)f (x)+x·w .
2 x∈V
This function F : V ? → C is called the Walsh-Hadamard transform of f , or
equivalently fˆ. It is a special case of a Fourier transform.
26 5. NONLINEARITY
Reciprocally, we have
1 X
C(F, πy ) = hF, πy i = F (w)πy (w)
2n w∈V
1 X
= F (w)(−1)y·w
2n w∈V
!
1 X X
= 2n (−1)y·w (−1)f (x)+x·w
2 w∈V x∈V
!
1 X X
= 2n (−1)f (x) (−1)(x+y)·w
2 x∈V w∈V
1 X
(−1)f (x) hπx , πy i
= n
2 x∈V
1 X 1
(−1)f (x) δ(x, y) = n fˆ(y).
= n
2 x∈V 2
So indeed X
fˆ(y) = F (w)πy (w)
w∈V
Consider the case when f (x) is a component of inversion, that is, f (x) =
tr(bx−1 ) for some b ∈ E ? . Rescaling, we have to evaluate a Kloosterman sum
−1
X
(−1)tr(x+bx )
x∈E ?
Warning! This part is taken from [CDVSV04], and needs some adjustments.
6.3. AES
We note immediately
Lemma 6.3.1. AES satisfies the hypotheses of Theorem 6.2.2.
Proof of Lemma 6.3.1. Condition (1) is clearly satisfied.
So is (3), by the construction of the mixing layer. In fact, suppose U 6= { 0 } is
a subspace of V which is invariant under λ. Suppose, without loss of generality,
that U ⊇ V1 . Because of MixColumns [DR02, 3.4.3], U contains the whole first
column of the state. Now the action of ShiftRows [DR02, 3.4.2] and MixColumns
on the first column shows that U contains four whole columns, and considering
(if the state has more than four columns) once more the action of ShiftRows and
MixColumns one sees that U = V .
The first part of Condition (2) is also well-known to be satisfied, with r = 1
(see [Nyb94] but also [DR06]). We recall the short proof for convenience. For
a 6= 0, the map GF(28 ) → GF(28 ), which maps x 7→ (x + a)−1 + x−1 , has image
of size 27 − 1. In fact, if b 6= a−1 , the equation
(6.3.1) (x + a)−1 + x−1 = b
6.4. PROOF 31
has at most two solutions. Clearly x = 0, a are not solutions, so we can multiply
by x(x + a) obtaining the equation
(6.3.2) x2 + ax + ab−1 = 0,
which has at most two solutions. If b = a−1 , equation (6.3.1) has four solutions.
Two of them are x = 0, a. Two more come from (6.3.2), which becomes
x2 + ax + a2 = a2 · (x/a)2 + x/a + 1 = 0.
Codes
35
Bibliography
[CDVSV04] A Caranti, F. Dalla Volta, M. Sala, and F. Villani, Primitivity for groups generated
by the round functions of iterated block cryptosystems, Working paper, 2004.
[CU57] L. Carlitz and S. Uchiyama, Bounds for exponential sums, Duke Math. J. 24 (1957),
37–41. MR MR0082517 (18,563c)
[Dia98] Persi Diaconis, From shuffling cards to walking around the building: an introduction
to modern Markov chain theory, Proceedings of the International Congress of Math-
ematicians, Vol. I (Berlin, 1998), no. Extra Vol. I, 1998, pp. 187–204 (electronic).
MR MR1648031 (99e:60154)
[DR02] Joan Daemen and Vincent Rijmen, The design of Rijndael, Information Security
and Cryptography, Springer-Verlag, Berlin, 2002, AES—the advanced encryption
standard. MR MR1986943 (2006b:94025)
[DR06] , Two-round AES differentials, IACR e-print
eprint.iacr.org/2006/039.pdf, 2006.
[GAP05] The GAP Group, GAP – Groups, Algorithms, and Programming, Version 4.4, 2005,
(http://www.gap-system.org).
[GGSZ04] D. Goldstein, R. Guralnick, L. Small, and E. Zelmanov, Inversion invariant additive
subgroups of division rings, Pacific J. Math. (2004), to appear.
[Mat05] Sandro Mattarei, Inversion invariant additive subgroups of division rings, Israel J.
Math. (2005), to appear.
[MR02] Sean Murphy and Matthew J.B. Robshaw, Essential algebraic structure within the
AES, Advances in Cryptology - CRYPTO 2002 (M. Yung, ed.), Lecture Notes in
Computer Science, vol. 2442, Springer, Berlin/Heidelberg, 2002, pp. 1–16.
[Nyb94] Kaisa Nyberg, Differentially uniform mappings for cryptography, Advances in Cryp-
tology — EUROCRYPT ’93 (Lofthus, 1993), Lecture Notes in Comput. Sci., vol.
765, Springer, Berlin, 1994, pp. 55–64. MR MR1290329 (95e:94039)
[Ser73] J-P Serre, A course in arithmetic, Graduate Texts in Math., vol. 7, Springer, New
York, 1973.
37