Professional Documents
Culture Documents
A Design For An Fpga Implementation of Rijndael
A Design For An Fpga Implementation of Rijndael
A Design For An Fpga Implementation of Rijndael
Keyword: Rijndael cipher, FPGA, memory block, A team from the University of California, Los
encryption, decryption. Angeles demonstrated in [3] an implementation of the
Rijndael cipher on Xilinx Virtex II XC2V1000-4
1. Introduction device using experimental frequency of 75 MHz,
For a long period of time, the Data Encryption throughput 739 Mbit/s for 128 bits.
Standard (DES) was considered a standard for the Máire McLoone, John V and McCanny described in
symmetric key encryption. DES has a key length of [4] their FPGA implementation, which employed
56 bits. For the time being, this key length is fully pipelined single-chip Rijndael design and runs at
considered small and can easily be broken. For this a data rate of 7 Gbits/sec on a Xilinx Virtex-E
reason, the National Institute of Standards and XCV812E-8-BG560 FPGA device. The decryptor
Technology (NIST) announced as a result of design also achieves a fast throughout of 6.4
computation among 15 algorithms that the Rijndael Gbits/sec.
cipher will replace the DES cipher and will become a Viktor Fischer and Milos Drutarovsk in [5] have
new AES. The Rijndael cipher has three possible implemented Rijndael cipher using 128-bit keys
block and key lengths: 128, 192, or 256 bits. implemented in Altera ACEX reconfigurable
Therefore, the problem of breaking the key becomes hardware, which has been found to be an excellent
more difficult [1-2]. solution for cost-sensitive encryption applications.
In general, hardware implementations of encryption Kenneth Stevens and Otmane Ait Mohamed have
algorithms and their associated key schedules are demonstrated in [6] a single-chip FPGA design of the
physically secure, as they cannot easily be modified Rijndael encryption algorithm. The design employed
by an outside attacker. Also, the high speed Rijndael two 32-bit DSP cores to reduce the latency issues
9
ICGST-PDCS Journal, Volume 9, Issue 1, October 2009
while a block cipher was implemented using the and key addition. On the other hand, each round of
Electronic Code Book mode of operation knowing Rijndael decryption function consists mainly of four
that the design focuses on a memory-based, byte- different transformations: InvByteSub, InvShiftRow,
sized arithematic pipeline structure that processes one InvMix-Column, and key addition. For a detailed
round at a time. The design provides a throughput description of each round transformations, the reader
rate of 0.1 Gbps per data channel. could refer to [1-2]. The output of the above
UCL Crypto Group, Laboratoire de transformations is called the 'state'. The state consists
Micro´electronique gave in [7] a good solution for of the same byte length as each block of the message.
applications, which need low area for implementation The description of the four transformations of the
like embedded systems where the rate of data transfer Rijndael cipher and their inverses will be given
is not the main concern. Such implementations use below.
FPGA Families like (Xilinx Virtex-II, Xilinx Spartan-
ByteSub Transformation: The ByteSub transformation
3 XC3S50).
is a non-linear byte substitution, operating on each of
M. C. Liberatori and J. D. Bonadero proposed in [8] a
the state bytes independently. The ByteSub
low-cost solution to deal with the sub-keys computed
transformation is done using a once-pre-calculated
during the decryption step while merging the key
substitution table called S-box. That S-box table
schedule and the data path parts which makes it
contains 256 numbers (from 0 to 255) and their
practical for cryptographic embedded applications.
corresponding resulting values. For more details of
the method of calculating the S-box table refer to
In this paper, we propose a modified implementation
[1- 2].
of Rijndael cipher using FPGA. Based on the fact that
any FPGA includes built in memory block, we store
InvByteSub Transformation: The InvByteSub
all the results of the fixed operations within the
transformation is done using a once-pre-calculated
memory modules. In order to achieve this goal, we
substitution table called InvS-box. That table (or
first implement the algorithm using C language. This
InvS-box) contains 256 numbers (from 0 to 255) and
step guides us through the FPGA design and helps us
their corresponding values.
in the optimization of the proposed design and in
obtaining the results of the fixed operation.
ShiftRow Transformation: In ShiftRow
transformation, the rows of the state are cyclically left
The present paper is organized as follows: in
shifted over different offsets. Row 0 is not shifted;
Section2, description of Rijndael cipher is given.
row 1 is shifted over one byte; row 2 is shifted over
Then, in Section 3, a description of the proposed
two bytes and row 3 is shifted over three bytes.
design for Very high speed integrated circuits
Hardware Description Language (VHDL)
InvShiftRow Transformation: In InvShiftRow
implementation of Rijndael encryption function is
transformation, the rows of the state are cyclically
detailed. In Section 4, a description of the proposed
right shifted over different offsets. Row 0 is not
design for VHDL implementation of Rijndael
shifted, row 1 is shifted over one byte, row 2 is shifted
decryption function is detailed. Then, in Section 5, a
over two bytes and row 3 is shifted over three bytes.
comparison of the proposed design with other FPGA
designs is illustrated. Finally, the paper concludes in
Mix-Column Transformation: In Mix-Column, the
Section 6.
columns of the state are considered as polynomials
multiplied by a fixed polynomial c(X), given by:
2. Description of Rijndael Cipher c(X)=’03’X3 + ‘01’X2 + ‘01’X + ‘02’. The Mix-
The Rijndael is a block cipher, which operates on Column transformation can be written in a matrix
different keys and block lengths: 128 bits, 192 bits, or multiplication as follows (equation 1):
256 bits. The input to each round consists of a block
of message called the state and the round key. It has ⎡d 0, j ⎤ ⎡02 03 01 01⎤ ⎡c 0, j ⎤
⎢d ⎥ ⎢ ⎢ ⎥
to be noted that the round key changes in every
⎢ 1, j ⎥ = ⎢ 01 02 03 01⎥⎥ ⎢ c1, j ⎥ (1)
round. The state can be represented as a rectangular ⎢d 2, j ⎥ ⎢ 01 01 02 03⎥ ⎢c 2, j ⎥
array of bytes. This array has four rows; the number ⎢ ⎥ ⎢ ⎥⎢ ⎥
of columns is denoted by Nb and is equal to the block ⎢⎣d 3, j ⎥⎦ ⎣03 01 01 01⎦ ⎢⎣c 3, j ⎥⎦
length divided by 32. The same could be applied to
the cipher key. The number of columns of the cipher
key is denoted by Nk and is equal to the key length Where the matrix ‘c’ represents the input to the Mix-
divided by 32. The cipher consists of a number of Column transformation and the matrix ‘d’ represents
rounds - that is denoted by Nr - which depends on the output of the Mix-Column transformation. Both
both block and key lengths. Each round of Rijndael ‘c’ and ‘d’ have a length of 32 bits.
encryption function consists mainly of four different
transformations: ByteSub, ShiftRow, Mix-Column InvMix-Column Transformation: In InvMix-Column,
the columns of the state are considered as polynomials
10
ICGST-PDCS Journal, Volume 9, Issue 1, October 2009
multiplied by a fixed polynomial d(x), given by: round_constant: it goes to the key schedule layer to
d(x)=’0B’x3 + ‘0D’x2 + ‘09’x + ‘0E’. The InvMix- indicate which round is running at the moment and it
Column transformation can be written in a matrix applies a constant value according to the running
multiplication form (equation 2) : round
⎡ b 0, j ⎤ ⎡ 0E 0B 0D 09 ⎤ ⎡a 0, j ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ b1, j ⎥ ⎢ 09 0E 0B 0D ⎥ ⎢ a1, j ⎥ (2)
⎢b ⎥=⎢ ⎥ ⎢a ⎥
⎢ 2, j ⎥ ⎢0D 09 0E 0B ⎥ ⎢ 2, j ⎥
⎢ b 3, j ⎥ ⎣ 0B 0D 09 0E ⎦ ⎢a 3, j ⎥
⎣ ⎦ ⎣ ⎦
3.1. The controller 3.3. The round layers: This block represents the
This block controls the sequencing operations of the rounds as shown in Figure 3.
rounds. It generates the round constant associated
with each round. It also generates the control signal at It contains the row shifter followed by two blocks
the appropriate time, those control signals are: running in parallel. The first one is Mix-Column
key_reg_mux_sel: The key schedule input selector block combined with an S-Box block, and the second
which selects the input key before the first round, and one is S-Box block. The first block calls four look-up
then it reads the output key as the next input. tables to calculate its output. The second block is the
load_key_reg: The key schedule output enable. normal S-Box block. The final output is taken from S-
load_data_reg: The input register enable. Box combined with Mix-Column block except for the
data_reg_mux_sel : The data register selector which final case where the output is taken directly from S-
selects input register data from the plaintext input, Box block using last_mux_sel signal. Then, an X-OR
round0 output which is the X-OR between key zero function is applied with the round key.
and the plaintext input, or the round layers outputs.
last_mux_sel: it goes to the round layer to indicate
the last round.
11
ICGST-PDCS Journal, Volume 9, Issue 1, October 2009
5. Synthesis Results
To ensure that our modifications give a better results
in terms of area and speed than the original one [9],
we compare the two encryption codes (original and
modified). The comparison considered two criteria:
chip speed and area utilization. The design was
implemented on an ALTERA EP20K300EBC652
device. It has to be noted that in [9], no decryption
algorithm was proposed. Both designs were
Figure 4: Block Diagram of decryption Module synthesized using Leonardo Spectrum tool and
simulated using ModelSim tool. As shown in Table 1,
write_key_reg which is used to enable the key an increase in speed about 25% was achieved in our
schedule - which is the same as the one used in the design and a reduction in area about 11% was
12
ICGST-PDCS Journal, Volume 9, Issue 1, October 2009
achieved in our design. On the other hand, since we The last step is to compare our decryption design with
use look-up tables to perform Mix-Column other design approaches. In [10] a design of
transformation, the new encryption design needs decryption module based on Altera FPGA is
more memory (as shown in Table 1), but this increase proposed. As shown in Table 3, our design
is not bad as it utilized the memory resources inside outperforms the design in [10] by a 84.21% reduction
the FPGA efficiently. For the sake of comparing the in area and 97.12% increase in speed with a price of
design with another Altera based designs, the design increased memory utilization.
in [10] is including in the table. The comparison
shows that the proposed design has a higher Table 3. Comparison Between the Altera Based Designs of
throughput compared to the design in [10]. the Decryption Module
Resource Used Available Utilization
Table 1. Comparison between the Altera Based Designs of
the Encryption Module IOs 387 408 94.85%
LCs 1494 16640 8.98%
Our design
Resource Used Available Utilization Memory Bits 169984 212992 79.81%
Speed 72.9 MHz
Ios 387 408 94.85%
Original IOs 387 408 94.85%
design [9] LCs 897 11520 7.79%
Memory Bits 40960 147456 27.78% Piotr [10] LCs 2885 11520 25.04%
Speed 81.2 MHz Memory Bits 40960 147456 27.78%
Ios 387 408 94.85% Speed 40.69 MHz
Modified
LCs 797 11520 6.92%
proposed
Design
Memory Bits 114688 147456 77.78% Note that the implementation in [10] needs 21 clock
Speed 99.1 MHz cycles to give a key compared to 13 clock cycles in
Ios 387 408 94.85% our design. The FPGAs used are EP20K300EBC652
LCs 1032 11520 8.96%
Piotr [10] in our design and EPF10K250AGC599-1 in [10].
Memory Bits 40960 147456 27.78%
Speed 44 MHz
We have to indicate that the new trends to implement
the AES employ VLSI (ASIC) and reconfigurable
Then, we compare our design with different designs.
architectures to improve the system efficiency by
A good comparison of the different designs
means of speed, power and size. Using such
implemented in xilinix FPGA chips could be found in
techniques to implement encryptors may achieve
[10]. To compare our work with those in [10], we
speeds up to 17.8 Gbps and more than ten times
implemented our design in xilinix XC2V2000 family
power efficient compared to field programmable gate
which has sufficient memory for our design. The
arrays implementations. Therefore, our modified
maximum core frequency was 80.5 MHz. Since each
algorithm could achieve higher throughput using such
round was finished in one clock cycle, plus two clock
techniques.
cycles for registering the inputs, 13 clock cycles were
needed in a 128-bit key design. As a result, the
throughput of the Rijndael core was 793.3 Mbit/sec 6. Conclusions
which could be improved to 937.6 Mbit/sec if the two In this paper we approved a modified implementation
registering cycles are performed in parallel. of Rijndael AES encryption standard based on the fact
Throughput calculated by other researches is listed that any FPGA includes built in memory block;
below in Table 2. therefore we store all the results of the fixed
operations within the memory modules. The
Table 2 Comparison of 128-bit Rijndael Encryption modification gives an 11% reduction in area and 25%
Implementations increase in speed (Throughput) compared with the
Design Device
Area Throughput original design. Our design gives the highest
(CLB) (Mbit/sec) throughput and area utilization over all the Iterative
Gaj [11] IL XCV1000BG560-6 2902 331.5
Looping based FPGA implementations. The
Dandalis [12] IL XCV1000 5673 353.0
Tong [13] IL XCV1000HQ240-6 702 755.1
decryption algorithm is implemented and gives better
Our design IL XC2V2000BF957-6 402 793.3 results than the design in [10].
Elbirt [14] P XCV1000BG560-4 10992 1937.9
McLoone[15] P XCV3200EBG560-8 7576 3239.0
McLoone[15] P XCV812EBG560-8 2222 6956.0
13
ICGST-PDCS Journal, Volume 9, Issue 1, October 2009
14
ICGST-PDCS Journal, Volume 9, Issue 1, October 2009
Biographies:
Mohamed Bakr Abdelhalim Dr. A. Tobal is an Associate
received his BSc, MSc and PhD Prof. at the Electronics
degrees in Electronics Research Institute, Cairo,
Engineering from Cairo Egypt. He received his B.
University in 1999, 2003 and Sc., M. Sc. And Ph. D. from
2008 respectively. He was a Faculty of Engineering, Cairo
researcher at the Department of University in 1990, 1994 and
Electrical and Communications 1999 respectively.
Engineering, Cairo University. His fields of research include embedded systems
He was also a teaching and lab implementation, digital signal processing,
assistant at the Electronics Engineering Department, Biological Neural Network, pattern recognition and
American University in Cairo. Currently he is an Artificial Intelligence.
assistant professor at the College of Computing and
Information Technology - Arab Academy for Dr. Farouk is an Assistant
Science, Technology and Maritime Transport. Prof. He joined the Electronics
His research interests are FPGA-based design, Research Institute, Egypt, in
System-level design and description languages, CAD 1993. His fields of research
tools, and Soft computing techniques. are signal processing, image
compression, video
Dr. Heba K. Aslan is an processing, video
Associate Professor at the compression, video indexing
Electronics Research Institute, and retrieval, video on
Cairo- Egypt. She received her demand, pattern recognition and machine vision.
B. Sc., M. Sc. And Ph. D. from Dr. Farouk received his Ph.D. at 2001 from
Faculty of Engineering, Cairo Electronics & Communications Dept., Faculty of
University in 1990, 1994 and Engineering, Cairo Univ. and his M.Sc. at 1996
1998 respectively. Her research from Electronics & Communications Dept.,
interests include: Key Distribution Protocols, Faculty of Engineering, Cairo Univ.
Authentication Protocols, Logical Analysis of
Protocols and Intrusion Detection Protocols.
15