Professional Documents
Culture Documents
Cryptographic Accelerator in Reconfigurable Hardware
Cryptographic Accelerator in Reconfigurable Hardware
Abstract
This article presents a general overview on the cryptography highlighting and its importance in the protection of
information. The main objective of this work is the implementation of the asymmetric algorithm, RSA, using the
Montgomery algorithm, for modular multiplication, with keys of 4096 bits. Increasingly, companies and organizations use
advanced technology to facilitate and expedite transactions of information. Privacy in the information becomes
necessary to ensure the safety of the transmitted data and cryptography is one of the resources used. Modular
Multiplication is a central operation in many application areas including public key cryptography. Therefore, there
is a need for algorithms that are on the one hand, fast and on the other hand, have an area and power efficient.
Therefore and in order to accelerate the process of encryption, the exponential multiplication has been implemented en
reconfigurable hardware (FPGA).
The Montgomery multiplication algorithm is a very base and another one for the exponent.
The Montgomery multiplication operation will use the (x_in*y_in*R-1) mod m (m is the module and R=2n,
following conditions: X = 23, Y = 20, m = 27, R = 32 n=number of bits, and the same throughout all the
-1
and R = 11. calculation), and a multiplication by R of operator x_in,
(x_in*R) mod m, that carries through the change of
domain to the Montgomery’s domain. The result will be
Number 23 is changed into its image in the domain of
the value of one or another multiplication, based on the
Montgomery:
value of the signal selMM.
X' = MM (X, R ²) = MM (23,32 ²) = 23,32 ²,11 mod 27 =
23,32 mod 27 = 7
Montgomery Multiplication (MM)
Through the Montgomery multiplication between (X') 5 because of hardware area usage limitations. Together
5
and 1 it’s possible to obtain (x) as an Integer number. with a fast exponentiation algorithm implementation,
5 5
(x) mod 27 = MM ((X') , 1) = MM (10,1) = 10.1.11 mod this component can allow cryptography with keys of
0 1 5 15 16 5,593
1 5 5 14
2 5 25 7 Memories
3 125 25 6
For the cryptographic calculation with base and
4 125 625 3
exponent values with 4096 bits it becomes necessary
5 78125 625 2
to store its values in blocks of memory RAM to help
6 78125 390625 1
the processing. The used memories are of type
7 30517578125 390625 0
RAMB16_S36_S36. These memories allow storing the
4096 intended bits with a fast reading. They allow
The result of 515=30517578125, as intended, is access to 64 bits of data, in each reading. If using 32
obtained faster. The multiplication to carry through bits (in the developed case), it only needs one memory
depends on the parity of the exponent. When this is access every two iterations.
even, x is modified with the result of x 2. When this is
odd, result is modified with the result of result*x. Control Unit
The management of the signals and registers through
Exponent
all the cryptographic calculations is done by a state
This component makes the update of the exponent’s machine. This unit carries through, in a general way,
value throughout all the calculation. The value of the the control of registers and signals on the basis of the
get the final result. In this way, the exponent functions well the accesses to the memories.
exponent will have the value zero. operands, between 10 states, having each one of
them, different functions. The states are: test, comeca,
This component has one operand, exp_in, and a select inicio, um, dois, tres, fimIter, montToint, intTomont and
Example:
515970 50 mod 610391 = 413136
MEM EXP Exp
X
Decomposing the base in factors:
x_inter
515970 = 81 * 65 * 98
Result Comes:
515970 50 mod 610391 = (81 * 65 * 98) 50
mod 610391
50 50 50
= ( 81 * 65 * 98 ) mod 610391 =
The calculation finishes when the whole exponent
( (8150 mod 610391) * (6550 mod 610391) * (9850 mod
became zero and not when each one of the n read bits
610391)) mod 610391 = 413136
of the exponent became to zero. When the bits of each
reading are zero, it is still necessary to continue to
With these factors the succession of calculations is the
update the register x_inter for the next iterations.
following one:
1. Calculate 8150 mod 610391 = 607196
Example: (n=2)
2. Calculate 6550 mod 610391 = 400590
29 (expoente=1001b, base=10b)
3. Calculate 9850 mod 610391 = 87270
Step result x_inter Exponent contabit
3. ( 607196 * 400590 ) mod 610391 = 104877
0 1 2 1->(01) 0 4. ( 104877 * 87270 ) mod 610391 = 413136
1 2 2 0 1
2 2 4 0 2 Separating the base in three factors the same result is
3 2 16 2->(10) 0 obtained. Instead of carrying through the
4 2 256 1 1 exponentiation one time, its necessary to carry through
5 512 256 0 2 an exponentiation for each factor and after that n
multiplications between the results of the factors (with
n= (nº factors) -1). This allows a calculation with The time of execution of each test (nº Cycles*Period)
operands with fewer bits each, improving the is:
processing.
64 189 ns
32 85 ns Using the values from above, a graphic can be
constructed where both variables can be related,
16 39 ns
Occupation Space Vs. Execution Time.
Ocupation Vs Time
0,104
0,102 64 bits
0,1
Time
0,098
0,096
0,094 32 bits
0,092
16 bits
0,09
0 2000 4000 6000 8000 10000 12000
Ocupation
implemented in a device of low cost. The circuit was
From the graphic it’s possible to verify that as the total specified in VHDL (High Speed Integrated Circuit
execution time grows the occupation space also Hardware Description Language) language, assuring
increases. This way is confirmed that using iterations the portability of the architecture to other technologies.
with lesser bits, the necessary space decreases.
Although the number of clock cycles, until obtaining The architecture developed in this project can be
the final result, increases (because of the lesser clock modified as necessary. As suggestion for the
period), the execution time is minor. continuation of the work, some of the following
This algorithm carries through the operations of modifications can be tried:.
multiplication in a iterative form functioning almost Implementation using other key sizes. This can be
sequentially. This does not imply advantages in done with the introduction of more memories and
execution time by using more or less bits for operand. adjustment of the circuit to perform more iterations to
With operands with fewer bits each multiplication is process more data.
perform more quickly, but it requires more operations Introduction of pipeline stages. The introduction of a
to be fulfilled, and vice versa. system in pipeline can increase the performance of the
Having almost an equality in the necessary time to circuit, mainly in the component that carries through
obtain the result, the occupation area, in each one of the Montgomery multiplication.
the tests, is as a more important role for the choice of Better management of the base factors. The
the best option. Thus the accomplishment of the management of the base in this architecture can
operation with fewer bits is the most viable solution. require a high computations time. A better processing
will improve the performance of the circuit.
Conclusions
With the increasing technological evolution, the References
computers have their capacity of processing increased [1] - Alfred J. Menezes, Paul C. Van Oorschot, Scott.
periodically, and so it becomes necessary to Vanstone, Handbook of Applied Cryptography, CRC-
constantly improve the cryptographic systems. Proof of Press, 1ª edition, December 1996
this is the fact that security in computer networks is [2] - Francisco Rodríguez-Henríquez, N. . Saqib. Díaz-
one of the areas where the continuous search for Perez, Cetin Kaya Koc, Cryptographic Algorithms on
newer alternatives provides a bigger development in Reconfigurable the Hardware (Signals and
the technological realm. Communication Tecnology), Springer, 1ª edition,
Another factor that values the asymmetrical November 2006
cryptography is the continuous growth of the electronic [3] - E. Savas, c. K. Koç, The Montgomery Modular
business that would be impracticable without a Inverse - Revisited, IEEE Computer Society, Vol. 49, nº. 7,
cryptography that is able to provide total security for July 2000
the users. The cryptography system currently used is [4] - F. Bernard, Scalable the hardware implementing
extremely safe, specialists’ esteem that somebody high-radix Montgomery multiplication algorithm,
who tries to break cryptography in the base of the Elsevier North-Holland, Vol. 53, nº 2-3, February 2007
attempt-and-error, would take about 100.000 years [5] - Deschamps Jean-Pierre, Bioul Géry, Sutter Gustavo,
using a common PC. Synthesis of Arithmetic Circuits, Wiley-Interscience,
This work proposed a reconfigurable hardware March 2006
architecture to perform the main operations of [6] - A. Daly, L. Marnane e E. Popovici, Fast Modular
cryptography using the RSA algorithm. This Inversion in the Montgomery Domain on
architecture performs the encryption and decryption of Reconfigurable Logic, Irish Signals and Systems
data of 4096 bits, using iterations of 32 bits, although it Conference — ISSC 2003, Limerick, July, 2003.
has been implemented in a way such that the size of [7] - Guerric Meurice de Dormale, Philippe Bulens e
the operands for iterations can be easily modified. The Jean-Jacques Quisquater, An Improved Montgomery
target technology of this project consisted in logical Modular Inversion Targeted for Efficient
reconfigurable devices FPGA. The circuit was Implementation on FPGA, INTERNATIONAL
designed considering the reduction of the amount of CONFERENCE ON FIELD-PROGRAMMABLE
used resources, which makes possible that it is TECHNOLOGY, 2004