2019-Efficient and Flexible Hardware Structures Ofthe 128 Bit CLEFIA Block Cipher

IET Computers & Digital Techniques
Research Article
Efficient and flexible hardware structures of ISSN 1751-8601

Received on 15th June 2019
Revised 14th September 2019
the 128 bit CLEFIA block cipher Accepted on 3rd October 2019
E-First on 9th January 2020
doi: 10.1049/iet-cdt.2019.0157
www.ietdl.org
Bahram Rashidi1
1Department of Electrical Engineering, Ayatollah Boroujerdi University, Boroujerd 69199-69411, Iran
E-mail: b.rashidi@abru.ac.ir
Abstract: In this study, high-throughput and flexible hardware implementations of the CLEFIA lightweight block cipher are
presented. A unified processing element is designed and shared for implementing of generalised Feistel network that computes
round keys and encryption process in the two separate times. The most complex blocks in the CLEFIA algorithm are
substitution boxes (S0 and S1). The S0 S-box is implemented based on area-optimised combinational logic circuits. In the
proposed S-box structure, the number of logic gates and critical path delay are reduced by using the simplification of
computation terms. The S-box S1 consists of three steps: a field inversion over F 28 and two affine transformations over F 2. The
inversion operation is implemented over the composite field F (24)2 instead of inversion over F 28 which is an important factor for the
reduction of area consumption. In addition, we proposed a flexible structure that can perform various configurations of CLEFIA
to support variable key sizes: 128, 192 and 256 bit. Implementation results of the proposed architectures in 180 nm
complementary metal–oxide–semiconductor technology for different key sizes are achieved. The results show improvements in
terms of execution time, throughput and throughput/area compared with other related works.
1 Introduction adequately scheduling and merging the processing structures. In

[13], a pipelined implementation of CLEFIA is presented. In [16],
In the past few years, several lightweight block ciphers for a compact and high-throughput hardware structure for CLEFIA
hardware implementation have been proposed. Block ciphers are cipher by using addressable shift registers and RAM blocks is
used for data protection in cryptographic applications. These proposed.
cryptographic primitives have been an important area of The focus of this paper is the design and implementation of
cryptographic researches [1–3]. Many lightweight block ciphers efficient Very-large-scale integration (VLSI) structures for the
have been proposed to reduce the costs of hardware consumption CLEFIA block cipher. The throughput, speed processing and
than that of advanced encryption standard (AES), which is one of flexibility of this block cipher are important factors for hardware
the most important and applicable block ciphers. The AES could be implementations. Therefore, the proposed structures are
too expensive for hardware implementation [4]. CLEFIA [5, 6] is a implemented based on efficient components for high-throughput
block cipher with 128 bit data block size and 128, 192 and 256 bit and versatile applications. The implementation results of proposed
key length, which is suitable for hardware implementation such as architectures in 180 nm complementary metal–oxide–
embedded crypto-processor that is used in the cryptographic semiconductor (CMOS) technology for different key sizes are
application systems. This block cipher was developed by SONY achieved. The results show that the proposed structures have high-
corporation to provide flexibility and security in cryptographic throughput and low-computation time with acceptable hardware
applications [5]. This block cipher improves the security of the consumption compared with other related works. The contributions
encryption process with the use of techniques such as diffusion of this paper are as follows:
switch mechanisms, multiple diffusion matrices and two different
non-linear S-boxes [6]. The CLEFIA block cipher was standardised • A unified processing element is designed and shared for
in the International Organization for Standardization (ISO) and the implementing generalised Feistel network (GFN) that computes
International Electrotechnical Commission (IEC) (ISO/IEC the intermediate key and encryption process in the two separate
29192-2) [7]. Also, the CLEFIA is proposed to Application for times.
Cryptographic Techniques towards the Revision of the e-
• The most complex blocks in the CLEFIA algorithm are
Government Recommended Ciphers List and The Internet
substitution boxes S0 and S1. The S0 S-box is implemented based
Engineering Task Force for use in the Internet and the e-
Government. on area-optimised combinational logic circuits. In the proposed
Several application-specific integrated circuits (ASIC) S-box structure, the number of logic gates and critical path delay
implementations of the CLEFIA algorithm have been reported in (CPD) are reduced by using simplification of computation
[6–10]. Akishita and Hiwatari [8] proposed very compact hardware terms.
implementations of CLEFIA with 128 bit keys based on 8 bit shift • The S-box S1 consists of three steps: a field inversion over F 28
registers. The implementations are based on serialised architectures and two affine transformations over F 2. The inversion operation
in the data processing (encryption process) part. In [9], five high- is implemented based on an efficient structure over the
performance hardware architectures for the 128 bit CLEFIA block composite field F (24)2 instead of the field F 28. This is an important
cipher is presented. In [10], both fault diagnosis schemes and factor for reduction of area consumption.
original structures for the CLEFIA are proposed. • We proposed a flexible structure that can perform various
In recent years, several field-programmable gate array-based configurations of CLEFIA to support variable key sizes 128,
implementations of the CLEFIA algorithm have been reported in 192 and 256 bit. Therefore, this architecture provides versatile
the literatures [11–17]. For example, in [12], a compact dual-cipher implementations that enable adaptive security level using a
hardware architecture is proposed to support the CLEFIA and the variable key size.
AES algorithms. The architecture is implemented based on
IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79 69

© The Institution of Engineering and Technology 2019
• The flexible structure is the suitable candidate for a broad range
of applications with multiple levels of security.
The rest of this paper is organised as follows. Section 2 recalls the

CLEFIA block cipher and the proposed hardware structures are
described. Section 5 shows a comparison between our works and
other related works. Moreover, this paper is concluded in Section
6.
2 Description of the CLEFIA block cipher

Fig. 1 Algorithm 1: GFN4, r computation
The CLEFIA cryptographic algorithm is a block cipher that can
process 128 bit data blocks, using 128, 192 and 256 bit secret keys.
In this case, we have a flexible key size. This block cipher was
developed by SONY corporation [5] and standardised by ISO/IEC
29192-2 [7]. The numbers of rounds of CLEFIA are 18, 22 and 26
for 128, 192 and 256 bit keys, respectively. The CLEFIA algorithm
is suitable for efficient hardware implementation of the block
ciphers for cryptographic applications. Data processing part and a
key scheduling part are two main parts of the CLEFIA. A
fundamental structure for CLEFIA is defined as d-branch r-round
GFN GFNd, r which is used for the definition of the data processing
part and the key scheduling part. The CLEFIA uses a 4-branch and
an 8-branch GFNs. The function GFNd, r uses two different 32 bit
F-functions F0 and F1. For d 32 bit inputs Xi and outputs Y i
(0 ≤ i < d), and dr/2 32 bit round keys RKi (0 ≤ i < dr/2), GFN4, r Fig. 2 Algorithm 2: GFN8, r computation
and GFN8, r are defined as Algorithms 1 and 2 (see Figs. 1 and 2),
respectively.
2.1 F-functions
Two F-functions F0 and F1 are used in the GFNd, r. Algorithms 3
and 4 (see Figs. 3 and 4) shows the computations of F-functions F0
and F1, respectively. As seen from these algorithms, the F0 and F1
are constructed based on two non-linear 8 bit S-boxes S0 and S1 and
two 4 × 4 matrices M0 and M1. In each F-function, two S-boxes S0
and S1 are used in the different order and different matrix is used Fig. 3 Algorithm 3: F0 function computation
( M0 for F0 and M1 for F1). Two matrices M0 and M1 used in each F-
function are defined as follows:
0×1 0×2 0×4 0×6

0×2 0×1 0×6 0×4
M0 = , M1 =
0×4 0×6 0×1 0×2
0×6 0×4 0×2 0×1
0×1 0×8 0×2 0×a
0×8 0×1 0×a 0×2
. Fig. 4 Algorithm 4: F1 function computation
0×2 0×a 0×1 0×8
0×a 0×2 0×8 0×1 2.1.2 S-box S1: The 8 bit S-box S1, for input x and output y, is
defined as follows:
The multiplication operations of these matrices by the vector T are
performed in the binary finite field F 28 constructed by the g( f (x)−1) if f (x) ≠ 0
irreducible primitive polynomial z8 + z4 + z3 + z2 + 1. y=
g(0) if f (x) = 0
2.1.1 S-box S0: This block is constructed based on four 4 bit S- The S-box S1 consists of three steps: two affine transformations f
boxes SS0, SS1, SS2 and SS3 as the following computations: and g over F 2 and a field inversion over F 28 [5]. The inverse
function is performed in F 28 defined by a primitive polynomial
t0 ← SS0(x0), t1 ← SS1(x1),
f 1(z) = z8 + z4 + z3 + z2 + 1.
u0 ← t0 ⊕ 0 × 2 × t1, u1 ← t1 ⊕ 0 × 2 × t0,
2.2 Key schedules of the CLEFIA block cipher
y0 ← SS2(u0), y1 ← SS3(u1) The CLEFIA key scheduling supports 128, 192 and 256 bit keys
and it produces whitening keys WKi (0 ≤ i < 4) and round keys
where x = x1 x0 and y = y1 y0 are input and output of S-box S0, RK j (0 ≤ j < 2r) for the data processing (encryption process) part
respectively (xi, yi ∈ F 24). The values of these 4 bit S-boxes are of CLEFIA cipher. The key schedule consists of the two steps. In
presented in Table 1. The multiplication by 0×2 in terms 0×2t0 and the first step, we have been generating L from K. Also, expanding
0×2t1 is performed in F 24 constructed by the primitive polynomial K and L for generating the round keys RK j are performed in the
f 2(z) = z4 + z + 1. second step, where K and L are the key (main key) and an
intermediate key, respectively. The key schedule, to generate L
from K, for a 128 bit key and 192/256 bit keys uses a 128 bit
70 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79

Table 1 Values of four 4 bit S-boxes SS0, SS1, SS2 and SS3
x 0 1 2 3 4 5 6 7 8 9 a b c d e f
SS0(x) e 6 c a 8 7 2 f b 1 4 0 5 9 d 3
SS1(x) 6 4 0 d 2 b a 3 9 c e f 8 7 5 1
SS2(x) b 8 5 e a 6 4 c f 7 2 3 1 0 d 9
SS3(x) a 2 6 d 3 4 5 e 0 7 8 9 b f c 1
GFN4, 12 and a 256 bit GFN8, 10, respectively. For the 128 bit key
scheduling, 24 32 bit constant values CONi(128) (0 ≤ i < 24) as
round keys and K = K3 K2 K1 K0 as an input are applied to
GFN4, 12 to generate intermediate key L. In the next step, the K and
L are used to generate the round keys RK j (0 ≤ j < 36) by using 36
32 bit constant values CONi(128) (24 ≤ i < 60). Algorithm 5 (see
Fig. 5) shows the 128 bit key scheduling.
In the key scheduling for 192 bit keys, the 128 bit values KL, KR
are generated from the 192 bit main key K = K5 K4 K3 K2 K1 K0,
where Ki, (0 ≤ i ≤ 5) are 32 bit numbers. In this case,
KL = K5 K4 K3 K2 and KR = K1 K0 K̄ 5 K̄ 4, where K̄ i is not of Ki.
The constant values CONi(192), where 0 ≤ i < 40 as round keys and
Fig. 5 Algorithm 5: key scheduling for 128 bit keys KL KR as a 256 bit input are applied to GFN8, 10 to generate the two
128 bit values LL, LR. The key scheduling for a 256 bit key is
similar to 192 bit key. However, in case of 256 bit constant key
values, number of RKi and initialisation of KR are different from
case 192 bit key. Algorithm 6 (see Fig. 6) shows key scheduling for
192 and 256 bit keys.
In the key scheduling algorithms, we have the DoubleSwap
function [∑ (X)] which is defined as follows:
∑ (X) = Y = X[120: 63] X[6: 0] X[127: 121] X[62: 7]
where X, Y are the 128 bit numbers.

As seen from Algorithms 5 and 6, we need to compute 60, 84
and 92 32 bit constant values CONi(k) for 128, 192 and 256 bit keys,
respectively. The constant values CONi(k) for k = 128, 192 and 256,
are generated by Algorithm 7 (see Fig. 7). In step 5 of Algorithm 7,
the field multiplication by (0×0002)−1 is performed in the finite
field F 216 constructed by a primitive polynomial
f 3(z) = z16 + z15 + z13 + z11 + z5 + z4 + 1.
3 Proposed hardware structures for 128, 192 and

256 bit keys
In this section, we present hardware structures for implementation
of the CLEFIA block cipher.
3.1 Hardware structure of the F-functions

The multiplication operations of the M0 and M1 matrices by the
vector T are performed in the binary finite field F 28 constructed by
the irreducible primitive polynomial z8 + z4 + z3 + z2 + 1. In the
Fig. 6 Algorithm 6: key scheduling for 192 and 256 bit keys
following, we present the proposed method for implementation of
these matrix multiplications in more details. The numbers 0×2,
0×4, 0×6, 0×8 and 0×A are equal to field elements z, z2, z2 + z, z3
and z3 + z, respectively, in field F 28. Therefore, the matrix
multiplication for Y ← M0 × T is presented as follows:
Y 3 = T 3 + zT 2 + z2T 1 + (z2 + z)T 0

Y 2 = zT 3 + T 2 + (z2 + z)T 1 + z2T 0
Y 1 = z2T 3 + (z2 + z)T 2 + T 1 + zT 0
Y 0 = (z2 + z)T 3 + z2T 2 + zT 1 + T 0
Also, for Y ← M1 × T we have
Fig. 7 Algorithm 7: constant values generation

Y 3 = T 3 + z3T 2 + zT 1 + (z3 + z)T 0 3.1.1 Hardware structure of the S-box S0: The 4 bit S-boxes
Y 2 = z3T 3 + T 2 + (z3 + z)T 1 + zT 0 (SS0, SS1, SS2 and SS3) in the structure of S-box S0 are important
blocks in the VLSI implementation. Here, we use Karnaugh
Y 1 = zT 3 + (z3 + z)T 2 + T 1 + z3T 0 mapping to simplify expressions for each of the four S-box outputs.
Y 0 = (z3 + z)T 3 + zT 2 + z3T 1 + T 0 Then, the expressions are simplified by using simplification of
algebraic terms. The input bits and the output bits are denoted by
The computations of Y 3, Y 2, Y 1 and Y 0 are performed based on x3, x2, x1, x0 and Si, 3, Si, 2, Si, 1, Si, 0, respectively, where 0 ≤ i ≤ 3.
multiplication by terms z, z2 and z3 modulo irreducible polynomial For further optimisation, after simplification by Karnaugh
mapping, we factorised the similar algebraic terms to reduce the
f 1(z) = z8 + z4 + z3 + z2 + 1. Let A = ∑i = 0 aizi be an element in F 28,
7
number of logic gates. The proposed simplifications of four S-
also in vector representation A is equal to vector boxes SS0, SS1, SS2 and SS3 are presented in Table 2.
[a7, a6, a5, a4, a3, a2, a1, a0], where ai ∈ F 2 = {0, 1}. The For example, in the structure of 4-input S-box SS0, the hardware
multiplication by z, z2 and z3 in F 28 are equal to consumption for the original expressions after simplification by
[a6, a5, a4, a3 ⊕ a7, a2 ⊕ a7, a1 ⊕ a7, a0, a7], Karnaugh mapping is equal to 32 2-input AND gates, 13 2-input
[a5, a4, a3 ⊕ a7, a2 ⊕ a7 ⊕ a6, a1 ⊕ a7 ⊕ a6, a0 ⊕ a6, a7, a6] and OR gates and 4 NOT gates. After factorisation, the hardware
[a4, a3 ⊕ a7, a2 ⊕ a7 ⊕ a6, a1 ⊕ a7 ⊕ a6 ⊕ a5, a0 ⊕ a6 ⊕ a5, a7 ⊕ a5, , consumption is reduced to 17 2-input AND gates, 10 2-input OR
a6, a5] gates, 2 2-input exclusive OR (XOR) gates, 1 2-input exclusive
respectively. Figs. 8a–c show structures for multiplication by z, z2 NOR (XNOR) gates and 4 NOT gates. Therefore, the hardware
and z3, respectively, over F 28 with irreducible polynomial resources of the S-boxes are reduced by using the simplification of
f 1(z) = z8 + z4 + z3 + z2 + 1. the algebraic expressions. Figs. 9a–d show the proposed structures
of S-boxes SS0, SS1, SS2 and SS3, respectively. As seen in these
figures, the S-boxes are constructed using only logic gates with an
Fig. 8 Hardware structures for multiplication by

(a) z, (b) z2, (c) z3 over F 28 with irreducible polynomial f 1(z) = z8 + z4 + z3 + z2 + 1
Table 2 Proposed simplifications of four S-boxes SS0, SS1, SS2 and SS3
Outputs After simplification by Karnaugh mapping Further simplification of algebraic terms
SS0, 3 x3′x1′x0′ + x3′x1 x0 + x2′x1′x0′ + x3 x2 x1′x0 + x3 x2 x1 x0′ + x3′x2 x0 x3 x2(x1 ⊕ x0) + x3′(x1 ⊙ x0) + x2′x0′(x3′ + x1′)
SS0, 2 x3′x2 x0 + x3 x2 x0′ + x3′x2′x1′ + x2′x1 x0 x2(x3 ⊕ x0) + x2′(x3′x1′ + x1 x0′)
SS0, 1 x3′x0 + x3′x2 x1 + x2′x1′x0′ + x2 x1 x0 x3′(x0 + x2 x1) + x2′x1′x0′ + x2 x1 x0
SS0, 0 x2 x0 + x3 x1′ + x3 x2 x2(x0 + x3) + x3 x1′
SS1, 3 x2′x1 x0 + x3′x2 x1′x0 + x3′x2 x1 x0′ + x3 x2′ + x3 x1′x0′ x3′x2(x1 ⊕ x0) + x2′(x1 x0 + x3) + x3 x1′x0′
SS1, 2 x3′x2′x1′ + x2′x0 + x3 x1′x0 + x3 x1 x0′ x3(x1 ⊕ x0) + x2′(x3′x1′ + x0)
SS1, 1 x3′x1′x0′ + x3′x2 + x3 x2′x1 + x2 x1′x0 x3′(x1′x0′ + x2) + x3 x2′x1 + x2 x1′x0
SS1, 0 x1 x0 + x2 x0 + x3 x2′x1′x0′ + x3 x2 x1 x0(x1 + x2) + x3(x2′x1′x0′ + x2 x1)
SS2, 3 x3′x1′x0′ + x2′x1′x0′ + x3 x2 x1 + x3′x2′x0 + x3′x1 x0 x3′(x1 ⊙ x0) + x2′(x1′x0′ + x3′x0) + x3 x2 x1
SS2, 2 x3′x1 + x3′x2 x0 + x3 x2′x1′ + x2 x1 x0′ x3′(x1 + x2 x0) + x3 x2′x1′ + x2 x1 x0′
SS2, 1 x2′x1 x0 + x3′x2 x1′ + x3 x2′x3′x1′x0′ x2′(x1 x0 + x3) + x3′(x2 + x0′)
SS2, 0 x3′x2′x0′ + x3 x2′x1′ + x3 x1 x0 + x3 x2 x0′ x3(x2′x1′ + x1 x0 + x2 x0′) + x3′x2′x0′
SS3, 3 x3′x2′x1′x0′ + x3′x1 x0 + x3 x2 x1′ + x3 x2′x1 + x3 x1 x0′ x3′(x2′x1′x0′ + x1 x0) + x3(x2 ⊕ x1 + x1 x0′)
SS3, 2 x3′x1 + x3 x1′x0 + x2 x1 x0′ + x3′x2 x0 x0(x3′x2 + x3 x1′) + x1(x3′ + x0′x2)
SS3, 1 x3′x2′x0′ + x3′x2 x1 x0 + x2′x1′x0 + x2 x1′x0′ + x3 x1′x0 x1′x0(x2′ + x3) + x2(x3′x1 x0 + x1′x0′) + x3′x2′x0′
SS3, 0 x2′x1 x0 + x3′x2 x0′ + x3′x2 x0′ + x3 x0 + x2 x1′x0′ x0′x2(x3′ + x1′) + x0(x3 + x2′x1)

φ−1 : (a7ω3 + a6ω2 + a5ω + a4)β
+(a3ω3 + a2ω2 + a1ω + a0) → b7α7
+b6α6 + b5α5 + b4α4 + b3α3 + b2α2 + b1α + b0
where ai, bi ∈ F 2.
The affine transformation f and the isomorphic mapping φ is
merged to a single matrix operation as φ ∘ f
b7 0 0 0 0 0 0 0 1 a7 0
b6 0 0 0 0 0 0 1 0 a6 1
b5 0 0 0 0 1 1 0 0 a5 1
b4 0 0 0 1 0 0 1 0 a4 0
= × ⊕
b3 0 0 1 0 0 0 0 0 a3 0
b2 0 1 0 0 0 0 0 0 a2 1
b1 0 0 0 0 0 1 0 1 a1 0
b0 1 0 0 1 0 0 0 0 a0 1
The inverse isomorphic mapping φ−1 and the affine transformation

g is also merged to a single matrix operation as g ∘ φ−1
b7 0 0 0 0 0 0 1 0 a7 0
b6 0 0 0 0 0 1 0 1 a6 1
b5 0 0 0 1 1 0 0 0 a5 1
b4 1 0 0 0 0 0 0 0 a4 0
= × ⊕
b3 0 1 0 0 0 0 1 0 a3 1
b2 0 0 0 0 1 1 0 0 a2 0
b1 0 0 1 1 0 0 0 0 a1 0
Fig. 9 Proposed structures of 4 bit S-boxes b0 1 0 1 0 0 0 0 0 a0 1
(a) SS0, (b) SS1, (c)SS2, (d) SS3
These matrix representations can be presented based on the
optimised structure. The CPD of S-boxes SS0, SS1, SS2 and SS3, in following computations for hardware implementation:
the proposed structures are reduced to T X + T A + 2T O, 3T X + 2T O,
2T A + 3T O and T X + T A + 2T O, respectively, where T X, T O and T A φ ∘ f : b7 = a0, b6 = a′1, b5 = (a3 ⊕ a2)′, b4 = (a4 ⊕ a1),
are time delay of the 2-input XOR gate, 2-input OR gate and the 2- b3 = a5, b2 = a′6, b1 = (a2 ⊕ a0), b0 = (a7 ⊕ a4)′
input AND gate, respectively.
The area and CPD of S-box S0 when directly synthesising the (see (1))
equations using Synopsys Design Compiler tool based on the The hardware resources for implementation of the merged
library of standard cells with 180 nm CMOS technology are equal mappings φ ∘ f and g ∘ φ−1 are equal to (two 2-input XOR gates,
to 209 gate equivalent (GE) and 0.964 ns, respectively. two 2-input XNOR gates and two NOT gates) and (two 2-input
XOR gates and four 2-input XNOR gates), respectively.
3.1.2 Hardware structure of the S-box S1: As mentioned In the following, we present the proposed hardware structure of
before, the S-box S1 consists of three steps: two affine the field inversion over composite field F (24)2. Let n1 β + n0 be an
transformations f and g over F 2 and a field inversion over F 28 [5]. arbitrary element in F (24)2, where n1, n0 ∈ F 24, the inversion
Implementation of inversion operation over the composite field m1 β + m0 = (n1 β + n0)−1 (m1, m0 ∈ F 24) can be calculated as follows:
instead of inversion over F 28 is an important factor for reduction of
area consumption [18]. Here, for the proposed implementation, we m1 = n1Δ−1,
use the composite field F (24)2 defined by the following irreducible (2)
polynomials [5]: m0 = (n1 + n0)Δ−1,
F 24 : f 2(z) = z4 + z + 1 where Δ = (n1 + n0)n0 + λn12 and vector representation of λ is equal

to (1, 1, 0, 0) [5].
F (24)2 : q(z) = z2 + z + λ(λ = ω3) Fig. 10a shows the structure of field inversion operation over
composite field F (24)2. This operation is constructed based on four
where ω is a root of f 2(z). We define an isomorphic mapping φ sub-blocks that consist of multiplication by constant λ, four field
from F 28 to F (24)2 and also its inverse isomorphic mapping φ−1 from multiplications, one field inversion, one field squaring and two
F (24)2 to F 28 as follows: field additions. The sub-blocks multiplication, inversion and
squaring are defined over F 24 with primitive polynomial
φ: a7α7 + a6α6 + a5α5 + a4α4 + a3α3 + a2α2 f 2(z) = z4 + z + 1. The multiplication and inversion, squaring
+a1α + a0 → (b7ω3 + b6ω2 + b5ω + b4)β operations over F 24 are implemented based on [19, 20]. However,
in the proposed implementation, we applied further optimisation in
+(b3ω3 + b2ω2 + b1ω + b0)
g ∘ φ−1 : b7 = a1, b6 = (a2 ⊕ a0)′, b5 = (a4 ⊕ a3)′, b4 = a7,

(1)
b3 = (a6 ⊕ a1)′, b2 = (a3 ⊕ a2), b1 = (a5 ⊕ a4), b0 = (a7 ⊕ a5)′
the structures of multiplication and inversion over F 24 for reducing
CPD and area consumption.
Let A = a3z3 + a2z2 + a1z + a0 and B = b3z3 + b2z2 + b1z + b0 be
two field elements in F 24, where ai and bi ∈ F 2. The multiplication
of these two field elements C = A × B = c3z3 + c2z2 + c1z + c0 is
computed as follows [19]:
c0 = a0b0 ⊕ a3b1 ⊕ a2b2 ⊕ a1b3 (3)
c1 = a1b0 ⊕ (a0 ⊕ a3)b1 ⊕ (a2 ⊕ a3)b2 ⊕ (a1 ⊕ a2)b3 (4)
c2 = a2b0 ⊕ a1b1 ⊕ (a0 ⊕ a3)b2 ⊕ (a2 ⊕ a3)b3 (5)
c3 = a3b0 ⊕ a2b1 ⊕ a1b2 ⊕ (a0 ⊕ a3)b3 (6)
The hardware structure of field squaring, field multiplication and

multiplication by constant λ over F 24 are shown in Fig. 10b–d,
respectively. As seen from Fig. 10c, the similar terms (a0 ⊕ a3),
(a2 ⊕ a3) are implemented by resource sharing of XOR gates. For
implementation of the field inversion over F 24, i.e.
I = A−1 mod f 1(z) = i3z3 + i2z2 + i1z + i0, we have [19]
Fig. 10 Proposed structures of
i0 = a0 ⊕ a1 ⊕ a2 ⊕ a1a2 ⊕ a0a1a2 ⊕ a3 ⊕ a1a2a3 (7)
(a) S-box S1, (b) Squaring over F 24, (c) Multiplication over F 24, (d) Multiplication by
constant λ, (e) Inversion over F 24
i1 = a0a1 ⊕ a0a2 ⊕ a1a2 ⊕ a3 ⊕ a1a3 ⊕ a0a1a3 (8)
i2 = a0a1 ⊕ a2 ⊕ a0a2 ⊕ a3 ⊕ a0a3 ⊕ a0a2a3 (9)
i3 = a1 ⊕ a2 ⊕ a3 ⊕ a0a3 ⊕ a1a3 ⊕ a2a3 ⊕ a1a2a3 (10)
In the following, we apply further simplifications on i0, i1, i2 and i3

terms, which are as below:
i0 = a0(1 ⊕ a2) ⊕ a1(1 ⊕ a2) ⊕ a2(1 ⊕ a0a1) ⊕ a3(1 ⊕ a1a2)

= a0a′2 ⊕ a1a′2 ⊕ a2(a0a1)′ ⊕ a3(a1a2)′ (11)
= [a′2(a0 ⊕ a1) ⊕ a2(a0a1)′ ⊕ a3(a1a2)′]
i1 = a0(a1 ⊕ a2) ⊕ a3(1 ⊕ a1) ⊕ a0a1a3

= a0(a1 ⊕ a2) ⊕ a3a′1 ⊕ a0a1a3
(12)
= a0(a1 ⊕ a2) ⊕ a3(a′1 ⊕ a0a1)
= a0(a1 ⊕ a2) ⊕ a0a3 = [a0((a1 ⊕ a2) ⊕ a3)]
i2 = a0(a1 ⊕ a2) ⊕ (a2 ⊕ a3) ⊕ a0a3(1 ⊕ a2) Fig. 11 Proposed structures for generating the constant values
= a0(a1 ⊕ a2) ⊕ a2 ⊕ a3 ⊕ a0a3a′2 (a) CON(k)
i , (b) Multiplication by (0 × 0002)
−1
over F 216
(13)
= a0(a1 ⊕ a2) ⊕ a2 ⊕ a3(1 ⊕ a0a′2)
3.2 Proposed hardware structures of key schedules of
= [a0(a1 ⊕ a2) ⊕ a2 ⊕ a3(a0a′2)′] CLEFIA block cipher
i3 = a0(1 ⊕ a2) ⊕ a1 ⊕ a1(1 ⊕ a2) ⊕ a2(1 ⊕ a0a1) ⊕ a3(1 ⊕ a1a2) Fig. 11a shows the proposed structure for generation of the
constant values. In this structure, four constant values are generated
= a0a′2 ⊕ a1a′2 ⊕ a2(a0a1)′ ⊕ a3(a1a2)′
at each clock cycle (CC). The initial value IV(k) is loaded into Reg1
= [a′2(a0 ⊕ a1) ⊕ a2(a0a1)′ ⊕ a3(a1a2)′]
(14) and CON(0k) to CON(3k) are generated in the first CC, while LD
signal is equal to ‘1’. In the other CCs LD signal is equal to ‘0’ and
the next constant values are computed. The constant values for
128, 192 and 256 bit keys are generated by the proposed structure
in 15, 21 and 23 CCs, respectively. If A, B ∈ F 216 for
Fig. 10e shows the proposed efficient hardware structure for field implementation multiplication by (0×0002)−1 (or z−1), i.e.
inversion over F 24. The hardware consumption and CPD for the B = z−1 A mod f 3(z) we have [20, 21]
original field inversion expressions are equal to (23 2-input XOR
gates and 11 2-input AND gates) and (3T X + 2T A), respectively. B = [a0, (a15 ⊕ a0), a14, (a13 ⊕ a0), a12, (a11 ⊕ a0),
After further simplifications on i0, i1, i2 and i3 terms, the hardware (15)
a10, a9, a8, a7, a6, (a5 ⊕ a0), (a4 ⊕ a0), a3, a2, a1] .
consumption and CPD are equal to (11 2-input XOR gates, 9 2-
input AND gates, 3 2-input NAND gates and 3 NOT gates) and
(3T X + T A), respectively. Therefore, in the proposed structure, the As seen from Fig. 11b, the multiplication by (0 × 0002)−1 is
implemented based on five XOR gates.
number of logic gates and CPD is reduced.
The proposed structures for generation of round keys for cases
The area and CPD of S-box S1 when directly synthesising the
128 and 192/256 bit main keys are shown in Figs. 12a and b,
equations using DC compiler are equal to 291 GE and 2.59 ns, respectively. Multiplexer with control signal Ct and D flip-flop is
respectively. used for implementation of two conditions i odd and i even in

Fig. 12 Proposed structures for generating the round keys for cases
(a) 128 bit, (b) 192 and 256 bit main keys
Table 3 Operations and control signals of the proposed structure for 192 bit round key generation
Control signals St1, St2, St3, St4, St5, en0, en1 Operations
(192) (192) (192
0100001 (192) CON41 CON42 CON43 )
LL ⊕ (CON40 )
(192) (192) (192
0001100 (192) CON45 CON46 CON47 )
∑ (LL) ⊕ KR ⊕ (CON44 )
(192) (192) (192
1010010 (192) CON49 CON50 CON51 )
LR ⊕ (CON48 )
(192) (192) (192
0010100 (192) CON53 CON54 CON55 )
∑ (LR) ⊕ KL ⊕ (CON52 )
(192) (192) (192
0000001 2 (192) CON57 CON58 CON59 )
∑ (LL) ⊕ (CON56 )
(192) (192) (192
0001101 3 (192) CON61 CON62 CON63 )
∑ (LL) ⊕ KR ⊕ (CON60 )
(192) (192) (192
0010010 2 (192) CON65 CON66 CON67 )
∑ (LR) ⊕ (CON64 )
(192) (192) (192
0010110 3 (192) CON69 CON70 CON71 )
∑ (LR) ⊕ KL ⊕ (CON68 )
(192) (192) (192
0000001 4 (192) CON73 CON74 CON75 )
∑ (LL) ⊕ (CON72 )
(192) (192) (192
0001101 5 (192) CON77 CON78 CON79 )
∑ (LL) ⊕ KR ⊕ (CON76 )
(192) (192) (192
0010010 4 (192) CON81 CON82 CON83 )
∑ (LR) ⊕ (CON80 )
(192) (192) (192

Algorithms 6 and 7 (line 6 in Algorithm 6 and lines 13 and 19 in (192) CON41 CON42 CON43 )
Algorithm 7). If i be an even number zero input of multiplexer is LL ⊕ (CON40 ). In this case, for computing
selected and for odd number of i one input of multiplexer is this term, the control signals St2 and en1 are equal to ‘1’ and other
selected. Therefore, in these structures, for loops with even number control signals are set to ‘0’. Also, the next CCs and computation
and odd number the output of D flip-flop is equal to ‘0’ and ‘1’, terms are implemented by the proposed structure based on Table 3.
respectively. Table 3 shows operations and control signals of the
proposed structure for 192 bit round key generation (Fig. 12b). For
example, in the first CC, we have computation of term

Fig. 13 Proposed structures of CLEFIA block cipher for
(a) 128 bit key, (b) 192 bit key, (c) 256 bit key
key is presented in Fig. 14a. In the first part of computations, the

control signals Se0 and Se1 are set to ‘0’ and ‘1’, respectively. Now,
constant values CONi, CONi1, CONi2 and CONi3 and main key K
are applied to the structure. In this case, the structure is configured
for computing the 32 bit parts (L0, L1, L2 and L3) of L based on
applied constant values and main key. The GFN4, 20 block (for 128
bit key) at each CC computes two rounds of computations.
Therefore, generating the L required to ten CCs. After computation
of L, the control signals Se0 and Se1 are set to ‘1’ and ‘0’,
respectively, and the structure is configured for computing the
ciphertext C0 − C3 in the encryption process. Also, round key
generator block based on two inputs L and K generates RKi, RKi1,
RKi2 and RKi3 for encryption part. In this structure, the start of the
data processing and generating of round keys is concurrent. The
proposed structures for 192 and 256 bit key are shown in Figs. 13b
Fig. 14 Timing diagrams of the proposed structures for cases and c, respectively. The functionality of proposed structures for the
(a) 128 bit key, (b) Flexible structure configured for 256 bit key 192 and the 256 bit key is similar to 128 bit structure.
3.3 Hardware structure of the CLEFIA block cipher
4 Proposed flexible structures for CLEFIA block
In the CLEFIA block cipher, GFNn, r function is used for cipher
generating the L (the first part) and encryption processing (the
In this section, we present a flexible structure for implementation
second part) in two separate times. The proposed structure of
of the CLEFIA block cipher. As mentioned before, the CLEFIA
CLEFIA block cipher for the 128 bit key is shown in Fig. 13a.
block cipher has several key sizes. The flexible architecture is for
Also, the timing diagram of the proposed structure for case 128 bit
Fig. 15 Proposed flexible structure of the CLEFIA block cipher
the realisation of the CLEFIA block cipher with different security compared in this section. The ASIC results are achieved by using
levels and security configuration management. Here, key sizes 128, Synopsys Design Compiler tool based on the library of standard
192 and 256 are supported by the proposed structure. The proposed cells with 180 nm CMOS technology. The area is usually measured
flexible structure for implementation of the CLEFIA block cipher in μm2 and GE, where one GE is equivalent to the area of a 2-input
is shown in Fig. 15. The width of the registers and multiplexers are NAND gate with the lowest driving strength of the corresponding
equal to 32 bit. The structures are configured based on five control technology. In other words, this metric represents the amount of
signals Sr0, Sr1, Sr2 and Sel128_192_256. The control signal Sel128_192_256 consumed area normalised to the area of one 2-input NAND gate.
is used to select the initial constant values IV(k) for related key size The performance and results of the designs are evaluated in terms
configuration. The configuration between generating L from the of CPD, area, computation time, throughput and throughput/area.
main key and encryption process (round keys are generated The results of the proposed works are achieved based on
concurrently with the encryption process) is performed based on different key sizes (128, 192 and 256 bit). Table 4 shows the results
the control signal Sr0. In this case, if Sr0 = 0, the structure is of the proposed implementations and other related works for
configured for generating the L from the main key and also if CLEFIA cipher. The area and critical path values are achieved
Sr0 = 1 the encryption process is started. The timing diagram of the from post-synthesis results (after performing place and route). In
proposed flexible structure configured for 256 bit key is presented this table, configurations (128-128, 128-192 and 128-256) are
in Fig. 14b. For both parts generating the L from main key and separated to provide the comparisons between the existing designs
encryption process, in the first CC, the control signal Sr1 is equal to and the proposed ones. The optimisation in works [6, 9, 10] is
‘1’ and for other CCs this control signal is equal to ‘0’. The control based on area and speed. In [9], five high-performance hardware
signal Sr2 is used for selecting between cases 192 and 256 bit keys architectures for the 128 bit CLEFIA block cipher is presented. In
(for Sr2 = ‘1’ 192 bit key is selected; otherwise, 256 bit key is [10], both fault diagnosis schemes and original structures for the
CLEFIA are proposed. Akishita and Hiwatari [8] proposed very
selected for computations). The hardware consumption and CPD of
compact hardware implementations of CLEFIA with 128 bit keys
the proposed flexible structure of the CLEFIA block cipher are
based on 8 bit shift registers. The implementations are based on
presented in the following section.
serialised architectures in the data processing part.
The employed 180 nm ASIC technology is not so recent
5 Results and comparison compared with the 90 and 65 nm ones, which were used in the
The ASIC implementation results of the proposed structures with competitive designs. To ensure a fair and clear comparison, for
other ASIC implementations of the CLEFIA block cipher are different technologies (90 and 65 nm), the parameters CPD, time,

Table 4 Results of the proposed implementations and other related works on CLEFIA cipher
Works Technology, nm Area Number of CPD (Norm.), Time (Norm.), Thr. (Norm.), Thr./area
(GE) CCs ns ns Mbps (Norm.),
Mbps/GE
[6], type-A, Table, A, 90 10,380 18 2.76 (8.52) 49.68 (153.36) 2576.49 (834.64) 0.248 (0.080)
128-128
[6], type-A, Table, S, 90 11,160 18 1.75 (5.40) 31.50 (97.2) 4063.49 0.364 (0.118)
128-128 (1316.87)
[6], type-A, Com., A, 90 6050 18 4.76 (14.70) 85.68 (264.6) 1493.93 (483.75) 0.247 (0.079)
128-128
[6], type-A, Com., S, 90 9330 18 1.90 (5.87) 34.20 (105.66) 3742.69 0.401 (0.129)
128-128 (1211.43)
[6], type-A, T-box, A, 90 13,830 18 1.77 (5.46) 31.86 (98.28) 4017.58 0.291 (0.094)
128-128 (1302.40)
[6], type-A, T-box, S, 90 21,070 18 1.34 (4.14) 24.12 (74.52) 5306.80 0.252 (0.082)
128-128 (1717.66)
[6], type-B, Table, A, 90 8100 36 2.77 (8.55) 99.72 (307.8) 1283.59 (415.85) 0.159 (0.051)
128-128
[6], type-B, Table, S, 90 12,250 36 1.93 (5.96) 69.48 (214.56) 1842.26 (596.57) 0.150 (0.049)
128-128
[6], type-B, Com., A, 90 5490 36 6.75 (20.84) 243 (750.24) 526.75 (170.61) 0.096 (0.031)
128-128
[6], type-B, Com., S, 90 6900 36 2.77 (8.55) 99.72 (307.8) 1283.59 (415.85) 0.186 (0.025)
128-128
[9], area, 128-128 90 5979 18 4.43 (13.68) 79.7 (246.24) 1605.94 (519.82) 0.269 (0.087)
[9], S, 128-128 90 12,009 13 3.37 (10.40) 42.62 (135.2) 3003 (946.75) 0.250 (0.079)
[9], A, 128-128 90 4950 36 4.97 (15.34) 178.85 (552.24) 715.69 (231.78) 0.145 (0.047)
[9], S, 128-128 90 9377 36 2.57 (7.93) 92.41 (285.48) 1385.10 (448.37) 0.148 (0.048)
[10], S, 128-128 65 9423 18 1.65 (6.81) 29.63 (122.58) 4310 (1044.22) 0.457 (0.111)
this work, 128-128 180 8667 15 3.08 46.2 2770.56 0.320
[9], A, 128-192 90 8536 22 4.84 (14.94) 106.50 (328.68) 1201.85 (389.44) 0.141 (0.046)
[10], A, 128-192 65 14,388 22 1.88 (7.76) 41.42 (170.72) 3090 (749.77) 0.215 (0.052)
this work, 128-192 180 14,303 21 3.41 71.61 1787.46 0.125
[9], A, 128-256 90 8482 26 4.84 (14.94) 125.87 (388.44) 1016.95 (329.52) 0.120 (0.039)
[10], A, 128-256 65 14,320 25 1.88 (7.75) 45.86 (193.75) 2620 (660.65) 0.183 (0.046)
this work, 128-256 180 14,179 23 3.41 78.43 1632.03 0.115
CCs: clock cycles; CPD: critical path delay; Thr.: throughput; Norm.: normalised; throughput; Com.: compact, S: speed is improved; and A: area is improved.
The bold values are used to denote the results of the proposed works.
Table 5 Results of the proposed implementations on flexible CLEFIA cipher

Works Technology Area Number of CCs CPD, Time, ns (128/192/256) Thr., Mbps (128/192/256) Thr./area, Mbps/GE
(GE) (128/192/256) ns (128/192/256)
this work 180 nm 14,951 (15/21/23) 4.045 (60.675/84.945/93.035) (2109.6/1506.86/1375.823) (0.141/0.101/0.092)
throughput and throughput/area (Table 4) of other works are important factor for the reduction of area consumption. Besides, we
normalised (scaled) to 180 nm technology based on presented proposed a flexible structure that can perform various
methods in works [22, 23]. The proposed structures archived the configurations of CLEFIA cipher to support variable key sizes 128,
better timing information based on the normalised results. As given 192 and 256 bit. This architecture provides a versatile
in this table, the proposed structures consume acceptable hardware implementation that supports different security levels using a
resources with reasonable timing characteristics compared with the variable key size. Implementation results of the proposed
other architectures. We also get improvements in terms of area, architectures in 180 nm CMOS technology for different key sizes
throughput and throughput/area. are achieved. The results show improvements in terms of area,
Results of the proposed implementation on flexible CLEFIA throughput and throughput/area compared with other existing
cipher are shown in Table 5. This structure supports three main key works.
sizes of 128, 192 and 256 bit. As given in Table 5, the proposed
structure consumes an acceptable area with low CPDs. 7 References
[1] Hatzivasilis, G., Fysarakis, K., Papaefstathiou, I., et al.: ‘A review of
6 Conclusion lightweight block ciphers’, J. Cryptogr. Eng., 2018, 8, (2), pp. 141–184
[2] Mohd, B.J., Hayajneh, T., Vasilakos, A.V.: ‘A survey on lightweight block
Efficient and flexible ASIC implementations of the lightweight ciphers for low-resource devices: comparative study and open issues’, J.
CLEFIA block cipher are presented. The most complex blocks in Cryptogr. Eng., 2015, 58, pp. 73–93
the CLEFIA algorithm are substitution boxes (S0 and S1). The S0 S- [3] Kitsos, P., Sklavos, N., Parousi, M., et al.: ‘A comparative study of hardware
box is implemented based on area-optimised combinational logic architectures for lightweight block ciphers’, J. Cryptogr. Eng., 2012, 38, pp.
148–160
circuits by further simplification of the algebraic structure of S- [4] Rezaeian Farashahi, R., Rashidi, B., Sayedi, S.M.: ‘FPGA based fast and
boxes. The S-box S1 is implemented based on a field inversion over high-throughput 2-slow retiming 128 14;bit AES encryption algorithm’,
F 28 and two affine transformations over F 2. The proposed structure Microelectron. J., 2014, 45, pp. 1014–1025
[5] The 128 bit block cipher CLEFIA: algorithm specification, Sony Corporation,
of inverse is designed over the composite field F (24)2, which is an Version 1, 2010

[6] Shirai, T., Shibutani, K., Akishita, T., et al.: ‘The 128 14;bit block cipher [15] Proenca, T., Chaves, R.: ‘Compact CLEFIA implementation on FPGAS’.
CLEFIA (extended abstract)’. Proc. Int. Workshop on Fast Software Proc. 21st Int. Conf. Field Programmable Logic and Applications, Chania,
Encryption, Luxembourg, 2007 (LNCS, 4593), pp. 181–195 Greece, 2012, pp. 512–517
[7] CLEFIA standardization in ISO/IEC 29192-2. Available at http:// [16] Bittencourt, J.C., Resende, J.C., Oliveira, W.L., et al.: ‘CLEFIA
www.sony.net/Products/cryptography/clefia/standard/iso.html, accessed in implementation with full key expansion’. Proc. Euromicro Conf. Digital
November 2012 System Design, Funchal, Portugal, 2015, pp. 555–558
[8] Akishita, T., Hiwatari, H.: ‘Very compact hardware implementations of the [17] Hanley, N., Neill, M.: ‘Hardware comparison of the ISO/IEC 29192-2 block
block cipher CLEFIA’. Proc. Int. Workshop on Selected Areas in ciphers’. Proc. Computer Society Annual Symp. VLSI, Amherst, MA, USA,
Cryptography, Toronto, Canada, 2012 (LNCS, 7118), pp. 278–292 2012, pp. 57–62
[9] Sugawara, T., Homma, N., Aoki, T., et al.: ‘High-performance ASIC [18] Rudra, A., Dubey, P.K., Jutla, C.S., et al.: ‘Efficient Rijndael encryption
implementations of the 128 14;bit block cipher CLEFIA’. Proc. Int. Workshop implementation with composite field arithmetic’. Proc. Cryptographic
on Selected Areas in Cryptography, Seattle, WA, USA, 2008 (LNCS, 7118), Hardware and Embedded Systems (CHES), Paris, France, 2001 (LNCS,
pp. 2925–2928 2162), pp. 171–184
[10] Mozaffari-Kermani, M., Azarderakhsh, R.: ‘Efficient fault diagnosis schemes [19] Paar, C.: ‘Efficient VLSI architectures for bit-parallel computation in Galois
for reliable lightweight cryptographic ISO/IEC standard CLEFIA fields’. PhD thesis, Institute for Experimental Mathematics, University of
benchmarked on ASIC and FPGA’, IEEE Trans. Ind. Electron., 2013, 60, Essen, Essen, Germany, June 1994
(12), pp. 5925–5932 [20] Rashidi, B., Sayedi, S.M., Rezaeian Farashahi, R.: ‘Efficient implementation
[11] Chaves, R.: ‘Embedded systems design with FPGA: compact CLEFIA of bit-parallel fault-tolerant polynomial basis multiplication and squaring over
implementation on FPGAs’ (Springer, New York, 2013, 1st edn.), pp. 225– GF(2m)’, IET Comput. Digit. Tech., 2016, 10, (1), pp. 18–29
243 [21] Rashidi, B., Rezaeian Farashahi, R., Sayedi, S.M.: ‘Fast and pipelined bit-
[12] Resende, J.C., Chaves, R.: ‘Dual CLEFIA/AES cipher core on FPGA’. Proc. parallel Montgomery multiplication and squaring over GF(2m)’. Proc. 12th
Int. Symp. Applied Reconfigurable Computing, Bochum, Germany, 2015 Int. Iranian Society of Cryptology Information Security and Cryptology
(LNCS, 9040), pp. 229–240 (ISCISC), Gillan, Iran, 2015, pp. 17–22
[13] Kryjak, T., Gorgon, M.: ‘Pipeline implementation of the 128 14;bit block [22] Stillmaker, A., Baas, B.: ‘Scaling equations for the accurate prediction of
cipher CLEFIA in FPGA’. Proc. Int. Conf. Field Programmable Logic and CMOS device performance from 180 to 7 nm’, Integr. VLSI J., 2017, 58, pp.
Applications, Prague, Czech Republic, 2009, pp. 373–378 74–81
[14] Suryawanshi, V.A., Manna, G.C., Dorale, S.S.: ‘Compact and high-speed [23] Wong, H.-S., Frank, D.J., Solomon, P., et al.: ‘Nanoscale CMOS’, Proc.
hardware implementation of the block – cipher clefia’, Int. J. Comput. Appl., IEEE, 1999, 87, (4), pp. 537–570
2016, 133, (8), pp. 17–20


2019-Efficient and Flexible Hardware Structures Ofthe 128 Bit CLEFIA Block Cipher

Uploaded by

Copyright:

Available Formats

You might also like

2019-Efficient and Flexible Hardware Structures Ofthe 128 Bit CLEFIA Block Cipher

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2019-Efficient and Flexible Hardware Structures Ofthe 128 Bit CLEFIA Block Cipher

Uploaded by

Copyright:

Available Formats

IET Computers & Digital Techniques

Efficient and flexible hardware structures of ISSN 1751-8601

1 Introduction adequately scheduling and merging the processing structures. In

IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79 69

The rest of this paper is organised as follows. Section 2 recalls the

2 Description of the CLEFIA block cipher

0×1 0×2 0×4 0×6

70 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79

∑ (X) = Y = X[120: 63] X[6: 0] X[127: 121] X[62: 7]

where X, Y are the 128 bit numbers.

3 Proposed hardware structures for 128, 192 and

3.1 Hardware structure of the F-functions

Y 3 = T 3 + zT 2 + z2T 1 + (z2 + z)T 0

Also, for Y ← M1 × T we have

Fig. 7 Algorithm 7: constant values generation

IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79 71

Fig. 8 Hardware structures for multiplication by

72 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79

The inverse isomorphic mapping φ−1 and the affine transformation

F 24 : f 2(z) = z4 + z + 1 where Δ = (n1 + n0)n0 + λn12 and vector representation of λ is equal

g ∘ φ−1 : b7 = a1, b6 = (a2 ⊕ a0)′, b5 = (a4 ⊕ a3)′, b4 = a7,

c0 = a0b0 ⊕ a3b1 ⊕ a2b2 ⊕ a1b3 (3)

c1 = a1b0 ⊕ (a0 ⊕ a3)b1 ⊕ (a2 ⊕ a3)b2 ⊕ (a1 ⊕ a2)b3 (4)

c2 = a2b0 ⊕ a1b1 ⊕ (a0 ⊕ a3)b2 ⊕ (a2 ⊕ a3)b3 (5)

c3 = a3b0 ⊕ a2b1 ⊕ a1b2 ⊕ (a0 ⊕ a3)b3 (6)

The hardware structure of field squaring, field multiplication and

i2 = a0a1 ⊕ a2 ⊕ a0a2 ⊕ a3 ⊕ a0a3 ⊕ a0a2a3 (9)

i3 = a1 ⊕ a2 ⊕ a3 ⊕ a0a3 ⊕ a1a3 ⊕ a2a3 ⊕ a1a2a3 (10)

In the following, we apply further simplifications on i0, i1, i2 and i3

i0 = a0(1 ⊕ a2) ⊕ a1(1 ⊕ a2) ⊕ a2(1 ⊕ a0a1) ⊕ a3(1 ⊕ a1a2)

i1 = a0(a1 ⊕ a2) ⊕ a3(1 ⊕ a1) ⊕ a0a1a3

74 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79

(192) (192) (192

IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79 75

key is presented in Fig. 14a. In the first part of computations, the

IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79 77

Table 5 Results of the proposed implementations on flexible CLEFIA cipher

78 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79

IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 2, pp. 69-79 79

You might also like