Professional Documents
Culture Documents
2019-Efficient and Flexible Hardware Structures Ofthe 128 Bit CLEFIA Block Cipher
2019-Efficient and Flexible Hardware Structures Ofthe 128 Bit CLEFIA Block Cipher
2019-Efficient and Flexible Hardware Structures Ofthe 128 Bit CLEFIA Block Cipher
Research Article
Bahram Rashidi1
1Department of Electrical Engineering, Ayatollah Boroujerdi University, Boroujerd 69199-69411, Iran
E-mail: b.rashidi@abru.ac.ir
Abstract: In this study, high-throughput and flexible hardware implementations of the CLEFIA lightweight block cipher are
presented. A unified processing element is designed and shared for implementing of generalised Feistel network that computes
round keys and encryption process in the two separate times. The most complex blocks in the CLEFIA algorithm are
substitution boxes (S0 and S1). The S0 S-box is implemented based on area-optimised combinational logic circuits. In the
proposed S-box structure, the number of logic gates and critical path delay are reduced by using the simplification of
computation terms. The S-box S1 consists of three steps: a field inversion over F 28 and two affine transformations over F 2. The
inversion operation is implemented over the composite field F (24)2 instead of inversion over F 28 which is an important factor for the
reduction of area consumption. In addition, we proposed a flexible structure that can perform various configurations of CLEFIA
to support variable key sizes: 128, 192 and 256 bit. Implementation results of the proposed architectures in 180 nm
complementary metal–oxide–semiconductor technology for different key sizes are achieved. The results show improvements in
terms of execution time, throughput and throughput/area compared with other related works.
2.1 F-functions
Two F-functions F0 and F1 are used in the GFNd, r. Algorithms 3
and 4 (see Figs. 3 and 4) shows the computations of F-functions F0
and F1, respectively. As seen from these algorithms, the F0 and F1
are constructed based on two non-linear 8 bit S-boxes S0 and S1 and
two 4 × 4 matrices M0 and M1. In each F-function, two S-boxes S0
and S1 are used in the different order and different matrix is used Fig. 3 Algorithm 3: F0 function computation
( M0 for F0 and M1 for F1). Two matrices M0 and M1 used in each F-
function are defined as follows:
GFN4, 12 and a 256 bit GFN8, 10, respectively. For the 128 bit key
scheduling, 24 32 bit constant values CONi(128) (0 ≤ i < 24) as
round keys and K = K3 K2 K1 K0 as an input are applied to
GFN4, 12 to generate intermediate key L. In the next step, the K and
L are used to generate the round keys RK j (0 ≤ j < 36) by using 36
32 bit constant values CONi(128) (24 ≤ i < 60). Algorithm 5 (see
Fig. 5) shows the 128 bit key scheduling.
In the key scheduling for 192 bit keys, the 128 bit values KL, KR
are generated from the 192 bit main key K = K5 K4 K3 K2 K1 K0,
where Ki, (0 ≤ i ≤ 5) are 32 bit numbers. In this case,
KL = K5 K4 K3 K2 and KR = K1 K0 K̄ 5 K̄ 4, where K̄ i is not of Ki.
The constant values CONi(192), where 0 ≤ i < 40 as round keys and
Fig. 5 Algorithm 5: key scheduling for 128 bit keys KL KR as a 256 bit input are applied to GFN8, 10 to generate the two
128 bit values LL, LR. The key scheduling for a 256 bit key is
similar to 192 bit key. However, in case of 256 bit constant key
values, number of RKi and initialisation of KR are different from
case 192 bit key. Algorithm 6 (see Fig. 6) shows key scheduling for
192 and 256 bit keys.
In the key scheduling algorithms, we have the DoubleSwap
function [∑ (X)] which is defined as follows:
Table 2 Proposed simplifications of four S-boxes SS0, SS1, SS2 and SS3
Outputs After simplification by Karnaugh mapping Further simplification of algebraic terms
SS0, 3 x3′x1′x0′ + x3′x1 x0 + x2′x1′x0′ + x3 x2 x1′x0 + x3 x2 x1 x0′ + x3′x2 x0 x3 x2(x1 ⊕ x0) + x3′(x1 ⊙ x0) + x2′x0′(x3′ + x1′)
SS0, 2 x3′x2 x0 + x3 x2 x0′ + x3′x2′x1′ + x2′x1 x0 x2(x3 ⊕ x0) + x2′(x3′x1′ + x1 x0′)
SS0, 1 x3′x0 + x3′x2 x1 + x2′x1′x0′ + x2 x1 x0 x3′(x0 + x2 x1) + x2′x1′x0′ + x2 x1 x0
SS0, 0 x2 x0 + x3 x1′ + x3 x2 x2(x0 + x3) + x3 x1′
SS1, 3 x2′x1 x0 + x3′x2 x1′x0 + x3′x2 x1 x0′ + x3 x2′ + x3 x1′x0′ x3′x2(x1 ⊕ x0) + x2′(x1 x0 + x3) + x3 x1′x0′
SS1, 2 x3′x2′x1′ + x2′x0 + x3 x1′x0 + x3 x1 x0′ x3(x1 ⊕ x0) + x2′(x3′x1′ + x0)
SS1, 1 x3′x1′x0′ + x3′x2 + x3 x2′x1 + x2 x1′x0 x3′(x1′x0′ + x2) + x3 x2′x1 + x2 x1′x0
SS1, 0 x1 x0 + x2 x0 + x3 x2′x1′x0′ + x3 x2 x1 x0(x1 + x2) + x3(x2′x1′x0′ + x2 x1)
SS2, 3 x3′x1′x0′ + x2′x1′x0′ + x3 x2 x1 + x3′x2′x0 + x3′x1 x0 x3′(x1 ⊙ x0) + x2′(x1′x0′ + x3′x0) + x3 x2 x1
SS2, 2 x3′x1 + x3′x2 x0 + x3 x2′x1′ + x2 x1 x0′ x3′(x1 + x2 x0) + x3 x2′x1′ + x2 x1 x0′
SS2, 1 x2′x1 x0 + x3′x2 x1′ + x3 x2′x3′x1′x0′ x2′(x1 x0 + x3) + x3′(x2 + x0′)
SS2, 0 x3′x2′x0′ + x3 x2′x1′ + x3 x1 x0 + x3 x2 x0′ x3(x2′x1′ + x1 x0 + x2 x0′) + x3′x2′x0′
SS3, 3 x3′x2′x1′x0′ + x3′x1 x0 + x3 x2 x1′ + x3 x2′x1 + x3 x1 x0′ x3′(x2′x1′x0′ + x1 x0) + x3(x2 ⊕ x1 + x1 x0′)
SS3, 2 x3′x1 + x3 x1′x0 + x2 x1 x0′ + x3′x2 x0 x0(x3′x2 + x3 x1′) + x1(x3′ + x0′x2)
SS3, 1 x3′x2′x0′ + x3′x2 x1 x0 + x2′x1′x0 + x2 x1′x0′ + x3 x1′x0 x1′x0(x2′ + x3) + x2(x3′x1 x0 + x1′x0′) + x3′x2′x0′
SS3, 0 x2′x1 x0 + x3′x2 x0′ + x3′x2 x0′ + x3 x0 + x2 x1′x0′ x0′x2(x3′ + x1′) + x0(x3 + x2′x1)
where ai, bi ∈ F 2.
The affine transformation f and the isomorphic mapping φ is
merged to a single matrix operation as φ ∘ f
b7 0 0 0 0 0 0 0 1 a7 0
b6 0 0 0 0 0 0 1 0 a6 1
b5 0 0 0 0 1 1 0 0 a5 1
b4 0 0 0 1 0 0 1 0 a4 0
= × ⊕
b3 0 0 1 0 0 0 0 0 a3 0
b2 0 1 0 0 0 0 0 0 a2 1
b1 0 0 0 0 0 1 0 1 a1 0
b0 1 0 0 1 0 0 0 0 a0 1
b7 0 0 0 0 0 0 1 0 a7 0
b6 0 0 0 0 0 1 0 1 a6 1
b5 0 0 0 1 1 0 0 0 a5 1
b4 1 0 0 0 0 0 0 0 a4 0
= × ⊕
b3 0 1 0 0 0 0 1 0 a3 1
b2 0 0 0 0 1 1 0 0 a2 0
b1 0 0 1 1 0 0 0 0 a1 0
Fig. 9 Proposed structures of 4 bit S-boxes b0 1 0 1 0 0 0 0 0 a0 1
(a) SS0, (b) SS1, (c)SS2, (d) SS3
These matrix representations can be presented based on the
optimised structure. The CPD of S-boxes SS0, SS1, SS2 and SS3, in following computations for hardware implementation:
the proposed structures are reduced to T X + T A + 2T O, 3T X + 2T O,
2T A + 3T O and T X + T A + 2T O, respectively, where T X, T O and T A φ ∘ f : b7 = a0, b6 = a′1, b5 = (a3 ⊕ a2)′, b4 = (a4 ⊕ a1),
are time delay of the 2-input XOR gate, 2-input OR gate and the 2- b3 = a5, b2 = a′6, b1 = (a2 ⊕ a0), b0 = (a7 ⊕ a4)′
input AND gate, respectively.
The area and CPD of S-box S0 when directly synthesising the (see (1))
equations using Synopsys Design Compiler tool based on the The hardware resources for implementation of the merged
library of standard cells with 180 nm CMOS technology are equal mappings φ ∘ f and g ∘ φ−1 are equal to (two 2-input XOR gates,
to 209 gate equivalent (GE) and 0.964 ns, respectively. two 2-input XNOR gates and two NOT gates) and (two 2-input
XOR gates and four 2-input XNOR gates), respectively.
3.1.2 Hardware structure of the S-box S1: As mentioned In the following, we present the proposed hardware structure of
before, the S-box S1 consists of three steps: two affine the field inversion over composite field F (24)2. Let n1 β + n0 be an
transformations f and g over F 2 and a field inversion over F 28 [5]. arbitrary element in F (24)2, where n1, n0 ∈ F 24, the inversion
Implementation of inversion operation over the composite field m1 β + m0 = (n1 β + n0)−1 (m1, m0 ∈ F 24) can be calculated as follows:
instead of inversion over F 28 is an important factor for reduction of
area consumption [18]. Here, for the proposed implementation, we m1 = n1Δ−1,
use the composite field F (24)2 defined by the following irreducible (2)
polynomials [5]: m0 = (n1 + n0)Δ−1,
i2 = a0(a1 ⊕ a2) ⊕ (a2 ⊕ a3) ⊕ a0a3(1 ⊕ a2) Fig. 11 Proposed structures for generating the constant values
= a0(a1 ⊕ a2) ⊕ a2 ⊕ a3 ⊕ a0a3a′2 (a) CON(k)
i , (b) Multiplication by (0 × 0002)
−1
over F 216
(13)
= a0(a1 ⊕ a2) ⊕ a2 ⊕ a3(1 ⊕ a0a′2)
3.2 Proposed hardware structures of key schedules of
= [a0(a1 ⊕ a2) ⊕ a2 ⊕ a3(a0a′2)′] CLEFIA block cipher
i3 = a0(1 ⊕ a2) ⊕ a1 ⊕ a1(1 ⊕ a2) ⊕ a2(1 ⊕ a0a1) ⊕ a3(1 ⊕ a1a2) Fig. 11a shows the proposed structure for generation of the
constant values. In this structure, four constant values are generated
= a0a′2 ⊕ a1a′2 ⊕ a2(a0a1)′ ⊕ a3(a1a2)′
at each clock cycle (CC). The initial value IV(k) is loaded into Reg1
= [a′2(a0 ⊕ a1) ⊕ a2(a0a1)′ ⊕ a3(a1a2)′]
(14) and CON(0k) to CON(3k) are generated in the first CC, while LD
signal is equal to ‘1’. In the other CCs LD signal is equal to ‘0’ and
the next constant values are computed. The constant values for
128, 192 and 256 bit keys are generated by the proposed structure
in 15, 21 and 23 CCs, respectively. If A, B ∈ F 216 for
Fig. 10e shows the proposed efficient hardware structure for field implementation multiplication by (0×0002)−1 (or z−1), i.e.
inversion over F 24. The hardware consumption and CPD for the B = z−1 A mod f 3(z) we have [20, 21]
original field inversion expressions are equal to (23 2-input XOR
gates and 11 2-input AND gates) and (3T X + 2T A), respectively. B = [a0, (a15 ⊕ a0), a14, (a13 ⊕ a0), a12, (a11 ⊕ a0),
After further simplifications on i0, i1, i2 and i3 terms, the hardware (15)
a10, a9, a8, a7, a6, (a5 ⊕ a0), (a4 ⊕ a0), a3, a2, a1] .
consumption and CPD are equal to (11 2-input XOR gates, 9 2-
input AND gates, 3 2-input NAND gates and 3 NOT gates) and
(3T X + T A), respectively. Therefore, in the proposed structure, the As seen from Fig. 11b, the multiplication by (0 × 0002)−1 is
implemented based on five XOR gates.
number of logic gates and CPD is reduced.
The proposed structures for generation of round keys for cases
The area and CPD of S-box S1 when directly synthesising the
128 and 192/256 bit main keys are shown in Figs. 12a and b,
equations using DC compiler are equal to 291 GE and 2.59 ns, respectively. Multiplexer with control signal Ct and D flip-flop is
respectively. used for implementation of two conditions i odd and i even in
Table 3 Operations and control signals of the proposed structure for 192 bit round key generation
Control signals St1, St2, St3, St4, St5, en0, en1 Operations
(192) (192) (192
0100001 (192) CON41 CON42 CON43 )
LL ⊕ (CON40 )
(192) (192) (192
0001100 (192) CON45 CON46 CON47 )
∑ (LL) ⊕ KR ⊕ (CON44 )
(192) (192) (192
1010010 (192) CON49 CON50 CON51 )
LR ⊕ (CON48 )
(192) (192) (192
0010100 (192) CON53 CON54 CON55 )
∑ (LR) ⊕ KL ⊕ (CON52 )
(192) (192) (192
0000001 2 (192) CON57 CON58 CON59 )
∑ (LL) ⊕ (CON56 )
(192) (192) (192
0001101 3 (192) CON61 CON62 CON63 )
∑ (LL) ⊕ KR ⊕ (CON60 )
(192) (192) (192
0010010 2 (192) CON65 CON66 CON67 )
∑ (LR) ⊕ (CON64 )
(192) (192) (192
0010110 3 (192) CON69 CON70 CON71 )
∑ (LR) ⊕ KL ⊕ (CON68 )
(192) (192) (192
0000001 4 (192) CON73 CON74 CON75 )
∑ (LL) ⊕ (CON72 )
(192) (192) (192
0001101 5 (192) CON77 CON78 CON79 )
∑ (LL) ⊕ KR ⊕ (CON76 )
(192) (192) (192
0010010 4 (192) CON81 CON82 CON83 )
∑ (LR) ⊕ (CON80 )
the realisation of the CLEFIA block cipher with different security compared in this section. The ASIC results are achieved by using
levels and security configuration management. Here, key sizes 128, Synopsys Design Compiler tool based on the library of standard
192 and 256 are supported by the proposed structure. The proposed cells with 180 nm CMOS technology. The area is usually measured
flexible structure for implementation of the CLEFIA block cipher in μm2 and GE, where one GE is equivalent to the area of a 2-input
is shown in Fig. 15. The width of the registers and multiplexers are NAND gate with the lowest driving strength of the corresponding
equal to 32 bit. The structures are configured based on five control technology. In other words, this metric represents the amount of
signals Sr0, Sr1, Sr2 and Sel128_192_256. The control signal Sel128_192_256 consumed area normalised to the area of one 2-input NAND gate.
is used to select the initial constant values IV(k) for related key size The performance and results of the designs are evaluated in terms
configuration. The configuration between generating L from the of CPD, area, computation time, throughput and throughput/area.
main key and encryption process (round keys are generated The results of the proposed works are achieved based on
concurrently with the encryption process) is performed based on different key sizes (128, 192 and 256 bit). Table 4 shows the results
the control signal Sr0. In this case, if Sr0 = 0, the structure is of the proposed implementations and other related works for
configured for generating the L from the main key and also if CLEFIA cipher. The area and critical path values are achieved
Sr0 = 1 the encryption process is started. The timing diagram of the from post-synthesis results (after performing place and route). In
proposed flexible structure configured for 256 bit key is presented this table, configurations (128-128, 128-192 and 128-256) are
in Fig. 14b. For both parts generating the L from main key and separated to provide the comparisons between the existing designs
encryption process, in the first CC, the control signal Sr1 is equal to and the proposed ones. The optimisation in works [6, 9, 10] is
‘1’ and for other CCs this control signal is equal to ‘0’. The control based on area and speed. In [9], five high-performance hardware
signal Sr2 is used for selecting between cases 192 and 256 bit keys architectures for the 128 bit CLEFIA block cipher is presented. In
(for Sr2 = ‘1’ 192 bit key is selected; otherwise, 256 bit key is [10], both fault diagnosis schemes and original structures for the
CLEFIA are proposed. Akishita and Hiwatari [8] proposed very
selected for computations). The hardware consumption and CPD of
compact hardware implementations of CLEFIA with 128 bit keys
the proposed flexible structure of the CLEFIA block cipher are
based on 8 bit shift registers. The implementations are based on
presented in the following section.
serialised architectures in the data processing part.
The employed 180 nm ASIC technology is not so recent
5 Results and comparison compared with the 90 and 65 nm ones, which were used in the
The ASIC implementation results of the proposed structures with competitive designs. To ensure a fair and clear comparison, for
other ASIC implementations of the CLEFIA block cipher are different technologies (90 and 65 nm), the parameters CPD, time,
throughput and throughput/area (Table 4) of other works are important factor for the reduction of area consumption. Besides, we
normalised (scaled) to 180 nm technology based on presented proposed a flexible structure that can perform various
methods in works [22, 23]. The proposed structures archived the configurations of CLEFIA cipher to support variable key sizes 128,
better timing information based on the normalised results. As given 192 and 256 bit. This architecture provides a versatile
in this table, the proposed structures consume acceptable hardware implementation that supports different security levels using a
resources with reasonable timing characteristics compared with the variable key size. Implementation results of the proposed
other architectures. We also get improvements in terms of area, architectures in 180 nm CMOS technology for different key sizes
throughput and throughput/area. are achieved. The results show improvements in terms of area,
Results of the proposed implementation on flexible CLEFIA throughput and throughput/area compared with other existing
cipher are shown in Table 5. This structure supports three main key works.
sizes of 128, 192 and 256 bit. As given in Table 5, the proposed
structure consumes an acceptable area with low CPDs. 7 References
[1] Hatzivasilis, G., Fysarakis, K., Papaefstathiou, I., et al.: ‘A review of
6 Conclusion lightweight block ciphers’, J. Cryptogr. Eng., 2018, 8, (2), pp. 141–184
[2] Mohd, B.J., Hayajneh, T., Vasilakos, A.V.: ‘A survey on lightweight block
Efficient and flexible ASIC implementations of the lightweight ciphers for low-resource devices: comparative study and open issues’, J.
CLEFIA block cipher are presented. The most complex blocks in Cryptogr. Eng., 2015, 58, pp. 73–93
the CLEFIA algorithm are substitution boxes (S0 and S1). The S0 S- [3] Kitsos, P., Sklavos, N., Parousi, M., et al.: ‘A comparative study of hardware
box is implemented based on area-optimised combinational logic architectures for lightweight block ciphers’, J. Cryptogr. Eng., 2012, 38, pp.
148–160
circuits by further simplification of the algebraic structure of S- [4] Rezaeian Farashahi, R., Rashidi, B., Sayedi, S.M.: ‘FPGA based fast and
boxes. The S-box S1 is implemented based on a field inversion over high-throughput 2-slow retiming 128 14;bit AES encryption algorithm’,
F 28 and two affine transformations over F 2. The proposed structure Microelectron. J., 2014, 45, pp. 1014–1025
[5] The 128 bit block cipher CLEFIA: algorithm specification, Sony Corporation,
of inverse is designed over the composite field F (24)2, which is an Version 1, 2010