Professional Documents
Culture Documents
On Efficiency Enhancement of SHA-3 For FPGA-Based Multimodal Biometric Authentication
On Efficiency Enhancement of SHA-3 For FPGA-Based Multimodal Biometric Authentication
4, APRIL 2022
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
SRAVANI AND ANANIAH DURAI: ON EFFICIENCY ENHANCEMENT OF SHA-3 489
TABLE I TABLE II
SHA-3 P ERFORMANCE M ETRICS OF R EPORTED W ORK SHA-3 VARIANTS W ITH D IFFERENT O UTPUT L ENGTHS [13]
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
490 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 4, APRIL 2022
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
SRAVANI AND ANANIAH DURAI: ON EFFICIENCY ENHANCEMENT OF SHA-3 491
is moderate. In order to decrease the computational time a technique to lower ROM area by resizing the bit-length
further that can ultimately increase the TP, Kerckhof et al. [15] from 64 to 8. Though the area occupied was convincingly low,
proposed a design with two distinct asynchronous single the achieved maximum frequency is not satisfactorily high.
port random access memory (RAM) memories. Feasibility of A compression box technique reported in [22] minimized
parallel processing of the SHA-3 core steps [15] provided the area by 16% by reducing the round function steps; how-
better TP. In addition, a low area is achieved with the inclusion ever, TP is compromised due to large path delay. Furthermore,
of a barrel rotator to store the offset values of the “rho” compact design with significantly low area is achieved by
step, which is implemented using dedicated LUT resources employing folding factor (FR) technique in [23] and [24]. The
instead of basic multiplexer (MUX). However, though the Keccak round function is folded four times by a rescheduling
design is compact, it could not meet the TP requirement concept. A separate state memory is also utilized to store the
for image-based biometric access control applications. The offset values of “pi” step to minimize the rewriting cycles.
flexibility to enhance the TP with minimal modification of Even though a low area of around 476 slices is achieved, TP is
this architecture is not feasible due to the additional RAM poor as the number of clock cycles is four times more than
store/fetch cycles. A co-processor design was later proposed the basic/reference Keccak designs. It is observed that the FR
by Knezevic et al. [16] for fair evaluation of the reported and the number of clock cycles are thus inversely proportional
SHA-3 candidates. This co-processor architecture with a con- to TP and can be represented as follows:
trol and cryptographic FPGA design in a fully autonomous
mode yielded increased TP of 9 Gb/s. However, the word- FR ∝ Cn ∝ 1/TP. (3)
length padding scheme increased the delay/wait cycles render- Aziz [25] suggested an approach to reduce area by way of
ing a long waiting period for preparing the padded message. efficient utilization of digital signal processor (DSP) resources.
Latif et al. [17] and Jararweh et al. [18] suggested an efficient Folding technique was employed with no additional resource
FPGA implementation technique, which can boost the TP cost such as block RAM (BRAM), DSP, and RAM; how-
through efficient mapping of LUT resources. Though the area ever, invariable increase of clock cycles resulted in poor TP.
utilization was better, TP achieved was unsatisfactory. Techniques to reduce the clock cycles with additional round
The design reported in [14]–[18] includes software padder function within the loop of the core unit was success-
that lay outside of the hardware core unit which could not yield fully implemented in [26] while maintaining same area.
the required TP for complex applications. A full hardware Unrolling technique utilized here rendered poor frequency.
crypto architecture is proposed first by Provelengios et al. [19] Later, this loop unrolling technique was further enhanced by
that can promisingly cater the TP requirement for image-based El Moumni et al. [27] to decrease the number of clock cycles
biometric access control applications. A lane architectural from 24 to 2. However, the architecture consumed larger
design was suggested by implementing a complete hardware area. Major concern posed by this architecture is the weak
padder block that seamlessly interfaces with the core unit. security wall against collision attack, which was successfully
Though the TP achieved was better than the earlier designs demonstrated by Guo et al. [28]. To enhance the security
of [14]–[18], additional clock cycles required by the lane archi- strength “S” of message bits, capacity “c” bits must be larger
tecture could not meet the required performance level. A com- compared with rate “r ” bits, as shown in Table II. Based on the
plete hardware architectural design with less computational security analysis, (3) can be rewritten in terms of capacity bits
time, together with an intermodule architectural modification, and number of clock cycles/rounds, as shown in the following
enhances TP and, thereby, the efficiency is required for the equation:
target application. Efficiency is directly proportional to TP as
per (1). Enhancing the TP directly depends on frequency and S ∝ c ∝ Cn ∝ FR ∝ 1/TP. (4)
bit rate improvement; however, TP itself is inversely dependent Considering the parameter dependencies provided
on the number of clock cycles, thus requiring a design which by (1)–(4), a Keccak design with capacity as high as
yields high frequency, moderate bit rate, and low clock cycle 1024 bits that curbs collision attack is proposed to be
numbers, as per (2) designed in this work. Furthermore, folding technique to
TP (in Gb/s) achieve high efficiency is not preferred in this work as it
Ef = (1) significantly increases the clock cycle rendering poor TP.
A (in slices)
B∗ f Instead, frequency enhancement technique is employed to do
TP = (2) so. Therefore, considering (1)–(4), efficiency can be expressed
Cn
in terms of frequency as in the following equation:
where TP is throughput, A is area, B is block size, f is
maximum frequency, and Cn is the number of clock cycles. E f ∝ TP ∝ f. (5)
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
492 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 4, APRIL 2022
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
SRAVANI AND ANANIAH DURAI: ON EFFICIENCY ENHANCEMENT OF SHA-3 493
TABLE IV
PADDER RULE FOR 32 B ITS [13]
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
494 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 4, APRIL 2022
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
SRAVANI AND ANANIAH DURAI: ON EFFICIENCY ENHANCEMENT OF SHA-3 495
TABLE VI
ROUND -S PECIFIC RC S EQUENCES
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
496 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 4, APRIL 2022
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
SRAVANI AND ANANIAH DURAI: ON EFFICIENCY ENHANCEMENT OF SHA-3 497
Fig. 12. Fusion architectures: (a) base design and (b) dual-pipeline design
(Dual-p).
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
498 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 4, APRIL 2022
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
SRAVANI AND ANANIAH DURAI: ON EFFICIENCY ENHANCEMENT OF SHA-3 499
TABLE VII
I MPLEMENTATION R ESULTS OF BASE A RCHITECTURE
TABLE VIII
I MPLEMENTATION R ESULTS OF F USION D ESIGNS
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
500 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 4, APRIL 2022
ACKNOWLEDGMENT
Authors acknowledge the facilities and support extended
by VIT Chennai, Chennai, India, in carrying out the research
work successfully. The suggestions indicated by the anony-
mous reviewers that improved the quality of this article are
duly recognized and the authors express sincere thanks for
Fig. 22. Efficiency versus frequency analysis on V5.
their productive comments.
effective utilization of 5/6 input LUT resources by eliminating
the additional pipeline stage. Furthermore, performance analy- R EFERENCES
sis in terms of TP revealed that the Cascade-P design achieved
the highest TP of 24.43 Gb/s on V7, when compared with any [1] S. Thavalengal, P. Bigioi, and P. Corcoran, “Iris authentication in
handheld devices–Considerations for constraint-free acquisition,” IEEE
other existing designs, whereas Dual-f yielded the highest TP Trans. Consum. Electron., vol. 61, no. 2, pp. 245–253, May 2015.
on V5. [2] V. Talreja, M. C. Valenti, and N. M. Nasrabadi, “Deep hashing for secure
Architecture in [21] exploited the fusion of unrolling, multimodal biometrics,” IEEE Trans. Inf. Forensics Security, vol. 16,
pp. 1306–1321, 2021.
pipelining, and subpipelining techniques, to augment the [3] C. Li, J. Hu, J. Pieprzyk, and W. Susilo, “A new biocryptosystem-
efficiency as 11.47 Mb/s/slices on V6. However, the pro- oriented security analysis framework and implementation of multibio-
posed Dual-f design achieved the highest efficiency of metric cryptosystems based on decision level fusion,” IEEE Trans. Inf.
12.85 Mb/s/slices even with V5 as the target device. Fur- Forensics Security, vol. 10, no. 6, pp. 1193–1206, Jun. 2015.
[4] M. Hammad, Y. Liu, and K. Wang, “Multimodal biometric authentica-
thermore, implementation of the same architecture on V7 tion systems using convolution neural network based on different level
rendered efficiency of 15.11 Mb/s/slices. Though the area fusion of ECG and fingerprint,” IEEE Access, vol. 7, pp. 26527–26542,
and TP performances are only marginally high, the efficiency 2018.
observed is significantly high. Analysis was carried out to [5] B. Topcu, C. Karabat, M. Azadmanesh, and H. Erdogan, “Practical
security and privacy attacks against biometric hashing using sparse
determine the cause of efficiency improvement. Referring recovery,” EURASIP J. Adv. Signal Process., vol. 2016, no. 1, pp. 1–20,
to (5), it might be inferred that the fusion technique that Dec. 2016.
tends to provide frequency improvement alone has inherently [6] N. D. Hammod, “Biometric authentication based on hash Iris features,”
Int. J. Biometrics Bioinf. (IJBB), vol. 13, no. 1, pp. 1–11, 2020.
improved the efficiency parameter. However, from Fig. 22,
[7] M. M. Sravani and S. Ananiah Durai, “Attacks on cryptosystems imple-
it can be observed that though a higher frequency of around mented via VLSI: A review,” J. Inf. Secur. Appl., vol. 60, Aug. 2021,
400 MHz is achieved by Dual-p architecture, efficiency is Art. no. 102861.
found to be lower than Dual-f. Area versus efficiency inves- [8] M. Stevens, E. Bursztein, P. Karpman, A. Albertini, and Y. Markov, “The
first collision for full SHA-1,” in Advances in Cryptology–(CRYPTO)
tigations as per Table VIII revealed that a moderately low (Lecture Notes in Computer Science), vol. 10401, Mountain View, CA,
area combined with the increased frequency due to fusion USA: Springer, 2017, pp. 570–596.
of functional unit yielded high efficiency. Therefore, it can [9] T. Mladenov and S. Nooshabadi, “Implementation of reconfigurable
be concluded that both frequency improvement and effective SHA-2 hardware core,” in Proc. IEEE Asia Pacific Conf. Circuits
Syst. (APCCAS), Macao, China, Nov. 2008, pp. 1802–1805.
area utilization in Dual-f architecture resulted in efficiency [10] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, “Keccak,”
enhancement; hence, (5) can also be expressed as in Proc. Annu. Int. Conf. Theory Appl. Cryptograph. Techn. Berlin,
Germany: Springer, 2013, pp. 313–314.
E f ∝ f ∝ 1/A. (9) [11] M. Rao, T. Newe, I. Grout, and A. Mathur, “High speed implementation
of a SHA-3 core on Virtex-5 and Virtex-6 FPGAs,” J. Circuits, Syst.
Comput., vol. 25, no. 7, 2016, Art. no. 1650069.
VII. C ONCLUSION [12] S. E. Moumni, M. Fettach, and A. Tragha, “High frequency implemen-
The comparative analysis from Figs. 18 to 22 shows that tation of cryptographic hash function Keccak-512 on FPGA devices,”
Int. J. Inf. Comput. Secur., vol. 10, no. 4, pp. 361–373, 2018.
the fusion of unrolling and pipelining architecture (Dual-f) [13] M. J. Dworkin, SHA-3 Standard: Permutation-Based Hash and
has satisfactory performance in terms of efficiency. A high Extendable-Output Functions, FIPS, Standard 202, 2015.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.
SRAVANI AND ANANIAH DURAI: ON EFFICIENCY ENHANCEMENT OF SHA-3 501
[14] K. Gaj, E. Homsirikamol, and M. Rogawski, “Fair and comprehensive [36] P. Luo, “Side-channel security analysis and protection of SHA-3,” Ph.D.
methodology for comparing hardware performance of fourteen round dissertation, Dept. Comput. Eng., Northeastern Univ., Boston, MA,
two SHA-3 candidates using FPGAs,” in Cryptographic Hardware USA, 2017.
and Embedded Systems (CHES) (Lecture Notes in Computer Science), [37] P. Luo, Y. Fei, L. Zhang, and A. A. Ding, “Differential fault analysis
vol. 6225, S. Mangard and F. X. Standaert, Eds. Berlin, Germany: of SHA-3 under relaxed fault models,” J. Hardw. Syst. Secur., vol. 1,
Springer, 2010, pp. 264–278. no. 2, pp. 156–172, Jun. 2017.
[15] S. Kerckhof, F. Durvaux, N. Veyrat-Charvillon, F. Regazzoni, G. M. de [38] P. Luo, K. Athanasiou, Y. Fei, and T. Wahl, “Algebraic fault analysis of
Dormale, and F. X. Standaert, “Compact FPGA implementations of the SHA-3 under relaxed fault models,” IEEE Trans. Inf. Forensics Security,
five SHA-3 finalists,” in Proc. Int. Conf. Smart Card Res. Adv. Appl. vol. 13, no. 7, pp. 1752–1761, Jul. 2018.
Berlin, Germany: Springer, 2011, pp. 217–233. [39] B. Baldwin et al., “A hardware wrapper for the SHA-3 hash algorithms,”
[16] M. Knezevic et al., “Fair and consistent hardware evaluation of fourteen in Proc. IET Irish Signals Syst. Conf. (ISSC), Cork, Ireland, 2010,
round two SHA-3 candidates,” IEEE Trans. Very Large Scale Integr. pp. 1–6.
(VLSI) Syst., vol. 20, no. 5, pp. 827–840, May 2012. [40] P. S. Z. Chen and S. Morozov, “A hardware interface for hashing
[17] K. Latif, A. Aziz, and A. Mahboob, “Look-up table based implemen- algorithms,” Cryptol. e-Print Arch., Lyon, France, Tech. Rep. 2008/529,
tations of SHA-3 finalists: JH, Keccak and Skein,” KSII Trans. Internet 2008. [Online]. Available: http://eprint.iacr.org/
Inf. Syst., vol. 6, no. 9, pp. 2388–2404, 2012. [41] V. Conti, C. Militello, F. Sorbello, and S. Vitabile, “A multimodal
[18] Y. Jararweh, L. Tawalbeh, H. Tawalbeh, and A. Moh’d, “Hardware technique for an embedded fingerprint recognizer in mobile payment
performance evaluation of SHA-3 candidate algorithms,” J. Inf. Secur., systems,” Mobile Inf. Syst., vol. 5, no. 2, pp. 105–124, 2009.
vol. 3, no. 2, pp. 69–76, 2012. [42] A. Alzahrani and F. Gebali, “Multi-core dataflow design and implemen-
[19] G. Provelengios, P. Kitsos, N. Sklavos, and C. Koulamas, “FPGA-based tation of secure hash algorithm-3,” IEEE Access, vol. 6, pp. 6092–6102,
design approaches of Keccak hash function,” in Proc. 15th Euromicro 2018.
Conf. Digit. Syst. Design, Izmir, Turkey, Sep. 2012, pp. 5–8. [43] A. Ashok, P. Poornachandran, and K. Achuthan, “Secure authentication
[20] R. Paul and S. Shukla, “Partitioned security processor architecture on in multimodal biometric systems using cryptographic hash functions,”
FPGA platform,” IET Comput. Digit. Techn., vol. 12, no. 5, pp. 216–226, in Proc. Int. Conf. Secur. Comput. Netw. Distrib. Syst. Berlin, Germany:
Sep. 2018. Springer, 2012, pp. 168–177.
[21] M. M. Wong, J. Haj-Yahya, S. Sau, and A. Chattopadhyay, “A new high [44] D. Jagadiswary and D. Saraswady, “Biometric authentication using
throughput and area efficient SHA-3 implementation,” in Proc. IEEE Int. fused multimodal biometric,” Proc. Comput. Sci., vol. 85, pp. 109–116,
Symp. Circuits Syst. (ISCAS), Florence, Italy, May 2018, pp. 1–5. Jan. 2016.
[22] A. Arshad, D.-E.-S. Kundi, and A. Aziz, “Compact implementation [45] A. Muthukumar and S. Kannan, “AES based multimodal biometric
of SHA3-512 on FPGA,” in Proc. Conf. Inf. Assurance Cyber Secur. authentication using cryptographic level fusion with fingerprint and fin-
(CIACS), Jun. 2014, pp. 29–33. ger knuckle print,” Int. Arab J. Inf. Technol., vol. 12, no. 5, pp. 431–440,
[23] T. Honda, H. Guntur, and A. Satoh, “FPGA implementation of new 2015.
standard hash function Keccak,” in Proc. IEEE 3rd Global Conf. [46] K. Vasavi, R. University, Y. Latha, and M. Reddy Engineering College
Consum. Electron. (GCCE), Oct. 2014, pp. 275–279. for Women, “RSA cryptography based multi-modal biometric identifi-
cation system for high-security application,” Int. J. Intell. Eng. Syst.,
[24] M. Sundal and R. Chaves, “Efficient FPGA implementation of the
vol. 12, no. 1, pp. 10–21, Feb. 2019.
SHA-3 hash function,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI
[47] Xilinx. (2012). Virtex-5 User Guide V5.4. [Online]. Available:
(ISVLSI), Jul. 2017, pp. 86–91.
https://www.xilinx.com/support/documentation/user_guides/ug190.pdf
[25] D.-E.-S. Kundi and A. Aziz, “A low-power SHA-3 designs using
[48] Xilinx. (2019). VC707 Evaluation Board for the Virtex-7 FPGA.
embedded digital signal processing slice on FPGA,” Comput. Electr.
[Online]. Available: https://www.xilinx.com/support/documentation/
Eng., vol. 55, pp. 138–152, Oct. 2016.
boards_and_kits/vc707/ug885_VC707_Eval_Bd.pdf
[26] A. Gholipour and S. Mirzakuchaki, “High-speed implementation of the [49] Xilinx. (2018). PG159—Virtual Input/Output V3.0 Product Guide v3.0.
Keccak hash function on FPGA,” Int. J. Adv. Comput. Sci., vol. 2, no. 8, [Online]. Available: https://www.xilinx.com/support/documentation/
pp. 303–307, 2012. ip_documentation/vio/v3_0/pg159-vio.pdf
[27] S. El Moumni, M. Fettach, and A. Tragha, “High throughput imple-
mentation of SHA3 hash algorithm on field programmable gate array
(FPGA),” Microelectron. J., vol. 93, Nov. 2019, Art. no. 104615.
[28] J. Guo, G. Liao, G. Liu, M. Liu, K. Qiao, and L. Song, “Practical
collision attacks against round-reduced SHA-3,” J. Cryptol., vol. 33,
pp. 228–270, Jan. 2019. M. M. Sravani received the bachelor’s and
[29] A. Akin, A. Aysu, O. C. Ulusel, and E. Savaş, “Efficient hardware master’s degrees from JNTU Anantapur Uni-
implementations of high throughput SHA-3 candidates Keccak, Luffa versity, Anantapur, India, in 2012 and 2015,
and blue midnight wish for single- and multi-message hashing,” in Proc. respectively. She is currently working toward the
3rd Int. Conf. Secur. Inf. Netw., Taganrog, Russia, 2010, pp. 168–177. Ph.D. degree at the Vellore Institute of Tech-
[30] F. D. Pereira, E. D. M. Ordonez, I. D. Sakai, and A. M. de Souza, nology, Chennai, India, under the supervision of
“Exploiting parallelism on Keccak: FPGA and GPU comparison,” Par- Dr. S. Ananiah Durai.
allel Cloud Comput., vol. 2, no. 1, pp. 1–6, 2013. Her research interests are cryptographic hardware
[31] G. S. Athanasiou, G.-P. Makkas, and G. Theodoridis, “High throughput implementation, reconfigurable computing, and dig-
pipelined FPGA implementation of the new SHA-3 cryptographic hash ital logic design.
algorithm,” in Proc. 6th Int. Symp. Commun., Control Signal Process.
(ISCCSP), May 2014, pp. 538–541.
[32] F. Kahri, H. Mestiri, B. Bouallegue, and M. Machhout, “High
speed FPGA implementation of cryptographic Keccak hash function
crypto-processor,” J. Circuits, Syst. Comput., vol. 25, no. 4, 2016,
Art. no. 1650026.
[33] H. E. Michail, L. Ioannou, and A. G. Voyiatzis, “Pipelined SHA-3 S. Ananiah Durai received the Ph.D. degree in
implementations on FPGA: Architecture and performance analysis,” in integrated circuit design from Massey University,
Proc. 2nd Workshop Cryptogr. Secur. Comput. Syst., Amsterdam, The Auckland, New Zealand, in 2015.
Netherlands, 2015, pp. 13–18. He is currently an Associate Professor with
[34] P. Luo, Y. Fei, X. Fang, A. A. Ding, M. Leeser, and D. R. Kaeli, the Center for Nanoelectronics and VLSI Design,
“Power analysis attack on hardware implementation of MAC-Keccak on School of Electronics Engineering, Vellore Institute
FPGAs,” in Proc. Int. Conf. ReConFigurable Comput. FPGAs (ReCon- of Technology, Chennai, India. He has authored over
Fig14), Dec. 2014, pp. 1–7. 14 research articles published in various journals.
[35] P. Luo, Y. Fei, X. Fang, A. A. Ding, D. R. Kaeli, and M. Leeser, “Side- His research interests are analog CMOS IC design,
channel analysis of MAC-Keccak hardware implementations,” in Proc. microsensor system design with CMOS-MEMS,
4th Workshop Hardw. Architectural Support Secur. Privacy, Jun. 2015, hardware security, and on-chip signal conditioning
p. 411. circuit design.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 27,2023 at 05:40:10 UTC from IEEE Xplore. Restrictions apply.