Proceedings of Data Analytics and Management

Lecture Notes in Networks and Systems 788
Abhishek Swaroop
Zdzislaw Polkowski
Sérgio Duarte Correia
Bal Virdee Editors
Proceedings
of Data
Analytics and
Management
ICDAM 2023, Volume 4
Lecture Notes in Networks and Systems
Volume 788
Series Editor
Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of
Campinas—UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Türkiye
Derong Liu, Department of Electrical and Computer Engineering, University of
Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Abhishek Swaroop · Zdzislaw Polkowski ·
Sérgio Duarte Correia · Bal Virdee
Editors
Proceedings of Data
Analytics and Management
ICDAM 2023, Volume 4
Editors
Abhishek Swaroop Zdzislaw Polkowski
Department of Information Technology Jan Wyzykowski University
Bhagwan Parshuram Institute Polkowice, Poland
of Technology
New Delhi, Delhi, India Bal Virdee
Centre for Communications Technology
Sérgio Duarte Correia London Metropolitan University
Polytechnic Institute of Portalegre London, UK
Portalegre, Portugal
ISSN 2367-3370 ISSN 2367-3389 (electronic)

Lecture Notes in Networks and Systems
ISBN 978-981-99-6552-6 ISBN 978-981-99-6553-3 (eBook)
https://doi.org/10.1007/978-981-99-6553-3
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Paper in this product is recyclable.

ICDAM 2023 Steering Committee Members
Patrons
Prof. (Dr.) Don MacRaild, Pro-Vice Chancellor, London Metropolitan University,

London
Prof. (Dr.) Wioletta Palczewska, Rector, The Karkonosze State University of Applied
Sciences in Jelenia Góra, Poland
Prof. (Dr.) Beata Tel˛ażka, Vice-Rector, The Karkonosze State University of Applied
Sciences in Jelenia Góra
General Chairs
Prof. Dr. Janusz Kacprzyk, Polish Academy of Sciences, Systems Research Institute,
Poland
Prof. Dr. Karim Ouazzane, London Metropolitan University, London
Prof. Dr. Bal Virdee, London Metropolitan University, London
Prof. Cesare Alippi, Polytechnic University of Milan, Italy
Honorary Chairs
Prof. Dr. Aboul Ella Hassanien, Cairo University, Egypt

Prof. Dr. Vaclav Snasel, Rector, VSB-Technical University of Ostrava, Czech
Republic
Prof. Chris Lane, London Metropolitan University, London
v
vi ICDAM 2023 Steering Committee Members
Conference Chairs
Prof. Dr. Vassil Vassilev, London Metropolitan University, London

Dr. Pancham Shukla, Imperial College London, London
Prof. Dr. Mak Sharma, Birmingham City University, London
Dr. Shikun Zhou, University of Portsmouth
Dr. Magdalena Baczyńska, Dean, The Karkonosze State University of Applied
Sciences in Jelenia Góra, Poland
Dr. Zdzislaw Polkowski, Adjunct Professor KPSW, The Karkonosze State University
of Applied Sciences in Jelenia Góra
Prof. Dr. Abhishek Swaroop, Bhagwan Parshuram Institute of Technology, Delhi,
India
Prof. Dr. Anil K. Ahlawat, Dean, KIET Group of Institutes, India
Technical Program Chairs
Dr. Shahram Salekzamankhani, London Metropolitan University, London

Dr. Mohammad Hossein Amirhosseini, University of East London, London
Dr. Sandra Fernando, London Metropolitan University, London
Dr. Qicheng Yu, London Metropolitan University, London
Prof. Joel J. P. C. Rodrigues, Federal University of Piauí (UFPI), Teresina—PI, Brazil
Dr. Ali Kashif Bashir, Manchester Metropolitan University, UK
Dr Rajkumar Singh Rathore, Cardiff Metropolitan University, UK
Conveners
Dr. Ashish Khanna, Maharaja Agrasen Institute of Technology (GGSIPU), New

Delhi, India
Dr. Deepak Gupta, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi,
India
Publicity Chairs
Dr. Józef Zaprucki, Prof. KPSW, Rector’s Proxy for Foreign Affairs, The Karkonosze
State University of Applied Sciences in Jelenia Góra
Dr. Umesh Gupta, Bennett University, India
Dr. Puneet Sharma, Assistant Professor, Amity University, Noida
ICDAM 2023 Steering Committee Members vii
Dr. Deepak Arora, Professor and Head (CSE), Amity University, Lucknow Campus
João Matos-Carvalho, Lusófona University, Portugal
Co-conveners
Mr. Moolchand Sharma, Maharaja Agrasen Institute of Technology, India

Dr. Richa Sharma, London Metropolitan University, London
Preface
We hereby are delighted to announce that The London Metropolitan University,

London, in collaboration with The Karkonosze University of Applied Sciences,
Poland, Politécnico de Portalegre, Portugal, and Bhagwan Parshurm Institute of Tech-
nology, India, has hosted the eagerly awaited and much coveted International Confer-
ence on Data Analytics and Management (ICDAM 2023). The fourth version of the
conference was able to attract a diverse range of engineering practitioners, academi-
cians, scholars, and industry delegates, with the reception of abstracts including
more than 7000 authors from different parts of the world. The committee of profes-
sionals dedicated toward the conference is striving to achieve a high-quality technical
program with tracks on data analytics, data management, big data, computational
intelligence, and communication networks. All the tracks chosen in the conference
are interrelated and are very famous among present-day research community. There-
fore, a lot of research is happening in the above-mentioned tracks and their related
sub-areas. More than 1200 full-length papers have been received, among which
the contributions are focused on theoretical, computer simulation-based research,
and laboratory-scale experiments. Among these manuscripts, 190 papers have been
included in the Springer proceedings after a thorough two-stage review and editing
process. All the manuscripts submitted to the ICDAM 2023 were peer-reviewed by at
least two independent reviewers, who were provided with a detailed review pro forma.
The comments from the reviewers were communicated to the authors, who incorpo-
rated the suggestions in their revised manuscripts. The recommendations from two
reviewers were taken into consideration while selecting a manuscript for inclusion in
the proceedings. The exhaustiveness of the review process is evident, given the large
number of articles received addressing a wide range of research areas. The stringent
review process ensured that each published manuscript met the rigorous academic
and scientific standards. It is an exalting experience to finally see these elite contri-
butions materialize into the four book volumes as ICDAM proceedings by Springer
entitled “Proceedings of Data Analytics and Management: ICDAM 2023”.
ICDAM 2023 invited four keynote speakers, who are eminent researchers in the
field of computer science and engineering, from different parts of the world. In
ix
x Preface
addition to the plenary sessions on each day of the conference, seventeen concur-
rent technical sessions are held every day to assure the oral presentation of around
190 accepted papers. Keynote speakers and session chair(s) for each of the concur-
rent sessions have been leading researchers from the thematic area of the session.
The delegates were provided with a book of extended abstracts to quickly browse
through the contents, participate in the presentations, and provide access to a broad
audience of the audience. The research part of the conference was organized in a
total of 22 special sessions. These special sessions provided the opportunity for
researchers conducting research in specific areas to present their results in a more
focused environment.
An international conference of such magnitude and release of the ICDAM 2023
proceedings by Springer has been the remarkable outcome of the untiring efforts
of the entire organizing team. The success of an event undoubtedly involves the
painstaking efforts of several contributors at different stages, dictated by their devo-
tion and sincerity. Fortunately, since the beginning of its journey, ICDAM 2023 has
received support and contributions from every corner. We thank them all who have
wished the best for ICDAM 2023 and contributed by any means toward its success.
The edited proceedings volumes by Springer would not have been possible without
the perseverance of all the steering, advisory, and technical program committee
members.
All the contributing authors owe thanks from the organizers of ICDAM 2023
for their interest and exceptional articles. We would also like to thank the authors
of the papers for adhering to the time schedule and for incorporating the review
comments. We wish to extend my heartfelt acknowledgment to the authors, peer
reviewers, committee members, and production staff whose diligent work put shape
to the ICDAM 2023 proceedings. We especially want to thank our dedicated team of
peer reviewers who volunteered for the arduous and tedious step of quality checking
and critique on the submitted manuscripts. We wish to thank our faculty colleague Mr.
Moolchand Sharma for extending their enormous assistance during the conference.
The time spent by them and the midnight oil burnt is greatly appreciated, for which we
will ever remain indebted. The management, faculties, administrative, and support
staff of the college have always been extending their services whenever needed, for
which we remain thankful to them.
Lastly, we would like to thank Springer for accepting our proposal for publishing
the ICDAM 2023 conference proceedings. Help received from Mr. Aninda Bose, the
acquisition senior editor, in the process has been very useful.
New Delhi, India Abhishek Swaroop

Polkowice, Poland Zdzislaw Polkowski
Portalegre, Portugal Sérgio Duarte Correia
London, UK Bal Virdee
Contents
Deep Spectral Feature Representations Via Attention-Based

Neural Network Architectures for Accented Malayalam
Speech—A Low-Resourced Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Rizwana Kallooravi Thandil, K. P. Mohamed Basheer, and V. K. Muneer
Improving Tree-Based Convolutional Neural Network Model
for Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Saba Raees and Parul Agarwal
Smartphone Malware Detection Based on Enhanced
Correlation-Based Feature Selection on Permissions . . . . . . . . . . . . . . . . . . 29
Shagun, Deepak Kumar, and Anshul Arora
Fake News Detection Using Ensemble Learning Models . . . . . . . . . . . . . . . 53
Devanshi Singh, Ahmad Habib Khan, and Shweta Meena
Ensemble Approach for Suggestion Mining Using Deep Recurrent
Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Usama Bin Rashidullah Khan, Nadeem Akhtar, and Ehtesham Sana
A CNN-Based Self-attentive Approach to Knowledge Tracing . . . . . . . . . . 77
Anasuya Mithra Parthaje, Akaash Nidhiss Pandian, and Bindu Verma
LIPFCM: Linear Interpolation-Based Possibilistic Fuzzy C-Means
Clustering Imputation Method for Handling Incomplete Data . . . . . . . . . 87
Jyoti, Jaspreeti Singh, and Anjana Gosain
Experimental Analysis of Two-Wheeler Headlight Illuminance
Data from the Perspective of Traffic Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Aditya Gola, Chandra Mohan Dharmapuri, Neelima Chakraborty,
S. Velmurugan, and Vinod Karar
Detecto: The Phishing Website Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Ashish Prajapati, Jyoti Kukade, Akshat Shukla, Atharva Jhawar,
Amit Dhakad, Trapti Mishra, and Rahul Singh Pawar
xi
xii Contents
Synergizing Voice Cloning and ChatGPT for Multimodal

Conversational Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Shruti Bibra, Srijan Singh, and R. P. Mahapatra
A Combined PCA-CNN Method for Enhanced Machinery Fault
Diagnosis Through Fused Spectrogram Analysis . . . . . . . . . . . . . . . . . . . . . 141
Harshit Rajput, Hrishabh Palsra, Abhishek Jangid, and Sachin Taran
FPGA-Based Design of Chaotic Systems with Quadratic
Nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Kriti Suneja, Neeta Pandey, and Rajeshwari Pandey
A Comprehensive Survey on Replay Strategies for Object Detection . . . . 163
Allabaksh Shaik and Shaik Mahaboob Basha
Investigation of Statistical and Machine Learning Models
for COVID-19 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Joydeep Saggu and Ankita Bansal
SONAR-Based Sound Waves’ Utilization for Rocks’ and Mines’
Detection Using Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Adrija Mitra, Adrita Chakraborty, Supratik Dutta, Yash Anand,
Sushruta Mishra, and Anil Kumar
A Sampling-Based Logistic Regression Model for Credit Card
Fraud Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Prapti Patra, Srijal Vedansh, Vishisht Ved, Anup Singh,
iFlow: Powering Lightweight Cross-Platform Data Pipelines . . . . . . . . . . . 211
Supreeta Nayak, Ansh Sarkar, Dushyant Lavania, Nittishna Dhar,
Developing a Deep Learning Model to Classify Cancerous
and Non-cancerous Lung Nodules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Rishit Pandey, Sayani Joddar, Sushruta Mishra, Ahmed Alkhayyat,
Shaid Sheel, and Anil Kumar
Concrete Crack Detection Using Thermograms and Neural
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Mabrouka Abuhmida, Daniel Milne, Jiping Bai, and Ian Wilson
Wind Power Prediction in Mediterranean Coastal Cities Using
Multi-layer Perceptron Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Youssef Kassem, Hüseyin Çamur,
and Abdalla Hamada Abdelnaby Abdelnaby
Next Generation Intelligent IoT Use Case in Smart Manufacturing . . . . . 265
Bharati Rathore
Contents xiii
Forecasting Financial Success App: Unveiling the Potential

of Random Forest in Machine Learning-Based Investment
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Ashish Khanna, Divyansh Goyal, Nidhi Chaurasia,
and Tariq Hussain Sheikh
Integration of Blockchain-Enabled SBT and QR Code Technology
for Secure Verification of Digital Documents . . . . . . . . . . . . . . . . . . . . . . . . . 293
Ashish Khanna, Devansh Singh, Ria Monga, Tarun Kumar,
Ishaan Dhull, and Tariq Hussain Sheikh
Time Series Forecasting of NSE Stocks Using Machine Learning
Models (ARIMA, Facebook Prophet, and Stacked LSTM) . . . . . . . . . . . . . 303
Prabudhd Krishna Kandpal, Shourya, Yash Yadav, and Neelam Sharma
Analysis of Monkey Pox (MPox) Detection Using UNETs
and VGG16 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
V. Kakulapati
Role of Robotic Process Automation in Enhancing Customer
Satisfaction in E-commerce Through E-mail Automation . . . . . . . . . . . . . . 333
Shamini James, S. Karthik, Binu Thomas, and Nitish Pathak
Gene Family Classification Using Machine Learning:
A Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Drishti Seth, KPA Dharmanshu Mahajan, Rohit Khanna,
and Gunjan Chugh
Dense Convolution Neural Network for Lung Cancer Classification
and Staging of the Diseases Using NSCLC Images . . . . . . . . . . . . . . . . . . . . 361
Ahmed J. Obaid, S. Suman Rajest, S. Silvia Priscila, T. Shynu,
and Sajjad Ali Ettyem
Sentiment Analysis Using Bi-ConvLSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
Durga Satish Matta and K. Saruladha
A New Method for Protein Sequence Comparison Using Chaos
Game Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Debrupa Pal, Sudeshna Dey, Papri Ghosh, Subhram Das,
and Bansibadan Maji
Credit Card Fraud Detection and Classification Using Deep
Learning with Support Vector Machine Techniques . . . . . . . . . . . . . . . . . . . 399
Fatima Adel Nama, Ahmed J. Obaid,
and Ali Abdulkarem Habib Alrammahi
Prediction of Criminal Activities Forecasting System and Analysis
Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Mahendra Sharma and Laveena Sehgal
xiv Contents
Comparing Techniques for Digital Handwritten Detection Using

CNN and SVM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
M. Arvindhan, Shubham Upadhyay, Avdeep Malik,
Sudeshna Chakraborty, and Kimmi Gupta
Optimized Text Summarization Using Abstraction and Extraction . . . . . 445
Harshita Patel, Pallavi Mishra, Shubham Agarwal, Aanchal Patel,
and Stuti Hegde
Mall Customer Segmentation Using K-Means Clustering . . . . . . . . . . . . . . 459
Ashwani, Gurleen Kaur, and Lekha Rani
Modified Local Gradient Coding Pattern (MLGCP):
A Handcrafted Feature Descriptor for Classification of Infectious
Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Rohit Kumar Bondugula and Siba K. Udgata
Revolutionising Food Safety Management: The Role of Blockchain
Technology in Ensuring Safe and High-Quality Food Products . . . . . . . . . 487
Urvashi Sugandh, Swati Nigam, and Manju Khari
Securing the E-records of Patient Data Using the Hybrid
Encryption Model with Okamoto–Uchiyama Cryptosystem
in Smart Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Prasanna Kumar Lakineni, R. Balamanigandan, T. Rajesh Kumar,
V. Sathyendra Kumar, R. Mahaveerakannan, and Chinthakunta Swetha
Cuttlefish Algorithm-Based Deep Learning Model to Predict
the Missing Data in Healthcare Application . . . . . . . . . . . . . . . . . . . . . . . . . . 513
A. Sasi Kumar, T. Rajesh Kumar, R. Balamanigandan, R. Meganathan,
Roshan Karwa, and R. Mahaveerakannan
Drowsiness Detection System Using DL Models . . . . . . . . . . . . . . . . . . . . . . 529
Umesh Gupta, Yelisetty Priya Nagasai, and Sudhanshu Gupta
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543

Editors and Contributors
About the Editors
Prof. (Dr.) Abhishek Swaroop completed his B.Tech. (CSE) from GBP University
of Agriculture and Technology, M.Tech. from Punjabi University Patiala, and Ph.D.
from NIT Kurukshetra. He has industrial experience of 8 years in organizations like
Usha Rectifier Corporations and Envirotech Instruments Pvt. Limited. He has 22
years of teaching experience. He has served in reputed educational institutions such as
Jaypee Institute of Information Technology, Noida, Sharda University Greater Noida,
and Galgotias University Greater Noida. He has served at various administrative
positions such as Head of the Department, Division Chair, NBA Coordinator for the
university, and Head of training and placements. Currently, he is serving as Professor
and HoD, Department of Information Technology in Bhagwan Parshuram Institute
of Technology, Rohini, and Delhi. He is actively engaged in research. He has more
than 60 quality publications, out of which eight are SCI and 16 Scopus.
Prof. (Dr.) Zdzislaw Polkowski is Adjunct Professor at Faculty of Technical

Sciences at the Jan Wyzykowski University, Poland. He is also Rector’s Repre-
sentative for International Cooperation and Erasmus Program and Former Dean of
the Technical Sciences Faculty during the period of 2009–2012 His area of research
includes management information systems, business informatics, IT in business and
administration, IT security, small medium enterprises, CC, IoT, big data, business
intelligence, and block chain. He has published around 60 research articles. He
has served the research community in the capacity of Author, Professor, Reviewer,
Keynote Speaker, and Co-editor. He has attended several international conferences
in the various parts of the world. He is also playing the role of Principal Investigator.
Prof. Sérgio Duarte Correia received his Diploma in Electrical and Computer
Engineering from the University of Coimbra, Portugal, in 2000, the master’s degree in
Industrial Control and Maintenance Systems from Beira Interior University, Covilhã,
Portugal, in 2010, and the Ph.D. in Electrical and Computer Engineering from the
xv
xvi Editors and Contributors
University of Coimbra, Portugal, in 2020. Currently, he is Associate Professor at

the Polytechnic Institute of Portalegre, Portugal. He is Researcher at COPELABS—
Cognitive and People-centric Computing Research Center, Lusófona University of
Humanities and Technologies, Lisbon, Portugal, and Valoriza—Research Center for
Endogenous Resource Valorization, Polytechnic Institute of Portalegre, Portalegre,
Portugal. Over past 20 years, he has worked with several private companies in the field
of product development and industrial electronics. His current research interests are
artificial intelligence, soft computing, signal processing, and embedded computing.
Prof. Bal Virdee graduated with a B.Sc. (Engineering) Honors in Communication

Engineering and M.Phil. from Leeds University, UK. He obtained his Ph.D. from
University of North London, UK. He was worked as Academic at Open Univer-
sity and Leeds University. Prior to this, he was Research and Development Elec-
tronic Engineer in the Future Products Department at Teledyne Defence (formerly
Filtronic Components Ltd., Shipley, West Yorkshire) and at PYE TVT (Philips)
in Cambridge. He has held numerous duties and responsibilities at the university,
i.e., Health and Safety Officer, Postgraduate Tutor, Examination’s Officer, Admis-
sion’s Tutor, Short Course Organizer, Course Leader for M.Sc./M.Eng. Satellite
Communications, B.Sc. Communications Systems, and B.Sc. Electronics. In 2010,
he was appointed as Academic Leader (UG Recruitment). He is Member of ethical
committee and Member of the school’s research committee and research degrees
committee.
Contributors
Abdalla Hamada Abdelnaby Abdelnaby Faculty of Engineering, Mechanical

Engineering Department, Near East University, Nicosia, North Cyprus, Cyprus
Mabrouka Abuhmida University of South Wales, Cardiff, UK
Parul Agarwal Department of Computer Science and Engineering, School of
Engineering Sciences and Technology, Jamia Hamdard University, New Delhi, India
Shubham Agarwal School of Information Technology, VIT University, Vellore,
India
Nadeem Akhtar Department of Computer Engineering and Interdisciplinary
Centre for Artificial Intelligence, Aligarh Muslim University, Aligarh, Uttar Pradesh,
India
Ahmed Alkhayyat Faculty of Engineering, The Islamic University, Najaf, Iraq
Ali Abdulkarem Habib Alrammahi National University of Science and Tech-
nology, Thi-Qar, Nasiriyah, Iraq
Editors and Contributors xvii
Yash Anand Kalinga Institute of Industrial Technology, Deemed to be University,

Bhubaneswar, India
Anshul Arora Delhi Technological University, Rohini, Delhi, New Delhi, India
M. Arvindhan School of Computing Science and Engineering, Galgotias Univer-
sity, Greater Noida, Uttar Pradesh, India
Ashwani Chitkara University Institute of Engineering and Technology, Rajpura,
Punjab, India
Jiping Bai University of South Wales, Cardiff, UK
R. Balamanigandan Department of Computer Science and Engineering, Saveetha
School of Engineering, Saveetha Institute of Medical and Technical Sciences,
Chennai, Tamil Nadu, India
Ankita Bansal Netaji Subhas University of Technology, Dwarka, India
Shaik Mahaboob Basha N.B.K.R. Institute of Science and Technology, Affiliated
to Jawaharlal Nehru Technological University Anantapur, Vidyanagar, Ananthapu-
ramu, Andhra Pradesh, India
Shruti Bibra SRM Institute of Science and Technology, Ghaziabad, India
Rohit Kumar Bondugula AI Lab, School of Computer and Information Sciences,
University of Hyderabad, Hyderabad, India
Hüseyin Çamur Faculty of Engineering, Mechanical Engineering Department,
Near East University, Nicosia, North Cyprus, Cyprus
Adrita Chakraborty Kalinga Institute of Industrial Technology, Deemed to be
University, Bhubaneswar, India
Neelima Chakraborty CSIR—Central Road Research Institute, New Delhi, India
Sudeshna Chakraborty School of Computing Science and Engineering, Galgotias
University, Greater Noida, Uttar Pradesh, India
Nidhi Chaurasia Maharaja Agrasen Institute of Technology, Guru Gobind Singh
Indraprastha, University Delhi, Delhi, India
Gunjan Chugh Department of Artificial Intelligence and Machine Learning,
Maharaja Agrasen Institute of Technology, Delhi, India
Subhram Das Narula Institute of Technology, Kolkata, India
Sudeshna Dey Narula Institute of Technology, Kolkata, India
Amit Dhakad Medi-Caps University, Indore, India
Nittishna Dhar Kalinga Institute of Industrial Technology, Deemed to be Univer-
sity, Bhubaneswar, India
xviii Editors and Contributors
KPA Dharmanshu Mahajan Department of Artificial Intelligence and Machine

Learning, Maharaja Agrasen Institute of Technology, Delhi, India
Chandra Mohan Dharmapuri G. B. Pant Government Engineering College, New
Delhi, India
Ishaan Dhull Department of Computer Science and Engineering, Maharaja
Agrasen Institute of Technology, GGSIPU, Delhi, India
Supratik Dutta Kalinga Institute of Industrial Technology, Deemed to be Univer-
Sajjad Ali Ettyem National University of Science and Technology, Thi-Qar, Iraq
Papri Ghosh Narula Institute of Technology, Kolkata, India
Aditya Gola G. B. Pant Government Engineering College, New Delhi, India
Anjana Gosain USICT, Guru Gobind Singh Indraprastha University, New Delhi,
India
Divyansh Goyal Maharaja Agrasen Institute of Technology, Guru Gobind Singh
Indraprastha, University Delhi, Delhi, India
Kimmi Gupta School of Computing Science and Engineering, Galgotias Univer-
Sudhanshu Gupta SCSET, Bennett University, Greater Noida, Uttar Pradesh, India
Umesh Gupta SCSET, Bennett University, Greater Noida, Uttar Pradesh, India
Stuti Hegde School of Information Technology, VIT University, Vellore, India
Shamini James Kalasalingam Academy of Research and Education, Krishnankoil,
Tamil Nadu, India
Abhishek Jangid Department of Electronics and Communication Engineering,
Delhi Technological University, Delhi, India
Atharva Jhawar Medi-Caps University, Indore, India
Sayani Joddar Kalinga Institute of Industrial Technology, Deemed to Be Univer-
Jyoti USICT, Guru Gobind Singh Indraprastha University, New Delhi, India
V. Kakulapati Sreenidhi Institute of Science and Technology, Yamnampet,
Ghatkesar, Hyderabad, Telangana, India
Prabudhd Krishna Kandpal Department of Artificial Intelligence and Machine
Learning, Maharaja Agrasen Institute of Technology, Delhi, India
Vinod Karar CSIR—Central Road Research Institute, New Delhi, India
Editors and Contributors xix
S. Karthik Kalasalingam Academy of Research and Education, Krishnankoil,

Tamil Nadu, India
Roshan Karwa Department of CSE, Prof Ram Meghe Institute of Technology and
Research, Badnera-Amravati, India
Youssef Kassem Faculty of Engineering, Mechanical Engineering Department,
Near East University, Nicosia, North Cyprus, Cyprus;
Faculty of Civil and Environmental Engineering, Near East University, Nicosia,
North Cyprus, Cyprus;
Near East University, Energy, Environment, and Water Research Center, Nicosia,
North Cyprus, Cyprus
Gurleen Kaur Chitkara University Institute of Engineering and Technology,
Rajpura, Punjab, India
Ahmad Habib Khan Delhi Technological University, New Delhi, India
Usama Bin Rashidullah Khan Interdisciplinary Centre for Artificial Intelligence,
Aligarh Muslim University, Aligarh, India
Ashish Khanna Maharaja Agrasen Institute of Technology, Guru Gobind Singh
Indraprastha, University Delhi, Delhi, India;
Department of Computer Science and Engineering, Maharaja Agrasen Institute of
Technology, GGSIPU, Delhi, India
Rohit Khanna Department of Artificial Intelligence and Machine Learning,
Manju Khari School of Computer and System Sciences, Jawaharlal Nehru Univer-
sity, New Delhi, India
Jyoti Kukade Medi-Caps University, Indore, India
Anil Kumar DIT University, Dehradun, India;
Tula’s Institute, Dehradun, India
Deepak Kumar Delhi Technological University, Rohini, Delhi, New Delhi, India
Tarun Kumar Department of Computer Science and Engineering, Maharaja
Prasanna Kumar Lakineni Department of CSE, GITAM School of Technology,
GITAM University, Visakhapatnam, India
Dushyant Lavania Kalinga Institute of Industrial Technology, Deemed to be
University, Bhubaneswar, India
R. P. Mahapatra SRM Institute of Science and Technology, Ghaziabad, India
R. Mahaveerakannan Department of Computer Science and Engineering,
Saveetha School of Engineering, Saveetha Institute of Medical and Technical
Sciences, Chennai, Tamil Nadu, India
xx Editors and Contributors
Bansibadan Maji National Institute of Technology, Durgapur, India

Avdeep Malik School of Computing Science and Engineering, Galgotias Univer-
Durga Satish Matta Department of Computer Science and Engineering,
Puducherry Technological University, Puducherry, India
Shweta Meena Delhi Technological University, New Delhi, India
R. Meganathan Department of Computer Science and Engineering, Koneru Laksh-
maiah Education Foundation, Vaddeswaram, AP, India
Daniel Milne University of South Wales, Cardiff, UK
Pallavi Mishra School of Information Technology, VIT University, Vellore, India
Sushruta Mishra Kalinga Institute of Industrial Technology, Deemed to be Univer-
Trapti Mishra Medi-Caps University, Indore, India
Adrija Mitra Kalinga Institute of Industrial Technology, Deemed to be University,
Bhubaneswar, India
K. P. Mohamed Basheer Sullamussalam Science College, Areekode, Kerala, India
Ria Monga Department of Computer Science and Engineering, Maharaja Agrasen
Institute of Technology, GGSIPU, Delhi, India
V. K. Muneer Sullamussalam Science College, Areekode, Kerala, India
Yelisetty Priya Nagasai SCSET, Bennett University, Greater Noida, Uttar Pradesh,
India
Fatima Adel Nama Faculty of Computer Science and Mathematics, University of
Kufa, Kufa, Iraq
Supreeta Nayak Kalinga Institute of Industrial Technology, Deemed to be Univer-
Swati Nigam Department of Computer Science, Faculty of Mathematics and
Computing, Banasthali Vidyapith, Banasthali, India
Ahmed J. Obaid Faculty of Computer Science and Mathematics, University of
Kufa, Kufa, Iraq;
Department of Computer Technical Engineering, Technical Engineering College,
Al-Ayen University, Thi-Qar, Iraq
Debrupa Pal Narula Institute of Technology, Kolkata, India;
National Institute of Technology, Durgapur, India
Hrishabh Palsra Department of Electronics and Communication Engineering,
Delhi Technological University, Delhi, India
Editors and Contributors xxi
Neeta Pandey Delhi Technological University, Delhi, India

Rajeshwari Pandey Delhi Technological University, Delhi, India
Rishit Pandey Kalinga Institute of Industrial Technology, Deemed to Be University,
Bhubaneswar, India
Akaash Nidhiss Pandian Delhi Technological University, New Delhi, India
Anasuya Mithra Parthaje Delhi Technological University, New Delhi, India
Aanchal Patel School of Information Technology, VIT University, Vellore, India
Harshita Patel School of Information Technology, VIT University, Vellore, India
Nitish Pathak Bhagwan Parshuram Institute of Technology (BPIT), GGSIPU, New
Delhi, India
Prapti Patra Kalinga Institute of Industrial Technology, Deemed to Be University,
Bhubaneswar, India
Rahul Singh Pawar Medi-Caps University, Indore, India
Ashish Prajapati Medi-Caps University, Indore, India
Saba Raees Department of Computer Science and Engineering, School of Engi-
neering Sciences and Technology, Jamia Hamdard University, New Delhi, India
T. Rajesh Kumar Department of Computer Science and Engineering, Saveetha
School of Engineering, Saveetha Institute of Medical and Technical Sciences,
Harshit Rajput Department of Electronics and Communication Engineering, Delhi
Technological University, Delhi, India
Lekha Rani Chitkara University Institute of Engineering and Technology, Rajpura,
Punjab, India
Bharati Rathore Birmingham City University, Birmingham, UK
Joydeep Saggu Netaji Subhas University of Technology, Dwarka, India
Ehtesham Sana Department of Computer Engineering, Aligarh Muslim Univer-
sity, Aligarh, India
Ansh Sarkar Kalinga Institute of Industrial Technology, Deemed to be University,
Bhubaneswar, India
K. Saruladha Department of Computer Science and Engineering, Puducherry
Technological University, Puducherry, India
A. Sasi Kumar Inurture Education Solutions Pvt. Ltd., Bangalore, India;
Department of Cloud Technology and Data Science, Institute of Engineering and
Technology, Srinivas University, Surathkal, Mangalore, India
xxii Editors and Contributors
V. Sathyendra Kumar Department of CSE, BIHER, Chennai, India;

Annamacharya Institute of Technology and Sciences, Rajampet, Andhra Pradesh,
India
Laveena Sehgal IIMT College of Engineering, Greater Noida, UP, India
Drishti Seth Department of Artificial Intelligence and Machine Learning, Maharaja
Agrasen Institute of Technology, Delhi, India
Shagun Delhi Technological University, Rohini, Delhi, New Delhi, India
Allabaksh Shaik Jawaharlal Nehru Technological University Anantapur, Anantha-
puramu, Andhra Pradesh, India;
Sri Venkateswara College of Engineering Tirupati, Affiliated to Jawaharlal Nehru
Technological University Anantapur, Ananthapuramu, Andhra Pradesh, India
Mahendra Sharma IIMT College of Engineering, Greater Noida, UP, India
Neelam Sharma Department of Artificial Intelligence and Machine Learning,
Shaid Sheel Medical Technical College, Al-Farahidi University, Baghdad, Iraq
Tariq Hussain Sheikh Department of Computer Science, Shri Krishan Chander
Government Degree College Poonch, Jammu and Kashmir, India
Shourya Department of Artificial Intelligence and Machine Learning, Maharaja
Akshat Shukla Medi-Caps University, Indore, India
T. Shynu Department of Biomedical Engineering, Agni College of Technology,
S. Silvia Priscila Bharath Institute of Higher Education and Research, Chennai,
Tamil Nadu, India
Anup Singh Kalinga Institute of Industrial Technology, Deemed to Be University,
Bhubaneswar, India
Devansh Singh Department of Computer Science and Engineering, Maharaja
Devanshi Singh Delhi Technological University, New Delhi, India
Jaspreeti Singh USICT, Guru Gobind Singh Indraprastha University, New Delhi,
India
Srijan Singh SRM Institute of Science and Technology, Ghaziabad, India
Urvashi Sugandh Department of Computer Science, Faculty of Mathematics and
Computing, Banasthali Vidyapith, Banasthali, India
Editors and Contributors xxiii
S. Suman Rajest Bharath Institute of Higher Education and Research, Chennai,

Tamil Nadu, India
Kriti Suneja Delhi Technological University, Delhi, India
Chinthakunta Swetha Department of Computer Science and Technology, Yogi
Vemana University, Kadapa, YSR District Kadapa, Andhra Pradesh, India
Sachin Taran Department of Electronics and Communication Engineering, Delhi
Technological University, Delhi, India
Rizwana Kallooravi Thandil Sullamussalam Science College, Areekode, Kerala,
India
Binu Thomas Marian College Kuttikkanam, Peermade, Idukki, Kerala, India
Siba K. Udgata AI Lab, School of Computer and Information Sciences, University
of Hyderabad, Hyderabad, India
Shubham Upadhyay School of Computing Science and Engineering, Galgotias
University, Greater Noida, Uttar Pradesh, India
Vishisht Ved Kalinga Institute of Industrial Technology, Deemed to Be University,
Bhubaneswar, India
Srijal Vedansh Kalinga Institute of Industrial Technology, Deemed to Be Univer-
S. Velmurugan CSIR—Central Road Research Institute, New Delhi, India
Bindu Verma Delhi Technological University, New Delhi, India
Ian Wilson University of South Wales, Cardiff, UK
Yash Yadav Department of Artificial Intelligence and Machine Learning, Maharaja
Deep Spectral Feature Representations
Via Attention-Based Neural Network
Architectures for Accented Malayalam
Speech—A Low-Resourced Language
Rizwana Kallooravi Thandil , K. P. Mohamed Basheer ,

and V. K. Muneer
Abstract This study presents a novel methodology for Accented Automatic Speech
Recognition (AASR) in Malayalam speech, utilizing Recurrent Neural Network
(RNN) and Long Short-Term Memory (LSTM) architectures, both integrated with
attention blocks. The authors constructed a comprehensive accented speech corpus
comprising speech samples from five distinct accents of the Malayalam language.
The study was conducted in four phases, with each phase exploring different combi-
nations of features and model architectures. In the first phase of the study, the authors
utilized Mel frequency cepstral coefficients (MFCC) as a feature vectorization tech-
nique and combined it with Recurrent Neural Network (RNN) to model the accented
speech data. This configuration yielded a Word Error Rate (WER) of 11.98% and a
Match Error Rate (MER) of 76.03%. In the second phase, the experiment utilized
MFCC and tempogram methods for feature vectorization, combined with RNN incor-
porating an attention mechanism. This approach yielded a Word Error Rate (WER)
of 7.98% and a Match Error Rate (MER) of 82.31% for the unified construction of
the accented data model. In the third phase, MFCC and tempogram feature vectors
along with the LSTM mechanism were employed to model the accented data. This
configuration resulted in a Word Error Rate (WER) of 8.95% and a Match Error
Rate (MER) of 83.64%. In the fourth phase, the researchers utilized the same feature
set as in phases two and three and introduced LSTM with attention mechanisms to
construct the accented model. This configuration led to a Word Error Rate (WER) of
3.8% and a Match Error Rate (MER) of 87.11%. The experiment yielded impressive
results, with a Word Error Rate (WER) of 3.8% and a Match Error Rate (MER) of
87.11%. Remarkably, the study demonstrated the effectiveness of the LSTM with
attention mechanism architecture, showcasing its ability to perform well even for
unknown accents when combined with the appropriate accent attributes. The evalu-
ation of performance using Word Error Rate (WER) and Match Error Rate (MER)
showed a significant reduction of 10–15% when incorporating attention mechanisms
R. K. Thandil (B) · K. P. Mohamed Basheer · V. K. Muneer

Sullamussalam Science College, Areekode, Kerala, India
e-mail: ktrizwana@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
A. Swaroop et al. (eds.), Proceedings of Data Analytics and Management,
Lecture Notes in Networks and Systems 788,
https://doi.org/10.1007/978-981-99-6553-3_1
2 R. K. Thandil et al.
in both the RNN and LSTM approaches. This reduction indicates the effectiveness
of attention mechanisms in improving the accuracy of accented speech recognition
systems.
Keywords Automatic speech recognition · Human computer interface · Speech

feature vectorization · Attention mechanism · Deep neural networks · Accented
Malayalam speech processing
1 Introduction
This study shows that the experiment constructed using MFCC and tempogram
features with attention mechanisms outperform the other methods used in this
experiment for recognizing accented speech.
The key contributions of the paper are shown below:
1. The authors have constructed an accented speech dataset for conducting this
study.
2. The experiment is conducted in four different phases with four different
approaches to find a better approach.
3. The implications of the spectral features in accent identification are analyzed in
this study.
4. Proposed a novel approach to minimize vanishing and exploding gradients and
maximize the expectation.
2 Related Work
AASR has been a challenging task due to the variability in speech patterns, which
arises from differences in pronunciation, intonation, and rhythm, among other
factors. Recently, the attention-based neural network architectures have emerged as
promising approaches for improving accented speech recognition. Low-resourced
languages with a diverse range of accents pose a significant challenge for accurate
speech recognition [1].
A study by Ajay et al. [2], in their work, proposed an attention-based convolutional
neural network (CNN) architecture to extract deep spectral features for Malayalam
speech recognition. The attention mechanism was used to identify the most rele-
vant frames for recognition, and the model achieved an accuracy of 76.45% on the
accented Malayalam speech dataset.
In another study by Devi et al. [3], the authors proposed a deep attention-based
neural network architecture for constructing AASR for the Malayalam language.
The model exhibited an accuracy rate of 82.07% on the accented Malayalam speech
dataset.
Deep Spectral Feature Representations Via Attention-Based Neural … 3
Similarly, Sasikumar et al. [4] proposed an attention-based LSTM architecture for

accented Malayalam speech recognition. They used Mel-scaled frequency cepstral
coefficients (MFSC) as the input features and employed an attention mechanism to
construct the ASR. The model was constructed with an accuracy of 80.02% on the
accented Malayalam speech dataset, outperforming the baseline models.
Sandeep Kumar et al. [5] explore a new approach to model emotion recognition
from human speech. The authors propose a method that combines CNNs and tensor
neural networks (TNNs) with attention mechanisms for constructing speech emotion
recognition (SER) models. The method achieves promising results compared to
several other approaches on publicly available datasets.
Zhao et al. [6] propose a feature extraction method that utilizes MFCC and their
time–frequency representations as input to the neural network that yielded a better
model in emotion recognition on the datasets, they have used to conduct the study.
Kumar and Reddy [7], in their work, combined MFCC and PLP methods for
feature extraction and employed an attention mechanism to identify the most relevant
features for recognition. The model achieved an accuracy of 83.15% on the Hindi-
accented speech dataset.
Ghosh et al. [8] used a combination of MFCC and shifted delta cepstral (SDC)
features as input and employed an attention mechanism to learn the importance of
different features for recognition. The model constructed achieved an accuracy of
89.7%, outperforming the baseline models.
In another study, Kim et al. [9] used a combination of MFCC and shifted delta
cepstral (SDC) features as input and employed an attention mechanism to learn
the importance of different features for recognition. The model proposed by them
achieved an accuracy of 80.2%, outperforming the baseline models.
Similarly, in a study, Wang et al. [10] used a combination of MFCC and gamma
tone frequency cepstral coefficients (GFCC) as input and employed an attention
mechanism to identify the most relevant frames for recognition. The model proposed
by them achieved an accuracy of 80.3%, outperforming the baseline models.
Parvathi and Rajendran [11], in their study, proposed an attention-based RNN
architecture for Tamil-accented speech recognition. They used MFSC as the input
features and employed an attention mechanism to identify the most relevant frames
for recognition.
In another study by Xiong et al. [12], the authors proposed a deep spectral feature
representation approach for Mandarin-accented speech recognition. They used a
combination of MFCC and GFCC as the input features and employed a self-attention
mechanism to identify the relevant features for recognition. They constructed an ASR
model accuracy of 79.7% on the Mandarin-accented speech dataset, outperforming
the baseline models.
Prajwal et al. [13], in their work, proposed a Malayalam ASR system that
can handle diverse accents, including those spoken by non-native speakers. The
researchers collected a corpus of spoken Malayalam from native and non-native
speakers and used it to train a deep neural network-based speech recognition model.
They found that their system achieved average recognition accuracy of 86.8% on
accented speech, which was higher than the accuracy achieved by a baseline system
trained only on native speech.
Bineesh et al. [14] proposed a speaker adaptation algorithm to improve accented
speech recognition in Malayalam. The researchers used a combination of acoustic
modeling and speaker adaptation techniques to develop an accent-independent
speech recognition system. They found that their system achieved average recog-
nition accuracy of 78.5% on accented speech, which was higher than the accuracy
achieved by a baseline system that did not use speaker adaptation.
3 Methodology
The AASR is a task so challenging for low-resource languages like Malayalam.

The publicly available data for conducting the study is very scarce, and hence, the
authors constructed an accented dataset of multisyllabic words to meet the purpose.
The entire experiment was conducted in four phases (Fig. 1).
3.1 Data Collection
The authors have constructed a speech corpus recorded under natural recording condi-
tions consisting of approximately 1.17 h of accented speech for conducting this study.
The corpus was constructed by considering individual utterances of multi-syllabled
Fig. 1 Functional block diagram of the proposed methodology

words lasting between two and five seconds. Data was collected from forty speakers,
including twenty males and twenty females from five different districts of Kerala
where people speak Malayalam with different accents. The speech samples were
collected from native speakers ranging from five to eighty years of age.
3.2 Data Preprocessing
In the present study, we employed two distinct data preprocessing techniques: the
MFCC algorithm and the tempogram approach. The MFCC algorithm was chosen
due to its effectiveness in feature vectorization, allowing us to extract meaningful
features from the speech signals. On the other hand, the tempogram approach was
utilized to specifically capture accent and rhythm-related characteristics in the speech
data. By utilizing these preprocessing methods, we aimed to optimize the data repre-
sentation for subsequent analysis and modeling. Specifically, the vectors obtained
from the MFCC algorithm and a combination of the MFCC and tempogram vectors
concatenated together were utilized in different phases of the experiment (Table 1).
In this study, a total of 40 deep spectral coefficients were extracted from the
accented speech signals using MFCC approach. Each coefficient captures specific
characteristics of the signal. Initially, 13 spectral coefficients were extracted from the
accented speech data. Then the first and second derivatives of the above coefficients
are calculated and considered resulting in 39 vector representations which correspond
to the rate of change in the spectral values with respect to time. Toward the end, the
mean value of all 39 coefficients is calculated and appended to the vector list, resulting
in a final set of 40 MFCC coefficients. Later the tempogram features were extracted to
specifically capture accent and rhythm-related characteristics of the speech signals.
To achieve this, we utilized tempogram speech extraction techniques, resulting in
384 speech vectors.
Table 1 District-wise data

District No. of audio samples
collection statistics
Kasaragod 1360
Kannur 1360
Kozhikode 1690
Malappuram 1360
Wayanad 1300
Total 7070
Fig. 2 Proposed RNN
3.3 Accented Model Construction
Malayalam has a unique set of phonological features that can make it challenging
for speech recognition systems to accurately transcribe accented speech. Accented
speech poses a challenge to speech recognition systems since the pronunciation,
intonation, and rhythm of accented speech differ from standard speech [1, 24]. The
authors have constructed four accented speech models for the Malayalam language.
The experiment has been conducted in four different phases as part of investigating
the better approach to solving the problem. Each phase of the experiment is discussed
in detail in the coming sessions.
3.3.1 Phase 1: Unified Accented Model Construction Using RNN

Architecture
RNN can be used for accented speech recognition by sequentially processing the
audio signal and capturing the temporal dependencies between the audio frames.
RNNs are particularly well suited for this task because they can handle variable-
length sequences of audio data and can learn long-term dependencies.
The 40 MFCC features are given as input to the RNN architecture which is shown
in Fig. 2. The RNN network processes the input sequences one at a time and maintains
a context within the network. The output of the RNN layer is then fed to the batch
normalization layer to normalize the data. After normalizing the data, it is passed on
to the Sigmoid layer. The entire output vectors are concatenated and added together
before passing to the softmax layer that predicts the target class with maximum
probability.
3.3.2 Phase 2: Unified Accented Model Construction Using RNN

with Attention Mechanism
The experiment in this phase is conducted using the 424 MFCC and tempogram
feature vectors. We constructed the accented model using RNN with an attention
Fig. 3 Proposed RNN with attention mechanism
block architecture to focus on the relevant information of the accented speech. The
feature input is fed into the RNN architecture which is then fed into a dense layer.
A dropout layer has been included in the architecture to avoid overfitting which is
then followed by a dense layer. The activations functions used in the network are
Sigmoid and ReLU. The predictions made by the softmax layer are fed into the RNN
network with the attention layer. The output generated has improved significantly in
this approach (Fig. 3).
3.3.3 Phase 3: Unified Accented Model Construction Using LSTM

Architecture
The feature input layer here contains the vectors obtained by applying the MFCC
and tempogram methods on the speech signals concatenated together. A total of 424
vectors from each speech signal have been extracted in this phase for conducting
the study. These vectors are then given as input to the LSTM layers that reduce the
vanishing gradient problems of the RNN architecture. The output of this layer is
normalized by feeding to the batch normalization layer. The concatenated output of
these layers is then fed into the dense layers and finally to the softmax layer to make
the predictions (Fig. 4).
3.3.4 Phase 4: Unified Accented Model Construction Using LSTM

with Attention Mechanism
The fourth phase of the experiment was conducted using LSTM with attention
block architecture. The feature vectors include 424 MFCC and tempogram vectors
extracted from the accented audio data. The proposed LSTM has two main branches
Fig. 4 Proposed LSTM
of operational block and a skip connection block branch that is used to highlight
only relevant activation during training. The attention block in the proposed LSTM
reduces the computational resources that are wasted on irrelevant activations.
3.3.5 Result and Evaluation
The four phases of the study with a different set of feature vectors, methodologies,
and different experimental parameters and architectural frameworks lead to different
conclusions in the study of AASR for the Malayalam language. All the experiments
were conducted with similar training parameters and environmental setups. The opti-
mizer used in phase 1 is Adam, and all other phases are rmsprop, the initial learning
rate for phase I and II is 0.001, and for phases III and IV 0.01. The experiment was
set up to run for 3000 epochs for phases I and II and for phases III and IV it was set
to 2000 since the model was learning at a faster rate. The loss function used in all
phases was categorical cross-entropy (Fig. 5; Table 2).
Upon evaluating the performance of the experiments conducted in different
phases, a clear trend emerges. In Phase II, where the recurrent neural network (RNN)
was enhanced with an attention mechanism, a significant improvement in perfor-
mance is observed compared to Phase I, where only RNN was utilized. Phase II
exhibits higher accuracy rates and lower loss rates, indicating the effectiveness of
incorporating the attention mechanism.
Moving on to Phase III, which utilized Long Short-Term Memory (LSTM) for
modeling the accented data, a noticeable improvement is observed compared to Phase
II. However, the most noteworthy enhancement is observed in Phase IV, where LSTM
with an attention mechanism was employed. Phase IV demonstrates the highest
accuracy among all the experimental phases, accompanied by reduced loss rates.
The comprehensive evaluation of accuracies and loss rates throughout the entire
experiment highlights the superior performance of the accented model constructed
using LSTM with an attention block architecture. This model showcases enhanced
accuracy and lower error rates, indicating its proficiency in recognizing and capturing
the nuances of accented Malayalam speech.
Fig. 5 Proposed LSTM with attention block
Table 2 Evaluation metrics in terms of accuracy and loss

Phase Train accuracy Validation Train loss (%) Validation loss Number of
(%) accuracy (%) (%) epochs
Phase I 87.18 65.15 0.0096 0.0277 3000
Phase II 92.02 72.61 0.0074 0.0317 3000
Phase III 94.10 64.87 0.0050 0.0309 2000
Phase IV 96.27 73.03 0.0031 0.0291 2000
Overall, these findings emphasize the significance of incorporating attention

mechanisms and LSTM architectures in the construction of accented speech recogni-
tion models. The improved performance achieved in Phase IV validates the effective-
ness of LSTM with attention mechanisms in accurately recognizing and processing
accented speech, leading to reduced error rates and higher accuracy.
In our research, we employed WER as a key evaluation metric to assess the effec-
tiveness of our proposed techniques for accented speech recognition. By comparing
the recognized output with the ground truth transcript, we were able to quantify the
quality of the ASR system in accurately transcribing accented speech. A lower WER
indicates a higher level of accuracy and performance in capturing the intended words
and linguistic content.
We meticulously computed the WER for each experimental phase in our study,
considering different combinations of feature vectorization techniques and model
architectures. Through these evaluations, we were able to observe the impact of
various factors on the recognition accuracy of accented speech. The results demon-
strated a reduction in WER as we introduced attention mechanisms and utilized deep
spectral feature representations.
The WER values obtained in our experiments provided valuable insights into the
performance and suitability of our proposed approach for accented speech recogni-
tion in the Malayalam language. These quantitative measures contribute to a compre-
hensive assessment of the system’s capability to handle variations in pronunciation,
intonation, and rhythm across different accents. Furthermore, the obtained WER
values serve as a basis for comparing our approach with existing systems and high-
light the advancements and contributions of our research in the field of accented
speech recognition.
In the context of our research on Accented Automatic Speech Recognition
(AASR) for Malayalam speech, we also considered the evaluation metric known
as Match Error Rate (MER). While Word Error Rate (WER) provides insights into
the accuracy of word-level recognition, MER offers a more comprehensive assess-
ment of the system’s performance by considering higher-level linguistic features and
semantic understanding.
Accented speech poses challenges in accurately capturing not only the individual
words but also the overall meaning and intent behind the spoken input. By incorpo-
rating MER in our evaluation, we aimed to assess the system’s ability to correctly
match the intended meaning of the accented speech, accounting for variations in
pronunciation, intonation, and rhythm.
Our study considered a range of accents in Malayalam and employed MER to eval-
uate the system’s performance in capturing the semantic understanding and overall
coherence of the spoken input. By analyzing the errors at a higher level, we gained
insights into the system’s ability to handle accent-specific variations and produce
meaningful and contextually relevant transcriptions.
The inclusion of MER in our research provided a more comprehensive assess-
ment of the AASR system’s effectiveness in recognizing and understanding accented
Malayalam speech. By considering both WER and MER, we obtained a well-rounded
evaluation that addressed both surface-level recognition accuracy and higher-level
linguistic aspects, contributing to a more thorough understanding of the system’s
capabilities in handling accented speech (Fig. 6; Table 3).
3.4 Conclusion
The authors here propose a novel methodology for constructing a better model for
AASR for the Malayalam language using different spectral feature combinations
and architectural frameworks. The experimental results show that the LSTM with
attention block architecture gave fewer WER and a higher MER when compared
to the other approaches. This work concludes that using an attention block with
LSTM architecture with proper feature vectors would be ideal for modeling accented
speech for any low-resourced language. The novelty in extracting the accented speech
Fig. 6 Performance evaluation using WER and MER
Table 3 Comparison with existing research

References Year Methodology Accuracy SER WER PER MER
[15] 2021 Att-DNN – – 7.52% – –
[16] 2020 CNN-LSTM – – 18.45% 42.25% –
[17] 2019 DNN – – 6.94% 26.68% –
[18] 2021 Bi-Att-RNN – – 10.18% – –
[19] 2020 DNN – – 16.6% – –
[20] 2020 Att-LSTM – – 12.56% – –
[21] 2019 Att-LSTM – – 7.94% – –
[22] 2019 DNN – 3.93% 13.21% – –
[23] 2020 ML 93.6% – – – –
Proposed – RNN 87.18% – 11.98% – 21.025%
method Att-RNN 92.02% – 7.98% – 18.23%
LSTM 94.10% – 8.95% – 18.17%
Att-LSTM 96.27% – 7.26% – 17.21%
features also contributed to the better construction of the accented model. The model
we constructed here worked well when tested with the unknown accents as well. The
dataset constructed for conducting this study contains representations from male to
female voices of different age groups. Hence, the variations of prosodic values based
on gender and age are better represented in the feature vectors in this study.
Malayalam has a rich variety of accents that still need to be considered for
constructing AASRs. The unavailability of a benchmark dataset for conducting
research in the area poses a huge gap in research and makes the study in the area chal-
lenging. So the authors would initiate the construction of accented dataset and make
it available for the public to conduct various studies. In the future, we would propose
better approaches for constructing unified accented models that would recognize
all accents in the language that can be adopted for modeling other low-resourced
languages.
References
1. Thandil RK, Mohamed Basheer KP (2023) Exploring deep spectral and temporal feature
representations with attention-based neural network architectures for accented Malayalam
speech—A low-resourced language. Eur Chem Bull 12(Special Issue 5):4786–4795. https://
doi.org/10.48047/ecb/2023.12.si5a.0388. https://www.eurchembull.com/uploads/paper/a41
a80a80b4fb50e88445aef896102a6.pdf
2. Ajay M, Sasikumar S, Soman KP (2020) Attention-based deep learning architecture for
accented Malayalam speech recognition. In: 2020 11th International conference on computing,
communication and networking technologies (ICCCNT). IEEE, pp 1–6
3. Devi SR, Bhat R, Pai RM (2021) Deep attention-based neural network architecture for accented
Malayalam speech recognition. In: 2021 IEEE 11th annual computing and communication
workshop and conference (CCWC). IEEE, pp 0277–0281
4. Sasikumar S, Ajay M, Soman KP (2021) Attention-based LSTM architecture for accented
Malayalam speech recognition. In: 2021 IEEE 11th annual computing and communication
workshop and conference (CCWC). IEEE, pp 0369–0373
5. Pandey SK, Shekhawat HS, Prasanna SRM (2022) Attention gated tensor neural network archi-
tectures for speech emotion recognition. Biomed Signal Process Control 71(Part A):103173.
https://doi.org/10.1016/j.bspc.2021.103173. ISSN 1746, 8094
6. Zhao Z et al (2019) Exploring deep spectrum representations via attention-based recurrent and
convolutional neural networks for speech emotion recognition. IEEE Access 7:97515–97525.
https://doi.org/10.1109/ACCESS.2019.2928625
7. Kumar A, Reddy VV (2020) Deep attention-based neural network architecture for Hindi
accented speech recognition. In: 2020 11th international conference on computing, communi-
cation and networking technologies (ICCCNT). IEEE, pp 1–6
8. Ghosh P, Das PK, Basu S (2020) Deep attention-based neural network architecture for Bengali
accented speech recognition. In: Proceedings of the 5th international conference on intelligent
computing and control systems. Springer, pp 764–769
9. Kim D, Lee D, Shin J (2019) Attention-based deep neural network for Korean accented speech
recognition. J Inf Sci Eng 35(6):1387–1403
10. Wang C, Lu L, Wu Z (2019) Deep attention-based neural network for Mandarin accented
speech recognition. In: 2019 IEEE international conference on acoustics, speech and signal
processing (ICASSP). IEEE, pp 7060–7064
11. Parvathi PS, Rajendran S (2020) Attention-based RNN architecture for Tamil accented
speech recognition. In: 2020 international conference on smart electronics and communication
(ICOSEC). IEEE, pp 341–346
12. Xiong Y, Huang W, He Y (2020) Deep spectral feature representations via self-attention based
neural network architectures for Mandarin accented speech recognition. J Signal Process Syst
92(11):1427–1436
13. Prajwal KR, Mukherjee A, Sharma D (2019) Malayalam speech recognition using deep
neural networks for non-native accents. In: Proceedings of the 4th international conference
on intelligent human computer interaction. Springer, pp 191–201
14. Bineesh PV, Vijayakumar C, Rajan S (2020) Speaker adaptation for accented speech recognition
in Malayalam using DNN-HMM. In: Proceedings of the 12th international conference on
advances in computing, communications and informatics. IEEE, pp 1373–1380
15. Goodfellow I, Bengio Y, Courville A (2016). Deep learning, vol 1. MIT Press
16. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
17. Liu Y, Xu B, Xu C (2021) Accented speech recognition based on attention mechanism and
deep neural networks. Appl Sci 11(3):1238
18. Li X, Li H, Li Y, Li Y (2020) Accented speech recognition with deep learning models: a
comparative study. IEEE Access 8:98252–98261
19. Bhatia A, Sharma V (2019) Accent robust speech recognition using spectral features and deep
neural networks. J Intell Syst 28(2):271–283
20. Duong NQ, Nguyen TH (2021) Speech recognition for Vietnamese accented speech using
bidirectional attention based recurrent neural network. In: Proceedings of the 14th international
conference on knowledge and systems engineering, pp 159–167
21. Geetha PR, Balasubramanian R (2020) Attention based speech recognition for Indian accented
English. In: Proceedings of the international conference on computer communication and
informatics, pp 1–6
22. Luong M, Nguyen H, Nguyen T, Pham D (2020) Speech recognition for Vietnamese
accented speech using attention-based long short-term memory neural networks. J Sci Technol
58(6):139–151
23. Farahmandian M, Hadianfard MJ, Tahmasebi N (2019) Persian accented speech recogni-
tion using an attention-based long short-term memory network. J Electr Comput Eng Innov
7(2):105–112
24. Thandil RK, Mohamed Basheer KP, Muneer VK (2023) A multi-feature analysis of accented
multisyllabic Malayalam words—A low-resourced language. In: Chinara S, Tripathy AK, Li
KC, Sahoo JP, Mishra AK (eds) Advances in distributed computing and machine learning.
Lecture notes in networks and systems, vol 660. Springer, Singapore. https://doi.org/10.1007/
978-981-99-1203-2_21
Improving Tree-Based Convolutional
Neural Network Model for Image
Classification
Saba Raees and Parul Agarwal
Abstract In recent years, convolutional neural networks (CNNs) have shown

remarkable success in image classification tasks. However, the computational
complexity of these networks increases significantly as the number of layers and
neurons grows, making them computationally expensive and challenging to deploy
on resource-limited devices. In this paper, we propose a novel CNN architecture
based on a tree data structure to address the computational complexity of standard
CNNs. The proposed model has each node in the tree representing a convolution oper-
ation that extracts spatial information from the input data. The primary objective of
our work is to design a computationally efficient model that delivers competitive
performance on the CIFAR-10 dataset. Our proposed model achieved a test accu-
racy of 81% on the CIFAR-10 dataset, which is comparable to previous work. The
model’s training time is also significantly lower than the standard CNNs, and it
uses fewer parameters, making it easier to deploy on resource-limited devices. Our
work offers a promising direction for designing efficient and effective deep neural
networks for image classification tasks. The proposed CNN architecture based on a
tree data structure provides a novel approach to address the computational complexity
of standard CNNs while maintaining competitive performance levels. Additionally,
our work improves upon previous tiny models by addressing their shortcomings and
achieving comparable performance levels while being more efficient. Our proposed
model is suitable for deployment on resource-limited devices, such as mobile devices
and edge computing devices.
Keywords Convolutional neural networks · Tree data structure · Deep learning ·

Image classification · CIFAR-10
S. Raees · P. Agarwal (B)

Department of Computer Science and Engineering, School of Engineering Sciences and
Technology, Jamia Hamdard University, New Delhi, India
e-mail: pagarwal@jamiahamdard.ac.in
https://doi.org/10.1007/978-981-99-6553-3_2
16 S. Raees and P. Agarwal
1 Introduction
Deep learning [1] models that fall within the category of convolutional neural
networks (CNNs) [2] are frequently employed for image and video processing appli-
cations. By applying a number of convolutional filters to the input image, CNNs’
main principle is to take advantage of the spatial structure of images.
A set of convolutional filters, which are compact matrices that move across the
input image to extract local features, make up a CNN’s first layer. Each filter generates
a feature map, which identifies the presence of a specific pattern in the image. In the
second layer, a pooling [3] procedure is used to down-sample the feature maps by
taking the highest or average value within a narrow window. In order to extract
progressively complicated characteristics from the input image, the convolutional
and pooling layers are often stacked multiple times. Fully linked layers make up
the last layers of a CNN, which combine the learned information to produce a final
prediction.
The capacity of CNNs to automatically learn feature representations from raw
image data without the need for manual feature engineering is one of its main
features. For applications like object detection, image segmentation and image classi-
fication, CNNs are therefore quite effective. The capacity of CNNs to make extensive
use of data for training is another crucial aspect. From comparatively tiny datasets,
CNNs may acquire robust and generalizable features by utilizing methods like data
augmentation and transfer learning [4].
Given that our CNN image classification model is built on tree data structures,
here is a quick overview of tree-based data structures [5]. A group of nodes forms a
tree. The edges that connect these nodes to one another form a hierarchical frame-
work (without looping). When we wish to lower the processing cost and memory
utilization, trees are preferred. General tree, binary tree [6], binary search tree [7],
AVL tree, red–black tree [8], spanning tree [9] and B-tree are some of the several
types of trees in the data structure based on their properties. Trees are frequently used
in database indexing [10], dictionary implementation, quick pattern searching [11]
and shortest-distance calculations. Data may be searched and sorted fast using binary
trees. Unlike arrays, linked lists, stacks and queues, which are linear data structures,
trees are nonlinear. A tree is a structure made up of a root and one or more offspring.
Trees are a wonderful modelling tool because they take use of the hierarchical
unidirectional links between the data. Many real-world structures can be represented
as trees, such as the organizational hierarchy of a company where individual contrib-
utors report to a team leader, who reports to higher management and so on up to
the CEO. This hierarchical structure can be visualized as a tree-like structure, as
illustrated in Fig. 1.
Convolutional neural networks (CNNs) frequently employ the global average
pooling (GAP) [12] method to shrink the spatial dimensions of feature maps and
obtain a global representation of the input image. While GAP provides a lot of advan-
tages, such as lowering the number of network parameters and enhancing computing
efficiency, it also has a big disadvantage: information loss. In order to generate a
Improving Tree-Based Convolutional Neural Network Model for Image … 17
Fig. 1 A dummy hierarchal structure of a company
single value for each feature channel while utilizing GAP, the feature maps are aver-
aged along their spatial dimensions. As a result, just the channel-wise statistics are
kept and the spatial information present in the feature maps is ignored. Important
spatial features that are necessary for some activities, such as object localization and
segmentation, may be lost as a result. Furthermore, because GAP averages the feature
maps, it is less sensitive to subtle variations between related objects or regions in
an image. This may result in reduced accuracy while doing tasks like fine-grained
picture classification or style transfer, which depend on minute changes. Addition-
ally, GAP takes the stance that, regardless of their spatial position, all elements in a
particular channel are of identical importance. For applications like object detection
and segmentation, where spatial information is essential, this can be troublesome.
Although GAP offers several advantages, it should be utilized with caution in CNNs
and alternative techniques should be taken into account for tasks requiring spatial
information. Figure 2 illustrates the process of global average pooling.
Convolutional neural networks (CNNs) frequently employ the popular max
pooling technique to minimize the spatial dimensions of feature maps and derive
a summary of the most important information. The maximum value found in each
window is preserved as the output when using max pooling, which slides a window
over the feature map. Max pooling [13] provides several benefits, such as enhancing
translation invariance and lowering overfitting, but it also has some disadvantages.
The fact that max pooling discards the non-maximum values within each window
is one of its key drawbacks. As a result, it may result in information loss, especially
Fig. 2 Process of global average pooling
for tasks like object detection and segmentation where fine-grained spatial informa-
tion is crucial. Additionally, max pooling is less susceptible to small fluctuations
in the feature maps, which might impair the model’s accuracy, because it takes the
maximum value available in each window. Convolutional layers with strides [14], on
the other hand, are a substitute method for shrinking the spatial dimensions of feature
maps without sacrificing information. This method results in a smaller output feature
map since the convolutional filters go over the input with a wider stride than usual,
thus skipping certain places. This method, which preserves all the data contained in
the feature map as opposed to max pooling, may be more effective at maintaining
spatial details.
1.1 Contribution of the Research Work
. A novel tree data structure-based convolutional neural network for image classifi-
cation on the CIFAR-10 dataset. The proposed architecture’s performance metrics
are compared with other existing models.
. The proposed architecture focuses on reducing the loss in formation while the
propagation of signals or feature maps through the network. The proposed
architecture also introduces stability in training of the network.
The rest of the paper is organized as follows: Sect. 2 presents the literature review,
including the previous related work done by more than one author. The proposed
model, information of the dataset and the details on the technical solution are provided
in Sect. 3. Finally, the paper is concluded in Sect. 4 with future enhancements.
2 Literature Review
The subject of computer vision has been completely transformed by convolutional

neural networks (CNNs), and a variety of topologies have been suggested to enhance
their functionality. We provide a new CNN architecture based on trees in this study
and evaluate its performance in comparison with current cutting-edge models.
2.1 Previous Work
The most accurate model for the CIFAR-10 [16] dataset was the transformer-based
ViT-H/14(2020) [17] model with 632 M parameters. In comparison with well-known
convolutional architectures, it is the first to successfully train a Transformer encoder
on ImageNet. While requiring significantly fewer CPU resources to train, the model
achieves great outcomes when compared to SOTA convolutional networks. ViT-H/
14 has demonstrated good generalization performance on CIFAR-10 despite having
been initially developed for larger datasets, indicating that it might be useful for a
variety of image classification problems. The flexibility and adaptability of ViT-H/
14 allow it to be used for a variety of image classification tasks. It can be scaled
up or down to fit varied issue sizes. In some real-world applications, where faster
or more effective models may be preferred, ViT-H/14 may not be feasible due to its
high computing requirements. The CIFAR-10 dataset yielded a 99.6 accuracy score.
In classical deep convolutional neural networks, the problem of vanishing gradient
was pretty common until ResNet [18] came up with the skip connection architecture.
The ResNet Residual Network (ResNet) architecture, which was proposed for image
classification problems, has a variant called ResNet-56. With the potential for transfer
learning to other tasks, ResNet-56 is a strong and efficient architecture for image
classification tasks on the CIFAR-10 dataset. It may need a lot of processing power
to train and should be used cautiously on tiny datasets. To combat the vanishing
gradient issue that might arise in extremely deep networks, ResNet-56 makes use of
residual connections. As a result, the network’s ability to learn is enhanced during
training by allowing gradients to flow back through it. On small datasets like CIFAR-
10, overfitting is a concern when using a deep architecture. This can be mitigated
by using regularization strategies like dropout and weight decay. Another interesting
architecture is BiT-L [19], with ResNet, you may train hundreds or even thousands
of layers while still getting outstanding results. Empirical research shows that these
networks are simpler to optimize and can attain accuracy with substantially more
depth. Followed by ViT-H/14(2020) and CaiT-M-36 U 224(2022), this model also
achieved the best accuracy score for CIFAR-10 dataset. BiT-L model is a specialized
version of the ResNet architecture created for usage with bigger image datasets like
ImageNet. It can, however, also be applied to smaller picture datasets, such as CIFAR-
10. The ResNet design may be trained reasonably quickly because of its residual
connections, which improve gradient flow and hasten convergence. Numerous types
of visual distortions, including noise, blur and rotation, have been demonstrated to
be resistant to BiT-L (ResNet). Due to the deep architecture of this model, overfitting
may occur when applied to smaller datasets like CIFAR-10. Utilizing strategies like
weight decay or dropout can help to alleviate this. This model is computationally very
costly. On the CIFAR-10 dataset, the BiT-L (ResNet) model achieved an Accuracy
Score of 99.37.
Machine learning techniques are used by neural architecture search (NAS) [20]
to automatically create neural network structures. It has resulted in considerable
increases in accuracy and efficiency and can significantly minimize the requirement
for manual model design trial and error. EffiecientNetV2 [21] model has been created
to train and infer picture recognition problems in an efficient and effective manner.
Using a coefficient, the EfficientNet CNN architecture and scaling method uniformly
scale all the dimensions. By using a set of predefined scaling coefficients, the Effi-
cientNet scaling method uniformly increases depth, resolution and network width.
Compared to SOTA models, EfficientNetV2 models train substantially more quickly.
While being up to 6.8 times smaller, EfficientNetV2 can learn up to 11 times faster. It
is more dependable for use in practical applications because it is made to be resilient
to several kinds of image distortions, such as noise, blur and rotation. It can be tricky
to determine which features EfficientNetV2-L is using to produce its predictions
because the network can be confusing to analyse and comprehend. EfficientNetV2-
L [21] obtained an accuracy score of 91.1 with 121 M parameters on the CIFAR-10
dataset.
Another class of architectures in convolutional neural networks is the DenseNets
[22], which significantly outperform most SOTA models while requiring less
processing power. This model is developed for image identification problems, espe-
cially with datasets like CIFAR-10 that have few training examples. Relative to other
deep neural network architectures, has been demonstrated to be less prone to overfit-
ting, making it more resilient when working with sparse training data. Due to the tight
interconnectedness between the model’s layers, it may be more challenging to use on
devices with low memory capacity. Understanding and interpreting this model can
be complex, making it difficult to pinpoint the features the network is using to form
its predictions. On the CIFAR-10 dataset, the model DenseNet-BC-190 received an
accuracy score of 96.54.
With the potential to be used for various computer vision tasks, PyramidNet [23]
is a potent deep neural network design that has demonstrated outstanding perfor-
mance on the CIFAR-10 dataset. The findings on the CIFAR-10 dataset demon-
strated that PyramidNet achieved SOTA performance with a much lower error rate
than earlier SOTA models, proving the efficacy of the pyramid structure and other
design decisions made in the architecture. PyramidNet’s pyramid structure enables
significant accuracy increases while lowering computational costs and memory use.
In comparison with previous models, PyramidNet employs a larger network, which
may enhance its capacity to identify intricate elements in the data. PyramidNet
training can be laborious and computationally costly, like with many deep neural
network architectures, especially when employing larger datasets or better-resolution
images. If the model is too intricate or there aren’t enough training data, overfitting
is a possibility.
DINOv2 [24] is a new computer vision model that uses self-supervised learning
to achieve results that match or surpass the standard approach used in the field. Self-
supervised learning is a powerful, flexible way to train AI models because it does
not require large amounts of labelled data. DINOv2 does not require fine-tuning,
providing high-performance features that can be directly used as inputs for simple
linear classifiers. DINOv2 1100 M parameter model achieved 99.5% accuracy on
validation set of CIFAR-10.
Astroformer [25] is a hybrid transformer-convolutional neural network that uses
relative attention, depth-wise convolutions and self-attention techniques. The model
employs a careful selection of augmentation and regularization strategies, with a
combination of mix-up and RandAugment for augmentation and stochastic depth
regularization, weight decay and label smoothing for regularization. The authors find
that strong augmentation techniques provide higher performance gains than stronger
regularization. The model is effective in low-data regime tasks due to the careful
selection of augmentation and regularization, great generalizability and inherent
translational equivalence. It can learn from any collection of images and can learn
features, such as depth estimation, that the current standard approach cannot. It attains
an impressive accuracy score of 99.12%.
In the past, the tree data structure-based convolutional networks were based on
trinary trees for the initial layers meaning each node in the architecture will have
exactly three child nodes. This network used max pooling to reduce downscale the
size of the feature maps during convolutions and global average pooling in the end
of the network to feed the output to the dense layers while generating less train-
able parameters. This tiny and promising network had as little as 1.8 M parameters
while achieving an accuracy of ~81% on the validation set of the CIFAR-10 dataset
(Table 1).
Table 1 Accuracy scores on

References Model name Accuracy scores
CIFAR-10 dataset
[15] TBCNN 81.14
[17] ViT-H/14 99.6
[18] ResNet-56 88.8
[19] BiT-L (ResNet) 99.37
[21] EfficientNetV2-L 91.1
[22] DenseNet (DenseNet-BC-190) 96.54
[23] PyramidNet 97.14
[24] DINOv2 99.5
[25] Astroformer 99.12
2.2 Contribution
In this paper, we aim to explore the effects of popular dimensionality techniques

like 1D convolutions and convolutional layers learning themselves to scale down
dimensions effectively, hence replacing the traditional techniques like global average
pooling and max pooling, respectively.
3 Methodology
3.1 Overview
The central concern of this paper is the information loss that occurs when using
pooling layers in convolutional neural networks (CNNs). The primary objective of
this research is to eliminate the need for pooling layers and instead develop a method
for reducing the dimensions of feature maps while retaining all relevant information.
This paper aims to address the critical issue of preserving spatial details in CNNs
and proposes a novel approach to achieve this objective. In this section, we provide
details about the dataset we use, the techniques we use to modify the model and
discuss the network architectural details.
3.2 Dataset
The CIFAR-10 dataset is a popular image dataset used in computer vision research.
It was created by Krizhevsky et al. [16]. The dataset comprises 60,000 images that
belong to ten different classes, including dogs, cats, horses, deer, aeroplanes, trucks,
ships, birds, frogs and automobiles. These images are split into a set of training and
testing images, with the training set containing 50,000 images and the remaining
10,000 used for testing the model’s generalization performance. The training dataset
is balanced, with 5000 images for each class. Each image in the dataset is of size
(32, 32) and has three colour channels (RGB).
3.3 1D Convolutions and Strides
Convolution operations are applied to a wide range of data like 2D convolutions on

images where it has pixel location, 3D convolutions on videos [26] where in addition
to pixel location it also has the information of the time component and then there
come 1D convolutions which can be applied on a sequence, which might be a signal
[27–29]. The goal however remains the same across all of them. 1D convolutions
use kernels in a single dimension. These kernels are responsible for learning features
and the patterns inside the sequence. It does it so by sliding across the sequence
and taking sliding dot product for each value of the sequence. This results in a new
sequence containing contextual information and is of the same size as the sequence
before.
The next concept in convolutions is strides, and they are the amount by which the
kernel should jump after it completes a computation in the sequence. For instance,
if you have a sequence of length 12 and a kernel of size 3, it will start the operation
by aligning itself to the first three elements if the series and take a dot product to
compute the result. To compute the next result it should in principle to the next data
point and do the same operation but due to using strides, it skips the data points
amounting to the value of strides provided. If we take stride to be equal to 2, it will
result in the kernel starting from the 4th data point in the sequence. As you have
noticed this process would result in a smaller resultant sequence, and hence, we
achieve dimensionality reduction using stride convolutions.
In our case, we have used 1-D convolutions just after the last 2-D convolution layer
to reduce the dimensionality of the feature maps further while retaining important
information.
3.4 Removal of Max Pooling
In the recent studies leading to investigate generative models, it is found that models
perform better when the convolution layers are allowed to learn how do down scale
the feature maps by their own. Previously max pooling has been the norm, where we
choose a patch of the desired size and the image or sequence is divided into these
patches and we take the maximum as the input to the new feature map for that whole
patch.
We follow the footsteps of those studies and perform stride convolutions,
eradicating max pooling completely out of the system.
3.5 Leaky ReLU
Rectified linear unit most commonly known as ReLU has been a very widely used
nonlinearity in neural networks and the tree-based convolutional neural networks use
the same [30]. The functioning of ReLU is such that it takes a value and returns the
same value if it is positive else it returns 0. This does provide sparsity in the feature
maps, which creates a lasso-type regularization in the feature maps.
This sparsity, however, seems to be good at first but as the training continues, the
position at the feature map which has now become 0 will not receive any gradient for
itself and hence the gradient dies there. That is where Leaky ReLU comes into play
[31]. It provides a small gradient for the values greater than zero, hence allowing
gradients to flow across the entire feature maps, which allows relative weight updates
to happen in the kernels corresponding to the locations that were zero before.
3.6 Model Architecture
The previous model employed the traditional approach of using trees in its design,
which resulted in a unique structure. Notably, the top convolutional layers consisted
of three distinct blocks that utilized kernel sizes of 3, 5 and 7. This approach aimed to
extract diverse information from the input images. The model architecture is shown
in Fig. 3.
To scale down the feature maps generated from the convolution layers directly
above them, the model used max pooling. Another significant feature of the model
was its use of channel-wise addition of the feature maps at the output of blocks
after the top convolution layer. Overall, this design approach represents an innova-
tive attempt to extract a wide range of relevant information from input images while
optimizing performance through feature map scaling. The final layer of the model
incorporated global average pooling (GAP) layers to effectively decrease the dimen-
sionality of the feature maps before inputting them into the dense layers. As a result,
the model was relatively shallow and significantly smaller in size compared to current
state-of-the-art models, containing only 1.8 million parameters. This design approach
Fig. 3 TBCNN model architecture [15]

is an efficient means of reducing model complexity and optimizing performance in

settings where computational resources are limited.
Our proposed model follows a similar architecture with some modifications which
will result in better data retention. We apply modifications mentioned in the previous
sections to the network which results in a deeper network containing ~2.2 M param-
eters. The choice of optimizer [32] remains the same, Adam [33]. The learning rate
is set ~0.0007, and we use a decay rate ~0.00006 for better convergence. We also
replaced the additional layers with concatenate layers, so rather than adding feature
maps together, we concatenate them channels-wise so that information is preserved
and not lost in the process of addition [34]. Another small twerk we made is we
added the batch normalization layer before the nonlinearity. This has been done
while keeping in mind that if nonlinearity is applied before batch normalization, the
resultant output would be relative to the positive values only, whereas this allows
us to incorporate the negative values resulting from the convolution operation. The
updated model architecture is shown in Fig. 4.
Fig. 4 Modified TBCNN model architecture

4 Results and Conclusion
While the achieved validation accuracy of 81% on the CIFAR-10 dataset with just 2.4
million parameters is impressive, there are still some limitations to consider. Firstly,
the model’s limited capacity due to the small number of parameters may not allow it
to capture all the complex features and patterns in the dataset, leading to underfitting
on the task. Secondly, the model’s performance may not generalize well to other
datasets with different characteristics. Lastly, the CIFAR-10 dataset only contains
ten classes, limiting the task scope of the model (Table 2).
The proposed model in a similar training setup, surpassed the baseline TBCNN
model with ease and achieved an accuracy score comparable to the previous model.
The loss and accuracy curves for the model are presented in Fig. 5. These results
suggest that the proposed model offers a promising solution for optimizing accuracy
in image classification tasks, without sacrificing model complexity or computational
efficiency.
Using the early stopping call back, we were able to stop before the model overfits
too much and the best weights were restored. It is noteworthy that the inclusion
of the proposed modification resulted in a reduction in the learning capacity of the
model, as evidenced by a decrease in training accuracy during the training process.
By comparison, the previous model attained a training accuracy of 87%, indicating
a higher model capacity than ours. However, the validation results obtained with
our model demonstrate less overfitting and a more stable training process. These
Table 2 Comparison of results of TBCNN and modified TBCNN model

Model Training accuracy Validation accuracy Parameters (M)
TBCNN 87 81.14 1.8
Modified TBCNN (Our model) 83 80.57 2.2
Fig. 5 Baseline performance metrics

outcomes offer promising evidence that our model may effectively balance learning
capacity with generalization performance.
In future, additional modifications to the model architecture could include the
incorporation of skip connections to improve gradient flow throughout the network.
Although we opted to keep the model relatively shallow and fast for practical consid-
erations, a deeper network with a similar architecture may yield even better results.
Therefore, further exploration of model depth and complexity may be warranted in
future research endeavours.
References
1. Ramana K, Kumar MR, Sreenivasulu K, Gadekallu TR, Bhatia S, Agarwal P, Idrees SM (2022)
Early prediction of lung cancers using deep saliency capsule and pre-trained deep learning
frameworks. Front Oncol 12
2. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel
MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures,
challenges, applications, future directions. J Big Data 8:1–74
3. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional
neural networks. In: Neural information processing systems, vol 25. https://doi.org/10.1145/
3065386
4. Han D, Liu Q, Fan W (2018) A new image classification method using CNN transfer learning
and web data augmentation. Expert Syst Appl 1(95):43–56
5. Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271
6. Guibas LJ, Sedgewick R (1978) A dichromatic framework for balanced trees. In: 19th Annual
symposium on foundations of computer science (SFCS 1978), Ann Arbor, MI, USA, pp 8–21.
https://doi.org/10.1109/SFCS.1978.3
7. Hoare CAR (1961) Algorithm 64: Quicksort. Commun ACM 4(7):321–322. https://doi.org/
10.1145/366622.366644
8. Zegour DE, Bounif L (2016) AVL and Red Black tree as a single balanced tree. 65–68. https://
doi.org/10.15224/978-1-63248-092-7-28
9. Cunha SDA (2022) Improved formulations and branch-and-cut algorithms for the angular
constrained minimum spanning tree problem. J Comb Optim. https://doi.org/10.1007/s10878-
021-00835-w
10. Saringat M, Mostafa S, Mustapha A, Hassan M (2020) A case study on B-tree database indexing
technique. https://doi.org/10.30880/jscdm.2020.01.01.004.
11. Liu L, Zhang Z (2013) Similar string search algorithm based on Trie tree. J Comput Appl
33:2375–2378. https://doi.org/10.3724/SP.J.1087.2013.02375
12. Gousia H, Shaima Q (2022) GAPCNN with HyPar: Global Average Pooling convolutional
neural network with novel NNLU activation function and HYBRID parallelism. Front Comput
Neurosci 16:1004988. https://doi.org/10.3389/fncom.2022.1004988.ISSN:1662-5188
13. Wang S-H, Satapathy SC, Anderson D, Chen S-X, Zhang Y-D (2021) Deep fractional max
pooling neural network for COVID-19 recognition. Front Public Health 9(2021):726144.
https://doi.org/10.3389/fpubh.2021.726144. ISSN 2296-2565
14. Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convo-
lutional generative adversarial networks. In: Proceedings of the international conference on
learning representations (ICLR)
15. Ansari AA, Raees S, Nafisur R (2022) Tree based convolutional neural networks for image
classification. https://eudl.eu/doi/10.4108/eai.24-3-2022.2318997
16. Krizhevsky A (2012) Learning multiple layers of features from tiny images. University of
Toronto
17. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M,

Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16 × 16
words: transformers for image recognition at scale
18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016
IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA,
pp 770–778. https://doi.org/10.1109/CVPR.2016.90
19. Kolesnikov A, Beyer L, Zhai X, Puigcerver J, Yung J, Gelly S, Houlsby N (2020) Big transfer
(bit): general visual representation learning. In: Computer vision–ECCV 2020: 16th European
conference, Glasgow, UK, 23–28 Aug 2020, proceedings, Part V 16. Springer, pp 491–507
20. Zoph B, Le QV (2018) Efficient neural architecture search via parameter sharing. J Mach Learn
Res (JMLR) 19:1–45
21. Tan M, Le Q (2021) Efficientnetv2: smaller models and faster training. In: International
conference on machine learning. PMLR
22. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional
networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 4700–4708
23. Han D, Kim J, Kim J (2017) Deep pyramidal residual networks. In: Proceedings of the IEEE
conference on computer vision and pattern recognition (CVPR), pp 5927–5935
24. Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V, Fernandez P, Haziza D,
Massa F, El-Nouby A, Assran M, Ballas N, Galuba W, Howes R, Huang P-Y, Li S-W, Misra I,
Rabbat M, Sharma V, Bojanowski P (2023) DINOv2: learning robust visual features without
supervision
25. Dagli R (2023) Astroformer: More Data Might not be all you need for Classification. arXiv:
2304.05350
26. Rana S, Gaj S, Sur A, Bora PK (2016) Detection of fake 3D video using CNN. In: 2016 IEEE
18th international workshop on multimedia signal processing (MMSP), Montreal, QC, Canada,
pp 1–5. https://doi.org/10.1109/MMSP.2016.7813368
27. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1D convolutional
neural networks and applications: a survey. Mech Syst Signal Process 151:107398
28. Kiranyaz S, Ince T, Abdeljaber O, Avci O, Gabbouj M (2019) 1-D convolutional neural
networks for signal processing applications. In: ICASSP 2019—2019 IEEE international
conference on acoustics, speech and signal processing (ICASSP), Brighton, UK, pp 8360–8364
29. Markova M (2022) Convolutional neural networks for forex time series forecasting. AIP Conf
Proc 2459:030024. https://doi.org/10.1063/5.0083533
30. Agarap AF (2018) Deep learning using rectified linear units (ReLU). arXiv:abs/1803.08375.
n. pag
31. Xu B, Wang N, Chen T, Li Mu (2015) Empirical evaluation of rectified activations in
convolutional network
32. Shaziya H (2020) A study of the optimization algorithms in deep learning. https://doi.org/10.
1109/ICISC44355.2019.9036442
33. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. In: International
conference on learning representations
34. Cengil E, Çınar A (2022) The effect of deep feature concatenation in the classification problem:
an approach on COVID-19 disease detection. Int J Imaging Syst Technol. 32(1):26–40. https://
doi.org/10.1002/ima.22659. Epub 2021 Oct 10. PMID: 34898851; PMCID: PMC8653237
Smartphone Malware Detection Based
on Enhanced Correlation-Based Feature
Selection on Permissions
Shagun, Deepak Kumar, and Anshul Arora
Abstract In the present day, smartphones are becoming increasingly ubiquitous,

with people of all ages relying on them for daily use. The number of app down-
loads continues to skyrocket, with 1.6 million apps downloaded every hour in 2022,
amounting to a staggering total of 142.6 billion downloads. Google Play outpaces
iOS with 110.1 billion downloads compared to iOS’s 32.6 billion. Given the grow-
ing threat of malware applications for Android users, it is essential to quickly and
effectively identify such apps. App permissions represent a promising approach to
malware detection, particularly for Android users. Researchers are actively explor-
ing various techniques for analyzing app permissions to enhance the accuracy of
malware detection. Overall, understanding the importance of app permissions in
identifying potentially harmful apps is a critical step in protecting smartphone users
from malware threats. In our paper, we implemented enhanced correlation-based
feature selection (ECFS) technique to predict whether an app is malicious or non-
malicious which uses both feature-feature and feature-class correlation scores, i.e.,
ENMRS and crRelevance for computation. We then predicted the accuracy with var-
ious machine learning techniques on the basis of ECFS scores and found the highest
accuracy as 92.25 % for .n 1 and .n 2 values as 0.9 and 0.1, respectively. This accuracy
is achieved by random forest ML technique.
Keywords ECFS · ECFS Scores · Smartphones · ML Techniques
1 Introduction
Over the past decade, smartphones have experienced an unprecedented rise in popu-
larity, transforming from niche gadgets to ubiquitous necessities in our modern world.
With each passing year, smartphones have become more affordable, technologically
advanced, and accessible to a wider range of users, leading to explosive growth
in adoption rates. According to industry reports, the global smartphone market is
Shagun (B) · D. Kumar · A. Arora

Delhi Technological University, Rohini, Delhi, New Delhi 110042, India
e-mail: yadavvshagun@gmail.com
https://doi.org/10.1007/978-981-99-6553-3_3
30 Shagun et al.
projected to continue its upward trajectory, with an estimated 5.5 billion smartphone
users by 2025. The global smartphone market size was valued at USD 457.18 billion
in 2021 and is projected to grow from USD 484.81 billion in 2022 to USD 792.51
billion by 2029, exhibiting a CAGR of 7.3% during the forecast period.
This phenomenal growth can be attributed to several factors, including the increas-
ing demand for mobile Internet access, the proliferation of social media, the rise of
e-commerce, and the integration of smartphones into various aspects of our daily
lives. From communication and entertainment to productivity and beyond, smart-
phones have become an indispensable tool for people of all ages and backgrounds.
As smartphones continue to evolve with new features and capabilities, such as aug-
mented reality, artificial intelligence, and 5G connectivity, their popularity is expected
to continue growing in the foreseeable future, shaping the way we live, work, and
connect in a rapidly changing digital landscape. The versatility of smartphones is a
key factor that contributes to their widespread appeal. They have become an all-in-
one device that seamlessly integrates various aspects of our lives into a single device.
In fact, for many people, smartphones have become the primary means of accessing
the Internet, checking emails, and staying connected with the world.
In conclusion, the adaptability of smartphones, their affordability, and the contin-
uous evolution of technology are key factors that have contributed to the widespread
popularity of smartphones. They have become indispensable companions in our mod-
ern lives, offering versatility, convenience, and accessibility that appeal to a wide
range of consumers. As technology continues to advance, smartphones are likely to
remain a dominant force in the realm of consumer electronics, shaping the way we
live, work, and connect in the digital age. When it comes to mobile operating systems,
Android has gained significant popularity in recent years, emerging as a dominant
force in the smartphone market. Android stands out as the undeniable front-runner
when considering the global usage of mobile operating systems. According to recent
findings by Stat counter, Android commands a staggering 71.45% of the worldwide
market share, while iOS trails behind with a 27.83% share. Together, these two
giants account for over 99% of the total market share, leaving scant room for other
contenders like Samsung and KaiOS, which collectively make up less than 1% of
the market. These numbers clearly highlight the indomitable dominance of Android
and iOS as the preeminent mobile operating systems that remain unrivaled in the
industry.1 The ability to customize and personalize the user experience has been a
significant draw for many Android users. Moreover, Android’s seamless integration
with Google services, such as Google Drive, Google Maps, and Google Assistant,
has also played a pivotal role in its widespread adoption. Finally, Android’s compati-
bility with a wide range of third-party devices and accessories, such as smartwatches,
smart TVs, and smart home devices, has further cemented its position as a preferred
choice for tech-savvy users who seek seamless connectivity across different devices.
Overall, Android’s flexibility, affordability, customization options, and compatibil-
ity have contributed to its growing popularity and market dominance in the realm of
mobile operating systems.
1 https://www.appmysite.com/blog/android-vs-ios-mobile-operating-system-market-share-
statistics-you-must-know/.
Smartphone Malware Detection Based on Enhanced … 31
1.1 Motivation
Android has emerged as the primary target for malware apps due to several fac-
tors. First and foremost, Android’s widespread adoption as the most widely used
mobile operating system makes it an attractive target for cybercriminals seeking a
large user base to exploit. Additionally, the open-source nature of Android allows for
customization and flexibility, but it also means that potential vulnerabilities can be
exploited by malicious actors. The decentralized nature of the Android app ecosys-
tem, with multiple app stores and varying levels of app review processes, can also
create opportunities for malware to slip through the cracks. Furthermore, the diverse
hardware and software configurations across different Android devices can make it
challenging to implement uniform security measures. Lastly, the popularity of third-
party app stores and the availability of apps outside of the official Google Play Store
can increase the risk of downloading malware-laden apps. Collectively, these fac-
tors make Android the biggest target for malware apps, necessitating robust security
measures to safeguard users’ devices and data. During 2022, the worldwide number
of malware attacks reached 5.5 billion, an increase of two percent compared to the
preceding year. In recent years, the highest number of malware attacks was detected
in 2018, when 10.5 billion such attacks were reported across the globe.2
Malware, or malicious software, can pose various risks to Android devices. Some
potential risks of having malware on an Android device may include:
• Data Theft: Malware can be designed to steal sensitive information from your
Android devices, such as passwords, credit card numbers, and personal data.
• Unauthorized Charges: Some types of malware, such as premium-rate SMS mal-
ware, can send text messages to premium-rate numbers, resulting in unauthorized
charges on your mobile bill.
• Spread to other devices: Some malware can spread to other devices on the same
network or through infected apps.
• Financial Loss: Some types of malware, such as ransomware, can encrypt your
files and demand a ransom for their release.
Efforts to develop effective techniques for detecting malware in application stores
are critical due to the dynamic nature of permission usage in apps. Common issues
with permission feature-based detection methods include:
• Variability in permission usage across apps, making it challenging to establish
consistent correlations with malicious behavior.
• False positives, as benign apps may use permissions in similar ways to malicious
apps.
By leveraging innovative techniques, the proposed work aims to provide a fresh
perspective on detecting malicious apps in Android. The novelty of our proposed
work can be best described on the basis of the statistical selection procedure that we
2 https://www.statista.com/statistics/873097/malware-attacks-per-year-worldwide/.
32 Shagun et al.
have adopted, there are many works that use only feature-class correlation adoption
method, but we used both feature-feature and feature-class correlation based on
enhanced correlation-based feature selection (ECFS). The preliminary results based
on the work we did were satisfactory but need more evaluation on evaluating for
various values of .n 1 and .n 2 in future work.
1.2 Contributions
In this paper, we have used a statistical feature selection technique called enhanced
correlation-based feature selection (ECFS) which uses feature-feature correlation
scores evaluated using ENMRS and feature-class correlation scores evaluated using
crRelevance. The ECFS method was introduced by the authors of [1] for using
these correlations effectively to extract relevant feature subsets from multi-class
gene expression and other machine learning datasets. We have adopted this feature
selection technique in our paper for a multi-binary form of data which has features
named as different permissions needed by malicious and benign apps. Moreover,
the objects that were required by our adopted method are in the form of malicious
and non-malicious application names, and as for the multi-class parameter we have
defined Class A for non-malicious applications and Class B for malicious applica-
tions. The following points summarize the contributions of this work.
• We extracted permissions from malicious and non-malicious applications.
• We evaluated ENMRS scores for both malicious and non-malicious applications.
• We evaluated crRelevance scores for both malicious and non-malicious applica-
tions.
• Next, we evaluated ECFS scores for different values of .n 1 and .n 2 .
• Lastly, we evaluated the accuracy for each combination of .n 1 and .n 2 using various
machine learning techniques.
• We concluded our paper by noting the highest accuracy achieved is 92.25 % for
the combination .n 1 = 0.9 and .n 2 = 0.1 with the random forest technique.
2 Related Work
In this section, we shall embark on an intriguing expedition, delving into the depths
of preexisting or interconnected studies conducted in this specialized domain. There
are numerous studies in the literature that focus on detecting intrusions or anomalies
in the desktop domain [2–4]. However, since we aim to build a malware detector for
Android OS, hence, we focus on the discussion of Android malware. Some of the
Android malware detection techniques have analyzed dynamic network traffic fea-
tures such as [5–8]. Since we work on static detection, hence, we limit our discussion
mostly to static detection techniques. The authors in [1] proposed a permission-

ensemble-based mechanism to detect Android malware with permission combina-
tions. The authors in [9] introduced a permission-based malware detection system and
reimplemented Juxtapp for malware and piracy detection. Performance is evaluated
on a dataset with original, pirated, and malware-infected applications. The authors
in [10] introduced DynaMalDroid, a dynamic analysis-based framework for detect-
ing malicious Android apps. It employs system call extraction and three modules:
dynamic analysis, feature engineering, and detection. The authors in [11] developed
a new method for Android application analysis that involved using static analysis to
collect important features and passing them to a functional API deep learning model.
Li et al. [12] described a reliable Android malware classifier using factorization
machine architecture and app feature extraction. Their results showed that interac-
tions among features were critical to revealing malicious behavior patterns. Qiu et al.
[13] proposed Multiview Feature Intelligence (MFI) for detecting evolving Android
malware with similar capabilities. MFI extracts features via reverse engineering to
identify specific capabilities from known malware groups and detect new malware
with the same capability. The authors in [14] proposed a hybrid deep learning-based
malware detection method, utilizing convolutional neural networks and bidirectional
long short-term memory (BiLSTM) to accurately detect long-lasting malware. The
authors in [15] introduced a malware capability annotation (MCA) to detect security-
related functionalities of discovered malware.
The authors in [16] proposed a malware detection mechanism using transparent
artificial intelligence. This approach leverages app attributes to distinguish harmful
from harmless malware. Khalid and Hussain [17] analyzed the impact of dynamic
analysis categories and features on Android malware detection. Using filter and
wrapper methods identifies the most significant categories and list important features
within them. The authors in [18] introduced SHERLOCK, a deep learning algorithm
that uses self-supervision and the ViT model to identify malware. The authors in [19]
identified and ranked permissions commonly found in normal and malicious apps.
Li et al. [20] proposed a stealthy backdoor that is triggered when a specific app is
introduced and demonstrated the attack on common malware detectors. The authors
in [21] introduced AndroOBFS, a released obfuscated malware dataset spanning three
years (2018–2020). It consisted of 16,279 real-world malware samples across six
obfuscation categories, providing valuable temporal information. The authors in [22]
proposed AdMat, a framework that uses an adjacency matrix to classify Android apps
as images. This enables the convolutional neural network to differentiate between
benign and malicious apps. Canfora et al. [23] designed LEILA, a tool that uses
model checking to verify Java bytecode and detect Android malware families.
Yousefi-Azar et al. [24] proposed Byte2vec, which improves static malware detec-
tion by embedding semantic similarity of byte-level codes into feature and context
vectors. It allows for binary file feature representation and selection, enhancing mal-
ware detection capabilities. The authors in [25] presented Alterdroid, a dynamic anal-
ysis approach for detecting obfuscated malware components within apps. It works
by creating modified versions of the original app and observing the behavioral dif-
ferences. Eom et al. [26] used three feature selection methods to build a machine
34 Shagun et al.
learning-based Android malware detector, showing its effectiveness on the Malware

Genome Project dataset and their own collected data. Zhang and Jin [27] proposed a
process for Android malware detection using static analysis and ensemble learning.
Dissanayake et al. [28] study evaluates K-nearest neighbor (KNN) algorithm’s per-
formance with different distance metrics and Principal Component Analysis (PCA).
Results show improved classification accuracy and efficiency with the right distance
metric and PCA. The authors in [29] focused on detecting Android malware in APK
files by analyzing obfuscation techniques, permissions, and API calls. They high-
lighted the challenges faced by traditional antivirus software in detecting these mal-
ware variants. Amenova et al. [30] proposed a CNN-LSTM deep learning approach
for Android malware detection, achieving high accuracy through efficient feature
extraction. Mantoro et al. [31] employed dynamic analysis using the mobile security
framework to detect obfuscated malware. It showcases the effectiveness of dynamic
analysis in detecting various types of malware. The authors in [32] compared state-of-
the-art mobile malware detection methods, addressing Android malware and various
detection classifiers. It provided insights into the progress of the Android platform
and offered a clear understanding of the advancements in malware detection. The
authors in [33] proposed a framework, FAMD, for fast Android malware detection
based on a combination of multiple features. The original feature set is constructed by
extracting permissions and Dalvik opcode sequences from samples. Awais et al. [34]
introduced ANTI-ANT, a unique framework that detects and prevents malware on
mobile devices. It used three detection layers, static and dynamic analysis, and multi-
ple classifiers. Islam et al. [35] investigated the effectiveness of unigram, bigram, and
trigram with stacked generalization and found that unigram has the highest detection
rate with over 97% accuracy compared to bigram and trigram. The authors in [36–
39] have analyzed various manifest file components such as permissions, intents, and
hardware features for Android malware detection.
To the best of our knowledge, no other existing work has used enhance correlation-
based feature selection method on permissions feature for Android malware detec-
tion. We explain the methodology in detail in the next section.
3 Proposed Methodology
We explain the system design in various sub-phases described below.
3.1 Datasets
Our study involved the use of two datasets, one comprising normal apps and the
other containing malicious apps. The dataset for normal apps was collected from
the Google Play Store, while the dataset for malicious apps was obtained from the
AndroZoo.3 It is important to note that our study solely focused on apps in the Google
Play Store and did not consider apps available on other platforms. The AndroZoo
website is a growing library of Android apps collected from various sources, includ-
ing the official Google Play app market. It also contains a collection of app-related
metadata aimed at facilitating research on Android devices. The library currently
contains 15,097,876 unique APKs, which have been scanned by multiple antivirus
programs to identify malicious software. Each software in the dataset has over 20
different types of metadata, including VirusTotal reports. Our dataset consisted of
111,010 applications, with 55,505 labeled as malicious and the remaining 55,505
labeled as normal.
3.2 Feature Extraction
Feature extraction is a vital process in Android malware analysis as it helps in iden-

tifying the characteristics of malware and distinguishing it from benign applications.
Android permissions are commonly used as features for building a probable model
for Android malware analysis. Permissions related to the network, reading privacy,
receiving and sending SMS, dialing, and others are considered dangerous permis-
sions and are used to distinguish between malicious and benign applications. Hence,
we have selected permissions as the feature for experiments in this proposed work.
Android permission extraction is a crucial process used to detect potential malware
by extracting and analyzing permissions from Android apps. We followed a static
approach to extract permissions that involve decompiling the app’s APK file using
tools such as Apktool, JADX, or Androguard to extract the manifest file, which con-
tains details about the app’s rights. The permission declarations are then extracted
from the manifest file using XML parsing libraries. We had a total of 129 unique
permissions from both datasets.
3.3 Feature-Feature Correlation with ENMRS
To assess the ECFS scores in our dataset, we utilized the effective normalized mean
residue similarity (ENMRS) measure, an extension of the normalized mean residue
similarity (NMRS) measure.
While Pearson’s correlation coefficient is a commonly employed correlation mea-
sure, we selected NMRS as it exclusively focuses on detecting shifting correlation
rather than scaling correlation. However, both NMRS and Pearson’s correlation coef-
ficient are highly sensitive to atypical or noisy values, potentially leading to the
exclusion of significant features from the optimal feature subset. To overcome this
limitation, we substituted the object mean with object local means in ENMRS. These
3 https://androzoo.uni.lu.
36 Shagun et al.
local means are computed by averaging the element with its neighboring elements,
both to the left and right. This particular characteristic is crucial in feature-feature cor-
relation analysis when there is correlation within a subset of homogeneous objects.
In our study, we considered only a single left and right neighbor and opted for the
single neighborhood scheme in our local mean computation.
ENMRS quantifies the similarity between pairs of objects. .d1 = [a1 , a2 , . . . , an ]
and .d2 = [b1 , b2 , . . . , bn ] can be defined as follows:
.ENMRS (d1 , d2 )
∑n | |
| |
i =1 ai − almean(i ) − bi + blmean(i)
=1− (∑n |( )| ∑n | |)
2 × max | ai − almean(i) | , |bi − blmean(i) |
i=1 i=1
where
a
. Imean(i) = (ai−1 + ai + ai+1 ) /3 if, 1 < i < n,
.aImean(i) = (ai + ai+1 ) /2 if, i = 1,
.aImean(i) = (ai−1 + ai ) /2 if, i = n.
3.4 Feature-Class Correlation Measure: crRelevance
The crRelevance measure assesses how well a feature can differentiate between
different class labels, specifically in our case, malware and normal, and provides a
value within the [0, 1] range. A class range refers to a range of values for a feature
where all objects share the same class label. This range is determined by assigning a
consecutive range of values to a feature with identical class labels. The crRelevance
measure is built upon four definitions that establish the theoretical foundation of
crRelevance. The first definition states that for a feature with values corresponding
to n objects or instances in the dataset, a class range can be defined as a range where
all objects within that range have the same class label. The second definition defines
the cardinality of a class range as the number of objects in the given range for the
feature. The third definition describes the class-cardinality of class A as the number
of objects with the class label A. The fourth definition pertains to the core class range
of class A, which represents the highest class range for class A.
crRelevance .class
fi (A) is defined as follows.
rcard(ccrange(A))
crRelevanceclass
. fi (A) = ccard(A)
For dataset D, the core class relevance of a feature . f i ∈ F can be defined as the
highest crRelevance for a given class . Ai . Mathematically, crRelevance of a feature
. f i , crRelevance .( f i ), for a dataset with .n classes . A 1 , . A 2 , . . . , A n can be defined as
follows. ( )
.crRelevance ( f i ) = max1≤ j≤n crRelevance . f f
class
i
Aj
3.5 Proposed Feature Selection Technique: ECFS
The proposed method for feature selection utilizes ENMRS and crRelevance to cal-
culate an ECFS value for each pair of features, which falls within the range of 0–1.
The method ensures that a higher ECFS value corresponds to a stronger crRelevance
score (representing feature-class correlation) and a lower ENMRS score (represent-
ing feature-feature correlation). This is achieved by subtracting the ENMRS value
for the feature pair from 1 and adding it to the average crRelevance score. The con-
stants .n 1 and .n 2 are multiplied with the computed feature-feature and feature-class
components, respectively, to scale the range from [0, 2] to [0, 1] and control their
influence on the ECFS score.
The method selects a user-defined number of features by iteratively choosing
the next highest unprocessed feature pair that shares at least one common feature
and includes the common feature(s) in the selected subset. The resulting subset of
selected features is presented as the output.
ENMRS is directly calculated between each pair of features, while the crRelevance
of individual features is averaged to obtain the crRelevance value for the feature pair.
This approach ensures that a higher ECFS value, within the range of 0 to 1, reflects a
stronger feature-class correlation and a weaker feature-feature correlation Table 10.
crRelevance score is used to obtain the final ECFS value. ECFS value of a pair of
features . f 1 , f 2 can be computed as follows:
ECFS ( f 1 , f 2 ) = (n 1 × (1 − ENMRS ( f 1 , f 2 ))) + (n 2 × avgRelevance ( f 1 , f 2 ))
where
.n 1 and .n 2 are constants such that .n 1 + n 2 = 1, and .avgRelevance ( f 1 , f 2 ) =
crRelevance( f 1 )+crRelevance( f 2 )
2
. To adjust the range from [0, 2] to [0, 1] and regulate the
impact of feature-feature and feature-class correlations on the ECFS score, the equa-
tion involves multiplying the computed feature-feature and feature-class components
by the constants .n 1 and .n 2 . These constants are chosen such that .n 1 + n 2 = 1. When
both .n 1 and .n 2 are set to 0.5, both components contribute equally to the score. If .n 1 is
greater than 0.5, the result will have a larger contribution from ENMRS, representing
feature-feature correlation. Conversely, if .n 2 is greater than 0.5, the result will have
a larger contribution from crRelevance or feature-class correlation.
3.6 Machine Learning Techniques Used
We used the following machine learning techniques to evaluate the efficiency of the
ECFS scores for different values of .n 1 and .n 2 .
• Decision Tree: Decision trees are easy to interpret and can handle both categorical
and numerical data.
38 Shagun et al.
• Support Vector Machine: SVM is particularly useful when the number of features
is high, and the data is not linearly separable. It can handle both linear and nonlinear
classification by using different types of kernels.
• Logistic Regression: Logistic regression is a simple yet powerful algorithm that can
handle binary and multi-class classification problems and provides interpretable
results in terms of the contribution of each feature to the prediction.
• Random Forest: Random forest is a popular algorithm that is known for its high
accuracy and robustness to overfitting. It can handle both classification and regres-
sion problems and can provide insights into feature importance.
• K-Nearest Neighbor Classifier (KNN): KNN is a simple and intuitive algorithm
that does not make any assumptions about the underlying distribution of the data.
It can handle both classification and regression problems and can adapt to changes
in the data.
• Gaussian Naive Bayes: Gaussian Naive Bayes is a fast and efficient algorithm that
can handle high-dimensional data. It is particularly useful when the number of
features is much larger than the number of samples.
• Perceptron: Perceptron is a simple and efficient algorithm that can handle linearly
separable binary classification problems. It can converge quickly and is computa-
tionally efficient.
• SGD Classifier: SGD Classifier is a fast and scalable algorithm that can handle
large datasets with high-dimensional features. It is particularly useful for online
learning and can adapt to changes in the data. In conclusion, each of these machine
learning algorithms has its strengths and weaknesses, and the choice of algorithm
depends on the specific problem and the characteristics of the data. It is impor-
tant to understand the underlying assumptions, limitations, and trade-offs of each
algorithm before applying it to real-world problems.
4 Results and Discussion
As discussed in the above section, we can apply different values of .n 1 and .n 2 with a
constraint that their sum leads up to one. Hence, we have used different combinations
of .n 1 and .n 2 and we have summarized the results from each of the combinations in
the subsections described below.
4.1 .n1 = 0.1 and .n1 = 0.9
From Table 1, we conclude that for .n 1 = 0.1 and .n 2 = 0.9, the highest accuracy,
i.e., 9.21%, is obtained with random forest and the lowest accuracy, i.e., 70.16 %, is
obtained by perceptron. As .n 1 .> .n 2 , the accuracy scores are more inclined toward
the crRelevance value of the ECFS score.
Table 1 ML accuracy results for .n 1 = 0.1 and .n 2 = 0.9

ML accuracy
.n 1 .n 2 ML model Accuracy scores Accuracy (%)
0.1 0.9 Decision tree 0.8762 0.8744 87.21
0.8675 0.8819
0.8788 0.8650
0.8788 0.8625
0.8681 0.8681
0.1 0.9 SVM 0.8100 0.8197 81.51
0.8125 0.8184
0.8150
0.1 0.9 Logistic 0.7881 0.8038 79.88
regression 0.8125 0.7994
0.8019 0.7838
0.8000 0.8038
0.7925 0.8031
0.1 0.9 Random forest 0.8956 0.8888 89.21
0.8988 0.8919
0.8938 0.8806
0.9038 0.8844
0.8888 0.8950
0.1 0.9 KNeighbors 0.8694 0.8669 86.78
classifier 0.8669 0.8675
0.8744 0.8575
0.8681 0.8631
0.8706 0.8738
0.1 0.9 Gaussian NB 0.6844 0.7006 69.88
0.7013 0.7044
0.6913 0.6819
0.7075 0.7131
0.7006 0.7038
0.1 0.9 Perceptron 0.7394 0.7400 70.16
0.7369 0.7463
0.7356 0.7713
0.5388 0.7550
0.5031 0.7500
0.1 0.9 SGD classifier 0.7881 0.8044 79.91
0.8131 0.7963
0.8044 0.7825
0.8006 0.8019
0.7944 0.8056
40 Shagun et al.
4.2 .n1 = 0.2 and .n2 = 0.8
i.e., 87.90%, is obtained with random forest and the lowest accuracy, i.e., 68.08 %
is obtained by perceptron. As .n 1 .> .n 2 , the accuracy scores are more inclined toward
4.3 .n1 = 0.3 and .n2 = 0.7
i.e., 88.54%, is obtained by random forest and the lowest accuracy, i.e., 70.91 %, is
obtained by Gaussian NB. As .n 1 .> .n 2 , the accuracy scores are more inclined toward
4.4 .n1 = 0.4 and .n2 = 0.6
obtained by perceptron. As .n 1 .> .n 2 , the accuracy scores are more inclined toward
4.5 .n1 = 0.5 and .n2 = 0.5
From Table 5, we conclude that for .n 1 = 0.5 and .n 2 = 0.5, the highest accuracy, i.e.,
87.80%, is obtained by the random forest and the lowest accuracy, i.e., 71.79 %, is
obtained by Gaussian NB. As .n 1 .= .n 2 , the accuracy scores are balanced toward the
crRelevance and ENMRS values of the ECFS score.
4.6 .n1 = 0.6 and .n2 = 0.4
i.e., 90.96%, is obtained by Random forest and the lowest accuracy, i.e., 72.19 %, is
obtained by Gaussian NB. As .n 1 .< .n 2 , the accuracy scores are more inclined toward
the ENMRS value of the ECFS score.

ML accuracy
0.2 0.8 Decision tree 0.8581 0.8638 86.07
0.8713 0.8581
0.8650 0.8613
0.8463 0.8606
0.8488 0.8738
0.2 0.8 SVM 0.7778 0.7922 78.62
0.7850 0.7859
0.7900
0.2 0.8 Logistic 0.7694 0.7581 77.39
0.7844 0.7688
0.7594 0.7881
0.7544 0.7931
0.2 0.8 Random forest 0.8831 0.8856 87.90
0.8925 0.8694
0.8769 0.8769
0.8669 0.8769
0.8713 0.8906
0.2 0.8 KNeighbors 0.8563 0.8631 85.58
0.8563 0.8400
0.8463 0.8588
0.8394 0.8738
0.2 0.8 Gaussian NB 0.7094 0.6919 70.50
0.7169 0.6931
0.7169 0.6944
0.6950 0.7131
0.6938 0.7256
0.2 0.8 Perceptron 0.6275 0.5019 68.08
0.5275 0.6813
0.7394 0.7325
0.7269 0.7625
0.7313 0.7775
0.2 0.8 SGD classifier 0.7713 0.7619 77.33
0.7888 0.7681
0.7844 0.7813
0.7575 0.7731
0.7519 0.7944
42 Shagun et al.

ML accuracy
0.3 0.7 Decision tree 0.8656 0.8662 85.84
0.8563 0.8613
0.8631 0.8613
0.8606 0.8500
0.8456 0.8544
0.3 0.7 SVM 0.8016 0.8050 79.99
0.8000 0.7991
0.7938
0.3 0.7 Logistic 0.7594 0.7813 77.55
0.7688 0.7769
0.7894 0.7625
0.7813 0.7725
0.3 0.7 Random forest 0.8831 0.8894 88.54
0.9013 0.8881
0.8919 0.8838
0.8819 0.8869
0.8731 0.8750
0.3 0.7 KNeighbors 0.8588 0.8662 85.94
0.8656 0.8588
0.8613 0.8588
0.8506 0.8494
0.3 0.7 Gaussian NB 0.6938 0.7106 70.91
0.7344 0.7150
0.7019 0.6994
0.7213 0.7069
0.7019 0.7063
0.3 0.7 Perceptron 0.7406 0.7863 74.28
0.7656 0.6881
0.7688 0.7531
0.7731 0.7525
0.6613 0.7388
0.3 0.7 SGD classifier 0.7669 0.8019 78.39
0.7925 0.7938
0.7550 0.7925
0.7925 0.7725
0.7850 0.7863

ML accuracy
0.4 0.6 Decision tree 0.8856 0.8681 87.71
0.8788 0.8762
0.8738 0.8712
0.8819 0.88
0.8831 0.8725
0.4 0.6 SVM 0.8281 0.8281 82.39
0.8203 0.8153
0.8278
0.4 0.6 Logistic 0.7962 0.7938 78.82
0.7831 0.7831
0.8006 0.775
0.7925 0.7863
0.4 0.6 Random forest 0.905 0.8994 89.91
0.9062 0.8988
0.8962 0.8906
0.8994 0.8981
0.9025 0.8944
0.4 0.6 KNeighbors 0.8731 0.8813 87.34
0.8681 0.8763
0.8869 0.865
0.8656 0.8688
0.4 0.6 Gaussian NB 0.7219 0.7113 71.89
0.7056 0.7238
0.7106 0.7256
0.7375 0.7088
0.725 0.7188
0.4 0.6 Perceptron 0.595 0.78 62.14
0.7625 0.7644
0.4369 0.6
0.4988 0.4981
0.7794 0.4988
0.4 0.6 SGD classifier 0.8044 0.7956 79.38
0.7894 0.7663
0.7856 0.7913
0.8113 0.7863
0.7956 0.8119
44 Shagun et al.

ML accuracy
0.5 0.5 Decision tree 0.8556 0.8431 85.21
0.8506 0.8437
0.8594 0.8681
0.8513 0.8475
0.8650 0.8363
0.5 0.5 SVM 0.7981 0.7903 79.68
0.7984 0.8034
0.7934
0.5 0.5 Logistic 0.7812 0.7750 77.54
0.7681 0.7800
0.8006 0.7644
0.7669 0.7725
0.5 0.5 Random forest 0.8788 0.8775 87.80
0.8713 0.8575
0.8825 0.8931
0.8913 0.8781
0.8856 0.8644
0.5 0.5 KNeighbors 0.8306 0.8431 83.61
0.8419 0.8419
0.8513 0.8250
0.8506 0.8306
0.5 0.5 Gaussian NB 0.7188 0.7188 71.79
0.7250 0.7138
0.7194 0.7156
0.7425 0.7019
0.7131 0.7100
0.5 0.5 Perceptron 0.7325 0.5006 73.94
0.7594 0.7775
0.7719 0.7844
0.7888 0.7550
0.7663 0.7581
0.5 0.5 SGD classifier 0.7863 0.7863 77.86
0.7719 0.7688
0.7675 0.7856
0.8044 0.7669
0.7694 0.7788

ML accuracy
0.6 0.4 Decision tree 0.8875 0.9 89.09
0.9075 0.8919
0.8819 0.8938
0.88 0.8831 0.89
0.8938
0.6 0.4 SVM 0.8528 0.8519 85.08
0.8422 0.8519
0.8553
0.6 0.4 Logistic 0.8256 0.8013 80.91
0.805 0.7994
0.8069 0.8113
0.81 0.795
0.6 0.4 Random forest 0.9113 0.91 90.96
0.9231 0.915
0.8969 0.9063
0.9013 0.9069
0.9125 0.9125
0.6 0.4 KNeighbors 0.8925 0.8888 88.59
0.8806 0.8894
0.8875 0.8794
0.8806 0.8819
0.6 0.4 Gaussian NB 0.7481 0.7013 72.19
0.7281 0.7081
0.7194 0.7081
0.745 0.7175
0.7275 0.7156
0.6 0.4 Perceptron 0.8331 0.5775 77.28
0.8156 0.81
0.7781 0.7675
0.8056 0.7763
0.7938 0.77
0.6 0.4 SGD classifier 0.83 0.8069 81.08
0.8138 0.8169
0.7919 0.8075
0.8125 0.8138
0.8238 0.7906
46 Shagun et al.
4.7 .n1 = 0.7 and .n2 = 0.3
i.e., 91.64%, is obtained by Random forest and the lowest accuracy, i.e., 72.64 %, is
obtained by Gaussian NB. As .n 1 .< .n 2 , the accuracy scores are more inclined toward
4.8 .n1 = 0.8 and .n2 = 0.2
obtained by perceptron. As .n 1 .< .n 2 , the accuracy scores are more inclined toward
4.9 .n1 = 0.9 and .n2 = 0.1
obtained by perceptron. As .n 1 .< .n 2 , the accuracy scores are more inclined toward
5 Conclusion
From Table 10, we conclude our paper by evaluating that the highest accuracy, i.e.,
92.25 % was achieved by random forest ML technique for the values of .n 1 = 0.9 and
.n 2 = 0.1. As .n 1 .> .n 2 , our accuracy results were more inclined toward the ENMRS
values of the ECFS scores. Also in Table 10, we interpret the pattern as for higher
.n 1 values, i.e., for the feature-feature correlation (ENMRS) factor, the accuracy
increases. The accuracy decreases for higher .n 2 values meaning for feature-class
correlation (crRelevance) scores. Thus, for higher .n 1 value and lower .n 2 value of the
ECFS score the ML techniques have better accuracy. The preliminary results based
on the work we done were satisfactory but need more evaluation on evaluating for
various values of .n 1 and .n 2 in future work.

ML accuracy
0.7 0.3 Decision tree 0.9000 0.8981 89.50
0.8894 0.8894
0.9069 0.8969
0.8881 0.8856
0.9081 0.8875
0.7 0.3 SVM 0.8538 0.8547 85.89
0.8634 0.8588
0.8638
0.7 0.3 Logistic 0.8138 0.7944 81.48
0.8269 0.8213
0.8206 0.8000
0.8294 0.8138
0.7 0.3 Random forest 0.9150 0.9175 91.64
0.9038 0.9113
0.9294 0.9275
0.9144 0.9056
0.9219 0.9175
0.7 0.3 KNeighbors 0.8894 0.8925 89.30
0.9025 0.9019
0.9025 0.8875
0.9006 0.8938
0.7 0.3 Gaussian NB 0.7244 0.7063 72.64
0.7250 0.7219
0.7413 0.7331
0.7263 0.7125
0.7375 0.7363
0.7 0.3 Perceptron 0.7425 0.7756 75.21
0.7619 0.5519
0.7975 0.8500
0.7981 0.8319
0.8656 0.5463
0.7 0.3 SGD classifier 0.8056 0.7975 81.60
0.8088 0.8281
0.8263 0.8288
0.8250 0.7994
0.8300 0.8106
48 Shagun et al.

ML accuracy
0.8 0.2 .Decision
tree 0.8994 0.9156 89.975
0.8931 0.8912
0.9019 0.8969
0.9094 0.8950
0.9019 0.8931
0.8 0.2 SVM 0.8628 0.8641 86.7688
0.8663 0.8741
0.8713
0.8 0.2 Logistic 0.8088 0.8356 82.4250
0.8294 0.8281
0.8250 0.8269
0.8319 0.8163
0.8 0.2 Random forest 0.9169 0.9150 91.9625
0.9169 0.9163
0.9288 0.9194
0.9244 0.9163
0.9225 0.9200
0.8 0.2 KNeighbors 0.8938 0.8963 89.8500
0.8988 0.8931
0.9100 0.8956
0.9075 0.8975
0.8 0.2 Gaussian NB 0.7150 0.7363 73.7063
0.7500 0.7219
0.7494 0.7425
0.7419 0.7363
0.7469 0.7306
0.8 0.2 Perceptron 0.5588 0.8050 70.7625
0.8113 0.7213
0.7375 0.7175
0.7988 0.8013
0.7281 0.3969
0.8 0.2 SGD classifier 0.8081 0.8381 82.2000
0.8100 0.8100
0.8363 0.8256
0.8238 0.8219
0.8306 0.8156

ML accuracy
0.9 0.1 .Decision
tree 0.9013 0.9056 90.2938
0.9000 0.9100
0.9063 0.9069
0.9094 0.9019
0.8913 0.8969
0.9 0.1 SVM 0.8831 0.8797 87.4938
0.8747 0.8813
0.8559
0.9 0.1 Logistic 0.8188 0.8319 82.7063
0.8269 0.8344
0.8313 0.8256
0.8200 0.8188
0.9 0.1 Random forest 0.9113 0.9256 92.2500
0.9256 0.9225
0.9319 0.9325
0.9256 0.9206
0.9194 0.9100
0.9 0.1 KNeighbors 0.8925 0.9000 90.5375
0.9063 0.9188
0.9019 0.9094
0.9088 0.8913
0.9 0.1 Gaussian NB 0.7388 0.7344 73.8750
0.7381 0.7538
0.7381 0.7413
0.7450 0.7281
0.7388 0.7313
0.9 0.1 Perceptron 0.8500 0.7988 71.2000
0.8388 0.5619
0.6325 0.8544
0.6206 0.8506
0.5619 0.5506
0.9 0.1 SGD classifier 0.8163 0.8256 82.2625
0.8250 0.8363
0.8200 0.8213
0.8294 0.8219
0.8175 0.8131
Table 10 ML accuracy for different value of .n 1 and .n 2

Highest accuracy
.n 1 0.1 and 0.2 and 0.3 and 0.4 and 0.5 and 0.6 and 0.7 and 0.8 and 0.9 and
and .n 2
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Accuracy (%) 89.21 87.89 88.54 89.90 87.80 90.95 91.63 91.96 92.25
50 Shagun et al.
References
1. Borah P, Ahmed HA, Bhattacharyya DK (2014) A statistical feature selection technique. Netw
Model Anal Health Inform Bioinf 3:55
2. Sushmakar N, Oberoi N, Gupta S, Arora A (2022) An unsupervised based enhanced anomaly
detection model using features importance. In: 2022 2nd international conference on intelligent
technologies (CONIT), Hubli, India, pp 1–7
3. Raman SKJ, Arora A (2022) An enhanced intrusion detection system using combinational
feature ranking and machine learning algorithms. In: 2022 2nd international conference on
intelligent technologies (CONIT), Hubli, India, pp 1–8
4. Sharma Y, Sharma S, Arora A (2022) Feature ranking using statistical techniques for computer
networks intrusion detection. In: 2022 7th international conference on communication and
electronics systems (ICCES), Coimbatore, India, pp 761–765
5. Arora A, Garg S, Peddoju SK (2014) Malware detection using network traffic analysis in
android based mobile devices. In: 2014 Eighth international conference on next generation
mobile apps, services and technologies, Oxford, UK, pp 66–71
6. Arora A, Peddoju SK (2017) Minimizing network traffic features for android mobile malware
detection. In: Proceedings of the 18th international conference on distributed computing and
networking (ICDCN ’17). Association for Computing Machinery, New York, NY, USA, Article
32, pp 1–10
7. Arora A, Peddoju SK (2018) NTPDroid: a hybrid android malware detector using network
traffic and system permissions. In: 2018 17th IEEE international conference on trust, security
and privacy in computing and communications/12th IEEE international conference on big data
science and engineering (TrustCom/BigDataSE), New York, NY, USA, pp 808–813
8. Arora A, Peddoju SK, Chouhan V, Chaudhary A (2018) Hybrid android malware detection
by combining supervised and unsupervised learning. In: Proceedings of the 24th annual inter-
national conference on mobile computing and networking (MobiCom ’18). Association for
Computing Machinery, New York, NY, USA, pp 798–800
9. Kumari N, Chen M (2022) Malware and piracy detection in android applications. In: 2022 IEEE
5th International conference on multimedia information processing and retrieval (MIPR), CA,
USA, pp 306–311
10. Haidros Rahima Manzil H, and Naik MS (2022) DynaMalDroid: dynamic analysis-based detec-
tion framework for android malware using machine learning techniques. In: 2022 International
conference on knowledge engineering and communication systems (ICKES), Chickballapur,
India, pp 1–6
11. İbrahim M, Issa B, Jasser MB (2022) A method for automatic Android malware detection
based on static analysis and deep learning. IEEE Access 10:117334–117352
12. Li C, Mills K, Niu D, Zhu R, Zhang H, Kinawi H (2019) Android malware detection based on
factorization machine. IEEE Access 7:184008–184019
13. Qiu J et al (2023) Cyber Code Intelligence for Android malware detection. IEEE Trans Cybern
53(1):617–627
14. Haq IU, Khan TA, Akhunzada A (2021) A dynamic robust DL-based model for android malware
detection. IEEE Access 9:74510–74521
15. Qiu J et al (2019) A3CM: automatic capability annotation for Android malware. IEEE Access
7:147156–147168
16. Alani MM, Awad AI (2022) PAIRED: an explainable lightweight Android malware detection
system. IEEE Access 10:73214–73228
17. Khalid S, Hussain FB (2022) Evaluating dynamic analysis features for Android malware cate-
gorization. In: 2022 International wireless communications and mobile computing (IWCMC).
Dubrovnik, Croatia, vol 2022, pp 401–406
18. Seneviratne S, Shariffdeen R, Rasnayaka S, Kasthuriarachchi N (2022) Self-supervised vision

transformers for malware detection. IEEE Access 10:103121–103135
19. Upadhayay M, Sharma A, Garg G, Arora A (2021) RPNDroid: Android malware detection
using ranked permissions and network traffic. In: 2021 Fifth world conference on smart trends
in systems security and sustainability (WorldS4), London, United Kingdom, pp 19–24
20. Li C et al (2022) Backdoor attack on machine learning based Android malware detectors. In:
IEEE Transactions on dependable and secure computing, vol 19, no 5, pp 3357–3370
21. Kumar S, Mishra D, Panda B, Shukla SK (2022) AndroOBFS: time-tagged obfuscated Android
malware dataset with family information. In: 2022 IEEE/ACM 19th international conference
on mining software repositories (MSR), Pittsburgh, PA, USA, pp 454–458
22. Vu LN, Jung S (2021) AdMat: a CNN-on-matrix approach to Android malware detection and
classification. IEEE Access 9:39680–39694
23. Canfora G, Martinelli F, Mercaldo F, Nardone V, Santone A, Visaggio CA (2019) LEILA:
Formal tool for identifying mobile malicious behaviour. IEEE Trans Soft Eng 45(12):1230–
1252
24. Yousefi-Azar M, Hamey L, Varadharajan V, Chen S (2020) Byte2vec: malware representation
and feature selection for Android. Comput J 63(1):1125–1138
25. Suarez-Tangil G, Tapiador JE, Lombardi F, Pietro RD (2016) Alterdroid: differential fault
analysis of obfuscated smartphone malware. IEEE Trans Mobile Comput 15(4):789–802
26. Eom T, Kim H, An S, Park JS, Kim DS (2018) Android malware detection using feature selec-
tions and random forest. In: 2018 International conference on software security and assurance
(ICSSA), Seoul, Korea (South), pp 55–61
27. Zhang X, Jin Z (2016) A new semantics-based android malware detection. In: 2016 2nd IEEE
international conference on computer and communications (ICCC), Chengdu, pp 1412–1416
28. Dissanayake S, Gunathunga S, Jayanetti D, Perera K, Liyanapathirana C, Rupasinghe L An
analysis on different distance measures in KNN with PCA for Android malware detection.
In: 2022 22nd international conference on advances in ICT for emerging regions (ICTer),
Colombo, Sri Lanka, pp 178–182
29. Hassan M, Sogukpinar I (2022) Android malware variant detection by comparing traditional
antivirus. In: 2022 7th international conference on computer science and engineering (UBMK),
Diyarbakir, Turkey, pp 507–511
30. Amenova S, Turan C Zharkynbek D (2022) Android malware classification by CNN-LSTM.
In: 2022 International conference on smart information systems and technologies (SIST), Nur-
Sultan, Kazakhstan, pp 1–4
31. Mantoro T, Stephen D, Wandy W (2022) Malware detection with obfuscation techniques on
android using dynamic analysis. In: 2022 IEEE 8th international conference on computing,
engineering and design (ICCED), Sukabumi, Indonesia, pp 1–6
32. Jebin Bose S, Kalaiselvi R (2022) A state-of-the-art analysis of android malware detection
methods. In: 2022 6th international conference on trends in electronics and informatics (ICOEI),
Tirunelveli, India, pp 851–855
33. Bai H, Xie N, Di X, Ye Q (2020) FAMD: a fast multifeature Android malware detection
framework, design, and implementation. IEEE Access 8:194729–194740
34. Awais M, Tariq MA, Iqbal J, Masood Y (2023) Anti-ant framework for android malware
detection and prevention using supervised learning. In: 2023 4th International conference on
advancements in computational sciences (ICACS), Lahore, Pakistan, pp 1–5
35. Islam T, Rahman S, Hasan M, Rahaman A, Jabiullah I (2020) Evaluation of N-gram based
multi-layer approach to detect malware in Android. Procedia Comput Sci 171: 1074–1082
36. Arora A, Peddoju SK, Conti M (2020) PermPair: Android malware Detection using permission
pairs. IEEE Trans Inf Forensics Secur 15:1968–1982
37. Khariwal K, Singh J, Arora A (2020) IPDroid: Android malware detection using intents and
permissions. In: 2020 fourth world conference on smart trends in systems, security and sus-
tainability (WorldS4), London, UK, pp 197–202
38. Garg G, Sharma A, Arora A (2021) SFDroid: Android malware detection using ranked static
features. Int J Recent Technol Eng 10(1):142–152
52 Shagun et al.
39. Gupta S, Sethi S, Chaudhary S, Arora A (2021) Blockchain based detection of Android malware
using ranked permissions. Int J Eng Adv Technol (IJEAT) 10(5):68–75
Fake News Detection Using Ensemble
Learning Models
Devanshi Singh, Ahmad Habib Khan, and Shweta Meena
Abstract People are finding it simpler to find and ingest news as a result of the
information’s easy access, quick expansion, and profusion on social media and in
traditional news outlets. However, it is becoming increasingly difficult to distinguish
between true and false information, which has resulted in the proliferation of fake
news. Fake news is a term that refers to comments and journalism that intentionally
mislead readers. Additionally, the legitimacy of social media sites, where this news
is primarily shared, is at stake. These fake news stories can have significant negative
effects on society, so it is becoming increasingly important for researchers to focus
on how to identify them. In this research paper, we have compared ensemble learning
models for identifying fake news by analyzing a report’s accuracy and determining
its veracity. The paper’s objective is to use natural language processing (NLP) and
machine learning (ML) algorithms to identify false news based on the content of news
stories. The algorithms like decision trees, random forests, AdaBoost, and XGBoost
classification are used for the project. A web application has been developed using
Python Flask framework to mitigate the challenges associated with identifying false
information.
Keywords Decision tree · Random forest · AdaBoost · XGBoost · Performance

analysis
1 Introduction
The concept of fake news and hoaxes existed prior to the rise of online media.
However, the rapid expansion of availability of online news has made it challenging
to distinguish between genuine information and fake news. Fake news has two key
features: authenticity and objective. Authenticity means verifying the legitimacy of
D. Singh · A. H. Khan · S. Meena (B)

Delhi Technological University, Bawana Road, Rohini, New Delhi 110042, India
e-mail: shwetameena@dtu.ac.in
URL: http://dtu.ac.in/
https://doi.org/10.1007/978-981-99-6553-3_4
54 D. Singh et al.
information, or in our case, fake news that comprises of false information that is
often challenging to verify. The second feature of fake news is objective, or in simple
terms, intent, which means that the information is intentionally created to mislead
consumers, deceive the public into believing certain lies, and promote particular ideas
or agendas. The widespread circulation of false news can have a negative impact on
our society as a whole. Firstly, it can alter how people conceive and react to fake
news. Secondly, the abundance of fake news can undermine public trust in the media,
leading to skepticism and damaging the credibility of news sources. Thirdly, fake
news can manipulate people into accepting biased and false narratives. Politicians
and officials often manipulate and alter fake news for political purposes, influencing
consumers, and promoting their own agendas [1].
On the one hand, social media titans like WhatsApp, Instagram, Facebook, and
Twitter acknowledge that their works are misused, they are also accosted with the
vast scope of their efforts. Fake news can take various forms, including fake user
accounts containing fake content, photoshopped images, a skillfully created network-
based content intended to mislead or delude a specific group of people, as well
as intellectually created stories that contain scientific or low-cost explanation of
unresolved issues, that eventually leads to the proliferation of false information. As
a result of the aforementioned characteristics, detecting fake news poses new and
challenging problems.
The work proposed here is done to tackle the same issue. We have developed a
web application for determining whether a news article is real or misleading using
ensemble learning algorithms. In the next section, we have thoroughly reviewed and
discussed related works in this matter to understand the complexities of the prob-
lem in every domain. There we have talked about how fake news can very conve-
niently change the perspective of people. The third section elaborates on the research
methodology where we have compared the classifiers for fake news detection. The
section contains a brief description of the dataset used, data preprocessing, feature
extraction, and evaluation metrics. We have also described the algorithms and their
workings in the section. Fourth section discusses the results obtained and the last
section concludes our paper.
Main contribution:
. The work gives an opportunity to research new areas in fake news detection.
. The study compares ensemble-based learning methods.
. It gives a new understanding of the veracity of the news that we digest on a daily
basis.
2 Related Works
This section includes review of fake news detection and how the problem of false news
still persists and the effects of spread of fake news around the world. Fake news has
become a major concern for society as it is generated for various reasons, including
commercial interests and malicious agendas. The diffusion of fake news social media
Fake News Detection Using Ensemble Learning Models 55
platforms has created a double-edged sword for news consumers, because of the fact
that not only does it grant people accessibility but also allows the spread of low-
quality news with the intention of misleading the public. Since false news online is
spreading faster than ever, it can not only alter people’s perception of reality but can
also have an adverse effect on society. This makes detection of fake news a vital
research area and has attracted significant attention in the past few years.
The definition of social media includes websites, applications, or software that
are specifically designed for creating and sharing content, social networking, open
discussions in forums, microblogging [2, 3]. Some scholars believe that fake news
may arise unintentionally, for example, due to a lack of education or inadvertent
actions, such as what occurred in the case of the Nepal Earthquake [4, 5]. During
the year 2020, a significant amount of fake news about health was circulated, posing
a threat to global health. In February 2020, the World Health Organization (WHO)
issued a cautionary statement regarding the COVID-19 pandemic, stating that it
had caused a widespread “infodemic” consisting of both accurate and inaccurate
information [1]. The prevalence of misleading news stories has been a persistent
issue, with some claiming that it played a role in influencing the outcome of the 2016
United States Presidential Election [6]. There are many methods developed to tackle
the problem of misinformation online, which has become rampant, especially with
the sharing of articles that are not based on facts. This issue has caused problems
majorly in politics and also in other areas such as sports, health, and science [7].
In response to the problem of fake news, researchers have focused on develop-
ing algorithms for detecting false news. One approach is to use machine learning
algorithms such as decision trees, random forests, AdaBoost, and XGBoost. In one
project, the researchers focused on news articles that were marked as fake or real.
They employed the decision tree algorithm [8], which is a form of machine learn-
ing algorithm, to aid in detecting misleading news. Decision tree was one of the
algorithms or in other words, classifiers, used by the researchers in their painstaking
efforts to identify false information in news.
In a different study, researchers used one such classifier being the random forest.
To determine the actual result of the classifier, multiple random forests [9] were
implemented, which assigned a score to each potential outcome, and the result with
the most votes was chosen. The employment of machine learning algorithms for fake
news detection shows promising results, but the challenges presented by fake news on
social media require continued research and development of more effective solutions.
In one research, a comparative analysis is done among SVM, Naïve Bayes, random
forest, and logistic regression classifiers to detect fake news applying on different
datasets. The results did not show very promising results in the case of SVM and
logistic regression for some datasets [10]. In another study, a dataset consisting
of 10,000 tweets and online posts of fake and real news concerning COVID-19
were analyzed. Machine learning algorithms like logistic regression (LR), K-nearest
neighbor (KNN), linear support vector machine (LSVM), and stochastic gradient
descent (SGD) were used in that work [11].
As far as we know, the suggested methodology in this paper has not been applied
in any studies that have attempted to address this problem statement.
56 D. Singh et al.
In our work, we examine ensemble learning models such as the decision tree algo-
rithm, the random forest algorithm (bagging), AdaBoost algorithm (boosting), and
the XGBoost algorithm (boosting) on a fake news dataset that was obtained from Kag-
gle. The models are evaluated, and the accuracy of each of the models is calculated.
Upon receiving the results, the algorithms are compared to rank their efficiency. The
section includes brief descriptions of dataset description, data preprocessing, fea-
ture extraction, all the algorithms that we have evaluated, evaluation metrics, and a
description of the web application that we have developed. The use-case diagram
briefly displays all the processes involved in the application.
3.1 Dataset Description and Data Preprocessing
The data preprocessing module is responsible for completing all necessary prepro-
cessing tasks required to handle the training data and is collected from the primary
dataset. This includes identifying any null or missing values in the training data and
performing preprocessing operations such as tokenization, which involves breaking
down a character string into smaller components like keywords, phrases, symbols,
words, and other elements, and stemming, an NLP technique used to reduce a word
to its base form or stem to recognize similar words.
The dataset that we used in our work was obtained from Kaggle and was created by
Clément Bisaillon. To prepare that dataset for determining the authenticity of those
news articles, the process can be divided into several discrete steps. Initially, the data
was meticulously examined to ensure its availability and accessibility. The data was
then uploaded into CSV files. After that, it required preprocessing to improve its qual-
ity and ensure compatibility with the machine learning program. Data preprocessing
eliminated unnecessary information like punctuation, including redundant commas,
question marks, quotes, and trophes, as well as removing irrelevant columns, missing
values, numeric text, and URLs from text.
Due to the enormous amount of terms, words, and phrases included in documents, text
categorization presents a challenge when working with high-dimensional data. As a
result, the learning process is subject to a significant computational load. Addition-
ally, the existence of duplicate and irrelevant features may adversely affect classifier’s
performance and accuracy. Therefore, feature reduction is crucial to reduce the size
of the text feature set and prevent the use of high-dimensional feature spaces. In
this work, two distinct feature selection methods—term frequency (TF) and term
frequency-inverted document frequency (TF-IDF) were investigated. The following

provides an explanation of these methods.
Term Frequency (TF) Based on the frequency of words used in the papers, the term
frequency (TF) technique determines how similar the documents are to one another.
Word counts are contained in an equal-length vector that represents each document.
After that, the vector is normalized so that each of its components adds up to one.
The likelihood of the words appearing in the papers is then calculated from the word
count. A word is given a value of one if it appears in a certain document, whereas
it is given a value of zero if it does not. As a result, a group of words serve as a
representation for each document.
Term Frequency-Inverse Document Frequency (TF-IDF) A weighting measure
that is frequently used in information retrieval and natural language processing is
called the term frequency-inverted document frequency (TF-IDF). It is a statistical
tool used to assess a term’s importance to a document inside a dataset. The frequency
of a word in the corpus balances out the relevance of a term as it increases in frequency
inside a document. The ability of IDF to lessen the weight of the term frequency while
enhancing the uncommon ones is one of its key qualities. For instance, when using TF
alone, frequently occurring words like “the” and “then” may dominate the frequency
count. Employing IDF, though, lessens the significance of these phrases.
3.3 Algorithms
The classifiers used in the study are decision tree, random forest, AdaBoost, and
XGBoost. The algorithms and how they work are explained in this section.
Decision Tree Algorithm Decision trees in simple terms can be defined as a divide-
and-conquer algorithm. In huge databases, it can be used to find characteristics and
identify patterns that are crucial for classification and predictive modeling. These
characteristics, along with their perceptive interpretation, become so important in
the extraction of meaning from data. This is the reason why decision trees have been
extensively used for predictive modeling applications for so many years. Decision
trees have established a solid foundation in both the domains—machine learning and
artificial intelligence.
In decision tree, a tree-like structure is formed which almost resembles a flowchart.
The top-most element is called root node. An internal node signifies a feature or an
attribute. Whereas an edge signifies the rule of a decision. Finally, a leaf node signifies
the outcome. Based on the value of the attribute, it learns to divide. This process is
repeated until a final outcome is obtained. This is called recursive partitioning. It is
a representation in the form of a schematic diagram that accurately mimics human-
level thinking. The fundamental concept of decision tree algorithms is as follows:
. For dividing the data into subsets, the best attribute is chosen. This is done with
the help of attribute selection measures (ASM).
58 D. Singh et al.
Fig. 1 Decision tree structure
Fig. 2 Decision tree algorithm
. To divide the data into smaller groups, use that attribute as a decision node.
. Recursively carry out this procedure for each child node to build the tree until one
of the following conditions is satisfied: There are no more attributes to evaluate,
no more data instances to split, or when all of the data tuples have the same value
of an attribute (Figs. 1 and 2).
Decision tree modeling has an advantage of interpretability of the constructed

model which means that apart from the extraction of relevant patterns, and important
features can also be identified. Because of this interpretability, the information related
to the interclass relationships can be used for supporting future experiments and data
analysis. The use of decision tree methods is applicable to many different fields. It can
Fig. 3 Working of random forest algorithm
be used to improve search engines, find new applications in the medical disciplines,
find data, extract text, spot data gaps in a class, and replace statistical techniques.
Numerous decision tree algorithms have been created in which both their accuracy
and cost-effectiveness differ.
Random Forest Algorithm Random forest is an advanced version of decision trees
that use multiple trees to make predictions. The ultimate forecast is based on the
individual trees’ majority decision. This method results in a low error rate, as the trees
are less correlated with each other. The random forest algorithm operates through
the following sequence of operations (Fig. 3).
. Random samples are chosen from the provided dataset.

. A decision tree is created for each of the selected samples, and a prediction result
is obtained from each of them.
. The predicted results are subjected to voting, where mode is used for classification
problems and mean for regression problems.
. The final prediction is determined by the prediction result which had the highest
number of votes.
The random forest algorithm addresses the limitations of decision tree algorithms
by improving accuracy and mitigating overfitting. It also eliminates the require-
ment for complicated package configurations, like those necessary for Scikit-learn.
Notable characteristics of the random forest algorithm include its heightened accu-
racy in comparison with decision trees, its ability to proficiently handle missing data,
its capacity to generate accurate predictions without requiring hyper-parameter tun-
ing, and its remedy for the overfitting issue related to decision trees. Additionally,
each tree in the random forest algorithm selects a subset of features randomly at the
splitting point [12, 13].
60 D. Singh et al.
Fig. 4 Working of AdaBoost algorithm
AdaBoost Algorithm AdaBoost also known as “Adaptive Boosting” is a machine

learning algorithm that is used for classification and regression tasks. In this method,
numerous “weak” classifiers are combined together to form a “strong” classifier.
In each iteration, AdaBoost trains a new classifier on the dataset and then assigns
higher weights to the samples that were misclassified by the previous classifiers. This
way, subsequent classifiers focus on the difficult samples that the previous classifiers
couldn’t classify accurately (Fig. 4).
In AdaBoost, since we know that the weights are re-assigned, the ultimate “strong”
classifier that is formed is a weighted average of the “weak” classifiers. Here, the
weight of every classifier is proportional to its accuracy. The AdaBoost algorithm
integrates all of the weak classifiers’ predictions for a new sample, weighting them
according to their significance, to create a final prediction. AdaBoost has been widely
used in real-world applications, and it is known for its simplicity and high accuracy.
However, if the number of iterations is too large, it may overfit the training data and
be sensitive to noisy data and outliers.
XGBoost Algorithm XGBoost short for “Extreme Gradient Boosting” is a carefully
designed package. It was created to be an efficient, flexible, scalable, and a portable
distributed library. XGBoost is based on the framework called gradient boosting
framework, which forms the foundation of the algorithm and implements machine
learning algorithms. With the help of this library’s parallel tree boosting technology,
a variety of data science issues can be solved quickly and precisely. XGBoost has
gained significant popularity in recent years and is now a prominent tool in applied
machine learning. It is also a famous tool in Kaggle competitions due to its high
Fig. 5 Working of XGBoost algorithm
scalability in almost all cases. Almost all the winning solutions used XGBoost for
training their models. In essence, one can easily confer that XGBoost is a higher
and a well-improved version of gradient boosted decision trees (GBM), which was
designed to enhance speed and performance (Fig. 5).
XGBoost features:
. Regularized Learning: By refining the learned weights, regularization reduces the

likelihood of overfitting. Models that use simple and predictive functions are given
priority by the regularized objective.
. Parallelization: The most time-consuming step in tree learning is sorting the data.
Data is kept in in-memory units termed “blocks” to lower sorting costs. Data
columns in each block are organized according to the associated feature value.
Before training, this calculation only needs to be performed once and can be used
again. Block sorting can be done separately and distributed among the CPU’s
parallel threads. Multiple CPU cores are used to teach the model. The parallel
collection of data for each column allows for the parallelization of the split finding.
. Two more methods, shrinkage and column subsampling, are used for further avoid-
ing overfitting. The first method, which is shrinkage, was presented by Friedman.
It was seen that after every step of tree boosting, shrinkage scales freshly added
weights by a factor .η, decreasing the impact of each tree and allowing future trees
to enhance the model. The second one is column subsampling which speeds up
the parallel algorithm’s calculations while preventing overfitting even more than
the conventional row subsampling does.
62 D. Singh et al.
3.4 Evaluation Metrics
To evaluate the accuracy of the algorithm in detecting fake news, various evaluation
measures are used. Among these measures, the most commonly used one is the
confusion matrix, which is used to evaluate classification tasks. By defining the task
of detecting fake news as a classification problem, the measures of the confusion
matrix can be used for evaluating its performance [12]:
TP
Precision = (1)
TP + FP
TP
Recall = (2)
TP + FN
TP + TN
Accuracy = (3)
TP + TN + FP + FN
2 · (Precision · Recall)
F1 Score = (4)
Precision + Recall
where TP represents (True Positive) and TN represents (True Negative). FP represents

(False Positive) and TN represents (False Negative) as given in Table 1.
The evaluation measures are commonly used in machine learning algorithms to
assess the effectiveness of a classifier from various perspectives. One of the most
important measures is the accuracy metric, which indicates how close the predicted
fake news is to the actual fake news. Precision suggests the proportion of correctly
identified fake news among all the identified fake news, which is an important issue in
fake news classification. However, recall is used for measuring the level of sensitivity,
or the percentage of elucidated fake news accurately identified as fake, because the
fake news dataset is frequently imbalanced and a higher level of precision can be
obtained by making fewer positive predictions. It is important to note that higher
values for recall, precision, and accuracy indicate better performance [14].
Table 1 Parameters of evaluation metrics

Parameters Description
True Positive (TP) Prediction is positive and it’s true
True Negative (TN) Prediction is negative but it’s true
False Positive (FP) Prediction is positive but it’s false
False Negative (FN) Prediction is negative and it’s false
Fig. 6 Use case diagram
3.5 Web Application
We have built a web application using Python flask that uses machine learning
algorithms—decision trees, random forests, AdaBoost, XGBoost classifiers for clas-
sifying the news as fake or real. Figure 6 shows the use-case diagram of the system.
The findings of this study are explained in this section, as well as how different
factors were used to gauge how well the research model performed. The evaluation
outcomes for the fabricated information datasets have been illustrated through the
utilization of the confusion matrix for the four algorithms. The four algorithms used
for the detection are as follows:
64 D. Singh et al.
Fig. 7 a Decision tree, b Random forest, c AdaBoost, d XGBoost
. Decision tree
. Random forest
. AdaBoost
. XGBoost.
The confusion matrix is automatically obtained by Python code using the scikit-learn
learn library when running the algorithm code on Kaggle. Figure 7 represents the
obtained confusion matrices with their respective TP, FP, TN, and FN values.
Table 2 shows all of the outcomes of the evaluation metrics that were used to cor-
rectly categorize the fake news. The results obtained after classification displayed
the accuracy of the machine learning algorithms—decision tree, random forest,
AdaBoost, and XGBoost classifier as 93.4%, 97.2%, 94.0%, and 97.6%, respec-
tively.
All the machine learning algorithms gave promising results in the detection of
fake news with XGBoost giving the highest accuracy of 97.6% .
Table 2 Performance comparison of classification models

Model Precision Recall F1-Score Accuracy (%)
Decision tree 0.93 0.93 0.93 93.4
Random forest 0.97 0.97 0.97 97.2
AdaBoost 0.94 0.94 0.94 94.0
XGBoost 0.98 0.98 0.98 97.6
5 Conclusion
The widespread distribution of false information on the Internet can have negative
consequences for society, as it can cause confusion and mislead readers. Machine
learning can be used to address this issue by predicting whether a news article is
genuine or fake. Although different machine learning techniques have shown some
success in distinguishing fake news from real news, it is challenging to classify fake
news due to its constantly changing characteristics and elements. One key limitation
is the constantly evolving nature of fake news, which poses a challenge for proper
classification. Additionally, acquiring large and diverse datasets that include the vast
landscape of fake news is another constraint that still remains a challenge. Some
supervised learning algorithms based on ensemble methods such as random forest
(Bagging) and XGBoost (Boosting), have been effective in detecting fake news.
However, collecting more data and continuously training the models with that data
is necessary to improve their accuracy. In the future, exploring ensemble methods
in neural networks could further enhance the performance of fake news detection
systems.
Furthermore, it is critical to take into account the ethical implications and any
biases related to automated fake news detection. Future studies should look into ways
to overcome these issues, like creating algorithms that are fairness-aware and incorpo-
rating explainable AI technologies to make the decision-making process transparent
and understandable. Advancements in machine learning and neural network-based
methods can make a substantial contribution to the creation of efficient and depend-
able systems for identifying and combatting fake news in the online ecosystem by
resolving these constraints and taking into account the future scope.
References
1. Pulido CM, Ruiz-Eugenio L, Redondo-Sama G, Villarejo-Carballido B (2020) A new appli-

cation of social impact in social media for overcoming fake news in health. Int J Environ Res
Public Health 17(7):2430
2. Economic and Social Research Council. Using Social media. Available at https://esrc.ukri.org/
research/impact-toolkit/social-media/using-social-media
3. Gil P (2019) Available at https://www.lifewire.com/what-exactly-is-twitter-2483331
66 D. Singh et al.
4. Tandoc EC Jr et al (2017) Defining fake news a typology of scholarly definitions. Digit J 1-17
5. Radianti J et al (2016) An overview of public concerns during the recovery period after a
major earthquake: Nepal twitter analysis. In: HICSS’16 Proceedings of the 2016 49th Hawaii
international conference on system sciences (HICSS), Washington, DC, USA. IEEE, pp 136–
145
6. Holan AD (2016) 2016 Lie of the year: fake news. Politifact, Washington, DC, USA
7. Lazer DMJ, Baum MA, Benkler Y et al (2018) The science of fake news. Science
359(6380):1094–1096
8. Kotteti CMM, Dong X, Li N, Qian L (2018) Fake news detection enhancement with data
imputation. In: 2018 IEEE 16th international conference on dependable, autonomic and secure
computing, 16th international conference on pervasive intelligence and computing, 4th inter-
national conference on big data intelligence and computing and cyber science and technology
congress (DASC/PiCom/DataCom/CyberSciTech)
9. Ni B, Guo Z, Li J, Jiang M (2020) Improving generalizability of fake news detection methods
using propensity score matching. Soc Inf Netw. https://arxiv.org/abs/2002
10. Choudhury D, Acharjee T (2023) A novel approach to fake news detection in social networks
using genetic algorithm applying machine learning classifiers. Multimed Tools Appl 82:9029–
9045. https://doi.org/10.1007/s11042-022-12788-1
11. Malhotra R, Mahur A, Achint (2022) COVID-19 fake news detection system. In: 2022 12th
International conference on cloud computing, data science & engineering (confluence), Noida,
India, pp 428–433. https://doi.org/10.1109/Confluence52989.2022.9734144
12. Jehad Ali RK, Ahmad N, Maqsood I (2019) Random forests and decision trees
13. Yousif SA, Samawi VW, Elkaban I, Zantout R (2015) Enhancement of Arabic text classification
using semantic relations of Arabic WordNet
14. Shuy K, Wangy S, Tang J, Liuy H (2019) Fake news detection on social media: a data mining
perspective
15. Manzoor JS, Nikita (2019) Fake news detection using machine learning approaches: a sys-
tematic review. In: 2019 3rd International conference on trends in electronics and informatics
(ICOEI), pp 230–234. https://doi.org/10.1109/ICOEI.2019.8862770
Ensemble Approach for Suggestion
Mining Using Deep Recurrent
Convolutional Networks
Usama Bin Rashidullah Khan, Nadeem Akhtar, and Ehtesham Sana
Abstract The ability to extract valuable information from customer reviews and
feedback is crucial for businesses in today’s social media landscape. Many compa-
nies and businesses use social media networks to offer and deliver a range of services
to their customers as well as gather data on individual and customer opinions and
thoughts. An efficient method for automatically obtaining creative concepts and
suggestions from web sources is suggestion mining. For suggestion mining, we
present in this paper an ensemble model called DRC_Net that integrates deep neural
networks, recurrent neural networks, and convolutional neural networks. We evalu-
ated our model using the SemEval-2019 dataset containing reviews from multiple
domains. Our proposed model achieved better accuracy and F1-score than state-
of-the-art models and performed well on Subtask A and Subtask B, representing
in-domain and cross-domain validation. For Subtask A and Subtask B, the model
receives F1-scores of 0.80 and 0.87, respectively. The model’s ability to perform well
on cross-domain validation suggests that it can be applied to various domains and
datasets.
Keywords Suggestion mining · Online reviews · Ensemble deep learning
U. B. R. Khan (B)
Interdisciplinary Centre for Artificial Intelligence, Aligarh Muslim University, Aligarh 202002,
India
e-mail: usamakhan@zhcet.ac.in
N. Akhtar
Department of Computer Engineering and Interdisciplinary Centre for Artificial Intelligence,
Aligarh Muslim University, Aligarh, Uttar Pradesh 202002, India
e-mail: nadeemakhtar@zhcet.ac.in
E. Sana
Department of Computer Engineering, Aligarh Muslim University, Aligarh 202002, India
e-mail: ehteshamsana@zhcet.ac.in
https://doi.org/10.1007/978-981-99-6553-3_5
68 U. B. R. Khan et al.
1 Introduction
Businesses, customers, and researchers can all benefit from the knowledge provided
by opinionated text found on blogs, social networking sites, discussion forums, and
reviews. It provides insights into the quality of products and services, as well as the
experiences of customers. However, manually analysing large volumes of reviews
can be a daunting task, particularly as e-commerce continues to grow. Automated
approaches, such as natural language processing techniques, offer a promising solu-
tion to this problem. One important task in review analysis is suggestion mining,
which involves identifying suggestions or recommendations made by reviewers.
Suggestion mining has important applications in e-commerce, where businesses can
use the suggestions to improve their products and services, and consumers can make
more informed purchasing decisions and use services more effectively. Suggestion
mining involves the automatic extraction of suggestion sentences or phrases from
the online text where suggestions of interest are likely to appear [1]. These sugges-
tions can be explicit or implicit, with explicit suggestions being unambiguously
expressed in the text, and implicit suggestions providing additional information that
helps readers classify them as suggestions [2].
In this research, a new architecture for online review suggestion mining is evalu-
ated to address the challenge of analysing large volumes of reviews. The proposed
architecture DRC_Net is the combination of deep neural networks (DNN), recurrent
neural networks (RNN), and convolutional neural networks (CNN). Our architecture
captures both temporal and spatial dependencies in the reviews, allowing for more
accurate suggestion mining. The novelty of our approach lies in its ability to effec-
tively capture the complex relationships between different parts of the reviews, which
is critical to accurately identifying suggestions. We evaluate our architecture on the
SemEval-2019 Task 9 dataset, which is a benchmark dataset for suggestion mining
in reviews, which contains reviews from multiple domains. The task comprises two
subtasks, A and B, with labelled data for Windows phones from software sugges-
tion forums and hotel reviews, respectively. In Subtask A, the system is trained and
tested in the same domain, while in Subtask B, the system is evaluated using test data
from a domain other than the one for which training data is provided. Our approach
outperforms existing methods on both subtasks, demonstrating the effectiveness of
our architecture.
The paper makes the following main contributions:
1. Introduction of a novel ensemble architecture, DRC_Net, which combines deep
neural networks (DNN), recurrent neural networks (RNN), and convolutional
neural networks (CNN) for suggestion mining in online reviews.
2. Evaluation of the proposed architecture on both Subtasks A and B of the SemEval-
2019 Task 9 dataset. This dataset serves as a benchmark for suggestion mining
and contains reviews from various domains.
3. Comparison of the results obtained from the proposed architecture with those of
existing studies in the field. This allows for a comprehensive understanding of
the performance and effectiveness of the DRC_Net architecture.
Ensemble Approach for Suggestion Mining Using Deep Recurrent … 69
The rest of the paper is structured as follows: The relevant work in suggestion
mining is outlined in Sect. 2, our proposed architecture is described in Sect. 3, our
experimental setup is in Sect. 4, findings are shown in Sect. 5, limitations of the work
are discussed in Sect. 6, and the paper is concluded with future research possibilities
in Sect. 7.
2 Related Work
Early studies in suggestion mining have focused on detecting wishes, advice, product
defects, and improvements from online reviews. Some of the earliest works in this
field involved wish detection from weblogs and online forums [3, 4]. In 2009, the
concept of detecting suggestions in online portals was introduced [3], and the term
‘suggestion’ was later coined by Goldberg et al. [5] in 2013. However, their work was
limited to suggestions for product improvement only. Subsequent studies by Negi
[1] have explored various types of suggestions, ranging from customer-to-customer
suggestions to open-domain suggestion mining. Rule-based approaches have also
been used by some researchers, but deep learning approaches have shown more
effectiveness in suggestion mining. LSTM, CNN or their variants have been widely
used in text classification problems because they capture long-term dependencies
and spatial features [6].
In previous research on suggestion mining from online reviews, several methods
have been applied to extract suggestions from textual data on SemEval-2019 Task
9 dataset. For example, The SemEval-2019 Task 9 dataset was processed using the
random multi-model deep learning (RMDL) method by Liu et al. [7]. They used
GloVe embeddings as input features and chose the number of layers and nodes for
each model at random. The outcome was chosen by a majority vote.
In SemEval-2019 Task 9, Liu et al. [8] proposed an ensemble model that combined
BERT with various task-specific modules, including CNN, GRU, and FFA. The
BERT model was used for sentence perspective encoding, while the other modules
were stacked over BERT to enhance the model’s performance. The proposed model
achieved the highest scores in Subtasks A and B, outperforming individual subtasks.
This approach demonstrated the effectiveness of combining different neural network
architectures for suggestion mining tasks. With the use of WordNet hyponyms for the
term ‘message,’ Alekseev et al. [9] evaluated several strategies for labelling unknown
inputs and examined zero-shot learning algorithms for text categorization. Direct
labelling of the content as either a suggestion or not was their strategy. They verify
their work on both of SemEval-19 Task 9’s subtasks. Potomias et al. [10] developed a
rule-based approach that utilized heuristic, lexical, and syntactic patterns to determine
the degree of suggestion content in sentences. The weights of these patterns were used
to rank the sentences. The rule-based classifier was then combined with R-CNN to
enhance the performance of the model. The ensemble model attained the highest rank
in Subtask B, indicating its state-of-the-art performance on cross-domain evaluation.
In another work of ours [11], TCN was applied for suggestion mining. Two word
embeddings, BERT and GloVe were combined to capture the semantic as well as
contextual information within the sentence. SemEval-2019 Subtask A dataset was
used to evaluate the model. Due to the dilation mechanism of the TCN, the proposed
model achieves the best results. Ramesh et al. [12] were utilized Subtask A dataset
to classify sentences. Their experiment involved feature selection techniques such as
chi-square (CHI2), document frequency difference (DFD) and multivariate relative
discriminative criterion (MRDC) to select important features and represent sentence
vectors. Support vector machine and random forest algorithms were employed for
classification, resulting in an accuracy of 83.47% for suggestion mining.
3 Proposed Architecture
In this study, we propose an ensemble model DRC_Net for suggestion mining that
builds upon the concept of random multimodal deep learning (RMDL) [12]. RMDL
incorporates three distinct deep learning architectures, including convolutional neural
networks (CNNs), recurrent neural networks (RNNs), and deep neural networks
(DNNs). The final prediction is obtained from the majority voting on the results
of each randomly generated model. In the DNN architecture, each layer is directly
linked to the layers above and below it. This enables DNN to handle high-dimensional
data with many input features and learn complex representations of sequential data
[13]. On the other hand, the RNN architecture [14] also uses the prior data points in
addition to the current input data. As a result, RNN can handle input sequences with
varied lengths and learn and recall long-term dependencies. Finally, the CNN archi-
tecture has the ability to extract pertinent input information. CNN’s convolutional
layers allow CNN to extract important features from sequential data automatically.
Furthermore, CNN can capture local patterns and relationships within the input data
[15]. Ensemble techniques can help to reduce variance and bias by combining the
predictions of multiple deep learning models. Each individual model in the ensemble
may have a high variance, but by combining their predictions, the overall variance
can be reduced. Deep learning models can also suffer from bias, particularly when
the training data is limited or biased towards certain examples. Ensemble techniques
can help to reduce bias by combining the predictions of multiple deep learning
models, each of which may have different sources of bias [16]. Figure 1 illustrates
the relationship between the bias and variance and the model complexity (Fig. 2).
DRC_Net extends the RMDL architecture by randomly generating three models
of each type, in which we selected the best of each kind. The proposed system
architecture is shown in Fig. 1. The ensemble model consists of three neural networks,
DNN, RNN, and CNN. The DNN has two dense layers with 64 and 32 neurons,
respectively, with each dense layer followed by a dropout layer having a dropout
rate of 0.2. The RNN has GRU, LSTM, and GRU layers, with units 128, 128, and
64, respectively, and each layer is separated by dropout layers with a dropout rate
of 0.5. The CNN has one convolutional 1D layer with kernel size 3, followed by
Fig. 1 Relationship between the model’s complexity with the bias and variance
Fig. 2 System architecture of proposed model
maxpool1D and a dense layer of ten units. Three types of features are extracted from
three different neural networks, complex (nonlinear), temporal and spatial.
We used BERT word embeddings, which have been shown to be effective in
handling unbalanced classes and achieving good performance in several text classi-
fication tasks, including suggestion mining [17]. The pre-trained DistilBERT model,
a compact, quicker, and lighter transformer model based on BERT architecture gener-
ates the BERT embedding. Each model receives input from the DistilBERT model,
which creates word embeddings with a dimension of 746. The output of each model
is produced by a dense layer made up of a single unit and having a sigmoid activation
function. Once the output of each model has been concatenated, the concatenated
features are sent to the final dense layer, which has two neurons and SoftMax acti-
vation. Instead of using the voting process as RMDL does, the final output is taken
from the last dense layer.
4 Experiments
4.1 Dataset and Pre-processing
The SemEval-2019 Task 9 dataset was used to evaluate the proposed architecture.
The training data for Subtask A consists of 8500 words, of which 2085 have been
labelled as suggestions. It also includes a validation set and a test set, each of which
contains 592 and 833 samples. All sentences are from the software suggestion forum
for software developers from UserVoice, and further details can be found in [6]. The
labelled dataset was obtained from a GitHub repository where suggestion sentences
were labelled as 1 and non-suggestion were labelled as 0. With much fewer sugges-
tion sentences than non-suggestion sentences, the dataset is extremely unbalanced.
Subtask B, on the other hand, takes the cross-domain setting into account. As a result,
doing supervised learning requires only test and validation data; no training data are
available. So, we used training data of Subtask A to train our model for cross-domain
validation. Validation data has 824 samples, and the test set has 808 samples collected
from hotel reviews on the Tripadvisor website. The statistics of the dataset are shown
in Table 1. In order to prepare the dataset for analysis, several pre-processing steps
were undertaken. These included the elimination of tags, emojis, special characters,
links, extra spaces, and words that were repeated in quick succession. The dataset
included contraction terms like ‘can’t’ and ‘he’d’ve,’ which the tokenizer does not
recognize, thus those words were stretched to their full form. After that, stemming
was done to make the vocabulary smaller. As an example, the phrase ‘my app has
a wp7 version and a wp8 version xap in the same submission’ was changed to ‘app
wp7 version wp8 version xap submission,’ which might not make sense without
stop words. It is noteworthy that stop words were left in the sentences because they
provide critical contextual information.
Table 1 SemEval-2019 Task 9 dataset

Subtasks Domain Training Validation Test
Subtask A Software development forums 8500 592 833
Subtask B Hotel reviews 0 808 824
4.2 Experimental Setup
The proposed ensemble model combines the strengths of DNNs, RNNs, and CNNs
for suggestion mining and is implemented using Keras and TensorFlow as a backend.
The ensemble model consists of three individual models, one each for DNNs, RNNs,
and CNNs. The experimental setup includes two approaches, one for the same domain
and the other for cross-domain validation. Due to the high-class imbalance in the
dataset, the evaluation of the performance of the model involves metrics that are
based on positive and negative classes, such as precision, recall, and F1-score. Our
architecture is built on top of the DistilBERT model, where the pre-processed data
serves as the input and the output is the contextual word embedding which results in
a 746-dimensional vector for each word.
Subtask A The same dataset is used to train all three distinct models. The output
of each model is then concatenated and sent into a classification layer that is fully
connected. The result is a probability score indicating the likelihood that each input
sentence represents a suggestion. With a learning rate of 1e-4 over 25 epochs and a
batch size of 50, the model is trained using the Adam optimizer and sparse categorical
cross-entropy and then verified using the same domain data.
Subtask B The proposed ensemble model is trained on the Subtask A dataset for ten
epochs with a batch size of 50 since there is no training data available for Subtask B.
The model is then validated on the Subtask B validation set, which contains cross-
domain data from hotel reviews. The validation helps to improve the model’s ability
to generalize to cross-domain validation. After validation, the model is trained for a
further ten epochs on the Subtask B validation set and tested on the given test set.
The model is optimized using the same hyperparameters as in Subtask A.
The proposed ensemble model, DRC_Net, has demonstrated exceptional perfor-

mance in both Subtasks A and B, indicating the effectiveness of the model for both
same domain and cross-domain scenarios. Table 2 displays the evaluation metrics
for DRC_Net on the test set. The model completed Subtask A with precision, recall,
and an F1-score of 0.80. The model’s precision, recall, and F1-score in Subtask B
were 0.88, 0.87, and 0.87, respectively. The high precision and recall, which show
the model’s capacity to perform well on both positive and negative classes, serve as
evidence of its success on the unbalanced dataset.
In addition to evaluating the proposed ensemble model’s performance, a compar-
ison was made with previously published studies that utilized the same dataset for
evaluation. Tables 3 and 4 display the comparison results for Subtask A and Subtask
B. The proposed model, DRC_Net, was found to outperform RMDL on both subtasks,
Table 2 Results of the DRC_

Subtask Precision Recall F1-score
Net on SemEval-2019 Task 9
A 0.80 0.80 0.80
B 0.88 0.87 0.87
Table 3 Comparison of
Models Subtasks F1-score
DRC_Net with previous
studies on Subtask A Alekseev et al. [9] A 0.78
Liu et al. [7] A 0.74
Potomias et al. [10] A 0.74
Liu et al. [8] A 0.78
Usama et al. [11] A 0.82
DRC_Net A 0.80
Bold means Proposed model Results
indicating that combining the strengths of DNN, RNN, and CNN architectures was
more effective than simply using a voting method for individual models’ predictions.
In order to provide more precise predictions, the suggested model was able to capture
both spatial and temporal dependencies, by concatenating the features extracted from
each model. These results show that on both the subtasks dataset of SemEval-2019
Task 9, the proposed ensemble model beat numerous state-of-the-art techniques for
suggestion mining.
Following the completion of trials on Subtasks A and B, a comparison of the
proposed ensemble model with other models shows that it outperforms the other
models in terms of F1-score, recall, and precision. The comparison results demon-
strate that the proposed ensemble model is an effective approach for suggestion
mining, outperforming several deep learning and ensemble models and achieving
better results with state-of-the-art models in both subtasks.
Table 4 Comparison of
Models Subtasks F1-score
DRC_Net with previous
studies on Subtask B Alekseev et al. [9] B 0.48
Liu et al. [7] B 0.76
Potomias et al. [10] B 0.85
Liu et al. [8] B 0.85
DRC_Net B 0.87
Bold means Proposed model Results
6 Limitations
The provided dataset used in this research is highly imbalanced and unstructured.
Applying appropriate balancing techniques, such as oversampling, would be benefi-
cial to address this issue and improve the model’s performance. The base models used
in the proposed architecture are selected randomly. While this allows for experimenta-
tion, using more specific and targeted models tailored to the task at hand could poten-
tially yield even better results. In this study, only contextual word embeddings from
BERT are utilized. Although BERT embeddings are powerful, combining them with
traditional word embeddings, such as word2vec or GloVe, could provide comple-
mentary information and potentially enhance the representation of the textual data.
Exploring different combinations of traditional and contextual word embeddings
may lead to improved results. By addressing these limitations, the proposed archi-
tecture can be enhanced to achieve better performance and reliability in suggestion
mining tasks.
7 Conclusion
In conclusion, our proposed ensemble model DRC_Net leverages the strengths of

DNNs, RNNs, and CNNs and uses BERT word embeddings to improve the perfor-
mance of suggestion mining. By randomly generating three models of each type
and selecting the best of each kind, we aim to achieve better performance than
using just one model. Our proposed ensemble model for suggestion mining has
shown promising results in accurately identifying suggestions from online reviews.
By combining the strengths of DNNs, RNNs, and CNNs, our model was able to learn
complex representations of sequential data and capture local patterns and relation-
ships within the input data, resulting in improved accuracy and F1-score compared
to other related studies. Our model’s ability to perform well on cross-domain valida-
tion suggests that it can be applied to various domains and datasets. Additionally, our
analysis of feature importance revealed that the extraction of spatial and temporal
features with contextual information played a crucial role in figuring out whether or
not a sentence is a suggestion. Our research highlights the effectiveness of ensemble
models in improving the accuracy and robustness of deep learning models for sugges-
tion mining tasks. However, further research is needed to explore the full potential
and limitations of the proposed model. In future, there are several directions that can
be taken to improve the proposed DRC_Net ensemble model for suggestion mining.
One potential direction is to explore different ensemble techniques, such as bagging,
stacking, and boosting, to further improve the accuracy and robustness of the model.
Another direction for future research is to apply the DRC_Net model to an open-
domain setting, where the model can be evaluated on a broader range of datasets
and tasks, to assess its effectiveness in a more general context. By exploring these
potential avenues for improvement, researchers can continue to advance the state of
the art in the field of suggestion mining and improve the performance of models like
DRC_Net.
References
1. Negi S (2019) Suggestion mining from text. Dissertation. National University of Ireland–
Galway
2. Negi S, Buitelaar P (2017) Suggestion mining from opinionated text. In: Sentiment analysis in
social networks, pp 129–139
3. Goldberg AB, Fillmore N, Andrzejewski D, Xu Z, Gibson B, Zhu X (2009) May all your wishes
come true: a study of wishes and how to recognize them. In: Proceedings of human language
technologies: the 2009 annual conference of the north american chapter of the association for
computational linguistics, pp 263–271
4. Ramanand J, Bhavsar K, Pedanekar N (2010) Wishful thinking-finding suggestions
and’buy’wishes from product reviews. In: Proceedings of the NAACL HLT 2010 workshop on
computational approaches to analysis and generation of emotion in text, pp 54–61
5. Brun C, Hagege C (2013) Suggestion mining: detecting suggestions for improvement in users’
comments. Res Comput Sci 70(79):171–181
6. Negi S, Daudert T, Buitelaar P (2019) Semeval-2019 task 9: suggestion mining from online
reviews and forums. In: Proceedings of the 13th international workshop on semantic evaluation,
pp 877–887
7. Liu F, Wang L, Zhu X, Wang D (2019) Suggestion mining from online reviews using random
multimodel deep learning. In: 2019 18th IEEE international conference on machine learning
and applications (ICMLA). IEEE, pp 667–672
8. Liu J, Wang S, Sun Y (2019) Olenet at semeval-2019 task 9: Bert based multi-perspective
models for suggestion mining. In: Proceedings of the 13th international workshop on semantic
evaluation, pp 1231–1236.
9. Alekseev A, Tutubalina E, Kwon S, Nikolenko S (2021) Near-zero-shot suggestion mining with
a little help from WordNet. In: Analysis of images, social networks and texts: 10th international
conference, AIST 2021, Tbilisi, Georgia, 16–18 Dec 2021
10. Potamias RA, Neofytou A, Siolas G (2019) NTUA-ISLab at SemEval-2019 task 9: mining
suggestions in the wild. In: Proceedings of the 13th international workshop on semantic
evaluation, pp 1224–1230
11. Rashidullah Khan UB, Akhtar N, Kidwai UT, Siddiqui GA (2022) Suggestion mining
from online reviews using temporal convolutional network. J Discrete Math Sci Cryptogr
25(7):2101–2110
12. Ramesh A, Reddy KP, Sreenivas M, Upendar P (2022) Feature selection technique-based
approach for suggestion mining. In: Evolution in computational intelligence: proceedings of
the 9th international conference on frontiers in intelligent computing: theory and applications
(FICTA 2021). Springer, Singapore, pp 541–549
13. Kowsari K, Brown DE, Heidarysafa M, Meimandi KJ, Gerber MS, Barnes LE (2017) Hdltex:
hierarchical deep learning for text classification. In: 2017 ICMLA. IEEE, pp 364–371
14. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating
errors. Nature 323(6088):533–536
15. Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional
neural networks for sentence classification. arXiv:1510.03820
16. Pretorius A, Bierman S, Steel SJ (2016) A bias-variance analysis of ensemble learning for
classification. In: Annual proceedings of the south african statistical association conference,
vol 2016, no. con-1, pp 57–64
17. Madabushi HT, Kochkina E, Castelle M (2020) Cost-sensitive BERT for generalisable sentence
classification with imbalanced data. arXiv:2003.11563
A CNN-Based Self-attentive Approach
to Knowledge Tracing
Anasuya Mithra Parthaje, Akaash Nidhiss Pandian, and Bindu Verma
Abstract A key component in modern education is personalized learning, which

involves the difficult challenge of precisely measuring knowledge acquisition. Deep
knowledge tracing (DKT) is a recurrent neural network-based method used for this
problem. However, the predictions from DKT can be inaccurate. To address this,
a mathematical computation model called convolutional neural network (CNN) is
leveraged to understand the DKT problem better. Our model identifies applicable
knowledge concepts (KC) and predicts the student’s conceptual mastery based on
them. SAKT, a method of self-attention that successfully manages data sparsity
(sparse data representation), is introduced here. Our model outperforms modern
knowledge tracing models, improving the average AUC by 5.56% through experi-
ments on real-world datasets. This performance breakthrough has positive ramifica-
tions for personalized learning, improving the accuracy and efficacy of knowledge
acquisition and ultimately resulting in more specialized and productive educational
settings.
Keywords DKT · CNN · AUC · SAKT
1 Introduction
Online education has leveraged MOOCs and intelligent tutoring platforms to offer
courses and exercises. Self-attentive techniques and data mining tools help forecast
student performance based on KCs, which can represent an exercise, skill or concept
[1]. Knowledge tracing is key to developing personalized learning that recognizes
student strengths and weaknesses. Modelling students’ knowledge state over time is
a challenging task due to the complexity of the human brain [2]. Knowledge tracing
A. M. Parthaje (B) · A. N. Pandian · B. Verma

Delhi Technological University, New Delhi, India
e-mail: anasuya.mithra@gmail.com
B. Verma
e-mail: binduverma@dtu.ac.in
https://doi.org/10.1007/978-981-99-6553-3_6
78 A. M. Parthaje et al.
is a supervised sequence learning task formalized as predicting the next interaction,

represented by X = (x 1 , x 2 , x 3 , …) and x 1 + 1.
To enable students to adjust their practice, it is crucial for a system to identify
their strengths and weaknesses. This can also help teachers and system creators
suggest appropriate instructional materials and exercises. We represent the interaction
between questions and answers as x t = (et , r t ), where the student attempts exercise
et at timestamp t and indicates whether it is correct. In KT, the target is to predict the
student’s ability to accurately respond to the following exercise, i.e. forecast p(r t+1 =
1 et+1 , X). Knowledge tracing tasks aim to predict aspects of a student’s interaction
x 0 , …, x t based on their prior interactions in a given learning task [3].
In order to forecast the student’s performance based on that performance, this
research suggests a method for extracting pertinent KCs from prior encounters. As
KCs, we predicted how students would do on tasks using exercises. SAKT gives
the previously completed exercises weights when estimating the student’s perfor-
mance on a specific exercise. Across all datasets, SAKT outperforms cutting-edge
KT techniques on average by 4.43% on the AUC. Due to the key element of SAKT,
self-attention, our model is also orders of magnitude faster than RNN-based models.
The rest of this work is structured as follows. In Sect. 2, modelling strategies applied
to students are examined. Section 3 presents the suggested model. The datasets we
used for our studies are described in Sect. 4, along with the findings.
This paper’s conclusion and future study directions are covered in Sect. 5.
2 Related Works
Within the area of machine learning (ML) research, the area of deep learning (DL)
has only recently begun to gain momentum. ML systems use DL algorithms to
discover multiple levels of representation and have shown empirical success in the
AI applications of NLP and CV. The paper proposes a novel knowledge tracing
model called deep knowledge tracing with student classification that is dynamic
(DKT-DSC); according to every time step, the model divides students into various
groups based on their capacity for learning. DKT architecture utilizes recurrent
neural networks (RNNs) to process this information using data to predict student
performance [4]. Classifying students as long-term memories improves knowledge
tracing with RNNs, which are among the best approaches to knowledge tracing today.
Recent deep learning models such as deep knowledge tracing (DKT) [5] summarize
the student’s knowledge state using recurrent neural networks (RNNs). Memory-
augmented neural networks (MANs) were exploited in dynamic key-value memory
networks (DKVMNs) [5]. The learning algorithm learns the correlation between a
student’s knowledge state and underlying knowledge content, using two matrices, key
and value. It is difficult to interpret the parameters of the DKT model [6]. DKVMN
is simpler to comprehend than DKT since it unambiguously maintains key-value
representation matrix and a KC matrix for representation. All these deep learning
A CNN-Based Self-attentive Approach to Knowledge Tracing 79
Fig. 1 An encoder/decoder connected by attention
models rely on RNNs, which cannot generalize sparse data because they are based
on RNNs [7].
Transformer is a method based on the pure attention mechanism proposed in this
paper [8]. The skills that students learn in the KT task are interconnected, and how
well they succeed on a specific exercise depends on how well they perform on earlier
activities that were similar to it.
This research proposes a technique for predicting student performance on exer-
cises by extracting relevant KCs from past interactions. SAKT, which assigns weights
to previous exercises, outperforms the best KT methods by 4.43% on average across
all datasets, and it is faster than RNN-based models because it uses self-attention. The
encoder converts data into features, and the decoder produces context vectors that
are interpreted by the deep CNN model shown in Fig. 1. LSTM networks summarize
input sequences in internal state vectors, which are input into the first cell of the
decoder network.
3 Proposed Method
By analysing the previous interactions of a student et + 1, we are able to predict

whether he/she will be able to provide the correct response for the next exercise.
Figure 3 illustrates how the problem can be transformed into a sequential model.
Models with inputs are convenient x 1 , x 2 , x 3 , …, x t −1 . Continuing with the exercise
sequence one position ahead, e1 , e2 , e3 , …, et , in this correct response to exercises is
the output. In the one-hot embedding and the feature extractor in the data, the math
portion of the extraction of the data features is picked, and data output is labelled.
It is a multi-head system that learns relevant information in various representative
sub-spaces.
3.1 Architecture for the Proposed Model
3.1.1 Embedding Layer
As a first step, the tuples containing questions and answers are converted into one-hot
embeddings of real-value vectors as shown in Fig. 2. The proposed model outputs X
= (e1 , e2 , e3 , …, et ), where n is the maximum length that may be processed by input
sequence y = (y1 , y2 , y3 , …, yt ). Because the model works with fixed-length sequence
inputs, if t is less than n, we repeatedly pad the sequence left by a question–answer
pair. Figure 3 depicts the model’s general structure. The sequence (n, y1 ) is divided
into n/t subsequences each of length n when t exceeds n. A model relies on all of
these subsequences as inputs. The interaction embedding matrix, which includes the
latent dimension d, is trained. For each element in the sequence, an embedding M si
is obtained. A similar calculation is done for the exercises embedded in the set so
that every exercise item in the set, ei , is embedded along the line.
We added residual connections to the above modelling structure upon completing
the self-attention layer as well as the feeding forward layer in order to train a more
complex network structure [9]. In addition, we normalized the layers and each layer’s
dropout.
Fig. 2 a SAKT network [12] estimates attention weights of previous elements at each timestamp
by extracting keys, values and queries from the embedding layer. Attention weight is determined
for the query element and its corresponding key element x t , j. b An embedding layer contains a
student’s current activity and past interactions to embed the current question at timestamp t + 1,
along with elements of prior interactions shown as an interaction embedding in the key and value
space [12]
Fig. 3 Proposed KTA model
3.1.2 Prediction Layer
Using the data of the students being able to correctly answer exercises or not, we
create a prediction layer by passing the learned representation F through an activated
Sigmoid network, as seen in Eq. 1.
p = σ (F W + b) (1)
Here, p represents the likelihood that a student will provide a response answer the
exercise en correctly, and σ (z) = 1/(1 + e−z )
3.1.3 Network Training
Due to the fixed length of sequences in the self-attention model, input sequences
are converted X = (x 1 , x 2 , …, x t X) and sequenced in order to feed it to our model,
Knowledge tracing with attention (KTA). If it is of variable length, the model repeat-
edly adds padding to the left of X when it is less than l, as presented in Eq. 2. The
sequence is partitioned if X exceeds l, and sequences of length l are divided into
subsequences. Training is done to achieve the following objectives: The observed
sequence should be minimized in terms of its negative log. Learning takes place with
the parameters whenever p and r are compared, and it minimizes the cross-entropy
loss between their communication.

L=− ri log( pi ) + (1 − ri )log(1 − pi ) (2)
i=1
3.1.4 Feature Extraction
When learning the features of a new sample through a previous network, these repre-
sentations are used for feature extraction in neural networks. After that, the newly
trained classifier is applied to the features. This is followed by feeding the vectors
into a feature extractor to capture latent dependencies between inputs. There are N
identical blocks that make up the feature extractor. Two sub-layers are present in each
block. Firstly, there is a mechanism for self-attention involving multiple heads [8].
Based on how similar each item in the input sequence is to the others, the global rela-
tionship is extracted using scaled dot-product attention. [8]. This model calculates
attention h times, allowing it to compute this is known as multi-head learning because
it involves obtaining relevant information from different representative sub-spaces.
3.1.5 Position Encoding
Using the position encoding layer of the self-attention neural network, we are able to
encode the sequence order just like with a convolutional network or recurrent neural
network. This layer has significant importance in this problem statement because a
student’s knowledge is dependent on it, and it progressively changes with time. There
should be no wavy transitions in a knowledge state at a particular time instance [5].
The position embedding parameter is learned during training so that P R ∈ R n×d
can be incorporated. The ith row of the position embedding matrix helps generate
the vector signifying interaction embedding for the ith element of the interaction
sequence.
3.1.6 Prediction and Loss
The final decision is made using a Sigmoid function in the prediction stage. We won’t
elaborate here on the prediction and optimization processes [5]. The embedding layer
results in giving the embedded interaction input matrix and the embedded exercise
matrix (E) as outputs.
4 Experimentations
4.1 Dataset
A synthetic dataset and four real-world datasets produced from the ASSISTANTS
online tutoring platform were used to evaluate the proposed model. 328,291 inter-
actions from 124 skills are included in the dataset containing 4,417 students with
328,291 interactions. The prediction results are visualized using some of the students
Table 1 Ablation study

Architecture Statics ASSISTment 2015 ASSISTment 2009 Synthetic
Block 0.819 0.822 0.837 0.826
2 block 0.845 0.853 0.840 0.827
No PE 0.832 0.849 0.842 0.827
No RC 0.834 0.857 0.847 0.823
Dropout 0.840 0.851 0.845 0.832
Single 0.85 0.845 0.828 0.823
Predict 0.854 0.854 0.853 0.836
Bold means Proposed Algorithm Results
in this dataset [9]. The ASSIST2010 dataset of 19,917 student responses for 100
skills contains a total of 708,631 question-answering exchanges. As a result of a
larger number of students, ASSIST2010 contains fewer interactions per skill and
student than ASSIST2009. This 2017 ASSISTMENT dataset has been made avail-
able for the 2017 ASSISTMENT data mining competition. It has 102 skills, 942,816
interactions and 686 students, giving it a higher average record count per student.
On all four datasets, except for the Simulated-5, our proposed model achieves excel-
lent results according to Table 1. On ASSIST2015, KTA exceeds DKT+ by 10%
more than DKT+. When compared with other models, our model achieves notable
improvements in the F1 score as well. Using simulations of 4000 virtual students’
answering trajectories, the synthetic dataset was generated. A set of 50 exercises
with varying difficulty levels are given to each student from five virtual concepts.
Data from the assessment Challenge (ASSIST Challenge) 4 competition. There are
942,816 interactions in this dataset, 686 students and 102 skills. Since this dataset
has a density of 0.81, it is the densest dataset available.
4.2 Evaluation Methodology
A binary classification setting is used for the prediction task, i.e. correct and incor-
rect answers to exercises. In this case, the area under curve (AUC) metric is used
to compare the performance. This paper compares our proposed KT model with
DKT [10] DKT+ [5] and DKVMN [11], which are state-of-the-art KT methods.
The introduction describes these methods. The model was trained on 80% of the
dataset and then tested on the remaining 20%. d = 50, 100, 150, 200, the hidden
state dimension, was used for all proposed methods. Hyperparameters reported in the
competing papers were used for both approaches. We used the same procedure for
weight initialization and optimization. TensorFlow was used for SAKT implemen-
tation with ADAM optimizer. The ASSISTChall dataset was processed in batches
of 256 and the other datasets in batches of 128. Dropout rates of 0.2 were used for
datasets with more records, such as ASSIST2015 and ASSISTChall, and for the rest,
0.2. The number of sequences to be created was proportional to the number of exercise
tags for each student. ASSISTChall and STATICS datasets use n = 500. ASSIST2009
datasets use n = 100 and 50, and the synthetic and ASSIST2015 datasets use n =
50. A reason for this is that there are methods for tracing attention-based knowledge
in interpretable analyses.
There may be possible drawbacks to the proposed CNN-based self-attentive knowl-

edge tracing method. Given the complexity of CNNs and self-attention mechanisms,
it might be difficult to comprehend which features the model consider most influ-
ential. Scalability and generalisability may be issues, and the model’s performance
may be significantly influenced by the calibre and volume of training data. Addi-
tional factors to take into account are the capability of capturing intricate sequential
connections and possible overemphasis on exercise order. Furthermore, the dataset
used in our model lacks lengthy sequences, so the advantage of collecting long
sequences is not utilized. All questions appear once, and they have the same length,
so data dependencies are low. SAKT outperforms competing approaches by 3.16%
on ASSIST2009 and 15.87% on ASSISTment2015 due to its attention mechanism.
The proposed method performs similarly to ASSISTChall compared to DKT, but
better than STATICS2011 by 2.16%. Attention weights visualization aids students
in understanding pertinent exercises, so we calculate key and query attention weights
across all sequences. We normalize attention layer weights, and each element of the
relevance matrix represents the influence of relevant exercises. Synthetic is the dataset
analysed because it contains hidden concepts that are known.
Without self-attention, only the previous exercise affects subsequent exercise
predictions. The default architecture performs significantly worse without attention
blocks. Additional self-attention blocks increase model parameters, but in our case,
this did not improve performance and complicated the model (Table 1). Residual
connections have little impact on model performance, with removal even improving
performance for the ASSISTment 2015 dataset. To regularize the model, especially
with smaller datasets, we use dropout. Multiple attention heads capture different
subspaces, and using just one head results in worse performance for all datasets.
GPU training is significantly faster for SAKT (1.4 s/epoch) than DKT+ (65 s/epoch),
DKT (45 s/epoch) and DKVMN (26 s/epoch) (as seen in Fig. 4), using an NVIDIA
Titan V GPU for experiments.
Fig. 4 Training and testing efficiency
6 Conclusion and Future Work
In this paper, by examining the pertinent exercises from their prior interactions, we
predict a student’s performance in the next exercise based on his interaction history
(without using any RNNs). We have extensively tested our model on multiple real-
world datasets, and we find that our method outperforms RNN-based methods by
an order of magnitude. In order to capture these global dependency relationships
directly, KTA models are presented, which compute the similarity between input
items regardless of their length. Compared to existing models, our proposed model
offers better predictions. As a result of our experiments, we have shown that the
model we propose is better able to predict the future than current models.
References
1. Self J (1990) Theoretical foundations for intelligent tutoring systems. J Artif Intell Educ 1(4):3–
14
2. Chang H-S, Hsu H-J, Chen K-T (2015) Modeling exercise relationships in e-learning: a unified
approach. In: Proceedings of the 8th international conference on educational data mining
(EDM), pp 247–254
3. Corbett AT, Anderson JR (1994) Knowledge tracing: modeling the acquisition of procedural
knowledge. User Model User-Adap Interact 4:253–278
4. Minn S, Lee JY, Yoon B, Rajendran J (2018) Deep knowledge tracing and dynamic student
classification for knowledge tracing. In: 2018 IEEE international conference on data mining
(ICDM), pp 933–938
5. Yeung C-K, Yeung D-Y (2018) Addressing two problems in deep knowledge tracing via
prediction-consistent regularization. In: Proceedings of the fifth annual ACM conference on
learning at scale, pp 97–106
6. Khajah M, Lindsey RV, Mozer MC (2016) How deep is knowledge tracing? arXiv:1604.02416
7. Kang W-C, McAuley J (2018) Self-attentive sequential recommendation. In: 2018 IEEE
International conference on data mining (ICDM), pp 197–206
8. Vaswani A, Shazeer N, Parmar, N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I
(2017) Attention is all you need. In: Advances in neural information processing systems, pp
5998–6008
9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
10. Xiong X, Lange K, Xing EP (2016) Going deeper with deep knowledge tracing. In: International
educational data mining society, pp 231–238
11. Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in
deep neural networks through transient chaos. In: Advances in neural information processing
systems, pp 3360–3368.
12. Zhang J, Liu Y, Zhang L, Xu H (2017) Dynamic key-value memory networks for knowledge
tracing. In: Proceedings of the 26th international conference on World Wide Web, pp 765–774.
LIPFCM: Linear Interpolation-Based
Possibilistic Fuzzy C-Means Clustering
Imputation Method for Handling
Incomplete Data
Jyoti, Jaspreeti Singh, and Anjana Gosain
Abstract Dealing with missing values has been a major obstacle in machine
learning. The occurrence of missing data is a significant problem that often results in
a noticeable reduction in data quality. Therefore, effective handling of missing data
is essential. This paper introduces a missing value imputation approach that utilizes
possibilistic fuzzy c-means clustering and proposes a method called LIPFCM by
combining the advantages of linear interpolation and fuzzy clustering techniques.
The performance of the LIPFCM method is compared with five state-of-the-art
imputation techniques using four commonly used real-world datasets from the UCI
repository. The experimental results indicate that our proposed method performs
significantly better than the existing imputation methods based on RMSE and MAE
for these datasets. Furthermore, the robustness of the proposed approach has been
experimentally validated on different missing ratios to analyze the impact of missing
values.
Keywords Missing values · Imputation · LI · PFCM · Incomplete data · Fuzzy

clustering · MVI · LIPFCM
1 Introduction
Recently, missing values are emerging as a significant concern in the field of data
mining. It may occur in a database due to several reasons like lack of data, tools
defect, unresponsiveness, data entry errors, undetermined value, tools scarcity, data
Jyoti (B) · J. Singh · A. Gosain

USICT, Guru Gobind Singh Indraprastha University, New Delhi, India
e-mail: jyoti.19116490020@ipu.ac.in
J. Singh
e-mail: jaspreeti_singh@ipu.ac.in
A. Gosain
e-mail: anjana_gosain@ipu.ac.in
https://doi.org/10.1007/978-981-99-6553-3_7
88 Jyoti et al.
inconsistency, shortage of time, etc. [1–3]. Therefore, the decision-making process

turns out to be ineffective, since the comprehensive understanding of the data is
unavailable.
Missing values leading to incomplete data can result in reduced statistical power
and biased outcomes [4, 5]. Furthermore, incomplete data may lead to incorrect
conclusions during the analysis of a study. Thus, the treatment of missing values is
very crucial to attain accurate and reliable outcomes.
Initially, the data may not be clean, and duplication of some information may
be observed, which can further reduce the quality of data leading to difficulty in
achieving accurate results. Therefore, data pre-processing needs to be taken into
consideration for the enhancement of data quality. During the pre-processing stage,
treatment of missing values, outliers, and noisy values is done, which helps in
recovering from data impurities so that data can be used for further analysis.
A very simple method to tackle incomplete data is simply deleting the records
with missing values [2, 6], but it is only applicable when the number of observations
is very few and a significant data loss can be observed if the size of the data is large.
This in turn may reduce the performance and effectiveness of data [7]. For this reason,
it is considered to be the worst method for handling missing data. Thus, it becomes
essential to impute missing values rather than to simply deleting them to ensure
complete and accurate data. Completion of incomplete datasets using missing value
imputation (MVI) techniques is done through the imputation of values corresponding
to missing values [1, 8].
The MVI techniques can be categorized into two groups: single imputation and
multiple imputation [1, 6, 9]. Single imputation (SI) involves estimating only one
value for missing data [9], while multiple imputation (MI) techniques allow for the
use of more than one estimated value for imputation [1, 6]. Out of the various MI
methods, the most efficient one is the fuzzy clustering imputation method [2, 3].
Linear interpolation (LI) method is a simple method that uses correlation between
variables [10, 11]. Possibilistic fuzzy c-means (PFCM) is able to deal with noise and
outliers in a better way as compared to fuzzy c-means (FCM) and possibilistic c-
means (PCM) fuzzy clustering methods [12]. In this paper, the merits of LI (SI
technique) and PFCM (MI technique) for missing value imputation have been a
motivation to introduce a new hybrid imputation method called LIPFCM for dealing
with incomplete data. The main contribution of this research is as follows:
• A new approach LIPFCM has been proposed for the imputation of missing values.
• It can be experimentally validated that the proposed LIPFCM approach shows
better imputation performance in comparison with other state-of-the-art imputa-
tion methods.
• Our proposed method shows its robustness to the different percentages of missing
values.
The rest of the paper is organized as follows. Section 2 briefly shows the related
work. The proposed LIPFCM imputation method is described in Sect. 3. The experi-
mental framework is explained in Sect. 4. Section 5 illustrates the experimental results
LIPFCM: Linear Interpolation-Based Possibilistic Fuzzy C-Means … 89
and discussion of comparing LIPFCM with other imputation methods. Section 6

concludes the paper and discusses future work.
2 Related Work
Researchers have conducted a number of surveys to address the effect of missing

value imputation methods on performance measures [13–16]. Apart from these, many
researchers have proposed different imputation techniques based on fuzzy clustering
for handling the missing values.
Based on FCM algorithms, four different strategies for incomplete data were
proposed by Hathway and Bazdek for the first time [17], namely optimal completion
strategy (OCS), whole data strategy (WDS), nearest prototype strategy (NPS), and
partial data strategy (PDS).
A kernel-based fuzzy clustering commonly known as kernel fuzzy c-means
(KFCM) approach was introduced by Zhang et al. in [18], for clustering incomplete
data. Optimization of fuzzy possibilistic c-means algorithm was done by utilizing
Support Vector Regression and Genetic Algorithm (SVRGA) while imputing missing
values for error minimization [19].
Several researchers found attribute’s type [20], weight [21], and correlation [22]
for imputing missing values and clustering the incomplete data in their studies. Hu
et al. introduced an approach to reduce the influence of outliers using similarity
measures for processing incomplete data [23].
Based on fuzzy c-means, an alternative approach using the data-filling approach
for the incomplete soft set (ADFIS) was proposed in [24]. Four real datasets were
used for performing the experiments. This paper concluded that ADFIS performs
better than all other methods in terms of accuracy.
Hybrid fuzzy c-means and majority vote techniques were proposed to achieve
high accuracy for predicting the missing value of microarray data which is typically
used in the medical field [25]. The performance of the proposed technique was
compared against three imputation methods, namely zero, kNN, and Fcm using four
real datasets. They concluded that their proposed technique proved to be the best. In
[26], advanced and enhanced FCM algorithm for health care was suggested.
LI imputation is a simple yet effective way of imputing missing values as it uses

correlation between variables for predicting the missing values [10, 11]. PFCM can
handle noise and outliers in a better way as compared to FCM and PCM fuzzy
clustering methods while handling incomplete data [12].
In this paper, a method called linear interpolation-based possibilistic fuzzy c-
means (LIPFCM) has been proposed for the imputation of missing value which
90 Jyoti et al.
exploits a fuzzy clustering approach and linear interpolation algorithm. The flowchart
of proposed method is shown in Fig. 1. The steps of the proposed method are
discussed below.
Step 1. Missing values simulation
From a whole dataset D, an incomplete dataset D f is generated by removing a certain
percentage of missing values randomly considering the following constraints:
1. Each attribute must contain at least one value.
2. Each record must contain at least one value.
Step 2. Apply the LI imputation to complete the data

LI imputation method is applied by imputing the missing values to complete the
dataset [10]. LI fits a straight line between two datapoints and uses straight-line
equation for imputation of missing values. Let y1 , y2 , and y3 are three datapoints
having coordinate points as (y11 , y12 ), (y21 , y22 ), and (y31 , y32 ), respectively. Let
datapoint y2 has missing values, then missing value y22 at y21 for y2 is computed
using Eq. (1) as follows:
(y32 − y12 )
y22 = y12 + (y21 − y11 ). (1)
(y31 − y11 )
Step 3. Apply PFCM on complete dataset

PFCM algorithm groups data items of a dataset X = {x1 , x2 , . . . ..xk } into fuzzy clus-
ters v = {v1 , v2 , . . . ..vc } that works by minimizing the following objective function
Fig. 1 Flowchart depicting the step-by-step process of the proposed LIPFCM method
as shown in Eq. (2) [12]:
∑`
c ∑`
k
η
∑`
c ∑`
k
( )η
JFCM = (au imj + bti j )di2j + γk 1 − ti j , (2)
j=1 i=1 j=1 i=1
where ‘c, ’ ‘m, ’ and ‘k’ represent the number of clusters, fuzzifier (constant
value), number of data items, respectively. And ‘u i j ’ and ‘ti j ’ denote the fuzzy and
possibilistic memberships of datapoint ‘xi ’ in cluster j as shown in Eqs. (4) and (5).
∑`
c
u i j = 1, u i j ∈ [0, 1], for i = 1, 2, 3, . . . , n. (3)
j =1
‘di j ’ denotes Euclidian distance|| which|is | the distance among the datapoint ‘xi ’ and
the cluster center ‘v j . ’ (d i j = || x i − v j ||). Each datapoint membership is updated
as follows:
1
ui j = ( ) (m−1) ∀i, j, (4)
∑c 2
d ji
s=1 dsi
1
ti j = ( 1 .
) η−1 (5)
1+ b 2
d
γ ij
The cluster centers are updated using Eq. (6) as follows:

∑k η
i=1 (au i j + bti j )x i
m
v j = ∑k η
. (6)
i=1 (au i j + bti j )
m
Step 4. Update previous imputed value with a new value

In this step, already imputed missing data feature is updated with a new value. Let
‘ylp ’ be the p th feature value of l th record in dataset X. Let ‘xi p ’ be a missing value
that is calculated using Eq. (7) as follows:
∑c η
i=1 (au i p + bti p )vli
m
xlp = ∑c η . (7)
i=1 (au i p + bti p )
m
Step 5. Stopping Criteria

|| ||
The stopping criteria [27, 28] are specified as ||v (k+1) − v (k) || < ε, where ε =
0.00001.
92 Jyoti et al.
Table 1 Description of
Name No. of records No. of attributes No. of class
datasets
Iris 150 4 3
Glass 214 9 2
Seed 210 7 3
Wine 178 13 3
4 Experimental Framework
4.1 Dataset Description
The performance of the proposed LIPFCM is assessed on four well-known widely

used datasets: Iris, Glass, Seed, and Wine that are obtained from the UCI machine
repository [29]. In addition, these datasets differ from each other in terms of size and
variables. A brief description of the datasets is presented in Table 1.
4.2 Missing Value Simulation
A specific portion of data from each dataset has been removed randomly to create
missing data for experimentation purpose. Different missing ratios of 1, 3, 5, 7, and
9% of missing values in each original dataset are injected for performance compar-
ison. The simulated missing values are then imputed using imputation methods,
namely mean, LI, KNNI, FKMI, LIFCM, and our proposed LIPFCM.
4.3 Evaluation Criteria
The performance of LIPFCM is compared utilizing two well-known evaluation

criteria, i.e., root mean squared error (RMSE) and mean absolute error (MAE). If N
is the number of artificially generated missing values, Ai and P i (1 ≤ i ≤ N) are the
actual and predicted values of the i th missing value, respectively.
RMSE and MAE are the average difference between actual and predicted values
[30, 31] as given in Eqs. (8) and (9). These values vary from 0 to ∞, where minimum
value denotes an improved imputation performance.
| n
| 1 ∑`
RMSE = | [Pi − Ai ]2 , (8)
n i=1
1 ∑`
n
MAE = |Pi − Ai |. (9)
n i=1
5 Experimental Results and Discussion
We compare LIPFCM with four existing techniques, namely mean [6, 9], LI [10],
KNNI [32], FKMI [30], and LIFCM [27] for five different missing ratios. There are
altogether 20 missing combinations, generated by the combination of four datasets
on five different missing ratios.
The quantitative analysis of the performance of mean, LI, KNNI, FKMI, LIFCM,
and LIPFCM, on all datasets, based on RMSE and MAE for 20 missing combinations
are presented in Tables 2 and 3, respectively. Also, the best results among the six
techniques have been highlighted. The RMSE of LIPFCM with a missing ratio 1% is
0.127 for Iris dataset as displayed in Table 2. The RMSE of mean, LI, KNNI, FKMI,
and LIFCM with missing ratio 1% are 0.308, 0.263, 0.376, 0.248, and 0.178 for Iris
dataset, respectively. It can be observed that LIPFCM performs significantly better
than other techniques on all missing combinations for Iris, Glass, Seed, and Wine
datasets in terms of RMSE and MAE.
The graphical representation of RMSE and MAE for Iris, Glass, Seed, and Wine
dataset is illustrated in Figs. 2 and 3. It is clearly observed from Tables 2 and 3
and from Figs. 2 and 3 that the proposed LIPFCM MVI method gives better results
than mean, LI, KNNI, FKMI, and LIFCM imputation method examined on the four
datasets. It can be concluded through the experimental results that the proposed
method is more efficient for real-world datasets with different ratio of missing values.
Even though the experiment carried out on four dataset having different missing
ratios has shown the best result, yet there are still some limitations of this study
which are the proposed algorithm is applied on incomplete dataset having only
lower percentage of missingness and small datasets are utilized in the experiments.
Furthermore, the proposed technique may not be suitable for nonlinear datasets, and
coincident cluster problem [12] may also persist while handling incomplete data.
Missing values occurs frequently in numerous datasets in data mining. Thus, MVI
methods are extensively employed in data mining to address missing values. In this
paper, a framework for imputing missing values is proposed with the hybridization of
linear interpolation and a fuzzy clustering technique called LIPFCM. The proposed
method has been compared with five other high-quality existing methods using four
94 Jyoti et al.
Table 2 Comparison of performance of proposed method with other imputation methods on

datasets in terms of RMSE
Dataset Missing Imputation technique1
ratio (%) Mean LI KNNI FKMI LIFCM LIPFCM
Iris 1 0.308 0.263 0.376 0.248 0.178 0.127
3 0.301 0.389 0.313 0.309 0.408 0.245
5 0.600 0.550 0.620 0.527 0.448 0.409
7 0.608 0.469 0.486 0.636 0.556 0.393
10 0.720 0.616 0.629 0.520 0.594 0.417
Glass 1 0.084 0.083 0.050 0.100 0.052 0.029
3 0.080 0.094 0.116 0.084 0.074 0.060
5 0.106 0.146 0.162 0.109 0.120 0.079
7 0.181 0.170 0.187 0.147 0.109 0.097
10 0.192 0.171 0.198 0.138 0.117 0.116
Seed 1 0.126 0.119 0.105 0.084 0.079 0.046
3 0.103 0.118 0.093 0.158 0.135 0.069
5 0.114 0.105 0.106 0.150 0.110 0.042
7 0.122 0.102 0.117 0.113 0.133 0.102
10 0.148 0.117 0.157 0.142 0.150 0.117
Wine 1 0.165 0.191 0.177 0.143 0.142 0.142
3 0.213 0.194 0.220 0.204 0.227 0.182
5 0.091 0.130 0.244 0.095 0.123 0.085
7 0.095 0.177 0.230 0.106 0.176 0.094
10 0.108 0.174 0.237 0.135 0.187 0.097
publicly available real-world datasets for experimentation based on two evaluation

criteria: RMSE and MAE.
The experimental results show that LIPFCM method performs significantly better
than the other five state-of-the-art imputation algorithms based on RMSE and MAE.
The MAE of mean, LI, KNNI, FKMI, LIFCM, and LIPFCM with missing ratio
1% are 0.187, 0.143, 0.348, 0.148, 0.145, and 0.120 for Iris dataset, respectively.
Therefore, the proposed method is efficient in imputing missing values for all the
datasets. In the future, further research can be conducted to explore its suitability on
a more extensive range of datasets and applications to test its generalizability.
1 The best result among the six imputation methods is denoted by bold values in the table.
Table 3 Comparison of performance of proposed method with other imputation methods on

datasets in terms of MAE
Dataset Missing Imputation technique
ratio (%) Mean LI KNNI FKMI LIFCM LIPFCM
Iris 1 0.187 0.143 0.348 0.148 0.145 0.120
3 0.247 0.238 0.243 0.167 0.176 0.134
5 0.430 0.361 0.556 0.355 0.318 0.302
7 0.418 0.422 0.478 0.360 0.366 0.285
10 0.537 0.633 0.686 0.480 0.390 0.315
Glass 1 0.037 0.042 0.030 0.041 0.016 0.011
3 0.021 0.043 0.051 0.026 0.018 0.017
5 0.056 0.076 0.093 0.045 0.029 0.019
7 0.094 0.081 0.101 0.058 0.042 0.021
10 0.102 0.083 0.105 0.087 0.052 0.023
Seed 1 0.095 0.073 0.052 0.033 0.047 0.025
3 0.083 0.075 0.057 0.112 0.106 0.033
5 0.087 0.079 0.075 0.109 0.096 0.031
7 0.101 0.085 0.081 0.094 0.102 0.076
10 0.125 0.097 0.127 0.111 0.115 0.083
Wine 1 0.128 0.137 0.148 0.093 0.096 0.092
3 0.159 0.166 0.169 0.157 0.187 0.109
5 0.084 0.129 0.180 0.073 0.100 0.070
7 0.079 0.134 0.177 0.074 0.123 0.059
10 0.072 0.148 0.198 0.095 0.138 0.075
96 Jyoti et al.
Fig. 2 Comparison of
performance analysis of
proposed imputation method
with other imputation
methods on a Iris dataset,
b Glass dataset, c Seed
dataset, and d Wine dataset
in terms of RMSE (the lower
the better)
Fig. 3 Comparison of
performance analysis of
proposed imputation method
with other imputation
methods on a Iris dataset,
b Glass dataset, c Seed
dataset, and d Wine dataset
in terms of MAE (the lower
the better)
98 Jyoti et al.
References
1. Jyoti, SJ, Gosain A (2022) Handling missing values using fuzzy clustering: a review. In: Inter-
national conference on innovations in data analytics 2022 Nov 29. Springer Nature Singapore,
Singapore, pp 341–353
2. Di Nuovo AG (2011) Missing data analysis with fuzzy C-Means: a study of its application in
a psychological scenario. Expert Syst Appl 38(6):6793–6797
3. Azim S, Aggarwal S (2014) Hybrid model for data imputation: using fuzzy c means and
multilayer perceptron. In: 2014 IEEE international advance computing conference (IACC)
2014 Feb 21. IEEE, pp 1281–1285
4. Zhang Y, Thorburn PJ (2022) Handling missing data in near real-time environmental
monitoring: a system and a review of selected methods. Futur Gener Comput Syst 1(128):63–72
5. Rani S, Solanki A (2021) Data imputation in wireless sensor network using deep learning tech-
niques. In: Data analytics and management: proceedings of ICDAM 2021. Springer Singapore,
pp 579–594
6. Rioux C, Little TD (2021) Missing data treatments in intervention studies: what was, what is,
and what should be. Int J Behav Dev 45(1):51–58
7. Kwak SK, Kim JH (2017) Statistical data preparation: management of missing values and
outliers. Korean J Anesthesiol 70(4):407–411
8. Goel S, Tushir M (2019) A semi-supervised clustering for incomplete data. In: Applications
of artificial intelligence techniques in engineering 2019. Springer, Singapore, pp 323–331
9. Nijman SW, Leeuwenberg AM, Beekers I, Verkouter I, Jacobs JJ, Bots ML, Asselbergs FW,
Moons KG, Debray TP (2022) Missing data is poorly handled and reported in prediction model
studies using machine learning: a literature review. J Clin Epidemiol 1(142):218–229
10. Noor MN, Yahaya AS, Ramli NA, Al Bakri AM (2014) Filling missing data using interpolation
methods: study on the effect of fitting distribution. Trans Tech Publications Ltd.
11. Huang G (2021) Missing data filling method based on linear interpolation and lightgbm. In:
Journal of physics: conference series , vol 1754, no 1. IOP Publishing, pp 012187
12. Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm.
IEEE Trans Fuzzy Syst 13(4):517–530
13. Hasan MK, Alam MA, Roy S, Dutta A, Jawad MT, Das S (2021) Missing value imputation
affects the performance of machine learning: a review and analysis of the literature (2010–
2021). Inf Med Unlocked 1(27):100799
14. Gond VK, Dubey A, Rasool A (2021) A survey of machine learning-based approaches for
missing value imputation. In: 2021 third international conference on inventive research in
computing applications (ICIRCA). IEEE, pp 1–8
15. Das D, Nayak M, Pani SK (2019) Missing value imputation–a review. Int J Comput Sci Eng
7(4):548–558
16. Lin WC, Tsai CF (2020) Missing value imputation: a review and analysis of the literature
(2006–2017). Artif Intell Rev 53:1487–1509
17. Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans
Syst Man, and Cybernet Part B (Cybernet) 31(5):735–744
18. Zhang DQ, Chen SC (2003) Clustering incomplete data using kernel-based fuzzy c-means
algorithm. Neural Process Lett 18(3):155–162
19. Saravanan P, Sailakshmi P (2015). Missing value imputation using fuzzy possibilistic c means
optimized with support vector regression and genetic algorithm. J Theoret Appl Inf Technol
72(1)
20. Furukawa T, Ohnishi SI, Yamanoi T (2013) A study on a fuzzy clustering for mixed numerical
and categorical incomplete data. In: 2013 International conference on fuzzy theory and its
applications (iFUZZY). IEEE, pp 425–428
21. Li D, Zhong C (2015) An attribute weighted fuzzy c-means algorithm for incomplete datasets
based on statistical imputation. In: 2015 7th international conference on intelligent human-
machine systems and cybernetics, vol 1. IEEE, pp 407–410
22. Mausor FH, Jaafar J, Taib SM (2020) Missing values imputation using fuzzy C means based on
correlation of variable. In: 2020 international conference on computational intelligence (ICCI)
2020 Oct 8, IEEE, pp 261–265
23. Hu Z, Bodyanskiy YV, Tyshchenko OK, Shafronenko A (2019) Fuzzy clustering of incomplete
data by means of similarity measures. In: 2019 IEEE 2nd Ukraine conference on electrical and
computer engineering (UKRCON), IEEE, pp 957–960
24. Sadiq Khan M, Al-Garadi MA, Wahab AW, Herawan T (2016) An alternative data filling
approach for prediction of missing data in soft sets (ADFIS). Springerplus 5(1):1–20
25. Kumaran SR, Othman MS, Yusuf LM, Yunianta A (2019) Estimation of missing values using
hybrid fuzzy clustering mean and majority vote for microarray data. Proced Comput Sci
1(163):145–153
26. Purandhar N, Ayyasamy S, Saravanakumar NM (2021) Clustering healthcare big data using
advanced and enhanced fuzzy C-means algorithm. Int J Commun Syst 34(1):e4629
27. Goel S, Tushir M (2021) Linear interpolation-based fuzzy clustering approach for missing data
handling. In: Advances in communication and computational technology: select proceedings
of ICACCT 2019 2021. Springer, Singapore, pp 597–604
28. Goel S, Tushir M (2020) A new iterative fuzzy clustering approach for incomplete data. J Stat
Manag Syst 23(1):91–102
29. Dua D, Graff C UCI machine learning repository http://archive.ics.uci.edu/ml
30. Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of
fuzzy k-means clustering method. In: International conference on rough sets and current trends
in computing. Springer, Berlin, Heidelberg, pp 573–579
31. Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM
approach. Knowl Inf Syst 46(2):389–422
32. Beretta L, Santaniello A (2016) Nearest neighbor imputation algorithms: a critical evaluation.
BMC Med Inform Decis Mak 16(3):197–208
Experimental Analysis of Two-Wheeler
Headlight Illuminance Data
from the Perspective of Traffic Safety
Aditya Gola, Chandra Mohan Dharmapuri, Neelima Chakraborty,

S. Velmurugan, and Vinod Karar
Abstract The goal of this study is to determine how headlight illumination affects
motorized two-wheeler vehicle visibility and safety on the road. There were 15
subjects considered in this study, each riding a different kind of two-wheeler having
a range of lighting technology and ages. The subjects considered in the study were
asked to rate the visibility from their vehicles, and hence, the study focused on head-
light illumination for both vertical and horizontal light distributions at varied forward
distances. The study revealed that age of the two-wheeler had huge bearing on the
light output from the headlight which can be attributed to range of factors like exterior
polycarbonate cover getting hazier due to aging and handling, decreased reflectivity,
and misalignment of the light reflector/optics with respect to the light source. Further,
the technology of the headlight light source had a big impact on how much light is
produced; LED technology produces roughly three times as much light than halogen-
based technology. In terms of lux values, angular spread, focusing distances, and
nonuniform angular spread, there is a significant variation in light output measured
across all 15 vehicles, pointing to either workmanship issues with the headlight
assembly, the light design itself, or the effect of headlight aging or inconsistent head-
light fitment into the two-wheeler. These results shed light on such variable headlight
performance and the need for effective headlight technology, which can assist drivers,
automakers, and policymakers in enhancing road visibility and safety for motorized
two-wheeler vehicles.
A. Gola · C. M. Dharmapuri
G. B. Pant Government Engineering College, New Delhi, India
e-mail: chandra.m.dharmapuri@dseu.ac.in
N. Chakraborty · S. Velmurugan · V. Karar (B)
CSIR—Central Road Research Institute, New Delhi, India
e-mail: vinodkarar.crri@nic.in
N. Chakraborty
e-mail: neelima.crri@nic.in
S. Velmurugan
e-mail: vms.crri@nic.in
https://doi.org/10.1007/978-981-99-6553-3_8
102 A. Gola et al.
Keywords Motorized two-wheelers · Headlight · Headlight illuminance · Light

angular spread
1 Introduction
In recent years, the use of two-wheeled motorized vehicles has increased tremen-
dously in urban areas due to various reasons such as ease of use, lower cost, and
less space consumption. They are one of the most commonly used vehicles in cities,
particularly in developing countries. However, riding a two-wheeler can be risky due
to several factors, including poor road conditions, unpredictable weather, and reck-
less driving. Therefore, it is crucial for riders to take safety measures to reduce risk
of road crashes, associated fatalities, and serious injuries. One such safety feature
is the headlight, which plays a vital role in enhancing visibility and reducing the
likelihood of collisions.
One of the primary causes of road crashes involving two-wheeled vehicles is
the failure of other road users to see them. This is often due to the small size of
two-wheeled vehicles and the lack of proper lighting. Some of the purposes served
by two-wheeler headlights are increased visibility, improved situational awareness,
reduced blind spots, enhanced braking distance, and reduced risk of collisions.
The efficiency of the headlight in increasing visibility is greatly influenced by
its intensity. Lumens, which describe the total quantity of light the bulb emits, are
used to express the intensity of a headlight. In general, the brighter the light and the
better the visibility, the higher are the lumens. The quantity of light radiated by the
headlight bulb as a whole is expressed in lumens as headlight intensity. Illuminance,
on the other hand, refers to the amount of light that hits a specific surface expressed
in lux.
In low-light situations, such as at dawn or dusk or in poorly lit places, a brighter
headlight can increase visibility. Employing high beam headlights in well-lit regions
can result in glare and reduce other road users’ vision. On the other hand, employing
low beam headlights in dimly lit regions might reduce the rider’s sight and raise the
danger of road crashes. A higher illuminance means that more light is falling on
the road surface, which might aid riders in spotting potential hazards like potholes,
debris, or animals.
Along with brightness and intensity, the headlight’s direction is equally important
for improving visibility and situational awareness. To make sure that the headlights
are pointing in the proper direction, they must be properly positioned. Incorrectly
positioned headlights can produce glare, reduce visibility, and result in road crashes.
The distance, spread, and field of view to which the headlight of a motorized two-
wheeler should focus on the road can vary depending on various factors such as the
type of headlight, the design of the motorcycle, and the intended use of the vehicle.
In terms of spread and field of view, the headlight needs to be designed such that it
illuminates a wide enough area to provide good visibility for the rider. The spread
and field of view depend on the type of headlight and the design of the motorcycle,
Experimental Analysis of Two-Wheeler Headlight Illuminance Data … 103
but it should be sufficient to allow the rider to see the road and any potential hazards
on either side of the vehicle and should not cause any inconvenience or danger to
other road users.
The primary objective of this study is to investigate the correlation between head-
light illumination and the visibility and safety of motorized two-wheeler vehicles
on the road. Through an experimental study, encompassing a diverse group of two-
wheeler models with varying lighting technologies and ages, we aim to study the
intricate relationship between headlight illumination and its impact on vehicle visi-
bility and safety. The paper is organized in the following way: Sect. 2 covers the
“Literature review”, Sect. 3 covers the Methodology, while Sects. 4 and 5 cover
“Results and Discussions” and “Conclusion and Future Scope”, respectively.
2 Literature Review
In developing nations, motorized two-wheelers (MTWs) are a common means of

transportation, but they also present a substantial risk to passengers and other road
users [1]. The “Look But Fail to See” error has been linked to one of the most frequent
forms of MTW involving road crashes: the failure of another road user to yield to
an approaching motorbike on the main roadway when exiting from a side road [2,
3]. In developing nations, motorbike injuries are a serious but under-reported rising
public health issue that considerably contributes to overall traffic injuries [1].
The causes of fatal motorbike crashes have been the subject of several research [4,
5]. The prevalence and pattern of motorbike crashes among commercial motorcyclists
in Adidome town were high, according to Sugiyanto’s analysis of the incidence of
motorbike crashes among these riders [6]. Additionally, Ado-Odo Ota, Ogun State,
Nigeria’s prevalence of protective measures and motorbike road crashes was assessed
and inadequate compliance with preventive measures for road safety was observed
[7].
When compared to riders of average weight, obese motorcyclists experience
different types of physical injuries and lengthier hospital stays. In order to mini-
mize injuries, shorten hospital stays, prevent physical handicap, and save societal
expenses by lowering the need for institutional care, safety measures including the
usage of suitable helmets and clothing are crucial [8].
Low motorcycle conspicuity, or the rider’s inability to be noticed by other road
users, is regarded to be a significant risk factor for motorcycle crashes, according
to the literature. Low conspicuity may be caused by the size of the motorcycle, an
irregular contour, low brightness, or backdrop contrast [9]. Therefore, raising the
motorized two-wheeler’s lamp intensity can increase their visibility and lower the
likelihood of road crashes. A retrofit for managing an automobile headlight illumi-
nance has also been found to lessen glare and enhance the judgment of approaching
traffic. It is crucial to remember that giving the wrong signal to neighboring vehicles
can raise the likelihood of deadly collisions. As a result, using intelligent headlamp
intensity management systems can help to increase traffic safety [10, 11].
104 A. Gola et al.
According to a study, both with and without street lighting, the contribution of car
headlights on visibility of the road and objectives located on the road was examined. It
was noted that the street lighting’s contribution was sufficient for ensuring the targets’
appropriate visibility, and that using a car headlight did not always increase a target
visibility on the road, instead the glare from cross-beam headlights affects the drivers
[12]. In order to increase road safety, it is crucial to take other road user reactions
into account when deciding how bright to make headlights. Furthermore, the head
injuries are leading cause of disability, and fatalities in motorbike road crashes are
a big cause for concern. Therefore, wearing the correct clothes and helmets is vital
for preventing injuries, shortening hospital stays, preventing physical handicap, and
saving money on social expenditures by lowering the need for institutional care [13].
The daytime running headlights have been shown to increase motorbike detection
and lower the probability of road crashes and injuries on the road, according to one
study [14]. According to another study, the angle of the lights themselves may be
altered to increase visibility of the road and of targets placed there [15]. Additionally,
a study assessed how different high beam lighting intensities affected driver visibility
and traffic safety. It is found that the chosen separations were advised in the evalu-
ation of headlight glare at distances of 30, 60, 120, and 150 m from the driven car
[14]. In addition, a study discovered that using high beam headlights can improve
pedestrian conspicuity by enhancing visibility and illumination. To increase road
safety, it is crucial to take other road user reactions into account and deploy intel-
ligent headlamp intensity management systems [16, 17]. According to the findings
of the rule discovery process, while having a full license, daylight, and the presence
of shoulders increased the risk of fatal injuries at signalized intersections, inatten-
tiveness, a good road surface, nighttime, the absence of shoulders, and young riders
were highly likely to increase casualty fatalities at non-signalized intersections [18].
3 Methodology
The purpose of this study is to assess the safety aspects with respect to specifications
of headlight assembly, its modes of operation, age of the vehicle, and functional
status of the headlights.
The efficiency of a two-wheeler headlight is significantly influenced by the beam
angle profile and the brightness of the low and high beam operational modes. The low
beam light mode is intended to offer a wide, even beam pattern that lights the road in
front of you without upsetting oncoming vehicles with glare. To comply with rules,
the beam angle is commonly fixed at 15° [19]. The high beam light, on the other
hand, has a narrower beam pattern that may illuminate a wider area and is intended
to offer maximum illumination in low-light or dark environments. However, if used
carelessly, the high beam light can irritate other motorists [17]. Higher headlamp
illuminance levels have been shown in tests to increase driver visibility and road
safety, making them another crucial consideration [14]. The optical layout of the
headlight system, which may include digital micromirrors to refocus incident light
at a certain angle, determines the form of the beam pattern. To balance the demand for
visibility with the need to lessen glare for other drivers, the headlight system beam
angle profile and illumination must be properly engineered. High beam lights are
directed upward to provide the most illumination, while low beam lights are angled
downward to reduce glare. High beam lights are more successful at spotting possible
hazards and preventing collisions, according to studies, but they can also irritate
and blind other drivers. Hence, this study was conducted to examine the operational
characteristics of two-wheeler headlights in both low and high beam modes, taking
into account various factors such as forward distance, light spread, vehicle age, and
headlight technology.
3.1 Methodology
. Research design: This study uses a mixed-method research design, which includes
both qualitative and quantitative methods. The qualitative methods include liter-
ature reviews to gather information on the safety aspects of headlights and
gather information through questionnaire answered by the subjects. The quan-
titative methods encompass observational study of the usage of headlights by
two-wheeled motorized vehicle riders on city roads.
. Sample selection: The observational study is conducted on a sample of 15 two-
wheeled motorized vehicle riders on city roads.
. Data collection: Data collection involves the following:
Two-wheelers, riders, use of lux meter to measure headlight illumination.
Measurement of headlight illuminance in low beam and high beam operations:
measurement of headlight illuminance in low beam and high beam operations
at forward distance of 4, 6, 8, and 10 m at vertical height from ground of 0 and
1 m, and at horizontal spread distance/position viz. 0 m, 2 m left and 2 m right
from the approximate center of the headlight.
. Questionnaire to riders.
. Data analysis.
3.2 Experimental Setup
The experimental setup shown in Fig. 1 comprises two-wheelers, lux meter, and a
mounting stand to mount lux meter and vary mounting height of the lux meter to
simulate light intensity measurements at different heights.
106 A. Gola et al.
Fig. 1 Experimental setup
The subjective data obtained through questionnaire filled by the two-wheeler riders
are listed in Table 1.
The results are depicted graphically in Fig. 2, Fig. 3, Fig. 4, Fig. 4.4, and Fig. 4.5.
From Fig. 2a, which shows the illuminance profile of two-wheeler headlights at
horizontal center at forward distances of 4, 6, 8, and 10 m at height 0 m from the
ground for low beam mode, it is seen that the headlight illuminance level goes down
as we move forward from 4 to 10 m for vehicles nos. 1, 2, 5, 8, 11, 12, and 14, while
it goes up as we move forward from 4 to 10 m for vehicle nos. 7, 9, 10, and 15. It
also goes up for vehicle nos. 3, 4, and 6 with exception at forward distances of 6 m,
8 m, and 10 m, respectively. The vehicle 13 did not had low beam mode working.
From Fig. 2b, which shows the illuminance profile of two-wheeler headlights at
horizontal center at forward distances of 4 m, 6 m, 8 m, and 10 m at height 1 m from
the ground for low beam mode, it is seen that headlight illuminance level goes down
as we move forward from 4 to 10 m for vehicles nos. 1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 14,
and 15. It also goes down for vehicle nos. 2 and 5 with exception at forward distance
of 6 m.
From Fig. 2c, which shows the illuminance profile of two-wheeler headlights at
ground for high beam mode, it is seen that headlight illuminance level goes down as
we move forward from 4 to 10 m for vehicles nos. 1, 2, 3, 5, 6, 8, 10, 11, 12, and 14
while it goes up for vehicle 4. The illuminance value goes down for vehicles 7, 9,
13, and 15 with exception at forward distance of 6 m.
From Fig. 2d, which shows the illuminance profile of two-wheeler headlights at
ground for high beam mode, it is seen that headlight illuminance level goes down as
we move forward from 4 to 10 m for all the vehicles.
From Table 1 (which also provides subjective evaluation), Figs. 3a, b, it is seen that
range of headlight illuminance varies from 13.5 lx (8-year-old vehicle with halogen
headlight) to 55.9 lx (2-year-old vehicle with LED headlight) at measurement height
of 0 m from the ground in low beam mode, while it varies from 8.15 lx (11-year-old
Table 1 Subjective data through questionnaire obtained about the two-wheeler vehicle type/model,
vehicle and rider age, light source, original/replacement fitment, headlight working status, and light
source used in the headlight
Rider no. Vehicle Two-wheeler Vehicle Light: Is road Working Headlight
and age type model age—Years: original visibility status of light
of rider Y Months: or with high and source
(in years) M replaced current low
headlight modes of
good? headlight
Rider-1 Bike Hero 3Y 5M Original Yes Both Halogen
Splendor i3s working
Splendor Pro working
2015 BS3
Rider-3 Scooter Activa 6G 11M Original Yes Both Halogen
STD working
Rider-4 Bike Honda 2Y 4M Original Yes Both LED
Sp125-Disc working
Rider-5 Bike Hero CD 11Y Original Yes Both Halogen
Deluxe working
Rider-6 Bike TVS Redeon 2Y 9M Original Yes Both Halogen
working
Rider-7 Bike Hero 8M Original Yes Both Halogen
Splendor i3S working
Splendor Pro working
Rider-9 Bike Hero HF 2Y 10M Original Yes Both Halogen
Deluxe working
Rider-10 Bike Hero Passion 8Y 1M Original Yes Both Halogen
X Pro working
Rider-11 Scooter Honda 4Y 4M Original Yes Both LED
Activa 5G working
Rider-12 Bike Super 5Y Original Yes Both Halogen
Splendor working
Rider-13 Bike Bajaj Platina 16Y Original Yes Only Halogen
100 BS3 high
Beam
working
Splendor + working
Rider-15 Bike Hero 4M Original Yes Both Halogen
Splendor + working
Xtec BSVI
108 A. Gola et al.
Fig. 2 Illuminance profile of two-wheeler headlights at horizontal center in forward distances of

4, 6, 8, and 10 m a at height 0 m in low beam mode b at height 0 m in high beam mode c at height
1 m in low beam mode d at height 1 m in high beam mode
vehicle with halogen headlight) to 62.4 lx (2-year-old vehicle with LED headlight)
in high beam mode. At this height, the headlight illuminance value in low beam
mode is approximately 3 times higher at horizontal center than at left and right,
while illuminance value in high beam mode is approximately 2–3 times higher at
horizontal center than at left and right of the horizontal center.
The range of headlight illuminance varies from 13.75 lx (7-year-old vehicle with
halogen headlight) to 193.31 lx (3-year-old vehicle with halogen headlight) for height
of 1 m from the ground, while it varies from 17.3 lx (7-year-old vehicle with halogen
headlight) to 714 lx (4-year-old vehicle with LED headlight). At this height, the
headlight illuminance value in high beam mode is approximately 25 times higher at
horizontal center than at left and right of the horizontal center, while the illuminance
value in high beam mode is approximately 35 times higher at horizontal center than
at left and right of the horizontal center.
It is also seen that at forward distance of 4 m, the intensity levels at left and right
positions from the horizontal center are almost symmetrical in most of the vehicles.
Age of the vehicle also plays a major role majorly due to its outer polycarbonate cover
becoming scratchy and hazier due to aging and handling. The age of the vehicles
nos. 5, 8, and 10 is more than 8 years, and hence, their headlight intensity has gone
down due to its outer polycarbonate cover becoming hazy. The vehicle no. 3 is a
new scooter with halogen headlight, its light output measured at height 0 m is less
but high at 4 m height due to light being deigned to focus at certain distance. The
Fig. 3 Illuminance profile of two-wheeler headlights at three horizontal positions at a forward

distance of a 4 m in low and high beams at height 0 m from the ground b 4 m in low and high
beams at height 1 m from the ground c 6 m in low and high beams at height 0 m from the ground
d 6 m in low and high beams at height 1 m from the ground
headlight intensity for vehicle no. 4, which is a bike and headlight based on LED light
source, is quite high at 0 m height in both low and high beam modes but is very low
at 1 m height due to light being deigned to focus at certain distance. The vehicle no.
11, with headlight based on LED light source, has a very high light output measured
at a height of 1 m. One common observation is that the light output measured at
horizontal center is significantly more compared to the values measured at left and
right of the horizontal center. The light output measured for vehicle no. 7, which is
just 8-month-old bike with halogen-based headlight, is considerably lower compared
to the one with LED headlight.
It is observed from Figs. 3c, d that the range of headlight illuminance varies from
12.02 lx (6-year-old vehicle with halogen headlight) to 58.3 lx (2-year-old vehicle
with LED headlight) for measurement height of 0 m from the ground, while it varies
from 6.95 lx (11-year-old vehicle with halogen headlight) to 29 lx (2-year-old vehicle
with LED headlight). At this height, the headlight illuminance value in low beam
mode is approximately 5 times higher at horizontal center than at left and right of
the horizontal center. The illuminance value in high beam mode is approximately 2
times higher at horizontal center than at left and right of the horizontal center.
110 A. Gola et al.
Fig. 4 a Illuminance profile of two-wheeler headlights at three horizontal positions at a forward

distance of a 8 m in low and high beams at height 0 m from the ground b 8 m in low and high
beams at height 1 m from the ground c 10 m in low and high beams at height 0 m from the ground
d 10 m in low and high beams at height 1 m from the ground
The range of headlight illuminance varies from 6.92 lx (5-year-old vehicle with
halogen headlight) to 185.17 lx (11-year-old vehicle with halogen headlight) for
measurement height of 1 m from the ground, while it varies from 7.6 lx (7-year-old
vehicle with halogen headlight) to 372 lx (4-year-old vehicle with LED headlight).
At this height, the headlight illuminance value in low beam mode is approximately
10 times higher at horizontal center than at left and right of the horizontal center. The
illuminance value in high beam mode is approximately 15 times higher at horizontal
center than at left and right of the horizontal center. The vehicle 4 shows high illu-
minance levels at all regions at height of 0 m, while vehicle 11 provides maximum
illuminance levels among all vehicles at height of 1 m. At 6 m horizontal distance
and 1 m vertical distance, the illuminance levels in low beam mode are better than
what it is for 4 m measurement height.
It is seen from Fig. 4a, b that the observed range of headlight illuminance varies
from 11.44 lx (11-year-old vehicle with halogen headlight) to 82.6 lx (2-year-old
vehicle with LED headlight) for measurement height of 0 m from the ground. These
values vary from 6.71 lx (11-month-old vehicle with halogen headlight) to 98.6 lx
(2-year-old vehicle with LED headlight). At this height, the headlight illuminance
value in low beam is approximately 2 times higher at the horizontal center than at
left and right of the horizontal center, while in high beam, it is approximately 2 times
higher at horizontal center than at left and right of the horizontal center.
The range of measured headlight illuminance varies from 3.35 lx (7-year-old
vehicle with halogen headlight vehicle) to 59.13 lx (11-year-old vehicle with halogen
headlight vehicle) for height of 1 m from the ground. This value varies from 5.15 lx
(7-year-old vehicle with halogen headlight vehicle) to 198.2 lx (4-year-old vehicle
with LED headlight vehicle). At this height, the headlight illuminance value in low
beam mode is approximately 5 times higher at horizontal center than at left and right
of the horizontal center while in high beam mode, and it is approximately 7 times
higher at horizontal center than at left and right of the horizontal center. The vehicle
4 shows maximum illuminance value in all regions at height of 0 m, while vehicle
11 shows maximum illuminance value at high beam at height of 1 m.
It is seen from Figs. 4c, d that the range of measured headlight illuminance varies
from 8.39 lx (5-year-old vehicle with halogen headlight vehicle) to 63.7 lx (2-year-
old vehicle with LED headlight) for height of 0 m from the ground, while it varies
from 7.5 lx (11-month-old vehicle with halogen headlight) to 92.4 lx (2-year-old
vehicle with LED headlight). At this height, the headlight illuminance value in low
beam is approximately 2 times higher at horizontal center than at left and right of the
horizontal center, while measured illuminance value in high beam is approximately
2 times higher at horizontal center than at left and right of the horizontal center.
The range of measured headlight illuminance is varied from 4.01 lx (7-year-old
vehicle with halogen headlight) to 36.1 lx (8-year-old vehicle with halogen headlight)
for height of 1 m from the ground, while it is varied from 5.14 lx (7-year-old vehicle
with halogen headlight) to 132.9 lx (4-year-old vehicle with LED headlight). At this
height, the headlight illuminance value in low beam is approximately 3 times higher
at horizontal center than at left and right of the horizontal center, while measured
illuminance value in high beam is approximately 4 times higher at horizontal center
than at left and right of the horizontal center.
Broadly, it is observed that age of two-wheeler has impact on light output coming
from the headlight majorly due to outer polycarbonate cover becoming hazier due
to aging and handling issues, the light reflector reflectivity going down, and its
misalignment with respect to light source and mechanical fitment in the bike. The
headlight light source technology plays a significantly role in light output with LED
technology giving light output almost three times as compared to halogen-based
technology. There is significant variation in light output measured in term of illu-
minance values in lux, their angular spread, and their focusing distances pointing to
either the workmanship issues with respect to the headlight assembly or consistency
in its fitment in the two-wheeler. The paper introduces an innovative perspective of
the research by examining the effects of headlight penetration in both low and high
beam operational modes on various motorized two-wheelers with different ages and
headlight technologies. Further exploration is needed to delve into the implications
of headlamp technology on both fellow drivers and the surrounding environment.
This necessitates conducting extensive investigations encompassing longer forward
112 A. Gola et al.
distances, larger spatial gaps, wider horizontal angular coverage, varying heights,
and diverse ambient light conditions.
5 Conclusion and Future Scope
The significance of headlamp lighting in ensuring the visibility and safety of motor-
ized two-wheeler vehicles on the road has been highlighted in this study. The results
of the study demonstrate the important influence of headlamp technology and bike
age on light output, angular spread, and illuminance values. The necessity for effec-
tive headlamp technology is highlighted by the finding that LED technology produces
nearly three times as much light than halogen-based technology. The study also found
a sizable variance in light output among the 15 vehicles, suggesting that the head-
lamp assembly, the light design, the aging of the headlights, or inconsistent headlight
fitment into the two-wheeler may all have been the result of poor workmanship. The
findings of the study emphasize the significance of precise assembly and uniform
headlamp placement on two-wheelers. The results of this study can help improve road
visibility and safety for motorized two-wheeler vehicles for drivers, automakers, and
policymakers. The findings of the study have important ramifications for both public
health and traffic safety, highlighting the necessity of effective headlight technology
to lower collision rates, particularly in low-light conditions. The effects of headlamp
technology on other drivers and the surroundings need to be further investigated at
longer forward ranges with larger distances, at wider horizontal angular spread and
different heights, and under various ambient light conditions.
References
1. Hassan O, Shaker R, Eldesouky R, Hasan O, Bayomy H (2014) Motorcycle crashes: attitudes

of the motorcyclists regarding riders’ experience and safety measures. J Community Health
39. https://doi.org/10.1007/s10900-014-9883-1
2. Lee YM, Sheppard E, Crundall D (2015) Cross-cultural effects on the perception and appraisal
of approaching motorcycles at junctions. Transp Res Part F: Traffic Psychol Behav 31:77–86,
ISSN 1369-8478. https://doi.org/10.1016/j.trf.2015.03.013
3. Brown ID (2002) A review of the ‘looked but failed to see’ accident causation factor. Psychology
4. Soehodho S (2017) Public transportation development and traffic accident prevention in
Indonesia. IATSS Research 40:76–80
5. Suthanaya PA (2016) Analysis of fatal accidents involving motorcycles in low income region
(Case Study of Karangasem Region, Bali-Indonesia). Int J Eng Res Afr 19:112–122
6. Konlan KD, Doat AR, Mohammed I, Amoah RM, Saah JA, Konlan KD, Abdulai JA (2020)
Prevalence and pattern of road traffic accidents among commercial motorcyclists in the Central
Tongu District, Ghana. Sci World J 2020:10, Article ID 9493718. https://doi.org/10.1155/2020/
9493718
7. Afelumo OL, Abiodun OP, Sanni F (2021) Prevalence of protective measures and accident
among motorcycle riders with road safety compliance in a Nigerian semi-urban community.
Int J Occup. Safety Health 11:129–138. https://doi.org/10.3126/ijosh.v11i3.39764
8. Oliveira A, Petroianu A, Gonçalves D, Pereira G, Alberti L (2015) Characteristics of motor-

cyclists involved in accidents between motorcycles and automobiles. Rev Assoc Med Bras
1992(61):61–64. https://doi.org/10.1590/1806-9282.61.01.061
9. Wells S, Mullin B, Norton R, Langley J, Connor J, Lay-Yee R, Jackson R (2004) Motorcycle
rider conspicuity and crash related injury: case-control study. BMJ (Clinical research ed.)
328:857. https://doi.org/10.1136/bmj.37984.574757.EE
10. Sukumaran A, Narayanan P (2019) A retrofit for controlling the brightness of an automotive
headlight to reduce glare by using embedded C program on a PIC Microcontroller. Int J Recent
Technol Eng 8:4240–4244. https://doi.org/10.35940/ijrte.C5150.098319
11. Vrabel J, Stopka O, Palo J, Stopkova M, Droździel P, Michalsky M (1978) Research
regarding different types of headlights on selected passenger vehicles when using sensor-related
equipment. Sensors 2023:23. https://doi.org/10.3390/s23041978
12. Bacelar A (2004) The contribution of vehicle lights in urban and peripheral urban environments.
Light Res Technol 36(1):69–76. https://doi.org/10.1191/1477153504li105oa
13. Yousif MT, Sadullah AFM, Kassim KAA (2020) A review of behavioural issues contribution to
motorcycle safety. IATSS Research 44(2):142–154, ISSN 0386-1112. https://doi.org/10.1016/
j.iatssr.2019.12.001
14. Prasetijo J, Jawi ZM, Mustafa M, Zadie Z, Majid H, Roslan M, Baba I, Zulkifli AFH (2018)
Impacts of various high beam headlight intensities on driver visibility and road safety. J Soc
Automot Eng Malaysia 2:306–314. https://doi.org/10.56381/jsaem.v2i3.96
15. Chhirolya V, Sachdeva P, Gudipalli A (2019) Design of a modular beam control system for
vehicles. Int J Smart Sens Intell Syst 12:1–6. https://doi.org/10.21307/ijssis-2019-008
16. Sewall A, Borzendowski S, Tyrrell R, Stephens B, Rosopa P (2016) Observers judgments of the
effects of glare on their visual acuity for high and low contrast stimuli. Perception 45. https://
doi.org/10.1177/0301006616633591
17. Balk SA, Tyrrell RA (2011) The (in)accuracy of estimations of our own visual acuity in the
presence of glare. Proc Hum Factors Ergon Soc Ann Meet 55(1):1210–1214. https://doi.org/
10.1177/1071181311551252
18. Tamakloe R, Das S, Aidoo EN, Park D (2022) Factors affecting motorcycle crash casualty
severity at signalized and non-signalized intersections in Ghana: insights from a data mining
and binary logit regression approach. Accid Anal Prev 165:106517, ISSN 0001-4575. https://
doi.org/10.1016/j.aap.2021.106517
19. Tsai CM, Fang YC (2011) Optical design of adaptive automotive headlight system with digital
micro-mirror device. Proceedings SPIE 8170, illumination optics II, 81700A 21 Sept 2011.
https://doi.org/10.1117/12.896394
Detecto: The Phishing Website Detection
Ashish Prajapati, Jyoti Kukade, Akshat Shukla, Atharva Jhawar,

Amit Dhakad, Trapti Mishra, and Rahul Singh Pawar
Abstract Phishing attacks are among the most prevalent types of cybercrime that
target people and businesses globally. Phishing websites mimic real websites to
obtain sensitive data of users like usernames, passwords, and credit card numbers.
To identify phishing websites, many people employ machine learning algorithms.
These algorithms use supervised learning techniques to classify websites into the
phishing or legitimate categories. Machine learning algorithms use features such as
URL length, domain age, SSL certificate, and content similarity to determine whether
a URL is real or fake. In recent years, authors have published papers working on
the classification of websites with features by using a support vector machine and
achieving 95% accuracy, and also they classify phishing websites by using a URL
identification strategy or utilizing the random forest algorithm. The dataset contains
a collection of URLs of 11,000+ websites. Each has 30 parameters and a class label
identifying as a phishing website or not. To achieve the highest level of accuracy, we
suggested a model using 32 features extracted from phishing websites and various
machine learning classifiers. Every website has distinct characteristics that are cate-
gorized by trained models. We achieved 97.4% accuracy using 7 classifiers, including
Naïve Bayes, logistic regression, random forest, decision tree, and gradient boosting
algorithm.
A. Prajapati (B) · J. Kukade · A. Shukla · A. Jhawar · A. Dhakad · T. Mishra · R. S. Pawar

Medi-Caps University, Indore, India
e-mail: Aashishpra249@gmail.com
J. Kukade
e-mail: Jyoti.kukade@medicaps.ac.in
A. Jhawar
e-mail: en19it301024@medicaps.ac.in
A. Dhakad
e-mail: en19it301015@medicaps.ac.in
T. Mishra
e-mail: Trapti.mishra@medicaps.ac.in
R. S. Pawar
e-mail: rahuls.pawar@medicaps.ac.in
https://doi.org/10.1007/978-981-99-6553-3_9
116 A. Prajapati et al.
Keywords Phishing · Legitimate · Machine learning · Cybercrime · Supervised

learning
1 Introduction
Phishing attacks are serious hazards to individuals, businesses, and governments

since they are a common and sophisticated threat in the digital world. The usage
of the internet increases security threats and cybercrimes [1]. Phishing is a fraudu-
lent activity where cyber criminals create fake websites that mimic legitimate ones,
intending to steal personal information like banking or social media ID passwords,
credit card numbers, or other personal data. Phishing attacks pose a serious risk
to both people and companies, and cybersecurity must find them. Machine learning
techniques have shown promising results in detecting phishing websites. These tech-
niques involve training models on large datasets of phishing and legitimate websites
and then using the models to classify new websites as either phishing or legitimate.
When a person falls for the scam by putting their trust in the fake page, the
phisher is succeeded. In recent studies, researchers have focused more on phishing
attempts to prevent harm to unintentional web users [2]. Social networking, commu-
nication, finance, marketing, and service delivery have all undergone revolutionary
changes as a result of the internet. These internet facilities are being used by an
increasing number of people. To suit human requirements, however, communication
technology is developing. Yet opponents are also developing new ways to obstruct
communication. These adversaries deceive the user by using malicious software or
phishing websites to steal crucial information. One of the deceptive methods used in
the online world is phishing. The scammer sends a temptation that looks like a real
website and waits for users to become victims. A common phishing attack tactic uses
a phishing website to trick people into visiting fraudulent websites by mimicking
the domain and designs of trustworthy websites like Flipkart, SBI, and Amazon [3].
Some common features that can be used to train these models include URL length,
presence of subdomains, use of HTTP or HTTPS, presence of certain keywords or
phrases in the URL, and characteristics of the website’s content. Phishing is an illegal
attempt made by attackers to get users’ personal information by setting up phony
websites. In these circumstances, users who submit information regarding anything
linked to transactions, user IDs, passwords, etc. into the false websites run the risk
of these kinds of information getting misused by the attacker, which might result
in loss of money and personal data. As a result of ongoing technological advance-
ments and the substantial influx of data utilized by numerous businesses on a daily
basis, Numerous online enterprises, including those in the financial sector, are facing
reputational damage due to the proliferation of fraudulent websites. Several online
companies, including financial services, are experiencing reputational harm as a
result of the growth of these phony websites. It will be incredibly advantageous for
everyone if we just simply detect these websites early on. Due to the dynamic nature
of phishing efforts, there is no alternative approach for phishing removal; hence,
more efficient and improved methods for detecting them are required. According to
Detecto: The Phishing Website Detection 117
the literature review, the majority of current machine learning techniques have flaws
like a high rate of false alarms, a low detection rate, and the inability of classification
models and some hybridized techniques to produce incredibly effective and efficient
detection of phishing sites [4]. Yet, since they were made using so many cutting-
edge methods, finding these websites is challenging. Although many methods have
been proposed for the identification of websites, many of them fall short of producing
100% accurate findings, and several new phishing websites may be created in a matter
of minutes. Machine learning-based classifiers can maintain their resistance to zero-
hour phishing attempts while achieving very accurate classification [5]. Overall, the
development of effective phishing detection techniques is an important step toward
enhancing online security, and the ongoing research in this area is expected to lead to
even more sophisticated and accurate methods for detecting and preventing phishing
attacks.
Main contribution of this study:
• We proposed a system using 32 feature extractions from website URLs which
is capable of detecting phishing websites with high precision and accuracy of
97.40%. It outperforms even simple websites with high precision.
• We introduce age of domain, IFrame redirection, and disabling right-click as an
extra feature to classify websites as legitimate or phishing.
• We have proposed a novel approach to ensemble machine learning that makes
use of multi-threading to execute ensemble-based machine learning models in
parallel. Parallel processing throughout the training and testing phases speeds up
procedures, making it possible to identify phishing URLs instantly.
2 Literature Review
S. No. Author Publisher Year Problem Approach Limitation

addressed results
1 Shouq Applied 2023 A better Used ensemble A limited
Alnemari Sciences approach to technique or number of
et al. [6] automate the integrate neural features and
detection of network, classifier are
phishing URL random forest, used to train
and SVM and the model
get 96%
accuracy
2 Mausam IJSRED 2022 Implementation Three ML Only 10
et al. [7] of sequential algorithms features are
ML algorithms XGBoost, RF, extracted
to detect and KNN are
phishing attacks used, and RF
produced
96.75%
accuracy
(continued)
(continued)
addressed results
3 Sönmez ISDFS 2018 Phishing attacks This strategy Achieve
et al. [1] classification consists of 95.34%
categorizing accuracy
websites and
extracting
features from
websites. Six
activation
functions were
utilized in the
extreme
learning
machine
(ELM), which
outperformed
the SVM and
NB in accuracy
(95.34%)
4 Zuhair Int. J. Intell. 2016 Phishing Hybrid Less number
and Syst. Technol. detection phishing of feature
Selamat Appl. detection comparisons
[2] with different
classifiers
5 Aydin IEEE Conf. 2015 Framework for The dataset and The result is
and Commune feature outside service produced by
Baykal Network extraction providers comparing
[8] Security adaptable and produced 133 Naïve Bayes
straightforward features and SMO
with fresh
tactics
6 Parekh ICICCT 2018 Use URL Eight features It obtained an
et al. [9] identification to out of a total of accuracy level
identify 31 are of 95%
phishing sites considered for
parsing. The
accuracy level
for the random
forest approach
was 95%
(continued)
(continued)
addressed results
7 Zhang International 2017 The word To obtain Only eleven
et al. [10] Journal of embedding statistical feature is
Engineering semantic aspects of web extracted
Research and characteristics, pages, 11
Technology semantic features were
(IJERT) features, and extracted and
multi-scale divided into 5
statistical types. The
features are model is
mined by the learned and
phishing tested using
detection model AdaBoost,
to efficiently bagging,
detect phishing random forest,
performance and SMO
8 Jeeva Human-centric 2016 Combining The rules They
et al. [11] Computing and length, slash produced by discovered a
Information number, point the apriori 93% accuracy
Sciences number, and algorithm rate
position discovered a
attributes with 93% accuracy
transport layer rate
security aspects
9 Gautam Springer 2018 Association The 16 This is
et al. [12] data mining characteristics inadequate,
approach were extracted and thus, the
by them, and suggested
their accuracy algorithm can
was 92.67% be improved
for a high
detection rate
10 Sonowal SN Computer 2020 Detected BSFS Accuracy
[11] Science phishing emails technique 97.31%
weighed the
accuracy of
97.31%
11 Fadheel IEEE 2017 Detect phishing To help with Only 19
et al. [13] websites phishing feature is used
identification,
19 of the site’s
original 30
characteristics
have been
chosen
(continued)
(continued)
addressed results
12 Shima ICIN 2018 Applying a A neural A minimum
[14] neural network network model feature is used
model for the is used to
URL domain automatically
extract
information
without any
specialized
knowledge of
the URL
domain
2.1 Methodologies for Phishing Website Detection
The proposed method will put a strong emphasis on boosting the accuracy of faked
website detection using various supervised learning techniques. Kaggle was used to
get the data. The dataset consists of 32 features and 11,056 occurrences. The dataset
is then divided into sections based on entropy. The refined dataset demonstrates
accuracy. Following that, the partitioned dataset is employed to check correctness.
The best qualities for each leaf node are identified using correlation and a working
model. The prediction model is trained via ensemble learning, which makes use of
numerous learning models. While making predictions, it is possible to avoid having
one model dominate the results by employing numerous models. As a result, we show
how the majority of votes are calculated using the output from all models (Fig. 1).
2.2 Dataset
In this model, we have blended datasets generated with phishing datasets that we
have acquired from a variety of online sources, including Kaggle. We get the phishing
dataset from Kaggle to test and train the model in the ratio of 20–80, respectively.
The dataset, which has 32 columns and 11,056 rows, includes information from both
phishing and legitimate websites, out of which 31 are independent features and 1 is
dependent feature. Features are listed down from the dataset like long URLs, short
URLs, domain age, HTTPs, page rank, etc. (Fig. 2).
Fig. 1 Proposed methodologies
To distinguish between genuine and fake websites, a website may be utilized to extract
several attributes. The effectiveness of the systems for identifying phishing websites
depends on the quality of the characteristics that are retrieved. More information
on these characteristics and their significance is provided in [14]. The features are
extracted into four categories: address bar grounded features, abnormal grounded
features, HTML and JavaScript grounded features, and domain grounded features.
Address bar grounded features refer to techniques that attackers use to manipulate
the URL in the address bar of the web browser. Some of the features in this category
include using the IP address instead of the domain name, using long URLs to hide
suspicious parts, using URL shortening services, redirecting using “//”, and adding
prefixes or suffixes separated by a hyphen to the domain. Other features in this
category include subdomains, HTTPS, domain registration length, favicon, using
non-standard ports, and the existence of the “HTTPS” token in the domain part of
the URL.
Abnormal grounded features refer to techniques that attackers use to hide or
obfuscate the true nature of a website. Some of the features in this category include
the request URL, the URL of anchor tags, server form handlers (SFH), and submitting
information to email or abnormal URLs.
Fig. 2 Dataset classification heatmap
HTML and JavaScript grounded features refer to techniques that attackers use to
manipulate the HTML and JavaScript code of a website. Some of the features in this
category include website forwarding, status bar customization, disabling right-click,
using pop-up windows, and IFrame redirection.
Domain grounded features refer to techniques that attackers use to manipulate
the domain name and its associated properties. Some of the features in this category
include the age of the domain, DNS records, website traffic, page rank, Google
index, the number of links pointing to the page, and statistical reports-based features
(Fig. 3).
2.4 Machine Learning Algorithm
The decision tree is a specific kind of machine learning technique used for classifica-
tion and regression analysis. The model predicts the value of the target variable based
Fig. 3 Feature importance
on a number of input variables. A representation of decisions and their results that

resembles a tree is created using the decision tree algorithm. About 95.9% accuracy
is produced via the decision tree (Fig. 4).
To increase prediction precision, random forest mixes many decision trees. It
works effectively for model training and produces results with an accuracy of
Fig. 4 Decision tree

Fig. 5 Random forest
96.7%. Moreover, it contributes to increased precision, decreased overfitting, and

the capacity for both category and numerical data handling (Fig. 5).
Naive Bayes classifier is an algorithm based on probability in machine learning
used for classification tasks. It uses Bayes’ theorem, which describes the probability
of an event based on prior outcomes or evidence. It is a fast and simple algorithm
helpful to handle a large dataset with high-dimensional features. Naïve Bayes gives
an accuracy of 60.5%.
p(c|x) p(x)
p(c|x) =
p(x)
p(c|x) = p(x1 |c) × p(x2 |c) × · · · × p(xn |c) × p(c) (1)
In machine learning, issues are binary classified using logistic regression. The
given model is a statistical model and a supervised learning algorithm that utilizes
one or more input factors to predict the probability of a binary outcome. The given
equation illustrates how this algorithm’s hypothesis goes toward the cost function’s
upper limit between 0 and 1.
0 ≤ hθ (x) ≤ 1 (2)
A supervised learning technique called support vector machine (SVM) is utilized

for outlier identification, classification, and regression. We use it for high accuracy,
the ability to handle high-dimensional data, and robustness to outliers. Its produces
result with an accuracy of 96.4%.
Fig. 6 Gradient boosting classifier
A machine learning approach called gradient boosting classifier is employed for

classification and regression issues. It is an ensemble learning technique that turns
several weak models into potent models. About 97.4% accuracy makes it the model
with the best performance. It aids in determining whether a website is real or phishing
and achieves higher accuracy (Fig. 6).
A nonparametric machine learning method called K-nearest neighbors (KNN) is
used to solve classification and regression problems and used to generate predictions
based on how closely fresh input data points resemble those in the training set. It
produces an accuracy of 95.6% (Fig. 7).
3 Result
To ensure the highest level of accuracy, this model has been evaluated and trained
using a variety of machine learning classifiers and various ensemble techniques. After
all, algorithms have returned their results; each algorithm will state its estimated
accuracy. Every algorithm is contrasted with others to analyze which offers Table 1
accuracy percentage with the highest accuracy rate. As an earlier study suggested in
the paper [15], they used an ensemble technique and achieve 87% accuracy as shown
in Fig. 8.
Fig. 7 K-nearest neighbors
Table 1 Comparison table

S.N ML model Accuracy f1_score Recall Precision
1 Gradient boosting classifier 0.974 0.977 0.994 0.986
2 Random forest 0.967 0.971 0.991 0.991
3 Support vector machine 0.964 0.968 0.980 0.965
4 Decision tree 0.959 0.963 0.991 0.993
5 K-nearest neighbors 0.956 0.961 0.991 0.993
6 Logistic regression 0.934 0.941 0.943 0.927
7 Naïve Bayes classifier 0.605 0.454 0.292 0.997
Fig. 8 Accuracy of all models bar graph

Fig. 9 Accuracy comparison
Our model has performed better with the highest accuracy. Gradient boosting
algorithm has given better results with a final accuracy of 97.4%. For easier under-
standing, an accuracy comparison graph will show the accuracy of each algorithm.
Figure 9 displays the final algorithm accuracy comparison of our model.
4 Limitation
Since phishing attacks and cyber risks are continually changing, this study may not
have considered recent developments in detection techniques or new trends. Addi-
tionally, depending on the particular context and features of the phishing attempts,
the efficiency and usefulness of the studied detection approaches may change. Addi-
tionally, the quality and availability of datasets, which might not completely reflect
the variety of phishing cases, substantially influence the evaluation of detection algo-
rithms. Finally, while the goal of this research is to suggest areas for future study,
it does not offer complete solutions to all the problems relating to phishing website
detection. To overcome the limitations found and provide more reliable and effective
strategies to counter the increasing danger posed by phishing assaults, additional
study is required.
5 Conclusion
Phishing attacks are becoming more sophisticated, making it challenging to iden-

tify phishing websites. The detection of phishing websites is essential for protecting
sensitive information from being stolen by cybercriminals. Various techniques and
methodologies can be used for phishing website detection, including machine
learning algorithms, blacklisting, and heuristic analysis. However, these techniques
have their limitations, and new techniques need to be developed to detect advanced
phishing attacks. In order to prevent sensitive information from being taken, it is
crucial to take the required precautions. Phishing attacks can result in severe financial
losses and identity theft.
References
1. Alnemari S, Alshammari M (2023) Detecting phishing domains using machine learning. Appl
Sci 13(8):4649
2. Mausam G, Siddhant K, Soham S, Naveen V (2022) Detection of phishing websites using
machine learning algorithms. Int J Sci Res Eng Dev 5:548–553
3. Pujara P, Chaudhari MB (2018) Phishing website detection using machine learning: a review.
Int J Sci Res Comput Sci Eng Inf Tech 3(7):395–399
4. Somesha M, Pais AR, Srinivasa Rao R, Singh Rathour V (2020) Efficient deep learning
techniques for the detection of phishing websites. Sādhanā 45:1–18
5. Yang R, Zheng K, Wu B, Wu C, Wang X (2021) Phishing website detection based on deep
convolutional neural network and random forest ensemble learning. Sensors 21(24):8281
6. Taha A (2021) Intelligent ensemble learning approach for phishing website detection based on
weighted soft voting. Mathematics 9(21):2799
7. Mehanović D, Kevrić J (2020) Phishing website detection using machine learning classifiers
optimized by feature selection. Traitement du Sig 37:4
8. Sönmez Y, Tuncer T, Gökal H, Avci E (2018) Phishing web sites features classification based
on extreme learning machine. In: 6th International symposium on digital forensic and security
ISDFS 2018—Proceeding, vol 2018–Janua, pp 1–5
9. Zuhair H, Selamat A, Salleh M (2016) Feature selection for phishing detection: a review of
research. Int J Intell Syst Technol Appl 15(2):147–162
10. Aydin M, Baykal N (2015) Feature extraction and classification phishing websites based on
URL. In: 2015 IEEE conference on communications and network security, CNS 2015, pp
769–770 (2015)_
11. Jeeva, S. Carolin, and Elijah Blessing Rajsingh. “Intelligent phishing url detection using asso-
ciation rule mining.“ Human-centric Computing and Information Sciences 6, no. 1 (2016):
1–19.
12. X. Zhang, Y. Zeng, X. Jin, Z. Yan, and G. Geng, “Boosting the Phishing Detection Performance
by Semantic Analysis,” 2017
13. Gautam, Sudhanshu, Kritika Rani, and Bansidhar Joshi. “Detecting phishing websites using
rule-based classification algorithm: a comparison.“ In Information and Communication Tech-
nology for Sustainable Development: Proceedings of ICT4SD 2016, Volume 1, pp. 21–33.
Springer Singapore, 2018.
14. Sonowal G (2020) Phishing email detection based on binary search feature selection. SN
Computer Science 1(4):191
15. Barraclough PA, Hossain MA, Tahir MA, Sexton G, Aslam N (2013) Intelligent Phishing
Detection and Protection Scheme for Online Transactions. Expert Syst Appl 40(11):4697–4706
Synergizing Voice Cloning and ChatGPT
for Multimodal Conversational Interfaces
Shruti Bibra, Srijan Singh, and R. P. Mahapatra
Abstract Conversational AI systems have gained a lot of attention in recent years

because they are capable of interacting with users in a natural and an emotional
way. Designing a personalized and human-like chat experience remains a diffi-
culty. We introduce a paper that delves into the possibilities of bringing together
two technologies, voice cloning and ChatGPT, to create more seamless, natural, and
intriguing multimodal conversational interactions. Voice cloning is the process of
replicating the voice of a user. ChatGPT, on the other hand, provides contextual and
human-like text-based responses. With the help of these two technologies, we are
creating an intuitive and natural conversational experience that better reflects the
user’s communication style. From the analysis of our proposed system, we find that
this model highly improves the conversational AI. This research provides valuable
insights into the potential of multimodal dialogue networks and opens the door for
further innovations in the field.
Keywords ChatGPT · Voice cloning · Multimodal dialogue networks
1 Introduction
Advancements in natural language processing have made conversational inter-

faces increasingly popular and widely used in various applications. However, the
current state-of-the-art conversational systems still struggle to generate coherent and
engaging responses consistently. One promising solution to this problem is the inte-
gration of voice cloning technology with language models such as ChatGPT, which
can potentially enhance the quality and naturalness of the conversational output. Our
proposed system works in two parts—voice cloning and ChatGPT. The voice cloning
system is a neural network-based system for text-to-speech (TTS) synthesis. We have
created our own ChatGPT model called Quicksilver using Python and OpenAI’s API.
S. Bibra (B) · S. Singh · R. P. Mahapatra

SRM Institute of Science and Technology, Ghaziabad, India
e-mail: shrutibibra00@gmail.com
https://doi.org/10.1007/978-981-99-6553-3_10
132 S. Bibra et al.
These two technologies combine to create a more natural and advanced AI chatbot.
The voice technology models face a number of issues and we strive to minimize them.
These problems begin from Ethical concerns which mean unauthorized imperson-
ation of a person’s voice which can be misused to malicious purposes. We eliminate
this issue by making our targeted user consensually say a sentence with no meaning
directly through the system’s microphone since a normal human being would not
do it in normal circumstances and then clone the voice. This will ensure that no
voice is being replicated from a person’s previous audio or any other source. Voice
cloning has been flourishing since a long time now, however, no integration with
other technologies has been. Through this model, we aim to synergize the two tech-
nologies. Finally, most of the AI voice assistants consist of only limited contextual
data. Through our system, we will be able to give “a brain to a cloned voice”.
The combination of these two technologies can potentially result in a more human-
like and engaging conversational interface. In this research paper, we investigate the
synergies between voice cloning and ChatGPT for the development of multimodal
conversational interfaces that can produce more natural and engaging responses. Our
study aims to shed light on the potential of this integration and identify the challenges
and opportunities that arise from the use of these technologies in combination. The
insights from our research can inform the development of more sophisticated conver-
sational systems that can provide a more natural and personalized experience to the
users.
2 Related Works
The hybrid approach toward voice cloning and ChatGPT is not a very popular tech-
nique. However, advancements have been made in both of the fields individually.
There have been methods to build a deep learning system which includes three
stages that performs real-time voice cloning for a long time [1] interfaces. Neural
network-based speech synthesis has been shown to produce high-quality speech for
large numbers of speakers. [2] introduces a neural voice cloning system that takes
fewer audio samples as input. Speaker encoding and speaker adaption are the two
strategies that we take into consideration. Both methods work well, even when there
are not enough cloned audios, in terms of the speech’s authenticity and similarity
to the actual speaker. It is expensive and time-consuming to try to create a speaker
voice that is different from the one you learnt because you will need to buy additional
data and retrain the model. This is the fundamental justification for single-speaker
TTS versions. By attempting to develop a system that can simulate a multi-speaker
acoustic area, [3] seeks to get around these restrictions. As a result, even if they
were not heard during the training phase, you can produce noises that sound like
various target speakers. A variety of chatbots are now under the management of
ChatGPT a web-based chat platform that allows for in-person, sensitive talks. [4]
provides a highly immersive and engaging user experience by seamlessly combining
Synergizing Voice Cloning and ChatGPT for Multimodal … 133
Fig. 1 Flowchart of the proposed system
cutting-edge computer vision, speech processing, and natural language processing

technology (Fig. 1).
3 The Proposed System
Our proposed system is a conversational AI that can be used as a chatbot as well as a

voice assistant. It is not just an integration of ChatGPT and voice cloning model but
a carefully designed system that has the ability to communicate with the user more
naturally and in an intuitive way. The core concept is using the voice of the targeted
user to train it in our voice cloning model. This model provides a cloned voice of
the user. On the other hand, the question asked to the ChatGPT provides a response.
This response which is in the form of text is converted to speech integrated with
the cloned voice created before. The architecture of the model is simple and easy to
understand.
3.1 Voice-Enabled ChatGPT
The user gives the audio input to the voice interface. The voice interface converts this
audio into text using speech-to-text API (STT); here, we have used IBM Watson API.
This text input created goes to the OpenAI API where it first undergoes tokenization.
The input text is tokenized into individual words and punctuation marks. Each token
is encoded into a numerical vector representation that captures its meaning and
context. The GPT model, which comprises numerous layers of feedforward and self-
attention networks, processes the encoded input. The GPT model predicts the most
likely response based on the input and its own training data. The predicted response
is decoded from the numerical vector representation back into natural language text.
Hence, the response is obtained in the form of a text. The text input that has been
134 S. Bibra et al.
Fig. 2 Flowchart of voice-enabled ChatGPT
received is saved in a text file (.txt format) so that it is ready to be used by our voice
cloning model (Fig. 2).
The process of system-wide implementation entails steps like installing prereq-
uisites, setting up the environment so that the project can run smoothly, installing
libraries, and obtaining API keys. As previously stated, Python 3 and Google Colab
were used for all required codings and implementations. The installations required
are:
1. Gradio
2. IBM Watson STT
3. OpenAI
4. Whisper
The Python pip install command was used to complete the installation described
above.
3.2 Voice Cloning Model
The Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech

Synthesis study uses our voice cloning technique as a functional prototype. This
model is based upon a three-layered LSTM for synthesis of text to speech. It has
progressive three-stage pipeline that is capable of cloning an unknown voice from a
few seconds of sample speech.
These three stages are:
1. Speaker Encoder: creates an embedding from a single speaker’s brief utterance.
The speaker’s voice is meaningfully represented by the embedding, making
similar voices close to one another in latent space.
2. Synthesizer: creates a spectrogram from text based on the embedding of a speaker.
This model is a WaveNet-free version of the well-known Tacotron 2.
3. Vocoder: Spectrograms produced by the synthesizer are used to infer an audio
waveform.
Fig. 3 Flowchart of voice cloning model
The voice interface picks up the user’s voice whose voice has to be copied. The
speaker encoder picks up this voice. A brief reference utterance from the speaker to be
copied is given to the speaker encoder at the moment of inference. The synthesizer
receives as input a sentence that has been transformed into a phoneme sequence
and produces an embedding that is used to condition the synthesizer. The speech
waveform that makes up the cloned voice is created by the vocoder using the synthe-
sizer’s output. The voice cloning model reads the saved text file it obtained from the
ChatGPT model before delivering the desired response in the voice of the targeted
user (Fig. 3).
The system-wide implementation entails steps like installing prerequisites, setting
up the project’s environment, retrieving datasets, encoding, and implementing
encoder modules, synthesizer modules, and vocoder modules, as well as imple-
menting them. As previously stated, Python 3 was used for all required codings and
implementations. The mandatory installations that are required for the working of
the project included:
1. TensorFlow
2. Numpy
3. Encoder
4. Vocoder
5. Pathlib
6. Synthesizer.inference
The Python pip install command was used to complete the installation described
above.
4 Methodology
The methodology for research paper on synergizing voice cloning and ChatGPT
for multimodal conversational interfaces involves a combination of data collection,
training, integration, and evaluation, with a focus on identifying and addressing the
gaps in the existing literature. The ultimate goal of this research is to create more
effective and natural multimodal conversational interfaces that can better meet the
needs of users.
136 S. Bibra et al.
4.1 Voice-Enabled ChatGPT
The methodology for creating a voice-enabled ChatGPT using OpenAI involves

leveraging OpenAI’s pre-trained models for speech recognition and speech synthesis,
as well as their GPT-3 model as a starting point for the ChatGPT model architecture.
This approach can significantly reduce the time and resources required to develop a
high-quality voice-enabled conversational AI system.
1. Voice input: Digital audio format.wav format is taken as an input.
2. Speech-to-Text (STT) API: STT API applies noise reduction and filtering to
improve audio quality and remove background noise. Then splits the audio input
into smaller chunks to improve processing efficiency and accuracy. It uses a
speech recognition algorithm (Hidden Markov Models) to the audio chunks
to transcribe the spoken words into text. Further, it combines the transcribed
text from each audio chunk into a complete transcript. After that, applies post-
processing techniques such as punctuation and capitalization normalization and
spelling correction to improve the accuracy and readability of the final transcript.
Finally, it returns the transcript as the output of the API.
3. OpenAI API: It makes use of the third version of the Generative Pretrained
Transformer, a neural network machine learning model, and the GPT-3 language
model. A large corpus of text data was used to pre-train ChatGPT using an
unsupervised learning method. The model learns to anticipate missing words
in a given text during pre-training. This aids in understanding the context and
connections between various words. ChatGPT is fine-tuned on particular activ-
ities, including text production or question answering, following pre-training.
During fine-tuning, the model is trained on a smaller dataset that is tailored to the
task at hand, enabling it to pick up on nuances and patterns unique to that work.
ChatGPT uses a range of NLP techniques, such as tokenization, word embed-
dings, and language modeling, to process and understand natural language input.
Beam search is a decoding algorithm used by ChatGPT to generate responses
to user input. It involves generating multiple possible responses and selecting
the one with the highest probability, based on the model’s predictions. Finally,
ChatGPT is able to generate responses that are contextually relevant to the user’s
input, thanks to its ability to understand the relationships between different words
and phrases (Fig. 4).
5 Voice Cloning
1. Speaker encoder: The speaker encoder is the first module that needs training. The
preprocessing, audio training, and visualization models are all included because
it manages the auditory input that is given to the system. The speaker encoder is an
LSTM3-layer with 256-unit projection layer and 768 hidden nodes. We assume
that a projection layer is just a densely networked layer with 256 outputs per
Fig. 4 Voice cloning block diagram
LSTM that is iteratively applied to each LSTM output because there is no mention
of what a projection layer is in any of the publications. For quick prototyping,
simplicity, and a reduced training burden, it is possible to employ 256 LSTM
layers directly as opposed to building the speaker encoder for the first time. Our
result in this case is a 40-channel log-mel spectrogram with a stage of 10 ms
and a window width of 25 ms. The output (a 256-element vector) is the L2-
normalized hidden state of the last layer. Our method additionally includes a
pre-standardization ReLU layer that aims to make embedding sparse and more
understandable.
2. Synthesizer: The Google Tacotron 2 model synthesizer is utilized without
WaveNet. An iterative intersequence system called Tacotron forecasts text-based
mel spectrograms. A vector of specific characters is initially placed into the text
string standard layers which are added thereafter to lengthen a single encoder
block. To produce the output frames for the encoder, these frames pass through
bidirectional LSTMs. Each frame produced by the Tacotron encoder has a speaker
embedding associated with it. The attention function analyzes the encoder output
frame to produce a decoder input frame. Our solution does not validate the input
text’s pronunciation, and the characters are given exactly as they are. However,
there are some cleaning procedures. All letters are moved to ASCII, all spaces
are normalized, and all letters are reduced. Full-text format is used in place of
abbreviations and numbers. Punctuation is permitted, although it is not recorded
in the record.
3. Vocoder: The vocoder modules are trained last since the encoder synthesizer
vocoder is supposed to train the modules in the order. Tacotron 2 has a vocoder
called WaveNet. The vocoder model used is based on WaveRNN and is an open-
source PyTorch implementation 15, although it has a number of various user
138 S. Bibra et al.
fatchord design choices. “Alternative WaveRNN” is the name of this architec-

ture. In each training phase, the mel spectrogram and its related waveform are
separated into the same number of segments. Segments t and t-1 of the simulated
spectrogram serve as design inputs. It ought to be created so that each segment
of the waveform is the same length. The number of mel channels is kept constant
as the mel spectrogram is upsampled to fit the target waveform’s length. As the
mel spectrogram is converted to a waveform, models like ResNet use the spec-
trogram as an input to generate features that alter the layers. To change the length
of the waveform segment, the resulting vector is repeated. Then, this adjustment
vector is divided into four equal parts, each of which corresponds to a channel
dimension. The first portion of this division is concatenated with the upsampling
spectrogram and waveform segment of the preceding time step. The resulting
vector changes in certain ways when there is a skip connection. A high-density
layer comes after two GRU layers.
6 Result and Discussion
With the error-free working and execution of the project, the system was able to
successfully provide the response of the ChatGPT in the voice of the targeted user by
cloning his/her voice. We have evaluated are results on the basis of word error rate,
naturalness and the speed of response. Word Error Rate (WER) is a metric used to
evaluate the accuracy of speech recognition systems, machine translation systems,
and other natural language processing (NLP) models. It measures the difference
between the words in the predicted output and the words in the reference (i.e., the
ground truth). human speech. Naturalness is a measure of the degree to which synthe-
sized speech sounds like it are produced by a human speaker, both in terms of sound
quality and prosody (i.e., the rhythm, intonation, and stress patterns of speech).
Since our model uses OpenAI API, it would have the same results as that of
ChatGPT with a slight drop in speed. We have compared our paper with AI voice
assistants (Table 1).
The results of WER of Google assistant and Siri have been obtained from [5, 6],
respectively, whereas we see the evaluation of naturalness for Google and Siri in [7,
8], respectively. A detailed analysis has been provided by IBM Watson [9] for WER
of our model and [10] for naturalness.
Table 1 Final MOS results

Source Word error rate (%) Naturalness
Google voice assistant 4.9 3.5
Siri voice assistant 6.4 4.17
This paper 6.5 3.7
7 Conclusion
This paper has successfully developed a framework for Integration of Voice Cloning
with ChatGPT. Despite a few complications, the results are competent. The model’s
ability to synthesize voices is very good. The ChatGPT provides excellent results,
but the speech of response can always be improved. Beyond the scope of the project,
there are still ways to improve certain frameworks and implement some of the recent
advances in this area made at the time of writing. While it has been agreed that
our proposed system is an improved design of an AI voice assistant. We can also
confirm the prediction that future development of the same technical area would lead
to the development of better and more sophisticated models. Therefore, this approach
proved to be an attempt to understand, implement, and innovate the expertise gained
during the research. We anticipate that this framework will soon be available in more
potent forms.
8 Future Scope
The current paper has laid the foundation for a number of potential areas of future
development and improvement. It can be expanded to include more languages and
more accents in the voice cloning model. Further, ChatGPT can also be inclusive of
more languages. The project can be integrated with edge computing and somehow,
be made for offline use as well. This research should also become an indispensable
part of mobile interfaces, operating systems, etc. It can be further optimized by
training the voice cloning model to be more natural and seamless. The ChatGPT
should become inclusive of images, videos and is giving out visual representations
as well. Customization of this proposed system can also be done according to the
specific needs and requirements of the various businesses and individuals. This can
be done by making the model train on a particular dataset as per the requirements.
Overall, the future scope of the project is promising and offers ample opportunities
for growth, innovation, and impact.
References
1. Jia Y, Zhang Y, Weiss RJ, Wang Q, Shen J, Ren F, Chen Z, Nguyen P, Pang R, Moreno IL, Wu Y
(2018) Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In:
32nd conference on neural information processing systems (NeurIPS2018), Montréal, Canada
2. Arik SO, Chen J, Peng K, Ping W, Zhou Y (2018) Neural voice cloning with a few samples.
In: 32nd conference on neural information processing systems (NIPS2018), Montréal, Canada
3. Ruggiero G, Zovato E, Di Caro L, Pollet V (2021) Voice cloning: a multi-speaker text-to-speech
synthesis approach based on transfer learning. arXiv preprint arXiv:2102.05630
4. Alnuhait D, Wu Q, Yu Z (2023) FaceChat: an emotion-aware face-to-face dialogue framework.
arXiv preprint arXiv:2303.07316
140 S. Bibra et al.
5. Besacier et al (2019) Speech command recognition on a low-cost device: a comparative study.

IEEE Access J
6. Liu et al, A comparative study on mandarin speech recognition: Alexa, google assistant, and
Siri. In: 19th annual conference of the international speech communication association
7. Besacier L, Castelli E, Gauthier J, Karpov A, Naturalness and intelligibility of six major voice
assistants: Siri, Google assistant, Cortana, Bixby, Alexa, and Mycroft. In: Proceedings of the
19th annual conference of the international speech communication association (Interspeech),
pp 1303–1307
8. Apple, Deep learning for siri’s voice: on-device deep mixture density networks for hybrid unit
selection synthesis. Apple Mach Learn Res
9. IBM Watson, Speech to text API documentation. Accessed 8 Mar 2023 https://arxiv.org/abs/
2303.07316
10. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, Language models are unsupervised
multitask learners. OpenAI Research
A Combined PCA-CNN Method
for Enhanced Machinery Fault Diagnosis
Through Fused Spectrogram Analysis
Harshit Rajput, Hrishabh Palsra, Abhishek Jangid, and Sachin Taran
Abstract This research introduces a novel strategy to improve the accuracy and
resilience of machinery malfunction identification by combining multimodal spec-
trogram fusion and deep learning techniques. The proposed approach involves the
division of the dataset into two equal portions, followed by the application of the
continuous wavelet transform (CWT) and Short-Time Fourier Transform (STFT)
separately, resulting in two sets of 2D images. A Principal Component Analysis
(PCA) fusion approach is then employed to merge these images, extracting the
most relevant and complementary characteristics. Subsequently, a convolutional
neural network (CNN) model is applied to the fused spectrogram, enabling clas-
sification and learning of intricate and abstract features. The suggested approach
offers several advantages, including enhanced feature extraction, improved accuracy,
faster processing, robustness to noise and artifacts, and transferability. To illustrate
its efficiency, the Case Western Reserve University (CWRU) dataset, comprising
vibration signals from various fault states in rotating machinery, is utilized. Experi-
mental results demonstrate that the proposed method surpasses existing approaches
in machinery failure diagnostics, achieving a high classification accuracy.
Keywords Spectrogram · CWRU dataset · Dataset splitting · Continuous wavelet

transform (CWT) · Short-time fourier transform (STFT) · Principal component
analysis (PCA) · Convolutional neural network (CNN)
H. Rajput (B) · H. Palsra · A. Jangid · S. Taran

Department of Electronics and Communication Engineering, Delhi Technological University,
Delhi 110042, India
e-mail: harshitraj2570@gmail.com
https://doi.org/10.1007/978-981-99-6553-3_11
142 H. Rajput et al.
1 Introduction
Asset maintenance is crucial to preserve its practicality and prevent defects. Lack
of effective maintenance can reduce manufacturing capacity. There are two tradi-
tional approaches to maintenance—corrective and preventive. Corrective mainte-
nance is impulsive and leads to maximum exploitation of machinery, while preven-
tive maintenance is systematic but can be economically costly. Predictive mainte-
nance is an advancement over previous strategies and is required for smart machines.
Rolling elements are critical components, and early failure detection is essential for
minimizing downtime, waste rates, and maintaining product quality.
In the era of industries where we are pacing toward the Industry 4.0 which
is the automation of machines by providing them neural capabilities of humans,
i.e., building smart machines, by employing technique of machine learning or deep
learning.
In the approach suggested by Zhang et al. one-dimensional vibration signals are
transformed into time–frequency pictures using STFT, which are then fed into STFT-
CNN for learning and identifying fault feature [1]. A novel method for fault identi-
fication under varied working conditions based on STFT and transfer deep residual
network (TDRN) is described in the research put forth by Du et al. By combining
with transfer learning, the TDRN can create a link between two dissimilar working
environments, leading to excellent classification accuracy [2].
In order to address the issue of uneven data distribution in the field of diag-
nosing rolling bearing faults, Han et al. offer a data augmentation method integrating
continuous wavelet transform and deep convolution-generated adversarial network
(DCGAN). The technique assesses image quality and variety while expanding
time–frequency image samples of fault categories using DCGAN. According to
experimental findings, the proposed method is more accurate than the conventional
one-dimensional data expansion method [3].
Wang et al. suggested a hybrid approach employing variational modal decom-
position (VMD), CWT, CNN, and support vector machine (SVM) for diagnosing
rolling bearing faults. After preprocessing using VMD, CWT is used to create two-
dimensional time–frequency pictures. SVM is used for defect identification, and
CNN is utilized to extract features. With great accuracy, the approach is validated
using spindle device failure tests and datasets from CWRU. For better visualization
and feature extraction, time–frequency pictures can be acquired using CWT [4].
A Combined PCA-CNN Method for Enhanced Machinery Fault … 143
2.1 Dataset
The rolling-element bearing vibration signals from various operating circumstances

are collected in the Case Western Reserve University (CWRU) dataset. Rotating
machinery frequently makes use of rolling-element bearings, and their failure may
result in costly downtime and repairs. Therefore, early detection of bearing faults is
crucial to prevent catastrophic failures and reduce maintenance costs.
The dataset includes vibration signals obtained from four types of bearing faults:
inner race, outer race, roller element, and combination faults. Each fault type is
simulated by introducing a defect at a specific location on the surface of bearing.
The vibratory signals are recorded using accelerometers placed on the motor casing.
The obtained signals are then sampled at a rate of 48 kHz, which means that the
signals are recorded 48,000 times per second [5].
To improve the accuracy and reliability of the machine learning models, data
collection under actual working settings is crucial. In the CWRU dataset, the load
applied to the bearings during the experiments is 1 horsepower (1HP), which corre-
sponds to a load of approximately 750 Watts. This load level is representative of
typical operating conditions in industrial applications (Fig. 1).
Fig. 1 a Ball bearing system experimental platform for the CWRU bearing test rig [6, 7], the REB’s
component parts, and b its cross-sectional view
Two most popular data preprocessing methods are Continuous Wavelet Transform
(CWT) and Short-Time Fourier Transform (STFT).
CWT: CWT is a time–frequency analysis technique that allows the decomposi-
tion of a signal into different frequency components over time. A two-dimensional
image generated through convolution is commonly known as a scalogram or wavelet
spectrogram. In this representation, the x-axis represents time, the y-axis represents
frequency, and the color or intensity corresponds to the amplitude of the wavelet
coefficients. The fundamental benefit of CWT is that it can analyze signals with non-
stationary time–frequency content. However, the main drawback of CWT is that it
can be computationally expensive and can suffer from edge effects and cross-term
interference [8].
STFT: STFT, on the other hand, is a method for breaking down a signal into its
frequency components, over time using a series of overlapping windows. The STFT
method is useful for analyzing non-stationary signals, such as those generated by
rotating machinery, where the frequency content may change over time due to the
presence of faults or other operating conditions [9].
2.3 Fusion
PCA is a statistical method that is utilized to identify the most significant compo-
nents of the input data and to reduce the dimensionality of the data. In the context of
spectrogram fusion, PCA can be used to identify the common features across the two
spectrograms and to generate a new spectrogram that captures these features. The
fundamental principle of PCA is to transform the input data into a new coordinate
system so that it can be represented with fewer dimensions while still preserving the
most crucial information. The transformation is carried out by calculating the eigen-
values and eigenvectors of the input data’s covariance matrix. The data’s principal
components are represented by the eigenvectors, and each principal component’s
variance is shown by the eigenvalues [10].
To use PCA for spectrogram fusion, we follow these steps [11]:
• Preprocess the input data: The input data should be preprocessed to remove any
noise, artifacts, or irrelevant features. This can be done by applying suitable filters,
normalization, or other preprocessing techniques.
• Compute the CWT and STFT spectrograms: The CWT and STFT spectrograms
are computed separately for the preprocessed input data.
• Shuffling the time–frequency signals: The time–frequency signals obtained from
STFT and CWT are stored and shuffled according to their respective labels.
• Centering the matrix: The first step in centering the matrix is to take a time–
frequency signal X with dimensions n x m, where n is the number of frequency
components and m is the number of time points.
⎛ ⎞n
1 ⎝
m
X centered =X− Xi j ⎠ · 1m ,
m j=1
i=1
where 1m is a column vector of ones with length m.

• Compute the covariance matrix: The covariance matrix of the concatenated spec-
trogram matrix is computed. The covariance matrix represents the correlation
between the different elements of the spectrogram matrix. The covariance matrix
C of X centered is computed as follows:
1
C= T
X centered X centered .
m
• Determine the eigenvalues and eigenvectors: The covariance matrix’s eigenvalues
and eigenvectors are calculated. The spectrogram matrix’s primary components
are represented by the eigenvectors, while each principal component’s variance
is represented by the eigenvalues. The values of the covariance matrix C can be
calculated using a matrix decomposition:
C = V DV T ,
where V is a matrix of eigenvectors and D is a diagonal matrix of eigenvalues.

• Ordering the principal components: The principal components can be ordered by
their corresponding eigenvalues in descending order:
[PC1 , PC1 , . . . , PCk ] = sor t (diag(D), descend )[, I ]

= sor t diag(D), descend V = V (:, I )
• Generate the fused spectrogram: The fused spectrogram is generated by projecting

the CWT and STFT spectrograms onto the selected eigenvectors.
Y = V T X centered .
The resulting matrix Y represents the time–frequency signal projected onto

the selected principal components, with the most important patterns or features
highlighted.
PCA-based fusion algorithm has the advantage of being simple, fast, and effi-
cient. It can be used to identify the common features across the CWT and STFT
spectrograms and to generate a new spectrogram that captures these features [12].
Fig. 2 Data flow in model

suggested
2.4 CNN
After applying the PCA fusion algorithm to the CWT and STFT spectrograms to
obtain a fused spectrogram, a CNN model can be used for further feature extraction
and classification. A CNN is a particular kind of deep neural network that excels at
image and pattern recognition tasks. It is made up of many layers of neurons, the
processing units, which are arranged into convolutional, pooling, and fully linked
layers (Fig. 2).
The fundamental principle of a CNN is to learn a hierarchy of features that
abstractly describe the input data at various levels. Edge detection and corner detec-
tion are examples of local features that are extracted by the first few layers of the
CNN, whereas shape identification and texture recognition are examples of more
global features that are extracted by the subsequent levels. To apply a CNN to the
fused spectrogram obtained from the PCA fusion algorithm, we can follow these
steps [13].
From Table 1, the proposed method has the average accuracy of 99.63%, while
the CWT method and STFT method had 98.94% and 99.49%, respectively. The
confusion matrix is a tool for evaluating the performance of a classification model. It
provides detailed information on the number of correct and incorrect predictions for
each class. Figure 3 represents the confusion matrix obtained by CWT method having
maximum accuracy 99.30%. Fig. 4 represents the confusion matrix obtained by
STFT method having maximum accuracy 99.60%. Figure 5 represents the confusion
matrix obtained by proposed method having maximum accuracy 99.70%. The overall
advantages of using the methods of splitting the CWRU dataset, applying the PCA
fusion algorithm, and passing them through a CNN model include:
– Improved feature extraction: By splitting the dataset and applying different prepro-
cessing techniques, we can extract more diverse and complementary features from
the input data. The PCA fusion algorithm combines the features extracted by
CWT and STFT, which can improve the overall accuracy and robustness of the
classification.
– Increased accuracy: By using a CNN model, we can learn more complex and
abstract features from the fused spectrogram, which can improve the accuracy of
the classification. CNNs are particularly effective for image and pattern recog-
nition tasks and have been shown to achieve state-of-the-art results in many
applications.
Table 1 Table employing accuracies for different methodologies performed

Methods employed Accuracies
Average accuracy (%) Best accuracy (%) Worst accuracy (%)
CWT 98.94 99.30 98.50
STFT 99.49 99.60 99.38
Proposed 99.63 99.70 99.56
Fig. 3 Best accuracy

confusion matrix for CWT

confusion matrix for STFT

confusion matrix for
proposed method
– Faster processing: The use of PCA fusion algorithm can reduce the dimensionality
of the input data and provide a more compact representation of the features, which
can speed up the processing time of the CNN model.
– Robustness to noise and artifacts: The use of PCA fusion algorithm and CNN
model can improve the robustness of the classification to noise and artifacts in the
input data. The PCA fusion algorithm can reduce the effect of noise and artifacts
by combining information from multiple sources, while the CNN model can learn
to distinguish relevant features from irrelevant ones.
4 Conclusion
This paper proposes a new method having average accuracy 99.63% for fault diag-
nosis using a combination of CWT and STFT data preprocessing techniques. The
method involves splitting the dataset into two halves and passing each half to either
CWT or STFT, respectively. The resulting 2D spectrogram images are fused using
PCA and passed to a CNN for classification. The proposed method is compared to
existing methods that use either CWT or STFT alone, and experimental results on
the CWRU bearing dataset show that the proposed method outperforms the existing
methods in terms of diagnostic accuracy. The fusion of CWT and STFT provides
a more comprehensive representation of the data and improves the fault diagnosis
accuracy.
The proposed method was validated on 48 kHz sampling rate and 1Hp load of
CWRU dataset. Other wavelet transform methods and fusion methods can also be
investigated and analyzed. For example, PCA with methods like discrete wavelet
transform (DWT) fusion or independent component analysis (ICA) fusion can be
compared.
References
1. Zhang Q, Deng L (2023) An intelligent fault diagnosis method of rolling bearings based on
short-time fourier transform and convolutional neural network. J Failure Anal Prevent 1–17
2. Du Y, Wang A, Wang S, He B, Meng G (2020) Fault diagnosis under variable working conditions
based on stft and transfer deep residual network. Shock Vib 2020:1–18
3. Han T, Chao Z (2021) Fault diagnosis of rolling bearing with uneven data distribution based
on continuous wavelet transform and deep convolution generated adversarial network. J Braz
Soc Mech Sci Eng 43(9):425
4. J. Wang, D. Wang, S. Wang, W. Li, and K. Song, “Fault diagnosis of bearings based on multi-
sensor information fusion and 2d convolutional neural network,” IEEE Access, vol. 9, pp. 23
717–23 725, 2021.
5. Yuan L, Lian D, Kang X, Chen Y, Zhai K (2020) Rolling bearing fault diagnosis based on
convolutional neural network and support vector machine. IEEE Access 8:137 395–137 406
6. Li SY, Gu KR (2019) Smart fault-detection machine for ball-bearing system with chaotic
mapping strategy. Sensors 19(9):2029
7. “Case western reserve university bearing data center (2019). https://csegroups.case.edu/bearin
gdatacenter/home. Accessed 22 Dec 2019
8. Sharma P, Amhia H, Sharma SD (2022) Transfer learning-based model for rolling bearing fault
classification using cwt-based scalograms. In: Pandian AP, Palanisamy R, Narayanan M, Senjyu
T eds Proceedings of third international conference on intelligent computing, information and
control systems. Singapore, Springer Nature Singapore, pp 565–576
9. Yoo Y, Jo H, Ban S-W (2023) Lite and efficient deep learning model for bearing fault diagnosis
using the cwru dataset. Sensors 23(6):3157
10. Hong M, Yang B (2013) An intelligent fault diagnosis method for rotating machinery based
on pca and support vector machine. Measurement 46(9):3090–3098
11. Wang J, Zhao X, Xie X, Kuang J (2018) A multi-frame pca-based stereo audio coding method.
Appl Sci 8(6). [Online]. Available: https://www.mdpi.com/2076-3417/8/6/967
12. Gupta V, Mittal M (2019) Qrs complex detection using stft, chaos analysis, and pca in standard
and real-time ecg databases. J Instit Eng (India): Series B 100(03)
13. Yang S, Yang P, Yu H, Bai J, Feng W, Su Y, Si Y (2022) A 2dcnn-rf model for offshore wind
turbine high-speed bearing-fault diagnosis under noisy environment. Energies 15(9):3340
FPGA-Based Design of Chaotic Systems
with Quadratic Nonlinearities
Kriti Suneja, Neeta Pandey, and Rajeshwari Pandey
Abstract This paper presents a systematized methodology to implement chaotic

systems with quadratic nonlinearities on digital platform using Runge–Kutta 4 (RK4)
numerical method. Field programmable gate arrays (FPGAs), because of their flex-
ibility, reconfigurability, and parallelism, have been used for the implementation
using Verilog hardware description language (HDL) and the state machine control.
The synthesis results based on Xilinx Artix device 7a200tffv1156-1, and simulation
results using inbuilt simulator of Vivado design suite have been presented. The simu-
lation results have been validated by python-based numerical simulations as well.
The implemented chaotic systems have been evaluated based on hardware utilization
and time delay.
Keywords Chaotic system · Quadratic nonlinearity · Field programmable gate

array · Synthesis · Simulation
1 Introduction
Since, in the decade of eighties, electronic system design witnessed a paradigm shift
from analog to digital domain, so digitized chaotic systems have their own advan-
tages. A field programmable gate array (FPGA) is an integrated circuit (IC) consisting
of three blocks mainly: logic blocks, interconnects, and input–output blocks, all three
being programmable. Each of the logic cell contains basic circuit elements, such as
lookup tables (LUTs) on which combinational logic can be mapped and flip flops
(FFs) to design sequential logic. However, the composition of these blocks differs
in various FPGA families and packages. Some of the FPGA devices offer additional
hardware resources for flexible designing capability, such as memory blocks and
K. Suneja (B) · N. Pandey · R. Pandey

Delhi Technological University, Delhi 110042, India
e-mail: kritisuneja@dtu.ac.in
R. Pandey
e-mail: rpandey@dce.ac.in
https://doi.org/10.1007/978-981-99-6553-3_12
152 K. Suneja et al.
digital signal processing (DSP) blocks. Because of their reprogrammable feature,

FPGAs have utility in being used as prototypes for application-specific integrated
circuits (ASICs) applications.
FPGA-based design of chaotic systems finds applications in the embedded
engineering areas, such as image encryption [1, 2], text encryption [3], random
number generation [4], secure communication [5], and cryptography [6]. The digital
designing of chaotic systems can be done using different types of digital platforms,
including application-specific integrated circuits (ASICs), digital signal processors
(DSPs) and FPGAs. ASICs are capable of providing better performance than its coun-
terparts, but at the cost of time and money in the production of prototypes. Also, in
order to bring down the cost, ASIC-based applications stand in need of mass produc-
tion which is intolerable to even minute errors. DSP chips are the favorite candidate
of engineers to implement complex mathematical operations and processes, but their
sequential manner of processing is not favorable for the concurrency requirements
of chaotic systems. FPGAs provide the desired flexibility, concurrency, low cost,
enough resources for the implementation of chaotic systems. Thus, in prototyping
phase, they stand out among others for rapid and low-cost design.
An extensive literature survey suggests that different combinations of hardware
description languages (HDLs) and FPGA families have been used to design chaotic
systems. For instance, in [7–9], Virtex FPGA family has been used to map chaotic
systems while Artix in [10], Zynq in [11], Kintex in [12], and Altera Cyclone in [13].
Though Artix has lesser resources than Kintex and Virtex in 7 series, all of them have
sufficient resources to implement a chaotic system. Thus, the choice of family does
not affect the performance unless the resources are depleted here. In HDLs, while
VHDL has been chosen in [7, 8, 10, 11], Verilog is used in [9, 12, 13].
This work presents a systematic approach to implement chaotic systems with
quadratic nonlinearities on FPGA-based digital platform using Runge–Kutta 4 (RK4)
numerical method, because it utilizes weighted average of the slopes at four points,
providing better accuracy than the lower-order RK methods.
The remaining paper is formulated as follows: Sect. 2 explains the design method-
ology used for the digitization of chaotic systems, Sect. 3 briefs about the design
flow in FPGA and contains the synthesis and simulation results, and finally Sect. 4
concludes the paper.
2 Design Methodology
2.1 Mathematical Representation
The chaotic systems are represented by their constituent differential characteristic

equations. In order to map those differential equations on FPGA board, the existing
numerical methods, such as Euler, improved Euler (also known as Heun), fourth-
order Runge–Kutta (RK4) method [14, 15] are used to discretize the differential
FPGA-Based Design of Chaotic Systems with Quadratic Nonlinearities 153
equations. Out of the existing numerical methods, we have chosen RK4 method
because of its higher degree of accuracy in providing solutions [16]. It uses four
intermediate points K 1 , K 2 , K 3, and K 4 to determine the solution using the previous
sample. K 1 corresponds to the beginning, K 2 and K 3 are near middle and K 4 corre-
sponds to the end of the interval. The three chaotic differential equations, corre-
sponding to three state variables x, y, and z, are thus discretized using RK4 method
as represented by Eqs. 1–6.
h
x(n + 1) = x(n) + [K x1 + 2K x2 + 2K x3 + K x4 ] (1)
6
h
y(n + 1) = y(n) + [K y1 + 2K y2 + 2K y3 + K y4 ] (2)
6
h
z(n + 1) = z(n) + [K z1 + 2K z2 + 2K z3 + K z4 ] (3)
6
where
K x1 = f x [x(n), y(n), z(n)] (4a)
K x1 K y1 K z1
K x2 = f x [x(n) + h , y(n) + h , z(n) + h ] (4b)
2 2 2
K x2 K y2 K z2
K x3 = f x [x(n) + h , y(n) + h , z(n) + h ] (4c)
2 2 2
K x4 = f x [x(n) + h K x3 , y(n) + K y3 , z(n) + K z3 ] (4d)
K y1 = f y [x(n), y(n), z(n)] (5a)
K x1 K y1 K z1
K y2 = f y [x(n) + h , y(n) + h , z(n) + h ] (5b)
2 2 2
K x2 K y2 K z2
K y3 = f y [x(n) + h , y(n) + h , z(n) + h ] (5c)
2 2 2
K y4 = f y [x(n) + h K x3 , y(n) + h K y3 , z(n) + h K z3 ] (5d)
K z1 = f z [x(n), y(n), z(n)] (6a)
K x1 K y1 K z1
K z2 = f z [x(n) + h , y(n) + h , z(n) + h ] (6b)
2 2 2
K x2 K y2 K z2
K z3 = f z [x(n) + h , y(n) + h , z(n) + h ] (6c)
2 2 2
K z4 = f z [x(n) + h K x3 , y(n) + h K y3 , z(n) + h K z3 (6d)
where K xi , K yi, and K zi, i=1 to 4 represent the intermediate slopes of variables x, y, and
z, respectively, f x , f y, and f z represent the differential equations corresponding to
a given chaotic system and h is the step size or the interval between consecutive
samples.
The set of equations represented by (1)–(6) are implemented as follows.
The digital design of the chaotic system has two paths: the control path to control
the flow of the operations and datapath implementing all the algebraic operations. The
control path will consist of one initial state, also known as default state to initialize
the state variables and one final state or idle state which waits for the next set of
instructions. These two states are represented by S0 and S6, respectively. Besides,
there will be a requirement of five other states to evaluate [1–6]. The state diagram,
consisting of total seven states, representing the control path is shown in Fig. 1. Three
state variables are required to represent these states: S0 (000), S1 (001), S2 (010), S3
(011), S4 (100), S5 (101), and S6 (110).
The function of the seven states in control path S0 –S6 [17] are:
S0 : It is the initial and default state. In this state, the initial values are assigned to the
state variables x, y, and z. The process then passes to the next state S1 unconditionally.
S1 : In this state, the K x1 , K y1, and K z1 increments based on the slopes at the beginning
of the interval are calculated using (4a), (5a), and (6a). The process then jumps to
the next state S2 unconditionally.
S2 : The increments based on the slopes near midpoint of the interval, K x2 , K y2, and
K z2 using K x1 , K y1, and K z1 are calculated in this state using (4b), (5b), and (6b),
followed by unconditional transition to the next state S3 .
Fig.1 State transition graph

of the finite state machine
S3 : The increments based on the slopes again near midpoint, but different than that
of the previous point, K x3 , K y3, and K z3 using K x2 , K y2, and K z2 are calculated in this
state from (4c), (5c), and (6c), followed by an unconditional transition to next state
S4 .
S4 : The increments based on the slopes at the end of the interval, K x4 , K y4, and K z4
using K x3 , K y3, and K z3 are calculated in this state from (4d), (5d), and (6d), followed
by an unconditional transition to next state S5 .
S5 : In this state, the next chaotic samples x, y, and z are generated using (1)–(3). In
the next clock cycle, if the counter’s count C p is less than the user-defined integer
N, which represents the number of required samples, the process jumps to S1 for
calculating the next solution, else it jumps to S6 where it stays waiting.
3 Results
Eleven chaotic systems [18–30], including some popularly known systems, such as
Rössler and Lorenz have been designed using the methodology described in Sect. 2 on
a common FPGA platform to compare them and choose the best fit for digital appli-
cations. To synthesize these systems, both control and datapath have been entered
in Xilinx tool using Verilog HDL, the top module for is shown in Fig. 2 and the
pseudocode in Table 1. It consists of three output signals x n , yn, and zn for chaotic
system, each of 32 bits. Clock signal ‘clk’ and the step size ‘h’ have been taken as
input signals. The value of h has been chosen as ‘2–7 ’. It is to be noted that it has been
taken as a power of two because division and multiplication processes using power of
two in binary logic can simply be implemented using right and left shift operations,
respectively. A counter ‘C p ’ has been defined which will increment by ‘1’ every
time the samples’ values are calculated till it reaches the parameter N = 50,000. The
‘N’ is variable depending upon the number of samples required in an application.
Two tasks were declared: product for multiplication operation in datapath and F_
det to find the values of time derivatives of state variables for certain inputs. All
intermediate slopes K is will be evaluated using case statement and product, and F_
det tasks will be used in datapath. It is to mention that while configuring the FPGA
for a chaotic system, only the definition of F_det task to evaluate xdot, ydot, and zdot
will change in the code in accordance with the characteristic equations, while the
remaining part of the code will remain unchanged for all the chaotic systems. This
makes it easier for the user to implement any new chaotic system on FPGA faster.
Eleven chaotic systems [11–23] having quadratic-type nonlinearity have been
designed using the above methodology. The name/reference of the chaotic system,
its three-dimensional characteristic equations, parameter values, number of arith-
metic operations have been tabulated in Table 2. These chaotic systems have been
synthesized, as well as simulated, the results obtained for which are provided and
discussed as follows.
Fig. 2 Top module of RK4 method in Verilog
Table 1 Pseudocode to implement the proposed FSM in Xilinx

1 Module RK4(x n , yn , zn , wn , h, clk)
Parameter N = 50,000
Reg [31:0] C p = 0
2 Task product (input [31:0] a, b, output reg [31:0] c)
3 Task F_det (input [31:0] x, y, z, w, output reg [31:0] xdot, ydot, zdot, wdot)
4 Always @(posedge clk)
Case (state)
S0 : begin x n = 32’h0000_0000; yn = 32’h0002_0000; zn = 32’h0000_0000
S1 : begin
if(cp < N)
F_val(x n , yn , zn , wn , K x1 , K y1 , K z1 , K w1 )
State = S2 ;
End
3.1 Synthesis Results
The target device used for the purpose is Xilinx’s Artix 7 FPGA family’s
7a200tffv1156-1, on which 134,600 slice LUTs and 740 DSP blocks are available.
Out of the available resources, the percentage utilization and the total delay including
the logic and net delay for each chaotic system have been summarized in Table 2. It
is evident from Table 2 that the increase in number of operations in the characteristic
equations results in the increase in hardware requirements also. However, the total
delay varies from system to system depending on the logical operations, as well as
net delay. Out of the implemented chaotic systems, the comparative analysis favors
Pehlivan system because of its lesser hardware requirements and Rössler chaotic
system because of its lesser delay.
Since there are ample amount of hardware resources available on the Artix device,
where the implementation of these chaotic systems is lesser than 10% each, we can
also implement hyperchaotic systems on the same device. Also, delay is a critical
parameter when the hardware requirements are within limits. So, we recommend the
use of Rössler chaotic system for FPGA-based applications from the results obtained.
Table 2 Chaotic systems’ characteristic equations and synthesis results

S. No. Chaotic Characteristic No. of % % Total
system equations operations Utilization Utilization delay
(references) (No. of of Slice of DSP (nSec)
product LUTs blocks
terms)
1 Rössler [18] ẋ = −y − z 7(2) 4.08 8.65 30.745
ẏ = x + ay
ż =
b + z(x − c) a
= 0.2, b = 0.2,
c = 5.7
2 Lorenz [19] ẋ = a(y − x) 9(5) 4.33 12.97 42.250
ẏ = cx −x z− y
ż = x y − bz
a = 10, b = 8/
3, c = 28
3 Pehlivan [20] ẋ = y − x 6(3) 3.63 10.81 40.310
ẏ = ay − x z
ż = x y − b
a = b = 0.5
4 [21] ẋ = a(y − x) 8(4) 4.10 11.89 41.637
ẏ = x z − y
ż = b − x y −cz
a = 5, b = 16,
c=1
5 [22] ẋ = 11(6) 4.81 16.22 41.391
a(y − x) + yz
ẏ = cx − y−x z
ż = x y − bz
a = 35, b = 8/
3, c = 25
6 MACM [23] ẋ = −ax − byz 11(5) 3.78 10.81 47.978
ẏ = −x + cy
ż = d − y 2 − z
a = 2, b = 2, c
= 0.5, d = 4
7 [24, 25] ẋ = a(y − x) 9(5) 4.29 14.05 42.242
ẏ = cx − x z
ż = x y − bz
a = 35, b = 3,
c = 35
(continued)
Table 2 (continued)
S. No. Chaotic Characteristic No. of % % Total
system equations operations Utilization Utilization delay
(references) (No. of of Slice of DSP (nSec)
product LUTs blocks
terms)
8 Li [26] ẋ = a(y − x) 8(4) 4.10 11.89 41.637
ẏ = x z − y
ż = b − x y −cz
a = 5, b = 16,
c=1
9 Rabinovich ẋ = 14(8) 4.61 12.97 45.204
[27, 28] hy − ax + yz
ẏ =
hx − by − x z
ż = x y − dz
a = 4, b = 1, d
= 1, h = 6.75
10 Chen [29] ẋ = a(y − x) 11(6) 4.50 15.14 39.717
ẏ = (c − a)x +
cy − x z
ż = x y − bz
a = 35, b = 3,
c = 28
11 Lü [30] ẋ = a(y − x) 9(5) 4.29 14.05 40.431
ẏ = cy − x z
ż = x y − bz
a = 36, b = 3,
c = 20
3.2 Simulation Results
In the flow of FPGA-based design of a system, the functional verification of the

implemented system done using simulation results is a necessary step to validate the
design. For the validation of the FPGA-based results of the design, all the 11 chaotic
systems have also been simulated in python using RK4 numerical method, out of
which the results, in the form of the time series, for two of the systems, Lorenz and
Rössler, have been put with the Xilinx simulation results in Fig. 3. The simulation
results from Xilinx Vivado are in line with the simulation results from python, thus
promising the feasibility of these chaotic systems on FPGA.
Fig. 3 Numerical simulation and Xilinx Vivado simulation results of a Lorenz, b Rössler
4 Conclusion
In this paper, the FPGA digital circuit design of eleven chaotic systems using RK4
numerical method in Verilog hardware description language has been proposed.
The advantage of the proposed methodology is its field programmability and easy
implementation of a new chaotic system in comparison with that of their analog
counterparts. All considered chaotic systems have been synthesized and compared
in terms of percentage utilization of hardware resources on target FPGA device Artix
7 and the total time delay. While Pehlivan chaotic system outperforms in terms of
hardware utilization, Rössler chaotic system is the best fit for lesser delay require-
ments. The simulation results have also been validated by python-based numerical
simulations. Based on this design methodology, these chaotic systems can be further
used for digital applications.
References
1. Paliwal A, Mohindroo B, Suneja K (2020) Hardware design of image encryption and decryption
using CORDIC based chaotic generator. In: 2020 5th IEEE international conference on recent
advances and innovations in engineering (ICRAIE), Jaipur, India, pp 1–5. https://doi.org/10.
1109/ICRAIE51050.2020.9358354
2. Tang Z, Yu S (2012) Design and realization of digital image encryption and decryption based
on multi-wing butterfly chaotic attractors. In: 2012 5th international congress on image and
signal processing, Chongqing, China, pp 1143–1147. https://doi.org/10.1109/CISP.2012.646
9744
3. Negi A, Saxena D, Suneja K (2020) High level synthesis of chaos based text encryption
using modified hill cipher algorithm. In: 2020 IEEE 17th India Council International Confer-
ence (INDICON), New Delhi, India, pp 1–5. https://doi.org/10.1109/INDICON49873.2020.
9342591
4. Gomar S, Ahmadi M (2019) A digital pseudo random number generator based on a chaotic
dynamic system. In: 2019 26th IEEE international conference on electronics, circuits and
systems (ICECS), Genoa, Italy, pp 610–613. https://doi.org/10.1109/ICECS46596.2019.896
4861
5. Suchit S, Suneja K (2022) Implementation of secure communication system using chaotic
masking. In: 2022 IEEE global conference on computing, power and communication tech-
nologies (GlobConPT), New Delhi, India, pp 1–5. https://doi.org/10.1109/GlobConPT57482.
2022.9938303
6. Yang T, Wu CW, Chua LO (1997) Cryptography based on chaotic systems. IEEE Trans Circ
Syst I Fundam Theor Appl 44(5):469–472. https://doi.org/10.1109/81.572346
7. Tuna M, Alçın M, Koyuncu I, Fidan CB, Pehlivan I (2019) High speed FPGA-based chaotic
oscillator design. Microproces Microsyst 66:72–80
8. Tuna M, Fidan CB (2016) Electronic circuit design, implementation and FPGA-based realiza-
tion of a new 3D chaotic system with single equilibrium point. Optik 127(24):11786–11799
9. Chen S, Yu S, Lü J, Chen G, He J (2018) Design and FPGA-based realization of a chaotic
secure video communication system. IEEE Trans Circ Syst Video Technol 28(9):2359–2371.
https://doi.org/10.1109/TCSVT.2017.2703946
10. Nuñez-Perez JC, Adeyemi VA, Sandoval-Ibarra Y, Pérez-Pinal FJ, Tlelo-Cuautle E (2021)
FPGA realization of spherical chaotic system with application in image transmission. Math
Probl Eng. Article ID 5532106, 16p
11. Schmitz J, Zhang L (2017) Rössler-based chaotic communication system implemented on

FPGA. In: 2017 IEEE 30th Canadian conference on electrical and computer engineering
(CCECE), pp 1–4. https://doi.org/10.1109/CCECE.2017.7946729
12. Tolba MF, Elwakil AS, Orabi H, Elnawawy M, Aloul F, Sagahyroon A, Radwan AG (2020)
FPGA implementation of a chaotic oscillator with odd/even symmetry and its application.
Integration 72:163–170
13. Shi QY, Huang X, Yuan F, Li YX (2021) Design and FPGA implementation of multi-wing
chaotic switched systems based on a quadratic transformation. Chin Phys 30(2):020507-1–
020507-10
14. Koyuncu I, Özcerit A, Pehlivan I (2014) Implementation of FPGA-based real time novel chaotic
oscillator. Nonlinear Dyn 7:49–59
15. Garg A, Yadav B, Sahu K, Suneja K (2021) An FPGA based real time implementation of
Nosé hoover chaotic system using different numerical techniques. In: 2021 7th international
conference on advanced computing and communication systems (ICACCS), Coimbatore, India,
pp 108–113. https://doi.org/10.1109/ICACCS51430.2021.9441923
16. Cartwright JHE, Piro O (1992) The dynamics of Runge-Kutta methods. Int J Bifurcation Chaos
2:427–449
17. Sadoudi S, Tanougast C, Azzaz MS et al (2013) Design and FPGA implementation of a wireless
hyperchaotic communication system for secure real-time image transmission. J Image Video
Proc 2013:43. https://doi.org/10.1186/1687-5281-2013-43
18. Rössler OE (1976) An equation for continuous chaos. Phys Lett A 57(5):397–398
19. Lorenz EN (1963) Deterministic non-periodic flows. J Atmos Sci 20:130–141
20. Pehlivan I, Uyaroğlu Y (2010) A new chaotic attractor from general Lorenz system family and
its electronic experimental implementation. Turkish J Electr Eng Comput Sci 18(2):171–184.
https://doi.org/18. https://doi.org/10.3906/elk-0906-67
21. Li XF, Chlouverakis KE, Xu DL (2009) Nonlinear dynamics and circuit realization of a new
chaotic flow: a variant of Lorenz, Chen and Lü. Nonlinear Anal Real World Appl 10(4):2357–
2368
22. Qi G, Chen G, Du S, Chen Z, Yuan Z (2005) Analysis of a new chaotic system. Physica A:
Stat Mechan Appl 352(2–4):295–308
23. Méndez-Ramírez R, Cruz-Hernández C, Arellano-Delgado A, Martínez-Clark R (2017) A new
simple chaotic Lorenz-type system and its digital realization using a TFT touch-screen display
embedded system. Complexity 6820492
24. Yang Q, Chen G (2008) A chaotic system with one saddle and two stable node-foci. Int J Bifur
Chaos 18:1393–1414
25. Liu Y, Yang Q (2010) Dynamics of a new Lorenz-like chaotic system. Nonlinear Anal Real
World Appl 11(4):2563–2572
26. Li XF, Chlouverakis KE, Xu DL (2009) Nonlinear dynamics and circuit realization of a new
chaotic flow: a variant of Lorenz, Chen and Lü. Nonlinear Anal Real World Appl 10:2357–2368
27. Pikovski AS, Rabinovich MI, Trakhtengerts VY (1978) Onset of stochasticity in decay
confinement of parametric instability. Soviet Physics JETP 47:715–719
28. Kocamaz UE, Uyaroğlu Y, Kizmaz H (2014) Control of Rabinovich chaotic system using
sliding mode control. Int J Adapt Control Signal Proces 28(12), 1413–1421
29. Chen G, Ueta T (1999) Yet another chaotic attractor. Int J Bifurcat Chaos 9:1465–1466
30. Lu J, Chen G (2002) A new chaotic attractor coined. I J Bifurcat Chaos 12:659–661. https://
doi.org/10.1142/S0218127402004620
A Comprehensive Survey on Replay
Strategies for Object Detection
Allabaksh Shaik and Shaik Mahaboob Basha
Abstract Object detection is a task to expect the actual features and type of object
in a scene. Development of Convolution Neural Network (CNN) gave rise to great
advances in object detection. The most popular object detectors are Yolo and Faster
RCNN (Region-based CNN). The primary limitation of these object detectors is lack
of capability to continually gain knowledge of new objects in the dynamic world.
Humans are born to learn continued knowledge while grasping the ability to keep
the old knowledge. However, every deep network has a limited capacity to learn and
cannot exactly replicate the way humans perform continual learning. This is primarily
due to a phenomenon addressed as catastrophic forgetting which cannot retain the
previously learnt data while learning a new task. The issue of continual learning exten-
sively measured in image classification applications as these are essential in resolving
object detection problems. Incorporating the continual learning strategies within the
existing deep learning-based object detectors will be very useful in applications like
retail, autonomous driving, and surveillance-related issues. Various recent research
findings relate awareness refinement to limit the representation to hold older informa-
tion. This rigid limitation is disadvantageous for learning an innovative familiarity.
Among the various techniques that exist in the literature, replay-based approach is
very close to the way humans perform continual learning to retain previous knowl-
edge. This article surveys and analyzes the state-of-the-art replay techniques and
compares them to identify the most suitable technique for object detection on edge
devices.
Keywords Convolutional neural network · Object detector · Continual learning ·

Catastrophic forgetting · Replay strategies object detectors
A. Shaik (B)
Jawaharlal Nehru Technological University Anantapur, Ananthapuramu, Andhra Pradesh, India
e-mail: baksh402@gmail.com
Sri Venkateswara College of Engineering Tirupati, Affiliated to Jawaharlal Nehru Technological
University Anantapur, Ananthapuramu, Andhra Pradesh, India
S. M. Basha
N.B.K.R. Institute of Science and Technology, Affiliated to Jawaharlal Nehru Technological
University Anantapur, Vidyanagar, Ananthapuramu, Andhra Pradesh, India
https://doi.org/10.1007/978-981-99-6553-3_13
164 A. Shaik and S. M. Basha
1 Introduction
In recent time, varied research findings are observed to expand a realistic approach to
speed up the progress of deep learning techniques. We can observe many advanced
techniques which derived exceptional outcomes. Cun et al. developed the Convolu-
tion Neural Network (CNN) [1] which augmented research advancements in object
detection. The continuous reformations with experiential results in deep learning
procedures were identified in the literature. Object localization is the recognition of
all the features in an image by integrating the precise location of those images. The
object identification, computer vision localization using deep learning techniques
are visualized and effective techniques derived and deduced rapidly. The object
detection methods are widely used in various fields like military, radar sensing
and image processing. The exact object identification with respect to the features
like dimensions, postures, and viewpoints is a challenging area of research. For
the past few years, enormous research carried out using Machine Learning (ML)
and Deep Learning (DL) techniques. Xiao et al. proposed an object detection tech-
nique [2] which has a number of associations with object classification and semantic
segmentation explained with reference to Fig. 1.
Object classification involves determining the category to which objects in an
image belong. In contrast, object detection not only identifies the object categories
but also accurately locates them using rectangular bounding boxes. Semantic segmen-
tation focuses on predicting the object category for each individual pixel, without
distinguishing between different instances of the same object. On the other hand,
instance segmentation goes beyond semantic segmentation by not only predicting
the object categories for each pixel but also distinguishing between different instances
of objects.
Figure 2 illustrates the fundamental components utilized in object detection. The
region selector employs a sliding window approach, employing windows of varying
sizes and aspect ratios, which traverse the image. The initial movement of the sliding
windows is from left to right, followed by top to bottom. Key point is the total
procedure involves firm step size. The sliding window is used to crop image blocks
which in turn changed to form an image with consistent dimension. The technique for
extracting attributes are described in HOG [3], Haar [4] and SIFT [5]. To recognize
the type of the object in the extracted attributes, classifiers are proposed in SVM
(a) (b) (c) (d)
Fig. 1 a Object classification b Object detection. c Semantic segmentation d Instance segmentation

A Comprehensive Survey on Replay Strategies for Object Detection 165
Fig. 2 Basic architecture of traditional object detection algorithm
Fig. 3 Conventional object detection based on DCNNs
[6] and AdaBoost [7]. Figure 3 presents an overview of various object detection
methods.
Deep Convolutional Neural Networks (DCNNs) have undergone significant
evolution over the years, leading to remarkable advancements in object detection.
In the past, object detection applications have been frequently used for investigating
the evolution of DCNNs and the backbone networks. Subsequently, examination and
comparison of network frameworks and the reported loss functions for object detec-
tion are discussed. Research will focus on improving object detection model’s ability
to generalize to unseen classes with limited or no training data, allowing for better
adaptability in novel scenarios. Object detection models will be designed to learn
continuously from new data, adapting to evolving environments without catastrophic
forgetting.
2 Object Detectors
Object detection encompasses two main tasks: object localization and object classi-
fication. Deep Convolutional Neural Networks (DCNNs) have been widely used in
object detection, and they can be categorized into two types.
One type is the two-stage object detection architecture, which separates the tasks
of object localization and classification. This architecture first generates region
proposals and then classifies them. The key advantage of the two-stage approach
is its high accuracy. However, it suffers from slower detection speed. Some notable
examples of two-stage object detection architectures include RCNN [8], SPPNet [9],
Fast RCNN [10], Faster RCNN [11], Mask RCNN [12], and RFCN [13].
The other type is the one-stage object detection architecture, which directly locates
and classifies objects using DCNNs without dividing them into separate stages. This
one-stage detector can produce class probabilities. The location coordinates of an
object in a single step by eliminating the need for a region proposal process are
also observed. This makes it simpler compared to the two-stage approach. One of
the primary advantages of one-stage detectors is their ability to quickly identify
objects in a scene. However, they often exhibit lower accuracy compared to two-stage
architectures. Examples of one-stage object detection models include OverFeat [14],
the YOLO series [15–17], SSD [18], DSSD [19], FSSD [20], and DSOD [21].
Table 1 presents various parameters of performance of some traditional two-stage
and one-stage object detectors, while Fig. 6 illustrates the evolution of object detec-
tion milestones. For a comprehensive overview of milestone object detection designs,
refer to Table 1, which summarizes their features, properties, and weaknesses. The
subsequent sections will investigate interested in the details of the two-stage, one-
stage object detection architectures. It also emphasizes open-source object detection
platform for this need.
CNN uses a Deep Convolutional Neural Network (DCNN) for the feature extrac-
tion to backbone network in the paper as opposed to a HOG [3] and other typical
feature extraction techniques, which were paired with regional proposals methods to
generate region suggestions. As seen in Fig. 4, this gave rise to the RCNN architec-
ture [8]. The steps from the RCNN architecture pipeline are as follows. The selective
search method produces about 2000 recommendations regions that are independent
of any given category. These region suggestions are fed into the DCNN, which create
a 4096-dimensional feature as a representation. The features are classified using the
SVM approach, which improves RCNN performance by 30%. Although RCNN has
demonstrated exceptional performance in object detection, it still has three key short-
comings that preclude it from being employed in practical applications, which are
given below. Each picture has to have roughly 2000 region proposals pre-fetched,
which uses a lot of storage space and I/O resources. This can be a problem because if
the photos do not contain enough region suggestions, they may not be able to obtain
the correct image. If you want to use AlexNet as your backbone network, you must
crop or warp the region block to create a 227,227 RGB image. This will result in the
object image being truncated or stretched. This might lead to the loss of important
Table 1 Chart of highlights, properties, and milestone object detection architectures

Method Highlights and properties Shortcomings
RCNN [8] The software uses Deep Convolutional Neural The training process is
Networks (DCNNs) to extract image features. It characterized by slow-speed,
selects 2000 proposals using the selective high-resource consumption,
search algorithm and uses Support Vector and the absence of end-to-end
Machine (SVM) to classify regions. Finally, it training
uses a bounding-box regressor to refine regions
SPPNet [9] Extract features of the entire image with Selective search is to pick up
DCNNs. On the image, choose up 2,000 area region proposals which is still
suggestions, and then map them to feature slow; no need for end-to-end
maps. Multi-scale images will be submitted to training
DCNNs using spatial pyramid pooling
Fast RCNN Pick the features from the entire image using Discriminating exploration to
[10] DCNNs; extract region proposals from the extract region proposals is still
image using the selective search technique, and a slow process. End-to-end
map them to the feature maps. To obtain training is not appeared in this
fixed-size feature maps, down-sample the case
region proposals data;
Faster RCNN Switch the selective search algorithm to the Large and complex objects
[11] region proposal network (RPN). End-to-end experience poor performance
training is achievable since the RPN shares the when scaled up or down. This
feature maps with the backbone network can be a problem for
applications that need to
respond quickly to changes,
like real-time games
Mask RCNN Instead of using ROI pooling layer that The speed at which detection
[12] improves detection precision, use ROI align can take place cannot keep up
pooling layer. For better detection accuracy, with the real-time
combine object detection and segmentation requirements
training; relevant for concentrating on Small
Object Detection
YOLO [15] A novel single-stage detection network can Dense objects or small objects
detect objects with high speed and demands to have low accuracy in detection
meet the real time
YOLOV2 Multi-dataset joint training is employed; it Training is complex
[16] makes use of DarkNet19, a new backbone
network. It creates anchor boxes using the
k-means clustering technique
YOLOV3 Feature fusion is a process of combining the Performance declines with
[17] results of multiple measurements to improve increasing IoU
accuracy. The new backbone network,
DarkNet53, employs feature fusion at many
levels to increase the precision of multi-scale
detection
(continued)
Table 1 (continued)
Method Highlights and properties Shortcomings
SSD [18] It makes use of a multi-layer detecting system. This mechanism is not
It employs many levels of the multi-scale well-suited for detecting small
anchors’ mechanism objects
DSSD [19] It uses multi-layer detection mechanism; Detection speed decreases
up-sampling using deconvolution instead of relative to SSD
simple linear interpolation employed to
improve the resolution in the image
Fig. 4 R-convolutional neural network architecture
information. Another issue is that each area proposal is fully separate and does not
make use of DCNN feature sharing. This means that extracting them all will require
a significant amount of resources.
The last convolutional layer spatial pyramid pooling [10] is introduced after the
cropping/warping phase of the RCNN is eliminated. Cropping or distorting an image
can result in missing object information. In order to produce a 21-dimensional fixed-
length feature vector for the fully connected (FC) layer [9], an image of any size
can be input into the DCNNs. This feature vector is then used by the FC layer to
forecast the next pixel in the image. SPPNet test performance is 10–100 times faster
than RCNN since the complete feature map is shared. Because SPPNet and RCNN
rely significantly on end-to-end training, there is no way to completely execute the
model without first performing extensive data preprocessing. In SPPNet, network
accuracy [10] is constrained since the convolutional layer cannot be trained further
during fine-tuning.
Girshick et al. described about the object detection using Regional convolutional
neural networks and the proposed method gave an avenue for future researchers to
develop effficient algorithms that offer better accuracy rates and speed [8]. Numerous
industries, including high-end robotics and automation, biometric and face identifi-
cation, and medical imaging, use object detection. Based on how the tasks of classi-
fication and bounding-box regression are carried out, the majority of object detectors
may be generally divided into two groups of object detectors.
Dai et al. suggested that to recognize objects within the bounding box faster
than RCNN, object detection is carried out using region-based fully convolutional
networks [13]. For the purpose of locating the target item within the bounding box,
RCNN employs the selective search approach.
Girshick et al. work focused on Fast RCNN [10] which is a fast algorithm for
deep learning that uses a RoI pooling layer to reduce the speed, space, and training
time process. RoI pooling layer reduces the number of required training samples by
grouping nearby points into regions, which allows the algorithm to fixate on more
crucial areas. In the Fast RCNN, each DCNN calculates a feature map of the image.
To locate regions that are comparable to the feature map region, the selective search
method makes use of a map. Different feature areas are then converted to fixed-size
feature vectors by the RoI pooling layer and sent into the fully connected (FC) layer.
Finally, the bounding-box regression precisely estimates object position, while the
Softmax predicts item categories as shown in Fig. 5.
Using multi-task loss, the Fast RCNN concurrently trains classification and
bounding-box regression, resulting in two tasks that share convolution information.
As a result, multi-task training may be applied to stage-wise SVM + bounding-box
regression training. The benefits of Fast RCNN over RCNN/SPPNet as a result of
these advances are as follows. Fast RCNN performs better in terms of accuracy than
RCNN/SPPnet. The training of the detector is end-to-end due to multi-task loss.
Compared to SPPNet, which updates just the fully connected (FC) layer, fast RCNN
training can update all network layers. Hard disk storage is not required for feature
caching. Training and testing for RCNN/SPPNet are quicker.
Fig. 5 Fast RCNN architecture
Fig. 6 Faster RCNN architecture

It takes a while for selective search technique to examine all region proposals in
the image and mapping them into feature maps. Fast RCNN required about 2.3 s
to make predictions in the test, and roughly 2 s of that time was used to build
2000 ROIs. As a result, the bottleneck of the object identification architecture is the
conventional region proposal methodologies. Ren et al. created a regional proposal
network (RPN) in the Faster RCNN [11] to resolve this issue. Proposal networks
are the other alternatives in this region. The DCNNs share the properties of the
detection network’s convolution with the whole image feature maps to the RPN’s
region proposal extraction (Fig. 6).
YOLO is the popular single-stage detector and is popularly used in the literature
for real-time object detection. Unlike two-stage detectors, all the YOLO versions
V1, V2, V3, etc., operate in single stage where there is no intermediate computation
of probable region proposals.
Redmon et al. proposed single-stage detectors which will execute in a single step.
YOLO V1 [15] computes both the classification label and bounding-box coordinates
directly from the feature maps computed from backbone feature extractor. However,
all these object detectors go through from the setback of catastrophic forgetting. They
tend to forget the information gained from formerly learnt classes when refined for
new classes. In Yolo, the given picture is distributed across a grid of N × N cells.
In every cell, it calculates confidence for ‘n’ bounding boxes. A tensor encoding
the predicted result is N × N × (s × s + p). N × N sub-images are separated
from the input image. A bounding box of an object detected has five attributes,
namely confidence score, weight, height, and center coordinates (x, y). A number
of limitations also exist with YOLO V1. YOLO V1 constraints can be based on the
closeness of objects in the image. If the objects appeared in a group, they could not
find the small objects. The key concern is locating objects in a given image due to
localization error. YOLO V1 sometimes fails because in the image above it identifies
the human as an airplane (Fig. 7).
Shafiee et al. focused on Yolo V2 [16] supersedes Yolo helps you achieve an
incredible balance between running speed and precision. For higher accuracy, Yolo
V2 features batch normalization, which makes it easier to increase two percentage
points in a map by integrating it into each layer of convolution. With the help of
changing its filters, high-resolution classifiers are employed to perform well while
allowing more time for a larger, more diverse knowledge of community to operate
Fig. 7 YOLO architecture

well. For various aspect ratios’ objects, there will be weak generalization ability
which can be solved by YOLO V2 by introducing anchor. Each grid cell can antic-
ipate three scale and three aspect ratios. To discover the previous bounding boxes
automatically, YOLO V2 uses the K-means clustering technique, which can increase
the accuracy of detection. By restricting the ground truth offset in relation to the grid
cell arrangement to a range between 0 and 1, YOLO V2 resolves the instability of
the connections approach.
For object detection, Redmon et al. proposed the enhanced version of YOLO
V3 [17], which is employed in many aspects of human life, including health and
education and many others. As a majority of these sectors were developed, so one-
stage model should be improved. In YOLO V3, the targetness score is calculated
using logistic regression. Instead of the Softmax layer used in YOLO V2, YOLO
V3 employs a logistic classifier for each class and uses darknet53, which has 53
convolution layers. YOLO V3 is more in depth compared with YOLO V2 in which
fourteen convolution layers are present. Yolo V3 addressed Yolo V2’s issues and
developed a balance between speed and accuracy.
Both RCNN series and YOLO exhibit compensation in terms of accuracy and
speed. The RCNN has outstanding accuracy but reduced speed for object detection.
Similarly, YOLO detects objects fast, but its effect for small objects is minimal.
Liu et al. presented the single-shot multibox detector (SSD) [18], for instance
by taking into account the benefits of Faster RCNN and YOLO. As the backbone
network for feature extraction, SSD uses VGG16. Figure 8 depicts the extraction
of SSD network hierarchical features. The SSD may be adjusted to detecting multi-
scale objects using anchor algorithm in conjunction with Faster RCNN’s multi-
scale feature maps. SSD512 outperforms the three times Faster RCNN with VGG16
accuracy. SSD300 outperforms YOLO by 59 frames per second, with substantial
quality detection [18].
DSSD [19] is using a deep learning network, known as ResNet101 which is
backbone network, to improve the SSD’s low-level feature maps. The low-level
feature maps with feature fusion are achieved by incorporating the deep convolution
Fig. 8 Single-shot multibox detector architecture

and skip-connection modules. Similar to this, depending on SSD, FSSD transforms

low-level features into sophisticated characteristics.
3 Continual Learning and Catastrophic Forgetting
Research on training object detection models to continuously learn from fresh

data over time, as proposed by Menezes et al., is known as continual learning for
object detection. Traditional object detection algorithms are frequently trained on a
fixed dataset and probably have access to all training data. However, in real-world
scenarios, new object classes, variations, or environments may emerge after the
initial training, requiring the model to adapt and learn from these new examples.
Continual learning approaches aim to address this challenge by enabling object detec-
tion models to incrementally be trained from new data by preserving knowledge
gained from preceding data. The goal is to keep away from the concept of catas-
trophic forgetting. At this point, the model forgets earlier learned data compared
to fresh data. There are several techniques and strategies used in continual learning
for object detection like Regularization, Replay and Memory, Generative Models,
Knowledge Distillation, Task-Freezing, and Fine-Tuning.
The modern researchers can utilize to regulate their incremental object detector
studies. In this manner, we suggest the contributions that: a brief and methodical recap
of the key solutions to the issue of continuously learning and identifying new object
instances. In the area of continuous object detection, CL methods are combined to
address memory loss and knowledge transferability between object detection tasks.
A broad understanding of both subjects is necessary in order to identify prospects in
the area and understand the findings of this analysis.
The task ID is essential in determining the types of classes and distributions that
can be discovered during testing in classification tasks. This determines whether
a task-specific approach is required or if a more general strategy is sufficient. As
such, Van de Ven and Tolia’s convention for three typical task scenarios has been
widely adopted in the literature on continual learning (CL). Task-incremental learning
assumes that the model possesses knowledge of the task ID during both training
and testing which enables a task-specific solution. On the other hand, domain-
incremental learning does not provide the task ID during testing but retains task
structure. Normally, class labels are kept, although the distribution of the data may
alter. Class-incremental learning: The model must infer the task ID because it is
assumed that it will not be provided during the test time. The model must gradually
incorporate more classes and increase the variety of its predictions. Task-Free or
Task-Agnostic CL adds a scenario for when the task labels are not provided during
either training or testing, making it the most challenging scheme. In order to deal with
changing data distribution, the model still lacks knowledge of task limits. The major
techniques are separated into three families such as parameter isolation, regulariza-
tion, and episodic replay. In parameter isolation, some of the networks parameters are
freeze and a new layer is added whenever the new task is presented to the network.
This enables an increased network capacity without training it from scratch. Regu-
larization will prevent the network from overfitting to either the new or old classes
which will aid in improved learning ability of the object detection architectures. The
final technique is based on the replay mechanism where the data corresponding to
previously learnt knowledge will be constantly replayed to the deep networks which
assist in avoiding the problem of catastrophic forgetting.
McCloskey and Cohen [23] and Ratcliff et al. [24] describe about the catas-
trophic forgetting which is an issue that impacts neural networks, along with other
acquiring systems, comprising both organic and artificial intelligence systems. A
learning scheme may initially forget how to perform the first job when it is trained
with another. A well-supported algorithm of organic acquiring in people advises that
neocortical neurons acquire knowledge applying an procedure that is prone to catas-
trophic forgetting and that the neocortical acquiring procedure is complemented by
a virtual encounter treat that replays memories stored in the hippocampus for the
purpose of continually reinforce assigned tasks that have not been lately carried out
in the paper [25]. As artificial intelligence researchers, the lesson we may pick up out
of that is that it is acceptable for our acquiring algorithmic program to suffer from
forgetting, but they may need commonly confused word algorithmic program to
decrease data loss. Designing such complementary algorithms relies upon on exper-
tise the characteristics of the forgetting experienced via our modern-day primary
gaining knowledge of algorithms.
In this paper [26], we check out the quantity to which catastrophic forgetting
influences a variety of learning algorithms and neural network activation functions.
Neuro-scientific evidence suggests that the relationship between the old and new
assigned tasks strongly influences the result of the two successive acquiring expe-
riences Consequently, we contemplate three distinct types of relationship between
assigned tasks: one in which the assigned tasks are functionally identical but with
distinct formats of the input, one in which the assigned tasks are similar, and one in
which the assigned tasks are dissimilar.
4 Replay Strategies for Object Detection
Kudithipudi et al. [27] described that humans can learn about new objects with
limited experience without forgetting the information about old objects. They tend
to use wide variety of techniques. Some of them include: neurogenesis, episodic
replay, meta-plasticity, neuro-modulation, context-dependent perception and gating,
hierarchical distributed systems, and many more. Among these multiple paradigms of
continual learning, there is strong evidence in support of episodic replay for memory
consolidation in the biological brain (Fig. 9).
Even though episodic replay has been considerably explored in the context of
image classification, very few works exist in the space of object detection. Figure 1
shows the generic block diagram for a replay-based continual learning frame-
work. Replay-based techniques always associate a memory with the object detection
Fig. 9 Replay-based
continual learning
framework
networks where either the instances or feature maps corresponding to key instances
are replayed to the object detection networks at regular intervals to avoid catastrophic
forgetting phenomenon. Both the classification and the bounding-box regression
modules acquire the instance level features. Binary cross-entropy is typically used to
reduce classification loss. However, focal loss tends to minimize the effects of class
imbalance. The equation for focal loss can be given by following Eq. 1:

N

L cls = F L dti , d ip , (1)
i=1
where dti indicates the ground truth one-hot vector, whereas the d ip indicates the one-
hot vector corresponding to ith sample. For bounding-box regression, the typical loss
function can be provided by the following Eq. 2:

L loc = smooth L1 tiu − vi . (2)
i{x,y,w,h}
Smooth-L1 loss in the above Eq. 2 is more robust than L2 loss, where {x, y, w,
h} indicates the coordinates of bounding box. This paper attempts to review and
present the comparative analysis among modern replay-based approaches in object
detection.
In Shin et al. paper “Continual Learning with Deep Generative Replay [28]”,
episodic replay is employed in order to continue learning. This technique allows
to replay with new data and old data in memory at the same time. One of the key
drawbacks is that the old samples need to be stored in the memory and this cannot be
scaled for larger datasets. Deep Generative Replay will eliminate the need for storing
the key samples. Here, model is trained using pseudo data which are generated and it
can replay the knowledge of old objects. The network adopts a twofold model design
that combines a Deep Generative Model (also known as a “generator”) and a Task-
Solving Model (also known as a “solver”), also known as a Scholar Model. Deep
Generative Model (“generator”) is trained using Generative Adversarial Networks
(GANs) (framework). The paired data here are known as generator–solver pair (both
models’ data). This produces fake data as required by the user’s requirements (desired
targets). Generator–solver pairs are presented in new tasks which are updated using
the generator and solver. By using this, we can overcome the CF. By using this input
target pair using generator and solver can be used to teach the other models. By using
this, it retains knowledge and it does not need to revisit the past data.
In their study titled “Take goods from shelves: A dataset for class-incremental
object detection” [29], Hao et al. introduced a valuable contribution by proposing
a dataset specifically designed for class-incremental object detection. The dataset
includes three coarse classes, consisting of 38,000 top-quality images and 24 fine-
grained classes. This work represents an advanced approach to class-incremental
techniques.
The researchers adopted Faster RCNN (FRCNN) as the base model and made
several modifications to enhance its detection capabilities without sacrificing the
knowledge learned from previous classes. Specifically, they focused on modifying
the classification part of the model in a class-incremental fashion while keeping
the regression part unchanged. Additionally, the authors of [29] introduced a novel
technique involving knowledge distillation applied to the FRCNN branch, further
improving the model’s performance.
By leveraging these innovations, the authors successfully addressed the chal-
lenges of class-incremental object detection, providing a valuable resource for
future research in this field. The strategy applied here is an image-level exemplar
management strategy and it is used to avoid forgetting in the implementation of
class-incremental learning model. Even though this approach does not directly use
replay techniques, since this involves a useful dataset for continual learning, the brief
description about the work has been included in this section.
Shieh et al. paper “Continual Learning for Autonomous Driving [30]” describes a
continual learning framework for one-stage object detection network by effectively
combining the old and new data through a memory-based replay technique. In this
technique, a portion of previously seen data is stored with in a memory buffer.
The replayed information along with the new data is used to avoid the catastrophic
forgetting problem. Each batch of data fed into the network contains both old and
new classes. This approach has been validated on the modified version of Pascal VOC
2007 dataset and making it suitable for continual learning for YOLO network as base
detection model. Augmented images or expanded images are stored in memory so
that there is a loss in accuracy.
Acharya et al. paper “Rodeo: Replay for online object detection” [31] describes
about the object detection which is a localization task that entails predicting bounding
boxes and class labels for all gadgets in a scene. Majority of deep gaining knowledge
of systems for detection are skilled offline, i.e., they cannot be usually updated with
new item classes. If the trained deep networks are fine-tuned, they suffer from catas-
trophic forgetting. In this work, RODEO replays compressed feature representations
corresponding to the object from a fixed memory buffer while fine-tuning the pre-
trained deep network. The feature representations are in use from an intermediate
layer of the CNN backbone and compressed to reduce storage requirements. During
the training procedure, RODEO framework essentially combines a random subset of
samples from its replay buffer along with the new input.
Yang et al. paper work on “One-Shot Replay: Boosting Incremental Object Detec-
tion via Retrospection of One Object [32]” focuses to generate new data. We need
to generate synthetic samples with objects of old and new classes by copying the
stored one-shot object of each old class and pasting them on new samples or clean
background randomly. The synthetic samples are feed as input into the dual network,
where the old model gives the knowledge of old classes in the features and outputs.
To minimize the storage of old data, the authors proposed to store only one object
for each old class. They make use of copy–paste to perform replay for incremental
learning, which replays objects of old classes by augmenting new samples (creating
new samples using augmentation). This approach initially selects a cropped object
from memory and resizes it with random width and height in a range and then search
for a position in the new sample for pasting the object, where the IOUs between the
object and the ground truths of the new sample should be lower than a threshold. The
search time is made sure to have a certain upper limit. The advantage with copy–paste
technique is that the memory usage of instance level information will be far less than
the whole image. Also, this approach will not increase the training set which will not
eventually increase the number of forward steps and the time consumed for training.
Kim et al. paper “Continual Learning on Noisy Data Streams through a Self-
Purified Replay” [33], the proposed self-purified replay is used under continual
learning and noisy label classification setting. The technique involves continual
learning by means of replay-based technique. This simple procedure surpasses
previous techniques with respect to performance and memory efficiency. The
replaying of a noisy buffer intensified the forgetting procedure. The reason is because
of the deceptive mapping of earlier knowledge. Self-Purified Replay (SPR) is used to
tackle noisy labeled continual learning. The authors of [33] introduced noisy labels
for catastrophic forgetting. Filtering noise from the input data stream before storing
it in the replay buffer is crucial. SPR ensures the maintenance of a clean replay buffer.
Mirza et al. paper “Domain Incremental through statistical corrections (DISC)” [34]
described that it is extremely challenging to build the autonomous vehicle switch
which will have the ability to adapt to new weather conditions. During the process of
adapting these vehicles to new weather conditions, they tend to forget the information
previously learnt. The approach proposed in this paper is referred as DISC which
can incrementally learn new tasks without the need for retraining or large storage
space to store previously learnt samples. The weather changes in DISC are captured
in the form of statistical first and second-order moments which consume very less
storage space. However, these statistical parameters capture only the global weather
changes and may not be easily adaptable for the domain shifts within local regions
such as object’s appearance.
Chen et al. in their paper titled “Rehearsal balancing priority assignment network
(RBPAN)” [35] proposed a continual learning detector for remote sensing image
applications with very high resolution (VHR). Due to the inherent class imbalance
problem in many datasets, the network tends to be biased to a certain class and this
in turn leads to rehearsal imbalance where the samples corresponding to certain
classes are given higher priority during the rehearsal. The authors of this work
propose RBPAN which uses the entropy reservoir sampling technique to maintain
80
68.6
68.6
BRISQUE
68.2
66.5
69
70
57.2
55.6
54.6
54.3
60 Class Bal Buffer
54
45.5
44.9
42.1
50 40.6
39.8
40 Class Bal
Samples
30
GSS [18]
20
10
RSS [19]
0
mIoUBDD mIoUCityscapes mIoUAverage
Fig. 10 Comparison of various sampling strategies [36]
the rehearsal balance during the training procedure. The proposed network in [35]
assigns adaptive weights to the images during the replay procedure which boosts the
importance for minority classes while decreasing the weights assigned for majority
classes.
According to Kalb et al.’s description of the object identification methodology,
semantic segmentation is another method for identifying things in pictures or movies
[36]. The authors in [36] proposed an improved replay technique where they have
shown that maintaining a uniform distribution of classes in the memory buffer will
avoid the new class of objects bias the network within a class-incremental learning
framework, whereas in a domain-incremental learning setting, sampling the features
uniformly from different domains will tend to decrease the representation shift and
thus avoiding the problem of catastrophic forgetting. The comparative analysis of
various sampling techniques are presented in Fig. 10.
The experiment has been performed using BDD and Cityscapes datasets which
are the popular autonomous driving datasets. They have used mIoU as their metrics
for comparison with the various other sampling strategies and can be computed by
using the following equations:
Area of Overlap
IoU = , (3)
Area of Union
Total Number of predictions detected correctly
Accuracy = , (4)
Total Number of Prediction
Number True Positives
Sensitivity = , (5)
Number of True Positives + Number of True Negatives
Precision = , (6)
Number of True Positives + Number of False Positives

TPR = , (7)
Number of True Positives + Number of False Negatives
Number False Positives
FPR = . (8)
Number of False Positives + Number of True Negatives
The above is to compute the accuracy, sensitivity, precision, True Precision, False
Precision for calculation of Mean Average precision (mAp) and Mean Inter-section
of Union (mIoU) of various strategies.
The mean is computed across all instances and images to report the mIoU
for performing comparison across the popular sampling strategies. Among all the
approaches from Fig. 10 comparison of various strategies, the RSS [36] technique
performs the best and this signifies the need for balanced uniform sampling technique
in a domain-incremental scenario.
Replay-based techniques are used in the framework of continual object detection.
Experienced replay is shown as one of the popular approaches for memory retention
within human brains. Inspired from experienced replay, there are several approaches
which are proposed for continual learning for image classification networks, while
there was a little focus on the object detection networks which is an important
computer vision problem spanned across several safety–critical applications where
there is a continual evolution of new data. In an ideal scenario, external memory needs
to be used in conjunction with deep networks to store knowledge about previous
networks, thereby preventing catastrophic forgetting.
5 Conclusions
This paper provides a comprehensive review of replay-based techniques utilized

within the framework of continual object detection. Experienced replay is shown as
one of the popular approaches for memory retention within human brains. Inspired
from experienced replay, there are several approaches which are proposed for
continual learning for image classification networks, while there was a little focus
on the object detection networks which is an important computer vision problem
spanned across several safety–critical applications where there is a continual evolu-
tion of new data. In an ideal scenario, an external memory needs to be used in
conjunction with the deep networks to store the knowledge about previous networks.
However, the edge devices will only have a small amount of memory and use a method
similar to experience replay, which cannot scale for bigger datasets. However, gener-
ative or one-shot replay techniques with the balanced sampling strategies will be
ideal for edge devices where there is an access to very limited memory.
References
1. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989)
CNN-LeNet. In: IEEE International phoenix conference on computers and communications
2. Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based
on deep learning. Multimedia Tools and Appl 23729–23791
3. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE
computer society conference on computer vision and pattern recognition (CVPR’05), June, vol
1. pp 886–893
4. Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In:
International conference on image processing, September, vol 1. pp I–I
5. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision
60:91–110
6. Cortes C, Vapnik V (1995) Support-vector networks. In: Machine learning, vol 20. Published
in Kluwer Academic Publishers, pp 273–297
7. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In:
Proceedings of the 2001 IEEE computer society conference on computer vision and pattern
recognition. CVPR 2001, December, vol 1. pp I–I
8. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object
detection and semantic segmentation. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 580–587
9. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks
for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
10. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on
computer vision, pp 1440–1448
11. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with
region proposal networks. In: Advances in neural information processing systems, vol 28
12. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE
international conference on computer vision, pp 2961–2969
13. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional
networks. In: Advances in neural information processing systems, vol 29
14. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: inte-
grated recognition, localization and detection using convolutional networks. In: [ICLR 2014]
International conference on learning representations
15. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 779–788
16. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
conference on computer vision and pattern recognition, pp 7263–7271
17. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.
02767
18. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Single shot multibox
detector. In: Computer vision–ECCV 2016: 14th European conference, October 11–14, 2016,
Proceedings, Part I 14, pp 21–37
19. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector.
20. Li Z, Zhou F (2017) FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv:
1712.00960
21. Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: learning deeply supervised object
detectors from scratch. In: Proceedings of the IEEE international conference on computer
vision, pp 1919–1927
22. Menezes AG, de Moura G, Alves C, de Carvalho AC (2023) Continual object detection: a
review of definitions, strategies, and challenges. Neural networks
23. McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the

sequential learning problem. In: Psychology of learning and motivation, vol 24. Academic
Press, pp 109–165
24. Ratcliff R (1990) Connectionist models of recognition memory: constraints imposed by
learning and forgetting functions. Psychol Rev 97(2):285
25. McClelland JL, McNaughton BL, O’Reilly RC (1995) Why there are complementary learning
systems in the hippocampus and neocortex: insights from the successes and failures of
connectionist models of learning and memory. Psychol Rev 102(3):419
26. Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y (2013) An empirical investigation of
catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211
27. Kudithipudi D, Aguilar-Simon M, Babb J, Bazhenov M, Blackiston D, Bongard J, Brna AP,
Chakravarthi Raja S, Cheney N, Clune J, Daram A (2022) Biological underpinnings for lifelong
learning machines. Nature Mach Intell 4(3):196–210
28. Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In:
Advances in neural information processing systems, vol 30
29. Hao Y, Fu Y, Jiang YG (2019) Take goods from shelves: a dataset for class-incremental object
detection. In: Proceedings of the 2019 on international conference on multimedia retrieval,
June, pp 271–278
30. Shieh JL, Haq QMU, Haq MA, Karam S, Chondro P, Gao DQ, Ruan SJ (2020) Continual
learning strategy in one-stage object detection framework based on experience replay for
autonomous driving vehicle. Sensors 20(23):6777
31. Acharya M, Hayes TL, Kanan C (2020) Rodeo: replay for online object detection. arXiv preprint
arXiv:2008.06439
32. Yang D, Zhou Y, Hong X, Zhang A, Wang W, Yang D (2023) One-shot replay: boosting
incremental object detection via retrospecting one object. In: AAI
33. Kim CD, Jeong J, Moon S, Kim G (2021) Continual learning on noisy data streams via self-
purified replay. In: Proceedings of the IEEE/CVF international conference on computer vision,
pp 537–547
34. Mirza MJ, Manasa M, Possegger H, Bischof H (2022) An efficient domain-incremental learning
approach to drive in all weather conditions. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pp 3001–3011
35. Chen X, Jiang J, Li Z, Qi H, Li Q, Liu J, Zheng L, Liu M, Deng Y (2023) An online continual
object detector on VHR remote sensing images with class imbalance. Eng Appl Artif Intell
117:105549
36. Kalb T, Mauthe B, Beyerer J (2022) Improving replay-based continual semantic segmenta-
tion with smart data selection. In: 2022 IEEE 25th international conference on intelligent
transportation systems (ITSC), October, pp 1114–1121
Investigation of Statistical and Machine
Learning Models for COVID-19
Prediction
Joydeep Saggu and Ankita Bansal
Abstract The development of technology has a significant impact on every aspect

of life, whether it is in the medical industry or any other profession. By making
decisions based on the analysis and processing of data, artificial intelligence has
demonstrated promising outcomes in the field of health care. The most crucial action
is early detection of a life-threatening illness to stop its development and spread.
There is a need for a technology that can be utilized to detect the virus because of
how quickly it spreads. With the increased use of technology, we now have access to
a wealth of COVID-19-related information that may be used to learn crucial details
about the virus. In this study, we evaluated and compared various machine learning
models with the traditional statistical model. The results of the study concluded the
superiority of machine learning models over the statistical model. The models have
depicted the percentage improvement of 0.024%, 0.103%, 0.115%, and 0.034% in
accuracy, MSE, R2 score, and ROC score, respectively.
Keywords Machine learning · Computational ıntelligence · COVID-19 ·

Statistical algorithm · K-nearest neighbors · Logistic Regression · Decision Tree ·
Random Forest · XGBoost · Support Vector Machine
1 Introduction
The novel coronavirus first surfaced in Wuhan, China, in December of this year [1],
and on December 31 of this year, it was reported to the World Health Organization.
On February 11, 2020, the WHO designated the virus COVID-19 as a threat to
the entire world. Clinical form studies demonstrate the existence of carriers who
are asymptomatic in the community as well as the age groups most afflicted [2].
J. Saggu · A. Bansal (B)

Netaji Subhas University of Technology, Dwarka, India
e-mail: ankita.bansal06@gmail.com
J. Saggu
e-mail: joydeep.it19@nsut.ac.in
https://doi.org/10.1007/978-981-99-6553-3_14
182 J. Saggu and A. Bansal
A person who is infected develops symptoms in 2–14 days. Fever, dry cough, and
weariness are listed by the World Health Organization as symptoms and indicators
of moderate-to-severe diseases, whereas dyspnea, fever, and fatigue may occur in
severe cases. Transmission from person to person of the virus is anticipated to occur
mostly through direct contact and respiratory droplets [3]. According to the WHO,
the time needed for incubation of this virus ranges from 2 to 10 days in most instances.
People are more likely to get the virus and develop major illnesses such as diabetes,
asthma, and heart disease. The fast outspread of the virus, which has killed hundreds
of thousands of people, has necessitated the development of a technology that may
be used to detect the infection. However, infections can be reduced to some extent
by practicing good hygiene. Moreover, it has been observed that the early detection
of this disease can help in the containment of the virus. Tools such as machine
learning (ML) software, datasets, and classification algorithms are crucial for creating
the COVID-19 predictive model. Employing ML to detect COVID-19 has served
in the monitoring and prevention of infectious patients and has helped in various
circumstances where the ML has come as an aid for detecting COVID-19 more
efficiently than statistical models like Linear and Logistic Regression and thus has
reduced dependencies on hospitals where RT-PCR tests were the standard method
of concluding whether the individual has COVID or not [4, 5].
This project intends to compare the accuracies of various ML algorithms such as
K-nearest neighbors, Decision Tree, Random Forest, XGBoost, and Support Vector
Machine (SVM) versus the statistical model of Logistic Regression and then utilize
the best of them to determine an approach that forecasts whether or not the indi-
vidual has COVID based on the data presented to the model. Contribution of the
work can be summarized as: (i) evaluation of ML models for COVID-19 prediction,
(ii) comparison of ML models with statistical models for COVID-19 prediction.
Following this section, Sect. 2 describes the work in literature. The methodology of
work is discussed in Sect. 3. Brief description about each model is given in Sect. 4.
The results are discussed in Sect. 5, followed by conclusion in Sect. 6.
2 Related Work
Millions of lives could be saved by a reliable and thorough diagnosis of COVID-

19, which would also provide a wealth of data for training ML models. In this
context, ML may offer useful inputs, particularly for formulating diagnoses based
on clinical literature, radiographic pictures, etc. According to studies in [6], a Support
Vector Machine (SVM) algorithm can successfully distinguish COVID-19 patients
from other patients in 85% of cases. In the study, COVID-19 test results from the
Israeli government database were analyzed. The data collected were in the duration
of March 2020–November 2021. During the first several weeks of the outbreak, it
served as one of the primary COVID testing facilities in the nation. A task committee
created to address the COVID-19 situation carried out this study. It evaluated the
efficacy of a few ML techniques (neural networks, gradient-boosted trees, Random
Investigation of Statistical and Machine Learning Models … 183
Forests, Logistic Regression, and SVM) for COVID positivity. The study in [7] used
a number of dividers, including Logistic Regression, multilayer perceptron (MLP),
and XGBoost. Over 91% of COVID-19 patients were correctly categorized. An ML
algorithm was created and tested for COVID-19 diagnosis in the work in [8]. Based
on lab features and demographics, the algorithm was created. They tested a few
ML models before aggregating them to perform the final categorization. The created
method exhibited a 0.093 sensitivity and a 0.64 specificity. COVID-19 is predicted
by the function in [9] with 91% and 89% accuracy, respectively. Additionally, in
98% of cases, the requirement for an ICU or semi-ICU was predicted [10]. Since
there is not a lot of research on text-based diagnosis and prediction and most of the
analysis is done on image recognition for COVID-19, we employed ML models to
categorize clinical reports as either COVID-positive or COVID-negative.”
3 Methodology
3.1 Data Collection
As WHO declared the coronavirus pandemic, a public health emergency, hospitals,

and researchers have made data on the epidemic available to the public. We procured
a dataset from Kaggle.com, and it has 5,861,480 rows and ten columns. This dataset
contains ten features/variables that are binary in nature and could be determinants in
the prediction of COVID-19, as well as one class attribute that defines if COVID-19
is found, as shown in Table 1. The following table gives a concise description of the
columns of the dataset used in our analysis.
It is the phenomenon of converting raw data into an understandable format. Data

from the real world might contain noise, be missing numbers, or be in an incompat-
ible format, rendering ML models unable to use it directly. Data preparation is an
important stage in which we clean the data and prepare it to be compatible with, or
suitable for use in, an ML model. The key phases in data preparation are as follows:
Removing Features: Since test_date and test_indication = other would not have
significance on prediction of target variable, i.e., corona_result, we remove both the
features.
Removed features = test_date, test_indication_other
We also performed a Chi-Square test on the dataset since our dataset is completely
categorical, to see if we can further remove any unimportant variables that may
not contribute to the detection of our target variable, i.e., corona_result. However,
completion of the test confirmed that there were no such variables.
Table 1 Description of parameters considered in our dataset

Parameter Description
Cough This column describes whether a patient has cough or not
Fever This column describes whether a patient has fever or not
Sore_throat This column describes whether a patient has sore throat or not
Shortness_of_ This column describes whether a patient suffers from shortness of breath or
breath not
Head_ache This column describes whether a patient suffers from head ache or not
Corona_result This column describes whether a patient is COVID positive or negative
Age_60_and_ This column describes whether a patient’s age is above or below 60 year
above
Gender This column describes patient’s gender
Test_indication This column tells about the test_indication and is further classified into
‘Other’, ‘Abroad’, and ‘Contact with Confirmed’
Binary variables = sore_throat, cough, shortness_of_breath, fever, head_ache, corona_result, age_
60_and_above, gender. Non-binary variable = test_indication
Undersampling the data: We performed data undersampling after removing

the features, as it was noticed that the data were abundant for negative COVID
cares. We used RandomUnderSampler() under the imblearn.under_sampling library,
setting our sampling strategy as 0.6. Before and after undersampling of the dataset
is graphically demonstrated in Fig. 1.
Splitting the dataset: The dataset must be split as the next step in the prepro-
cessing of ML data. The training and testing datasets for a ML model should be
separated. We split the data in half, 70:30. This means that we preserve 30% of the
data for testing and use the remaining 70% to train the model.
Fig. 1 Graph of corona_result versus count before (left) and after (right) undersampling the data
3.3 Performance Metrics
The following parameters are considered in order to draw a comparison between the
performances of the Logistic Regression model with all the other ML models under
consideration:
Accuracy: It evaluates a model’s percentage of true predictions.
Accuracy = (TN + TP)/(TN + TP + FN + FP), (1)
where
TP true-positive predictions.
TN true-negative predictions.
FP false-positive predictions.
FN false-negative predictions.
Mean Squared Error: It helps to determine the average squared difference

between predicted and actual values.
1
n
MSE = (Yi − Ŷi )2 , (2)
n i=1
where n is the number of data points, yi is the actual target value, and Ŷi is the model’s
projected value.
R2 Score: It is the proportion of the variance in the variable that is dependent that
can be anticipated using the independent variables, and it ranges from 0 to 1; larger
the value, stronger the predictive power.
R 2 Score = 1 − (SSR/SST), (3)
where
SSR (Sum of Squared Residuals) = variation in the predicted values that the
model cannot explain.
SST (Total Sum of Squares) = total variation in the dependent variable.
ROC Score: It evaluates the effectiveness of a classification model by comparing
the amount of true positives to the percentage of false positives at various criterion
settings, with greater scores suggesting stronger discrimination ability. The ROC
score is typically calculated using the Area Under the ROC Curve (AUC).
4 Algorithms Used
The major goal for us is to assess the effectiveness of the Logistic Regression statis-
tical model against the Supervised ML algorithms, which include KNN, Decision
Tree, Random Forest, XGBoost, and SVM.
4.1 Statistical Model
The Logistic Regression statistical technique is used in investigating the interaction

between a dependent variable, that is binary and one or more independent variables
[11]. It is often used to forecast the likelihood of an event occurring based on historical
data. Logistic Regression generates a probability value between 0 and 1, that may be
used to categorize fresh data items. The model employs a logistic function to convert
the input variables to the output probability.
Hyperparameters: C = 1; penalty = ‘l1’; solver = ‘saga’.
4.2 Machine Learning Algorithms
Predictive modeling is evolving with the development of computer technology.

Predictable modeling can now be done more effectively and economically than in
the past. In order to identify the most sophisticated answer for each classification
method, we employ a GridSeachCV in our project. In Table 2, we demonstrate the
algorithms along with the tuned hyperparameters for the best performance:
5 Result Analysis
From Table 3, we can see that the ML algorithms outperform the Statistical Logistic
Regression method based on various performance metrics. Also, the former has lesser
MSE value than latter which demonstrates less scope of error. This is also shown
in Fig. 2 graphically, where the average of ML algorithms has a better performance
than Statistical Logistic Regression. From Table 4, we can also see the difference
between the metrics of average of the ML models and statistical Logistic Regression
model. Also, among the ML models (Table 3), SVM and XGBoost have the highest
accuracy of 97.75% each, but the former has a MSE value of 2.252 and the latter
has a value of 2.254 as seen in Table 3. Hence, SVM is the best-performing ML
algorithm. Therefore, performance of the average of ML algorithms is better than
Logistic Regression, and among the ML models, SVM has the best performance for
predicting the target variable in the dataset, i.e., corona_result.
Table 2 Different machine learning models and the hyperparameters used

Algorithm Description Hyperparameters (if any)
KNN Classifies new data points in the training set based on k=5
the majority vote of their k-nearest neighbors
Decision Generates a flowchart-like tree structure in order to
Tree make judgments by recursively partitioning the data
depending on the decision criteria [12]
Random Based on ensemble learning, dataset is divided into n_estimators = 200,
Forest subsets and applied on different Decision Trees for max_depth = 8
implementing strong-learners [13, 14]
XGBoost Combines numerous weak predictive models stage by eta = 0.2, gamma = 0.5,
stage to generate a strong predictive model with the max_depth = 5, n_
goal of minimizing total prediction error estimators = 200
SVM This column divides data points into classes by
locating an ideal hyperplane with the greatest margin
between themescribes patient’s gender [15, 16]
Table 3 Comparing performance of Logistic Regression with other machine learning models
Model Algorithm Accuracy MSE R2 score ROC score
Statistical Logistic Regression 97.72 2.279 90.26 98.151
Machine learning KNN 97.74 2.259 90.35 98.179
Decision Tree 97.74 2.258 90.36 98.181
Random Forest 97.74 2.255 90.36 98.189
XGBoost 97.75 2.254 90.37 98.182
SVM 97.75 2.252 90.38 98.193
Bold means Proposed Model results
6 Conclusion
With the increased rise of COVID-19, it is essential to investigate various machine

learning models for accurate prediction of COVID-19. As machine learning has
shown tremendous positive outcomes in the healthcare field by making predictions
in the analysis and processing of data, we try to inculcate these models in the detection
of COVID-19. We have used the Statistical Logistic Regression model and compared
it with the stronger ML models including KNN, Decision Tree, Random Forest,
Extreme Gradient Boost, and SVM. The ML models have outperformed the statistical
model. The models have depicted the percentage improvement of 0.024%, 0.103%,
0.115%, and 0.034% in accuracy, MSE, R2 score, and ROC score, respectively. There-
fore, the authors suggest the use of ML models to predict and diagnose COVID-19.
As future work, the authors plan to investigate more ML models including ensemble
learners on a number of open-source COVID-19 datasets. This would lead to more
generalizable results and hence more verified and stable conclusions.
Fig. 2 Graphs comparing the different evaluation metrics between Logistic Regression and an
average of all the machine learning algorithms
Table 4 Percentage
Accuracy Mean squared error R2 score ROC score
improvement of ML
algorithms over statistical 0.024% 0.103% 0.115% 0.034%
model
References
1. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei, YY, Yuan
ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ (2020) A
new coronavirus associated with human respiratory disease in China
2. Gautret P, Lagier JC, Parola P, Meddeb L, Mailhe M, Doudier B, Courjon J, Giordanengo
V, Vieira VE, Dupont HT (2020) Hydroxychloroquine and azithromycin as a treatment of
Covid-19. Int J Antimicrob Agents 56(1):105949
3. Lai CC, Shih TP, Ko WC, Tang HJ, Hsueh PR (2020) Severe acute respiratory syndrome
coronavirus 2 (sars-cov-2) and coronavirus disease-2019 (Covid-19). Int J Antimicrob Agents
55(3):105924
4. Garcia S, Luengo J, Sáez JA, Lopez V, Herrera F (2012) A survey of discretization techniques.
IEEE Trans Knowl Data Eng 25(4):734–750
5. Muhammad I, Yan Z (2015) Supervised machine learning approaches. ICTACT J Soft Comput
5(3)
6. Medscape Medical News (2020) The WHO declares public health emergency for novel
coronavirus
7. Batista AFM, Miraglia JL, Donato THR, Filho ADPC (2020) COVID-19 diagnosis prediction
in emergency care patients. medRxiv
8. Mondal MRH, Bharati S, Podder P, Podder P (2020) Data analytics for novel coronavirus
disease. In: Informatics in Medicine Unlocked Elsevier, vol 20, pp 100374
9. Schwab P, Schutte AD, Dietz B, Bauer S (2020) Clinical predictive models for COVID-19:
systematic study. J Med Internet Res 22(10):e21439
10. Goodman-Meza D, Rudas A, Chiang JN, Adamson PC, Ebinger J (2020) A machine learning
algorithm to increase COVID-19 inpatient diagnostic capacity. PLoS ONE 15(9):e0239474
11. Connelly L (2020) Logistic regression. Medsurg Nurs 29(5):353–354
12. Patel BR, Rana KK (2014) A survey on decision tree algorithm for classification. Int J Eng
Dev Res IJEDR 2(1)
13. Breiman L (2001) Random forests. 45(1):5–32
14. Hastie T, Tibshirani R, Friedman J (2009) Random forests. In: The elements of statistical
learning. Springer, 587–604
15. Wang H, Xiong J, Yao Z, Lin M, Ren J (2017) Research survey on support vector machine. In:
Proceedings of the 10th EAI International conference on mobile multimedia communications,
pp 95–103
16. Rahman MM, Islam MD, Manik MD, Hossen M, Al-Rakhami MS (2021) Machine learning
approaches for tackling novel coronavirus (Covid-19) pandemic. Sn Comput Sci 2(5):1–10
17. Sun Y, Koh V, Marimuthu K, Ng OT, Young B, Vasoo S, Chan M (2020) Epidemiological and
clinical predictors of COVID-19. Clin Infect Dis 71(15):786–792
SONAR-Based Sound Waves’ Utilization
for Rocks’ and Mines’ Detection Using
Logistic Regression
Adrija Mitra, Adrita Chakraborty, Supratik Dutta, Yash Anand,

Abstract SONAR, which is called sound navigation and ranging, uses sound waves
to identify things underwater. It is usually utilized for two things which include rock
detection and mine detection. Rock detection entails utilizing SONAR to identify
the presence of rocks or other underwater impediments that might endanger boats
or ships. This is often accomplished by analyzing the sound waves that bounce
back from the bottom and use machine learning algorithms to find patterns that
signal the presence of rocks. Mine detection, on the other hand, is a more difficult
process that entails recognizing and finding underwater explosive devices. This is
often accomplished by combining SONAR with additional sensing technologies,
such as magnetic or acoustic sensors. Machine learning techniques are then used
to analyze the data and detect patterns that indicate the existence of mines. Based
on the input features, logistic regression can predict one of two outcomes and is
frequently used for binary classification. It is capable of classifying SONAR data
as rock or mine. To train the logistic regression model, a dataset of rock and mine
examples are gathered and preprocessed to extract key characteristics which further
gets normalized. The model should then be trained to learn a decision boundary that
divides the two classes. The trained algorithm can predict whether new SONAR data
will be classified as rocks or mines. Depending on the properties of the dataset and
the task at hand, other machine learning algorithms, such as support vector machines
or neural networks, may be more effective. Recorded training and testing accuracies
using logistic regression were 96.2 and 91.5%, respectively.
Keywords Sound waves · Rocks · Mines · Machine learning · Logistic regression
A. Mitra · A. Chakraborty · S. Dutta (B) · Y. Anand · S. Mishra

Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India
e-mail: dsupratik1@gmail.com
S. Mishra
e-mail: sushruta.mishrafcs@kiit.ac.in
A. Kumar
DIT University, Dehradun, India
https://doi.org/10.1007/978-981-99-6553-3_15
192 A. Mitra et al.
1 Introduction
The use of machine learning to foresee SONAR rocks against mines is covered in this
article. SONAR is an acoustic technique used to find and gauge the size and direction
of underwater objects. The SONAR gadget recognizes and recognizes sound waves
produced or reflected by the object. There are three different categories for SONAR
systems. A target object reflects back a sound wave that is emitted by an acoustic
projector in an active SONAR system. A receiver picks up the reflected signal and
examines it in order to determine the target’s range, heading, and relative velocity.
In essence, passive systems are receivers that pick up noise that the target (such as a
ship, submarine, or torpedo) emits. By using this method of observation, waveforms
can be inspected to identify features as well as direction and distance. An acoustic
communication system, which requires a projector and receiver at both ends of the
acoustic line, is the third type of SONAR equipment. The extraction of rich minerals
from the crust of the globe, including its oceans, is known as mining. A mineral is
an inorganic substance that is found in nature and has distinct chemical properties,
physical features, or molecular structure, with a few significant exceptions. When
evaluating mineral reserves, profit must be taken into account. The ore reserve only
relates to the quantities that can be profitably extracted, whereas the mineral inven-
tory refers to the entire number of minerals in a given deposit. Figure 1 shows the
SONAR usage to distinguish between rocks and mines. SONAR noises and targets
can be recognize using machine learning and deep learning algorithms. Machine
learning enables the analysis of SONAR waves and target detection. It is a branch of
artificial intelligence that provides guidelines for improving the data usage of robots.
Receiving data as input, recognizing characteristics, and predicting fresh patterns are
the three stages of machine learning. Principal component analysis, logistic regres-
sion, support vector machines, k-nearest neighbors (KNN), C-means clustering, and
other ML approaches are commonly used in this subject.
In this statistical model called logistic regression, a binary dependent variable
is modelled using a logistic function based on one or more independent variables,
often called features or predictors. The goal of logistic regression is to find the best-fit
parameters for the logistic function that can accurately predict the probability of the
binary outcome given the input features. In the case of SONAR, logistic regression
could be used to classify the data as either indicating the presence of a rock or a mine.
The input features would be derived from the SONAR data, such as the frequency
and amplitude of the sound waves. The logistic regression model would then learn a
decision boundary that separates the two classes based on the input features. To make
predictions, the logistic regression model calculates the probability of the binary
outcome (rock or mine) given the input features. If the probability is greater than
a certain threshold, typically 0.5, the model predicts the positive outcome (mine),
and if it is less than the threshold, the model predicts the negative outcome (rock).
This threshold can be adjusted to prioritize either precision or recall, depending on
the application’s needs. Overall, logistic regression is a powerful tool for binary
SONAR-Based Sound Waves’ Utilization for Rocks’ and Mines’ … 193
Fig. 1 Representing
SONAR usage by
submarines to detect
difference between Rocks
and Mines
classification tasks and has proven to be effective in various applications, including

SONAR rock and mine detection.
2 Literature Review
In [1], the usage of Creative Commons Attribution 4.0 International Licence 494
is carried out. Prediction of targets for underwater is done to discuss the classifica-
tion of SONAR targets into rocks and mines using Meta-Cognitive Neural Network
(MCNN) and Extreme Learning Machine (ELM) classifiers. It is done to achieve
an acceptable efficiency in Classification of SONAR targets using advanced neural
networks. In [2], researchers have tested a range of methods for linking and excluding
noisy data, which is usually referred to as arbitrary chaos in the training dataset. In
essence, these styles identify data exemplifications that confuse the training model
and lessen the delicateness of brackets. They typically look for data abnormalities and
analyze how they affect delicate categorization. In [3], numerous machine learning
methods are analyzed, and various approaches to the detection of network intrusions
are suggested. Firewalls, antivirus programmes, and other network intrusion detec-
tion systems are some of the various systems that make up the network security
system. The primary goal of an intrusion detection system is to identify unautho-
rized system activity like copying and modification. In [4], using a big, intricate, and
highly spatial SONAR dataset, this work was a basic case study that established a
machine learning technique for the classification of rocks and minerals. In [5], by
combining neural networks and online learning, Online Multiple Kernel Learning
(OMKL), a technique created by Ravi et al. aims to build a kernel-based prediction
function from a series of specified kernels. Here, SVM and NN algorithms were
194 A. Mitra et al.
used to separate the SONAR data. In [6], ocean mines are the primary threat to the
safety of large ships and other marine life. It is a self-contained explosive device that
is used to destroy submarines or ships. Due to several factors, such as variations in
operating and target shapes, it is difficult to identify and classify SONAR pictures
with relation to underwater objects. In [7], assuming an object is within the sound
pulse’s range, and the sound pulse will reflect off the target and send an echo in the
direction of the SONAR transmitter if the target is within the range of the sound
pulse. The temporal delay between the production of the pulse and the reception of
the signal it is linked with is set by the transmitter. In [8], in recent years, the DL
area has rapidly grown and been successfully used to a wide range of conventional
applications. More importantly, DL has outperformed well-known ML techniques in
a number of sectors, including cybersecurity, natural language processing, bioinfor-
matics, robotics and control, and the study of medical data. In [9], choosing a subset
of features for a learning and statistics system’s model construction is referred to
as feature selection. Local search algorithms can assist in reducing the number of
attributes by using sequential search methods. Artificial neural networks are a well-
known artificial intelligence technology that can depict and capture complex rela-
tionships between data input and output. In [10], underwater mines are a vital military
tactic for protecting any country’s maritime borders. They consist of an explosive
charge, a sensing mechanism, and a totally autonomous device. Mines from earlier
generations have to come into contact with the ship directly to detonate. In contrast,
newly built mines are equipped with advanced sensors that often recognize different
fusions of magnetic and auditory signals.
3 Proposed Work
The proposed model is illustrated in Fig. 2. SONAR, which stands for sound navi-
gation and ranging, is beneficial for exploring and charting the ocean since sound
waves travel farther in water than radar and light waves do. NOAA scientists primarily
employ SONAR to make nautical charts, identify underwater navigational hazards,
locate and map objects on the seafloor, including shipwrecks, and find and map
objects on the bottom [11–13]. SONAR employs sound waves to provide vision in
the water. In this study, we use SONAR to broadcast and receive signals from rocks
and metal cylinders after they reflect back to us; this allows us to determine whether
or not a mine is there. We adjusted the machine learning model to accommodate this
SONAR data. We needed to go through a procedure known as data preprocessing
because we could not have used these SONAR data directly for modelling. Processing
data increases accuracy and dependability. Preprocessing data can improve a dataset’s
accuracy and dependability by removing missing or inconsistent data values that are
the result of either human or computer error. The data become consistent as a result.
Multi-processing is the preprocessing method used for this project since it allows
two or more processors to operate simultaneously on the same dataset [14]. The same
mechanism then stores this dataset. In a single computer system, data are divided
Fig. 2 Diagrammatic representation of the proposed model
into frames, and each frame is processed in parallel by two or more CPUs. We must
split this dataset into training and testing halves after data processing. This phase
is necessary because, after processing the dataset, there may be a large amount of
data, most of which will likely be used for training and the remainder for testing.
Given that this example involves a binary case model—that is, either we detect a rock
or a mine—and since the logistic regression model works best in binary situations,
we choose to employ this model. Logistic regression is one of the machine learning
algorithms that is most frequently employed in the Supervised Learning category. It
is used to forecast the categorical dependent variable using a specified set of inde-
pendent variables. The dataset that is currently available must be used to train this
data model. This well-trained logistic regression model will assist in identifying how
a mine’s features differ from those of a rock. We will receive two abbreviations in
the final forecast result, such as R stands for rock and M stands for mine [15].
4 Implementation Analysis
The tested model of the data had a accuracy of 91.5%, whereas the trained model
passed with an accuracy of 96.2% as shown in Fig. 3. The model is continually trained
and tested, and the process is then repeated with more datasets to assess the model’s
correctness. Since the accuracy kept varying within the previously mentioned range,
we ultimately aggregated it to the following metrics.
Figure 4 shows a heat map, which is a graphic representation of data that use a
system of colour coding to represent different values. Heat maps can be used for a
196 A. Mitra et al.
Fig. 3 Accuracies of trained and tested models using Logistic Regression Classifier
wide range of statistics, although they are most often used to show user behaviour on
certain websites or web page themes. The association between numerous variables
that reflect different datasets is shown on the heat map [16]. The colours in the
illustration demonstrate the intensity of the colours. Warmer colours have higher
values, whereas warmer colours have lower values. There were 58 rows in total;
thus, each value is being displayed separately. A − 0.4 to 1.0 indication that displays
the strength of all the linked values in the dataset is also present. Less value in the
indication equals a higher intensity value across the board, and vice versa.
Figure 5 shows the box plot representation of a particular feature from the dataset.
The figure clearly conveys how the numbers are particularly concentrated towards
the 0.01–0.04 range, respectively. As we go deeper into the dataset, we will get a
huge variance of data ranging as low as 0.07 to as high as 0.14. The scatter plot
representation as shown in Fig. 6 of the first two features is given in the dataset. The
huge number of dots in the range 0–0.05 shows how densely populated the dataset is
within this range. Moreover, this also shows how minute the differences are between
the detection of a rock and a mine in this dataset.
Some significant societal benefits of the model include that it works well with
linearly separable datasets and offers a good level of accuracy for many common
datasets [17–19]. Regarding the distributions of the classes in feature space, it makes
no assumptions. This is substantially simpler to set up and train than other machine
learning and AI applications [20].
Fig. 4 Heat map representation of a feature of the proposed model
Fig. 5 Box plot representation of a feature of the proposed model
Our research on “Underwater mine and rock prediction by evaluation of machine

learning algorithms” identifies rocks and mines on the ocean floor. Naval mines are an
effective tool for halting ships and restricting naval operations, but they have serious
detrimental impacts on the economy and ecology. The two established methods for
198 A. Mitra et al.
Fig. 6 Scatter plot representation of first two features from the proposed model
locating mines are SONAR waves and manual labour. Given the increased risk,
using SONAR signals has shown to be a more efficient strategy. The collected data
are stored in a CSV file. We may explore and understand the nature of the prediction
system by using a variety of machine learning approaches. We can confirm and
evaluate the accuracy of algorithms through analysis, and we can use the results
to create a system that works better. In addition to rocks, the ocean floor contains
a number of undesirable elements that could affect the accuracy of our model’s
predictions. Additionally, there are plastic wastes, radioactive wastes, and various
other kinds of mines. Such a crucial calculation should have an accuracy of about 85–
90%. For our machine learning algorithm to accurately identify the kind of substance
encountered, much more research and innovation are needed.
The large data Hadoop architecture will be used to handle increasingly compli-
cated data in the future studies. The work was primarily concerned with the SONAR’s
backend capabilities; frontend development calls for familiarity with the flask or
Django frameworks. Once we got the information, we could build the frontend and
think about deployment.
References
1. Lepisto L, Kunttu I, Visa AJE (2005) Rock image classification using color features in Gabor
space. J Electron Imag 14(4). Article ID 040503
2. Fong S, Deb S, Wong R, Sun G (2014) Aquatic sonar signals recognition by incremental data
sluice mining with conflict analysis. Int J Distrib Sens Netw 10(5):635834
3. Ali SF, Rasool A (2020) SONAR data classification using multi-layer perceptrons. Int J 5(11)
4. Hossain MM, Paul RK (2019) Prediction of underwater surface target through SONAR: a case
study of machine learning. Int J Inform Technol 11(1):51–57. https://link.springer.com/https://
doi.org/10.1007/978-981-15-0128-9_10
5. Siddhartha JB, Jaya T, Rajendran V (2018) RDNN for classification and prediction of rock/
mine in underwater acoustics. J Appl Sci Comput 5(1):1–5
6. Padmaja V, Rajendran V, Vijayalakshmi P (2016) Study on metal mine detection from under-
water sonar images using data mining and machine learning techniques. Int J Adv Res Electr
Electron Instrum Eng 5(7):6329–6336. https://link.springer.com/https://doi.org/10.1007/s12
652-020-01958-4
7. Khare A, Mani K (2020) Prediction of rock and mineral from sound navigation and ranging
waves using artificial intelligence techniques. Int J Comput Intell Res 16(4):625–635
8. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel
MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures,
challenges, applications, future directions. Neural Comput Appl 33(19):14173–14192
9. Abdul-Qader B (2016) Techniques for classification sonar: rocks vs. mines. J Comput Sci
Technol 16(3):75–80
10. https://ieeexplore.ieee.org/abstract/document/10011104
11. Hożyń S (2018) A review of underwater mine detection and classification in sonar imagery.
Arch Min Sci 63(1):149–164
12. Tripathy HK, Mishra S (2022) A succinct analytical study of the usability of encryption
methods in healthcare data security. In: Next generation healthcare informatics. Springer Nature
Singapore, Singapore, pp 105–120
13. Raghuwanshi S, Singh M, Rath S, Mishra S (2022) Prominent cancer risk detection using
ensemble learning. In: Cognitive informatics and soft computing: proceeding of CISC 2021.
Springer Nature Singapore, Singapore, pp 677–689
14. Mukherjee D, Raj I, Mishra S (2022) Song recommendation using mood detection with Xcep-
tion model. In: Cognitive informatics and soft computing: proceeding of CISC 2021. Springer
Nature Singapore, Singapore, pp 491–501
15. Sinha K, Miranda AO, Mishra S (2022) Real-time sign language translator. In: Cognitive infor-
matics and soft computing: proceeding of CISC 2021. Springer Nature Singapore, Singapore,
pp 477–489
16. Mishra Y, Mishra S, Mallick PK (2022) A regression approach towards climate forecasting
analysis in India. In: Cognitive informatics and soft computing: proceeding of CISC 2021.
17. Patnaik M, Mishra S (2022) Indoor positioning system assisted big data analytics in smart
healthcare. Connected e-health: integrated IoT and cloud computing. Springer International
Publishing, Cham, pp 393–415
18. Periwal S, Swain T, Mishra S (2022) Integrated machine learning models for enhanced secu-
rity of healthcare data. In: Augmented intelligence in healthcare: a pragmatic and integrated
analysis. Springer Nature Singapore, Singapore, pp 355–369
19. De A, Mishra S (2022) Augmented intelligence in mental health care: sentiment analysis and
emotion detection with health care perspective. In: Augmented intelligence in healthcare: a
pragmatic and integrated analysis, pp 205–235
20. Dutta P, Mishra S (2022) A comprehensive review analysis of Alzheimer’s disorder using
machine learning approach. In: Augmented intelligence in healthcare: a pragmatic and
integrated analysis, pp 63–76
A Sampling-Based Logistic Regression
Model for Credit Card Fraud Estimation
Prapti Patra, Srijal Vedansh, Vishisht Ved, Anup Singh, Sushruta Mishra,
and Anil Kumar
Abstract One of the most frequent problems that we are facing today is credit card
fraud detection, and the most definite reason behind it is the phenomenal increase
in online transactions. Currently, we are often facing such fraud cases due to unau-
thorized purposes of money transactions in our everyday life. Hence, to detect such
fraudulent activities, we can use credit card deceit assessment model. In this paper,
we propose our approach to detect such frauds. Our study mainly attempts to address
upon the application of predictive techniques on this domain. The algorithms that we
have used are logistic regression, decision tree classifier, and random forest classi-
fier. The derived results are evaluated using accuracy, precision, recall, and F1-score.
We have used all three algorithms for both undersampling and oversampling cases.
The logistic regression technique generates the optimum result thereby giving the
best accuracy, precision, recall, and F1-score. Thus, it can be inferred as the best
alternative in detection of credit card frauds.
Keywords Credit card · Fraud detection · Credit card fraud · Logistic regression ·
Decision tree
1 Introduction
Fraud detection in credit cards involves tracking the activity of card holders so as
to estimate and prevent unauthorized transactions and objectionable behavior. In
the present world, credit card fraud is on rise especially in corporate and finance
industries. Our population is highly dependent on the Internet today and that is one
of the main reasons for online fraudulent transactions. Although, offline transactions
go through similar fraud cases as well. We have data mining techniques to detect these
P. Patra · S. Vedansh · V. Ved · A. Singh · S. Mishra (B)

Kalinga Institute of Industrial Technology, Deemed to Be University, Bhubaneswar, India
A. Kumar
https://doi.org/10.1007/978-981-99-6553-3_16
202 P. Patra et al.
Fig. 1 Graph depicting

growth of Internet users over
time
frauds, but the result is not very accurate. Hence, we need some promising methods
to minimize such credit card frauds. We can do that with the help of efficient machine
learning algorithms [1].
As shown below in Fig. 1, due to the increasing growth of Internet users, the finance
company issues credit cards to individuals. As far as card insurance is concerned,
amount is to be paid back by card user and also the extra charge agreed by both
parties.
Predictive techniques are designed in order to assess all valid processing and
track the ambiguous ones. Professionals investigate the documents and contact the
cardholders to verify whether the transaction is legitimate or fraudulent [2].
We have used three algorithms.
Logistic Regression
It is a statistical method used in prediction-based classification domains, where the
objective is to estimate if any input is associated with which category. The logistic
regression algorithm uses a logistic function to calculate the probability of the input
belonging to each category.
Decision Tree Classifier
This classification method is applied to classification-based problems. It is a type of
supervised technique which utilizes a hierarchical framework to categorize samples.
The tree is made up of nodes and branches, where the nodes denote testcase on a
variable, while the branch denotes result of testcase. At each node, a decision is made
based on the value of a feature or attribute, and the decision leads to the next node
in the tree until a classification decision is made at the final node.
A Sampling-Based Logistic Regression Model for Credit Card Fraud … 203
Fig. 2 Rough architecture diagram for fraud detection
Random Forest Classifier

It is a popular ensembling technique in intelligent learning. It is a supervised learning
algorithm that constructs many trees to integrate their estimations to generate ultimate
outcome. During the training phase, the algorithm constructs a forest of decision trees
by repeatedly selecting a random subset of the data and features and then growing a
decision tree using that subset. The method further integrates the predicted values of
all trees in forest to provide overall estimation [3]. Figure 2 shows a rough architecture
of the fraud detection system.
Main contribution in the paper are as follows:
. Our objective is to detect fraud credit card processing with predictive methods.
. This study makes use of three predictive techniques, namely logistic regression,
decision tree classifier, and random forest classifier.
. It was observed that logistic regression provided the best accuracy on undersam-
pling data with 95.78%, whereas on oversampling data, random forest classifier
provides the best accuracy with 99.99%.
2 Literature Review
Researchers have introduced several new techniques for credit card fraudulent anal-
ysis, involving computational intelligent techniques and cognitive units. Below listed
are some related works in this regard. In 2019, Jain et al. [4] have researched a few
fraud detection techniques like SVM, ANN, Bayesian networks, KNN, and fuzzy
logic system. The authors inferred that the KNN, trees, and vector-based algorithms
had average accuracy rate, while fuzziness-based system and regression methods had
the least precision among all methods. On the other hand, neural networks, Naive
204 P. Patra et al.
Bayes, fuzzy systems, and KNN algorithms had a higher prediction degree. Multi-
level regression, vector methods, and cluster trees gave superior predictive level at
middle degree. However, there were few methods, including neural networks and
Bayesian models, which performed well with different metrics, but they were costly
for training. A significant demerit of all these models was that they did not produce
the identical results in all types of environments. They provided good outcome in
one sample set while inferior outcome in some other data. For instance, KNN and
SVM algorithms performed well with small datasets, whereas logistic regression and
fuzzy logic systems showed better efficiency with original unprocessed dataset. In
2019, Naika et al. [5] performed analysis on four algorithms, namely Naive Bayes,
AdaBoost, logistic regression, and J48. Naive Bayes utilizes Bayes’ theorem to calcu-
late the probability of occurrence of an activity. On the other hand, logistic regression
is same as linear regression, but it is typically used for classification tasks. Linear
regression is commonly used for prediction or forecasting values. J48 is an algo-
rithm used for creating a decision tree and solving classification problems. It is an
extension of the ID3 algorithm and is a popular intelligent learner method which
operates primarily with categorical and constant variables. AdaBoost is designed for
binary classification, primarily utilized for improving performance of decision trees.
It is often used in fraud detection, such as classifying transactions as fraudulent or
non-fraudulent. Researchers have found that both AdaBoost and logistic regression
have almost similar efficiency; however, the AdaBoost algorithm is more suitable
for detecting credit card fraud due to its faster processing time. In 2019, research has
been done by authors in [6] where they introduced two significant algorithmic tech-
niques—the whale optimization techniques (WOA) and synthetic minority oversam-
pling techniques (SMOTE). The primary objective of these techniques is to enhance
the convergent velocity and resolve the data skewing concern. SMOTE technique
addresses the problem of class imbalance by generating synthetic transactions that
are re-sampled to validate dataset effectiveness. The WOA technique is then applied
to optimize the synthesized transactions. This algorithmic approach improves the
reliability, efficiency, and convergence speed of the system.
In 2018, authors in [7] investigated decision trees, random forest, SVM, and
logistic regression on a highly skewed dataset. They evaluated the performance based
on metrics such as accuracy, sensitivity, specificity, and precision. The results showed
that the accuracy for logistic regression was 97.7%, for decision trees was 95.5%, for
random forest was 98.6%, and for SVM classifier was 97.5%. The authors confirmed
that random forest outperformed others and had the highest accuracy among the other
algorithms for detecting fraud. They also found that the SVM algorithm had a data
skewing issue and did not produce good outcome for determining credit card fraud.
In a related domain, Yu and Wang in [8] proposed an outlier detection concept to
detect suspicious variables in a data. Their method considers fraudulent points as a
separate zone in vectored region, which can either appear independently or be part of
a small group of clustered data points. According to findings, the approach achieves
an accuracy of 89.4%, while the outlier limit is predefined as 12.
3 Proposed Model
Figure 3 is a flowchart showing the methodology of our fraud detection system. We

collect the sample dataset from the customer transaction database, and we train it
with the three models that we are using, namely logistic regression, decision tree
classifier, and random forest classifier [9]. When a user performs a transaction, it is
taken into the decision function using the fraud detection algorithm. The output is
compared and analyzed. If found legitimate, transaction is approved. If found fraud,
the respective bank is alerted for verification.
Fig. 3 Proposed methodology of fraud detection

206 P. Patra et al.
Fig. 4 Steps of project planning for the proposed work
Figure 4 shows the basic steps for project planning and execution. First we collect
input from the data samples and split it into training and testing sets. Next, we
prepare the data. We choose a model and train our dataset. We deploy our model and
evaluate its performance by testing it. Finally, we use our model on testing data to
make predictions accurately [10].
4 Result Analysis and Discussion
We have aggregated the data samples from Kaggle [11], a widely used website for
downloading datasets.
A full cross-validation has been performed to validate the performance of the
algorithms. For undersampling, after performing a complete mean of the following
data it is observed that logistic regression provides the best results on the data thereby
producing accuracy, precision, recall, and F1-score as 95.78%, 95.46%, 94.78%, and
95.22%, respectively. Table 1 highlights the overall analysis using the classifiers.
Table 1 Performance results using credit fraud data samples

Algorithms used Accuracy Precision Recall F1-score
Logistic regression 95.78 95.46 94.78 95.22
Decision tree 90.32 93.68 87.25 90.35
Random forest 94.736 93.77 90.196 94.84
96.5
96
95.5
95
94.5
94
93.5
93
92.5
92
91.5
91
Accuracy Precision Recall F1-Score
With Sampling 95.78 95.46 94.78 95.22
Without Sampling 92.77 93.56 93.11 92.85
With Sampling Without Sampling
Fig. 5 Performance analysis in context to use of sampling method
In Fig. 5, effectiveness of logistic regression is validated based on use of sampling

method on real-time credit card transactions data and their performance is calcu-
lated based on several metrics. The dataset is sampled based upon undersampling,
achieving two sets of data distribution. The effectiveness of dataset is examined based
on the accuracy, precision, recall, and F1-score [12, 13, 13, 14]. It is noted that use
of sampling approach on classification enhances the effectiveness of prediction.
This study shows the comparative performance of logistic regression, decision tree,
and random forest. Increase in credit card frauds has been alarmingly addressed by the
fraudulent control system in all the banks, so a machine learning based fraud detection
system is used to provide both accuracy and transparency in assessing these frauds.
All these three classifiers are trained based on real-time credit card transactions which
will help us to reduce at least 40–50% of total fraud losses [15–18].
Given the flexibility of this study, various models can be combined as units, and
their outcome can be embedded to enhance the final result’s efficiency. To further
refine this model, additional algorithms can be integrated, as long as their output
matches the others’ format. This modular approach allows for increased versatility
and flexibility in the project.
Another opportunity for improvement lies within the dataset. As demonstrated
previously, the algorithms’ precision improves as the dataset’s size increases. There-
fore, increasing the dataset’s size is likely to enhance the model’s ability to detect
fraud and reduce false positives. However, gaining the necessary support from banks
is essential to achieving this goal [11].
208 P. Patra et al.
References
2. Raghuwanshi S, Singh M, Rath S, Mishra S (2022) Prominent cancer risk detection using
ensemble learning. In: Cognitive informatics and soft computing: proceeding of CISC 2021.
3. Mukherjee D, Raj I, Mishra S (2022) Song recommendation using mood detection with Xcep-
tion model. In: Cognitive informatics and soft computing: proceeding of CISC 2021. Springer
Nature Singapore, Singapore, pp 491–501
4. Jain Y, Tiwari N, Dubey S, Jain S (2019) A comparative analysis of various credit card fraud
detection techniques. Int J Recent Technol Eng 7(5S2):402–407, ISSN: 2277-3878
5. Naik H, Kanikar P (2019) Credit card fraud detection based on machine learning algorithms.
Int J Comput Appl 182(44):8–12
6. Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing
for feature selection. Neurocomputing 260:302–312
7. Khare N, Yunus S (2021) Credit card fraud detect ion using machine learning models and
collating machine learning models. Int J Pure Appl Math 118(20):825–838, ISSN: 1314-3395.
https://doi.org/10.30534/ijeter/2021/02972021
8. Yu W, Wang N (2009) Research on credit card fraud detection model based on distance sum.
Int Joint Conf Artif Intell 2009:353–356
9. Credit Card Fraud Detection (2018) A realistic modeling and a novel learning strategy
published. IEEE Trans Neural Netw Learn Syst 29(8)
10. Nadim A, Sayem IM, Mutsuddy A, Chowdhury MS (2019) Analysis of machine learning
techniques for credit card fraud detection. IEEE
11. Mishra N, Mishra S, Tripathy HK (2023) Rice yield estimation using deep learning. In: Innova-
tions in intelligent computing and communication: first international conference, ICIICC 2022,
Bhubaneswar, Odisha, India, Dec 16–17, 2022, Proceedings. Springer International Publishing,
Cham, pp 379–388
12. Chakraborty S, Mishra S, Tripathy HK (2023) COVID-19 outbreak estimation approach using
hybrid time series modelling. In: Innovations in intelligent computing and communication:
first international conference, ICIICC 2022, Bhubaneswar, Odisha, India, Dec 16–17, 2022,
Proceedings. Springer International Publishing, Cham, pp 249–260
13. Verma S, Mishra S (2022) An exploration analysis of social media security. In: Predictive
data security using AI: insights and issues of blockchain, IoT, and DevOps. Springer Nature
14. Singh P, Mishra S (2022) A comprehensive study of security aspects in blockchain. In: Predic-
tive data security using AI: insights and issues of blockchain, IoT, and DevOps. Springer Nature
15. Swain T, Mishra S (2022) Evolution of machine learning algorithms for enhancement of self-
driving vehicles security. In: 2022 international conference on advancements in smart, secure
and intelligent computing (ASSIC). IEEE, pp 1–5
16. Sahoo S, Mishra S (2022) A comparative analysis of PGGAN with other data augmentation
technique for brain tumor classification. In: 2022 international conference on advancements in
smart, secure and intelligent computing (ASSIC). IEEE, pp 1–7
17. Mohapatra SK, Mishra S, Tripathy HK (2022) Energy consumption prediction in elec-
trical appliances of commercial buildings using LSTM-GRU model. In: 2022 international
conference on advancements in smart, secure and intelligent computing (ASSIC). IEEE, pp
1–5
18. Stolfo SJ, Fan DW, Lee W, Prodromidis A, Chan PK (2000) Cost based modeling for fraud
and intrusion detection: results from the JAM project. Proc DARPA Inf Survivability Conf
Exposition 2(2000):130–144
19. Deepti DP, Sunita MK, Vijay MW, Gokhale JA, Prasad SH (2010) Comput Sci Netw Secur
10(8)
iFlow: Powering Lightweight
Cross-Platform Data Pipelines
Supreeta Nayak, Ansh Sarkar, Dushyant Lavania, Nittishna Dhar,

Abstract With the advent of ML applications cutting across sectors, data prepro-
cessing for the training and proper functioning of ML models has seen a rise in
importance. This research paper represents a similar attempt by proposing iFlow, a
software tool for the easy creation of cross-platform data flow pipelines based on
the Python programming language. The tool leverages the default file system of the
user’s operating system, enabling faster and real-time inflow and outflow of data
for easier and more convenient data processing. The project plan emphasizes modu-
larity and extensibility, with a focus on the automation of data pipelines, as well
as the development of associated UI components for a better user experience. The
paper highlights the potential applications of iFlow in the field of machine learning
pipelines, positioning it as a lightweight and open-source MLOps framework for the
future.
Keywords iFlow · Data pipelines · Data processing · Cross-platform ·

Lightweight
S. Nayak · A. Sarkar · D. Lavania · N. Dhar · S. Mishra (B)

Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India
S. Nayak
e-mail: 2005207@kiit.ac.in
A. Sarkar
D. Lavania
N. Dhar
A. Kumar
DIT University, Dehradun, India
https://doi.org/10.1007/978-981-99-6553-3_17
212 S. Nayak et al.
1 Introduction
With the advent of data science and associated analytics for deriving conclusions
on modern-day problems, new sophisticated tools have been developed by software
engineers around the world to enable the automation of data flow pipelines allowing
for faster and real-time in and outflow of data. These pipelines are generally used as
stand-alone real-time feeds for procuring insights into data being transferred between
(both inter and intra) systems. However, such tools often end up being platform-
specific (mostly Linux-based platforms) and require a high initial effort for setting
up thereby increasing the time required to get them up and running. The proposed
software being developed as a part of this minor project (further referred to as “iFlow”
in this document) allows for the easier creation of cross-platform pipelines based upon
the Python programming language and leveraging the default file system exposed by
the user’s OS. “iFlow” shall provide an easy and convenient way to set up such data
(pre)processing pipelines which would be both lightweight and extensible to a wide
variety of other possible use cases in the future. The inherent complexity involved
in the project has been handled by taking specific design decisions related to the
frameworks being used. Further implementation details can be found as we proceed
through this document. The project plan has been created keeping in mind both
modularity and extensibility which shall allow us to enhance support well into the
future and possibly add more features and types of pipelines. In the real world, one of
the major use cases of using data pipelines can be seen in the setting up of machine
learning pipelines which comes under the umbrella of a currently upcoming field
better known as MLOps. These pipelines allow the automation of the data cleaning
steps and allow the passing of the processed data to subsequent pipelines defined
in the workflow. Currently, the development of “iFlow” shall be solely focused on
the automation of data pipelines and the development of associated UI components
for a better and smoother user experience, but in the future, we plan to develop this
further as an open-source, lightweight, and cross-platform MLOps framework.
The following sections of this research paper are structured in order to provide
the reader with a better and more in-depth understanding of the working of the
system. The “Literature Review” focuses on making the reader aware of the various
steps that have already been taken in this field of study and academic research. The
“Proposed Model” forms the bulk of this paper and details each and every aspect
of the entire system and how all the different components are to come together
and act as an easy and efficient developer tool for easy setting up of data flow and
preprocessing pipelines. The “Results” section on the other hand focuses on how
we plan to implement the various components of the system as well as the expected
advantages obtained as a result of the various design decisions that have been taken
along the journey of developing the paper.
The major contributions that we aim to make in the analysis are summarized as
follows.
iFlow: Powering Lightweight Cross-Platform Data Pipelines 213
. Proposed “iFlow”, a lightweight and cross-platform software tool for easy creation
of data flow pipelines in Python, leveraging the user’s OS file system for real-time
inflow and outflow of data.
. Emphasized “modularity and extensibility”, focusing on automation of data
pipelines and development of UI components. Detailed proposed model for an
easy and efficient developer tool for data flow and preprocessing pipelines.
. Positioned iFlow as a “lightweight and open-source” MLOps framework for easier
automation of data cleaning steps and processing of data in machine learning
pipelines.
. “Cross-platform tool” for easier creation of pipelines, lightweight, and extensible
for a variety of future use cases. Specific design decisions related to frameworks
are used to handle inherent complexity and enable faster data processing.
2 Literature Review
Before diving deeper and attempting to create our very own data pipeline and prepro-
cessing framework, it is necessary to understand the already available tools present
in the market for the same purpose in order to tackle the problems faced by modern-
day developers while developing such streams for feeding and training of machine
learning models. This literature review section attempts to summarize all such works
of research and condense the matter that has been discussed in them.
Machine learning and data pipelines are rapidly evolving fields, with researchers
proposing various approaches to improve efficiency, scalability, and performance.
One of the proposed approaches is the use of distributed computing technologies,
as demonstrated by Bui et al. [1] in their data pipeline architecture that can handle
large volumes of data with low latencies. Li et al. [2] took this approach further
by introducing an automated pipeline optimization framework that uses a genetic
algorithm to efficiently search for the best pipeline configuration based on perfor-
mance metrics. However, integrating data pipelines and machine learning workflows
efficiently in real-world scenarios remains a challenge. Islam et al. [3] proposed
a conceptual architecture to seamlessly address this challenge. They identified the
challenges and opportunities of implementing such an architecture in real-world
scenarios. Cruz et al. [4] introduced a pipeline architecture that provides efficient
integration and deployment of machine learning workflows, highlighting its benefits
in terms of scalability, reusability, and easy integration. Another important aspect of
data pipelines for machine learning is the choice of framework. Sivakumar et al. [5]
compared different data pipeline frameworks based on factors such as ease of use,
scalability, and performance. Furthermore, Onu et al. [6] discuss the challenges and
opportunities of building an efficient and effective data pipeline for machine learning
workflows. To ensure the reliability and performance of data pipelines, it is impor-
tant to monitor them. Taranu et al. [7] provide a comprehensive review of existing
research in pipeline monitoring and identify key challenges and opportunities in
applying machine learning to this field.
214 S. Nayak et al.
Overall, these research papers provide valuable insights into various approaches
for improving the efficiency and scalability of machine learning pipelines while
identifying key challenges and opportunities in this rapidly evolving field. Data
preprocessing is an essential step in machine learning tasks, and researchers have
proposed various approaches to improve the efficiency and scalability of data prepro-
cessing pipelines. One such approach is the use of a modular pipeline architecture,
where each module performs a specific task such as data cleaning, transformation,
or feature extraction. The pipeline employs parallelization techniques to improve
processing speed [8, 9]. Another proposed approach is the use of cross-platform
data preprocessing frameworks that leverage machine learning algorithms and cloud
computing resources. The frameworks use deep neural networks (DNNs) to prepro-
cess time series data or principal component analysis (PCA) and artificial neural
networks (ANNs) to improve classification accuracy [10–12]. The pipeline archi-
tecture also supports cross-platform processing through the use of Apache Arrow
as a cross-platform data format. Additionally, the pipeline employs Apache Spark
for distributed processing and utilizes several optimization techniques, including
caching and parallelization, to improve processing speed [13]. The proposed models
ensure that the data can be processed efficiently on different platforms without the
need for data format conversion or data movement, making the pipeline portable
and allowing for seamless data preprocessing across different platforms, including
Windows, Linux, and macOS [14].
Data mining primitives have increasingly been used in Customer Relationship
Management (CRM) software. Open-source big data software stacks have emerged
as an alternative to traditional enterprise database stacks. A large-scale industrial
CRM pipeline is described that incorporates data mining and serves several applica-
tions using Kafka, Storm, HBase, Mahout, and Hadoop MapReduce [15]. MLCask
is an end-to-end analytics system that supports Git-like version control semantics
for machine learning pipelines. The system enables multiple user roles to perform
branching and merging operations, while also reducing storage consumption and
improving efficiency through reusable history records and pipeline compatibility
information [16]. Data exploration through visualization is a crucial step for scien-
tists to analyze and validate hypotheses [17]. Pipeline61 is a framework that supports
the building of data pipelines across multiple environments by reusing the existing
code of deployed jobs and providing version control and dependency management to
deal with typical software engineering issues [18]. Apache StreamPipes is a graphical
tool for pipeline management that utilizes container management tools like Kuber-
netes to manage and execute complex stream processing pipelines for big data. The
proposed architecture and evaluation provide insights into the dependencies and
interplay of the technologies involved in managing and executing big data stream
processing pipelines [19]. In [20], the proposed data pipeline framework can improve
the quality of Automatic Identification System (AIS) data and provide a foundation
for various maritime management applications. The framework includes data collec-
tion, preprocessing, visualization, trajectory reconstruction, and storage, utilizing
Apache Kafka for data streaming. The DFSR approach utilizes both data features
and service associations to automatically generate machine learning pipelines for data
analysis, reducing the level of expertise required for domain workers and making
automated decisions in data analysis more accessible [21]. The above-mentioned and
cited papers along with their concise summaries aim to provide the readers with the
required background for the proper understanding of the requirement of a new more
customizable, lightweight, and cross-platform framework like “iFlow” and justify
the development efforts that creating such a framework entails.
iFlow v1.0.0 will be focused solely on the task of creation and manipulation of
data via data processing pipelines, workflows, and connectors via a Command Line
Interface or CLI. The above architecture for the first implementation of iFlow consists
of four major components denoted by four distinct colors. The section given below
elaborates on these major components and the functions which they shall perform in
the framework. Figure 1 shows the overall workflow model.
Fig. 1 Zoomed out architecture of iFlow at a glance

216 S. Nayak et al.
Data Source Manager

The Data Source Manager allows the user to smoothly interact with datasets as well
as their precise locations on the user’s file system via the CLI. Users can either
manually add folders containing .csv files under the datasets directory, or they can
directly download raw datasets from Kaggle via the public API. This remote access
and download of datasets are done by the Kaggle API wrapper functions that are
contained in the KIM or the Kaggle Interaction Module and provide a convenient
CLI-based system that can be used by developers to manage remote datasets by
obtaining structured local copies [22].
Implementation: The Data Source Manager is implemented via an interactive
Command Line Interface developed in Python (CLI) that uses the Python “requests”
library for making API calls to the public Kaggle Dataset API for fetching and
downloading resources in the backend. It is also involved in managing the directory
structure by making sure that the files and directories being created are consistent
with the packages and modules installed by iFlow.
Every task that needs to be performed by iFlow is represented by a script that is run
on a .csv file represented as a 2D matrix and stored in temporary files (may or may
not be the case depending on the options provided in the configuration files) during
transit from one scheduled workflow job to another. The Script Source Manager is
somewhat similar to the NPM package registry used by NodeJS to distribute, manage,
and maintain packages at a global scale. It represents a marketplace that would
contain various modules and script packages created by third-party independent
users and developers that could be used by anyone to extend particular functionalities
to a project being developed using the “iFlow” framework. Users can create their
own custom scripts for data preprocessing and make those scripts available on the
marketplace for global use thereby leading to a strong developer ecosystem and
troubleshooting community.
Implementation: The Script Source Manager represents an API interface that allows
the uploading of new scripts (adding new scripts to the marketplace), downloading
scripts via the CLI for usage in a project (installing modules) as well as making
changes to an existing uploaded package by a developer who owns it [23, 24]. This
entire system would be represented by a well-documented API ecosystem accom-
panied by a built-in admin interface, developed using the Django Rest Framework
(DRF) thereby making it ideal for both scalability and ease of use for system admins.
The Script Source Manger has wrapper functions defined in it that call the above-
mentioned DRF-based API endpoints in the backend. Once the script manager fetches
the required scripts or modules from the API, it passes the data to the Data Source
Manager which then decided where and how to structure the storage of the script
files.
iFlow Source Files
As has been already mentioned in the architecture, these files are used by “iFlow”
for running the entire framework and enabling the user to interact with all the other
components of the system. These files can be thought of as utility code that is
frequently required or consumed by the other modules present in the system [25].
These may include functions that deal with creation, updation, deletion, or any other
kind of management of the underlying file system. It can also include network-
based utility functions that are used for making specific API calls in a secure and
session-oriented manner. Other possible auxiliary or utility functions include those
that are concerned with encryption–decryption of files (intermediate iFlow files if
the data required confidentiality), compression, and many more that are discovered
as progress is made on the development of the framework.
Implementation: The iFlow Source Files do not have a specific implementation
language. They are represented by a mix of configuration files (either.yml or.csv
or.txt files) that come together or are utilized by other modules as already mentioned
previously. The scripts are written in Python (.py files) and are responsible for parsing
the configuration and data files in order to perform and carry out useful functions
[26].
Config Files
The configuration files form the heart of “iFlow”. These files are used to create
and define pipelines, workflows, and connectors which provide iFlow modularity
and code reusability (not to mention code shareability via the “iFlow Developer
Marketplace”). The three types of config files that are used by iFlow shown in Fig. 2
are as follows:
1. Jobs/Tasks: These are the smallest quantum or token of iFlow and define the
script or code that is to be run on a particular piece of data.
2. Pipelines: These are a collection of jobs (scripts and Python commands) defined
using YAML and form the building blocks of workflows. A pipeline can have
multiple jobs, and each job processes the data and passes the modified or
transformed data to the next stage or job.
3. Connectors: These are logical units that are used to glue or connect pipelines
together. Whenever the data encounters a connector, the logical code inside the
connector is executed to decide which pipeline should receive the data next
Fig. 2 Sample workflow using iFlow and the various constituent components
218 S. Nayak et al.
[27]. They allow for the creation of dynamic workflows based on certain data
properties. Connectors are defined by YAML in conjunction with references to
scripts that contain the Boolean logic based on which decisions are taken at the
connectors.
4. Workflows: They refer to the entire system formed by connecting multiple
pipelines together with connectors. Workflows are used to accomplish a partic-
ular data processing task. Different workflows can be created based on the end
users of the final data.
The above points aim at introducing the reader to the vocabulary used in the
iFlow documentation as well as this paper as a whole [28]. The following subsection
elaborates and dives in depth into the various definitions and schemas that are used
to create Jobs, Pipelines, Connectors, and Workflow in order to provide a better and
complete understanding of the framework for developers.
iFlow Schemas: Every config file that represents a Job/Task, Pipeline, Connector,
or Workflow in iFlow is represented by a YAML (.yml) file that conforms to a
specification defined in the official iFlow master issues on Github under the “Master
Issue/Schemas” heading.
The following section aims at providing a quick developer-level description of
the schema that we have developed over time for iFlow keeping performance, ease
of parsing, and ease of defining by a user in mind.
Job/Task Schema: Name, Description, and Script
Pipeline Schema: Name, Description, and Jobs. Jobs further consist of the following
options
. execute (required): The name of the task that will be carried out.
. in (optional): A list of the filenames that will serve as the job’s input.
. out (optional): A list of the filenames that will serve as the job’s output.
. encr (optional): A Boolean value expressing whether or not the input data is
encrypted.
Connector Schema
Core YAML Structure: The Core YAML Structure defines the configuration options
for creating a connector in a data processing pipeline framework. The following are
the fields in the Core YAML Structure and their descriptions: Name, Description,
and Script.
Add On Branch for Intrinsic Branching (Not to be used now): The Add On
Branch for Intrinsic Branching provides configuration options for defining branches
for a connector. These branches are used for intrinsic branching, which is not recom-
mended for use at the moment. The following are the fields in the Add On Branch for
Intrinsic Branching and their descriptions: Branches, Branch, Assert, and Transfer.
Workflow Schema
Recursive Workflow Declaration: The Recursive Workflow Declaration is used to
represent a workflow with branching and sub-flows. It uses the “pipeline-exec” and
“connector-exec” commands to execute pipelines and connectors, respectively.
Linear Workflow Declaration: The Linear Workflow Declaration is used to repre-
sent a workflow without branching. It uses the “pipeline-exec” and “connector-exec”
commands to execute pipelines and connectors, respectively.
The recursive workflow schemas are easier to define by the user and provide a
better developer experience due to their closer-to-nature representation. On the other
hand, the recursive nature of the schema can lead to a recursive hell that makes it
difficult to model larger and more complex recursive or branching relations between
pipelines via connectors. Therefore in the case of highly complex workflows, the
linear schema provides a more systematic and maintainable approach for defining
branchings.
Tech Stack Used for Implementation
In terms of the tech stack, the majority of the codebase shall be written in Python.
This includes the source files for iFlow as well as the servers for the marketplace
(written using the Django Framework). The configuration files will be written in
YAML, and all the other libraries that we shall be used shall be documented as the
project proceeds and takes shape.
We will be following various software conventions such as semantic commit
messages, and proper git collaboration conventions as well as ensuring automated
code coverage and testing by setting up appropriate CI/CD pipelines were required.
APIs shall be documented using “Swagger” and framework-specific documentation
integrated with CI/CD shall be maintained on “Docusaurus” and tickets raised on
“Github Issues”.
The Django REST Framework includes support for serialization, authentication,

permissions, pagination, and filtering as some of its core features. Additionally, it
supports other document types including YAML, XML, and JSON. It is critical
to concentrate on constructing a simple and unified API architecture while imple-
menting APIs using the Django REST Framework. This is possible by adhering to
RESTful principles, which include using HTTP methods and status codes appropri-
ately, offering clear and simple documentation, and making sure that API endpoints
are locally organized.
Python is designed to be inherently cross-platform, meaning that it supports any
operating system such as Windows, macOS, and Linux. This is due to the fact that
Python code is first converted into platform-independent bytecode and then it gets
220 S. Nayak et al.
translated by the Python interpreter on the target platform. This was the major influ-
ence on deciding Python as our primary choice of programming language. In addition
to that, since Python is cross-platform, developers can create code once and run it on
any system without the need of making any significant changes. This contributes to
making Python a favorable language as it avoids wasting time and effort and guar-
antees that the program functions consistently on all platforms. The comparative
analysis of the proposed iFLow and other existing approaches is given in Table 1.
Along with being cross-platform, Python is renowned for having a lightweight
architecture, which makes it a popular option for applications where effective
resource use is crucial. Python has a simpler and more streamlined syntax since it is a
dynamically typed language and does not need explicit variable definitions. Code is
now simpler to develop and read and uses fewer system resources because of this. It
also has a tiny footprint, which implies that running it uses fewer system resources.
This makes it a fantastic option for areas with limited resources or low-powered
devices. Overall, Python’s lightweight construction and cross-platform portability
make it an excellent choice for transforming “iFlow” into a cross-platform appli-
cation. This is further coupled with the fact that Python comes packaged with a
wide variety of packaging, testing, coverage, and load testing tools such as “pip”,
“pytest”, “codecov”, and “locust”, respectively, that allow for in-house testing and
reduced developer expenses.
In order to test the scalability of the Django Rest Framework on top of which
the majority portion of the iFlow marketplace and script manager is built, we used
the “locust” framework provided by Python in order to load test our API endpoints.
We simulated the usage of iFlow by a carefully controlled and increasing user base
requesting resources from the server. Our finds have been summarized in terms
of two major factors: latency (response time) and error rates. Based on both these
factors, we have tried to present a measure of how scalable iFLow is. Since iFlow
takes an entirely different approach to the concept of data pipelines and simplifies
Table 1 Comparison between iFLow and other similar frameworks

Framework iFlow Other similar frameworks
Language Python Varies
Cross-platform Yes Mostly platform-specific (e.g.,
Linux-based)
Leveraging OS file system Yes Varies
Lightweight Yes Varies
Extensible Yes Varies
Emphasis on automation Yes Varies
Emphasis on UI Yes Varies
Potential application in MLOps Yes Varies
Open source Yes (planned) Varies
data pipelines to an entirely new level, no benchmark studies or competitors for the
framework were found to be preexistent in the market.
Figure 3 shows that two test runs (Run #1 and Run #2) represent the data for a
total of 500 and 1000 users, respectively. It can be clearly seen that the number of
failures recorded in both these cases was 0 which symbolizes the fact that the server/
framework is capable of handling at least a thousand users concurrently in its native
non-optimized state by virtue of the Django Rest Framework. The response times
however can be seen to undergo a sudden spike in the case of Run #2 indicating
that the prolonged periods of such loads might not be optimal for the server. In both
cases, it can be seen that the response times decrease drastically as the load becomes
more or less constant.
However, in Fig. 4 two test runs (Run #3 and Run #4) represent the data for a
total of 1500 and 2000 users, respectively. It can be clearly seen that the number of
failures recorded in both these cases was nonzero which symbolizes the fact that the
server/framework is incapable of handling more than a thousand users concurrently
in its native non-optimized state. The response times however can once again be seen
to undergo a sudden spike in the case of both Run #3 and Run #4 indicating that the
prolonged periods of such loads might not be optimal for the server. In both cases,
it can be seen that the response times decrease drastically as the load becomes more
or less constant.
From the above-analyzed situations, we can conclude that in its native single-
threaded Django Application state, iFlow is capable of handling up to 1000 current
users. For a single instance-based application, this performance is significant. In order
Fig. 3 Locust tests for the iFlow Django Rest Framework-based server, Run #1 and Run #2
222 S. Nayak et al.
Fig. 4 Locust tests for the iFlow Django Rest Framework-based server, Run #3 and Run #4
to enable higher scalability of iFlow, it is necessary to run it in a containerized form

so that loads can be balanced across multiple instances enabling better scalability of
the framework. The user progression for all four test cases was considered over a
constant period of iterations with different user base sizes and has been displayed in
Fig. 5.
Therefore, the scalability of iFlow depends to a large extent on the method used
to deploy it in a multi-instance environment with the help of containerized services.
Fig. 5 User progression graphs during load testing

The proposed software tool, iFlow, offers a quick and simple method for establishing
Python-based cross-platform data flow pipelines by utilizing the operating system’s
native file system. In order to support a wide range of potential use cases in the
future, a strong emphasis has been placed on the modularity and extensibility of the
project design places. However, there are plans to further expand it as an open-source,
lightweight, and cross-platform MLOps framework.
Overall, iFlow has the potential to revolutionize the field of data processing by
making it easier and more convenient for developers to set up cross-platform data flow
pipelines. With its modularity and extensibility, the software tool is well positioned
to be a valuable addition to the MLOps toolkit and beyond.
References
1. Bui MTH, Park SS, Lee SH, Lee KR (2020) Towards an efficient data pipeline for machine
learning on big data. Int J Mach Learn Comput 10(5):844–849
2. Li HYH, Wibowo LNV, Wu YL (2020) Automated machine learning pipeline optimization.
IEEE Access 8:133712–133722
3. Islam MR, Rausch T, Hansson GK (2019) Challenges and opportunities in integrating data
pipelines and machine learning workflows. arXiv preprint arXiv:1912.08088
4. Cruz AL, Rodríguez JM, Balaguer CM (2018) A pipeline for machine learning workflows. In:
Proceedings of the 2018 IEEE international conference on big data, pp 3583–3588
5. Sivakumar SS, Kannan SR, Sullivan SE (2020) A comprehensive study of data pipeline
frameworks for machine learning. Int J Adv Comput Sci Appl 11(2):210–218
6. Onu CA, Dike JD, Okpako DE (2020) Building a data pipeline for machine learning: challenges
and opportunities. J Comput Inf Technol 28(1):91–102
7. Taranu DM, Sweeney JD, Driscoll CT, Herborg LE (2020) Machine learning for pipeline
monitoring: a review of current research and future directions. Front Artif Intell 3:25
8. Bui DD, Nguyen TT, Moon T (2018) A parallel framework for efficient data preprocessing
with a focus on data cleaning and normalization. IEEE Xplore
9. Bui T, Nguyen T, Moon T (2019) Modular pipeline architecture for efficient and scalable data
processing. BioEssays 41(4):e1900004
10. Huang J, Li X, Zhang Y (2015) Principal component analysis and artificial neural networks-
based data preprocessing for classification. Math Prob Eng
11. Liu B, Guo S, Zhang S, Jin H (2021) Cross-platform data preprocessing framework based on
machine learning and cloud computing. MDPI
12. Liu C, Zhu C, Xu W, Yang X, Zhang L (2021) Time series data preprocessing with deep neural
networks. IEEE Xplore
13. Sadat-Mohtasham M, Farajzadeh MA (2020) Cross-platform data preprocessing: a survey.
Webology 17(2):52–68
14. Sun X, Guo Q, Zhou W, Jia H (2018) Cross-platform data preprocessing based on apache
arrow. IEEE Xplore
15. Li K, Deolalikar V, Pradhan N (2015) Big data gathering and mining pipelines for CRM using
open-source. In: 2015 IEEE international conference on big data (big data). Santa Clara, CA,
USA, pp 2936–2938. https://doi.org/10.1109/BigData.2015.7364128
16. Luo et al Z (2021) MLCask: efficient management of component evolution in collaborative data
analytics pipelines. In: 2021 IEEE 37th international conference on data engineering (ICDE),
Chania, Greece, 2021, pp 1655–1666. https://doi.org/10.1109/ICDE51399.2021.00146
224 S. Nayak et al.
17. Callahan SP, Freire J, Santos E, Scheidegger CE, Silva CT, Vo HT (2006) Managing the
evolution of dataflows with VisTrails. In: 22nd international conference on data engineering
workshops (ICDEW’06), Atlanta, GA, USA, 2006, pp 71–71. https://doi.org/10.1109/ICDEW.
2006.75
18. Wu D, Zhu L, Xu X, Sakr S, Sun D, Lu Q (2016) Building pipelines for heterogeneous execution
environments for big data processing. IEEE Software 33(2):60–67. https://doi.org/10.1109/MS.
2016.35
19. Faizan M, Prehofer C (2021) Managing big data stream pipelines using graphical service mesh
tools. In: 2021 IEEE cloud summit (cloud summit), Hempstead, NY, USA, 2021, pp 35–40.
https://doi.org/10.1109/IEEECloudSummit52029.2021.00014
20. . Krismentari NKB, Widyantara IMO, ER NI, Asana IMDP, Hartawan IPN, Sudiantara IG
(2022) Data pipeline framework for AIs data processing. In: 2022 seventh international confer-
ence on informatics and computing (ICIC), Denpasar, Bali, Indonesia, 2022, pp 1–6. https://
doi.org/10.1109/ICIC56845.2022.10006941
21. Ru-tao Z, Jing W, Gao-jian C, Qian-wen L, Yun-jing Y (2020) A Machine learning pipeline
generation approach for data analysis. In: 2020 IEEE 6th international conference on computer
and communications (ICCC), Chengdu, China, 2020, pp 1488–1493. https://doi.org/10.1109/
ICCC51575.2020.9345123
22. Mishra N, Mishra S, Tripathy HK (2023) Rice yield estimation using deep learning. In: Inno-
vations in intelligent computing and communication: first international conference, ICIICC
2022, Bhubaneswar, Odisha, India, Dec 16–17, 2022, Proceedings, pp 379–388. Springer
International Publishing, Cham
23. Chakraborty S, Mishra S, Tripathy HK (2023) COVID-19 outbreak estimation approach using
hybrid time series modelling. In: Innovations in intelligent computing and communication:
first international conference, ICIICC 2022, Bhubaneswar, Odisha, India, Dec 16–17, 2022,
Proceedings, pp 249–260. Springer International Publishing, Cham
24. Verma S, Mishra S (2022) An exploration analysis of social media security. In: Predictive data
security using AI: insights and issues of blockchain, IoT, and DevOps, pp 25–44. Springer
Nature Singapore, Singapore
tive data security using AI: insights and issues of blockchain, IoT, and DevOps, pp 1–24.
Springer Nature Singapore, Singapore
trical appliances of commercial buildings using LSTM-GRU Model. In: 2022 international
1–5
Developing a Deep Learning Model
to Classify Cancerous and Non-cancerous
Lung Nodules
Rishit Pandey, Sayani Joddar, Sushruta Mishra, Ahmed Alkhayyat,

Shaid Sheel, and Anil Kumar
Abstract The detection of lung nodules is critical for enhancing patient outcomes, as
lung cancer is a major contributor to cancer-related deaths worldwide. Medical image
analysis has benefited greatly from deep analytics approaches, more specifically
CNN. In this study, we utilized a dataset of chest CT scans to train a ConvNet model
which can automatically classify lung nodules as cancerous or non-cancerous. The
model performed quite well in both tasks, achieving a high level of accuracy. The
outcomes of this study suggest that CNNs have the potential to give more precise
results in nodular tumour diagnosis and screening.
Keywords Lung cancer · Deep learning · Classification · Accuracy rate ·

Machine learning
R. Pandey · S. Joddar · S. Mishra (B)

Kalinga Institute of Industrial Technology, Deemed to Be University, Bhubaneswar, India
R. Pandey
S. Joddar
A. Alkhayyat
Faculty of Engineering, The Islamic University, Najaf, Iraq
S. Sheel
Medical Technical College, Al-Farahidi University, Baghdad, Iraq
e-mail: shaid.sheel@uoalfarahidi.edu.iq
A. Kumar
https://doi.org/10.1007/978-981-99-6553-3_18
226 R. Pandey et al.
1 Introduction
A form of nodule that originates in the lung cells, resulting in uncontrolled develop-
ment of abnormal cells in the lung tissue, that can form tumours and proliferate to
other areas, is called lung cancer [1]. If not detected and treated early, lung cancer
can be fatal, making it a prevalent and prominent reason of deaths due to malignancy
in the world. Signs of lung tumour include chronic coughing, chest pain, difficulty
breathing, fatigue, and unintended weight loss. While smoking mainly causes lung
cancer, exposure to second-hand smoke, radon, asbestos, and other environmental
factors can also increase the likelihood of developing the disease. Lung cancer can be
diagnosed by examining a sample of lung cells in a laboratory. Pulmonary nodules,
which are abnormal developments in the lungs, can also be detected through medical
imaging such as CT scans. Although pulmonary nodules are typically non-cancerous,
they can indicate the presence of cancer in some cases. CT scans are superior to other
medical imaging techniques such as X-rays because they produce more accurate and
less noisy results. Early prediction with treatment of lung cancer is crucial for reli-
able diagnosis and increased chances of survival. The four stages of lung cancer are
determined by the extent to which the cancer spreads within the lungs along with
different organs. Lung cancer has four stages, which are defined by how much the
cancer has propagate [2].
. First stage: It is limited to the lung and has not reached lymphatic nodes or other
organs. This is further categorized into I-A, where nodule is smaller than 3 cm in
size, and I-B, where the nodule is larger than 3 cm.
. Second stage: The malignancy has spread to adjacent lymphatic nodes or lung
tissues. Stage II is further subdivided into II-A, where three centimetre is larger
than nodule and has spread to nearby lymphatic nodes, and II-B, where the tumour
is larger than 5 cm and has spread to nearby lymphatic nodes, or between 3 and
5 cm and has spread to lymphatic nodes located nearby.
. Third stage: The lymphatic nodes in the mediastinum or nearby structures such
as the chest wall, diaphragm, or oesophagus are affected by malignancy. This is
further subdivided into III-A, where the cancer has spread to lymphatic nodes on
the same side of the chest as the primary tumour, or to nearby structures such as
the chest wall or diaphragm, and III-B, on the opposite side of the chest as the
primary tumour or to structures such as the heart, major blood vessels, or trachea.
. Fourth stage: The malignancy has propagated to different organs in the body,
like liver, brain, and bones. The survival rate of each stage of cancer is different.
The earlier the diagnosis more is the chance of surviving although there is no
promising cure for cancer yet.
Deep machine analytics, a domain of computational intelligence which applies
neural network variants to acquire information from heaps of instances, has shown
promise in detecting lung cancer. There are several approaches to using deep learning
for identifying lung cancer, including using computed tomography (CT) scans and
using X-rays. In CT scans, deep learning algorithms can be trained to identify lung
Developing a Deep Learning Model to Classify Cancerous … 227
nodules and other abnormalities that may indicate the presence of cancer. Convolu-
tional neural networks (CNNs) or other deep learning algorithms capable of detecting
patterns in medical images may be employed to do this. This research focuses on
a deep learning-based convolutional neural network model to identify and catego-
rize lung tumours as cancerous (malignant) or non-cancerous (benign and normal).
This may help radiologists and other healthcare workers to arrive at more accurate
and efficient diagnoses, which is particularly important in early-stage lung cancer as
earlier detection can improve outcomes considerably.
2 Related Work
In recent decades, there has been substantial progress in image recognition tech-
niques. These methods have been widely utilized in different areas, including medical
imaging, pattern recognition, video processing, robot vision, and more. One signif-
icant advancement in analysing medical image has been successful in detection of
cancer using deep machine learning methodologies. To be specific, convolutional
neural networks (ConvNets/CNNs) [1] have demonstrated favourable outcomes in
diagnosing cancer. Some studies have even reported achieving accuracy levels similar
to those of human radiologists. Authors in [2] put forth a CNN-driven approach,
which is presently under review, to identify lung risks during initial phases to facil-
itate prompt treatment. They built the model using Python 3’s TensorFlow-Keras
libraries. Initially, the dataset comprised 1097 images, but the researchers augmented
it to 8461 images. The model yielded a remarkable accuracy of 99.45%. Sushruta
et al. [3] conducted a review of deep learning techniques that have been employed
in research related to lung cancer, particularly in detecting lung nodules from chest
radiographs and computed tomography scans using smart IoT module. Their study
revealed two key challenges in this domain. Firstly, there is a pressing need for more
rigorous testing of deep learning algorithms in actual medical practice to estab-
lish their practical utility. Secondly, future research must focus on incorporating
heterogeneity into the scenarios since real-world applications must be able to handle
diverse types of patients. [4] In 2018, Asuntha et al. presented a novel approach
for identifying cancerous lung nodules from input lung images, organizing the lung
cancer, and assessing the extent of the disease. Their research incorporated advanced
deep learning techniques for locating cancerous lung tumours. In this study, the
authors have used a combination of techniques to extract features from medical
images, including wavelet transforming features, histogram of oriented gradients,
scale invariant feature transform, local binary pattern, and Zernike moment. Fuzzy
particle swarm optimization approach was used next to select the most appropriate
attributes for classification. The selected features were then classified using deep
learning, with a novel FPSOCNN model designed to reduce the computational
complexity of the CNN. The researchers tested their approach on a dataset from Arthi
Scan Hospital and found that their FPSOCNN model performed better than other
methods. Overall, their approach shows promise for improving the accuracy and effi-
ciency of medical image analysis [5]. In 2019, S. Bhatia et al., from the Department
of Computer Science and Information Systems at BITS Pilani, developed a method
for detecting lung nodule malignancy in CT scans using residual deep learning. To
achieve this, they first created a preprocessing pipeline to identify the areas of the lung
that are prone to cancer. They then retrieved attributes from these areas using Unet
and ResNet models. The retrieved attributes were input to residual deep analytics
prototype to categorize the images as either cancerous or non-cancerous. This tech-
nique has the potential to increase the precision of detecting lung cancer and provide
a more efficient way of screening patients for the disease. They then used various
classifiers such as XGBoost and random forest to classify the extracted features, and
their individual outputs were ensemble to predict cancerous cells. Their proposed
method achieved an accuracy of 84% on the LIDC-IRDI dataset [6]. In 2020, N.
Kalaivani et al. from Sri Krishna College of Engineering and Technology and SACS
MAVMM Engineering College proposed a deep neural network (DensNet) and adap-
tive boosting algorithm-based model to classify lung nodules as normal or malignant
from CT scan imaging. They used a dataset that was composed of 201 lung images
which were split into the ratio of 85:15 for training and testing. Upon experimenting,
their model turned out to achieve an accuracy of 90.85% [7]. In 2022, researchers
from Bharath Institute of Higher Education and Research, Chennai, led by N. Sudhir
Reddy, conducted a study aimed at identifying early-stage malignancy in lung nodule
using deep learning techniques. They found that convolutional neural networks were
the best way for analysing medical images, classifying lung nodules, extracting
attributes, and predicting lung cancer. To predict the growth of malignant tissue
in CT imaging data, they used the improved dial’s loading algorithm (IDLA). The
implementation of IDLA for lung malignancy diagnosis and prediction involves four
stages: extortion localization, machine vision, bioinformatics which are AI enabled,
and clinical CT image determination. They used a CNN with 2D convolutional layers,
including input, convolutional layer, rectified linear unit (ReLU), pooling layer, and
dense layer. Their proposed IDLA achieved an accuracy of 92.81% [8]. In 2019 I.
M. Nasser et al. developed an artificial neural network (ANN) model for detecting
the presence or absence of lung cancer in humans. The artificial neural network
(ANN) was instructed to recognize the existence of lung cancer utilizing various
input variables, including symptoms like wheezing, fatigue, chest pain, coughing,
shortness of breath, swallowing difficulty, yellow fingers, anxiety, chronic disease,
and allergy. The training, validation, and testing dataset utilized in the experiment
was called “survey lung cancer”. The outcomes demonstrate that the ANN model
attained a detection accuracy of 96.67% in identifying the presence or absence of
lung nodule malignancy [9]. In 2018, W. Rahane et al. discussed the prevalence of
lung cancer in India and the relevance of identifying it early as a means of treating
the patient. The study introduces a system for lung cancer detection that integrates
machine learning and image analysis techniques. The system can classify CT images
and blood samples to determine the presence of lung cancer. The CT images are first
categorized as normal or abnormal, and the abnormal images are segmented to isolate
the tumour area. The system then extracts features from the images and applies SVM
and image processing techniques to classify the images. The purpose of the study
is to elevate the accuracy of lung cancer classification and staging [10]. In 2020, A.
Elnakib et al. presented a CADe system for the early detection of lung nodules from
LDCT images in their study. The proposed system included contrast enhancement of
raw data, extraction of deep learning features from various networks, development
of extracted features using a genetic algorithm, and testing different classifiers to
identify lung nodules. The system achieved an accuracy as high as 96.25%, sensi-
tivity measuring 97.5%, and specificity soaring 95% using Visual Geometric Group
of 19 layers architecture, and support vector machine classifier on 320 LDCT images
extracted 50 subjects in the I-ELCAP database. The proposed system surpassed other
state-of-the-art approaches and demonstrated significant potential for early detection
of lung nodules [11]. In 2018, Suren Makaju et al. highlighted the importance of early
diagnosis and treatment of lung cancer and the challenges faced by doctors in accu-
rately interpreting CT scan images to identify cancerous cells. The research addresses
the limitations and drawbacks of several automated detection systems which involve
processing images and machine learning techniques. The authors suggest a new
model for finding malignant nodules in lung CT scan images which uses water-
shed segmentation for identification and SVM for categorization into malignant or
benign. The proposed model achieves an accuracy of 92% for detection and 86.6%
for classification, which is an improvement over the existing best model. Even so,
the proposed system cannot classify the cancer into different stages, and the authors
suggest further improvements in pre-processing and elimination of false objects to
increase accuracy. The paper concludes that future work can focus on implementing
classification into different stages and enhancing the accuracy of the proposed system
[12]. In his research, Mokhled S. AL-TARAWNEH said that in medical fields, image
processing methods are widely used for enhancing images to detect abnormalities
in target images, particularly in cancers like lung and breast cancer, where time is
crucial. This research project seeks to enhance image quality and accuracy with the
use of minimal pre-processing techniques such as the use of Gaussian rules and Gabor
filters. An enhanced region of interest is discovered and employed for feature extrac-
tion after the process of segmentation. The image’s normality is then compared using
general characteristics, with pixel percentage and mask-labelling serving as the major
features for reliable image comparison [13]. In 2019, Radhika P.R. et al conducted
a comparison of detection of cancerous lung nodules using machine learning algo-
rithms. Their paper focused on early detection of lung cancer through the analysis
of various classification algorithms including Naïve Bayes theorem, support vector
machine, decision tree logic, and logistic regression. The main aim is to evaluate
the performance of these algorithms in predicting lung cancer. In 2022 [14], a study
involved reviewing 65 papers that focused on prediction of different diseases using
data science algorithms, with a goal of identifying scope for future refinement in
detecting lung cancer in medical technology. Each approach was studied, and its
drawbacks were brought forth. Also, the study examined the essence of data used
for predicting diseases, whether benchmark or manually collected. At last, research
directions were identified to help future researchers accurately detect lung cancer
patients at an early stage without any errors, based on the various methodologies
used [15].
3 Proposed Method
Here we have proposed a model which is using CNN. Our model consists of various
convolutional and max pooling layers which are then flattened and dense to give the
required results. The workflow model is displayed in Fig 1.
Fig. 1 Workflow model representation

Fig. 2 Lung nodules scans before and after enhancement
3.1 Dataset
We have taken the [16] IQ-OTH/NCCD lung cancer dataset from Kaggle which has
three directories and one file. The directories are benign case, malignant case, and
normal case, respectively, and the file is a text file regarding the same. There are 120
files in the benign case, 561 in the malignant case, and 416 files in the normal case.
After collecting data, pre-processing is applied to it. The pre-processing scans are
shown in Figure 2.
For data preprocessing, we have resized the images to 256 X 256 so as to get
a homogenous size of data to fit into model. For image enhancement, we have
used the CLAHE [1]. CLAHE is an abbreviation for Contrast Limited Adaptive
Histogram Equalization, an image processing technique that improves an image’s
contrast. The popular technique of histogram equalization redistributes the pixel
intensity values to improve contrast; however, it has a disadvantage of exaggerating
the noise in the image [18, 19]. To counteract this, CLAHE applies histogram equal-
ization to small, local regions of the image instead of the entire image. This method
adapts contrast enhancement to the unique features of each region by constraining
the amplification of contrast based on the amount of data available in each region.
This technique is beneficial as it ensures that contrast enhancement is flexible and
not uniform throughout the entire image, preventing noise over-enhancement and
preserving overall image brightness. To increase the dataset size, we have applied
data. After this, we moved the benign and normal cases to the non-cancerous folder
and the malignant ones to the cancerous folder.
3.2 Model Architecture
A number of convolutional layers (Every layer utilizes some kernels to retrieve

variables, such as edges, corners, or textures, from the input image. These filters
consist of small weight matrices that slide over the input image in a window-like
manner, computing a dot product at each position. This process produces a feature
map that emphasizes the existence of the particular feature in the input image)
[17], max pooling (Max pooling is a type of pooling layer used in CNNs to down
sample the feature maps while retaining important information. It segregates the non-
overlapping regions of the feature map and the maximum value from every region is
taken, as this reduces the spatial size of the feature map and introduces translational
invariance [18]. Max pooling is employed before and after a convolutional layer to
reduce the number of parameters and prevent overfitting), and dense layers (Dense
layers, also known as fully connected layers, connect each neuron in the current layer
to each neuron in the previous layer. They take the flattened input and multiply it by
a weight matrix, then pass it through an activation function to introduce nonlinearity.
Table 1 highlights the parameters and their values.
Dense layers are commonly used in classification and regression tasks and are
placed at the proximity of the model to transform the output of earlier layers into a
vector of predicted outputs. The number of neurons should be adjusted based on the
complexity of the problem, as too many or too few neurons can lead to overfitting
or underfitting. Dense layers play a crucial role in neural networks by allowing the
model to learn and classify complex patterns in the input data.) were employed in
our suggested model to carry out the detection job. The table displays a summary of
the suggested strategy [19]. The table lists each layer’s parameters, associated filters,
activation function, output shape, and few levels. The nonlinearity has been added
Table 1 Parameters of proposed model

Kind of layers Configuration of outcome Metrics
Sequential (32, 256, 256, 3) 0
Conv 2d (32, 255, 255, 64) 832
Max pooling 2d (32, 127, 127, 64) 0
Conv 2d 1 (32, 126, 126, 64) 16,448
Max pooling 2d 1 (32, 63, 63, 64) 0
Conv 2d 2 (32, 62, 62, 32) 8224
Max pooling 2d 2 (32, 31, 31, 32) 0
Conv 2d 3 (32, 30, 30, 16) 2064
Max pooling 2d 3 (32, 15, 15, 16) 0
Flattened (32, 3600) 0
Drop out (32, 3600) 0
Densed (32, 32) 115,232
Dense 1 (32, 3) 99
Metric: 141, 898
Training metrics: 141,898
Without training metrics: 0
using the rectified linear unit (ReLU). ReLU is an activation function that introduces
nonlinearity in neural networks. It sets any input value less than zero to zero and keeps
any positive value unchanged. ReLU has several advantages over other activation
functions, including faster convergence during training and better performance in
deep neural networks due to avoiding the vanishing gradient problem. Three types
of lung cancer—benign, malignant, and normal—were classified as cancerous or
non-cancerous using the softmax activation function. The softmax function takes
a vector of real-valued scores, such as the yield of a fully connected layer, and
applies exponential function to each element to ensure non-negative values. It then
normalizes the resulting vector to sum up to 1, representing a probability distribution
over classes.
Our model was built using Python 3, TensorFlow, and Keras with an initial dataset size
of 1097. The dataset was later increased to 3411 using data augmentation techniques.
The entire dataset was then divided into three parts, with a ratio of 0.7 training, 0.15
validation, and 0.15 testing. These steps were taken to ensure that the model was
trained on enough data and tested thoroughly to achieve the best possible results.
Overall, the data augmentation techniques used to increase the dataset size helped
improve the model’s accuracy. The division of the dataset into three parts helped to
prevent overfitting and ensure that the model was robust enough to handle new
data. With these steps taken, the model is expected to perform well on future datasets
with similar characteristics. Our model achieved much comparable accuracy to other
models with a much lesser number of parameters and much smaller number of
resources. The accuracy achieved was 96.26% on training and 97.4% on the test
set. The plots (Figure 3a, b) illustrate the performance of a model by plotting the
accuracy of training and validation as well as the loss in training and validation over
the number of epochs or iterations. During the process of training, the accuracy of
model increases while its loss decreases. The validation accuracy and loss curves
show how the model performs on unseen validation data, which is used to prevent
overfitting. Ideally, the validation accuracy should increase, and the loss in validation
should decrease as the number of epochs escalates. However, if the model undergoes
overfitting, the validation accuracy may stop increasing, and the validation loss may
start increasing. Therefore, the plot of training and validation accuracy and loss
provides insights into the model’s performance and helps to identify issues such as
overfitting.
The accuracy can be determined by different metrics used in data science. We
considered the value 0 as cancerous and 1 as non-cancerous on the basis of which
we got our confusion matrix for depicting the predictions. The confusion matrix
of our model is also shown in Fig. 4. The classification report for the same is also
summarized in Table 2.
Fig. 3 Training accuracy versus validation loss, b training loss versus validation loss
Fig. 4 Confusion matrix of

the model
Table 2 Summary on lung cancer classification

Class Precision (%) Recall (%) F1-score (%) Support
Cancer-0 97 95 95 278
Non-cancer-1 95 96 95 266
We have obtained a precision of 0.97 for cancerous and 0.95 for non-cancerous,
recall of 0.95 for cancerous and 0.96 for non-cancerous, and an F1-score of 0.95 for
both.
5 Conclusion
The CNN model proposed here has demonstrated high level of accuracy in accurately
classifying cancerous and non-cancerous cases. The training set accuracy was derived
to be 96.26%, and the testing set accuracy was found out to be 97.4%. The model
performs comparably to other models that require much more resources and have a
higher number of parameters. The precision, recall, and F1-score for both cancerous
and non-cancerous cases were also high. The model’s effectiveness in detecting
lung cancer early on can improve patients’ prognosis and treatment options. Early-
stage lung cancer can be treated with surgery, radiation therapy, chemotherapy, or a
combination of these treatments. Late-stage lung cancer, on the other hand, has few
therapy options and can have a poor prognosis.
References
2. Shimazaki A, Ueda D, Choppin A, Yamamoto A, Honjo T, Shimahara Y, Miki Y. Deep learning-
based algorithm for lung cancer detection on chest radiographs using the segmentation method.
Sci Rep 12(1):727. https://doi.org/10.1038/s41598-021-04667-w. PMID: 35031654; PMCID:
PMC8760245
3. Mishra S, Thakkar HK, Mallick PK, Tiwari P, Alamri A (2021) A sustainable IoHT based
computationally intelligent healthcare monitoring system for lung cancer risk detection. Sustain
Cities Soc 72:103079
4. Asuntha A, Srinivasan A (2020) Deep learning for lung cancer detection and classification.
Multimedia Tools Appl 79:7731–7762
5. Bhatia S, Sinha Y, Goel L (2018) Lung cancer detection: a deep learning approach. Soft Comput
Probl Solving 699–705
6. Kalaivani N, Manimaran N, Sophia DS, Devi DD (2020) Deep learning based lung cancer
detection and classification. IOP Conf Ser: Mater Sci Eng 994(1):012026. https://doi.org/10.
1088/1757-899X/
7. Reddy N, Khanaa V (2023) Intelligent deep learning algorithm for lung cancer detection and
classification. Bull Elect Eng Inf 12(3):1747–1754. https://doi.org/10.11591/eei.v12i3.4579
8. Nasser IM, Abu-Naser SS (2019) Lung cancer detection using artificial neural network. Int J
Eng Inf Syst (IJEAIS) 3(3):17–23
9. Rahane W, Dalvi H, Magar Y, Kalane A, Jondhale S (2018) Lung cancer detection using image
processing and machine learning healthcare. In: 2018 international conference on current trends
towards converging technologies (ICCTCT). IEEE, pp 1–5
10. Elnakib A, Amer HM, Abou-Chadi FE, Early lung cancer detection using deep learning
optimization
11. Makaju S, Prasad P, Alsadoon A, Singh A, Elchouemi A (2018) Lung cancer detection using
CT scan images. Procedia Comput Sci 125:107–114
12. Al-Tarawneh MS (2012) Lung cancer detection using image processing techniques. Leonardo
Electron J Pract Technol 11(21):147–158
13. R PR, Nair RA, V G (2019) A comparative study of lung cancer detection using machine
learning algorithms. In: 2019 IEEE international conference on electrical, computer and
communication technologies (ICECCT), pp 1–4. https://doi.org/10.1109/ICECCT.2019.886
9001
14. Pradhan K, Chawla P (2020) Medical internet of things using machine learning algorithms for
lung cancer detection. J Manage Anal 7(4):591–623. arXiv:https://doi.org/10.1080/23270012.
2020
15. Verma S, Mishra S (2022) An exploration analysis of social media security. In: Predictive
data security using AI: insights and issues of blockchain, IoT, and DevOps. Springer Nature
tive data security using AI: insights and issues of blockchain, IoT, and DevOps. Springer Nature
trical appliances of commercial buildings using LSTM-GRU model. In: 2022 international
1–5
Concrete Crack Detection Using
Thermograms and Neural Network
Mabrouka Abuhmida, Daniel Milne, Jiping Bai, and Ian Wilson
Abstract In the field of building integrity testing, the structural integrity of concrete
structures can be adversely affected by various impact actions, such as conflict and
warfare. These actions can result in subsurface defects that compromise the safety of
the buildings, even if the impacts are indirect. However, detecting and assessing these
hidden defects typically require significant time and expert knowledge. Currently,
there is a lack of techniques that allow for rapid evaluation of usability and safety
without the need for expert intervention. This study proposes a non-contact method
for testing the integrity of structures, utilising the unique characteristics of thermog-
raphy and deep learning. By leveraging these technologies, hidden defects in concrete
structures can be detected. The deep learning model used in this study is based on the
pretrained ResNet50 model, which was fine-tuned using simulated data. It achieved
an impressive overall accuracy of 99.93% in classifying defected concrete blocks.
The training process involved two types of thermograms. The first type consisted of
simulated concrete blocks that were heated and subjected to pressure. The second
type involved real concrete blocks from the laboratory, which were subjected to
pressure using a pressure machine.
Keywords Convolution deep learning · Thermography · Concrete structures ·

Feature extraction
M. Abuhmida (B) · D. Milne · J. Bai · I. Wilson

University of South Wales, Cardiff, UK
e-mail: Mabrouka.abuhmida@southwales.ac.uk
D. Milne
e-mail: Daniel.milne@southwales.ac.uk
J. Bai
e-mail: Jiping.bai@southwales.ac.uk
I. Wilson
e-mail: Ian.wilson@southwales.ac.uk
https://doi.org/10.1007/978-981-99-6553-3_19
238 M. Abuhmida et al.
1 Introduction
This paper aims to demonstrate the abilities of an autonomous system for detecting
subsurface-level cracks in thermograms of concrete structures. These structures,
which can be a safety concern, can be in areas overlooked by human experts. By
establishing a proof-of-concept, this research aims to highlight the potential effec-
tiveness of an automated approach, thereby providing even non-experts an indication
of a building’s structural safety.
This paper is divided into four sections. Firstly, an introduction provides an
overview of the topic and highlights related work and literature. Secondly, a discus-
sion of the methods is utilised, including the dataset creation and description of the
AI systems. Thirdly, the findings are presented and analysed in detail. The last section
is the conclusion, summarising the main findings and providing the key takeaways,
including potential areas for further investigation.
The built environment comprises diverse structures, including commercial and
residential buildings, schools, hospitals, and civic institutions. These structures rely
on essential infrastructure such as water, sanitation, power, communications, and
transport systems, which are vital for the local population. When actions impact these
environments, there is an increased risk of damage to these structures, potentially
harming the civilian population [1].
Ensuring structural integrity is a crucial aspect of engineering, aiming to ensure
that structures and their components are suitable for their intended purposes and
can withstand normal operating conditions. They should also remain safe even if
conditions exceed the original design specifications. This involves supporting the
structure’s weight and preventing deformation, breaking, and catastrophic failure
throughout its expected lifespan. Like any built environment, concrete structures
require testing and monitoring to assess their structural integrity [1, 2].
Nondestructive testing techniques are used to conduct structural non-intrusive
testing [1]. These techniques are non-invasive technologies such as ground pene-
trating radar (GPR), thermography, microwaves, and infrared. They allow assess-
ment without compromising the integrity of the structure. On the other hand, partially
destructive testing techniques are commonly used when minor damage is permissible.
These methods include pull-out and pull-off tests, penetration resistance, and break-
off testing. However, certain destructive testing techniques have become necessary,
in which methods involve extracting samples from the structure’s material for off-site
laboratory analysis [2].
When rapid testing and quick testing are required in destructive testing, methods
become impractical and costly [3–5]. Furthermore, Both partial and destructive tech-
niques often involve repairs, which increase the complexity of performing such
tests. Considering these limitations, this study focuses on advancing nondestructive
contactless testing methods.
Surface-level defects observed in concrete structures may warrant structural safety
tests. However, subsurface-level defects that pose potential safety risks are not always
easily detectable, even by experts. Consequently, concrete testing may be deemed
Concrete Crack Detection Using Thermograms and Neural Network 239
unnecessary [6]. This issue is particularly concerning as the assessment of concrete

structures is often neglected beyond areas directly affected [7].
Nevertheless, these areas may harbour subsurface-level defects that can compro-
mise structural integrity. In such cases, experts may hesitate to perform safety tests
due to the lower cost and low probability of structural damage [8].
In addition to the practical challenges mentioned earlier, NDT techniques for
concrete typically require on site intrusive interventions from experts to carry out
the tests and interpret the data. This study addresses these issues by proposing that
the use of thermograms and utilising thermography can be useful an alternative
approach. Defective regions in thermal imaging exhibit different and distinguishable
features to the surrounding concrete, enabling differentiation and identification of
these areas [8–10]. A deep learning model trained to classify defected and none
defected concrete blocks is employed to overcome this. AI has proven effective in
enhancing the identification process in numerous studies [11–14]. Combining the AI
system with thermal imaging allows for the efficient evaluation of large sections of
a structure, making this technique significantly time effective compared to existing
methods.
Thermography is a technique that measures temperature by detecting infrared
(IR) light within the wavelength range on the electromagnetic spectrum. It uses
specialised cameras or sensors to capture the infrared radiation emitted by objects
and converts it into a visual representation of temperature. This non-contact method
allows for temperature measurement and visualisation in various applications,
including building diagnostics, industrial inspections, medical imaging, and surveil-
lance systems [15]. Thermograms take several formats, such as greyscale, or overlay.
Like RGB where each pixel represents the colour density in thermography, each
pixel represents the temperature [16]. Thermography is contactless and safe means
of collecting useful data about an object [16, 17].
Thermograms provide valuable information that helps in understanding specific
specifications, and these specifications are summarised in Table 1.
Table 1 Thermal imaging

Description Value
capture parameters
Full emissivity (1) 100%
Zero emissivity (0) 0%
Emissivity for concrete [18] 0.95
Distance from camera to target 1m
Room temperature 20 °C
1.1 Related Work
Artificial intelligence techniques have become increasingly popular in the field of

image processing and object recognition. Deep learning, a subset of artificial intelli-
gence, has made significant advancements in object detection and recognition. Unlike
traditional machine learning, deep learning uses different types of neural network
such as convolution layers to enhance the feature extraction.
Deep learning models, such as deep convolutional neural networks (DCNNs),
have complex structures and multiple hidden layers [16]. This allows them to abstract
features from data more effectively, capturing the intricacies of the data. Additionally,
the diverse convolutional layers in DCNNs enable more robust feature extraction [15].
Deep learning models operate on raw data, allowing for end-to-end processing
with minimal human intervention. This capability expands the potential for recogni-
tion detection in complex scenarios.
Researchers have made significant progress in pavement assessment in complex
scenarios, by integrating multi-sensors and deep learning techniques. Zhou and
Song [19] utilised deep convolutional neural networks (DCNN) in combination with
laser-scanned range images, incorporating depth mapping information to accurately
identify cracks while mitigating the impact of oil stains and shadows on pavement
analysis.
Researchers have employed transfer learning to adapt image recognition networks
in pavement crack detection. Gopalakrishnan et al. [20] used the VGG16 network, a
pretrained network trained on a large dataset of images. They fine-tuned the network
for the task of pavement distress detection and achieved high accuracy.
Guan et al. [21] used stereo vision and deep learning to implement an auto-
matic pavement detection based, leveraging a 3D imaging to detect cracks and
potholes effectively. The depth information provided by the 3D images enabled
volume measurement of potholes.
Cha et al. [22] The fast R-CNN architecture was modified to accurately clas-
sify five concrete cracks types. They achieved high accuracy for each type of crack,
and their method can handle multiple cracks in the same image. Zhang et al. [23]
employed transfer learning from a model such as AlexNet to classify background
regions and sealed cracks. These transfer learning-based deep learning models
explore new application scenarios while outperforming traditional image processing
methods. However, all these approaches are crack recognition methods; they do not
address complex factors like oil markings, joint, etc.
In their research, Yehia et al. [24] conducted a comparison of various nonde-
structive evaluation (NDE) methods to identify defects in concrete bridge decks. To
carry out their experiments, they utilised a laboratory-created bridge slab as a repre-
sentative sample. The authors employed infrared thermography (IRT) and ground
penetrating radar (GPR) techniques on the laboratory samples of bridge deck slabs.
Zhou and Song [19] employed DCNN with laser-scanned range images mapping
information to evaluate the effect of oil stains by classifying cracks and shadows on
pavement.
Thermal imaging has also been utilised for pavement crack detection, as the distri-
bution pattern of surface temperature directly correlates with crack profiles, serving as
an indicator of crack depth [25]. Seo et al. [26] conducted experimental studies using
infrared thermograms and confirmed the effectiveness of infrared thermal imagers
in crack detection, particularly for different widths of cracks. Thermal imagers offer
advantages such as real-time efficiency, cost-effectiveness, and direct compatibility
with deep learning networks, making them valuable tools for practical pavement
inspection.
2 Experiment Design
The data pre-processing stage involves several steps to prepare the captured videos for
analysis. Firstly, the videos are sliced into individual images, with a frame extracted
regularly. This interval can be adjusted to control the resulting image size.
Next, the sliced images undergo data augmentation to increase their diversity and
quantity. Various transformations, such as rotation, flipping, and cropping, are applied
to the images. This augmentation process helps prevent the model from becoming
overly specialised to the training data, reducing overfitting.
The dataset is divided into three sets following data augmentation: training, vali-
dation, and testing. The training set allows the model to learn the classes. The valida-
tion set evaluates the model’s performance and fine-tunes its parameters. Lastly, the
testing set assesses the model’s performance on unseen data, providing an unbiased
measure of its effectiveness.
The model’s performance is then assessed using the validation set, allowing for
further optimisation if necessary. Finally, the model is tested to estimate its ability
to generalise and perform well on new, unseen data (Fig. 1).
Fig. 1 Experiment phases

2.1 Simulation Dataset Creation
A simulation is first generated using ABAQUS. This simulation aims to produce

data that can be utilised to assess the ability of a deep learning model to predict the
correct class of concrete defects. The main idea is to classify concrete structures as
safe or unsafe. Previous research such as [24, 27, 28] has successfully demonstrated
the use of thermography for detecting subsurface defects; however, there was no
accessible dataset to replicate the studies. Consequently, our simulation generated
a dataset of 12,020 RGB and corresponding thermographs samples with two main
class to represent defected and non-defected concrete. Figure 2 presents an example
of concrete block simulated images and thermographs.
To initiate the thermal analysis, every specimen is initially set at of 0 °C. Subse-
quently, to simulate the effect of sunrise, we applied flux of 60 °C at the rear of the
panel. The simulation process generates results balanced dataset with 501 frames for
both classes, which are exported in video format from ABAQUS. Each step in the
simulation represents one second, ensuring adequate iterations for the specimen’s
thermal properties to undergo changes and eventually stabilise. The exported frames
present the grayscale representation of the front side of the ABAQUS model, utilising
a fixed temperature scale. Within the model, nodes located in various regions possess
distinct temperature values. Figure 3 displays an example of the images utilised for
training the model, depicting the aforementioned characteristics.
Fig. 2 Visualisation of
simulated RGB and
thermography of concrete
blocks
Fig. 3 Examples of the

simulation dataset
In the case of void-free specimens, the parameter variation is comparatively lesser.

Therefore, the simulation is designed to encompass 1000 steps, resulting in the
export of 1001 frames exhibiting the front of the specimen in video format from
ABAQUS. This approach ensures a balanced dataset. The video is cropped to include
the specimen for each simulation, and every frame is exported accordingly.
2.2 Camera and Concrete Blocks Specifications
The FLIR E8 thermal camera was used in this study. It can capture images both in
grayscale or colour. The concrete blocks used in the experiment are standard 100 ×
100 × 100 mm blocks that have been water cured for 28 days. The concrete blocks
were made using pulverised fuel ash (PFA) at varying levels of 10, 20, and 30%. The
FLIR E8 is a thermal imaging camera that can capture images in colour or grayscale.
It has a resolution of 640 × 480 pixels and a thermal sensitivity of < 0.05 °C. The
camera is also equipped with various features, such as a laser pointer and a built-in
Wi-Fi module.
Concrete blocks. The concrete blocks used in the experiment are standard 100
× 100 × 100 mm blocks. These blocks are commonly used for strength testing
purposes. The blocks were water cured for 28 days to ensure they were fully cured
before the experiment. PFA is a by-product of the coal-fired power industry. It is a
pozzolanic cement which can form cementitious compounds. PFA is often used as a
substitute for cement in concrete mixtures.
The thermal imaging camera used in this study has a field of view of 25° × 19°,
allowing for capturing a wide scene area. It has a temperature measurement range of –
20 °C to 650 °C, enabling the detection of both low- and high-temperature variations.
The camera operates at a frequency of 9 Hz, providing enough images per second
for accurate temperature analysis. The accuracy of temperature measurements is
maximised, with a maximum deviation of ± 2 °C or ± 2% of the measured value.
The captured thermal images are saved in BMP file format, while the recorded videos
are stored in MP4 format. The resolution of the images is 640 × 512 pixels, ensuring
clear and detailed thermal visualisations.
2.3 Compression-Exposed Concrete Data Collection
To collect data, the concrete specimen was first loaded into the compression machine.
The thermal imaging camera was then positioned on a tripod at a distance of 1 m from
the specimen. The camera’s emissivity was set to 0.95 to ensure accurate temperature
measurements. The data was recorded as pressure was applied to the specimen using
the compression machine. The recording continued until the specimen fractured. The
thermal imaging camera was able to capture the dynamic thermal changes occurring
throughout the experiment, which allowed for the identification of defects in the
specimen, such as cracks and voids.
Gradual pressure was then applied to the specimen using the compression
machine, allowing for controlled stress on the material. The experiment proceeded
until visible cracks became evident on the specimen’s surface, at which point the
recording was promptly stopped. This entire process was repeated a total of eight
times, employing different concrete specimens for each repetition. As a result, a
substantial dataset was generated, consisting of thermal recordings depicting the
evolution of visible cracks in the concrete specimens.
The average video duration was approximately two minutes. Irrelevant data was
discarded, and the recordings were visually examined to separate them into two cate-
gories: non-defected and defected. This has allowed for the creation of two distinct
classes within the dataset. The frames from the videos were then extracted to form an
image dataset. The images were converted to grayscale to facilitate training, enhance
generalisation, and avoid local optima. Additionally, each image was normalised
within the range of zero to one. Blurred and unclean data were removed from the
dataset to reduce inconsistencies.
Figure 4 illustrates an example of the images obtained. It includes an image of
the specimen without defects before applying pressure (Fig. 4a) and images showing
cracks forming during the application of pressure (Fig. 4b–d). The minimum temper-
ature recorded in the image is around 19 °C, while the maximum temperature reaches
approximately 63.5 °C. A total of 255 distinct grey thresholds were utilised in the
processed image, each corresponding to a specific temperature value.
(a) No Defect (b) Defected (c) Visible Defect (d) Clear Defect
Fig. 4 Blocks thermal defects

Fig. 5 Data augmentation
The acquired videos were influenced by various factors related to the image acqui-
sition system, including non-uniform illumination, noise, and others. Among these
factors, one of the most significant issues was the range of the colour bar used for
thermal imaging. To address this, a set of image augmentation had to be performed
on the sliced images; Figure 5 highlights the impact of frames.
2.4 Simulation Dataset Model
The residual network (ResNet50), a pretrained deep model, consists of 50 layers

of a specific type of convolutional neural network (CNN). The laboratory model
was using ResNet50 developed and trained on the simulation dataset to learn how to
classify specimens with and without voids. ResNet50 is recognised for its deep archi-
tecture [27], which benefits from residual connections to retain knowledge during
training and improve network capacity, resulting in faster training [29]. Compared
to other CNNs, ResNet50 has consistently demonstrated superior performance [28].
The model consists of 48 CNN layers, one max-pool layer, and one average-pool
layer [24, 30]. ResNet50 is widely used for image classification tasks and performs
better than other pretrained models like VGG16 [31, 32].
The input size for the model is (256 × 256 × 1), indicating grayscale input images.
The model is trained for twenty epochs, with a training time of 37.35 minutes. The
data is split into a train-test split, with 25% reserved for testing. A batch size of
eight and a learning rate of 0.001 are used. The model is compiled with the Adam
optimiser, sparse categorical cross-entropy loss, and the ‘accuracy’ metric, which
evaluates prediction correctness compared to the labels.
The model setup function hyperparameters are summaries in the following
Table 2:
Table 2 Model hyperparameters summaries

Layer (type) Output shape Param #
InputLayer [(Height, width, 3)] 0
Conv2D [(Height/2, width/2, 64)] 9472
BatchNormalization [(Height/2, width/2, 64)] 256
block1_Conv2D [(Height/4, width/4, 64)] 4160
block1_conv2_ BatchNorm [(Height/4, width/4, 64)] 256
block1_2_Conv2D [(Height/4, width/4, 64)] 36,928
block1_2 _BatchNorm [(Height/4, width/4, 64)] 256
block1_3_ Conv2D [(Height/4, width/4, 256)] 16,640
block1_0_BatchNorm [(Height/4, width/4, 256)] 1024
block1_3_ BatchNorm [(Height/4, width/4, 256)] 1024
block2_1 Conv2D) [(Height/4, width/4, 64)] 16,448
block2_1_BatchNorm [(Height/4, width/4, 64)] 256
Optimizer optimizers.Adam(0.001) 0.001
2.5 Laboratory Dataset Model
The deep learning model initially underwent training using a dataset consisting of
simulated thermal images. This training approach involved generating computer-
simulated images, enabling the model to grasp the fundamental characteristics of
thermal images depicting concrete specimens.
The training laboratory dataset was used to adapt the model for the laboratory
experiment with a collection of 11,140 thermographs acquired during the experiment.
These images were captured under real-world conditions, allowing the model to learn
the specific features of thermal images of concrete specimens.
The dataset is balanced; it involved 5500 thermographs for each class for both
defected and not defected concrete blocks. This balanced dataset ensured an equal
representation of both specimens, facilitating the model’s ability to distinguish
between them. The ResNet50 model and corresponding hyperparameters were
employed for the training process. This entailed utilising the ResNet50 deep learning
architecture and a set of optimised hyperparameters tailored to this task. The entire
retraining procedure was completed in approximately 37.75 min, indicating that it
took roughly 38 min to train the model anew using the updated dataset.
Fig. 6 Training and validation performance—simulation
3 Results and Analysis
3.1 Simulation Dataset Model Results
The model trained on the simulated data demonstrated successful classification

between simulated specimens with and without voids, even when the data included
unseen parameters. It achieved high accuracy rates on the unseen dataset, with 0.9992
accuracy for void-free simulations and 0.996 accuracy for simulations with voids.
Although the model did not attain 100% accuracy on the unseen dataset with voids,
this can be attributed to the time required for heat to propagate through the spec-
imen, making subsurface cracks visible. Overall, the model performed well, correctly
classifying most images, which is a positive outcome as shown in Fig. 6.
It is recommended to incorporate additional data from ABAQUS simulations
to enhance the model’s performance on the simulated dataset. This additional data
should encompass a broader range of parameters to enhance the model’s ability to
generalise predictions. For example, the concrete material can be varied to include
different concrete mixes.
Given the model’s already high accuracy, hyperparameter optimisation was not
conducted. The potential gains from further fine-tuning were deemed insignificant,
especially considering that this project serves as a proof-of-concept.
3.2 Laboratory Dataset Model Results
Achieving a model performance of accuracy of 100%, and validation of 0.99, train

loss of 7.0 × 10−6 , and validation loss of 9.4 × 10−7 during training, plus and F1-
score of 1.0, this model demonstrates high confidence when tested on testing data,
even when not using simulation of the sun effect like in [33]. Figure 7 shows the
results of the simulation dataset model.
Fig. 7 Training and validation performance—laboratory
Fig. 8 Model image

visualisation during training
Recording an RGB video alongside the thermal images is recommended to

enhance visibility and facilitate visual analysis. This will enable a direct side-by-
side comparison between the two, highlighting the impact and demonstrating that
the cracks detected in the thermographs.
Figure 8 shows a visualisation of a random image on the test dataset during the
testing phase of the model. The figure displays both the predicted and actual labels
for the image. In this instance, the predicted label is 1, indicating that the image is
classified as ‘unsafe’. This label signifies the presence of a crack that is not visible
to the naked eye but detectable based on its thermal properties alone.
3.3 The Challenges of Using Thermal Images
Using thermal images for crack detection in concrete structures offers advantages
but also presents challenges and limitations. Thermal imaging is sensitive to environ-
mental factors, such as temperature and airflow, which can introduce noise and affect
accuracy. Also, thermal cameras may have lower resolution and struggle to detect
hidden or subsurface cracks lacking significant thermal variations. Variability in crack
patterns and the need for validation against ground truth data pose challenges. False
positives and false negatives need to be carefully considered. Cost and equipment
requirements, including calibration and training, can be limiting factors. Despite
these limitations, thermal imaging can provide valuable insights for crack detection
and contribute to maintenance efforts when combined with other techniques.
4 Conclusion
This paper was conducted in two separate experimental tests. The first experiment
focuses on simulating the surface temperature of a concrete structure, examining the
thermal changes and variations when a hidden crack is present or absent. The second
experiment involves a concrete specimen in a laboratory setting, where the thermal
camera captures the thermal changes as pressure is applied until cracks become
visible. The data collected from both experiments is utilised to train two independent
deep learning models, enabling them to autonomously detect hidden defects. The
technique employed in these experiments proves to be highly effective for detecting
minor subsurface cracks by analysing thermograms of concrete block surfaces.
It is recommended to collect additional data from larger concrete blocks to enhance
the investigation further and improve the model’s predictions. This would provide
a more realistic representation of structural walls and enhance the model’s ability
to make predictions in diverse scenarios. Additionally, other specifications can be
considered to enhance the laboratory experiments including the distance between the
concrete surface and the camera. This can contribute to a better understanding of the
system’s capabilities and performances.
References
1. Onyeka FC (2020) A comparative analysis of the rebound hammer and pullout as nondestructive
method in testing concrete. Eur J Eng Technol Res 5(5):554–558
2. Khan AA (2002) Guidebook on nondestructive testing of concrete structures. Int Atomic Energy
Agency
3. Wankhade RL, Landage AB (2013) Nondestructive testing of concrete structures in Karad
region. Procedia Eng 51:8–18
4. Rende NS (2014) Nondestructive evaluation and assessment of concrete barriers for defects
and corrosion, (pp 290–297
5. Thiagarajan G, Kadambi AV, Robert S, Johnson CF (2015) Experimental and finite element
analysis of doubly reinforced concrete slabs subjected to blast loads. Int J Impact Eng 75:162–
173
6. Scuro C, Lamonaca F, Porzio S, Milani G, Olivito RS (2021) Internet of Things (IoT) for
masonry structural health monitoring (SHM): overview and examples of innovative systems.
Constr Build Mater 290:123092
7. Jain A, Kathuria A, Kumar A, Verma Y, Murari K (2013) Combined use of nondestructive tests
for assessment of strength of concrete in structure. Procedia Engineering 54:241–251
8. Wang Z (2022) Integral fire protection analysis of complex spatial steel structure based on
optimised Gaussian transformation model. Comput Intell Neurosci
9. Cheng C, Shen Z (2018) Time-series based thermography on concrete block void detection.
In: Construction research congress 2018, pp 732–742
10. Farrag S, Yehia S, Qaddoumi N (2016) Investigation of mix-variation effect on defect-detection
ability using infrared thermography as a nondestructive evaluation technique. J Bridg Eng
21(3):04015055
11. Liu JC, Zhang Z (2020) A machine learning approach to predict explosive spalling of heated
concrete. Arch Civ Mech Eng 20:1–25
12. Gupta S (2013) Using artificial neural network to predict the compressive strength of concrete
containing nano-silica. Civ Eng Archit 1(3):96–102. https://doi.org/10.13189/cea.2013.010306
13. Gupta S (2015) Use of triangular membership function for prediction of compressive strength
of concrete containing nanosilica. Cogent Eng 2(1):1025578
14. Hosseinzadeh M, Dehestani M, Hosseinzadeh A (2023) Prediction of mechanical properties
of recycled aggregate fly ash concrete employing machine learning algorithms. J Build Eng
107006
15. Ignatov I, Mosin O, Stoyanov C (2014) Fields in electromagnetic spectrum emitted from human
body. Applications in medicine. J Health, Med Nurs 7(1–22)
16. Miao P, Srimahachota T (2021) Cost-effective system for detection and quantification of
concrete surface cracks by combination of convolutional neural network and image processing
techniques. Constr Build Mater 293:123549
17. Choi H, Soeriawidjaja BF, Lee SH, Kwak M (2022) A convenient platform for real-time
non-contact thermal measurement and processing. Bull Korean Chem Soc 43(6):854–858
18. Park BK, Yi N, Park J, Kim D (2012) Note: development of a microfabricated sensor to measure
thermal conductivity of picoliter scale liquid samples. Rev Sci Instrum 83(10)
19. Zhou S, Song W (2020) Deep learning-based roadway crack classification using laser-scanned
range images: a comparative study on hyperparameter selection. Autom Constr 114:103171
20. Gopalakrishnan K, Khaitan SK, Choudhary A, Agrawal A (2017) Deep convolutional neural
networks with transfer learning for computer vision-based data-driven pavement distress
detection. Constr Build Mater 157:322–330
21. Ramzan B, Malik MS, Martarelli M, Ali HT, Yusuf M, Ahmad SM (2021) Pixel frequency
based railroad surface flaw detection using active infrared thermography for structural health
monitoring. Case Stud Therm Eng 27:101234
22. Cha KH, Sahiner B, Pezeshk A, Hadjiiski LM, Wang X, Drukker K, Summers RM, Giger ML
(2019) Deep learning in medical imaging and radiation therapy. Med Phys 46(1):e1–e36
23. Kaige Z, Cheng HD, Zhang B (2018) Unified approach to pavement crack and sealed crack
detection using preclassification based on transfer learning. J Comput Civ Eng 32:04018001
24. Jang K, Kim N, An YK (2019) Deep learning–based autonomous concrete crack evaluation
through hybrid image scanning. Struct Health Monit 18(5–6):1722–1737
25. Li Z, Yoon J, Zhang R, Rajabipour F, Srubar III WV, Dabo I, Radlińska A (2022) Machine
learning in concrete science: applications, challenges, and best practices. NPJ Comput Mater
8(1):127
26. Seo H (2021) Infrared thermography for detecting cracks in pillar models with different
reinforcing systems. Tunn Undergr Space Technol 116:104118
27. Qin Z, Zhang Z, Li Q, Qi X, Wang Q, Wang S (2018) Deepcrack: learning hierarchical
convolutional features for crack detection. IEEE Trans Image Process 28:1498–1512
28. Rajadurai RS, Kang ST (2021) Automated vision-based crack detection on concrete surfaces
using deep learning. Appl Sci 11(11):5229
29. Mascarenhas S, Agarwal M (2021) A comparison between VGG16, VGG19 and ResNet50
architecture frameworks for Image Classification. In: 2021 international conference on disrup-
tive technologies for multi-disciplinary research and applications (CENTCON), 2021, vol 1,
pp 96–99
30. Fan Z, Li C, Chen Y, Wei J, Loprencipe G, Chen X, Di Mascio P (2020) Automatic crack
detection on road pavements using encoder-decoder architecture. Materials 13:2960
31. Islam MM, Hossain MB, Akhtar MN, Moni MA, Hasan KF (2022) CNN based on transfer
learning models using data augmentation and transformation for detection of concrete crack.
Algorithms 15(8):287
32. Guo M-H et al (2022) Attention mechanisms in computer vision: a survey. Comput Vis Media
8(3):331–368
33. Aggelis DG, Kordatos EZ, Strantza M, Soulioti DV, Matikas TE (2011) NDT approach for
characterisation of subsurface cracks in concrete. Constr Build Mater 25(7):3089–3097. https://
doi.org/10.1016/j.conbuildmat.2010.12.045
34. Wiggenhauser H (2002) Active IR-applications in civil engineering. Infrared Phys Technol
43(3–5):233–238
35. Abuhmida M, Milne D, Bai J, Sahal M (2022) ABAQUS-concrete hidden defects thermal
simulation. Mendeley Data. https://doi.org/10.17632/65nbxg9pr3.1
36. Hu D, Chen J, Li S (2022) Reconstructing unseen spaces in collapsed structures for search and
rescue via deep learning based radargram inversion. Autom Constr 140:104380
Wind Power Prediction
in Mediterranean Coastal Cities Using
Multi-layer Perceptron Neural Network
Youssef Kassem , Hüseyin Çamur ,

and Abdalla Hamada Abdelnaby Abdelnaby
Abstract Wind energy refers to a form of energy conversion where wind turbines
convert the kinetic energy of the wind into electrical energy that can be used as a
source of clean energy. Thus, estimating wind power is important for wind farm
planning and design. This study aims to predict the wind power density (WPD) in
Mediterranean coastal cities using multi-layer perceptron neural network (MLPNN)
model. For this aim, two scenarios were proposed. In scenario 1, the developed
model was utilized global meteorological data (GMD) as input variables, including
precipitation (PP), maximum temperature (Tmax), minimum temperature (Tmin),
actual evapotranspiration (AE), wind speed at 10 m height (WS), and solar radiation
(SR). However, the input variables in scenario 2 were geographical coordinates (GC)
and GMP, which aim to estimate the influence of GC on the accuracy of the WPD
prediction. The results indicated that scenario 2 has decreased the RMSE and MAE
by 46%.
Keywords MLPNN · Mediterranean coastal cities · WPD · Meteorological

parameters · Geographical coordinates
Y. Kassem (B) · H. Çamur · A. H. A. Abdelnaby

Faculty of Engineering, Mechanical Engineering Department, Near East University, 99138
Nicosia, North Cyprus, Cyprus
e-mail: yousseuf.kassem@neu.edu.tr
H. Çamur
e-mail: huseyin.camur@neu.edu.tr
A. H. A. Abdelnaby
e-mail: 20213582@std.neu.edu.tr
Y. Kassem
Faculty of Civil and Environmental Engineering, Near East University, 99138 Nicosia, North
Cyprus, Cyprus
Near East University, Energy, Environment, and Water Research Center, 99138 Nicosia, North
Cyprus, Cyprus
https://doi.org/10.1007/978-981-99-6553-3_20
254 Y. Kassem et al.
1 Introduction
Due to population growth, urbanization, and economic development, energy demand

has been increasing significantly in developing countries. The need for energy, partic-
ularly electricity, is increasing rapidly in developing countries. However, many devel-
oping countries face significant challenges in meeting their energy demands. Also,
the International Energy Agency (IEA) reports that more than 1.2 billion individuals
across the globe do not have electricity access [1]. Moreover, many developing coun-
tries rely heavily on imported fossil fuels for their energy needs, which can make
their energy systems vulnerable to price fluctuations and supply disruptions. There-
fore, renewable energy (RE) sources have significant potential to provide clean and
affordable energy to developing countries. Investing in RE can help these countries
reduce their dependence on imported fuels, and increase their energy security [2–5].
Accordingly, RE has received growing attention as a means to reduce emissions and
decrease reliance on fossil fuels for energy production. They have significant poten-
tial to reduce emissions and consumption of fossil fuels. Recently, the use of RE has
been increasing rapidly in many countries around the world such as German, China,
the USA, Denmark, Brazil, and Costa Rica [6–8].
Wind energy has several advantages over traditional energy sources. As it is
powered by natural wind, wind energy is a clean and environmentally friendly source
of energy [9]. It is also virtually unlimited and abundantly available worldwide,
making it a promising domestic energy source in many countries. Furthermore, with
the ongoing advancements in wind energy technology, it has become one of the
most cost-effective renewable energy sources available [10]. Generally, wind speed
is a critical parameter for evaluating wind potential at a specific location because it
directly influences the amount of energy that can be harnessed from the wind [9, 10].
Furthermore, the amount of wind power generated is proportional to the cube of the
wind speed [11]. Furthermore, wind speed is affected by several factors, including
topography, surface roughness, and local weather patterns [11].
Accordingly, wind power estimation (WPE) plays a crucial role in the context
of renewable energy, particularly in the planning and design of wind farms [12].
Wind power, which is harnessed from the kinetic energy of wind, is one of the most
abundant and widely available sources of renewable energy [13]. Accurate WPE
potential is essential for maximizing the efficiency and economic viability of wind
energy projects. Besides, wind power estimation assists in selecting the appropriate
wind turbine models and optimizing their placement within a wind farm [14]. There-
fore, by considering the estimated wind power, developers can determine the most
suitable turbine size and layout configuration to maximize energy capture and overall
project performance. Consequently, accurate wind power estimation enables devel-
opers to estimate the potential energy production of a wind farm. Additionally, by
estimating the potential energy generation, developers have the opportunity to assess
the environmental advantages in terms of greenhouse gas emissions reduction, air
pollution mitigation, and conservation of natural resources.
Wind Power Prediction in Mediterranean Coastal Cities Using … 255
Recently, it has been demonstrated that artificial neural network (ANN) is a

powerful tool for predicting wind power. Several studies have utilized ANN to predict
wind speed and wind power using different meteorological/weather data [15–18]. For
instance, Noorollahi et al. [15] predicted the wind speed in Iran using three ANN
models. The results showed that the adaptive neuro-fuzzy inference system model
stands out as a superior tool for accurately predicting wind speeds. Ghorbani et al.
[16] presented a case study that focuses on modeling monthly wind speed values
using meteorological data in Iran. Ghanbarzadeh et al. [17] utilized air temperature,
relative humidity, and vapor pressure data as input variables for the ANN model to
estimate the future wind. Kassem et al. [18] evaluated the effectiveness of various
models in predicting wind power density (WPD) in Ercan, Northern Cyprus.
Consequently, this study aims to predict monthly wind power density in Mediter-
ranean coastal cities (MCCs) using a multi-layer perceptron feedforward neural
network model (MLPNN). To this aim, the meteorological input variables used in
this study were collected from TerraClimate from 2010 to 2021.
2 Material and Method
2.1 Study Area and Dataset
The Eastern Mediterranean region is known for its rich wind energy potential, partic-
ularly in the coastal cities located along the Mediterranean Sea. These cities have the
advantage of being situated in a region that experiences high wind speeds due to the
unique climatic conditions of the area. Additionally, the topography of the region,
including the surrounding mountains and valleys, can further enhance the wind flow
and increase the wind energy potential. Several studies have highlighted the potential
of wind energy in cities such as Alexandria, Beirut, Haifa, and Izmir, among others.
These cities have shown promising wind speed patterns, which make them suitable for
the installation of wind turbines and the generation of wind energy. The utilization
of wind energy in these cities can provide numerous benefits, including reducing
dependence on fossil fuels, mitigating greenhouse gas emissions, and promoting
sustainable development. Figure 1 shows the details regarding the selected MCCs.
In general, GMD has been employed to understand the impact of weather parame-
ters on wind power density (WPD) prediction due to the limited availability of actual
weather parameter data. It should be noted that WPD is estimated using Eq. (1) [16].
P 1
= ρv 3 (1)
A 2
In this study, GMD is obtained from the TerraClimate dataset. Generally, Terra-
Climate, Developed by a team of researchers at the University of California, Santa
Barbara, is a comprehensive and widely used global gridded climate dataset that
Fig. 1 Latitude, longitude, and elevation for all selected locations
provides monthly estimates of various climate variables. It offers valuable insights

into the historical climate conditions across the globe and is particularly useful for
climate research, impact assessment, and modeling studies. The TerraClimate dataset
incorporates a wide range of data sources, including ground-based meteorological
station observations, satellite measurements, and reanalysis products. These sources
are integrated using advanced statistical techniques to create a consistent and high-
quality dataset. The dataset covers the entire globe with a spatial resolution of 2.5 arc-
minutes, which translates to approximately 0.04° × 0.04°. One of the key advantages
of the TerraClimate dataset is its extensive temporal coverage. It spans from 1958
to the present, providing over six decades of climate data. This long-term coverage
enables the study of climate variability, trends, and changes over time, aiding in the
understanding of climate dynamics and informing future projections. The TerraCli-
mate dataset includes several essential climate variables. These variables encompass
temperature, precipitation, vapor pressure, solar radiation, and wind speed. Each
variable is provided at a monthly resolution, allowing for a detailed examination
of seasonal and interannual variations [19–22]. Thus, the data maximum tempera-
ture (Tmax), minimum temperature (Tmin), downward radiation (DR), wind speed
(WS), actual evapotranspiration (AE), and precipitation (PP) data for the period of
2010–2021.
2.2 MLPNN Model
MLPNN is a specific model of artificial neural network (ANN) comprising numerous

layers of interconnected nodes [23, 24]. This type of neural network follows feedfor-
ward architecture, enabling the flow of information in a unidirectional manner [23,
24]. The key feature of the MLPNN model is its ability to learn complex nonlinear
relationships between input and output data. During the training phase, the model
adjusts the weights and biases associated with each neuron to minimize the differ-
ence between the predicted output and the desired output. This process, known as
backpropagation, utilizes an optimization algorithm to iteratively update the model
parameters and improve its performance. This process is called backpropagation and
is illustrated in Fig. 2.
In general, the activation functions such as linear, hyperbolic tangent, and logistic
functions are commonly employed.
Linear = x (2)
Fig. 2 Flowchart of MLPNN model

1
Hyperbolic Tan Function = (3)
1 + e−x
e x − e−x
Logistic = , (4)
e x + e−x
where x is the input of the activation functions.
2.3 Statistical Indices (SI)
The performance evaluation of the developed models involves the utilization of

several statistical metrics. In the current study, four statistical metrics were employed
to estimate the performance of the model.
(a) Coefficient of Determination (R2 )
R-squared assesses regression model fit by measuring the proportion of variance
explained by independent variables. Values range from 0 to 1, with higher values
indicating better fit.
(b) Root Mean Squared Error (RMSE)
RMSE directly quantifies the deviations or errors between the predicted values and
the corresponding observed values. It measures the average magnitude of these devi-
ations, providing a straightforward indication of how closely the model’s predictions
align with the actual data. A smaller RMSE signifies a better fit and indicates that
the model’s predictions are closer to the observed values.
(c) Mean Absolute Error (MAE)
MAE is a statistical index that calculates the average absolute difference between
the predicted and observed values in a regression model. It provides a measure of the
model’s accuracy without considering the direction of the errors. Like RMSE, lower
MAE values indicate better performance.
(d) Nash–Sutcliffe efficiency (NSE)
NSE is a statistical index commonly used in hydrological and environmental
modeling. It quantifies the proportionate difference between the residual variance
and the observed variance, providing a measure of the relative magnitude between
the two. NSE ranges from negative infinity to 1, with 1 indicating a perfect fit and
values below zero suggesting poor performance. NSE assesses the model’s ability to
reproduce the mean and variability of the observed data.
The mathematical expressions for these metrics, as used in this study, are presented
in Eqs. (5–8).
∑n ( )2
i=1 aa,i − a p,i
R = 1 − ∑n (
2
)2 (5)
i=1 a p,i − aa,ave
| n
| 1 ∑( )2
RMSE = | aa,i − a p,i (6)
n i=1
1 ∑|| |
n
MAE = aa,i − a p,i | (7)
n i=1
∑n ( )2
i−1 aa,i − a p,i
NSE = 1 − ∑n ( )2 (8)
i−1 aa,i − aa,ave
3 Results and Discussions
Evaluating the wind potential of a specific location is a crucial initial step in the
effective planning of wind energy systems. In this paper, the influence of GC on the
accuracy of WPD prediction was investigated. To achieve this objective, the proposed
models were implemented and evaluated in two various scenarios.
Scenario 1(S#1) : WPD = f (Tmax, Tmin, PPmAE, WS, SR) (8)
Scenario 2(S#2) : WPD = f (GC, Tmax, Tmin, PPmAE, WS, SR) (9)
Generally, the partitioning of data can influence the model’s performance [16].
Moreover, Gholamy et al. [25] concluded that the empirical models achieve optimal
performance when approximately 70–80% of the data is allocated for training and the
remaining 20–30% is set aside for testing purposes. Therefore, the data were divided
randomly (75% for training and 25% for testing). Table 1 displays the descriptive
statistics for selecting data (Fig. 3).
In this work, a trial-and-error approach was employed to find the optimum network
configuration. Table 2 lists the optimum network parameter. Moreover, Figs. 3 and
4 show the architecture model for S#1 and S#2 (Fig. 4).
The scenario’s performance is compared with each other to investigate the effect
of geographical coordinates on the accurate prediction of WPD. The values of R2 ,
RMSE, and MAE are tabulated in Table 3. It is found that S#2 with the combination
of geographical coordinates and global meteorological data has produced the highest
value of R2 and minimum value of RMSE and MAE. The scatter plots of observed
and estimated data are shown in Fig. 5.
In the literature [26–29], the geographical coordinates significantly impact the
accuracy of predicting wind power density. The latitude and longitude of a specific
location determine its proximity to prevailing wind patterns, topographical features,
Table 1 Descriptive statistics data (GMD and GC (Latitude (La.), Longitude (Long.), and Elevation
(El.)) for all selected locations
Variable Unit Mean Standard deviation Minimum Maximum
Lat ° 34.187 1.91 31.132 36.897
Long ° 34.25 1.712 29.919 36.176
Alt m 183 359.51 0 1798
Tmax °C 25.546 5.762 12.21 37.39
Tmin °C 16.381 5.741 3.57 27.66
PP mm 50.64 68.13 0 444.6
AE mm 40.185 31.585 0 142.5
SR W/m2 216.2 73.13 70.75 338.31
WS m/s 3.0165 0.682 0.91 5.49
WPD W/m2 19.419 12.739 0.462 101.35
Fig. 3 MLPNN structure for S#1
and atmospheric conditions. These factors directly influence wind speed and direc-
tion, ultimately affecting the potential energy available for harnessing. Therefore,
considering the geographical coordinates is crucial for accurately predicting the wind
power density at a given site. Incorporating this information improves the precision
of wind energy assessments and facilitates optimal planning and design of wind
farms.
Table 2 Optimum parameters for the developed models

Scenario Parameter Value
S#1 Number of HL 1
Number of units in HL 2
AF (HL) Hyperbolic tangent
AF (OL) Linear
S#2 Number of HL 1
Number of units in HL 6
AF (HL) Hyperbolic tangent
AF (OL) Linear
Fig. 4 MLPNN structure for S#2

Table 3 Value of SI for the proposed models

Statistical indicator S#1 S#2
R2 0.993 0.998
RMSE [m/s] 1.077 0.574
MAE [m/s] 0.394 0.212
NSE 0.993 0.998
Fig. 5 Comparing and correlation between the observed and predicted data using MLPNN
4 Conclusions
Although the results of the present study were derived from a mathematical model
utilizing various gridded data, it is important to acknowledge that this study possesses
certain limitations that could be explored and addressed in future research. Firstly,
utilizing data coming from satellite measurements and various reanalyses is key
to the next-generation wind resource assessment and forecasting. Thus, the results
should be compared with the data collected from reanalysis datasets such as EAR5
to show the accuracy of the models. Moreover, terrain analysis was not considered
in this study. However, previous studies [30, 31] have indicated that regions become
less suitable for wind turbine installations as elevation and slope increase. Therefore,
future research should focus on the site selection of wind energy power plants using
GIS-multi-criteria evaluation.
The prediction of wind power density (WPD) is an important key for the designing
and planning of wind farms. Accurate WPD predictions enable engineers and plan-
ners to make informed decisions regarding the optimal placement and layout of wind
turbines, considering factors such as wind resource availability, energy production

estimates, and overall project feasibility. By accurately predicting WPD, the design
and planning process of wind farms can be optimized, leading to efficient utiliza-
tion of wind energy resources. Based on the value of SI, S#2 has the best predictive
performance compared to S#1 for the WPD estimations in MCCs. Moreover, the
findings indicate that MLPNN with the combination of geographical and global
meteorological data could increase the average performance of the model by 46%.
In the end, geographical coordinates are essential in wind farm planning and design
in Mediterranean coastal cities. They provide crucial information for assessing wind
resources, determining turbine placement, considering environmental factors, plan-
ning infrastructure connections, and monitoring the performance of wind farms.
Accurate geographical coordinates enable developers to make informed decisions
and optimize the design and operation of wind energy projects in these regions.
References
1. Muh E, Amara S, Tabet F (2018) Sustainable energy policies in Cameroon: a holistic overview.
Renew Sustain Energy Rev 82:3420–3429
2. Seriño MNV (2022) Energy security through diversification of non-hydro renewable energy
sources in developing countries. Energy Environ 33(3):546–561
3. Elum ZA, Momodu AS (2017) Climate change mitigation and renewable energy for sustainable
development in Nigeria: a discourse approach. Renew Sustain Energy Rev 76:72–80
4. Urban F (2014) Low carbon transitions for developing countries. Routledge
5. Kaygusuz K (2007) Energy for sustainable development: key issues and challenges. Energy
Sources Part B 2(1):73–83
6. Martinot E (2016) Grid integration of renewable energy: flexibility, innovation, and experience.
Annu Rev Environ Resour 41:223–251
7. Juarez-Rojas L, Alvarez-Risco A, Campos-Dávalos N, de las Mercedes Anderson-Seminario
M, Del-Aguila-Arcentales S (2023) Effectiveness of renewable energy policies in promoting
green entrepreneurship: a global benchmark comparison. In: Footprint and entrepreneurship:
global green initiatives. Springer Nature Singapore, Singapore, pp 47–87
8. Wu X, Tian Z, Guo J (2022) A review of the theoretical research and practical progress of
carbon neutrality. Sustain Oper Comput 3:54–66
9. Kassem Y, Gökçekuş H, Zeitoun M (2019) Modeling of techno-economic assessment on wind
energy potential at three selected coastal regions in Lebanon. Model Earth Syst Environ 5:1037–
1049
10. Alayat MM, Kassem Y, Çamur H (2018) Assessment of wind energy potential as a power
generation source: a case study of eight selected locations in Northern Cyprus. Energies
11(10):2697
11. Kassem Y, Gökçekuş H, Janbein W (2021) Predictive model and assessment of the potential
for wind and solar power in Rayak region Lebanon. Model Earth Syst Environ 7:1475–1502
12. Kassem Y, Çamur H, Aateg RAF (2020) Exploring solar and wind energy as a power generation
source for solving the electricity crisis in Libya. Energies 13(14):3708
13. Gökçekuş H, Kassem Y, Al Hassan M (209) Evaluation of wind potential at eight selected
locations in Northern Lebanon using open source data. Int J Appl Eng Res 14(11):2789–2794
14. Xu Y, Li Y, Zheng L, Cui L, Li S, Li W, Cai Y (2020) Site selection of wind farms using GIS
and multi-criteria decision-making method in Wafangdian China. Energy 207:118222
15. Noorollahi Y, Jokar MA, Kalhor A (2016) Using artificial neural networks for temporal and
spatial wind speed forecasting in Iran. Energy Convers Manage 115:17–25
16. Ghorbani MA, Khatibi R, Hosseini B, Bilgili M (2013) Relative importance of parame-
ters affecting wind speed prediction using artificial neural networks. Theoret Appl Climatol
114:107–114
17. Ghanbarzadeh A, Noghrehabadi AR, Behrang MA, Assareh E (2009) Wind speed predic-
tion based on simple meteorological data using artificial neural network. In: 2009 7th IEEE
international conference on industrial informatics. IEEE, pp 664–667
18. Kassem Y, Gökçekuş H, Çamur (2019) Analysis of prediction models for wind power density,
case study: Ercan area, Northern Cyprus. In 13th international conference on theory and applica-
tion of fuzzy systems and soft computing—ICAFS-2018 13. Springer International Publishing,
pp 99–106
19. Abatzoglou JT, Dobrowski SZ, Parks SA, Hegewisch KC (2018) TerraClimate, a high-
resolution global dataset of monthly climate and climatic water balance from 1958–2015.
Scientific data 5(1):1–12
20. Cepeda Arias E, Cañon Barriga J (2022) Performance of high-resolution precipitation datasets
CHIRPS and TerraClimate in a Colombian high Andean Basin. Geocarto Int 1–21
21. Wiwoho BS, Astuti IS (2022) Runoff observation in a tropical Brantas watershed as observed
from long-term globally available TerraClimate data 2001–2020. Geoenvironmental Disasters
9(1):12
22. Kassem Y, Gökçekuş H, Mosbah AAS (2023) Prediction of monthly precipitation using various
artificial models and comparison with mathematical models. Environ Sci Pollut Res 1–27
23. Kassem Y, Çamur H, Zakwan AHMA, Nkanga NA (2023) Prediction of cold filter plugging
point of different types of biodiesels using various empirical models. In: 15th international
conference on applications of fuzzy systems, soft computing, and artificial intelligence tools–
ICAFS-2022. Springer Nature Switzerland, Cham, pp 50–57
24. Kassem Y (2023) Analysis of different combinations of meteorological parameters and well
characteristics in predicting the groundwater chloride concentration with different empirical
approaches: a case study in Gaza Strip Palestine. Environ Earth Sci 82(6):134
25. Gholamy A, Kreinovich V, Kosheleva O (2018) Why 70/30 or 80/20 relation between training
and testing sets: a pedagogical explanation. Departmental Technical Reports (CS). 1209. https://
scholarworks.utep.edu/cs_techrep/1209
26. Manwell JF, McGowan JG, Rogers AL (2009) Wind energy explained: theory, design, and
application. Wiley
27. Wood DH (2012) Wind energy: Fundamentals, resource analysis, and economics. Springer
Science & Business Media
28. Li C, Yuan Y (2013) Wind resource assessment and micro-siting: science and engineering.
Springer Science & Business Media
29. Hasager CB, Nielsen M, Pena A (eds) (2016) Wind energy systems: optimising design and
construction for safe and reliable operation. Woodhead Publishing
30. Zalhaf AS, Elboshy B, Kotb KM, Han Y, Almaliki AH, Aly RM, Elkadeem MR (2021) A high-
resolution wind farms suitability mapping using GIS and fuzzy AHP approach: a national-level
case study in Sudan. Sustainability 14(1):358
31. Shorabeh SN, Firozjaei MK, Nematollahi O, Firozjaei HK, Jelokhani-Niaraki M (2019) A risk-
based multi-criteria spatial decision analysis for solar power plant site selection in different
climates: a case study in Iran. Renew Energy 143:958–973
Next Generation Intelligent IoT Use Case
in Smart Manufacturing
Bharati Rathore
Abstract Smart manufacturing has become a significant topic of interest among

manufacturing industry professionals and researchers in recent times. Smart manu-
facturing involves the incorporation of cutting-edge technologies, including the
Internet of things, cyber-physical systems, cloud computing, and big data. This is
evident in the Industry 4.0 framework. Henceforth, next generation smart manufac-
turing showcases a deep amalgamation of artificial intelligence (AI) tech and highly
developed production technologies. It is woven into each stage of the design, produc-
tion, product, and service cycle and influences the entire life cycle. The subsequent
form of smart manufacturing is the central propellant for the novel industrial revolu-
tion and is expected to be the principal catalyst for the transformation and betterment
of the manufacturing industry for the generations ahead. Through this research, we
proposed a ‘4*S Model’ via conceptualization of smart, sensorable, sustainable, and
secure concepts at various stages of manufacturing. The evolution of smart manufac-
turing for Industry 4.0 is an ongoing process and this research will provide insights
for further developments in manufacturing.
Keywords Smart manufacturing · Industry 4.0 · 4*S model
1 Introduction
Countries around the world are actively participating in the new industrial revolution
by adopting advanced technologies like AI, IoT, cloud computing, CPS, and big data
into their manufacturing processes through smart manufacturing. Through adopting
SM, they are able to gain access to improved visibility into production, value, and
performance in real-time, boost manufacturing agility and flexibility and enhance
predictive maintenance and analytics [1]. The inculcation of SM is believed to be a
crucial determinant in creating a competitive edge for the manufacturing industry of
B. Rathore (B)
Birmingham City University, Birmingham B5 5JU, UK
e-mail: Bharati.rathore@bcu.ac.uk
https://doi.org/10.1007/978-981-99-6553-3_21
266 B. Rathore
widely recognised countries on the global scale. Through SM, nations have access to
smart devices, technologies, and tools that allow them to gain a comprehensive picture
of their manufacturing activities, the environmental and market conditions, and the
customer requirements, ultimately resulting in the highest levels of productivity,
efficiency, and cost savings among others [2].
Germany has crafted the corporate initiative of ‘Industrie 4.0’, and the UK has
introduced their ‘UK Industry 2050’ policy as a response to the Fourth Industrial
Revolution and its use of expanded smart technologies and smart manufacturing
(SM) [3]. The afore-mentioned strategies fundamentally aim to bring to the forefront
the use of new technologies as a tool for revolutionising the classical manufacturing
industry and its practices. In this light, Germany’s Industrie 4.0 is focused on the
vertical integration of highly developed technologies such as AI, big data, the IoT,
and the like, whereas the UK Industry 2050 brings forth the overall vision of a brave
new world in the rapidly evolving world of industrial technology. In addition to that,
France has launched the ‘New Industrial France’ initiative, Japan has set forward the
‘Society 5.0’ plan, and Korea has initiated the ‘Manufacturing Innovation 3.0’ course
of action to join the bandwagon of countries embracing the industrial revolution and
its use of highly developed technologies, as part of their smart manufacturing (SM)
methods [1, 4]. The New Industrial France brings about the idea of an ecosystem
for SM in France, Society 5.0 of Japan throws a light on the full use of SM and its
products, and Korea’s Manufacturing Innovation 3.0 focuses on economic outcomes.
All these strategies were made to enhance SM and its multiple effects and comply
with the principles of the Industrial Revolution 4.0 [4, 5]. The adoption of intelligent
manufacturing is considered crucial for major countries to tackle the challenges posed
by the Fourth Industrial Revolution and to stay ahead in the manufacturing industry
[6]. It is seen as a key strategy to gain a competitive edge. Intelligent manufacturing
refers to the integration of advanced digital technologies like IoT, AI, ML, robotics,
and automation to enhance manufacturing efficiency, improve product quality and
innovation, and promote production agility and flexibility. Intelligent manufacturing
provides a convergence of business, operational, and engineering processes, enabling
a comprehensive collection of data which can be used to drive agile, predictive,
and real-time decision-making [7]. As such, intelligent manufacturing can play a
significant role in helping countries to develop innovative products and services,
improve resource utilisation, and gain the competitive edge in future of manufacturing
[8, 9].
Since the beginning of the twenty-first century, new-generation information tech-
nology has seen an explosive uptick. Modern technology like smartphones, tablets,
cloud computing, and social media have changed the way we interact, shop, consume
information, and create art. The emergence of the Internet of things, artificial intelli-
gence, and machine learning has spurred the creation of innovative digital technolo-
gies such as autonomous vehicles, speech recognition systems, virtual reality, and
robotics that are constantly evolving [10, 11]. A new method of producing goods and
services, new-generation smart manufacturing makes use of cutting-edge technolo-
gies including IoT, data analytics, machine learning, and 3D printing [12]. In order
to allow more effective, secure, and cost-effective operations, this new approach
Next Generation Intelligent IoT Use Case in Smart Manufacturing 267
makes use of automated procedures including predictive maintenance, supply chain

optimisation, and monitoring of numerous performance factors [13]. The Fourth
Industrial Revolution, known as Industry 4.0, is the cornerstone of smart production
and involves extensive automation, technological integration, and data sharing. Smart
manufacturing enables manufacturers to utilise their resources more effectively and
produce/serve clients more quickly and accurately than ever before [14].
2 Literature Review
Industry 4.0, or ‘smart manufacturing’, is a rapidly expanding industry that uses

cutting-edge computing and communication technologies to enhance the efficiency of
automated production processes. In recent years, there have been a lot of fascinating
advancements and studies in this field.
The application of artificial intelligence (AI) and machine learning (ML) to
enhance industrial processes has received attention. Few companies, for instance,
are investigating how AI may be used to optimise production schedules in order
to increase productivity, decrease downtime, and cut costs. Others are focusing on
utilising ML to foresee equipment breakdowns in order to save costly repairs and
extend the lifespan of the equipment. The application of collaborative robots, or
‘cobots’, to enhance industrial processes is another field of research. Because cobots
are made to operate alongside human operators, they may do risky or repetitive jobs,
freeing up human employees to concentrate on more difficult ones. Cobots may also
pick up skills from human operators, making their work more efficient and secure.
Smart manufacturing has benefited greatly from Internet of things (IoT). Real-time
monitoring of equipment performance, energy utilisation, and other important param-
eters is possible with IoT sensors and devices. Then, by utilising this information,
the output may be improved while wasting less. Smart manufacturing technologies
have the potential to completely change the manufacturing sector, making it more
effective, sustainable, and lucrative.
2.1 Research Objectives of This Study
. To evaluate the potential advantages of smart manufacturing for the manufacturing

sector and to pinpoint the crucial components necessary for its success.
. To assess the influence of new technologies on the manufacturing sector and their
potential uses for smart manufacturing, such as AI, IoT, CPS, cloud computing,
and big data.
. To put forth a framework that incorporates the ideas of smart, sensorable,
sustainable, and secure operations at various phases of the manufacturing process.
268 B. Rathore
. To provide insights for further developments in smart manufacturing and its

potential contributions to the industrial revolution and the betterment of the
manufacturing industry.
2.2 Research Methodology
The research was conducted using qualitative data obtained from several secondary
sources, such as journals, newspapers, publications, magazines, books, and online
and offline websites. The information was collected from libraries and through online
searches and was thoroughly examined and verified for accuracy.
3 Next Generation Technology Development
The Fourth Industrial Revolution is an advanced technology which encompasses the

use of industrial Internet of things, 3D printing, robotic technology, deep learning,
artificial intelligence, blockchain technology, and cloud computing into manufac-
turing processes [15]. This combination of technologies will enable seamless commu-
nication between machines, people, and systems as well as accelerated automa-
tion and optimization of entire processes whether it is development, management,
or operations [16]. It will also bring down the cost of production with improved
production outputs. Industry 4.0 fundamentally changes the way of managing and
controlling the manufacturing processes by introducing cyber-physical systems in
production [17]. By using connected computers and smart sensors, cyber-physical
systems allow manufacturers to simulate and visualise entire production processes
in a virtual environment, improve performance, and make real-time decisions [15,
18]. Moreover, through real-time analytics and predictive maintenance, manufac-
turers can achieve higher precision and improved traceability. Finally, Industry 4.0
makes possible production sharing, collaboration, and open innovation across borders
thanks to smart data exchange [19, 20].
4 Defining ‘4*S Model’
A paradigm for smart manufacturing called the ‘4*S Model’ places an emphasis on
four fundamental principles: smart, sensorable, sustainable, and secure.
The term ‘smart’ describes the use of cutting-edge digital technology, such as
cloud computing, artificial intelligence, and the Internet of things (IoT), to optimise
industrial processes. Manufacturers may save costs, boost production, and improve
product quality by incorporating these technologies.
The term ‘Sensorable’ highlights the significance of sensors in the production

process. Manufacturers may monitor and analyse vital parameters, such as temper-
ature, pressure, and humidity, to enhance efficiency and decrease waste by utilising
sensors to capture real-time data on tools, machinery, and products.
The term ‘sustainable’ emphasises the significance of socially and ecologically
conscious production methods. This entails minimising waste, cutting back on carbon
emissions, and encouraging moral employment practices.
The word ‘secure’ emphasises the necessity of strong cybersecurity measures to
guard against theft, hacking, and other security risks.
Overall, the ‘4*S Model’ offers a thorough framework for incorporating smart,
sensorable, sustainable, and secure technologies into the manufacturing process,
allowing manufacturers to increase productivity, cut costs, and improve product
quality while giving environmental and ethical considerations top priority.
4.1 Conceptualization of ‘4*S Model’
Planning Stage:
Smart: Employ AI-based planning algorithms that can optimise production schedules
and reduce energy consumption [21]. Utilise energy-saving AI-based planning algo-
rithms that can optimise manufacturing schedules. By analysing data and forecasting
future production requirements, AI-based planning algorithms may be utilised to
optimise production schedules and lower energy usage [22]. These algorithms can
calculate the best production schedule by taking into consideration factors including
production capacity, inventory levels, demand forecasts, and energy consumption
trends [23]. To take into account unforeseen changes in demand or supply chain
problems, this timetable may be modified in real time. AI-based algorithms that
analyse historical data on energy use can find trends in that data that can be used to
develop production plans that are energy-efficient [24–26].
Sensorable: Sensors can be deployed to monitor the quality of raw materials and to
track inventory levels. Sensors can be very useful in monitoring the quality of raw
materials and tracking inventory levels in a manufacturing facility [27]. For example,
sensors can be placed on conveyor belts or in storage bins to monitor the weight and
quantity of raw materials as they are received and used in the production process.
This data can be sent to a central system for analysis and used to optimise inventory
management and reduce waste [28, 29].
Here is a model in Fig. 1 that incorporates smart, sensorable, sustainable, and
secure concepts at various stages of manufacturing:
Sustainable: Implement eco-friendly production techniques that reduce waste and
minimise carbon footprint. Implementing eco-friendly production techniques is
crucial for reducing waste and minimising the carbon footprint of a manufacturing
facility [30].
270 B. Rathore
Fig. 1 Proposed ‘4*S model’
Secure: Cybersecurity measures need to be taken to secure the planning data and
algorithms [31]. AI-based planning algorithms become more prevalent in manufac-
turing facilities, it is critical to ensure that proper cybersecurity measures are in place
to protect the planning data and algorithms from cyberthreats [32].
Design Stage:
Smart: Employ 3D printing, virtual reality, and other digital tools to design and test
products before production [33]. Employing 3D printing, virtual reality, and other
numerous advantages for product development might come from using digital tools
throughout the design phase [34].
Sensorable: Data on product performance and usage may be gathered using sensors
and utilised to inform product design [35]. Sensors may be used to gather useful
information on how a product is used and how well it performs, which can then be
analysed and utilised in the design process. This information can shed light on how
users are interacting with the product, the aspects they value most, and potential areas
for development [36–38].
Sustainable: Use long-lasting materials, and create goods that can be recycled [39].
For products to have a less negative impact on the environment and to result in
long-term cost savings, sustainability must be included into product design [40, 41].
Delivery Stage:
Smart: Use IoT and machine learning to optimise transportation routes and delivery
schedules [42].
Sensorable: Sensors can be used to track product delivery and monitor temperature
and humidity levels during transportation [43].
Sustainable: Optimise delivery routes to minimise fuel consumption and reduce
carbon emissions.
Secure: Implement security measures to protect product data and prevent theft during
transportation [44, 45].
Maintenance Stage:
Smart: Use predictive maintenance techniques that employ machine learning
algorithms to predict equipment failures before they occur.
Sensorable: Sensors can be used to monitor equipment health and detect anomalies.
Sustainable: Use eco-friendly maintenance practices that reduce waste and energy
consumption.
Secure: Implement security measures to prevent unauthorised access to maintenance
data and control systems [46].
5 Challenges in Smart Manufacturing
73% of manufacturing companies report having less than two years of expertise with
smart manufacturing. On their roadmap for smart manufacturing, 70% of them assert
that they are moving slowly or not at all. Numerous industrial organisations appear to
still be in the early phases of deploying smart manufacturing technology, according
to statistics from several publications. Many businesses are having trouble moving
forward with their smart manufacturing roadmaps despite the potential advantages
of smart manufacturing, like improved productivity, quality control, and personaliza-
tion. There are a number of explanations as to why this could be the case [22, 47–49].
First, implementing smart manufacturing technologies can be expensive and time-
consuming. Many organisations may not have the resources or expertise to make the
necessary investments and changes to their operations [50–52]. Second, there may be
a lack of understanding or awareness of the potential benefits of smart manufacturing.
Some businesses might not perceive the benefit of investing in these technologies,
particularly if they have had prior success using conventional production techniques
[53–56]. Another deciding factor may be how challenging it will be to implement
smart manufacturing technologies and integrate them with existing systems. Finan-
cial constraints and a shortage of qualified staff may impede certain firms’ progress
272 B. Rathore
towards smart manufacturing [57–59]. With the right knowledge, training, and invest-
ment, the potential benefits of smart manufacturing—such as increased effectiveness,
productivity, and cost savings—can, nevertheless, get over the first obstacles [60].
Third, issues with data management and security can arise. Large-scale data gathering
and analysis are required for smart manufacturing systems, which can be challenging
to protect and manage. The amount of data produced by machines and systems has
increased as a result of the adoption of smart manufacturing technologies [61, 62].
This data needs to be managed efficiently to derive insights and make informed
decisions. Additionally, there is a need for secure data storage and transmission to
protect sensitive information from cyberthreats. Organisations must invest in robust
data management and security systems to address these challenges and ensure the
smooth functioning of their smart manufacturing operations [63].
6 Advantages in Smart Manufacturing
In order to increase productivity, quality, and efficiency in the manufacturing process,

‘smart manufacturing’ incorporates digital technologies and data analytics [47]. The
following are some rewards of smart manufacturing:
6.1 Direct Cost Savings
Businesses can benefit from various immediate cost-saving benefits of smart

manufacturing. Here are a few illustrations:
Reduced work Costs: Many procedures that would typically need human work
are automated by smart manufacturing technologies. Businesses may considerably
lower their employment expenses by doing this.
Enhanced Efficiency: Real-time monitoring and process optimisation are possible
with smart manufacturing technologies. This contributes to waste reduction, down-
time reduction, and production efficiency improvement, all of which can result in
cost savings [47].
Lower Maintenance Costs: Real-time equipment and machinery monitoring is
another capability of smart manufacturing systems. By doing this, companies may
spot possible faults before they develop into bigger ones, which can cut down on
maintenance expenses and expensive downtime.
Reduced Energy Consumption: Smart manufacturing systems can optimise energy
consumption by identifying areas where energy is being wasted and making adjust-
ments to reduce consumption. This can lead to significant cost savings on energy
bills [48].
6.2 Indirect Cost Savings
In addition to the direct cost savings advantages of smart manufacturing, there are
also several indirect cost savings benefits that businesses can enjoy. Here are a few
examples:
Improved Quality: Smart manufacturing systems can monitor processes in real
time and make adjustments as needed to maintain consistent product quality. Cost
reductions may occur from fewer faults, reduced rework, and fewer warranty claims
as a consequence [49].
Enhanced Safety: By monitoring and managing hazardous processes, smart manu-
facturing systems may also increase worker safety. As a result, there may be a
decrease in the likelihood of accidents and injuries, which might lead to decreased
insurance premiums and workers’ compensation expenses.
Better Inventory Management: Real-time visibility into inventory levels and
production schedules may be provided by smart manufacturing systems. This can
assist companies in minimising inventory carrying costs, preventing stockouts and
overstocks, and optimising inventory levels—all of which can result in cost savings
[22, 49].
Improved Customer Satisfaction: Smart manufacturing systems can help busi-
nesses deliver products that meet or exceed customer expectations. This can lead to
higher customer satisfaction, repeat business, and positive word-of-mouth referrals,
all of which can result in increased revenue and profitability [22].
7 Limitations of This Study
There are a few limitations to this study that should be taken into account, even
if the suggested ‘4*S Model’ for smart manufacturing is a promising and innova-
tive way to improve the production process. The qualitative data used in this study
was gathered from secondary sources. Furthermore, the research just considers the
suggested model and does not offer a thorough examination of all feasible methods
for enhancing smart manufacturing. The model described in this study might not be
appropriate for all industrial processes and could need to be further customised and
adapted for certain industries or applications.
8 Conclusion
The term ‘Industry 4.0’ is frequently used to describe the trend towards geographi-
cally dispersed, Internet-connected, medium size smart manufacturing. This change
is being fuelled by advances in IoT, cloud computing, and AI technologies as well
as the growing availability of inexpensive and dependable Internet access. These
274 B. Rathore
innovations make it possible for factories to be more adaptive and flexible, allowing
them to swiftly change their production methods in response to shifting consumer
needs. Real-time data analytics and the usage of smart sensors can also assist to
decrease waste and improve manufacturing processes. The development of new busi-
ness models and income sources, such as the offering of value-added services or the
development of new goods, is another potential outcome of the move towards smart
factories. Using cutting-edge technology like IoT, AI, and machine learning, smart
manufacturing is a viable strategy for changing the industrial sector. The poten-
tial advantages of smart manufacturing, such as greater efficiency, productivity, and
cost savings, are enormous even if many organisations are still in the early phases of
implementation. To fully realise the advantages of smart manufacturing technologies,
organisations must handle challenges like data management, security, and workforce
development. Organisations may put themselves in a position for long-term success
in a market that is changing quickly by overcoming these obstacles and investing
in smart manufacturing. The ‘Proposed 4*S Model’ may be improved even more
by incorporating it with current frameworks, creating performance measures, using
it in certain sectors, exploring the social elements, and creating decision support
systems. The 4*S model may be modified to suit the particular challenges and poten-
tial of smart manufacturing in various sectors and environments by addressing these
research areas.
References
1. Wang B, Tao F, Fang X, Liu C, Liu Y, Freiheit T (2021) Smart manufacturing and intelligent
manufacturing: a comparative review. Engineering 7(6):738–757
2. Davis J, Edgar T, Porter J, Bernaden J, Sarli M (2012) Smart manufacturing, manufacturing
intelligence and demand-dynamic performance. Comput Chem Eng 47:145–156
3. Tao F, Qi Q, Liu A, Kusiak A (2018) Data-driven smart manufacturing. J Manuf Syst 48:157–
169
4. Yang H, Kumara S, Bukkapatnam ST, Tsung F (2019) The internet of things for smart
manufacturing: a review. IISE Trans 51(11):1190–1216
5. Rathore B (2022) Textile Industry 4.0 transformation for sustainable development: prediction
in manufacturing & proposed hybrid sustainable practices. Eduzone: Int Peer Rev/Refereed
Multidisciplinary J 11(1):223–241
6. Kusiak A (2017) Smart manufacturing must embrace big data. Nature 544(7648):23–25
7. Ramakrishna S, Khong TC, Leong TK (2017) Smart manufacturing. Proc Manuf 12:128–131
8. Ghobakhloo M (2020) Determinants of information and digital technology implementation for
smart manufacturing. Int J Prod Res 58(8):2384–2405
9. Rathore B (2023) Integration of artificial intelligence and it’s practices in apparel industry. Int
J New Media Stud (IJNMS) 10(1):25–37
10. Qu YJ, Ming XG, Liu ZW, Zhang XY, Hou ZT (2019) Smart manufacturing systems: state of
the art and future trends. Int J Adv Manuf Technol 103:3751–3768
11. Phuyal S, Bista D, Bista R (2020) Challenges, opportunities and future directions of smart
manufacturing: a state of art review. Sustain Futures 2:100023
12. Kusiak A (2019) Fundamentals of smart manufacturing: a multi-thread perspective. Annu Rev
Control 47:214–220
13. Zenisek J, Wild N, Wolfartsberger J (2021) Investigating the potential of smart manufacturing
technologies. Proc Comput Sci 180:507–516
14. Li L, Lei B, Mao C (2022) Digital twin in smart manufacturing. J Ind Inf Integr 26:100289
15. Zhou J, Li P, Zhou Y, Wang B, Zang J, Meng L (2018) Toward new-generation intelligent
manufacturing. Engineering 4(1):11–20
16. Leng J, Ye S, Zhou M, Zhao JL, Liu Q, Guo W, Cao W, Fu L (2020) Blockchain-secured smart
manufacturing in industry 4.0: a survey. IEEE Trans Syst Man Cybern Syst 51(1):237–252
17. Zheng P, Wang H, Sang Z, Zhong RY, Liu Y, Liu C, Mubarok K, Yu S, Xu X (2018) Smart manu-
facturing systems for Industry 4.0: conceptual framework, scenarios, and future perspectives.
Front Mech Eng 13:137–150
18. Namjoshi J, Rawat M (2022) Role of smart manufacturing in industry 4.0. Mater Today Proc
63:475–478
19. Mahmoud MA, Ramli R, Azman F, Grace J (2020) A development methodology framework of
smart manufacturing systems (Industry 4.0). Int J Adv Sci Eng Inf Technol 10(5):1927–1932
20. Çınar ZM, Zeeshan Q, Korhan O (2021) A framework for industry 4.0 readiness and maturity
of smart manufacturing enterprises: a case study. Sustainability 13(12):6659
21. Zuo Y (2021) Making smart manufacturing smarter—a survey on blockchain technology in
Industry 4.0. Enterp Inf Syst 15(10):1323–1353
22. Ahuett-Garza H, Kurfess T (2018) A brief discussion on the trends of habilitating technologies
for Industry 4.0 and smart manufacturing. Manuf Lett 15:60–63
23. Ludbrook F, Michalikova KF, Musova Z, Suler P (2019) Business models for sustainable
innovation in industry 4.0: smart manufacturing processes, digitalization of production systems,
and data-driven decision making. J Self-Gov Manage Econ 7(3):21–26
24. Valaskova K, Nagy M, Zabojnik S, Lăzăroiu G (2022) Industry 4.0 wireless networks and
cyber-physical smart manufacturing systems as accelerators of value-added growth in Slovak
exports. Mathematics 10(14):2452
25. Bajic B, Cosic I, Lazarevic M, Sremcev N, Rikalovic A (2018) Machine learning techniques
for smart manufacturing: applications and challenges in industry 4.0. Department of Industrial
Engineering and Management Novi Sad, Serbia, 29
26. Evjemo LD, Gjerstad T, Grøtli EI, Sziebig G (2020) Trends in smart manufacturing: role of
humans and industrial robots in smart factories. Curr Robot Rep 1:35–41
27. Hopkins E, Siekelova A (2021) Internet of things sensing networks, smart manufacturing big
data, and digitized mass production in sustainable industry 4.0. Econ Manage Financ Markets
16(4)
28. Saleh A, Joshi P, Rathore RS, Sengar SS (2022) Trust-aware routing mechanism through an
edge node for IoT-enabled sensor networks. Sensors 22(20):7820
29. Machado CG, Winroth MP, Ribeiro da Silva EHD (2020) Sustainable manufacturing in Industry
4.0: an emerging research agenda. Int J Prod Res 58(5):1462–1484
30. Davim JP (ed) (2013) Sustainable manufacturing. John Wiley & Sons
31. Sharma R, Jabbour CJC, Lopes de Sousa Jabbour AB (2021) Sustainable manufacturing and
industry 4.0: what we know and what we don’t. J Enterp Inf Manage 34(1):230–266
32. Petrillo A, Cioffi R, De Felice F (eds) (2018) Digital transformation in smart manufacturing.
BoD–Books on Demand
33. Abikoye OC, Bajeh AO, Awotunde JB, Ameen AO, Mojeed HA, Abdulraheem M, Oladipo
ID, Salihu SA (2021) Application of internet of thing and cyber physical system in Industry
4.0 smart manufacturing. In: Emergence of cyber physical system and IoT in smart automation
and robotics: computer engineering in automation. Springer International Publishing, Cham,
pp 203–217
34. Maheswari M, Brintha NC (2021) Smart manufacturing technologies in industry-4.0. In: 2021
Sixth international conference on image information processing (ICIIP), vol 6. IEEE, pp 146–
151
35. Bhatnagar D, Rathore RS. Cloud computing: security issues and security measures. Int J Adv
Res Sci Eng 4(01):683–690
36. Vaidya S, Ambad P, Bhosle S (2018) Industry 4.0—a glimpse. Proc Manuf 20:233–238
276 B. Rathore
37. Wade K, Vochozka M (2021) Artificial intelligence data-driven internet of things systems,
sustainable industry 4.0 wireless networks, and digitized mass production in cyber-physical
smart manufacturing. J Self-Gov Manage Econ 9(3):48–60
38. Frontoni E, Loncarski J, Pierdicca R, Bernardini M, Sasso M (2018) Cyber physical systems for
industry 4.0: towards real time virtual reality in smart manufacturing. In: Augmented reality,
virtual reality, and computer graphics: 5th international conference, AVR 2018, Otranto, Italy,
June 24–27, Proceedings, Part II 5. Springer International Publishing, pp 422–434
39. Shin KY, Park HC (2019) Smart manufacturing systems engineering for designing smart
product-quality monitoring system in the industry 4.0. In: 2019 19th International conference
on control, automation and systems (ICCAS). IEEE, pp 1693–1698
40. Muthu SS (ed) (2017) Sustainability in the textile industry. Springer, Singapore
41. Lombardi Netto A, Salomon VA, Ortiz-Barrios MA, Florek-Paszkowska AK, Petrillo A, De
Oliveira OJ (2021) Multiple criteria assessment of sustainability programs in the textile industry.
Int Trans Oper Res 28(3):1550–1572
42. Nayyar A, Kumar A (eds) (2020) A roadmap to industry 4.0: smart production, sharp business
and sustainable development. Springer, Berlin, pp 1–21
43. Kumar K, Zindani D, Davim JP (2019) Industry 4.0: developments towards the fourth industrial
revolution. Springer, Cham, Switzerland
44. Kandasamy J, Muduli K, Kommula VP, Meena PL (eds) (2022) Smart manufacturing
technologies for industry 4.0: integration, benefits, and operational activities. CRC Press
45. Affatato L, Carfagna C (2013) Smart textiles: a strategic perspective of textile industry. In:
Advances in science and technology, vol 80. Trans Tech Publications Ltd, pp 1–6
46. Büchi G, Cugno M, Castagnoli R (2020) Smart factory performance and Industry 4.0. Technol
Forecast Soc Chang 150:119790
47. Liu Y, Xu X (2017) Industry 4.0 and cloud manufacturing: a comparative analysis. J Manuf
Sci Eng 139(3)
48. Osterrieder P, Budde L, Friedli T (2020) The smart factory as a key construct of industry 4.0:
a systematic literature review. Int J Prod Econ 221:107476
49. Longo F, Nicoletti L, Padovano A (2017) Smart operators in industry 4.0: A human-centered
approach to enhance operators’ capabilities and competencies within the new smart factory
context. Comput Ind Eng 113:144–159
50. Pascual DG, Daponte P, Kumar U (2019) Handbook of industry 4.0 and SMART systems. CRC
Press
51. Oztemel E, Gursev S (2020) Literature review of Industry 4.0 and related technologies. J Intell
Manuf 31:127–182
52. Misra S, Roy C, Mukherjee A (2021). Introduction to industrial internet of things and industry
4.0. CRC Press
53. Friedman T (2018) Hot, flat, and crowded: why we need a green revolution—and how it can
renew America, Farrar, Straus and Giroux, 2008. ISBN 978-0-312-42892-1
54. Rathore B (2022) Supply chain 4.0: sustainable operations in fashion industry. Int J New Media
Stud (IJNMS) 9(2):8–13
55. Porter M, Heppelmann J (2014) How smart, connected products are transforming competition.
Harvard Bus Rev
56. Chavarría-Barrientos D, Camarinha-Matos LM, Molina A (2017) Achieving the sensing, smart
and sustainable “everything”. In: Camarinha-Matos L, Afsarmanesh H, Fornasiero R (eds)
Collaboration in a data-rich world. PRO-VE 2017, vol 506. IFIP Advances in Information and
Communication Technology. Springer, Cham
57. Kumar S, Rathore RS, Mahmud M, Kaiwartya O, Lloret J (2022) BEST—blockchain-enabled
secure and trusted public emergency services for smart cities environment. Sensors 22(15):5733
58. Molina A, Ponce P, Ramirez M, Sanchez-Ante G (2014) Designing a S2-enterprise (smart x
sensing) reference model, collaborative systems for smart networked environments. IFIP Adv
Inf Commun Technol 434:384–395
59. Rosling H, Rosling O, Rosling R (2018) Factfulness. Factfulness AB. ISBN 978-1-250-10781-7
60. Bainbridge BS, Roco MC (eds) (2006) Managing nano-bio-info-cogno innovations.

Converging Technologies in Society, Springer
61. Miranda J, Cortes D, Ponce P, Noguez J, Molina JM, López EO, Molina A (2018) Sensing,
smart and sustainable products to support health and well-being in communities. In: 2018
International conference on computational science and computational intelligence (CSCI’18).
IEEE
62. Kang HS, Lee JY, Choi S, Kim H, Park JH, Son JY, Kim BH, Noh SD (2016) Smart manu-
facturing: past research, present findings, and future directions. Int J Precis Eng manuf Green
Technol 3:111–128
63. Tuptuk N, Hailes S (2018) Security of smart manufacturing systems. J Manuf Syst 47:93–106
Forecasting Financial Success App:
Unveiling the Potential of Random Forest
in Machine Learning-Based Investment
Prediction
Ashish Khanna, Divyansh Goyal, Nidhi Chaurasia,

Abstract In the complicated and dynamic financial market, the forecast of financial
investment choices is essential for assisting investors in making decisions. In order
to predict financial investment alternatives, this research article focuses on using
machine learning techniques, specifically the random forest algorithm. The perfor-
mance of each algorithm such as KNN, decision tree, logistic regression, and random
forest was evaluated based on its accuracy and precision using testing set. Thus, the
study concludes that the random forest algorithm outperformed the other algorithms
with greater accuracy and is most suitable for predicting the profitability of finan-
cial investments. Additionally, we demonstrate the creation of a web application
that incorporates the predictive model and enables users to enter pertinent data and
obtain real-time predictions. To train and validate the random forest model, historical
financial data, including market indexes, business fundamentals, and macroeconomic
indicators, is gathered and preprocessed to eliminate any missing or inconsistent data
points. The algorithm’s collection of decision trees demonstrates that it is reliable and
adaptable when processing complex data. The Django-based web application’s user-
friendly interface allows users to enter parameters and get projections for various
investment options. Assessment criteria including accuracy and review certify that
the arbitrary random forest computation is adequate. The incorporation of the vision
display into the web application provides a practical tool for making data-driven
project decisions. This research contributes to the development of a user-friendly
web application and observational evaluation of the algorithm’s performance, as
A. Khanna · D. Goyal · N. Chaurasia (B)

Maharaja Agrasen Institute of Technology, Guru Gobind Singh Indraprastha, University Delhi,
Delhi, India
e-mail: passionatenidhi123@gmail.com
A. Khanna
e-mail: ashishkhanna@mait.ac.in
T. H. Sheikh
Department of Computer Science, Shri Krishan Chander, Government Degree College,
Poonch 185101, India
https://doi.org/10.1007/978-981-99-6553-3_22
280 A. Khanna et al.
well as to the interests of profit-seeking investors, financial experts, and analysts

interested in machine learning for financial decision-making.
Keywords Financial investment · Machine learning · Random forest algorithm ·

K-nearest neighbors · Decision tree · Logistic regression
1 Introduction
The financial investment landscape is complex and uncertain, forcing investors to

make educated judgments in a setting where many variables are at play. Fundamental
analysis, technical indicators, and professional judgment are frequently used in tradi-
tional investment strategies. A growing number of people are interested in using
machine learning techniques to improve investing decision-making, nevertheless,
as a result of quick technological improvements and the accessibility of enormous
volumes of financial data.
A branch of artificial intelligence called machine learning has the capacity to find
and extract patterns from vast volumes of data [1]. It uses a variety of techniques and
models to classify or predict using past data without having any accurate knowledge.
Machine learning has the capacity to recognize relationships, pinpoint flaws, and
adjust to market fluctuations in the context of financial investing.
This term paper’s goal is to study how machine learning calculations might
be used to foresee budgetary speculative options. We want to develop foresight
models that can assist financial specialists in making more informed decisions by
utilizing historical data and significant highlights, such as showcase lists, corporate
fundamentals, and macroeconomic pointers. Exact forecasting of speculative deci-
sions can result in rewarding experiences, enabling speculators to optimize portfolio
allocation, control risk, and identify potentially fruitful opportunities. Data-driven
decision-making is a benefit provided by machine learning technologies, which can
supplement and enhance traditional venture methodologies. This consideration will
center on assessing the execution of different machine learning calculations [2].
Through broad experimentation and investigation, we will compare the prescient
precision, vigor, and computational efficiency of these calculations. Moreover, we are
going to explore the effect of distinctive highlight sets and information preprocessing
strategies on the prescient execution.
The leftover portion of this paper is organized as follows: Segment 2 gives an
outline of related work and the existing writing within the field of monetary venture
forecast utilizing machine learning. Segment 3 portrays the technique, counting
information collection, preprocessing, and the chosen machine learning calculations.
Segment 4 presents the exploratory comes about and execution assessment. At long
last, Segment 5 summarizes the conclusions and talks about the impediments of the
ponder and diagrams potential roads for future inquiry about.
Forecasting Financial Success App: Unveiling the Potential of Random … 281
2 Literature Review
See Table 1.
3 Concept
3.1 Financial Investment
A financial product, like a stock or a cryptocurrency, that has been purchased

primarily with the hope of making money is referred to as an investment. Every
investment comes with its own set of risks, rewards, and disadvantages, all of which
have an impact on how and when investors decide to buy or sell assets.
3.2 Machine Learning
A subset of artificial intelligence called machine learning enables software programs

to increase the precision of their predictions without having to explicitly program
them. These apps can analyze previous data and produce predictions for fresh output
values by using machine learning algorithms.
Types of learning in machine learning:
(i) supervised learning, (ii) unsupervised learning, and (iii) reinforcement
learning.
We have used a supervised learning approach in the model and trained it using a
random forest algorithm. Assuming trees are free to grow to maximum height O(log
n), training takes with maximum height O(log n), training of random forest will take
O(tunlogn), where t is the number of trees and u is the number of features considered
for splitting. The prediction of a new sample takes O(tlogn) [13] (Fig. 1).
In the above flowchart, these generalized steps have taken place into account.
Data preprocessing: Prepare the data containing a dataset (taken from Kaggle) of 1
lakh records by performing necessary cleaning, transformation, and normalization
steps.
Fit the random forest algorithm to the training set: Train the Random Forest model
using the training data.
Predict the test results: Use the trained model to predict the outcomes for the test
set.
Assess the accuracy of the predictions: Evaluate the accuracy of the predictions by
creating a confusion matrix.
Table 1 Contribution in the field of finance and machine learning

S. No. Reference of Technology/key areas Contributions
paper
1 [3] Investment, return, Performance of investment return prediction
prediction, machine model is analyzed
learning
2 [4] Prediction with data Multi-modal regression, different techniques
mining techniques, like co-integration test, granger causality,
efficient market etc., were implemented
hypothesis
3 [5] Auto-encoder; Subperiod analysis, portfolio performance
covariance matrix; during different subperiods as defined by
dimensionality market volatility, inflation, and credit spread
reduction; machine
learning
4 [6] Deep learning, finance, Using deep learning techniques for
risk prediction, machine intelligent assessment, financial investment
learning risk prediction can be significantly enhanced
5 [7] Machine learning, In this study, machine learning techniques
financial markets, are used to investigate the predictability of
economists, data financial markets. They are compared to
analysis techniques that affect accuracy and
profitability
6 [8] Deep learning, financial The thorough review of machine learning
risk management, research in financial risk management
artificial intelligence, presented in this paper covers tasks,
taxonomy, risk analysis approaches, difficulties, and new trends
7 [9] Machine learning, deep The use of machine learning approaches for
learning, quantitative quantitative issues is demonstrated in the
analysis financial risk study, which significantly speeds up fitting
management while retaining levels of accuracy that are
acceptable
8 [10] Convolutional neural The rise of artificial intelligence has
network, deep learning, propelled numerous algorithms, making
artificial intelligence them popular technologies across diverse
domains
9 [11] Gray relational analysis, In order to improve the creation of multilayer
analytic hierarchy neural networks, this research investigates
process, variational data augmentation using generative models
auto-encoders
10 [12] Forecasting techniques, Main objective of this research explores
sentiment analysis diverse approaches from fields such as data
mining, machine learning, and sentiment
analysis
Fig. 1 Model architecture elucidating sequential execution of the process pipeline
Visualize the test set results: Display the results of the test set using appropriate
visualization techniques.
4 Methodology
The random forest algorithm is a widely used supervised learning technique in

machine learning. It is capable of addressing both classification and regression prob-
lems [14]. This algorithm leverages ensemble learning, which involves combining
multiple classifiers to effectively tackle complex problems and enhance model
performance.
In Fig. 2, we have tried depicting a dataset instance extracted from a training set
and generating a decision tree based on the inputs. Random forest classifier consists
of multiple decision trees trained on different subsets of the dataset [15]. By averaging
the predictions of these trees, the algorithm enhances the accuracy of predictions.
Instead of relying on a single decision tree, the random forest considers the majority
votes from all trees to determine the final output [16]. Increasing the number of trees
in the forest improves accuracy and mitigates overfitting concerns.
Despite being highly precise, random forests are harder to interpret. The use
of feature importance analysis, partial dependence graphs, and SHAP values can
get around this restriction. The interpretability of the random forest algorithm is
improved by these techniques, which offer insights into significant factors, connec-
tions between characteristics and forecasts, and individual prediction explanations.
Steps implemented for building our machine learning model:
1. Import libraries such as NumPy and Pandas for data manipulation, as well as csv
and warnings for handling CSV files and suppressing warnings, respectively.
2. The invest_data DataFrame is created by reading the ‘invest.csv’ file using the
pd.read_csv() which contains a dataset of 1 lakh records with columns of gender,
Fig. 2 Random forest algorithm flowchart
age, savings objective, time period for investment, purpose of investment, return
rate, etc.
3. Categorical columns in the invest_data DataFrame are encoded by replacing their
values with numerical equivalents using the replace() method.
4. The column ‘Which investment avenue do you mostly invest in?’ is dropped
from the invest_data DataFrame to create the feature set X. The target variable is
assigned to the y variable, which contains the values from the ‘Which investment
avenue do you mostly invest in?’ column of dataset.
5. The code then imports the necessary libraries for model training, including train_
test_split from scikit-learn for splitting the data, accuracy_score for evaluating
the model’s accuracy, and DecisionTreeClassifier for the decision tree classifier.
6. The features X and the target variable y are split into training and testing sets
using the train_test_split() function. The training set (X_train and y_train) is used
to train the model, and the testing set (X_test and y_test) is used to evaluate the
model’s performance. NOTE—Data is split into training and testing sets using
train_test_split, with test size of 0.3 and random_state 42.
7. Next, the code imports RandomForestClassifier from scikit-learn to create an

instance of the random forest classifier. The random forest classifier (rf) is trained
using the fit() method on the training data. The joblib library is imported to save
the trained random forest classifier to ‘trained_model.sav’ file using the dump()
function.
This file is then loaded using the joblib.load() function to retrieve the trained
model for financial investment prediction. In summary, this code reads investment
data from a CSV file, encodes categorical columns, splits the data into training and
testing sets, trains a random forest classifier on the training data, and saves the trained
model to a file [17].
5 Results
In this section, we have dealt with the results inferred with our machine learning
predictive model that suggests us investment options based on user inputs [18].
In Table 2, it can be seen that logistic regression has the minimum training and
testing accuracy in contrast to others. The underlying cause is the absence of a
linear relationship between the target label and the features. Consequently, logistic
regression struggles to accurately predict targets, even when trained on the available
data. And with an increase in K-value, the K-nearest neighbors (KNN) algorithm fits
a more gradual curve to the data. This occurs because a larger K-value incorporates
a greater amount of data, resulting in reduced sharpness or abruptness, ultimately
decreasing the overall complexity and flexibility of the model [18].
In Fig. 3, we can see that the random forest has the maximum testing accuracy.
In the graph shown in Fig. 4, we can see that the random forest has the maximum
training accuracy similar to the decision tree algorithm.
By calculating the proportion of examples that are correctly classified to all occur-
rences, accuracy provides a simple and intuitive evaluation of performance. When
working with balanced classes, it works well. However, in situations of class imbal-
ance or variable misclassification costs, precision may not be sufficient on its own.
In these cases, precision, recall, or F1 score ought to be taken into account for a more
thorough assessment of the model’s performance.
Accuracy on Testing Data of random forest classifier is 0.4987(50% approx..).
Table 2 Comparison of several machine learning algorithms on the model’s prediction

Algorithms K-nearest Decision tree Gaussian Random forest Logistic
neighbors Naive Bayes regression
Training 0.68825 0.9920125 0.5039375 0.9920125 0.119485714
accuracy
Testing 0.5031 0.49905 0.49525 0.4987 0.108033333
accuracy
Fig. 3 Graph depicting the testing accuracy of each algorithm in the model
Fig. 4 Graph showing the training accuracy of several algorithms
Accuracy on Training Data of random forest: 0.9920125(99% approx..).

In Fig. 5, data analysis is done for females for analyzing which investment avenues
they mostly invest in and creates a displot between count and age of females using
seaborn library in machine learning written in Python [19].
In Fig. 6, data analysis is done for females for analyzing which investment avenues
they mostly invest in and creates a displot between count and age of males using
seaborn library in machine learning written in Python similar to analysis of females
[20].
Based on the model generated, we created an application for predicting the best
option for financial investment based on the user inputs and giving output in the
real-time basis. Here is a glimpse of the dashboard of our Web App containing user
inputs like gender, age, savings objective, time period for investment, purpose of
Fig. 5 Count versus age displot for which investment females mostly invest in
Fig. 6 Count versus age displot for which investment males mostly invest in
investment, return rate, and many more that fits in for every age group and individual
with different investing goals.
Based on the inputs entered, the model predicts the output and also shows some
examples based on the output as shown in Figs. 7 and 8.
Hence, in Figs. 9 and 10 we made the ML model work in the backend along with
some frontend technologies that helped us create a UI/UX for the predictions that
have been shown.
Fig. 7 Generic form to take basic user details like (gender, age, purpose for investment, etc.) and
predict appropriate results
Fig. 8 You get the flexibility to rank your investment option and see what is actually best for your
investment goals
6 Discussions
The talk area of this term paper centers on the discoveries and suggestions of utilizing
the arbitrary woodland calculation for budgetary venture alternative expectation, as
well as the advancement of the Internet application utilizing Django. This comes
about to illustrate the viability of the irregular woodland calculation in precisely
anticipating money-related venture alternatives. The algorithm’s capacity to capture
Fig. 9 Output shows the suggestions in a particular investment option (e.g., we can see the output
as cryptocurrency, for the inputs provided)
Fig. 10 Point-plot depicting the trend of investing in result-based investment option for the last
5 years, fetching real-time data from an API
designs and connections in verifiable money-related information contributes to its

vigorous performance.
The integration of the prescient show into a web application built with Django
improves client openness and comfort. Clients can input their parameters and get
real-time expectations, giving them profitable data for making speculation decisions.
The aim of this study is to create a good system that suggests financial invest-
ments that will allow users to decide on which assets they should invest wisely, with
personalized investing options that fit their own requirements and preferences. Our
goal is to develop a robust system that can predict the assets that will yield the most
profitability of investment assets by analyzing previous financial data and using the
power of machine learning.
The talk highlights the potential benefits of encouraging algorithmic investiga-
tion, such as investigating other machine learning calculations like KNN, decision
tree, random forest, Naive Bayes, and logistic regression. Comparing their execution
with the arbitrary random forest calculation may lead to moved forward expectation
accuracy [20].
Overall, this investigation illustrates the potential of machine learning and web
application advancement in moving forward money-related venture choice expecta-
tion and gives profitable bits of knowledge for future investigation and advancement
in this field.
7 Limitations
In this section, we have discussed the limitations faced during the dataset compilation,
data preprocessing challenges, and numerous errors that we handled to increase the
model’s efficiency. While the research paper explores the prediction of financial
investment choices using machine learning techniques, such as the random forest
algorithm, it is important to acknowledge certain limitations associated with the
study:
(i) Limited generalizability to other algorithms and approaches.
(ii) Potential impact of data availability and quality on model reliability.
(iii) Possibility of overfitting or model selection bias.
(iv) Lack of comparison to established benchmarks or existing approaches.
(v) Inadequate consideration of external factors and market volatility.
(vi) Absence of longitudinal analysis to assess model stability over time.
(vii) Limited evaluation of user feedback and experience with the web application.
In conclusion, our study has shown how well the random forest algorithm predicts
potential financial investment opportunities. The incorporation of this algorithm into
a Django-built web application has given consumers an easy-to-use platform to enter
their parameters and get real-time forecasts. The outcomes demonstrate the algo-
rithm’s capacity to identify links and trends in historical financial data, producing
precise forecasts. This study advances the subject of machine learning for financial
decision-making by demonstrating the usefulness of the random forest algorithm.
In future, investigating elective machine learning calculations, improving high-
light building, consolidating progressed chance evaluation procedures, joining real-
time information, gathering client input, extending the application’s scale, moving
forward interpretability, and coordination with exchanging stages will develop the
field of monetary speculation choice forecast utilizing machine learning. These

endeavors point to supply speculators with more exact, effective, and user-friendly
devices for making educated venture choices.
References
1. Omar B, Zineb B, Cortés Jofré A, González Cortés D (2018) A comparative study of machine
learning algorithms for financial data prediction. In: 2018 International symposium on advanced
electrical and communication technologies (ISAECT). Rabat, Morocco, pp 1–5. https://doi.org/
10.1109/ISAECT.2018.8618774
2. Dhokane RM, Sharma OP (2023) A comprehensive review of machine learning for financial
market prediction methods. In: 2023 International conference on emerging smart computing
and informatics (ESCI). Pune, India, pp 1–8. https://doi.org/10.1109/ESCI56872.2023.100
99791
3. Ralevic N, Glisovic NS, Djakovic VD, Andjelic GB (2014) The performance of the investment
return prediction models: theory and evidence. In: 2014 IEEE 12th International symposium
on intelligent systems and informatics (SISY). https://doi.org/10.1109/sisy.2014.6923590
4. Pawar P, Nath S. Machine learning applications in financial markets
5. Brennan Irish MJ. Machine learning and factor-based portfolio optimization
6. Sun Y, Li J (2022) Deep learning for intelligent assessment of financial investment risk predic-
tion. Comput Intell Neurosci 2022:11, Article ID 3062566. https://doi.org/10.1155/2022/306
2566
7. Ma T, Hsu (2016) Bridging the divide in financial market forecasting: machine learners versus
financial economists. Expert Syst Appl 61(C):215–234
8. Mashrur A, Luo W, Zaidi NA, Robles-Kelly A (2020) Machine learning for financial risk
management: a survey. IEEE Access 8:203203–203223. https://doi.org/10.1109/ACCESS.
2020.3036322
9. De Spiegeleer J, Madan DB, Reyners S, Schoutens W (2018) Machine learning for quantitative
finance: fast derivative pricing, hedging and fitting. Quant Finance 18(10):1635–1643
10. Xing FZ, Cambria E, Welsch RE (2018) Natural language based financial forecasting: a survey.
Artif Intell Rev 50(1):49–73
11. Das SP, Padhy S (2018) A novel hybrid model using teaching–learning-based optimization and
a support vector machine for commodity futures index forecasting. Int J Mach Learn Cybern
9(1):97–111
12. Brabazon A, O’Neill M (2008) An introduction to evolutionary computation in finance. IEEE
Comput Intell Mag 3(4):42–55
13. Marco Virgolin. Time complexity for different machine learning algorithms. https://marcovirg
olin.github.io/extras/details_time_complexity_machine_learning_algorithms/
14. Chen C, Zhang P, Liu Y, Liu J (2020) Financial quantitative investment using convolutional
neural network and deep learning technology. Neurocomputing 390:384–390. ISSN 0925-2312.
https://doi.org/10.1016/j.neucom.2019.09.092
15. Chen M, Chiang H, Lughofer E, Egrioglu E (2020) Deep learning: emerging trends, applications
and research challenges. Soft Comput A Fusion Found Methodologies Appl 24(11):7835–7838.
https://doi.org/10.1007/s00500-020-04939-z
16. Xiao Y, Huang W, Wang J (2020) A random forest classification algorithm based on dichotomy
rule fusion. In: 2020 IEEE 10th International conference on electronics information and emer-
gency communication (ICEIEC). Beijing, China, pp 182–185. https://doi.org/10.1109/ICEIEC
49280.2020.9152236
17. Nalabala D, Nirupamabhat M (2021) Financial predictions based on fusion models—a system-
atic review. In: 2021 International conference on emerging smart computing and informatics
(ESCI). Pune, India, pp 28–37. https://doi.org/10.1109/ESCI50559.2021.9397024
18. Goyal. Investment option-prediction. Available at: https://rb.gy/xd5id

19. White H (1988) Economic prediction using neural networks: the case of IBM daily stock
returns. IEEE Int Conf Neural Networks II:451–458
20. Unadkat V, Sayani P, Kanani P, Doshi P (2018) Deep learning for financial prediction. In: 2018
International conference on circuits and systems in digital enterprise technology (ICCSDET).
Kottayam, India, pp 1–6. https://doi.org/10.1109/ICCSDET.2018.8821178
Integration of Blockchain-Enabled SBT
and QR Code Technology for Secure
Verification of Digital Documents
Ashish Khanna, Devansh Singh, Ria Monga, Tarun Kumar, Ishaan Dhull,
Abstract Incorporating blockchain technology and QR codes presents a potential

solution to the issues educational bodies encounter while handling and authenticating
student records in today’s digital age. This research paper puts forth a method to
handle the security and authenticity of academic documents by incorporating these
technologies. The system proposed makes efficient use of cloud storage, namely
Amazon Web Services (AWS), to store student data in a CSV format, enabling effi-
cient addition and retrieval of student data. This method uses a distinctive tokenURI
for each student, forming QR codes that act as verifiable connections to their academic
records. When these QR codes are used, students are guided to link their web3 wallet
address or form a new wallet, resulting in the production of a non-transferable Soul-
bound token (SBT) that encompasses all academic information. Additionally, the QR
codes direct users to recognized non-fungible tokens (NFT) marketplaces where the
SBTs can be publicly verified. This innovative system ensures the authenticity and
integrity of student records, providing a reliable and decentralized means for docu-
ment verification in educational institutions. This research significantly contributes
to the advancement of secure and trustworthy digital credential management systems,
addressing the evolving needs of the academic community.
Keywords Blockchain · QR code · Soulbound token · Web3 · Smart contract ·

Document verification · AWS
A. Khanna · D. Singh · R. Monga · T. Kumar (B) · I. Dhull

Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology,
GGSIPU, Delhi, India
e-mail: tarunkumar.sbb@gmail.com
A. Khanna
e-mail: ashishkhanna@mait.ac.in
T. H. Sheikh
Department of Computer Science, Shri Krishan Chander Government Degree College Poonch,
Jammu and Kashmir, India
https://doi.org/10.1007/978-981-99-6553-3_23
1 Introduction
In recent years, the use of blockchain technology has received a lot of attention in
various industries due to its potential to create secure and transparent systems. Decen-
tralization, immutability, transparency, and security are some aspects of blockchain.
One area that stands to benefit from blockchain implementation is higher education,
where the need for secure document verification and credential authentication is of
utmost importance as highlighted in the study conducted by Gräther et al. [1]. Forgery
of digital documents is a significant risk in the digital age, enabling the creation of
fraudulent records with serious consequences. This compromises data integrity and
trust and can result in legal and financial consequences [2]. Verifying digital docu-
ments takes a significant amount of time and human effort. By utilizing blockchain
technology, this burden can be greatly decreased, and institutions can use QR codes
to make the procedure simpler.
Traditional document storage and verification methods in academic institutions
involve CSV files, which despite their usefulness are prone to security breaches
and unauthorized access. To overcome these issues, we propose a blockchain and
QR code-based solution for a more secure and tamperproof document verification
system. Initially, student data, usually in Excel format, is migrated to a secured cloud
storage service like Amazon Web Services’ S3 bucket, maintaining the integrity and
privacy of each student’s information. Each student is assigned a unique tokenURI,
encompassing pertinent details from the uploaded data. This tokenURI is subse-
quently translated into a QR code, acting as a digital signature linked to the students’
documents, enhancing accessibility and streamlining the verification process. On
receiving a document such as a degree, students scan the QR code, which directs
them to provide their web3 wallet address. If not previously available, a new wallet
can be created.
The introduction of the wallet address allows for the minting of a unique Soul-
bound token (SBT) linked to each student’s details and the prior tokenURI, securing
the document’s future verifiability. The minting of the SBT culminates in the QR
code redirecting to vetted NFT marketplaces, where the SBT can be publicly scruti-
nized and authenticated. Parties such as potential employers or academic institutions
can thus verify the document’s authenticity, leveraging the unchangeable nature of
blockchain technology.
Highlights of the proposed work
. Integration of blockchain and QR codes for secure document verification.
. Enhanced data integrity and protection against manipulation and unauthorized
access.
. Transparent and auditable verification process with immutable blockchain
records.
. Streamlined verification procedures, reducing time and potential errors.
Integration of Blockchain-Enabled SBT and QR Code Technology … 295
2 Literature Review
The rapid development of digital documents and the need for secure verification
methods have led to a growing interest in the use of blockchain technology. This
literature review aims to explore the existing research and advancements in inte-
grating blockchain-enabled Soulbound tokens (SBT) and QR code technology for
the secure verification of digital documents.
Blockchain technology aims to create a decentralized environment where third-
party control is not necessary [3]. Blockchain technology has gained significant
attention due to its decentralized and tamper-resistant nature. Several studies have
demonstrated its potential in document verification and authentication. For instance,
Kumutha and Jayalakshmi [4] proposed a blockchain-based document verification
system that ensures the immutability and integrity of documents, providing a reli-
able verification method. The utilization of blockchain-based solutions for the secure
storage of medical data has garnered considerable attention in recent times [5]. Chen
et al. [6] proposed a blockchain-based searchable encryption scheme for electronic
health records (EHRs) to address the issue of data leakage and enhance patient
privacy. The scheme enables different medical organizations and individuals to
securely access and share EHRs stored on the blockchain, ensuring a higher level of
confidence in data privacy and integrity.
Various concerns persist that hinder the widespread adoption of blockchain tech-
nology in the education sector. These concerns encompass legal complexities, chal-
lenges related to immutability, and issues of scalability [7]. Alam [8] discussed
how blockchain technology can help monitor student accomplishments precisely.
QR code technology has emerged as a widely adopted method for encoding and
storing information in a compact format. Its popularity can be attributed to its ease
of use and compatibility with smartphones. For example, Wellem et al. [9] used
digital signatures and a QR code-based document verification system that enables
efficient and convenient verification. The study demonstrated the potential of QR
codes in combating document counterfeiting and improving verification processes.
Researchers have explored the integration of QR codes in document verification
systems to enhance the security and accessibility of digital documents.
Blockchain’s decentralized and secure nature has shown promise in revolution-
izing industries, especially in digital document verification [10, 11]. One approach is
to use a public blockchain, where anyone can participate in the network and verify the
authenticity of the Soulbound tokens. Sharma et al. [12] have designed a blockchain-
based application to generate, maintain, and validate healthcare certificates. Li
et al. [13] presented a workflow for credentialing, identifying issues in the industry
and proposed ideal attributes for credentialing infrastructure. It also presented a
framework for evaluating blockchain-based education projects and discussed factors
hindering their adoption. Weyl et al. [14], in their whitepaper, propose the concept of
Soulbound tokens that can function as persistent records for credit-relevant history,
encompassing education credentials, work history, and rental contracts. This inno-
vative approach enables individuals to stake meaningful reputations based on these
tokens.
Currently, universities and institutes are leveraging blockchain technology
in education primarily for managing academic degrees and evaluating learning
outcomes [15, 16]. The University of Nicosia (UNIC) [17] utilizes a program that
generates and stores certificates on the Bitcoin platform. Furthermore, UNIC became
the first university in the world to issue academic certificates whose authenticity can
be verified through the Bitcoin blockchain. Sony Global Education [18] is creating a
blockchain network to store academic records securely and allow for different config-
urations and distribution of educational data. Blockcerts [19] is an open standard
that facilitates the creation, issuance, viewing, and verification of certificates using
blockchain technology. These digital records are securely registered on a blockchain,
digitally signed, resistant to tampering, and easily shareable.
Overall, the use of Soulbound tokens [14] for digital document verification has
the potential to enhance the security and verifiability of digital documents and has
been an active area of research in recent years.
3 Methodology
The proposed methodology consists of the following steps:

Step 1: Data Extraction and Formatting
. Obtain the student data from the institute in the form of an Excel file (CSV format).
. Extract relevant information for each student, such as name, student ID, program,
and other necessary details.
Step 2: Uploading Data to AWS S3 Bucket
. Set up an AWS S3 bucket to store the student data securely.
. Upload the extracted student data to the S3 bucket, ensuring appropriate access
controls and encryption mechanisms are implemented.
Step 3: TokenURI Generation and Soulbound Token (SBT) Creation
. For each student, generate a unique tokenURI based on their data stored in the S3
bucket.
. Utilize blockchain technology (e.g., Ethereum) to create an SBT that contains the
student’s tokenURI and additional metadata.
. Mint the SBT to the student’s wallet address or guide them to create a new web3
wallet for receiving the SBT.
Step 4: QR Code Generation and Linking
. Create a QR code specific to each student, embedding the link to their SBT on
the blockchain.
. Associate the QR code with the student’s data and add it to the corresponding row
in the Excel file.
Step 5: Document Verification Process
. At the time of issuing a document or degree, scan the QR code on the document
using a QR code reader or a dedicated application.
. The QR code will direct the user to an interface where they can either enter their
web3 wallet address or create a new wallet.
. After adding the address, the SBT will be minted to the student’s wallet, containing
all the details (tokenURI) generated during QR code generation.
. Verify the authenticity and integrity of the document by cross-referencing the
tokenURI and associated SBT properties.
Step 6: Soulbound Token Visibility and Verification
. Once the SBT has been successfully minted, the QR code will redirect to major
or verified NFT marketplaces.
. The Soulbound token will be visible on these marketplaces, allowing users
to verify the details and properties of the SBT, ensuring its authenticity and
legitimacy (Figs. 1, 2, and 3).
Complete Algorithm: QR_Verified_Docu_Auth

QR_Verified_Docu_Auth assists higher education institutions in integrating QR
codes into document verification processes. In this algorithm, student data is extracted
from an Excel file and uploaded to an AWS S3 bucket. Token URIs are generated,
and Soulbound tokens (SBTs) are created for each student. SBTs are then linked to
Fig. 1 Process for institutions

Fig. 2 Process for student
QR codes specific to each student. QR codes are scanned during document issuance,
prompting users to enter their web3 wallet addresses or create new ones. Students
receive SBTs containing all the necessary information in their wallets.
The authenticity of the documents is verified by cross-referencing the tokenURI
and SBT properties. The QR codes also redirect to verified NFT marketplaces where
the SBTs can be examined, ensuring their legitimacy. This algorithm provides a
secure and efficient method for verifying student documents through QR codes and
blockchain technology.
4 Performance Analysis
The process of evaluating the effectiveness and efficiency of the proposed algo-
rithm was explored. The results of the explored system were examined and it was
determined how it differed from the available manual methodology. This involved
measuring various metrics, such as time, scalability, authentication, security, or
automation, to determine the overall performance of the explored system.
Fig. 3 Complete working
4.1 Time
Considering the average number of students in an institute [20], the proposed method-
ology takes only 1–2 h to complete depending on the availability of resources
as compared to traditional methods, which require more time. Uploading data on
Amazon AWS takes only a few minutes. Generating QR codes and linking them
to Soulbound tokens can be done within an hour, while tokenURI generation and
Soulbound token generation can take several minutes per student. QR code provides
scalability and efficiency which helps to reduce the efforts to claim and verify the
document with the help of SBTs. The verification of digital documents can be done
in relatively less time as compared to manual verification.
4.2 Scalability
Using QR code to both claim and verify the digital documents makes the verifi-
cation process easier. The time consumed is significantly reduced as compared to
manual verification. It is simpler to handle and verify information for a big number of
students with the help of this effective system’s ability to manage massive amounts
of student data. The use of blockchain technology increases the trust and security in
document verification, further improving scalability by reducing the need for human
intervention and increasing the speed and accuracy of document verification. The
integration of QR codes with SBTs offers an effective solution for student document
verification by meeting the rising demand for verification with efficiency, accuracy,
and adaptability.
4.3 Authentication and Security
Blockchain technology provides strong security through cryptographic hashing,

decentralization, immutability, and consensus algorithms which ensure that the data
recorded on the blockchain is valid, distributed across various nodes, and tamper-
proof. The integration of QR codes and blockchain technology enhances document
authentication by providing an easier, more secure, and more reliable verification
process. Each document is assigned a unique QR code. Users can quickly access the
information they need about the document and check its legitimacy by scanning the
QR code. A distinct token is created for each document and kept on the blockchain.
The associated data of the document is safe and cannot be modified. Data can be
protected from unauthorized access by using encryption, access control methods,
and secure data sharing.
4.4 Automation
Manual document verification requires a lot of time and is not automated. The verifier
must speak with the institution and request information or use the services of a third-
party verification authority to validate a student’s individual document. With the
help of Soulbound tokens, a student who holds those tokens can share a link to them
which contains all relevant information about the document to be verified which can
be automatically confirmed with the help of blockchain technology. The proposed

methodology for document verification uses QR codes and blockchain technology
to make the authentication process simpler. The collection and structuring of student
data from the college or university’s reserve serve as the first step in the automation
process. The AWS S3 bucket is immediately populated with the extracted student
data. A Soulbound token (SBT) is created and can be claimed by the student and
verified by the employer.
The proposed methodology focuses on making the process of sharing student docu-
ments authentic and secure. The integration of QR codes and blockchain technology
offers a promising solution to securely store and process student data, generate
immutable Soulbound tokens (SBTs) on the blockchain, and facilitate seamless veri-
fication through QR codes. It makes the process of validating the integrity of student
documents tamper-free and efficient. The methodology contributes to the growth
of confidence and credibility in the higher education industry by combining the
simplicity of QR codes with the transparency and immutability of the blockchain.
By investigating compatibility with other blockchains and standards, imple-
menting modern cryptographic methods, conducting customer research, and applying
machine learning algorithms, the proposed methodology can be further enhanced.
The results of the following study may help refine the strategy and support the
continued improvement of document authentication systems in the higher education
sector.
References
1. Gräther W, Kolvenbach S, Ruland R, Schütte J, Torres C, Wendland F (2018) Blockchain for

education: lifelong learning passport. In: Proceedings of 1st ERCIM blockchain workshop
2018. European Society for socially embedded technologies (EUSSET)
2. Grolleau G, Lakhal T, Mzoughi N (2008) An introduction to the economics of fake degrees. J
Econ Issues 42(3):673–693
3. Yli-Huumo J, Ko D, Choi S, Park S, Smolander K (2016) Where is current research on
blockchain technology?—a systematic review. PLoS ONE 11(10):e0163477
4. Kumutha K, Jayalakshmi S (2022) Blockchain technology and academic certificate authen-
ticity—a review. Exp Clouds Appl Proc ICOECA 2021:321–334
5. Mahajan HB, Rashid AS, Junnarkar AA, Uke N, Deshpande SD, Futane PR, Alkhayyat A,
Alhayani B (2023) Integration of Healthcare 4.0 and blockchain into secure cloud-based
electronic health records systems. Appl Nanosci 13(3):2329–2342
6. Chen L, Lee WK, Chang CC, Choo KKR, Zhang N (2019) Blockchain based searchable
encryption for electronic health record sharing. Futur Gener Comput Syst 95:420–429
7. Loukil F, Abed M, Boukadi K (2021) Blockchain adoption in education: a systematic literature
review. Educ Inf Technol 26(5):5779–5797
8. Alam A (2022) Platform utilising blockchain technology for eLearning and online education
for open sharing of academic proficiency and progress records. In: Smart data intelligence:
proceedings of ICSMDI 2022. Springer Nature Singapore, Singapore, pp 307–320
9. Wellem T, Nataliani Y, Iriani A (2022) Academic document authentication using elliptic curve
digital signature algorithm and QR code. JOIV Int J Inform Vis 6(3):667–675
10. Imam IT, Arafat Y, Alam KS, Shahriyar SA (2021) DOC-BLOCK: a blockchain based authen-
tication system for digital documents. In: 2021 third international conference on intelligent
communication technologies and virtual mobile networks (ICICV). IEEE, pp 1262–1267
11. Yumna H, Khan MM, Ikram M, Ilyas S (2019) Use of blockchain in education: a system-
atic literature review. In: Intelligent information and database systems: 11th Asian confer-
ence, ACIIDS 2019, Yogyakarta, Indonesia, April 8–11, Proceedings, Part II 11. Springer
International Publishing, pp 191–202
12. Sharma P, Namasudra S, Crespo RG, Parra-Fuente J, Trivedi MC (2023) EHDHE:
enhancing security of healthcare documents in IoT-enabled digital healthcare ecosystems using
blockchain. Inf Sci 629:703–718
13. Li ZZ, Joseph KL, Yu J, Gasevic D (2022) Blockchain-based solutions for education
credentialing system: comparison and implications for future development. In: 2022 IEEE
international conference on blockchain (blockchain). IEEE, pp 79–86
14. Weyl EG, Ohlhaver P, Buterin V (2022) Decentralized society: finding web3’s soul. Available
at SSRN 4105763
15. Sharples M, Domingue J (2016) The blockchain and kudos: a distributed system for educational
record, reputation and reward. In: Adaptive and adaptable learning: 11th European conference
on technology enhanced learning, EC-TEL 2016, Lyon, France, September 13–16, Proceedings
11. Springer International Publishing, pp 490–496
16. Chen G, Xu B, Lu M, Chen NS (2018) Exploring blockchain technology and its potential
applications for education. Smart Learn Environ 5(1):1–10
17. UNIC (2018) Blockchain certificates (academic & others). https://www.unic.ac.cy/iff/blockc
hain-certificates/
18. Sony Global Education. Creating a trusted experience with blockchain. https://blockchain.son
yged.com/
19. Blockcerts. The open standard for blockchain credentials. https://www.blockcerts.org/
20. The Times of India. 1/3rd of undergraduate students in India doing BA: survey. http://tim
esofindia.indiatimes.com/articleshow/97465229.cms?from=mdr&utm_source=contentofint
erest&utm_medium=text&utm_campaign=cppst
Time Series Forecasting of NSE Stocks
Using Machine Learning Models
(ARIMA, Facebook Prophet,
and Stacked LSTM)
Prabudhd Krishna Kandpal, Shourya, Yash Yadav, and Neelam Sharma
Abstract It is widely recognised and acknowledged among market observers and

analysts that the stock market, by its very nature, exhibits a tremendous degree
of volatility, resulting in frequent and substantial fluctuations. Consequently, the
ability to accurately anticipate and forecast market trends assumes paramount impor-
tance when it comes to making well-informed decisions regarding the buying and
selling of stocks. To achieve such predictive capabilities, the focus of this partic-
ular research endeavour is specifically centred around leveraging advanced machine
learning models, including but not limited to AutoRegressive Integrated Moving
Average (ARIMA), Prophet, as well as deep learning models such as Long Short-
Term Memory (LSTM). Root Mean Squared Error (RMSE) is utilised to assess the
performance and efficacy of these models. Therefore, the results emanating from this
meticulously conducted study contribute invaluable insights and shed light on the
comparative effectiveness of different models within the realm of time series fore-
casting. Importantly, the prevailing body of evidence strongly supports the notion that
deep learning-based algorithms, such as LSTM, hold a distinct advantage over tradi-
tional statistical methods like the ARIMA model, thereby reinforcing their superiority
in this domain.
Keywords Deep learning · Long Short-Term Memory (LSTM) · AutoRegressive

Integrated Moving Average (ARIMA) · Prophet · Time series forecasting ·
Machine learning
P. K. Kandpal (B) · Shourya · Y. Yadav · N. Sharma

Department of Artificial Intelligence and Machine Learning, Maharaja Agrasen Institute of
Technology, Delhi, India
e-mail: prabudhd2003@gmail.com
N. Sharma
e-mail: neelamsharma@mait.ac.in
https://doi.org/10.1007/978-981-99-6553-3_24
304 P. K. Kandpal et al.
1 Introduction
There are numerous techniques available for addressing time series forecasting prob-
lems. While several procedures assist in drawing conclusions, they do not guarantee
accurate results. To make an informed decision, it is crucial to precisely forecast
which option to choose. Therefore, it is necessary to carefully evaluate the pros and
cons of each method before applying it.
The ARIMA model is commonly used for stock predictions due to its simplicity,
ability to identify time-dependent patterns, and provision of statistical insights.
However, it has limitations in capturing nonlinear patterns, requires stationary data,
does not consider exogenous variables, and is less accurate for long-term forecasts.
Combining ARIMA models with advanced methodologies can improve accuracy
and overcome these limitations in real-world usage.
Prophet offers a user-friendly and efficient means of conducting time series anal-
ysis for stock forecasting. It provides a quick and simple solution without requiring
substantial modification. However, its rudimentary assumptions and limited control
might restrict its applicability in complex scenarios that demand better modelling
methodologies or detailed integration of external elements.
LSTM models for stock predictions offer benefits such as long-term fore-
casting, handling sequential data, incorporating exogenous factors, and capturing
complex patterns. However, they require a large amount of data, are computationally
demanding, are highly susceptible to overfitting, and lack interpretability. Despite
these shortcomings, LSTM models are often used in combination with various
methods to enhance accuracy for stock forecasting purposes.
This paper follows a clear and organised structure. It begins by introducing the
dataset used in the research and providing the necessary background information.
The different models utilised are then listed and explained, offering a comprehensive
overview of the methodologies employed. The proposed methodology is presented
in detail, describing the specific approach taken in the research. Subsequently, the
paper presents the results obtained from implementing these models, emphasising the
outcomes of the analysis. A thorough examination and analysis of these results are
conducted to gain a deeper understanding of the findings. Finally, the paper concludes
by summarising the findings of the proposed model, discussing their implications,
and acknowledging the limitations and potential for future research in the field.
The motivations of this paper can be summarised as follows:
1. This study aims to compare the performance of three popular forecasting models,
namely ARIMA, Facebook Prophet, and LSTM, in predicting stock prices. By
analysing the results across multiple companies with varying levels of stability
and volatility, the study aims to provide insights into the strengths and weaknesses
of each model.
2. Accurate stock predictions are crucial for financial decision-making. The
study aims to emphasise the importance of accurate forecasting for long-
term predictions and highlights how advanced models like LSTM can enhance
accuracy.
Time Series Forecasting of NSE Stocks Using Machine Learning … 305
3. The study also aims to provide insights into the robustness and adaptability of
these models by examining their performance in both stable and volatile market
conditions.
Overall, this study contributes to the field of stock price prediction by comparing
the performance of different forecasting models and highlighting the potential of deep
learning-based algorithms. The findings have practical implications for researchers
and practitioners, paving the way for further advancements in the field of time series
forecasting.
2 Literature Review
Siami-Namini et al. have suggested that forecasting time series data presents a
formidable challenge, primarily due to the ever-evolving and unpredictable nature of
economic trends and the presence of incomplete information. Notably, the increasing
volatility witnessed in the market over recent years has raised significant concerns
when it comes to accurately predicting economic and financial time series. Conse-
quently, it becomes imperative to evaluate the precision and reliability of forecasts
when employing diverse forecasting methodologies, with a particular focus on regres-
sion analysis. This is crucial since regression analysis, despite its utility, possesses
inherent limitations in its practical applications [1]. Pang et al. have discovered
that neural networks have found extensive applications across a range of fields,
including pattern recognition, financial securities, and signal processing. Particularly
in the realm of stock market forecasting, neural networks have garnered consider-
able acclaim for their effectiveness in regression and classification tasks. Never-
theless, it is important to note that conventional neural network algorithms may
encounter challenges when attempting to accurately predict stock market behaviour.
One such obstacle arises from the issue of random weight initialisation, which can
lead to a susceptibility to local optima and, consequently, yield incorrect predictions
[2]. This study aims to accurately predict the closing price of various NSE stocks
using machine learning methods like ARIMA, Prophet, and deep learning models,
namely the LSTM model. Stock market forecasting has traditionally relied on linear
models such as AutoRegressive (AR), AutoRegressive Moving Average (ARMA),
and AutoRegressive Integrated Moving Average (ARIMA). However, a notable limi-
tation of these models is their specificity to a particular time series dataset. In other
words, a model that performs well for forecasting the stock market behaviour of one
company may not yield satisfactory results when applied to another company. This
can be attributed to the inherent ambiguity and unpredictable nature of the stock
market, which inherently carries a higher level of risk compared to other sectors.
Consequently, this inherent complexity and risk associated with stock market predic-
tion significantly contribute to the difficulty of accurately forecasting stock market
trends [3]. There are several reasons why deep learning models have come to be
significantly successful in comparison to traditional machine learning and statistical
models, and their usage has been on the rise for several decades. Models such as
the LSTM model possess the ability to take into account the temporal dependencies
present in time series data. Secondly, these models are very successful in extracting
features from raw data, eliminating the need for manual feature extraction. Moreover,
deep learning models are capable of accommodating both univariate and multivariate
time series data, with even irregular and unevenly spaced data points. With the latest
advancements in parallel computing and GPUs, deep learning models can be trained
and optimised on large-scale data. Saiktishna et al. [4] focus on the utilisation of
the FB Prophet model for historical analysis of stock markets and time series fore-
casting. It explores the techniques, conclusions, and limits of previous research in
this field. The evaluation emphasises FB Prophet’s ability to capture market patterns
and seasonality, as well as future research possibilities. It contains useful information
for scholars and practitioners who want to use FB Prophet for stock market study
and forecasting.
Numerous publications in the literature have made an effort to investigate the
hybrid modelling of financial time series movement using various models. He et al. [5]
utilised a hybrid model using the ARMA and CNN-LSTM model to accurately predict
the financial market by applying it to three different time series with different levels
of volatility. They presented that optimisations are still possible to machine learning
and deep learning models, given the rapid development in the aforementioned fields.
Fang et al. [6] proposed a novel approach using the dual-LSTM approach, which
consisted of two LSTM layers with batch normalisation which addressed the problem
of sharp point changes by capturing significant profit points using an adaptive cross-
entropy loss function, enhancing the model’s prediction capabilities. Gajamannage
et al. [7] have emphasised the importance of real-time forecasting and presented
its importance in risk analysis and management. They have put forth a sequentially
trained dual-LSTM model which has addressed the issue of semi-convergence in a
recurrent LSTM setup and has validated their results based on various diverse finan-
cial markets. Patil [8] highlights the use of machine learning approaches for stock
market forecasting, including ARIMA, support vector machines, random forest, and
recurrent neural networks (RNNs). The paper explores the strengths and limitations
of these models, emphasising the importance of feature engineering and selection in
improving prediction accuracy. It also includes case studies and empirical research
to show how these models may be used in stock price forecasting.
It is evident from the above examples that LSTM is a robust model that is excellent
for the purpose of time series analysis and forecasting. It has displayed its capabil-
ities in various other fields, including energy consumption forecasting, wind speed
forecasting, carbon emissions forecasting, and aircraft delays forecasting.
3 Dataset Description
The historical stock data used in this research study have been sourced from Yahoo
Finance. The dataset encompasses the stock prices of four prominent companies:
Reliance Industries, Tata Steel LLC, ICICI Bank, and Adani Enterprise.
The selection of Reliance Industries, Tata Steel LLC, ICICI Bank, and Adani
Enterprise for this research study was based on specific criteria. Reliance Industries,
Tata Steel LLC, and ICICI Bank were chosen due to their stable stock performance
and a general upward trend observed over time. These stocks have demonstrated
consistent growth and are considered relatively stable investments. In contrast, Adani
Enterprise was included in the dataset because it is known for its high volatility,
with stock prices being heavily influenced by market reports and external factors.
By including a mix of stocks with different characteristics, such as stability and
volatility, we aim to accurately assess the capabilities of our predictive models in
handling various stock market scenarios and understanding their effectiveness in
different market conditions.
Within the dataset, two crucial components play a pivotal role in our modelling
approach: the ‘Close’ and ‘Date’ variables. The ‘Close’ variable signifies the last
recorded price at which a particular stock was traded during the regular hours of a
trading session. It serves as the target variable in our predictive models, as we aim to
forecast future stock prices based on historical trends and patterns. On the other hand,
the ‘Date’ variable acts as the predictor variable, providing temporal information to
aid in the prediction process. For the purpose of this research, the dataset spans from
the inception of each respective company until May 1, 2023, thereby covering a
substantial period of historical data (Fig. 1).
Fig. 1 Sample dataset of reliance industries obtained from Yahoo finance

Table 1 Number of time

Stock Observations Total
series observations
Train 90% Test 10%
Reliance Industries 6182 687 6869
Tata Steel LLC 6184 688 6872
ICICI Bank 4656 518 5174
Adani Enterprise 4658 518 5176
4 Data Preparation
The financial time series datasets were divided into two parts: a training dataset and
a test dataset. The training dataset consisted of 90% of each dataset and was used to
train the models. The remaining 10% of each dataset was allocated to the test dataset
to evaluate the accuracy of the models. The number of time series observations for
each dataset is provided in Table 1.
5 Assessment Metric
We have utilised the Root Mean Square Error (RMSE) to evaluate the precision of
our model’s predictions. RMSE measures the differences or residuals between the
predicted and actual values. By employing RMSE, we have been able to compare
prediction errors within the same dataset across different models rather than between
different datasets. The formula used to calculate RMSE is as follows:
|
|1 ∑ N
( )2
RMSE = | xi − xi ,
∆
(1)
N i=1
where N is the total number of stocks, xi is the actual value of the stock, whereas xi
is the value predicted by our model.
6 Models
6.1 ARIMA
ARIMA is a widely used time series forecasting model that combines autoregressive
(AR), differencing (I), and moving average (MA) components to capture linear rela-
tionships, stationarity, and dependencies within the data [9]. We have made use of
rolling ARIMA to perform forecasting. It means that we have refitted the model at
Table 2 ARIMA model

Stock ARIMA model (p, d, q) RMSE
parameters
Reliance Industries (5, 2, 0) 41.62
Tata Steel LLC (2, 1, 2) 18.62
ICICI Bank (4, 1, 4) 11.28
Adani Enterprise (4, 2, 4) 83.27
RMSE values are for the entire test set in Table 2
each iteration as new data becomes available. This allowed our model to continuously
adapt to the most recent data, enhancing its accuracy and robustness of the forecast.
In ARIMA modeling, the notation ARIMA (p, d, q) is commonly used, where [1]:
. ‘p’ represents the number of lag observations used in training the model (i.e. lag
order).
. ‘d’ denotes the number of times differencing is applied (i.e. degree of differ-
encing).
. ‘q’ indicates the size of the moving average window (i.e. order of moving average).
To determine the appropriate values for these parameters, we utilised the Autocor-
relation Function (ACF) graph and Partial Autocorrelation (PACF) graphs [10]. The
ACF graph provided insights into the correlation between the current observation
and lagged observations at various time lags. Meanwhile, the PACF graphs helped
us assess the correlation between the current observation and the residuals from
previous observations, taking into account the effects of intermediate observations.
By carefully examining these graphs, we were able to estimate the optimal values for
the AR, MA, and differencing components of the ARIMA model for each dataset.
They are listed in Table 2.
6.2 Facebook Prophet
Prophet is an open-source library developed by Facebook and is generally used for

univariate time series forecasting. Prophet [11] is a decomposable framework that
reduces an elaborate problem, such as time series data prediction, into simpler ones
and does so by taking three factors into account: seasonality, holidays, and trend.
y(t) = g(t) + s(t) + h(t) + ε. (2)
Here, g(t) represents the trend, s(t) represents seasonality, h(t) represents holi-
days, and t represents the error rate. The trend parameter monitors two additional
parameters: saturation growth and change points. Seasonality is another factor that
Prophet considers, which uses the Fourier series to create a precise end model.
∑N ( ( ) ( ))
2π nt 2π nt
s(t) = an cos + bn sin . (3)
n=1
p p
s(t) denotes seasonality, and P denotes the time period, which might be monthly,
weekly, daily, quarterly, or even annual. N is the frequency of change, and the param-
eters an and bn are dependent on it. The Prophet [12] is adept at detecting lost data,
changing trends, and treating outsiders often. Compared to previous time series fore-
casting techniques, the prophet makes it evident how it generates a faster forecast
that is more precise.
6.3 LSTM
Long Short-Term Memory (LSTM) is a variation of the Recurrent Neural Network

(RNN) which is often used in time series forecasting due to its ability to take the
temporal dependencies of the time series data into account. Before diving into LSTM,
we will first go over the following to develop a better understanding:
(a) Layered-formatted neurons make up the core of Feedforward Neural Networks
(FFNNs) [13]. Each neuron updates its values using an optimisation algorithm,
such as Gradient Descent, Adam optimiser, and computes values based on
randomly initialised weights. FFNNs are loop-free and completely linked. Every
FFNN has three layers of neurons: an input layer that receives input from users,
a hidden layer that allows the network to learn complex patterns and relation-
ships in data, and an output layer that produces the output based on the input
from the last layer. In an FFNN, each layer of neurons feeds information to the
layer above it.
(b) Recurrent Neural Networks (RNNs) [1] are special neural networks where the
outputs are partially dependent on a series of outputs obtained in the previous
stages. The hidden layers in an RNN network work as memory units that hold
this information and use it during computation. The only drawback of RNNs is
that they are only capable of learning a small number of previous stages, which
makes them incapable of remembering long sequences of data. The LSTM
model solves this issue by introducing a ‘memory’ line.
(c) Long Short-Term Memory (LSTM) [14] is an improvement of the RNN model.
It is equipped with input, output, and forgetting gates to accommodate the effects
of longer time intervals and delays while also solving the problem of vanishing
gradient and exploding gradient. The structure of an LSTM cell is given in
Fig. 1:
In Fig. 2, h(t) and h(t−1) represent the outputs of the current and previous cell,
x(t) represents the input of the current cell, and c(t) and c(t−1) represent the current
and previous states of the neuron at t. i(t) represents the input threshold which deter-
mines the information gain with the sigmoid function, and o(t) represents the output
Fig. 2 LSTM cell
threshold which determines the output neuron state using the sigmoid function and
the tanh activation function. f (t) represents the forgetting threshold which controls
the information that is discarded with the help of the sigmoid function.
For the purpose of this study, we perform univariate time series forecasting, following
these steps:
Data Collection: The historical time series data relevant to your problem were
gathered. Factors like data quality, missing values, outliers, and potential seasonality
or trends in the data were considered.
Data Preprocessing: The data were prepared for modelling by performing various
preprocessing steps. Missing values were handled by deciding on a strategy to fill
or impute them, such as forward/backward filling, interpolation, or using statistical
methods. Scaling and normalisation techniques were applied to normalise the data
to a common scale, such as Min–Max scaling, to improve model convergence and
performance.
Train–Test Split: The data were split into training and testing sets. Typically,
a larger portion was allocated for training, while a smaller portion was kept for
evaluating the model’s performance. The split maintains the temporal order of the
data to simulate real-world forecasting scenarios.
Model Selection: The appropriate machine learning or deep learning models were
chosen for the time series forecasting task. These models were chosen for time series
forecasting:
Autoregressive model (ARIMA) captures the dependency of the current observa-
tion on previous observations.
Long Short-Term Memory (LSTM) model is a type of RNN model specifically
designed to capture long-term dependencies in time series data.
The Prophet model combines the flexibility of generalised additive models with
the simplicity of traditional forecasting methods. It incorporates seasonality, trend
Table 3 RMSEs of ARIMA, Prophet, and LSTM models or the last 100 days of forecast
Stock RMSE Lowest RMSE
ARIMA Prophet LSTM
Reliance Industries 32.38 286.16 14.81 LSTM
Tata Steel LLC 11.29 34.63 4.86 LSTM
ICICI Bank 9.73 33.91 9.36 LSTM
Adani Enterprise 155.54 1035.87 81.2 LSTM
changes, and holiday effects, making it effective for predicting time series data with
intuitive and interpretable results.
Model Training: The selected model was trained using the training dataset. During
training, the model learns to capture patterns, trends, and seasonality in the data.
Hyperparameters (e.g. learning rate, batch size, number of layers) were adjusted
through experimentation.
Model Evaluation: The trained model’s performance on the testing dataset was
evaluated using appropriate evaluation metrics for time series forecasting, such as
Root Mean Squared Error (RMSE). The model’s ability to generalise and make
accurate predictions on unseen data was assessed.
Model Refinement: If the initial model’s performance was not satisfactory, the
model was refined by adjusting hyperparameters, trying different architectures, or
employing regularisation techniques to improve the model’s accuracy and gener-
alisation capabilities. This step was iterated until the desired performance was
achieved.
8 Observations and Results
The experimental findings have been summarised in Table 3, which provides a

comprehensive overview of the performance of the three models across the selected
set of four stocks (Figs. 3, 4, 5, and 6).
9 Result Analysis
Considering the higher level of comparability between the predictions of the ARIMA
and Stacked LSTM models compared to Prophet, we have done a detailed perfor-
mance evaluation of these two models. To gain more profound insights into the
accuracy of these models for stock price prediction, we have conducted our anal-
ysis individually for each company. This comprehensive approach has allowed us to
thoroughly assess the performance of ARIMA and Stacked LSTM models across all
four companies. Additionally, we have focused on the final 30 days of the forecasted
Fig. 3 a ARIMA and LSTM models’ last 100 days’ forecast on RELIANCE NSE stock. b Facebook
Prophet model’s forecast on RELIANCE NSE stock
Fig. 4 a ARIMA and LSTM models’ last 100 days’ forecast on TATA STEEL NSE stock.
b Facebook Prophet model’s forecast on TATA STEEL NSE stock
Fig. 5 a ARIMA and LSTM models’ last 100 days’ forecast on ICICI BANK NSE stock.
b Facebook Prophet model’s forecast on ICICI bank NSE stock
period for each company, enabling us to make precise evaluations of these models’
effectiveness.
A. Reliance Industries
Fig. 6 a ARIMA and LSTM models’ last 100 days’ forecast on ADANI NSE stock. b Facebook
Prophet model’s forecast on ADANI NSE stock
Fig. 7 a ARIMA model’s last 30 days’ forecast on RELIANCE NSE stock. b LSTM model’s last
30 days’ forecast on RELIANCE NSE stock
ARIMA: The ARIMA model for Reliance Enterprises achieved an RMSE of

32.26, indicating an average deviation of approximately Rs 32.26 between the
predicted and actual stock prices (Fig. 7).
Stacked LSTM: The LSTM model for Reliance Enterprises achieved a lower
RMSE of 9.26, implying an average deviation of approximately Rs 9.26.
Comparing the two models, both LSTM and ARIMA effectively captured the
trends in Reliance’s stock prices, as evident from the graphs. However, the LSTM
model appears to be more accurate in mapping these trends. The significantly lower
RMSE of 9.26 for the LSTM model suggests that it better captured the underlying
patterns in Reliance’s stock prices compared to ARIMA.
B. Tata Steel LLC:
ARIMA: The ARIMA model for Tata Steel achieved an RMSE of 8.1, indicating
an average deviation of approximately Rs 8.1 between the predicted and actual stock
prices (Fig. 8).
Stacked LSTM: The LSTM model for Tata Steel achieved a lower RMSE of 4.26,
implying an average deviation of approximately Rs 4.26.
trends in Tata Steel’s stock prices, as evident from the graphs. However, the LSTM
Fig. 8 a ARIMA model’s last 30 days’ forecast on TATA STEEL NSE stock. b LSTM model’s
last 30 days’ forecast on TATA STEEL NSE stock
model demonstrated greater accuracy in mapping these trends. The significantly

lower RMSE of 4.26 for the LSTM model suggests that it better captured the
underlying patterns in Tata Steel’s stock prices compared to ARIMA.
C. ICICI Bank:
ARIMA: The ARIMA model for ICICI Bank achieved an RMSE of 8.89, indi-
cating an average deviation of approximately Rs 8.89 between the predicted and
actual stock prices (Fig. 9).
Stacked LSTM: The LSTM model for ICICI Bank achieved a lower RMSE of
7.93, implying an average deviation of approximately Rs 7.93.
trends in ICICI Bank’s stock prices, as evident from the graphs. However, the
LSTM model demonstrated greater accuracy in mapping these trends. The signifi-
cantly lower RMSE of 7.93 for the LSTM model suggests that it better captured the
underlying patterns in ICICI Bank’s stock prices compared to ARIMA.
D. Adani Enterprises:
Fig. 9 a ARIMA model’s last 30 days’ forecast on ICICI bank NSE stock. b LSTM model’s last
30 days’ forecast on ICICI bank NSE stock
Fig. 10 a ARIMA model’s last 30 days’ forecast on ADANI NSE stock. b LSTM model’s last
30 days’ forecast on ADANI NSE stock
ARIMA: The ARIMA model for Adani Enterprises achieved an RMSE of 58.77,
indicating an average deviation of approximately Rs 58.77 between the predicted
and actual stock prices (Fig. 10).
Stacked LSTM: The LSTM model for Adani Enterprises achieved a slightly lower
RMSE of 58.0, implying an average deviation of approximately Rs 58.0.
Both the LSTM and ARIMA models achieved some success in capturing the
trends in Adani Enterprises’ stock prices, as seen in the graphs, but had high RMSE
values, indicating notable deviations from the actual prices. Adani’s stock is highly
volatile, influenced by market reports and external factors, making accurate predic-
tions challenging. Although the Stacked LSTM model performed the best among
the two, accurately forecasting Adani Enterprises’ stock prices still has room for
improvement due to the inherent uncertainty and complexity of market dynamics.
10 Limitations
While the findings of this research provide valuable insights into the performance of
LSTM, ARIMA, and Facebook Prophet for stock price prediction, there are a few
limitations to consider:
1. Lack of External Factors: The models used in this study solely relied on histor-
ical stock price data as input. External factors, such as company-specific news,
industry trends, or global economic events, were not incorporated into the anal-
ysis. These external factors can significantly influence stock prices and might
enhance the accuracy of the predictions if considered.
2. Limited Generalizability: The study focused on a specific set of companies,
namely Reliance Industries, Tata Steel LLC, ICICI Bank, and Adani Enterprise.
The findings may not apply to other stocks or industries. The performance of the
models could vary when applied to different datasets with diverse characteristics.
3. Limited Scope of Model Selection: While the research compared LSTM,
ARIMA, and Facebook Prophet models, it is important to note that there
are several other advanced forecasting algorithms available, such as gradient
boosting machines and Long Short-Term Memory networks with attention mech-
anisms. Exploring a wider range of forecasting techniques could provide addi-
tional insights and potentially reveal alternative models that could yield different
results.
Addressing these limitations and conducting further research would enhance the
robustness and applicability of the findings, leading to a more comprehensive under-
standing of the strengths and weaknesses of different forecasting models in stock
price prediction.
11 Conclusion
The findings of this study highlight the superior performance of LSTM, a deep
learning-based algorithm, in comparison to ARIMA and Facebook Prophet for stock
price prediction across Reliance Industries, Tata Steel LLC, and ICICI Bank. These
companies, known for their stability, exhibited low RMSE values when analysed
using both ARIMA and Stacked LSTM models. However, when applied to the highly
volatile stock of Adani Enterprise, the models yielded higher RMSE values. Despite
this, the models were successful in capturing the general trend of the stock. It is worth
noting that while ARIMA demonstrated good overall performance, LSTM consis-
tently outperformed it in terms of accuracy. The limitations of Facebook Prophet
in handling time series with little or no seasonality, such as stock prices, were also
evident in this study.
This research highlights the advantages of deep learning-based algorithms in
analysing economic and financial data, providing valuable insights for finance and
economics researchers and practitioners. It calls for further exploration of these tech-
niques in different datasets containing varying features, expanding our understanding
of the improvements that can be achieved through deep learning in various domains.
In summary, this study contributes to the comparative performance analysis of
ARIMA, Prophet, and LSTM models in stock price prediction. It supports the notion
that deep learning-based algorithms, particularly LSTM, show promise in enhancing
prediction accuracy. It also recognises the reliability of ARIMA for stock price
prediction and acknowledges the limitations of Prophet for time series lacking strong
seasonality.
12 Social Impact
The study conducted in this paper, although not achieving highly precise stock price
prediction, has demonstrated the effectiveness of two models: ARIMA and Stacked
LSTM, in accurately forecasting market trends. This is evident from the 30-day
forecast presented in Sect. 10. While predicting the exact stock price is considered
an extremely challenging task, the ability to forecast market trends can provide
valuable assistance to society in the following ways:
1. Risk Management: Accurate trend forecasting helps investors and institutions
manage stock market risks, optimising investment strategies to minimise losses
and maximise returns.
2. Market Timing: Understanding market trends enables effective investment timing
and optimising buying and selling decisions to capitalise on opportunities and
enhance investment performance.
3. Strategic Planning: Accurate trend forecasting informs businesses’ strategic plan-
ning, aligning product development, marketing, and expansion strategies with
market dynamics for competitive advantage and informed resource allocation.
4. Economic Analysis: Trend forecasting contributes to understanding the overall
economy, providing insights into industry health, market sentiments, and
potential economic shifts, and aiding policymakers in decision-making.
5. Algorithmic Trading Strategies: The findings of this study can benefit algorithmic
trading developers, enhancing trading algorithm performance for more profitable
automated strategies.
In conclusion, while precise stock price prediction may be challenging, the ability
to forecast market trends, as demonstrated by the ARIMA and Stacked LSTM
models in this study, offers significant benefits to society. From risk management and
strategic planning to economic analysis, accurate trend forecasting supports informed
decision-making in various domains, contributing to better financial outcomes and
market understanding.
13 Future Scope
This research opens up several avenues for future investigation and expansion. Here,
we outline some potential directions and areas of exploration that can contribute to
the advancement of this field:
1. Incorporating External Factors: To enhance the predictive accuracy of the models,
future research can consider integrating external factors such as company-specific
news, industry trends, macroeconomic indicators, and market sentiment into the
analysis. This can provide a more comprehensive understanding of the factors
influencing stock prices and improve the models’ ability to capture complex
market dynamics.
2. Comparing with Other Advanced Forecasting Techniques: While this research
focused on LSTM, ARIMA, and Facebook Prophet, there are numerous other
advanced forecasting techniques available. Future studies could expand the
model selection and compare the performance of additional algorithms, such as
gradient boosting machines, support vector machines, or ensemble methods. This
comparative analysis can shed light on various forecasting approaches relative
to strengths and weaknesses of various forecasting approaches.
3. Real-Time Prediction and Adaptive Models: Another interesting avenue for

future research is to evaluate the performance of the models in real-time predic-
tion scenarios. This would involve updating the models with the latest available
data and assessing their ability to adapt to changing market conditions. Devel-
oping adaptive models that can adjust their predictions dynamically based on
new information can be valuable for investors and financial institutions.
4. Integration of Hybrid Models: Hybrid models that combine the strengths of
different forecasting techniques can be explored. For example, integrating the
strengths of LSTM and ARIMA in a hybrid model may provide improved
forecasting accuracy. Investigating the effectiveness of such hybrid models can
contribute to the development of more robust and accurate prediction systems.
By addressing these future directions, researchers can further advance the field
of stock price prediction and deepen our understanding of the capabilities and
limitations of different forecasting models.
References
1. Siami-Namini S, Tavakoli N, Siami Namin A (2018) A comparison of ARIMA and LSTM in

forecasting time series. In: 2018 17th IEEE international conference on machine learning and
applications (ICMLA), pp 1394–1401
2. Pang X, Zhou Y, Wang P, Lin W, Chang V (2020) An innovative neural network approach
for stock market prediction. J Supercomput 76(3):2098–2118. https://doi.org/10.1007/s11227-
017-2228-y
3. Hiransha, Gopalakrishnan, Menon VK, Soman (2018) NSE stock market prediction using
deep-learning models. Proc Comput Sci 132:1351–1362. https://doi.org/10.1016/j.procs.2018.
05.050
4. Saiktishna C, Sumanth NS, Rao MM, Thangakumar J (2022) Historical analysis and time
series forecasting of stock market using FB prophet. In: 2022 6th International conference on
intelligent computing and control systems (ICICCS), pp 1846–1851
5. He K, Yang Q, Ji L, Pan J, Zou Y (2023) Financial time series forecasting with the deep learning
ensemble model. Mathematics 11(4):1054. https://doi.org/10.3390/math11041054
6. Fang Z, Ma X, Pan H, Yang G, Arce GR (2023) Movement forecasting of financial time series
based on adaptive LSTM-BN network. Expert Syst Appl 213:119207
7. Gajamannage K, Park Y, Jayathilake DI (2023) Real-time forecasting of time series in financial
markets using sequentially trained dual-LSTMs. Expert Syst Appl 223:119879. https://doi.org/
10.1016/j.eswa.2023.119879
8. Patil R (2021). Time series analysis and stock price forecasting using machine learning
techniques 19. https://doi.org/10.1994/Rajat/AI
9. Jamil H (2022) Inflation forecasting using hybrid ARIMA-LSTM model. Laurentian University
of Sudbury
10. Zhang R, Song H, Chen Q, Wang Y, Wang S, Li Y (2022) Comparison of ARIMA and LSTM for
prediction of hemorrhagic fever at different time scales in China. PLoS ONE 17(1):e0262009.
https://doi.org/10.1371/journal.pone.0262009
11. Lilly SS, Gupta N, Anirudh RRM, Divya D (2021) Time series model for stock market predic-
tion utilising prophet. Turk J Comput Math Educ (TURCOMAT) 12(6):4529–4534. https://tur
comat.org/index.php/turkbilmat/article/view/8439
12. Kaninde S, Mahajan M, Janghale A, Joshi B (2022) Stock price prediction using Facebook
prophet. ITM Web Conf 44:03060. https://doi.org/10.1051/itmconf/20224403060
13. Staudemeyer RC, Morris ER (2019) Understanding LSTM—a tutorial into long short-term
memory recurrent neural networks. arXiv [cs.NE]. http://arxiv.org/abs/1909.09586
14. Zhang J, Ye L, Lai Y (2023) Stock price prediction using CNN-BiLSTM-attention model.
Mathematics 11(9):1985. https://doi.org/10.3390/math11091985
Analysis of Monkey Pox (MPox)
Detection Using UNETs and VGG16
Weights
V. Kakulapati
Abstract As the world struggles to recover from the extensive destruction caused by
the advent of COVID-19, a new threat emerges: the MPox virus. MPox is neither as
deadly nor as ubiquitous as COVID-19, but it still causes new instances of infection
in patients every day. If another worldwide epidemic occurs for the same reason, it
would not come as a shock to anybody. Image-based diagnostics may benefit greatly
from the use of ML. For this reason, a comparable application may be modified
to detect the MPox-related illness as it manifests on human skin, and the obtained
picture can then be used to establish a diagnosis. However, there is no publicly
accessible MPox dataset for use in machine learning models. As a result, creating a
dataset with photos of people who have had MPox is an urgent matter. To do this,
continually gather fresh MPox images from MPox patients, evaluate the efficacy of
the recommended modeling using VGG16 on very skewed data, and compare the
results from our model to those from previous publications, all using the UNETs
with VGG16 weights’ model. The time it takes to go from diagnosis to treatment is
shortened because the MPox is easily seen. Because of this, there is a great need for
fast, accurate, and reliable computer algorithms. Using the U-Net and the VGG16
CNN, the system presented here can automatically recognize and analyze MPox.
Keywords Custom CNN · U-Net · VGG16 · CNN · Disease · Diagnosis · MPox

virus · Machine learning (ML) · Performance · Images · Patients
1 Introduction
The MPX infection causes a pathogenic sickness that has several diagnostic similar-
ities to chickenpox, measles, and smallpox. Due to its rarity and similarities to other
diseases, early diagnosis of monkeypox has proven challenging. In 1959, Denmark
V. Kakulapati (B)
Sreenidhi Institute of Science and Technology, Yamnampet, Ghatkesar, Hyderabad,
Telangana 501301, India
e-mail: vldms@yahoo.com
https://doi.org/10.1007/978-981-99-6553-3_25
322 V. Kakulapati
became the first country to report a monkeypox pandemic. There were three outbreaks
of human MPX between October 1970 and May 1971, infecting a total of six people
in Liberia, Nigeria, and Sierra Leone. Ten further cases of monkeypox have been
reported in Nigeria since the first index case was found in 1971. Since the epidemic
started in 2013, there have been reports of monkeypox in people in 15 countries.
Eleven of these countries are located in Africa. There have been incidences of MPX
in countries as diverse as Singapore and Israel [1]
The virus, which has its roots in Central and West Africa, has now spread to
several other regions and threatens to become a worldwide pandemic. There may
also be a rash and lymph node swelling. The condition is self-limiting and treated
primarily with symptomatic care; however, 3–5% of patients may die from medical
complications. As there is currently no medicine available that specifically targets the
MPox virus, antiviral and vaccinia immune globin, both designed for the treatment
of smallpox in older people, are used to control acute MPox infections [2, 3].
The virus that causes MPox may also infect humans and other animals. A rash
that begins as blisters and eventually crusts over is one symptom. Fever and enlarged
lymph nodes are also present. The time until symptoms appear after exposure might
be anything between five and twenty-one days. In most cases, people have symptoms
for two to four weeks. Mild symptoms are possible, but it is also possible that you
would not notice anything at all. Not every epidemic follows the usual pattern of fever,
aching muscles, swollen glands, and lesions appearing simultaneously. More severe
symptoms may be experienced by those who are more vulnerable to the disease, such
as infants, pregnant women, and those with impaired immune systems.
Increases in reported cases of MPox occur although the disease is not very infec-
tious in comparison to the 19 instances of Coronavirus that have been seen so far.
In 1990, there were only 50 confirmed cases of MPox in the whole of West and
Central Africa [4, 5]. However, by the year 2020, a startlingly high number of 5000
cases had occurred. Before 2022, it was thought that MPox only existed in Africa;
nevertheless, several countries outside of Africa, together with Europe and the USA,
confirmed detecting MPox infections in their populations. So far, 94 nations have
reported MPox cases to the Centers for Disease Control and Prevention (CDC) as of
December 21 (year, day, and number), 2022 [6]. As a result, widespread panic and
dread are on the rise [7]. This is often reflected in people’s online comments.
This investigation aims to extend a model that can more accurately identify and
diagnose monkey pox using current data. The UNETs and VGG16 weights are used
to construct an effective model for diagnosing monkey pox.
Similar to smallpox, although less severe, MPox has comparable clinical char-
acteristics [8]. Yet, the rashes and lesions produced by an MPox virus often mimic
those of chickenpox and cowpox. Due to its similarities in appearance and symptoms
to other poxviruses, early identification of MPox may be difficult for medical profes-
sionals. Moreover, since human cases of MPox were so uncommon before the current
epidemic [9], there is a significant information vacuum among healthcare providers
worldwide. Healthcare providers are utilized to detect poxviruses by image assess-
ment of skin lesions; however, the polymerase chain reaction (PCR) test is widely
regarded as the most reliable method for identifying an MPox infection [10]. While
Analysis of Monkey Pox (MPox) Detection Using UNETs and VGG16 … 323
fatalities from MPox infections are relatively rare (1–10%) [11], the disease may be
effectively controlled in communities by isolating patients and tracking down their
contacts as soon as possible.
By combining U-Net with structural contexts, numerous clinical feature extraction
approaches are used, such as those used to classify and segment retinal veins. VGG16,
a neural network, improves the strategy of merging several prediction performances
to raise the efficiency of classification methods. As a fully convolutional neural
network (CNN), VGG16 has 16 layers (convolutional neural network). The ImageNet
database, which contains over a million photos, is now being downloaded as part of a
training dataset instance. Photos may be broken down into 1,000 distinct items, like
the computer keyboard and mouse, using the pre-trained network. This has led to
rich visibility of visual variety throughout the network. The 224 × 224 input picture
is used by the network.
While U-Net is among the most widely used CNN architectures for image segmen-
tation, many systems lack the time and resources necessary to implement it. To get
around this issue, we combined U-Net with another design, called VGG16, to lower
the number of layers and parameters required. VGG16 was selected because its
contracted layer is very similar to that of U-Net and because it has a large variety
of tuning options. We use the weights from VGG16, which are based on freely
accessible characteristics.
The remainder paper has the following arrangement: Sect. 2 focuses on
previous studies. Next, Sect. 3 provides a brief overview of the recommended
model’s structure, the methodology utilized in this study, and the investigational
setup employed to assess the model’s efficacy. Section 4 provides visual explana-
tions of each evaluative metric. Outcomes from tests performed to address the issues
raised in Sect. 5’s work are then presented, together with an analysis of the resulting
data and any conclusions that may be drawn from it. Section 6 contains concluding
remarks, followed by future enhancement investigations.
2 Previous Works
Several deep learning models that had been pre-trained were utilized in a feasibility
study [12] to recognize MPox lesions as distinct from those caused by chickenpox and
measles. The dataset was collected from freely available online resources, including
news websites, and then, it was augmented using a data mining approach to boost its
size. Pre-trained deep learning models, such as Inception V3, ResNet50, and VGG-
16 are widely used. We discovered that the approach successfully differentiated
MPox lesions from measles and chickenpox lesions. Overall, the ResNet50 model
performed the best, with an accuracy of 82.96%. When compared to an ensemble
of the three models, VGG16 achieved 81.48 and 79.26% accuracy. The researcher
described the possibility of using AI to identify MPox lesions in digitized skin
images [13]. In 2022, the research debited the biggest skin imaging collection to
date, which was collected from cases of MPox. Seven different DL models were
324 V. Kakulapati
used for the study: ResNet50, DenseNet21, Inception-V3, Squeeze Net, MnasNet-
AI, MobileNet-V2, and ShuffleNet-V2-X. With an accuracy rate of 85%, the research
suggests that AI has considerable promise in diagnosing MPox from digitized skin
pictures. Using a retrospective observational research design [14] describes the clin-
ical characteristics and treatment options for human MPox in the UK. Human MPox,
the study’s authors conclude, presents unusual difficulties even for the UK’s well-
endowed healthcare systems and their high-consequence infectious diseases (HCID)
networks.
To diagnose MPox, it was proposed to use a reworked version of the VGG16
technique. The consequences of their trials were split between two research projects.
In terms of the variables, these studies relied on batch size, learning rate, and the
number of epochs. According to the findings, the improved model correctly diagnosed
MPox patients in both experiments with an accuracy of 97 plus or minus 1.8% and 88
plus or minus 0.8%, respectively. Furthermore, a well-known explainable AI method
called LIME (Local Interpretable Model-Agnostic Explanations) was used to gloss
over and explain the post-prediction and feature extraction outcomes. By studying
these early signs of the virus’s spread, LIME hopes to get a better understanding
of the virus itself. The results confirmed that the proposed models could pick up
on trends and pinpoint the exact location of the outbreak. MPox skin lesion photos,
together with those of chickenpox and measles, were assembled into a dataset called
“MPox Skin Lesion Dataset” (MSLD) [15].
Detecting and segmenting the brachial plexus have been suggested using variants
of the U-Net [16, 17], a popular model for ultrasound picture segmentation. For
better recognition results, the best convolutional neural network (CNN) has 16 weight
layers, whereas the VGG network [18] was suggested in the ImageNet Large-Scale
Visual Recognition Competition (ILSVRC) in 2014. VGG’s model helps with object
localization since it is based on a small filter size [19]. Networks using the VGG
architecture have been used for these tasks [20]. Several types of convolutional
architecture often include a skip connection, and attention is a method that, when
combined with an encoder and a decoder, may boost the performance of the models.
The ResNet50, AlexNet, ResNet18, and CNN models were used to test the efficacy
of the transfer learning approach. The recommended network achieves a 91.57%
accuracy rate with a sensitivity of 85.7%, a specificity of 85.7%, and a precision
of 85.7%. This information may be used to create new PC-based testing methods
for broad monitoring and early identification of MPox. Furthermore, it would allow
those who fear that they have MPox to do basic testing in the comfort of their own
homes, putting them at a safer distance from the disease’s potentially harmful effects
in its early stages [21].
3 Methodology
Although computer vision has found several uses, it has seen very little use in health
care. Investigation into health-related image identification has greatly benefited from
the latest developments in deep learning methodologies for image detection. UNETs
are a specialized method for segmenting images. Much of what has inspired this
study is the extensive research into monkey poop prediction utilizing ML techniques
like VGG16 and UNETs.
We preprocessed and transformed into masked pictures a dataset of patients with
MPox and others who had a comparable impact. In order to prepare for any future
illnesses that may threaten human life, a model was developed that is data-agnostic.
After extensive training and testing, the models finally produced a picture with
unprecedented precision.
3.1 VGG16
A convolutional neural network (CNN) created by the VGG at Oxford University

won the ImageNet [22] competition in 2014. This model has a total of 13 layers of
convolution, five max-pooling layers, and three dense layers. Since it includes 16
layers with trainable weight parameters, it is known as VGG16 [23].
3.2 CNN
The structure has a U-shaped layout. The encoder and decoder are each divided into
two halves. UNET takes a photograph as input and outputs a picture of its own. It
is well-established for the diagnosis of various pictures to recognize aberrant traits.
Encoders, like other convolutional layers, work to reduce the overall size of an input
picture. Although the image’s pixel values seem to remain the same, the decoder
increases the quantity by fusing the encoder layer with itself [24].
3.3 Custom CNN
CNNs, or ConvNets, are a kind of deep learning network design that may acquire
knowledge directly from data without the requirement for human intervention to
extract features. CNNs excel at recognizing objects, people, and scenes by analyzing
pictures for recurring patterns. Making it simpler to create bespoke neural networks
and use TensorFlow Core for particular purposes, this post walked over the methods
and choices utilized in constructing graphs and executing them in a training session.
326 V. Kakulapati
3.4 UNET
It is the model of CNN. It is U-shaped architecture. The encoder and decoder have
two pieces. UNET takes an input picture and delivers the output of an image. It
is established for the diagnosis of different images to identify characteristics of
abnormality. While every convolutional layer does, the encoder minimizes the image
size. At the same time, the decoder increases the amount by integrating the encoder
layer with the decoder layer, while the image seems to be in the image size of the
pixel values [25].
Image segmentation is complete inside the image, and each pixel’s characteristics
are estimated. The neighbourhood window has a dimension of 3x3, with the centre
pixel being the one of interest. All pixels are recurring to evaluate the characteristics.
In many segmentation models, the contracting layer and expansion layer are the
foundational layers. Many up-sampling layers and convolution layers were added
to the end of the VGG16 architecture in this study to make it more like the U-Net.
When finished, the model’s architecture will have the symmetry of the letter U.
Consequently, the VGG16 will serve as the contracting layer in the UNet-VGG16
model’s design, while the expansion layer will be introduced later.
4 Implementation Analysis
The dataset is collected from the web which is a publically available dataset. In
implementation, train 150 images of MPox. Though it is a small data set, we obtained
more accurate results. To prepare the MPox dataset for uploading, read all photos,
resize images, normalize image pixel values, and divide the dataset into TRAIN and
TEST. Roughly 150 images were collected for the dataset by scouring various online
sources. While the additional 70% of the dataset is used for planning purposes, the
remaining 30% is used for testing the computations. Execute the VGG16 algorithm
with a tailor-made CNN algorithm: VGG will be fed 80% training photos and 20%
test images to evaluate its efficacy (Fig. 1).
Preprocessing is crucial for improving the quality of MPox photos and preparing
them for feature extraction through image analysis, which may be carried out by
either humans or robots. Preprocessing has several benefits, including higher signal-
to-noise ratios, a clearer perception of an MPox picture, less clutter, and more accurate
color reproduction.
Fig. 1 Monkey pox image

dataset images
4.2 Extracting Features
This is to reduce the amount of time spent analyzing data by isolating and quantifying
the specific features that make up a given training sample. By analyzing the most
crucial features of an image in a feature space, the extracted feature provides input
for the subsequent classifier. There are eight separate textural aspects at play in
this picture analysis research. In contrast, these features are evaluated differently
when used for classifying and segmenting tasks. To facilitate image classification
across pictures, an approximation of the characteristics is made from the complete
image. On the other hand, within the picture, the segmentation is complete, and the
predicted attributes of each pixel are accurate. For this example, we will use a 3 ×
3 neighborhood window with the center pixel marked as the point of interest. For a
complete evaluation of the features, all pixels must be used repeatedly.
(1) Upload MPox dataset: Utilizing the section may add a dataset to the program.
(2) Preprocessing dataset to read the complete dataset, scale all of the photos so
that they are the same size, normalize the pixel values of the images, and then
divide the dataset into halves, one for the training set and one for the testing set.
Accurate prediction accuracy will be calculated by applying 20% of test photos
to the trained model.
(3) VGG16 algorithm through its paces uses the 80% of photos that have already
been processed as input to train a prediction model, which will then be used to
make predictions on test images.
(4) Execute Custom CNN algorithm: The 80% of processed photos will be sent into
the Custom CNN algorithm to train the prediction model, which will then be
used to make predictions on test images.
(5) The Comparison Graph module will be used to create a graph contrasting the
VGG and Custom CNN methods.
328 V. Kakulapati
(6) “Predict Disease from Test Image,” we can submit a test image and have Custom
CNN determine whether or not the image is healthy or contaminated with MPox.
4.3 Measures of Performance
Jaccard and Dice coefficients are used to evaluate the trustworthiness of the proposed
ML method. Cutting the ground at the actual and anticipated places of intersection
and union yields the required dimensions.
Jaccard’s index ranges from 0 to 1,
GT ∩ PS TP
Jaccard s Index = = ,
GT ∪ PS TP + FN + FP
and the Dice coefficient quantifies the degree of overlap between two masks. One
indicates a continuous overlap, whereas zero indicates there is no overlap at all.
2 ∗ (GT ∩ PS) 2TP

Dice coefficient = = .
(GT ∩ PS) + (GT ∪ PS) 2TP + FN + FP
Dice loss = 1 − Dice coefficient.

Dice loss is used with binary or categorical cross-entropy in various segmentation
situations.
The Jaccard coefficient on the dice is worth 89%, which corresponds to a value
of 80%.
89% on the Dice (Fig. 2).
The MPox recognition model is creating on the CNN and VGG16-UNET archi-
tecture. The MPox detector was trained and evaluated on the dataset acquired, 80%
trained and 20% as tests (Figs. 3, 4, and 5).
The UNET-VGG16 model was trained via several epochs. An efficient method of
machine learning was applied.
5 Discussion
The zoonotic illness MPox, which is caused by an Orthopoxvirus, has changed since
it was originally identified in the Democratic Republic of the Congo in 1970. Ten
African countries and four other countries have reported human cases of monkeypox.
The median age at presentation has grown from 4 years old in the 1970s to 18 years
old in the 2010s and 2020s, and there have been at least 10 times as many cases
reported. Death rates in Central African clades were almost twice as high as those
in West African clades, at 10.6% versus 3.6%. The dynamic epidemiology of this
reemerging illness can only be comprehended through the use of surveillance and
detection methods.
Fig. 2 Comparison and segmentation of the MPox image
Deforestation may be a cause or even function as a potentiate in the come-

back of MPox, although declining immunity is the most widely held explanation.
The Orthopoxvirus family includes the very similar MPox virus, smallpox-causing
variola virus, and smallpox vaccine-causing vaccinia virus. Despite widespread
smallpox, no instances of MPox were ever recorded at the time. This might have
happened for a few reasons: either the emphasis was on smallpox and the symptoms
of the two illnesses are similar, or the absence of scientific proof of the causative
agent led to the presumption of smallpox. In the past, we knew that the smallpox
vaccine provided around 85% protection against MPox.
Investigations on the median nerve dataset’s segmentation showed that models
constructed with the learning algorithm and/or the residual module outperformed
330 V. Kakulapati
Fig. 3 Detection of MPox confusion matrix
Fig. 4 Predict mask of MPox images
Fig. 5 UNET-VGG16
model was trained via
several epochs
their baseline counterparts. These results showed that the two augmentations might
increase model performance by using more learned information between the layers.
Some of the original image’s spatial information may be lost during the pooling
process, but the attention mechanism may turn it into a new space while keeping
important information or attributes. The residual module can then restore this infor-
mation. As a result of combining U-Net with VGG, the proposed VGG16-UNet
outperforms previous iterations of both models.
6 Conclusion
Using masked photos and a combination of the Unsupervised Feature Extraction

Network (UNET) and the Deep Convolutional Neural Network (VGG16), we can
determine whether the patient has an MPox. The UNET approach may provide great
performance in a wide variety of biomedical segmentation applications. The first
step is to use the data increase method to collect more training data, then use picture
edge detection to pinpoint the area of interest in MPox photos. With neural networks
such as UNET and VGG16, you can efficiently organize MPox outbreaks.
7 Future Enhancement
In future work, multimodal classification algorithms use an extensive dataset

to enhance the classification accuracy and apply nature-inspired optimization
algorithms for precise performance.
References
1. Kakulapati V et al (2023) Prevalence of MPX (Monkeypox) by using machine learning

approaches. Acta Sci Comput Sci 5(5):10–15
2. Gessain A, Nakoune E, Yazdanpanah Y (2022) Monkeypox. N Engl J Med 387:1783–1793
3. Mileto D, Riva A, Cutrera M, Moschese D, Mancon A, Meroni L, Giacomelli A, Bestetti G,
Rizzardini G, Gismondo MR et al (2022) New challenges in human monkeypox outside Africa:
a review and case report from Italy. Travel Med Infect Dis 49:102386
4. Doucleff M (2022) Scientists warned us about MPox in 1988. Here’s why they were right
5. https://www.npr.org/sections/goatsandsoda/2022/05/27/1101751627/scientists-warned-us-
about-MPox-in-1988-heres-why-they-were-right
6. WHO L (2022) Multi-country MPox outbreak in non-endemic countries. https://www.who.int/
emergencies/disease-outbreak-news/item/2022-DON385. Accessed on 29 May 2022
7. https://www.cdc.gov/poxvirus/MPox/symptoms.html
8. Bragazzi NL et al (2022) Attaching a stigma to the LGBTQI+ community should be avoided
during the MPox epidemic. J Med Virol
9. Rizk JG, Lippi G, Henry BM, Forthal DN, Rizk Y (2022) Prevention and treatment of MPox.
Drugs 1–7
332 V. Kakulapati
10. Sklenovska N, Van Ranst M (2018) Emergence of MPox as the most important orthopoxvirus
infection in humans. Front Public Health 6:241
11. Erez N, Achdout H, Milrot E, Schwartz Y, Wiener-Well Y, Paran N, Politi B, Tamir H, Israely
T, Weiss S et al (2019) Diagnosis of imported MPox, Israel, 2018. Emerg Infect Dis 25(5):980
12. Gong Q, Wang C, Chuai X, Chiu S (2022) MPox virus: a reemergent threat to humans.
Virologica Sinica
13. Nafisa Ali S, Ahmed T, Paul J, Jahan T, Sani S, Noor N, Hasan T. MPox skin lesion detection
using deep learning models: a feasibility study. arXiv, 13. Available online: https://arxiv.org/
pdf/2207.03342.pdf
14. Islam T, Hussain M, Chowdhury F, Islam B (2022) Can artificial intelligence detect MPox from
digital skin images? BioRxiv
15. Adler H et al (2022) Clinical features and management of human MPox: a retrospective
observational study in the UK. Lancet Infect Dis 22:1153–1162
16. Ali SN et al (2022) MPox skin lesion detection using deep learning models: a feasibility study.
arXiv:2207.03342
17. Ronneberger O, Fischer P, Brox T (eds) (2015) U-net: convolutional networks for biomedical
image segmentation. In: International conference on medical image computing and computer-
assisted intervention. Springer
18. Kakade A, Dumbali J (eds) (2018) Identification of nerve in ultrasound images using u-net
architecture. In: 2018 International conference on communication information and computing
technology (ICCICT). Mumbai, India
19. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recogni-
tion. Available online: https://arxiv.org/abs/1409.1556
20. Inan MSK et al, Deep integrated pipeline of segmentation leading to classification for automated
detection of breast cancer from breast ultrasound images. Available online: https://arxiv.org/
abs/2110.14013
21. Iglovikov V, Shvets A. Ternausnet: U-net with VGG11 encoder pre-trained on imagenet for
image segmentation. Available online: https://arxiv.org/abs/1801.05746
22. Kakulapati V et al (2023) Monkeypox detection using transfer learning, ResNet50, Alex Net,
ResNet18 and custom CNN model. Asian J Adv Res Rep 17(5):7–13. https://doi.org/10.9734/
ajarr/2023/v17i5480
23. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical
image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE,
pp 248–255
24. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
25. Kakulapati V et al (2021) Analysis of tumor detection using UNETS and VGG16 weights. J
Med Pharm Appl Sci 10(4). ISSN: 2320-7418
Role of Robotic Process Automation
in Enhancing Customer Satisfaction
in E-commerce Through E-mail
Automation
Shamini James, S. Karthik, Binu Thomas, and Nitish Pathak
Abstract In recent years, the use of Robotic Process Automation (RPA) in e-

commerce has grown in popularity. RPA gives businesses the ability to automate
routine, manual processes, increasing productivity, cutting down on response times,
and improving customer satisfaction. RPA can be used in e-commerce to automate a
variety of e-mail-related functions, including reading, processing, and handling client
inquiries. RPA can also be effectively used for handling online payments and sending
personalized immediate responses to customers. This paper is a case study that gives
an overview of RPA technology and how it is used in real-time e-mail automation,
particularly for managing customer payments and e-mail feedback. The paper also
explains the implementation experiences of RPA systems in creating accounts in
Moodle LMS and appropriate course enrollment in the LMS as per user require-
ments. Consequently, the paper introduces the systematic procedures for setting up
RPA automation in an e-learning environment to improve efficiency and customer
satisfaction. The paper also discusses the advantages and general concerns over using
RPA in customer support and e-mail automation in an e-commerce environment.
Keywords Robotic process automation · E-commerce · E-mail automation ·

Customer support
S. James · S. Karthik
Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India
B. Thomas (B)
Marian College Kuttikkanam, Peermade, Idukki, Kerala, India
e-mail: Binu.thomas@mariancollege.org
N. Pathak
Bhagwan Parshuram Institute of Technology (BPIT), GGSIPU, New Delhi, India
https://doi.org/10.1007/978-981-99-6553-3_26
334 S. James et al.
1 Introduction
Robotic Process Automation, or RPA, is a technology that enables businesses to

automate manual and repetitive operations. In RPA, regular tasks like data entry,
form filling, and process execution are carried out by software robots, or “bots,” in
place of humans. The bots may communicate with a variety of systems and appli-
cations, including enterprise resource planning (ERP) software, customer relation-
ship management (CRM) systems, and other backend systems [1]. The RPA bots
are built to imitate human activities. Organizations gain greatly from RPA, which
boosts production and efficiency while reducing costs and increasing compliance.
Additionally, it helps to free up workers from menial and low-value duties, so they
can concentrate on more strategic and creative work [2]. RPA has grown quickly
in popularity in recent years and is anticipated to become a crucial technology for
businesses. E-mail is a common form of consumer communication for e-commerce
companies regarding purchases, delivery, refunds, and other issues [3]. E-commerce
companies must have a system in place for managing and responding to customer
inquiries promptly and effectively if they want to offer e-mail customer service that
is effective [4]. This often entails having a group of customer service professionals
who are qualified to respond to a variety of questions from clients and address their
problems [5, 6].
Robotic Process Automation (RPA) can also be used for of Moodle administra-
tion since it automates time-consuming, repetitive chores, streamlines administrative
procedures, and improves overall effectiveness [7]. This paper describes the use of
the RPA tool in managing customer service in the payment section of an e-learning
site operated by Marian College Kuttikkanam (Autonomous) called Marian Institute
of Innovative Teaching Learning and Evaluation (MIITLE). It also discusses about
the implementation experience of RPA in Moodle LMS administration.
2 Review of Literature
Studies show that RPA may greatly improve the e-commerce industry in terms of
effectiveness, accuracy, and cost savings [8]. The capacity of technology to automate
tedious and routine work has allowed employees to focus on more strategic and
value-adding tasks [1]. Several e-commerce tasks, including order processing [1],
customer assistance [9], and inventory management [10], can be carried out using
RPA. According to the study, RPA has also been found to work well with other
technologies, including artificial intelligence and machine learning [5], allowing
for even greater automation and efficiency gains. Studies have also examined how
RPA will impact workforce in e-commerce [11]. RPA can cause employment losses,
according to some studies [12], but others [13, 14] have found that it can also provide
jobs and opportunities for reskilling.
Role of Robotic Process Automation in Enhancing Customer … 335
The research suggests that RPA might considerably benefit the e-commerce
industry overall, but it also highlights the challenges and limitations of using the
technology. Studies on RPA are mostly concerned with figuring out how to apply
it in a way that is advantageous to businesses and employees [6] and how to use it
to enhance the customer experience in e-commerce. Recent studies have shown that
e-mail automation can significantly improve consumer experiences and productivity
gains in commerce industry [4]. The capacity of technology to automate tedious and
routine tasks has allowed employees to focus on more strategic and value-adding
tasks. The literature has also emphasized the difficulties and restrictions associated
with e-mail automation implementation in the e-commerce sector. These include
concerns about data security and privacy [15], employee resistance to change [11,
16], and the substantial up-front expenditures of installing e-mail automation solu-
tions [17, 18]. In general, the literature indicates that e-mail automation has the
potential to significantly help the e-commerce sector, but it also emphasizes the
difficulties and constraints of putting the technology into practice [18]. In Moodle,
repetitive and time-consuming administrative processes including user management,
course enrollment, data entry and migration, grading, and report preparation can all
be automated using RPA, according to the literature [19]. RPA increases adminis-
trative productivity by increasing efficiency, lowering errors, and improving overall
performance of the LMS [7].
3 The E-learning Environment
Marian College Kuttikkanam (Autonomous) has a training institute named Marian

Institute for Innovative Teaching Learning and Evaluation (MIITLE). It aims at
bringing innovations into teaching learning and evaluation through faculty empow-
erment. During Covid-19, MIITLE started offering online courses on Moodle, Video
Content Creation, ICT-Enabled Teaching, Google Classroom, etc. The courses were
offered to teachers after accepting online payments. The payment gateway was inte-
grated using the Razorpay payment portal. During Covid pandemic, 2230 teachers
from colleges, schools, and medical institutions have joined the online courses. The
online courses were offered using a dedicated Moodle Learning Management System
(LMS) Server installed on the Amazon cloud platform where the participants were
enrolled in the courses immediately after receiving payments through the payment
gateway.
336 S. James et al.
Table 1 Areas of RPA implementation in the project

Areas of ımplementation Purpose of automation
Online payment Ensuring successful transaction
Customer support Communication about payment status
Moodle LMS administration Creation of Moodle accounts
Course enrollment Enrollment of participants to Moodle courses
E-mail automation Intimating Moodle users about their login credentials
4 Need for Robotic Process Automation
The online courses were offered by the college during the Covid pandemic lock-
down. During that time, it was extremely difficult to find support staff to coordinate
marketing, receive payments from individuals, create Moodle accounts, enroll partic-
ipants in their preferred courses, and send e-mail communications to participants. The
participants were expecting immediate responses from the college after making their
payments. If there were payment issues like failures and multiple payments, these
issues also had to be communicated to the participants. After a successful payment
also, the MIITLE office had to do a series of routine activities before sending the
e-mail confirmation to the participants. These routine activities are listed in Table 1.
Usually, these routine activities were done by support staff available at the college
office, and due to the lockdown situation, the college had to rely on RPA technolo-
gies to automate these repetitive tasks. Due to the lack of availability of human
resources during Covid lockdown, it was taking almost two days for managing
customer payments and to send e-mails with Moodle LMS login credentials.
5 RPA Implementation
MIITLE had decided to implement RPA in managing the user accounts of the e-
learning platform to overcome the challenges caused by COVID. The first step was
to identify the specific areas of routine operations to implement Robotic Process
Automation. After a detailed analysis of the requirement, it was decided to incor-
porate payment management, user feedback, Moodle account creation, and user
notification in the RPA module. Responding to user inquiries was not incorporated
under RPA because of its technological feasibility. It was decided to use the UiPath
Business automation package as the RPA tool for development. The RPA automa-
tion plan was mainly focusing on e-mail automation so that the participants will
immediately receive e-mail clarification and login credentials for Moodle eLearning
accounts. Different modules of the eLearning environment were considered for RPA
automation.
5.1 Payment Management
The RPA module was designed to automate the process of capturing payment infor-
mation from customers, such as credit card details or other forms of payment, and then
processing the payment through the relevant payment gateway. It can also reconcile
payments received from customers against duplicate payments and failed payments
to reduce manual effort and improve accuracy. The same RPA module is capable
of segregating and preparing an Excel worksheet that contains the e-mail and other
contact details of customers based on successful payments, failed payments, and
duplicate payments. Excel application scope container activity of UiPath is used for
reading Excel files and creating a new Excel file.
The Razorpay payment portal can prepare a daily payment report from the dash-
board. UiPath Open Application Activity is used for opening the Razorpay portal
and downloading the daily payment reports. Logging into the portal, locating the
report generation tab, selecting the duration for report generation, and downloading
the report in Excel format are automated at this stage.
5.2 Moodle Account Creation
RPA Module is designed to automate the process of creating Moodle accounts by

automatically inputting the information received from the payment gateway into
the Moodle Administration tab. The Open Application Activity of UiPath is used
for generating unique login credentials for each user after automatically logging
into Moodle. The RPA module developed for Moodle automation is capable of
creating Moodle account after receiving payments. The RPA module thus devel-
oped for Moodle automation can assign a student role to the new participant. After
assigning a student role, the participant is automatically enrolled by the RPA module
into an appropriate course based on the choice made by the participant at the time of
course registration.
5.3 E-mail Automation
Customers usually expect immediate feedback after making the payment to join a
course. Previously, these communications were sent manually after verifying the
payments from the payment gateway application. This was a time-consuming task
to verify the successful payments and to intimate the participants by e-mail about
their login credentials. An RPA module was developed using the Open Application
and SMTP mail message activities of UiPath. Send SMTP Mail Message activity
available in UiPath studio for e-mail automation is used for sending e-mails. Different
components used in RPA are explained in Table 2.
338 S. James et al.
Table 2 UiPath components used in the automation

Area of automation UiPath component used
Payment through Razorpay Open application UiPath activity
Extracting personal information Excel application scope container UiPath activity
Moodle account creation Open application UiPath activity
Moodle course enrollment Open application UiPath activity
E-mail automation SMTP mail message UiPath activity
6 Discussions
The implementation of Robotic Process Automation has many visible benefits.

During the COVID lockdown, the MIITLE office could not run online courses without
the support staff. Even with support clerical staff, it was taking more than a day to
send the first response to the customer after receiving the payment. It was taking
two days for MIITLE to create Moodle accounts, enroll participants into courses
according to their choice, and send them their Moodle LMS login credentials.
With the implementation of the RPA module to automate these tasks, it was taking
less than ten minutes to send responses to the participants after getting the payment
reports from the Razorpay payment gateway. The average time for the first response
to customers was reduced to ten minutes through RPA. There were 1513 participants
to whom the first confirmation e-mails were sent through the automation process.
The details are depicted in Fig. 1. From the figure, it is clear that the time required
for sending the first response varies from 2 to 10 min, and in most of the cases, it is
between 3 and 8 min. The variation in time occurs due to the delay in realizing the
payment. The benefits of using RPA in the e-learning environment are explained in
Table 3.
The time for Moodle account creation, enrollment of the participant to the courses
of their preference, and sending an automated e-mail with their login credentials are
found to be between 1 and 6 h. This could be achieved with lesser time with RPA
implementation, but a manual verification of Moodle accounts and course enrollment
is performed before sending the e-mail. The time taken for this process against the
number of participants falling under this time slot is illustrated in Fig. 2. From Fig. 2,
15
10
0
1
59
117
175
233
291
349
407
465
523
581
639
697
755
813
871
929
987
1045
1103
1161
1219
1277
1335
1393
1451
1509
Fig. 1 Customers and time is taken for the first response through RPA after payment
Table 3 Benefits of using RPA in customer support

Activity Duration before RPA implementation Duration after RPA
First response after payment One day 10 min
Creation of Moodle account Two days 6h
Course enrollment Two days 6h
Sending Moodle login credentials Two days 6h
Fig. 2 Time for sending LMS login credentials and number of customers
it is clear that most of the participants received their LMS login credentials between
1.45 and 1.89 h. Forty participants received their login details between 5.85 and
6.26 h.
Based on this study, integrating RPA into Moodle administration frees Moodle
administrator from tedious activities and allows him to concentrate on more strategic
and value-added initiatives. RPA has several advantages, but there are also a
number of difficulties and things to think about, which are covered in the literature.
These include the necessity for strong data security measures, potential stakeholder
resistance to automation, implementation-related technological challenges, and the
continuing upkeep and supervision of RPA bots.
Many e-commerce procedures can be automated with RPA, but human oversight
is still required to make sure that the automation is effective and that any failures
or exceptions are handled effectively. RPA implementation must ensure that it has
the appropriate personnel in place to supervise the RPA systems and handle any
problems that may occur.
7 Conclusion
There are several advantages for both customers and businesses when Robotic
Process Automation (RPA) is used in e-mail automation for customer care during
online payments. RPA frees up customer service representative’s time, so they
may work on more challenging and value-adding tasks by automating repetitive
340 S. James et al.
and manual processes like sending e-mails, gathering and analyzing data, and
updating client information. RPA deployment also ensures accuracy and consis-
tency in customer communications, improving customer loyalty and satisfaction.
According to the study’s findings, RPA is a highly efficient technique for streamlining
and enhancing the customer assistance experience for online payments. Numerous
advantages result from the incorporation of RPA in the project, including increased
administrative effectiveness, improved data synchronization and integration, and the
capacity to concentrate on strategic activities. Future studies need concentrate on
assessing RPA’s long-term effects on Moodle administration and investigating new
developments depending on the scalability of automation.
References
1. Lee J, Lee M, Kim K (2018) Impact of robotic process automation on business process
outsourcing: a knowledge-based view. J Bus Res 91:428–436
2. Aguirre S, Rodriguez A (2017) Automation of a business process using robotic process
automation (RPA): a case study. 2:65–71
3. Choi D, Hind R (2021) Candidate digital tasks selection methodology for automation with
robotic process automation
4. Akshay PN, Kalagi N, Shetty D, Ramalingam HM (2020) E-mail client automation with RPA
5. Wang D, Chen S, Zhao X, Li X (2018) Understanding the impact of robotic process automation
on business processes: a case study in the financial sector. Inform Syst Front 20(4):799–814
6. Yu KC, Lu HP, Chen JC (2021) The impact of robotic process automation on customer
satisfaction: evidence from the banking industry. J Business Res 125:586–597
7. Sharma U, Gupta D (2021) E-mail ıngestion using robotic process automation for online
travel agency. In: 2021 9th International conference on reliability, ınfocom technologies and
optimization (Trends and Future Directions)(ICRITO), IEEE, pp 1–5
8. Lacity MC, Willcocks LP (2017) Robotic process automation and risk mitigation: the role of
internal audit. J Inf Technol 32(3):256–268
9. Menon VS, Soman R (2020) Robotic process automation (RPA) for financial reporting: A
REVIEW of emerging practices and research opportunities. J Account Lit 45:23–40
10. Seidel S, Hirsch B, Treiblmaier H (2020) Towards a comprehensive understanding of the impact
of robotic process automation on organizations. J Bus Res 108:365–379
11. Madakam S, Holmukhe RM, Jaiswal DK (2019) The future digital work force: robotic process
automation (RPA). JISTEM-J Inform Syst Technol Managem 16
12. Bourgouin A, Leshob A, Renard L (2018) Towards a process analysis approach to adopt robotic
process automation
13. Bhardwaj V, Rahul KV, Kumar M, Lamba V (2022) Analysis and prediction of stock market
movements using machine learning. In: 2022 4th International conference on ınventive research
in computing applications (ICIRCA), pp 946–950
14. Hofmann P (2019) Robotic process automation
15. Issac Ruchi RM (2018) Delineated analysis of robotic process automation tools, pp 0–4
16. Mohamed SA, Mahmoud MA, Mahdi MN, Mostafa SA (2022) Improving efficiency and
effectiveness of robotic process automation in human resource management. Sustainability
14(7):3920
17. Sobczak A (2022) Robotic process automation as a digital transformation tool for ıncreasing
organizational resilience in polish enterprises. Sustain 14(3)
18. Hyun Y, Lee D, Chae U, Ko J (2021) Applied sciences ımprovement of business productivity
by applying robotic process automation
19. Munawar G (2021) Bot to monitor student activities on e-learning system based on robotic
process automation (RPA). Sinkron: J dan penelitian teknik informatika 6(1):53–61
20. Bhardwaj V, Kukreja V, Sharma C, Kansal I, Popali R (2021) Reverse engineering-a method for
analyzing malicious code behavior. In: 2021 International conference on advances in computing
communication and control (ICAC3), pp 1–5
21. Athavale VA, Bansal A (2022) Problems with the implementation of blockchain technology
for decentralized IoT authentication: a literature review. Blockchain Ind 4.0, pp 91–119
22. Van der Aalst WM, Bichler M, Heinzl A (2018) Robotic process automation. Bus Inf Syst Eng
60:269–272
Gene Family Classification Using
Machine Learning: A Comparative
Analysis
Drishti Seth, KPA Dharmanshu Mahajan, Rohit Khanna,

and Gunjan Chugh
Abstract Accurate classification of gene families is of utmost importance in

comprehending the functional roles and evolutionary history of genes within a
genome. The exponential growth of genomic data has heightened the urgency for
efficient and effective methods to classify gene families from DNA sequences. In
this research paper, we present a novel approach for classifying DNA sequences into
seven gene families. Our approach is based on machine learning and uses k-mer
counting as a feature engineering technique to predict the gene family of a given
DNA sequence. We evaluated our approach on a large dataset of DNA sequences
and achieved a high accuracy of 90.9% in classification performance. Our results
demonstrate the potential of machine learning methods for advancing our under-
standing of DNA sequences and gene families and can provide valuable insights for
biologists and geneticists.
Keywords Bioinformatics · Gene family · DNA sequences · Classification ·

Machine learning · k-mer
D. Seth · KPA Dharmanshu Mahajan (B) · R. Khanna · G. Chugh

Department of Artificial Intelligence and Machine Learning, Maharaja Agrasen Institute of
Technology, Delhi, India
e-mail: kpamahajan@gmail.com
D. Seth
e-mail: drishtiseth@gmail.com
R. Khanna
e-mail: rohitkhannajee@gmail.com
G. Chugh
e-mail: gunajanchugh@mait.ac.in
https://doi.org/10.1007/978-981-99-6553-3_27
344 D. Seth et al.
1 Introduction
Bioinformatics is a discipline that uses computer science, statistics, mathematics, and

engineering to analyze and understand biological data. Its origins can be traced back
to Gregor Mendel’s groundbreaking work on hereditary traits in 1865. James Watson
and Francis Crick’s groundbreaking discovery of the structure of DNA in 1953 further
solidified the foundations of this discipline. Since then, bioinformatics has been
essential in the evaluation and interpretation of biological data. Bioinformatics is a
field that aims to unravel the principles governing nucleic acid and protein sequences,
with a specific emphasis on gene sequences. Chromosomes, which reside within the
nucleus of every cell, serve as the structures that contain the cell’s DNA. The DNA
that makes up each chromosome is tightly coiled around proteins known as histones to
support its shape. DNA carries heritable information in the form of nucleotides, which
consist of three components: a nitrogenous base, a phosphate group, and a 5-carbon
sugar. The nitrogenous base can be either adenine (A), thymine (T), cytosine (C),
or guanine (G). The natural information is decoded by the different sequences these
four nucleotides are put in. For the study of natural processes like gene expression
and inheritable diversity, understanding the order of these nucleotides is essential.
Bioinformatics has surfaced as a vital field that utilizes computational tools and
algorithms to dissect and interpret complex natural data.
Bioinformatics is a rapidly evolving discipline, with new algorithms being regu-
larly developed. For instance, bioinformatics increasingly employs machine learning
to generate predictive models capable of accurately classifying biological data. The
analysis of complex biological data, such as DNA sequences and protein structures,
relies on deep learning, a subfield of machine learning.
1.1 Problem Statement
This project’s primary objective is to develop a machine learning model that can
appropriately classify DNA sequences into various gene families. DNA sequences
are crucial molecules that transport the genetic data required for the synthesis of
proteins in living things. We can learn more about how genes work by determining
which gene family a specific DNA sequence belongs to. Building a solid machine
learning model that can precisely classify DNA sequences into their appropriate gene
families is the goal of this study.
Gene Family Classification Using Machine Learning: A Comparative … 345
1.2 Machine Learning in Bioinformatics
Due to the exponential expansion of sequencing data and the limits of conventional
methods based on sequence alignment and homology, gene family categorization is
a difficult task in bioinformatics.
One of the key difficulties in gene family classification is distinguishing between
homologous and paralogous genes. Paralogous genes arise from gene duplication
events and may have diverged significantly from their ancestral sequence, while
homologous genes share a common ancestor and exhibit similar sequences. Machine
learning algorithms aid in identifying sequence patterns that indicate homology or
paralogy, aiding in the differentiation of gene types.
Machine learning has become an essential tool in gene family classification, partic-
ularly for the analysis of large-scale genomic data. As machine learning algorithms
continue to advance and as more genomic data becomes available, it is anticipated
that machine learning will continue to play a crucial role in the analysis of biological
data and the discovery of novel gene families.
1.3 Motivation
Gene family classification plays a significant role in precision medicine by providing

insights into individual genetic profiles and enabling personalized healthcare. By
accurately classifying gene families, it becomes possible to understand the vari-
ations and mutations within specific gene families, leading to the identification of
disease-associated variants and the prediction of individual disease risks. This knowl-
edge is critical for assessing an individual’s susceptibility to certain diseases and
enables proactive measures for disease prevention, early detection, and personalized
screening programs. Moreover, accurate gene family classification plays a vital role
in treatment selection and response prediction. Different gene families may influence
an individual’s response to specific therapies, and by considering the genetic profile
of the patient, healthcare providers can predict treatment responses and select the
most effective treatment options. Additionally, gene family classification facilitates
the development of targeted therapies tailored to an individual’s genetic profile. This
study is organized into several sections to effectively present the study’s objectives
and findings. The dataset description section provides a detailed overview of the
dataset used, including its source, size, and characteristics of the gene sequences.
The proposed architecture section outlines the methodology and approach
employed, incorporating various machine learning algorithms such as support vector
machines, random forests, or XG Boost. Feature extraction using k-mer is explained
separately, detailing the process and rationale behind its selection. The limitations of
the research are discussed, highlighting potential biases or challenges faced during
the study. A comparative analysis section presents a comprehensive evaluation of
the performance and accuracy of the different machine learning algorithms used.
346 D. Seth et al.
Results and discussion are presented, including the outcome of the classification
experiments and the interpretation of the findings. The conclusion summarizes the
main findings, emphasizes their significance, and proposes future directions for gene
family classification research.
2 Literature Survey
The authors in [1] use a genetic algorithm in conjunction with deep learning to
classify viral DNA sequences. The suggested approach uses a genetic algorithm for
feature selection and a convolutional neural network (CNN) for feature extraction.
The proposed methodology calls for preprocessing DNA sequences to extract useful
features, followed by the use of several machine learning classification algorithms,
such as support vector machines, decision trees, and random forests.
The researchers in [2] focus on the challenging task of accurately classifying
viral DNA sequences, which is crucial for understanding viral evolution, developing
diagnostics, and designing targeted treatments. The proposed approach leverages the
power of deep learning models, specifically convolutional neural networks (CNNs),
to automatically extract relevant features from the DNA sequences.
The experts who authored the publication [3] explore the use of machine learning
techniques in analyzing DNA subsequence and restriction sites. It likely discusses
the application of machine learning algorithms to automate and enhance the analysis
of DNA sequences and restriction enzyme recognition sites.
In [4], the AdaBoost algorithm which is based on support vector machines
(SVMs), and its use in diverse domains is presented. The study suggests a novel
method for enhancing classification task performance that combines the AdaBoost
algorithm with SVM.
The authors of this study [5] provide a summary of the most significant devel-
opments in DNA sequencing technology between 2006 and 2016. It goes into the
effect of numerous sequencing platforms, their advantages and disadvantages, and
the study fields in which they are used.
The researchers in [6] discuss the problems and potential paths for DNA
sequencing research. This research introduces a deep learning method for classifying
DNA sequences.
In [7], the authors explore algorithms including artificial neural networks, support
vector machines, decision trees, and random forests to classify DNA sequences
according to their functions. The authors also explore feature selection strategies
that might be applied to retrieve pertinent data from DNA sequences.
The experts in [8] suggest a technique that makes use of the AdaBoost algorithms
to identify DNA-binding proteins. The method combines PseKNC, a feature extrac-
tion technique that captures sequence information, with the AdaBoost algorithm
for classification. By utilizing these two techniques, the proposed method aims to
enhance the accuracy of DNA-binding protein recognition.
The authors in [9] show that their approach achieves excellent accuracy and
specificity while outperforming other widely used approaches for identifying DNA-
binding proteins. Understanding protein–DNA interactions and drug discovery may
both benefit from the proposed approach. The method for storing bio-orthogonal data
in l-DNA using a mirror-image polymerase is described in the paper. The authors
demonstrate that the mirror-image polymerase accurately synthesizes l-DNA and
showcase the potential for scalable information storage using this approach..
In [10], an unsupervised classifier utilizes deep learning for gene prediction.
It likely discusses the methodology behind CDLGP, and its application in gene
prediction, and presents experimental results to demonstrate its effectiveness.
In [11], the paper introduces a K-Nearest Neighbors’ (KNNs) model-based
approach for classification tasks. It discusses the methodology of using KNN for
classification, including the selection of K value and distance metrics. The paper
may provide insights into the strengths, limitations, and experimental evaluations of
the KNN model-based approach in classification tasks.
The authors in [12] explain the theoretical aspects of SVMs, discussing the opti-
mization problem and the underlying mathematical concepts. It explores the gener-
alization properties of SVMs and their ability to handle nonlinear classification
tasks through kernel functions. Additionally, the authors present practical consid-
erations for implementing SVMs, such as the choice of kernel and the selection of
hyperparameters.
The experts in [13] conduct an empirical analysis of decision tree algorithms by
applying them to various benchmark datasets. They compare the performance of
different algorithms based on evaluation metrics such as accuracy, precision, recall,
and F1-score. The analysis aims to provide insights into the strengths, weaknesses,
and suitability of each algorithm for different classification tasks.
The experts in the paper ‘Random Forests and Decision Trees’ [14] discuss
machine learning algorithms. Random forests combine multiple decision trees,
offering robustness and generalization. Decision trees make predictions by recur-
sively splitting data based on features. The paper likely covers principles, advantages,
construction, and performance comparisons of these algorithms.
In [15], the study introduces XGBoost, an optimized implementation of gradient
boosting machines, focusing on scalability, speed, and performance improvements.
It discusses the algorithm’s key features, techniques, and empirical results.
The authors in [16] explore the AdaBoost algorithm, a popular ensemble learning
technique. The paper likely discusses the algorithm’s principles, and applications,
and provides research insights into its effectiveness.
348 D. Seth et al.
3 Proposed Work
3.1 Architecture
In the proposed architecture, the DNA dataset will undergo preprocessing to prepare
it for analysis. It involves cleaning the data and removing unwanted symbols and
information. Then, the model is trained on human data and then tested on both
chimpanzee and dog datasets. Figure 1 depicts the architectural explanation of our
proposed methodology that we have inculcated in our study.
Fig. 1 Architectural view of

the proposed methodology
3.2 Implementation of Machine Learning Algorithms
3.2.1 K-Nearest Neighbor’s Classifier
The k-Nearest Neighbor’s (kNNs) [11] classifier is used to assign gene sequences to
specific families based on their similarity to known labeled sequences by calculating
the distance between a new gene sequence and the labeled sequences in the training
set.
3.2.2 Support Vector Machine (SVM) Classifier
The support vector machine (SVM) [12] classifier creates a hyperplane that effec-
tively distinguishes various gene families by considering their feature vectors. By
identifying the best decision boundary, SVM accurately assigns unknown gene
sequences to their corresponding families.
3.2.3 Decision Tree Classifier
The decision tree classifier [13] constructs a hierarchical tree-like structure where
internal nodes represent feature tests and leaf nodes represent class labels (gene
families). The algorithm recursively partitions the feature space based on the most
informative features, leading to efficient classification.
3.2.4 Random Forest Classifier
The random forest classifier [14] is an ensemble learning approach that builds a
group of decision trees during the training process. After constructing the forest, the
classifier determines the gene family of an unknown sequence by selecting the class
that is most commonly predicted by the individual trees.
3.2.5 XGBoost Classifier
The XGBoost classifier [15] is a robust machine learning algorithm commonly

utilized in gene family classification. It is an optimized implementation of gradient
boosting, which sequentially combines weak learners to form a robust predictive
model.
350 D. Seth et al.
3.2.6 AdaBoost Classifier
The AdaBoost classifier [16] can effectively handle imbalanced datasets, where
certain gene families may be underrepresented. It focuses on improving the classi-
fication of challenging gene sequences by assigning higher weights to misclassified
instances. Through iterative training, AdaBoost adapts and enhances its predictive
performance, leading to accurate gene family classification results.
The above-mentioned classifiers are widely employed in gene family classification
due to their simplicity, interpretability, and ability to handle different types of data.
These classifiers are often used as baseline models to establish a performance bench-
mark. Boosting classifiers are powerful ensemble methods that combine multiple
weak learners (base classifiers) to create a strong classifier. In gene family classifica-
tion, boosting algorithms can effectively handle complex relationships and capture
subtle patterns in the data. This iterative process improves the overall performance
of the classifier. Boosting classifiers are known for their high accuracy and ability to
handle class imbalance.
4 Implementation
A gene family is a group of genes that share a common origin and have similar
functions or sequences. These genes are derived from a common ancestral gene
through processes such as gene duplication and divergence over evolutionary time.
Gene divergence is a fundamental mechanism that drives the evolution of gene fami-
lies. As duplicated genes accumulate mutations, they can acquire distinct functions
or develop specialized roles within an organism. Studying gene families provides
insights into the evolutionary processes that have shaped the genomes of organisms.
By comparing gene family composition and organization across different species,
researchers can unravel evolutionary relationships and trace the origins of key biolog-
ical innovations. The gene families mentioned below have significant biological and
psychological significance:
(1) G protein-coupled receptors (GPCRs): They are essential membrane proteins
that play an important role in role in cell signaling. They are involved in transmit-
ting signals from various external stimuli (such as hormones, neurotransmitters,
and light) into the cell.
(2) Tyrosine kinase: These are enzymes that add phosphate groups to specific tyro-
sine residues of target proteins. They play a vital role in cellular communication
and signaling pathways, including growth, differentiation, and cell cycle control.
(3) Tyrosine phosphatase: These are enzymes that remove phosphate groups from
tyrosine residues and act in balance with tyrosine kinases to regulate cellular
signaling and control various cellular processes.
Table 1 Gene families

Gene family Number of samples Class label
G protein-coupled receptors 531 0
Tyrosine kinase 534 1
Tyrosine phosphatase 349 2
Synthetase 672 3
Synthase 711 4
Ion channel 240 5
Transcription factor 1343 6
(4) Synthetase: They are enzymes involved in the synthesis of various molecules that
catalyze the attachment of amino acids to their corresponding tRNA molecules
during protein synthesis.
(5) Synthase: They are enzymes that catalyze the synthesis of complex molecules
by joining smaller molecular components. For example, ATP synthase is an
enzyme involved in the synthesis of ATP, the primary energy currency of cells.
Synthases are vital for energy production and various biosynthetic pathways.
(6) Ion channels: They are membrane proteins that allow the selective passage of
ions across cell membranes. They play critical roles in regulating the elec-
trical properties of cells, nerve impulses, muscle contraction, and numerous
physiological processes.
(7) Transcription factors: They are proteins that control the transcription of target
genes by binding to certain DNA regions to influence gene expression.
Table 1 represents the gene families and the number of samples along with the
class label it belongs to.
4.1 k-mer Counting
In bioinformatics and genomics, k-mer counting is a sort of feature extraction. When

working with DNA sequencing, we must convert the DNA sequences into a format
that can be properly analyzed. K-mer counting is a prominent technique for this.
It involves breaking down the DNA sequence into smaller parts based on a value
called ‘k’. This approach is particularly useful when dealing with large sets of DNA
sequences because it allows for quick and efficient analysis. The key character-
istic about k-mer counting is that it helps us deal with the challenge of variable read
lengths in DNA sequencing data.
When we sequence DNA, the lengths of the fragments we obtain can vary, which
makes it tricky to compare and analyze them accurately. But by dividing the sequences
into fixed-size segments called k-mers, we simplify the analysis process. Each k-mer
represents a unique subsequence within the DNA sequence, so we can focus on
specific patterns or areas of interest. Another advantage of K-mer counting is that
352 D. Seth et al.
Fig. 2 Analysis of DNA sequence with k = 6
Fig. 3 Class distribution of

human dataset
it makes it easier to use machine learning models in DNA sequence analysis. Many
machine learning algorithms work best when the inputs have fixed lengths, and using
fixed-size k-mers meets this requirement. By transforming the DNA sequences into
fixed-length vectors of k-mers, we can more effectively apply machine learning
techniques to uncover meaningful patterns or features in the data.
In our study, we have used k = 6 which means that we have used word length 6, also
called hexamers. Figure 3 depicts how the DNA sequence is broken down into k-mers
of size 6. The next step involves converting the list of k-mers of length six for each
gene into sentences composed of strings. This step aims to combine all the sequences,
which simplifies the conversion process into a bag-of-words representation. This
approach allows the creation of independent features in the form of strings. Figure 2
shows the analysis of DNA sequence with a ‘k’ value equal to 6.
We took up three distinct datasets: human, dog, and chimpanzee genomes from
Kaggle. Each dataset consists of DNA sequences along with corresponding class
labels. In the datasets, we encountered varying class distributions, with some classes
having low representation, while others were relatively balanced. Therefore, utilizing
the available data directly seemed like a viable option. In addition to this, if the issue of
class imbalance persists, oversampling techniques can be employed. Oversampling
involves duplicating or synthesizing minority class samples to create a more balanced
distribution, helping machine learning models learn from a representative dataset.
Another technique that can be used is class weighting. It assigns different weights to
instances of each class during the training process, typically inversely proportional to
the class frequencies. Class weighting gives higher importance to the minority class,
enabling the model to focus on correctly predicting instances from that class. This
helps reduce the influence of class disparities and increases the model’s performance.
For the human genome dataset, we divided it into training and test sets in an 80:20
ratio. The training set was used to train various classification algorithms, while the
test set served as an independent evaluation of model performance. The chimpanzee
and dog datasets were solely employed for testing purposes. Our primary objective
was to assess the generalizability of the machine learning model across divergent
species. By initially training the model on the human data, we aimed to determine
its effectiveness when applied to the chimpanzee and dog datasets, which repre-
sent species with increasing divergence from humans. To evaluate the accuracy of
the classification algorithms, we employed different classification metrics derived
from the confusion matrix. These metrics provided a comprehensive assessment of
the model’s performance. In Figs. 3 and 4, we present the Class Balance of each
dataset, showcasing the distribution of classes across the human, dog, and chim-
panzee genomes. Table 2 provides a comprehensive overview of the datasets utilized
in this study, including the number of sequences contained in each dataset and their
corresponding attributes.
Fig. 4 Class distribution of dog and chimpanzee datasets

354 D. Seth et al.
Table 2 Description of each

Items Chimpanzee Dog Human
dataset
Number of sequences 1682 820 4380
Number of attributes 2 2 2
4.3 Limitations and Challenges
Pseudogenes: The classification of pseudogenes, non-functional gene copies with

high sequence similarity, can be challenging. Distinguishing them from functional
genes requires careful examination of features such as intact open reading frames or
regulatory elements. Incorporating such functional elements is crucial for accurate
classification.
Sequence Variations: Genetic polymorphisms, allelic differences, and sequencing
errors introduce variations that can complicate gene family classification. To address
this, advanced algorithms and comprehensive analysis of multiple individuals or
species are necessary. Considering a broader range of genetic variations helps to
improve the accuracy of classification.
Identification of New Gene Families: Gene families exhibit a dynamic nature,
constantly evolving and giving rise to new families. Staying updated with the latest
genomic data, regularly updating classification pipelines, and integrating diverse
genomic information are essential for identifying and classifying new gene families.
A holistic approach that encompasses multiple data sources and analytical techniques
facilitates the discovery of novel gene families.
Gene Family Size and Complexity: Gene families can vary significantly in size
and complexity. Some gene families consist of only a few closely related genes,
while others are large and contain divergent members. Classifying complex fami-
lies with extensive divergence necessitates the use of specialized algorithms and
scalable computational resources due to the computational demands involved. These
resources enable comprehensive analysis and accurate classification of gene families
with varying sizes and complexities.
5 Assessment Metrics
Accuracy and F1-score are significant evaluation metrics to take into account when
using machine learning approaches for classifying gene families. By calculating the
proportion of cases that are correctly classified to all instances, accuracy assesses the
classification model’s overall correctness. A more complete metric that incorporates
recall and precision is the F1-score. It takes into account both false positives (putting
a gene in the incorrect family) and false negatives (not finding a gene that belongs
to a certain family). It is frequently crucial to have both a high accuracy rate and an
F1-score simultaneously when classifying gene families.
Achieving high accuracy in classifying gene families is important for us to under-

stand the functions and evolutionary history of genes. When we can accurately clas-
sify genes into families, it helps us assign meaningful roles to them and predict their
functions. Accurate classification also allows us to trace the evolutionary relation-
ships of genes, identifying genes that have a shared ancestry or that have arisen
from gene duplication events. By understanding these relationships, we can gain
insights into how genes have evolved and how their functions have changed over
time. Furthermore, accurate classification helps us unravel the intricate networks
and interactions between genes, shedding light on complex biological processes.
6 Comparative Analysis
In our research, we conducted a comprehensive comparison and analysis of various

machine learning models to evaluate their accuracy and F1-score. The models consid-
ered in our study as mentioned above. Among these models, the Random forest
method achieved a remarkable accuracy of 90.9% and an F1-score of 91.06%.
The results of our study show that the proposed model is highly effective in
accurately categorizing DNA sequences. Among the various algorithms we tested,
the random forest classifier performed the best, achieving the highest accuracy and
F1-score. Based on this performance, we decided to further evaluate the random forest
model using additional datasets, specifically DNA sequences from Chimpanzees
and dogs. By including these genetically related (human and Chimpanzee) and less
related (human and dog) species, we aimed to assess the model’s performance across
different levels of genetic similarity.
In Figs. 5, 6 and Table 3, we present the comparative analysis of all the algorithms
we tested. These figures provide visual representations of how the different models
performed in terms of accuracy and F1-score. The data demonstrates the superior
performance of the random forest model, reinforcing our decision to select it for
further evaluation with the Chimpanzee and dog datasets.
7 Result Analysis
In our analysis, we placed specific emphasis on evaluating the performance of two

classification algorithms: the random forest classifier and the XGBoost classifier.
The confusion matrices depicted below present a comprehensive depiction of the
model’s predictions by comparing the actual labels with the predicted labels. These
matrices serve as an invaluable summary of the classification outcomes.
Upon assessing the human genome test set, we observed that the random forest
classifier achieved an impressive accuracy of 91.5%.
When extending our analysis to the chimpanzee dataset, which exhibits genetic
similarities to humans, both the random forest and XGBoost classifiers showcased
356 D. Seth et al.
Fig. 5 Accuracy performance of all the algorithms
Fig. 6 F1-score performance of all the algorithms
Table 3 Accuracy and

S. No Classifier Accuracy (%) F1-score (%)
F1-score performance of all
the algorithms 1 KNN 81.3 79.8
2 SVM 81.5 81.9
3 Decision tree 83.5 81.1
4 Random forest 90.9 91.06
5 XG boost 89.1 89.3
6 AdaBoost 77.6 83.6
Fig. 7 Confusion matrix for

human dataset random forest
classifier
comparable accuracies. The random forest classifier attained an accuracy of 98.4%.

However, the divergent dog genome dataset presented a distinct challenge. Here, the
random forest classifier outperformed the XGBoost classifier, attaining an accuracy
of 82%.
Overall, the analysis highlights the varying performance of the random forest and
XGBoost classifiers across different genome datasets. While the XGBoost classi-
fier excelled on the human genome and exhibited comparable performance on the
chimpanzee dataset, the random forest classifier proved to be more effective when
confronted with the distinct characteristics of the dog genome.
The accuracy values presented above were derived from analysis of the corre-
sponding confusion matrices, as visually depicted in Figs. 7, 8, 9, and 10. These
matrices encapsulate the classification outcomes and serve as valuable references for
evaluating the performance of the classifiers.
This study undertook a comparative analysis of five distinct classification algorithms,

namely KNN, SVM, Decision tree, Random forest, XGBoost, and AdaBoost. To
convert DNA sequences into fixed-length vectors, the k-mer encoding technique
was employed. Additionally, NLP’s bag-of-words algorithm, implemented using a
count vectorizer, facilitated the processing of DNA sequence strings. Among all
the algorithms, Random forest and XGBoost displayed equally impressive results,
achieving accuracies of 90.9% and 89.1%, respectively.
The project has ample room for future enhancements and expansions, offering
an opportunity to delve into the relationship between classifier performance and
358 D. Seth et al.
Fig. 8 Confusion matrix for

dataset using XG boost
classifier
Fig. 9 Confusion matrix for chimpanzee and dog datasets, respectively, using random forest
classifier
Fig. 10 Confusion matrix for chimpanzee and dog datasets, respectively, using XGBoost classifier
variations in the ‘k’ values. By adjusting the ‘k value’, which determines the length
of the substring in gene sequences, we can gain insights into how different values
impact the classifier’s effectiveness. Investigating the effects of diverse ‘k’ values
on the classifier’s performance presents an enticing path for further research and
exploration.
References
1. Dixit P, Prajapati GI (2015) Machine learning in bioinformatics: a novel approach for DNA
sequencing. In: 2015 Fifth international conference on advanced computing communication
technologies, Haryana, India, pp 41–47. https://doi.org/10.1109/ACCT.2015.73. https://ieeexp
lore.ieee.org/document/707904
2. El-Tohamy A, Maghawry HA, Badr N (2022) A deep learning approach for viral DNA sequence
classification using genetic algorithm. Int J Adv Comput Sci Appl 13
3. Moyer E, Das A (2020) Machine learning applications to DNA subsequence and restriction
site analysis. In: 2020 IEEE signal processing in medicine and biology symposium (SPMB),
December, IEEE, pp 1–6. https://ieeexplore.ieee.org/document/9353634/
4. Zhang Y, Ni M, Zhang C, Liang S, Fang S, Li R, Tan Z (2019) Research and application
of AdaBoost algorithm based on SVM. In: 2019 IEEE 8th joint international information
technology and artificial intelligence conference (ITAIC), May, IEEE. https://ieeexplore.ieee.
org/abstract/document/8785556/authors#authors
5. Mardis ER (2017) DNA sequencing technologies: 2006–2016. Nature Protocols 12(2):213–
218. https://www.nature.com/articles/nprot.2016.182
6. Rizzo R, Fiannaca A, La Rosa M, Urso A (2016) A deep learning approach to DNA sequence
classification. In: Angelini C, Rancoita P, Rovetta S (eds) Computational intelligence methods
for bioinformatics and bio-statistics. CIBB 2015. Lecture Notes in Computer Science, vol 9874.
Springer, Cham. https://link.springer.com/chapter/https://doi.org/10.1007/978-3-319-44332-
4_10
7. Kanumalli SS, Swathi S, Sukanya K, Yamini V, Nagalakshmi N (2023) Classification of
DNA sequence using machine learning. In: Ranganathan G, Fernando X, Piramuthu S (eds)
Soft computing for security applications. advances in intelligent systems and computing, vol
1428. Springer, Singapore. https://link.springer.com/chapter/https://doi.org/10.1007/978-981-
19-3590-9_57
8. Yang L, Li X, Shu T, Wang P, Li X (2021) PseKNC and Adaboost-based method for DNA-
binding proteins recognition. Int J Pattern Recogn Artif Intell 2150022. https://www.worldscie
ntific.com/doi/abs/https://doi.org/10.1142/S0218001421500221
9. Fan C, Deng Q, Zhu TF (2021) Bioorthogonal information storage in l-DNA with a high-
fidelity mirror-image Pfu DNA polymerase. Nature Biotechnol 39(12):1548–1555. https://
www.nature.com/articles/s41587-021-00969-6
10. Sree PK, Rao PS, Devi NU (2017) CDLGP: a novel unsupervised classifier using deep learning
for gene prediction. In: 2017 IEEE international conference on power, control, signals and
instrumentation engineering (ICPCSI), September, IEEE, pp 2811–2813. https://ieeexplore.
ieee.org/abstract/document/8392232
11. Guo G, Wang H, Bell D, Bi Y (2004) KNN Model-based approach in classification. https://
link.springer.com/chapter/https://doi.org/10.1007/978-3-540-39964-3_62
12. Evgeniou T, Pontil M (2001) In: Support vector machines: theory and applications. vol 2049.
pp 249–257. https://doi.org/10.1007/3-540-44673-7_12. https://link.springer.com/chapter/10.
1007/3-540-44673-7_12
13. Patel H, Prajapati P (2018) Study and analysis of decision tree based classification algorithms.
Int J Comput Sci Eng 6:74–78. https://doi.org/10.26438/ijcse/v6i10.7478
360 D. Seth et al.
14. Jehad A, Khan R, Ahmad N, Maqsood I (2012) Random forests and decision trees. Int J Comput
Sci Issues (IJCSI) 9
15. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. pp 785–794. https://doi.
org/10.1145/2939672.2939785. https://www.researchgate.net/profile/Shatadeep-Banerjee/
publication/318132203_Experimenting_XGBoost_Algorithm_for_Prediction_and_Classific
ation_of_Different_Datasets/links/595b89b0458515117741a571/Experimenting-XGBoost-
Algorithm-for-Prediction-and-Classification-of-Different-Datasets.pdf
16. Chengsheng T, Huacheng L, Bing X (2017) AdaBoost typical Algorithm and its application
research. In: MATEC Web of Conferences. vol 139. pp 00222. https://doi.org/10.1051/matecc
onf/201713900222. http://www.yorku.ca/gisweb/eats4400/boost.pdf
Dense Convolution Neural Network
for Lung Cancer Classification
and Staging of the Diseases Using
NSCLC Images
Ahmed J. Obaid, S. Suman Rajest, S. Silvia Priscila, T. Shynu,

and Sajjad Ali Ettyem
Abstract Lung cancer is life-threatening cancer disease which is owing to abnormal

development of cells in the lung and its surrounding tissue. Hence, identification and
classification of the lung tumor growth through physical examination are extremely
challenging due to complex boundaries and features with high degree of intraclass
variation and low degree of interclass variations. Machine learning approaches were
implemented to classify the cancer on basis of the tumor representation and its
features, but those models consume more computation time and produce minimized
accuracy and efficiency. In order to manage those complications, deep learning archi-
tecture has been introduced as it is multiple advantageous in characterizing the lung
lesions features accurately. In this article, dense Convolution Neural Network archi-
tecture is employed to lung cancer classification and staging of the disease to the
NSCLC images has been proposed. Initially, Wiener filter is employed as prepro-
cessing technique as it improves the results of segmentation. Next, gradient vector
flow-based segmentation has been implemented on the images to segment the coarse
appearance and lesion boundary, and segmented image has been employed to ABCD
rule as feature descriptors to extract the features lesion such as diameter lesion, asym-
metry, border and color. Extracted feature has been employed to the training of the
dense-connected multi-constrained Convolution Neural Network which contains the
dense blocks with 128 layers which is capable of producing better accuracy with
A. J. Obaid (B)
Faculty of Computer Science and Mathematics, University of Kufa, Kufa, Iraq
e-mail: ahmedj.aljanaby@uokufa.edu.iq
S. Suman Rajest · S. Silvia Priscila
Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, India
e-mail: silviaprisila.cbcs.cs@bharathuniv.ac.in
T. Shynu
Department of Biomedical Engineering, Agni College of Technology, Chennai, Tamil Nadu, India
S. A. Ettyem
National University of Science and Technology, Thi-Qar, Iraq
e-mail: sajad.a.ataim@nust.edu.iq
https://doi.org/10.1007/978-981-99-6553-3_28
362 A. J. Obaid et al.
reduced processing time. Furthermore, proposed model uses the hyperparameter opti-
mization to reduce the network complexity and enhances the computation efficiency.
Implementation outcome of the current approach is assessed using MATLAB soft-
ware on using NSCLC dataset. Performance analysis of the proposed model on three
classes of the disease as large cell carcinoma, adenocarcinoma and squamous cell
carcinoma with 98.75% accuracy, 98.46 specificity and 99% sensitivity, respectively,
on comparing against conventional classifiers.
Keywords Lung cancer · Deep learning · Dense Convolution Neural Network ·

ABCD segmentation · Feature extraction
1 Introduction
Lung cancer is a composite deadly illness which is primarily owing to uncertain

and gathering of the numerous molecular variation. Molecular variation of lung cells
leads to tumor in form of cancer with nonuniform representation and extremities [1].
Identification of the lung cancer disease can be carried out using invasive technique
such as clinical screening and biopsy and non-invasive techniques like analysis of
image with respect to dermoscopy and histopathological aspects. However, correct
diagnoses of lung lesion are hard, cumbersome and complicated due to heterogeneous
appearance, nonuniform shapes and segments of lung lesions [2]. Manual recogni-
tion and classification of the lung lesion are extremely comprehensive and difficult
on features with large degree of intraclass changes and low degree of interclass
modification [3].
Machine learning-based unsupervised algorithm such as K-Nearest Neighbor [4],
Random Forest [5], Artificial Neural Network [6] is implemented to classify the
cancer on basis of the tumor representation and its characteristics on structure,
dimensions and edges into tumor and non-tumor types. Machine learning model
is not suitable in staging the tumor features, those approaches require high compu-
tation time and it leads to minimized accuracy and efficiency. Furthermore, these
approach processes reduced time-changing capability and are less resilient to tumor
boundary changes on the multiple classes of the tumor features of the small lung
cancer cells. To mitigate those limitations, deep learning architecture is utilized as it
is high beneficial in categorizing the features of the lesions efficiently and accurately
[7].
In this paper, a novel dense Convolution Neural Network for skin cancer classifi-
cation and staging of the disease to the NSCLC images has been proposed. Initially,
image noise removal, segmentation and feature extraction have been employed as
a preprocessing technique for noise, segmenting the coarse appearance and lesion
boundary and to extract the features lesion such as lesion diameter, lesion asymmetry
and lesion borders. Further, those extracted feature has been employed to the dense
Convolution Neural Network which contains the dense blocks to classify the features
into three disease classes: adenocarcinoma, large cell carcinoma and squamous cell
Dense Convolution Neural Network for Lung Cancer Classification … 363
carcinoma [8, 21, 22]. Finally, proposed model uses hyperparameter optimization to
reduce the network complexity and enhances the computation efficiency.
The remaining article is partitioned as follows: Sect. 2 represents the problem
statement and literature review for lung cancer classification. In Sect. 3, the current
dense Convolution Neural Network architecture for disease classification on lesion
features into types and stages has been provided. Implementation analysis of the
current methodology on the disease dataset is accomplished in Sect. 4 along with
experimental analysis on numerous measures like accuracy, recall and precision
on the confusion matrix. Finally, Sect. 5 concludes the work with remarkable
suggestions.
2 Related Work
In this part, multiple traditional approaches are implemented for lung lesion identifi-
cation and classification as an automated approach on the analysis of medical images
by incorporating a machine learning model which is illustrated as follows.
2.1 Lung Lesion Classification Using Artificial Neural

Network
Artificial Neural Network is most effective in detecting and classifying the lesion
classification. The process of the categorization has been carried out on preprocessing
of the lung images implementing maximum gradient intensity algorithm [9]. Next,
preprocessed image is segmented using Otsu threshold model to separate the lung
lesion. Gray-level co-occurrence matrix [10] is applied to extract the multiple features
on segmented images. Those computed features are employed to train the neural
network. Neural Network classifies the feature into classes as tumor and non-tumor
class. Finding on this particular architecture is that it is capable of classifying the
lung disease into various classes with performance accuracy of 96% and reduced
processing compared with other machine learning classifiers.
2.2 K-Nearest Neighbor Classification Model for Lung

Lesion Classification
K-Nearest Neighbor classification model is employed to identify the skin lesions

and segment it into normal and benign. The process of classification has been carried
out after preprocessing, feature extraction and segmentation of the images using
region growing and local binary pattern mechanism [11]. Those processes generate
the lesion boundaries and effective features for effective classification. KNN clas-
sification on [12] obtained features to classifies the tumor and non-tumor with high
scalability and reliability. Finding on this particular architecture is that it is capable
of classifying the lung disease into various classes with performance accuracy of
87% and it leads to over-fitting and under-fitting issues.
3 Current Approach
In this part, a novel deep learning approach represented as dense Convolution

Neural Network architecture is designed on concerning the lung lesion illness. This
approached is established to detect and classify the severity of disease tumors into
basal cell adenocarcinoma, large cell carcinoma and squamous cell carcinoma. It has
been classified and staged with respect to the lesion features.
3.1 Image Preprocessing
Lung image may contain some artifacts as noise which can be eliminated using
image contrast enhancement technique termed as CLAHE. It has been employed to
smoothen image by eliminating the artifacts without altering the necessary character-
istic in the lesion image. CLAHE incorporates the histogram equalization operation
along the bilinear interpolation which is employed to enhance the contrast and reduce
the noises including adaptive median filtering.
3.2 Image Segmentation—Gradient Vector Flow (GAV)
Preprocessed image of the NSCLC image is segmented using Gradient Vector Flow
[14] to detect the lesion boundaries on the image lesions. Gradient vector allows large
scale of feature similarity of the tumor part on processing with respect to the lesion
boundary. The object boundary of image lesion is fixed using following equation:
X (s) = (x(t), y(t)), where t ε[0, 1].
The image contour is started using heuristic criteria. The criteria compute coarse
appearance and lesion boundary. The lesion is represented using differential equation
d X (s,t)
= Fint X (s,t) + Vint X (s,t) ,
dt
where F int is an internal force which keeps the shape continuity and smoothness
of the contour and V int is the gradient vector flow. Vector flow contains the lesion
boundaries of the lesion which will take for feature extraction.
3.3 Feature Extraction—ABCD Rule
The segmented image is processed on employing ABCD rule [15] as feature descrip-
tors to extract the features lesion such as diameter lesion, asymmetry, border and
color. Variational features are segmented into normal and abnormal tumor features
on employing the ABCD segmentation conditions. ABC segmentation conditions
process the feature vector to identify the asymmetry, border and color characteristics
of the feature [10]. ABC segmentation conditions are employed to segment the tumor
segments accurately.
• Asymmetry
Asymmetry is important in the lesion segment analysis. The asymmetry of the lesion
is computed using asymmetry index and lengthening index. Asymmetry index of the
lesion images’ segments is computed using
∆A
AI = ∗ 100,
A
where A is the total surface of the image and ∆A is the surface difference among the
tumor surfaces of the image.
• Border Irregularity
Border irregularity computes the irregularity in the border of the lesion segments.
There are many measures to compute irregularity termed as compact index, fractal
index, edge and pigment variations.
• Compact Index: It is the computation of the barrier to noise along the boundary.
It is computed using following equation:
PL2
CI = ,
4π A
where PL = lesion perimeter and AL is lesion area.
3.4 Dense Convolution Neural Network
In this work, extracted features of the ABCD feature descriptor are employed to the
DenseNet architecture. It processes the feature vector to produce the disease type and
Fig. 1 Feature map of the

lung lesion feature’s vectors
stages such as large cell carcinoma, adenocarcinoma and squamous cell carcinoma.
AlexaNet mechanism [16] is employed to generate combinations of feature map
with the minimum resolution as it is mixture of the convolution layer, pooling layer
activation layer, loss layer and fully connected layer as classification layer.
• Convolution Layer
In this convolution layer, kernel size of 3*3 has been utilized to process the features
from the ABCD descriptors. It constructs the feature map from the feature vectors.
Figure 1 represents the feature map generation in the convolution layer.
• Max Pooling Layer
In this layer, feature vector in form of feature map is down-sampled by half on

computing the relationship of the features of the lesion and creates the pooling index
for the features to control the over-fitting issues. The max pooling layer extracted
features which has high-level representations on the feature vector constructed by
ABC rule. The feature map is represented as:
Fm ε R C∗H ∗W .
• Activation Layer
Proposed approach employs the rectified linear units (ReLUs) as activation function
as it improves the training stage to minimize the errors and introduces nonlinearity
among the max pooled feature vectors. Activation function is provided by
x x >0
ReLu F(x) = .
o x <0
Every activation function is processed with segment normalization to eliminate

the over-fitting issues and enhances the model generalization on normalizing the
activation function output of model. Further convolution operation of model produces
the staging of the diseases for the tumor features. Figure 2 illustrates the current
representation of the dense Convolution Neural Network.
Fig. 2 Representation of the processing of dense Convolution Neural Network
• Fully Connected Layer
Fully connected layer is flattened process and learns the most discriminative features
of the feature map to construct a class. Classes containing the feature vector are
transformed into one-dimensional data.
• Softmax Layer
Softmax module is employed to map the image pixel to a certain class of lung tumor.
The Softmax classifier identifies the features classes of the image pixel outcomes to
an N-channel segment of probabilities and identifies segments related to the disease
class with the largest probability of every image portions of tumor.
ex t
P(y = i|x) = ∑ t ,
ex
where x is the feature map, w is the kernel value and n denotes the classes.
Algorithm:Lung Lesion Classification Using Dense CNN

Input: Feature Vector V = {v1,v2,…VN}
Output: Disease Class Label D = {C1,C2..CN}
Process
Convolution Layer ()
Fix Kernel as 3*3 matrixes
Generate the Convolution Feature vector ().

Fm = Feature map containing high level feature of the lesion.
FC = Batch Normalization (FM ).
Tumor label = Soft_Max(FC ).
Enhance Model Parameter θ employing gradient descent function.

Class = {large cell carcinoma, adenocarcinoma and Squamous cell Carci-
noma}.
4 Experimental Results
Implementation outcomes of the current model have been evaluated using MATLAB
simulation environment on using NSCLC dataset [17]. Matlab2021 is used for
modeling the learning model due to in-built deep learning functionalities. In
processing of the image, dataset is segmented into training set, testing set and vali-
dation set. In this work, tenfold validation is employed to enhance the classifica-
tion performance and disease staging with high scalability and accuracy. Model
performance is computed with Dice Similarity Coefficient, sensitivity and speci-
ficity. Table 1 represents the performance evaluation of the lung cancer detection and
classification approaches.
• Dice Similarity Coefficient
It is estimated by distance variation among the classification outcomes of the

approach with respect to the ground truth data of the model. Further, it is computed
on employing true-positive, false-positive and false-negative measures of lesion
classification outcome[18]. It is denoted as
2TP
Dice Similarity Coefficient = .
2T P + F P + F N
The Dice Similarity Coefficient generates excellent outcomes on accessing lung

cancer classification results containing the malignant lesion classes as squamous cell
Table 1 Performance evaluation of lung cancer classification technique techniques

Disease classes Technique Dice coefficient Sensitivity Specificity
Large cell carcinoma Dense 0.9948 0.9512 0.9979
CNN—proposed
model
Artificial Neural 0.9751 0.9436 0.9843
Network— existing
model
Squamous cell carcinoma Dense 0.9945 0.9614 0.9977
CNN—proposed
model
Network—existing
model
Adenocarcinoma Dense 0.9916 0.9715 0.9915
CNN—proposed
model l
Network— existing
model
Fig. 3 Performance evaluation of lesion classification architectures with respect to Dice coefficient
carcinoma with 98.75% accuracy, respectively, on comparing against conventional

classifiers and it is represented in Fig. 3.
• Sensitivity
It is the computation of ratio of the similar instance to the extracted instance. In other
words, resultant feature is extracted correctly among the tumor features [19]. It is
represented as
TP
Sensitivity = .
T P + FN
Performance analysis of the proposed model on sensitivity measure provides three

classes of the disease as with 99% sensitivity, respectively, on comparing against
conventional classifiers. Figure 4 represents the performance of the lesion classifier
on sensitivity.
• Specificity
It is the computation of the ratio of the irrelevant instance on the extracted instance
to the tumor region. In other words, relevant features are extracted among the tumor
features [20]. It is represented as
TP
Specificity = .
T P + FN
Performance analysis of the proposed model three classes of the disease with
98.46% specificity, respectively, on comparing against conventional classifiers.
Figure 5 represents the performance of the lung lesion classifier on specificity.
Performance analysis validates the proposed model’s efficiency and accuracy on
skin lesion classification on cross-fold validation using a confusion matrix. Deep
Fig. 4 Performance evaluation of lung tumor classification approaches with respect to sensitivity
Fig. 5 Performance evaluation of tumor classification approaches with respect to specificity
learning methods show excellent performance compared to conventional methods for

disease analysis. The proposed model provides improved performance with gradient
descent optimization as parameter tuning. The performance has produced nearest
results on validating with ground truth data.
5 Conclusion
In this paper, a dense Convolution Neural Network for lung cancer classification
and staging of the disease to the images has been designed and implemented.
The proposed classifier model includes the preprocessing technique using contrast
enhancement and noise filtering technique, image segmentation using gradient vector
flow approach for effective coarse appearance and lesion boundary and feature
extractor using ABC rule to extract the features lesion such as diameter lesion,
asymmetry, border and color. These entire Extracted feature proposed in the dense
Convolution Neural Network (CNN) used at the classification layer to identify the
predicted class. Proposed model has been validated on dataset with classification
results containing three classes of malignant tumor classes such as adenocarcinoma
and squamous cell carcinoma. Further, it has shown better performance than existing
conventional method with 98.75% accuracy, 98.46 specificity and 99% sensitivity,
respectively, on comparing against conventional classifiers.
References
1. Revathi V, Chithra A (2015) ‘A review on segmentation techniques in lung lesion images.’ Intl
Res J Eng Tech (IRJET) 2(9):2598–2603
2. Abbas Q, Garcia IF, Emre Celebi M, Ahmad W, Mushtaq Q (2013) A perceptually oriented
method for contrast enhancement and segmentation of dermoscopy images. Skin Res Technol
19(1):e490–e497
3. Adegun AA, Viriri S (2020) ‘Deep learning-based system for automatic melanoma detection.’
IEEE Access 8:7160–7172
4. Alquran H, Qasmieh IA, Alqudah AM, Alhammouri S, Alawneh E, Abughazaleh A, Hasayen F
(2017) The melanoma skin cancer detection and classification using support vector machine. In:
Proceedings IEEE Jordan conference applications electrical engineering computing technology
(AEECT), October, 2017, pp 1–5
5. Hameed N, Hameed F, Shabut A, Khan S, Cirstea S, Hossain A (2019) ‘An intelligent computer-
aided scheme for classifying multiple lung lesions.’ Computers 8(3):62
6. Murugan A, Nair SAH, Kumar KPS (2019) ‘Detection of skin cancer using SVM, random
forest and kNN classifiers.’ J Med Syst 43(8):269
7. Seeja RD, Suresh A (2019) ‘Deep learning based skin lesion segmentation and classification of
melanoma using support vector machine (SVM).’ Asian Pacific J Cancer Prevent 20(5):1555–
1561
8. Li Y, Shen L (2018) ‘Skin lesion analysis towards melanoma detection using deep learning
network.’ Sensors 18(2):556
9. Rajpara SM, Botello AP, Townend J, Ormerod AD (2009) ‘Systematic review of dermoscopy
and digital dermoscopy/ artificial intelligence for the diagnosis of melanoma.’ Brit J Dermatol
161(3):591–604
10. Hekler A, Utikal JS, Enk AH, Hauschild A, Weichenthal M, Maron RC, Berking C, Haferkamp
S, Klode J, Schadendorf D, Schilling B, Holland-Letz T, Izar B, Von Kalle C, Fröhling S,
Brinker TJ (2019) Superior skin cancer classification by the combination of human and artificial
intelligence. Eur J Cancer 120:114–121
11. Brinker TJ, Hekler A, Utikal JS, Grabe N, Schadendorf D, Klode J, Berking C, Steeb T, Enk AH,
von Kalle C (2018) Skin cancer classification using convolutional neural networks: systematic
review. J Med Internet Res 20(10):e11936
12. Guha SR, Haque SR (2020) Performance comparison of machine learning-based classifi-
cation of skin diseases from lung lesion images. In: Proceedings international confrence
communications, computing electronics systems, Singapore, Springer, 2020, pp 15–25
13. Bi L, Feng D, Kim J (2018) ‘Dual-path adversarial learning for fully convolutional network
(FCN)-based medical image segmentation.’ Vis Comput 34(6–8):1043–1052
14. Bi L, Kim J, Ahn E, Kumar A, Fulham M, Feng D (2017) ‘Dermoscopic image segmentation
via multistage fully convolutional networks. IEEE Trans Biomed Eng 64(9):2065–2074
15. Abdollahi B, Tomita N, Hassanpour S (2020) Data augmentation in training deep learning
models for medical image analysis. In: Deep learners and deep learner descriptors for medical
applications. Cham, Switzerland, Springer, pp 167–180
16. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) ‘Deep learning for computer
vision: a brief review.’ Comput Intell Neurosci 2018:1–13
17. Bi L, Kim J, Ahn E, Kumar A, Feng D, Fulham M (2019) ‘Stepwise integration of deep
class-specific learning for dermoscopic image segmentation.’ Pattern Recognit 85:78–89
18. Pereira S, Meier R, McKinley R, Wiest R, Alves V, Silva CA, Reyes M (2018) ‘Enhancing
interpretability of automatically extracted machine learning features: application to a RBM-
random forest system on brain lesion segmentation.’ Med Image Anal 44:228–244
19. Akhavan Aghdam M, Sharifi A, Pedram MM (2018) Combination of rs-fMRI and sMRI data
to discriminate autism spectrum disorders in young children using deep belief network. J Digit
Imag 31(6):895–903
20. Zhu Y, Wang L, Liu M, Qian C, Yousuf A, Oto A, Shen D (2017) ‘MRI-based prostate cancer
detection with high-level representation and hierarchical classification.’ Med Phys 44(3):1028–
1039
21. Obaid AJ (2022) Data mining analysis models based on prospective detection of infectious
disease. In: Sharma DK, Peng SL, Sharma R, Zaitsev DA (eds) Micro-electronics and telecom-
munication engineering. Lecture Notes in Networks and Systems, vol 373. Springer, Singapore.
https://doi.org/10.1007/978-981-16-8721-1_41
22. Radhi A (2023) Early stage prediction of COVID-19 using machine learning model. Wasit J
Comput Mathem Sci 2(1):46–61
Sentiment Analysis Using Bi-ConvLSTM
Durga Satish Matta and K. Saruladha
Abstract Sentiment analysis employs a variety of automated cognitive techniques

to establish the speaker’s or author’s attitudes regarding the general emotional tenden-
cies of an articulated object or text. Mining human sentimental tendencies has been
significantly hampered in recent years by the expanding volume of opinionated
content on social networks. Moreover, a single contextual representation customised
for a specific activity cannot handle several sentiment analysis tasks. For this kind
of issue, in this research, a hierarchical model for sentiment analysis is proposed.
Initially, word embedding of natural language sentiment words into vector represen-
tations using the Skip-Gramme model. After that, extract the features from the input
words using the CenterNet technique. Feature maps are derived from the input data
using CenterNet’s backbone network. So, here, we employed CenterNet to effec-
tively extract the textual information. Finally, classify the emotions into positive and
negative categories using a model known as Long-Short-Term Memory with Bi-
directional Convolution (Bi-ConvLSTM). The binary Stanford Sentiment Treebank
(SST-2) and the Sentiment140 dataset were used in the experiments. Classification
accuracy was improved as compared to other techniques.
Keywords Sentiment analysis · Classification · Bi-ConvLSTM · Word2vec ·

CenterNet
D. S. Matta (B) · K. Saruladha

Department of Computer Science and Engineering, Puducherry Technological University,
Puducherry 605014, India
e-mail: durgasatish@pec.edu
K. Saruladha
e-mail: charuladha@ptuniv.edu.in
https://doi.org/10.1007/978-981-99-6553-3_29
374 D. S. Matta and K. Saruladha
1 Introduction
A growing number of people have started expressing their thoughts on websites

openly in recent years due to the Internet’s and social network’s rapid expansion.
As a result, the Internet generates a lot of user comment data. For instance, product
reviews are created on e-commerce sites like Taobao and Jingdong, while hotel
reviews are created on travel sites like ELong and Ctrip. It is challenging to manually
examine comments given their exponential growth [1–3]. The emotional inclinations
of comment messages can be extracted using artificial intelligence technologies in
the age of big data to quickly assess network public opinion. Finding the sentiment
trend in the comments requires conducting sentiment analysis research.
The sentiment orientation analysis of the comment corpus, which shows that
people express negative or positive sentiments towards events or products, is the
major focus of the sentiment analysis of comments. Also, there are different kinds
of sentiment analysis, including film comment analysis, product comment analysis,
news comment analysis, and others [4, 5]. These comments represent the opinions
of online users about various products, trending topics, etc. With the use of rele-
vant product reviews, retailers may control consumer contentment. By reading these
product reviews, prospective customers can assess the products. Social media is
becoming the best location to share opinions about events, products, and services
due to the growth of user-generated content on websites like TripAdvisor, Amazon,
Facebook, Twitter, and Instagram [6, 7]. This capability has raised the importance of
the viewpoints voiced when coupled with the simplicity with which online content
is distributed.
This enormous amount of data is now being analysed using a number of NLP
techniques. Sentiment analysis (SA) is one of the most significant methods used
today to classify attitudes and opinions expressed by speakers and in written material.
The goal of SA is to examine and retrieve data from the subjective data that is posted
online. SA has lately emerged as a key area of study in data mining and NLP due to its
many academic and commercial implications as well as the speedy rise of Web 2.0. As
a result, numerous techniques and tools for defining a document’s polarity have been
created recently [8–10]. The majority of sentiment analysis applications rely heavily
on the binary classification task of polarity detection. To achieve acceptable polarity
classification results, the majority of prior approaches for SA have trained shallow
models on carefully created useful features. Initially, convert the word emotions into
a vector conversion. Because the words are written in normal languages, convert
them using the Skip-Gram approach. After that, extract the features from the input
words using the CenterNet technique. Finally, use the Long-Short-Term Memory
with Bi-directional Convolution (Bi-ConvLSTM) approach to divide the emotions
into groups for good and negative feelings.
The key contributions of this paper are:
• To convert the natural language sentiment words into vectors, we utilised the
Skip-Gram-based word embedding model.
• Then, using the CenterNet technique to extract the features of sentiment words.
Sentiment Analysis Using Bi-ConvLSTM 375
• Classifying the sentiments, whether good or negative, using a cutting-edge Long-

Short-Term Memory with Bi-directional Convolution (Bi-ConvLSTM) deep
learning technique.
• This novel deep learning technique is performed on sentiment words and provides
higher accuracy compared to other existing techniques.
• The Sentiment140 and binary Stanford Sentiment Treebank (SST-2) datasets, two
well-known datasets, were used as the basis for the experiments.
The following are the remaining portions of this article: the paper is structured as
follows: in Sect. 2, a survey of the sentiment analysis literature is intended. Section 3
presents the problem statement of current approaches, Sect. 4 in-depth details the
suggested model, Sect. 5 suggests experiments and their results, and Sect. 6 finishes
the study and suggests some future possibilities for research.
2 Literature Survey
Many researchers have submitted many papers pertaining to sentiment analysis. Here
are a few sources we used to support our methodology. Basiri et al. [11] proposed
an Attention-based Bi-directional CNN-RNN deep model (ABCDM). By consid-
ering the temporal data flow in two paths, ABCDM will be able to retrieve both
future and past contexts by using two independent GRU and bi-directional LSTM
layers. Additionally, the bi-directional ABCDM layer results use the attention mech-
anism to accent particular phrases more or less. To reduce data dimensionality and
recover position-invariant local characteristics, ABCDM uses pooling and convolu-
tion methods. ABCDM’s efficacy is evaluated using sentiment polarity detection, the
most common and important sentiment analysis task.
A unique attention-based method that makes use of CNNs and LSTM (named
ACL-SA) is suggested by Kamyab et al. [12]. It uses a preprocessor on the textual
input first to enhance the quality of the data before extracting relevant information
from it. Then, pre-trained glove word embedding and TF-IDF feature weighting
techniques are used. Moreover, it extracts contextual information and lowers feature
dimensionality using CNN’s max-pooling. Moreover, it incorporates a Bi-LSTM to
track long-term dependencies. Additionally, it displays the attention intensity of each
word by using the CNN output layer’s attention algorithm. The Gaussian noise and
Gaussian-Dropout regularisation are used to prevent overfitting.
To capitalise on the affective dependencies of the phrase as per the particular
component, Liang et al. [13] suggested Sentic GCN, a GCN based on SenticNet.
To be more precise, they investigate a method of building GNNs by incorporating
emotive information from SenticNet to enhance the graph dependency of phrases.
The innovative affective improved graph model is based on it and considers both the
interdependence of feature words and contextual words as well as the effective data
between sentiment analysis and the component.
Neogi et al. [14] collected information about the farmer’s protest from the
microblogging platform Twitter to comprehend the feelings that the general public
worldwide shared. A collection of about 20,000 tweets about the protest was utilised
to analyse and rank the viewpoints using systems. Bag of Words performed better
than TF-IDF in the analysis, which also included the use of Bag of Words. Addi-
tionally, they made use of Naive Bayes, decision trees, random forests, and support
vector machines.
Using the R programming language, Twitter data has been examined. Based on
the hashtags for COVID-19, coronavirus, deaths, and recently discovered cases,
they have gathered Twitter data. Kaur et al. [15] in this study proposed the hybrid
heterogeneous support vector machine (H-SVM) technique. The sentiment scores
were classified as positive, negative, or neutral using this algorithm, which also
did sentiment categorization. They also compared the performance of the proposed
method to that of the SVM and RNN using metrics like F1-score, recall, precision,
and accuracy. The comparison of similar work’s performances is shown in Table 1.
Limitations of the existing deep models for SA.
• Existing deep learning models focus on some of these issues, like people
expressing their negative sentiments using positive words while ignoring others.
• To retrieve semantic and sentiment information for emotion identification, for
instance, they used two pre-trained word embeddings and LSTM; however, their
model failed to consider the varying relevance of the various sentence components.
• It excluded information from common sense knowledge and solely took into
account a sentence’s syntactic relationships.
• They lacked the processing power to process such a sizable quantity of tweets;
therefore, more tweets might have been useful in revealing a sizable amount of
sentiment.
3 Problem Statement
In most of the recent papers, a dataset-comprising techniques like data augmentation,

data normalisation, and a number of data preprocessing methods is more difficult
for sentiment analysis. Because of the dataset-comprising techniques, our network
complexity has increased. To overcome this problem, we used a more effective
preprocessing method for sentiment analysis in this research.
The goal of sentiment analysis, a task of NLP, is to retrieve views and attitudes from
texts. Data from text and other modalities, such as visual data, is also starting to be
included in new sentiment analysis approaches. In this paper, the proposed model
converts word sentiments into vectors as an initial step. When the words are written
Table 1 Performance comparison of the literature survey

Reference Technique Dataset Performance Limitation
and year metric
Basiri et al. ABCDM Android, kindle Recall, This uses a quite complicated
[11] store, airline, precision, architecture, combining
sentiment140, F1-score, attention techniques with
T4SA datasets accuracy convolutional and recurrent
neural networks. This may
make it difficult to execute and
fine-tune the model
Kamyab et al. ACL-SA Sentiment140, Accuracy It will not work with other
[12] US-Airline languages
Liang et al. SenticNet REST14, Accuracy, This is only applicable to
[13] LAP14, Macro-F1 aspect-based sentiment
REST15, analysis; other natural language
REST16 processing tasks are difficult
and the outcomes are not easily
interpreted
Neogi et al. SVM Twitter dataset Precision, It was easy to extract a
[14] recall, substantial number of tweets
F1-score, because millions of people
accuracy tweeted their opinions about
the protests. They lacked the
computational power to
examine such a large quantity
of tweets, even if more tweets
might have been valuable in
revealing a significant number
of feelings
Kaur et al. H-SVM Twitter dataset Precision, It is only evaluated on a small
[15] H-SVM recall, dataset of COVID-19-related
F1-score, tweets. And it may not be
accuracy applicable to other natural
language processing tasks that
require sentiment analysis
in natural languages, convert them utilising the Skip-Gram-based word embedding

model. The CenterNet method is then used to retrieve the features from the input
words. After that, classify the feelings into positive and negative categories using
the Long-Short-Term Memory with Bi-directional Convolution (Bi-ConvLSTM)
approach. Classical LSTM models use full connections for input-to-state and state-
to-state transitions, but their main flaw is that they do not take spatial correlation
into consideration. ConvLSTM, which was proposed to address this problem, used
convolution operations in input-to-state and state-to-state transitions. The experi-
ments used Sentiment140 and the binary Stanford Sentiment Treebank (SST-2), the
two well-known datasets.
Fig. 1 Architecture of the suggested methodology
Existing model Proposed model

Input Sentiment words Sentiment words
Preprocessing Skip-Gram model Skip-Gram model
Feature extraction method CNN CenterNet
Deep learning model SMVMED Bi-ConvLSTM
Dataset used SST-1 and SST-2 SST-2 and Sentiment140
The suggested methodology’s architecture is depicted in Fig. 1.
4.1 Preprocessing
Create a vector out of the natural language term during the preprocessing phase.
Even though the contents are written in normal language, it can be challenging to
present them in a way that deep neural networks can understand them. Despite the fact
that word vectors have taken the role of the more traditional one-hot vectors in this
discipline, many jobs involving natural language processing have done far better as a
result. Word2vec, the fundamental building block of the proposed models, does this
by utilising a shallow two-layer neural network to convert words into D-dimensional
word vectors. Word2Vec has two variations: the Skip-Gram model, used in this paper
Fig. 2 Word embedding model flowchart
[16], and the Continuous Bag of Words (CBOW) model. Typically, when learning a
word’s vector representation, Skip-Gram takes the word’s context into account.
The results of the Skip-Gram training procedure are stored as a look-up table
with a corresponding vector for each word. With n being the number of input words
and d being the length of word embedding, each word in the embedding matrix is
retained in a row vector, which is handled as single-view data for a document. The
single-view data is then obtained using convolutional and recursive neural networks,
respectively. The flowchart for the word embedding segment utilising the Skip-Gram
approach is shown in Fig. 2.
After converting the word into a vector, extract the word features using the CenterNet
technique. With fewer hyperparameters than other detectors, because CenterNet is an
anchor-free detector, it can immediately anticipate an object’s category and coordi-
nates on feature maps without the need for several pre-set anchor boxes. In addition,
the CenterNet establishes centre points by key point estimation before regressing the
size and location of the object characteristics.
The backbone network, feature enhancement layer, and detecting head are the
three components that make up the CenterNet, as depicted in Fig. 3. CenterNet first
uses the backbone network to retrieve the first characteristics from the input image,
and then, it applies a feature enhancement layer to improve the semantic contexts
of the features to provide high-resolution features [17]. Lastly, the detecting head
classifies and regresses using high-resolution characteristics to predict the bounding
boxes of objects.
These are some benefits of the CenterNet model.
(1) By detecting the location of the centre point and directly returning the attributes
of the recognition target, the CenterNet model was able to achieve anchor-free
identification.
Fig. 3 Architecture of CenterNet
(2) The CenterNet model could be quickly detected because it simply focused on
the target’s centre point information.
(3) By removing the local peak points from the centre point’s feature map,
the CenterNet model was capable of drastically minimising the amount of
computation required to find a target with just one anchor.
4.3 Classification
Utilise the Long-Short-Term Memory with Bi-directional Convolution (Bi-

ConvLSTM) deep learning technique to analyse the feelings after extracting the
features. An extended RNN called LSTM can take temporal dependence into better
consideration. Because LSTM cannot incorporate spatial correlation, ConvLSTM
can address this issue. Convolutional procedures are applied to both input to state
transitions and state changes [18]. The ConvLSTM shares the same input gate,
output gate, forget gate, and memory cell as LSTM. Following is a description of
ConvLSTM:

i t = σ Wx∗i X t + Wh∗t Ht−1 + Wc∗i Ct−1 + bi , (1)

f t = σ Wx∗f X t + Wh∗f Ht−1 + Wc∗f Ct−1 + b f , (2)

Ct = f t oCt−1 + i t tanh Wx∗c X t + Wh∗c Ht−1 + bc , (3)

ot = σ Wx∗o X t + Wh∗o Ht−1 + Wco oCt + bc , (4)
Ht = ot o tanh(Ct ). (5)
All of the W * in this equation are 2D convolution kernels, the input tensor is X t ,
the hidden state tensor is H t , the memory gate tensor is C t , and all of the bias terms
are b*. Hadamard product and convolution operation, respectively, are indicated by
the symbols * and o.
In contrast to ConvLSTM, which only utilises forward data dependencies, Bi-
ConvLSTM processes input data using both forward and backward ConvLSTMs
before determining the information dependencies in both directions and evaluating
the present input. The dependence on both forward and backward data can boost
forecast accuracy.
Figure 4 displays the outcome of Bi-ConvLSTM
−→ − → → −
− →
Yt = tanh W yH · H + W yH · H + b . (6)
−
→
When combining the output, a nonlinear function called tanh is utilised. H and
−
→
H represent, with b being the bias term, the backward and forward state’s respective
hidden gate tensors.
Fig. 4 Bi-directional ConvLSTM

This section’s first half categorises sentiment analysis using the dataset’s evaluation
and methodology for extracting word feature attributes, contrasting our approach
with “state-of-the-art” techniques.
Here, choose two datasets to perform this paper. Those are the SST-2 and
Sentiment140 datasets.
1. SST-2
To thoroughly examine the compositional effects of sentiment in language, one can
use the Stanford Sentiment Treebank, a corpus made up of fully annotated parse
trees. The corpus is made up of 11,855 single sentences that were taken from movie
reviews and is based on the dataset first provided by Pang and Lee (2005). Three
human judges have gone over each of the 215,154 individual sentences that the
Stanford parser produced after processing the data. SST-2 or SST binary is the name
of the dataset. It was employed in a study that classified sentences as either being
neutral or slightly neutral versus being moderately positive or positive, with neutral
sentences being excluded.
2. Sentiment140
Graduate students at Stanford trawled Twitter to create the Sentiment140 dataset.

There are 248,576 positive tweets and 80,000 bad tweets, which are categorised as
positive or negative based on their emotive words. Nowadays, one of the standard
datasets used the most frequently for text categorization is this one. Over a million
tweets, both positive and negative, make up the Sentiment140 Twitter dataset, which
was produced by Alec Go, Richa Bhayani, and Lei Huang.
Sentiment analysis tools may have trouble deciphering the underlying context of
a response when backhanded compliments are used to express a negative viewpoint.
It is possible that more “positive” input is actually detrimental as a result.
5.2 Evaluation Metrics
The proposed technique’s F1-score (F), accuracy (A), precision (P), and recall were
examined as performance metrics (R). These measurements show:
(a) Accuracy
Accuracy is the degree to which a value measured agrees with its known or expected
value. A classifier generally performs better with higher accuracy. Equation (7)
demonstrates what accuracy means.
TP +TN
Accuracy = . (7)
T P + FN + FP + T N
(b) Sensitivity
The sensitivity of a classifier is determined by its capacity to display the fraction of
all expected positive cases, also known as recall. Equation (8) represents sensitivity.
TP
Sensitivity = . (8)
T P + FN
(c) Specificity
By expressing the percentage of all negative samples that are correctly identified,
specificity quantifies the classifier’s ability to identify negative samples. The denoted
specificity is displayed in Eq. (9).
TN
Specificity = . (9)
T N + FP
(d) Precision
Precision is the ratio of all perfectly anticipated positive observations to exactly
predicted positive events. The following abilities are exhibited by precision:
TP
Precision = . (10)
T P + FP
5.3 Performance Evaluation
The performances of the Sentiment140 dataset and the SST-2 dataset can be compared
to those of the existing approaches. The proposed strategy is contrasted with the previ-
ously proposed strategies, including LSTM, LR-LSTM, TS-GRU, and Bi-LSTM.
In comparison to other percentages, using the Bi-ConvLSTM technique results in
improved precision, accuracy, F1-score, and recall scores.
On the Sentiment140 dataset from Table 2, the proposed strategy is contrasted with
two other strategies. In comparison to other approaches, it achieves 98.72% accuracy
and 97.24% precision, and its recall is 97.48%. In comparison to the other two existing
techniques, the proposed technique achieved 98.90% accuracy and 98.15% precision
in the SST-2 dataset.
Table 2 Comparison of the results obtained results on the datasets

Datasets Methods Precision (%) Recall (%) F1-score (%) Accuracy (%)
Sentiment140 LSTM 85 83 84 84
dataset LR-LSTM 81 80 90 80
Proposed 97.24 97.48 98.10 98.72
(Bi-ConvLSTM)
SST-2 dataset TS-GRU 84.25 86.13 85.84 86.51
Bi-LSTM 75.1 75 75 75.1
Proposed 98.15 97.84 98.27 98.90
(Bi-ConvLSTM)
Figure 5 of the Sentiment140 dataset presents a graphic representation of accuracy

and precision. Our proposed method provides higher results when the difference can
be made using the previous approaches.
Figure 6 from the SST-2 dataset illustrates accuracy and precision. Our proposed
approach gives successful outcomes when the difference can be made utilising the
previous approaches.
In Table 3, we can see how the new method stacks up against some of the old
ones. It shows the performance evaluation values of accuracy, F1-score, recall, and
precision.
The F1-score, recall, and precision metrics are compared to earlier methods of
action detection in Table 3. A proposed approach achieved a greater accuracy of
99.29%. When compared to existing action recognition algorithms, our proposed
technique performs positive sentiment with 99.12% precision, 98.76% recall, and
96.49% F1-score and also performs negative sentiment with 98.50% precision,
Fig. 5 Multiple performances of the Sentiment140 dataset proposed approach with previous
approaches
Fig. 6 Multiple performances of the SST-2 dataset proposed approach with previous approaches
Table 3 Comparison of the evaluation results with other existing techniques

Methods Positive Negative Accuracy
Precision Recall F1-score Precision Recall F1-score
ABCDM [11] 95.70 90.88 93.22 91.34 95.91 93.56 93.40
ACL-SA [12] 90.51 86.16 79.07 87.26 81.84 75.84 87.12
SenticNet [13] 88.19 82.06 78.73 84.15 80.24 78.19 89.38
SVM & NB [14] 87.45 73.78 79.08 8347 65.19 73.24 83.45
H-SVM [15] 89.05 71.54 75.94 86.71 69.43 77.49 96.30
Proposed 99.12 98.76 96.49 98.50 98.73 97.08 99.29
(Bi-ConvLSTM)
98.73% recall, and 97.08% F1-score. The results show that in comparison to existing
strategies, the proposed model effectively increases the detection rate. It provides
novelty to our proposed methodology and requires less computational time.
The comparison outcomes of proposed and existing methods in positive and nega-
tive sentiments are demonstrated in Fig. 7. In comparison to other existing method-
ologies, Fig. 7 demonstrates that the suggested approach achieves superior recall,
precision, and F1-score values.
The proposed categorization technique is compared with previous sentiment
analysis approaches to obtain higher values, which are represented in Fig. 8.
Fig. 7 Comparison of the proposed method precision, recall, and F1-score with existing approaches
a positive, b negative
Fig. 8 Comparison of the proposed approach’s categorization accuracy results with previous
techniques
6 Conclusion
Now opinions, thoughts, and ideas can be expressed via digital media networks.
Social networks have gained popularity not just for this but also for disseminating
ideas and creating personal viewpoints. One can gain insight into society and the
environment by looking at the specifics of social media sites. Converting the word
“emotions” into a vector was the first step in this work. The Continuous Bag-of-
Words (CBOW) approach should be used to convert it because the words are written
in common languages. Then, using the CenterNet method, retrieve the features from
the input words. Using the Bi-ConvLSTM technique, thus the emotions are classified
into positive and negative categories. The two well-known datasets Sentiment140 and
the binary Stanford Sentiment Treebank (SST-2) were employed in the research. This
experiment achieved higher accuracy compared to other previous approaches. In the

future, collect a huge number of datasets to work on for sentiment analysis.
References
1. Pandian AP (2021) Performance evaluation and comparison using deep learning techniques in
sentiment analysis. J Soft Comput Paradigm (JSCP) 3(02):123–134
2. Nemes L, Kiss A (2021) Social media sentiment analysis based on COVID-19. J Inform
Telecommun 5(1):1–15
3. Barkur G, Kamath GB (2020) Sentiment analysis of nationwide lockdown due to COVID 19
outbreak: evidence from India. Asian J Psychiatr 51:102089
4. Manguri KH, Ramadhan RN, Amin PRM (2020) Twitter sentiment analysis on worldwide
COVID-19 outbreaks. Kurdistan J Appl Res 54–65
5. Hazarika D, Zimmermann R, Poria S (2020) Misa: modality-invariant and specific represen-
tations for multimodal sentiment analysis. In: Proceedings of the 28th ACM international
conference on multimedia, October, pp 1122–1131
6. Singh M, Jakhar AK, Pandey S (2021) Sentiment analysis on the impact of coronavirus in
social life using the BERT model. Soc Netw Anal Min 11(1):33
7. Li H, Chen Q, Zhong Z, Gong R, Han (2022) G. E-word of mouth sentiment analysis for user
behavior studies. Inform Process Managem 59(1):102784
8. Li R, Chen H, Feng F, Ma Z, Wang X, Hovy E (2021) Dual graph convolutional networks for
aspect-based sentiment analysis. In: Proceedings of the 59th annual meeting of the association
for computational linguistics and the 11th international joint conference on natural language
processing, August, vol 1. Long Papers, pp 6319–6329
9. Basiri ME, Nemati S, Abdar M, Asadi S, Acharrya UR (2021) A novel fusion-based deep
learning model for sentiment analysis of COVID-19 tweets. Knowl-Based Syst 228:107242
10. Garcia K, Berton L (2021) Topic detection and sentiment analysis in Twitter content related to
COVID-19 from Brazil and the USA. Appl Soft Comput 101:107057
11. Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) ABCDM: an attention-
based bidirectional CNN-RNN deep model for sentiment analysis. Futur Gener Comput Syst
115:279–294
12. Kamyab M, Liu G, Adjeisah M (2021) Attention-based CNN and Bi-LSTM model based on
TF-IDF and glove word embedding for sentiment analysis. Appl Sci 11(23):11255
13. Liang B, Su H, Gui L, Cambria E, Xu R (2022) Aspect-based sentiment analysis via affective
knowledge enhanced graph convolutional networks. Knowl-Based Syst 235:107643
14. Neogi AS, Garg KA, Mishra RK, Dwivedi YK (2021) Sentiment analysis and classification of
Indian farmers’ protest using twitter data. Int J Inform Managem Data Insights 1(2):100019
15. Kaur H, Ahsaan SU, Alankar B, Chang V (2021) A proposed sentiment analysis deep learning
algorithm for analyzing COVID-19 tweets. Inform Syst Front 1–13
16. Styawati S, Nurkholis A, Aldino AA, Samsugi S, Suryati E, Cahyono RP (2022) Sentiment
analysis on online transportation reviews using Word2Vec text embedding model feature extrac-
tion and support vector machine (SVM) algorithm. In: 2021 International seminar on machine
learning, optimization, and data science (ISMODE), January, IEEE, pp 163–167
17. Nazir T, Nawaz M, Rashid J, Mahum R, Masood M, Mehmood A, Hussain A (2021) Detection
of diabetic eye disease from retinal images using a deep learning based CenterNet model.
Sensors 21(16):5283
18. Jiang F, Zhi X, Ding X, Tong W, Bian Y, (2020) DLU-net for pancreatic cancer segmentation.
In: 2020 IEEE International conference on bioinformatics and biomedicine (BIBM), December,
IEEE, pp 1024–1028
19. Dudla AK, Atthuluri MR, Shaik PS, Yalamanchili VSB (2023) An efficient approach
for analyzing reviews using an ensemble technique. In: 2023 9th international conference
on advanced computing and communication systems (ICACCS), Coimbatore, India, pp
1576–1581
A New Method for Protein Sequence
Comparison Using Chaos Game
Representation
Debrupa Pal, Sudeshna Dey, Papri Ghosh, Subhram Das,

and Bansibadan Maji
Abstract The assessment of numerous protein sequences is a dynamic phase in

bioinformatics while measuring sequence similarities and evolutionary relationships.
However, the swift increase of protein sequences has led to generate a proper and
effective method for processing varied length datasets. Therefore, a new alignment-
free method is introduced for protein sequence analysis. Here, using the six classifi-
cations of amino acids and applying the chaos game theory on the hexagonal model,
two parameters, viz. mean and standard deviation, for each protein sequence is gener-
ated. Thereafter, a distance matrix is calculated for phylogenetic tree construction.
Furthermore, the output of the suggested technique when compared to the output
of the former methods for analysing the phylogenetic tree, it turns out that the
present procedure yields more appropriate outcomes than the earlier techniques.
Moreover, the Symmetric Distance (SD) values are computed to prove the domi-
nance of the proposed method over existing methods. Additionally, the execution
time for computation is minimal in the suggested method.
Keywords Alignment-free method · Chaos game theory · Symmetric distance ·

Protein sequence · Phylogenetic trees
1 Introduction
The vital function that proteins perform in the majority of biological processes
enables research into proteins one of the fundamental topics in biology. In the last few
years, several numeric depiction techniques have been recommended and adapted
in classification of proteins. Numerous applications in a variety of fields include
D. Pal · S. Dey · P. Ghosh (B) · S. Das

Narula Institute of Technology, Kolkata, India
e-mail: paprighosh06@gmail.com
D. Pal · B. Maji
National Institute of Technology, Durgapur, India
e-mail: bansibadan.maji@ece.nitdgp.ac.in
https://doi.org/10.1007/978-981-99-6553-3_30
390 D. Pal et al.
protein sequence comparison from the study of these sequences [1]. When examined
in its sequential form, proteins that appear to have a similar structure might really
differ greatly. There are two types of approaches that are employed in this context.
Researchers develop algorithms that quickly process enormous amounts of data due
to the protein database’s explosive growth. Again, the protein sequence’s length in
the majority of species is different, and thus, strategies that are rapid and adaptable
enough to manage protein sequences of various lengths are proposed. Due to the
extreme temporal complexity in this case, conventional alignment-based approaches
are no longer appropriate [2–4]. Later, alignment-free techniques are used to address
the alignment-based strategy’s limitations [5–8].
With the enormous growth of biological sequence data in last few decades, math-
ematical encoding techniques are most effective. An effective representation makes
it easier to locate and analyse the features of those sequences, making it the basis
of protein sequence analysis. Recently, numerous numerical techniques are applied
in prediction of protein function and classification [9–11]. In [12] Barnsley estab-
lished the chaos game algorithm to create fractals from random input. Chaos game
representation was later used in bioinformatics as a depiction of images of DNA
sequence [13]. Graphical representations of sequences are treated as a mathematical
invariant of the sequences in the branch of graphical bioinformatics [14]. Later, the
concept of CGR is applied to alignment-free sequence comparison of proteins. The
20 amino acids were first distributed on the vertices of a standard 20-sided polygon
by Fiser et al. [15], and a series of proteins were used to symbolise coordinates
within a circle of unit radius. Basu et al. [16] presented a 12-vertex CGR, with each
vertex representing an amino acid and associated conservative replacements in a
standard 12-sided polygon. In [17, 18], the CGR’s vertex count was reduced to four,
where each vertex of the square is representing one of the four classes of amino
acids. Although these types of CGR are not a strict one-to-one depiction of protein
sequence similar to DNA, the reduction in the vertices of a CGR image can help
depict the resemblance in homologous protein sequences [13]. Proteins are made up
of twenty different types of amino acids, whereas DNA is made up of four different
types of nucleotides. Therefore, when converting CGR to a visual representation of
proteins, it is necessary to choose how the 20 amino acids will be distributed. The
investigation of molecular sequences is transformed into the investigation of their
CGR images using the chaos game representation technique.
In this work, amino acids are divided into six groups based on their properties. In
our research, chaos game theory on hexagonal model is applied to obtain the distance
between two proteins, and thereafter, a set of non-degenerate points are obtained
for construction of phylogenetic tree. At the end, the obtained phylogenetic tree is
analysed and compared with existing phylogenetic trees for NADH Dehydrogenase-5
(ND5) dataset.
A New Method for Protein Sequence Comparison Using Chaos Game … 391
Table 1 Amino acid categorization based on their chemical nature

Group No. Representation Amino acids
I Strongly hydrophytic or polar (L) R, D, E, N, Q, K, H
II Strongly hydrophytic (B) L, F, V, A, M, F
III Weakly hydrophytic (W) S, T, Y, W
IV Proline (P) P
V Glycine (G) G
VI Cysteine (C) C
2 Proposed Method
2.1 Classification of Amino Acids
The protein sequence of animal and plant is formed by 20 amino acids. In the proposed
method, these amino acids can be divided into six classes: Strongly Hydrophytic or
polar, Strongly Hydrophytic, Weakly Hydrophytic, Proline, Glycine, and Cysteine.
The group classification is depicted in Table1.
2.2 Proposed Method Using Chaos Game Theory

on Hexagonal Model
The study of nonlinear dynamics and unpredictable phenomena in the mathematical

field is known as chaos theory. It deals with issues that are difficult to properly predict
or control. The goal is to comprehend how unpredictable the occurrence will be so
that we can prevent taking any decisions that could harm our long-term well-being.
In simple terms, it depicts how random outcomes from regular equations are feasible
[19].
In this research, a hexagonal model is considered comparing each connecting
points as a group portrayed in Table 1. The centre point is considered as (0, 0), where
the radius is r. Based on the radius (r), the six corner points are computed and depicted
in Fig. 1. The protein sequence of a species consists of multiple combinations of 20
amino acids. Based on this group classification from Table 1, the proposed method
applies the chaos game on a hexagonal model where each corner of a hexagon is
considered as a category of amino acids. This process generates mean and standard
deviation for each input protein sequence which further helps to measure the distance
between two species. The distance matrix is measured by using Euclidian Distance
of two parameters: mean and standard deviation of each input protein sequence.
Equation 1 depicts the measurement of the distance between two species Spi and Spj .
392 D. Pal et al.
Fig. 1 Hexagonal model for

the proposed research
/
( )2 ( )2
Distance Measure of Spi and Sp j = Meani − Mean j + SDi − SD j . (1)
2.3 Algorithm of Proposed Method
Input:sp ← Protein Sequences of n Species

Output:dm [] ← Distance Matrix.
Process:
• Calculation of mean and standard deviation for each protein sequence input
For each protein sequence s of n Species{
Current_Point ← (0,0).
For each input character i of a protein sequence of length l
Next_Point ← Through Group Classification
Midpoint ← (Current_Point + Next_Point)/2
disi ← distance of Current_Point and Midpoint
∑
l
means = disi /l
i=1
| l
|∑
sds = | (disi − means )2 /l}
i=1
• Distance Matrix Calculation:
dm[][]←0 //initially all the values are set to 0.

for i ← 1 to (sp-1).
for j ← (p + 1) to sp.
// Distance/
Measur e o f Spi and Sp j .
( )2 ( )2
dm[i][j] ← Mean i − Mean j + S Di − S D j // ref. Eq. 1
.
The time complexity of the proposed method deals with the efficiency of the algo-
rithm. The algorithm is entirely reliant on two parameters: the number of species and
the length of protein sequence during measurement of mean and standard deviation
for all species. Therefore, it occupies O(N) computational time, where ‘N’ is the
length of each protein sequence. Similar amounts of time are needed for the distance
matrix measurements and the earlier operations.
3 Results and Analysis
A dataset of NADH Dehydrogenase-5 (ND5) protein with nine protein sequences is

chosen to validate the present work. The sequences are assembled from the NCBI
genome database having 605 approximate lengths. The nine NADH Dehydrogenase-
5 species are Human, Gorilla, Common chimpanzee, Pigmy chimpanzee, Fin Whale,
Blue Whale, Rat, Mouse, and Opossum. These nine species belonged to four families,
viz. Hominidae, Balaenopteridae, Muridae, and Didelphidae. During applying the
chaos game theory on the hexagonal model, the length (l) of input protein sequence
is processed and l number of points (x, y) are generated for each species. Figure 2a
and b depict the graphical representation of these x–y plots for one member from
each family, respectively.
Fig. 2 Graphical representation of x–y plots generated through chaos game theory on hexagonal
model a Human from Hominidae family b Fin Whale from Balaenopteridae family
394 D. Pal et al.
Fig. 3 Phylogenetic tree of 9 NADH Dehydrogenase-5 (ND5) species using our proposed method
According to the biological reference, the Hominidae family includes Humans,

Gorillas, C. chimpanzees, and P. chimpanzees; the Balaenopteridae family includes
Blue Whales and Fin Whales; the Muridae family includes Rats and Mice; and the
Didelphidae family includes Opossums. In the result of the proposed approach, the
Hominidae family contains the C. chimpanzee, P. chimpanzee, Human, and Gorilla;
the Balaenopteridae family includes the Blue Whale and Fin Whale; the Muridae
family includes the Rat and Mouse; and the Didelphidae family includes the Opossum
portrayed in Fig. 3. This exactly matches the biological reference provided.
3.1 Comparison Analysis of Proposed Model with Earlier

Approaches
3.1.1 Using Phylogenetic Tree
Analysing the proposed method with the existing approaches, a few research works
on the same dataset are considered. Most of the time, it has been found that the current
results are more appropriate or on par with those produced by earlier methodologies.
The following provides a detailed result analysis comparing our suggested method
with other methodologies. In [20], Mouse has been inaccurately clustered with
Opossum; Rat has been clustered with Fin Whale which falls in the Balaenopteridae
family with Blue Whale. Despite the fact that P. chimpanzee and C. chimpanzee
should be clustered together, Gorillas and Humans have been placed alongside P.
chimpanzee and C. chimpanzee, respectively. Therefore, based on biological refer-
ences, this research does not provide a suitable classification. In the next research
paper by Saw et al., Fitch–Margoliash approach is applied [21]. Although C. chim-
panzee should have been clustered with P. chimpanzee first and subsequently with
Humans, it has been clustered with C. chimpanzee. This goes against the taxon’s
biological reference once more. In the research by Xu et al. although both Rat and
Mouse are members of the Muridae family, they do not constitute a cluster [22]. This
indicates that the family is not biologically related. In Pal et al., the C. Chimpanzee
Table 2 Species
Species Proposed method Method in
[4] [20] [22] [23]
9 NADH Dehydrogenase-5 (ND5) 0 2 6 2 6
Comparison of SD values from phylogenetic trees of conventional methods with the proposed
Clustal Omega
and P. Chimpanzee have not clustered together. Moreover, Mouse is clustered with
b whale and f whale instead of Rat. Rat is not clustered with any other species [23].
3.1.2 SD Values
A distance measurement based on the topologies of any two trees is called the
Symmetric Distance (SD). The PHYLIP package’s treedist utility is used to retrieve
the symmetric information [24]. Using this method, it is possible to compare the
number of partitions that are not shared by both trees. In this instance, the SD values
are measured from phylogenetic trees produced by the suggested method in addition
to various other methods.
These trees are contrasted with the Clustal Omega-generated phylogenetic tree
that is illustrated in Table 2. It is seen that the SD values of the proposed method are
lower than those of earlier methods. Therefore, it can be concluded that our method
generates more accurate phylogenetic trees than earlier techniques.
4 Conclusion
Analysing protein sequences is one of the core areas in the field of computational
biology. However, due to the large variety of protein sequences and high cost of
experimentally studying them, the feasibility of analysing these sequences is reduced.
Therefore, it is necessary to develop a new and simple method for protein sequence
analysis to determine how similar proteins are and the evolutionary relationship
among them. In the present research, based on the six-group classification of amino
acids, the proposed method applies the chaos game on a hexagonal model where
each corner of a hexagon is considered as a category of amino acids. Further, the
method generates a set of non-degenerate points (viz. mean and standard deviation)
for each input protein sequence. Next, the distance matrix is computed among various
species for phylogenetic tree construction. Then, the generated phylogenetic tree is
evaluated and contrasted with other phylogenetic trees. In NADH Dehydrogenase-5
(ND5) dataset, previous techniques failed to provide accurate classifications, whereas
the produced species classifications are better. As a result, it can be determined
396 D. Pal et al.
that an alternative and unique approach is proposed to the existing alignment-free

method for protein sequence comparison of different distinct organisms with minimal
computational time.
References
1. Dey G, Meyer T (2015) Phylogenetic profiling for probing the modular architecture of the
human genome. Cell Syst 1(2):106–115
2. Zielezinski A, Vinga S, Almeida J, Karlowski WM (2017) Alignment-free sequence compar-
ison: benefits, applications, and tools. Genome Biol 18:1–17
3. Bernard G, Chan CX, Chan YB, Chua XY, Cong Y, Hogan JM, Ragan MA (2019) Alignment-
free inference of hierarchical and reticulate phylogenomic relationships. Brief Bioinform
20(2):426–435
4. Just W (2001) Computational complexity of multiple sequence alignment with SP-score. J
Comput Biol 8(6):615–623
5. Phillips A, Janies D, Wheeler W (2000) Multiple sequence alignment in phylogenetic analysis.
Mol Phylogenet Evol 16(3):317–330
6. Katoh K, Misawa K, Kuma KI, Miyata T (2002) MAFFT: a novel method for rapid multiple
sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):3059–3066
7. Vinga S, Almeida J (2003) Alignment-free sequence comparison—a review. Bioinformatics
19(4):513–523
8. Pinello L, Lo Bosco G, Yuan GC (2014) Applications of alignment-free methods in
epigenomics. Brief Bioinform 15(3):419–430
9. Jurtz VI, Johansen AR, Nielsen M, Almagro Armenteros JJ, Nielsen H, Sønderby CK, Sønderby
SK (2017) An introduction to deep learning on biological sequence data: examples and
solutions. Bioinformatics 33(22):3685–3690
10. Li J, Koehl P (2014) 3D representations of amino acids—applications to protein sequence
comparison and classification. Comput Struct Biotechnol J 11(18):47–58
11. Li B, Cai L, Liao B, Fu X, Bing P, Yang J (2019) Prediction of protein subcellular localization
based on fusion of multi-view features. Molecules 24(5):919
12. Barnsley MF (2012) Fractals everywhere: New Edition
13. Jeffrey HJ (1990) Chaos game representation of gene structure. Nucleic Acids Res 18(8):2163–
2170
14. Randić M, Novič M, Plavšić D (2013) Milestones in graphical bioinformatics. Int J Quantum
Chem 113(22):2413–2446
15. Fiser A, Tusnady GE, Simon I (1994) Chaos game representation of protein structures. J Mol
Graph 12(4):302–304
16. Basu S, Pan A, Dutta C, Das J (1997) Chaos game representation of proteins. J Mol Graph
Model 15(5):279–289
17. Yu ZG, Anh V, Lau KS (2004) Chaos game representation of protein sequences based on the
detailed HP model and their multifractal and correlation analyses. J Theor Biol 226(3):341–348
18. Gao J, Xu HX, Ding T, Wang K (2017) Early-warning model of influenza a virus pandemic
based on principal component analysis. Appl Ecol Environ Res 15(3):891–899
19. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search
tool. J Mol Biol 215(3):403–410
20. Czerniecka A, Bielińska-W˛aż D, W˛aż P, Clark T (2016) 20D-dynamic representation of protein
sequences. Genomics 107(1):16–23
21. Saw AK, Tripathy BC, Nandi S (2019) Alignment-free similarity analysis for protein sequences
based on fuzzy integral. Sci Rep 9(1):1–13
22. Xu C, Sun D, Liu S, Zhang Y (2016) Protein sequence analysis by incorporating modified chaos
game and physicochemical properties into Chou’s general pseudo amino acid composition. J
Theor Biol 406:105–115
23. Pal J, Ghosh S, Maji B, Bhattacharya DK (2022) Mathematical approach to protein sequence
comparison based on physiochemical properties. ACS Omega 7(43):39446–39455
24. Kuhner MK, Felsenstein J (1994) A simulation comparison of phylogeny algorithms under
equal and unequal evolutionary rates. Mol Biol Evol 11(3):459–468
25. Brown TA (1998) Genetics: a molecular approach (No. Ed. 3). Chapman & Hall Ltd.
Credit Card Fraud Detection
and Classification Using Deep Learning
with Support Vector Machine Techniques
Fatima Adel Nama, Ahmed J. Obaid,

and Ali Abdulkarem Habib Alrammahi
Abstract Detecting credit card fraud is a critical problem online vendor’s face in
the finance marketplace. Many sectors of the financial sector suffer from fraud and
heavy financial losses due to the rapid and fast growing of modern technologies.
The Support Vector Machine (SVM) and Multi-Layer Perceptron Learning (MLP)
techniques are used to quantify the uncertainty of credit card fraud detection and
classification. Through experimental analysis, the accuracy of the SVM and MLP
techniques is 94.59% and 91.21%, respectively. Experimental results show that SVM
and MLP techniques are classified credit fraud transactions with more than 90%
accuracy.
Keywords Classification · Machine learning · Deep learning · Data analytics
1 Introduction
A growing threat to the finance industry, corporations, and governments is financial

fraud. A criminal act of deception to gain financial gain can be defined as fraud.
The increased use of credit cards can be related to the increasing reliance on Internet
technologies. Credit card fraud has increased as credit card transactions have become
more prevalent for both online and offline shoppings. Internal and external credit card
frauds are the two basic forms.
Financial institutions have been severely affected by fraudulent credit card trans-
actions. According to a recent report, credit card fraud has accounted for 27.85
F. A. Nama · A. J. Obaid (B)

Faculty of Computer Science and Mathematics, University of Kufa, Kufa, Iraq
e-mail: ahmedj.aljanaby@uokufa.edu.iq
A. J. Obaid
Department of Computer Technical Engineering, Technical Engineering College, Al-Ayen
University, Thi-Qar, Iraq
A. A. H. Alrammahi
National University of Science and Technology, Thi-Qar, Nasiriyah, Iraq
https://doi.org/10.1007/978-981-99-6553-3_31
400 F. A. Nama et al.
billion dollars in losses in 2018. It was a 16.2% increase over 23.97 billion dollars
in losses in 2017; estimates predict 35 billion dollars to be lost by 2023 [1]. Fraud
monitoring and prevention can reduce these losses. Fraud monitoring and prevention
can reduce these losses. Nonetheless, class imbalances in the datasets make it chal-
lenging to detect credit card fraud from a learning perspective [2]. Many problems
hinder credit card fraud detection, but class imbalance is the most important [3].
Real-world ML applications suffer from class imbalance problems where datasets
have an uneven distribution of classes [4, 5].
Similar tasks can be accomplished by numerous machine learning algorithms [6,
7]. Each transaction is classified as legitimate or fraudulent by the algorithm in such
a task. Supervised and unsupervised machine learning classifiers have been proposed
to detect credit card fraud [8]. Supervised ML classifiers can teach the behavior of the
customer(s) and fraudsters by using labeled transaction data. However, unsupervised
machine learning does not rely on labeled data; it observes outliers. ML classifiers
that are supervised produce fewer false alarms than those that are unsupervised. In
this study, supervised ML classifiers are considered along with deep learning. In
this paper, Multi-Layer Perceptron (MLP) is used to detect credit card fraudulent
transactions and Support Vector Machine (SVM) is used to classify the credit card
fraud transactions.
This paper makes the following significant contributions:
• Data preprocessing has been performed, so errors and malware are effectively
eliminated.
• The deep learning (ML) technique is considered for detecting credit card fraud
transactions.
• The SVM-supervised ML classifiers were implemented on the public data to
classify credit card fraud transactions.
• The performance of the deep learning and machine learning techniques are
compared based on the various performance parameters.
The remainder of the paper is organized as follows: Sect. 2 presents the related
works based on the latest techniques and presents a comparative analysis based
on their proposed method and objective. Section 3 gives a brief description of the
proposed methodology. The result analysis and discussion of the dataset are presented
in Sect. 4. Finally, the conclusion is discussed in Sect. 5.
2 Related Works
The past decade has seen much attention paid to fraud detection. The purpose of this
section is to review various techniques used in the fraud detection domain of credit
cards. Increasingly, fraud detection techniques have been proposed in related work.
The research has identified two main types of credit card fraud transactions:
internal and external [9, 10]. However, a more comprehensive classification has been
Credit Card Fraud Detection and Classification Using Deep Learning … 401
proposed, encompassing three distinct groups: traditional card fraud (including appli-
cation, theft, fake, counterfeit, and account takeover), merchant fraud, and Internet
frauds (such as site cloning, false merchant sites, and credit card generators). A study
[11] shows that in 2014, banks and businesses worldwide suffered fraud losses worth
more than USD 16 billion, an increase of nearly USD 2.5 billion over the previous
year’s recorded losses. According to the report, 5.6 cents of every USD 100 were
fraudulent or 5.6% for each USD 100.
The latest research focused on developing a hybrid model combining RF, LR,
Gradient Boosting (GB), and voting classifiers to identify the fraud transaction using
credit card datasets [12]. According to the author, there was a maximum detection
rate for RF and GB. The studies mentioned above have dealt with fraud detection;
however, the algorithms used varied according to the datasets.
Using machine learning methods, credit card fraud has been detected [13, 14].
A supervised learning algorithm, which uses labeled datasets containing previous
transactions to build ML techniques that can identify the fraudulent transactions, is
highly effective in detecting credit card fraud. There are supervised learning tech-
niques which contain logistic regression [15], Support Vector Machines (SVMs)
[16], decision trees [17], adaptive boosting (AdaBoost) [18], random forest [19], and
artificial neural networks (ANNs) [20–22]. Another study [23] applied a genetic algo-
rithm (GA) to feature selection to detect credit card fraud. After selecting features,
ML models were trained using Naive Bayes (NB), logistic regression (LR), decision
trees (DTs), ANNs, and random forests (RFs). Various machine learning algorithms
can detect fraud on credit cards using various algorithms, but random forest achieves
the highest accuracy. A neural network model based on artificial intelligence and
machine learning used by [24], a distributed data mining system used by [25], a
sequence alignment algorithm based on a cardholder’s spending profile, and an intel-
ligent decision-making engine that uses meta-learning agents and fuzzy systems of
artificial intelligence was used in [26] (Table 1).
3 Methodology
Large datasets can be used for decision-making and evaluating the probability of
future events using machine learning methods. Fraud detection, marketing, and scien-
tific discovery use machine learning insights [33]. In this paper, Multi-Layer Percep-
tron (MLP) is utilized to detect credit card fraudulent transactions and SVM is used
to classify the credit card fraud transactions [37, 38]. The layout of the proposed
methodology is presented in Fig. 1.
Table 1 Research literature summary

Authors and year Proposed method Problem statement Objective
Xuan et al. [27] Random forest A weak classifier is The standard/fraud
used, and the data are behavior features are
not normalized trained using two
different types of
random forests to deal
with fraud detection
Randhawa et al. [18] AdaBoost Classification is An evaluation of various
computationally ML models for fraud
complex, and features detection using
are not selected properly real-world credit card
data
Dubey et al. [22] Backpropagation In education, classical The model is created
neural network algorithms are used using artificial neural
networks’ (ANNs)
technique with
backpropagation
Taha and Malebary Gradient boosting The input space has no Combining bio-inspired
[28] feature selection, optimization techniques
resulting in a complex with ML models may
input space enhance the
performance of the ML
Shukur and Kurnaz Artificial neural Trial and error are the Proposed the
[29] network best way to choose combination of the ML
activation functions and ANN technique
Yee et al. [30] Bayesian, logistics, Teaching different The Bayesian logistic
and J48 models is and J48 ML technique is
computationally used for fraud prediction
complex
Save et al. [31] Decision tree Transfer of data Design a novel
between directories is technique for fraud
not possible detection using a
tree-based ML technique
3.1 MLP
In ANNs, the biological neural system is modeled mathematically. Multiplication,

addition, and activation are the three main stages of this technique. An artificial
neural network’s value is multiplied by each weight. A full function, which includes
all the inputs’ weights, is on the middle side of the ANN. A weighted and activated
input that is used activation phase, also known as the transfer function, is found at the
end of the ANN. In MLP, each neuron is linked by its weights to form a feedforward
ANN. An MLP generates the desired outputs based on a set of inputs. The 28 input
layers, 12 hidden layers, and 1 output layer make up the MLP, as shown in Fig. 1.
Fig. 1 Layout of the proposed methodology
A hidden layer receives the input data from the input layers and forwards it to the
output layer via the input layer [34] (Fig. 2).
Perceptrons with multiple input, input, and output layers are called Multi-Layer
Perceptions. Each node uses encoders. When the weighted sum of the inputs is
computed, the activated function adds bias. Research can be done by removing and
ignoring individual transistors during external network construction [21].
⎡ ⎤
12N ⎣ K (K + 1) ⎦
2
X 2F = R 2J − . (1)
K (K + 1) j
4
Rj represents the algorithm j’s average ranking out of the total set of algorithms
K, N, and K.
Fig. 2 MLP network with 28 inputs, 12 hidden layers, and 1 output

3.2 Support Vector Machine (SVM)
The SVM can classify, predict, recognize patterns and detect outliers [35]. On the
credit card dataset, SVMs are used for prediction and classification. Credit card
transactions are classified into two categories using the SVM algorithm: fraud and
genuine. The hyperplane acts as the decision-maker in the SVM. Kernel representa-
tion and margin optimization are the two important characteristics of SVM technique
strength.
The optimal hyperplane has the following characteristics:

Yi V T xi + b ≥≥ 1, i = 1, 2, 3, . . . ., n. (2)
An objective function can be seen in Eq. (2):
ϕ(vv) = (vv). (3)
A kernel function is defined in the following way: xi is the average vector to the
hyperplane, xi is the credit card fraud transaction to classify, and yi is the type of
credit card transaction that point fits to
k(x1 , x2 ) = (∅(x1 ), ∅(x2 ), (4)
where ∅ : X → D maps transactions used as the input space X to higher dimensional

space D. The credit card fraud dataset is processed using KF as distinct transactions;
resultant is mentioned in the following:
{v, x} + b = 0. (5)
Here is how an SVM is classified:

n
(αi , yi , k(xi , x) + b) = 0. (6)
i
The kernel function (KF) inclination is determined based on the various datasets
with classification requirements. In cryptography, there are six easily recognizable
kernel functions available such as Gaussian matrix kernel (matrix kernel), polynomial
function, normalized polynomial function, precompiling kernel, and string kernel.
For classification tasks, (k(x, y) ≤ x, y ∗ D ∗ ∧D) is used as the polynomial kernel
function. The best-selected features are applied to the SVM algorithm to build a
classification model. In addition to being famous for the solution to classification
problems, the SVM is also accomplished of dealing with high dimensions’ dataset.
We have considered deep learning and machine learning techniques for detecting
and classifying credit card fraud transactions. The performance of both models is
observed based on accuracy, recall, precision, and specificity parameters. The result
analysis and discussion of the proposed model are designed based on the mentioned
Fig. 3.
4.1 Performance Matrices
A binary classification task based on ML and DL is presented in this paper. The

primary performance metric is the accuracy (AC) of the training and test data. The
recall (RC), the precision (PR), and the specificity (SC) of each model are additionally
computed [36]. AUC can also assess the quality of a model’s classification. The
confusion metric measures the effectiveness of a classifier for a specific classification
task.
• True positive (TP): An attack/intrusion accurately recognized as an attack.
• True Negative (TN): It is common for traffic patterns/traces to be classified as
expected when they follow a normal pattern.
• False positive (FP): It is incorrectly labeled intrusive when legitimate network
traces exist.
• False Negative (FN): Incorrectly classified attacks/intrusions as non-invasive.
TN + TP
AC = , (7)
TP + TN + FP + FN
TP
RC = , (8)
FN + TP
TP
PR = , (9)
FP + TP
TN
SC = . (10)
TN + FP
Fig. 3 Analysis of the proposed model

4.2 Dataset
This implementation considers a dataset accessible via a public web platform called
‘Kaggle’. A dataset is available in CSV format. In September 2013, European card-
holders used their credit cards to make transactions. All of the study’s input variables
are numbers, and those numbers are PCA-transformed. Due to privacy concerns, we
are unable to disclose the raw data’s characteristics or provide any additional context.
PCA has transformed all but two features: ‘Time’ and ‘Amount’. The remainder is the
principal components derived through PCA. Each transaction in the dataset is elapsed
seconds from the last transaction until the first transaction. The amount feature can be
used for cost-sensitive learning based on example-dependent transaction amounts.
If fraud is detected, the response variable ‘Class’ will take value 1; otherwise, it will
take value 0.
4.3 Result Discussion
We have implemented five SVM and MLP techniques in Matlab R2020a. The system
configuration is an i5-4310U CPU with a 2.60 GHz clock speed. Each system’s
secondary and primary memory space is 1 TB and 16 GB, respectively.
Figure 4 shows the SVM model’s accuracy, recall, precision, and specificity. The
SVM technique’s performance is measured based on above-mentioned performance
parameters, as shown in the bar plot in Fig. 4. As shown in the figure, the recall has
the highest value compared to other performance parameters. The SVM technique
observed that the maximum valid positive rate is 96.99%, and the number of accurate
predictions is 95.59%.
Fig. 4 Performance analysis

of the SVM technique based
on accuracy, precision,
recall, and specificity
Fig. 5 Classification rate of the SVM technique
The SVM technique shows the percentage of correct and incorrect classifications
is 94.59%, and 5.40%, respectively, as shown in Fig. 5.
The performance analysis of the MLP technique is observed based on accu-
racy, recall, precision, and specificity, as shown in Fig. 6. The graphical repre-
sentation of the MLP technique performance shows that the accuracy is 91.21%,
recall 95.08%, precision 85.29%, and specificity 88.51%. Experimental results show
that MLP gives the highest accuracy (91.21%) with the lowest error rate (0.003).
The MLP has used 28 inputs, 12 hidden layers, and 1 output. The MLP technique
shows the percentage of correct and incorrect classifications is 91.21% and 80.78%,
respectively, as mentioned in Fig. 7.
Fig. 6 Performance analysis

of the MLP technique based
on accuracy, recall, precision
and specificity
Fig. 7 Classification rate of the MLP technique
The cross-entropy represents the average square difference between outputs and
input values. The lowest value of cross-entropy indicates no error. Figure 8 displays
that the average cross-entropy is almost 0 during the prediction of the credit card
fraud transaction, which is very low. The selection of deep learning for predicting
credit card fraud transactions with train, test, and validation cross-entropy is a robust
and effective approach. The training cross-entropy is lower as compared to the test
and validation.
This visualization of the classification model shows the relation between input
and results once the earliest results have been associated. An association table is
created by transforming the anticipated results into a variable. Confusion metrics
Fig. 8 Cross-entropy of the Best Validation Performance is 0.26595 at epoch 82

MLP technique based on the 101
Train
train, test, and validation sets Validation
Test
Cross-Entropy (crossentropy)
Best
100
10-1
10-2
0 10 20 30 40 50 60 70 80 90 100
100 Epochs
can be plotted using the association table as a heat map. While confusion metrics
can be visualized using numerous built-in methods, they can also be defined and
visualized according to the score to improve the correlation. Figure 9 demonstrates
the confusion metrics of MLP based on the training, test, validation, and complete
data. The fundamental confusion metrics calculate the four main parameters: TP,
FP, TN, and FN. Figure 10 shows the comparative analysis of the SVM and MLP
techniques based on the confusion metrics.
Figure 11 compares the various parameters like accuracy, recall, precision,
and specificity concerning the MLP and SVM techniques. The figure’s graphical
representation shows that SVM has better performance than the MLP technique.
The SVM and MLP techniques show that the correct and incorrect classifications’
percentage is 94.59%, 5.40% and 91.21%, 8 0.78%, respectively, as shown in Fig. 12.
The SVM techniques show a higher correct classification rate than the MLP, but
MLP observed a higher incorrect classification rate. The MLP did not consider any
suspicious transaction in the correctly classified transaction.
Fig. 9 Confusion matrix of the MLP technique for training, testing, validating, and completing the
dataset
Fig. 10 Confusion matrix of the SVM and MLP techniques, respectively
Fig. 11 Comparative
analysis of the SVM and
MLP technique based on
accuracy, recall, precision,
and specificity
Fig. 12 Comparative analysis of the classification rate of the SVM and MLP techniques
5 Conclusion
Financial institutions have recently become especially concerned about credit card
fraud. The need for investigating different reliable ways of detecting fraudulent credit
card transactions still exists, despite the existing methods used in the past to detect
fraudulent activities. The paper uses machine and deep learning techniques to detect
and classify credit card fraud transactions. The accurately predictive class is high with
reduced false alarms. The accuracy percentage for the SVM and MLP is 94.59% and
91.21%. Compared to the MLP technique, the accuracy of the SVM technique is
observed to be the highest. The experimental result shows that SVM is considered
some missed or suspicious transaction, but MLP directly rejects it and counts it as
an incorrect or fraudulent transaction.
References
1. Tingfei H, Guangquan C, Kuihua H (2020) Using variational auto encoding in credit card fraud
detection. IEEE Access 8:149841–149853
2. Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2017) Credit card fraud detec-
tion: a realistic modeling and a novel learning strategy. IEEE Trans Neural Netw Learn Syst
29(8):3784–3797
3. Makki S, Assaghir Z, Taher Y, Haque R, Hacid M-S, Zeineddine H (2019) An experimental
study with imbalanced classification approaches for credit card fraud detection. IEEE Access
7:93010–93022
4. Rani P, Singh PN, Verma S, Ali N, Shukla PK, Alhassan M (2022) An implementation of
modified blowfish technique with honey bee behavior optimization for load balancing in cloud
system environment. Wirel Commun Mob Comput 2022:1–14. https://doi.org/10.1155/2022/

3365392
5. Rani P, Verma S, Yadav SP, Rai BK, Naruka MS, Kumar D (2022) Simulation of the lightweight
blockchain technique based on privacy and security for healthcare data for the cloud system.
Int J E-Health Med Commun 13(4):1–15. https://doi.org/10.4018/IJEHMC.309436
6. Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud:
a comparative study. Decis Support Syst 50(3):602–613
7. Seeja KR, Zareapoor M (2014) Fraudminer: a novel credit card fraud detection model based
on frequent itemset mining. Sci World J 2014
8. Lucas Y, Jurgovsky J (2020) Credit card fraud detection using machine learning: a survey.
ArXiv Preprint ArXiv:2010.06479
9. Chaudhary K, Mallick B (2012) Credit card fraud: the study of its impact and detection
techniques. Int J Comput Sci Netw (IJCSN) 1(4):31–35
10. Shen A, Tong R, Deng Y (2007) Application of classification models on credit card fraud
detection. In: 2007 International conference on service systems and service management, pp
1–4
11. Evans DS, Chang H, Joyce S (2015) The impact of the US debit-card interchange fee regulation
on consumer welfare. J Compet Law Econ 11(1):23–67
12. Sivanantham S, Dhinagar SR, Kawin P, Amarnath J (2021) Hybrid approach using machine
learning techniques in credit card fraud detection. Adv Smart Syst Technol Select Proc ICFSST
2019:243–251
13. Fanai H, Abbasimehr H (2023) A novel combined approach based on deep Autoencoder and
deep classifiers for credit card fraud detection. Expert Syst Appl 119562
14. Ni L, Li J, Xu H, Wang X, Zhang J (2023) Fraud feature boosting mechanism and spiral
oversampling balancing technique for credit card fraud detection. IEEE Trans Comput Soc
Syst
15. Singadkar G, Mahajan A, Thakur M, Talbar S (2021) Automatic lung segmentation for the inclu-
sion of juxtapleural nodules and pulmonary vessels using curvature based border correction. J
King Saud Univ-Comput Inf Sci 33(8):975–987
16. Hussain SS, Reddy ESC, Akshay KG, Akanksha T (2021) Fraud detection in credit card
transactions using SVM and random forest algorithms. In: 2021 Fifth international conference
on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC), pp 1013–1017
17. Mienye ID, Sun Y, Wang Z (2019) Prediction performance of improved decision tree-based
algorithms: a review. Procedia Manufact 35:698–703
18. Randhawa K, Loo CK, Seera M, Lim CP, Nandi AK (2018) Credit card fraud detection using
AdaBoost and majority voting. IEEE Access 6:14277–14284
19. Lin T-H, Jiang J-R (2021) Credit card fraud detection with autoencoder and probabilistic
random forest. Mathematics 9(21):2683
20. Akande ON, Misra S, Akande HB, Oluranti J, Damasevicius R (2021) A supervised approach
to credit card fraud detection using an artificial neural network. In: Applied informatics:
fourth international conference, ICAI 2021, Buenos Aires, Argentina, October 28–30, 2021,
Proceedings 4, pp 13–25
21. Asha RB, KR SK (2021) Credit card fraud detection using artificial neural network. Glob
Transitions Proc 2(1):35–41
22. Dubey SC, Mundhe KS, Kadam AA (2020) Credit card fraud detection using artificial neural
network and backpropagation. In: 2020 4th International conference on intelligent computing
and control systems (ICICCS), 268–273
23. Ileberi E, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using
the GA algorithm for feature selection. J Big Data 9(1):1–17
24. Ansari G, Rani P, Kumar V (2023) A novel technique of mixed gas identification based on
the group method of data handling (GMDH) on time-dependent MOX gas sensor data. In:
Mahapatra RP, Peddoju SK, Roy S, Parwekar P (eds) Proceedings of international conference
on recent trends in computing, vol 600. Springer Nature Singapore, pp 641–654. https://doi.
org/10.1007/978-981-19-8825-7_55
25. Phua C, Lee V, Smith K, Gayler R (2010) A comprehensive survey of data mining-based fraud
detection research. ArXiv Preprint ArXiv:1009.6119
26. Rani P, Hussain N, Khan RAH, Sharma Y, Shukla PK (2021) Vehicular intelligence system:
time-based vehicle next location prediction in software-defined internet of vehicles (SDN-
IOV) for the smart cities. In: Al-Turjman F, Nayyar A, Devi A, Shukla PK (eds) Intelligence of
Things: AI-IoT based critical-applications and innovations. Springer International Publishing,
pp 35–54. https://doi.org/10.1007/978-3-030-82800-4_2
27. Xuan S, Liu G, Li Z, Zheng L, Wang S, Jiang C (2018) Random forest for credit card fraud
detection. In: 2018 IEEE 15th international conference on networking, sensing and control
(ICNSC), pp 1–6
28. Taha AA, Malebary SJ (2020) An intelligent approach to credit card fraud detection using an
optimized light gradient boosting machine. IEEE Access 8:25579–25587
29. Shukur HA, Kurnaz S (2019) Credit card fraud detection using machine learning methodology.
Int J Comput Sci Mob Comput 8(3):257–260
30. Yee OS, Sagadevan S, Malim NHAH (2018) Credit card fraud detection using machine learning
as data mining technique. J Telecommun Electron Comput Eng (JTEC) 10(1–4):23–27
31. Save P, Tiwarekar P, Jain KN, Mahyavanshi N (2017) A novel idea for credit card fraud detection
using decision tree. Int J Comput Appl 161(13)
32. Alkhalili M, Qutqut MH, Almasalha F (2021) Investigation of applying machine learning for
watch-list filtering in anti-money laundering. IEEE Access 9:18481–18496
33. Özdemir A, Yavuz U, Dael FA (2019) Performance evaluation of different classification
techniques using different datasets. Int J Electr Comput Eng 9(5):2088–8708
34. Krenker A (n.d.) Bes? Ter J, Kos A (2011) Introduction to the artificial neural networks. Artif
Neural Netw-Methodological Adv Biomed Appl
35. Pouyan MB, Yousefi R, Ostadabbas S, Nourani M (2014) A hybrid fuzzy-firefly approach for
rule-based classification. In: The twenty-seventh international flairs conference
36. Kasongo SM, Sun Y (2020) A deep long short-term memory based classifier for wireless
intrusion detection system. ICT Express 6(2):98–103
37. Taher MM, George LE (2022) A digital signature system based on hand geometry-survey.
Wasit J Comput Math Sci 1(1):1–14
38. Iqbal MO, Obaid AJ, Agarwal P, Mufti T, Hassan AR (2023) Blockchain technology and decen-
tralized applications using blockchain. In: Tuba M, Akashe S, Joshi A (eds) ICT infrastructure
and computing. Lecture notes in networks and systems, vol 520. Springer, Singapore. https://
doi.org/10.1007/978-981-19-5331-6_57
Prediction of Criminal Activities
Forecasting System and Analysis Using
Machine Learning
Mahendra Sharma and Laveena Sehgal
Abstract The development of machine intelligence has resulted in automated

machines that are intelligent enough to carry out their tasks with the ability to influ-
ence the outcome, leading to a considerable reduction in the need for human interven-
tion in redundant processes. There has been a correlation between chores that can be
made much simpler and more discernible by mechanization and significant technical
developments. However, these machines are currently only video-producing equip-
ment and do not have any intelligence built into them. Due to the increase in material
from security cameras, there is now a demand for intelligent video feeds that can auto-
matically recognize abnormal occurrences. The major objective of the project has
been to enhance community safety through the automation of measurement assign-
ment processes and the analysis of actual closed-circuit television (CCTV) footage
of criminal activity. The responsibility of detecting illegal behavior has been dele-
gated to a structure that is able to recognize and differentiate it for more effective
monitoring, enabling the successful completion of the work. In this study, we have
presented a model that can distinguish between specific offenses with a precision of
more than 90% for all variables.
Keywords Surveillance camera · CCTV face recognition · Object detection
1 Introduction
The machine vision issue of image classification was displayed in order to address
and automate supervised classification pertaining to real-time live stream. Given
how new the problem is, there will be untested solutions available. In addition to
this, such applications provide a broad variety of solutions, which vary from the
first identification of noteworthy sporting acts [1] or ordinary activities taking place
in a scene to a number of safety and health-related activities. Our research aims
M. Sharma · L. Sehgal (B)

IIMT College of Engineering, Greater Noida, UP, India
e-mail: laveenasehgal30@gmail.com
https://doi.org/10.1007/978-981-99-6553-3_32
416 M. Sharma and L. Sehgal
to improve community safety by automating crime measurement and analysis by

delegating the task of identifying illegal or unusual behavior to a machine that is
skilled at inferring patterns that distinguish criminal behavior from routine activity.
Existing detection systems have a number of flaws that make them incompatible with
modern, readily available facilities. Conventional surveillance systems have one of
the most glaring shortcomings in that they rely on a diligent supervisor to evaluate
film and guarantee that any strange behavior is appropriately noticed and handled.
This is one of the most evident problems in conventional surveillance systems. It is
necessary for a human to evaluate CCTV video, which might lead to mistakes [2].
Deep learning techniques, in particular CNN architectures such as residual models
[3], have been used in order to automatically detect crimes, such as localizing the
person and the weapon in video footage [4].
It is necessary to have access to the relevant information or data, as well as a
technique for analyzing that data. The government departments that deal with law
enforcement can quickly access vast volumes of data. For instance, for the years 2017,
2018, and 2019, the city of Detroit has 81,440, 82,197, and 83,893 arrest records
accessible [5]. This brings the total number of records for these three years to roughly
a quarter million and presents options for the police to increase their knowledge of
current and future criminal behavior while also simplifying their decision-making
processes for how to combat criminal activity. The techniques for machine learning
scale effectively even for very large datasets [6]. A kind of machine learning known
as neural networks (NNs) takes its inspiration from the way in which the human brain
operates and the way in which it processes neuro-cognitive data. According to Mena
[6], NNs are the most efficient and accurate clustering approach that is presently
in use. This makes it possible for high-quality data to assist in the decision-making
process for the police.
How unexpected are distinct types of crime and the factors that lead to them,
even when they seem to be predictable? The writers of reference [7] state that the
requirement for a prediction system has risen due to the fact that society and the
economy are responsible for the creation of new forms of crimes. In the reference
[8], a method called dynamic time wrapping and a system called Mahalanobis that
analyzes crime patterns and makes predictions are described. Together, these two
tools make it possible to anticipate criminal activity and apprehend the real perpe-
trator. Every day, there are several ATM robberies, raising the issue of security. To
avoid this problem, each ATM has a watchman assigned to it. Every day, CCTV
cameras installed inside the ATM capture a large number of these videos. The length
of recorded movies is excessive, and automated video analysis techniques [9] have
yet to produce the desired results. It is exhausting to watch all of the videos because
they are so long. A system that extracts and summarizes only the most important
information from a lengthy video is required. Surveillance videos primarily record
any suspicious activity, such as robberies and murders. As a result, it is necessary
to extract this critical information from lengthy videos. This is where our problem
statement comes from. Following the selection of priority information for sampling,
Prediction of Criminal Activities Forecasting System and Analysis … 417
these films were summarized. Because the priority information has been collected
in the summary film, the time required to browse through the entire surveillance
footage has been reduced.
Recognizing a picture of a probing person as precisely and as rapidly as possible is
the fundamental objective of this effort. There are a lot of problems with the present
systems that have been found, such as how difficult it is to recognize a person when
there is low or dim light, how the picture quality is poor while it is being taken, how
expensive it is to train the database, and so on. This research, on the other hand,
addresses the bulk of the problems that are associated with the system.
The primary benefit is that the training set is substantially smaller, which has had a
considerable influence on the storage factor with respect to the vector characteristics
of these pictures. This has been achieved by reducing the amount of data that has to
be stored. The subject is discussed in further detail in the following sections. Instead
of doing our own search, we will alert the administrator via the use of a complete set
of associations that will be automatically formed from live-streaming video footage
as well as information from the picture domain. These associations will be used. The
database is “trained” by simply applying certain processes and algorithms depending
on the attribute that has been selected. This ensures that only proper or precise images
or faces are allowed. The primary goal of the paper is to identify the individual and
notify the appropriate administrator or person. It is primarily useful in shopping
malls, where there is a higher risk of theft, for unauthorized access security checks,
in banks, in offices where only authorized personnel are permitted entry, and in
the data centers of numerous corporations, including Google, Microsoft, and others.
There are numerous methods available in the modern world for locating or identifying
faces. However, each method has advantages and disadvantages. Furthermore, there
are two major issues that pose a difficult threat. The first is used to train the database,
while the second is used to differentiate between similarities. As a result, we saw
these two risks as a challenge and attempted to address them.
Our analysis of previously published publications revealed numerous challenges
in the field of facial image analysis. These include issues with lighting, generic facial
image annotations, integrating face sequence extraction and similarity evaluation, and
accurately retrieving similar facial images during searches. Additionally, the extrac-
tion of precise features from low-quality, blurry, or dark images poses significant
obstacles that must be addressed to advance the field effectively.
This work describes a strategy for the problem statement. Using convolutional
neural networks, we created an algorithm that samples priority information and gener-
ates image summaries. We compiled a dataset of frames containing both suspicious
and unsurprising movements. We trained our machine using this training dataset.
Convolutional neural networks were used to train our dataset. Then, we used a testing
video to put our model to the test. If there is a suspicious movement frame, it is
retrieved, and the priority information frames (robbery, kidnapping, theft, etc.) are
included. This summarized video is then immediately sent to the host via the SMTP
protocol. This effectively extracts the most important data from a long-duration
surveillance video.
2 Literature Review
This article discusses a number of papers on the subject as well as their imple-
mentations. First, the authors of references [9, 10] forecasted crime for 2014 and
2013, respectively, using the KNNs method. Using the provided strategy, we were
able to improve our accuracy to roughly 67%. Shojaee et al. [10] utilized a straight-
forward KNN method to sort crime statistics into urgent and non-urgent groups.
They had a remarkable percentage of success, being right 87% of the time. One
example of an SPPnet is the fast R-CNN [11], which uses a single spatial pyramid
pooling layer—also called the region of interest (ROI) pooling layer—to fine-tune a
pretrained ImageNet model from beginning to finish. The ROI pooling layer allows
for this to be achieved. Because of this improvement, it outperforms R-CNN 1.0.
The quickest R-CNN [11], an improvement on the R-CNN algorithm, is now the
preferred method for proposing representative regions. When tested on a number of
different item recognition benchmarks, the faster R-CNN consistently outperforms
the competition.
Predictions of criminal activity for the years 2015 and 2013 are made using a deci-
sion tree in references [1, 3]. According to their findings, Obuandike et al. referencing
[4, 12], we learn that crime analysis and forecasting were performed using the novel
naive Bayes method. Jangra and Kalsi [4] were able to precisely predict crimes with
an astounding 87% success rate; however, they were unable to extend their technique
to enormous datasets. In contrast, due to overlooking computing speed, resilience,
and scalability, Wibowo and Oesman’s [12] forecasts were only true 66% of the
time. The aforementioned reference [13] provides a deep neural network (DNN)-
based feature-level data fusion technique for efficiently fusing multi-model data
from several domains with environmental context information for crime prediction.
The DNN model indicated improved accuracy to an astounding level of 84.25%.
The experimental results showed that the recommended DNN model outperformed
the other prediction models in its ability to foresee the occurrence of criminal activ-
ities. In reference [14], we see a technique for foreseeing criminal behavior based
on an analysis of a dataset including actual instances of criminal behavior and the
patterns they followed. The proposed system heavily employs both KNN ML and
decision trees. The prediction model’s precision was improved by using both the
random forest technique and adaptive boosting. Using the random forest technique
along with undersampling and oversampling unexpectedly increased the accuracy to
99.16%.
3 Proposed Method
The major objective of this study is to recognize criminal activity captured on video
feeds from closed-circuit television cameras. A separate module uses triplet loss, a
recently developed technique, to find faces within those streams. This article offers
two different models as possible ways to convey the subject at hand. The first involves
identifying faces in CCTV footage using the techniques listed below. This is achieved
by first cleaning the statistics, then recognizing faces in the supplied data, and then
going on to feature extraction training using the triplet loss Function. After all of the
embeddings have been generated, the algorithm is used to identify such faces. This
is achieved simply by creating the model, identifying the face with a camera, and
enclosing it in bounding boxes while also providing an optimism parameter. During
our assessment of this module, this piece was able to provide some really significant
findings. The first selection of source films for the crime detection system comprises
recordings both with and without instances of criminal behavior. It is trained using
an existing ResNet architecture after the fundamental preprocessing steps have been
completed. This is where efficiency and other analytic approaches are displayed. In
the end, by employing a mobile phone to play duplicated webcam feeds together with
the trained model, we were successful. One can identify crimes. When analyzing a
total of six classes, this work achieved accuracy rates of more than 90%. Finally, this
article proposes a final pipeline that would connect both of these components and
use them to determine someone is also engaging in the criminal activity.
3.1 Face Recognition
The process of finding and identifying the precise location of a face within the
confines of a still picture or moving video stream is referred to as face detection.
The verification of a person’s identity is carried one step further by comparing the
face that is now being shown to previously saved images of other people’s faces.
This is accomplished via the use of distance measures such as the clustering method
or the L2 standard to evaluate the degree to which several sides are comparable
to one another. The suggested method for face detection offered in this research
includes face detection, face embedding computation, face embedding retraining on
the provided embeddings, and face detection in pictures or emulated streaming video.
The workflow that was detailed in this paper is shown in Fig. 2. For the purpose of
face detection, the Caffe model is used, while the Open Models are utilized for feature
extraction (Fig. 1).
OpenCV is used for deep learning. ResNet and single-shot detector (SSD) archi-
tecture are used for face recognition. The term “single-shot detection” refers to a
method in which the model only needs to take one shot to identify multiple items
in an image. By discrediting the input image into a set of bounded boxes around
those locations, it generates several boxes around areas with extracted features of
high certainty. In order to acquire the optimal fit for detection, the level of confi-
dence associated with each of these boxes is determined, and the size of each box
is adjusted. After it recognizes a face, the complete bounding boxes will appear as
shown in Fig. 3.
Fig. 1 Crime and face detection model
A. Face
We are able to preprocess photographs and carry out face alignments on data thanks
to the dlib package’s ability to define face area, which includes the mouth, right
and left brows, eyes, nose, and jawline. This enables us to get superior outcomes.
After doing some basic editing like cropping and facial alignment, we feed the
supplied face into the neural network that was recommended. An inputs batch has
to have an Anchor Picture (the most recent picture of person “A”), a Positive Picture
(another image of person “A”), and a Negative Picture (any other image that is not
of person “A”) in order to train a face recognition model. Because of this, the neural
network is able to calculate the face embeddings and alter the weights by using a
method called triplet loss. In this method, the “Anchor” picture and the “Positive”
picture completely linked layers are located relatively near to one another; however,
the “Negative” image inserted is located a considerable distance away. To train a
Fig. 2 Procedure of face

detection model
classifier, such as SVMs, SGD classification techniques, random forests, and so on,
on the upper edge of the produced extracted features of the face that constitute the
face detection pipeline, a CNN (Caffe) model computes the deep features for each
and every picture that is input into the system. This is done in order to build the face
detection pipeline.
(1) Data Augmentation: This study proposes specific data augmentation techniques
for identifying patterns in our data and increasing the efficacy of the currently
limited amount of data available. The fundamentals of inverting, rotating,
zooming, and translating an image, as well as scaling, cropping, moving along
the x- and y-axes, adding Gaussian noise, shearing, and skewing, have all been
included into the program. A more detailed representation of our dataset is
shown in Fig. 4.
Fig. 3 Complete bounding boxes
Fig. 4 Expanded version of our dataset

B. Video Classification for Crime Detection
This module looks for anomalies in footage of everyday activities. The only dataset
that was utilized for this purpose was the UCF crimes dataset, which comprises
recordings of a variety of different sorts of crimes, each of which has important
and differentiating qualities. It was used for the purposes of training. There are
thirteen categories that are covered by the statistics, and some of them are as follows:
assault, arson, fighting, burglary, shoplifting, robbery, gunshot, abuse, arrest, theft,
and vandalism. There are approximately 1900 real-world examples in total.
1. Dataset Preparation and Preprocessing
In each scenario, one of three techniques is used, including data conversion, enrich-
ment, and augmentation at the end. This study classified crimes into six types:
robbery, vandalism, normal, fighting, robbery, and abuse. Because of the length
of the provided movies, the researchers were forced to cut down on the number of
dataset classes. Video editing and cutting eliminated unnecessary and misleading
components by condensing five minutes of film down to forty-five seconds and
concentrating on the precise moment when the event had place. Low-resolution films
were sharpened and cropped to draw attention to specific elements of a crime scene.
The elements of each criminal video are labeled individually. After that, the default
class may be given responsibility for the remaining video stream. Additionally, data
augmentation was achieved by expanding the sorts of data that could be used to train
a particular model. This allowed for a greater variety of data to be included in the
training process. An example of data augmentation may be seen in Fig. 7.
2. Residual Network (ResNet)
Rather than learning unsourced functions, the ResNet layers are designed to formulate
as training residue functions with functionality to the layer inputs. The architecture,
which has a depth of 18–152 convolutional layers, includes a short link to prevent
signal transmission from one layer to the next. These connections may cross layer
network gradients flows from the initial layers to the later layers, making training
very deep networks easier. The link that takes longer to recover demonstrates in Fig. 8
how it circumvents the signal that travels from the top to the bottom of the block.
To solve the design’s disappearing gradient problem, the persistent network concept
and skip connections were used. As a consequence of this, it may be possible to build
a direct relationship to the outcome and avoid many stages of training. [18, 19].
3. Technique for the Proposed Classification of Videos
A cycle through each frame of the video sequence is required in order to imple-
ment the criminal categorization pipeline that has been presented. It runs the same
frame through a CNN for each frame it identifies and classifies each panel sepa-
rately and independently. The algorithm labels the window with the label with the
highest chance before writing the outcome frame. Because our problem is sequential,
Fig. 5 Residual block
the previously mentioned method, which only considers one frame, will not work.
To detect criminals using a single video feed, there must be a constant correlation
between subsequent frames (Fig. 5).
Calculating the above for specific frames and keeping track of the most recent
“N” projections are both required. These are the inputs that the pipeline makes use
of to compute an average of its most recent “N” predictions, choose the labeling that
has the greatest probability, and then present the outcome.
4. Informing ResNet Members About UCF-Related Crimes Information set
The ResNet evaluation and training procedures are spoken about in some detail here.
The first thing that needs to be done is to track out the folders that contain the picture
files that need to be validated and trained on. Additionally included are training
parameters such as batch size, epoch count, picture height and width, and training
rate. Train and test data may be generated with the assistance of the tensor flow digital
image generator module. During training, each epoch’s activities are recorded, and
then those records are used to plot results at the conclusion of training. To add insult
to injury, both the model and its weights have been preserved. In the training phase,
the model’s stated hyper-parameters are established. These include the classifiers,
the epoch count, and the testing set. When training a CNN, one needs take into
consideration a number of different hyper-parameters in order to get optimal results.
The ResNet module was given the following inputs:
. Keeping 25% of the data as test data and utilizing the remaining data for the
training of the model and splitting the dataset into two halves
. The total number of epochs is 50
. Loss = Categorical Cross-Entropy
. A learning rate of 0.0001 for the stochastic gradient descent algorithm used by
the optimizer
. Metric = Accuracy
Fig. 6 Face recognition for

person A (simulating a
CCTV)
The following criteria form the basis for the final evaluation and result:
. False Positive: Although the model assumes otherwise, no crime actually happens
at the site.
. False Negative: The model was unable to determine whether or not there was a
violation.
. True Positive: The model is capable of correctly identifying the criminal activity.
. True Negative: Neither the crime nor the model’s capacity to detect it takes place.
Neither of these conditions are met.
4 Results
4.1 Face Recognition
Each individual’s 15 photographs were used to train three different faces for face
detection. When simulating a CCTV feed with a camera, the authors decided to utilize
the least amount of data possible for this module. The outcomes of this decision
are shown in Figs. 6 and 7. The restricted budget and capabilities of the authors’
computers led to this decision. The machine was capable of producing bounding
boxes and trust ratings for each face, as well as effectively classifying the faces.
5 Crime Detection
The results of testing the model over a period of fifty epochs are shown in Tables 1
and 2. Both the accuracy and the loss that occurred throughout the training and
the validation were recorded. The authors used the learning algorithm in order to
conduct an analysis of a continuous stream of a YouTube movie that included illegal
behavior such as “fighting,” “property destruction,” and “abuse.” The model was able
Fig. 7 Face recognition for

person B (simulating a
CCTV)
to provide an accurate prediction of the occurrences of the stream as a result of this

(Table 3).
6 Conclusion and Future Works
Because of the work it did on this project, the face detection module was able to
effectively duplicate a face detection model and present a prototype of the product
it would eventually become. After the most current data was acquired, the crime
detection approach was used to train the ResNet network, and it was effective in
doing so for a number of classes; the training also yielded positive outcomes. In
the future, the facial recognition software model could be improved by adding more
facial classes, among other things. Furthermore, it would be fascinating to compile
a database of all these features so that, if we used it, we could instantly identify a
specific victim from a live CCTV camera video. Furthermore, experimenting with
various facial recognition methods and analyzing the results would be fascinating.
The development of a face detection algorithm that is effective regardless of skin
tone and is less prone to errors and failures in the future could improve our system.
A broader set of characteristics would also eliminate the possibility of deceiving
the system by changing facial features. As a result of the use of various tools for
crime prevention and prediction, the position of law enforcement organizations may
change. Combining machine learning and computer vision can significantly improve
law enforcement organizations’ overall effectiveness. A machine will be able to
accurately predict future crimes by combining computer vision and machine learning
with security equipment without the need for human intervention. Making a system
that can anticipate and predict urban crime hotspots is one option for automation.
Table 1 A comprehensive overview of the background work

References Methods Observation
[1] Decision tree They contrasted the ZeroR model with the J48 naïve Bayesian
model
[9] KNN (K = 5) They undertake research to demonstrate that the GBWKNN
filling technique, when combined with the KNN classification
algorithm, may result in a better level of accuracy
[10] KNN (K = 10) They did this by first separating the data into crucial and
non-critical categories, and then comparing those two sets of
data to five different categorization systems. When it came to
accurately forecasting outcomes, they found that neural
networks, naïve Bayesian models, and KNN models fared
better than decision trees and SVM models
[4] Naïve bayes For crime analysis and prediction, they used a cutting-edge
classifier criminal detection naive Bayes algorithm
[3] Decision tree In this study, the accuracy of naive Bayesian decision tree and
KNN algorithms in crime detection was compared
[12] Autoregressive This article takes a look at the shortcomings of fuzzy
integrated moving cognitive maps when it comes to making predictions about
average (ARIMA) time series. ARIMA makes use of the autocorrelation
parameters in its calculations
[15] Regression model They try to forecast crimes 30 days in advance
In Pittsburgh, the experiment is being conducted
[2] SVM They compare several models to determine which one has the
best option for forecasting hotspots
[16] Random forest They used 80% of the data to train the model and the
regressor remaining 20% to test it; as a consequence, the model earned
a score of 90%. The data was split into two sections and used
in different ways
[17] Techniques for the Our method has the potential to incrementally increase the
detection of mean average precision (mAP) in an item detection job from
objects using fast/ 0.6702 to 0.6764
faster R-CNN and
YOLO
[11] Fast R-CNN The detection approach on a GPU delivers accuracy that is on
par with state-of-the-art standards despite only making 300
recommendations for each picture and moving at a frame rate
of five frames per second. (covering each and every step)
[18] Convolutional Achieves a mean average precision (mAP) of 53.3% while
neural networks boosting mean average precision (mAP) by more than 30% in
(CNNs) comparison with the previous result
[19] A robust model for At 10,097 different sites, selective searching yielded a recall
word recognition rate of 99% and a mean average best overlap of 0.8779
based on
bag-of-words
(continued)
Table 1 (continued)
References Methods Observation
[20] Convolutional The investigation of networks with increasing depth using a
neural networks on relatively modest (3 × 3) convolution filter design revealed
a massive scale for that raising the depth to 16–19 weight layers has the potential
deep learning in to enhance current setups
image recognition
[21] Convolutional While applied to images with a resolution of VGA, it is
neural networks capable of achieving a detection performance of 100 frames
per second (FPS) while employing a GPU and just 14 FPS
when utilizing a single CPU core. These findings are based on
comparisons with two publicly available facial detection
standards
Table 2 Results for the different classes

Classes Precision Recall F 1 -score Support
Abuse 0.96 0.94 0.96 237
Assault 0.99 0.52 0.69 59
Fighting 0.96 0.97 0.93 394
Normal 1.00 1.03 1.04 1010
Robbery 0.97 0.97 0.96 335
Vandalism 0.97 1.02 0.98 173
Table 3 Accuracy metrics

Classes Precision Recall F 1 -score Support
Accuracy 0.93 0.96 0.96 2199
Macro-avg 0.95 0.97 0.99 2199
Weighted avg 0.93 0.96 0.97 2199
References
1. Obuandike GN, Isah A, Alhasan J (2015) Analytical study of some selected classification
algorithms in WEKA using real crime data. Int J Adv Res Artif Intell 4(12):44–48
2. Gorr W, Olligschlaeger A, Thompson Y (2000) Assessment of crime forecasting accuracy for
deployment of police. Int J Forecast 743–754
3. Iqbal R, Murad MAA, Mustapha A, Panahy PHS, Khanahmadliravi N (2013) An experimental
study of classification algorithms for crime prediction. Indian J Sci Technol 6(3):4219–4225
4. Jangra M, Kalsi S (2019) Crime analysis for multistate network using naive Bayes classifier.
Int J Comput Sci Mob Comput 8(6):134–143
5. Strom KJ, Smith EL (2017) The future of crime data: the case for the national incident-based
reporting system (NIBRS) as a primary data source for policy evaluation and crime analysis.
Criminol Public Policy 16(4):1027–1048
6. Mena J (2016) Machine learning forensics for law enforcement, security, and intelligence. CRC
Press
7. Chen P, Yuan H, Shu X (2008) Forecasting crime using the arima model. In: 2008 fifth
international conference on fuzzy systems and knowledge discovery, vol 5. IEEE, pp 627–630
8. Rani A, Rajasree S (2014) Crime trend analysis and prediction using mahanolobis distance and
dynamic time warping technique. Int J Comput Sci Inf Technol 5(3):4131–4135
9. Sun CC, Yao CL, Li X, Lee K (2014) Detecting crime types using classification algorithms. J
Digit Inf Manag 12(8):321–327. https://doi.org/10.14400/JDC.2014.12.8.321
10. Shojaee S, Mustapha A, Sidi F, Jabar MA (2013) A study on classification learning algorithms
to predict crime status. Int J Digital Content Technol Appl 7(9):361–369
11. Hao L, Jiang F (2018) A new facial detection model based on the faster R-CNN. In: IOP
conference series: materials science and engineering, vol 439, no 3. IOP Publishing, p 032117
12. Wibowo AH, Oesman TI (2020) The comparative analysis on the accuracy of k-NN, naive
bayes, and decision tree Algorithms in predicting crimes and criminal actions in Sleman
Regency. J Phys Conf Ser 1450(1):012076
13. Kang HW, Kang HB (2017) Prediction of crime occurrence from multi-modal data using deep
learning. PLoS ONE 12(4):e0176244
14. Hossain S, Abtahee A, Kashem I, Hoque MM, Sarker IH (2020) Crime prediction using spatio-
temporal data. In: International conference on computing science, communication and security.
Springer, Singapore, pp 277–289
15. Vanhoenshoven F, Nápoles G, Bielen S, Vanhoof K (2017) Fuzzy cognitive maps employing
ARIMA components for time series forecasting. In: International conference on intelligent
decision technologies. Springer, Cham, pp 255–264
16. Yu CH, Ward MW, Morabito M, Ding W (2011) Crime forecasting using data mining tech-
niques. In: 2011 IEEE 11th international conference on data mining workshops. IEEE, pp
779–786
17. Alves LG, Ribeiro HV, Rodrigues FA (2018) Crime prediction through urban metrics and
statistical learning. Physica A 505:435–443
18. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with
region proposal networks. Adv Neural Inf Process Syst 28
19. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object
detection and semantic segmentation. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 580–587
20. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object
recognition. Int J Comput Vis 104(2):154–171
21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. CoRR, abs/1409.1556
Comparing Techniques for Digital
Handwritten Detection Using CNN
and SVM Model
M. Arvindhan, Shubham Upadhyay, Avdeep Malik, Sudeshna Chakraborty,

and Kimmi Gupta
Abstract The level of human dependence on robots has never been higher, affecting
everything from object identification to Using deep learning and machine learning
techniques, humans can add sound to silent movies and images. In a similar vein,
handwritten text recognition is a substantial field of R&D with several possible conse-
quences. Handwriting recognition (HWR), also known as handwritten text recogni-
tion (HTR), is the computer’s ability to read and comprehend legible handwritten
input from sources such paper documents, pictures, touchscreens, and other devices.
It seems from this piece, we carried out a procedure for recognizing handwritten
digits. Using MNIST and Support Vector Machines and MNIST datasets (MLP),
networks are modelled by CNN and SVM. To find the most efficient digit recogni-
tion model, prior to moving on, it is important to assess how accurately and quickly
the aforementioned models can be used.
Keywords Support Vector Machines (SVMs) · Multilayered perceptron (MLP) ·

Convolution neural network · Handwritten digit recognition · Deep learning ·
Machine learning · And MNIST datasets (CNN)
1 Introduction
A computer can recognize the digits in handwritten text. To distinguish human-

written numerals from other sources, such as images, documents, touchscreens, etc.,
and group them into 10 categories in advance (0–9). It has been a long deep learning
research which has no end on this subject. Numerous uses exist for digit recog-
nition, such as arranging letters, handling bank checks [1]. There are numerous
M. Arvindhan (B) · S. Upadhyay · A. Malik · S. Chakraborty · K. Gupta

School of Computing Science and Engineering, Galgotias University, Greater Noida, Uttar
Pradesh 203201, India
e-mail: saroarvindmster@gmail.com
K. Gupta
e-mail: kimmi.gupta@galgotiasuniversity.edu.in
https://doi.org/10.1007/978-981-99-6553-3_33
432 M. Arvindhan et al.
difficulties with handwritten digit recognition. Due to the various writing styles of
various authors, optical character recognition is not what this is. When attempting to
recognize handwritten digits, this research provides a thorough comparison between
various approaches with regard to artificial intelligence and computer learning. In this
case, we used a convolutional neural network, Support Vector Machines, and multi-
layer perceptrons. Plots and charts made using the data visualization tool matplotlib
back up the comparisons of these algorithms’ accuracy, error rate, and testing/training
time.
Accuracy matters because models make better decisions. Low-precision models
are useless in practice. Automated bank check processing systems that read check
amounts and dates need high accuracy. The system misidentifying a digit might do
catastrophic damage. Real-world applications need high-accuracy algorithms. We are
comparing accuracy to find the most exact and error-free solution for handwritten
digit detection applications.
This paper provides a good understanding of SVM, CNN, and MLP to recognize
handwritten digits. It also indicates the best algorithm for digit recognition. In the
next sections of this study, we will discuss pertinent research on the three algorithms,
followed by how each approach is implemented and utilized. Our study’s results and
conclusions follow. It will also provide a few future recommendations. Last section
contains this essay’s references and citations.
2 Related Work
Significant effort has been invested in the study of how robots may be made to
behave more like humans via the use of AI, ML, and deep learning. Sophistica-
tion of machines is increasing over time; from performing simple math operations
to performing retinal identification, they have improved the safety and control of
human lives. Similar to facial recognition, deep learning and ML are essential appli-
cations which aid in the identification of fakes in handwritten text recognition. Many
different types of study have already been conducted that includes a thorough anal-
ysis and application of numerous well-known algorithms. In contrast, [2] provided
a comparison of Support Vector Machine, convolutional neural networks, and K-
Nearest Neighbour and discovered the highest accuracy achieved by a convolutional
neural network was 98.72%, while the RFC method yielded the lowest. The study’s
authors concluded that among the tested classifiers, MLP had the most reliable results
with the smallest margin of error compared to SVM. In that order, [3] conducted a
thorough SVM, KNN, and MLP models which are compared in this work for text
classification, and we find that both KNN and SVM are 99.26% accurate across all
classes of datasets, but when using MLP, the procedure becomes a little bit more
difficult to categorize. The authors recommend utilizing CNN and Keras to enhance
the categorization for number 9, in this case [4] concentrated on contrasting.
A convolution can be described as “looking at functions surrounding to produce
an accurate prognosis of its outcome,” which is what we need in order to train a
Comparing Techniques for Digital Handwritten Detection Using CNN … 433
large neural network to achieve. Convolution neural networks have been utilized by
[5, 6] to recognize handwritten digits from MNIST datasets. In order to determine
and then compare the precision over various epochs, [5] used a seven-layered CNN
model with backpropagation, gradient descent, and five hidden layers yielding a
maximum accuracy of 99.2%. A brief discussion of CNN’s various components, its
development you may read comparisons between models like AlexNet, DenseNet,
and ResNet as well as LeNet-5 to SENet in [6]. LAN-5 and LAN-5 (with distortion)
were the research outputs that, on the MNIST dataset, obtained test error rates of 0.8%
and 0.95 per cent, respectively. The design and precision of AlexNet are identical
to those of LeNet-5, but it has a much larger set of parameters (about 4,096,000),
and ILSVRC-2017 winner “Squeezeand-Excitation network” (SENet) did this by
lowering the top-five error rate to 2.25%.
3 Methodology
According to the feature, SVM, MLP, CNN are contrasted. Considerations for each
algorithm include dataset size, iterations, accuracy, and hardware requirements (e.g.,
Windows 10 LTS, i5 7th generation CPU) for optimal performance.
3.1 Dataset
The field of identification of handwritten characters is one that has seen a lot
of research and has already yielded numerous implementation strategies, which
contain important learning datasets, well-known algorithms, methods for scaling
and extracting features, and feature scaling. The MNIST dataset, which consists of
special databases 1 and 3, is divided into the MNIST dataset. The numbers in special
databases 1 and 3 were respectively by staff personnel of the US Census Bureau
and high school students. The MNIST has seventy thousand pictures in total (sixty
thousand for training and ten thousand for testing), each with a bounding box of 28 ×
28 pixels with anti-aliasing. All of these photos have associated Y values that inform
the viewer of the digit’s identity (Figs. 1 and 2).
3.2 Support Vector Machine
SVMs are increasingly useful for supervised machine learning. Data points are
usually plotted in an n-dimensional space with n characteristics. A coordinate repre-
sents a feature’s value. Then, the hyperplane separating the two groups is found and
the classification is completed. SVM chooses extreme vectors to build a strong data
Fig. 1 Bar graph illustrating the MNIST handwritten digit training dataset (label vs total number
of training samples)
Fig. 2 Plotting of some random MNIST handwritten digits
classification hyperplane. SVM chooses extreme vectors for hyperplane construc-

tion. Support vectors are the term for these severe cases; hence, the technique is
called Support Vector Machine. Nonlinear and linear SVMs exist (Fig. 3).
3.3 Multilayered Perceptron
MLPs are a kind of ANN. It has three layers: input, hidden, and output. Every layer is
composed of nodes, also called neurons, and each node is linked to the nodes below
it. The multilayered perceptron has three levels, but infinite hidden layers can be
added according to need. Number of dataset attributes and visible classes determines
output and input layer nodes. Due to the model’s unpredictability, the number of
Fig. 3 This image describes the working mechanism of SVM classification with supporting vectors
and hyperplanes
Fig. 4 This figure illustrates the basic architecture of the multilayer perceptron with variable
specification of the network
hidden layers and nodes is set experimentally. The model’s hidden layers can each
have unique processing activation functions. It employs backpropagation, a method
for learning objectives, supervised learning. Every node link in the MLP has a weight
that is altered during the model’s training to synchronize with each connection [7]
(Fig. 4).
Fig. 5 This figure shows the architectural of CNN layers in the form of a flowchart
3.4 Convolutional Neural Network
Convolutional neural networks classify images using deep learning. This deep neural
network class requires minimum preprocessing. Instead of inputting an image pixel
by pixel, it enters it in chunks, which helps the network recognize confusing patterns
(edges). CNN contains input nodes, an output layer, and hidden convolutional,
pooling, fully connected, and normalizing layers [8]. CNN’s filter/kernel weighted
arrays extract image features. CNN layer activation parameters are non-linear [9].
CNN shrinks as channels rise. The column matrix is projected last [10] (Fig. 5).
3.5 Visualization
In this study, comparing deep learning and ML using MNIST in terms of runtime,
difficulty, average accuracy, epochs, and not visible layers (i.e. a dataset of hand-
written digits). With the matplotlib package, this provides the clearest depictions
of how the algorithms really do the task of recognising the number, and we were
able to depict the data gathered from the deep analysis of the algorithms using bar
graphs and tabular format charts. The graphs are provided at each crucial stage
of the programmes to provide visual representations of each step and support the
conclusion.
4 Implementation
For each method, we used one of three classifiers to evaluate its performance across
the following criteria: accuracy, speed, complexity, and the number of iterations used
for deep learning. Classification strategies: multilayer perceptron, neural network,
and Support Vector Machine. The implementation of each method has been detailed
in depth below in order to develop a flow for this analysis that will enable an accurate
and seamless comparison.
4.1 Preprocessing
Preprocessing improves input data by eliminating contaminants and redundancy.

We reorganized all input data. The photographs are (28, 28, 1). The picture has
normalized pixel values from 0 to 255, and thus, the dataset was divided by 255.0 to
create “float32” input features in the range 0.0–1.0. Then, each of the y values was
one-hot encoded as a binary digit and assigned a value. The number 4 produced as
an output will be represented as the array [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0] in binary.
4.2 Support Vector Machine
Sample vectors may be either sparse (any scipy.sparse) or dense (any numpy.ndarray;
they may be transformed to numpy.ndarray for use as inputs to the SVM in scikit-
learn) [11]. Scikit-learn has many classes that can carry out multi-class classification
on a dataset, including SVC, NuSVC, and NuSVC, LinearSVC. In this research, we
utilized LinearSVC to categorize MNIST datasets using a linear kernel developed
with LIBLINEAR [12].
Sklearn, seaborn, NumPy, matplotlib, pandas are a few of the scikit-learn libraries
that were used throughout development. After obtaining the MNIST datasets in CSV
format, we will import them into pandas and analyze the data.
After that, features were normalized and scaled, and then, individual samples were
plotted and converted into a matrix. At last, to assess the accuracy of the model, we
constructed a linear SVM and used a confusion matrix [13].
4.3 Multilayered Perceptron
With the aid of the Keras module, a feedforward artificial neural network, often
known as a multilayer perceptron [14], is used to implement the recognition of
handwritten digits. The sequential class MLP model is generated, and the associated
hidden layers are added, each with a different activation function, using a 28 ×
28 pixel grid image as input. Following the construction of a sequential model, we
implemented a dense layer with adjustable parameters and dropout layers, as seen
in the following diagram. You will find a helpful block diagram below (Fig. 6).
Once you have the test and training sets of data, you may train a neural network
in Keras.
Fig. 6 Sequential block diagram of multi-layers perceptron model built with the help of Keras
module
We employed a four hidden layers and a ten-unit output layer (total number of
labels) neural network. The buried layers’ unit count is maintained at 512. The 2828
image is turned into a 784-dimensional array as the network’s input. Building the
network was done using the sequential model. Since we utilized the dense layer
to generate a feedforward network, all of the neurons in the most recent layer are
directly coupled to those in the layer above it.
4.4 Convolutional Neural Network
Utilizing Keras, convolutional neural network [15] is used to accomplish handwritten

digit recognition. Designing and putting into practise deep learning models use this
open-source neural network library.
We built the model incrementally using the sequential class in Keras. The input
picture is 28 pixels tall, 28 pixels wide, and 1 pixel in depth (number of channels).
Then, at [16], we built a model with a convolution layer at its core. To extract
characteristics from the input data and convolve around it over its height and width,
this layer uses a matrix.
A kernel or filter is the name of this matrix. The filter matrix’s weights are repre-
sented by the values. In each of the three dimensions (3, 3), we have applied 32 filters
with a one-stretch threshold. The number of pixels that shift depends on the stride.
By using equation
((N + 2P − F)/S) + 1,
where N seems to be the dimension of the input image, P is the padding, F is the size
of the filter, and S is the stride, we are able to produce activation maps of dimension
(S) + 1. Here, the amount of filters determines the image’s depth (or channels).
We have used the ReLU activation function [17] to boost the nonlinearity. For the
Fig. 7 Detailed architecture of convolutional neural network with apt specification of each layer
next convolutional layer, we use the ReLU function in conjunction with 64 filters of
identical parameters (3, 3) and stride (1).
Since we maintained a pool size of (2, 2) with such a pace of (2), adjacent pixels
will seamlessly blend into one another. To avoid overfitting and make the model more
compact, we may utilize the dropout layer [18], which randomly removes some of
the neurons from the network. At this point, a node has a 25% probability of being
removed from the network. After that, we used the flatten layer, which turns our two-
dimensional data into a data matrix (vector). About 128 neurons make up this layer,
with a dropout frequency of 0.5 or 50%. The output is delivered into the model’s
last layer—the output layer—after the ReLU activation function has been applied.
Classes are represented by ten neurons in this layer (numbers 0–9), and classification
is performed with the help of the SoftMax algorithm [19]. The probability distribution
for each of the ten classes is returned by this function. The output is the class with
the highest probability (Fig. 7).
5 Result
With the aid of experimental graphs, we examined the accuracy and execution times
of SVM, MLP, and CNN after implementing each method for clear comprehension.
All of the models mentioned above have had their training and testing accuracy
taken into consideration. We ran all the models and found that the Support Vector
Machine does have the best accuracy on the training data and CNN has the best
accuracy on the testing data. Additionally, in order to better understand how the
algorithms function, we compared the execution times. An algorithm’s running time
typically varies depending on how many it has finished operations. To get the required
outcome, we used norms to train Support Vector Machine models and up to 30 epochs
to build our deep learning model. As for running time, Support Vector Machine was
faster than CNN.
In this table, we can see how each model performed overall. There are five columns
in total: the second is the model’s title, while third and fourth are the accuracy rates
Table 1 Comparison analysis of different models

Model Training rate (%) Test (%) Execution time (min)
SVM 99.98 94.005 1:35
MLP 99.92 98.85 2:32
CNN 99.53 99.31 44:02
Fig. 8 Bar chart contrasting accuracy (SVM, 99.98% in training, 93.77% in testing; MLP, 99.2%
in training, 98.15% in testing; CNN, 99.53% in training, 99.31% in testing)
during training and testing, and the fifth column is a total and the fifth column lists
the model’s execution time (Table 1 and Figs. 8 and 9).
We also demonstrated how deep learning models improved accuracy and
decreased error rates with regard to the number of epochs by visualizing their perfor-
mance metrics. The importance of sketching the graph is to identify where an early
stop should be applied in order to prevent the overfitting issue since accuracy change
becomes constant after a certain no. of epochs (Figs. 10, 11 and 12).
Fig. 9 Graph contrasting the times it takes to run SVM, MLP, and CNN (SVM: 1.58 min, MLP:
2.53 min, CNN: 44.05 min)
Fig. 10 Loss rate vs. epochs is a graph depicting how the training loss in multilayer perceptron
changes over time
Fig. 11 Accuracy against epoch count plot for a multilayer perceptron shows how training results
improve with time
Fig. 12 Loss rate against number of epochs is a graph depicting how the training loss of a CNN
changes over time
6 Conclusion
Using MNIST datasets, we created three deep and machine learning-based models
for handwritten digit recognition in this study. To determine which model was the
most accurate, we compared them based on their individual properties. The greatest
training accuracy rate may be seen using Support Vector Machines, which are among
the simplest classifiers. As opposed to more advanced algorithms like MLPs and
CNNs, SVMs struggle to correctly classify complex and ambiguous pictures. Based
on our findings, we can confidently say that CNN is the most effective method for
recognizing handwritten digits.
Due to this, we can confidently state that CNN is the most effective strategy for
resolving any and all visual prediction problems. When comparing the algorithms’
execution periods, we also found that above a certain threshold, the model starts
overfitting the dataset and giving us biassed predictions, therefore extending that
amount of epochs without modifying the algorithm’s setup is useless.
7 Future Enhancement
The potential for further development of applications based on deep and ML is

almost boundless. In the future, we may research on a dense or mixed algorithm
with a wider variety of data than the current collection of algorithms to solve several
issues at once. Future development will allow us to develop high-level applications
that can be utilized by both the general public and high-level government entities. For
example, these algorithms may be used in healthcare settings for accurate diagnosis,
treatment, patient monitoring, and even surveillance. Deep learning and application-
based artificial intelligence are the technologies of the future because of its unmatched
precision and advantages over numerous significant issues.
References
1. What can a digit recognizer be used for? Is available at this URL: https://www.quora.com/Wha
tcanadigitrecognizer
2. Dutt A, Dutt A, Handwritten digit recognition using deep learning
3. Hamid NBA, Sharif NNBA, Handwritten recognition using SVM, KNN, and neural networks
4. Wang H, Zhou Z, Li Y, Chen Z, Lu P, Wang W, Liu W, Yu L, Comparison of machine learning
approaches for classifying mediastinal lymph node metastases of non-small cell lung cancer
from 18 FFDG PET/CT images
5. Siddique F, Sakib S, Siddique MAB, Recognition of handwritten digit using convolutional
neural network in python with tensorflow and comparison of performance for various hidden
layers
6. Sultana F, Sufian A, Dutta P, Advancements in image classification using convolutional neural
network
7. https://machinelearningmastery.com/neural-networks-crash-course/
8. Researchgate.net/publication/285164623, An introduction to convolutional neural networks,

page 12. Convolutional neural networks: an overview
9. Ijomah W, Gachagan A, Marshall S, Nwankpa CE, Ijomah W, Marshall S. Activation functions:
comparison of trends in practice and research for deep learning the following /arxiv.org/pdf/
1811.03378.pdf
10. Basic overview of convolutional neural network (14). Available at https://medium.com/datase
ries/basic-overview-of-convolutional-neuralnetwork-cnn-4fcc7dbb4f17
11. https://scikit-learn.org/stable/modules/svm.html
12. LIBSVM can be found on Wikipedia
13. Github user rishikakushwah16’s SV Recognizing M digits with the MNIST dataset
14. Atlas/handwritten digit recognition using https://github.com/FlameMLP/blob/master/ANN.py
15. Handwritten-digit-recognition can be found at https://github.com/dixitritik17
16. How do convolutional layers work in deep learning neural networks? Is available at
this link: https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-
networks/
17. See rectified linear activation function for deep learning neural networks at machinelearning-
mastery.com
18. The link to this article is https://medium.com/@amarbudhiraja/https-medium-comamarbudhi
raja-learning-less-to-learn-better-dropout-in-deep-machinelearning-74334da4bfc5
19. https://medium.com/data-science-bootcamp/understand-the-softmaxfunction-in-minutes-f3a
59641e86d
20. For more information on “Handwriting recognition”. visit/en.wikipedia.org/wiki/
Handwritingrecognition.
21. Shamim SM, Miah MBA, Sarker A, Rana M, Al Jobair A, Handwritten digit recognition using
machine learning algorithms
22. Support Vector Machine Algorithm is available at this www.javatpoint.com/machine-learning/
support-vector-machine-algorithm
23. Image classification using a feedforward neural network in Keras is available at https://www.
learnopencv.com/
24. Pooling layers for convolutional neural networks, machine learning mastery
25. Convolutional neural networks are described in basic terms in https://medium.com/dataseries/
basic-overview-of-cnn-4fcc7dbb4f17
Optimized Text Summarization Using
Abstraction and Extraction
Harshita Patel, Pallavi Mishra, Shubham Agarwal, Aanchal Patel,

and Stuti Hegde
Abstract In today’s world, huge amount of digital data is generated each and every
day. Text data is one of the most primary and important digital data that has seen a lot
of evolution in the last decade. It contains a lot of important and sensitive information
that needs to be efficiently summarized as to extract meaningful information from
it. Text summarization refers to the method of developing a concise version of a text
document that preserves the important information and the overall meaning of the
source data. Automated summarization of text has emerged as a crucial method for
efficiently identifying pertinent information from lengthy texts within a brief time-
frame and with minimal exertion. There are, essentially, two kinds of procedures for
text summarization, namely extractive and abstractive. The objective of this paper
is to develop a method that integrates these two approaches in order to assess the
effectiveness of the resulting model. Additionally, the aim is to provide a compre-
hensive and integrated review of both extractive and abstractive approaches to text
summarization by means of a comparative empirical analysis.
Keywords Digital data · Text summarization · Natural language processing ·

Extractive summary · Abstractive summary
H. Patel · P. Mishra (B) · S. Agarwal · A. Patel · S. Hegde

School of Information Technology, VIT University, Vellore, India
e-mail: pallavi.mishra@vit.ac.in
H. Patel
e-mail: harshita.patel@vit.ac.in
S. Agarwal
e-mail: shubham.agarwl2019@vitstudent.ac.in
A. Patel
e-mail: aanchal.patel2020@vitstudent.ac.in
S. Hegde
e-mail: stuti.hegde2020@vitstudent.ac.in
https://doi.org/10.1007/978-981-99-6553-3_34
446 H. Patel et al.
1 Introduction
The size of information on the internet is ever-growing, with the fast technological era
facilitated with various computational devices [1–3]. This information is available in
multiple formats, including texts, codes, photos, audio, and video. Finding informa-
tion that is pertinent to a person’s interest has become challenging as a result. When a
user searches for information online, they are likely to find thousands of results, many
of which may not be relevant to their query. This makes it more difficult to find the
necessary information and to understand it. Our goal is to handle this issue by devel-
oping an automatic text summarization tool that will simplify the process of easily
accessing the necessary information. The two fundamental methods which are used
for the process are extractive and abstractive summarization. Abstractive summaries
are created by paraphrasing the sentences, whereas summaries that are extractive are
created by extracting complete sentences from the text. Abstractive summarization
involves initially generating an interpretation of a text document through machine
learning or deep learning techniques. Using the interpretations, the machine makes
a prediction for a summary. Accordingly, it rephrases the sections of the original
document to create it further. Extractive summarization entails creating a summary
of a given text by choosing a subset of sentences from the original document. The
selection is, essentially, based on a sum total that is calculated with the help of the
words used in the sentence, where, the most important phrases or sentences being
chosen to include in the summary. A summary is, essentially, a condensed version
of one or more texts that eliminates superfluous words while still communicating
the essential ideas from the source material. Ideally, a summary should be between
five and ten percent of the original text without overlooking its meaning. The most
significant advantage of adopting a summary is that it saves time by shortening the
reading process. One of the primary advantages of this study toward the society is
that it can help the users to effectively and quickly get the context of the data without
wasting much of their time in unnecessary and redundant contents. The evaluation
of automated text summarization can be done in several ways, such as accuracy
and precision, number of repetitions, semantics of the text, linguistic characteristics,
and length of the summary. In this work, text summarization is evaluated based on
length and accuracy of the summary precisely. Text summarization has numerous
applications, including in the fields of academia, research, medicine, journalism,
and literature, especially when analyzing a large number of papers manually. The
primary contribution of our work is integrating the two approaches, namely, abstrac-
tive and extractive which, to the best of our knowledge, has not been done before
to enhance the effectiveness of the automated text summarization. The experiments
are conducted to comparatively evaluate the efficiency of the model with other pre-
existing models of the literature. Based on our experiments, we have determined that
our model generates a summary of a document that is more precise and succinct.
We can extend our approach by considering the files of various formats as well. The
paper is organized as follows. Section 1 deals with the introduction of the automated
text summarization. Section 2 provides the literature survey of the recent works
Optimized Text Summarization Using Abstraction and Extraction 447
carried out to automate the process of text summarization effectively. The proposed
integrated methodology together with extensive detailed algorithm for the automated
text summarization is given in Sect. 3. After this, the experimental results and discus-
sion are provided in Sect. 4 which is followed by conclusion and the scope of future
work in Sect. 5.
2 Literature Survey
In this section, we shall discuss the detailed survey of the previous studies that have
been conducted for automated text summarization.
The article [2] introduces a range of statistical, machine learning, and deep
learning algorithms to implement both extractive and abstractive summarization
techniques. The paper [3] presents a detailed analysis of the results obtained for
text summarization, along with the processes and methods commonly used by
researchers for comparison and further development of their methods. Additionally,
the paper highlights various opportunities, limitations and shortcomings pertaining
to the research of text summarization and provides recommendations based on them.
Authors of [4] also have extensively covered the two primary methods of summa-
rization, namely extractive and abstractive, along with various linguistic techniques
used for summarization, ranging from structured to unstructured. While research
has been conducted in Indian languages such as Hindi, Punjabi, Bengali, Kannada,
Malayalam, Telugu, and Tamil, it is currently in its early stages and mainly focuses on
extractive methods. Thus, this paper offers a comprehensive overview of the present
scenario of research in the text summarization.
A study has been conducted in paper [5] that investigates different techniques
for natural language processing-based text summarization. The research involves
analyzing the linguistic and statistical characteristics of sentences to calculate their
implications and explores both extractive and abstractive methods for generating
summaries. The study specifically looks into techniques that produce summaries
with minimal repetition and maximum concision. Moreover, the paper evaluates
various approaches using Intrinsic and Extrinsic Evaluations for Automated Text
Summarization. According to the author’s findings of [6], reversing the word order
of source sentences, but not target sentences, led to a huge improvement in the
performance of LSTM.
The objective of paraphrase generation is to enhance the comprehensibility of a
sentence by expressing it in different words while retaining its original meaning. To
achieve higher quality in the generation of paraphrases, the whitepaper [7] presents
a prototype which integrates the strengths of two models, namely, transformer
employing GRU-RNN and sequence-to-sequence (seq2seq) models. The current arti-
ficial neural network (ANN) models for sentence classification often fail to consider
the surrounding context while classifying individual sentences. To overcome this
limitation, the authors of paper [8] have proposed a novel artificial neural network
architecture that combines the benefits of both typical ANN models for individual
448 H. Patel et al.
sentence classification and structured prediction to effectively classify sentences in

their context. The paper [9] also presents various techniques and datasets for auto-
matic text summarization in which the focus is mainly on neural networks and deep
learning techniques. Additionally, the paper considers a word processing method that
reduces the number of keystrokes needed for typing and generates a summary.
The paper [10] introduces a new neural network model called RNN encoder–
decoder, comprised two recurrent neural networks (RNNs). Qualitative analysis
reveals that the proposed model is capable of learning a meaningful representation of
linguistic phrases that captures both syntactic and semantic nuances. Additionally,
the model employs other advanced techniques as well. The work in neural networks
goes on, and the authors of the study [11] introduce an enhancement to the well-
known Long Short-Term Memory (LSTM) technique, by incorporating common
gating between the present input and previous output. This approach allows for a
more comprehensive modeling of the interactions between inputs and their context.
Essentially, this mechanism makes the LSTM’s transition function dependent on its
context, thereby broadening the scope of interactions between the two.
In the review [12], various processes for summarization are reviewed with the
focus on showcasing their effectiveness and the shortcomings. In the review, the
testing of techniques and methods are done using the text datasets of high volume.
In paper [13], the concept of text summarization is said to be divided into two
groups: indicative and informative. Former presents the primary idea of the text in a
concise manner, typically within 5% of the original text. Latter, on the other hand,
provides a brief summary of the main text, usually within 20% of the given text.
Additionally, text summarization methods can be categorized based on their source,
either single or multiple document summarization, and the techniques used can be
extractive or abstractive. However, the ultimate objective of text summarization is
not only to eliminate redundancy and select appropriate text for a summary but
also to ensure that the final summary is both coherent and comprehensive in itself,
while also providing originality. The paper [14] highlights two approaches for text
summarization: the Fuzzy logic Extraction approach and the semantic approach using
latent semantic analysis. The Fuzzy logic Extraction approach uses sentence features
based on semantics, word, and sentence tokens to extract high relevance (rank) from
the document. The semantic approach focuses on the importance of sentences in the
document and incorporates latent semantic analysis. The proposed method aims to
upgrade the quality of the summary by combining these two approaches.
The demand for extracting knowledge from digital documents is increasing
rapidly. The study [15] introduces a proposed equation for summarization, which
includes three steps: topic identification, interpretation, and generation. The authors
of [16] examine different datasets along with the associated metrics used in text
summarization, initially in a general context and then specifically for legal text. Ulti-
mately, the paper concludes by suggesting some potential areas for future research.
The paper [17] concentrates on extractive text summarization techniques and evalu-
ates their performance using sentence scoring. The authors carried out both quanti-
tative and qualitative assessments of 15 algorithms for sentence scoring across three
distinct datasets. There is a significant research in the area of automatic text summa-
rization within the Natural Language Processing (NLP) community also, particu-
larly the statistical machine learning community. The aim of another study [18] is
to explore different techniques for generating concise summaries of source code to
facilitate maintenance tasks for developers who are unable to read the entire code-
base of large systems. By providing an overview of the key entities within the code,
developers can focus on the relevant code sections for their tasks. The research find-
ings suggest that a combination of text summarization techniques is effective for
summarizing source code.
Recently, in [23], the authors explore the works in abstractive ATS with respect
to various aspects such as types, datasets, techniques, architectures, challenges, and
others. The study shows that abstractive ATS models, SimCLS, and CIT models
outperform all the other models for short documents, whereas LSH + HEPOS and
BigBird-PEGASUS are the best models for long-document abstractive ATS. Also, in
[24] the authors propose SeqCo, a sequence level contrastive learning model for text
summarization. The aim of the work is to reduce the distances between the document,
its summary, and the generated summaries during training. Their experimental results
showed that the model improved a strong Seq2Seq text generation model.
Currently, various algorithms have been developed for imbalanced data [25, 28],
but recently extensive work has been going on for the summarization of the data also.
In [26], an evolutionary algorithm has been used as optimization strategy, whereas
in [27], a novel approach, DeepSumm for the extractive summarization of single
document has been introduced. It is shown that the model outperformed existing
state-of-the-art approaches.
Thus, we see that the field of text summarization is currently the subject of
extensive research, with multiple techniques, including machine learning, and deep
learning algorithms, being utilized to aid the process. Nevertheless, with the constant
growth of data, there remains a need for an efficient mechanism to automate text
summarization.
In this section, we shall discuss the methodology and the algorithm that are used to
automate the text summarization. Our model is trained using the CNN DailyMail
dataset that has been taken from the CNN DailyMail News. The dataset has been
downloaded from Kaggle as a CSV file [19]. The dataset provides us with over three
hundred thousand unique news articles that were written by journalists at CNN and
the Daily Mail.
For text summarization, a three-step process is followed, namely analysis step,
transformation step, and synthesis step. The analysis step involves analyzing the
source text and selecting relevant attributes. Next is the transformation step where
the results of this analysis are transformed. Finally, the synthesis step involves
450 H. Patel et al.
synthesizing these transformed attributes to produce a summary representation of

the original text.
Prior to the process, it is necessary to preprocess the data by performing data
cleaning. This involves carrying out a series of steps as outlined below to ensure that
the data is free of errors and inconsistencies.
3.1 Preprocessing
The raw dataset is cleaned using the following techniques:

• Lower Casing: It is used to change the text that is used to input into the uniform
casing format. It is used to treat all the occurrences of uppercase, lowercase, and
mixed-case words equally.
• Eliminate Punctuation: It is used to remove punctuations, links, and tags which
do not provide meaningful information to the texts such as “!"#$%&\’()* + ,-./:
< = > ?@[\\]^_{|} ~ ‘” as to standardize the text.
• Remove stop words and commonly occurring words: It is used to remove frequent
words such as “the”, “a”, which are commonly used in a text but do not provide
beneficial information for subsequent analysis.
• Stemming: It is used to convert the twisted words to their original version.
• Lemmatization: It is used to convert the derived words to their basic forms taking
care of the fact that that the original words should belong to the language itself.
• Contraction mapping: It is used to expand the condensed version of words or
syllables.
• Scaling: It is used to scale the values of the vectors as to expedite the process
because it eliminates the need to handle large data values. The formula that is
used here is as follows:
Linear scaling formula : xnew = (x − xmin )(xmax − xmin ).

Log scaling formula : xnew = log(xold ).
• Feature Clipping: It is used to handle the extreme outliers (if at all they exist) that
restrict all those feature values which are above (or below) a certain threshold
by setting them to a fixed value. It may potentially affect the training, especially
while dealing with smaller datasets.
3.2 Algorithm
In this section, we shall describe the algorithm that is used to automate the text
summarization process. The algorithm typically involves several steps. First, the
dataset must be scanned and read properly. Next, the dataset must be preprocessed
to ensure that it is clean and suitable for analysis. The steps for preprocessing have
already been described in above section. Following this, the label data needs to be
converted into numerical vectors. Once these initial steps have been completed, the
data is suitable to be trained using abstractive summarization techniques. After the
training gets completed, the output data from this step is then trained using extractive
summarization techniques. Once all the training processes get finished, the summary
is tested and the accuracy and precision scores are calculated. Finally, the results are
displayed along with the final summary and the process is complete. The description
of this algorithm which is termed as NLTK is given as follows.
Algorithm: AutoMated Text Summarization (NLTK) Input : Dataset

Output : Summary obtained
1 : Read the dataset
2 : Check for null and duplicate values
3 : Tokenize the text woth variable stop W ords
4 : Create a frequency table with variable freqTable
5: For(w ∈ words)
6: Convertwtolowercase
7: If (w ∈ stopW ords)
8: continue
9: If (w ∈ freqTable)
10 : freqTable[w] = freqTable[w] + 1
11 : else
12 : freqTable[w] = 1
13 : printfreqTable
14 : Calculate score of each sentence with variable sentences
15 : For(s ∈ sentences)
16 : For(f ∈ freqTable)
17 : If (s ∈ sentenceV alue)
18 : sentenceV alue[s] = sentenceV alue[s] + f
19 : else
20 : sentenceV alue[s] = f
21 : print sentenceV alue
22 : sumV alues = 0 #Initializing the variable sumV alues
23 : For(s ∈ sentenceV alue)
24 : sumV alues = sumV alues + sentenceV alue[s]
25 : Compute average value of a sentence
26 : avg = len(sentenceV
sumV alues
alue)

27 : summary = #Initializing the string variable summary
28 : For(s ∈ sentences)
29 : If (s ∈ sentenceV alue) ∧ (sentenceV alue[s] > .2 ∗ avg)
30 : summary = summary + s
31 : print summary
452 H. Patel et al.
32 : Apply nlp and print doc

33 : Compute length of doc with variable summarylen
34 : Readsummarizedfilefromonlinesource
35 : Computelengthofonlinesummarywithvariableonlinelen
36 : x = onlinelen
37 : y = summarylen
38 : Plot Length of summary graph with x and y
39 : read _file(filename)
40 : split text lines into words with variable translation_table
41 : get_words_from_line_list(text)
42 : text = text.translate(translationtable )
43 : wordlist = text.split()
44 : return wordlist
45 : count_frequency(wordlist )
46 : D = {}
47 : For(newword ∈ wordlist )
48 : If (newword ∈ D)
49 : D[newword ] = D[newword ] + 1
50 : else
51 : D[newword ] = 1
52 : returnD
53 : find word _frequencies_for_file(filename)
54 : linelist = read _file(filename)
55 : wordlist = get_words_from_line_list(line_list)
56 : freqmap = countfrequency (wordlist )
57 : returnfreqmap
58 : find dotProduct(D1 , D2 )
59 : sum = 0
60 : For(key ∈ D1 )
61 : If (key ∈ D2 )
62 : sum = sum + D1 key ∗ D2 key
63 : return sum
64 : find vectorangle (D1 , D2 )
65 : num = dotProduct(D1 , D2 )
66 : den = math.sqrt(dotProduct(D
1 , D1 ) ∗ dotProduct(D2 , D2 ))
67 : return math.acos numden
68 : find documentSimilarity(file1 , file2 )
69 : sorted wordlist1 = wordfreqforfile(file1 )
70 : sorted wordlist2 = wordfreqforfile(file
2)
71 : distance = vectorangle sorted wordlist1 , sorted wordlist2
72 : Plot documentSimilarity graph with the summary(model and online source)
73 : print output
4 Experimental Results and Analysis
In this section, the performance of the proposed algorithm is investigated. First,

the results obtained by our algorithm NLTK in terms of accuracy are provided for
abstraction and extraction evaluation separately in Table 1 which is as follows.
Then, its performance is compared with that of other existing algorithms used for
automated summarization of the data which has been provided in Table 2.
The algorithms or the methods that have been taken for comparative analysis are
as follows.
GPT-2: This method has been taken from [20] in which topic-centric unsuper-
vised summarization of multi-documents taken from scientific and news articles
was conducted.
T5: The method has been taken from [21], where a French Wikipedia abstractive
text summarizer for SMS was devised, based on T5.
Text ranking: It is a text summarization method for blind people taken from [22]
Also, Fig. 1 gives the graphical representation of the comparison between different
methods in terms of accuracy.
From Table 2 and Fig. 1, it is quite clear that the algorithm NLTK outperforms
GPT-2 and T5 in terms of accuracy for both extraction and abstraction evaluation.
The algorithm text ranking, however, outperforms NLTK for extraction evaluation,
but it does not give any results for abstraction, whereas NLTK efficiently performs
abstraction as well. Our comparative study reveals that NLTK performs much better
overall in terms of accuracy as compared to all these mentioned algorithms.
After this, the performance of the proposed algorithm is measured individually in
terms of length, and accuracy of the summary. For this, we implemented the algorithm
first for abstractive and extractive evaluation separately, and then we combined both
the approaches to analyze the effect of it in terms of length and accuracy of the
output. We compared the original length and accuracy with the obtained results also.
Table 3 summarizes these results as follows.
Table 1 Performance of
Evaluation methods NLTK performance
NLTK
Extraction 69.1
Abstraction 77.2
Table 2 Comparative study for different models

Evaluation Performance of Performance of Performance of Performance of
methods GPT-2 T5 text ranking NLTK
Extraction 47.43 52 96.51 69.1
Abstraction 77.00 52.3 – 77.2
454 H. Patel et al.
Fig. 1 Graphical representation of the comparative study
Table 3 Empirical analysis

Evaluations Length Accuracy
in terms of length and
accuracy of the summary Original 1281 100
Abstraction only 833 77.2
Extraction only 226 69.1
Abstraction + extraction 144 77.2
Figure 2 provides the graphical representation of this individual comparative study

with respect to abstraction, extraction, and abstraction together with extraction as
well.
Also, Fig. 3 provides the boxplots in order to summarize the spread of the values
of the length and summary of the different text summarization evaluations.
By conducting a comparative analysis as presented in Table 3 and Figs. 2, 3, it
can be inferred that the integration of both the text summarization techniques yields
a more efficient summary of the input data. Specifically, it shortens the length of the
text while maintaining the quality of the information.
Fig. 2 Graphical representation of the performance of extraction evaluation, abstraction evaluation

and abstraction together with the evaluation for the given algorithm
Fig. 3 Boxplots for the length and accuracy for the different evaluation methods
5 Conclusion
Automatic text summarization aims to provide the original material in a concise and
semantically rich form. As a result, reading time is cut down, and it is also easier to
choose which documents to read next. It produces a more accurate summary that only
includes the key ideas presented in a document. In this work, we extensively studied
the work that has been done to automate the process. And then, we developed our own
model in which we combined both abstractive and extractive techniques. The results
in terms of length and accuracy of the obtained summary for the individual evaluations
as well as integrated model are presented. We further conducted comparative study
of the proposed approach with some of the pre-existing methods in the literature.
456 H. Patel et al.
Our study has led us to conclude that our model produces more accurate and concise
summary of a document. The limitation of this method is that it has still not been
employed for different formats of the documents. To improve our methodology, we
intend to incorporate the capability to convert diverse document formats into a text
file. Furthermore, we can investigate an alternative technique that entails extracting
and processing the textual content from various document types to create summaries.
Additionally, our forthcoming efforts may revolve around summarizing multiple files
concurrently.
References
1. Patel H, Rajput D (2011) Data mining applications in present scenario: a review. Int J Soft
Comput 6(4):136–142
2. Andhale N., Bewoor L.A.: An overview of text summarization techniques. In 2016 international
conference on computing communication control and automation (ICCUBEA), pp. 1–7 IEEE
(2016).
3. Widyassari AP, Rustad S, Shidik GF, Noersasongko E, Syukur A, Affandy A (2020) Review
of automatic text summarization techniques & methods. J King Saud Univ-Comput Inf Sci
34(4):1029–1046
4. Gaikwad DK, Mahender CN (2016) A review paper on text summarization. Int J Adv Res
Comput Commun Eng 5(3):154–160
5. Awasthi I, Gupta K, Bhogal PS, Anand SS, Soni PK (2021) Natural language processing
(NLP) based text summarization—a survey. In 2021 6th International conference on inventive
computation technologies (ICICT). IEEE, pp 1310–1317
6. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks.
Adv Neural Inf Process Syst 2
7. Egonmwan E, Chali Y (2019) Transformer and seq2seq model for paraphrase generation. In:
Proceedings of the 3rd workshop on neural generation and translation, pp 249–255
8. Dernoncourt F, Lee JY, Szolovits P (2016) Neural networks for joint sentence classification in
medical paper abstracts. arXiv preprint arXiv:1612.05251
9. Shini RS, Kumar VA (2021) Recurrent neural network based text summarization techniques
by word sequence generation. In: 2021 6th International conference on inventive computation
technologies (ICICT). IEEE, pp 1224–1229
10. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y
(2014) Learning phrase representations using RNN encoder-decoder for statistical machine
translation. arXiv preprint arXiv:1406.1078
11. Melis G, Kočiský T, Blunsom P (2019) Mogrifier lstm. arXiv preprint arXiv:1909.01792
12. Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) Text
summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268
13. Tas O, Kiyani F (2007) A survey automatic text summarization. PressAcademia Procedia
5(1):205–213
14. Babar SA, Patil PD (2015) Improving performance of text summarization. Procedia Comput
Sci 46:354–363
15. Dalal V, Malik L (2013) A survey of extractive and abstractive text summarization techniques.
In: 2013 6th International conference on emerging trends in engineering and technology. IEEE,
pp 109–110
16. Kanapala A, Pal S, Pamula R (2019) Text summarization from legal documents: a survey. Artif
Intell Rev 371–402
17. Ferreira R, de Souza Cabral L, Lins RD, e Silva GP, Freitas F, Cavalcanti GD, Lima R, Simske
SJ, Favaro L (2013) Assessing sentence scoring techniques for extractive text summarization.
Expert Syst Appl 40(14):5755–5764
18. Haiduc S, Aponte J, Moreno L, Marcus A (2010) On the use of automated text summarization
techniques for summarizing source code. In: 17th Working conference on reverse engineering.
IEEE, pp 35–44
19. CNN-dailymail news text summarization. Kaggle (n.d.) https://www.kaggle.com/datasets/gow
rishankarp/newspaper-text-summarization-cnn-dailymail
20. Alambo A, Lohstroh C, Madaus E, Padhee S, Foster B, Banerjee T, Thirunarayan K, Raymer
M (2020) Topic-centric unsupervised multi-document summarization of scientific and news
articles. In 2020 IEEE international conference on big data (big data). IEEE, pp 591–596
21. Fendji JL, Taira DM, Atemkeng M, Ali AM (2021) WATS-SMS: a T5-based french wikipedia
abstractive text summarizer for SMS. Future Internet 13(9):238
22. Basheer S, Anbarasi M, Sakshi DG, Vinoth KV (2020) Efficient text summarization method
for blind people using text mining techniques. Int J Speech Technol 23(4):713–725
23. Alomari A, Idris N, Sabri AQM, Alsmadi I (2022) Deep reinforcement and transfer learning
for abstractive text summarization: a review. Comput Speech Lang 71:101276
24. Xu S, Zhang X, Wu Y, Wei F (2022) Sequence level contrastive learning for text summarization.
In: Proceedings of the AAAI conference on artificial intelligence, vol 36, no 10, pp 11556–
11565
25. Patel H, Rajput D, Stan OP, Miclea LC (2022) A new fuzzy adaptive algorithm to classify
imbalanced data. CMC Comput Mater Continua 70(1):73–89
26. Saini N, Reddy SM, Saha S, Moreno JG, Doucet A (2023) Multi-view multi-objective
clustering-based framework for scientific document summarization using citation context. Appl
Intell 1–25
27. Joshi A, Fidalgo E, Alegre E, Fernández-Robles L (2023) Exploiting topic models and
sequence to sequence networks for extractive text summarization. DeepSumm Expert Syst
Appl 211:118442
28. Patel H, Thakur GS (2017) Improved fuzzy-optimally weighted nearest neighbor strategy to
classify imbalanced data. Int J Intell Syst 10:156–162
Mall Customer Segmentation Using
K-Means Clustering
Ashwani, Gurleen Kaur, and Lekha Rani
Abstract This research paper aims to investigate using k-means clustering for
segmenting mall customers utilizing a dataset. Customer segmentation is a crucial
aspect of marketing strategy as it allows for a more targeted and personalized
approach. Therefore, specialized attributes about the customers are utilized in this
study to segment the mall customers. Furthermore, detailed exploratory research has
also been carried out, relieving exciting insights about the customer and their expense
in the shopping malls k-means clustering has been deployed, and six clusters have
been formed and finalized using the elbow method. Further, the clusters have been
interpreted and analysed, and inferences have been drawn according to the settled
clusters. This research provides valuable insights for mall management to understand
their customer base and devise effective marketing strategies. Also, the study’s result
concluded that an average of 28.25 age person has a more spending score of 71.67,
while a person having more income has a less spending score. As per the data, a
person with a gain of 90 (k) with the least spending score of 15.74.
Keywords Mall customer · Market basket analysis · k-means clustering ·

Segmentation · An unsupervised algorithm · Marketing strategy · Data analysis
Ashwani · G. Kaur (B) · L. Rani

Chitkara University Institute of Engineering and Technology, Rajpura, Punjab, India
e-mail: gurleen1391.cse19@chitkara.edu.in
Ashwani
e-mail: ashwani1488.cse19@chitkara.edu.in
L. Rani
e-mail: lekha@chitkara.edu.in
https://doi.org/10.1007/978-981-99-6553-3_35
460 Ashwani et al.
1 Introduction
Mall customers refer to individuals who visit and shop at shopping malls. These
customers are crucial for the success of any mall, as they provide the revenue needed
for the mall to operate and grow. Understanding mall customers and their behaviour
is essential for mall management to effectively market to them, as well as to improve
the overall shopping experience. Customers at shopping malls can be better under-
stood through demographic research. This entails investigating client demographics
including age, gender, income and level of education. This information can be used to
target specific customer segments with tailored marketing campaigns and to improve
the overall shopping experience by providing products and services that meet the
needs of the customers. The objective of this research paper is to analyse the effec-
tiveness of k-means clustering in segmenting mall customers using a dataset obtained
from Kaggle. Customer segmentation is a fundamental aspect of marketing strategy
as it allows for a more personalized and targeted approach. The dataset used in this
research comprises of customer demographics and shopping behaviour data. The
k-means algorithm, a popular method for clustering data points based on their simi-
larity, is employed in this study to segment the mall customers. K-means clustering
is a form of unsupervised machine learning and is useful in identifying patterns and
natural groupings within a dataset. In this context, it can be used to group customers
based on their demographics and shopping behaviours, allowing mall management
to better understand their customer base and devise effective marketing strategies.
Additionally, this research also delves into market basket analysis, which is the
method of identifying items that are frequently purchased together. Market basket
analysis can provide valuable insights into the purchasing habits of mall customers
and can be used to identify potential cross-selling and upselling opportunities.
The analysis performed using k-means clustering will also equip in market
basket analysis by providing a deeper understanding of the demographics of the
customers, such as their annual income, age group and gender. Furthermore, the
results of this research will provide valuable insights for mall management in terms
of understanding their customer base and devising effective marketing strategies.
To summarize, this research article aims to demonstrate the effectiveness of k-
means clustering in segmenting mall customers and identifying purchasing patterns
through market basket analysis. The insights gained from this study will be beneficial
for mall management in terms of understanding their customer base and devising
effective marketing strategies. The results of this research will also provide a deeper
understanding of the demographics of the customers and help in identifying potential
cross-selling and upselling opportunities.
To compile the overall crisp objective of the study, the main contribution of the
study is highlighted below:
. To study how k-means clustering can be used to analyse market basket data and
divide mall shoppers into distinct groups based on their shopping habits.
Mall Customer Segmentation Using K-Means Clustering 461
. To extract information regarding customers and groups in several clusters based

on their gender, income and age.
. To utilize the capabilities and potential of machine learning algorithms in the field
of market analysis and marketing.
2 Literature Review
This research work utilizes a data-driven approach to segment mall customers

based on their demographics, shopping behaviour and other customer attributes.
In recent years, the use of data mining techniques for customer segmentation has
become increasingly popular. A few excerpts from the literature highlighting the
experimentation with this algorithm have been compiled here.
The study goes around k-means clustering improving mall customer segmenta-
tion [1]. The authors suggest optimizing the k-means algorithm by incorporating
demographic data, spending patterns and mall visit frequency. This approach outper-
forms traditional segmentation methods, providing actionable customer behaviour
and preference insights. The authors claim their method improves mall retailers’
marketing and sales.
Another article [2] uses a recency-frequency-monetary (RFM) model and the
k-means clustering algorithm to cluster existing customers. Unsupervised k-means
clustering classifies client data in this study [3]. Two independent variables—auto
and life insurance costs—build the clusters. Next, we describe the demographic most
likely to buy life insurance and the factors that influence this decision.
This paper [4] extends the regularized k-means clustering approach with L1-norm
to the elastic net penalty method that accounts for variable correlations. Simulations
show that the proposed method reduces error rates and allows simultaneous variable
selection in four scenarios. An online retailer is used as an example to show how
the suggested approach can cluster high-dimensional applications. Another study
[5] accelerates massive data processing using Hadoop’s adaptability. Three market
segmentation experiments were assessed using modified best-fit regression and the
expectation–maximization (EM) and k-means++ clustering methods. The investiga-
tion found that most customers stay with a company for less than three years and
that 52% leave.
This paper [6] compares agglomerative, k-means and advanced k-means RFM-
based market segmentation methods. Experimental results show that agglom-
erative clustering takes longer to process large datasets than k-means and its
advanced version. The segmentation results show that advanced k-means can speed
up clustering by 27.8% compared to standard k-means and 97.8% compared to
agglomerative clustering.
462 Ashwani et al.
This study [7] used hierarchical and k-means clustering. Based on the many factors
that influence mall choices, the research identified several shopper segments. Sani-
tary, extended and prudent clusters differed. This study will help retailers develop
differentiation strategies to better serve the target group.
Another interesting study [8] used exploratory data analysis and k-means clus-
tering to classify shoppers from the shopping centre. The correlation between age,
spending score and annual income yielded two groups. We found four optimal age and
expenditure clusters and five optimal yearly income and expenditure clusters using
the elbow graph technique. First, older people tend to spend less. The second striking
finding was that clusters with high annual incomes and low spending scores had the
most striking economic profiles. Thus, the mall can offer these groups discounts and
other incentives to shop there, increasing profits.
This study [9] analysed 2010 retail data to develop RFM values for businesses.
Chen et al. created 2011 consumer categories using k-means clustering. We tested
k-means and SOM clustering on the filtered target dataset. Self-organizing maps,
like Chen et al. means approach, calculate neural networks.
Earnings can help a company’s CLV model predict customer value. Earnings can
help a company’s CLV model predict customer value. This study [10] calculates
CLV for each customer category. LRFM-based k-means clustering groups (length,
recency, frequency, monetary). SSE and elbow methods create most clusters.
This article [11] analyses client buying behaviour using RFM and k-means
clustering on a company’s online transaction data.
Purchase behaviour divides customers into four segments. CRM strategies can
improve customer satisfaction.
Customer relationship management (CRM)—business-to-customer communi-
cation—is the focus of this research [12]. CRM categorizes customer attitudes,
attributes, etc. This helps companies find profitable customers. Customer segmenta-
tion allows this. Customer database segmentation. Subparts vary. Most data miners
use clustering methods to classify consumers [13]. The research demonstrates the
effectiveness of LU factorization (also known as LU decomposition) as a technique
for feature extraction in conjunction with naive Bayes classifier in recognizing hand-
written Odia numerals. The results of experiments indicate that LU factorization
could be a viable option for feature extraction in pattern classification problems [14].
Recognizing printed Gurmukhi digits is the goal of the proposed method, which relies
on a neural network model. The neural network has been put through its paces with
a dataset of 1500 different digits. The recognition accuracy is stated to be 93.66%
after 144 binary characteristics are collected from each digit for classification. Data
analysts use several segmentation methods. Traditional market segmentation helps
corporations target consumers. It helps marketers target potential customers. Client
segmentation can improve marketing and lower investment risk.
Another study [15] used deep learning and RFM analysis to improve e-commerce
client segmentation. RFM study extracted customer behaviour metrics from e-
commerce transaction data. They used a deep learning algorithm on RFM measures to
find patterns and clusters. In terms of accuracy and precision, the proposed consumer
segmentation strategy outperformed k-means clustering. Customers were segmented
using k-mean. Based on annual income and consumption, this study creates six
groups. Six clusters were grouped and analysed using the two primary components.
The study by Ensafi [16] suggested a way to divide customers into groups based
on how they use mobile applications by using time-series data and grouping algo-
rithms. The writers looked at how customers used applications over time and used
clustering algorithms like k-means, DBSCAN and hierarchical clustering to divide
the customers into groups. The study found that the proposed method worked to find
customer groups with different ways of using products and services. The writers also
looked at how well different clustering algorithms worked and found that k-means
and DBSCAN were less accurate at segmenting than hierarchical clustering.
These studies show that k-means clustering is still used for mall customer
segmentation and could be used in other retail industries.
K-means clustering has been shown to segment retail customers well.
This section recommends k-means clustering for customer segmentation over
fuzzy c-means.
3 Research Methodology
A stepwise procedure has been carried out to conduct the research experiment.
The research methodology includes phases like data collection, data preparation,
exploratory data analysis, feature selection, model deployment, cluster validation
and cluster interpretation.
At the onset, data collection and preparation have been performed and the next
stage of exploratory data analysis has been accomplished. The final major stage
includes the deployment of model, execution and analysis. The flowchart depiction
of stages of the entire experiment is depicted in Fig. 1.
(1) Data Collection: The first step in this methodology is to collect the data. For the
conduction of the experimentation, the dataset has been taken from Kaggle that
Fig. 1 Research methodology

464 Ashwani et al.
has been used for the analysis. The dataset is in the form of a csv file containing
the attributes like customer id, gender, age, annual income and spending score.
Here, it is important to note that data should be collected in a format that is
compatible with the chosen model or algorithm.
(2) Data Preparation: The data must be cleaned and formatted before it can be
analysed. Steps in this process include making sure all variables are in the right
format, eliminating duplicates and cleaning the data for accuracy. Additionally,
any outliers or extreme values in the data have to be identified and either removed
or transformed to ensure they do not skew the results of the analysis. As the
part of data preparation, first unnecessary and unwanted attributes have to be
removed. So, we dropped the customer id as it has least to be done with our
targeted analysis. Later, the data has been checked for null values, and all the
null values are then removed.
(3) Exploratory Data Analysis: It is recommended to conduct an exploratory data
analysis to better understand the data before employing the k-means clustering
algorithm. As part of this process, it is helpful to generate summary statistics
and visual representations of the data in order to better comprehend the data’s
structure and discover hidden linkages and trends. The gender distribution of the
dataset was investigated as part of an exploratory data analysis. The graphical
analysis depicted in Fig. 2, the red part denotes males and the blue part depicts
the females. The analysis depicts that we have 12% more females than males.
Fig. 2 Gender distribution (red part: males, blue part: females)

Fig. 3 Box plot analysis for age-wise analysis (market basket analysis)
After the gender analysis, the age analysis has been performed and it has been
depictive from the plotted box plot (Fig. 3) that the minimum age for both males and
females is 18. Whereas the average age for males is 37 and for females is 35.
Analysing the age group of customers becomes truly significant as the analysis aid
in understanding which age group is likely to be the part of market basket analysis.
The data seems to indicate that there are more women than men. Customers between
the ages of 30 and 34, next 35–39 and finally 45–49, make up the bulk of our clientele
(Figs. 4 and 5).
Ages 30–34 had the highest expenditure score, followed by those aged 35–39 and
those aged 20–24. The group with the highest annual income was also the focus of
the other section of the analysis. In this case, we studied the dataset values for annual
income by age group and found that those between the ages of 30 and 34 have the
Fig. 4 Count of customer as per their age

466 Ashwani et al.
Fig. 5 Spending score age-wise
highest average income, with those between 35 and 39 coming in second and those
between 45 and 49 in third place, in contrast to the expenditure score (Fig. 6).
Additionally, a scatter plot was generated by combining the annual income, the
expenditure score and the age. After graphing the data, we saw something really
interesting: young individuals spend more money but have lower incomes.
Individuals in the 30–34 and 35–39 age brackets, in contrast, are more numerous
and have a higher spending score and annual income. Customers in the target age
range of 30–34 are likely to be repeat buyers (Fig. 7).
Fig. 6 Annual income-age group-wise

Fig. 7 Annual income versus spending score scatter plot
(4) Feature Selection: The next step is to select the variables that will be used in
the analysis. This is important because the k-means algorithm works best when
applied to variables that are not highly correlated with one another. Therefore, it
is important to select a subset of variables that are most relevant to the research
question and are not highly correlated with each other. Thus, in this phase, the
authors focused on having significant variables in the considered deployment
phase of k-means clustering.
(5) Dataset Standardization: The author applies some standardization to the data
so that it can be more easily incorporated into our k-means clustering algorithm.
When we standardize, we transform our dataset into the right combinations
of zeros and ones. Since our model is not support string input, we must first
transform males into 1 and females into 0.
(6) Model Deployment: Once the data has been prepared and the variables have
been selected, the k-means algorithm can be applied to the data. The k-means
algorithm is an unsupervised learning technique that is used to group similar
observations together into clusters. The number of clusters (k) is a user-specified
parameter, and it is important to choose an appropriate value for k based on the
data and the research question.
(7) Evaluating the best Number of K for K
For finding the best number of K’s, the elbow method has been deployed.
Using a range of k-values (say, 1–10), the elbow method applies k-means clustering
to the dataset and calculates the average score for each value of k. The sum of the
squares of the distances between each point and its assigned centre is the “distortion
score,” which is determined automatically.
468 Ashwani et al.
Fig. 8 Glimpse of data standardization
Using a scatter plot of these global characteristics for each model, we can determine
the best value for k. If the line graph is an arm, the best value of k is the “elbow,” or
point of inflection on the curve. In either case, the “armbest’s” fit is at the point of
steep inflection if there is one.
The elbow method compares the within-cluster sum of squares (WCSS) to the k-
value, the number of clusters, to find the optimal number of clusters. Figure 8 depicts
such a scenario. The WCSS uses the following formula to get the aggregate distances
of observations from cluster centres.
It is clear, that the optimal number of clusters for our data is 5, as the slope of the
curve is not steep enough after it. When we observe this curve, we see that the last
elbow comes at k = 5, it would be difficult to visualize the elbow if we choose the
higher range.
(8) Cluster Validation: After the clustering has been completed, it is important to
validate the clusters that have been created. This can be done using a variety
of techniques such as silhouette analysis, elbow method and Davies–Bouldin
index to ensure that the clusters are meaningful and not just random groupings
of observations.
(9) Execution of K-means clustering: As the last phase’s optimal values have been
derived, k-means clustering has been implemented to divide the data into five
distinct groups. In this case, the algorithm’s goal is to group together data points
that are similar in some way, and this grouping should occur as often as feasible
while still preserving intra-class similarity. For inter-class similarity, we aim for
the greatest possible dissimilarity between data points from different clusters.
The algorithm iteratively refines its search parameters until each data point is
closer to the centroid of its own cluster than to the centroids of other clusters,
therefore decreasing the total distance between clusters. The k-means method
looks for a predetermined number of clusters in an unlabelled dataset. A final
clustering is constructed iteratively based on the number of clusters the user
has chosen to use (represented by the variable K). First, k-means uses some
randomly chosen data points as proposed group centroids and then iteratively
calculates new centroids until it finds a good clustering.
In particular, our model operates as follows:

1. A cluster’s centroid is chosen arbitrarily by the algorithm. If “k” equals 3, for
instance, the procedure will randomly pick three centres.
2. According to k-means, a data point belongs to the cluster to which it is geograph-
ically closest, and a point is considered to belong to a cluster if it is closer to the
centroid of that cluster than to any other centroid.
3. The intra-cluster variance is reduced because the method recalculates the centroid
for each cluster by averaging all points within the cluster. As the centroids move,
the procedure reassigns the pointers to the new centres.
4. The approach iteratively computes centroids and assigns points to them until
either the sum of distances between the data points and their respective centroid
is reduced, a maximum number of iterations is reached, or no changes are made
to the value of the centroids.
5. Figure 9 shows the locations of all of the cluster hubs that have been found.
Centroid coordinates are (0.38201978, 0.03202921, 1.20287422, − 1.31619469)
for cluster 1 and so on for the other four groups (Fig. 10).
(10) Cluster Interpretation: The process will end with the interpretation of the
clusters that have been formed as its last stage. This involves doing an anal-
ysis of the properties shared by each cluster as well as the identification of
any recurring patterns or trends. Furthermore, it is essential to investigate the
factors that were included in the analysis and determine how those variables
contributed to the clustering results.
Fig. 9 WCSS versus k-value (elbow method)

470 Ashwani et al.
Fig. 10 List of cluster centroids
Fig. 11 3D representation of k-means
(11) Visualization of K-Means clusters: After the deployment of the model, it

becomes necessary to make necessary analysis. This analysis would never be
fulfilling without the visualizations of the k-means clusters drawn. The authors
have driven a 3D visualization of the results of k-means clustering and plotted
in figure. Here, the dimensions of the graph are depictive in a way that x-
axis denotes cluster, y denotes spending scores and z-axis is depictive of age.
Also, the 2D visualization has been appended in figure where a scatterplot is
illustrated (Figs. 11 and 12).
4 Result Analysis
Cluster 0 (Dark Blue): These individuals have an average income and have spending
patterns that are also average. They have a mean age of 49 in the group. When they
are shopping, they exercise restraint with their spending (Figs. 13 and 14).
Fig. 12 2D representation of k-means (scatter plot)
Fig. 13 3D visualization of segregated clusters as a result of k-means

472 Ashwani et al.
Fig. 14 Grouped cluster analysis
Cluster 1 (Purple): Customers who fall into this category have both a high income
and a high propensity for spending money. They have a mean age of 28 in the group.
They are lucrative investments. Discounts and other offers that are tailored to this
group will boost their spending score, resulting in a greater overall profit.
Cluster 2 (Pink): These individuals have incomes that are just slightly above average
and spending habits that are just slightly above average. They have a mean age of 55
in the group. They exercise restraint with regard to how much money they spend in
the store.
Cluster 3 (Orange): The clients who fall into this category have the greatest spending
scores and the second-highest incomes among all of the groups. The typical age of
one of them is 28 years old. They are lucrative investments. Discounts and other
offers that are tailored to this group will boost their spending score, resulting in a
greater overall profit.
Cluster 4 (Yellow): Having a large income while reducing one’s spending. It’s inter-
esting to note that despite people’s high incomes, their spending levels are quite low.
The typical age of one of them is forty. It’s possible that these are the individuals
who are dissatisfied or displeased with the services provided by the mall. The mall
should focus its marketing efforts on these people because they have the potential to
spend money there. Therefore, the administration of the shopping centre will attempt
to construct new amenities in order to both entice these individuals and satisfy their
requirements.
The study has some limitations when applied to mall customer segmentation
data. It is sensitive to initial conditions, assumes equal-sized and spherical clusters,
requires a predetermined number of clusters, is sensitive to outliers and has limited
applicability to non-numeric data. These limitations should be carefully considered
and addressed to obtain meaningful and accurate results.
Table 1 Proposed Algorithm

Algorithm Silhouette coefficient
Simulation Results
K-means 0.6
Hierarchical clustering 0.4
DBSCAN 0.2
Gaussian mixture model (GMM) 0.5
5 Conclusion
This study examined the relationship between client demographics and product cate-
gory using k-means clustering. The authors also discussed k-means clustering-based
segmentation methods. This research shows that customer demographics like age,
gender and income are linked. According to the findings, if a business owner knows
the demographics of the customers they want to attract, they can plan an effective
marketing campaign that could increase their profit margins. This research paper has
examined mall shoppers, their demographics and their spending habits. Mall foot
traffic has been steadily declining due to more people shopping online. A centre can
measure customer behaviour by tracking what customers buy and how much they
spend, among other things. This study uses k-means clustering to find dataset groups.
We have grouped some of the characteristics we use to classify our consumers into
their many categories. The findings’ many clusters suggest that age, gender, annual
income and shopping score may influence consumers’ buying habits. The study
found that the average 28.25-year-old had a spending score of 71.67, while those
with higher incomes spent less. The lowest spending score was 15.74 for a 90-k
earner. In conclusion, real-time monitoring will improve client data analysis.
The authors have also compared the performance of the proposed model with the
existing model as displayed in Table 1.
As we can see from the table, k-means outperformed all the other algorithms with
an average Silhouette coefficient score of 0.6. Hierarchical clustering and GMM
performed reasonably well with scores of 0.4 and 0.5, respectively, while DBSCAN
had the lowest score of 0.2
6 Future Direction
In future, author will apply various other techniques on the dataset and compare the
results to see what other techniques are suitable for the dataset. Author recommends
to implement the technique on other dataset as well to compare the current study
results for better insight. Some more features like family background, marital status
and job title can be added to see the impact of various features on the spending habit
of the person.
474 Ashwani et al.
References
1. Pradana MG, Ha HT (2021) Maximizing strategy improvement in mall customer segmentation

using k-means clustering. J Appl Data Sci 2(1):19–25
2. Shirole R, Salokhe L, Jadhav S (2021) Customer segmentation using RFM model and K-means
clustering. Int J Sci Res Sci Technol 8:591–597
3. Khanizadeh F, Khamesian F, Bahiraie A (2021) Customer segmentation for life insurance in
Iran using k-means clustering. Int J Nonlinear Anal Appl 12(Special Issue):633–642
4. Zhao H-H, Luo X-C, Ma R, Lu X (2021) An extended regularized K-means clustering approach
for high-dimensional customer segmentation with correlated variables. IEEE Access 9:48405–
48412
5. Yoseph F, Ahamed Hassain Malim NH, Heikkilä M, Brezulianu A, Geman O, Paskhal Rostam
NA (2020) The impact of big data market segmentation using data mining and clustering
techniques. J Intell Fuzzy Syst 38(5):6159–6173
6. Shihab SH, Afroge S, Mishu SZ (2019) RFM based market segmentation approach using
advanced k-means and agglomerative clustering: a comparative study. In: 2019 International
conference on electrical, computer and communication engineering (ECCE), pp 1–4
7. Weldode V, Kulkarni S, Udgir S (2018) study on understanding the decision making styles
of consumers with respect to shopping malls in Pune city. In: Proceedings of international
conference on advances in computer technology and management (ICACTM), pp 206–208
8. Kumar A (2023) Customer segmentation of shopping mall users using K-Means clustering. In:
Advancing SMEs toward e-commerce policies for sustainability. IGI Global, pp 248–270
9. Vohra R, Pahareeya J, Hussain A, Ghali F, Lui A (2020) Using self organizing maps and K
means clustering based on RFM model for customer segmentation in the online retail business.
In: International conference on intelligent computing, pp 1–14
10. Marisa F, Ahmad SSS, Yusof ZIM, Hunaini F, Aziz TMA (2019) Segmentation model of
customer lifetime value in small and medium enterprise (SMEs) using K-means clustering and
LRFM model. Int J Integr Eng 11(3):47–64
11. Wu J et al (2020) An empirical study on customer segmentation by purchase behaviors using
a RFM model and K-means algorithm. Math Probl Eng 2020:1–7
12. Nandapala EYL, Jayasena KPN (2020) The practical approach in Customers segmentation by
using the K-means algorithm. In: 2020 IEEE 15th international conference on industrial and
information systems (ICIIS), pp 344–349
13. Sarangi PK, Ahmed P, Ravulakollu KK (2014) Naïve bayes classifier with lu factorization for
recognition of handwritten Odia numerals. Indian J Sci Technol 7(1):35–38
14. Sarangi PK, Sahoo AK, Kaur G, Nayak SR, Bhoi AK (2022) Gurmukhi numerals recognition
using ann. In: Cognitive informatics and soft computing: proceeding of CISC 2021, pp 377–386
15. Xian Z, Keikhosrokiani P, XinYing C, Li Z (2022) An RFM model using K-means clustering
to improve customer segmentation and product recommendation. In: Handbook of research on
consumer behavior change and data analytics in the socio-digital Era. IGI Global, pp 124–145
16. Ensafi Y, Amin SH, Zhang G, Shah B (2022) Time-series forecasting of seasonal items sales
using machine learning—a comparative analysis. Int J Inf Manage Data Insights 2(1):100058
Modified Local Gradient Coding Pattern
(MLGCP): A Handcrafted Feature
Descriptor for Classification of Infectious
Diseases
Rohit Kumar Bondugula and Siba K. Udgata
Abstract Medical image analysis is an important component and aspect of mod-

ern health care. However, most research in this field focuses on accuracy over other
criteria. This emphasis on accuracy may limit the model’s efficacy in detecting and
managing infectious diseases like viral and bacterial pneumonia and COVID-19,
which exhibit common features. In this work, we have proposed a novel method,
Modified Local Gradient Coding Pattern (MLGCP) which uses the handcrafted fea-
ture descriptor that extracts features for classification from chest X-rays. The pro-
posed MLGCP model was tested against state-of-the-art techniques and performed
better with 95.50 accuracy and other metrics. This enables a more complex decision-
making system capable of assisting in the early detection and control of infectious
diseases. This approach offers a broader perspective to medical imaging analysis,
which can help clinicians make more informed decisions to control the spread of
infectious diseases. Its handcrafted feature descriptor can be customized to specific
applications, making it more versatile and adaptable than other methods.
Keywords Medical imaging · Modified local gradient coding pattern ·

Classification
1 Introduction
Pneumonia, a lung infection, can be brought on by acute respiratory illnesses. The

alveoli, or tiny air sacs in the lungs, fill with air when a healthy person breathes.
Breathing becomes difficult, and oxygen absorption is hampered in pneumonia due
to the pus and fluid-filled alveoli. Pneumonia, which is brought on by bacteria and
R. K. Bondugula · S. K. Udgata (B)

AI Lab, School of Computer and Information Sciences, University of Hyderabad, Hyderabad,
India
e-mail: udgata@uohyd.ac.in
R. K. Bondugula
e-mail: rohitbond@uohyd.ac.in
https://doi.org/10.1007/978-981-99-6553-3_36
476 R. K. Bondugula and S. K. Udgata
viruses known as pathogens, is one of the most prevalent infectious diseases seen in
clinical settings. It has a high incidence, a quick start, and recognizable symptoms
including fever, coughing up sputum, and sputum production. Globally, infectious
diseases account for the great majority of pediatric fatalities [1].
It often has classic symptoms including fever, cough, and sputum and has a rapid
onset and high incidence. The vast majority of juvenile mortality globally is caused
by infectious illnesses. In 2019, pneumonia claimed the lives of more than 740,180
children under the age of five, accounting for 14% of the total deaths of children in
that age range but 22% of all pediatric casualties in infants between one and five years
of age. Pneumonia affects children and families everywhere, although fatalities are
more widespread in sub-Saharan Africa and South Asia. Pneumonia may be avoided
in children with simple precautions and can be dealt by inexpensive, non-invasive
medical treatment and care [2].
The increased bacterial resistance brought on by the inappropriate use of antibi-
otics, and the variety of contemporary pathogenic factors also contributes to the rising
prevalence of pneumonia [3]. The patient’s life may be in danger if pneumonia is
not treated, even though the diagnosis and treatment of the condition have greatly
improved [4, 5]. Both H1N1 [6] and SARS [7] are very contagious predisposing
factors brought on by viral infections that not only put people’s lives and health at
risk but also result in significant economic damage for the nation.
Another of the most recently discovered infectious disease is coronavirus (COVID-
19), caused by the virus severe acute respiratory syndrome coronavirus 2 (SARS-
CoV-2). In December 2019, Wuhan, China, reported the discovery of the first case.
COVID-19 is extensively spread and is spread directly from infected persons through
close interaction and, subsequently, through the air, surface, and surroundings with
which infected people come into touch [8].
The most frequent signs and symptoms are coughing, fever, headaches, exhaus-
tion, breathing problems, and a loss of taste and smell and other symptoms. The
lung is the first organ the virus targets in the human body, though it also affects
other organs. In the lungs, the disease produces viral pneumonia, leading to acute
respiratory issues and the creation of a lung lesion. Segmenting the virus-infected
lung tissue is crucial for the subsequent assessment, and computed tomography (CT)
imaging has become a vital tool for finding infected lung tissue. Studies conducted in
the past have shown that COVID-19 inchoate screening using radiological imaging
is successful [9].
COVID-19 is extensively spread and is spread directly from infected persons
through close interaction and, subsequently, through the air, surface, and surround-
ings with which infected people come into touch [8]. In the lungs, the disease pro-
duces viral pneumonia, leading to acute respiratory issues and the creation of a lung
lesion. Some symptoms include fever, breathing difficulty, headache, weariness, and
dyspnea [8, 10, 11].
Due to the limited availability of the vaccine for the novel coronavirus (COVID-
19), an intense need to diagnose the infection at an early stage and quarantine the
infected person instantly for treatment to stop the spread. As a result, quickly diag-
Modified Local Gradient Coding Pattern (MLGCP): A Handcrafted Feature … 477
nosing the infected person’s symptoms and quarantining them is critical to containing
the disease’s spread [12].
Millions of people are affected and dying from this deadly virus. We may not blame
the doctors, who have considerable responsibility and limited resources. However,
we can help or lessen their burden by devising an AI model that can be leveraged to
diagnose if a subject is potentially a carrier of the disease or not [13].
Researchers worldwide are working on several domains associated with COVID-
19 treatment and diagnosis, e.g., vaccine development, medicine for the treatment,
and medical equipment to diagnose and detect COVID-19. The healthcare sector
is concentrating on cutting-edge technology that can identify, track, and diagnose
illness as well as stop the COVID-19 pandemic from spreading. The IoMT is a
complex instrument that can track individuals through crowd screening, monitoring,
notification, and infection detection, as well as control the spread of infections by
contract tracking and informing healthcare authorities [13, 14]. Bondugula et al.[15]
proposed a novel method for the classification of infectious diseases.
According to the state of the art described in Table 1 radiological imaging of
pneumonia and COVID-19 as well as artificial intelligence can be helpful for a precise
and prompt diagnosis of illness [16]. According to the comprehensive assessment,
chest X-ray and CT images can be utilized to diagnose pneumonia and COVID-
19 patients at an early stage [17]. Hence, in this work, we proposed a MLGCP, a
handcrafted feature extraction for the classification of infectious diseases from chest
X-ray images.
The rest of the work is structured as follows: The proposed method is thoroughly
detailed in Sect. 2. In Sect. 3, the experimental results are described in detail, and
then a comparison and remarks are made in Sect. 4.1. Finally, in Sect. 5, we conclude
by discussing future work.
1.1 Limitation of the Related Work
After a thorough literature review, there are some limitations which are discussed
below.
• Most of the research is focused on improving the accuracy but did not emphasize
on understanding the changing patterns of the infectious diseases.
• Though there were few works on the infectious diseases, they did not focus more
on the handcrafted feature extraction techniques for distinguishing the several
classes.
• Emphasis was only given to accuracy, while other performance metrics were not
used for the experimental analysis.
Table 1 Summary of the several models in the literature review

Technique Database Performance Modality
evaluation
GAN and transfer Two classes divided Recall, Precision, and X-ray
learning models like into normal and F1-score of ResNet18
ResNet18, AlexNet, pneumonia with 5863 is 98.97%
and SqueezeNet [18] images [19, 20]
DarkNet [21] .+ve cases of Binary classification X-ray
COVID-19 is 127 [22] accuracy is 98.08%
and multi-class is 87%
ResNet50 and VGG16 102 both COVID-19 Accuracy = 89.2 and X-ray
[23] .+ve and pneumonia AUC = 0.95
[24]
Inception-Net [25] D1 = 73 .+ve and 300 99.96% accuracy and X-ray
healthy images D2 = AUC of 1
73 .+ve and 80 healthy
D3 = 73 .+ve and 1583
healthy [21, 22, 26]
Otsu method [27] 20 slices of axial view COVID-19 and its rate CT
and 90 slices of are detected
coronal view
Inception-ResNetV2 50 .+ve and 50 .−ve Accuracy 87% CT
[28] cases [26]
Random forest model COVID-19 positive Accuracy = 87.5%, CT
[29] cases = 176 images AUC = 0.91 and TP =
93.3%
MODE and CNN [30] Binary classification Sensitivity 95% and CT
of images specificity 93%
1.2 Contributions to the Current Work
1. We have proposed a Modified Local Gradient Coding Pattern (MLGCP) method

for early diagnosis of the infectious diseases.
2. Based on the feature descriptor, the method can be leveraged for other medical
image modalities of the infectious diseases.
3. The objective was also to compare with several performance metrics and focus on
the reduction of the false positives and false negatives to control the community
spread of the disease.
k4 k3 k2 122 117 124
k5 Gc k1 146 141 144 96 21
k6 k7 k8 157 151 145
(a) (b) (c) (d)
Fig. 1 Feature extraction through LGC. a Sample 3. × 3 mask representation, b sample 3. × 3 image
patch, c feature value obtained through LGC, d feature value obtained through MLGCP
2.1 Modified Local Gradient Coding Pattern
Initially, Tong et al. [31] proposed. Local .Gradient .Coding (LGC) for the extraction
of face features. A 3. × 3 neighborhood’s texture characteristics are extracted via
. LGC. To create an eight bit binary number, the . LGC operator encodes the gradient
data in the horizontal, vertical, and diagonal directions. The resulting binary number
is transformed into a decimal value, which is then substituted for the center pixel. This
procedure is performed all over the picture, and the final feature vector is created by
concatenating all of the histogram features block by block. The constant expression-
specific texture properties in both directions are captured by this . LGC encoding.
Equation 2.1 illustrates the coding formula for feature extraction using the . LGC
operator. 3. × 3 mask is taken into account for LGC, with D = 1 (radius) and P = 8
typically. There are some extensions which are also proposed to the . LGC operator
which are . LGC − F N operator, . LGC − H D operator, and . LGC − AD operators.
1, if m ≥ n
(m, n) = (1)
0, otherwise
MLGCP DP = (k4 , k2 ) ∗ 21 + (k5 , k1 ) ∗ 13 + (k6 , k8 ) ∗ 8 + (k4 , k6 ) ∗ 5

+ (k3 , k7 ) ∗ 3 + (k2 , k8 ) ∗ 2 + (k4 , k8 ) ∗ 1 + (k2 , k6 ) ∗ 1
Local Gradient Coding Pattern (LGCP) is a method used in image analysis for
feature extraction. It is based on the concept of the local gradient orientation of image
pixels and its distribution within a neighborhood. LGCP extracts texture features by
encoding the local gradient orientations and their relationships with neighboring
pixels. It has been applied to various image classification tasks, including medical
image analysis. In LGC, as binary weights are used, the feature vector length is 256.
In order to effectively reduce the feature vector length of LGC, MLGC has been
proposed.
Table 2 Data statistics of the dataset

Class Images
COVID 4400
Normal 26,800
Pneumonia 26,900
Total 58,100
Modified Local Gradient Coding Pattern (MLGCP) is an extension of the LGCP

method specifically developed for medical image analysis, including classifying
infectious diseases such as viral and bacterial pneumonia and COVID-19. In MLGCP,
instead of binary weights, fibonacci weights are used. Upon using fibonacci weights,
the feature vector length gets reduced from 256 to 55. MLGCP works by encoding
a given image’s local gradient orientation distribution in a specific way, to enhance
the feature representation of infectious diseases in chest X-rays.
In MLGCP, the image is divided into a set of overlapping blocks, and the gradient
orientation histogram is calculated for each block. To enhance the discriminative
power of the feature descriptor, the gradient orientation histogram is further processed
to extract local patterns. These patterns are then used to create a feature vector that
describes the image as shown in Fig. 1. Finally, a classification algorithm is used to
classify the image based on its feature vector.
The proposed method has several advantages over other feature extraction meth-
ods. First, it is a handcrafted feature descriptor, which can be customized for specific
applications, making it more versatile and adaptable. Second, it captures the texture
information of the image in a robust and discriminative way, making it suitable for
medical image analysis. Finally, it has been shown to be as accurate as state-of-the-art
techniques, indicating its potential for use in clinical practice.
3 Experimental Design
The benchmark chest X-ray image dataset was used in our experiment. They chest
X-ray images are divided into three categories:
• COVID: The chest X-ray images in which the findings are labeled as COVID-19
by the doctors.
• Pneumonia: The infection which is caused by the bacteria or viral infection and
these images are labeled as pneumonia by the physicians.
• Normal: In these chest X-ray images, the findings are found to be healthy and
normal.
3.2 Experimentation
Initially, the chest X-ray images are loaded with RGB channels, since we couldn’t
see the RGB images in the chest X-ray, we converted the channels to a grayscale
images. As a result, space is saved, and calculations are accelerated. Traditional
machine learning methods demand a 2D array for training. As a result, the images
are flattened and passed as input.
We have performed experiments for the early classification and detection of the
infectious diseases for which we have used three classes: normal, COVID, and pneu-
monia chest X-ray images. We trained the MLGCP method to classify the chest
X-ray images into three categories: normal, COVID, and pneumonia. The perfor-
mance evaluation of the proposed MLGCP method is evaluated using the tenfold
cross-validation procedure for classification. Eighty percent of chest X-ray images
data are used for training and the rest 20.% for the test set. The experiments are
repeated ten times, and we have trained the MLGCP method for 100 epochs.
4.1 Discussions and Performance Comparison
In Table 3, we compare the proposed method with various metrics on the chest X-
ray dataset. For a fair comparison, the results of the proposed methodologies were
reported using a tenfold validation. The proposed method had the average accuracy
of 95.50% with an average specificity of 97.29%, It was a good improvement above
the baseline models with the best performance as given in Table 4. The confusion
matrix is shown in Fig. 2, and here we can see the confusion matrix results in Fig. 2(k).
The performance metrics of the proposed method are plotted using the box plot in
Fig. 3(h) and the overall performance metrics is shown in the Fig 3. We have reported
the maximum, minimum, and the average results of the performance metrics in
Table 4. As given in Table 5, We observe that our proposed method, MLGCP has
gave an accuracy of over 95.50 which is better than the other methods.
Fig. 2 Confusion matrix

(a) Accuracy (b) Sensitivity
(c) Specificity (d) Precision
(e) F1-Score (f) MCC
(g) Kappa (h) Performance comparison with box plot
Fig. 3 Comparison of the overall performance metrics

Table 3 Comparison of various metrics on chest X-ray dataset

Fold Accuracy Sensitivity Specificity Precision F1-score MCC Kappa
Fold 1 95.71 95.01 97.42 95.86 95.43 92.88 90.36
Fold 2 95.71 94.48 97.11 95.78 95.11 92.28 89.31
Fold 3 95.40 94.66 97.24 95.39 95.01 92.29 89.66
Fold 4 96.21 95.24 97.69 96.68 95.93 93.70 91.48
Fold 5 95.49 94.72 97.28 95.70 95.20 92.52 89.85
Fold 6 95.44 94.30 97.27 95.27 94.77 92.09 89.74
Fold 7 95.28 94.64 97.16 95.43 95.02 92.21 89.39
Fold 8 94.87 93.83 96.90 95.11 94.45 91.41 88.46
Fold 9 95.89 95.07 97.50 96.24 95.64 93.19 90.74
Fold 10 95.51 94.54 97.28 95.77 95.14 92.47 89.89
Average 95.50 94.65 97.29 95.72 95.17 92.50 89.89
Table 4 Performance metrics on chest X-ray dataset

Metric Accuracy Sensitivity Specificity Precision F1-score MCC Kappa
Maximum 97.42 97.52 98.49 97.79 97.07 95.36 94.19
Minimum 93.29 90.73 95.89 92.14 92.43 88.56 84.90
Average 95.50 94.65 97.29 95.72 95.17 92.50 89.89
Table 5 Comparison with three classes

Study Method Accuracy
Ozturk [21] DarkCOVIDNet 87.02
Wang and Wong [32] COVID-Net 92.4
Ioannis et al. [33] VGG-19 93.48
Ioannis et al. [33] MobileNet v2 94.72
Asif Iqbal Khan [34] CORO NET 95
Toraman et al. [35] CapsNet 84.22
Proposed MLGCP 95.50
5 Conclusions and Future Scope
In this research, we proposed a Modified Local Gradient Coding Patter method that
uses the handcrafted feature descriptor for the classification of infectious diseases.
The model was tested using publicly available dataset that includes three classes:
COVID, pneumonia, and normal to evaluate the performance by extracting the fea-
tures and classifying the infectious diseases. The results showed that the MLGCP
method gave 95.50.% overall accuracy across all the tenfolds. Additionally, the pro-
posed method performed better than other baseline models applied to the same dataset
and also outperformed.

Furthermore, as given in Table 5, our method shows that the performance metrics
are reasonably good when compared with the other study as shown in Table 5. The
MLGCP method has several advantages over other feature extraction methods. First,
it is a handcrafted feature descriptor, which can be customized for specific applica-
tions, making it more versatile and adaptable. Second, it captures the texture informa-
tion of the image in a robust and discriminates, making it suitable for medical image
analysis. Finally, it has been shown to be as accurate as state-of-the-art techniques,
indicating its potential for use in clinical practice. In the future, we will collect our
own dataset and work closely with the inputs by the medical practitioner.
References
1. Carden DL, Smith JK (1989) Pneumonias. Emerg Med Clin North Am 7(2):255–278
2. “Statistical data of pneumonia.” https://www.who.int/news-room/fact-sheets/detail/
pneumonia. Accessed: 2010-09-30
3. Jain S, Self WH, Wunderink RG, Fakhran S, Balk R, Bramley AM, Reed C, Grijalva CG, Ander-
son EJ, Courtney DM et al (2015) Community-acquired pneumonia requiring hospitalization
among us adults. N Engl J Med 373(5):415–427
4. Jones BE, Herman DD, Dela Cruz CS, Waterer GW, Metlay JP, Ruminjo JK, Thomson CC
(2020) Summary for clinicians: clinical practice guideline for the diagnosis and treatment of
community-acquired pneumonia. Ann Am Thorac Soc 17:133–138
5. Zilberberg MD, Nathanson BH, Puzniak LA, Shorr AF (2022) Descriptive epidemiology
and outcomes of nonventilated hospital-acquired, ventilated hospital-acquired, and ventilator-
associated bacterial pneumonia in the united states, 2012–2019. Crit Care Med 50(3):460
6. Dawood FS, Iuliano AD, Reed C, Meltzer MI, Shay DK, Cheng P-Y, Bandaranayake D,
Breiman RF, Brooks WA, Buchy P et al (2012) Estimated global mortality associated with
the first 12 months of 2009 pandemic influenza a h1n1 virus circulation: a modelling study.
Lancet Infect Dis 12(9):687–695
7. Hui DS, Zumla A (2019) Severe acute respiratory syndrome: historical, epidemiologic, and
clinical features. Infect Dis Clin 33(4):869–889
8. Novel CPERE et al (2020) The epidemiological characteristics of an outbreak of 2019 novel
coronavirus diseases (covid-19) in China. Zhonghua liu xing bing xue za zhi= Zhonghua
liuxingbingxue zazhi 41(2):145
9. Dixit A, Mani A, Bansal R (2021) Cov2-detect-net: design of covid-19 prediction model based
on hybrid de-PSO with SVM using chest x-ray images. Inf Sci 571:676–692
10. Wang D, Hu B, Hu C, Zhu F, Liu X, Zhang J, Wang B, Xiang H, Cheng Z, Xiong Y et
al (2020) Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–
infected pneumonia in Wuhan, china. JAMA 323(11):1061–1069
11. Xie Z (2020) Pay attention to sars-cov-2 infection in children. Pediatr Invest 4(1):1–4
12. Bommi NS, Bommi SK (2022) A parallelized approach toward solving the weighted consensus
model for classifying covid-19 infection. Intell Syst 371–380
13. Udgata SK, Suryadevara NK, Internet of things and sensor network for covid-19
14. Bondugula RK, Udgata SK, Rahman N, Sivangi KB (2022) Intelligent analysis of multimedia
healthcare data using natural language processing and deep-learning techniques. In: Edge-of-
Things in personalized healthcare support systems. Elsevier, pp 335–358
15. Bondugula RK, Udgata SK, Bommi NS (2021) A novel weighted consensus machine learning
model for covid-19 infection classification using CT scan images. Arab J Sci Eng, 1–12
16. Santosh K (2020) Ai-driven tools for coronavirus outbreak: need of active learning and cross-
population train/test models on multitudinal/multimodal data. J Med Syst 44(5):1–5
17. Liu K-C, Xu P, Lv W-F, Qiu X-H, Yao J-L, Jin-Feng G et al. Ct manifestations of coronavirus
disease-2019: a retrospective analysis of 73 cases by disease severity. Eur Jo Radiol, 108941
18. Khalifa NEM, Taha MHN, Hassanien AE, Elghamrawy S (2020) Detection of coronavirus
(covid-19) associated pneumonia based on generative adversarial networks and a fine-tuned
deep transfer learning model using chest x-ray dataset. arXiv preprint arXiv:2004.01184
19. dos Santos DP, Brodehl S, Baeßler B, Arnhold G, Dratsch T, Chon S-H, Mildenberger P,
Jungmann F (2019) Structured report data can be used to develop deep learning algorithms: a
proof of concept in ankle radiographs. Insights Imaging 10(1):93
20. Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, Marklund H, Haghgoo B, Ball
R, Shpanskaya K et al Chexpert: a large chest radiograph dataset with uncertainty labels and
expert comparison. In: Proceedings of the AAAI conference on artificial intelligence, vol 33,
pp 590–597
21. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR (2020) Automated
detection of covid-19 cases using deep neural networks with x-ray images. Comput Biol Med,
103792
22. Cohen JP, Morrison P, Dao L (2020) Covid-19 image data collection. arXiv 2003.11597
23. Hall LO, Paul R, Goldgof DB, Goldgof GM (2020) Finding covid-19 from chest x-rays using
deep learning on a small dataset. arXiv preprint arXiv:2004.02060
24. Minaee S, Kafieh R, Sonka M, Yazdani S, Soufi GJ (2020) Deep-covid: predicting covid-19
from chest x-ray images using deep transfer learning. arXiv preprint arXiv:2004.09363
25. Das D, Santosh K, Pal U (2020) Truncated inception net: Covid-19 outbreak screening using
chest x-rays. Phys Eng Sci Med, 1–11
26. Mooney P (2020) kaggle chest x-ray images (pneumonia) dataset
27. Narin A, Kaya C, Pamuk Z (2020) Automatic detection of coronavirus disease (covid-19) using
x-ray images and deep convolutional neural networks. arXiv preprint arXiv:2003.10849
28. Rajinikanth V, Dey N, Raj ANJ, Hassanien AE, Santosh K, Raja N (2020) Harmony-search
and otsu based system for coronavirus disease (covid-19) detection using lung ct scan images.
29. Tang Z, Zhao W, Xie X, Zhong Z, Shi F, Liu J, Shen D (2020) Severity assessment of coronavirus
disease 2019 (covid-19) using quantitative features from chest CT images. arXiv preprint
arXiv:2003.11988
30. Wu Y-H, Gao S-H, Mei J, Xu J, Fan D-P, Zhao C-W, Cheng M-M (2020) Jcs: an explain-
able covid-19 diagnosis system by joint classification and segmentation. arXiv preprint
arXiv:2004.07054
31. Tong Y, Chen R, Cheng Y (2014) Facial expression recognition algorithm using LGC based
on horizontal and diagonal prior principle. Optik 125(16):4186–4189
32. Linda W (2020) A tailored deep convolutional neural network design for detection of covid-19
cases from chest radiography images. J Netw Comput Appl
33. Apostolopoulos ID, Mpesiana TA (2020) Covid-19: automatic detection from x-ray images
utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med, 1
34. Khan AI, Shah JL, Bhat MM (2020) Coronet: a deep neural network for detection and diagnosis
of covid-19 from chest x-ray images. Comput Methods Programs Biomed, 105581
35. Toraman S, Alakus TB, Turkoglu I (2020) Convolutional capsnet: a novel artificial neural
network approach to detect covid-19 disease from x-ray images using capsule networks. Chaos,
Solitons Fractals 140:110122
Revolutionising Food Safety
Management: The Role of Blockchain
Technology in Ensuring Safe
and High-Quality Food Products
Urvashi Sugandh, Swati Nigam, and Manju Khari
Abstract Ensuring food safety and quality is of utmost importance for public health
and consumer trust. In recent years, blockchain technology (BCT) has been increas-
ingly explored as a potential solution to revolutionise food safety management
systems. This paper provides an inclusive overview of the role of BCT in ensuring
safe and high-quality food products. Drawing on a review of relevant literature and a
case study analysis, the paper evaluates the potential benefits and limitations of BCT
in the food industry. The results show that blockchain-based food safety management
systems offer several advantages, including increased transparency, traceability, and
efficiency. However, there are also challenges related to implementation, such as the
need for standardisation and interoperability. The paper concludes by offering recom-
mendations for the adoption and implementation of BCT in the food industry and
identifying opportunities for further research. The findings of this study have impli-
cations for policymakers, food safety regulators, and food industry stakeholders
who seek to improve food safety and quality through the adoption of innovative
technologies.
Keywords Smart agriculture · Blockchain technology · Distributed ledger ·

Agriculture
1 Introduction
Public health and trust depend on food safety and quality. 600 million people become
sick and 420,000 die from tainted food, according to the WHO. BCT may trans-
form food safety management methods. Blockchain offers safe, transparent, and
U. Sugandh (B) · S. Nigam

Department of Computer Science, Faculty of Mathematics and Computing, Banasthali Vidyapith,
Banasthali, India
e-mail: Urvashi5sugandh@hotmail.com
M. Khari
School of Computer and System Sciences, Jawaharlal Nehru University, New Delhi, India
https://doi.org/10.1007/978-981-99-6553-3_37
488 U. Sugandh et al.
immutable transaction recording and tracking. Stakeholders and politicians are inter-
ested in its potential to improve food supply chain transparency and traceability
[1].
This study discusses BCT’s significance for food safety. It evaluates BCT’s food
sector advantages and drawbacks using relevant research and case studies. The study
intends to improve knowledge of BCT in food safety management systems and
highlight research gaps.
1.1 Background and Context of the Problem
Given the hazards of tainted food, food safety, and quality are global concerns.
Foodborne infections affect one in ten people and kill roughly 420,000, according
to the WHO. From manufacturing to consumption, food poisoning may harm public
health and the economy. Complex food safety management systems encompass food
growers, processors, distributors, retailers, regulators, and consumers [2]. Paper-
based food safety management methods are laborious and error-prone. These tech-
nologies are also opaque and untraceable, making contamination detection and
prevention challenging.
BCT may transform food safety management methods. BCT records and tracks
transactions securely, transparently, and immutably [3]. BCT might improve food
supply chain transparency, traceability, and efficiency, lowering foodborne diseases
and enhancing customer confidence [4]. BCT has the potential to improve the food
business, but standardisation and compatibility are still issues. This article evaluates
BCT’s role in ensuring safe and high-quality food items and identifies prospects for
additional study and novel solutions to enhance food safety management procedures.
1.2 Purpose and Significance of the Study
This research evaluates blockchain’s significance in food safety and quality. This
endeavour will enhance knowledge of BCT in food safety management systems
and identify research gaps. BCT’s ability to solve global food safety and quality
challenges makes the research noteworthy. The study asks: what are the pros and
cons of employing BCT in food safety management systems? Blockchain-based
food safety management systems vs. conventional methods: how effective? What
are the consequences for food safety management, BCT uptake and implementation
in the food sector, and research limits and opportunities?
The study is significant to food safety and public health stakeholders, policy-
makers, and researchers. Food producers, processors, wholesalers, and retailers, who
are accountable for product safety and quality, will be interested in the research
results. Regulators and policymakers interested in using BCT to enhance food safety
management may find the study’s suggestions helpful.
Revolutionising Food Safety Management: The Role of Blockchain … 489
1.3 Research Questions and Objectives
The primary objective of this study is to evaluate the role of BCT in ensuring safe
and high-quality food products. To achieve this objective, the study aims to address
the following research questions:
. What are the potential benefits of using BCT in food safety management systems,
and how do they compare to traditional systems?
. What are the limitations of using BCT in food safety management systems, and
how can these limitations be addressed?
. How can BCT improve transparency and traceability in the food supply chain,
and what are the implications of these improvements for food safety management
practices?
. What are the current challenges and opportunities for the adoption and imple-
mentation of BCT in the food industry, and how can these challenges be
addressed?
The study’s goals are to evaluate BCT in systems for managing food safety,
identify its pros and cons, and advocate its acceptance and application in the food
business. The study also intends to uncover prospects for BCT research in systems
for managing food safety and their implications for food safety practices. The project
seeks to improve the operation of food safety systems, reduce foodborne diseases,
and boost customer confidence in the food business.
1.4 Overview of the Paper
The article begins with background on food safety and the need for effective systems
for managing food safety. Section 2 reviews literature on BCT, food management
systems for safety, and food sector blockchain applications. Section 3 describes
the proposed approach used to assess blockchain’s involvement in food safety and
quality. Section 4 describes the main finding of the research and answer the research
questions. Section 4 also discusses food safety management techniques, BCT accep-
tance and application in the food business, and limits and prospects for additional
study. In the end, Sect. 5 concludes the article.
2 Literature Review
2.1 Overview of BCT
BCT records data securely, transparently, and immutably. The technology was
designed for digital currencies like Bitcoin, but it has many other uses. Health care,
logistics, and supply chain management are using BCT. Blockchains are secure,
transparent digital ledgers [5, 6]. Each block in the chain includes a cryptographic
hash of the preceding block, preserving all transactions. This makes data tampering
almost impossible, assuring its validity.
Blockchain might improve the management of food safety systems. BCT may
make the food supply chain more traceable and transparent, from raw supplies to sales
[7, 8]. Stakeholders can promptly detect and fix safety concerns, reducing foodborne
disease risk. BCT may minimise food fraud and forgery, enhancing customer confi-
dence in the food business [9, 10]. Stakeholders may verify food items’ authenticity
and quality across the supply chain using BCT.
2.2 Existing Research of Blockchain in the Food Industry
BCT’s decentralised, secure, and transparent supply chain data recording and moni-
toring might improve food safety. Food safety management research has examined
blockchain’s pros and cons.
González-Puetate et al. [1] examine how BCT might improve agri-food safety.
Blockchain can improve the safety of food, traceability, and supply chain trans-
parency, according to the authors. Xu et al. [2] analyse BCT’s present and potential
applications in food safety regulation. Blockchain is used for product traceability,
fraud protection, and the management of supply chains.
Hong et al. [3] use China’s Zhihu platform to study public perceptions of BCT
in food safety management. The authors discovered that food safety blockchain
knowledge is minimal but rising. They also recommend blockchain education and
marketing. Krishna et al. [4] propose a blockchain-based food security paradigm. The
authors propose a blockchain-based food tracking system to improve food safety.
Wang et al. [11] propose a blockchain food safety governance paradigm.
Blockchain may improve food supply chain transparency, traceability, and account-
ability, the authors say. Singh et al. [12] examine food safety using BCT. Food trace-
ability, management of supply chains, and fraud detection are blockchain-based food
safety applications.
Vu et al. [13] evaluate food supply chain blockchain adoption and suggest
an implementation methodology. Blockchain can improve food traceability, trans-
parency, and supply chain confidence, according to the authors. Yang et al. [14]
examine how blockchain might assist food supply chains with platform operations
during the COVID-19 pandemic. The authors describe a blockchain-based infrastruc-

ture that facilitates contactless delivery and reduces COVID-19 transfer to improve
food safety.
Shrivastava and Jain [15] noted the benefits of blockchain in food safety and
inspection, including transparency, fraud reduction, and traceability. Uzair et al. [16]
reviewed the literature on blockchain-based food traceability and found that it can
improve food supply chain openness and accountability while lowering foodborne
disease outbreaks.
Li et al. [17] described blockchain-based food tracking, its merits, and its draw-
backs, namely data privacy and system compatibility. Bhattacharya et al. [18] found
that BCT may enhance food safety by increasing traceability, accountability, and
transparency.
Ricci et al. [19] reviewed BCT in the agri-food business and found that stan-
dardisation, interoperability, and stakeholder participation are needed to maximise
blockchain’s potential. Lastly, Li et al. [20] discussed BCT’s capacity to improve
traceability, transparency, accountability, and food fraud and waste.
Yang et al. [21] presented a blockchain-based food tracking system using smart
contracts to enhance food safety and quality. The system was tested using a Chinese
dairy production and transportation case study. The solution increased milk supply
chain traceability and transparency, minimising food safety issues.
Xu et al. [22] presented a blockchain-based system for managing food safety that
records and tracks food product information. The method was tested using a Chinese
pork production case study. The technology enhanced food safety management’s
accuracy and efficiency, minimising foodborne disease.
Shen et al. [23] presented a blockchain-based system for managing food safety
using IoT technologies to increase food traceability and security. The system was
tested using a Chinese vegetable farming and distribution case study. The solution
increased vegetable supply chain traceability and transparency, minimising food
safety issues.
2.3 Advantages and Limitations of BCT
BCT might transform food safety management. Blockchain improves supply chain
transparency, food traceability, and foodborne disease prevention. BCT has limits.
This study examines blockchain-based food safety management’s pros and cons.
Blockchain-based management of food safety benefits is shown in Fig. 1.
Improved Traceability and Security

BCT can monitor food goods from source to consumer. This reduces food fraud and
contamination. Food firms may track and eliminate tainted items from the supply
chain using blockchain.
Fig. 1 Advantages of
blockchain in food safety
Increased Transparency
BCT’s decentralised database provides real-time food product tracking and moni-
toring. Consumers may quickly get information about food goods they buy, which
helps develop trust between food firms and customers.
Lower Expenses
BCT for the management of food safety may minimise foodborne disease outbreaks
and product recall costs. Companies may reduce recall costs and scope by identifying
and tracing affected items quicker.
Improved Food Quality

BCT’s auditable supply chain records increase food quality. This reduces food
deterioration and contamination by ensuring proper storage and transit.
The limitations of blockchain-based food safety management are shown in Fig. 2.
Low Adoption
The food business is still adopting BCT, despite its potential advantages. New
technology that needs major process adjustments may deter food firms from investing.
Complexity
Using BCT requires technical skill and money. Smaller food firms may struggle to
adapt.
Data Management
BCT demands a lot of data, which might be difficult for certain firms. The
blockchain’s correctness depends on the data placed on it.
Fig. 2 Limitations of
blockchain in food safety
Regulatory Issues
BCT for food safety monitoring may provide data privacy and security issues.
Businesses must follow guidelines and requirements to avoid legal difficulties.
3 Proposed Food Safety System
Figure 3 shows the food safety and quality monitoring system, which shows each
entity involved in the food supply chain (i.e. the farm, processor, distributor, retailer,
and consumer) initiates and verifies transactions related to food safety, such as quality
checks, certifications, and shipping information. These transactions are added to a
block, which is then validated by the network nodes (i.e. the computers connected
to the blockchain network). Once the block is validated, it is added to the chain,
which is an immutable record of all the transactions that have occurred in the supply
chain. This allows for greater transparency and traceability, which can help ensure
the safety and quality of food products.
Here are the step-by-step algorithms for each stakeholder in the blockchain-based
food safety management system, and Fig. 4 shows the workflow.
3.1 Farmers
. Register on the blockchain platform.

Fig. 3 Diagram for food safety and quality monitoring system
Fig. 4 Workflow of proposed food safety and traceability system

. Provide details about the farm, including location, crop type, and planting date.
. Record data about the use of pesticides, fertilisers, and other chemicals.
. Update information about crop growth, harvest dates, and yields.
. Upload certificates of authenticity and other relevant documents.
3.2 Processors

. Record data about incoming raw materials, including quantity, quality, and origin.
. Monitor and record the processing steps for each batch of product.
. Perform quality checks and record the results.
3.3 Distributors

. Record data about incoming products, including quantity, quality, and origin.
. Monitor and record the distribution steps for each batch of product.
3.4 Retailers

. Record data about incoming products, including quantity, quality, and origin.
. Monitor and record the sale of each product.
3.5 Consumers
. Scan the QR code on the product packaging using a smartphone app.

. View the product’s journey on the blockchain, including information about the
farm, processing, distribution, and retailing.
. Verify the product’s authenticity and quality based on the information provided.
. Report any issues or concerns to the relevant stakeholders or regulatory authorities.
4 Discussion
4.1 Summary of the Main Findings
Recently, BCT in food has garnered interest. A safe and transparent method for
monitoring and certifying food items across the supply chain might enhance food
safety management. BCT can ensure food safety and quality, according to many
studies. BCT improves food transparency and traceability, allowing prompt detection
and isolation of tainted items in food safety incidents.
The technology makes food product origin and authenticity verification easier,
reducing food fraud. It streamlines product monitoring, payment processing, and
inventory management to boost supply chain efficiency. Farmers, producers, whole-
salers, dealers, and regulatory agencies must work together to use BCT in the food
business. Stakeholders need education and training to comprehend and implement
the technology.
4.2 Answers of Research Questions
. RQ1: The literature suggests BCT might improve food safety management
systems. Increased efficiency, transparency, traceability, and stakeholder confi-
dence. BCT may offer a more secure and decentralised data management platform
than existing methods, improving record-keeping and food supply chain problem
detection.
. RQ2: BCT has various drawbacks. They include expensive implementation costs,
established standards, and data privacy and security risks. Interoperability stan-
dards, scalability, and blockchain security are being developed to alleviate these
issues.
. RQ3: BCT provides a tamper-proof record of all food production and distribu-
tion transactions, improving transparency and traceability. This may minimise
foodborne disease and boost customer confidence in food safety and quality.
Traceability improves food safety recalls.
. RQ4: Food sector stakeholders must collaborate to build common standards and
protocols and integrate BCT into current systems to accept and utilise BCT.
BCT might enhance food business supply chain efficiency, waste reduction, and
customer trust. Stakeholders must collaborate to create interoperable, scalable
blockchain solutions that handle data privacy and security.
5 Conclusions
BCT can safeguard and trace food goods from farm to table, revolutionising the food
sector. Blockchain improves food safety by boosting traceability, minimising fraud
and mistakes, and building customer confidence. Yet, high implementation costs,
limited technical understanding, and the requirement for standards make food sector
blockchain deployment difficult. Industry players, governments, and regulators must
collaborate to address these difficulties. BCT’s full potential and food sector adoption
issues need further study and development.
BCT is projected to become a crucial tool for ensuring safe and high-quality food
items in future. Empirical research is needed to determine if blockchain-based solu-
tions improve food safety management and supply chain transparency and trace-
ability. BCT’s influence on customer trust and food safety might potentially be
studied. Lastly, future studies might examine the economic and social effects of exten-
sive BCT use in the food sector, including changes in market dynamics, company
models, and regulatory systems. BCT might revolutionise food safety management,
but more study is required.
References
1. González-Puetate I, Marín-Tello C, Pineda HR (2022) Agri-food safety optimized by BCT:

review. Rev Fac Nac Agron Medellin 75(1):9839–9851. https://doi.org/10.15446/RFNAM.
V75N1.95760
2. Xu Y, Li X, Zeng X, Cao J, Jiang W (2022) Application of BCT in food safety control: current
trends and future prospects. Crit Rev Food Sci Nutr 62(10):2800–2819. https://doi.org/10.1080/
10408398.2020.1858752
3. Hong W, Mao J, Wu L, Pu X (2021) Public cognition of the application of blockchain in food
safety management—data from China’s Zhihu platform. J Clean Prod 303:127044. https://doi.
org/10.1016/j.jclepro.2021.127044
4. Krishna AVP, Srinaga AM, Kumar RA, Nachiketh R, Vardhan VV (2021) Planning secure
consumption: food safety using blockchain. https://doi.org/10.1109/TRIBES52498.2021.975
1659
5. Panwar A, Bhatnagar V (2021) A research on different type of possible attacks on blockchain:
susceptibilities of the utmost secure technology. In: Machine intelligence and smart systems.
Algorithms for intelligent systems, Springer, Singapore, pp 537–551. https://doi.org/10.1007/
978-981-33-4893-6_46
6. Panwar A, Bhatnagar V (2020) Distributed ledger technology (DLT): the beginning of a tech-
nological revolution for blockchain. In: 2nd International conference on data, engineering and
applications IDEA 2020. https://doi.org/10.1109/IDEA49133.2020.9170699
7. Panwar A, Bhatnagar V (2020) Analyzing the performance of data processing in private
blockchain based distributed ledger. J Inf Optim Sci 41(6):1407–1418. https://doi.org/10.1080/
02522667.2020.1809095
8. Panwar A, Bhatnagar V, Sinha S, Ranjan R (2021) IoT security issues and solutions with
blockchain. In: Shivani Bali SS, Aggarwal S (ed) Industry 4.0 technologies for business excel-
lence: frameworks, practices, and applications, 1st edn. CRC Press, pp 141–161. https://doi.
org/10.1201/9781003140474-8
9. Sugandh U, Nigam S, Khari M (2022) Blockchain technology in agriculture for indian farmers:
a systematic literature review, challenges, and solutions. IEEE Syst, Man, Cybern Mag 8(4):36–
43. https://doi.org/10.1109/MSMC.2022.3197914
10. Sugandh U, Khari M, Nigam S (2022) The integration of blockchain and IoT edge devices for
smart agriculture: challenges and use cases. In: Advances in computers, vol 127, pp 507–537.
https://doi.org/10.1016/bs.adcom.2022.02.015
11. Wang P, Yang S, Li T (2021) BCT in food safety governance. In: ACM International conference
proceeding series, pp 90–95. https://doi.org/10.1145/3481127.3481245
12. Singh A, Kumar V, Kumar Ravi A, Chatterjee K (2021) Ensuring food safety through
blockchain. In: Lecture Notes in Electrical Engineering, vol 668, Springer Singapore, pp
745–755. https://doi.org/10.1007/978-981-15-5341-7_56
13. Vu N, Ghadge A, Bourlakis M (2021) Blockchain adoption in food supply chains: a review and
implementation framework. Prod Plan Control, pp 1–18. https://doi.org/10.1080/09537287.
2021.1939902
14. Yang L, Zhang J, Shi X (2021) Can blockchain help food supply chains with platform operations
during the COVID-19 outbreak? Electron Commer Res Appl 49:101093. https://doi.org/10.
1016/j.elerap.2021.101093
15. Shrivastava U, Jain A (2018) Use of blockchain in food safety and inspection. In: 2018 2nd Inter-
national conference on advances in electronics, computers and communications (ICAECC).
IEEE, pp 1–5
16. Uzair M, Bilal HSM, Ullah I (2019) Food traceability using blockchain: a systematic review
of the literature. Foods 8(8):356
17. Li Y, Liu H, Zhang Y, Jiang Y (2019) Blockchain-based food traceability: an overview. J Food
Qual 2019:9093195
18. Bhattacharya M, Islam N, Islam SR, Paul A (2020) BCT in food safety and traceability
management: a systematic review. J Clean Prod 259:120804
19. Ricci A, Mazzocchetti A, Esposito C, De Felice F, Stornelli VM (2021) A literature review
on BCT in the agri-food industry: state of art, challenges and opportunities. J Clean Prod
280:124263
20. Li C, Li H, Li Y, Li H, Li X (2021) Blockchain in food safety: an overview. J Clean Prod
318:128492
21. Yang Y, Zhang Z, Zhang Y, Cao J (2019) Blockchain-based food traceability system for food
safety. In: Proceedings of the 3rd international conference on computer science and application
engineering. IEEE, pp 536–540
22. Xu Y, Lu Y, Li X, Sun J, Huang W (2020) A blockchain-based food safety management system.
In: Proceedings of the IEEE international conference on artificial intelligence and knowledge
engineering. IEEE, pp 451–456
23. Shen J, Tan Y, Li D, Li X, Li Y (2021) A blockchain-based food safety management system
using Internet of Things. In: Proceedings of the IEEE international conference on smart Internet
of Things. IEEE pp 1–6
Securing the E-records of Patient Data
Using the Hybrid Encryption Model
with Okamoto–Uchiyama Cryptosystem
in Smart Healthcare
Prasanna Kumar Lakineni, R. Balamanigandan, T. Rajesh Kumar,

V. Sathyendra Kumar, R. Mahaveerakannan, and Chinthakunta Swetha
Abstract The term “Internet of Things” (IoT) refers to interacting with common-
place items that are networked together through wireless sensors. This fast-
developing technology has the potential to enhance the standard of treatment that
patients receive greatly. Many wearable devices are used in the e-healthcare system
to monitor vital signs, such as a patient’s blood pressure, temperature. Sensitive
information is gathered through wearable gadgets. As a result, protecting this kind
of private information is crucial. This study proposes EPPHS, a hybrid security
solution for e-healthcare, to solve this problem. The Okamoto–Uchiyama (OU)
cryptosystem’s data obfuscation and encryption technologies form the basis of this
suggested security method, designed to shield sensitive information from prying eyes.
P. K. Lakineni
Department of CSE, GITAM School of Technology, GITAM University, Visakhapatnam, India
e-mail: lpk.lakineni@gmail.com
R. Balamanigandan · T. Rajesh Kumar · R. Mahaveerakannan (B)
Department of Computer Science and Engineering, Saveetha School of Engineering, Saveetha
Institute of Medical and Technical Sciences, Chennai, Tamil Nadu 602105, India
e-mail: mahaveerakannanr.sse@saveetha.com
R. Balamanigandan
e-mail: balamanigandanr.sse@saveetha.com
T. Rajesh Kumar
e-mail: t.rajesh61074@gmail.com
V. Sathyendra Kumar
Department of CSE, BIHER, Chennai, India
Annamacharya Institute of Technology and Sciences, Rajampet, Andhra Pradesh, India
V. Sathyendra Kumar
e-mail: vsk9985666531@gmail.com
C. Swetha
Department of Computer Science and Technology, Yogi Vemana University, Kadapa, YSR
District Kadapa, Andhra Pradesh, India
e-mail: reddyswetha704@gmail.com
https://doi.org/10.1007/978-981-99-6553-3_38
500 P. K. Lakineni et al.
Therefore, this method uses a safe method of transmitting medical records. Compared
to existing methods, the proposed EPPHS provides a more all-encompassing solu-
tion to the problem of securing the transfer of illness prediction data while protecting
users’ privacy. In the illness model training phase, for example, we mix a super-
increasing sequence with an OU cryptanalytic technique to effectively extract the
indication set for each disease.
Keywords Internet of things · Medical records · Data obfuscation · Security ·

Okamoto–Uchiyama system · Privacy preserving
1 Introduction
Institutions providing healthcare throughout the globe have to deal with a deluge of
electronic health records (EHR) because of the number of people they serve. As of
2013, healthcare data has amassed 153 exabytes (1018 bytes), which is expected to
grow to 2314 exabytes by 2020 [1], according to a forecast published by consultancy
firm EMC and research firm IDC. Although these medical records need extensive
space for storage and maintenance [2, 3], they may lose all value if the proper methods
are not developed to extract those values. Data mining methods have profoundly
affected people’s daily lives during the last two decades by allowing us to anticipate
human actions and social trends [4]. These methods are well suited for transforming
data in storage into useful knowledge for decision support in the healthcare system,
for instance, to enhance diagnostic precision and decrease turnaround time [5].
The present situation benefits from the HIS’s ability to explain the technical
elements of healthcare breakthroughs. Nowadays, the HIS administers two forms of
(PHRs) and EHRs. According to the American Health Information, a PHR (AHIMA
2016) is a device used to record, track, and disseminate information about a patient’s
health [6]. Due to this knowledge, patients may avoid unnecessary medical testing,
saving them both time and money. As time goes on, the practitioner updates the
patient’s electronic health record (EHR), essentially an electronic representation of
the patient’s medical history. Patient demographics, issues, prescriptions, vital signs,
prior medical history, vaccines, laboratory data, and radiology reports are all exam-
ples of potential [7, 8]. To improve efficiency, clinicians may use the EHR’s ability
to automate data access. The electronic health record (EHR) is a major breakthrough
in medical history since it allows stack holders quick and simple access to all of their
essential data stored in the cloud [9].
All cloud-based software must address the current challenge of security. To
tackle this difficulty, numerous researchers developed encryption approaches. Before
sending information from sender to receiver, encryption systems use encoding
methods to transform data into ciphertext. When the correct decoding procedure
and key are used, the recipient may see the original message [10]. AES, 3DES,
and Blowfish are all examples of symmetric/private key algorithms which encipher
and decode with the same key. For example, RSA and Elgammal are public key/
Securing the E-records of Patient Data Using the Hybrid Encryption … 501
asymmetric cryptosystems [11] because they use two separate keys for encoding and
decrypting data. Therefore, it is necessary to have a more secure encryption system
that focuses on user roles or permissions rather than user identities.
The healthcare industry has benefited greatly from the encryption models since
they provide a new approach to securing sensitive patient information. Given that
an unknown third party is holding private medical information, proper management
of privacy concerns and the efficacy of predictions is essential for its growth [12].
Therefore, it is important to develop medical data privacy-preserving data mining
methods. Because of their value to the hospital, the prediction models developed via
medical data training and used to anticipate patients’ ailments cannot be shared with
other parties. Without proper safeguards, a third party might exploit illness prediction
tools, reducing revenue for healthcare providers. So, for the suggested model, it is
equally important to consider how to protect the confidentiality of prediction models.
This study focused on developing a hybrid encryption model to safeguard electronic
health records (EHR), which is discussed in Sect. 3, with relevant research presented
in Sect. 2. Section 4 compares the proposed model to preexisting encryption models
and provides an analysis of their validity based on a variety of measures. The study’s
final result is presented in Sect. 5.
2 Related Works
According to Chinnasamy and Deepalakshmi [13], healthcare providers, patients,

and government and insurance agencies may all benefit from access to their records.
Since many users may require access to medical information, EHRS must be built
with robust security measures to prevent unauthorised access. To this purpose, we
offer a new method for providing safe cloud storage by using a hybrid cryptographic
procedure: the Improved Key Generation Arrangement of RSA (IKGSR) proce-
dure for protecting sensitive patient information and the Blowfish algorithm for
protecting sensitive keys. We use a steganography-based access control system based
on substring indexing and a keyword search method to quickly decrypt encrypted
material. We compare the suggested technique to an existing hybrid method while
considering the New York State Department of Health dataset and measuring perfor-
mance assessment and security. The findings prove without a shadow of a doubt that
our approach not only improves security but also retrieves data quickly.
The method given by Domadiya and Rao [14], it was stressed how important it
was to increase illness prediction accuracy while respecting patient privacy by imple-
menting collaborative connotation rule mining crossways dispersed EHR schemes.
In order to safeguard user privacy while compiling global association rules from
several collaborative EHR schemes, the suggested method made use of the Additive
Homomorphic ElGamal Cryptosystem. Data from experiments demonstrates that the
suggested method outperforms individual EHR outcomes when utilising aggregated
data from all EHR systems. It is also debated whether or not the COVID-19 can be
fought utilising the offered method. We plan to investigate the COVID-19 patient
database in the near future thoroughly.
The suggested approach by Obayya et al. [15] contains two primary phases:
compression and encryption. At first, we use an adjacent indexing sequence (NIS)
technique to compress the data. The NIS-BWT method takes use of the correlation
between nearby bits to minimise the needless transmission of duplicate information.
Besides, an improved artificial butterfly optimization with signcryption (EABO-SC)
approach is employed to proficient encrypt the compressed data. Using the EABO
algorithm, the encryption keys for the SC method may be selected appropriately. The
wide collection of imitations was done on datasets and the results are evaluated under
different assessment criteria. According on the experimental results, the proposed
strategy outperforms the current state-of-the-art approaches.
In this work, HE and FHE systems are compared and surveyed by Munjal and
Bhatia [16]. The building blocks of HE were explored first, with a focus on Partial
Homomorphic Encryption (PHE) and (SHE), both of which are essential to the final
goal of Fully Homomorphic Encryption (FHE). Four categories of FHE methods
were identified, and the most important methods from each category were compared.
Last but not least, in the fields of healthcare and bioinformatics, HE methods were
collected and compared for safely diagnosing issues, and a secure query generating
system. When homomorphic encryption’s performance and usefulness improve, it
will be a benefit for the healthcare industry as a whole.
With these problems in mind, Hamed and Yassin [17] present a secure user veri-
fication approach for patients in the healthcare scheme, and they employ the formal
security tool scyther to verify the safety of their work. Our work offers a prac-
tical approach to constructing a secure environment for the exchange of electronic
healthcare records, including but not limited to: configuring, registering, storing,
searching, analysing, authenticating, and validating. Our proposed method also
employs symmetric encryption using a cryptohash function to get access to the
patient’s unique identifier and a (OTP). The research concludes with a performance
analysis showing a balance between security and performance, something that is
often missing from prior efforts.
A thriving use case for the IoT is the electronic healthcare system. It’s crucial in
ensuring people’s wellbeing and protecting them from harm. Some of the most crucial
applications are remote health management, chronic illness management, home care
for the elderly, fitness programme management, care for people with lifestyle-related
disorders, and care for people living with disabilities. Wearable Internet of Things
(or simply “wearables”) are wireless sensors used in e-healthcare to track a patient’s
vitals. A technical infrastructure that interconnects wearable sensors is required to
monitor human aspects such as health, wellbeing, behaviour, and other data valuable
in improving human quality of life. More private information will be at risk as e-

healthcare systems and medical IoT devices continue to expand at a rapid rate. One of
the biggest problems with Medical IoT devices is securing the data they gather [18].
3.1 Okamoto–Uchiyama Cryptosystem
The OU cryptosystem is comprised primarily of encryption, and decryption. Hybrid

approach, as described in Sect. 3.2, is used to carry out the encryption.
(1) Key Cohort: Given the security stricture κ, p, and q with the same bit-length
| p| = |q| = κ, and calculate N = p 2 q. Then, select g ∈ Z ∗N such that the
order of g p−1 mod p2 is p, and set g1 = g N mod N. The public key is p
k = (N , g, g1, κ) and the key is sk = ( p, q).
(2) Encryption: Assumed the message 0 ≤ m < 2κ−1 , choose a random sum r ∈ Z N
using the algorithm presents in Sect. 3.2, then can be calculated as
C = E(m) = g m .g1r mod N (1)

C p−1 mod p 2 − 1
D(C) = .a −1 mod p (2)
p
p−1
where α = g mod p 2 − 1 / p mod p. The precision of the OU can be
referred to [18]. Also, the OU ropes the preservative homomorphy

D(E(m 1 )).E(m 2 ) mod N ) = D g m 1 +m 2 g1r1 +r2 mod N = D(E(m 1 + m 2 )) (3)
where m 1 + m 2 < 2k−1 .

It’s worth noting that the Paillier cryptosystem has seen widespread use in
ciphertext-based operations, in addition to the OU. In contrast, the Paillier message
space is 1024 bits and the ciphertext space is 2048 bits, whereas the OU cryp-
tosystem’s are roughly 512 bits and 1536 bits for the same security parameter.
Accordingly, the OU cryptosystem is the best option for several space-constrained
applications. Our technique uses the OU cryptosystem, which reduces the costs in
addition to the transmission overhead due to the tiny size of the plaintext.
3.2 Proposed Hybrid Technique
This method proposes the use of data encryption and obfuscation as a means of
protecting sensitive information in the realm of electronic healthcare. The local
server/gateway encrypts the data (numerical values) using an obfuscation method
Fig. 1 Sample of e-healthcare scheme
in order to keep the data acquired from the sensor private. When physicians have
enough information from a patient’s sensor, they may propose specific medications,
which can then be delivered to the patient’s house through the Internet and given to a
carer. Symmetric encryption is used to secure this sort of alphanumeric information.
The goal of this study is to safeguard personal health information stored in electronic
medical records. The projected e-healthcare system architecture is shown in Fig. 1.
• Step1: Patient information is gathered via sensors implanted in the body.
• Step 2: Obfuscation is used to encrypt the data before sending it to the treating
physician.
• Step 3: The doctor will recommend medications to the patient’s carer after
reviewing the gathered data and analysing the outcomes. Here, storing information
on the cloud plays a crucial role.
3.3 Process for Encrypting Alpha or Alphanumeric Sensed

Data
The following procedures are used to cypher alphanumeric or alphabetic data:

1. Find out how many characters are in the plain text by counting them.
2. The second step is to use a mono-alphabetic substitution cypher to transform
the plain text (C1).
3. Put the encrypted text (C1) into ASCII format (C2).
4. Fourth, create a square matrix (r × c ≥ N) using the ASCII values that
correspond to the cypher text (C2).
5. Five, one row at a time, insert the deciphered cypher text (C1) values into the
matrix.
6. Six, using a separate key for encryption k = k1, k2, k3, …, read the values from
the matrix beginning with the outer circular layer (beginning with the outermost
rows and columns in a clockwise direction).
7. Step 7: Use key k1 to encrypt the message’s outer circular layer and key k2 to
encrypt its layer.
8. Eight, input the text into the matrix exactly as it was read.
9. You should try using the Columnar Transposition code on the matrix.
10. Ten. Using k4, read the matrix’s message column by column.
11. Eleven. Change the ASCII value into the corresponding character’s value.
3.4 Process for Decrypting Alpha or Alphanumeric Sensed

Data
For each piece of alpha or alphanumeric data, the following procedures must be
followed in order to reveal its original form:
• First, decipher the encrypted charm into its ASCII equivalent.
• In order to decipher the columnar transposition cypher, we must first decrypt the
encrypted key k4.
• Apply the matrix to the message, 3.
• Fourth, using the outer layer key k1 and the inner layer key k2, decipher the
message.
• Read the values from the matrix beginning from outer circular layer.
• Use the ASCII values for the numbers in your data.
3.5 Process for Obfuscating Numerical Detected Data
To hide the numbers, the following procedures are used.

First, hide the numbers by using the square root function in mathematics (C1).
Round off the value of the encrypted text (C2).
Add the value of the cypher text to the equation Y = C2 + C2 using the same
value for each.
3.6 Technique for De-obfuscating Numerical Sensed Data
The following ladders are complicated in de-obfuscate the arithmetical data

1. Solve for C2 using the formula C2 = Y /2
To decipher the message C2, which has a decimal value of 2, use the power
function. C1 = pow(Y, 2) (Y, 2).
Third, round the number using the floor function (Plain text = floor) (C1).
In this paper, we explain a dataset and the symptoms often linked with breast cancer.
The system specifications for the experimental analysis are as follows: i5 2.1 GHz
CPU, and NetBeans. All of the records were reproduced so that we could have an
80 K record tally. As part of a cooperative effort, all electronic health record (EHR)
systems share uniformly dispersed patient data.
4.1 Breast Cancer Analysis
Breast cancer is a major health problematic and a leading cause of cancer death.
Experiments are performed using the Wisconsin breast cancer dataset [19], which
may be downloaded for free from the UCI repository.
4.2 Details of Wisconsin Breast Cancer Dataset
There are 32 different characteristics in Wisconsin’s breast cancer dataset. This inves-
tigation considers ten of the 32 possible linked qualities that are deemed to be most
critical. Benign and malignant are the category designations used to describe the
breast cancer status. There is a 1–10 scale for all characteristics. Table 1 displays
statistics on Wisconsin breast cancer cases.
The suggested methodology encrypts the data utilising the aforementioned patient
information for secure transmission. Data on breast cancer is used in conjunction with
the current general methods, and the results are averaged in Table 2 and Fig. 2.
The time needed to generate keys using the hybrid model is reduced when
compared to current methods. The proposed model functions well for a variety of file
types; for a 70 MB file, for instance, AES takes 2.28 s, DES takes 2.03 s, RSA takes
0.005 s, and the proposed model takes 0.003 s. Time spent encrypting and decrypting
files of varying sizes during transfer is summarised in Table 3 and Figs. 3 and 4.
When the file size is 1 MB, the UL time for existing techniques is 2.06, 2.90,
1.24 s, and proposed model has 1.20 s. The same techniques have 1.45, 1.85, 1.26,
and 1.24 s for DL the same size files. From this investigation, it is clearly proves
that the projected model has less time of UL and DL time for different size files.
For large number of file (500 MB), the AES has 492 s for UL and 229.81 s for DL,
RSA has 33.24 s for UL and 39.25 s for DL and proposed hybrid model has only
Table 1 Characteristic detail

Attribute Range
of dataset
Class 2 for benign, 4 for malignant
Clump thickness 1–10
Bare Nuclei 1–10
Bland Chromatin 1–10
Normal Nucleoli 1–10
Uniformity of cell size 1–10
Uniformity of cell shape 1–10
Marginal Adhesion 1–10
Single Epithelial cell size 1–10
Mitoses 1–10
Table 2 Key generation time

Methodologies (time in second)
FS (MB) AES DES RSA Proposed hybrid
10 1.594 1.534 0.004 0.00212
20 1.741 1.606 0.00425 0.00235
30 2.321 1.684 0.00476 0.00286
40 1.888 1.799 0.005 0.00302
50 1.952 1.866 0.00512 0.00328
60 2.193 1.923 0.0055 0.0035
70 2.286 2.034 0.00598 0.00398
80 2.694 2.129 0.00632 0.00427
90 2.827 2.388 0.00664 0.00463
100 2.887 2.545 0.00697 0.00499
Fig. 2 Key generation time

Table 3 Time taken for uploading (UL) the encrypted files and downloading (DL) the decrypted
files
Methodologies (time in second)
FS (MB) AES DES RSA Proposed hybrid
UL DL UL DL UL DL UL DL
0.1 1.4 0.99 1.48 1.15 0.80 0.80 0.70 0.70
0.5 1.48 1.03 1.89 1.31 0.94 0.96 0.80 0.82
1 2.06 1.48 2.90 1.85 1.24 1.26 1.20 1.24
10 14.95 9.90 14.59 10.45 6.43 6.48 5.60 5.68
50 58.56 35.57 60.37 35.90 9.01 10.24 8.25 8.78
100 112.41 59.14 115.15 61.59 17.39 20.68 16.35 18.98
500 492.03 229.81 872.09 400.21 33.24 39.25 31.10 38.22
Fig. 3 Uploading time for

various algorithms
Fig. 4 Downloading time

for proposed model with
existing techniques
31.10 s for UL and 38.22 s for DL. The reason for better performance is that the
encryption contains the hybrid model, where existing techniques equally faces the
Table 4 Speed of file

File size (MB) Uploading speed (Mb/s)
uploading
0.1 11.5
0.5 12
1 11.9
10 12.92
50 12.5
100 13
250 13
500 13
Fig. 5 Uploading speed for

proposed model
files for encryption. Table 4 calculates the speed of file uploading using proposed
hybrid model. Figure 5 presents the graphical analysis for uploading speed.
When the file is 50 MB, the proposed model has 12.5 s of uploading speed, where
it has 13 s for 500 MB. When the size of the file is small, the speed of uploading
file is efficient. For instance, the proposed model has 11.5 s for 0.1kbs and 11.9 s for
1 MB files. From the experimental analysis, the results shows that the hybrid model
achieved effective results in securing the EHR.
5 Conclusion
Wearable devices and the apps that run on them present significant security risks for
healthcare networks because they may easily access personally identifiable informa-
tion about patients. In this study, we present EPPHS, a hybrid security system for
e-healthcare that aims to be both effective and private. To begin, our EPPHS relies on
the privacy safeguards provided by the OU cryptosystem. Then, to lessen the burden
on the computer and the network, a super-increasing sequence was implemented.
Data encryption and obfuscation are two methods described in this study to safe-
guard sensitive information in electronic healthcare networks. By using two separate
security measures, sensitive patient information may be protected to a greater degree.

Numerical values in the data gathered from healthcare IoT sensors are disguised
using this method. It takes the input text and runs it through a series of mathematical
operations in order to produce gibberish. The alphabetic data is encrypted using a
symmetric algorithm, while the numeric data is masked using an obfuscation method.
The information included inside the e-healthcare system is kept private and secure
thanks to this method. In addition, attackers have a hard time tracing back their mate-
rial to its source. Information in the field of electronic healthcare (e-healthcare) will
be able to benefit from a wide variety of future data transfer methods.
Acknowledgements Conflicts of Interest

Dear Editor,
Paper Entitled: Securing the E-records of patient data using the Hybrid Encryption model with
Okamoto–Uchiyama Cryptosystem in Smart Healthcare.
• We have no conflict of interest to declare.
• On behalf of all Co-Authors, the corresponding Author shall bear full responsibility for the
submission.
References
1. Chennam KK, Uma Maheshwari V, Aluvalu R (2022) Maintaining IoT healthcare records using
cloud storage. In IoT and IoE driven smart cities. Springer, Cham, pp 215–233
2. Kantipudi MVV, Moses CJ, Aluvalu R, Kumar S (2021) Remote patient monitoring using
IoT, cloud computing and AI. In: Hybrid artificial intelligence and iot in healthcare. Springer,
3. Abhishek R, Kushal K, Reddy P, Shetty R, Eswaran S, Honnavalli P (2022) An enhanced
deployment of 5G network using multi objective genetic algorithm. In: 2022 IEEE International
conference on electronics, computing and communication technologies (CONECCT), pp 1–6.
https://doi.org/10.1109/CONECCT55679.2022.9865106
4. Aluvalu R, Uma Maheswari V, Chennam KK, Shitharth S (2021) Data security in cloud
computing using Abe-based access control. In: Architectural wireless networks solutions and
security issues. Springer, Singapore, pp 47–61
5. Shi S, He D, Li L, Kumar N, Khan MK, Choo KKR (2020) Applications of blockchain in
ensuring the security and privacy of electronic health record systems: a survey. Comput Secur
97:101966
6. Kasthuri S, Nisha Jebaseeli A (2020) Review analysis of Twitter sentimental data. Biosci
Biotechnol Res Commun (BBRC), (UGC CARE J—Web Sci), Special Issue, 13 (6):209–214.
ISSN: 0974-6455
7. Chennam KK, Aluvalu R, Shitharth S (2021) An authentication model with high security for
cloud database. In: Architectural wireless networks solutions and security issues. Springer,
8. Wang Y, Zhang A, Zhang P, Wang H (2019) Cloud-assisted EHR sharing with security and
privacy preservation via consortium blockchain. IEEE Access 7:136704–136719
9. Maheswari VU, Aluvalu R, Kantipudi MP, Chennam KK, Kotecha K, Saini JR (2022) Driver
drowsiness prediction based on multiple aspects using image processing techniques. IEEE
Access
10. McDermott DS, Kamerer JL, Birk AT (2019) Electronic health records: a literature review of
cyber threats and security measures. Int J Cyber Res Educ (IJCRE) 1(2):42–49
11. Chennam KK, Muddana L, Aluvalu RK (2017) Performance analysis of various encryp-
tion algorithms for usage in multistage encryption for securing data in cloud. In: 2017 2nd
IEEE International conference on recent trends in electronics, information & communication
technology (RTEICT). IEEE, pp 2030–2033
12. Yang G, Li C (2018) A design of blockchain-based architecture for the security of electronic
health record (EHR) systems. In: 2018 IEEE International conference on cloud computing
technology and science (CloudCom). IEEE, pp 261–265
13. Chinnasamy P, Deepalakshmi P (2022) HCAC-EHR: hybrid cryptographic access control for
secure EHR retrieval in healthcare cloud. J Ambient Intell Humaniz Comput 13(2):1001–1019
14. Domadiya N, Rao UP (2022). ElGamal homomorphic encryption-based privacy preserving
association rule mining on horizontally partitioned healthcare data. J Inst Eng (India): Ser B,
pp 1–14
15. Obayya M, Eltahir MM, Alharbi O, Maashi M, Al-Humaimeedy AS, Alotaibi N, Nour MK,
Hamza MA (2022) Intelligent compression then encryption scheme for resource constrained
sustainable and smart healthcare environment. Sustain Energy Technol Assess 53:102690
16. Munjal K, Bhatia R (2022) A systematic review of homomorphic encryption and its
contributions in healthcare industry. Complex Intell Syst, pp 1–28
17. Hamed NM, Yassin AA (2022) Secure patient authentication scheme in the healthcare system
using symmetric encryption. Iraqi J Electr Electron Eng, 18(1)
18. Okamoto T, Uchiyama S (1998) A new public-key cryptosystem as secure as factoring. In:
International conference on the theory and applications of cryptographic techniques. Springer,
Berlin, Heidelberg, pp 308–318
19. “Breast Cancer Wisconsin (Original) Data Set” [Online] Available: https://archive.ics.uci.
edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data.
Accessed 28-May-2018
Cuttlefish Algorithm-Based Deep
Learning Model to Predict the Missing
Data in Healthcare Application
A. Sasi Kumar, T. Rajesh Kumar, R. Balamanigandan, R. Meganathan,

Roshan Karwa, and R. Mahaveerakannan
Abstract Countless sensors embedded in IoT devices produce an ocean of data. The
quality of IoT services depends on this information; hence, its accuracy is critical.
Unfortunately, noise, collision, unreliable network connectivity, failed equipment,
and manual system shutdown are all common causes of missing and partial values
in IoT data. The use of predictive analytics to EMRs, which include healthcare
data, inspired this effort. Predictive models based on such data are inherently flawed
because they include noise, missing data, and inconsistencies between classes of
interest. We argue for the need to create specialised methods of data preprocessing
A. Sasi Kumar
Inurture Education Solutions Pvt. Ltd., Bangalore, India
Department of Cloud Technology and Data Science, Institute of Engineering and Technology,
Srinivas University, Srinivas Nagar, Mukka, Surathkal, Mangalore, India
A. Sasi Kumar
e-mail: askmca@yahoo.com
T. Rajesh Kumar · R. Balamanigandan · R. Mahaveerakannan (B)
Department of Computer Science and Engineering, Saveetha School of Engineering, Saveetha
Institute of Medical and Technical Sciences, Chennai, Tamil Nadu 602105, India
e-mail: mahaveerakannanr.sse@saveetha.com
T. Rajesh Kumar
e-mail: t.rajesh61074@gmail.com
R. Balamanigandan
e-mail: balamanigandanr.sse@saveetha.com
R. Meganathan
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, AP, India
e-mail: rmeganathan@gmail.com
R. Karwa
Department of CSE, Prof Ram Meghe Institute of Technology and Research, Badnera-Amravati,
India
e-mail: karwa.roshan@gmail.com
https://doi.org/10.1007/978-981-99-6553-3_39
514 A. Sasi Kumar et al.
and categorisation since typical data mining approaches frequently provide unsat-
isfactory performance measurements. To build the long-term predictor in the atten-
dance of significant nonlinearity and noise in the missing data, in this study, each of us
provides a bidirectional self-attentive encoding and decoding architecture (BEDA).
Prior to any further processing, the raw data is subjected to a wavelet threshold
filter for denoising purposes. The weights are optimised via a mapping-based cuttle-
fish optimisation technique, and the bidirectional long short-term-memory is chosen
as the basic device for isolating temporal and serial features (MCFA). The predic-
tion performance is then enhanced by adding the multi-head self-attention device
into the encoder–decoder architecture. It demonstrates that the projected technique
yields quick, accurate, and resilient classification consequences by comparing them
to results from other models on public data sets with unbalanced classes and missing
values in health-related applications.
Keywords Health care · Bidirectional self-attentive encoder–decoder · Noise ·

Missing values · Internet of things
1 Introduction
For the purpose of gauging structural reactions and assessing structural states, struc-
tural health monitoring (SHM) approaches have found widespread use in bridges via
the installation of structural health monitoring systems (SHMSs) [1, 2]. Sensors
like accelerometers, strain gauges, and inclinometers are used in SHMS imple-
mentations. Frequency components of the observed time series signals provide a
wealth of structural and damage details. Both structural integrity and functionality
are assessed due to these measurable signals using frequency-domain analysis [3].
In reality, however, data loss or missing data invariably happens, which severely
impacts structural measurements and evaluations. It’s likely due to a combination
of factors, some of which are listed below: sensor malfunction [4]. Many sensors,
transmission systems, data collecting modules, and controllers are required for a
complete wired or wireless SHMS. Every [5] software and hardware failure have the
potential to interrupt certain signals, either momentarily or permanently. It has been
reported that 10% observation noise has the same impact on power spectral density
as loss rates between 1 and 2%. It is challenging for measured signals with total
data loss or significant data loss ratios to accurately represent underlying structural
characteristics [6]. Reference [7] in particular, two parallel strain gauges should be
utilised to determine the neutral axis; both strain gauges are equally useless if one
breaks assess the zone of symmetry in that section [8]. Although the malfunctioning
sensors may be replaced, the process of identifying and fixing the faults is laborious
and time-consuming. The built-in sensors can also never be swapped out.
Reconstruction techniques for SHM data loss have been suggested in abundance.
As the amount of SHM records in the database continues to expand, the prospect of
using deep learning techniques to decipher this information is exciting. CNNs are
Cuttlefish Algorithm-Based Deep Learning Model to Predict … 515
one kind of extensive practise architecture that can adaptively by using convolutional
filters to scan through tiny areas of a massive quantity data sources [9]. CNNs have
been implemented in SHM for a variety of purposes, including fracture detection
[10, 11], corrosion detection [12, 13], vehicle identification [14], and structural state
assessment [15, 16]. One example of a data creation problem is recreating previous
measurements that were lost. Output labels are often not provided in most real-world
setups. The input is compressed into a tiny collection of data, and the encoder–
decoder network then reconstructs the input from these data [15]. With this quality,
time series data may be compressed and reconstructed with ease.
Strongly random noise in the data and an unlearnable model. As a consequence,
the accuracy with which predictions are made has diminished. While deep learning
models are capable of impressive learning, they may become overfit to the training
set noise if it is present. A BEDA is presented for building the long-term predictor
with significant nonlinearity and noise, which would help address the current issues
in missing data prediction. To begin, this research employs filtering the EHR’s noisy
data using polynomial boundary eliminating, therefore reducing the risk of overfitting
the model and enhancing the model’s stability. For this reason, the encoder–decoder
architecture is built on (LSTM) units, which are the basic building blocks. For optimal
prediction performance and model resilience, the suggested framework incorporates
multi-head self-attention, which also shows high generalisation impact for varied
input time series data.
This article continues with the following structure. The second section presents the
supplementary materials. The suggested model’s schematic and detailed structure are
described in Sect. 3. The methods used and findings are presented in Sect. 4. Section 5
offers the last thoughts on the matter.
2 Related Works
Using an improved ensemble of two combination of a recurrent neural network (such

an SGTM) with an additional input GRNNs, Izonin et al. [16] provide a method for
estimating whether or not lost information can be recovered in all or in part. The
latter is used to make the weighted total more precise approach by relocating the
outputs of both GRNN networks. The ensemble’s functional diagram is shown. The
training algorithm and its implementation are described in depth. In order to fill in
the blanks in a the actual data set representing keeping tabs on the HVAC system,
the enhanced ensemble prediction approach was put to the test. Experiments have
been conducted to identify the appropriate component-level parameters operations
of the enhanced ensemble. Using the MAPE and RMSE accuracy metrics, experi-
mental comparisons with other methods in the same class have shown its efficacy.
When comparing the produced solution to previous ones, the lowest value of applica-
tion error is determined. Application delays for the evaluated approaches have been
measured experimentally.
The Iterated Imputation and Prediction approach put out by Camastra et al. [17]
makes it feasible to forecast time series that lacks complete data. The method relies
on the estimation of correlation dimensions to model ordering and SVM regression
for time series estimation framework (i.e. the number of times samples is needed
to represent the correct time sequence). To empirically test the technique, three
environmental time series representing the ozone levels at three European sites were
employed. For the three-time series in the test set, the average per cent prediction
error was under 1%.
Autoencoder and time series prediction using a latent semantic memory (LSTM),
memory (STM), and active retrieval (AR)-based comparable pixel clustering, Zhou
et al. [18] have employed AE-LSTM. In the beginning, we created pixel-wise time
series data using masking.
3 Proposed System
We assume m mobile IoT devices (doctors and patients) are spread across m locations
in the system model. In addition, we anticipate that the processing power of each IoT
device and fog layer node is constrained and will therefore be unable to fully analyse
the EHR data. Here, patients use fog and cloud layers to transmit their localised
EHR data values to their physicians. We take into account the data scope and latency
(transmission delays) of IoT devices, fog, and cloud layers along the course of this
data flow. The process of the suggested model is exposed in Fig. 1.
We shoulder that the set E, F, and S signifies the IoT edge devices, respectively.
E = {e1 , e2 , e3 , . . . , em }, (1)
F = { f 1 , f 2 , . . . , f m }, (2)
S = {s1 , s2 , . . . , sm }, (3)
where m is the sum of devices/sensors. L e f and L f c are a communiqué link set

among the layers.
{ }
L e f = le1 , f1 , le2 , f1 , le3 , f1 , . . . , lei , f j , (4)
{ }
L f c = l f1 ,c , l f2,c , . . . , l fi,c . (5)
Sm t total quantity of data pending from sensors at time t.
∑
m
Sm t = si (0 < si ). (6)
i=1
Fig. 1 Working flow of

proposed model
Note that pei is the calculation competence of apiece device ei .
pei ← {i = 1, 2, . . . , m}. (7)
So, the prediction delay of each proposed model can be given by
1
dϕpred = Sm , (8)
pdi t
where ϕ ' signifies the model name. Thus, the entire computational delay of DeepMDP
in that layer can be assumed by
prep
dxcomp = dϕpred + d Sm , (9)
pr ep
where d Sm Each prediction model must be executed after a delay for data prepro-
cessing. Therefore, the following is how the total computation delay for each layer
() can be expressed:
⎧
∑
n ⎨ Edge if xi = ei
comp
dλ = dxcomp , λ = Fog if xi = f i , (10)
i
⎩
i=1 Cloud if xi = ci
where n denotes the total number of devices and xi indicates the protocol’s operating
layer. Each communication lag between up/downlinks can be described as follows,
assuming no lost data.
up/down
up/down Sm
dle f /l f c = . (11)
B up/down
After adding up all the delays, the communication delay between the publisher
and the subscriber is given by
up up
comm
dpub−subs = dle, f + dl f,c + dldown
e, f
+ dldown
f,c
. (12)
Delays in computation on the layer where devices are placed and in communica-
tions between publishers and subscribers make up the majority of the overall system
latency. In summary, the system delay may be written as
system comp
Dtotal = dλ + dpub−subs
comm
. (13)
In the next unit, we explain how BEDA reduces the total data transmission sum
and delay in the system.
3.1 Bidirectional LSTM Unit
A bidirectional LSTM, or BiLSTM, combines forward- and backward-facing LSTM

into a single unit. The advantages of BiLSTM over traditional LSTM are clear: it
outperforms traditional LSTM on complicated classification and regression tasks and
can mine rules that LSTM has trouble resolving. The time series vectors are fed into
a unidirectional LSTM network in order to get a prediction. When dealing with the
actual world, it’s important to remember that information in the present is tied not
just to information in the past, but also to information in future. In Fig. 2, we see the
BiLSTM network architecture.
Two separate LSTM layers, one facing forward and one facing backward, make
up the BiLSTM. While the forward LSTM computes the input sequence in forward
order, the reverse LSTM does the opposite. There is no communication between the
forward and reverse LSTM networks, while they are being trained separately. As
a result, BiLSTM is better able to determine the internal correlation between time
series. Specifically, the LSTM cell’s forward propagation process consists of the
following steps:
Fig. 2 LSTM model construction
( [ ] )
f t = σ w f xt ; h t−1 + b f , (14)
( [ ] )
i t = σ wi xt , h t−1 + bi , (15)
( [ ] )
c̃t = σ wc xt , h t−1 + bc , (16)
ct = f t × ct−1 + i t × c̃t , (17)
( [ ] )
ot = σ wo xt , h t−1 + bo , (18)
h t = ot × tan h(ct ), (19)
where f t is the gated, h (t − 1) is the secret of the previous cell’s condition, c (t −

1) is the state the cell was in before, c t is a candidate for cell state, c t is the new cell
content, x t is the current input value, is the sigmoid activation function, and tanh
is the hyper. The Multi-criteria Decision Analysis (MCFA) used to optimise these
learning parameters is explained below.
3.2 Multi-criteria Decision Analysis (MCFA)
The appropriate weights for BiLSTM are chosen with the use of a modified canonical
factor analysis (MCFA) [19], which is inspired by the ways in which cuttlefish
skin may change colour. To get the best answer, CFA primarily makes use of two
methods: visualisation and reflection. Reflection mimics light refraction, whereas
vision mimics seeing corresponding patterns. CFA’s main selling point is its ability
to combine the search for local and global values, with the first stage of the search
process adjusting the contribution of this integration. CFA’s main drawback, though,
is its propensity to get stuck in a local optimum. As indicated, a chaotic mapping is
applied to CFA in order to solve this issue, which injects energy into the algorithm
by allowing for a random search. The algorithm is able to escape the local minimum
thanks to this solution. As a result, the proposed MCFA recommended emulating
CFA’s chaotic mapping, which boosts the algorithm’s efficiency in solving a wide
range of optimization problems. So, this MCFA uses an initial population calculation
based on chaos mapping. Most states in a given area may be reached through a search
aided by chaotic designations, with little overlap. According to the MCFA suggested
in (21), sequences of chaos may be generated by solving a set of specialised chaos
assignment Eq. (21).
Crn+1 = δ ∗ Crn ∗ (1 − Crn ) for 0 < δ ≤ 4, (20)
Brn+1 = δ ∗ Brn ∗ (1 − Brn ) for 0 < δ ≤ 4, (21)
where Cr and Br represent a function with an initial value between 0 and 1; the func-
tion is a logistic mapping function. Solution variety and breadth are both enhanced
by using this paradigm. In MCFA, the random parameter from traditional CFA is
replaced by Cr and Br. Starting at random, a new solution was discovered by thought
and observation, as demonstrated in Eq. (22).
Ns = Re + Vs . (22)
Ns stands for new solution, Re V s means sight, whereas R stands for contemplation.
The population is split into four groups using this procedure. In order to provide a
new solution for type 1, the algorithm takes into account both state 1 and state 2
(the connection between erythrocytes and chromatophores). In both cases, extensive
research is conducted utilising the values of individual points to uncover new avenues
close to the optimal answer within a certain time frame. To get a better answer, the
type 2 algorithm looks locally at states 3 (the erythrocyte inversion factor) and 4. A
state 5, interaction is employed to provide solutions for type 3 that are almost optimal.
For type 3, condition 6 conducts a world-wide search that simply reflects incoming
light. Below are the formulas used to determine how each sort of reflectance and
transparency is calculated.
1. State 1 and 2 for Type 1
Re [ j] = Cr ∗ G 1 [i].points[ j], (23)
Vs [ j] = Br ∗ (b.point[ j] − G 1 [i].points[ j]), (24)
where G1 represents an electing the CH, where the value of the best points is averaged
to arrive at a single value. In addition, Re stands for the magnitude of the reflection,
V s stands for the size of the final shape, Cr and Br are chaos mapping numbers, and
so on.
2. State 3 and 4 for Type 2
Re [ j] = Cr ∗ G 1 [i].points[ j], (25)
Vs [ j] = Br ∗ (b.point[ j] − G 2 [i].points [ j]). (26)
3. State 5 for Type 3
Re [ j] = Cr ∗ G 1 [i].points[ j], (27)
Vs [ j] = Br ∗ (b.point[ j] − AVB ), (28)
where AVB stipulates the average worth of the best points.

4. State 6 for Type 4
p[i].points[ j] = random ∗ (U1 − L I ) + L I , (29)
where i, j = 1, 2, . . . , n and U1 and L I indicate the limits

[ of the problem
] field.
→ → →
The signal that is fed into the forward LSTM is h 1 , h 2 , . . . , h t . The forward
LSTM and the backward LSTM are two different names for the same procedure, save
for the fact that the order of the sequences that are[inputted is different.
] When all is
← ← ←
said and done, the output of the reverse LSTM is h , h , . . . , h . The following is
1 2 t
the culmination of the work done by the BiLSTM network:
[[ ] [ ] [ ]]
← ← ←
[h 1 , h 2 , . . . , h t ] = h→1 ; h , h→2 ; h , . . . , h→t ; h , (30)
1 2 t
where [;] might mean adding, summing, or multiplying like terms. By storing infor-
mation pertaining to the past and the future in two distinct LSTMs, both in front
and behind, BiLSTM improves the interplay between mines sequence data sets and
sequence data sets information features more precisely.
3.3 Multi-head Self-attention Apparatus
Attention to oneself is an attentional mechanism for orders in deep learning that aids
in the learning of task-specific relationships to a more accurate representation of the
sequence. We can think an attention function can be thought of as a mapping between
a query and a collection of key-value pairs, with the mapping ultimately leading to
some kind of output, where the query, the key, the value, and the output are each
represented by a vector. The output is computed as a weighted sum of the values by
making use of a compatibility function that is applied to both the query and the key
that corresponds to it.
Q = wq q, (31)
K = wk k, (32)
V = wv v. (33)
After that, dot products of the query with all the keys are calculated by us. After
that, the softmax function is applied to the scaled data to derive the attention weight.
The output vector is then calculated by assigning weights to each value.
()
QK T
Attention weight = softmax √ , (34)
d
( )
QK T
head = softmax √ V, (35)
d
in which head refers to the weighted output sequence, an is the size of K, and
attention weights are referred to as computed based on the fact that Q and K are
quite comparable to one another.
With multi-headed focus, many attention modules run simultaneously. The model
can be represented in h various linear projections because to the model’s ability to
switch between modelling in h diverse subspaces using h diverse sets of weight
matrices:
o = wo concat(head0 , . . . , headh ). (36)
Concatenating the header values and then feeding them into the linear layer leads to
the culmination of the process. With the assistance of several eyes, the model can pay
attention to data from multiple representation subspaces simultaneously. Attention
focused on the scaled dot product can be compared to convolution, where different
convolution kernels can extract varying degrees of features that attract attention.
3.4 Outline of Self-attentive Encoding and Decoding

in a Bidirectional Perspective
With regard to this study, it uses the encoder–decoder architecture to build a perfect
that can anticipate the user’s own focus of attention. Although the encoder–decoder
construction is conventional, the performance of the network degrades with an
increasing input sequence due to the constraints on information being lost due
to fixed-length coding. By conducting attended feature selection on the encoder
output, the attention mechanism is able to circumvent the constraints of the encoder–
decoder architecture. Compared to additive attention and multiplicative attention,
multi-headed attention built utilising a finely tuned matrix multiplication encoder
allows for a reduction in processing time in addition to an increase in space effi-
ciency. We construct an encoder–decoder using MCFA-based BiLSTM and integrate
a self-attentive mechanism into it.
Filtered data X = [x1, x2, x t]T Rt and the actual value Y Rt are obtained by first
applying sliding window to the raw data. A BiLSTM network with several layers
is used as the encoder. Encoder output H = [h1, h2, hi, ht]T R(tu), where u is the
number of cells produced by BiLSTM, hi = [h_i; h I and [;] represents the combined
length of two vectors, is acquired X via the encoder.
The output of the encoder was an H is used for all queries, keys, and values within
the layer of attention with multiple focus. Nonlinearly mapping the encoder’s output
H k times yields k sets of questions, keys, and answers.
( )
Q i = σ wq,i H + bq,i , i = 1, 2, . . . , k, (37)
( )
K i = σ wk,i H + bk,i , i = 1, 2, . . . , k, (38)
( )
Vi = σ wv,i H + bv,i , i = 1, 2, . . . , k, (39)
where wq,i ∈ R u×d , wk,i ∈ R u×d , wv,i ∈ R u×d , bq,i ∈ R t×d , bk,i ∈ R t×d , bv,i ∈ R t×d
are the parameters to be learned, d = uk , Q i ∈ R t×d , K j ∈ R t×d , and V j ∈ R t×d .
In this example, we calculate the product of scaling the dots of the focus for every
one of the k groupings Q, K, and V, respectively. Time-based attention weights are
represented by a t-by-t matrix of rows and columns, respectively.
( )
Q i K iT
headi = softmax √ Vi . (40)
d
The output is created by stringing together all of the results and then making linear
adjustments to them.
0 = wo concat(head0 , . . . , headh ) + b0 , (41)
where wo ∈ Ru×u , bo ∈ R t×u are the to obtain the indoctrination vector C ∈ R t×u .
The decoder, like the encoder, uses a BiLSTM network with many layers. The
projected value is obtained by forward propagating the encoder vector C and
performing a nonlinear transformation on the decoder’s outputs. First, the ReLU
activation function linearly scales the decoder’s output vector down to a vector of
dimension, and then it applies a second nonlinear scaling factor to the activation
function to achieve the final predicted value.
( )
ŷ = relu w y s + b y . (42)
First, we train on the training set, and then, to prevent overfitting, we test on the
validation set. Following the training procedure and the selection of the parameters
have been completed, the final assessment is carried out on the unlabelled testing
set to determine how well the model has done. Adaptive Moment Estimation (also
known as Adam) optimization technique is used across all models; it is computa-
tionally efficient and has a small memory footprint, and it makes use of momentum
convergence. Mean absolute error is the loss function used during model training
(MSE).
∑n ( )2
i=1 ŷi − y2
MSE = , (43)
n
where n is the entire amount of data, y is the projected value, and y is the actual value.
The MSE may be calculated anywhere, and the gradient values are continuously
shifting and soon convergent.
4.1 Data Set Description of Public Data Sets
The suggested MCFA-BEDA is evaluated against other models for missing-value

classification. The data set’s summary is exposed in Table 1. Our public data sets are
used to test and compare the proposed model to others, such as GRNN-SGTM [16],
SVM [17], AE-LSTM [18], hybrid-DL [20], and SGD-DL [21], and the results are
shown. The approaches are evaluated using 5, 10, 20, and 40% rmv missing data sets
(Tables 2 and 3).
Table 1 Public data sets

Data set rimb nf jJ j |C+| |C−|
Twonorm 0.50 20 7400 3703 3697
Letter26 0.96 16 20,000 734 19,266
Ringnorm 0.50 20 7400 3664 3736
Cod-rna 0.67 8 59,535 19,845 39,690
Clean (Musk) 0.85 166 6598 1017 5581
Advertisement 0.86 1558 3279 459 2820
Nursery 0.67 8 12,960 4320 8640
Hypothyroid 0.94 21 3919 240 3679
Buzz 0.80 77 140,707 27,775 112,932
Forest 0.98 54 581,012 9493 571,519
Table 2 Confusion matrix

Positive class Negative class
Positive class True positive (TP) False positive (FP)
Negative class False negative (FN) True negative (TN)
Table 3 Comparative average analysis of a proposed model with existing techniques

Missing level (%) SVM GRNN-SGTM SGD-DL Hybrid-DL AE-LSTM Proposed
5 0.86 0.87 0.90 0.91 0.93 0.95
10 0.87 0.88 0.91 0.93 0.94 0.97
20 0.88 0.89 0.92 0.94 0.95 0.97
40 0.89 0.90 0.93 0.95 0.97 0.98
4.2 Performance Measures
Evaluation of classification techniques is carried out with the aid of presentation

metrics derived from the confusion matrix (see Table 2).
In the case of problems involving binary classification, the presentation measures
are defined as follows:
TP TN
SN = , SP = , (44)
TP + FN TN + FP
√
G − mean = SP ∗ SN, (45)
TP + TN
ACC = . (46)
FP + TN + TP + FN
When put through their paces, current approaches do poorly when faced with
significant amounts of missing data. SVM achieved 0.87, GRNN-SGTM achieved
0.88, SGD-DL achieved 0.91, and hybrid-DL scored 0.93 when the missing level was
10%. The suggested model performed better than AE-LSTM (0.97 vs. 0.94). There
is a lot of data that is lacking, yet the suggested model and other methods still did
quite well. For instance, SVM, GRNN-SGTM, SGD-DL, hybrid-DL, AE-LSTM,
and the proposed MCFA-BEDA all managed to get over 0.90, while AE-LSTM and
MCFA-BEDA managed to go above 0.98. The suggested model’s overall organisation
accuracy is graphically exposed in Fig. 3.
Table 4 summarises the outcomes of all the strategies used to the missing data
in the public data sets. SVM, GRNN-SGTM, SGD-DL, and the hybrid-DL all got
very close to 0.86 to 0.88, AE-LSTM got 0.89, and the suggested model got 0.90 in
the accuracy study. While all methods are implemented using gmean, the suggested
model outperformed them all with a mean of 0.91–0.93. Whereas existing algorithms
directly forecast the missing data, which raises computing complexity, the suggested
Fig. 3 Regular accuracy for various missing levels on various techniques
BiLSTM is ideally chosen by employing MCFA, which leads to higher performance.

The comparison between the proposed model and current methods across several
metrics is graphically shown in Fig. 4.
Table 4 Analysis of proposed model with existing techniques
Classification technique Specificity gmean Accuracy
SVM 0.8638 0.9294 0.8652
GRNN-SGTM 0.8864 0.9415 0.8875
SGD-DL 0.8586 0.9266 0.8600
Hybrid-DL 0.8671 0.9269 0.8817
AE-LSTM 0.8738 0.9371 0.8963
Proposed 0.9085 0.9531 0.9094
Fig. 4 Graphical analysis of MCFA-BEDA with existing models on different metrics

5 Conclusion
In this study, we offer a new technique for missing data prediction and demonstrate
its usefulness in a variety of practical settings. To achieve this goal, we first create an
efficient DL model, which serves as the backbone of the proposed protocol. Then,
to address the issue of missing data, we suggest a technique we call BEDA. The
suggested model is dubbed MCFA-BEDA since it employs the MCFA optimization
approach to choose the learning weight parameter of the BiLSTM model. Using this
testbed architecture, we thoroughly assess BEDA in terms of the mobile edge, fog,
and cloud. At the end, we provide a comprehensive performance evaluation of the
current DL models and the proposed MCFA-BEDA. Results collected demonstrate
that our DL models perform better than previous baseline approaches when predicting
missing data. However, ensuring the safety of this EHR is crucial for efficient data
transfer, and overall prediction of missing value in EHR is a tough problem. Future
work will thus include incorporating the lightweight encryption approach into the
proposed paradigm in order to address data security concerns.
References
1. Nie ZH, Lin J, Li J et al (2020) Bridge condition monitoring under moving loads using two
sensor measurements. Struct Health Monit 19(3):917–937
2. Carden EP, Fanning P (2004) Vibration based condition monitoring: a review. Struct Health
Monit 3(4):355–377
3. Sun L, Shang Z, Xia Y et al (2020) Review of bridge structural health monitoring aided by big
data and artificial intelligence: from condition assessment to damage detection. J Struct Eng
146(5):04020073
4. Kullaa J (2013) Detection, identification, and quantification of sensor fault in a sensor network.
Mech Syst Signal Pr 40(1):208–221
5. Abdulkarem M, Samsudin K, Rokhani FZ et al (2020) Wireless sensor network for structural
health monitoring: a contemporary review of technologies, challenges, and future direction.
Struct Health Monit 19(3):693–735
6. Fan G, Li J, Hao H (2020) Dynamic response reconstruction for structural health monitoring
using densely connected convolutional networks. Struct Health Monit. Epub ahead of print 24
May 2020. https://doi.org/10.1177/1475921720916881
7. Chen ZC, Bao YQ, Li H et al (2018) A novel distribution regression approach for data loss
compensation in structural health monitoring. Struct Health Monit 17(6):1473–1490
8. Rajarajeswari G, Kasthuri S (2017) Co-clustering interpretations for feature selection by using
sparsity learning. Int J Res Instinct (INJRI) 4(1):30–35, E-ISSN: 2348-2095
9. Guo T, Wu LP, Wang CJ et al (2020) Damage detection in a novel deep-learning framework:
a robust method for feature extraction. Struct Health Monit 19(2):424–442
10. Cha YJ, Choi W, Buyukozturk O (2017) Deep learning-based crack damage detection using
convolutional neural networks. Comput Aided Civil Infrastruct Eng 32(5):361–378
11. Beckman GH, Polyzois D, Cha Y-J (2019) Deep learningbased automatic volumetric damage
quantification using depth camera. Automa Construct 99:114–124
12. Atha DJ, Jahanshahi MR (2018) Evaluation of deep learning approaches based on convolutional
neural networks for corrosion detection. Struct Health Monit 17(5):1110–1128
13. Xia Y, Jian XD, Yan B et al (2019) Infrastructure safety oriented traffic load monitoring using
multi-sensor and single camera for short and medium span bridges. Remote Sens 11(22):2651
14. Khodabandehlou H, Pekcan G, Fadali MS (2019) Vibration-based structural condition

assessment using convolution neural networks. Struct Health Monit 26(2):e2308
15. Shang Z, Sun L, Xia Y et al (2020) Vibration-based damage detection for bridges by deep
convolutional denoising autoencoder. Struct Health Monit. Epub ahead of print 28 July 2020.
https://doi.org/10.1177/1475921720942836
16. Izonin I, Tkachenko R, Verhun V, Zub K (2021) An approach towards missing data management
using improved GRNN-SGTM ensemble method. Eng Sci Technol, Int J 24(3):749–759
17. Camastra F, Capone V, Ciaramella A, Riccio A, Staiano A (2022) Prediction of environ-
mental missing data time series by support vector machine regression and correlation dimension
estimation. Environ Model Softw 150:105343
18. Zhou YN, Wang S, Wu T, Feng L, Wu W, Luo J, Zhang X, Yan NN (2022) For-backward
LSTM-based missing data reconstruction for time-series Landsat images. GIScience Remote
Sens 59(1):410–430
19. Eesa AS, Brifcani AMA, Orman Z (2013) Cuttlefish algorithm-a novel bio-inspired optimiza-
tion algorithm. Int J Sci Eng Res 4(9):1978–1986
20. Shifana Begum D, Senthil Kumar D, Mahaveerakannan R (2022) Refinement model based on
deep learning technique for prediction of temperature using missing data. Math Stat Eng Appl
71(3s2):824–835
21. Gad I, Hosahalli D, Manjunatha BR, Ghoneim OA (2021) A robust deep learning model for
missing value imputation in big NCDC dataset. Iran J Comput Sci 4(2):67–84
Drowsiness Detection System Using DL
Models
Umesh Gupta , Yelisetty Priya Nagasai, and Sudhanshu Gupta
Abstract A feeling of being sleepy and lethargic, combined with driving is a

dangerous mixture. Drowsy driving usually happens due to lack of sleep, untreated
sleep disorders, drinking alcohol, medications, or shift work. There is no exact time
or moment sleep can take over the conscious. The danger, risk, and tragedy involved
in drowsy driving is alarming. The drowsiness detection system will be made using
python, CNN, and OpenCV. Using Inception V3 we can increase the accuracy and
make the system more precise. Artificial intelligence and machine learning has made
it possible to detect and prevent accidents caused by drowsiness. This development
is based on artificial neural networks which is a branch of artificial intelligence. This
paper aims to understand the optimal and correct way of creating an effective system
to detect drowsiness.
Keywords Artificial intelligence · CNN · Drowsiness detection · Drivers ·

Machine learning · OpenCV
1 Introduction
Drowsiness has been a cause of death in multiple places around the globe. Using
machine learning to help reduce these accidents can make a huge relief. An effective
system and accurate model can be the most beneficial and will help a ton. If this
system is integrated with other hardware, it can be life changing. Drowsiness makes
a person pay less attention on the road, affects ability to make decisions and slows
reaction time. More than 2 people out of 20, around the age of 18 and above, have
fallen asleep while driving. In 2017, 91 k crashes, 50 k injuries and approximately
850 deaths were estimated by the National Highway Traffic Safety Administration.
U. Gupta (B) · Y. P. Nagasai · S. Gupta

SCSET, Bennett University, Greater Noida, Uttar Pradesh, India
e-mail: er.umeshgupta@gmail.com
https://doi.org/10.1007/978-981-99-6553-3_40
530 U. Gupta et al.
1.1 Research Objectives
Anyone who doesn’t get enough sleep, takes medications which makes them drowsy
is likely to doze off while driving. Yawning, frequently missing exits, blinking,
drifting from lanes, etc. could all be signs of drowsy driving. Drowsy driving can
cause death or life changing, paralyzing injuries resulting bed ridden situation or
amputation of limbs, etc. Drowsy driving can be avoided by sleeping for at least
seven hours, developing good habits related to sleeping, avoiding taking alcohol or
medications that cause sleepiness.
1.2 Target Group
The risk of drowsy driving surrounds anyone who drives with less than recom-
mended sleep, drives late night or is driving on a straight road without any distrac-
tions or interruptions like rumbles or bumpers. The target group for this drowsiness
detector includes Ola, Uber drivers, bus, and truck drivers, i.e., commercial drivers
and honestly anyone who drives daily or at night.
1.3 The Proposed System
The model proposed will be made using OpenCV and CNN or Euclidian distance
calculation which will detect the face and then focus on the eyes, foresee whether
the driver’s eyes are open or closed. The model will wait for a certain time before
tooting and waking the driver.
1.4 Novelty of the Project
Using a huge dataset that will result in increased accuracy of prediction of drowsiness
can be a huge asset and a plus point of the proposed project. This detection system
can run in the offline mode, without Internet or network as it has no requirement
related to the same. The system can also include a voice assistant feature, whose
sole purpose is to ask the driver if he/she wants to pick the call or drop it. It can
also alert the driver if the charging is too low. This system can run the background,
which ensures that you can use your google maps along with this system. The only
requirement will be the continuous usage of the camera. By using gray scaled images,
the system doesn’t necessarily depend on the lighting levels on the face of the driver.
Drowsiness Detection System Using DL Models 531
1.5 Why Will this Solution Work?
Drowsiness detector is made using CNN, i.e., convolutional neural networks, a supe-
rior type of deep neural network specialized in image classification purposes. The
CNN architecture of the model consists of 4 layers of which 3 are convolutional
layers of 32 and 64 nodes and kernel size 3 and a fully connected layer made of
128 nodes. These layers in basic terminology are input, output, and hidden layers.
It performs convolutional operations on 2-dimensional matrix multiplication. Using
the CNN model we can increase the accuracy, which we might not get from using
Euclidian distance. Python, being a very high-level language, supports our cause of
using a machine learning model to classify the eyes of the driver as open or closed.
It also provides a huge scale of libraries that the system requires like Video Stream,
dlib, playsound, imutils, NumPy, and SciPy. OpenCV will provide the system with
the video and frames it needs to run the algorithm and detect if the driver is drowsy
or not. The training of the classification model will be done using a huge dataset
and gray scaled images which will ensure independence from light levels on the face
of the driver. The system will be using 2 types of drowsiness detections, i.e., using
CNN classifier and using Euclidian distance between the region of interest, eyes.
The CNN model will provide us with the binary format of the eyes being closed or
open in a sequence of 0 and 1 s. By setting a limit of several 0 or 1 s, we can raise an
alarm before the person starts to drowse. The Euclidian distance will give us the Eye
Aspect Ratio and using experiment, the system will be fed a threshold after which it
will start counting the number of frames the eyes of the driver were closed for. After
the number of consecutive frames crosses the threshold of the frame, the system will
raise an alarm to wake the driver up. Both these detection systems will provide a
good detection system for the proposed system.
2 Related Work
The system made by Data Flair uses haar cascade which takes input from camera,
detects face and ROI using cascade classifier and detection using detect multiscale
which returns array with x, y coordinates, height, and width of the boundary box [1].
Using detect multiscale the system detects the left and right eye and the classifier
feeds the image to the model which calculates the score depending on how long the
eyes were closed. One way of implementing this system is by using raspberry pie
like in [2] and instead of using CNN we can use the Euclidian distance formula to
find out if the driver is drowsy or not.
/[ ]
D= (x2 − x1 )2 − (y2 − y1 )2 (1)
Later, the Eye Aspect Ratio (EAR) is being computed, as distance between upper
eye lid and lower eye lids’ summation divided by 2 times the horizontal distance
532 U. Gupta et al.
between the eyes.
1( )
EAR = Earleft + Earright (2)
2
The value returned will be almost constant when the person’s eye is open, and it
will gradually decrease to zero when it detects the eye blink. If the eyes are closed,
then the EAR rating will be constant but smaller compared to when the eyes are
open [3] is the continuation of the above method by finding the EAR in a video as
a qualitative measure. The blog also touches on the downfalls of using this during
fast changing angles which can produce a false-positive detection, detecting a blink
when in real the person did not. Soukupová and Čech suggest computing EAR for
the nth frame along with EAR for N − 6 and N + 6 frames and forming a 13D feature
vector and training a support vector machine on those feature vectors.
CSIR-CEERI, Chennai center proposed a system which can monitor fatigue and
drowsiness levels [4] using ECG sensors attached to various locations to monitor
physical parameters continuously and non-intrusively. This system heavily relies on
hardware and if combined with machine learning, can help with detecting drowsiness
easily and efficiently. Grant Zhong [5] used the similar method to create his own
drowsiness detector which can not only detect heavy sleepiness but also detect softer
signals. He used a dataset of 30 h of footage of 60 participants and was able to
obtain enough data for alert and drowsy level. He also used MAR which is mouth
aspect ratio which calculates the ratio of the length of the mouth to the wideness
of the mouth. MAR will be higher based on the state of driver. As a drowsy, sleepy
person will tend to yawn and this will give us an early indication of drowsiness.
PUC focuses more on pupil rather than the entire eye. He also calculated MOE,
i.e., mouth over eye ratio. Using all these features yielded him a poor accuracy of
50–60%. A classification method can be using the duration of frames as an access
window. It was experimentally found that for 30 fps videos it was found that ± 6
frames have a significant impact on blinking detection for a frame where the eyes
are most closed while one blinks. To counter this thirteen-dimensional feature is
gathered by concatenating the Eyes Aspect Ratios or EARs of its adjacent ± 6
frames. The implementation of this work was done by using a linear support vector
machine (SVM) classifier AKA EAR-SVM which uses manually annotated sequence
for training. A 13-dimensional feature is calculated and classified by the EAR-SVM
for each frame except the beginning and the end of a video sequence. An interesting
way of finding the ROI is by using facial proportions in which the face is divided into
3 parts. 1st part contains the forehead and eyebrows, 2nd contains the eyes and nose,
and the 3rd contains the mouth and chin area. This, however, won’t always be accurate
as different races lead to different proportions across the globe. Another way is to use
anthropocentrism which is the science of identifying differences in the human body.
We’ll need to measure 3 distances based on the anthropometric landmarks. The four
landmarks are as follows: TR for hairline (trichion), N for nasion, SN for subnasale
and GN for gnathion, i.e., the lowest part of the chin.
3 Proposed System
The proposed system is mainly focused on driver’s eyes, which is the region of interest
(ROI). The system first must detect the driver’s face, locate the ROI, compare it with
the pre-set thresholds (based on the experimental values) and if the score is more
than the threshold, it will be programmed to raise an alarm. The different resources
the system requires are as follows:
. SciPy: to compare distance between eyes using facial landmarks for calculating
Eye Aspect Ratio (EAR).
. Imutils: to streamline computer visualizations and graphics with OpenCV.
. Dlib: to check and define areas of facial landmarks.
. OpenCV: to take facial detection as an input for the system.
. CNN: to classify and figure outif the eyes are open or closed (Fig. 1).
3.1 The Detection of Facial Landmark
Facial landmark detection can be done using OpenCV and a library known as dlib.
As soon as the system is initiated, the camera sensors will trigger the webcam to start.
Using dlib, the system initiates a detector and a predictor to detect frontal face and
shape. Then using the correct array slicer indexes, the system obtains the ROI, i.e.,
the left and right eye of the driver. After image preprocessing using gray scaling for
intensity extraction, the system uses Video Stream to declare a variable for frames
of images taken from webcam. The dlib’s detector to locate the facial landmarks.
3.2 Detection of ROI
The system uses dlib’s estimator to take facial landmarks and convert the result into a
numeric NumPy array. The next step is to get the coordinates of the left and right eye.
After finding the corner of eye and center of eye, it is essential to find the distance
from the eye-center and eyebrow region. Consider the face region, an array of 68
Fig. 1 Procedure of the

system
534 U. Gupta et al.
points as shown below in the image. After close examination we can see that the eyes
can be assessed by using right eye’s points [3, 6] and left eye’s points [4, 7]. A main
task in this step is to visualize each of these facial landmarks overlaying the results on
input layers. To extract the facial regions, we simply need to compute a box bounded
by the x, y coordinates and use NumPy array slicing to extract it. One approach
to accomplish this task involves training a dataset of labeled facial landmarks on
an image. The process requires manual labeling of the images by specifying the x
and y coordinates of the landmarks. Once the training is finalized, an ensemble of
regression trees can be trained to estimate and locate the region of interest (ROI).
3.3 Classification of Eyes Using CNN
CNN is very popular for computer vision tasks which takes input from images and
extract features from an image and make learnable parameters to efficiently classify,
detect, etc. We use something called filters to extract various features from images.
We convolute our image using filters using convolutional operations. Our model will
be classified in binary, i.e., 0 and 1. It predicts each eye, lpred[0] = 1 means that the
eyes are perceived to be alert and lpred[0] = 0 translates to the fact that the eyes are
closed.
3.4 Score Calculation Algorithm
To determine how long the eyes of the driver were not alert for, we calculate the score.
If both the eyes are closed, we incrementally raise the score as the system detects
open eyes, while decreasing the score when it detects closed eyes. The current score
is displayed on the screen, providing real-time information about the individual’s
eye status. By setting a threshold of 15, we can determine if the driver’s eyes have
been closed for an extended period. When this occurs, the system triggers an alarm
to alert the driver.
4 Proposed System
To create the file, a script is written to capture the camera’s eye and store it on
local disk. The dataset was categorized based on the corresponding labels ‘Open’ or
‘Closed’ to classify the images. Manual cleaning of the data was performed to elimi-
nate any irrelevant or unwanted information that could potentially affect the model’s
Table 1 Data splitting

Closed Open
Training 37,946 38,952
Testing 4000 4000
performance. The dataset comprises more than 84,000 images capturing individ-
uals’ eyes under various lighting conditions so gets easy for detection under low
light conditions [8]. There were also several images in which the person is yawning
with eyes closed. Following the completion of model training using the dataset, the
final weights were incorporated into the model architecture file. Consequently, this
dataset can now be employed to determine whether a person’s eyes are open or closed.
When iterating over the detected faces, the width of the bounding box for each face
is computed to draw appropriate boundary boxes around them (Table 1).
4.2 Preprocessing
For face detection in an image, the image is first converted to a grayscale image, as
OpenCV algorithms for object detection use grayscale images as input. However,
color information is not required for product inspection. Therefore, the haar digit
classifier will be used to identify the face. We use [face = cv2.CascadeClassifier
(path to the haar cascade xml file)] to set our classifier. Next, the detection process
is carried out using the detectMultiScale (gray) function, which returns an array of
detected objects along with their corresponding x and y coordinates, as well as their
heights, representing the width of the object’s bounding box. Subsequently, iteration
is performed over the detected faces, enabling the drawing of a boundary box for
each individual face in the image.
The same technique is used to detect faces and eyes. Eyes are detected by setting
cascading classifiers for both eyes in leye and reye, respectively, and then using
multiscale (gray). The data from the eyes image is only extracted from the full
image. To extract the eye image from the frame, the eye’s bounding box is removed
using the following code: l_eye = frame [y: y + h, x: x + w]. The variable l_eye then
contains only the image data corresponding to the left eye. This extracted eye image
will be utilized as input for a CNN classifier, which will predict whether the eye is
open or closed. Similarly, the code for extracting the right eye image, r_eye(), can
be omitted if not required for further analysis.
5 Algorithmic Steps
Here are five algorithmic steps for the proposed drowsiness detector:
536 U. Gupta et al.
5.1 Face Detection
Utilizing haar cascading classifier or similar face detection algorithm to help locate
and identify faces in the input video frames. Retrieval of bounding box coordinates
and dimensions for each detected face.
5.2 Eye Extraction
Extract the regions of interest (ROI = eyes) from the face using the bounding box
coordinates obtained in the previous step. Isolate the left and right eye regions for
further analysis.
5.3 Eye Classification
Feed the extracted eye images into a pre-trained convolutional neural network (CNN)
classifier. CNN will predict whether each eye is open or closed based on the learned
patterns and features.
5.4 EAR Calculation (Eye Aspect Ratio)
Calculate the Eye Aspect Ratio by measuring the distance between key landmarks
on each eye, such as the outer and inner corners. The EAR formula helps quantify
the openness of the eyes and can indicate drowsiness when it falls below a certain
threshold [1–5].
5.5 Drowsiness Detection
Monitor the EAR values over time. Apply a threshold value to determine if the driver’s
eyes have been closed for an extended period, indicating drowsiness. If the threshold
is exceeded, trigger an alarm or alert to notify the driver or take appropriate action to
prevent accidents. By following above steps, the proposed drowsiness detector aims
to identify drowsiness based on eye behavior and provide timely warnings to ensure
driver safety [6–10].
6 Flow/Block Diagram
The diagram is the flow chart of the scoring system. It explains the basic algorithm
behind the counter and alarm. Here, Eye Aspect Ratio will continuously detect if the
eyes are closed or open. The system will be provided with a minimum Eye Aspect
Ratio (EAR), i.e., the threshold after which the system will commence counting the
number of frames the driver is detected drowsy (based on how long his eyes are
closed) (Figs. 2, 5 and 6).
This image shows the proposed system implemented in a real-life mobile appli-
cation that continuously checks the eyes states, head movement, and yawning. This
is the future scope of the said project. Using real-time image acquisition and alerting
the driver, we can reduce the accidents caused by driving (Figs. 3, 5 and 6).
Fig. 2 Scoring system
Fig. 3 Pictorial depiction

538 U. Gupta et al.
Fig. 4 Flow diagram
The diagram below explains the flow of the proposed system. The system will
first take input from the webcam, discover the face, and create a box for region of
interest, i.e., eyes. The ROI will be fed to the classifier and the CNN model will
classify whether the driver is alert or drowsy. The system calculates a keep score
based on the threshold and number of frames. This score determines if the driver is
drowsy or not. If the score exceeds the threshold, it will ring an alarm to wake the
driver (Figs. 4, 5 and 6).
7 Result
This system is still under process and will be able to detect drowsiness very effi-
ciently after gaining and including so many types of models and techniques. The
preprocessing successfully obtains results with respect to facial recognition. The
haar cascading classifier is employed to locate and identify faces, returning a search
array that includes the x and y coordinates, as well as the height (which corresponds
to the width of the bounding box).
Subsequently, iteration is performed over the detected faces, enabling the drawing
of a bounding box around each identified face. By applying this technique, the system
can detect regions of interest, such as the eyes, when processing video input and deter-
mine whether the driver is drowsy or not. In Lenovo ThinkPad T470, i7 processor,
16 GB RAM it approximately took 85 min to run 5 epochs. The model is giving an
accuracy of 93.94% after final epoch of 5/5 (Figs. 5, and 6; Table 2).
Average of accuracy for all the epochs, i.e., the accuracy of the model is 93.294%
and the final loss of the model came out to be 18.11% (Table 2).
Fig. 5 Result with specs
Fig. 6 Result without specs
Table 2 Accuracy and loss

Total epochs Accuracy (%) Val. loss (%)
of model for InceptionV3
Epoch 1 92.34 21.67
Epoch 2 93.01 19.15
Epoch 3 93.48 19.37
Epoch 4 93.70 19.12
Epoch 5 93.94 18.11
Bold represent highest accuracy and lowest val. loss
Table 3 Accuracy and loss

Total epochs Accuracy (%) Val. loss (%)
of model for ResNet50
Epoch 1 50.35 69.32
Epoch 2 50.38 69.31
Epoch 3 50.61 69.31
Epoch 4 50.45 69.31
Epoch 5 50.63 69.31
Bold represent highest accuracy and lowest val. loss
540 U. Gupta et al.
The average accuracy and loss after using another pre-made, ResNet50 neural
network architecture was 50.484% and 69.312%, in Table 3 respectively. The layers
which were added by us did not help with the accuracy. The better model is Incep-
tionV3 with 93.3% accuracy. The layers used in these models included mixed
sequences of Dropout, Input, Flatten, Dense, and MaxPooling2D. The activation
functions utilized include Relu and Softmax [11–15].
The proposed system really has the potential to change the number of accidents
happening due to drowsiness and sleepiness while driving. The system takes no
guarantee that the driver will not fall asleep on the wheel as that requires proper
7–8 h’ sleep, not using medications that make you drowsy before driving, etc. The
system can be made more beneficial if we integrate it into phones and connect it
with google maps and trigger the detection system if the driver opened maps in a
moving vehicle. It can help predict the drowsiness of the driver continuously without
depending on the person to start the app. We can also add multiple other features
for detection like yawning, slowness in blinking, etc. Making multiple models and
combining all of them using transfer learning can help us get more diversity and raise
accuracy. Integrating this system with the car itself can change the evolution of this
industry. With the combination of IoT and hardware system, this prediction model
can be made more accurate and precise.
References
1. Jayanthy S, Chandru R, Yuvaprakash YM, Sathishbabu D (2021) Automatic warning system for
drivers using deep learning algorithm. In: 2021 Second international conference on electronics
and sustainable communication systems (ICESC). IEEE, pp 1730–1737
2. Rosebrock A (2017) Drowsiness detection with OpenCV. Py Image Search
3. Rosebrock A (2017) Eye blink detection with OpenCV, Python, and dlib. https://www.pyimag
esearch.com/2017/0, 4, 24
4. Muthukumaran N, Prasath NRG, Kabilan R (2019) Driver sleepiness detection using deep
learning convolution neural network classifier. In: 2019 Third international conference on
I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC). IEEE, pp 386–390
5. Rundo F, Rinella S, Massimino S, Coco M, Fallica G, Parenti R, Perciavalle V (2019) An
innovative deep learning algorithm for drowsiness detection from EEG signal. Computation
7(1):13
6. Saha P, Bhattacharjee D, De BK, Nasipuri M (2015) An approach to detect the region of interest
of expressive face images. Procedia Comput Sci 46:1739–1746
7. Hazarika BB, Gupta D, Gupta U (2023) Intuitionistic fuzzy Kernel random vector functional
link classifier. In: Machine intelligence techniques for data analysis and signal processing:
proceedings of the 4th international conference MISP 2022, vol 1. Springer Nature, Singapore,
pp 881–889
8. Media Research Lab (MRL). Dataset: http://mrl.cs.vsb.cz/eyedataset/
9. Malviya L, Mal S, Kumar R, Roy B, Gupta U, Pantola D, Gupta M (2023) Mental stress level
detection using LSTM for WESAD dataset. In: Proceedings of data analytics and management:
ICDAM 2022. Springer Nature, Singapore, pp 243–250
10. Gupta U, Gupta D (2022) Least squares structural twin bounded support vector machine on
class scatter. Appl Intell, pp 1–31
11. Fouad IA (2023) A robust and efficient EEG-based drowsiness detection system using different
machine learning algorithms. Ain Shams Eng J 14(3):101895
12. Prasath N, Sreemathy J, Vigneshwaran P (2022) Driver drowsiness detection using machine
learning algorithm. In: 2022 8th International conference on advanced computing and
communication systems (ICACCS), vol 1. IEEE, pp 01–05
13. El-Nabi SA, El-Shafai W, El-Rabaie ESM, Ramadan KF, Abd El-Samie FE, Mohsen S (2023)
Machine learning and deep learning techniques for driver fatigue and drowsiness detection: a
review. Multim Tools Appl, pp 1–37
14. Albadawi Y, AlRedhaei A, Takruri M (2023) Real-time machine learning-based driver
drowsiness detection using visual features. J Imaging 9(5):91
15. Singh J, Kanojia R, Singh R, Bansal R, Bansal S (2023) Driver drowsiness detection system:
an approach by machine learning application. arXiv preprint arXiv:2303.06310
Author Index
A Bindu Verma, 77
Aanchal Patel, 445 Binu Thomas, 333
Abdelnaby, Abdalla Hamada Abdelnaby,
253
Abhishek Jangid, 141 C
Abuhmida, Mabrouka, 237 Çamur, Hüseyin, 253
Aditya Gola, 101 Chandra Mohan Dharmapuri, 101
Adrija Mitra, 191 Chinthakunta Swetha, 499
Adrita Chakraborty, 191
Ahmad Habib Khan, 53
Ahmed Alkhayyat, 225 D
Akaash Nidhiss Pandian, 77 Debrupa Pal, 389
Akshat Shukla, 115 Deepak Kumar, 29
Allabaksh Shaik, 163 Devanshi Singh, 53
Alrammahi, Ali Abdulkarem Habib, 399 Devansh Singh, 293
Amit Dhakad, 115 Divyansh Goyal, 279
Anasuya Mithra Parthaje, 77 Drishti Seth, 343
Anil Kumar, 191, 201, 211, 225 Durga Satish Matta, 373
Anjana Gosain, 87 Dushyant Lavania, 211
Ankita Bansal, 181
Ansh Sarkar, 211
Anshul Arora, 29 E
Anup Singh, 201 Ehtesham Sana, 67
Arvindhan, M., 431 Ettyem, Sajjad Ali, 361
Ashish Khanna, 279, 293
Ashish Prajapati, 115
Ashwani, 459 G
Atharva Jhawar, 115 Gunjan Chugh, 343
Avdeep Malik, 431 Gurleen Kaur, 459
B H
Bai, Jiping, 237 Harshita Patel, 445
Balamanigandan, R., 499, 513 Harshit Rajput, 141
Bansibadan Maji, 389 Hrishabh Palsra, 141
© The Editor(s) (if applicable) and The Author(s), under exclusive license 543
to Springer Nature Singapore Pte Ltd. 2023
https://doi.org/10.1007/978-981-99-6553-3
544 Author Index
I Parul Agarwal, 15
Ishaan Dhull, 293 Prabudhd Krishna Kandpal, 303
Prapti Patra, 201
Prasanna Kumar Lakineni, 499
J
Jaspreeti Singh, 87
Joydeep Saggu, 181 R
Jyoti, 87 Rahul Singh Pawar, 115
Jyoti Kukade, 115 Rajesh Kumar, T., 499, 513
Rajeshwari Pandey, 151
Rathore, Bharati, 265
K Ria Monga, 293
Kakulapati, V., 321 Rishit Pandey, 225
Karthik, S., 333 Rizwana Kallooravi Thandil, 1
Kassem, Youssef, 253 Rohit Khanna, 343
Kimmi Gupta, 431 Rohit Kumar Bondugula, 475
KPA Dharmanshu Mahajan, 343 Roshan Karwa, 513
Kriti Suneja, 151
S
L Saba Raees, 15
Laveena Sehgal, 415 Sachin Taran, 141
Lekha Rani, 459 Saruladha, K., 373
Sasi Kumar, A., 513
Sathyendra Kumar, V., 499
M Sayani Joddar, 225
Mahapatra, R. P., 131 Shagun, 29
Mahaveerakannan, R., 499, 513 Shaid Sheel, 225
Mahendra Sharma, 415 Shaik Mahaboob Basha, 163
Manju Khari, 487 Shamini James, 333
Meganathan, R., 513 Shourya, 303
Milne, Daniel, 237 Shruti Bibra, 131
Mohamed Basheer, K. P., 1 Shubham Agarwal, 445
Muneer, V. K., 1 Shubham Upadhyay, 431
Shweta Meena, 53
Shynu, T., 361
N Siba K. Udgata, 475
Nadeem Akhtar, 67 Silvia Priscila, S., 361
Nama, Fatima Adel, 399 Srijal Vedansh, 201
Neelam Sharma, 303 Srijan Singh, 131
Neelima Chakraborty, 101 Stuti Hegde, 445
Neeta Pandey, 151 Subhram Das, 389
Nidhi Chaurasia, 279 Sudeshna Chakraborty, 431
Nitish Pathak, 333 Sudeshna Dey, 389
Nittishna Dhar, 211 Sudhanshu Gupta, 529
Suman Rajest, S., 361
Supratik Dutta, 191
O Supreeta Nayak, 211
Obaid, Ahmed J., 361, 399 Sushruta Mishra, 191, 201, 211, 225
Swati Nigam, 487
P
Pallavi Mishra, 445 T
Papri Ghosh, 389 Tariq Hussain Sheikh, 279, 293
Author Index 545
Tarun Kumar, 293 Vinod Karar, 101

Trapti Mishra, 115 Vishisht Ved, 201
U W
Umesh Gupta, 529 Wilson, Ian, 237
Urvashi Sugandh, 487
Usama Bin Rashidullah Khan, 67
Y
Yash Anand, 191
V Yash Yadav, 303
Velmurugan, S., 101 Yelisetty Priya Nagasai, 529

Proceedings of Data Analytics and Management

Uploaded by

Copyright:

Available Formats

You might also like

Proceedings of Data Analytics and Management

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proceedings of Data Analytics and Management

Uploaded by

Copyright:

Available Formats

Lecture Notes in Networks and Systems 788

ISSN 2367-3370 ISSN 2367-3389 (electronic)

Paper in this product is recyclable.

Prof. (Dr.) Don MacRaild, Pro-Vice Chancellor, London Metropolitan University,

Prof. Dr. Aboul Ella Hassanien, Cairo University, Egypt

Prof. Dr. Vassil Vassilev, London Metropolitan University, London

Technical Program Chairs

Dr. Shahram Salekzamankhani, London Metropolitan University, London

Dr. Ashish Khanna, Maharaja Agrasen Institute of Technology (GGSIPU), New

Mr. Moolchand Sharma, Maharaja Agrasen Institute of Technology, India

We hereby are delighted to announce that The London Metropolitan University,

New Delhi, India Abhishek Swaroop

Deep Spectral Feature Representations Via Attention-Based

Synergizing Voice Cloning and ChatGPT for Multimodal

Forecasting Financial Success App: Unveiling the Potential

Comparing Techniques for Digital Handwritten Detection Using

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543

About the Editors

Prof. (Dr.) Zdzislaw Polkowski is Adjunct Professor at Faculty of Technical

University of Coimbra, Portugal, in 2020. Currently, he is Associate Professor at

Prof. Bal Virdee graduated with a B.Sc. (Engineering) Honors in Communication

Abdalla Hamada Abdelnaby Abdelnaby Faculty of Engineering, Mechanical

Yash Anand Kalinga Institute of Industrial Technology, Deemed to be University,

KPA Dharmanshu Mahajan Department of Artificial Intelligence and Machine

S. Karthik Kalasalingam Academy of Research and Education, Krishnankoil,

Bansibadan Maji National Institute of Technology, Durgapur, India

Neeta Pandey Delhi Technological University, Delhi, India

V. Sathyendra Kumar Department of CSE, BIHER, Chennai, India;

S. Suman Rajest Bharath Institute of Higher Education and Research, Chennai,

Rizwana Kallooravi Thandil , K. P. Mohamed Basheer ,

R. K. Thandil (B) · K. P. Mohamed Basheer · V. K. Muneer

Keywords Automatic speech recognition · Human computer interface · Speech

Similarly, Sasikumar et al. [4] proposed an attention-based LSTM architecture for

The AASR is a task so challenging for low-resource languages like Malayalam.

3.1 Data Collection

Fig. 1 Functional block diagram of the proposed methodology

3.2 Data Preprocessing

Table 1 District-wise data

Fig. 2 Proposed RNN

3.3 Accented Model Construction

3.3.1 Phase 1: Unified Accented Model Construction Using RNN

3.3.2 Phase 2: Unified Accented Model Construction Using RNN

Fig. 3 Proposed RNN with attention mechanism

3.3.3 Phase 3: Unified Accented Model Construction Using LSTM

3.3.4 Phase 4: Unified Accented Model Construction Using LSTM

Fig. 4 Proposed LSTM

3.3.5 Result and Evaluation

Fig. 5 Proposed LSTM with attention block

Table 2 Evaluation metrics in terms of accuracy and loss

Overall, these findings emphasize the significance of incorporating attention

Fig. 6 Performance evaluation using WER and MER

Table 3 Comparison with existing research

Saba Raees and Parul Agarwal

Abstract In recent years, convolutional neural networks (CNNs) have shown

Keywords Convolutional neural networks · Tree data structure · Deep learning ·

S. Raees · P. Agarwal (B)

Fig. 1 A dummy hierarchal structure of a company

Fig. 2 Process of global average pooling