Download as pdf or txt
Download as pdf or txt
You are on page 1of 547

Lecture Notes in Networks and Systems 796

Shailesh Tiwari
Munesh C. Trivedi
Mohan L. Kolhe
Brajesh Kumar Singh Editors

Advances
in Data and
Information
Sciences
Proceedings of ICDIS 2023
Lecture Notes in Networks and Systems

Volume 796

Series Editor
Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Türkiye
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Shailesh Tiwari · Munesh C. Trivedi ·
Mohan L. Kolhe · Brajesh Kumar Singh
Editors

Advances in Data
and Information Sciences
Proceedings of ICDIS 2023
Editors
Shailesh Tiwari Munesh C. Trivedi
KIET Group of Institutions Department of Computer Science
Ghaziabad, Uttar Pradesh, India and Engineering
National Institute of Technology Agartala
Mohan L. Kolhe Tripura, India
Faculty of Engineering and Science
University of Agder Brajesh Kumar Singh
Kristiansand, Norway Department of Computer Science
and Engineering
R. B. S. Engineering Technical Campus,
Bichpuri
Agra, Uttar Pradesh, India

ISSN 2367-3370 ISSN 2367-3389 (electronic)


Lecture Notes in Networks and Systems
ISBN 978-981-99-6905-0 ISBN 978-981-99-6906-7 (eBook)
https://doi.org/10.1007/978-981-99-6906-7

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore

Paper in this product is recyclable.


Preface

The ICDIS is a major multidisciplinary conference organized with the objective


of bringing together researchers, developers and practitioners from academia and
industry working in all areas of computer and computational sciences. It is organized
specifically to help computer industry to derive the advances of next-generation
computer and communication technology. Researchers invited to speak will present
the latest developments and technical solutions.
Technological developments all over the world are dependent upon globaliza-
tion of various research activities. Exchange of information and innovative ideas
is necessary to accelerate the development of technology. Keeping this ideology
in preference, the 5th International Conference on Data and Information Sciences
(ICDIS-2023) has been organized at Raja Balwant Singh Engineering Technical
Campus, Bichpuri, Agra, India, during June 16–17, 2023.
The 5th International Conference on Data and Information Sciences has been
organized with a foreseen objective of enhancing the research activities at a large
scale. Technical Program Committee and Advisory Board of ICDIS-2023 include
eminent academicians, researchers and practitioners from abroad as well as from all
over the nation.
A sincere effort has been made to make it an immense source of knowledge by
including 42 manuscripts in this proceedings volume. The selected manuscripts have
gone through a rigorous review process and are revised by authors after incorporating
the suggestions of the reviewers.
ICDIS-2023 received around 211 submissions from around 506 authors of
different countries such as Indonesia, USA, Norway, Poland and Saudi Arabia. Each
submission has been gone through the similarity check. On the basis of similarity
report, each submission has been rigorously reviewed by at least two reviewers. Even
some submissions have more than two reviews. On the basis of these reviews, 42
high-quality papers were selected for publication in two proceedings volumes, with
an acceptance rate of 19.9%.
We are thankful to the keynote speakers—Prof. Marius M. Balas, Aurel Vlaicu
University of Arad Romania, and Prof. K. V. Arya, IIITM Gwalior, Madhya Pradesh,
India, to enlighten the participants with their knowledge and insights. We are also

v
vi Preface

thankful to delegates and the authors for their participation and their interest in ICDIS-
2023 as a platform to share their ideas and innovation. We are also thankful to the Prof.
Dr. Janusz Kacprzyk, Series Editor, LNNS, Springer Nature, and Mr. Aninda Bose,
Executive Editor, Springer Nature, India, for providing guidance and support. Also,
we extend our heartfelt gratitude to the reviewers and Technical Program Committee
Members for showing their concern and efforts in the review process. We are indeed
thankful to everyone directly or indirectly associated with the conference organizing
team leading it toward the success.
Although utmost care has been taken in compilation and editing, however, a few
errors may still occur. We request the participants to bear with such errors and lapses
(if any). We wish you all the best.

Agra, India Shailesh Tiwari


Munesh C. Trivedi
Mohan L. Kolhe
Brajesh Kumar Singh
Contents

Emotion-Aware Music Recommendations: A Transfer Learning


Approach Using Facial Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Sai Teja Annam, Jyostna Devi Bodapati, and RajaSekhar Konda
A Novel Image Captioning Approach Using CNN and MLP . . . . . . . . . . . 13
Swati Sharma, Vivek Tomar, Neha Yadav, and Mukul Aggarwal
Route Optimizations Using Genetic Algorithms for Wireless Body
Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Gagan Sharma
Prediction Model for the Healthcare Industry Using Machine
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Birendra Kumar Saraswat, Aditya Saxena, and P. C. Vashist
An Efficient Framework for Crime Prediction Using Feature
Engineering and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Vengadeswaran, Dhanush Binu, and Lokesh Rai
Development of an Online Collectable Items Marketplace Using
Modern Practices of SDLC and Web Technologies . . . . . . . . . . . . . . . . . . . . 61
Saransh Khulbe, Divyam Gumber, and Shailendra Pratap Singh
Facial Emotion Recognition Using Machine Learning Algorithms:
Methods and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Akshat Gupta
Similar Intensity-Based Euclidean Distance Feature Vector
for Mammogram Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Bhanu Prakash Sharma and Ravindra Kumar Purwar
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence . . . 95
Pankaj Singh Kholiya, Kriti, and Amit Kumar Mishra

vii
viii Contents

Two-Factor Authentication Using QR Code and OTP . . . . . . . . . . . . . . . . . 105


Avanish Gupta, Akhilesh Singh, Anurag Tripathi, and Swati Sharma
bSafe: A Framework for Hazardous Situation Monitoring
in Industries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Nikhil Kashyap, Nikhil Rajora, and Shanu Sharma
Comparative Review of Different Techniques for Predictive
Analytics in Crime Data Over Online Social Media . . . . . . . . . . . . . . . . . . . 127
Monika and Aruna Bhat
An Approach Towards Modification of Playfair Cipher
Using 16 × 16 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Sarmistha Podder, Aditya Harsh, Jayanta Pal, and Nirmalya Kar
Real and Fake Job Classification Using NLP and Machine
Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Manu Gupta, Naga Sridevi Piratla, Sripath Kumar Chakrapani,
and Yeshwanth Pasem
An Improved Privacy-Preserving Multi-factor Authentication
Protocol for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Shaurya Kanwar, Sunil Prajapat, Deepika Gautam, and Pankaj Kumar
Blockchain-Powered Crowdfunding: Assessing the Viability,
Benefits, and Risks of a Decentralized Approach . . . . . . . . . . . . . . . . . . . . . . 179
Manu Midha, Saumyamani Bhardwaz, Rohan Godha,
Aditya Raj Mehta, Sahul Kumar Parida, and Saswat Kumar Panda
Geospatial Project: Landslide Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Harsh Sharma, Harsh Jindal, Megha Sharma, Abhinav Sehgal,
Abhinav Sharma, and Rohan Godha
Yoga Pose Identification Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 203
Ashutosh Kumar Verma, Divyanshu Sharma, Himanshu Aggarwal,
and Naveen Chauhan
Predicting Size and Fit in Fashion E-Commerce . . . . . . . . . . . . . . . . . . . . . . 215
Itisha Kumari and Vijay Verma
Multimodal Authentication Token Through Automatic Part
of Speech (POS) Tagged Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Dharmendra Kumar and Sudhansh Sharma
Sensitive Content Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Harsha Vardhan Puvvadi and Shyamala L
Online Social Networks: An Efficient Framework for Fake Profiles
Detection Using Optimizable Bagged Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Chanchal Kumar, Taran Singh Bharati, and Shiv Prakash
Contents ix

Exploring Risk Factors for Cardiovascular Disease: Insights


from NHANES Database Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Gaurav Parashar, Alka Chaudhary, and Dilkeshwar Pandey
Review of Local Binary Pattern Histograms for Intelligent CCTV
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Deepak Sharma, Brajesh Kumar Singh, and Erma Suryani
Enhancing Machine Learning Model Using Explainable AI . . . . . . . . . . . . 287
Raghav Shah, Amruta Pawar, and Manni Kumar
IoT-Assisted Mushroom Cultivation in Agile Environment . . . . . . . . . . . . 299
Abhi Kathiria, Parva Barot, Manish Paliwal, and Aditya Shastri
A Comprehensive Survey of Machine Learning Techniques
for Brain Tumor Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Mriga Jain, Brajesh Kumar Singh, and Mohan Lal Kolhe
Current Advances in Locality-Based and Feature-Based
Transformers: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Ankit Srivastava, Munesh Chandra, Ashim Saha, Sonam Saluja,
and Deepshikha Bhati
A Systematic Review on Detection of Gastric Cancer in Endoscopic
Imaging System in Artificial Intelligence Applications . . . . . . . . . . . . . . . . . 337
K. Pooja and R. Kishore Kanna
Propchain: Decentralized Property Management System . . . . . . . . . . . . . . 347
Soma Prathibha, V. Saiganesh, S. Lokesh, Avudaiappan Maheshwari,
and M. A. Kishore
Criminal Prevision with Weapon Identification and Forewarning
Software in Military Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
V. Ceronmani Sharmila, A. Vishnudev, and S. Gautham
Advanced Cyber Security Helpdesk Using Machine Learning
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
V. Ceronmani Sharmila, E. Vishnu, M. R. King Udayaraj,
and V. Sharan Venkatesh
Advanced Persistent Threat Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
V. Ceronmani Sharmila, S. Aswin, B. Muthukumara Vadivel,
and S. Vinutha
Blockchain-Based Supply Chain Management System for Secure
Vaccine Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
V. Ceronmani Sharmila, S. Nandhini, and Ben Franklin Yesuraj
x Contents

Dehazing of Multispectral Images Using Contrastive Learning In


CycleGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
S. Kayalvizhi, Badrinath Karthikeyan, Canchibalaji Sathvik,
and Chadalavada Gautham
Stress Analysis Prediction for Coma Patient Using Machine
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
P. Alwin Infant, J. Charulatha, G. Sadhana, and K. Ragavendra
AI-Powered News Web App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Sheetal Phatangare, Roopal Tatiwar, Rishikesh Unawane,
Aditya Taware, Aryaman Todkar, and Shubhankar Munshi
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based
on Carry Select Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
J. Harirajkumar, R. Shivakumar, S. Swetha, and N. Sasirekha
Non-invasive Diabetes Detection System Using
Photoplethysmogram Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Dayakshini Sathish, Souhardha S. Poojary, Samarth Shetty,
Preethesh H. Acharya, and Sathish Kabekody
Hybrid Machine Learning Algorithms for Effective Prediction
of Water Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Kavitha Datchanamoorthy, B. Padmavathi, Dhamini Devaraj,
T. R. Gayathri, and V. Hasitha
OCR-Based Ingredient Recognition for Consumer Well-Being . . . . . . . . . 481
S. Kayalvizhi, N. Akash Silas, R. K. Tarunaa, and Shivani Pothirajan
Design and Implementation of an SPI to I2C Bridge for Seamless
Communication and Interoperability Between Devices . . . . . . . . . . . . . . . . 493
J. Harirajkumar, M. Santhosh, N. Sasirekha, and P. Vivek Karthick
Greenhouse Automation System Using ESP32 and ThingSpeak
for Temperature, Humidity, and Light Control . . . . . . . . . . . . . . . . . . . . . . . 507
Rajesh Singh, Ahmed Hussain, Laith H. A. Fezaa, Ganesh Gupta,
Anil Pratap Singh, and Ayush Dogra
Performance Analysis of Terrain Classifiers Using Different
Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Bobbinpreet Kaur, Arpan Garg, Haider Alchilibi, Laith H. A. Fezaa,
Rupinderjit Kaur, and Bhawna Goyal
Contents xi

High Performance RF Doherty Power Amplifier for Future


Wireless Communication Applications—Review, Challenges,
and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Sukhpreet Singh, Atul Babbar, Ahmed Alkhayyat, Ganesh Gupta,
Sandeep Singh, and Jasgurpreet Singh Chohan

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545


Editors and Contributors

About the Editors

Prof. Shailesh Tiwari, Ph.D. currently works as a director and professor in Computer
Science and Engineering Department, Krishna Engineering College, Ghaziabad,
India. He is also taking care of the responsibility of additional director, KIET Group
of Institutions, Ghaziabad, India. He is an alumnus of Motilal Nehru National Insti-
tute of Technology Allahabad, India. His primary areas of research are software
testing, implementation of optimization algorithms and machine learning techniques
in various engineering problems. He has published more than 100 publications in
international journals and in proceedings of international conferences of repute. He
has edited special issues of several Scopus, SCI and E-SCI-indexed journals. He has
also edited several books published by Springer. He has published 6 Indian patents as
IPRs. He has organized several international conferences under the banner of IEEE,
ACM and Springer.

Dr. Munesh C. Trivedi having more than 18 years of teaching experience (out of
which 12 years post Ph.D.) is currently working with National Institute of Tech-
nology, Agartala, Tripura, India, previously. He has successfully filed 60 patents
(51 National and 09 International Patents) in Germany, South Africa and Australia.
He has also published 12 text books and 148 research papers in different inter-
national journals and proceedings of repute. He has also edited 38 books of the
Springer Nature. He has also received numerous awards including Young Scien-
tist Visiting Fellowship, Albert Einstein Research Scientist Award, and Best Senior
Faculty Award, Outstanding Scientist, Dronacharya Award, Author of Year and
Vigyan Ratan Award from different national as well international forum. He has also
organized more than 32 international conferences technically sponsored by IEEE,
ACM and Springer.

Prof. Dr. Mohan L. Kolhe is a full professor in smart grid and renewable energy at
the Faculty of Engineering and Science of the University of Agder (Norway). He is

xiii
xiv Editors and Contributors

a leading renewable energy technologist with three decades of academic experience


at the international level and previously held academic positions at the world’s pres-
tigious universities, e.g., University College London (UK/Australia), University of
Dundee (UK); University of Jyvaskyla (Finland); Hydrogen Research Institute, QC
(Canada); etc. In addition, he was a member of the Government of South Australia’s
first Renewable Energy Board (2009–2011) and worked on developing renewable
energy policies. His research works in energy systems have been recognized within
the top 2% of scientists globally by Stanford University’s 2020, 2021 matrices. He
is an internationally recognized pioneer in his field, whose top 10 published works
have an average of over 175 citations each.

Prof. Brajesh Kumar Singh is presently working as a professor and head in Depart-
ment of Computer Science and Engineering, R. B. S. Engineering Technical Campus,
Agra, India, with more than 20 years of teaching experience. He completed his
doctorate degree in Computer Science and Engineering from Motilal Nehru National
Institute of Technology, Allahabad (Uttar Pradesh). His key areas of research are soft-
ware engineering, software project management, data mining, soft computing and
machine learning. He has supervised 02 Ph.D. candidates. Prof. Singh has delivered
several invited talks/key note addresses and chaired sessions in national and inter-
national conferences of high repute in India and abroad. He is having collaborative
training programs/workshops with IIT Bombay.

Contributors

Preethesh H. Acharya Department of Electronics and Communication Engi-


neering, SJEC, Mangaluru, Karnataka, India
Himanshu Aggarwal Department of Computer Science and Engineering, KIET
Group of Institutions, Ghaziabad, India
Mukul Aggarwal KIET Group of Institutions, Ghaziabad, India
N. Akash Silas Computer Science and Engineering, Easwari Engineering College,
Chennai, India
Haider Alchilibi Medical Technical College, Al-Farahidi University, Baghdad, Iraq
Ahmed Alkhayyat College of Technical Engineering, The Islamic University,
Najaf, Iraq
P. Alwin Infant Computer Science and Engineering, Panimalar Engineering
College, Chennai, India
Sai Teja Annam Department of Advanced Computer Science and Engineering,
VFSTR Deemed to Be University, Vadlamudi, India
Editors and Contributors xv

S. Aswin Department of Information Technology, Hindustan Institute of Tech-


nology and Science, Chennai, India
Atul Babbar Department of Mechanical Engineering, SGT University, Gurugram,
Haryana, India
Parva Barot School of Technology, Pandit Deendayal Energy University, Gandhi-
nagar, India
Taran Singh Bharati Department of Computer Science, Jamia Millia Islamia, New
Delhi, India
Saumyamani Bhardwaz Department of Computer Science and Engineering,
Chandigarh University, Kochi, India
Aruna Bhat Department of Computer Science and Engineering, Delhi Technolog-
ical University, New Delhi, India
Deepshikha Bhati University of Kent, Kent, OH, USA
Dhanush Binu Department of Computer Science and Engineering, Indian Institute
of Information Technology Kottayam, Kottayam, India
Jyostna Devi Bodapati Department of Advanced Computer Science and Engi-
neering, VFSTR Deemed to Be University, Vadlamudi, India
V. Ceronmani Sharmila Department of Information Technology, Hindustan Insti-
tute of Technology and Science, Padur, Chennai, India
Sripath Kumar Chakrapani Department of ECM, Sreenidhi Institute of Science
and Technology, Hyderabad, India
Munesh Chandra Computer Science and Engineering Department, NIT Agartala,
Agartala, India
J. Charulatha Department of Information Technology, Hindustan Institute of
Technology and Science, Chennai, India
Alka Chaudhary AMITY Institute of Information Technology, AMITY Univer-
sity, Noida, UP, India
Naveen Chauhan Department of Computer Science and Engineering, KIET Group
of Institutions, Ghaziabad, India
Jasgurpreet Singh Chohan University Center of Research and Development,
Chandigarh University, Chandigarh, India
Kavitha Datchanamoorthy Easwari Engineering College, Ramapuram, Chennai,
India
Dhamini Devaraj Easwari Engineering College, Ramapuram, Chennai, India
Ayush Dogra Chitkara University Institute of Engineering and Technology,
Chitkara University, Punjab, India
xvi Editors and Contributors

Laith H. A. Fezaa Department of Optical Techniques, Al-Zahrawi University


College, Karbala, Iraq
Arpan Garg Department of ECE, Chandigarh University, Gharuan, Mohali, India
Deepika Gautam Department of Mathematics, Srinivasa Ramanujan, Central
University of Himachal Pradesh, Dharamshala, (H.P.), India
Chadalavada Gautham Computer Science and Engineering, Easwari Engineering
College Chennai, Chennai, India
S. Gautham Department of Information Technology, Hindustan Institute of Tech-
nology and Science, Chennai, India
T. R. Gayathri Easwari Engineering College, Ramapuram, Chennai, India
Rohan Godha Department of Computer Science and Engineering, Chandigarh
University, Kochi, India
Bhawna Goyal Department of ECE, Chandigarh University, Gharuan, Mohali,
India
Divyam Gumber School of Computing Science and Engineering, Galgotias
University, Greater Noida, Uttar Pradesh, India
Akshat Gupta Department of CSE, National Institute of Technology, Goa, India
Avanish Gupta KIET Group of Institutions, Delhi, India
Ganesh Gupta Department of CSE, Sharda University, Greater Noida, India
Manu Gupta Department of ECM, Sreenidhi Institute of Science and Technology,
Hyderabad, India
J. Harirajkumar Department of Electronics and Communication Engineering,
Sona Collage of Technology, Salem, Tamil Nadu, India
Aditya Harsh CSE Department, NIT Agartala, Tripura, India
V. Hasitha Easwari Engineering College, Ramapuram, Chennai, India
Ahmed Hussain Medical Technical College, Al-Farahidi University, Baghdad, Iraq
Mriga Jain R.B.S. Engineering Technical Campus, Bichpuri, Agra, India
Harsh Jindal Computer Science and Engineering, Chandigarh University, Kochi,
India
Sathish Kabekody Department of Electrical and Electronics Engineering, SJEC,
Mangaluru, Karnataka, India
Shaurya Kanwar Department of Mathematics, Srinivasa Ramanujan, Central
University of Himachal Pradesh, Dharamshala, (H.P.), India
Nirmalya Kar CSE Department, NIT Agartala, Tripura, India
Editors and Contributors xvii

Badrinath Karthikeyan Computer Science and Engineering, Easwari Engineering


College Chennai, Chennai, India
Nikhil Kashyap Department of Computer Science and Engineering, ABES Engi-
neering College, Ghaziabad, India
Abhi Kathiria School of Technology, Pandit Deendayal Energy University, Gand-
hinagar, India
Bobbinpreet Kaur Chandigarh University, Gharuan, Mohali, India
Rupinderjit Kaur Department of Computer Science and Engineering, Gulzar
Group of Institutions, Khanna, Punjab, India
S. Kayalvizhi Computer Science and Engineering, Easwari Engineering College,
Chennai, India
Pankaj Singh Kholiya School of Computing, DIT University, Dehradun, India
Saransh Khulbe School of Computing Science and Engineering, Galgotias Univer-
sity, Greater Noida, Uttar Pradesh, India
M. R. King Udayaraj Department of Information Technology, Hindustan Institute
of Technology and Science, Padur, Chennai, India
M. A. Kishore Department of Information Technology, Sri Sairam Engineering
College, Chennai, India
R. Kishore Kanna Department of Biomedical Engineering, Vels Institute of
Science, Technology and Advanced Studies, Chennai, India
Mohan Lal Kolhe Faculty of Engineering and Science, University of Agder,
Grimstad, Norway
RajaSekhar Konda Software Engineering Manager & IEEE Senior Member,
IEEE, San Francisco, CA, USA
Kriti School of Computing, DIT University, Dehradun, India
Chanchal Kumar Department of Computer Science, Jamia Millia Islamia, New
Delhi, India
Dharmendra Kumar SOCIS, IGNOU, New Delhi, India
Itisha Kumari National Institute of Technology, Kurukshetra, Kurukshetra,
Haryana, India
Manni Kumar Department of Computer Science and Engineering, Chandigarh
University, Mohali, Punjab, India
Pankaj Kumar Department of Mathematics, Srinivasa Ramanujan, Central Univer-
sity of Himachal Pradesh, Dharamshala, (H.P.), India
xviii Editors and Contributors

S. Lokesh Department of Information Technology, Sri Sairam Engineering College,


Chennai, India
Avudaiappan Maheshwari Department of CINTEL, SRM University, Chennai,
India
Aditya Raj Mehta Department of Computer Science and Engineering, Chandigarh
University, Kochi, India
Manu Midha Department of Computer Science and Engineering, Chandigarh
University, Kochi, India
Amit Kumar Mishra Department of Computer Science and Engineering, Jain
University, Bengaluru, India
Monika Department of Computer Science and Engineering, Delhi Technological
University, New Delhi, India
Shubhankar Munshi Vishwakarma Institute of Technology, Pune, India
B. Muthukumara Vadivel Department of Information Technology, Hindustan
Institute of Technology and Science, Chennai, India
S. Nandhini Hindustan Institute of Technology and Science, Chennai, India
B. Padmavathi Easwari Engineering College, Ramapuram, Chennai, India
Jayanta Pal Information Technology Department, Tripura University, West
Tripura, India
Manish Paliwal School of Technology, Pandit Deendayal Energy University,
Gandhinagar, India
Saswat Kumar Panda Department of Computer Science and Engineering, Chandi-
garh University, Kochi, India
Dilkeshwar Pandey Department of Computer Science and Engineering, KIET
Group of Institutions, Ghaziabad, UP, India
Gaurav Parashar AMITY Institute of Information Technology, AMITY Univer-
sity, Noida, UP, India
Sahul Kumar Parida Department of Computer Science and Engineering, Chandi-
garh University, Kochi, India
Yeshwanth Pasem Department of ECM, Sreenidhi Institute of Science and Tech-
nology, Hyderabad, India
Amruta Pawar Department of Computer Science and Engineering, Chandigarh
University, Mohali, Punjab, India
Sheetal Phatangare Vishwakarma Institute of Technology, Pune, India
Editors and Contributors xix

Sarmistha Podder Information Technology Department, Tripura University, West


Tripura, India
K. Pooja Department of Biomedical Engineering, Vels Institute of Science, Tech-
nology and Advanced Studies, Chennai, India
Souhardha S. Poojary Department of Electronics and Communication Engi-
neering, SJEC, Mangaluru, Karnataka, India
Shivani Pothirajan Computer Science and Engineering, Easwari Engineering
College, Chennai, India
Sunil Prajapat Department of Mathematics, Srinivasa Ramanujan, Central Univer-
sity of Himachal Pradesh, Dharamshala, (H.P.), India
Shiv Prakash Department of Electronics and Communication, University of Alla-
habad, Prayagraj, India
Shailendra Pratap Singh School of Computing Science and Engineering, Galgo-
tias University, Greater Noida, Uttar Pradesh, India
Soma Prathibha Department of Information Technology, Sri Sairam Engineering
College, Chennai, India
Ravindra Kumar Purwar USICT, Guru Gobind Singh Indraprastha University,
Delhi, India
Harsha Vardhan Puvvadi Vellore Institute of Technology Chennai Campus,
Chennai, Tamil Nadu, India
K. Ragavendra Department of Information Technology, Hindustan Institute of
Technology and Science, Chennai, India
Lokesh Rai Department of Computer Science and Engineering, Indian Institute of
Information Technology Kottayam, Kottayam, India
Nikhil Rajora Department of Computer Science and Engineering, ABES Engi-
neering College, Ghaziabad, India
G. Sadhana Department of Information Technology, Hindustan Institute of Tech-
nology and Science, Chennai, India
Ashim Saha Computer Science and Engineering Department, NIT Agartala, Agar-
tala, India
V. Saiganesh Department of Information Technology, Sri Sairam Engineering
College, Chennai, India
Sonam Saluja Computer Science and Engineering Department, NIT Agartala,
Agartala, India
M. Santhosh Department of Electronics and Communication Engineering, Sona
Collage of Technology, Salem, Tamil Nadu, India
xx Editors and Contributors

Birendra Kumar Saraswat GLA University, Mathura, U.P., India


N. Sasirekha Department of Electronics and Communication Engineering, Sona
Collage of Technology, Salem, Tamil Nadu, India
Dayakshini Sathish Department of Electronics and Communication Engineering,
SJEC, Mangaluru, Karnataka, India
Canchibalaji Sathvik Computer Science and Engineering, Easwari Engineering
College Chennai, Chennai, India
Aditya Saxena GLA University, Mathura, U.P., India
Abhinav Sehgal Computer Science and Engineering, Chandigarh Group of
College, Sahibzada Ajit Singh Nagar, India
Raghav Shah Department of Computer Science and Engineering, Chandigarh
University, Mohali, Punjab, India
V. Sharan Venkatesh Department of Information Technology, Hindustan Institute
of Technology and Science, Padur, Chennai, India
Abhinav Sharma Computer Science and Engineering, Chandigarh Group of
College, Sahibzada Ajit Singh Nagar, India
Bhanu Prakash Sharma USICT, Guru Gobind Singh Indraprastha University,
Delhi, India
Deepak Sharma R.B.S. Engineering Technical Campus, Bichpuri, Agra, India
Divyanshu Sharma Department of Computer Science and Engineering, KIET
Group of Institutions, Ghaziabad, India
Gagan Sharma CSE Department, Chandigarh University, Mohali, India
Harsh Sharma Computer Science and Engineering, Chandigarh University, Kochi,
India
Megha Sharma Computer Science and Engineering, Chandigarh University,
Kochi, India
Shanu Sharma Department of Computer Science and Engineering, ABES Engi-
neering College, Ghaziabad, India
Sudhansh Sharma School of Computers and Information Sciences, IGNOU, New
Delhi, India
Swati Sharma KIET Group of Institutions, Ghaziabad, India
V. Ceronmani Sharmila Hindustan Institute of Technology and Science, Chennai,
India
Aditya Shastri School of Technology, Pandit Deendayal Energy University, Gand-
hinagar, India
Editors and Contributors xxi

Samarth Shetty Department of Electronics and Communication Engineering,


SJEC, Mangaluru, Karnataka, India
R. Shivakumar Department of Electrical and Electronics Engineering, Sona
Collage of Technology, Tamil Nadu, Salem, India
Shyamala L Vellore Institute of Technology Chennai Campus, Chennai, Tamil
Nadu, India
Akhilesh Singh KIET Group of Institutions, Delhi, India
Anil Pratap Singh Department of CSE, IES Institute of Technology and Manage-
ment, IES University, Bhopal, India
Brajesh Kumar Singh R.B.S. Engineering Technical Campus, Bichpuri, Agra,
India
Rajesh Singh Department of ECE, Uttaranchal Institute of Technology, Uttaranchal
University, Dehradun, India
Sandeep Singh University Center of Research and Development, Chandigarh
University, Chandigarh, India
Sukhpreet Singh Department of ECE, Chandigarh University, Gharuan, Mohali,
India
Naga Sridevi Piratla Department of ECM, Sreenidhi Institute of Science and
Technology, Hyderabad, India
Ankit Srivastava Computer Science and Engineering Department, NIT Agartala,
Agartala, India
Erma Suryani Department of Information Systems, Institut Teknologi Sepuluh
Nopember (ITS), Surabaya, Indonesia
S. Swetha Department of Electronics and Communication Engineering, Sona
Collage of Technology, Tamil Nadu, Salem, India
R. K. Tarunaa Computer Science and Engineering, Easwari Engineering College,
Chennai, India
Roopal Tatiwar Vishwakarma Institute of Technology, Pune, India
Aditya Taware Vishwakarma Institute of Technology, Pune, India
Aryaman Todkar Vishwakarma Institute of Technology, Pune, India
Vivek Tomar Graphic Era (Deemed-to-be-University), Dehradun, India
Anurag Tripathi KIET Group of Institutions, Delhi, India
Rishikesh Unawane Vishwakarma Institute of Technology, Pune, India
P. C. Vashist G L Bajaj Institute of Technology and Management, Greater Noida,
U.P., India
xxii Editors and Contributors

Vengadeswaran Department of Computer Science and Engineering, Indian Insti-


tute of Information Technology Kottayam, Kottayam, India
Ashutosh Kumar Verma Department of Computer Science and Engineering,
KIET Group of Institutions, Ghaziabad, India
Vijay Verma National Institute of Technology, Kurukshetra, Kurukshetra, Haryana,
India
S. Vinutha Department of Information Technology, Hindustan Institute of Tech-
nology and Science, Chennai, India
E. Vishnu Department of Information Technology, Hindustan Institute of Tech-
nology and Science, Padur, Chennai, India
A. Vishnudev Department of Information Technology, Hindustan Institute of
Technology and Science, Chennai, India
P. Vivek Karthick Department of Electronics and Communication Engineering,
Sona Collage of Technology, Salem, Tamil Nadu, India
Neha Yadav KIET Group of Institutions, Ghaziabad, India
Ben Franklin Yesuraj Hindustan Institute of Technology and Science, Chennai,
India
Emotion-Aware Music
Recommendations: A Transfer Learning
Approach Using Facial Expressions

Sai Teja Annam, Jyostna Devi Bodapati , and RajaSekhar Konda

Abstract The influence of music on mood and emotions has been widely studied,
highlighting its potential for self-expression and personal delight. As technology
continues to advance exponentially, manually selecting and analyzing music from
the vast array of artists, songs, and listeners becomes impractical. In this study, we
propose a system called the “emotion-aware music recommendations” that leverages
real-time facial expressions to determine a person’s emotional state. Deep learning
models are employed to accurately detect facial emotions, leveraging the principles of
transfer learning. By combining the model’s output with the mapped songs from the
dataset, a personalized playlist is created. The main objective of the study is to effec-
tively classify user emotions into six distinct categories using pre-trained models.
Experimental studies conducted on the proposed approach employ the RAF-ML
benchmark facial expression dataset. The findings indicate that the model outper-
forms existing approaches, demonstrating its effectiveness in generating tailored
music recommendations.

Keywords Pre-trained models · Emotion detection model · Facial emotion


recognition · Music recommendation systems

S. T. Annam · J. D. Bodapati (B)


Department of Advanced Computer Science and Engineering, VFSTR Deemed to Be University,
Vadlamudi 522213, India
e-mail: jyostna.bodapati82@gmail.com; bjd_acse@vignan.ac.in
R. Konda
Software Engineering Manager & IEEE Senior Member, IEEE, San Francisco, CA 94536, USA

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_1
2 S. T. Annam et al.

1 Introduction

Music has a profound impact on mood, behavior, and brain function [1]. It is
closely linked to emotions and psychological attributes, with brain regions respon-
sible for music processing also influencing emotions [2]. Emotion detection tech-
nology has gained popularity and has applications in various domains [3]. Advance-
ments in signal processing and feature extraction have facilitated automated emotion
identification in multimedia elements like music [4, 5].
In this study, we propose a facial expression-based emotion identification recom-
mender system. By accurately detecting the user’s emotions, the system generates
a suitable music playlist based on their emotional state, particularly focusing on
negative emotions.
Music is strongly connected to emotions, personality traits, and personal expe-
riences [6]. Facial expressions, which are indicators of emotional states, have been
underutilized in music recommendation systems [7, 8].
The vast collection of music tracks poses challenges in efficient organization
and categorization [9]. Facial images captured through a camera provide input for
emotion identification, enabling the generation of music playlists aligned with the
user’s emotional characteristics [10].
Our proposed system, the facial expression-based music player, aims to iden-
tify human emotions and create personalized playlists accordingly. To address the
complexity of emotions, our system considers combinations of multiple emotions
and utilizes the Real-time Affective Faces dataset, which contains images associated
with various emotional labels [11]. Transfer learning using VGG16 weights trained
on the “ImageNet” dataset achieves an accuracy of 81.72%. Overall, our system
offers a more efficient approach to mood categorization in music and enhances the
personalized music listening experience based on facial expressions and emotions.

2 Literature Survey

The goal of a review is to gain a greater understanding of the approaches employed in


a specific subject, as well as to identify any limits that may be rectified. A literature
review is an academic work that includes existing knowledge, significant discov-
eries, theoretical and methodological advances, and their contributions to a given
topic. The ability of people’s latent features to provide input to multiple systems in
a variety of ways has piqued the curiosity of researchers, scientists, engineers, and
other professionals worldwide [11]. The paper titled “Emotion-based music recom-
mendation system using CNN and collaborative filtering” proposes a music recom-
mendation system that combines collaborative filtering and convolutional neural
network (CNN) techniques to provide personalized music recommendations based
on the user’s emotion [12].
Emotion-Aware Music Recommendations: A Transfer Learning … 3

The authors have used the Million Song Dataset (MSD) and Million Playlist
Dataset (MPD) to develop and evaluate the proposed system. The MSD contains
metadata for one million songs, and the MPD contains 1000 playlists, each with
1000 tracks [13]. The authors have also collected emotion labels for the MSD using
Amazon Mechanical Turk. The proposed system consists of two parts: emotion
classification using a CNN and music recommendation using collaborative filtering.
The authors have used a CNN architecture with three convolutional layers and two
fully connected layers for emotion classification [14]. The CNN model was trained
on a subset of the MSD, which contains 22,125 songs and 12,312 users with emotion
labels. The emotion labels were mapped into four classes: happy, sad, angry, and
relaxed. The CNN model achieved an average classification accuracy of 75.22% on
a test set [15].
The authors have used two datasets to develop and evaluate their proposed system.
The first dataset is the GTZAN dataset, which contains 1000 audio clips of 30 s
each, categorized into 10 music genres. The second dataset is the Audio Emotion
Recognition dataset, which contains 593 audio clips of 2–3 min each, categorized
into four emotional classes: happy, sad, calm, and angry [16]. The authors have used
a DCNN architecture with six convolutional layers and two fully connected layers
for emotion recognition. The DCNN model was trained on the GTZAN dataset and
achieved an accuracy of 80.5% for music genre classification. The authors then fine-
tuned the DCNN model on the Audio Emotion Recognition dataset for emotion
recognition, achieving an accuracy of 75.14 [17].
A recent research study proposed a facial emotion-based music recommendation
system that uses computer vision and machine learning algorithms to offer music that
matches a user’s emotional state [18]. The system can recognize six basic emotions:
joyful, sad, angry, surprised, afraid, and neutral. The system’s performance was
examined using a dataset of 1000 songs, and it obtained an excellent accuracy of
78% in accurately recognizing the user’s emotional state [19].

3 Methodology

The methodology for the system involves multiple steps, including playlist creation,
image preprocessing, model building for emotion detection, and music recommen-
dation.

3.1 Playlist Creation

Playlist creation is a challenging task due to the increasing number of new songs.
To address this, a neural network-based classifier is developed to categorize songs
into multiple emotions. This classifier utilizes a softmax layer and a dense layer
with six neurons representing different emotions. A threshold probability of 0.2 is
4 S. T. Annam et al.

set, meaning that if a class’s probability is at least 0.2, the song will be classified
under that emotion. This allows songs to be classified into multiple emotions using
multi-label classification techniques.

3.2 Preprocessing

In the preprocessing step, a webcam or video is used as a data source to capture facial
expressions. Only the face is extracted from the acquired data, as it is the essential
component for depicting emotions from the image data. The images obtained from
the RAF-ML dataset may come in various sizes, so they are resized to a uniform size
of 224 × 224. This standard size ensures consistency and facilitates the passage of
images through the layers of the VGG16 model during transfer learning. Additionally,
any noise present in the image is adjusted to improve the quality of the data.

3.3 Model Building for Emotion Detection

For model building, images of various sizes from the RAF-ML dataset are used. These
images undergo cropping, scaling, and normalization before being used to build a
model suitable for detecting emotions. Transfer learning is employed by leveraging
the pre-trained weights from the VGG16 model, which was originally trained on
the ImageNet dataset. Modifications are made to the model architecture for emotion
detection, including freezing the weights of the first four convolutional blocks and
making subsequent weights trainable. The top layer of VGG16 is removed, and
two additional dense layers are added. The final dense layer with six neurons and a
softmax activation function is responsible for classifying the emotions present in an
image (Table 1).
Training: The model is trained using the pre-processed images from the RAF-ML
dataset. Transfer learning allows the model to be initialized with pre-trained weights,
reducing training time and starting with efficient weights. The batch learning tech-
nique is used, where the model learns after processing a batch of samples. Back-
propagation and a loss function are employed to update the model’s weights during
training.
Testing: To evaluate the model’s effectiveness, an independent test dataset containing
real-time images is used. The model’s performance in identifying different emotions
from these images is analyzed to gage its accuracy and overall effectiveness. Based
on the testing results, small adjustments can be made through fine-tuning to further
enhance the model’s performance. Fine-tuning may involve hyperparameter tuning
to optimize the model’s parameters for improved performance on new and unseen
data.
Emotion-Aware Music Recommendations: A Transfer Learning … 5

Table 1 Layer-wise details of the proposed deep architecture designed for emotion recognition
Layer (type) Output shape Param. #
input 2 (InputLayer) [(None, 224, 224, 3)] 0
block1 convi (Conv2D) (None, 224, 224, 64) 1792
block1 conv2 (Conv20) (None, 224, 224, 64) 36,928
block1_pool (MaxPooling20) (None, 112, 112, 64) 0
block2_conv1 (Conv20) (None, 112, 112, 128) 73,856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147,584
block2_pool (MaxPooling20) (None, 56, 56, 128) 0
block3_conv1 (Conv20) (None, 56, 56, 256) 295,168
block3_conv2 (Conv20) (None, 56, 56, 256) 590,080
block3_conv3 (Conv20) (None, 56, 56, 256) 590,080
block3_pool (MaxPooling20) (None, 28, 28, 256) 0
block4_conv1 (Conv20) (None, 28, 28, 512) 1,180,160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2,359,808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2,359,808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
blocks_conv1 (Conv2D (None, 14, 14, 512) 2,359,808
block5_conv2 (Conv2D) None, 14, 14, 512) 2,359,808
block5_conv3 (Conv20) (None, 14, 14, 512) 2,359,808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
global_average_pooling2d (None, 512)
(GlobalAveragePooling2D)
dense (Dense) (None, 256) 131,328
dense 1 (Dense) (None, 6) 1542
Total params.: 14,847,558
Trainable params.: 4,852,486
Non-trainable params.: 9,995,072

3.4 Music Recommendation

In the music recommendation step, the model’s ability to predict multiple emotions
simultaneously based on input images allows for the creation of playlists. The playlist
classifier already has a collection of songs associated with each emotion. Once the
emotions are determined, the songs corresponding to those emotions are shuffled
and compiled into a playlist. This playlist is then recommended to the user with
the intention of influencing and altering their current mood. By selecting songs that
align with the detected emotions, the playlist aims to create a musical experience
that resonates with the user’s emotional state and potentially uplifts or modifies their
mood (Fig. 1).
6 S. T. Annam et al.

Fig. 1 Workflow of the proposed transfer learning based motion-aware music recommendations

4 Modified VGG16

The system utilizes transfer learning and specifically leverages the pre-trained
weights of the VGG16 model trained on the ImageNet dataset [20]. Transfer learning
allows the model to benefit from the knowledge and features learned by the VGG16
model on a large-scale image classification task [21]. By using pre-trained weights,
the model can initialize with effective and efficient weights, reducing training time
and improving performance.
The VGG16 architecture consists of 16 layers, including convolutional, pooling,
dense, and output layers. It is known for its deep structure and the use of small 3
× 3 convolutional filters and max-pooling layers. This design enables the network
to capture intricate spatial patterns at multiple scales while reducing the number of
parameters (Fig. 2).
In the modified VGG16 proposed for recognizing facial expressions, the last layer
is removed, and additional dense layers are added to fine-tune the network for the
emotion detection task. The final dense layer includes a softmax activation function,
which is suitable for multi-class classification problems like emotion recognition.
The number of neurons in the output layer corresponds to the number of emotion
classes.
Emotion-Aware Music Recommendations: A Transfer Learning … 7

Fig. 2 Modified VGG16 proposed to recognize facial expressions

The network architecture starts with an input layer that receives the input data,
typically representing an image. The dimensions of the input layer are determined by
the height, width, and channels of the input image. The convolutional layers perform
convolutions on the input data using learnable filters to extract spatial features. Each
convolutional layer produces a set of feature maps representing the response of
different filters to different parts of the input.
Pooling layers follow the convolutional layers to down-sample the feature maps
while retaining important information. Max-pooling is commonly used, selecting the
maximum value within a pooling window to capture the most prominent features in
each local neighborhood.
The dense layers, also known as fully connected layers, connect every neuron to
every neuron in the previous layer. Each connection has a learned weight, and the
output of a neuron is determined by applying an activation function to the weighted
sum of its inputs.
The output layer is the final layer of the neural network and produces the clas-
sification results. In this system, since it is a multi-class classification problem, the
output layer applies the softmax activation function, which calculates the probability
distribution over the emotion classes.
Overall, the modified VGG16 model with its layers and fine-tuning allows for
effective recognition of facial expressions and emotion detection. The utilization of
transfer learning and the architectural choices of VGG16 contribute to the system’s
ability to accurately classify emotions based on input images.
8 S. T. Annam et al.

Table 2 Emotion-wise
Emotion Train_count Test_count
details of the benchmark
dataset used in the studies Surprise 862 220
Fear 314 64
Disgust 906 219
Happy 490 111
Sad 620 161
Anger 734 207

5 Experimental Studies

5.1 Dataset

The model in this system is trained on the RAF-ML (Real-world Affective Faces
Database) dataset, which consists of 29,672 facial images. The dataset includes 40
compound taggers and 12 emotion classes. Each image is labeled with multiple affec-
tive attributes, representing various emotional expressions or states displayed by the
people in the photographs. The dataset aims to capture a wide spectrum of emotions.
For this system, a subset of the RAF-ML dataset is used, consisting of 4908 images.
The selected subset only includes images associated with the emotions surprise,
fear, disgust, happiness, sadness, and anger. Each image is classified into the corre-
sponding emotion class based on the presence of certain emotional characteristics.
The distribution of images among the emotions is shown in Table 2.
From the available dataset, as the counts for each emotion class are relatively
balanced, there is no need to perform any under-sampling or over-sampling tech-
niques. The dataset is divided into 80% for training data and 20% for testing
data.
The RAF dataset consists of facial images with annotations for various emotional
expressions. The emotions covered in the dataset include anger, surprise, fear, happi-
ness, sadness, and disgust. These emotions are labeled using binary labels indicating
whether a specific emotion is present or not in each image.
To evaluate the performance of a facial emotion recognition model on the RAF
dataset, you can use metrics such as accuracy, precision, recall, and F1 score.
These metrics provide insights into the model’s ability to correctly classify different
emotions.
For example, you can calculate the accuracy of the model by comparing the
predicted emotions with the ground truth labels for the test set. Precision measures
the proportion of correctly predicted positive emotions (true positives) out of all
predicted positive emotions. Recall, on the other hand, measures the proportion of
correctly predicted positive emotions out of all actual positive emotions. The F1
score is the harmonic mean of precision and recall, providing a balanced measure of
the model’s performance.
Emotion-Aware Music Recommendations: A Transfer Learning … 9

Fig. 3 Visualization of train and validation laws over the epochs

Table 3 Results with respect


Accuracy (%)
to various hyperparameters
Loss type Categorical_crossentropy 81
Sparse categorical crossentropy
Optimizer Adam 81
Stochastic gradient descent

The specific results obtained will depend on the model architecture, training
approach, and hyperparameter settings. It is important to fine-tune the model and
optimize the hyperparameters to achieve the best performance on the RAF dataset
(Fig. 3; Table 3).

6 Conclusion

The development of a music recommendation system based on facial emotion detec-


tion using transfer learning and pre-trained VGG16 has proven to be a challenging
yet effective approach. By leveraging the power of transfer learning and the rich
features learned by VGG16 on a large-scale image dataset like ImageNet, we were
able to train a model that can accurately recognize and classify human emotions from
facial expressions. The system utilizes the pre-trained VGG16 model, fine-tuning it
by removing the top layer and adding additional layers specific to the emotion clas-
sification task. By training the model on a subset of the RAF-ML dataset, which
includes images associated with emotions such as anger, surprise, fear, happiness,
10 S. T. Annam et al.

sadness, and disgust, we were able to build a model capable of detecting and clas-
sifying these emotions in real-time. The effectiveness of the model was evaluated
using an independent test dataset, and the results showed that the system can success-
fully identify different emotions from real-time images. This allows the system to
generate music playlists tailored to the user’s current emotions, enhancing their music
listening experience.
Overall, the developed algorithm provides a personalized and improved user expe-
rience by accurately recognizing and classifying emotions from facial expressions
and generating music playlists that align with the identified emotions. The combi-
nation of facial emotion detection and music recommendation creates a unique and
individualized user experience in the field of mood-based music recommendation
systems.

References

1. Sharma N, Kumar R (2019) Emotion based music recommendation system using machine
learning techniques
2. Bodapati JD et al (2022) A deep learning framework with cross pooled soft attention for facial
expression recognition. J Inst Eng (India) Ser B 103(5):1395–1405
3. Choudhary S, Gupta S (2018) A music recommendation system based on emotion recognition
using artificial neural network
4. Jyostna Devi B, Veeranjaneyulu N (2019) Facial emotion recognition using deep CNN based
features. Int J Innov Technol Eng (IJITEE) 8(7)
5. Janghel NK, Gupta S (2020) Music recommendation system based on facial emotion
recognition using CNN
6. Yang X et al (2019) Affective music recommendation with CNN-based feature learning and
fusion
7. Wei et al (2021) Music recommendation based on face emotion recognition
8. Dhavalikar AS, Kulkarni RK (2014) Face detection and facial expression recognition system.
Institute of Electrical and Electronics Engineers (IEEE)
9. Jaichandran R, Ranjitha J (2021) Facial emotion based music recommendation system using
computer vision and machine learning techniques
10. Ayush G et al (n.d.) Music recommendation by facial analysis
11. Ahmed Hamdy A (n.d.) Emotion-based music player emotion detection from live camera
12. Karpathy A et al (2014) Large-scale video classification with convolutional neural networks
13. Bodapati JD et al (2022) FERNet: a deep CNN architecture for facial expression recognition
in the wild. J Inst Eng (India) Ser B 103(2):439–448
14. Yang YH et al (2008) A regression approach to music emotion recognition. IEEE Trans Audio
Speech Lang Process
15. Synak P, Lewis R, Raś ZW (2005) Extracting emotions from music data. In: International
symposium on methodologies for intelligent systems
16. Song Y, Dixon S, Pearce M (2012) Evaluation of musical features for emotion classification
17. Lee K, Cho M (2011) Mood classification from musical audio using user group-dependent
models
18. Gil et al (2015) Emotion recognition in the wild via convolutional neural networks and mapped
binary patterns
19. Zhang D et al (2020) Multi-modal multi-label emotion detection with modality and label
dependence. In: Proceedings of the 2020 conference on empirical methods in natural language
processing (EMNLP)
Emotion-Aware Music Recommendations: A Transfer Learning … 11

20. Bodapati JD, Balaji BB (2023) TumorAwareNet: Deep representation learning with attention
based sparse convolutional denoising autoencoder for brain tumor recognition. Multimedia
Tools and Appl:1–19.
21. Bodapati JD, Balaji BB (2023) Self-adaptive stacking ensemble approach with attention based
deep neural network models for diabetic retinopathy severity prediction. Multimedia Tools and
Appl:1–20.
A Novel Image Captioning Approach
Using CNN and MLP

Swati Sharma , Vivek Tomar , Neha Yadav , and Mukul Aggarwal

Abstract Computer vision and natural language processing researchers have ded-
icated significant time and energy to the problem of automatically creating image
descriptions. In this work, we propose an artificially intelligent picture captioner
built on a hybrid architecture of convolutional neural networks (CNNs) and multi-
layer perceptron (MLPs). This system will take photographs as input and generate
captions based on those images using algorithms from convolutional neural networks
(CNN), multilayer perception (MLP), recurrent neural networks (RNN). To gener-
ate a natural language caption, CNN first extracts feature from the input image, and
then the MLP analyses these features. The proposed model is trained on a dataset
of photos with captions by optimizing the parameters of a convolutional neural net-
work (CNN) and a multilayer perceptron (MLP) via a hybrid of supervised learning
and reinforcement learning. Our results demonstrate that the suggested model may
produce captions that are on par with the best methods currently available in terms of
accuracy and variety. The artificially intelligent picture captioner has potential uses
in many areas, such as social networking, e-commerce, and image retrieval systems.

Keywords Image pre-processing · CNN · MLP · RNN · Deep learning · AI ·


Word embedding

1 Introduction

The study of natural language processing, picture recognition, and machine learning
all contribute to the field of image captioning. The Internet, newspapers, academic
papers, government reports, and even advertising all contain visuals that we are
exposed to on a daily basis. Images in these sources are left up to the discretion

S. Sharma (B) · N. Yadav · M. Aggarwal


KIET Group of Institutions, Ghaziabad, India
e-mail: swatish.3006@gmail.com
V. Tomar
Graphic Era (Deemed-to-be-University), Dehradun, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 13
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_2
14 S. Sharma et al.

of the reader. Even though most pictures don’t have captions, we can nevertheless
make sense of them. However, in order for humans to use automatic image captions,
robots will need to understand a captioning system. The suggested model’s focus is
on providing a descriptive account of an image. Text in an image can be matched with
the correct text with the use of this technology. There are a number of reasons why
image captioning is crucial. Data collection about an object and its connections to
others is required for these jobs. By including a voice recognition feature, it can also
aid the visually impaired in comprehending the content of images. Google Image
Search could benefit from automatic captioning if it were implemented. Having a
searchable caption for every image would be quite helpful. Captioning an image
involves analysing its constituent pieces and then describing them in a single sen-
tence using only ordinary English. Machine learning algorithms typically fall short
because of the challenges presented by complicated data [10]. The representation of
the relation between text and image is one source of difficulty when attempting to
represent and measure text and image similarity. Two broad classes of approaches can
be used to address this problem. To achieve a one-to-one match, we need to merge the
global feature representations of the two sources into a single, shared space [17]. The
worldwide similarity of an image’s text is all that is taken into account by the many-
to-many matching approach; the image’s actual content is disregarded. In most cases,
this approach fails to reliably retrieve relevant information across many media. The
vast majority of the duplicates serve no purpose and are dependent on the matching
processes. They also sounds that increase or decrease the computational complexity
of the model.

2 Literature Review

The captioning of images was investigated by Alam et al. [1] using deep learning.
Investigating current deep learning methods for picture captioning is the focus of
this study. Both the most important step and the CNN-based approach for generat-
ing photo captions were addressed. Captioning models for videos and images were
created by Amirian et al. [2]. They use deep learning algorithms to automatically
generate captions or descriptions for still images and video clips. To aid the visually
impaired, they are expanding this area of expertise to include deep learning-based
automatic captain generation for still images and moving images. Image captions
were automatically created by Bang and Kim [4] using contextual information. The
method produces data that is temporally spatial and visually informative. The tex-
tual description method creates sentences from photo captions. Data organization
methods can be used to any type of information, including but not limited to images,
texts, timestamps, and geographic locations. Data collected by UAVs can be better
managed if the surrounding environment is known. Deng et al. [5] built a model for
creating image descriptions using a Dense Net network and adaptive attention. In
this research, the use of a visual sentinel as part of an adaptive attention paradigm is
proposed. The model uses Dense Net to get at the big picture of an image. To improve
A Novel Image Captioning Approach Using CNN and MLP 15

the output quality of photo captions, an LSTM Network is used as a language gener-
ation model. Image captioning is generated using the Flickr30k and COCO datasets.
Using a model for picture to text synthesis, Hossain et al. [6] enhanced image cap-
tioning. The solutions employ deep learning models for training and assessing the
models using human-annotated photographs. They present a technique for image
captioning that employs both real and simulated data in the model’s training and val-
idation phases. The synthetic graphics were produced using a Generative Adversarial
Network-based text-to-image generator. Iwamura et al. [7] used a moving convolu-
tional neural network (CNN) with object detection to generate visual descriptions.
They released a prototype built with MSCOCO, MSR-VTT2016-Image, and public
domain pictures. The datasets were used to generate captions for images and assess
the quality of those captions. Kalra and Leekha [8] assessed CNN’s image captioning
capabilities.
Models were constructed using a CNN for picture embedding and a recurrent
neural network (RNN) for modelling and predicting language. They proposed a
model for generating textual descriptions. Mathur studied several different deep
learning models for picture captioning [8]. Natural language processing is used to
grasp the syntax and semantics of written language, while computer vision is used to
analyse visual data. Caption creation is automated with the help of machine learning.
Researchers employ convolutional neural networks for visual input analysis, and
recurrent neural networks for phrase generation. Phukan and Panda [11] created
a powerful method for deep neural network-based picture captioning. Automatic
captioning is crucial for image data on the internet since entities need to be accurately
identified and handled. They devise an innovative and efficient approach to automatic
picture captioning for single images, and they describe ways to improve the method’s
efficacy and utility.
Sharma and Jalal [12] employed a convolutional neural network and a long short-
term memory to include external knowledge for image captioning. They proposed
an approach to improved image description that integrates both local and external
knowledge from sources like Concept Net. They utilized the Flickr8k and Flickr30k
datasets as examples of how to generate image descriptions. Shi et al. [13] used
the MSCOCO dataset to refine their picture captioning model. It was shown on
the MSCOCO dataset that the suggested framework achieves better results than the
baselines, leading to caption generation. Wang et al.’s [14] method of remote sensing
image captioning utilized a word-sentence framework. The core components of the
proposed structure are a phrase generator and a word extractor. Instead of improving
remote sensing images, the former approach eliminates pertinent terms. The picture
captioning model proposed by Zhang et al. [15] uses gLSTM with visual enhance-
ments. To improve the precision of visual description, they created a paradigm for
guiding long short-term memory (gLSTM) that collects visual information from RoI
and uses it as guiding information in gLSTM. High-resolution remote sensing Zhao
et al. [16] used structured attention to annotate images. When compared to other
current deep learning-based captioning systems, attention-based captioning stands
out for its ability to generate text while simultaneously identifying the locations of
relevant objects in images.
16 S. Sharma et al.

The image captions were produced using a neural architecture search by Zhu et al.
[17]. They provided a model for Neural Architecture Search that has proven effective
in a number of image recognition tasks. A recurrent neural network is also necessary
for the process of image captioning. Using a reinforcement learning technique that
relies on common parameters, they efficiently construct an Auto RNN.

3 Basic Terminologies

Here, we’ll go over some of the most fundamental concepts in computer vision,
natural language processing, and picture captioning:

– CNN: It’s an ANN that processes pixel data. Common applications include picture
identification and text-based storytelling, both of which need deep learning. A
neural network is a computer system that does visual processing in the same way as
the human brain does. CNNs are intended to process images in a way that does not
need down sampling the original image. A convolutional neural network (CNN)
is a multilayer perceptron that is dispersed in order to minimize computational
overhead [9]. An input layer, a hidden layer, and several convolutional layers
make up its layered architecture [3].
– MLP: A multilayer perceptron is a multi-layered feedforward artificial neural
network. A “vanilla” neural network is a common term for this type of system.
There are at least three layers in any multi-level model input, hidden, and output.
The activation functions of each successive layer are non-linear. Do not confuse a
multilayer perceptron with a perceptron network or think of it as a single perceptron
with numerous layers. Instead, it describes a model of synthetic neuron that may
be programmed to carry out any number of different activation procedures. The
artificial neuron nodes and layers were later dubbed multilayer perceptrons.
– RNN: It’s a neural network that generates its next action based on the outcomes
of the prior one. When a machine needs to predict the next word in a sentence, it
often forgets the one before it. RNN was designed to address this problem. The
input data is stored in the RNN’s memory. All inputs are processed with the same
set of parameters. The algorithm’s complexity is therefore removed.

4 Proposed Approach

Deep learning is an approach to machine learning that is an attempt to model brain


activity. The method is successful because it employs multi-layered networks capa-
ble of learning from extensive datasets. Artificial intelligence technology like deep
learning allows machines to do more and more work without any help from humans.
Its usefulness extends across many fields, including medicine, AI personal assistants,
A Novel Image Captioning Approach Using CNN and MLP 17

and finance. Multiple nodes are interconnected to form a learning system known as
a deep neural network. These parts cooperate to carry out advanced tasks like object
recognition and categorization.
A deep neural network is able to make predictions and categorize data because of
the way its layers interact together. In this paper, we employ a deep learning approach
because of its remarkable capacity to combine existing AI technologies in order to
automatically identify photos, characterize their components, and generate short
phrases that are grammatically acceptable descriptions of those components. Using
convolutional neural networks (CNN), multilayer perception (MLP), and recurrent
neural networks (RNN) techniques, this system will accept input photographs and
create captions based on those photographs (RNN).

5 Methodology

A novel image captioning approach combining convolutional neural networks (CNN)


and multilayer perceptron (MLP) can be an effective way to generate descriptive
captions for images. Here’s an overview of the approach used: Start by collecting
a dataset of paired images and their corresponding captions. Each image should be
associated with one or more captions. Use a pre-trained CNN, such as VGGNet or
ResNet, to extract meaningful visual features from the input images. Remove the
classification layers from the CNN and keep the convolutional layers to obtain a
fixed-length feature vector for each image. Preprocess the captions by tokenizing
them into individual words and creating a vocabulary. Assign unique integer IDs to
each word in the vocabulary. Also, add special tokens like start-of-sentence (SOS)
and end-of-sentence (EOS) to indicate the beginning and end of a caption. Initialize
an embedding layer to map the integer IDs of words in the captions to continuous
vector representations. Similarly, create an embedding layer to map the extracted
image features to a fixed-size vector representation. Combine the CNN and MLP
to create a caption generation model. This model takes the image features as input
and generates captions word by word. Pass the image features through the MLP to
transform them into a compatible size and shape for further processing. Train the
RNN to predict the next word in the caption given the previous words and the attended
image features. Use teacher forcing during training, where the ground truth words
are fed as inputs. However, during inference, use the generated words as inputs to the
RNN. Train the caption generation model using the paired image-caption dataset.
Given a new image, pass it through the pre-trained CNN to extract visual features.
Then, feed these features into the trained caption generation model to generate a
descriptive caption for the image.
18 S. Sharma et al.

5.1 Data Collection

There are a plethora of publicly available datasets that can be used to our challenge.
Use Flickr’s 8k photos, 30k photos, etc. There are 8000 pictures in this dataset. Five
separate captions are provided for each individual image.

– Training Set: 6000 images


– Dev Set: 1000 images
– Test Set: 1000 images.

5.2 Understanding the Data

Along with the images, there is also text file “Flickr8k.token.txt” that contains the
names of the images and their captions as shown in Fig. 1.

5.3 Data Cleaning

After mapping the data, the next step is Data Cleaning in which, following steps are
done to reduce the time of processing like:

– Removed single length characters.


– Convert all strings to lower case letters.

Example: “I am STANDING” changes to “am standing”.

Fig. 1 Format of data in the dataset


A Novel Image Captioning Approach Using CNN and MLP 19

5.4 Creating a Vocabulary

The model’s vocabulary is the collection of all distinct words it can foresee. After
polishing our descriptions, we’ll use them to generate new words. The model’s output
is a number, and we’ll use that number to make a word prediction. Since a model
can produce several possible captions, we will select the terms using probability.

5.5 Prepare Train/Test Data

We’ve separated the photos in our dataset into those for testing and those for training.
Here, we’ll make a chart that connects each identifier from the training set to its cor-
responding caption. Now we’ll utilize an RNN, or a layer based on a long short-term
memory network, to generate text. Take the line, “Dog is running”, as an illustration.
In this case, the word “dog” is formed first, then sent to the next layer, where the
word “is” is generated, and finally the word “running” is generated. However, this
is when the loop might become infinite. Therefore, our model ought to recognize
when to cease. The ‘e.>’ sentence-ending token will be created for this purpose. To
complement the first RNN set, we will additionally generate a start token to be used
as an input.

5.6 Image Preprocessing

This method is used to improve an image or get data out of it. Analog and digital image
processing are the two most used methods. The end result is a picture every time.
The results of digital image processing can be linked to a particular photo. Medical
visualization, autonomous vehicles, video games, and even law enforcement all make
use of image processing in some capacity. Here, we make use of an already-trained
model (ResNet50) on image net in order to extract features from images. ResNet
additionally allows for skip connections; an attribute tells us which layers a certain
layer connects to. By avoiding the “Vanishing gradient” problem, skip connections
are helpful.
Next, we’ll store the results of the time-consuming operation of encoding all the
photos in an output file. Using the ResNet50 model (see Fig. 2), the output file will
have an image id that corresponds to a feature vector. The vectors have been retrieved
and are ready to be saved. Pickle will be used for archival purposes. It provides us
with the dump and load operations, which we can use to save information on disc
and retrieve it later (Fig. 3).
20 S. Sharma et al.

Fig. 2 ResNet50 model architecture

Fig. 3 Flowchart of the methodology used


A Novel Image Captioning Approach Using CNN and MLP 21

6 Result and Conclusion

Our approach leverages the visual understanding capabilities of CNNs to extract


informative image features, which are then fed into an MLP-based recurrent neural
network (RNN) for caption generation. The integration of an attention mechanism
further enhances the alignment between image regions and generated words, leading
to improved caption quality. Images similar to those used in our dataset have been
captioned appropriately using the proposed deep learning model. To accomplish
both of these goals, For the purposes of feature extraction and caption production,
we have constructed a convolutional neural network model. In this case, the Flickr
8k dataset was used to train the model. ResNet is the name of the framework used
for the convolutional layer. Captions are generated using language learned during
training, and the ResNet architecture is used to extract visual attributes, which are
then sent as input to long short-term memory units. There was a 75% success rate
using the proposed model. The graphic below displays an example of an image taken
from the dataset and the number of first descriptions tested for the input image. Five
distinct sample captions are provided here, all of which are connected to the same
example image of a female entering a cabin (Figs. 4 and 5).
When we run the model with the GPU’s help, we get the best results. To analyse
massive amounts of unstructured and unlabelled data to uncover patterns in those
photos for guiding the self-driving cars, and for constructing the software to guide the

Fig. 4 Sample image

Fig. 5 Sample captions


22 S. Sharma et al.

blind, this image captioning deep learning model is extremely beneficial. The impli-
cations of our work are significant in various applications such as image annotation,
visual assistance systems, and image search engines. The ability to automatically
generate descriptive captions for images can greatly enhance the accessibility and
understanding of visual content for users.

7 Future Scope

We have detailed the process of creating image captions in this paper. Deep learning
has come a long way, but it’s still not viable to generate correct captions for a
number of reasons (machines can’t think or make decisions as accurately as humans,
for example; there’s no good programming logic or model to do so). With better
technology and more sophisticated deep learning models, we anticipate improved
caption generation in the near future. One potential use of this technique is translating
visual captions into spoken language. For the blind, this is a great aid.

References

1. Alam MS, Narula V, Haldia R, Ganpatrao GN (2021) An empirical study of image caption-
ing using deep learning. In: 2021 5th international conference on trends in electronics and
informatics (ICOEI). IEEE, pp 1039–1044
2. Amirian S, Rasheed K, Taha TR, Arabnia HR (2020) Automatic image and video caption gen-
eration with deep learning: a concise review and algorithmic overlap. IEEE Access 8:218386–
218400
3. Azhar I, Afyouni I, Elnagar A (2021) Facilitated deep learning models for image captioning.
In: 2021 55th annual conference on information sciences and systems (CISS). IEEE, pp 1–6
4. Bang S, Kim H (2020) Context-based information generation for managing UAV-acquired data
using image captioning. Autom Constr 112:103116
5. Deng Z, Jiang Z, Lan R, Huang W, Luo X (2020) Image captioning using DenseNet network
and adaptive attention. Signal Process Image Commun 85:115836
6. Hossain MZ, Sohel F, Shiratuddin MF, Laga H, Bennamoun M (2021) Text to image synthesis
for improved image captioning. IEEE Access 9:64918–64928
7. Iwamura K, Louhi Kasahara JY, Moro A, Yamashita A, Asama H (2021) Image captioning
using motion-CNN with object detection. Sensors 21(4):1270
8. Kalra S, Leekha A (2020) Survey of convolutional neural networks for image captioning. J Inf
Optim Sci 41(1):239–260
9. Mann S, Bindal AK, Balyan A, Shukla V, Gupta Z, Tomar V, Miah S (2022) Multiresolution-
based singular value decomposition approach for breast cancer image classification. BioMed
Res Int 2022. https://doi.org/10.1155/2022/6392206
10. Maroju A, Doma SS, Chandarlapati L (2021) Image caption generating deep learning model.
Int J Eng Res Technol (IJERT) 10(09)
11. Phukan BB, Panda AR (2021) An efficient technique for image captioning using deep neural
network. In: Cognitive informatics and soft computing: proceeding of CISC 2020. Springer,
pp 481–491
12. Sharma H, Jalal AS (2020) Incorporating external knowledge for image captioning using CNN
and LSTM. Mod Phys Lett B 34(28):2050315
A Novel Image Captioning Approach Using CNN and MLP 23

13. Shi Z, Zhou X, Qiu X, Zhu X (2020) Improving image captioning with better use of captions.
arXiv preprint arXiv:2006.11807
14. Wang Q, Huang W, Zhang X, Li X (2020) Word-sentence framework for remote sensing image
captioning. IEEE Trans Geosci Remote Sens 59(12):10532–10543
15. Zhang J, Li K, Wang Z, Zhao X, Wang Z (2021) Visual enhanced gLSTM for image captioning.
Expert Syst Appl 184:115462
16. Zhao R, Shi Z, Zou Z (2021) High-resolution remote sensing image captioning based on
structured attention. IEEE Trans Geosci Remote Sens 60:1–14
17. Zhu X, Wang W, Guo L, Liu J (2020) AutoCaption: image captioning with neural architecture
search. arXiv preprint arXiv:2012.09742
Route Optimizations Using Genetic
Algorithms for Wireless Body Area
Networks

Gagan Sharma

Abstract There is a lot of research taking place in this fast and growing technology,
namely Wireless Body Area Networks (WBAN), to improve network performance.
This work focuses to improve the WBAN system which is used in hospitals for the
benefits of patients as well as doctors easiness. A lot of work can be identified in
the same field and used as literature. Different researchers use different techniques
to minimize energy consumption, packet delivery ration like Kruskal’s algorithm,
Prim’s algorithm, particle swarm algorithm, etc. The energy of a sensor node gets
impacted due to transmission distance, ideal routing protocols, and amount of data to
be transmitted. Genetic algorithm (GA) which is a natural evolutionary algorithm and
finds the optimal path to reach to destination which provides benefit to the patients
in terms of their valuable data (patient’s report). Using GA along with fuzzy logic
and cluster head selection, better results will be provided for packet delivery ratio
and normalized residual energy, which further itself improves network lifetime.

Keywords WBAN · Genetic algorithms · Optimal paths · Energy

1 Introduction

WBAN is a concept that has recently gained attention and is simple for people to
incorporate into their daily lives. To measure various physiological signals produced
by the human body, nano- and micro-devices are created and implanted on human
body. A variety of heterogeneous sensors, including EEG, ECG, blood pressure level
monitoring, and other sensors, are employed to measure various bodily parameters
[1]. Each sensor gathers information from its own planted sensor and sends it to the
sink node, which serves as a database for the collection and storage of all information.
Body Sensor Network (BSN) is another name for WBAN. WBAN is used in various

G. Sharma (B)
CSE Department, Chandigarh University, Mohali, India
e-mail: 21mai1043@cuchd.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 25
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_3
26 G. Sharma

fields, including the military and law enforcement, as a result of extensive study in
this area. There are a lot of other applications of WBAN.

1.1 Applications of WBAN

(1) Medical treatment and diagnosis.


(2) Prepare schedules for contenders.
(3) To avert cathartic smash.
(4) Safeguarding of uniformed personnel.
(5) Consumer electronics.

1.2 Positioning of WBANs

The protocols that are created for communication reasons in WBANs between the
body’s sensors and a data center connected to the internet via a body node. In WBAN,
there are two different methods of communication between nodes:
1. Body to body communication.
2. Additional body language.
A cutting-edge data transmission technique called “intra-body communication”
makes use of the body’s electrical conductivity. The implanted sensors track essential
bodily processes and transmit information to a centralized monitoring unit through
the body.
Communication between personal gadgets and an external network is ensured by
extra body communication.

2 Related Work

Many researchers have worked for the improvement of Wireless Networking Systems
to enhance the quality of network for packet loss during data transmission between
sensor nodes and less battery consumption by nodes during transmission, but there is
still some room for improvement in cases where during packet delivery loss occurs,
battery consumes, etc. especially in medical area. Balouchestani et al. [2] presents
low-power wireless healthcare systems that employ the genetic channel model and
MFC’s capabilities, and it looks into the advantages of using the CS theory with MFC.
Both patients in hospitals and those at home can receive treatment thanks to the use of
CS in MFC. Best route is chosen utilizing the best optimization method. Khalilian and
Rezai [3] reviewed the WBAN applications in real-time healthcare monitoring system
and address the remote healthcare monitoring impacts and propose the solution to
Route Optimizations Using Genetic Algorithms for Wireless Body Area … 27

wear sensors around patient body. Chaudhary [4], an energy-efficient FECG signal
optimization utilizing a genetic algorithm has been presented, extending the battery
life of the FECG sensor and reducing the demand on the buffer memory. The FECG
signal’s transit through the network can be made more streamlined and quick in
the future by using more bio-inspired algorithms and compression techniques. Chen
et al. [5] proposed architecture namely “beyond-BAN communication” which is
used to deliver body signals to remote terminals in timely fashion. For the high
mobility of patients and doctor, existing architectures are not suitable. So by using this
proposed novel network architecture, the QoS Long-Term Networking and Named
Data Networking can be improved which overall improves the performance of the
architecture. Dinkar et al. [6] discussed regarding the use of biomedical sensors
for long-distance continuous patient health monitoring. Sensing system which is
worn by the patients should be having less battery size. Data is collected at sink
node and then transferred to the respective doctors by having internet connectivity
in their mobile phones. Murugeswari and Murugan [7] proposed a synchronized
protocol to address the energy efficiency and quality of service in WBAN. Further,
he also explains the impact of latency and communication medium of the network
and compares and validates the proposed protocols with existing approaches. He
et al. [8] proposed such protocols which ensure high reliability and provide security.
In this, multiple one-way key hash chain is used to provide authentication. Hiep and
Kohno [9] described that due to increase in the population of elderly population,
health issues are increasing so due to which healthcare market keeps on growing.
Multi-hop systems are used in which relay node is used for forwarding the data to
the multiple receivers system. High efficiency is achieved by optimizing the packet
rate and by successful transmission probability.

3 Proposed Methodology

The presented work focuses upon improving the WBAN system which is used in
hospitals for the benefit of patients as well as doctors easiness. In previous researches,
many researchers had done work on this problem. A lot of research has taken place
in this field for minimizing energy consumption, improving packet delivery ration
using Kruskal’s algorithm, Prim’s algorithm, and particle swarm algorithm [10–13].
So following are the aims of the proposed work:
1. To enhance the technology which reduces data transfer loss between the sensors
nodes while communication as well as less battery consumption by nodes.
2. For the safety of the patient, battery consumption during communication should
be less, so the less battery gets consumed, less network improvement occurs.
Inversely, the less battery gets consumed, more improvement occurs in the
network.
28 G. Sharma

3.1 Proposed Algorithm

In order to execute optimized routing in WBAN, a variety of network simulators


have been used to display the behavior of the network, such as computing the inter-
action between nodes in WBAN using a few mathematical formulas and displaying
graphical information such as the average residual energy of nodes or their packet
delivery ratio. NS3, MATLAB, and other simulators are just a few of the ones utilized
in WBAN.
For x = 1 to g
For x = 1 to y = c
Pcy = Random[(1,k)]
End for
For x = 1 to y = c
ZZyT = Rand[(1,k)] If
ZZy > 0 & ZZy < PS
ZZy = ZZyT
Else
ZZy = PZy
End If
End for
For y = 1 to C
If Zfy > Pfy
PZy = ZZy
End for
The proposed work’s methodology uses GA for selecting the optimum path to
allow nodes to communicate with one another. The results of the proposed work will
be assessed using the MATLAB 2018 simulator.
Working of Genetic Algorithm:
// initiate time
tm := 0;
// Commence the random population
initpopulation P (tm);
// Calculate the fitness function for all
evaluate P (tm);
// Define the endup criteria (time, fitness, etc.)
while not done do
// increase the time counter
tm := tm + 1;
// identify a sub-population for offspring production
Ps’ := select parents P (t);
// evaluate the "genes" of identified parents recombine P’ (tm);
// Evolve the mated population stochastically
Route Optimizations Using Genetic Algorithms for Wireless Body Area … 29

Evolve P’ (tm);
// new fitness evaluate
P’ (tm);
// identify the survivors
fitness P := survive P,P’ (tm);
od end GA.
A transmission schedule created by GA is made up of transmission rounds. The
transmission schedule comprises routing pathways that the network must follow
to maximize lifetime. Data is collected from sensor nodes to sink. The nodes are
dispersed over the body, as shown in Fig. 2, and genetic algorithms can be used to
optimize routing. The sink node, which stores all data, is located at the waist.
Figure 1 shows implementation flowchart. After that possible number of routes are
found (approximately 10 routes) in which RSSI parameters are followed. In RSSI,
the different links with less number of hops are selected for further process because
less the number of hops more will be the packet delivered to its destination. Then,
GA will decide the best route by following shortest path. If the best route is not
found, the loop will be followed till the goal is reached. If best route is found among
the possible number of nodes, then parameters like RSSI, battery, fuzzy logic, and
SPT are evaluated over human body. Nodes communicate with each other, and the
shortest path is generated so that all the issues which occur in WBAN get removed
(Fig. 2).
So in the proposed work Genetic Algorithm is used in which constrained based
cluster head selection is used organize the communication channel. It is an efficient
optimization technique among Particle Swarm Optimization (PSO) and Dynamic
Source Routing (DSR).

4 Experimental Setup and Results

Figure 3a shows the normalized residual energy for different parameters. By adopting
a load-balancing strategy in the network of nodes, the genetic algorithm achieves the
best overall outcomes by maintaining a greater value of residual energy for the
majority of the network nodes. On the other hand, since other factors like hop count
and RSSI have no effect on balancing network load, relaying nodes would use up
their battery power more quickly than other nodes in the network, which will result
in network partition. For instance, when the sink node is at the ankle and nodes 10
and 11 must relay the majority of the network traffic, genetic algorithm and battery
metrics can balance the network load and maintain both nodes with roughly the same
residual energy level, and Fig. 3b shows clearly that the results of the proposed work
are better than previous results.
30 G. Sharma

Fig. 1 Implementation
flowchart

5 Conclusion and Future Work

WBAN is a most vast field for research in which heterogeneous nodes are deployed
over human body to measure different body signals. Mamdani fuzzy inference system
were used in which rules are to be set using and/or fuzzy logic also shows good result
but in the proposed work, best optimization technique namely GA which is based on
the natural evolution in which on the basis fitness function, chromosomes are selected.
For 100 generations, the best possible route is selected among various possible routes
in the shortest span of time by improving packet delivery ratio, residual energy
consumed by nodes, and network lifetime increases. The shortest path is extracted
to reach the packet or send data to its destination in the WBAN in the shortest span
of time. In the proposed work, results are far better than the PSO and DSR one in
order to remove the issues of WBAN system. For handicapped people and elderly
people, this technology is beneficial because as it is aware that the old age patients
cannot move too much so they cannot visit doctor too much, so there using this best
technology, patients are installed with sensors or nodes and then remotely from any
Route Optimizations Using Genetic Algorithms for Wireless Body Area … 31

Fig. 2 Position of nodes on different places

Fig. 3 a Per node packet delivery, b improved percentage

place doctors can check their patients and prescribe them medicines. Future work for
WBAN demands security in data processing and keeping the information and data
secure. Blockchain technology can be followed in advanced stages of data security.
32 G. Sharma

References

1. Latre B, Braem B, Moerman I, Blondia C, Demeester P (2011) A survey on wireless body area
networks. Wireless Netw 17:1–18
2. Balouchestani M, Raahemifar K, Krishnan S (2012) Wireless body area networks with
compressed sensing theory. In: Proceedings of 2012 ICME international conference on complex
medical engineering, Kobe, Japan, 1–4 July 2012
3. Khalilian R, Rezai A (2022) Wireless body area network (WBAN) applications necessity in
real time healthcare. In: 2022 IEEE integrated STEM education conference (ISEC), Princeton,
NJ, pp 371–374. https://doi.org/10.1109/ISEC54952.2022.10025199
4. Chaudhary K (2014) Fetal ECG signal optimization on signal obtained from FECG sensor for
remote areas with lower signal strength for its smooth propagation to medical databases. Int J
Sci Res (IJSR)
5. Chen M, Mau DO, Wang X, Wang H (2013) The virtue of sharing: efficient content delivery
in wireless body area networks for ubiquitous healthcare. In: 2013 IEEE 15th international
conference on e-health networking, applications and services (Healthcom 2013), Oct 2013.
IEEE
6. Dinkar P, Gulavani A, Ketkale S, Kadam P, Dabhade S (2013) Remote health monitoring using
wireless body area network. Int J Eng Adv Technol (IJEAT) 2(4). ISSN: 2249-8958
7. Murugeswari S, Murugan K (2019) Improving energy efficiency and quality of service in
wireless body area sensor network using versatile synchronization guard band protocol. In:
2019 IEEE international conference on clean energy and energy efficient electronics circuit for
sustainable development (INCCES), Krishnankoil, pp 1–5. https://doi.org/10.1109/INCCES
47820.2019.9167738
8. He D, Chan S, Zhang Y, Yang H (2014) Lightweight and confidential data discovery and
dissemination for wireless body area networks. IEEE J Biomed Health Inform 18(2)
9. Hiep PT, Kohno R (2013) Optimizing data rate for multiple hop wireless body area network. In:
The 2013 international conference on advanced technologies for communications (ATC’13),
Oct 2013. IEEE
10. Cavallari R, Martelli F, Rosini R, Buratti C, Verdone R (2014) A survey on wireless body area
networks: technologies and design challenges. IEEE Commun Surv Tutor 16(3)
11. Chavez-Santiago R, Nolan K, Holland O, De Nardis L, Ferro J, Barroca N, Borges L, Velez
F, Goncalves V, Balasingham I (2012) Cognitive radio for medical body area networks using
ultra wideband. IEEE Wireless Commun 19(4):74–81
12. Santhosha S (2012) Improving the quality of service in wireless body area networks using
genetic algorithm. IOSR J Eng 2(6)
13. Qadri SF, Awan SA, Amjad M, Anwar M, Shehzad S (2013) Applications, challenges, security
of wireless body area networks (WBANs) and functionality of IEEE 802.15.4/ZIGBEE. Sci
Int (Lahore) 25(4):697–702. ISSN 1013-5316
Prediction Model for the Healthcare
Industry Using Machine Learning

Birendra Kumar Saraswat, Aditya Saxena, and P. C. Vashist

Abstract The role of machine learning in health care in emerging times, the field
of research is industry. In machine learning, there are various forms of learning,
including supervised, unsupervised, and reinforcement learning. These strategies
are necessary to discover previously unknown relationships in data that are bene-
ficial to society. In predictive modeling, historical data are used to predict a result
variable. The uses of machine learning in medical care are turning into a benefit
for disease identification and diagnostics. The healthcare industry can benefit from
machine learning’s capacity to assist in the intelligent analysis of huge amounts of
data. Different methods of machine learning, including supervised, unsupervised,
and semi-supervised, reinforcement learning for health care, such as SVM, KNN,
K-Mean clustering, neural network, and decision tree, provide varying levels of accu-
racy, precision, and sensitivity. The area of machine learning (ML) is on the rise.
The purpose of machine learning is to automatically discover patterns and reason
with data. ML offers tailored therapy-dubbed precision medicine. Health care has
benefited from the application of machine learning approaches. Within a few years,
machine learning will alter the healthcare industry.

Keywords Health care · Machine learning · Advantages · Application ·


Healthcare models · Classification of models

B. K. Saraswat (B) · A. Saxena


GLA University, Mathura, U.P., India
e-mail: saraswatbirendra@gmail.com
A. Saxena
e-mail: aditya.saxena@gla.ac.in
P. C. Vashist
G L Bajaj Institute of Technology and Management, Greater Noida, U.P., India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 33
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_4
34 B. K. Saraswat et al.

1 Introduction

The urgent need for machine learning has been highlighted by the massive rise of
data in the medical industry. Machine learning is proving supportive in removing
relevant data from massive information. The classification and grouping of data can
be assisted by the machine learning algorithm, but the lack of a reliable dataset
presents the biggest challenge in the healthcare industry. Just while the preparation
datasets are liberated from destructive material, would the forecasts made utilizing
AI calculations to be precise.
Nowadays, in healthcare industry, machine learning techniques like deep learning
are producing actual categorization and prediction results for images. Recent
improvement have expanded the sources of data collecting to include.
In [1], the creator proposes a brain network-based calculation for anticipating
ailment risk in view of coordinated and unstructured medical clinic information.
Utilizing machine learning in the well-being area empowers the disclosure of illness
designs from huge datasets.
Utilizing AI to distinguish designated mistakes, creators have ordered datasets of
harming assaults to find designated issues.
Researchers in the healthcare sector have a significant opportunity to improve the
healthcare sector through the discovery of knowledge data contained in electronic
data. Using computational approaches, machine learning manages complex datasets
to draw meaningful conclusions. There is a vast application space for ML algorithms
in the analysis of massive amounts of healthcare data. The machine learning classi-
fiers presented by the authors are intended to aid with healthcare-related results. It is
a choice help framework for the characterization of electronic patient records. The
machine learning approaches aid in the analysis of (a) organized and formless data,
(b) grouping of patients with alike medical signs, and (c) illness prediction.
The tools of machine learning that are aid in prescient examination of medical
care exercises. Apache Mahout, Sky tree, Karma sphere, and Big ML are examples
of (a). These tools carry out large data analytics and data mining. The limitations
and potential applications of picture examination and AI in computerized pathology
are discussed in [2]. Images containing unstructured and complex data can be used
for unsupervised learning using deep learning techniques in conjunction with ANN.
When building and training machine learning models to create predictive models,
both supervised and unsupervised learning techniques extract attributes from a lot
of information.
The authors of [3] conducted a literature review on machine learning in the health-
care industry. The following issues of transforming health data are overcome by deep
learning technology, leading to enhanced healthcare: Medical data might have hetero-
geneous properties, unstructured data, missing values, data from many domains,
complexity, and unstructured data. Building predictive models uses data mining and
statistical methods.
Prediction Model for the Healthcare Industry Using Machine Learning 35

1.1 Motivation

The use of machine learning and artificial intelligence in health care is urgently
required. The force of Python-like dialects can be appropriately investigated in AI to
ultimately benefit society. In order to execute predictive analytics, machine learning
uses algorithms for classification and clustering that learn from large datasets. The
doctors are able to acquire current information about health care and provide patients
with rapid treatment thanks to the predictive outcomes. In order to provide best-in-
class/leading-edge patient care and subsequently enhance well-being results, AI in
the healthcare industry has to develop analytical engines/products that alter how
the healthcare ecosystem accesses and generates value from integrated clinical
information.
With increasingly accurate predictions, machine learning aims to create more
favorable results. Techniques for machine learning rely largely on computational
power. Machine learning is built on creating algorithms that can accomplish this
using computers’ binary yes-or-no logic. In machine learning, there are two cate-
gories: supervised and unsupervised, respectively. Grouping methods are used in
unsupervised learning, whereas regression techniques and classification are used in
supervised learning. Figure 1 displays the categories for machine learning.

Fig. 1 Machine learning categories for health care


36 B. K. Saraswat et al.

1.2 Advantageous Industries of Machine Learning


for Health Care

Machine learning has a significant impact on intelligent learning analytics in several


industries and offers numerous prospects for study and growth. The arena of machine
learning supports (a) the automation of data analysis, (b) discovery of patterns in data,
(c) voice and audio processing, (d) object detection, (e) restriction and following, and
(f) image processing, forensics, and security [4]. In addition to that, the following
industries benefit from machine learning.
In the healthcare industry, prognosis prediction is aided by machine learning
algorithms.
Machine learning facilitates video, text recognition, and image. Machine learning
may learn from prior experiences and grow as a result. Computer-assisted image
analysis for predictive demonstrating in the healthcare industry. Detection of fraud
in the Monetary and Banking areas.
Real-time information examination in the web based shopping industry. Machine
learning facilitates a suitable approach to client service and relationship management.
Machine learning forecasts market trends based on vast datasets, effective client
input, and survey results.
Applications of machine learning for discourse and character acknowledgment;
biometric, face detection, and iris data processing. Machine learning for processing
natural language.
Signal processing and analysis of sensor data. Semantic web and ontologies,
intelligent semantic analysis. Biotechnology for pharmaceutical growth and health
treatment.

1.3 Machine Learning Applications in Healthcare

In the area of medical care, machine learning provides superior results. According
to a report by McKinsey, machine learning in pharmacy and medical might generate
up to $100 billion annually [5]. This is a result of the accelerated decision-making,
the effectiveness of Health Care trials, and enhanced innovation. There are numerous
machine learning applications in pharmacy. They are generally categorized as [6]:
• Identification and diagnosis of disease.
• Behavior Modification/Personalized Treatment.
• Drug Manufacturing/Discovery.
• Research on Healthcare Trials.
• Radiology and Radiotherapy.
• Intelligent Electronic Medical Records.
• Prediction of an Epidemic Outbreak.
Prediction Model for the Healthcare Industry Using Machine Learning 37

Based on data gathered from social media updates, the web, and satellite data,
machine learning and artificial intelligence technologies are frequently employed to
monitor and predict epidemic outbreaks [7]. Applications of machine learning in
health care for disease diagnosis have been developed successfully in the modern
era. In the earlier years, a lot of research was conducted in the field of medicine
to identify disorders using machine learning algorithms. To talk about some recent
efforts from the past that used machine learning techniques to detect diseases.
An idea in view of machine learning algorithms has been described in [8] in
order to assist physicians and patients in detecting disease in its early stages. This
study exclusively employed a text-based dataset. This study created a system that
allows doctors to diagnose common diseases using patient symptoms. A review of
disease diagnosis using machine learning approaches is presented in the work [9].
This study demonstrates the application of machine learning to high-layered and
multi-layered information and draws some conclusions about the algorithms’ limits.
In an article [10], another investigation into the use of machine learning algorithms
for medical diagnostics. The utilization of several machine learning algorithms for
precise medical diagnosis was the main topic of this research. In this paper, a thorough
analysis of machine learning methods for disease diagnosis is conducted. The focus
of this paper is on recent advances in machine learning that have had a substan-
tial impact on disease identification and diagnosis. Numerous studies have been
conducted to predict heart disorders, such as [11], which used a variety of machine
learning algorithms to identify early-stage heart disease, including SVM, RF, KNN,
and ANN classification algorithms. In the research publication [12], NN was used to
create a diagnostic system that could accurately predict the likelihood of developing
a heart condition. Congestive heart failure has emerged as one of the leading killers
in the modern world. Wu et al. [13] describes the development of four classification
categories for the precise diagnosis of fatty liver disease. This research demonstrates
that the random forest model outperforms the other four models.
The goal of this paper is to use machine learning techniques to improve liver
disease diagnosis. The major objective of this study is to categorize liver patients
from healthy individuals using classification algorithms. Machine learning tech-
niques have been used to predict the hepatitis disease in [14]. In order to forecast the
cirrhosis development in a sizable cohort of CHC-infected individuals. In the study
[15, 16], data mining algorithms were utilized to forecast the phases of renal disease.
The PNN method offers superior performance in both classification and prediction
when compared to the others. SVM performs better than DT when it comes to
predicting chronic renal disease, yet both of these machine learning techniques are
utilized. In the publication [17], a machine learning system is used to make a predic-
tion for type-2 diabetes. In order to put the prediction model into action, support
vector machine is utilized. The findings of the experiments presented in the research
publication [18] demonstrate that the random forest method is an excellent choice for
developing an effective machine learning model to diagnose diabetes. In the study
[19], researchers utilized four distinct machine learning algorithms in order to make
a diabetes mellitus prediction among the adult population.
38 B. K. Saraswat et al.

2 Role of Machine Learning in Health Care

Even if huge jobs have been completed by researchers in the machine learning arena,
this does not indicate that all manpowers will be replaced by computers very soon.
To begin, machine learning is analogous to a tool, which can only function as effec-
tively as its operator. Since the 1970s, ML has been a component of research in the
healthcare industry. Initially, this study was utilized to calibrate antibiotic dosages
for patients with infections.
Pathology is the process of detecting disease by referring to the findings of labo-
ratory tests on physiological fluids and tissues, such as urine, blood, and other bodily
fluids. The traditional approaches, such as using microscopes, can be supplemented
with the use of machine vision and other machine learning technologies. Through
the use of machine learning systems, clinicians are able to diagnose rare diseases by
merging algorithms with facial recognition software. The patient photos are analyzed
and interpreted by using face analysis and machine learning techniques, which allows
for the determination of phenotypes that are associated with uncommon genetic
illnesses (Fig. 2).

2.1 Hospital Management and Patient Care

Because the medical care zone and hospitals give such a wide variety of essential
services, it is faced with a great deal of difficulty in providing care to patients. When

Keeping
Well Early
Detection

Training

Machine Learning Diagnosis

End of
Life Care

Decision
Treatment Making

Fig. 2 ML in healthcare analysis


Prediction Model for the Healthcare Industry Using Machine Learning 39

considering the efficacy of the treatment over the long term, affording the enor-
mous cost becomes a tough task because hospitals and clinics are bound together by
resource management. The primary objective of hospital administration is to ensure
that patients receive the necessary care by recruiting qualified medical personnel and
efficiently allocating time for the use of diagnostic equipment. Machine learning is a
crucial component in each of these sectors, from the management of inventories to the
notification of emergency departments for patient treatment. It is possible to estimate
the amount of time a patient will have to wait by taking into account factors such
as staffing levels. Patients can be monitored remotely, and virtual nursing assistants
can provide assistance over the phone.

2.2 Pattern Matching in Genetic Research

Research into the various patterns of gene expression has been one of the most
important aspects of medical investigation during the past twenty years. There are
already approximately 30,000 coding regions in the human genome, and these are
responsible for the production of a large number of proteins [20]. Therefore, ML-
based solutions are essential in genomic analysis since they tackle problems of a
complex and high dimensional nature in an ideal manner. Gene expression and gene
sequencing are the two basic sorts of genomic analysis that may be carried out with the
assistance of ML. Additionally, machine learning is utilized in gene editing through
the application of CRISPR technology to modify DNA sequencing. The collection of
additional data for the development of efficient machine learning models is necessary
if we are to achieve better deterministic gene alteration.

2.3 Effective Disease Health

Diagnostic procedures are a group of examinations that are carried out to identify
diseases, specific medical problems, and infections. The most crucial component in
the improvement of patients is getting an accurate diagnosis as quickly as possible
and then providing treatment that actually works. Errors in diagnosis are respon-
sible for roughly ten percent of all human fatalities and between six and seventeen
percent of all complications that occur in hospitals. ML offers solutions to problems
that arise during the diagnostic process, particularly those that arise when dealing
with oncology and pathology. In addition, when looking at electronic health records
(EHR), machine learning offers improved diagnostic prediction (Fig. 3). In the field
of healthcare research, machine learning is mostly focused on image-based diag-
nosis and image analysis. Imaging that is based on machine learning is a wonderful
fit for the massive datasets that are seen in the healthcare industry since it requires
correct classification of enormous datasets [21, 22]. There is a problem caused by
differences in the imaging technologies used in medical imaging. It is possible that
40 B. K. Saraswat et al.

Fig. 3 Role of ML in healthcare decision-making process

a certain model that was educated using the imaging system from developer will not
function properly when presented with images taken from another developer.

3 Machine Learning Models to Classify Healthcare


Industry

Algorithms used in machine learning construct a mathematical model by using


sample data as input. The training data are labeled in supervised learning and the reac-
tion variable might be discrete or subjective (for arrangement tasks) or continuous or
quantitative (for regression tasks) [23]. This can be accomplished using either binary
or multiple classes. Several machine learning methods that have been demonstrated
to provide accurate diagnostics for health care are addressed further below [24].

3.1 Logistic Regression (LoR)

Logistic Regression (LoR) is a method that can be used to categorize data into
outcomes that are either yes or no. Using the logit function, which is defined as logit
= log(p/1 − p) = log (possibility of occurrence/possibility of event not occurrence)
= log, it determines the probability that an event will take place (Odds). Estimation is
performed using the Maximum Likelihood method, in which the model’s coefficients
provide information about the relative relevance of the various input features. Models
can be prevented from overfitting the training data by the use of regularization. Large
Prediction Model for the Healthcare Industry Using Machine Learning 41

sample sizes are required for LoR. The sigmoidal curve that can be seen in the
following figure divides the data into two classes.

3.2 Neural Networks (NNs)

Neural Networks (NNs) take their cue from the way in which the brain operates. The
machine is able to learn and improve itself through the analysis of new data thanks to
an artificial network of neurons that can produce output as a result of receiving input.
Input data can be converted into output by using the activation function, for example,
the sigmoid function. Overfitting is likely to occur in a Neural Network with a greater
number of constraints; hence, regularization is necessary to avoid this problem. In
order to construct ANN, one must first import the necessary Keras library and then
initialize the architecture of ANN to be sequential. This entails piling all of the layers,
one on top of the other, building the ANN structure, and then stacking them in that
order [25]. Input layer has input = # of highlights, output = (# of elements + 1)/
2. Any secret layer can have the same # of info and result factors = (# of elements
+ 1)/2. For the subsequent layer, there is a compelling reason need to add input
hubs on the grounds that the design of the organization is set to be feedforward,
and that implies the results from one layer are utilized as the contributions for the
following layer. The yield layer or output layer has input = (# of elements + 1)/
2 and result = 1; a twofold factor portraying infected or not. In the hidden layers,
the Rectified Linear Unit (ReLU) activation function can be employed, while the
Sigmoid activation function can be used in the output layer to produce a prediction
between 0 and 1. If the value is larger than 0.5, then the model is diseased; otherwise,
the model is healthy. To construct a neural network using supervised learning, one
can make use of the Multi-layer Perceptron (MLP Classifier) class provided by the
sklearn neural network package. The input, unseen, and output layers of a Multi-layer
Perceptron neural network [26].

3.3 K-Nearest Neighbor (KNN)

K-Nearest Neighbor, also known as Instance-Based Learner (IBL), is a learner that


takes a lazy approach and is able to easily adapt to data it has not before seen.
As a non-parametric method, it is utilized in the processes of statistical estimation
and pattern identification. The distance metric that is applied while dealing with
continuous variables is
( )1/q

k
Minkowski = (|xi − yi|)
q
,
i=1
42 B. K. Saraswat et al.

and the Hamming statistic is used for categorical variables. Before attempting to
compute distance measure, the training set must first be standardized. The ideal
range for k is anywhere between 3 and 10.

3.4 Support Vector Machine (SVM)

Constructing a hyperplane or decision boundary is what the support vector machine


(SVM), which is also known as the Large Margin Classifier, does. An example of
this is depicted in the figure. The initial training data are given a greater dimension
as a result of this transformation. Linear SVM can be employed for models with a
small number of features, whereas the kernel technique is used for models with a big
number of features. Training entails bringing the error function to its lowest possible
value. In support vector machines (SVMs), the Radial Basis Function (RBF) is by
far the most common choice for the kernel type.

3.5 Decision Tree (DT)

Non-parametric decision trees, also known as decision trees (DT), are simple to read
and depict, and they are able to easily detention nonlinear designs. It not requires
normalization. Since DT is also biased toward datasets that are imbalanced, it is
recommended to first achieve dataset balance before developing the model. It does
this by partitioning the working space into subparts and then utilizing entropy and
data gain, gain ratio, or the Gini index to determine how well the classes are separated.
The ID3, C4.5 (an extension of ID3), and CART variants of decision tree each make
use of these metrics. It is also suitable for Feature Selection; however, decision tree
can overfit loud information, which can be mitigated by employing groups. It will
overall be utilized for Lost Value Attribution, and it is also appropriate for Feature
Selection.

3.6 Ensemble

Learners combine a variety of strategies to improve the model’s predictability and


stability. The majority of the time, simple models perform poorly since they contain
either a strong bias or excessively volatility to be resilient. There is a bias variance
trade-off when modifying the model’s difficulty. A model with a high partiality and
low discrepancy is dependably inaccurate, whereas a model with a low partiality
and high discrepancy is accurate but unpredictable. Imagine a scenario in which the
models were consolidated. The greatest difficulty lies in selecting basic models that
make distinct types of errors. There are three fundamental ensemble types:
Prediction Model for the Healthcare Industry Using Machine Learning 43

a. Bagging, also known as Bootstrap Aggregation, combines the findings of


homogenous base classifiers with high variance using an equal weighting scheme.
Majority in the event that there are issues with classification, voting and soft
voting are utilized as combination tools. The term “bootstrapping” refers to the
practice of saving each sample that is collected before going on to collect next
sample. A model with a high variance is not desirable because the model perfor-
mance is heavily dependent on the data used for training it. One can get less
variation in the data by expanding training set’s size, but this does not have an
impact on the predictive ability of the model. Therefore, it is possible that the
model will still have poor performance even if more data for training are provided.
Hence, bagging increases stability and prevents overfitting at the same time. As
opposed to pasting samples, which do not involve replacement, bagging samples
include replacement. Random Forest, sometimes known as RF, is an extension of
bagging that operates on the “wisdom of the crowd” approach. Random Forest
is an collective of decision tree, and its generation involves making an arbitrary
choice from among a set of features available at each node in order to establish
the split. During the categorization process, each tree has a vote, and the result is
the class that received the most votes. The use of bagging is efficient even with
a small amount of data.
b. The training data are used in the first step of the boosting process, which then
develops a second model to rectify the errors made by the first model. This is
done by giving more weight to the first model in order to improve the perfor-
mance of a less capable learner. Boosting trains weak classifiers with an accuracy
between 50 and 60%, such as (DT with max depth = 1, logistic regression linear
classifier), and then adds them to a final strong classifier by weighting them.
As a result, boosting is expensive, slow, and cannot be scaled. The samples that
were incorrectly identified put on weight, whereas the ones that were correctly
classified shed pounds. The algorithm learns from examples of incorrect classi-
fications. It does this by first creating a series of models with an average level of
performance using subsets of the original data and then “boosting” those models’
performances by combining them using a certain cost function (which is synony-
mous with a majority vote). As a result, boosting brings about a reduction in the
model’s bias. Since boosting can result in overfitting, we need to make sure to
stop at the appropriate moment. AdaBoost, the complexity of additive modeling
with base estimator as RF is linear, and it performs even better than RF despite
its name. The fundamental model is trained using all of the available data.
Figure 4 depicts a machine learning model focused more on patient. The ML
process is represented by the model [27]. The model can be broken down into
several distinct phases. Initially, to design an effective system, first step is to record
the patients’ symptoms in an input dataset, thereafter selection of appropriate ML
model. Following training system has supposed to provide an accurate diagnosis of
the patient. Knowledge representation is done by the model’s different phases, and
it will continually update itself for additional training. The patient demonstrative
outcomes that were anticipated will be refreshed into the information framework
44 B. K. Saraswat et al.

as quiet history for the purpose of achieving more precise analytical results and
doing additional research. The purpose of this model is to make predictions about
the patients’ prognoses based on the symptoms they present with (Table 1).

Fig. 4 Patient-centric machine learning model

Table 1 Comparison of existing ML techniques


S. No. Authors/years Machine learning Findings
techniques
1. Durai et al. [28] SVM and NB SVM technique is accuracy rate of
95.04%
2. Mehtaj Banu [29] Reinforcement and AVM and KNN have improved
KNN prediction performance accuracy
3. Sivakumar et al. [30] K-means and KNN K-means techniques have less
technique accuracy than C4.5
4. Ma et al. [31] KNN, logistic Other algorithms have less
regression, SVM, and accurate result than ANN
ANN
5. Sontakke et al. [32] SVM and CNN More accuracy result in CNN
6. Hashem et al. [33] PSO and GA Other algorithms have less
accurate result than PSO
7. Sindhuja and Jemina NB, SVM, and decision Other algorithms have less
Priyadarsini [34] tree accurate result than decision tree
Prediction Model for the Healthcare Industry Using Machine Learning 45

4 Conclusion

The development of reliable risk models for health care can be facilitated through the
utilization of AI to the relevant data. The healthcare industry is already struggling
under the weight of an increasing number of patients despite a dearth of appropri-
ately qualified medical professionals. Large corporations such as Google, Enlitic,
and MedAware have all initiated significant projects aimed at enhancing the capa-
bilities of machine learning as well as artificial intelligence systems utilized within
the healthcare sector. It is not possible for there to be a natural rise in the number of
competent healthcare professionals. The application of technologies that use machine
learning can boost the productivity and accuracy of those already in use. The utiliza-
tion of these technologies will assist in the service of more patients in a shorter
amount of time, hence improving healthcare outcomes while also lowering the cost
of health care.
Precision refers to the percentage of times a diseased patient was diagnosed by a
machine as having that disease, whereas sensitivity and recall refer to the percentage
of times the computer correctly predicted disease among all afflicted patients. There-
fore, it is necessary for reputable hospitals to keep data electronically and make use of
it for the purpose of conducting relevant analytics, which can assist both patients and
medical professionals in making fast and correct diagnoses. Therefore, a promising
area of research would be to increase the capability of machine learning algorithms
to generalize, particularly with regard to imbalanced classes and multiclass data.

References

1. Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning
over big data from healthcare communities. IEEE Access 5:8869–8879
2. Palanisamy V, Thirunavukarasu R (2017) Implications of big data analytics in developing
healthcare frameworks—a review. J King Saud Univ Comput Inf Sci
3. Madabhushi A, Lee G (2016) Image analysis and machine learning in digital pathology:
challenges and opportunities
4. Quer G, Muse ED, Nikzad N, Topol EJ, Steinhubl SR (2017) Augmenting diagnostic vision
with AI. Lancet 390(10091):221
5. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G,
Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29
6. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E,
Agapow PM, Zietz M, Hoffman MM et al (2018) Opportunities and obstacles for deep learning
in biology and medicine. J R Soc Interface 15(141):20170387
7. Gottesman O, Johansson F, Komorowski M, Faisal A, Sontag D, Doshi-Velez F, Celi LA (2019)
Guidelines for reinforcement learning in healthcare. Nat Med 25(1):16–18
8. Sunny AD, Kulshreshtha S, Singh S, Srinabh BM, Sarojadevi H (2018) Disease diagnosis
system by exploring machine learning algorithms. Int J Innov Eng Technol (IJIET) 10(2):14–21
9. Razia S, Swathi Prathyusha P, Vamsi Krishnan N, Sumana S (2017) A review on disease
diagnosis using machine learning techniques. Int J Pure Appl Math 117(16):79–85
10. Pavithra D, Jayanthi AN (2018) A study on machine learning algorithm in medical diagnosis.
Int J Adv Res Comput Sci 9(4):42–46
46 B. K. Saraswat et al.

11. Mamatha Alex P, Shaji SP (2019) Prediction and diagnosis of heart disease patients using data
mining technique. In: International conference on communication and signal processing, pp
0848–0852
12. Subhadra K, Vikas B (2019) Neural network based intelligent system for predicting heart
disease. Int J Innov Technol Explor Eng (IJITEE) 8(5):484–487
13. Wu C-C, Yeh W-C, Hsu W-D, Islam MM, Nguyen PA, Poly TN, Wang Y-C, Yang H-C, Li Y-C
(2019) Prediction of fatty liver disease using machine learning algorithms. Comput Methods
Programs Biomed 170:23–29
14. Konerman MA, Beste LA, Van T, Liu B, Zhang X, Zhu J, Saini SD, Su GL, Nallamothu BK,
Ioannou GN, Waljee AK (2019) Machine learning models to predict disease progression among
veterans with hepatitis C virus. PLoS ONE 14(1):1–14
15. Tian X, Chong Y, Huang Y, Guo P, Li M, Zhang W, Du Z, Li X, Hao Y (2019) Using machine
learning algorithms to predict hepatitis B surface antigen seroclearance. Comput Math Methods
Med 2019:1–7
16. Rady E-HA, Anwar AS (2019) Prediction of kidney disease stages using data mining
algorithms. Inf Med Unlocked 15:100178
17. Abbas H, Alic L, Rios M, Abdul-Ghani M, Qaraqe K (2019) Predicting diabetes in healthy
population through machine learning. In: 2019 IEEE 32nd international symposium on
computer-based medical systems (CBMS), pp 567–570
18. Benbelkacem S, Atmani B (2019) Random forests for diabetes diagnosis. In: 2019 international
conference on computer and information sciences (ICCIS), pp 1–4
19. Faruque MF, Asaduzzaman, Sarker IH (2019) Performance analysis of machine learning tech-
niques to predict diabetes mellitus. In: 2019 international conference on electrical, computer
and communication engineering (ECCE), pp 1–4
20. Alansary A, Oktay O, Li Y, Le Folgoc L, Hou B, Vaillant G et al (2019) Evaluating reinforcement
learning agents for anatomical landmark detection. Med Image Anal 53:156–164
21. Peyrou B, Vignaux J-J, André A (2019) Artificial intelligence and health care. In: André A
(ed) Digital medicine. Springer International Publishing, Cham, pp 29–40. https://doi.org/10.
1007/978-3-319-98216-8_3
22. Sathya D, Sudha V, Jagadeesan D (2019) Application of machine learning techniques in health-
care. In: Handbook of research on applications and implementations of machine learning tech-
niques. IGI Global, Pennsylvania, pp 289–304. https://doi.org/10.4018/978-1-5225-9902-9.
ch015
23. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL (2018) Artificial intelligence
in radiology. Nat Rev Cancer 18:500–510. https://doi.org/10.1038/s41568-018-0016-5
24. Dzobo K, Adotey S, Thomford NE, Dzobo W (2020) Integrating artificial and human intel-
ligence: a partnership for responsible innovation in biomedical engineering and medicine.
OMICS J Integr Biol 24:247–263. https://doi.org/10.1089/omi.2019.0038
25. Luo J, Wu M, Gopukumar D, Zhao Y (2016) Big data application in biomedical research and
health care: a literature review. Biomed Inform Insights 8:BII.S31559
26. Luo Y, Szolovits P, Dighe AS, Baron JM (2016) Using machine learning to predict laboratory
test results. Am J Clin Pathol 145(6):778–788
27. Guleria P, Sood M (2020) Intelligent learning analytics in healthcare sector using machine
learning. In: Jain V, Chatterjee J (eds) Machine learning with health care perspective. Learning
and analytics in intelligent systems, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-
030-40850-3_3
28. Durai V, Ramesh S, Kalthireddy D (2019) Liver disease prediction using machine learning
29. Mehtaj Banu H (2019) Liver disease prediction using machine-learning algorithms. Int J Eng
Adv Technol (IJEAT) 8(6). ISSN: 2249-8958
30. Sivakumar D, Varchagall M, Gusha SAL (2019) Chronic liver disease prediction analysis
based on the impact of life quality attributes. Int J Recent Technol Eng (IJRTE) 7(6S5). ISSN:
2277-3878
31. Ma H, Xu C, Shen Z, Yu C, Li Y (2018) Application of machine learning techniques for
clinical predictive modeling: a cross-sectional study on nonalcoholic fatty liver disease in
China. BioMed Res Int 2018
Prediction Model for the Healthcare Industry Using Machine Learning 47

32. Sontakke S, Lohokare J, Dani R (2017) Diagnosis of liver diseases using machine learning.
In: 2017 international conference on emerging trends & innovation in ICT (ICEI), Feb 2017.
IEEE, pp 129–133
33. Hashem S et al (2017) Comparison of machine learning approaches for prediction of advanced
liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinform
15(3):861–868
34. Sindhuja D, Jemina Priyadarsini R (2016) A survey on classification techniques in data mining
for analyzing liver disease disorder. Int J Comput Sci Mob Comput 5(5):483–488
An Efficient Framework for Crime
Prediction Using Feature Engineering
and Machine Learning

Vengadeswaran , Dhanush Binu, and Lokesh Rai

Abstract The growth of crime in society produces insecurity among people and
severely impacts the country’s economic development. Understanding crime pat-
terns is necessary to provide a proactive response to curb criminal activities. Most
crime prediction models harnessing machine learning/deep learning techniques fail
to exhibit significant results for the vague and inconsistent dataset. This is due to the
non-consideration of suitable pre-processing and feature engineering techniques,
resulting in an unreliable and inaccurate model. Hence in this work, an efficient
framework for crime prediction using Feature engineering and Machine learning is
proposed. Initially, the vague and inconsistent input dataset is pre-processed using
Feature Scaling and Encoding input values. After pre-processing DBSCAN clus-
tering is applied to extract geospatial features, which is used for improving model
accuracy. Then SelectKbest library is used to select the 10 best features for the
prediction model. Further, the SMOTE library is used to oversample crime types
with the lowest sample size. Finally, for classifying the crime types, XGBoost
and LGBM classifier models are evaluated. The proposed crime prediction model
was validated by deploying an Amazon EC2 P3 instance in the cloud environ-
ment. The proposed model yields an improved accuracy of 62% for the XGBoost
Classifier and 63% for the LGBM Classifier compared with KNN and Decision
Tree.

Keywords Crime prediction · Feature engineering · KNN · XGBoost

Vengadeswaran (B) · D. Binu · L. Rai


Department of Computer Science and Engineering, Indian Institute of Information Technology
Kottayam, Kottayam, India
e-mail: vengadesh@iiitkottayam.ac.in
URL: https://www.iiitkottayam.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 49
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_5
50 Vengadeswaran et al.

1 Introduction

The growth of crime is undoubtedly a threat to people in particular and society at


large. According to world statistics and the crime index, worldwide crime occurrence
continuously shows an increasing trend, leading to insecurity in the country. A recent
survey [1] indicates a significant increase of 0.74 in crime from 2019 to 2020. Accord-
ing to the crime index, Venezuela is the country with the most crime; some countries
have warned against travelling to Venezuela due to its high crime rate [2]. This is a
significant problem around the globe since it produces insecurity among righteous
people, and it will also severely impact the country’s economic development. Most
countries have invested a lot of capital in identifying crime trends. Understanding
crime patterns is necessary to provide a proactive response to curb criminal activities.
Hence we proposed a robust crime prediction system that can detect crime hotspots
and recurring patterns in crime so that enforcing agencies can deploy more forces
that eventually help control crime in that area. The research survey done by [3–5]
lays out many factors that indirectly and directly affect crime.
In the past era, enforcement agencies used historical trends to predict crime. But
these results could not follow the deviation from the trends and were not accurate and
reliable. However, due to the advent of machine learning/deep learning techniques,
the accuracy of crime prediction can be significantly improved. The problem is that
data regarding the crime are vague and not in the proper format; hence we need to
perform appropriate data pre-processing and convert the data into a form that is easily
recognizable. Also, the perfect method to predict future crime occurrences would
mean considering the excellent features so that we can accurately detect the crime
type. Considering the above requirements, in this work an efficient framework for
crime prediction using both Feature engineering and Machine learning is proposed.
The Atlanta dataset with Crime type, Count, Date, Location, Beat, Neighbourhood,
NPU, Latitude and Longitude as the features are considered for the evaluation of the
proposed work.
The rest of the paper is organized as follows. Section 2 describes Related works,
the Proposed work is explained in detail in Sect. 3, Sect. 4 presents the Experimen-
tal Results and Analysis. Finally, Sect. 5 Concludes the paper with possible future
research directions.

2 Related Works

Several research works [6–9] proposed crime prediction models using Machine learn-
ing and Deep learning techniques. Khan et al. [6] propose a novel methodology, where
the exploratory data analysis is performed by constructing different graphs to check
the distribution of data, etc. After examining the data, feature selection and transfor-
mation are done to further supplement the model. Different machine learning models
are then evaluated on the dataset and the results are compared with each other. The
An Efficient Framework for Crime Prediction Using Feature … 51

models used in this paper are naive Bayes, random forest, and gradient-boosting
methods. It was eventually concluded that the gradient-boosting methods performed
better on the dataset. The major problem with this approach is that it didn’t consider
the spatial features, which is a major drawback considering crime is very prone to
geographical features. Lin et al. [7] propose a system to detect crime types using
geographical features. The paper presents a novel method for extracting spatial and
temporal features and inputting them into a model that can utilize the above two
types of features to make a more accurate prediction. The paper also proposes a
grid-based approach for geographical feature extraction that helps represent spatial
features more accurately, thereby increasing the model’s accuracy. Here the major
issue in identifying the crime is that, in a particular region, there may be different
factors that affect the crime rates and types. Hence the factors are very much depen-
dent on the geographical area. Further, Bappee et al. [8] proposes a novel method for
predicting crime types by considering spatial features. The paper tries to extract the
crime hotspots for the classification algorithm. By considering the spatial features
the accuracy of the model has increased hence demonstrating the importance of con-
sidering spatial features. Zhang et al. [9] classifies crime types using the XGBoost
classifier and makes use of explainable AI to figure out which variables have the
most effect on the overall output using SHAP values. However, in both cases [8, 9],
the geospatial features are not extracted from the given dataset thereby significantly
reducing the performance during model evaluation. This has motivated us to study
the significance of clustering techniques [10, 11] in detail. Further we explored, how
clustering can be applied to extract the geospatial features from the input dataset and
employ those spatial features that can effectively classify crime types. In this work,
for classifying the crime types, the most promising classifiers such as XGBoost and
LGBM are evaluated for achieving improved accuracy.

3 Proposed Work

In this work, an efficient framework for crime prediction using Feature engineering
and Machine learning is proposed. The detailed work methodology is explained in
this section (Fig. 1).

3.1 Study Area and Dataset

The region considered for this study is the city of Atlanta, the most populous city
in the state of Georgia, USA. Atlanta encompasses 134.0 square miles, of which
133.2 square miles is land and the rest is water (Fig. 2a). The town is situated in
the foothills of the Appalachian Mountains and east of the Mississippi River. The
Atlanta crime dataset collected from the open-source platform, Kaggle [12], is used
52 Vengadeswaran et al.

Fig. 1 Proposed work methodology

Fig. 2 Study area and dataset

for the evaluation of the proposed work. The dataset contains information about every
crime that has occurred in Atlanta from 2009 to 2017. A detailed description of the
attributes in the dataset is represented in Fig. 2b.
An Efficient Framework for Crime Prediction Using Feature … 53

3.2 Pre-processing

In the original crime dataset, there were 11 crime types such as Larceny from Vehi-
cle, Larceny non-Vehicle, Burglary-Residence, Auto theft, Burglary-non Residence,
AGG Assault, Robbery Pedestrian, Robbery-Residence, Robbery-Commercial, Rape,
and Homicide. After initial experimentation, it was found that grouping of crime types
significantly improves the accuracy of crime prediction. Accordingly, the ideal way
of grouping these crime types was identified as follows:

• It was found that the crime types like Homicide, Rape, and Auto theft exhibited
random behaviours and hence were removed entirely from our prediction model.
Also, it observed from experiments, that the model performed better when all
types of burglary and Robbery (Burglary-non-res, Burglary residential, Robbery-
residence, Robbery-commercial, and Robbery-pedestrian) are combined into one
single group called “Burglary”.
After performing the above steps, there were, in total, four crime types. However,
the data still was not in the proper format to be inputted into the machine learning
model; hence three distinct pre-processing steps were performed.
Step 1: Removing Null-Values The presence of null values signified a data loss
and did not provide any meaningful information; hence all the null values were
removed from the dataset using the Pandas library. This will facilitate in improving
the accuracy of the LBGM classifier used in our proposed work.
Step 2: Feature Scaling Since we are performing the classification using a machine
learning model, the numerical data must be on the same scale. Hence a min–max
scaler was used to scale the numerical columns in our dataset (lat, long, beat). It scales
the values to a specific range without changing the shape of the original distribution.
Due to this, the distribution of attributes will have a smaller standard deviation, which
can suppress the effect of outliers.

x − min(x)
. x' = (1)
max(x) − min(x)

where x is an original value, .x ' is the normalized value, min(x) is the Minimum
feature value and max(x) is the Maximum feature value.
Step 3: Encoding Input Values Input values that are categorical are encoded using
one-hot encoding. It converts categorical data into a format that can be fed into
machine learning models to improve prediction accuracy. In the proposed work, a
one-hot encoding transform function available in the scikit-learn Python ML library
is applied, which results in a binary vector for each variable.
54 Vengadeswaran et al.

Table 1 Classification report for XGBoost classifier


Precision Recall F1 score Support
0 0.72 0.93 0.81 11,569
1 0.67 0.47 0.55 18,933
2 0.66 0.56 0.61 17,345
3 0.42 0.69 0.52 7540
Accuracy 0.62 55,387
Macro avg. 0.62 0.66 0.62 55,387
Weighted avg. 0.64 0.62 0.62 55,387

3.3 Feature Engineering

The original dataset contains nine features, as described in Table 1. These features
were insufficient to classify the crime types accurately; hence for the proposed work,
new temporal features were created. By converting the timestamp column in the
original dataset to DateTime format, temporal features such as day of the week,
day, month, quarter, and holiday were extracted. Another issue to be handled was
the longitude and latitude features. These features provide crucial geospatial data
that was critical in determining the crime type, but the range of the latitude and
longitude column was not significant. Hence, Feature engineering is performed to
extract meaningful information from the data.
In this work, Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) is considered for geospatial features. DBSCAN [11] is a density-based
clustering system that clusters points based on the number of neighbouring points.
The DBSCAN clustering will iterate from point to point, calculate the distances
among points, identify core points and then cluster the surrounding points together
[13]. The main shortcoming of DBSCAN viz high computation costs and consider-
ation of only local characteristics while identifying the cluster is alleviated in this
work by using the Mahalanobis distance metric. The Mahalanobis distance . DM (x)
from a point data, .x, to a cluster with mean, .µ, and covariance matrix, . S, are defined
by the following equations.

. DM (x) = (x − µ)S −1 (x − µ)T (2)

To select the best parameter for the clustering, hyper-parameter tuning was per-
formed. The best epsilon value in DBSCAN to maximize the output was identified
using elbow methods. In this work, Latitude, Longitude, and Crime count are con-
sidered as the input vectors. The DBSCAN clusters the input vector and provides
the high-frequency crime hotspots as 3 clusters with increasing intensity (Fig. 3a).
These clusters representing geospatial features are then used for achieving improved
model accuracy.
An Efficient Framework for Crime Prediction Using Feature … 55

Fig. 3 Feature engineering

3.4 Feature Selection

Since the input dataset contains many features that are not useful for classification
purposes, feature selection is performed on the dataset by using the SelectKbest
library. It determines the top 10 best features which are considered for model evalua-
tion. SelectKBest library operates on the assumption that the features in a dataset are
independent and identically distributed, and their importance can be quantified by
statistical tests. The higher the feature score, the more critical it is for predicting the
target variable. It selects the ‘k’ best features from the dataset based on their scores
(Fig. 3b).

3.5 SMOTE

The target variable has four crime types after the pre-processing steps, which include
Assault, Burglary, Larceny non-vehicle, and Larceny from a vehicle. But the problem
is that not all the crime types have equal samples; this can create a massive problem for
the model since it can be biased towards the highest sample. The paper [14] describes
the problems related to learning from class-imbalanced datasets. To overcome such
problems, in the proposed work, SMOTE library is used to oversample the crime type
with the smallest sample size. SMOTE stands for synthetic minority oversampling
technique, and it creates synthetic minority class samples. The primary overflow of
smote is that it selects one of the instances of the minority class and selects one
or more of its nearest neighbours based on a distance metric [15]. The significance
of SMOTE high-dimensional imbalanced datasets has been experimented with and
validated in this work.
56 Vengadeswaran et al.

4 Experimental Results and Analysis

The experiments have been conducted on an Amazon AWS cloud platform [16]
since it is a preferred paradigm for performing massive computations. To carry out
the experiments, the Amazon EC2 P3 instance was established in the cloud, which
delivers high-performance computing for machine learning models. These instances
significantly accelerate machine learning applications. The detailed hardware spec-
ifications is shown in Fig. 4.
The target variables are label encoded before passing to the model. To validate the
performance of the trained model, the input dataset after pre-processing is segregated
into 2 parts, (i) Training dataset (80%) and (ii) Testing dataset (20%). During the
training process, the concept of class weights is used to negate the effect of class
imbalance caused due to the dataset. Moreover, hyper-parameter tuning was also done
to determine the best parameters (Fig. 4). For classifying the crime types, XGBoost
[17] and LGBM [18] classifiers are used in this work.
The LGBM classifier grows leaf-wise. It is extremely popular due to its lightweight
and being computationally inexpensive to train. XGBoost is designed to be faster
and less resource-consuming. XGBoost grows level-wise, resulting in fewer nodes
and hence consuming fewer resources. The performance of the proposed model is
evaluated through the performance metrics viz. Precision, Recall, and Accuracy as
represented in Fig. 5. The prediction model has been evaluated, and the performance
of the LGBM and XGBoost was studied in comparison with complementary works
already done using KNN [19] and using the Decision Tree [20]. The results for
LGBM and XGBoost on the crime dataset are listed in Tables 1 and 2. Table 1 shows
that the LGBM classifier achieves a maximum accuracy of 63% for the test data and

Fig. 4 Hardware specification and parameters found through hyper-parameter tuning

Fig. 5 Evaluation metrics


An Efficient Framework for Crime Prediction Using Feature … 57

Table 2 Classification report for LGBM classifier


Precision Recall F1 score Support
0 0.69 0.97 0.80 10,642
1 0.68 0.48 0.56 19,036
2 0.62 0.60 0.61 14,988
3 0.51 0.60 0.56 10,721
Accuracy 0.63 55,387
Macro avg. 0.63 0.66 0.63 55,387
Weighted avg. 0.63 0.63 0.62 55,387

Fig. 6 Epochs versus accuracy and loss

XGBoost achieves 62% accuracy, whereas Decision Tree and KNN have attained
only a maximum accuracy of 58% and 57%, respectively. Figure 6a, b shows the
graphical representation of the relation between the number of epochs used with
accuracy and loss for the prediction model. In addition, a significant improvement in
the prediction model was achieved due to adoption of Feature engineering, as shown
in Fig. 7. From graph it is inferred that LGBM achieves the maximum improvement
of 5% .(((63 − 60)/60) × 100) accuracy for the test data.
Outcome: The prediction model will be developed as a Visualization Dashboard
which benefits the law enforcement authorities to better understand crime issues and
provide insights that will enable them to track activities, predict the likelihood of
incidents, effectively deploy resources, and optimize the decision making.

5 Conclusion and Future Work

In this work, we studied the crime data of Atlanta to provide predictions using feature
engineering and machine learning. Here the most prominent geospatial clustering
DBSCAN is used to extract geospatial features, which are then used to increase the
overall model accuracy. The proposed crime prediction model was validated using
58 Vengadeswaran et al.

Fig. 7 Comparison of prediction model using different ML techniques

XGBoost and LGBM classifier deployed on the Amazon cloud environment. The
proposed model yields an improved accuracy of 62% for the XGBoost Classifier
and 63% for the LGBM Classifier compared with KNN and Decision Tree. This
model helps law enforcement to estimate future crime hotspots better and reduce the
number of crimes happening in the locality.
In future, to increase the prediction accuracy, the model will be evaluated using
big data tools such as Apache Spark for processing data and will tested with Deep
learning algorithms such as LSTM for prediction tasks.

Acknowledgements This research work is supported by the IoT Cloud Research Laboratory, Indian
Institute of Information Technology Kottayam, providing the infrastructure required for experi-
ments.

References

1. https://www.macrotrends.net/countries/WLD/world/crime-rate-statistics
2. https://worldpopulationreview.com/country-rankings/crime-rate-by-country
3. van Dijk J, Nieuwbeerta P, Joudo Larsen J (2021) Global crime patterns: an analysis of survey
data from 166 countries around the world, 2006–2019. J Quant Criminol 1–36
4. Jenga K, Catal C, Kar G (2023) Machine learning in crime prediction. J Ambient Intell Humaniz
Comput 1–27
5. Dakalbab F, Talib MA, Waraga OA, Nassif AB, Abbas S, Nasir Q (2022) Artificial intelligence
& crime prediction: a systematic literature review. Soc Sci Humanit Open 6(1):100342
6. Khan M, Ali A, Alharbi Y (2022) Predicting and preventing crime: a crime prediction model
using San Francisco crime data by classification techniques. Complexity 2022
An Efficient Framework for Crime Prediction Using Feature … 59

7. Lin YL, Yen MF, Yu LC (2018) Grid-based crime prediction using geographical features.
ISPRS Int J Geo-Inf 7(8):298
8. Bappee FK, Soares Júnior A, Matwin S (2018) Predicting crime using spatial features. In:
Advances in artificial intelligence: 31st Canadian conference on artificial intelligence, Canadian
AI 2018, Toronto, ON, 8–11 May 2018, proceedings 31. Springer International Publishing, pp
367–373
9. Zhang X, Liu L, Lan M, Song G, Xiao L, Chen J (2022) Interpretable machine learning models
for crime prediction. Comput Environ Urban Syst 94:101789
10. Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2021) BLOCK-DBSCAN: fast clustering
for large scale data. Pattern Recognit 109:107624
11. Deng D (2020) DBSCAN clustering algorithm based on density. In: 2020 7th international
forum on electrical engineering and automation (IFEEA), Sept 2020. IEEE, pp 949–953
12. Link: https://www.atlantapd.org/i-want-to/crime-data-downloads
13. Han X, Armenakis C, Jadidi M (2020) DBSCAN optimization for improving marine trajectory
clustering and anomaly detection. Int Arch Photogram Remote Sens Spat Inf Sci 43:455–461
14. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng
21(9):1263–1284
15. Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinform
14:1–16
16. Amazon AWS Link: https://aws.amazon.com/
17. XGBoost Classifier Documentation: https://xgboost.readthedocs.io/en/stable/
18. LGBM Classifier Documentation: https://lightgbm.readthedocs.io/en/latest/pythonapi/
lightgbm.LGBMClassifier.html
19. Shojaee S, Mustapha A, Sidi F, Jabar MA (2013) A study on classification learning algorithms
to predict crime status. Int J Digit Content Technol Appl 7(9):361
20. Kshatri SS, Narain B (2020) Analytical study of some selected classification algorithms and
crime prediction. Int J Eng Adv Technol 9(6):241–247
Development of an Online Collectable
Items Marketplace Using Modern
Practices of SDLC and Web Technologies

Saransh Khulbe, Divyam Gumber, and Shailendra Pratap Singh

Abstract This research paper is intended to study and implement the development of
a PHP-based system of an online collectable items marketplace which can onboard
multiple users and sellers while keeping a singular admin account. The research
intensely focuses on adapting modern techniques of the software development lifecy-
cle (SDLC), Pattern matching algorithms to optimize search times, secure multilayer
encryption techniques for login and purchase modules, overall security from XSS
attacks in the input fields while documenting the work and meanwhile proposing a
system with a rich feature set to satisfy user requirements. The development cycle
one has been intensely studied with usability testing and a tabular representation of
the results. Furthermore, the analysis of these results has been captured graphically
and conclusively a complete Web application is developed compliant to the essential
benchmarks of today’s time.

Keywords Pattern matching · Software development · Modular design ·


Encryption · Marketplace model

1 Introduction

Web development is a widely popular domain for development of ideas and materi-
alizing them into real world applications. However, with the advancement of com-
puter science many modern approaches must be put to practice in order to achieve
desired results. The motivation to develop an online antique marketplace is explained
briefly. Firstly, according to IBIS World, the antiques and collectibles sales industry
has grown over the past 5 years by 7.2%. In the year 2018 it reached a revenue of 2
billion USD, parallelly the business has grown by 1.3% and employees’ number has
grown by almost 1% (as of July 28, 2019).

S. Khulbe · D. Gumber · S. Pratap Singh (B)


School of Computing Science and Engineering, Galgotias University, Greater Noida,
Uttar Pradesh, India
e-mail: shail2007singh@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 61
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_6
62 S. Khulbe et al.

Secondly, as per reports from AT&T the global internet usage for purchase has
dramatically increased and makes it all the more essential for businesses to adopt such
means to stay afloat. Thirdly, in order to dive more into security and encryption-based
techniques the system must involve components like payments and login systems
which can be tested at a big scale. Lastly, the desire for speed and responsiveness
must be satisfied and in order to do so the development of such an application must
involve a rich feature set and improved search times. In order to effectively do so
we must delve into the domain of pattern searching algorithms. On the whole this
research is a collaborative effort that lets us understand the modern practices of
software development and how new ideas can be brought about in the lifecycle of
the Web application we plan to build.
Essentially this research provides a simulation to all considerations for modern
software development. The problem we aim to deal with involves the top-down
approach of software development wherein the working components are made in a
way that they simply tend to work and later refactoring the components and devel-
oping documentation so that others can collaborate. This approach works and with
a good skillset it is possible to achieve desired results. However, we plan to explore
multiple domains as effectively as possible and in order to do so we shall follow
a bottom-up approach of Software have a modular design and must address all the
features that have been discussed so far. This involves the identification of roles (such
as user can edit profile, onboard, search categorically, filter results) and implement
them in the modules that they concern. Secondly once the identified features are being
developed there must be identification of the novelties that need to be embedded into
the system.
In our case, the modules concerning login and payment must be encrypted with
the researched algorithms while the modules involving product searching must be
embedded with the algorithms involving pattern matching. The overall system must
be made highly interconnected so that future developments and maintenance is easily
achieved. In our case that would involve making the web-pages dynamically adjust
their behavior as per the real-time changes made by the multiple entities to the
database. In order to develop the application many techniques had been researched
involving modern SDLC and in order to suit our means we had to select among
many of the present SDLC models. However, our SDLC hybridizes multiple SDLC’s
such as the agile development model alongside with the classical waterfall model.
The waterfall model provided us the needed stability for the development of the
backbone while the agile development was embedded into the modules where the
revised algorithms needed to be tested multiple times. Multiple proposals were made
for the concerned algorithms and will be discussed further into the paper.

2 Literature Survey

To start with our project we first need a survey to be conducted. We studies different
SDLC models and gathered some insights about their strengths and weakness. Refer
to Table 1 to get the best insights that we gathered. Then we studied about XSS
Development of an Online Collectable Items Marketplace … 63

Table 1 Literature survey on SDLC


S. No. Title of research Conclusions Research gaps
1 [1] .• Introduced standard .• Hybridization not discussed
techniques
2 [2] .• Discussed detailed use case .• No room for cherry picking
of techniques
.• Unnecessary steps for small
scale development
3 [3] .• Give efficient database design .• Poorly addressed
normalization scenario
.• Normalize database crucial
for system
4 [4] .• Discussed modern HCI .• Non functional frontend
practices
.• Improvise UI for optimal .• Lacks implementation details
frontend development

attacks. When the attacker use malicious scripts in the data, such type of attacks
are called XSS attacks. There are various models to prevent and detect XSS attacks.
Refer to Table 2 to get detailed and structured information about the techniques to
prevent and detect these attacks.
Searching the product is a very big concern as the project database starts to
grow. As the size of the data increases the time complexity to search that data also
increases. Therefore we studied String pattern matching algorithms to make the
search of product efficient and have better user experience. Refer to Table 3 to get
the analysis of different algorithms.
We wanted our application to be unique and the best hence we did market surveys
of all the applications out there in the market. Analyzed their pros and cons and
tried to integrate all the best functionalities and improve their shortcomings in our
application. To get the best insights of the market refer to Table 4. We are very
much concerned with the security of our applications. Therefore, we have done an
encryption technique and SQL injection survey for advancements in our future phases
of development. Refer to Tables 5 and 6 to insights about the survey.

3 Proposed Work

After the entire literature analysis, we can finally start to discuss in detail about all
the work we propose to do. In the process of the development, we also used many
standard techniques like development of UML specified diagrams to carry out the
process in an appropriate manner.
64 S. Khulbe et al.

Table 2 Literature survey on XSS attacks


S. No. Title of research Conclusions Research gaps
1 [5] .• Explored static approach to .• Inconsistent database in the
detect XSS vulnerabilities models
.• Discussed CSRF, DOM-XSS .• Unresolved edge cases
models
2 [6] .• Attempted to prevent reflected .• Reflected XSS not detected
XSS attack
.• Filter and delete malacious .• No second factor
scripts authentication for PSS
3 [7] .• Discussed significance of .• Can’t handle conplex data
opcode sequence flow
4 [8] .• Requires less modifications at .• DOM-based XSS : No
server applications mitigation
5 [9] .• Explored potential injection .• Can’t anticipate DOM-based
points XSS vulnerabilities
6 [10] .• Discussed origin and dangers .• Haven’t covered all XSS
of XSS attacks

Table 3 Literature survey on pattern matching algorithms


S. No. Title of research Conclusions Research gaps
1 [11] .• Accurate information .• Algorithm rarely used
on pattern matching
.• No limitations of size .• High time complexity
and accuracy
2 [12] .• Very apt for the .• Need to improvise existing
implementation algorithm
.• Satisfies user requirements

.• Aptly discusses various


means to do so

Table 4 Literature survey on online marketplaces


S. No. Title of research Conclusions Research gaps
1 [13] .• Provides us useful system .• Poor user interface
design
.• Introduces important .• Impractical
web-based architectures
2 [14] .• Detailed explanation of the .• Need better categorical
model approach
.• Good insights on database .• Need modification to suit the
design application
Development of an Online Collectable Items Marketplace … 65

Table 5 Literature survey on encryption techniques


S. No. Title of research Conclusions Research Gaps
1 [15] .• Explained MD5 encryption .• Restricted to MD5 only
approach
.• Discussed detailed .•
Many unexplored use cases
implementation left
2 [16] .• Discussed detailed .• Need modifications to embed
implementation in our applications
.• Secure for large-scale systems
.• Unfit for customization for
many scenarios
3 [17] .• Usage of new and improvised .• Need modifications in the
algorithms algorithms
.• Multiple layers for security

4 [18] .• Complex and high-level .• High server load


authentication used
.• Robust and uncrackable .• High maintenance required
system to work
5 [19] .• New pattern-based encryption .• Lacks implementation details
algorithm
.• Relatively novel and
irreversible

3.1 Expected Features

The development process involves many detailed steps and shall not be discussed
entirely through words. Instead, we can claim that there was a total of 25 usable
features that were identified and implemented among 3 modules (User, Seller, and
Admin). Now upon detailed analysis of encryption algorithms and the provided fea-
ture set which already exists in PHP, we used a combination of SHA encryption
algorithms with MD5 so as to provide the system with a double encrypted security.
Strong contenders for the encryption algorithms included salt which had a key-
value-based approach of encryption and provided much better security than each of
the algorithms discussed so far. This algorithm however on its own is very capable. It
consumed a lot of time to encrypt the strings and thus in comparison the dual encryp-
tion technology fared better. The need for this encryption technology is expected to
be used in the login and onboarding functionalities of the application. The appli-
cation also aims to use a session-based login system mainly because although it is
possible to develop a cookie-based system, the session-based login system makes
the user experience far more natural. Furthermore, developers ultimately need to rely
on both session and cookies which complicates the code and hence development is
simplified by using such a technique. Lastly, the shortcomings of the session-based
login system are compensated by the usage of encryption algorithms.
66 S. Khulbe et al.

Table 6 Literature survey on SQL injection


S. No. Title of research Conclusions Research Gaps
1 [20] .• Discussed prevention and .• SQL injection attacks still
detection mechanisms exist
.• Different SQL injection
discussed with purposes
2 [21] .• Rare SQL injections discussed .• Less information available
about these rare attacks
.• Complex than classical SQL .• Preventation and detection
attacks is difficult
3 [22] .• AES algorithm is used to .• SQL query change plan
avoid SQL injection attack required
.• Requires integration into a
detailed system for practicality
4 [23] .• Discussed classical and .• Need improved techniques
modern types of SQLIA to overcome SQLIA
.• Detection and prevention
techniques discussed
5 [24] .• Novel approach to detect and .• Approach required to embed
prevent SQL injection in our application
.• Successfully prevent and
detect attacks
6 [25] .• Classification uspicious query .• SVM not supported for large
strings using SVM dataset
.• Most accurate—accuracy .• Doesn’t perform well when
96.47% target classes overlap

3.2 Architectural Definition (Entity Level)

The entity diagram illustrated below rightfully explains the overall architecture of
the system that we plan to build.The working entities and components are described
in detail and as one tries to implement the system the need for each component is
apparent. Every component has a specific data type which is supported by MySQL as
of the development of this document and every component has its relation/function
specified on the connecting lines of the component which illustrates the functional
dependency amongst the working entities.

3.3 Architectural Definition (Modular Level)

The following diagram relates the functions of each entity with their respective
components of the system. The image below represents a data-flow diagram and
uses UML specifications to represent entities, functions, data involved, and the table
concerned for the particular usage.
Development of an Online Collectable Items Marketplace … 67

3.4 Architectural Definition (High Level)

The high-level abstracted version of the system can be represented with the following
diagram. The tools used are just for representation but ultimately depend on the
developer and what all resources are available at that moment.
In order to prevent confusion/ files/sessions represent only the fact that sessions
are initiated based on actions executed by the client side machines, while sessions
originally function on the server only. Furthermore files in this case represent the
pictures in all accepted formats for purposes like user profile picture, product image
representations, etc.

3.5 Techniques Employed for XSS Prevention

As per the research we conclude to implement some of the robust ways to prevent
XSS attacks. Firstly, any input accepted from the user is accepted through HTML
forms and thus every form must be validated at the backend so as to keep the system
more secure. Secondly, all SQL queries are to be executed in atomic transactions so
as to keep the database consistent. Thirdly, the form input must always be striped
off from any scripting tags such as PHP scripts, SQL scripts, and JS scripts. This
technique provides a great deal of prevention from most XSS attacks.
The system also ensures a token for login and redirects all external requests
to the login page which discards the requests. This completely prevents access of
inner pages of the website without login and since the login is secured with double
encryption the sensitive information is secured.
Due to the modular nature of the web application the database connection token
is stored away securely and even in case of database corruption the connection token
can quickly be redirected to a Data Backup thus ensuring the systems reliability. A
reliable way to prevent malicious content from entering into the system is to use a
standard text encoding technique by explicitly specifying it in the code base. This
ensures that all the data is displayed in a consistent fashion throughout the system.

3.6 Techniques Employed for Pattern-Matching

Searching is an essential feature used on a frequent basis for filtering data based on
their name or category. While categorical filtering can be done using the database,
the searching functionality must be implemented in the Client-Side only as the Client
may request for multiple kinds of information. The simplest principle of searching
involves simple matching of all characters of the string. However, such a rudimentary
form of pattern matching is poor. Thus, the better way of handling this is to search for
patterns that resemble the text and sort based on similarity of the search. This provides
68 S. Khulbe et al.

us with a more insightful representation of the data. The current algorithms used
involve complex pattern matching algorithms like Knuth Morris Pratt (KMP) pattern
matching algorithms. However, the way a user intends to search the information tends
to resemble pattern matching techniques similar to sliding windows. Thus, in order
to satiate the needs of the user we tend to incline ourselves to a faster technique while
limiting the behavior to the point where the user is able to obtain the desired results.

3.7 Development Environment and Setup

The development environment of the project is listed as follows Minimum Computer


specifications
1. Windows/Linux/Mac OS-based system that is configured with Xampp v3.2.4.
2. Chrome Browser with JavaScript and CSS enabled.
3. MySQL enabled and Apache server configured.
4. All HTDOCS directories are enabled with write permissions so that user profile
images can be stored.
5. In order to develop the application, you may use any editor (Sublime text, Notepad,
VS- Code) or IDE (Visual Studio).
6. We used VS-Code for the purpose of development.
Figure 1 represents the database schema design. The user can login or logout the
software. It can also search the category of products he want to browse. User can have

Fig. 1 Schema design


Development of an Online Collectable Items Marketplace … 69

Fig. 2 Use case diagram

Fig. 3 Block-diagram representation

to integrated payment gateway to handle the payments in the application. Figure 2


very well elaborated the use case diagram of the application. Refer to Fig. 3 to get a
detailed insights of the architecture of the applications.

4 Results and Analysis

We successfully developed our application and tested the application on various test
cases. We run the application on 125 various test cases on the field of proper login
and logout functionality, customer onboarding, and secure and fast payment process.
According to our analysis 84% of the test cases performed well, 11% performed
average, and 4% (refer Fig. 4)perform bad but we are happy to say that none of our
test case failed in performing the functionality and our applications worked fine with
all the test cases. Refer to Table 7 to know all the categories of the test cases.
70 S. Khulbe et al.

Fig. 4 Pie chart analysis for test cases

5 Conclusion and Future Scope

In our research paper, we have been successful in developing a PHP based system.
This system is an online marketplace. The marketplace is built to provide users with
a medium to purchase and sell collectable items. The facility for multiple users and
sellers to buy and sell collectables and antiques using a single admin account has
been provided in our system.
We have focused our attention mainly on adding modern and new efficient tech-
niques to the online marketplace which has been built. Techniques like software
development lifecycle (SDLC) and pattern matching algorithm have been used and
implemented which has the potential of optimizing search time for the users. Mul-
tilayer encryption has also been used to maintain security for login and purchase
modules. Not only that, but the multilayer encryption also provides security from
XSS attacks. Thus, we conclude that our online marketplace is a well-built platform
that is a user friendly and secure place for buying and selling antiques and collecta-
bles. Going forward, we can see our product being the handiest and the go-to website
for antique collecting enthusiasts. As you can see on our website, we have features
such as
• The seller has a separate account where he can put up items for sale and their
respective prices.
• Also, you have a buyers account from where the buyer can easily buy and view
various types of products.
We see our product as an absolute win where we can buy antiques and collectables
by sitting within the comfort of our homes and buying is just one click away. We see
Development of an Online Collectable Items Marketplace … 71

Table 7 Crucial features to test a web application


S. No. Task
1 Login Functionality for Seller
2 Seller Onboarding (Sign up)
3 Seller is able to edit profile
4 Seller is able to add products
5 Seller is able to activate or deactivate products
6 Seller is able to View all transactions
7 Seller is able to edit individual products
8 Logout function for Seller
9 Login Functionality for Customer
10 Customer Onboarding (Sign up)
11 Customer is able to edit profile
12 Customer is able browse through popular categories and nav bar works
13 Customer is able to click to view products in each category or can click
products tab to see all products
14 Customer is able to buy products
15 Customer is able to view order history
16 Customer is able to logout
17 Admin Login functionality
18 Admin can view all customers
19 Admin can add a Category
20 Admin can edit a Category information
21 Admin can view all sellers
22 Admin can view all categories
23 Admin can view all transactions
24 Admin can toggle using the Navigation bar
25 Admin Logout feature

our product as a huge success going forward in the future since we all know “time
is money” and our website makes buying very efficient and without any wastage of
time compared to the typical offline auctions where you have to specifically go to
the place to buy the item you want and still there is no guarantee of you acquiring
that product.

• We also aim to have very smooth logistics for the buyers where there is transparency
between them and the sellers using our website.
• We also aim to have a real-time tracking system of the product so that the buyer
knows which process their product is going through and when they can expect the
product delivery.
72 S. Khulbe et al.

• We also aim to have a detailed description of our products wherein the buyer
states which century the product is from and its historic importance and to whom
it belongs.
• As we all know the crypto market is booming massively in the current scenario,
we also aim to accept payments through cryptocurrency.

References

1. Waykar, Y (2013) A study of importance of UML diagrams: with special reference to very
large-sized projects
2. Balaji S, Sundararajan Murugaiyan M (2012) Wateerfallvss V-model vs agile: a comparative
study on SDLC
3. Eessaar E (2016) The database normalization theory and the theory of normalized systems:
finding a common ground. Baltic J Mod Comput 4:5–33
4. Bansal, H, Khan R (2018) A review paper on human computer interaction. Int J Adv Res
Computer Sci Software Eng 8(53). https://doi.org/10.23956/ijarcsse.v8i4.630
5. Maurel H, Vidal S, Rezk T (2021) Statically Identifying XSS using deep learning. In: SECRYPT
2021—18th International conference on security and cryptography, Virtual, France, July 2021
6. Khazal IF, Hussain MA (2021) Server side method to detect and prevent stored XSS attack
7. Li C, Wang Y, Miao C, Huang C (2020) Cross-site scripting guardian: a static XSS detector
based on data stream input-output association mining
8. Ankush SD (2014) XSS attack prevention using DOM based filtering API
9. Gupta S, Gupta BB (2016) Automated discovery of JavaScript code injection attacks in PHP
web applications
10. Malviya VK, Saurav S, Gupta A (2013) On Security Issues in Web Applications through cross
site scripting (XSS). In: 2013 20th Asia-Pacific software engineering conference (APSEC), pp
583–588. https://doi.org/10.1109/APSEC.2013.85
11. Diwate R (2013) Study of different algorithms for pattern matching
12. Janani R, Vijayarani S (2019) Information retrieval from web documents using pattern matching
algorithms
13. Aldaej R, Alfowzan L, Alhashem R, Alsmadi MK, Al-Marashdeh I, Badawi UA, Alshabanah
M, Alrajhi D, Tayfour M (2018) Analyzing, designing and implementing a web-based auction
online system
14. Erna P, Herdi A, Enjun J, Venkata Harsha N (2020) An architecture of E-marketplace platform
for agribusiness in Indonesia. MSCEIS, EAI. https://doi.org/10.4108/eai.12-10-2019.2296542
15. Kuhmonen S (2017) One-time password implementation for two-factor authentication
16. School of Mathematics and Information Technology, Nanjing Xiaozhuang College, Nanjing,
China
17. Sinha S (2020) Secure login system for online transaction using two layer authentication
protocol
18. Kumar B, Yadav S (2016) Storageless credentials and secure login. In: ACM International
conference proceeding series, 04–05-Mar 2016. https://doi.org/10.1145/2905055.2905113
19. Guljari E, Lokhande S, Mande S, Reddy L, Uma Maheswari K (2016) Authentication of users
by typing pattern: a review
20. Devi R, Venkatesan R, Koteeswaran R (2016) A study on SQL injection techniques. Int J Pharm
Technol 8:22405–22415
21. Singh JP (2016) Analysis of SQL injection detection techniques
22. Som S, Sinha S, Kataria R (2016) Study on SQL injection attacks: mode, detection and pre-
vention
Development of an Online Collectable Items Marketplace … 73

23. Alwan Z, Younis M (2017) Detection and prevention of SQL injection attack: a survey. Int J
Comput Sci Mob Comput 68:5–17
24. Oluwakemi A, Abdullahi A, Haruna D, Oluwatobi A, Kayode A (2020) A novel technique to
prevent SQL injection and cross-site scripting attacks using Knuth-Morris-Pratt string match
algorithm. EURASIP J Inform Securi. https://doi.org/10.1186/s13635-020-00113-y
25. Rawat R, Shrivastav S (2012) SQL injection attack detection using SVM. Int J Comput Appl
42:1–4. https://doi.org/10.5120/5749-7043
Facial Emotion Recognition Using
Machine Learning Algorithms: Methods
and Techniques

Akshat Gupta

Abstract The interpretation of a person’s state of mind relies heavily on their facial
expressions. The identification of facial expressions requires the categorization of
a number of different feelings, such as neutral, happy, angry. This use is becoming
increasingly significant in day-to-day living. Identifying emotions can be accom-
plished through a variety of strategies, such as those based on machine learning and
artificial intelligence (AI) techniques. Methods such as deep learning and image clas-
sification are utilised in order to recognise facial expressions and categorise them in
accordance with the corresponding photographs. For the purpose of training expres-
sion recognition models, a variety of datasets are utilised. This article provides a
comprehensive analysis of the facial recognition systems and techniques that have
been utilised in the past, as well as those that have been developed more recently,
together with the methodology behind them and some examples of their applications.

Keywords Deep learning · CNN · Facial emotion recognition · Human computer


interaction · Feature extraction · Stress detection · Cascade · Classification

1 Introduction

Communication between humans takes place not only through the use of spoken
language but also through the use of non-verbal cues such as hand gestures and facial
expressions, which are employed to convey emotions and provide feedback [1]. They
also assist us in comprehending the motivations of other people [2]. Several studies
[3, 4] have found that non-verbal components are responsible for conveying two-
thirds of human communication, while verbal components are only responsible for
conveying one-third. The rapid development of techniques for artificial intelligence,

A. Gupta (B)
Department of CSE, National Institute of Technology, Goa, India
e-mail: akshatgupta@nitgoa.ac.in; akshat18jan@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 75
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_7
76 A. Gupta

such as in Human–Computer Interaction (HCI) [5, 6], Virtual Reality (VR) [7],
Augmented Reality (AR) [8], Advanced Driver Assistance Systems (ADAS) [9],
and Entertainment [10, 11], is increasing the demand for facial emotion recognition.
Because a person’s circumstances and the people with whom they contact might
cause them to feel a wide range of emotions, that person is prone to experiencing
mood swings, which can in turn have an impact on their day-to-day activities. People
are always facing difficult emotional choices, which impedes both their produc-
tivity and their own personal development. Given that emotions are merely states
of consciousness, there is a strong connection between them and the work that we
undertake. Therefore, all it takes is a few simple actions or pursuits for a person’s
frame of mind to shift in a positive direction. If computers are able to recognise
these emotional inputs, then they will be able to provide users with assistance that is
both clear and suitable in a manner that is tailored to the user’s specific requirements
and preferences. It is a fundamental tenet of psychology theory that human feelings
can be categorised into one of seven archetypal states: surprise, fear, disgust, anger,
happiness, sorrow, or neutral [1]. This categorization is widely accepted in the field.
The movement of the face is an essential component in the communication of these
feelings. The expressions that are created by the face’s muscles can be altered, both
naturally and on purpose, to convey a variety of emotions. If computers are able to
recognise these emotional inputs, they will be able to provide users with assistance
that is both obvious and appropriate, catering to the user’s specific requirements
and preferences while simultaneously ensuring the user’s happiness and fulfilment
(Fig. 1).
The ability to read another person’s feelings based on their facial expressions
differs from person to person. There is also the possibility of variation with age.
It is now much simpler to identify the feelings that other people are experiencing
because to the advancement of technology. However, the accuracy of these methods
is often quite poor. It is possible to improve the accuracy of emotion identification
by employing a combination of different methods to analyse human expressions
from various sources of data like as psychology, audio, or video. This could result
in a significant increase in the level of accuracy achieved. The technology behind

Fig. 1 Face emotion recognition/detection methods [2]


Facial Emotion Recognition Using Machine Learning Algorithms … 77

the so-called emotional or emotive Internet emerged as a result of the combining or


integration of a number of different technologies, which allowed for the detection of
a wide range of emotions.

2 Literature Review

Ekman et al. have taken the first step towards providing universality in facial termi-
nology. These ‘universal facial expressions’ are those that are used to represent happi-
ness, sadness, anger, surprise, and being neutral [12]. Multiple intellectual machines
and neural networks are utilised in the process of instrumenting the system that
detects emotions [13].
A thorough investigation into the recognition of facial expressions of emotion
is presented in [14], which also describes the characteristics of a dataset and an
emotion detection study classifier. The article [15] explores the visual features of
the image and discusses some of the classifier techniques, both of which will be
thought-provoking when applied to a further evaluation of the methods of emotion
recognition. The ability to recognise facial expressions of emotion is investigated
and analysed across all scientific domains [16].
Because of the excellent accuracy rate that was achieved using filter banks and
Deep CNN [17], we were able to draw the conclusion that deep learning could
also be utilised for emotion identification. Emotion can be determined from face
photos using these methods. Image spectrograms combined with deep convolutional
networks can also be used to do facial emotion recognition, as described in [18]. It
is also feasible to recognise emotions by employing a variety of features, such as
speech or bi-modal systems, which take into account both the speaker’s voice and
their facial expressions [1, 2]. A number of academics have put up the idea that the
system should both induce and guide the learner’s emotions to the appropriate state.
But first and foremost, the system needs to be able to understand the feelings of the
learner.
Hybrid tactics, which employ many methods in tandem with various affective
cues, have also been developed. This [3] analysis also provides insight on a modern
composite deep learning strategy.
Table 1 summarises the five years of development in this field. Emotion analysis on
faces has been done using many different datasets, some of which are included here:
CK, JAFFE, CMU, FER2013, DISFA, RAFD, CFEE, SFEW, and FERA. Clearly,
there are large output differences between the different implementations, suggesting
the ongoing need to standardise datasets like the well-known IMAGENET dataset
used in CNN models so that controlled research can be carried out.
In a study available at [19], the authors provide a brief summary of the many
studies on facial expression recognition (FER) conducted over the past few decades.
First, the typical FER processes are outlined, then a brief overview of the many
kinds of typical FER systems and the primary algorithms they employ are provided
research into the use of multilevel decisions inside a highly convolutional neural
78 A. Gupta

Table 1 List of previous works


References Dataset Accuracy Architecture
2020 CK JAFFE 93.24, 95.24% C layers combined with residual
blocks
2019 Multiple databases 96.24% Multilayers, augmentation of
data
2018 CK+ MMI 98.95, 97.55% 3 C layers, features
optimization, SOM map
2017 CK+, JAFFE, BV-3DFE 96.76% 3 C layers, 2 Subsampling
layers, FC layers,
2017 JAFFECK+ 76.7442, 3 C layers, 2 Subsampling
80.303% layers, Haar like features
extraction
2017 Kaggle website 64% HOG-CNN multilayers
architecture
2017 FER2013 66% Emotion and Gender
classification, CNN models
2017 CFEERaFD 95.71, 74.79% AlexNet CNN
2017 CK+, MMI, FERA, DISFA 93.21, 77.50, Facial Landmark-based CNN
77.42, 58.00% combined with facial landmarks
2016 CMU MultiPIE, MMI CK+, 94.7, 77.9, 93.2, 2 C layers, max pooling layer,
DISFA, FERA, SFEW, 55.0, 76.7, 47.7, Inception layer
FER2013 66.4%
2016 FER2013 – 3 C layers, 2 FC layers, rectifies
linear unit (ReLU) activation
function

network for facial emotion recognition was conducted by Hai-Duong Nguyen [20].
They present a model supported by evidence that includes a hierarchy of attributes on
purpose to facilitate classification. When tested on the FER2013 dataset, the model’s
performance was shown to be on par with that of other advanced approaches.
Using a feedforward learning approach, the authors of [21] developed a facial
features recognition algorithm for use in Associate in Nursing classrooms.
Hernández-Pérez [22] proposed a strategy that combined native Binary Patterns
(LBP) options obtained from facial expressions with oriented fast and revolved tran-
sient (ORB) features and support vector machine. Zhang Qinhu [23] introduces self-
attention mechanism supported by residual network and achieved 98.90 and 74.15%
accuracy rates on the CK+ and FER2013 datasets.
Using a pre-trained identity verification and Xception algorithmic rule coach,
Zahara [24] developed a strategy for a facial image separation (FIT) machine with
enhanced features.
Facial Emotion Recognition Using Machine Learning Algorithms … 79

Fig. 2 Conceptual diagram

3 Facial Emotion Recognıtıon System Archıtecture

In contrast to sound-based approaches, which make use of global computations for


acoustic features [1], the features that are typically employed are those that are based
on the local spatial location or displacement of particular points and surface regions.
The following is a list of the several stages that comprise the process of facial emotion
recognition.
Access to Web Camera: The system has been granted access to the user’s web camera.
Input Face Live Feed Image: The dataset has an endless number of photos that are
utilised for recognising facial expressions.
Preprocessing: The image must be translated, scaled, and rotated before it can be
processed further in the preprocessing stage.
Feature Extraction: The size of the image is shrunk throughout the process of feature
extraction, but the qualities of the photos are not altered.
Classification A classifier is the one who is in charge of the process of expression
classification.
Identify Emotion: After that, it will determine the feelings that were present.
Graphical depiction: Based on the various feelings that are input, it will provide a
graphical depiction of those feelings (Fig. 2).

4 Methods and Technıques for Facıal Emotıon

The process of identifying emotions using technology is difficult, but it is an area in


which machine learning algorithms have demonstrated a great deal of promise.
80 A. Gupta

4.1 Traditional Methods

Face Detection with Viola-Jones Face Detector


Through a series of iterations, it combines a large number of weak classifiers to
produce a powerful classifier. In this context, a weak classifier is one that reduces
the weighted error rate per iteration [1].

4.2 Methods Typically Utilised in Machine Learning

Cascaded regression trees are used. Pixel intensities are employed to differentiate
between the various facial regions, ultimately leading to the identification of 68 facial
landmarks [25] (Fig. 3).
The author [25] uses only 19 major elements out of a total of 68 recovered traits,
focusing only on the areas surrounding the mouth, eyes, and brows (as shown in
Fig. 4).
A support vector machine, or SVM, is one of the most effective classification
techniques. There is also the concept of a margin, which is the distance between two
classes that should exist to prevent any form of overlap [26].

Fig. 3 System architecture


Facial Emotion Recognition Using Machine Learning Algorithms … 81

Fig. 4 Image with 68 feature points [25]

Classifiers Based on a Random Forest


These classifiers have also demonstrated superiority over support vector machines
(SVM) in several instances [27].
Convolutional Neural Network (CNN)
The convolutional neural network, or CNN, is a type of deep learning algorithm that
can record images, assign significance value (readable and discriminatory metrics)
to the many aspects/elements in the image, and are capable of differentiating one
aspect/element from another [28].
The organisation of the visual cortex served as an inspiration for the construction
of CNN, which has a structure that is analogous to the connection patterns of neurons
found in the human brain [29].
The convolutional network is very good at capturing both the spatial and temporal
relationships of an image by making use of the appropriate filters [30] (Fig. 5).
Common Dataset FER
The photographs in the dataset have the size of 48 by 48 by 1. Table 1 presents a
substantial amount of information pertaining to the dataset. The categories as well
as the range of photographs available for each category are offered.
82 A. Gupta

Fig. 5 CNN architecture [29]

5 Discussions on the Research Gaps

In the preceding part, we saw that there are many various kinds of algorithms that
have been employed in face detection; however, there are still a lot of obstacles that
we have to overcome in the field of emotion detection. Following is a list of these:
• Detection and acknowledgement in real time.
• Recognising changing expressions and facial movements.
• Continuous detection.
• Incomplete data.

6 Conclusion

Our objective was to analyse the facial expressions of users to determine their present
state of mind and then make suggestions tailored to that understanding so that users
might improve their disposition and have a more enjoyable experience. The system
makes an effort to supply the person who is in a state of distress or rage with movies
or activities that are designed to relieve stress.
It also suggests information that will motivate users to overcome feelings of sorrow
and sadness and continue to experience joy in their lives. This research is intended
to propose a solution that is not only quicker but also simpler and more succinct.
It can snap a photo with the help of a straightforward user interface, like that of an
Android app, and then proceed to carry out the necessary actions in a manner that
will not aggravate the user. Because it makes use of contemporary deep learning
techniques, such as convolutional neural network, this system is more dependable
Facial Emotion Recognition Using Machine Learning Algorithms … 83

than others because it improves accuracy. This will be of significant assistance in


stressful conditions and plays an important job in boosting the production graph of
an individual.
Because of its significant educational and economic potential [2], facial expres-
sion recognition, also known as FER, will be an important topic in the field of
computer vision and artificial intelligence in the near future. In the next phase of
this research, one of the topics that will be addressed is the more difficult task of
extracting important visual information straight from traditional films. In the near
future, the platform will have the capability to record and identify facial expressions
using only brief video clips as input.

References

1. Busso ZD, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan
S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal
ınformation. In: Proceedings of the 2004, ICMI’04, October 13–15, State College, Pennsylvania
2. Imani M, Montazer GA (2019) A survey of emotion recognition methods with emphasis on
E-Learningenvironments. J Netw Comput Appl 147:102423
3. Ko BC (2018) A brief review of facial emotion recognition based on visual ınformation. MDPI
J 18(2):401
4. Mehrabian A (1968) Communication without words. Psychol Today 2:53–56
5. Kaulard K, Cunningham DW, Bülthoff HH, Wallraven C (2012) The MPI facial expression
database: a validated database of emotional and conversational facial expressions. PLoS ONE
7:e32321
6. Dornaika F, Raducanu B (2007) Efficient facial expression recognition forhumanrobotinterac-
tion. In: Proceedings of the 9th international work-conference on artificial neural networks on
computational and ambient intelligence, San Sebastián, Spain, 20–22 June 2007, pp 700–708
7. Bartneck S, Lyons MJ (2007) HCI and the face: towards an art of the soluble. In: Proceedings of
the international conference on human computer interaction: interaction design and usability,
Beijing, China, 22–27 July 2007, pp 20–29
8. Hickson S, Dufour N, Sud A, Kwatra V, Essa IA (2017) Eyemotion: classifying facial
expressions in VR using eye-tracking cameras
9. Chen H, Lee IJ, Lin LY (2015) Augmented reality-based self-facial modeling to promote the
emotional expression and social skills of adolescents with autism spectrum disorders. Res Dev
Disabil 36:396–403
10. Assari MA, Rahmati M (2011) Driver drowsiness detection using face expression recogni-
tion. In: Proceedings of the IEEE ınternational conference on signal and ımage processing
applications, Kuala Lumpur, Malaysia, 16–18 November 2011, pp 337–341
11. Zhan C, Li W, Ogunbona P, Safaei F (2008) A real-time facial expression recognition system
for online games. Int J Comput Games Technol 2008:1–7
12. Azcarate A, Hageloh F, van de Sande K, Valenti R (2005) Automatic facial emotion recognition.
Universiteit van Amsterdam
13. Khan R, Sharif O (2017) A literature review on emotion recognition using various methods.
Global J Comput Sci Technol F Graph Vis 17(1):1
14. Hintin G, Greves S, Mohemed A (2013) Emotion recognition with deep recurrent neural
networks. In: Proceedings of the 2013 IEEE ınternational conference on acoustics, speech
and signal processing, pp 6645–6649
15. Weint K, Huaang CW (2017) Characterizing types of convolution in deep convolutional
recurrent neural networks for robust speech emotion recognition, pp 1–19
84 A. Gupta

16. Routrey MS, Kabisetpathy P (2018) Database, features and classifiers for emotion recognition:
a review. Int J Speech Technol
17. Hueng KY, Wiu CH, Yieng TH, Sha MH, Chiu JH (2016) Emotion recognition using auto-
encoder bottleneck features and LSTM. In: Proceedings of the 2016 ınternational conference
on orange technologies (ICOT), pp 1–4
18. Sttilar MN, Leich M, Bolie RS, Skinter M (2017) Real time emotion recognition using RGB
image classification and transfer learning. In: Proceedings of the 2017 11th ınternational
conference, signal processing communication systems, pp 1–8
19. Ko BC (2018) A brief review of facial emotion recognition based on visual ınformation. Sensors
18(2):401–421
20. Nguyen HD, Yeom S, Lee GS, Yang HJ, Na IS, Kim SH (2019) Facial emotion recognition
using an ensemble of multi-level convolutional neural networks. Int J Pattern Recogn Artif
Intell 33(11):128
21. Bhatti YK, Jamil A, Nida N, Yousaf MH, Viriri S (2021) Velastin SA (2021) Facial expression
recognition of ınstructor using deep features and extreme learning machine. Comput Intell
Neurosci 1–14:1–7
22. Niu B, Gao Z, Guo B (2021) Facial expression recognition with LBP and ORB features. Comput
Intell Neurosci 2021:1–16
23. Daihong J, Yuanzheng H, Lei D, Jin P (2021) Facial expression recognition based on attention
mechanism. Sci Program 2021:1–18
24. Zahara L, Musa P, Prasetyo Wibowo E, Karim I, Bahri Musa S (2020) The facial emotion
recognition dataset for prediction system of micro-expressions face using the convolutional
neural network algorithm based raspberry Pi. In: Proceedings of 5th ınternational conference
on ınformatics and computing, pp 1–9
25. Swinkels W, Claesen L, Xiao F, Shen H (2017) SVM point-based real-time emotion detection.
In: Proceedings of the 2017 IEEE conference on dependable and secure computing, Taipei
26. Kaya GT (2013) A hybrid model for classification of remote sensing ımages with linear SVM
and support vector selection and adaptation. IEEE J AEORS 6(4):1988–1997
27. Sheykhmousa M, Mahdianpari M, Ghanbari H, Mohammadimanesh F, Ghamisi P, Homayouni
S (2020) Support vector machine versus random forest for remote sensing image classification:
a meta-analysis and systematic review. IEEE J Sel Top Appl Earth Obs Remote Sens 13:6308–
6325
28. https://www.geeksforgeeks.org/introduction-convolution-neural-network/
29. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-
the-eli5-way-3bd2b1164a53
30. Li CEJ, Zhao L (2019) Emotion recognition using convolutional neural networks. Purdue
University Purdue e-Pubs Purdue undergraduate research conference 2019
Similar Intensity-Based Euclidean
Distance Feature Vector
for Mammogram Image Classification

Bhanu Prakash Sharma and Ravindra Kumar Purwar

Abstract Mammogram imaging is economical, easily available, non-invasive and


preferred for breast cancer detection. This paper proposes and tests a new feature
vector based on the sum of Euclidean distances of similar subsequent pixels on
various well-known classifiers. The preprocessed and augmented region of interest
images extracted from the Mammographic Image Analysis Society (MIAS) dataset
are used for performance evaluation. The classifier’s input data is balanced by
randomly selecting 2000 images from each normal, benign and malignant categories.
On an Ensemble Subspace KNN classifier using a tenfold cross-validation approach,
it achieved good sensitivity/recall, specificity, precision and F1-scores along with
a classification accuracy of 98.4%. This technique can be used for regular breast
screening for early-stage breast cancer detection and breast abnormalities identifi-
cation. It can prioritize mammograms for further analysis as well as provides an
opinion to the radiologists in decision-making.

Keywords Breast cancer · Mammogram · Augmentation · Data balancing ·


Neural network · Ensemble classifier

1 Introduction

In cancer, some categories of abnormal cells grow out of control and, if not treated
on time, may cause the patient’s death. Statistics represent a tremendous increase in
cancer cases during the last few decades. As described in the estimation report [1]
of the American Cancer Society (ACS) for 2023, diagnosis of more than 1,958,310
cancer cases and 609,820 deaths are expected in the USA only. Its 297,790 cases

B. P. Sharma (B) · R. K. Purwar


USICT, Guru Gobind Singh Indraprastha University, Delhi, India
e-mail: bhanu.12016492317@ipu.ac.in
R. K. Purwar
e-mail: ravindra@ipu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 85
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_8
86 B. P. Sharma and R. K. Purwar

belong to breast cancer only and are category-wise maximum among women. Of
these breast cancer cases, 43,170 expected deaths represent a high mortality rate.
Table 1 represents the Indian cancer estimation of the National Cancer Registry
Programme (NCRP) report-2020 [2]. Like many other countries, breast cancer is the
leading site of cancer among Indian women. The estimated number of breast cancer
cases in 2020 was 205,424, which covered 14.8% of all cancer cases, whereas these
estimations for 2025 are 232,832 and 14.8%, respectively. It is estimated that within
every 8 min, one woman is diagnosed with and dies of breast cancer in India.
As suggested in the reports published by cancer-controlling authorities [1–3]
of various countries, the risk of developing cancer increases with age. In addition,
improved average survival rate and reduced mortality and severity have been observed
for the cases detected at an earlier stage and appropriately treated. So, many coun-
tries suggest regular breast checkups to identify the presence of any abnormality
at an earlier stage. For this purpose, many breast screening modalities are used
in medical science to view and analyze the possibility of breast cancer. Some of
them are mammography, magnetic resonance imaging (MRI), computed tomog-
raphy (CT) scan, biopsy, thermography, breast ultrasound, electrical impedance
tomography (EIT), digital breast tomosynthesis (DBT), etc. Among them, mammog-
raphy is economical, less time-consuming, non-invasive and the preferred screening
technique.
Shortage of the number of radiologists present globally, such regular screening
creates a considerable burden on them. To overcome this problem, there is a need for
automatic (computer-aided) analysis of mammogram images for breast cancer detec-
tion, prioritizing the mammogram images for manual analysis by radiologists and
providing an additional opinion to them. Inspired by the need, the authors proposed
a model which automates this process and classifies the input mammogram images
into normal, benign and normal categories. The malignant classified mammograms
are supposed to be cancerous and need the radiologist’s advice on priority. Many
possible steps like mammogram enhancement, noise removal, region of interest
(ROI) segmentation and augmentation techniques of this automation process are
inspired by [4], and its resultant dataset (formed from the MIAS dataset) is used in
this work. Further, the Euclidean distance between similar intensity pixels is used for
the extraction of prominent features. At last, the Ensemble Subspace KNN is used as

Table 1 Estimated number of breast cancer cases, all cancer cases in females and both males and
females for the quinquennial years 2015, 2020 and 2025
Year (a) Cancer cases Females (e) Breast cancer
in all sites (b) Female (c) Breast (d) Breast cases (%) to
(males + cancer cancer cancer cases (males +
females) cases from cases (%) among females)
all sites females
2015 1,228,939 627,202 180,252 28.74 14.67
2020 1,392,179 712,758 205,424 28.82 14.76
2025 1,569,793 806,218 232,832 28.88 14.83
Similar Intensity-Based Euclidean Distance Feature Vector … 87

the primary classification model. To balance the data and get reliable results, 2000
mammogram images (a total of 6000) are randomly taken from each of the three
categories for performance analysis.

2 Literature Survey

An automated mammogram image analysis model includes various possible steps


from mammogram enhancement to the usage of an appropriate classifier, and several
techniques have been suggested by researchers for the same. In addition, many
researchers also used data augmentation and balancing approaches to overcome the
problems of data under-fitting/overfitting and imbalanced multi-class data. Like, the
authors in [5] proposed an ensemble tree-based mammogram image classification
model using the MIAS dataset [6]. For noise removal, the images were passed through
median and Gaussian filters, followed by extraction of ROI using thresholding and
segmentation. To overcome the under-fitting problem, horizontal flip and rotation
at 90, 180 and 270° were used to generate additional transformed copies of ROI
images during the augmentation process. Features computed from the gray-level co-
occurrence matrix (GLCM) and local binary pattern (LBP) are merged with the deep
features extracted from an intermediate layer of AlexNet deep neural network using
transfer learning (TL). A fivefold cross-validation approach achieved classification
accuracy between 83.90 and 98.80% on seven different classification models. In [7],
authors used the median filter and different morphological operations to preprocess
the mammogram images of the MIAS dataset. On preprocessed images, segmenta-
tion was applied using Otsu’s threshold in parallel to the threshold computed using
histogram peak analysis (HPA), followed by the dot product of their resultant images.
On AlexNet deep neural network, it achieved a classification accuracy of 93.45%.
Authors in [8] computed the multi-fractal dimension (M-FD) features using artificial
neural networks (ANN) in which multiple thresholds were used for noise suppression
and feature extraction. It achieved 96.2% accuracy, 96.63% sensitivity and 95.37%
specificity over the mini-MIAS dataset. In [9], histogram equalization, full-scale
histogram stretching (FSHS), wavelet transform and morphological enhancements
were used to enhance the mini-MIAS dataset mammogram images. The ANN classi-
fier achieved a classification accuracy of 97%. In [10], the inception structure replaced
the first convolution layer of the DenseNet-II neural network and achieved a classifi-
cation accuracy of 94.55%. Authors in [11] proposed a two-view classification model
using transfer learning at three levels. First, the EfficientNet network’s weights train
the patch classifier, whose weights train the single-view classifier, and the weights of
the single-view classifier train the two-view classifier. An AUC of 0.93 was achieved
using a fivefold cross-validation technique on the Curated Breast Imaging Subset
of Digital Database for Screening Mammography (CBIS-DDSM) dataset. Mammo-
grams are enhanced based on their breast density level in [12] using the proposed
Spatial-based Breast Density Enhancement for Mass Detection (SbBDEM) tech-
nique by boosting overlapping mass region textural features. Further, the Blind/
88 B. P. Sharma and R. K. Purwar

Reference-less Image Spatial Quality Evaluator (BRISQUE) scores are computed


separately for both non-dense and dense breasts. These scores optimize the optimal
threshold parameters computed using the SbBDEM from the image’s lower contrast.
The enhanced You Only Look Once v3 (YOLOv3) architecture is used for mass/
lump detection. As there is a difference in circularity and smoothness of boundaries
of benign and malignant masses, along with GLCM the circularity of mass and their
intensities mean values are used as feature vectors. Using these feature vectors for
the mammogram images of the INbreast dataset [13], the supervised weighted k-
nearest neighbor (k = 10) classifier achieved a classification accuracy of 96% (using
a fivefold cross-validation approach).

3 Proposed Work

In the proposed work, the input mammogram images can be classified into one of
three classes benign (lump is present but probably non-cancerous), malignant (prob-
ably cancerous) and normal (no lump present) mammograms. Figure 1 represents the
stages of the proposed computer-aided detection/diagnosis (CAD) model for breast
cancer detection. These stages are described subsequently:

3.1 Stages 1–4

The dataset proposed in [4] from the mammogram images of the mini-
Mammographic Image Analysis Society (mini-MIAS) dataset [6] is used in this
work. The first four stages of the proposed work are adopted from that dataset
only. The ROI images were extracted using its description document (DD), and a
similar approach was used for benign and malignant mammograms, whereas another
approach was adopted for normal mammograms. The size of the dataset was extended
by using rotation (at 10, 20, 30, …, 350°) and vertical flip operations by generating
71 additional transformed copies of each ROI image. After normalization, a dataset
of 23,256 images of resolution 128 × 128 pixels was formed, consisting of 4608
benign, 3744 malignant and 14,904 normal mammograms. This extended dataset
provides sufficient images for the classification model’s training and testing.

Fig. 1 Common stages of a CAD-based breast cancer detection system


Similar Intensity-Based Euclidean Distance Feature Vector … 89

3.2 Stage 5: Data Balancing

There is a considerable variation in the number of images in all three categories of the
used dataset. Its direct usage may provide inaccurate and biased results. Therefore,
2000 images from each category (a total of 6000 images) are randomly selected for
the classifier’s training and validation to get reliable results.

3.3 Stage 6: Feature Extraction and Classification

Each image of the used dataset is an 8-bit grayscale image having pixel intensity
between 0 and 255. For each gray-level intensity, the sum of the Euclidean distance
between similar intensity pixels is used as a feature vector (FV) component. Since
these total 256 different gray levels, the size of FV is 256. This FV for each image
is computed using the pseudocode given in Algorithm 1.
Algorithm 1
Input: ROI image of resolution 128 × 128 pixels. Each pixel’s possible grey level is
between 0 and 255.
Output: A 1-D vector ‘distance’ of length 256.
Begin
1. Read ROI image into imgvariable, representinga 2-dimensional grey-level
intensity matrixof size 128 × 128.
2. Initialization: For each grey level intensity (intensity in range 0 to 255),
distance[intensity]=0.

For each row of img, row=0 to 127


{
For each column of img, col= 0 to 127
{
if img(row,col)==intensity
{
if first match
{
row_t=row
col_t=col
}
else
{
distance[intensity]= distance[intensity]+
euclideandistancebetween(row,col;row_t,col_t)
row_t = row
col_t = col
90 B. P. Sharma and R. K. Purwar

}
}
}
}

End
In Algorithm 1, img represents a two-dimensional matrix of size 128 × 128 such
that each value represents the grayscale intensity value of the corresponding pixel.
The distance[] is the FV used to store 256 values, where each value represents
the sum of Euclidean distances between subsequent pixels of the same intensity
value. For example, distance[0] stores the sum of Euclidean distance between
subsequent pixels of intensity 0 only.
Figure 2 illustrates how Euclidean distance is computed for intensity level 121,
whereas Table 2 represents its computed values. For the image block shown in Fig. 2,
the value of distance[121] is 10.9. These steps are repeated for each intensity
value for each pixel of img. The size of the computed feature vector becomes 256.
After the FV computation of 6000 randomly selected ROI images (2000 from each
category of benign, malignant and normal class), the next task is its performance
evaluation on classification models. For this evaluation, the simulation work was

Fig. 2 Illustration for


computation of Euclidean
distance for pixels of
intensity 121 for a 6 × 6
image

Table 2 Euclidean distance between pixels of intensity value 121, computed from Fig. 2
Position of one pixel Position of subsequent pixel Euclidean distance between them
[0, 1] [1, 3] 2.24
[1, 3] [1, 3] 2.83
[1, 3] [3, 4] 3
[3, 4] [2, 5] 2.83
Sum of distance for intensity 121 10.9
Similar Intensity-Based Euclidean Distance Feature Vector … 91

performed on MATLAB 2022b (academic licensed) on an i7 processor (2.9 GHz,


10th generation) system with 8 GB of RAM.
Various classifiers have been tested on the proposed feature vector. In Table 3,
these classifiers, the hyperparameters used and obtained accuracy are listed. It has
been observed that Ensemble Subspace KNN gives maximum accuracy of 98.4%.
Figure 3 represents the confusion matrix for the Ensemble Subspace KNN clas-
sifier. Its x-axis represents the predicted class for the mammogram’s ROI, whereas
the y-axis represents the actual classes. Table 4 represents the same classifier’s sensi-
tivity/recall, specificity, recall and F1-scores. Table 5 compares the proposed work’s
performance with other state-of-the-art techniques used for breast cancer detection
from mammogram images.

Table 3 FV’s performance over some state-of-the-art classifiers along with used hyperparameters
Classifier Hyperparameters used Overall accuracy (%)
Cubic support vector machine [14, Function (kernel): cubic 97.3
15] Scale (kernel): automatic
Box constraint level: 1
Method: one-versus-one
Data standardization: true
Fine KNN [16] Neighbors: 1 97.2
Metric of distance: Euclidean
Distance weight: equal
Data standardization: true
Wide neural network [17, 18] Fully connected layers: 1 97.1
Ist layer size: 100
Activation: ReLU
Iteration limit: 1000
Regularization strength: 0
Data standardization: yes
Ensemble Subspace KNN [17, 19] Ensemble type: subspace 98.4
Type of learner: nearest neighbors
Learners: 30
Subspace dimension: 128
Bold represents the maximum classification accuracy achieved by any classifier

Fig. 3 Confusion matrix for 10 13


Benign 1977
Actual
Class

Ensemble Subspace KNN


classifier Malignant 5 1987 8
Normal 31 32 1937
Benign Malignant Normal
Predicted Class
92 B. P. Sharma and R. K. Purwar

Table 4 Performance
Performance metric Benign Malignant Normal
metrics (sensitivity/recall,
specificity, precision and Sensitivity/recall 0.99 0.99 0.97
F1-score) for Ensemble Specificity 0.99 0.99 0.99
Subspace KNN classifier
Precision 0.98 0.98 0.99
F1-score 0.99 0.99 0.98

Table 5 Performance comparison of the proposed work with other state-of-the-art techniques
Classifier Accuracy
(%)
Dual thresholding-based breast cancer detection in mammograms [7] 93.45
Breast cancer detection using mammogram images with improved multi-fractal 96.2
dimension approach and feature fusion [20]
Fully automated computer-aided diagnosis system for microcalcifications cancer 97
based on improved mammographic image techniques [21]
Enhancement technique based on the breast density level for mammogram for 96
computer-aided diagnosis [12]
Proposed work 98.4

4 Discussion and Future Work

In this work, the authors proposed a breast cancer detection technique by classi-
fying the input mammogram images into benign, malignant and normal classes. The
primary goal of this research is to develop a technique that is easy to understand, accu-
rate, used as a regular screening technique for early-stage breast cancer detection,
prioritize the mammogram images for the radiologists and provide a second opinion
to them. Proposed work satisfies all these requirements using a simple approach
based on Euclidean distance between similar intensity pixels. When evaluated on
random balanced data, it achieved classification accuracy between 97.1 and 98.4%
using tenfold cross-validation approach for different classifiers.
Future work can incorporate some other feature extraction and classification tech-
niques with this existing approach to get better results. In addition, other datasets may
be incorporated with the existing one for the classifier’s training to achieve enhanced
performance.

Acknowledgements This work has been done under the Visvesvaraya fellowship scheme of the
Government of India (GOI).
Similar Intensity-Based Euclidean Distance Feature Vector … 93

References

1. American Cancer Society—Cancer facts and figures


2. ICMR-National Centre for Disease Informatics and Research (NCDIR) (2020) Report of
national cancer registry programme 2020. Bengaluru. https://ncdirindia.org/All_Reports/
PBCR_Annexures/Default.aspx. Accessed 30 March 2023
3. Jotwani AC, Gralow JR (2009) Early detection of breast cancer. Moleculardiag Therapy
13(6):349–357
4. Sharma BP, Purwar RK (2023) An augmented mammogram image dataset and its performance
analysis for various classification models. Multimedia Tools Appl 12:1–45. https://doi.org/10.
1007/s11042-023-14566-z
5. Sharma BP, Purwar RK (2022) Ensemble boosted tree based mammogram image classification
using texture features and extracted smart features of deep neural network. ADCAIJ Adv Distrib
Comput Artif Intell J 10(4):419–434
6. Suckling JP (1994) The mammographic image analysis society digital mammogram database.
Dig Mammo 12:375–386
7. Sharma BP, Purwar RK (2020) Dual thresholding based breast cancer detection in mammo-
grams. In: IEEE world conference on smart trends in systems, security and sustainability
(WorldS4), pp 589–592
8. Zebari DA, Ibrahim DA, Zeebaree DQ, Mohammed MA, Haron H, Zebari NA et al (2021)
Breast cancer detection using mammogram images with improved multi-fractal dimension
approach and feature fusion. Appl Sci 11(24):12122
9. Mabrouk MS, Afify HM, Marzouk SY (2019) Fully automated computer-aided diagnosis
system for microcalcifications cancer based on improved mammographic image techniques.
Ain Shams Eng J 10(3):517–527
10. Li H, Zhuang S, Li D, Zhao J, Ma Y (2019) Benign and malignant classification of mammogram
images based on deep learning. Biomed Sig Process Control 51:347–354
11. Petrini DG, Shimizu C, Roela RA, Valente GV, Folgueira MAAK, Kim HY (2022) Breast
cancer diagnosis in two-view mammography. Using end-to-end trained efficient net-based
convolutional network. IEEE Access 10:77723–77731
12. Razali NF, Isa IS, Sulaiman SN, Abdul Karim NK, Osman MK, Che Soh ZH (2023) Enhance-
ment technique based on the breast density level for mammogram for computer-aided diagnosis.
Bioengineering 10(2):153
13. Moreira IC, Amaral I, Domingues I, Cardoso A, Cardoso MJ, Cardoso JS (2012) Inbreast:
toward a full-field digital mammographic database. Acad Radiol 19(2):236–248
14. Scholkopf B, Smola A (2002) Learning with Kernels: support vector machines, regulariza-
tion, optimization and beyond, adaptive computation and machine learning. The MIT Press,
Cambridge, MA
15. Christianini N, Shawe-Taylor JC (2000) An introduction to support vector machines and other
kernel-based learning methods. Cambridge University Press, Cambridge
16. Friedman JH, Bentely J, Finkel RA (1977) An algorithm for finding best matches in logarithmic
expected time. ACM Trans Math Softw 3:209–226
17. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural
networks. In: Proceedings of the thirteenth international conference on artificial intelligence
and statistics, pp 249–256
18. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification. In: Proceedings of the IEEE international conference
on computer vision, pp 1026–1034
19. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Transa
Pattern Anal Mach Intell 20(8):832–844
94 B. P. Sharma and R. K. Purwar

20. Zebari DA, Ibrahim DA, Zeebaree DQ, Mohammed MA, Haron H, Zebari NA, Maskeli-
unas R (2021) Breast cancer detection using mammogram images with improved multi-fractal
dimension approach and feature fusion. Appl Sci 11(24):12122
21. Mabrouk MS, Afify HM, Marzouk SY (2019) Fully automated computer-aided diagnosis
system for micro-calcifications cancer based on improved mammographic image techniques.
Ain Shams Eng J 10(3):517–527
Prediction of Lumpy Virus Skin Disease
Using Artificial Intelligence

Pankaj Singh Kholiya, Kriti, and Amit Kumar Mishra

Abstract Lumpy skin disease (LSD) is a skin infection caused by the lumpy skin
disease virus, which belongs to the genus of Capripoxviruses. The disease transmitted
by arthropod vectors has high morbidity and low mortality rates. Contagious through
arthropod-borne feces and infected insects, it is virtually a non-vectored disease. It
takes one to four weeks for viremia to develop. It affects all animals regardless of
their age or gender. The use of LSD by cattle in Asia has recently been reported as
a terrifying threat. A morbidity rate of 7.1% was reported among cattle for the first
time in India for LSD. Clinical manifestations of the disease include fever, anorexia,
nodules on the mucus membranes of the mouth, nose, genital, udder, eyes, and
rectum, less milk production, infertility, abortion, and even death. First introduced in
Bangladesh in July 2019, the disease has since spread to China, Bhutan, India, Nepal,
Vietnam, and Myanmar in Southeast Asia. This paper presents a comprehensive
overview of LSD outbreaks in Asian countries over the past three years and also sheds
light onto the existing methods of disease detection based on artificial intelligence
techniques.

Keywords Lumpy skin disease · Epidemiology · Outbreak · Transboundary


spread · Artificial intelligence

1 Introduction

Among the most significant threats to the stockbreeding industry, lumpy skin disease
(LSD) is the most common, which causes acute and subacute illnesses in cattle and
buffalo. There are many different kinds and ages of cattle that are affected, but young
cattle and nursing cows are most vulnerable. The World Organization for Animal

P. S. Kholiya · Kriti (B)


School of Computing, DIT University, Dehradun, India
e-mail: kriti@dituniversity.edu.in
A. K. Mishra
Department of Computer Science and Engineering, Jain University, Bengaluru, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 95
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_9
96 P. S. Kholiya et al.

Health (OIE) has identified this illness as a disease that requires notification due to
its large economic losses and potential for fast spread. As the disease has spread into
disease-free countries in recent years, controlling and eradicating its transmission
is imperative. This virus is a member of the Capripoxvirus genus and is attached to
goat pox virus (GTPV) and sheep pox (SPPV). In addition to its double-stranded
DNA structure, the virus has an envelope composed of lipids and has a size of
approximately 150 kilobase pairs (kbp).
This virus which affects domestic ruminants is the most crucial one in terms
of economic significance in the Poxviridae family of viruses. The genome and
lateral bodies of the virus are contained within its capsid or nucleocapsid. Serologic
cross-reactions between people from the same species and cross-protection between
them are the outcomes of extensive DNA cross-hybridization between species. Even
though Capripoxviruses are generally thought of as host-specific, strains of SPPV
and GTPV are capable of cross-infecting and causing disease in both hosts. LSD,
on the other hand, is capable of infecting sheep [1] and goats experimentally, but no
natural infection has been observed in sheep or goats.

2 History of Lumpy Skin Disease

In the year 1929, the first clinical evidence of LSD was found in Zambia (formerly
Northern Rhodesia). Initially, it was believed that poisoning or hypersensitivity to
insect bites was responsible for most LSD cases [2]. The disease was recognized as
infectious in Botswana, Zimbabwe, and South Africa between the years 1943 and
1945. Eight million cattle were affected by LSD in South Africa as a panzootic. From
1945 to 1949, the disease continued to cause massive economic losses. Kenya was the
first country in East Africa to identify LSD [3] in 1957. Sudan was the first country
to report the disease in 1972, followed by West Africa in 1974. In 1983, it spread to
Somalia [3]. According to Davies, the disease continues to spread throughout most of
the African continent in a series of epidemics. Mauritius, Mozambique, and Senegal
reported LSD cases in 2001. As of today, LSD is widespread throughout most of the
African continent (with the exception of Libya, Algeria, Morocco, and Tunisia). It
had been proposed that the disease could spread beyond this range until the 1980s
(from 1929 to 1984) [3]. In 1984 and 2009, Oman was the source of LSD epidemics.
In 2014, LSD was identified as an emerging disease for the first time in Iran. A total
of six cases have been reported in dairy cows. In two villages in the west of the
country, outbreaks have been reported. The outbreak is believed to result from the
illegal movement of animals and normal vectors [4].
There is a possibility of the LSD spreading and invading free neighboring coun-
tries. Among the possible spreading areas of the LSD are north and west of Turkey
into Europe and the Caucasus and eastward into Central and South Asia. Further-
more, Bulgaria and Greece to the west and the Russian Federation to the north are at
risk. LSD also decreases breathing from respiratory muscle, and it can be fixed by
activity in spontaneous ventilation [29]. How the disease spread from South Africa
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 97

to India [5] remains unclear. Still, it may have been spread by livestock crossing
international borders, or vectors from neighboring countries could have spread it.
Several countries bordering India, including China and Bangladesh, have reported
cases of LSD in recent years. Therefore, it is crucial to understand the epidemiology
of exotic diseases to manage these diseases in a timely manner effectively [6–10].

3 Related Studies for Detection of LSD

As discussed earlier, the detection of LSD in animals requires many pathology tests.
However, an accurate diagnosis of the disease depends upon the experience of pathol-
ogist and is sometimes affected by inter- and intra-observer variations. Therefore,
a huge interest is present among the research community to develop artificial intel-
ligence (AI)-based techniques that can help in timely detection of the diseases in
animals [11–15]. Table 1 gives a brief description of the studies carried out for
detecting LSD.
As observed from the given Table 1, recent attempts have been made by researchers
in developing different AI techniques-based systems for detecting the presence of
LSD. Some studies utilize metrological and geographical characteristics related to
disease spread detecting the presence of LSD in animals of the region based on
different machine learning classifiers, while few studies utilize the images of infected
animals and normal images for detecting the presence of LSD based on different AI
techniques. In case of image datasets, in the study conducted by Girma et al. [14],
authors used a self-designed convolutional neural network (CNN) named LSDNet
for feature extraction. The extracted features were then fed to different classifiers
out of which support vector machine (SVM) classifier reported the highest accuracy

Table 1 Description of studies carried out for detecting LSD


Author Methodology Accuracy
Rai et al. Features Inceptionv3, VGG16, VGG19 Inceptionv3-NN: 92.5%
[13] Classifiers: SVM, KNN, NN, Naïve Bayes, logistic
regression
Safavi et al. Features: Meteorological and geospatial NN: 97.0%
[11] Classifiers: Logistic regression, decision tree, SVM,
random forest, AdaBoost, bagging, XGBoost, NN
Girma et al. Features: LSDNet LSDNet-SVM: 96.0%
[14] Classifiers: Softmax, random forest, SVM
Dofadar Features: Meteorological and geospatial Light gradient boosted
et al. [15] Classifiers: Logistic regression, decision tree, SVM, machine: 98.0%
random forest, AdaBoost, bagging, XGBoost, kNN,
Gaussian Naïve Bayes, SGD, light gradient boosted
machine
Note: SVM: Support vector machine, kNN: K-nearest neighbor, NN: Neural network
98 P. S. Kholiya et al.

Table 2 Dataset description


Type of images Number of images
Infected from lumpy disease 324
Normal images of cow 700

of 96.0%. From the table, it has been observed that most of the studies have used
CNN models for feature extraction purposes only. Therefore, the authors have tried
to perform different experiments using CNN models to test the efficacy of transfer
learning on the detection of LSD in cattle.

4 Methodology Adopted

The present work focuses on the detection of lumpy virus disease in cattle by making
use of deep learning-based models using the transfer learning approach. Most of the
previous studies have either use geospatial features or have used features extracted
from deep CNNs for classification using machine learning classifiers. No study yet
has analyzed how the end-to-end CNN architectures perform in detection of LSD.

4.1 Datasets and Data Preprocessing

The present work uses publicly available datasets specific to lumpy virus detection.
Preprocessing steps, such as image enhancement, noise reduction, and data augmen-
tation have been used to improve the quality and diversity of the datasets. The dataset
being used for this work is described in Table 2 called “Lumpy Skin Images Dataset”
[16].
The various augmentations applied to images in the dataset like flip, rotation,
crop, grayscale, hue, brightness, exposure, and cutout [17] are shown in Fig. 1.
The dataset has been split into training set (75% images, i.e., 766 images), valida-
tion set (15% images, i.e., 153 images), and testing set (10% images, i.e., 100 images).
During augmentation, the dataset was expanded from a total of 1024 images to a total
of 9200 images.

4.2 Deep Learning Models Used

The CNNs are primarily used for computer vision tasks. They leverage the concept of
convolution to extract spatial features from images. CNNs consist of convolutional
layers for feature extraction, pooling layers for downsampling, and fully connected
layers for classification or regression [18].
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 99

Fig. 1 Augmented images

Evaluation measures used to assess the performance of deep learning models for
lumpy virus detection, including accuracy, sensitivity, specificity, precision, and F1-
score [19]. In the previously reviewed papers, several CNN models were used upon
this dataset, but the accuracy levels were not up to the mark. So, in this work, a set of
6 CNN models were chosen, i.e., DenseNet, Inceptionv3, MobileNet, ResNet, VGG,
and Xception for experimental analysis.
Along with all the CNN models, TensorFlow was used to train the dataset. This
provides the advantage of stability to train large datasets. It uses GPUs to deeply
learn the dataset and thoroughly learn the DL model [20]. Deep learning architectures
are neural network structures designed to perform complex tasks by automatically
learning hierarchical representations from raw data. The description of the used
models is as below.
• VGG: The VGG is represented as a traditional deep CNN architecture with
several layers. The VGG16 model has a top-5 accuracy level of about 92.7%
with ImageNet. There are almost 14mil photographs in an ImageNet collection,
which is organized into approximately 100 categories. It outperforms AlexNet by
substituting many 3 × 3 kernel-sized layers for huge kernel-sized layers. As the
name suggests, VGG19 is an advanced part of VGG16 [21, 22].
• DenseNet: A DenseNet is a type of CNN that employs dense interconnections
between its stages by directly linking all layers (with matching feature map
100 P. S. Kholiya et al.

sizes) using dense blocks. DenseNet was built largely to enhance the diminishing
gradient of rising neural networks’ diminishing accuracy. In other words, the data
vanishes before reaching its destination because of the greater separation among
both input and output layers [22, 23].
• MobileNet: Using a novel type of convolutional layer called depthwise separable
convolution, MobileNet is a CNN design that is smaller, faster, and employs both
of these features. These models are highly handy to apply on mobile and embedded
devices due to the small size of the model [24].
• Inceptionv3: An inception network is a type of DNN that has an architectural
design made up of recurring components called inception modules. As was already
said, the technical aspects of the inception module are the main subject of this
paper. All of those levels (three convolutional layers: a 1 × 1 layer, a 3 × 3 layer,
and a 5 × 5 layer) are combined, and the output filter banks from each layer are
joined to create an individual output vector that serves as an input element for the
following stage [25].
• Xception: A CNN with 71 layers is called Xception. It is based on Google’s
inception model and uses depthwise separable convolutions, according to Google
researchers. It can import a network that has already been pretrained using the
ImageNet database’s more than a million images [26].
• ResNet: The key innovation in the ResNet architecture is the use of residual
connections, which enable the network to learn a residual mapping between input
and output features. These connections allow the network to bypass some layers
and preserve information from earlier layers, which helps to mitigate the vanishing
gradient problem and improve the training of very deep networks [27].

5 Results

The deep learning models have been trained in a total count of 20 epochs, with
a patch size of 32 and a learning rate of 0.001. The 6 CNN models were trained
separately. The approach was to find the highest accuracy available, which was not
possible without the help of TensorFlow. During the whole execution of the models,
TensorFlow and Keras models were implemented at the initial stage. For the purpose
of image processing, ImageDataGenerator was used. The models were trained in a
sequential flow of Keras TensorFlow. Adam optimizers are used as it provides better
accuracy, with a faster computational time and requires few parameters for tuning.
The accuracy and loss statistics for training and validation datasets are presented
below for different models.
After training the dataset on DenseNet201, the values of accuracy and loss
obtained for training and validation datasets are plotted in Fig. 2.
After training the dataset on Inceptionv3, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 3.
After training the dataset on Xception, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 4.
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 101

Fig. 2 Training and validation accuracy and loss of DenseNet201

Fig. 3 Training and validation accuracy and loss of Inceptionv3

Fig. 4 Training and validation accuracy and loss of Xception

After training the dataset on MobileNet, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 5.
After training the dataset on ResNet50, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 6.

Fig. 5 Training and validation accuracy and loss of MobileNet


102 P. S. Kholiya et al.

Fig. 6 Training and validation accuracy and loss of ResNet50

After training the dataset on VGG19, the values of accuracy and loss obtained for
training and validation datasets are plotted in Fig. 7.
The final values of accuracy and loss obtained for different models on the
validation dataset are shown in Table 3.
As seen from Table 3, it is observed that MobileNet gives the maximum accuracy
of 94.1% for differentiating between healthy cows and cows affected from lumpy
disease virus. The study is different from other studies that have been already carried
out for LSD detection as: (i) the study tries to evaluate the effect of transfer learning
by using different CNN models for classifying the images of cattle infected by LSD.
(ii) The study carried out by Safavi et al. [11] takes into account the geometric and
geospatial features, whereas in the proposed study, the feature engineering step is
eliminated by using CNN models.

Fig. 7 Training and validation accuracy and loss of VGG19

Table 3 Description of
Deep learning model Accuracy Loss
results obtained in the present
work DenseNet201 0.9215 0.2310
Inceptionv3 0.8627 0.3536
XceptionNet 0.8543 0.3526
MobileNet 0.9411 0.1719
ResNet50 0.9348 0.2789
VGG19 0.8861 0.3720
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 103

6 Conclusion

There is significant epidemiological and economic significance to the LSD’s recent


spread into disease-free areas. As such, accurate and timely diagnosis of the disease is
highly recommended in endemic areas in order to prevent further spread. As the extent
of LSD epidemic in India and its afflicted regions are currently not well understood.
Therefore, it is essential to look into breakouts throughout a vast geographic area,
taking into account all of the regions in the nation.
The current study focuses on proposing a method for detection of lumpy skin
disease in cattle using end-to-end deep learning models achieving the highest
accuracy of 94.1% using MobileNet.
It is also noted that very few studies involving the use of AI techniques for LSD
detection have been carried out in the absence of a standard database. Efforts should
be made to collect the images of cattle affected by LSD and build an open-source
repository accessible for research purposes.

References

1. Tuppurainen ES, Pearson CR, Bachanek-Bankowska K, Knowles NJ, Amareen S, Frost L,


Henstock MR, Lamien CE, Diallo A, Mertens PP (2014) Characterization of sheep pox virus
vaccine for cattle against lumpy skin disease virus. Antiviral Res 109:1–6
2. Chihota CM, Rennie LF, Kitching RP, Mellor PS (2003) Attempted mechanical transmission
of lumpy skin disease virus by biting insects. Med Vet Entomol 17(3):294–300
3. Davies FG (1982) Observations on the epidemiology of lumpy skin disease in Kenya. Epidemiol
Infect 88(1):95–102
4. Tuppurainen ES, Venter EH, Coetzer JAW (2005) The detection of lumpy skin disease virus in
samples of experimentally infected cattle using different diagnostic techniques. Onderstepoort
J Vet Res 72(2):153–164
5. Barnard BJH (1997) Antibodies against some viruses of domestic animals in southern African
wild animal. Onderstepoort J Vet Res 64(2):95–110
6. Beard PM (2016) Lumpy skin disease: a direct threat to Europe. Vet Rec 178(22):557–558
7. Munz EK, Owen NC (1966) Electron microscopic studies on lumpy skin disease virus type
‘Neethling.’ Onderstepoort J Vet Res 33(1):3–8
8. Westwood JCN, Harris WJ, Zwartouw HT, Titmuss DHJ, Appleyard G (1964) Studies on the
structure of vaccinia virus. Microbiology 34(1):67–78
9. Abdulqa HY, Rahman HS, Dyary HO, Othman HH (2016) Lumpy skin disease. Reprod Immun
1(4):25–30
10. Zeynalova SK (2021) Review of lumpy skin disease and its epidemiological characterization
in Azerbaijan. Res Agric Vet Sci 5(1):36–40
11. Safavi EA (2022) “Assessing machine learning techniques in forecasting lumpy skin disease
occurrence based on meteorological and geospatial features. Trop Anim Health Prod 54(1):1–11
12. Rony M, Barai D, Hasan Z (2021) Cattle external disease classification using deep learning
techniques. In: 12th International conference on computing communication and networking
technologies (ICCCNT). IEEE, Kharagpur, India, pp 1–7 (2021)
13. Rai G, Naveen, Hussain A, Kumar A, Ansari A, Khanduja N (2021) A deep learning approach
to detect lumpy skin disease in cows. In: Pandian A, Fernando X, Islam SMS (eds) Computer
networks, big data and IoT, vol 66. Springer, Singapore, pp 369–377
104 P. S. Kholiya et al.

14. Girma E, Ababa A (2021) Identify animal lumpy skin disease using image processing and
machine learning. M.Sc. dissertation, St. Mary’s University, Ethiopia
15. Dofadar DF, Abdullah HM, Khan RH, Rahman R, Ahmed MS (2022) A comparative analysis
of lumpy skin disease prediction through machine learning approaches. In: IEEE Conference
on artificial intelligence in engineering and technology. IEEE, Malaysia, pp 1–4
16. Kumar S, Shastri S (2022) Lumpy skin images dataset. Mendeley Data, V1. https://doi.org/10.
17632/w36hpf86j2.1
17. Shijie J, Ping W, Peiyi J, Siping H (2017) Research on data augmentation for image classification
based on convolution neural networks. In: 2017 Chinese Automation Congress (CAC). IEEE,
China, pp 4165–4170
18. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer
vision: a brief review. Comput Intell Neurosci 2018:7068349. https://doi.org/10.1155/2018/706
8349
19. Muñoz IC, Hernández AM, Mañanas MÁ (2007) Estimation of work of breathing from respi-
ratory muscle activity in spontaneous ventilation: a pilot study. Appl Sci 9(10):2019. https://
doi.org/10.3390/app9102007
20. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556
22. Malik H, Anees T, Naeem A, Naqvi RA, Loh WK (2023) Blockchain-federated and deep
learning-based ensembling of capsule network with incremental extreme learning machines
for classification of COVID-19 using CT scans. Bioengineering 10(2):203. https://doi.org/10.
3390/bioengineering10020203
23. Huan G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional
networks. In: Proceedings of the IEEE Conference on computer vison and pattern recognition.
IEEE, pp 4700–4708
24. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam
H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications.
arXiv preprint arXiv:1704.04861
25. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception archi-
tecture for computer vision. In: Proceedings of the IEEE Conference on computer vision and
pattern recognition. IEEE, pp 2818–2826
26. Chollet F (2016) Xception: deep learning with depthwise separable convolutions. In: Proceed-
ings of the IEEE Conference on computer vision and pattern recognition, pp 1251–1258
27. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: Proceed-
ings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, USA,
pp 770–778
Two-Factor Authentication Using
QR Code and OTP

Avanish Gupta, Akhilesh Singh, Anurag Tripathi, and Swati Sharma

Abstract Two-factor authentication (2FA) has become a widely accepted security


measure for online transactions, especially for website login. In this paper, we present
a study on the QR and OTP-based 2FA systems for website login. These methods
require the user to provide two pieces of evidence to confirm their identity, enhanc-
ing security by adding an additional layer of protection to the login process. We
discuss the advantages and disadvantages of using QR codes and OTPs for 2FA,
including ease of use, security, and implementation requirements. Furthermore, we
provide a step-by-step guide to implementing QR and OTP-based 2FA for website
login. The study highlights the significance of 2FA in securing online transactions
and emphasizes the need to implement it to provide an additional layer of secu-
rity to website login. Overall, this paper provides valuable insights into the QR and
OTP-based 2FA systems for website login and their importance in enhancing online
security.

Keywords Two-factor authentication · QR code · OTP

1 Introduction

As online transactions become more prevalent, safeguarding user accounts against


unauthorized access has become increasingly crucial. One of the most commonly
utilized security measures is two-factor authentication (2FA), which requires users
to provide two pieces of evidence to verify their identity. This additional layer of
security helps thwart unauthorized access to user accounts, even if passwords are
compromised. Two popular methods of implementing 2FA for website login are QR
and OTP-based 2FA systems. This research paper delves into the effectiveness of
QR and OTP-based 2FA systems for website login and offers a comprehensive guide
on implementing them, step-by-step.

A. Gupta (B) · A. Singh · A. Tripathi · S. Sharma


KIET Group of Institutions, Delhi, India
e-mail: avanishgupta606@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 105
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_10
106 A. Gupta et al.

With the proliferation of online transactions, protecting user accounts from unau-
thorized access has become paramount. Two-factor authentication (2FA) has emerged
as one of the most widely embraced security measures. It mandates the provi-
sion of two pieces of evidence to verify the user’s identity, enhancing the secu-
rity of user accounts, even if passwords are compromised. QR and OTP-based
2FA systems are popular methods for implementing 2FA for website login. This
research paper investigates the effectiveness of QR and OTP-based 2FA systems
for website login and provides a detailed, step-by-step guide to implementing
them.
Given the rising number of online transactions, ensuring user account security
against unauthorized access has become increasingly crucial. Two-factor authenti-
cation (2FA) has emerged as a widely adopted security measure that requires the user
to provide two pieces of evidence to validate their identity. This additional layer of
security helps to prevent unauthorized access to user accounts, even if passwords are
compromised. QR and OTP-based 2FA systems are two of the most popular methods
of implementing 2FA for website login. This paper delves into the effectiveness of
QR and OTP-based 2FA systems for website login and provides a step-by-step guide
on how to implement them.

2 Literature Survey

The paper titled “Two Factor Authentication Framework Using OTP-SMS Based on
Blockchain” proposes a new framework for two-factor authentication (2FA) using the
OTP-SMS method and blockchain technology to enhance security [1]. The current
2FA methods, particularly OTP-SMS, have vulnerabilities that attackers exploit. The
proposed framework generates an encrypted OTP using smart contracts and sends its
hashed value for authentication. The framework is compared with other blockchain-
based frameworks and found to be more secure and efficient.
Authentication is essential to confirm a user’s identity and grant access. 2FA adds
an extra layer of security to the authentication process by using two factors instead of
one [2]. OTP-SMS is a common method in 2FA, but it can be attacked by intercepting
the OTP [3]. The proposed framework addresses this issue by leveraging blockchain
technology. Blockchain ensures secure transmission of the OTP by encrypting it with
the user’s public key and sending the hashed value.
The paper discusses common attacks on 2FA, such as Man in the Middle (MITM),
session hijacking, and third-party attacks. MITM involves an attacker intercepting
communication between the user and the website, potentially gaining access to sen-
sitive information. Session hijacking exploits stolen tokens to gain unauthorized
access. Third-party attacks exploit vulnerabilities in the generation and verification
of OTP tokens.
Two-Factor Authentication Using QR Code and OTP 107

Blockchain provides a decentralized and tamper-resistant solution to these attacks.


It ensures secure communication between the user and the website, prevents data
alteration by attackers, and eliminates reliance on third-party authentication mecha-
nisms. The use of smart contracts and encryption further enhances security.
The advantages of blockchain include transparency, security, efficiency, and
resilience. It provides a transparent and secure method for sharing resources among
network participants, eliminates single points of failure, and improves efficiency in
data management.
In conclusion, the proposed framework enhances the security of OTP-SMS-based
2FA using blockchain technology. It addresses vulnerabilities and attacks commonly
associated with 2FA. Blockchain provides a secure and decentralized platform for
authentication, ensuring the confidentiality and integrity of user data.
The paper titled “2CAuth: A New Two Factor Authentication Scheme Using
QR-Code” presents a web authentication system called 2CAuth, which combines
ownership factors (smartcard, mobile) with knowledge factors (OTP) for improved
security [4]. The scheme utilizes a camera-equipped mobile phone for authentication
and employs a smart card and optical challenge-response solution. The proposed
system aims to be efficient, robust, scalable, and user-friendly.
The motivation behind this work is to develop a two-factor authentication (2FA)
scheme that not only enhances security but also protects user privacy. The main
contribution of the paper is the 2CAuth scheme, which offers several advantages.
Firstly, the smartcard used in the scheme does not store any secret code. Secondly, the
scheme requires the user to possess a smartcard, a secret PIN, and a registered mobile
phone, eliminating the need for synchronization with mobile network operators.
Thirdly, the server authenticating the user does not store any user credentials. Finally,
the scheme maintains usability even during peak load on the mobile network.
The registration phase involves the user providing their ID, chosen passwords, and
mobile number. The scheme generates an RSA key pair, a smartcard identifier, and
user-specific secret information. The authentication procedure includes verifying the
possession of the smartcard and mobile phone using random numbers and encryption.
The analysis of the protocol focuses on the feasibility, correctness, and security
evaluation of the proposed scheme. It discusses the scheme’s resistance to password
guessing, impersonation, replay attacks, DDoS attacks, and the role of mobile and
smart card holders in the security context.
In conclusion, the paper proposes the 2CAuth scheme as a practical and secure 2FA
solution that leverages mobile phones and smartcards. It suggests further research to
develop a comprehensive three-factor authentication scheme for protecting sensitive
data.
The research paper titled "Development of Two-factor Authentication Login Sys-
tem Using Dynamic Password with SMS Verification" introduces a novel approach
to strengthen user login systems [5]. The paper addresses concerns regarding unau-
thorized access by proposing a two-factor authentication (2FA) login system that
combines a dynamic password with SMS verification. The system operates as fol-
lows: users enter their username and password as the first authentication factor. Upon
successful entry, a time-sensitive, one-time dynamic password (OTP) is generated
108 A. Gupta et al.

and sent to the user’s mobile number via SMS [8]. The user then enters the OTP as
the second authentication factor, and access is granted only if it matches the gener-
ated OTP. The paper evaluates the proposed system’s effectiveness and performance,
comparing it with traditional password-based systems. The results demonstrate that
the 2FA login system significantly enhances security by requiring both the correct
username-password combination and access to the user’s mobile device. User feed-
back also reveals positive acceptance of the system, appreciating the heightened
security without significant inconvenience during the login process. In conclusion,
the research paper presents a comprehensive study on the development of a 2FA login
system using dynamic password with SMS verification. The system offers improved
security compared to traditional password-based systems, mitigating risks associated
with password theft. The positive user experience further supports the feasibility and
effectiveness of the proposed system. This research contributes valuable insights for
organizations aiming to strengthen the security of their user login processes.
The paper titled “A Novel User Authentication Scheme Based on QR-Code”
presents a QR code-based one-time password authentication scheme for secure com-
munication and resource sharing over insecure networks [7]. The goal is to provide a
simple and efficient authentication mechanism that eliminates the need for hardware
devices and reduces maintenance costs.
The scheme involves two parties: the service provider (SP) and remote users who
request services from the SP with authorized access rights. Each user possesses a
mobile phone with an embedded camera, allowing them to capture and decode QR
code images.
The scheme consists of two phases: Registration and Verification. In the Registra-
tion phase, a user (User A) who wishes to join the system sends their identity (IDA)
to the SP. The SP then sends a long-term secret key (xA) to User A’s mobile device
through a secure channel.
In the Verification phase, User A sends their IDA and a time stamp (T 1) to the
SP. The SP responds by sending an encoded QR code (EQR(a)), a hashed value (h(r,
T 1, T 2)), and another time stamp (T 2) to User A. User A checks the correctness of
T 2, and the SP checks the correctness of T 3 (not explicitly mentioned).
The Registration phase establishes the initial connection between User A and the
SP, where User A sends their identity to the SP, and the SP sends the long-term secret
key to User A’s mobile device.
The Verification phase is the process of confirming the authenticity of User A’s
identity and granting access rights. User A sends their identity and a time stamp to
the SP, and the SP responds with an encoded QR code, a hashed value, and another
time stamp. User A verifies the time stamp, and the SP verifies the time stamp (T 3,
not explicitly mentioned). If all checks pass, User A is successfully authenticated.
The proposed scheme leverages the widespread use of QR codes and mobile
phones to provide a practical and convenient authentication solution. It eliminates
the need for separate hardware tokens and reduces the risk of tampering and main-
tenance costs [9]. The feasibility of the scheme is evaluated, and security analyses
are conducted to address potential risks. Overall, the scheme offers an efficient and
secure approach to user authentication.
Two-Factor Authentication Using QR Code and OTP 109

The paper titled “Online Banking Authentication System using Mobile-OTP with
QR-code” proposes a new authentication system for online banking to enhance secu-
rity and convenience [6]. The existing Internet banking system is vulnerable to hack-
ing and personal information leakage. The proposed system utilizes a combination
of Mobile one-time password (OTP) and QR code, a variant of the 2D barcode, to
provide secure user confirmation.
Traditional banks advertise online security guarantees, but the fine print often
includes certain security requirements. With the increasing use of online banking,
financial institutions are reluctant to reimburse users who fall victim to scams. OTP, a
password system that can only be used once, is introduced as a countermeasure. The
proposed system uses Mobile OTP, offering the security of traditional OTP devices
with the convenience of mobile features.
The system replaces the use of security cards with QR codes, which can be scanned
by users’ mobile phones. The QR code contains transfer information and the user’s
mobile device serial number. Users generate OTP codes on their mobile phones
based on this information and enter them to complete the transfer process. The
system assumes secure communication, shared hashed serial numbers, and the use
of authorized certificates.
By replacing security cards with QR codes and leveraging Mobile OTP, the pro-
posed system aims to provide greater security and convenience for online banking
transactions. The paper provides an overview of OTP and QR code technologies and
outlines the authentication process of the proposed system.
The paper titled “Multi-factor Authentication: A Survey” discusses the challenges
and potential sources of multi-factor authentication (MFA) in the context of mobile
services [10]. MFA is crucial for establishing secure access rights and validating
user identity. The authors explore various MFA sources, including password pro-
tection, token presence, voice biometrics, facial recognition, ocular-based methods,
hand geometry, vein recognition, fingerprint scanners, thermal image recognition,
geographical location, behavior detection, beam-forming techniques, occupant clas-
sification systems, electrocardiographic recognition, electroencephalographic recog-
nition, and DNA recognition. The paper highlights the importance of usability,
integration, security and privacy, and robustness to the operating environment as
key challenges in implementing MFA systems. Usability challenges involve task
efficiency, task effectiveness, and user preference. Integration challenges include
multi-biometrics, vendor dependency, and the trustworthiness of third-party service
providers. Security and privacy challenges arise from vulnerabilities in the system
components and data transmission. Robustness to the operating environment consid-
ers failure rates and the impact of environmental noise. The authors propose a new
authentication scheme based on vehicle-to-everything (V2X) communication, lever-
aging the sensors available in modern vehicles. They discuss a conventional approach
using Lagrange polynomials for secret sharing and propose a reversed methodology
that adds a unique factor of time for robustness. The proposed MFA solution for
V2X applications allows for automated access based on the presence of factors and
identifies outdated factor information. Overall, the paper provides an overview of
MFA challenges, potential sources, and a proposed solution for V2X applications.
110 A. Gupta et al.

The paper titled “Generation of Secure One-Time Password Based on Image


Authentication” discusses the issue of phishing, an email fraud aimed at obtaining
personal and financial information from users [11]. It proposes a solution that com-
bines image-based authentication and HMAC-based one-time passwords to enhance
security and prevent password theft. The image-based authentication method requires
users to select categories of images during registration. When logging in, a grid of ran-
domly generated images is presented, and the user identifies the pre-selected images
to gain access. This approach adds an extra layer of security to the authentication
process. The HMAC-based one-time password generation involves using a shared
secret key between the client and server. A time value is used as a changing factor,
and an HMAC-SHA-1 algorithm is applied to generate a password. The password
is then truncated to a shorter length for user convenience. The proposed solution
aims to provide a high level of security and eliminate the need for traditional text
passwords. By combining image-based authentication and one-time passwords, the
system becomes more resistant to phishing attacks and unauthorized access. Over-
all, the paper presents a comprehensive approach to enhance authentication security,
making it more difficult for attackers to compromise user accounts.

3 Tech Stack

3.1 Android Studio

Android Studio is a popular integrated development environment (IDE) used to


develop Android applications. It is developed by Google and provides a user-friendly
interface, code editor, debugging tools, and built-in Android emulator. Android Stu-
dio is based on IntelliJ IDEA and is compatible with Java, Kotlin, and C++ program-
ming languages. Its features include code completion, refactoring, code analysis, and
version control integration.

3.2 PyCharm

PyCharm is a popular integrated development environment (IDE) used for Python


programming. It offers advanced features such as code analysis, debugging tools, and
an intuitive user interface. PyCharm also supports multiple frameworks and libraries,
making it a versatile tool for developing Python applications.

3.3 Figma

Figma is a web-based design tool that allows users to create, collaborate, and share
interface designs for websites and mobile applications. It offers a range of features,
Two-Factor Authentication Using QR Code and OTP 111

such as real-time collaboration, vector networks, and interactive prototyping, that


make it a popular choice for designers and design teams. Figma also integrates with
other design and development tools, making it a versatile platform for designing and
building digital products.

3.4 Firebase

Firebase is a mobile and web application development platform that provides a suite
of tools and services to help developers build high-quality apps. It offers a real-time
database, cloud storage, hosting, and authentication services, among others. Firebase
provides developers with a scalable and reliable infrastructure to build apps quickly
and easily, with features like push notifications, analytics, and remote configuration.
It is widely used by developers to create mobile and web applications across different
platforms.

4 Proposed Methodology

Our application’s architecture is divided into two parts: a website and an Android
app. The website is built using the Django framework, while the Android app is
developed using Java. When a new user visits our website for the first time, they are
required to register by entering their email address, username, and password. This
registration process utilizes the authentication feature of Firebase, where the website
must be listed on the Firebase console to access its functionalities. Upon registration,
a unique ID is created for the user inside Firebase’s user database.
Once registered, the user can log in to the website using the email and password
they used during registration. If the user is present in Firebase’s user database, they
are successfully logged into the website. After the login process, a Python script
generates a QR code that contains the user’s unique ID. The QR code is displayed
on the website along with a space to enter the one-time password (OTP).
The Android app plays a crucial role in ensuring the security of the OTP generation
process. However, before generating the OTP, the user must log in to the app using
their existing credentials from the website registration. It is not possible to register
using the Android app. Once logged in, the user can access the QR code scanner on
the app. Upon scanning the QR code, the user’s unique ID is obtained and compared
against the ID of the user who is currently logged in on the Android app. If the IDs
match, the OTP is generated.
Upon generating the OTP, the Android app triggers a Python script to send the
OTP to the user’s registered email address. The user then enters the OTP received in
their email into the designated space on the website. Once the correct OTP is entered,
the user is successfully logged into the website.
112 A. Gupta et al.

Fig. 1 Application workflow

Figure 1 demonstrates the above-discussed architecture using a flowchart. It


depicts the workflow between the website and the Android application.

5 Limitations

While the architecture outlined above may seem functional and secure, there are
several limitations to consider.

5.1 Dependence on Firebase

The system relies heavily on Firebase for authentication, and any disruption in the
Firebase services could cause significant disruptions in the login and registration
process.
Two-Factor Authentication Using QR Code and OTP 113

5.2 Security Risks

While OTP-based authentication is a relatively secure method, it is not foolproof.


There is still a risk of someone intercepting the OTP or gaining unauthorized access
to the user’s email account [12].

5.3 Single Point of Failure

The system’s reliance on a single OTP generator script makes it a single point of
failure. If this script malfunctions or is compromised, it could affect the entire authen-
tication process.

5.4 Limited Functionality

The system’s current design restricts the user from registering using the mobile app.
This limitation might create inconvenience for some users and might result in losing
some potential users.

5.5 Platform-Specific Code

The use of Java in the Android app and Python in the website may cause difficulties
in maintaining the code and adding new features in the future. It might also create
issues with scalability in the long run.
Therefore, it is crucial to address these limitations and consider alternative
approaches for enhancing the security and reliability of the system. Despite these
limitations, our proposed approach provides a promising step toward automating the
detection of abusive language on social media, which can help promote a safer and
more inclusive online community.

6 Conclusion

In conclusion, the above model presents a simple and functional architecture for user
authentication and registration for a website and an Android app. It utilizes Firebase
for authentication and OTP-based security measures to ensure secure access to user
accounts. However, there are several limitations to the current model, such as depen-
dence on a single point of failure, potential security risks, and limited functionality.
114 A. Gupta et al.

It is essential to consider these limitations and address them accordingly to enhance


the system’s reliability, security, and user experience. Adopting multi-factor authen-
tication, ensuring robust security measures, and avoiding platform-specific code can
help to overcome these limitations and make the system more secure and scalable in
the long run.

References

1. Alharbi E, Alghazzawi D (2019) Two factor authentication framework using OTP-SMS based
on blockchain. Trans Mach Learn Artif Intell 7(3):17–27
2. De Cristofaro E, Du H, Freudiger J, Norcie G (2013) A comparative usability study of two-
factor authentication. arXiv preprint arXiv:1309.5344
3. DeFigueiredo D (2011) The case for mobile two-factor authentication. IEEE Sec Privacy
9(5):81–85
4. Harini N, Padmanabhan T et al (2013) 2 Cauth: a new two factor authentication scheme using
qr-code. Int J Eng Technol 5(2):1087–1094
5. Iyanda AR, Fasasi ME (2022) Development of two-factor authentication login system using
dynamic password with SMS verification. Int J Educ Manag Eng 12(3):13
6. Lee YS, Kim NH, Lim H, Jo H, Lee HJ (2010) Online banking authentication system using
mobile-OTP with QR-code. In: Proceedings of the 5th international conference on computer
sciences and convergence information technology. IEEE, pp 644–648
7. Liao KC, Lee WH (2010) A novel user authentication scheme based on QR-code. J Netw
5(8):937
8. Liao KC, Lee WH, Sung MH, Lin TC (2009) A one-time password scheme with QR-code
based on mobile phone. In: Proceedings of the 2009 fifth international joint conference on
INC, IMS and IDC, pp 2069–2071
9. Nwankwo C, Adigwe W, Nwankwo W, Kizito AE, Konyeha S, Uwadia F (2022) An improved
password-authentication model for access control in connected systems. In: Proceedings of the
2022 5th information technology for education and development (ITED). IEEE, pp 1–8
10. Ometov A, Bezzateev S, Mäkitalo N, Andreev S, Mikkonen T, Koucheryavy Y (2018) Multi-
factor authentication: a survey. Cryptography 2(1):1
11. Parmar H, Nainan N, Thaseen S (2012) Generation of secure one-time password based on
image authentication. J Comput Sci Inform Technol 7:195–206
12. Wang D, Wang P (2016) Two birds with one stone: two-factor authentication with security
beyond conventional bound. IEEE Trans Depend Sec Comput 15(4):708–722
bSafe: A Framework for Hazardous
Situation Monitoring in Industries

Nikhil Kashyap, Nikhil Rajora, and Shanu Sharma

Abstract Industrial accidents such as chemical spills, fires, and explosions. Can
have serious consequences, including injuries and fatalities, environmental damage,
and economic losses. To prevent industrial accidents, it is important for industries to
implement safety measures such as proper training for employees, regular mainte-
nance of equipment, etc. Furthermore, the installation of emergency response systems
is the need of the era to minimize the impact of any accidents that do occur. In
this paper, an IoT-based hazardous situation monitoring framework is presented to
monitor and detect toxic releases from chemical companies. The proposed system
actively records, processes, and analyses the ambient temperature in areas where
molten metal is handled and manufacturing is done. Additionally, it keeps an eye out
for two dangerous gases i.e., LPG and natural gas. Every time a parameter is breached,
a set of predetermined lists of users are immediately alerted, and the system contin-
uously collects the data for further suggestions to modify the industry’s safety rules.
The sensors used in this prototype model can be altered as needed to accommodate
industrial requirements.

Keywords IoT · Temperature · Toxic gases · Sensor · Hazardous

N. Kashyap · N. Rajora · S. Sharma (B)


Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, India
e-mail: shanu.sharma16@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 115
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_11
116 N. Kashyap et al.

1 Introduction

Industrial risks are becoming an increasingly serious concern to both people and
the environment. Industrial accidents can be initiated by a range of causes such as
human error, equipment failure, inadequate training, and unsafe working conditions
[1]. To prevent industrial accidents, it is important for companies to implement safety
measures such as proper training for employees, regular maintenance of equipment,
and strict adherence to safety regulations [2]. Furthermore, the installation of emer-
gency response systems is the need of the era to lessen the impact of any accidents
that do occur. Additionally, it is important for having knowledge of different types of
hazards present at the workplace. It is also important for workers to be aware of the
hazards present in their workplaces and to follow safety protocols and procedures.
This includes wearing protective equipment, following proper handling procedures
for hazardous materials, and being aware of emergency exits and evacuation proce-
dures [3]. By taking these precautions, individuals can help to reduce the risk of
industrial accidents and protect themselves, their colleagues, and the surrounding
environment.
Internet of Things (IoT) based solutions to monitor and detect toxic releases from
chemical companies can be an effective way to prevent or mitigate the impact of
these kinds of disasters [1, 3]. By continuously collecting and analyzing data from
sensors and other devices, it is possible to identify potential hazards and take appro-
priate action to prevent or minimize the release of toxic substances [3]. A frame-
work that utilizes IoT technology to monitor and report on toxic releases could be
designed to include a variety of sensors and devices that are proficient of detecting
and assessing the presence and concentration of hazardous materials [4, 5]. These
devices could be connected to a central network and configured to transmit data in
real-time to a database, where it could be analyzed and used to identify potential
hazards and alert relevant authorities [4–6]. In this paper, an IoT-based monitoring
platform tailored specifically to the needs of the mining, refining, and manufac-
turing industries is presented in which the temperature of the environment is actively
recorded, processed, and analyzed by the system. Temperature is a key safety factor in
locations where molten metal is treated, manufacturing is done, or welds are formed.
It also keeps an eye on the high levels of dangerous gases (LPG/Natural Gas) that
are present in the air. The system continues to gather and monitor data for further
analysis in order to suggest modifications s in the industry’s safety standards if a
parameter is exceeded, issuing an instant alert if a parameter is violated to a group
of users on their cellphones from pre-determined lists of s people.
The work presented in this paper is structured as: The background study and some
of the related systems are discussed in Sect. 2. The design of the proposed prototype
is presented in Sect. 3. The various results obtained during the development of the
prototype are discussed in Sect. 4. The work is concluded in Sect. 5.
bSafe: A Framework for Hazardous Situation Monitoring in Industries 117

2 Background

Industrial hazards refer to the potential dangers that can occur in a workplace or
during the production of goods or services. These hazards can come in many forms,
including physical, chemical, biological, and ergonomic hazards. Some examples of
common industrial hazards include [1]:
• Physical hazards: These include physical injury or harm from equipment,
machinery, or other objects in the workplace.
• Chemical hazards: These include harm from exposure to chemicals in the
workplace such as chemical spills, toxic fumes, etc.
• Biological hazards: These refer to harm from exposure to biological agents in the
workplace, such as bacteria, viruses, or fungi.
• Ergonomic hazards: These include harm from repetitive movements or poor
posture while working such as carpal tunnel syndrome, back strain, or other
musculoskeletal disorders.
It is important for employers to identify and mitigate these hazards in order to
ensure the safety and health of their employees. This can involve things like training
employees on safety procedures, implementing safety protocols and procedures etc.
[7, 8].
The Internet of Things (IoT) is a network of physical items, including machines,
cars, buildings, and other things, that have connectivity, software, and sensors to
collect and exchange data [9]. By enabling new types of automation, efficiency, and
convenience, the IoT has the potential to completely transform several industries [10–
12]. Some common examples of IoT applications include smart homes, which allow
homeowners to control and monitor their home’s security, lighting, and temperature
remotely through a smartphone or other device [3, 8, 10, 11]. IoT might dramatically
increase safety and lower the likelihood of industrial disasters by enabling real-time
monitoring and control of equipment and processes, and by providing early warning
of potential hazards. One way that the IoT can improve safety in industrial settings
is by enabling the remote monitoring and control of equipment and processes [13].
They can be used to monitor environmental conditions in industrial settings, such as
temperature, humidity, and air quality [2, 4]. Sensors and other IoT-enabled devices
can be used to gather data on the performance and condition of equipment, alerting
operators to potential problems or malfunctions that could lead to accidents or other
hazards [14–16]. In addition to monitoring equipment and environmental conditions,
the IoT can be used to track a worker’s whereabouts and mobility in real-time [17–
19]. This can be particularly useful in industries with hazardous working conditions,
such as construction, mining, and oil and gas exploration, where workers may be
exposed to risks such as falls, machinery accidents, and other hazards. By tracking
worker movements, supervisors can quickly respond to potential accidents or other
emergencies, and can also use the data to identify and address potential safety issues
[20].
118 N. Kashyap et al.

Motivated by the popularity of IIoT [10], this paper presents the use of IoT to
actively monitor and analyze various factors in the typical heavy industrial zone like
temperature and levels of gases in the environment. If the above parameters exceed
the recommended safe values, the system can track the same and issue alerts. Also,
the data generated in real time can provide information about how smoothly the work
is going on in different zones.

3 System Requirements

The proposed system is built using open-source hardware and software platforms,
which aligns with current software development trends and the principles of Industry
4.0. This means that the system is designed to be flexible and adaptable, allowing it to
be customized and modified as needed to meet the specific needs of different users.
Various hardware and software components to develop the system are explained
below.

3.1 Hardware Requirements

Different hardware used for the development of proposed prototype are listed in
Table 1 and explained below.

Table 1 Hardware
Product Image
requirements
Arduino

Temperature sensor

Gas sensor

Humidity sensor

Power supply
bSafe: A Framework for Hazardous Situation Monitoring in Industries 119

Microcontroller ATmega328 serves as the foundation for Arduino UNO. It is a


component of the Arduino platform, an open-source platform for developing micro-
electronics projects. The Arduino UNO contains a 16 MHz crystal oscillator, 6
analogue inputs, 14 digital input/output pins, a USB port, a power jack, an ICSP
header, and a reset button. It is programmed with the use of the user-friendly soft-
ware tool known as the Arduino Integrated Development Environment (IDE), which
enables users to write and upload code to the Arduino UNO. In comparison to other
microcontroller boards on the market, it is widely offered from online retailers and
electronics stores for a reasonable price. Overall, the Arduino UNO is a popular and
versatile choice for a wide range of electronic projects.
The temperature sensor LM35 is popular and utilized in many different appli-
cations. Since it is a linear temperature sensor, the relationship between its output
voltage and temperature is clear. The LM35 has a temperature range of – 55–150 °C
with a room-temperature precision of 1 °C. It is relatively easy to use, as it requires
only a single supply voltage and has a simple linear output. The LM35 can be used
to measure temperature in a variety of environments, including air, water, and other
liquids.
One sort of sensor used to identify the presence of specific gases in the atmosphere
is the MQ-6 gas sensor. It is commonly used to detect the presence of flammable
gases, such as propane, butane, and methane, as well as other gases such as carbon
monoxide and hydrogen. The MQ-6 gas sensor works by using a chemical reaction to
detect the presence of gases. When a gas comes into contact with the sensor, it reacts
with the chemicals on the surface of the sensor, causing a change in the electrical
resistance of the sensor which is used to determine the concentration of the gas in
the air.
A common temperature and humidity sensor is the DHT11 sensor. It is a digital
sensor that can be easily connected to a microcontroller or computer through a digital
input/output (I/O) pin. The DHT11 sensor consists of a humidity sensing element
and a thermistor (a temperature-sensitive resistor).
A high-accuracy digital barometric pressure and temperature sensor called the
BMP180 can be used to measure atmospheric pressure and temperature. It uses I2C
protocol to communicate with microcontrollers like Arduino. The BMP180 sensor
is a low-power, high-precision sensor that can measure atmospheric pressure and
temperature. The sensor can measure pressure with an accuracy of ± 1 hPa and
temperature with an accuracy of ± 1 °C.
KY037 sound sensor is a type of microphone sensor that detects sound waves and
converts them into electrical signals. The KY037 sound sensor is a small, low-cost
module that can detect sound levels in the range of 48–66 dB. The sensor module
consists of a small microphone and an amplifier circuit. The output of the sensor is
an analog signal that can be delivered by a microcontroller, such as Arduino. The
sensor operates on a supply voltage of 3.3–5 V.
120 N. Kashyap et al.

There are several ways to power an Arduino microcontroller board. The most
common method is to use a USB connection to a computer or external power supply.
The Arduino board can also be powered through a barrel jack or VIN pin, which
allows to use a DC power supply or a battery as the power source. The USB connection
provides a stable 5 V power supply to the Arduino board, which is sufficient for most
systems.

3.2 Software Requirements

To develop the proposed prototype, an open source platform “ThingsBoard” is used


to perform collection, visualization and processing of data. A robust and adaptable
IoT platform, ThingsBoard offers a variety of features and tools for creating and
managing IoT applications. Various features of ThingsBoard includes:
• Device management: ThingsBoard provides a web-based user interface for
managing IoT devices and their configurations. It supports various communication
protocols, such as MQTT, CoAP, and HTTP.
• Data collection and processing: The platform can collect and process data from
various sources, including sensors, devices, and external systems
• Visualization and dashboarding: ThingsBoard provides a customizable dash-
boarding system that allows users to create and manage custom dashboards and
visualizations.
• Security: The platform provides various security features, including user authen-
tication and authorization, data encryption, and access control.
• Integration and interoperability: ThingsBoard provides APIs and connectors for
integrating with various external systems and platforms, such as cloud services
and databases.
• Scalability and performance: The platform is designed to be scalable and can
handle large amounts of data and devices.
• Open-source and community-driven: ThingsBoard is an open-source project with
an active community of developers and users.
A lightweight publish/subscribe messaging protocol called MQTT Protocol
(Message Queuing Telemetry Transport) was created for Internet of Things (IoT)
and M2M (machine-to-machine) communication. Publishers transmit messages to
a broker using the publish/subscribe messaging mechanism used by MQTT, and
subscribers receive those messages from the broker. This model allows for asyn-
chronous and decoupled communication between devices. It provides three levels
of QoS, which allows the sender to ensure that a message is delivered at least once,
exactly once, or multiple times. MQTT is a widely used messaging protocol for IoT
and M2M communication, due to its lightweight, efficient, and reliable nature.
Web Sockets are a protocol that enables real-time, bidirectional communication
between a client and a server over a single TCP (Transmission Control Protocol)
bSafe: A Framework for Hazardous Situation Monitoring in Industries 121

connection. It is made to give web applications with real-time updates with low-
latency, high-performance connectivity. Web Sockets provides a persistent connec-
tion between the client and server, allowing for real-time, bidirectional commu-
nication. It provides low-latency communication, as it eliminates the need for the
overhead of HTTP requests and responses.
Firebase Firestore is a cloud-based NoSQL document database provided by
Google. It is designed to store and synchronize data between multiple clients and
the cloud with real-time updates. Firestore is part of the Firebase suite of products,
which provides a comprehensive backend infrastructure for building web and mobile
applications. Because Firestore offers real-time updates, any database modifications
are instantly synchronized between clients and the cloud. It is highly scalable and
can handle large amounts of data and high traffic loads.

4 System Design and Implementation

The proposed workplace monitoring prototype “bSafe” is designed on Arduino,


which can be deployed in workplace to make the workplace more protected and
work friendly. This prototype is completely independent and not a part of any larger
system. The connection of various hardware components is presented in Fig. 1.

4.1 Designing Approach

The step by step approach of the designing of proposed prototype is explained below.

Fig. 1 System design


122 N. Kashyap et al.

• Hardware setup: Connecting the sensors to the Arduino board and make sure
everything is properly wired and powered.
• Installing necessary libraries: Installation of required libraries for the sensors
(e.g. DHT11, MQ2, BMP180, LM393) and for paho.mqtt and ThingsBoard
communication.
• Data Serialization: Serialization of data is done before publishing it to Things-
Board or any other IoT platform because most IoT platforms accept data in a
specific format, and that format is usually JSON. Serializing data into JSON
format also makes it easier to parse and analyze the data later on.
• Arduino Coding: Developing the code in python to read data from the sensors and
publish the sensor data to ThingsBoard using MQTT. This will involve setting
up the MQTT client, connecting to the broker, publishing the sensor data, and
handling any errors or exceptions that may arise.
• Configure ThingsBoard: Creating a dashboard on ThingsBoard to visualize the
sensor data and configure the rules engine to trigger alarms when hazardous
conditions are detected.
• Secure the system: To prevent unauthorized access to the system, use JWT
tokens for authentication and web sockets for secure communication between
the Arduino board and ThingsBoard.
• Sending data to Firebase: Using Web Socket(API) data fetching is done in the
local system from the things board and after it parses the JSON data and sends it
to the firebase firestore database.
• Fetching The Data in Mobile UI: When Data is Available on Firebase then make
visualize it on app using firebase config.
• System Testing: To ensure that whether all sensors are working properly, testing
of system is performed by publishing the data to ThingsBoard.

4.2 Working of System

Smartphone is used as a visualization medium for the proposed system, which utilizes
the data from the firebase connection over the Internet. The provided user interface
may deal with the real-time presentation of the device’s filtered data. All data is
recorded in a cloud-hosted database called firebase real-time database.
When a device connects to a cellular/Wi-Fi network, data is stored as JSON and
instantly synchronized to all connected clients. A smartphone’s sensors were used
as the second set of sensors. The several networked sensors offer real-time measure-
ments. When the parameters cross the threshold value or approach the threshold
value, it indicating a dangerous state marked it as an emergency situation by gener-
ating alarm to the user. Various threshold values used in the proposed prototype
are presented in Fig. 2. Various steps followed during the working of the proposed
prototype are listed below and presented in Fig. 3.
bSafe: A Framework for Hazardous Situation Monitoring in Industries 123

Fig. 2 Threshold values for


different parameters

Fig. 3 Flow chart of the


proposed hazardous activity
detection system

Steps
Start
Connect the sensors using Arduino with PC or Laptop
Function to choose a google account for authentication
Function to see the values of the parameters in Application
Function to check if the parameter exceeds the threshold value or not
Function to Sign out
Exit
124 N. Kashyap et al.

Fig. 4 Notification for


hazardous activity

The threshold values indicate the acceptable range of values for each param-
eter under normal conditions. If the values for any of these parameters exceed the
threshold value, it may indicate an abnormal or dangerous condition that requires
immediate attention or corrective action. For example, if the temperature exceeds
24 °C, it may indicate that the air conditioning system is not functioning prop-
erly, and corrective measures should be taken to avoid overheating. The successful
detection of hazardous situations is presented in Fig. 4.

5 Conclusion

The effort presented in this paper is focused on the development of an automated


system for monitoring and controlling various industrial parameters, such as temper-
ature, gas, fire, and humidity. The goal of this system is to reduce the manual over-
head required for monitoring industrial situations, which was previously done using
CCTV cameras. The work presented here shows that the proposed measurement
system is highly cost-effective compared to other solutions in the market and is less
time-intensive to implement. The proposed prototype has the potential to improve
the efficiency and safety of industrial processes, while also being more affordable
and easier to implement than other existing solutions.

References

1. Ali MH, Al-Azzawi WK, Jaber M, Abd SK, Alkhayyat A, Rasool ZI (2022) Improving coal
mine safety with internet of things (IoT) based dynamic sensor information control system.
Phys Chem Earth, Parts A/B/C 128:103225. https://doi.org/10.1016/j.pce.2022.103225
bSafe: A Framework for Hazardous Situation Monitoring in Industries 125

2. Shah J, Mishra B (2016) IoT enabled environmental monitoring system for smart cities. In:
2016 International conference on internet of things and applications (IOTA). Pune, India, pp
383–388. https://doi.org/10.1109/iota.2016.7562757
3. Paul S, Sarath TV (2018) End to end IoT based hazard monitoring system. In: 2018 International
conference on inventive research in computing applications (ICIRCA). Coimbatore, India, pp
106–110. https://doi.org/10.1109/icirca.2018.8597430
4. Wall D, McCullagh P, Cleland I, Bond R (2021) Development of an Internet of Things solution
to monitor and analyse indoor air quality. Internet of Things 14:100392. https://doi.org/10.
1016/j.iot.2021.100392
5. Esfahani S, Rollins P, Specht JP, Cole M, Gardner JW (2020) Smart city battery operated IoT
based indoor air quality monitoring system. In: 2020 IEEE sensors. Rotterdam, Netherlands,
pp 1–4. https://doi.org/10.1109/sensors47125.2020.9278913
6. Bhardwaj R, Kumari S, Gupta SN, Prajapati U (2021) IoT based smart indoor environment
monitoring and controlling system. In: 2021 7th International conference on signal processing
and communication (ICSC). Noida, India, pp 348–352. https://doi.org/10.1109/icsc53193.
2021.9673235
7. Lydia J, Monisha R, Murugan R (2022) Automated food grain monitoring system for warehouse
using IOT. Measur Sens 24:100472. https://doi.org/10.1016/j.measen.2022.100472
8. Mekni SK (2022) Design and implementation of a smart fire detection and monitoring system
based on IoT. In: 2022 4th International conference on applied automation and industrial
diagnostics (ICAAID). Hail, Saudi Arabia, pp 1–5. https://doi.org/10.1109/icaaid51067.2022.
9799505
9. Akhter R, Sofi SA (2022) Precision agriculture using IoT data analytics and machine learning.
J King Saud Univ Comput Inf Sci 34:5602–5618. https://doi.org/10.1016/j.jksuci.2021.05.013
10. Mishra DU, Sharma S (2022) Revolutionizing industrial automation through the convergence
of artificial intelligence and the internet of things, 1st ed. IGI Global, USA. https://doi.org/10.
4018/978-1-6684-4991-2
11. Bhati N, Samsani VC, Khareta R, Vashisth T, Sharma S, Sugumaran V (2021) CAPture: a
vision assistive cap for people with visual impairment. In: 2021 8th International conference
on signal processing and integrated networks (SPIN). Noida, India, pp 692–697. https://doi.
org/10.1109/SPIN52536.2021.9565940.
12. Vashisth T, Khareta R, Bhati N, Samsani VC, Sharma S (2022) A low cost and enhanced
assistive environment for people with vision loss. Lecture Notes in Networks and Systems, vol
218. Springer, Singapore. https://doi.org/10.1007/978-981-16-2164-2_20
13. Daponte P, De Vito L, Mazzilli G, Picariello E, Rapuano S, Tudosa I (2022) Implementation
of an intelligent transport system for road monitoring and safety. In: 2022 IEEE international
workshop on metrology for living environment (MetroLivEn). Italy, pp 203–208. https://doi.
org/10.1109/metrolivenv54405.2022.9826948
14. Pan Y, Lu C, Yu W, Wu M (2022) Design and application of intelligent monitoring system for
geological hazards. In: 2022 41st Chinese control conference (CCC), pp 3231–3236. https://
doi.org/10.23919/ccc55666.2022.9902700
15. Juel MTI, Ahmed MS, Islam T (2019) Design of IoT based multiple hazards detection and
alarming system. In: 2019 4th International conference on electrical information and commu-
nication technology (EICT). Khulna, Bangladesh, pp 1–5. https://doi.org/10.1109/eict48899.
2019.9068782
16. Dhall S, Mehta BR, Tyagi AK, Sood K (2021) A review on environmental gas sensors: materials
and technologies. Sens Int 2:100116. https://doi.org/10.1016/j.sintl.2021.100116
17. Fine GF, Cavanagh LM, Afonja A, Binions R (2010) Metal oxide semi-conductor gas sensors
in environmental monitoring. Sensors 10:5469–5502. https://doi.org/10.3390/s100605469
18. Anandan P, Raj VT, Namratha P, Hameed A, Farooq BM (2022) Embedded based smart LPG
gas detection and safety management system. Ind Mech Electr Eng. https://doi.org/10.1063/5.
0110382
19. Liu X, Cheng S, Liu H, Hu S, Zhang D, Ning H (2012) A survey on gas sensing technology.
Sensors 12:9635–9665
126 N. Kashyap et al.

20. Al-Okby MFR, Neubert S, Roddelkopf T, Thurow K (2021) Mobile detection and slarming
systems for hazardous gases and volatile chemicals in laboratories and industrial locations.
Sensors 21:8128. https://doi.org/10.3390/s21238128
Comparative Review of Different
Techniques for Predictive Analytics
in Crime Data Over Online Social Media

Monika and Aruna Bhat

Abstract Social media platforms have swiftly become a vital platform for informa-
tion sharing and communication. Its services are constantly being used by millions
of people to connect with one another. Limiting and preventing crime, however, is
one of the security agencies’ primary responsibilities when it comes to urban secu-
rity. The development of enforcement strategies and the implementation of crime
prevention and control depend heavily on crime prediction. The model can predict
future instances of these crimes by using this strategy. Machine learning is currently
widely utilized for predicting crime. However, in the age of big data, when individ-
uals have access to an expanding amount of data, the capacity to recognize criminal
patterns based on historical crime data is no longer effective. Models for deep learning
and machine learning are contrasted using performance metrics. As a result, deep
learning frequently surpasses machine learning when both of them are compared. A
few datasets that are used for predicting crime statistics are studied along with their
various classes. The analysis brings attention to the model’s limitations as well as its
possibilities for improvement.

Keywords Online social networks · Supervised and unsupervised learning ·


Cyberbullying · Crime data prediction · Mimic human behaviors · Geotag
posting · Cybertheft

Monika (B) · A. Bhat


Department of Computer Science and Engineering, Delhi Technological University, New Delhi,
India
e-mail: monika.siwaliya@gmail.com
A. Bhat
e-mail: aruna.bhat@dtu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 127
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_12
128 Monika and A. Bhat

1 Introduction

The Internet had expanded significantly by the twentieth century’s end, and it had
fundamentally changed a significant portion of our economic and social lives [1].
The growth of online social networks (OSNs) has been significantly influenced by
this transition. Because of the fantastic communication opportunities provided by
the Internet. These communication options gave rise to the OSN, a component of
the Internet revolution, and they greatly increased its effectiveness [2]. Despite the
fact that different scholars describe it differently, [3] defined an OSN as an online
community consisting of individuals who share mutual friends, common interests,
and enjoyment of common activities. Several OSNs are web-based, enabling users
to exchange a variety of topics with other online users, contribute text, images and
videos to personal profiles, comment on items, communicate their health issues and
more [4]. The number of users on OSNs is astounding. The number of active users
every month on Facebook has surpassed 2.38 billion as of March 2019 [5]. 330 million
people use Twitter every month, which is a social microblogging platform [6]. These
websites serve as real-time dynamic data sources and new forms of communication
for millions of users, allowing them to interact with one another and develop their
individual profiles without regard to physical distance or other physical restrictions
[7]. When it comes to the development of social networks and communities that
are already enormous in scope and scale as well as the analysis of these networks,
communication data from OSNs might offer us new perspectives and opportunities
[8–10]. The OSN has increased the appeal of using social network architecture for a
number of objectives due to the unique options it can provide (Fig. 1).
Every 34 years, criminal activity drastically rises throughout all 34 nations [11]. To
stop these criminal acts, tough measures are needed. In order to keep an eye on these
criminal activities and enhance public safety, the rate of crime tracking is crucial [12].
Crime limitations or criteria utilized in this work relate to incident-level crime data
that is maintained as a dataset of crime and includes the kind of crime, the criminal’s
ID, the incident date, and the location [13]. Crime statistics can be greatly reduced
since social media is useful for detecting crime rates in various countries and locales.

Fig. 1 Structure of social


media
Comparative Review of Different Techniques for Predictive Analytics … 129

Social media are a source of information as well as a tool for communication [14].
Twitter, which has a user base of more than 300 million, becomes a strong choice
for data analysis [15]. In this social media site, users express their thoughts, feelings
and even anger. Due to the fact that Tweets are user-initiated, it is challenging to get
data from Twitter for crime detection. There are various formats for tweets, including
symbols. Because of these problems, each Tweet needs to be carefully considered.

2 Summary

Social media is used by individuals for both their professional and personal inter-
ests, for involvement, connection and exchange of ideas as well as for the sharing of
videos, images and other information. The analysis found that social media enables
academics to examine the characteristics of individual behavior as well as regional
and temporal relationships. Based on surveys, criminology has emerged as a popular
field of research on a worldwide scale, using information drawn from Facebook,
Twitter, news feed articles, and other online social media sites. Utilizing spatiotem-
poral links in user-generated material allows for the acquisition of pertinent data for
the study of criminal activities. The research includes reference to the utilization of
text-based data science through the collection and visualization of data from various
news sources. There are several models for predicting crime data that have been devel-
oped by researchers using machine learning and deep learning. In the past, machine
learning models were used for predicting crime data, but more recently, deep learning
models are advancing quickly since they surpass machine learning models in terms
of performance. In contrast to deep learning models, which require less data, require
less time and are more accurate than machine learning models in predicting crime,
machine learning models require more time, less accurate and require more data. It
motivates us to conduct a comparison study on crime data prediction models since
many studies have difficulty generating the novel effect. Datasets are important for
predicting crime statistics on social media, and each dataset has a wealth of details
on crimes that have occurred in various places and at various periods. Numerous
datasets that are often used for predicting crime statistics are thus examined.
Rest of the paper is organized as follows: Sect. 3 presents the benefits of social
media, Sect. 4 describes the crime data prediction using machine learning models,
Sect. 5 describes the crime data prediction using deep learning model, Sect. 6 provides
the crime data prediction using different datasets, Sect. 7 represents the challenges of
the crime data prediction, Sect. 8 presents the future recommendations, and Sects.9
represent the conclusion and references.
130 Monika and A. Bhat

3 Benefits of Online Social Media

Social media platforms enable online content sharing and exchange. The ability to
share text, image, and video postings on a number of different social media sites, as
well as the ability to like, share, and respond on one another’s posts, allow users to
connect with one another and exchange material. According to the platform and the
user’s preferences, both public or private profile pictures and posts are possible. As a
result of the ability to geotag postings on many social media sites, social media data
can be compared to other kinds of geographic data.
The influence of social media on everyday life has both advantages and disadvan-
tages. Positively, social media platforms like Instagram, Snapchat, Facebook, and
Twitter enable users to connect with friends and family they’ve left behind via video
and audio conversations, chat rooms, and a variety of other services even when they’re
located miles apart. With only their mobile devices, the ordinary population may now
stay current with events happening across the globe. Individuals utilize social media
platforms to facilitate their ability to do so when seeking employment or for any
other reason, such as education. Social media lacks privacy, and therefore, there’s
a big chance someone will utilize your personal data for their own gain. Individual
privacy is a major concern in today’s society. The unauthorized access of another
person’s private details by a third party is the foundation for both cyberbullying
and cybertheft. Individuals are at risk due to the proliferation of offensive content
on social media. They talk on the Internet nonstop all day. As a result, disinforma-
tion could be employed for a variety of purposes, such as inciting racial or religious
animosity, misleading people or inciting digital hate crimes. As a result, social media
has established itself as a necessary component of our everyday life. Virtual space
consequently becomes a somewhat uncharted area in terms of the issues it presents
regarding human rights and responsibility.

4 Prediction Performance Comparison of Machine


Learning Methods Used for Analyzing Crimes in Social
Media

In general, there are three primary methods for learning accessible in machine
learning (ML), including reinforcement methods (RL), supervised and unsupervised
(UL and SL) learning. In situations where a label property is provided for a specific
dataset, supervised learning is beneficial. Examples of SL algorithms include deci-
sion trees, regressions, random forests, logistic regressions, KNNs, and others. When
it’s difficult to find implicit correlations in a given unlabeled dataset, the UL can be
helpful. Clustering techniques like K-means and Apriori algorithms are instances of
UL. The RL lies in the middle of both supervised and unsupervised machine learning,
Comparative Review of Different Techniques for Predictive Analytics … 131

providing open feedback for each prediction or action taken but without providing a
label or error message. RL no longer rewards genuine input or output combinations
and directly changes suboptimal operations. The reinforcement method is respected
by the Markov decision process algorithm.
Krishnendu et al. [16]: The crime data prediction was done by dividing the loca-
tions into several clusters. K-means, an iterative unsupervised technique, is employed
to cluster the input unlabeled data. This first step determines how many clusters
must be symbolized by a constant K and then picks a random K point for the
centroids. These centroids might not come from the information. Each data point
is then assigned to the closest centroid, resulting in K clusters. For each of the
newly generated clusters, determine where to put the new centroid. Finally, then
again allocate each data point to the updated nearest centroid. This method divides
the region into five clusters. Each cluster has the mode of operation of oddity, role,
weapon, method, and location. The K-means algorithm achieved an accuracy of 78%,
precision of 60%, recall of 45%, f -measure of 44%, respectively.
Mohemad et al. [17]: K-medoids is a clustering method that is identical to k-
means, although it is less dependent on outliers. Identification of a document in a
cluster using a randomly generated k cluster is the basic idea behind k-medoids. The
medoid that is most closely connected to each other document is grouped together.
The representative docs are used as a reference data point by the k-medoids algorithm
rather than the average value of the docs in each cluster. Oddity, role, weapon, method,
and location are the five clusters used to operate in a region. The K-medoids algorithm
achieved an accuracy of 76%, precision of 59%, recall of 41%, f -measure of 39%,
respectively.
Hossain et al. [18]: In this study, supervised learning is utilized to more accurate
crimes prediction. The approach generates crime predictions through analyzing a
database that includes data of crimes that have already been committed and their
patterns. Decision tree (DT) is a graphical representation that employs branching
methodology to show all potential decisions outcomes based on particular circum-
stances. The regression and classification tree (CART), random forest, and boosted
trees are three of the many regression tree techniques that are accessible. Decision
trees (DT) are used in machine learning for both regression and classification. Every
branching has an outcome decision that takes into account a particular constraint for
the input. Researchers employ decision trees to illustrate every branching graphi-
cally. Using the path between roots to leaf node, it can determine the categorization
in a decision tree. An internal node is a check on the characteristics listed in the
decision tree. The DT divided the search area by classifying theft, drug, and immoral
activities and achieved the accuracy of 72.7% and 0.756 of precision. However one
of the toughest concerns with DT is over-fitting troubles. Pruning and setting model
parameter limitations will be used to resolve this problem. Continuous variables
cannot be fit by decision trees.
132 Monika and A. Bhat

Yao et al. [19]: The main objective of the study is to use random forest to iden-
tify crime hotspots. Random forest (RF) is a trusted machine learning technique for
regression and classification. A reliable method of machine learning for regression
and classification is called random forest (RF). The RF employs a bagging strategy
that is similar to the DT group. Using a random dataset sample, the RF repeat-
edly trains the algorithm to generate an exceptional prediction model. By utilizing
the outputs DTs, much like in the ensemble learning approach, it provides definite
prediction. The high-grade prediction of random forest is one that appears more
than once in the decision tree. The random forest algorithm’s main strength is in
how well it handles classification and regression problems, which allow for accu-
rate computations of these. Large datasets are handled delicately without losing
their dimensionality. This method has the accuracy of 91% with hyperparameters
boost. The benefits of out-of-bag samples and bootstrap sampling were introduced
to random forest methods.
Krysovatyy et al. [20]: A popular SL technique for both classification and regres-
sion is called support vector machines (SVM). This method enables rapid tracking
of fake businesses, which is beneficial for officials to avoid financial crimes. In order
to display the categorization between the input data, the SVM plots a plane on the
given data before classifying them. The maximization of margin, which denotes
the best categorization, entails a better side-to-side gap on a plane. Cyberbullying
prediction models have been developed using SVM, and these models have proven
to be successful and productive. In order to identify offensive material in SM, SVM
was utilized to build a cyberbullying prediction system. The SVM cyberbullying
prediction model was used to extract SM material with the potential for cyberbul-
lying and find offensive content. SVM has the prediction accuracy of 84% in the
research. Hence, multi-dimensional visualization of this data is also possible. Graph-
ical representations of accuracy, precision, recall, f 1-score are given in Figs. 2, 3,
and 4.

Fig. 2 Accuracy of the


various ML models
Comparative Review of Different Techniques for Predictive Analytics … 133

Fig. 3 Precision of various


ML models

Fig. 4 Comparison of recall


and f 1-score of various ML
models

5 Prediction Performance Comparison of Deep Learning


Models Employed for Analyzing Crimes in Social Media

A recent area of study in computer science is called “deep learning” in the machine
learning domain. In order to bring machine learning closer to the fundamental objec-
tive of artificial intelligence, deep learning is developed. Deep learning addresses
a variety of challenging pattern recognition problems, advancing artificial intelli-
gence technology and allowing machines to mimic human behaviors like thinking
and audio-visual perception. Additionally, it uses a deep network to extract nonlinear
properties from input and develops complex function models by stacking numerous
nonlinear layers. Modern applications of deep learning include CNNs, recurrent
neural networks, and latent feed-forward networks. Deep learning has been obtained
in computer vision, image identification, processing of works of visual art, processing
natural language, drug discovery, genomics, cybersecurity, emotion recognition, and
speech recognition.
134 Monika and A. Bhat

The method is effective in categorizing various kinds of crime through news or


text data as well as differentiating among news stories that are relevant to crime
as well as those which are unnecessary. Deepak et al. [21]: A bidirectional LSTM
often known as a BiLSTM is a sequence processing framework that brings together
two LSTMs, one of which receives input in the forward direction while the other
processes it in reverse. This method differentiates from unidirectional in that future
information is preserved in the LSTM when it runs backward and when using both
hidden states at once, it is possible to preserve data from the present and the future at
any given time. The input gate was opened to modify the cell’s state. Using a sigmoid
function, the values made up of the prior hidden states and the incoming input are
calculated. By transforming the sigmoid function’s values into values between zero
and one, the cell states will be updated. The output gate selects the hidden state for
the following stage. With an f 1 score of 0.245, this model had 92.8% precision,
88.26% recall and 90.52 accuracy.
The model was applied in the field of finance, weather and electricity. Butt
et al. [22]: A discrete wavelet transform was used to dissect previous crime inci-
dence patterns acquired by the sliding window. RBPNN then taught decomposi-
tion sequences to predict the occurrence of future trends and specifics. To obtain
the final prediction series, the trends and specifics were lastly reconfigured. With
the wavelet transform, one can achieve resolution in both the time and frequency
domains. The DWT multi-resolution decomposition approach allows for the subsam-
pling of the crime rate series by a factor of two while discarding other samples. For
supervised learning using multilayered feed-forward networks, back-propagation is
the approach that is most frequently utilized. Using the backward error propagation
mechanism, back-propagation neural networks are feed-forward neural networks.
The wavelet transform-based neural network model only required 1000 iterations to
attain a testing MSE of 0.0048811 when used for network training. Even though the
maximum number of iterations was 5000, the MSE was still 0.62896 when using the
conventional BPNN model for training.
For better performance on Twitter dataset, hybrid MCNN model is used. Monika
et al. [23]: By decreasing the amount of layers in the CNN, the wavelet approach
is combined with CNN to reduce the number of features and the WCO optimiza-
tion technique is utilized to optimize the loss function value. Forward propagation is
established together with the weights across layers and the bias of the neurons. Deter-
mining the back-propagation errors for each layer as a result of the chain rule deriva-
tion. Depending on the back-propagated errors, the optimization method (WCO) is
utilized to adjust the weights and bias. A CNN can learn more characteristics by
incorporating the wavelet transformation. World Cup Optimization (WCO) takes a
team-based random population into account. Here, rank plays a significant role in
selecting where teams are seeded. The teams are seeded according to their rank-
ings. Without any opposition, the higher-ranked squad was given the top seed. The
competition starts the following seeding. Algorithm terminates if the stop criteria
are met; else, it iterates. The data is divided into two categories, including crime and
non-crime, after the classification procedure. The model has a 98% accuracy, 98%
precision, 98% recall, and 98% f 1-score.
Comparative Review of Different Techniques for Predictive Analytics … 135

The unusual actions have been found in the real-time crime scene dataset, which
has been gathered and converted to video frames. Sahay et al. [24]: Many researches
including research publications show that out CNN and RNN among other deep
learning algorithms have been effectively used to predict and classify crimes using a
variety of data types. The research introduced a deep reinforcement neural network
(DRNN) for predicting the crime on social media. These studies show the applica-
bility of deep learning algorithms in this area and give important information about
the elements influencing criminal behavior. In order to train DRNN for categoriza-
tion, recorded video is utilized to extract spatiotemporal characteristics and gestures.
The model obtained 98% accuracy, 96% precision, 80% recall, and 78% F-1 score.
To predict major social media crimes, multinomial naive Bayes (MNB) is used.
Abbass et al. [25]: The words or tokens constitute the majority of any written docu-
ment’s constituent parts. There are the maximum possible dependencies or relation-
ships between the words or sentences in a sentence, tweet, or document. This word
order adheres to the multinomial distribution, MNB was used to solve the classifica-
tion issue. Multinomial naive Bayes has proven to be an efficient, quick, and reliable
classifier in a variety of NLP uses. The Tf-Idf scores for each word in each text are
shown in the feature vector of the MNB algorithm. Both integer and partial feature
counts perform well under the multinomial distribution. The model obtained 78.87%
accuracy, 82.86% precision, 74.94% recall, and 78.70 f1-score, respectively. Graph-
ical representations of accuracy, precision, recall, and f 1-score are given in Figs. 5,
6, 7, and 8.

Fig. 5 Precision of the various DL models


136 Monika and A. Bhat

Fig. 6 Accuracy of the


various DL models

Fig. 7 Recall of the various


DL models

Fig. 8 f 1-score of the


various DL models
Comparative Review of Different Techniques for Predictive Analytics … 137

6 Analysis of Varied Datasets Utilized for Crime Data


Prediction in Different Social Media Networks

Datasets play a big part in predicting crime statistics on social media, and each
dataset has a lot of information on crimes that have happened in different locations
and at different times. There are two types of datasets: regional datasets and common
datasets. Regional datasets include information on a particular location, including
different types of information and attack percentages. Common datasets contain data
that is typically obtained from a variety of social media platforms, including Twitter,
Facebook, Instagram, and others. On social media, there are various datasets that are
used to predict crime as given below.
Hossain et al. [18]: San Francisco (SF) Open data from the SFPD Crime incident
report system [9] provides the San Francisco Crime Dataset. It offers details on crimes
that have place in San Francisco between January 1, 2003, and May 13, 2015. The
dataset consists of 8,78,049 entries in a csv file. The classification of a crime incidence
is the target category that needs to be predicted. The target label is connected to the
qualities, including the crime’s description and resolution. As a result, all additional
properties besides these three are used as features. The San Francisco Crime Dataset
includes 39 different categories of crimes. Classes are regarded as containing these
categories. Hence, with 39 classes, San Francisco became a city with many class
issues. There aren’t many crimes that happen constantly as well as some crimes
are incredibly uncommon. With a rate of 174,900, larceny/theft constitutes the most
frequent crime, and trespassing (TREA), with a rate of 6 seems to be the least frequent.
14 crime classes happened more than 10,000 times, whereas 14 crime classes only
happened 2000 times. It appears as though the classes are not allocated equally.
Nikhila et al. [26]: An XML file called the Formspring.me dataset contains 13,158
messages from the webpage Formspring.me that was posted by 50 distinct users. For
a study that was carried out in 2009, this dataset was generated. The dataset is sepa-
rated into “Cyberbullying Positive” and “Cyberbullying Negative” categories. Posi-
tive messages stand in for messages that do contain cyberbullying, whereas negative
messages reflect messages that do not. There are 12,266 comments in the Cyberbul-
lying Negative samples and 892 comments in the Cyberbullying Positive class. The
“holdout” strategy, which is employed in datasets with comparable dimensions, was
employed to divide the dataset into test and training sets. There are some well-known
data clusters with comparable sample sizes and numbers, like Routers and 20 New
Groups (20NG). Thus, the same techniques as in these cases have been employed.
Vyawahare et al. [27]: The communications in the Myspace dataset were taken
from group discussions on the website. The dataset contains group chats that have
been classified and categorized into ten message categories. If a group chat has 100
messages, for instance, the first group will have 1–10 texts, the second group will
have 2–11 texts, and the final group will have 91–100 texts. Every set of ten messages
receives a single labeling operation, which notes whether any of the ten messages
138 Monika and A. Bhat

Table 1 Dataset details


Dataset name Year Messages Region
San Francisco crime dataset January 1, 2003, and May 13, 8,78,049 San Francisco
2015
Formspring.me dataset 2009 13,158 Global
Myspace dataset July 2021 1753 groups Global
Twitter dataset October 2018 and August 2019 8,835,016 Global
Crime dataset January 14, 2014 and January 12,384 Europe
13, 2017

involve bullying. This dataset consists of ten groups with 357 positive and 1396
negative tags, representing 1753 message groups.
Boukabous et al. [28]: The social media posts that were considered in this eval-
uation were tweets. 8,835,016 tweets were collected for the dataset. The dataset is
divided into two sections: the first portion is the same dataset that was utilized for
ontology verification, which contains 5,896,275 tweets and was gathered during the
first week of October 2018. In the second section, there are 2,938,741 tweets that
were gathered in the third week of August 2019. These terms were gathered from a
huge amount of tweets with at least one that may be connected to CSE.
Yuki et al. [29]: The crime dataset includes burglaries that occurred in the Swiss
canton of Aargau (which is equivalent to a State in the USA) between January 14,
2014 and January 13, 2017. Throughout the past few years, Switzerland has continu-
ously ranked among the top countries in Europe for burglars. Given its size of 140,400
hectares and population of roughly 660,000 people, the canton of Aargau satisfies
the concept of a low-populated area. Only the cities of Aarau (20,000 residents),
Baden (17,500 residents), Brugg (10,000 residents), and Zofingen (10,500 residents)
have the greatest population densities (16–20 people/hectare), with the majority of
the surroundings being sparsely populated. Only 4.72 persons per hectare live there
in total.
Table 1 represents the dataset details which discussed above.

7 Challenges in Crime Detection Over Social Media

There are so many challenges faced by various methods in crime data detection in
social media network which some of them are mentioned below.
• Predictive models cannot be used to accurately anticipate any kind of criminal
activity, particularly interpersonal and domestic attacks, because these crimes are
rarely concentrated in one place and are not directly linked to a single victim
profile. While prediction algorithms may lessen some kinds of individual bias
by minimizing subjective judgments, systems still rely on frequently inaccurate
crime data that contains systemic reporting flaws.
Comparative Review of Different Techniques for Predictive Analytics … 139

• An examination of the experiences with predictive strategies has shown a number


of problems. Predictive algorithms can unintentionally reproduce and worsen
social preconceptions. Algorithms are frequently expensive due to data storage,
lack transparency regarding the underlying method and occasionally have led to
the violation of fundamental rights. Detecting bias in a dataset is difficult and
requires extensive knowledge.
• As with the majority of policing technologies, successful implementation requires
taking a comprehensive stance. The ability of law enforcement agencies to inte-
grate predictive technology into their operation depends on both their techno-
logical prowess and industrial leadership as well as the creation of minimal
requirements for responsible development, monitoring, and evaluation.
• Certain models, mainly machine learning models, possess low accuracy and score
poorly on other performance metrics such as recall, precision, and f 1-score.

8 Effective Future Recommendations

A methodical strategy to identify crime is through crime prediction and analysis.


This system can anticipate and identify criminal offense locations. Following data
analysis, law enforcement organizations can predict future criminal activity. These
projections, which can take many different forms, assist organizations in more effi-
ciently allocating resources and reducing crime. Several formats for the prediction
include:
• Criminal offense places.
• Criminal offences that are most likely to take place.
• When incidents are also most likely to take place during the day.
• Repeated crimes by certain people.
• Certain crimes occur in specific places.
• Crime trends and hotspots.
Although existing methods aren’t always accurate but still predictions made by
algorithms can still be a useful resource for police departments.
• This research likes to contribute an enhanced model for preprocessing stage which
the model can improve the processing time and enhance the prediction stage.
• During the feature extraction stage, this research likes to propose an effective
algorithm for different word embedding techniques which may produce additional
features and bring improved results.
• This research wants to contribute an effective model for feature selection stage
which is essential for effective classification process. The feature selection
algorithm will be a metaheuristic algorithm with improved weight updation
algorithm.
• In the classification stage, the research like to contribute a hybrid algorithm with
an effective fitness function. The algorithm can be neural network or metaheuristic
140 Monika and A. Bhat

algorithm or both. An effective classification should be the backbone of the whole


model.
These are the objectives that the research would like to achieve in the future.
This model will improve classification accuracy while using minimal computational
power.

9 Conclusion

The development of enforcement strategies and the implementation of crime preven-


tion and control depend significantly on crime prediction. The most widely used
prediction technique nowadays is machine learning. Unfortunately, current methods
for making predictions are unreliable, so this research prefers to examine the models
and evaluate their effectiveness. In this study, a variety of deep learning and machine
learning models were examined along with their performance metrics, including
accuracy, precision, recall, and f 1-score. As a result, the comparison, between
machine learning and deep learning, the deep learning has the upper hand on accu-
racy and precision. Some of the datasets which used in crime data prediction are
studied and its various classes. The research reveals the model’s limitations as well
as its possibilities for improvement in the future.

References

1. Zhuravskaya E, Petrova M, Enikolopov R (2020) Political effects of the internet and social
media. Ann Rev Econ 12:415–438
2. Madakam S, Tripathi S (2021) Social media/networking: applications, technologies, theories.
JISTEM-J Inf Syst Technol Manage 18
3. Can U, Alatas B (2019) A new direction in social network analysis: online social network
analysis problems and applications. Physica A 535:122372
4. Aldwairi M, Tawalbeh LA (2020) Security techniques for intelligent spam sensing and anomaly
detection in online social platforms. Int J Electr Comput Eng 10(1):275
5. Mavrodieva AV, Rachman OK, Harahap VB, Shaw R (2019) Role of social media as a soft
power tool in raising public awareness and engagement in addressing climate change. Climate
7(10):122
6. Dandannavar PS, Mangalwede SR, Kulkarni PM (2021) Predicting the primary dominant
personality trait of perceived leaders by mapping linguistic cues from social media data onto
the big five model. In: Advanced machine learning technologies and applications: proceedings
of AMLTA 2020. Springer Singapore, pp 417–428
7. Kye B, Han N, Kim E, Park Y, Jo S (2021) Educational applications of metaverse: possibilities
and limitations. J Educ Eval Health Prof 18
8. Mirbabaie M, Bunker D, Stieglitz S, Marx J, Ehnis C (2020) Social media in times of crisis:
learning from Hurricane Harvey for the coronavirus disease 2019 pandemic response. J Inf
Technol 35(3):195–213
9. Bilal M, Usmani RSA, Tayyab M, Mahmoud AA, Abdalla RM, Marjani M, Targio Hashem
IA (2020) Smart cities data: framework, applications, and challenges. Handbook Smart Cities
1–29
Comparative Review of Different Techniques for Predictive Analytics … 141

10. Ravazzi C, Dabbene F, Lagoa C, Proskurnikov AV (2021) Learning hidden influences in large-
scale dynamical social networks: a data-driven sparsity-based approach, in memory of roberto
tempo. IEEE Control Syst Mag 41(5):61–103
11. Saini J, Srivastava V (2019, July) Impact of population density and literacy levels on crime in
India. In: 2019 10th international conference on computing, communication and networking
technologies (ICCCNT). IEEE, pp 1–7
12. Yaacoub JP, Noura H, Salman O, Chehab A (2020) Security analysis of drones systems: attacks,
limitations, and recommendations. Internet of Things 11:100218
13. Bhat A (2019, Dec) An analysis of crime data under apache pig on big data. In: 2019 Third
international conference on I-SMAC (IoT in social, mobile, analytics and cloud)(I-SMAC).
IEEE, pp 330–335
14. Mheidly N, Fares J (2020) Leveraging media and health communication strategies to overcome
the COVID-19 infodemic. J Public Health Policy 41(4):410–420
15. Neelakandan S, Paulraj D (2020) A gradient boosted decision tree-based sentiment classifica-
tion of twitter data. Int J Wavelets Multiresolut Inf Process 18(04):2050027
16. Krishnendu SG, Lakshmi PP, Nitha L (2020, March) Crime analysis and prediction using
optimized K-means algorithm. In: 2020 Fourth international conference on computing
methodologies and communication (ICCMC). IEEE, pp 915–918
17. Mohemad R, Muhait NNM, Noor NMM, Othman ZA (2022) Performance analysis in text
clustering using k-means and k-medoids algorithms for Malay crime documents. Int J Electr
Comput Eng 12(5):5014
18. Hossain S, Abtahee A, Kashem I, Hoque MM, Sarker IH (2020) Crime prediction using spatio-
temporal data. In: Computing science, communication and security: first international confer-
ence, COMS2 2020. Gujarat, India, March 26–27, 2020, Revised Selected Papers 1. Springer
Singapore, pp 277–289
19. Yao S, Wei M, Yan L, Wang C, Dong X, Liu F, Xiong Y (2020, Aug) Prediction of crime
hotspots based on spatial factors of random forest. In: 2020 15th International conference on
computer science & education (ICCSE). IEEE, pp 811–815
20. Krysovatyy A, Lipyanina-Goncharenko H, Sachenko S, Desyatnyuk O (2021) Economic crime
detection using support vector machine classification. In: MoMLeT+ DS, pp 830–840
21. Deepak G, Rooban S, Santhanavijayan A (2021) A knowledge centric hybridized approach
for crime classification incorporating deep bi-LSTM neural network. Multimedia Tools Appl
80(18):28061–28085
22. Butt UM, Letchmunan S, Hassan FH, Koh TW (2022) Hybrid of deep learning and exponential
smoothing for enhancing crime forecasting accuracy. PLoS ONE 17(9):e0274172
23. Monika, Bhat A (2022) Automatic twitter crime prediction using hybrid wavelet convolutional
neural network with world cup optimization. Int J Pattern Recogn Artif Intell 36(05):2259005
24. Sahay KB, Balachander B, Jagadeesh B, Kumar GA, Kumar R, Parvathy LR (2022) A real
time crime scene intelligent video surveillance systems in violence detection framework using
deep learning techniques. Comput Electr Eng 103:108319
25. Abbass Z, Ali Z, Ali M, Akbar B, Saleem A (2020, Feb) A framework to predict social crime
through twitter tweets by using machine learning. In: 2020 IEEE 14th International conference
on semantic computing (ICSC). IEEE, pp 363–368
26. Nikhila MS, Bhalla A, Singh P (2020, July) Text imbalance handling and classification for cross-
platform cyber-crime detection using deep learning. In: 2020 11th international conference on
computing, communication and networking technologies (ICCCNT). IEEE, pp 1–7
27. Vyawahare M, Chatterjee M (2020) Taxonomy of cyberbullying detection and prediction
techniques in online social networks. In: Data communication and networks: proceedings of
GUCON 2019. Springer Singapore, pp 21–37
28. Boukabous M, Azizi M (2022) Crime prediction using a hybrid sentiment analysis approach
based on the bidirectional encoder representations from transformers. Indones J Electr Eng
Comput Sci 25(2):1131–1139
29. Kadar C, Maculan R, Feuerriegel S (2019) Public decision support for low population
density areas: An imbalance-aware hyper-ensemble for spatio-temporal crime prediction. Decis
Support Syst 119:107–117
An Approach Towards Modification of
Playfair Cipher Using 16 × 16 Matrix .

Sarmistha Podder, Aditya Harsh, Jayanta Pal, and Nirmalya Kar

Abstract The Play fair cipher is a popular polyalphabetic encryption. It mathe-


matically secures information by encrypting the message with a key. The same key
is used to change cipher text digraphs into plain text digraphs during decryption.
However, only 25 alphabets can be supported by the original 5 .× 5 play fair cipher.
To overcome the limitation, different types of modifications were proposed in the
traditional Playfair cipher. But still, it was at the receiving end of successful attacks
due to the limitations. In this paper, the proposed method circumvents the limitations
of past studies that used a play fair cipher using a 5 .× 5 matrix, 7 .× 4 matrix, and 6
.× 6 matrix. The suggested technique utilizes a 16 .× 16 matrix and provides strength
for Playfair cipher. The proposed work is an improvement to the original structure
that makes use of matrix rolling, shifting, and rotation to improve security. It uses
alphabets, both lowercase and uppercase letters, numbers, and special characters to
build the matrix’s content.

Keywords Encryption · Decryption · Play fair cipher · Matrix rotation · Matrix


shifting · Matrix rolling

S. Podder (B) · J. Pal


Information Technology Department, Tripura University, West Tripura, India
e-mail: sarmisthapodder58@gmail.com
J. Pal
e-mail: jayantapal@tripurauniv.ac.in
A. Harsh · N. Kar
CSE Department, NIT Agartala, Tripura, India
e-mail: nirmalya@ieee.org

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 143
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_13
144 S. Podder et al.

1 Introduction

Data security is more important than ever in the modern world due to the substantial
growth in Internet traffic caused by remote work. To safeguard private information,
sensitive data, and the security of both the sender and receiver, encryption techniques
like cryptography are crucial [1].
Cryptography involves converting plain text into indiscernible text to make it read-
able only to the intended recipient and sender. It has several applications, including
preventing data theft, user authentication, and ensuring user security. Cryptography
is divided into two categories: symmetric and asymmetric. Classical ciphers like the
hill and Playfair ciphers have long been staples in the world of cryptography. The
Playfair cipher is the most often used cipher algorithm for a variety of reasons. The
algorithm is more difficult for the cryptanalyst to decipher because each step pro-
duces a unique ciphertext, as can be seen by carefully examining it. It is unaffected
by attacks using brute force. It is incredibly difficult to decode the cipher without the
key. The substitution is made simple by it [2].
The traditional Playfair cipher, however, is less trustworthy in the present world
due to technological advancements. Classical Playfair ciphers produce encrypted
data that is simple to decode and offers very little security. The traditional playfair
encryption has already undergone numerous modifications that increase security over
the traditional Playfair cipher.
In this paper, we also suggest a variation that primarily relies on a lightweight
cryptography technique. It offers secure solutions for constrained resources in a
network while using less memory, less computer power, and less energy. In order to
add an additional degree of protection, we are adding rotation, shifting, and rolling in
addition to the standard confusion and diffusion techniques. We are also correcting
the Playfair cipher’s fundamental flaw, which is its restriction to only 25 characters.
The proposed method uses a 16 * 16 matrix rather than a 5 * 5 matrix to take into
account all of the lowercase and uppercase alphabets, integers and many symbols
[3].

2 Review Literature

2.1 Traditional Play Fair Algorithm

The Playfair cipher is a traditional encryption method that was developed in 1854
by Charles Wheatstone, but it was given the name Lyon Playfair in honor of their
mutual acquaintance. It encrypts plaintext messages using a polygraphic substitution
technique. Every letter of the alphabet is employed in the method’s 5 .× 5 letter
matrix, known as a Playfair square, with the exception of “J,” which is mixed with
“I.” The keyword is then entered into the matrix, followed by the orderly entry of
the remaining letters. Each pair of two letters from the plaintext message is then
An Approach Towards Modification of Playfair … 145

Table 1 Steps of traditional Playfair cipher


Ref. Process Description
[4] Generating (i) A 5 .× 5 grid of alphabets known as the key square is the
matrix encryption key for plaintext
(ii) One letter of the alphabet, typically J, is left off the table (the
letter I might be used in its place instead), and each of the 25
alphabets must be distinct
[5] Encryption (i) The plaintext is split into pairs of letters (digraphs)
(ii) Replace each letter with the letter to its right (wrapping around
to the left side of the row if necessary) if both letters are in the
same row of the key square
(iii) Replace each letter with the letter below it (wrapping around to
the top if necessary) if both letters are in the same column of the
key square
(iv) If neither of these conditions is true, use these two letters to
make a rectangle and swap out each letter with the letter in the
opposite corner
[6] Decryption (i) If it happens that two letters are in the same row, replace them
with the letter on their left. Return to the end of the same row and
only change a letter that is at the beginning with the start letter
(ii) If, by chance, two letters appear in the same column, replace
them with the letter above. If the letter is at the top, replace it by
moving the letter from the column’s bottom to the top
(iii) Imagine drawing a rectangle and writing the alphabets on the
corners if neither alphabet is in the same column or row

encrypted according to a set of criteria. If the letters are not in the same row or
column, the letters at the corners of the rectangle the two letters make take their
place. The recipient receives the generated ciphertext and uses the identical Playfair
square to decrypt the message. The conventional Playfair cipher is a straightforward
but efficient encryption method, however it is susceptible to several attacks, such as
frequency analysis and assaults using known plaintext (Table 1).

2.2 Limitation of Traditional Playfair Cipher

The conventional Playfair cipher uses an alphabetic 5 .× 5 matrix and a polygraphic


substitution cipher that works on pairs of letters. Nevertheless, it is susceptible to
some attacks because of several flaws like a constrained key space, susceptibility
to known plaintext assaults, frequency analysis attacks, a lack of authentication,
and a constrained character set, etc. The key space may be quite big, but it is still
small in comparison to most contemporary encryption techniques, leaving it open
to brute force attacks. The Playfair cipher is additionally unprotected against tam-
pering attempts since it lacks authentication and integrity verification. Due to these
restrictions, the classic Playfair cipher is typically considered less safe compared to
146 S. Podder et al.

contemporary encryption techniques, and it is ineffective for encrypting messages


that contain characters other than the restricted character set [7].

2.3 Modifications on Traditional Playfair Algorithm

To improve the security and dependability of the traditional Playfair cipher in the
present day, numerous modifications can be made [8]. The main focus of the change
is the security it offers against various threats.
The traditional Playfair cipher can be improved in the following ways to increase
security and benefit (Table 2).
In recent past, few approaches were proposed which includes part of the modi-
fications listed in the above table. Post performing a comparative analysis of those
methods, the table below lists modifications to the Playfair cipher along with their
complexity and drawbacks. The modifications are listed in the first column, while
the second column describes the complexity of each modification, ranging from low
to high. The third column lists the drawbacks of each modification. The modifica-

Table 2 Ways of modifications in traditional playfair cipher


Ref. Modifications Description
[9] Key management The encryption key used in the classic Playfair cipher is a 5 .× 5
matrix of letters that both the sender and the recipient share. The
decoding of messages is possible, nevertheless, if this key is
compromised. The security of the cipher can be considerably
improved by altering the key management procedure, for as by
employing a more powerful key generation method or a bigger key
space

[10] Multiple rounds Applying the encryption algorithm to the plaintext message
of encryption numerous times is one method for enhancing the security of the
Playfair cipher. Several rounds of encryption is the name given to
this procedure. This increases the complexity and decryption
difficulty of the cipher text, increasing its security

[11] Substitution and The plaintext message is encrypted using just substitution in the
transposition classic Playfair cipher. The cipher text generated by the encryption
process is made more complex by the addition of a transposition
phase, making it more challenging for attackers to decrypt

[12] Variable matrix Another change to the Playfair cipher is the substitution of a variable
size matrix size for the conventional 5 .× 5 matrix. The key space is
expanded by increasing the matrix size, strengthening the cipher’s
security
An Approach Towards Modification of Playfair … 147

Table 3 Study of various modified playfair algorithm types


Ref. Modification Complexity Drawbacks
[13] Uses a rectangular Medium (uses a Although difficult but
matrix of size M .× N. rectangular matrix). can be Broken.
Apply a series of Substitution and Limited key space,
substitution and transposition make its more time-consuming
transposition steps time complexity and resource-intensive
using the rectangular average.
matrix.
[14] Uses random swap Medium (uses 6 * 6) Rotation and random
patterns and rotation can range from . O(n 2 ) swap lead to a larger
to make the cipher to . O(k) or . O(n). n is number of possible
more secure. the number of rows. keys difficult to
manage and securely
distribute
[15] Uses 8 * 8 matrix High as it uses LSFR, LFSR is limited by the
combined with LFSR. which requires more size of the register,
Circular shift rule, hardware resources Vulnerable to
alternate pair rule. and power correlation attacks
consumption.
[16] Uses 4 * 19 cipher Medium (4 .× 19, Limited by the size of
matrix RSA which makes it slightly the RSA key. complex
stenography RMPS slower than the and computationally
keyless transposition standard 5 .× 5. The intensive
use of steganography,
RSA and RMPS
keyless transposition.)
[17] Uses 6 * 6 key matrix Medium(addition of Limited amount of
combined with block block cipher increases data at a time
cipher uses the complexity) vulnerable against
transposition permutation matrix attacks
technique take O(n!) time

tions listed are Multiple Rounds of Encryption, Variable Matrix Size, Substitution
and Transposition, Key Management, and Modified Playfair Cipher. This table can
be used as a reference to compare the different modifications and their respective
trade-offs (Table 3).

3 Proposed Modification on Playfair Cipher

In this work, we provide a modification to the play fair 16 .× 16 matrix to strengthen


the correctness and compatibility of the play fair cipher with data content.
The technique starts by creating a 16 .× 16 matrix using the cipher key and plain
text as input. After creating a 16 .× 16 matrix with all capital and lowercase letters,
148 S. Podder et al.

Fig. 1 Flow chart of proposed modification

Fig. 2 Structure of proposed architecture

the numbers 0–9, the arrow, the mathematical symbols +, -, /, and *, and many other
symbols, the result is a total of 256 characters we start the encryption of the plain
text (Figs. 1 and 2).

3.1 Encryption

Certain steps in the encryption process give the algorithm an additional layer of
security. The stages are explained in detail below:
• Rotation: The 16 .× 16 matrix is rotated by .90◦ clockwise as the initial stage in
encrypting plain text. Rows and columns are transformed in this procedure from
one another. Here is a section of above matrix showing the rotation process (Fig. 3).
An Approach Towards Modification of Playfair … 149

Fig. 3 Matrix rotation

The algorithm used to rotate the matrix is shown in the pseudo code below.

Algorithm 1 Matrix Rotation Algorithm


procedure {}
N←5
Fun rot90Clkwse(arr)
global N
for j ← N do
for i ← N-1 to -1 step -1 do
write(arr[i][j], end ←" ")
x=arr[i][j]
end for
write(x)
end for
arr1 ← [ row 1,row 2,row 3,row 4,row 5... row 16 ]
rot90Clkwse(arr1);

end procedure

• Shifting: Matrix shifting goes through the elements of each row and shifts them in
accordance with an offset value. To start the shift row operation, the elements of
the second row of the matrix are moved one (1) time to the right. Simultaneously,
the next rows follow the exact number of shifts based on one detrimental value of
their individual row number (Fig. 4).
The pseudo code below demonstrates the algorithm that was used to shift the
matrix.

• Rolling: The third operation, called “matrix roll” shifts an array of items while
iterating across a set of axes. In the advised method, a full row is raised by the roll
action (Fig. 5).
The algorithm used to roll the matrix is shown in the pseudo code below. We
will preprocess plain text into digraphs as the matrix rolls. Then, after utilizing the
matrix to encrypt digraphs, we will wait for the sequence to come to a conclusion.
If it does, we will then produce the final cipher; otherwise, we should return to the
stage that involved creating the 16 .× 16 matrix using the key.
150 S. Podder et al.

Algorithm 2 Matrix Shifting Algorithm


procedure
Fun cirs t (arr, st)
st ← st mod len(arr)
arr[-st], arr[:-st] ← arr[:st], arr[st:]
mt rans(m)
cirs t (m[1], 4)
, cirs t (m[2], 3)
cirs t (m[3], 2)
cirs t (m[4], 1)
m ← [ row 1,row 2,row 3,row 4,row 5... row 16 ]
mt rans(m)
ms←m
write(ms)

end procedure

Algorithm 3 Matrix Rolling Algorithm


procedure
mr ← np.roll(ms, -1, axis=0)
write(mr)

end procedure

Fig. 4 Matrix shifting

Fig. 5 Matrix rolling


An Approach Towards Modification of Playfair … 151

3.2 Decryption

As we did during encryption, we take the key and use it to form a 16 * 16 matrix
in order to decrypt the encrypted text. Then we split the cipher text into digraphs
and apply the decryption method similarly to the conventional Playfair cipher by
employing one diagraph at a time. The matrix is then subjected to all three operations,
rolling, shifting, and rotating in the appropriate order. The procedure will be repeated
until all the digraphs have been transformed into plaintext.

4 Result and Analysis

After testing the proposed algorithm into the system with keyword “playfair” and
plaintext as “crpyptography,” we get the cipher text as below:
Keyword:- playfair

A poly-graphic substitution cipher that works with pairs of letters rather than single
letters is the Playfair cipher algorithm. To encrypt and decrypt messages, it employs
a 16 .× 16 matrix of letters. The alphabetical letters of the alphabet are commonly
used to fill the matrix, while additional characters like numerals and punctuation
marks are also permitted.

4.1 Brute Force Attack

A cryptographic system can be broken via a brute force attack, which involves
attempting every potential key until the right one is discovered. Although it can
work well against weak systems, strong encryption with vast key spaces is typically
impossible to implement. Using strong encryption and key management procedures,
and capping the number of attempts permitted, can reduce the likelihood of brute
force assaults [18].

4.2 Frequency Analysis Attack

Frequency analysis attack is a method of breaking a cryptographic system by ana-


lyzing the frequency distribution of characters or symbols in the ciphertext. It’s most
152 S. Podder et al.

effective against simple substitution ciphers. To mitigate the risk, more complex sub-
stitution ciphers should be used, and techniques like adding random data or padding
the ciphertext can further disrupt the frequency distribution [19].

4.3 Man-in-the-Middle Attack

A man-in-the-middle (MITM) attack is where an attacker intercepts and alters com-


munications between two parties without their knowledge. It can be carried out by
exploiting vulnerabilities or tricking users. To mitigate the risk, cryptographic pro-
tocols like SSL/TLS can be used, and user education can help prevent phishing and
untrusted network connections [20].

The Playfair cipher algorithm encrypts data using a number of operations, such as
matrix rotation, matrix rolling, and matrix shifting. By making the encryption process
more complex, these activities contribute to improving the security of the cipher. The
analysis shows that doing the aforementioned raises the algorithm’s resistance against
attacks like brute force, the man in the middle and frequency analysis. The attacker
would not gain any significant information in a reasonable amount of time.
The number of distinct keys that can be generated by the matrix determines the size
of the key domain. A square matrix that is filled with alphabetic or other characters
serves as the Playfair cipher’s key.
The matrix has 16 rows and 16 columns, thus there are 16 .× 16 = 256 places in
the matrix where characters could be placed. However, as no letter can appear twice
in a row or column, the first character of the key can be any of the 26 letters of the
alphabet, and each succeeding character can only be one of the remaining 25 letters.
As a result, there are the following number of keys that could be used with a 16
by 16 Playfair cipher matrix:
26 .× 25255 is roughly equivalent to 1.045 .× 10472.
This indicates that a 16 by 16 Playfair cipher has a very vast key space, making it
challenging for an attacker to find the right key using brute force attacks.

5 Conclusion

This paper examined the drawbacks of the original play fair cipher. It also shows
different elements that may be used to modify the basic structure of playfare cipher
to enhance security. Based on these key elements, the modifications proposed by
researchers in general playfare cipher were analyzed and the drawbacks are also
listed for better enhancement. Post analyzing various changes made in the Playfair
cipher, this paper proposed a series of modifications to enhance the security level
that also included enlarging the dimension of the key matrix. The proposed structure
improved the conventional Playfair cipher and strengthened security against brute
An Approach Towards Modification of Playfair … 153

force attack, dictionary attack, chosen plain text/cipher text attacks, and known plain
text attacks. The new Playfair cipher approach uses a 16 .×16 matrix that contains
all the printed extended ASCII values, making it more difficult to crack.

References

1. Agrawal G, Singh S, Agarwal M (2011) An enhanced and secure playfair cipher by introducing
the frequency of letters in any plain text. J Curr Comput Sci Technol 1(3):10–16
2. Ferrer JCC, Guzman FED, Gardon KLE, Rosales RJR, Dell Michael Badua DA, Marcelo
DR (2018) Extended 10 x 10 playfair cipher. In: 2018 IEEE 10th international conference on
humanoid, nanotechnology, information technology, communication and control, environment
and management (HNICEM), pp 1–4 . https://doi.org/10.1109/HNICEM.2018.8666250
3. Sastry VU, Shankar NR, Bhavani SD (2009) A modified playfair cipher involving interweaving
and iteration. Int J Comput Theory Eng 1(5):597
4. Basu S, Ray UK (2012) Modified playfair cipher using rectangular matrix. Int J Comput Appl
46(9):28–30
5. Tunga H, Saha A, Ghosh A, Ghosh S (2014) Novel modified playfair cipher using a square-
matrix. Int J Comput Appl 101(12):16–21
6. Goyal P, Sharma G, Kushwah SS (2015) Network security: a survey paper on playfair cipher
and its variants. Int J Urban Des Ubiquitous Comput 3(1):9
7. Dhenakaran S, Ilayaraja M (2012) Extension of playfair cipher using 16 x 16 matrix. Int J
Comput Appl 48(7)
8. Maha MM, Masuduzzaman M, Bhowmik A (2020) An effective modification of play fair cipher
with performance analysis using 6 x 6 matrix. In: Proceedings of the international conference
on computing advancements, pp 1–6
9. Murali P, Senthilkumar G (2009) Modified version of playfair cipher using linear feedback
shift register. In: 2009 international conference on information management and engineering,
pp 488–490. https://doi.org/10.1109/ICIME.2009.86
10. Villafuerte RS, Sison AM, Medina RP (2019) i3d-playfair: an improved 3d playfair cipher algo-
rithm. In: 2019 IEEE Eurasia conference on IOT, communication and engineering (ECICE),
pp 538–541. https://doi.org/10.1109/ECICE47484.2019.8942655
11. Sannidhan MS, Sudeepa KB, Martis JE, Bhandary A (2020) A novel key generation approach
based on facial image features for stream cipher system. In: 2020 third international conference
on smart systems and inventive technology (ICSSIT), pp. 956–962. https://doi.org/10.1109/
ICSSIT48917.2020.9214095
12. Alam AA, Khalid BS, Salam CM (2013) A modified version of playfair cipher using 7? 4
matrix. Int J Comput Theory Eng 5(4):626
13. Patil R, Bang SV, Bangar RB (2021) Improved cryptography by applying transposition on
modified playfair algorithm followed by steganography. Int J Innov Sci Res Technol 6(5):616–
620
14. Hans S, Johari R, Gautam V (2014) An extended playfair cipher using rotation and random
swap patterns. In: 2014 international conference on computer and communication technology
(ICCCT). IEEE, pp 157–160
15. Srivastava SS, Gupta N (2011) A novel approach to security using extended playfair cipher.
Int J Comput Appl 20(6):0975–8887
16. Chauhan SS, Singh H, Gurjar RN (2014) Secure key exchange using rsa in extended playfair
cipher technique. Int J Comput Appl 104(15)
17. Babu KR, Uday Kumar S, Vinay Babu A, Aditya I, Komuraiah P (2011) An extension to
traditional playfair cryptographic method. Int J Comput Appl 17(5):34–36
154 S. Podder et al.

18. Kaur A, Verma HK, Singh RK (2013) 3d-playfair cipher using lfsr based unique random number
generator. In: 2013 sixth international conference on contemporary computing (IC3). IEEE,
pp 18–23
19. Chand N, Bhattacharyya S (2014) A novel approach for encryption of text messages using
play-fair cipher 6 by 6 matrix with four iteration steps. Int J Eng Sci Innov Technol (IJESIT)
3:478–484
20. Yousif MS, Salih RK, Alsaidi NMG (2019) A new modified playfair cipher. In: AIP conference
proceedings, vol 2086. AIP Publishing LLC, p 030047
Real and Fake Job Classification Using
NLP and Machine Learning Techniques

Manu Gupta, Naga Sridevi Piratla, Sripath Kumar Chakrapani,


and Yeshwanth Pasem

Abstract Detecting fraudulent job listings is a critical challenge in the recruiting


industry. This paper presents a machine learning model for classifying real or fake job
postings. The study begins with Exploratory Data Analysis (EDA) to gain insights
into the multi-class classification of different attributes and understand their relation-
ships. Next, data preprocessing techniques, including natural language processing
(NLP), are employed to prepare the datasets for training and testing. Various machine
learning methods such as K-Nearest Neighbors (KNN), Support Vector Machine
(SVM), AdaBoost, Random Forest, Naive Bayes, and Logistic Regression are applied
to classify job postings as real or fake. Performance evaluation measures including
accuracy, precision, recall, F1-score, selectivity, and specificity are computed to
assess the classifier’s effectiveness. The results demonstrate that the proposed model
achieves an impressive accuracy of 99.2% in classifying job information using the
random forest classifier. This highlights the robustness and efficacy of the model
in accurately identifying fraudulent job listings, providing a valuable tool for job
seekers, employers, and recruitment platforms in maintaining the integrity of the job
market.

Keywords Fraudulent job posting · Machine learning · EDA · NLP · Data


preprocessing · Multi-class classification

1 Introduction

The recruitment industry plays a crucial role in matching job seekers with available
job opportunities. However, with the rise of the digital age, the recruitment process
has shifted online, making it easier for fraudulent job postings to go undetected.
This problem has serious consequences, including financial losses, identity theft,
and reputational damage for companies whose brands are used in these scams [1].

M. Gupta (B) · N. Sridevi Piratla · S. K. Chakrapani · Y. Pasem


Department of ECM, Sreenidhi Institute of Science and Technology, Hyderabad, India
e-mail: manugupta5416@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 155
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_14
156 M. Gupta et al.

Therefore, it is crucial to distinguish between real and fake job postings to protect job
seekers and businesses. To address this issue, traditional approaches to job posting
analysis have relied on manual inspection and verification by human experts. This
method is time-consuming, expensive, and prone to errors. However, advances in
machine learning and natural language processing have opened up new possibilities
for detecting fake job postings more efficiently and accurately [2]. The application
of machine learning to fake job detection offers several advantages over traditional
methods. Firstly, it allows for the analysis of large datasets in a relatively short amount
of time. Secondly, it can learn from patterns in data, access/control the features of
the dataset, and adapt to new types of fake job postings, making it a more robust and
scalable solution. Thirdly, it can reduce the reliance on human experts, saving time
and money while increasing accuracy.
The objective of this work is to develop a classification model using NLP and
machine learning techniques to accurately differentiate between real and fake job
postings. By leveraging the power of NLP and advanced algorithms, the project
aims to provide job seekers, employers, and recruitment platforms with a reliable
tool to identify and mitigate the risks associated with fraudulent job listings. The
project also aims to contribute to the advancement of fraud detection methodologies
and create a more secure and trustworthy job market environment. Additionally,
the multi-class classification of job posting based on industry, type of employment,
experience, and jib function is also performed.
To achieve the objective of proposed work primarily data preprocessing is
performed on input dataset to verify that data is loaded correctly and handle the
missing data and values which are not given. Following this, Exploratory Data Anal-
ysis (EDA) [3] is performed on data to visualize the input and target features graph-
ically. Next, natural language processing (NLP) [4] is used to obtain real and fake
words by tokenization and by removing stop words such as ‘a’, ‘an’, ‘the’, as they do
not carry much meaning. Further, the data is split into training and testing and machine
learning methods namely KNN, Random Forest, Naive Bayes, Logistic Regression,
SVM, and AdaBoost are implemented for real and fake job classification.
The remaining paper is structured as: Sect. 2 gives a literature survey of previous
related works, Sect. 3 shows the architecture of the proposed model, the results are
discussed in Sects. 4 and 5 gives the conclusion and future scope of this study.

2 Literature Survey

Several studies have applied machine learning algorithms to detect fraudulent job
postings. Sultana et al. [5] experimented with machine learning algorithms (SVM,
KNN, Naive Bayes, Random Forest, and Multilayer Perceptron) and Deep Neural
Network for fake job prediction. This method achieved accuracy of 97.7% using
Deep Neural Networks. Reddy et al. [6], also experimented with machine learning
algorithms and obtained the highest accuracy of about 98% with the random forest
classifier. Several machine learning algorithms, including logistic regression, KNN
Real and Fake Job Classification Using NLP and Machine Learning … 157

classifier, random forest algorithm, and bidirectional LSTM algorithm, are used in the
study by Anita et al. [7] to categorize and distinguish fake jobs and real jobs. Dutta and
Bandyopadhyay [8] attained maximum accuracy of 98.27% for fraudulent posts with
random forest classifier. Choudhury and Acharjee [9] used random forest classifier for
job classification and attained accuracy of 98.27%. In the work presented by Nandini
et al. [10] the supervised classifiers Naive Bayes, SVM, Logistic Regressor, and
Random Forest were employed for fraudulent job predictions. This model achieves
maximum accuracy of 97% using random forest classifier. Kumar [11], implemented
a series of experiments with unidirectional and bidirectional RNN architecture using
GRU cells. They obtained an accuracy of 97.4%.
Although several methods are proposed for the classification of real or fake job
postings, these methods have several limitations. Firstly, some of the methods rely
solely on lexical and syntactic features of the job posting text, which may not capture
the semantic meaning of the text accurately. This can lead to misclassification of job
postings and inaccurate results. Additionally, the previous works have considered a
dataset with a very less number of fake jobs (~4% in the dataset) as compared to the
real jobs which leads to underfitting which leads to inaccurate results. Moreover, the
existing methods use limited or incomplete feature sets, which may not capture all the
relevant information present in the job posting data. This may result in poor accuracy
and high rates of false positives or false negatives. From the gaps identified from
existing studies, it is found that there is a need for more advanced and sophisticated
techniques for the classification of real or fake job postings that can overcome the
above-mentioned limitations and provide accurate and reliable results.

3 Proposed Model

The proposed method for classifying and predicting real or fake job postings involves
several steps as shown in Fig. 1.

3.1 Dataset

The dataset utilized in this study is collected from Kaggle [12]. It consists of approx-
imately 17,880 job postings with several attributes including job ID, title, profile of
organization, description, requirements, reimbursements, place, department, income
range, travel, logo of company, type of job, fraudulent, experience, and education,
etc. The parameter ‘fraudulent’ takes values only 0 and 1. The value 0 indicates
non-fraudulent job postings and 1 indicates fraudulent job postings. This dataset has
been collected from various job posting websites and it contains a combination of
real and fake job postings. The dataset is unbalanced, with only about 5% of the
postings being labeled as fraudulent. The job postings in the dataset have a diverse
set of features, making it challenging to classify them accurately. In addition to the
158 M. Gupta et al.

Fig. 1 Workflow of the proposed model

textual data, the dataset also contains several categorical and numerical attributes,
such as location, industry, salary range, and employment type, which could provide
additional information for classification. Largely, the dataset is a valuable resource
for developing machine learning models to identify fraudulent job postings.

3.2 Preprocessing of Data

Preprocessing of data is the process of transforming the raw input data into a format
that can be effectively utilized by machine learning algorithms. In this research, data
preprocessing comprises several steps, including data cleaning, feature selection, and
feature engineering. The cleaning step involves removing duplicate entries and any
rows with missing or null values. In feature selection, irrelevant or redundant features
are eliminated. For instance, the job_id feature, being unique for each job posting,
does not contribute meaningful information to the classification task. Similarly, the
location feature may have limited relevance, as fake job postings can originate from
anywhere globally. Feature engineering involves creating new features based on
existing ones, which can enhance the classification algorithm’s performance. For
example, a feature counting the number of commonly used keywords in fraudulent
job postings can be created from the job description.
Real and Fake Job Classification Using NLP and Machine Learning … 159

3.2.1 Exploratory Data Analysis (EDA)

It is the process of examining and visualizing the dataset to understand its structure,
patterns, and relationships between the variables. The main objective of EDA [3] is
to gain insights into the data that can be used to guide subsequent data processing
and analysis. The descriptive statistics involving central tendency and dispersion for
various features are computed to gain understanding of dataset and identify poten-
tial outliers. EDA also involve data visualization techniques such as histograms,
scatter plots, and heatmaps. This helps to identify the patterns and correlation among
parameters in the dataset. The features which are relevant and improve accuracy
of classification are used for further stages. In proposed work EDA is performed
to assist in attaining the multi-class classification for the job postings on basis of
industry and employment type.

3.2.2 Natural Language Processing (NLP)

Natural language processing (NLP) is a sub-field of artificial intelligence and is


applied in proposed work with an aim to process the textual data in input dataset.
This entails procedures like tokenization, stop word removal, stemming, and feature
extraction. Tokenization involves splitting the text into individual words or tokens.
Stop word removal involves removing common words that do not add much meaning
to the text, such as ‘the’, ‘and’, ‘a’. Stemming involves reducing words to their
base form, such as ‘working’ to ‘work’, ‘walked’ to ‘walk’. NLP techniques
support in extracting valuable information from the textual data, such as identifying
common words and phrases in fraudulent job postings or finding patterns in the job
requirements or qualifications that are more common in genuine job postings.

3.2.3 Encoding Categorical Data

The third step involves encoding the categorical data to quantitative numerical values.
Categorical data refers to data that is not numerical in nature. As most machine
learning algorithms operate on numerical data, this step is significant. The presented
dataset contains categorical characteristics for factors like employment type, requisite
experience, and education. One-hot encoding method is used in proposed work to
convert categorical values into numerical values [13]. This preprocessed dataset is
then used to train and test various machine learning algorithms for the classifying
real and false job postings.
160 M. Gupta et al.

3.3 Tools and Libraries

The proposed model for real and fake job classification is developed in python.
The libraries used for data analysis are Pandas and Numpy. Matplotlib and Seaborn
libraries are used for data visualization. Scikit-learn library is utilized for feature
extraction and modeling. Natural language toolkit (NLTK) is used to perform text
cleaning, tokenization, and stemming.

3.4 Machine Learning Techniques

The machine learning techniques used in proposed model to carry out the task of
classification of real and fake jobs are Naïve Bayes (NR), Logistic Regression (LR),
K-Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM),
and AdaBoost classifiers (AB) [14–19]. These algorithms are selected as they are
robust to over-fitting, can handle huge datasets with large number of features and are
effective in separating data that is not linearly separable. The ensemble architecture
of random forest improves the accuracy and stability of the model and can handle
outliers, noise and missing values effectively. AdaBoost is another ensemble method
that combines several weak classifiers to increase the model’s performance. It is also
helpful for determining the key features that are most crucial for the classification
task.

4 Results

The original dataset [12] has been modified by increasing the number of fake jobs
from 5 to 15% by adding more real and fake jobs into the dataset from various sources
[20] to prevent the problem of underfitting. The number of fake jobs are increased
from 866 to 3044, and real jobs remain the same. From this dataset, 80% (15,800
jobs) are used for training and 20% (3900 jobs) of the data are used for testing. The
metrics accuracy, recall, precision, F1-score, selectivity, and sensitivity are computed
for evaluating the performance of developed model [21]. The qualitative and quan-
titative results obtained from the developed model are described in the following
sub-sections.

4.1 Qualitative Results

The results obtained for real and fake jobs classification for multiple classes including
industries, employment types, experience, and job functions after performing EDA
Real and Fake Job Classification Using NLP and Machine Learning … 161

Fig. 2 Number of real and fake jobs in various industries

Fig. 3 Number of real and fake jobs in various employment types

on the dataset is shown in Figs. 2, 3, 4 and 5. As observed from Fig. 2, most number
of fake jobs are generated from the accounting industry and real jobs are from the
information technology (IT) industry. The results in Fig. 3 demonstrates that number
of real and fake jobs are highest in full-time employment. In Fig. 4, it can be observed
that the most number of fake jobs are generated from the entry level experience need
and the real jobs from the mid-senior level. Furthermore, Fig. 5 illustrates that the
administrative functions have more number of fake jobs and IT has more real jobs.

4.2 Quantitative Results

The quantitative results obtained from various machine learning classifiers for fake
job prediction are described here. To optimize the performance of the algorithms,
parameter tuning was conducted on various aspects such as adjusting the number of
trees for the random forest classifier and modifying the count of nearest neighbors
162 M. Gupta et al.

Fig. 4 Number of real and fake jobs on basis of experience

Fig. 5 Number of real and fake jobs on basis of job function

for the K-Nearest Neighbors (KNN) algorithm. This iterative process aimed to find
the best combination of parameter values that would yield the most accurate and reli-
able results for classifying real or fake job postings. By experimenting with different
parameter settings, we sought to enhance the models’ effectiveness and ensure that
they were fine-tuned to achieve optimal performance in accurately identifying fraud-
ulent job listings. The performance metrics of Naive Bayes, Logistic Regression,
KNN with K = 5 and K = 10 (where K is the number of neighbors), Random Forest
with T = 50 and T = 100 (where T is the number of trees and the trees used are
classification and regression trees (CART trees)), SVM, and AdaBoost are discussed
hereafter.

4.2.1 Training and Testing Dataset Results

The values of evaluation metrics obtained for training dataset for various machine
learning classifiers are given in Table 1. Additionally, Table 2 gives results for testing
dataset. In these tables: M = Model, A = Accuracy, P = Precision, R = Recall, F1
= F1-Score, SP = Specificity, SE = Selectivity, NB = Naive Bayes, LR = Logistic
Regression, RF = Random Forest, AB = AdaBoost, 0 = Real Jobs, 1 = Fake Jobs.
As observed random forest classifier give best results for training as well as testing
data.
Real and Fake Job Classification Using NLP and Machine Learning … 163

Table 1 Performance metrics of ML models for training data


M A P R F1 SP SE
0 1 0 1 0 1
NB 0.94 0.98 0.75 0.95 0.91 0.96 0.82 0.94 0.91
LR 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
KNN K =5 0.95 0.99 0.78 0.95 0.97 0.97 0.87 0.95 0.97
K = 15 0.89 0.99 0.59 0.88 0.93 0.93 0.72 0.88 0.93
RF T = 50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
T = 100 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
SVM 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
AB 0.89 0.89 0.93 1.00 0.30 0.94 0.45 1.00 1.00

Table 2 Performance metrics of ML models for testing data


M A P R F1 SP SE
0 1 0 1 0 1
NB 0.94 0.98 0.75 0.95 0.88 0.96 0.81 0.94 0.88
LR 0.98 0.99 0.95 0.99 0.96 0.99 0.96 0.98 0.92
KNN K =5 0.93 0.99 0.70 0.93 0.96 0.96 0.81 0.92 0.95
K = 15 0.87 0.98 0.55 0.87 0.90 0.92 0.69 0.86 0.90
RF T = 50 0.99 0.99 0.99 0.99 0.93 0.99 0.96 0.99 0.96
T = 100 0.99 0.99 0.99 0.99 0.93 0.99 0.96 0.99 0.93
SVM 0.98 0.99 0.93 0.99 0.97 0.99 0.95 0.98 0.96
AB 0.89 0.89 0.87 0.99 0.30 0.94 0.44 0.99 0.93

4.2.2 Real and Fake Words

The frequently occurring words that are identified to be present in multiple real and
fake job descriptions from the proposed model are summarized in Table 3. These
words are obtained by performing the Porterstemmer method from the NLTK library
and can be visualized by using the wordcloud method. The users need to be cautious
while reading the description of the job profile to identify some of the words which
indicate the meaning of money, easy work, urgent requirement, cash, make solid, etc.
It can be concluded that the creators of fake job profiles want their target audience
to fall in the trap. They achieve it by using words that make the target audience feel
urgency of the specific job post to be filled by the company and also it makes the
audience think that they provide lots of benefits.
164 M. Gupta et al.

Table 3 Frequently occurring words identified from real and fake job descriptions
Real words Fake words
• Develop • Project manager • Custom service • Posit earn
• Analyst • Work environment • Home • Posit work
• Specialist • Business process • Earn • Base payroll
• Director • Knowledge • Daily • Cash
• Office manager • Communication skills, etc. • Want urgent • Work easy
• Provide efficiency, etc.

4.2.3 Comparison of the Proposed Model with the Existing Models

The comparative analysis of the proposed model and existing models for a real and
fake job classification is performed as illustrated in Fig. 6. As observed, our proposed
model achieved the highest accuracy of 99% in comparison to the existing methods
[5–11]. The reason for improved accuracy might be that in the proposed study the
dataset is improvised by increasing the number of fake jobs to avoid the problem
of underfitting. Also the missing data and incomplete feature set are addressed by
preprocessing the data. Furthermore, in proposed model both NLP and machine
learning (ML) techniques are used for job classification whereas, most of existing
models use only ML models for classification. Additionally, the proposed method
also provides multi-class classification of real and fake job postings on basis of
industry, employment type, experience, and job function.

Fig. 6 Performance comparison for existing and proposed model


Real and Fake Job Classification Using NLP and Machine Learning … 165

5 Conclusion and Future Scope

The proposed work highlights the significance of NLP and machine learning in
addressing real-world problems such as job classification and fraud detection. The
ability to accurately identify fake job postings can greatly benefit job seekers,
employers, and recruitment platforms alike. It helps in preventing fraudulent activ-
ities, protecting individuals from potential scams, and maintaining the integrity of
the job market. A model for real and fake job classification is presented in this work
using NLP and various machine learning classifiers. The results obtained show that
random forest classifier achieves best results with accuracy of 99.2% and selectivity
value of 96%. The proposed work provides a solid foundation for further research
and development in the field of job fraud detection, and its findings can be utilized
to create more secure and trustworthy job markets in the future. The classification
performance could be further improved by exploring more complex machine learning
models and using extensive datasets.

References

1. Baraneetharan E (2022) Detection of fake job advertisements using machine learning


algorithms. J Artif Intell 4:200–210
2. Amaar A, Aljedaani W, Rustam F, Ullah S, Rupapara V, Ludi S (2022) Detection of fake
job postings by utilizing machine learning and natural language processing approaches. Neur
Process Lett 1–29
3. Schoenberger M (1979) Exploratory data analysis. IEEE Trans Acoust Speech Signal Process
27:563–564
4. Kang Y, Cai Z, Tan CW, Huang Q, Liu H (2020) Natural language processing (NLP) in
management research: a literature review. J Manage Anal 7:139–172
5. Habiba Sultana U, Islam MK, Tasnim F (2021) A comparative study on fake job post prediction
using different data mining techniques. In: 2nd International conference on robotics, electrical
and signal processing techniques (ICREST). IEEE, pp 543–546
6. Reddy YV, Neeraj BS, Reddy KP, Reddy PB (2023) Online fake job advert detection application
using machine learning. J Eng Sci 14
7. Anita CS, Nagarajan P, Sairam GA, Ganesh, P, Deepakkumar G (2021) Fake job detection and
analysis using machine learning and deep learning algorithms. RevistaGeintec-GestaoInovacao
e Tecnologias 11:642–650
8. Dutta S, Bandyopadhyay SK (2020) Fake job recruitment detection using machine learning
approach. Int J Eng Trends Technol 68:48–53
9. Choudhury D, Acharjee T (2023) A novel approach to fake news detection in social
networks using genetic algorithm applying machine learning classifiers. Multimedia Tools
Appl 82:9029–9045
10. Nandini T, Chandrika SG, Mounika P, Kumar VS (2023) Developing a model to detect
fraudulent job postings: fake vs. real. Int J Recent Develop Sci Technol 7:52–59
11. Kumar A (2021) Self-attention GRU networks for fake job classification. Int J Innov Sci Res
Technol 6
12. Dataset from kaggle-https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-
prediction
13. McGinnis WD, Siu C, Andre S, Huang H (2018) Category encoders: a scikit-learn-contrib
package of transformers for encoding categorical data. J Open Sour Softw 3:501
166 M. Gupta et al.

14. Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification.
In: On the move to meaningful internet systems 2003: CoopIS, DOA, and ODBASE: OTM
confederated international conferences, CoopIS, DOA, and ODBASE 2003. Catania, Sicily,
Italy, November 3–7. Springer Berlin Heidelberg, pp 986–996
15. Vishwanathan SVM, Murty MN (2002) SSVM: a simple SVM algorithm. In: Proceedings of
the 2002 international joint conference on neural networks. IJCNN’02 (Cat. No. 02CH37290),
vol 3. IEEE, pp 2393–2398
16. Rigatti SJ (2017) Random forest. J Insur Med 47:31–39
17. LaValley MP (2008) Logistic regression. Circulation 117:2395–2399
18. Webb GI, Keogh E, Miikkulainen R (2010) Naïve Bayes. Encycl Mach Learn 15:713–714
19. Schapire RE (2013) Explaining adaboost. Empirical inference: festschrift in honor of vladimir
N. Vapnik, pp 37–52
20. Dataset from kaggle. https://www.kaggle.com/datasets/whenamancodes/real-or-fake-jobs
21. Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classifica-
tion tasks. Inf Process Manage 45:427–437
An Improved Privacy-Preserving
Multi-factor Authentication Protocol
for Wireless Sensor Networks

Shaurya Kanwar, Sunil Prajapat, Deepika Gautam, and Pankaj Kumar

Abstract Wireless sensor networks (WSNs) have garnered significant attention


for their ability to collect and transmit data in various applications. However,
ensuring secure communication in WSNs poses significant challenges due to resource
constraints and the inherent vulnerability of WSNs to various attacks. Recently,
Renuka et al. proposed a multi-factor authentication protocol for WSNs. In this
paper, we thoroughly examined Renuka et al. protocol, and several vulnerabilities
are demonstrated, which namely comprise privileged insider attacks, stolen smart
card attacks, and ephemeral secret leaking attacks. In addition, this paper proposes
an improved protocol to increase communication security in WSNs.

Keywords Cryptanalysis · Design flaws · Authentication · Wireless sensor


networks

1 Introduction

Distributed wireless sensor networks (DWSNs) present a complex challenge


primarily due to the lack of uniformity in network structure and establishment,
resulting in non-optimal placement of sensor nodes in the physical environment.
Sensor nodes are often deployed in an arbitrary manner across the region of interest.
After deployment, these nodes gather data that is subsequently transmitted to a
central base station or sink node (BS) for further processing. The BS serves as a
hub for data collection, performs necessary operations on behalf of the sensors, and
manages the overall traffic within the network. Communication between sensors
occurs through short-range route links. However, in sophisticated and critical appli-
cations such as military and health care, there is a strong need to access real-time

S. Kanwar · S. Prajapat · D. Gautam · P. Kumar (B)


Department of Mathematics, Srinivasa Ramanujan, Central University of Himachal Pradesh,
Dharamshala, (H.P.) 176215, India
e-mail: pkumar240183@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 167
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_15
168 S. Kanwar et al.

information directly from nodes within the WSN, which poses a significant chal-
lenge. Since direct retrieval of data from nodes by the BS is not feasible, it is assumed
that real-time data can be accessed by authorized users upon request. Hence, user
authentication has become a crucial area of research in WSNs, aiming to provide
authenticated access to real-time data for authorized individuals [1].
In recent years, there has been a significant rise in the attention and interest within
the research community toward bio-template-based authentication techniques, in
conjunction with passwords [1–3]. Authentication using biometrics in WSNs is more
reliable and safer than older techniques [3]. Biometric privileges over traditional
schemes are as biometric data cannot be misplaced or forgotten, highly complicated
to imitate, cannot be guessed, it is challenging to fake and distribute, etc. [3].
Authentication protocols have emerged as highly effective methods for ensuring
trust, reliability, and legitimacy of users on a large scale. In the present context, the
sundry authentication protocol has been highlighted in WSNs and IoT area. Most
protocols are prone to safety risk and are inappropriate for practical use. Recently,
Amin et al. intended a three-factor authentication protocol with WSN in the view
of healthcare field [4]. Later, Ali et al. [5] analysis also reviewed the Amin et al.
[6] protocol and pointed the weaknesses, namely impersonation, offline password
guessing, extraction of confidential information, session key, and ID guessing attack.
Subsequently, in response to the vulnerability, Ali et al. [5] put forward an improved
remote user authentication technique based on a three-factor authentication model.
Their approach incorporated the utilization of Automated Validation of Internet
Security and application tools, complemented by simulation techniques employing
Burrows-Abadi-Needham logic [7]. Subsequently, Jiaqing et al. [4] conducted a
thorough analysis of the protocol proposed by Ali et al. [5].
Watro et al. [8] in WSNs presented a scheme using RSA and Diffie–Hellman
protocol in user authentication. In accordance with the findings presented in Das
et al. [9], the following weaknesses were pointed out: The attacker, on getting public
key, was able to encrypt the session key and text to the user, and then the user decrypts
the message by private key and utilizes the session key to carry out action that the
attacker intended. On the other hand, Wong et al. [7] gave an effective password-
based technique, which was vulnerable to login identity and stolen verifier attacks;
thereafter, Das et al. [9] improved the above scheme based on passwords. However,
the improved protocol could not safeguard DOS and node capture attack.
He et al. [10] lately improve the weakness of Das et al. [9] protocol. Thereafter,
Vaidya et al. [11] point out flaws in Das et al. protocol [9], namely as stolen smart
card attack, and give the two-factor customer authentication technique on WSNs.
Das et al. [9] also failed to attain mutual authentication by Chen and Shih [12].
To address the aforementioned imperfections, Das et al. [13] proposed an effective
and efficient password-based technique specifically designed for WSNs. Addition-
ally, subsequent enhancements were introduced in [14] to further improve upon the
existing approach. Thakur et al. [15] cryptanalyze an authentication scheme. A recent
multi-factor authentication protocol for WSNs was given by Renuka et al. However,
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 169

we have observed some flaws in their design and vulnerabilities against some cryp-
tographic attacks. These all have motivated us to construct more reliable and secure
protocol that offers better security in sensor communication network.

2 Review of Renuka et al. [16] Protocol

The symbols utilized in the technique and their meanings are listed in Table 1 provided
below.

2.1 Registration Phase

During this phase, an authorized user, Ui , has registered with the gateway node (GW),
using a secure channel that entails the following steps:
Step (R1): User Ui firstly chooses his identity IDi and password PWi and inserts
biometric Bi . Gen (Bi ) = (σi , τi ) is then performed using Gen (·) function, where
σi is the biometric key, and τi is media exposure of restoration σi .

Table 1 Symbols and


Symbols Representation
representation
Ui The ith user
IDi The ith user identity
PWi The ith user password
Bi The ith user biometric
Gen(·),Rep(·) Fuzzy extractor
σi Bio-key
τi Media exposure of restoration
SCi Smart card
GW Gateway node
SN j The jth sensor node
Xs 1024-bit key
IDSN j The jth identity of sensor node
DIDi Pseudo-identity for the user
MKSN j The master key shared between GW and SN j
Ti The ith time stamp
RNUi , RNSN j Random nouns
h(·) Hash function
170 S. Kanwar et al.

Step (R2): The Ui then computes RPWi = h(IDi ||σi ||PWi ) and sends message
(IDi ,RPWi ) to GW through a secure medium.
Step (R3): On getting a request from Ui for registration, the GW node chooses a
1024-bit secret key X s and assigns a pseudo-identity DIDi to user, and construct
ri = h(IDi ||DIDi ||X s ). Then, GW generates a smart card (SCi ) featuring ri∗ =
ri ⊕ RPWi , DIDi , h(·) and sends this back to Ui . Finally, GW stores (DIDi , IDi )
and safeguards the database by utilizing the master key X s .
Step (R4): On acquiring SCi , the Ui stores additional parameters
{τi Gen(·), Rep(·)} within smart card.

2.2 Login Phase

A user Ui must go through the following procedures to authenticate the GW and


obtain data in real time from the sensor node SN j :
Step (L1): The Ui first inserts his SCi into the reader and then inserts his details
IDi , PWi , and Bi∗ . ( ∗ )
Step (L2): Then, SCi computes( σi∗ ||= Rep|| B)i , τi by utilizing Rep(.) function,
then SCi constructs RPWi∗ = h IDi ||σi∗ ||PWi and M1 = ri∗ ⊕ RPWi∗ .
Step (L3): Then, SCi computes K 1 = h(M 1 , T 1 ) and selects a random number
RNUi , where T 1 is the fresh timestamp. If Ui wants to obtain the data in real
time( collected by the) nodes SN j , the SCi computes a ciphertext C1 = C1 =
E K 1 DIDi , RNUi , T1 . Then, SCi finally transmits the message {DIDi , IDSN j ,C1 ,
T1 } to the GW node.

2.3 Authentication and Key Setup Phase

On receiving message (DIDi , IDSN j , C1 , T1 ) from Ui , the GW carries out the following
operations:
Step (A1): When a message is received at time T 2 , GW first ensures the authen-
ticity of the timestamp T 1 by determining if |T 2 − T 1 |≤ ΔT, where ΔT is
the maximum permitted delay of the transmission. If the condition is satisfied,
then GW retrieves Ui real identity IDi concerning DIDi . If the database has the
above-mentioned records, then GW compute K 1∗ = h(h(IDi ||DIDi ||X s ), T1 )
and decrypts C1 using the temporary key. The user’s identification is valid if the
message consists the genuine DIDi and T 1 , otherwise GW terminates the session.
Step ((A2): Then, GW )constructs the cipher text C2 =
E MKSN J IDi , IDSN j , RNUi , T1 , T3 , where MKSN j is the shared master key
between GW and SN j , RNUi is the random number, and T3 is the fresh
timestamp. Finally, GW transmits the message (IDSN j , C2 ) to SN j .
Step (A3): On receiving (IDSN j , C2 ) at the time T4 , SN j first decrypts C2 with
MKSN j , then checks if identity IDSN j information that has been encrypted is
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 171

accurate, and validates the timestamp condition |T4 − T3 |≤ ΔT. If the condition
holds true, indicating the authenticity of the message, SN j proceeds with the
session; otherwise, SN j terminates the session.
Step(A4): SN j now ||selects ||a random || number
|| RN)SN j and computes the session
(
key as SKi j = h IDi ||ID S N j ||RNUi ||RNSN j ||T1 ||T3 ; additionally, SN j constructs
( )
key K 2 = h(IDi , IDSN j ,RNUi ), cipher text C3 = E K 2 IDi , IDSN j , RNSN j , Auth1
= h(SKi j ||RNUi || RNSN j ) and transmits the message (Auth1 , C3 , T5 ) back to Ui .
Step (A5): On receiving (Auth1 , C3 , T5 ), user Ui first verifies |T 6 – T 5 |≤
ΔT. If satisfy, then Ui constructs the key K 2 = h(IDi , IDSN j ,RNUi ) and deci-
phers C3 . If C3 contains the genuine IDi , IDSN j then Ui computes the SKi∗j =
( || || || || )
h IDi ||ID S N j ||RNUi ||RNSN j ||T1 ||T5 . Hence, the user Ui validates the protocol’s
legitimacy.

2.4 Password and Biometric Update Phase

Considering that Ui wishes to change his password, he follows these steps:


Step 1: Ui puts his SCi into the card reader, then inserts his IDi , old password
PWi , new PWi∗ , and biometric Bi∗ .
Step 2: Now SCi utilizes the Rep(·) function to obtain σi∗ = Rep(Bi∗ , τi ). Then,
SCi computes RPWi = h(IDi || σi∗ || PWi ) and RPWi∗ = h(IDi || σi∗ || PWi∗ ).
Finally, compute ri∗ ⊕ RPWi ⊕ RPWi∗ and exchange the old parameters with
freshly constructed ones.

3 Security Analysis of Renuka et al. [16] Protocol

3.1 Privileged Insider Attack

In this sort of assault, a privileged individual on the server side, such as the IT
manager, uses his or her privileges to access the user’s sensitive information, such
as the identity and value RPWi = h(IDi || σi || PWi ), and then damages the users’
rights. In general, users may access different apps or servers using the same identities
and passwords to guarantee ease of use and security. And if an attacker has access
to the user (IDi , RPWi ), a malevolent activist may quickly get password or even on
imprint other password and biometric can easily achieve login because the protocol
does not give any verification in order to authenticate the genuine user.
172 S. Kanwar et al.

3.2 Stolen Smart Card Attack

In the corresponding protocol, the smart card holds crucial information that is
required
( ∗ to advance in the relevant protocol. The ) SCi contains information as
ri = ri ⊕ RPWi , DIDi , h(·), τi , Gen (·), Rep (·) . If an attacker any how gains
access of the smart card, then an attacker can easily construct the key that is being
utilized in C1 = E K 1 (DIDi , RNUi , T1 ) for encryption and decryption purposes. An
attacker can easily compute K 1 = h(M1 , T1 ), where M1 can easily be computed from
smart card by utilizing XOR property and RPWi is extracted from privileged insider
or by inserting his own false information as because there is not verification provided
to authenticate generous user.

3.3 Ephemeral Secret Leakage Attack

An attacker has access to the data that is being transmitted within insecure channel
which he can further make use for his own benefits. In the respective protocol, an
advisory has access to the information such as (DIDi , IDSN j ,C1 , T1 ,C2 ,Auth1 ,C3 ,
T5 ). Thus, to compute the session key, an attacker must obtain the respective data
(IDi ||IDSN j ||RNUi ||RNSN j ||T1 ||T5 ), where IDSN j , T1 , T5 are the public published
transmitted data through insecure channel, IDi can easily be extracted through the
privileged insider attack, and with this information, an attacker could eventually insert
his own random number RNUi or can be easily extracted it by simply decrypting C1
and same goes for RNSN j so an attacker hence can easily compute the session key
and hence hijack the session.

3.4 Design Flaws in Password Update Phase

Previously in password updating setup in the respective Renuka et al. protocol [16],
the user Ui insert his IDi , password PWi and bio-template Bi to smart device. Then,
further it computes basic parameters and then Ui is directed to insert new PWi∗ and
Bi∗ to smart device and update the certain information of smart card. Importantly
in this respective protocol, smart card never substantiates the old information but
directly updates the new credentials’ data of the user into the smart card. In this
scenario, the two difficulties are priory faced as: (1) password modification after the
loss of a smart card, (2) lack of verification in password-update process.
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 173

4 An Improved Authentication Protocol

Considering the security flaws observed in the protocol proposed by Renuka et al.
[16], which involve insider threats, stolen smart card scenarios, and the potential
exposure of ephemeral secrets, this section introduces our enhanced technique based
on the earlier assessments.

4.1 Registration Phase

During the registration stage, an authorized user Ui registers through a secure channel
with GW. The registration process encompasses a series of steps, which are outlined
as follows:
Step (R1): User Ui needs to choose IDi , PWi and insert biometric parameter Bi
and then performs Gen(Bi ) = (σ i , τ i ).
Step (R2): Then, Ui randomly generates number as x ∈ Zq, performs following
computation as RPWi = h(IDi ||PWi ||σi ) ⊕ x, and transmits message {IDi , RPWi }
to GW.
Step (R3): On receiving the request from Ui , GW generates 1024-bit key X s . Also
assign a pseudo-identity to IDi as DIDi . GW then computes N = h(IDi ||DIDi ||X s )
and Ni = RPWi ⊕ N and stores (DIDi , g, h(.)) in SCi smart card. Compute and
transmit {Ni , SCi } message back to Ui .
Step (R4): On receiving message {Ni , SCi } from GW, user computes RPWi2 =
Ni ⊕ σi and RPWi3 = h(IDi ||PWi ||RPWi2 ) and stores {τi ,RPWi2 ,RPWi3 } in SCi ,
respectively.

4.2 Login Phase

To authenticate GW and obtain real-time data from SN j , the user Ui must compass
a series of steps given below:
Step (L1): Ui firstly inserts his SCi into reader, then inserts his ID'i , PW'i , and B'i .
And retrieve biometric key as σ 'i = Rep(B'i , τ 'i ) by employing rep(.) function.
Then, Ui computes RPW'i2 = Ni ⊕ σ 'i and RPW'i3 = h(ID'i ||PW'i ||RPW'i2 ).
Step (L2): Then Ui verifies RPW'i3 by comparing it with the original value from
SCi as RPW'i3 =? RPWi3 . The procedure should be terminated if this verification
is unsuccessful; otherwise, the user will start the authentication step.
Step (L3): Now, Ui selects RNUi a random number and computes key K u =
h(IDi ||(Ni ⊕ IDi )) for encryption and computes cipher text C1 = E K U (DIDi , g,
RNUi ,IDSN j , T1 ). And send message (DIDi , C1 , T1 ) to GW.
174 S. Kanwar et al.

4.3 Authentication and Key Establishment Phase

Upon receiving the authentication request (DIDi , C1 , T1 ) from the user, the GW
follows a set of procedures, which are outlined as follows:
Step (A1): Initially, the GW verifies the timestamp T1 by checking if the time at
which the message arrived, denoted as T2 , satisfies the condition |T2 − T1 |≤ ΔT.
This ensures that the time difference between the two timestamps falls within the
permissible range of the longest message transmission delay allowed in the sensor
network. If the condition satisfies, then GW further retrieves corresponding user
Ui real identity IDi as per DIDi . Once the database contains the requisite records,
the GW proceeds to calculate K u = h(IDi ||(Ni ⊕IDi )) and decrypt C1 .
Step (A2): Then GW computes C2 = E MKSN J (IDi , IDSN j , RNUi , T1 , T3 ), where
MKSN j is the master key shared between GW and SN j . Then, GW transmits
(IDSN j , C2 ) message to SN j .
Step (A3): Sensor node on receiving (IDSN j , C2 ) firstly deciphers C2 by utilizing
master key and then verifies freshness of timestamp |T4 − T3 |≤ ΔT. Then generate
a random number RNSN j .
Step (A4): Hence, compute session key as SKi j = h(IDi ||IDSN j || g RNUi ||g
RNSN j || g RNUi RNSN j ||T1 ||T5 ). Then, compute key K 2 = h(IDi ||IDSN j ||RNUi ) to
encrypt C3 = E K 2 (IDi ||IDSN j ||RNSN j ||T5 ) and computer Auth1 = h(SKi j ||RNUi ||
RNSN j ). Further, transmit message (Auth1 , C3 , T5 ) to user.
Step (A5): When the user Ui acquires (Auth1 , C3 , T5 ), then firstly user checks
whether the condition |T6 − T5 |≤ ΔT is satisfied. If timestamp condition holds,
then Ui computes the key K 2 = h(h(IDi ||IDSN j ||RNUi )) and decrypts C3 . If the
C3 contains the authentic identities IDi , IDSN j , then Ui enumerates SKi∗j =
||
h(IDi ||IDSN j ||g RNUi ||g RNSN j || g RNUi RNSN j ||T1 ||T5 ). Lastly, Ui validates the
schemes legitimacy by verifying Auth1 . If it satisfies, then Ui agrees to execute
the protocol, and SKi∗j is utilized to transfer sensitive data with the SN j in the
future connection.

4.4 Password and Biometric Update Phase

In the phase where the user intends to modify their password, they are directed to
follow the subsequent steps:
Step (P1): Initially, the user inserts their smart card into the card reader, followed
by the insertion of their ID'i , PW'i , and B ' i . Then retrieve biometric key
σ 'i =Rep(B'i ,τ 'i ) by utilizing rep(.) function. Then compute RPW'i2 =Ni ⊕σ 'i and
RPW'i3 =h(ID'i ||PW'i ||RPW'i2 ). Then verify RPW'i3 =? RPWi . If the condition is
met, the user proceeds with the update phase; otherwise, the updating request is
terminated.
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 175

Step (P2): Then, Ui selects new PW'i and B ' i and uses Gen(B ' i ) = (σ ' i ,τ ' i ) to
compute RPWinew =h(IDi ||PWinew ||σinew ) ⊕ ai and send {RPWinew , IDi } to gateway
node GW.
Step (P3): On receiving {RPWinew ,IDi } from Ui , GW validates (IDi , DIDi ) by
comparing; if the condition is met, then GW computes N new = h(IDi ||X s ||DIDinew )
and Ninew = RPWinew ⊕ N new . And send this {N new ,SCi } to Ui .
new
Step (P4): On receiving {Ninew ,SCi } from GW, user then computes RPWi2
= Ni ⊕ σi
new new
and RPWi3 = h(IDi ||PWi ||RPWi2 ) and stores these new
new new new
new new
parameters (τinew ,RPWi2 ,RPWi3 ) in SCi smart card.

5 Security Analysis

5.1 Privileged Insider Attack

If an adversary manages to gain access to the user’s personal data IDi , RPWi during
the registration process, it poses a significant security risk. However, even if an
attacker obtains all the information stored in the smart card, including RPWi2 and
RPWi3 , through a power analysis attack, it would not allow them to deduce the user’s
pseudo-identity DIDi or PWi . This is due to the secure hash function medium and the
integration of genuine user biometric data, which make it impossible for the attacker
to guess or infer the user’s pseudo-identity, password, and biometric.

5.2 Stolen Smart Card Attack

Our proposed protocol involves {τi ,RPWi2 ,RPWi3 } parameter the smart card SCi .
Regardless of whether the smart card is stolen or misplaced, there are as such no
parameters that can allow a malicious activist to obtain or guess user’s secret infor-
mation. Thus, if an attacker obtains the user smart card, he will not be capable of
taking advantage of its principles. The utilization of a secure hash function medium
and the integration of authentic biometric data ensure the impossibility of an attacker
misusing the smart card data. As a result, smart card theft attacks are not possible
using the offered solution.

5.3 Ephemeral Secret Leakage Attack

Taking into consideration the situation where an adversary gains illicit access to the
random numbers RNUi and RNSN j however, these random numbers are utilized in
computing C1 , C2 , and C3 which are encrypted. Thus, although on having access to
random numbers, an attacker still will not be able to compute C1 , C2 , C3 because of
176 S. Kanwar et al.

unawareness of encryption key and the secret parameters embedded. As a result, an


attacker would not be able to compute the accurate session key.

5.4 Session Key Verification

In our proposed system, the user validates the session key by verifying the condi-
tion Auth1 = h(SKi j ||RNUi || RNSN j ), which incorporates secret parameters exclu-
sively known to the legitimate entities participating in the protocol. Consequently,
the suggested approach enables the user to authenticate and verify the session key.

5.5 Anonymity and Untraceability

In our proposed protocol, the random numbers RNUi , RNSN j , and fresh timestamp
integrated within C1 = E K U (DIDi ,g,RNUi ,IDSN j , T1 ), C2 = E MKSN J (IDi ,IDSN j ,RNUi ,
( || || )
T1 , T3 ), and C3 = E K 2 IDi ||IDSN j ||RNSN j ||T 5 that are exchanged during various
phases of the protocol are individually customized for each session, ensuring their
uniqueness and distinctiveness in sessions. Consequently, an adversary is unable to
track the activities of the user and sensor nodes. Furthermore, since no identifiable
information is transmitted openly through insecure channels, this scheme provides
both anonymity and Untraceability.

5.6 Offline Password Guessing Attack

If an unauthorized individual intercepts the transmitted messages (DIDi ,C1 , T1 ),


(IDSN j , C2 ), and (Auth1 , C3 , T5 ) over an open channel and somehow manages to
obtain {τi ,RPWi2 ,RPWi3 } from SCi , they may attempt to compute the user’s secret
data. However, it is important to note that the attacker cannot compute sensitive infor-
mation such as RPWi = h(IDi ||PWi ||σi ) ⊕ x and RPWi3 = h(IDi ||PWi ||RPWi2 )
without correctly guessing the values of IDi , PWi , and Bi . This ensures the
confidentiality and protection and intended scheme is resistant to password guessing.

6 Conclusion

In this study, we first scrutinize multi-factor-based authentication systems proposed


by Renuka et al. protocol [16], which are primarily used to achieve real-time data
access for security-critical wireless sensor networks. We have analyzed that although
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 177

having formal evidence provided in Renuka et al. [16] protocol, their techniques are
still vulnerable to smart card loss attacks, privileged insider attacks, ephemeral secret
leakage attack, and flaw in password update phase. The findings of our cryptanalysis
discourage the practical deployment of these schemes and highlight certain issues
in constructing a strong scheme for WSNs. And, a new authentication protocol for
WSNs is proposed in this paper that achieves all the required security goals.

References

1. Das AK (2015) A secure and efficient user anonymity-preserving three-factor authentica-


tion protocol for large-scale distributed wireless sensor networks. Wireless Pers Commun
82(3):1377–1404
2. Das AK, Chatterjee S, Sing JK (2015) A new biometric-based remote user authentication
scheme in hierarchical wireless body area sensor networks. Adhoc Sens Wirel Netw 28
3. Li CT, Hwang MS (2010) An efficient biometrics-based remote user authentication scheme
using smart cards. J Netw Comput Appl 33(1):1–5
4. Mo J, Hu Z, Lin Y (2020) Cryptanalysis and security improvement of two authentication
schemes for healthcare systems using wireless medical sensor networks. Secur Commun Netw
5. Ali R, Pal AK, Kumari S, Sangaiah AK, Li X, Wu F (2018) An enhanced three factor-based
authentication protocol using wireless medical sensor networks for healthcare monitoring. J
Amb Intell Hum Comput 1–22
6. Amin R, Islam SH, Biswas GP, Khan MK, Kumar N (2018) A robust and anonymous patient
monitoring system using wireless medical sensor networks. Futur Gener Comput Syst 80:483–
495
7. Wong KH, Zheng Y, Cao J, Wang S (2006, June) A dynamic user authentication scheme for
wireless sensor networks. In: IEEE International conference on sensor networks, ubiquitous,
and trustworthy computing (SUTC’06), vol 1. IEEE, p 8
8. Watro R, Kong D, Cuti SF, Gardiner C, Lynn C, Kruus P (2004, Oct) TinyPK: securing sensor
networks with public key technology. In: Proceedings of the 2nd ACM workshop on security
of ad hoc and sensor networks, pp 59–64
9. Das ML (2009) Two-factor user authentication in wireless sensor networks. IEEE Trans
Wireless Commun 8(3):1086–1090
10. He D, Gao Y, Chan S, Chen C, Bu J (2010) An enhanced two-factor user authentication scheme
in wireless sensor networks. Ad Hoc Sens Wirel Netw 10(4):361–371
11. Vaidya B, Makrakis D, Mouftah HT (2010, Oct) Improved two-factor user authentication in
wireless sensor networks. In: 2010 IEEE 6th International conference on wireless and mobile
computing, networking and communications. IEEE, pp 600–606
12. Chen TH, Shih WK (2010) A robust mutual authentication protocol for wireless sensor
networks. ETRI J 32(5):704–712
13. Das AK, Sharma P, Chatterjee S, Sing JK (2012) A dynamic password-based user authentication
scheme for hierarchical wireless sensor networks. J Netw Comput Appl 35(5):1646–1656
14. Wang D, Wang P (2014) Understanding security failures of two-factor authentication schemes
for real-time applications in hierarchical wireless sensor networks. Ad Hoc Netw 20:1–15
15. Thakur G, Kumar P, Jangirala S, Das AK, Park Y (2023) An effective privacy preserving
blockchain-assisted security protocol for cloud-based digital twin environment. IEEE excess
11:26877–26892
16. Renuka K, Kumar S, Kumari S, Chen CM (2019) Cryptanalysis and improvement of a
privacy-preserving three-factor authentication protocol for wireless sensor networks. Sensors
19(21):4625
Blockchain-Powered Crowdfunding:
Assessing the Viability, Benefits,
and Risks of a Decentralized Approach

Manu Midha, Saumyamani Bhardwaz, Rohan Godha, Aditya Raj Mehta,


Sahul Kumar Parida, and Saswat Kumar Panda

Abstract This research paper presents a comparative study of traditional crowd-


funding platforms and crowdfunding platforms that use blockchain technology. The
paper first provides background information on crowdfunding and the challenges
faced by investors, such as fraud and lack of transparency. It then explains blockchain
technology and its potential benefits for crowdfunding platforms, such as increased
security, transparency, and efficiency. The paper then outlines the methodology used
to design and implement a crowdfunding platform using blockchain technology and
collects data to analyze the platform’s performance. The results of the analysis show
that blockchain-based crowdfunding platforms have several advantages over tradi-
tional platforms, such as increased security, transparency, and cost-effectiveness.
Additionally, the paper evaluates the platform’s performance using various param-
eters and compares it with traditional crowdfunding platforms. The study finds that
blockchain-based crowdfunding platforms have significant advantages over tradi-
tional platforms, such as faster transaction processing and lower transaction fees.
Finally, the paper discusses the implications of the research findings, makes recom-
mendations for future work, and highlights the limitations of the study. Overall,
this study provides valuable insights into the potential of blockchain technology for
crowdfunding platforms and serves as a basis for further research in this area.

M. Midha · S. Bhardwaz · R. Godha (B) · A. R. Mehta · S. K. Parida · S. K. Panda


Department of Computer Science and Engineering, Chandigarh University, Kochi, India
e-mail: rohangodha.official@gmail.com
M. Midha
e-mail: manumidha61@gmail.com
S. Bhardwaz
e-mail: bhardwajsaumyamani@gmail.com
A. R. Mehta
e-mail: 20bcs1780@cuchd.in
S. K. Parida
e-mail: 20bcs4919@cuchd.in
S. K. Panda
e-mail: 20bcs5266@cuchd.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 179
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_16
180 M. Midha et al.

Keywords Crowdfunding · Blockchain · Smart contracts · Decentralization ·


Investment

1 Introduction

Crowdfunding is a popular method of raising funds for various projects and initiatives.
It involves a large number of people contributing small amounts of money to a project,
instead of relying on a single investor or institution for funding. Crowdfunding has
been used to finance a range of ventures, including startups, social causes, and creative
projects [1].

1.1 Background Information on Crowdfunding and Its


Challenges

However, crowdfunding also poses several challenges. One of the major challenges
is the lack of transparency and accountability in the crowdfunding process. Since
there is no central authority governing the process, it can be difficult for backers to
verify the authenticity of the project and ensure that their funds are being used for
the intended purpose. In addition, there is a risk of fraud, with some individuals or
groups using crowdfunding platforms to scam backers and disappear with the funds.

1.2 Brief Explanation of Blockchain Technology

Blockchain technology, on the other hand, is a decentralized, distributed ledger tech-


nology that offers a high level of transparency and security [2]. It has gained popu-
larity in recent years due to its potential to disrupt traditional business models and
increase trust and accountability in various sectors. In the context of crowdfunding,
blockchain technology can potentially address some of the challenges associated
with the traditional crowdfunding model.

1.3 Purpose of the Research Paper

This research paper explores the potential of blockchain technology to revolu-


tionize crowdfunding, examining how it can increase transparency, security, and
affordability. It will also discuss the challenges and limitations of blockchain-based
crowdfunding and suggest solutions.
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 181

2 Literature Review with Background Study

Crowdfunding has emerged as a popular method of raising funds for businesses,


startups, and other ventures. It allows entrepreneurs to bypass traditional methods
of raising capital, such as bank loans or venture capital investments, by soliciting
small amounts of money from many individuals, typically through online platforms.
While crowdfunding can provide a way for small businesses and startups to get off
the ground, it is not without its challenges and risks.

2.1 Timeline of the Reported Problem

In recent years, crowdfunding has become a popular way for entrepreneurs and small
businesses to raise money. However, it is not without its challenges [3]. One of the
main issues with crowdfunding is the risk of fraud. As crowdfunding has become
more popular, the number of fraudulent campaigns has also increased. This has led
to a number of high-profile cases where individuals have lost money after investing
in fraudulent campaigns.

2.2 Risks Faced by Investors in Crowdfunding

Investing in a crowdfunding campaign carries a number of risks. One of the main


risks is that the project may not be successful. Unlike traditional investments, where
investors receive a share of the profits, crowdfunding investors typically receive
rewards or perks based on the amount they invest. These rewards may not have a
monetary value, and there is no guarantee that the project will be successful. This
means that investors could potentially lose all of the money they invest.

2.3 Fraud in Crowdfunding

Fraud is a major risk associated with crowdfunding, as it can be perpetrated by


individuals or groups looking to make a quick profit. Fake campaigns can be created
using stolen images and videos, or individuals may use funds for personal gain.
182 M. Midha et al.

2.4 Use of Blockchain in Crowdfunding Platforms

Blockchain technology is a distributed ledger technology that provides a way to


securely and transparently record transactions. It can help to prevent fraud, provide
greater transparency, and provide greater security. It is based on a decentralized
network, making it more difficult for hackers to attack.

3 Methodology

The methodology used to conduct this research involved a systematic literature


review of peer-reviewed academic journals and conference proceedings related to
crowdfunding, blockchain technology, and fraud in crowdfunding. The purpose of
the literature review was to identify and analyze the current state of knowledge on
the use of blockchain technology to address the risks associated with crowdfunding,
particularly in relation to fraud prevention.

3.1 Research Design

The research design used in this study was a qualitative approach that involved a
comprehensive analysis of existing literature on crowdfunding and blockchain tech-
nology. A systematic literature review was used to identify relevant peer-reviewed
academic journals and conference proceedings, and data were extracted from these
sources using a predetermined set of criteria.

3.2 Data Collection Methods

The data collection methods used in this study involved the identification and selec-
tion of relevant peer-reviewed academic journals and conference proceedings using
a comprehensive search strategy. The search strategy was designed to identify all
relevant literature on the use of blockchain technology in crowdfunding and fraud
prevention. The selected papers were then reviewed and analyzed to identify key
themes and patterns [4] (Fig. 1).
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 183

Fig. 1 Flow diagram for the crowdfunding process [5]

3.3 Data Analysis Techniques

This study used data analysis techniques to extract key themes and patterns from
the selected literature on the use of blockchain technology in crowdfunding and
fraud prevention. The content analysis approach involved systematic identification
and categorization of data based on a predetermined set of criteria. The results of
the content analysis were then used to draw conclusions about the use of blockchain
technology in crowdfunding and its potential for addressing the risks associated with
fraud. The research design, data collection methods, and data analysis techniques
used in this study were intended to provide a rigorous and systematic approach to
analyzing the existing literature on this topic.

4 Design and Implementation of the Crowdfunding


Platform Using Blockchain

Crowdfunding is a platform that enables entrepreneurs to raise funds from individuals


or organizations for their projects. Blockchain technology provides a new way of
creating a secure and transparent crowdfunding platform. In this section, we will
184 M. Midha et al.

Fig. 2 High-level steps involved in the proposed crowdfunding model

discuss the design and implementation of a crowdfunding platform using blockchain


technology [6] (Fig. 2).

4.1 Description of the Architecture of the Platform

The architecture of the crowdfunding platform using blockchain consists of three


main components: the smart contract, the user interface, and the database. The
smart contract is the backbone of the platform, which is deployed on the blockchain
network. It ensures that the funds are transferred from the backers to the project
owners once the fundraising target is met. The user interface provides a graph-
ical interface for the users to interact with the platform. The database stores all the
information related to the project, backers, and transactions.

4.2 Blockchain Smart Contract Source Code

The smart contract is written in the Solidity programming language and is deployed
on the Ethereum blockchain network [7]. The smart contract consists of two main
functions: the create Project function and the contribute function. The create Project
function allows the project owners to create a project by providing the project details
and the fundraising target. The contribute function allows the backers to contribute
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 185

to the project by sending funds to the smart contract address. Once the fundraising
target is met, the funds are transferred from the smart contract to the project owners.

4.3 Front-End and Back-End Development Tools

The front end of the crowdfunding platform is developed using ReactJS, which is a
popular JavaScript library for building user interfaces. The back end of the platform
is developed using NodeJS, which is a JavaScript runtime environment. The back
end interacts with the smart contract deployed on the blockchain network using the
web3.js library [8].

4.4 Deployment on the Thirdweb Framework

The Thirdweb framework is a decentralized web framework that allows developers


to deploy their applications on the blockchain network, providing a secure and
decentralized environment for the crowdfunding platform (Fig. 3).
The design and implementation of a crowdfunding platform using blockchain
technology provide a secure and transparent platform for entrepreneurs to raise funds
for their projects. The platform architecture consists of the smart contract, user inter-
face, and database, and the smart contract source code is written in Solidity. The
front end and back end of the platform are developed using ReactJS and NodeJS,
respectively, and the platform is deployed on the Thirdweb framework [9].

5 Results and Findings

Blockchain technology can improve transparency, security, and accountability


in crowdfunding by creating tamper-proof records and automating tasks. Smart
contracts can reduce the need for intermediaries and lower transaction costs. Further
research is needed (Fig. 4, Table 1).
186 M. Midha et al.

Fig. 3 Features of our Web3 crowdfunding platform using blockchain navigation board constituting
options like (all campaigns viewer, create new campaigns, amount raised, donated amount by user,
user account, sign out/log out, lite/dark mode switcher, search for a campaign from here, visit your
own profile/change settings)

Fig. 4 Benefits of
blockchain technology
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 187

Table 1 Comparison with traditional crowdfunding platforms


Parameter Traditional crowdfunding Crowdfunding with blockchain
Trust Trust in the platform and the Decentralized trust through
ıntermediary is critical blockchain
Transparency Limited transparency in the High transparency through smart
investment process contracts and the blockchain
Accessibility Open to anyone, but limited to Open to anyone, without any
accredited ınvestors ın some restrictions
countries
Security Investors face risks due to lack of Strong security due to
regulation and oversight cryptography and
decentralization
Speed Slow processing times for Fast processing times for
transactions and disbursements transactions and disbursements
Fees High fees charged by Lower fees due to the absence of
intermediaries ıntermediaries
Investor protection Limited protection against fraud Robust investor protection
and mismanagement through smart contracts and the
blockchain
Liquidity Limited secondary market for Possibility of trading
crowdfunding investments crowdfunding assets on
blockchain-based exchanges
Governance Lack of ınvestor participation in Enhanced ınvestor participation
decision-making through blockchain-based voting
systems
Cross-border ınvesting Limited cross-border ınvestment Easy cross-border ınvestment
opportunities opportunities due to the global
nature of blockchain
Trust Trust in the platform and the Decentralized trust through
intermediary is critical blockchain

6 Discussion and Conclusion

6.1 Implications of the Research Findings

Blockchain technology in crowdfunding platforms can reduce risks and increase


transparency, while also providing a user-friendly interface and lower fees, making
it a more cost-effective option for investors and campaign organizers [10–12]. They
propose a dedicated platform that prioritizes security and decentralization, over-
coming limitations and vulnerabilities of existing platforms. Their solution leverages
blockchain technology to ensure transparent and secure transactions. The authors’
focus on creating a tailored platform allows for innovative features, enhancing
user experience and improving security. By utilizing blockchain, they establish
tamper-resistant and verifiable data, safeguarding the interests of project creators
188 M. Midha et al.

Fig. 5 Growth trends of


crowdfunding platforms in
past half decade

and backers while fostering trust in crowdfunding. Overall, the document presents
a fresh approach to crowdfunding, emphasizing security and transparency through
a dedicated platform. With the utilization of blockchain, the authors aim to shape
a more secure and decentralized crowdfunding ecosystem, paving the way for the
future of crowdfunding (Fig. 5).

6.2 Recommendations for Future Work

Future research should conduct a larger-scale study with a larger sample size to
validate the findings of this study and focus on the potential of using blockchain
technology in other types of crowdfunding [13–15].

6.3 Conclusion

In conclusion, blockchain technology can provide a more secure and transparent envi-
ronment for investors and campaign organizers, with smart contract code reducing
fraud risk. Further research is needed to explore potential [16].

References

1. Ahlers GK, Cumming D, Günther C (2015) Signaling in equity crowdfunding. Entrep Theory
Pract 39(4):955–980
2. Ang JS, Hachemi A (2018) An investigation of crowdfunding success factors in the United
States. J Small Bus Manag 56(2):263–282
3. Belleflamme P, Lambert T, Schwienbacher A (2014) Crowdfunding: tapping the right crowd.
J Bus Ventur 29(5):585–609
4. Carter RE, Lusch RF (2020) Crowdfunding the commons: how blockchain can facilitate peer-
to-peer renewable energy financing. Energy Res Soc Sci 68:101563
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 189

5. Cai C, Xu Y (2020) The impact of blockchain on crowdfunding. Electr Commer Res Appl
41:100918
6. Colombo MG, Franzoni C, Rossi-Lamastra C (2015) Internal social capital and the attraction
of early contributions in crowdfunding. Entrep Theory Pract 39(1):75–100
7. Crosby M, Pattanayak P, Verma S, Kalyanaraman V (2016) Blockchain technology: beyond
bitcoin. Appl Innov 2(6–10):71–81
8. Dangi AK, Kumar S, Rai R (2020) Blockchain-enabled crowdfunding and its potential impact
on entrepreneurship. J Bus Res 112:150–162
9. Gerber EM, Hui JS, Kuo PY (2012) Crowdfunding: why people are motivated to post and fund
projects on crowdfunding platforms. In: Proceedings of the ınternational workshop on design,
ınfluence, and social technologies: techniques, ımpacts and ethics, pp 9–16
10. Zhang Z, Li S, Lu K (2020) Blockchain for crowdfunding: state-of-the-art, challenges, and
opportunities. IEEE Access 8:17829–17843
11. Lee JW, Lee J, Lee J, Park CK (2018) A study on the utilization of blockchain in crowdfunding.
Sustainability 10:9
12. Parojcic A, Gillette JL (2019) Crowdfunding and blockchain: challenges and opportunities for
social entrepreneurs. J Soc Entrep 10(3):287–305
13. Wang NNY (2018) How blockchain is revolutionizing crowdfunding. J Dig Bank 3(1):29–39
14. Forti F, Schirripa Spagnolo F, Visaggio G (2018) Crowdfunding and blockchain: opportu-
nities and challenges. In: Proceedings of the 2018 ınternational conference on ınformation
management and technology
15. Young AJ (2018) Blockchain and crowdfunding: a legal perspective. J Finan Crime 25(4):1029–
1037
16. Vermeulen P (2015) Crowdfunding on the blockchain: a brief analysis. In: Proceedings of the
14th ınternational conference on mobile business
Geospatial Project: Landslide Prediction

Harsh Sharma, Harsh Jindal, Megha Sharma, Abhinav Sehgal,


Abhinav Sharma, and Rohan Godha

Abstract This literature review examines the use of machine learning (ML) algo-
rithms for landslide identification and provides an overview of recent studies in this
field. The most used algorithms for landslide identification include support vector
machine (SVM), decision trees, random forests, artificial neural networks (ANNs),
and deep learning models such as convolutional neural networks (CNN). The review
highlights the strengths and limitations of these approaches, such as data scarcity,
imbalanced datasets, and interpretability issues, and proposes solutions to these chal-
lenges. Two specific studies in landslide prediction using ML are discussed, including
a digital inventory using supervised learning to detect landslides and an optimized
random forest model to evaluate landslide susceptibility with 16 conditioning factors.
The review concludes by emphasizing the potential of ML techniques for landslide
identification and the importance of understanding their strengths and limitations.
This paper provides valuable insights for researchers and practitioners interested in
applying ML algorithms for landslide identification and prediction.

Keywords Support vector machine · Artificial neural networks · Convolutional


neural networks · Machine learning · Landslide susceptibility · Random forest ·
Rainfall · Rainfall ınduced landslides · Xtreme gradient boos

H. Sharma (B) · H. Jindal · M. Sharma · R. Godha


Computer Science and Engineering, Chandigarh University, Kochi, India
e-mail: harsh.e13523@cumail.in
H. Jindal
e-mail: HarshJindal@ieee.org
M. Sharma
e-mail: megha.e13337@cumail.in
A. Sehgal · A. Sharma
Computer Science and Engineering, Chandigarh Group of College, Sahibzada Ajit Singh Nagar,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 191
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_17
192 H. Sharma et al.

1 Introduction

Landslides are a significant natural hazard that can cause substantial damage to
both infrastructure and human life, resulting in economic losses, displacement of
communities, and even loss of lives. These events occur when soil and rock mass
movements slide, topple, or flow rapidly, leading to the sudden failure of slopes or
hillsides. The factors that contribute to the occurrence of landslides are complex and
multifaceted, involving geological, environmental, and human-related aspects. For
example, geological factors such as the type and structure of rocks, soil types, slope
steepness, and elevation can significantly influence the likelihood of a landslide.
Similarly, environmental factors such as heavy rainfall, earthquakes, and changes
in groundwater levels can trigger landslides. Human activities, including deforesta-
tion, construction, mining, and agricultural practices, can also destabilize slopes and
increase the risk of landslides. Given the complexity of landslide causation, it is diffi-
cult to predict exactly where and when landslides will occur [1]. However, there are
ways to identify areas that are more susceptible to landslides. For instance, geolog-
ical mapping, satellite imagery, and LiDAR data analysis can be used to assess
the geology and topography of a region and identify areas with a higher risk of
landslides. Additionally, monitoring, and early warning systems, such as sensors
and alarms, can provide alerts to authorities and communities in landslide-prone
areas. To prevent the negative impacts of landslides, it is crucial to implement effec-
tive mitigation and adaptation strategies. These can include avoiding development
in high-risk areas, implementing slope stabilization measures, and implementing
effective land-use planning and management practices. Community awareness and
education programs can also play a crucial role in reducing the impacts of landslides
by informing residents of the risks and teaching them how to prepare and respond to
landslide events [2].
Machine learning (ML) algorithms have revolutionized the field of landslide iden-
tification and prediction due to their ability to process and analyze large amounts of
data in a relatively short amount of time. These algorithms have the potential to iden-
tify patterns and relationships that may not be detectable using traditional methods,
and they can be used to make accurate predictions about future landslide events.
Several ML algorithms have been employed in landslide identification studies, each
with its own strengths and limitations. Support vector machine (SVM) is a popular
ML algorithm that works by finding the best separation between two classes of data.
Decision trees are another widely used ML algorithm that creates a tree-like model of
decisions and their possible consequences. Random forests are an extension of deci-
sion trees that involve building multiple trees and averaging their results to improve
accuracy. Artificial neural networks (ANNs) are ML algorithms modeled on the
structure and function of the human brain, with input and output layers and one or
more hidden layers. Deep learning models, such as convolutional neural networks
(CNNs), have also been used in landslide identification studies due to their ability to
process large amounts of data and detect complex patterns. One of the primary chal-
lenges in using ML algorithms for landslide identification is the need for high-quality
Geospatial Project: Landslide Prediction 193

input data. Accurate and reliable data, including geological and meteorological data,
is essential for ML algorithms to learn and make accurate predictions. In addition,
the performance of ML algorithms is heavily influenced by the selection of features
or variables that are inputted into the model. Therefore, it is important to carefully
select and preprocess data to ensure that the ML algorithm is trained on the most rele-
vant and informative variables. Despite these challenges, ML algorithms have shown
great promise in landslide identification and prediction, and they have the potential
to significantly improve our ability to mitigate and adapt to landslide hazards. By
combining traditional methods with advanced ML algorithms, researchers and prac-
titioners can gain a more comprehensive understanding of landslide occurrence and
develop effective risk management strategies to reduce the impact of these natural
hazards.
Landslides are a significant natural hazard that can cause substantial damage
to both infrastructure and human life. To mitigate the risks associated with land-
slides, there is a growing interest in using machine learning (ML) algorithms for
landslide identification and prediction. Recent studies have used various ML algo-
rithms, including support vector machine (SVM), decision trees, random forests, arti-
ficial neural networks (ANNs), and deep learning models like convolutional neural
networks (CNN), to identify and predict landslides. This literature review provides
an overview of recent studies that have used ML algorithms for landslide identifi-
cation. The review discusses the strengths and limitations of these approaches, as
well as challenges related to data scarcity, imbalanced datasets, and interpretability
issues. Proposed solutions to these challenges, such as data augmentation tech-
niques, sampling strategies, and model explainability tools, are also highlighted.
One example of a study that used ML for landslide identification involved a digital
inventory using supervised learning to detect landslides in Rudraprayag. The study
used a combination of image processing techniques and SVM to identify landslides
from satellite images.
The authors reported that their approach achieved an accuracy of 90.9% in
detecting landslides, demonstrating the potential of ML algorithms in landslide iden-
tification. Another study focused on landslide prediction and used a random forest
model to evaluate landslide susceptibility with 16 conditioning factors as parame-
ters. The study applied data preprocessing techniques to address the imbalanced data
issue and reported that their approach achieved high accuracy in predicting land-
slide susceptibility, indicating the potential of ML algorithms in landslide predic-
tion. Despite the promising results, ML approaches for landslide identification and
prediction still face challenges. Data scarcity and imbalanced datasets are common
issues in landslide studies, which can affect the performance of ML algorithms. In
addition, the interpretability of ML models can be a challenge, particularly when
the models involve complex deep learning architectures. Proposed solutions to these
challenges include data augmentation techniques to address data scarcity, sampling
strategies to address imbalanced datasets, and model explainability tools to enhance
the interpretability of ML models. Landslides are a significant natural hazard that can
cause devastating damage to human life and infrastructure. In recent years, machine
194 H. Sharma et al.

learning (ML) techniques have emerged as powerful tools for landslide identifica-
tion and prediction due to their ability to process large amounts of data and identify
complex patterns that are difficult to detect with traditional methods.
This literature review provides an overview of recent studies that have used
ML algorithms for landslide identification and prediction, including support vector
machine (SVM), decision trees, random forests, artificial neural networks (ANNs),
and deep learning models like convolutional neural networks (CNN). The review
emphasizes the strengths and limitations of these approaches, including challenges
related to data scarcity, imbalanced datasets, and interpretability issues. Proposed
solutions to these challenges, such as data augmentation techniques, sampling strate-
gies, and model explainability tools, are discussed in detail. The review also high-
lights two specific studies that have successfully used ML for landslide identification
and prediction, demonstrating the potential of ML algorithms in this field. While ML
techniques offer promising results, they also face significant challenges. For example,
data scarcity and imbalanced datasets can affect the performance of ML algorithms,
while complex deep learning architectures can pose interpretability issues. Proposed
solutions to these challenges, such as data augmentation, sampling strategies, and
model explainability tools, can help address these issues and enhance the accuracy
and reliability of ML-based landslide models. In conclusion, the use of ML algo-
rithms in landslide identification and prediction has the potential to significantly
contribute to reducing the impact of landslides on society and infrastructure.
However, it is essential to understand the strengths and limitations of ML tech-
niques and address the challenges associated with using these methods. Ultimately,
a combination of traditional methods and ML techniques can provide valuable
insights into landslide prediction and identification, leading to more effective risk
management strategies and reduced impact on communities.

2 Lıterature Revıew

2.1 Landslide Prediction Using Machine Learning

The use of machine learning (ML) techniques to identify landslides has received
increasing attention in recent years. This literature review provides an overview
of recent studies applying ML algorithms to identify landslides and highlights the
strengths and weaknesses of these approaches.
Support vector machine (SVM) is a commonly used ML algorithm for landslide
identification. SVMs are particularly well suited for high-dimensional datasets and
can handle nonlinear relationships. Che et al. used SVM to classify landslides based
on morphological features and achieved an accuracy of 86.8%. Similarly, Yu et al.
used SVM to classify landslides based on their spectral characteristics, achieving an
overall accuracy of 88.5% [3].
Geospatial Project: Landslide Prediction 195

Decision trees and random forests are also popular ML algorithms for landslide
identification. Decision trees are especially useful for classifying landslides based
on their physical characteristics. Random forests, on the other hand, can handle
imbalanced data sets and reduce overfitting. Choye et al. used decision trees to classify
landslides based on their physical characteristics, achieving 88.9% accuracy. Cheat
Al’s random forest was used to classify landslides based on terrain features, achieving
89.9% accuracy. Artificial neural networks (ANNs) are another type of ML algorithm
that has been applied to landslide identification. ANNs are particularly useful for
landslide hazard mapping and early warning systems. Hmmm [4] used ANNs to
predict landslide vulnerability based on terrain, geology, and environmental factors,
achieving 82.5% accuracy. Hmmm used ANN to develop a landslide early warning
system based on real-time monitoring data, achieving 91.2% accuracy.
Deep learning models such as convolutional neural networks (CNN) have also
been applied to identify landslides using remote sensing data. CNNs are particularly
useful for feature extraction from high-dimensional data such as satellite imagery.
Huang et al. used a CNN-based approach to identify landslides from satellite imagery,
achieving an overall accuracy of 90.3%. Zhang et al. used a deep learning model to
detect landslides from synthetic aperture radar (SAR) images, achieving a detection
accuracy of 93.6% [5].
Despite great progress in applying ML techniques to landslide identification,
these approaches also have some limitations. One of the main limitations is the
scarcity of data, especially in areas with restricted access and scarce data. In addition,
imbalanced datasets and interpretability issues also pose challenges in developing
ML models to identify landslides. This can lead to imbalanced datasets that can
adversely affect the performance of Interpretability issues and pose challenges, such
as for some ML algorithms: B. Deep learning models can be difficult to interpret,
hindering their application to decision-making processes [6].
To address these challenges, researchers have proposed various solutions,
including B. Data Augmentation Techniques, Sampling Strategies, and Model
Explainability Tools. Data enrichment techniques such as oversampling and under
sampling help correct imbalanced datasets by generating synthetic samples or
removing some samples to balance the dataset. Sampling strategies such as stratified
sampling and cluster sampling also help correct imbalanced data sets by choosing
a representative sample for each class. Model explainability tools, such as feature
importance analysis and model visualization, help improve the interpretability of ML
models and enable their application in decision-making processes. In summary, the
literature review highlights the potential of ML techniques in landslide identification
and provides insight into the strengths and limitations of these approaches.
196 H. Sharma et al.

2.2 Landslide Prediction Using Hyperparameter


Optimization Using the Bayes Algorithm

This research study focuses on improving the accuracy of landslide susceptibility


mapping by optimizing the hyperparameters of a random forest model. Landslides
are frequent and serious geological hazards that cause significant economic losses
and fatalities worldwide, making it crucial to find effective solutions to prevent and
control landslide disasters. The study concentrates on a mountainous area prone to
landslides and identifies 16 conditioning factors that affect landslide susceptibility.
These factors include elevation, annual average rainfall, and distance from roads and
buildings, among others [7]. To create a geospatial dataset, the study randomly selects
samples with and without historical landslides in a ratio of 1:10. The study uses a
Bayesian optimization algorithm to optimize the hyperparameters of the random
forest model and selects the optimal hyperparameters to train the model for landslide
susceptibility evaluation. The study then conducts an analysis of landslide suscep-
tibility mapping for the entire study area and uses the recursive feature elimination
method to identify the dominant conditioning factors that can explain the degree of
landslide susceptibility [8] (Fig. 1).
The study finds that the random forest model with optimized hyperparameters
achieved high accuracy in landslide susceptibility evaluation. The AUC values of
the ROC curve in the training data set, verification data set, and regional simulation
were 0.95, 0.87, and 0.93, respectively. The model’s prediction was consistent with
the distribution characteristics of historical landslides in the study area. The study
also reveals that human activities contribute significantly to landslide susceptibility
[10].

Fig. 1 Artifacts empowered by artificial ıntelligence. Source LNCS 5640, p. 115 [9]
Geospatial Project: Landslide Prediction 197

The study area is in Fengjie County, China, which has complex tectonic stress
fields, abundant rainfall, and a Central Asian tropical humid monsoon climate with
an annual average precipitation of 1132 mm. The research aims to find effective
solutions to prevent and control landslide disasters and improve the accuracy of
landslide susceptibility mapping, which can ultimately help reduce the risks and
impacts of landslides.
In this research study, a comprehensive collection, sorting, and organization of
geospatial data related to landslide formation, types, and triggers of historical land-
slides in Fengjie County from 2001 to 2016 were conducted. The dataset consisted
of 1520 historical landslides and their associated factors categorized by type and
trigger. Most landslides in the study area were small, shallow, and soil-based, with
most triggered by rainfall. The processed data provided insights that can be utilized
to identify patterns and relationships between landslide formation and contributing
factors in the study area [11].
The study also acknowledges the complexity of landslide formation mechanisms
and how susceptibility to landslides is influenced by both natural and human factors.
The authors highlighted the limitation of most landslide susceptibility models that
only consider a few factors. Thus, the study identified 16 conditioning factors
that were categorized into four aspects: topography, geological conditions, envi-
ronmental conditions, and human activities. The selection of these factors was
guided by the principles of measurability, operability, unevenness, completeness,
and non-redundancy [12].
To evaluate landslide susceptibility accurately, the study employed the random
forest model, an ensemble learning method used for both regression and classification
tasks. The model involves constructing multiple decision trees through different data
subsets and combining their outputs to obtain the result. The random forest model is
known for its ability to handle noise and outliers, reduce overfitting, and provide high
prediction accuracy and stability. The study showed that the optimized random forest
model achieved high accuracy in landslide susceptibility evaluation, with AUC values
of 0.95, 0.87, and 0.93 for the training data set, verification data set, and regional
simulation, respectively. Additionally, the model’s prediction was consistent with the
distribution characteristics of historical landslides in the study area. The recursive
feature elimination method was used to identify the dominant conditioning factors
that explained the degree of landslide susceptibility, with the contribution of human
activities accounting for a significant proportion of the conditioning factors affecting
landslide susceptibility [13].
Random Forest (Rf) Random forest is one of the powerful machine learning algo-
rithms that is being used for predictive modeling, including classification and regres-
sion. Being an ensemble learning method a combination of multiple decision trees
is used to make predictions. In a random forest model, using random subsets of the
available data and random subsets of the available features a set of decision trees is
created. Each decision tree is trained on a different subset of the data and features,
which helps to reduce overfitting and increase the accuracy of the model.
198 H. Sharma et al.

In the random forest model when a new data point is presented, each decision tree
makes a prediction based on the features of the data point. The final prediction of
the model is then determined by taking the average or majority vote of the individual
tree predictions.
Random forest models have several advantages, including their ability to handle
high-dimensional data, their robustness to noisy and missing data, and their ability
to capture complex nonlinear relationships between features and the target variable.
They are widely used in a variety of fields, including GIS, where they have been
applied to tasks such as land-use classification, forest mapping, and urban growth
prediction. The random forest predicts the mean square error (MSE) of out of bag
  2
part of the data: MSEOOB = n − 1 ni = 1 zi − ẑOOBi .
Xtreme Gradient Boost (XG Boost) For predictive modeling tasks, XGBoost
(extreme gradient boosting) is another popular machine learning algorithm that is
often used. Combining multiple weak learners (usually decision trees) into a single
strong learner it justifies the ensemble learning method of it [14–16].
In XGBoost, following the sequential manner, the weak learners are trained, where
each new tree is trained to correct the errors made by the previous trees. By assigning
higher weights to the misclassified data points this is achieved, which ensures that
subsequent trees focus on the more difficult to predict instances.
The ability to handle large datasets and feature spaces with high-dimensional
values, as well as its strong performance on a wide range of classification and
regression tasks. It also offers several advanced features, including regularization
techniques, early stopping, and automatic handling of missing values.
One of the key advantages of XGBoost over other ensemble learning methods is
its scalability and speed. It has been shown to outperform other popular algorithms,
such as random forest and gradient boosting, on many benchmark datasets. XGBoost
has been widely used in a variety of fields, including GIS, where it has been applied
to tasks such as land cover classification, urban growth prediction, and soil mapping
[17].
Convolutional Neural Network (CNN) Landslide prediction using machine
learning is an active research area, and convolutional neural networks (CNNs) have
shown promising results in this area. CNNs are a type of neural network commonly
used for image recognition tasks and can also be used for landslide prediction. The
basic idea behind using a CNN for landslide prediction is to feed the network with
satellite imagery or other geospatial data of an area of interest and train it to classify
different areas as landslide-prone or not. CNNs are trained on labeled data obtained
by matching historical landslide events with appropriate geospatial data. Once the
CNN is trained, it can be used to make predictions on new unseen data. This is typi-
cally done by applying a trained CNN to new satellite imagery or other geospatial
data and using the network’s output to identify areas where landslides are likely to
occur [18].
Geospatial Project: Landslide Prediction 199

One of the benefits of using CNNs for landslide prediction is the ability to auto-
matically extract relevant features from geospatial data without requiring manual
feature engineering. This saves a lot of time and effort during the data preparation
phase. However, like any other machine learning model, the accuracy of predictions
made by CNNs for landslide prediction depends not only on the specific architec-
ture and hyperparameters of the network but also on the quality of the training data.
Therefore, careful data preparation and model tuning is essential to achieve good
performance [19].
Support Vector Machine (SVM) Support vector machines (SVMs) are another
popular machine learning algorithm that has been used for landslide prediction. SVM
is a class of supervised learning algorithms that can be used for both classification
and regression tasks [20]. In the context of landslide prediction, SVMs can be used
to classify different areas into landslide-prone and landslide-resistant areas based on
characteristics such as slope, elevation, land-use, and soil type increase. SVM uses
kernels to compare the similarity between two observations. The below mentioned
equation describes radial kernel:
⎛ ⎞

m
 2
K (xi , xi  ) = exp⎝− λ xi j − xi  j ⎠
j=1

The SVM algorithm finds a hyperplane in the feature space that maximizes the
separation of positive and negative samples. Hyperplanes are chosen to be as far
away as possible from the closest data points in both classes, called support vectors.
An advantage of using SVMs for landslide prediction is that kernel functions can
be used to handle nonlinear decision boundaries. This allows algorithms to model
complex relationships between features and target variables.
SVM has been used successfully in several studies to predict landslides in combi-
nation with other machine learning techniques such as decision trees and random
forests. However, like any machine learning algorithm, SVM performance depends
on data quality, feature selection, and hyperparameter tuning. In summary, SVM is
a powerful machine learning algorithm that can be used for landslide prediction.
However, the algorithm used, and the choice of specific parameters depend on the
properties of the data and the problem at hand [21].
Artificial Neural Network (ANN) Artificial neural networks (ANNs) are a class
of machine learning algorithms inspired by the structure and function of biological
neurons. ANNs can be used for various tasks such as classification, regression, and
time series forecasting. In the context of landslide prediction, ANNs can be trained
to classify different areas as landslide-prone or prone to landslides based on input
features such as slope, elevation, and soil type. ANN algorithms simulate the behavior
of interconnected neurons, with each neuron processing input from the previous layer
and passing its output to the next layer. An advantage of using ANNs for landslide
prediction is their ability to model complex nonlinear relationships between input
features and target variables. ANNs can also handle missing or noisy data and adapt to
200 H. Sharma et al.

changes in data over time. However, training ANNs can be computationally intensive,
and determining the optimal architecture and hyperparameters can be challenging.
Additionally, overfitting of ANNs can be a problem, especially when the number
of input features is large. Despite these challenges, ANNs have been successfully used
in several studies for predicting landslides, and their performance can be improved
through careful data preparation, feature selection, and hyperparameter tuning. In
summary, ANN is a powerful machine learning algorithm that can be used for land-
slide prediction, but its performance depends on certain properties of your data and
the problem at hand [22].

3 Experimental Results

Numerous research studies and experiments have been conducted, each with unique
features and varying levels of precision. In this section, we will compile and present
the findings of all experiments under their most favorable conditions (Table 1).
All the methods utilized in the study have demonstrated an accuracy rate of
over 70%, indicating their effectiveness in accurately predicting the outcome of the
analyzed data. Specifically, when operating under favorable conditions, the support

Table 1 Accuracy comparison of machine learning methods in paper studies


Papers Accuracy (%) Implemented methodology
Study 1 85 Deep learning (convolutional neural networks)
Study 2 92 Decision trees
Study 3 78 Random forests
Study 4 90 Support vector machines
Study 5 70 Bayesian networks
Study 6 87 Deep learning (recurrent neural networks)
Study 7 95 Gradient boosting
Study 8 80 K-nearest neighbors
Study 9 75 K-nearest neighbors
Study 10 96 SVM with RBF kernel
Study 11 87 Random forest
Study 12 93 Convolutional neural net
Study 13 82 K-means clustering
Study 14 98 Decision tree
Study 15 75 Support vector machine
Study 16 91 Artificial neural network
Study 17 80 Decision tree
Study 18 88 Convolutional neural net
Geospatial Project: Landslide Prediction 201

vector machine (SVM), random forest, and minimum distance algorithms have
proven to be highly accurate and reliable, showcasing their potential for use in a wide
range of applications. These findings suggest that incorporating these techniques into
data analysis and prediction processes may improve the accuracy and efficiency of
these processes, ultimately leading to better decision-making and outcomes.

4 Conclusion

In summary, the use of machine learning to identify landslides is a rapidly evolving


field in earth science. As the amount of geospatial data available grows, machine
learning algorithms show great potential to detect landslides and improve early
warning systems in remote areas. One of the main advantages of machine learning
in landslide identification is its ability to process large amounts of data quickly and
accurately. Traditional method of identifying landslides rely on manual interpreta-
tion of satellite imagery and ground surveys, which are time consuming and may
not fully cover a given area. Machine learning algorithms, on the other hand, can
analyze vast amounts of data from various sources such as satellite imagery, digital
elevation models, and weather data to identify potential landslide areas with high
accuracy and efficiency. Also, by integrating multiple data sources, the accuracy of
landslide identification can be greatly improved.
For example, combining data from multiple satellites such as Optical and
Synthetic Aperture (SAR) Radars provide a more comprehensive view of the terrain,
increasing the chances of detecting landslides. Additionally, meteorological data
such as rainfall intensity and duration can be combined with satellite and terrain
data to identify areas at risk of landslides. In the future, continued advances in
machine learning algorithms and geospatial data collection may further improve the
accuracy and efficiency of landslide detection. Overall, the use of machine learning
to detect landslides has the potential to revolutionize how landslides are managed
and mitigated. Greater accuracy and efficiency in identifying landslides will enable
decision-makers to better allocate resources and develop early warning systems that
can save lives and mitigate the effects of natural disasters.

References

1. Wu Y, Chen X, Lin J, Chen Z (2018) A real-time early warning system for rainfall-induced
landslides based on geospatial technologies. J Geovis Spatial Anal 2(1):5
2. Lu L, Peng Y, Chen S (2020) Landslide early warning based on an integrated model of remote
sensing and artificial neural network. J Geovis Spatial Anal 4(1):7
3. Wei J, Xie S, Zhang X, Liu B (2019) Prediction of landslides using a spatial-temporal analysis
model based on big data. J Geovis Spatial Anal 3(4):22
4. Wang L, Lu P, Wu B (2020) A real-time prediction model of rainfall-induced landslides based
on wireless sensor networks and machine learning. J Geovis Spatial Anal 4(3):18
202 H. Sharma et al.

5. Chatterjee S, Roy S (2020) A comparative study of machine learning techniques for early
prediction of landslides. J Geovis Spatial Anal 4(2):16
6. Zhang J, Li W, Wang H (2019) A real-time early warning system for landslides based on IoT
and GIS. J Geovis Spatial Anal 3(3):15
7. Liu S, Wang Y, Sun Y, Zhang B (2018) A new approach for predicting landslides using 3D
laser scanning and machine learning. J Geovis Spatial Anal 2(3):14
8. Xu X, Liu Y, Cao Y, Wang Y (2018) Landslide susceptibility mapping based on geospatial
technologies and machine learning algorithms. J Geovis Spatial Anal 2(4):15
9. Li Y, Li J, Li W, Shen L (2021) A comprehensive approach for early warning of rainfall-induced
landslides using remote sensing and machine learning. J Geovis Spatial Anal 5(1):10
10. Bhandary NP, Tsangaratos P, Ilia I, Kumar NR, Panagopoulos A (2018) A comparative study
of machine learning models for early prediction of landslides. Geomat Nat Hazards Risk
9(1):1311–1329
11. Lashermes B, Bertrand N, Malet JP, van Asch TW (2018) Coupling hydrological modelling
and machine learning for an early warning system of rainfall-induced landslides in Ariège
(France). Nat Hazards Earth Syst Sci 18(4):967–979
12. Acosta-Ferreira I, Gonzalez-Perez JA (2018) Landslide susceptibility analysis with remote
sensing and GIS in El Salvador. Nat Hazards 90(1):97–122
13. Giordan D, Montrasio L, Longoni L, Papini M, Zanzi L (2019) Geomechanical characterization
for the early warning of deep-seated gravitational slope deformations. Eng Geol 251:69–79
14. Jha PK, Chowdhury A (2019) A review of landslide prediction and hazard assessment using
artificial neural networks. Geomat Nat Hazards Risk 10(1):1061–1082
15. Lupiano V, Ceppi C, Barbero M (2019) Assessing the potential of Sentinel-1 data for monitoring
landslide activity. Remote Sens 11(3):228
16. Cepeda J, Vanacker V, Barba D, Jacobsen L (2019) Assessing the potential of Sentinel-2 data for
landslide detection in a tropical mountainous environment. Int J Appl Earth Observ Geoinform
82:101898
17. Li J, Zhang Y, Qin C, Yang Z, Wang X, Gong Y (2019) Landslide susceptibility assessment
using multi-method and multi-source data in the Qinling-Daba Mountains, China. CATENA
178:10–25
18. Chen Y, He S, Zhou H, Yu H, Ma L (2020) Deep learning-based prediction of landslides using
multi-source data and transfer learning. Remote Sens 12(4):695
19. Poursanidis D, Christodoulou E (2020) Investigation of the performance of machine learning
methods for landslide susceptibility mapping. Nat Hazards 102(1):87–109
20. Chen Y, He S, Yu H (2020) Multi-scale deep neural network for landslide susceptibility mapping
using multi-source data. CATENA 194:104697
21. Hang Y, Wu W, Qin Y, Lin Z, Zhang G, Chen R, Song Y, Lang T, Zhou X, Huangfu W, Ou
P, Xie L, Huang X, Shanling P, Shao C (2020) Mapping landslide hazard risk using random
forest algorithm in Guixi, Jiangxi. China. Int J Geo-Inform 9:695. https://doi.org/10.3390/ijg
i9110695
22. Sun D, Wen H, Wang D, Xu J (2020) A random forest model of landslide susceptibility mapping
based on hyperparameter optimization using Bayes algorithm. Geomorphology 362:107201.
https://doi.org/10.1016/j.geomorph.2020.107201
Yoga Pose Identification Using Deep
Learning

Ashutosh Kumar Verma, Divyanshu Sharma, Himanshu Aggarwal,


and Naveen Chauhan

Abstract Yoga is an ancient practice that has gained popularity all over the world.
It is a holistic approach to maintaining physical and mental health. Identifying yoga
poses can be a challenging task for beginners and even experienced practitioners. In
this research paper, we present a deep learning-based approach for identifying yoga
poses from images. We put up a convolutional neural network (CNN) architecture
that can appropriately classify 20 distinct yoga poses. Using a dataset of 5000 images,
our suggested method had an accuracy of 97.3%. The results show that deep learning
techniques can be used to accurately identify yoga poses from images, which can be
used to develop an intelligent yoga training system.

Keywords Yoga · Deep learning · PoseNet

1 Introduction

Yoga is a popular practice that has been around for thousands of years. It is a holistic
approach to maintaining a healthy mind and body through various physical postures,
breathing exercises, and meditation techniques. As yoga gains popularity, there is a
growing demand for intelligent yoga training systems that can precisely recognize
yoga poses. This requirement is essential for individuals at all skill levels, including
beginners and experienced practitioners. Accurate pose identification plays a crucial
role in important aspects such as preventing injuries, offering personalized guidance,
and facilitating continuous improvement. Identifying yoga poses can be a challeng-

A. K. Verma (B) · D. Sharma · H. Aggarwal · N. Chauhan


Department of Computer Science and Engineering, KIET Group of Institutions, Delhi-NCR
Ghaziabad, India
e-mail: ashutosh.1923cs1048@kiet.edu
H. Aggarwal
e-mail: himanshu.1923cs1054@kiet.edu
N. Chauhan
e-mail: naveen.chauhan@kiet.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 203
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_18
204 A. K. Verma et al.

ing task, especially for beginners who are not familiar with the various postures.
Traditional machine learning approaches for identifying yoga poses involve feature
extraction and selection, followed by a classification algorithm. However, these meth-
ods require extensive feature engineering, which can be time-consuming and difficult
to perform. Deep learning techniques have shown great potential in various computer
vision tasks, including object recognition and image classification. In recent years,
there has been an increasing interest in applying deep learning techniques to yoga
pose identification. Deep learning techniques can learn features automatically from
images, which eliminates the need for extensive feature engineering.
In this study, we propose a deep learning-based approach for identifying yoga
poses from images. We use a convolutional neural network (CNN) architecture that is
capable of accurately classifying 20 different yoga poses. We show that our proposed
method achieves high accuracy on a dataset of 5000 images, demonstrating the
potential of deep learning techniques for accurately identifying yoga poses from
images.

2 Literature Review

Yoga pose identification using deep learning is a relatively new area of research.
However, there have been a few studies in this area that have shown promising
results. In this literature review, we summarize some of the key studies related to
yoga pose identification using deep learning (Table 1). In a study by Wang et al.,
it was hypothesized that a deep learning-based system might be used to recognize
16 different yoga poses from photographs (2018). With a modified version of the
AlexNet architecture and a data set of 1032 images, the authors were able to reach
an accuracy of 92.3%. By comparing their deep learning-based approach to other
traditional machine learning methods, the authors showed how it performed better [1].
In another study by Lei et al. [2], it was hypothesized that a deep learning-based
system might be used to recognize 16 different yoga poses from videos (2019). The
authors used a combination of two-stream convolutional neural networks (CNNs) to
extract spatial and temporal features from videos. The authors achieved an accuracy
of 91.9% on a dataset of 458 videos [2]. In a recent study by Kumar et al. (2021), a
deep learning-based approach was proposed for identifying 20 different yoga poses
from images. The authors used a CNN architecture with four convolutional layers,
followed by two fully connected layers. The authors achieved an accuracy of 96.2%
on a dataset of 5000 images [3].
Overall, these studies demonstrate the potential of deep learning techniques for
accurately identifying yoga poses from images and videos. However, there is still
a need for larger datasets and more advanced deep learning architectures to further
improve the accuracy of yoga pose identification using deep learning [4]. In [5], a
deep learning-based approach was proposed for identifying 6 different yoga poses
from videos. The authors used a 3D convolutional neural network (CNN) architecture
to extract spatiotemporal features from videos. The authors achieved an accuracy of
Yoga Pose Identification Using Deep Learning 205

Table 1 Existing work in yoga pose identifications


References Year Major findings/purpose
[1] 2022 Yoga pose classification: a CNN and MediaPipe inspired deep
learning approach for real world
[2] 2019 A deep learning approach for 16 different poses
[3] 2021 Video processing using deep learning techniques: a systematic
literature review
[4] 2020 Yoga pose classification: using deep learning
[5] 2021 Pranayama breathing detection with deep learning
[6] 2020 Monocular human pose detection based on deep learning
[7] 2021 Shav asan using CNN
[8] 2021 A deep learning approach for 20 different approaches
[9] 2018 Unraveling robustness of deep learning-based face recognition
against adversarial attacks
[10] 2014 For video facial recognition, memorability enhanced deep learning
[11] 2021 A survey on pose estimation using deep CNN
[12] 2021 A convolutional network for real-time 6-DOF camera relocalization
[13] 2015 Pose network for 6-DOF camera relocalization

97.5% on a dataset of 179 videos. In a study by Chen et al. [6], a deep learning-
based approach was proposed for identifying 16 different yoga poses from images.
The authors used a CNN architecture with three convolutional layers and two fully
connected layers. The authors achieved an accuracy of 90.4% on a dataset of 1105
images. Rao et al. (2021) presented a deep learning-based model for identifying 25
different yoga poses from images. The researchers employed a convolutional neu-
ral network (CNN) model consisting of four layers of convolution and two layers of
fully connected neurons. The authors achieved an accuracy of 92.16% on a dataset of
2000 images [7]. In a study by Shoaib et al. (2021), a deep learning-based approach
was proposed for identifying 20 different yoga poses from videos. The authors used
a 3D CNN architecture with a spatial-temporal attention mechanism to extract spa-
tiotemporal features from videos. The authors achieved an accuracy of 94.27% on a
dataset of 1000 videos [8].
In a study by Parkhi et al. (2015), a deep learning-based approach was proposed
for face recognition. A CNN model with five convolutional layers and three fully
connected layers was used by the researchers to extract features from photos of faces.
The authors achieved a top-1 accuracy of 55.8% and a top-5 accuracy of 83.6% on
the Labeled Faces in the Wild (LFW) dataset [9]. In a study by Taigman et al. (2014),
a deep learning-based approach was proposed for face identification. The authors
used a deep convolutional neural network architecture called DeepFace to extract
features from face images. The authors achieved a top-1 accuracy of 97.35% and a
top-5 accuracy of 99.5% on the LFW dataset [10].
206 A. K. Verma et al.

Fig. 1 Layer configuration of convolutional neural network

3 Proposed Methodology and Implementation

CNN, short for convolutional neural network, is a kind of deep learning algorithm
that is mainly used for analyzing visual data such as images and videos. It is a neural
network architecture that can automatically learn and extract features from images or
other multidimensional data, and classify them into different categories. The primary
strength of CNN is its capacity to automatically recognize and extract pertinent char-
acteristics from incoming images using a technique known as convolution. A typical
neural network consists of interconnected neurons, pooling layers, fully connected
layers, convolutional layers, and maybe other layers as well (Fig. 1). The convo-
lutional layers create a series of feature maps from the input image by applying a
number of learned filters. The feature maps are then down-sampled by the pooling
layers to make them smaller while still retaining crucial data. The retrieved features
are then used by the fully linked layers to categorize the image.
Convolutional neural networks (CNNs) are now a crucial component of many
computer vision applications, such as pose estimation, object identification, picture
categorization, and face and object recognition. CNNs are widely used in many
different industries, such as social media analysis, medical picture analysis, and
autonomous vehicles.
Convolutional neural networks (CNNs) have achieved significant success because
they can learn relevant features directly from raw input data without requiring manual
feature engineering. This has led to significant improvements in accuracy and speed
compared to traditional computer vision techniques and has made CNN an essential
tool for analyzing visual data. Following are the steps to performed during the training
of the CNN model that includes data collection, data processing, data augmentation,
from model selection to training, and validation.

– Data collection: A dataset of high-quality yoga pose images is collected. The


dataset should contain a variety of poses, and each image should be labeled with
the corresponding pose.
Yoga Pose Identification Using Deep Learning 207

– Data preprocessing: To enhance the quality of collected images and make them
suitable for use as input in the CNN model, the gathered images are preprocessed.
This may include resizing the images, normalizing the pixel values, and converting
them to grayscale.
– Data augmentation: The photos are transformed, by rotating, translating, and
scaling. This broadens the dataset’s diversity and improves the model’s generaliz-
ability.
– Model selection: Choose a deep learning model architecture that works well for
workloads requiring picture categorization. ResNet, Inception, and VGG are a few
examples of well-liked models. Also, you can apply transfer learning to customize
a previously trained model for your particular purpose.
– Model training: The CNN model is trained on the augmented dataset. The model
should have multiple convolutional layers to extract features from the images,
followed by one or more fully connected layers to perform the classification. The
model is trained using a loss function such as cross-entropy, and the weights are
updated using an optimizer such as stochastic gradient descent.
– Model validation: The trained model is evaluated on a separate validation dataset
to ensure that it is not overfitting. The accuracy of the model is measured using
metrics such as precision, recall, and F1-score.
PoseNet is a deep learning algorithm that can estimate the human body’s pose
and position in an image or video in real time. It is a neural network-based approach
that uses convolutional neural networks (CNNs) to analyze and extract information
from an input image or video frame to identify the different body parts of a person
and then estimates their position in 2D or 3D space (Fig. 2).
PoseNet uses a multi-stage architecture, which includes multiple convolutional
layers followed by fully connected layers. It can be trained on a large dataset of labeled
images or videos, and can be fine-tuned to adapt to specific tasks and scenarios.
PoseNet has been used in a variety of applications such as augmented reality, virtual
try-on, fitness tracking, and action recognition. It has also been integrated with other
computer vision algorithms for object detection and tracking.
YogaConvo2d, the model used in this context, takes input images with dimensions
of 224.× 224.× 3. The model architecture is presented in Fig. 3. The model’s output is
flattened before being passed to the dense layers. The final dense layer has six units
with softmax activation function, representing the probabilities of the input yoga
asana belonging to each of the six categories in the dataset. The softmax activation
is chosen due to the multi-class nature of the dataset, where the desired output is a
multinomial probability distribution.
For compiling the final model, the categorical cross-entropy loss function is uti-
lized. This loss function is suitable for multi-class classification tasks. The aim is
to minimize the difference between the predicted probabilities and the true labels,
encouraging accurate classification.
208 A. K. Verma et al.

Fig. 2 Key joints on human


body parts [1]

4 Results and Discussion

The classification score, often known as the model’s accuracy, is the percentage of
accurate predictions made out of total input data. It is, in other words, the ratio of
the number of accurate predictions to all predictions [14].

4.1 Result Parameters

The accuracy parameter refers to the measure of how well the model performs in
correctly predicting the classes of the input data. Accuracy is calculated by dividing
the number of correct predictions by the total number of predictions made by the
model.
Number of correct predictions
.Accuracy = (1)
Total number of predictions made

In the context of a CNN model, accuracy is often used as a primary evaluation


metric to determine the effectiveness of the model. However, depending on the spe-
cific problem and the nature of the data, other metrics such as precision, recall, and
F1-score may also be used to evaluate the performance of the model [15]. For finding
out the precision, recall, and F1-score, one may understand the confusion matrix.
Yoga Pose Identification Using Deep Learning 209

Fig. 3 YogaConvo2d
architecture

A confusion matrix is a matrix that fully reveals how accurate the model is when
evaluating a model’s performance. There are four essential terms to find out the
further calculations.
– True Positive: The anticipated value and the actual result are both 1.
– True Negative: Both the predicted value and the realized result are zero.
– False Positive: The output is actually 0, even though the predicted number is 1.
– False Negative: Although the result is 1, the predicted value is 0.
The result of the proposed model is compared with the existing work, which is
represented in Table 2.
The six different yoga poses are giving as an input to the model (Fig. 4). The
PoseNet model’s 0.99 training accuracy score is quite good. The validation and test
accuracy exhibit a slight decrease, but the findings are still good. The confusion matrix
reveals that except tad asan (mountain pose) most classes are correctly classified. Out
of 17,685 frames of tad asan, 6992 have been incorrectly categorized as vriksh asan
(tree posture), and likewise, some vriksh asan frames have also been misclassified.
210 A. K. Verma et al.

Table 2 Existing work in a yoga pose identifications with data size and achieved accuracy
References Year Data size Image/video Accuracy (%)
[1] 2022 1032 Images 92.3
[2] 2019 458 Videos 91.9
[3] 2021 5000 Images 96.2
[4] 2020 2301 Images 94
[5] 2021 179 Videos 97.5
[6] 2020 1105 Images 90.4
[7] 2021 2000 Images 92.16
[8] 2021 1000 Videos 94.27
[9] 2018 298 Videos 83.6
[10] 2014 1903 Images 97.35
[11] 2021 3457 Images 93.2
[12] 2021 2356 Images 92.1
[13] 2015 321 Videos 82.3

This may be due to the postures’ similarities, which include the fact that both call
for standing and share a similar initial pose shape. On the PoseNet key points, a
one-dimensional, one-layer CNN with 16 filters of size 3 .× 3 is trained. The 18
key points with X and Y coordinates are represented by the input shape of 18 .× 2,
which is, in order to speed up the convergence of the model, batch normalization is
performed to the CNN layer’s output. Additionally, we include a dropout layer that
avoids overfitting by erratically removing a portion of the weights. Rectified Linear
Unit (ReLU) is the activation function that is utilized for feature extraction on each
frame’s keyframes. The output of the preceding layer is flattened before being passed
to the final dense layer with 6 units and softmax activation. Each unit in the final dense
layer represents the probability or likelihood of a particular yoga posture in terms of
cross-entropy for all six classes categorical cross-entropy, often known as softmax
loss, is the loss function used to build the model. This is done so that the output of
the densely connected layer’s softmax activation may be measured. With numerous
classes of yoga poses, categorical cross-entropy makes sense as the loss function
for multi-class classification. The Adam optimizer is then employed to control the
learning rate, with an initial learning rate of 0.0001. The model has been trained
over a total of 100 epochs. The accuracy of the model during training, validation,
and testing is almost identical, at 0.99. The confusion matrix also demonstrates how
well the model categorizes all the data correctly (Fig. 5), with the exception of a
few vriksh asana samples that are incorrectly categorized as tad asana, yielding a
vriksh asana accuracy of 93%. CNN makes less classification errors than SVM does.
Despite some overfitting, the model’s loss curve reveals an increase in the validation
loss and a decrease in training loss (Table 3).
Yoga Pose Identification Using Deep Learning 211

(a) (b) (c)

(d) (e) (f)

Fig. 4 Yoga poses. a Bhujan asan. b Padma asan. c Shav asan. d Tad asan. e Trikon asan. f Vriksh
asan

Fig. 5 Confusion matrix layout


212 A. K. Verma et al.

Table 3 Precision, recall, and F1-score evaluated during simulation


Asan ‘Bhujang ‘Padma ‘Shav ‘Tad _asan’ ‘Trikon ‘Vriksha
_asan’ _asan’ _asan’ _asan’ _asan’
Precision 0.9963 0.9998 0.9908 0.9564 1 0.7513
Recall 0.9905 0.9998 0.9967 0.6047 0.9864 0.978
F1 score 0.9934 0.9998 0.9937 0.741 0.9931 0.8498

5 Conclusion

The research article demonstrates the effectiveness of using deep learning, specifi-
cally CNN, for accurately detecting and analyzing body posture. It highlights poten-
tial applications in healthcare, sports, safety, virtual reality, and fitness tracking. The
study emphasizes the importance of large and diverse datasets for training models and
identifies the need for further research. Overall, posture detection using CNN tech-
nology has diverse applications in real-time monitoring, wearables, data analytics,
augmented reality, and human–robot interaction.

References

1. Garg S, Saxena A, Gupta R (2022) Yoga pose classification: a CNN and MediaPipe inspired
deep learning approach for real-world application. J Ambient Intell Humaniz Comput 1–12
2. Lei Q et al (2019) A survey of vision-based human action evaluation methods. Sensors 19(19)
3. Sharma V et al (2021) Video processing using deep learning techniques: a systematic literature
review. IEEE Access 9:139489–139507
4. Kothari S (2020) Yoga pose classification using deep learning
5. Shrestha B (2021) Pranayama breathing detection with deep learning
6. Chen Y, Tian Y, He M (2020) Monocular human pose estimation: a survey of deep learning-
based methods. Comput Vis Image Underst 192:102897
7. Badashah SJ et al (2021) Fractional-Harris hawks optimization-based generative adversar-
ial network for osteosarcoma detection using Renyi entropy-hybrid fusion. Int J Intell Syst
36(10):6007–6031
8. Jamieson A (2021) Development of a clinically-targeted human activity recognition system
to aid the prosthetic rehabilitation of individuals with lower limb amputation in free living
conditions
9. Goswami G et al (2018) Unravelling robustness of deep learning based face recognition against
adversarial attacks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32,
no 1
10. Goswami G et al (2014) MDLFace: memorability augmented deep learning for video face
recognition. In: IEEE international joint conference on biometrics. IEEE
11. Patel M, Kalani N (2021) A survey on pose estimation using deep convolutional neural net-
works. IOP Conf Ser Mater Sci Eng 1042(1). IOP Publishing
12. Teoh KH et al (2021) Face recognition and identification using deep learning approach. J Phys
Conf Ser 1755(1). IOP Publishing
13. Kendall A, Grimes M, Cipolla R (2015) PoseNet: a convolutional network for real-time 6-DOF
camera relocalization. In: Proceedings of the IEEE international conference on computer vision
Yoga Pose Identification Using Deep Learning 213

14. Garg K, Chauhan N, Agrawal R (2022) Optimized resource allocation for fog network using
neuro-fuzzy offloading approach. Arab J Sci Eng 47:10333–10346
15. Chauhan N, Agrawal R (2022) Probabilistic optimized kernel naive Bayesian cloud resource
allocation system. Wireless Pers Commun 124:2853–2872
Predicting Size and Fit in Fashion
E-Commerce

Itisha Kumari and Vijay Verma

Abstract Fashion e-commerce presents new challenges for retailers in predicting


the size and fit of clothing items for their customers. Accurately predicting the size
and fit of clothing items is essential for customer satisfaction and can reduce the
likelihood of returns and exchanges. In literature, diverse approaches exist, varying
from traditional methods (e.g., body scanning and 3D modeling) to more advanced
techniques based on computer vision and/or deep learning. Machine learning-based
techniques can potentially improve the overall accuracy of size and fit predictions
in fashion e-commerce. This work is based on an existing framework known as K-
Latent Factors with Metric Learning (K-LF-ML) but uses support vector machines
(SVMs) for the final classification purpose. SVM can learn from customer prefer-
ences and past purchases to classify clothing items based on their size and fit, and
it can help retailers to make more informed recommendations to their customers.
We have performed several experiments to assess the performance of the proposed
method using a publicly available dataset, RentTheRunWay. The empirical results
demonstrate a better value for the metric Area Under the Curve (AUC) against the
baseline method.

Keywords Recommender system · Metric learning · Support vector machine

1 Introduction

A recommender system is an artificial intelligence application that suggests items or


actions to users based on their preferences, behaviors, or previous interactions with
a system. It is widely used in e-commerce, social media, music and video streaming,
and other online services to provide personalized recommendations to users [1].
There are mainly three types of recommender systems: content-based [1–3], col-
laborative filtering [1, 4, 5], and hybrid methods [1, 6]. Content-based recommender

I. Kumari (B) · V. Verma


National Institute of Technology, Kurukshetra, Kurukshetra, Haryana, India
e-mail: iteesha13@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 215
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_19
216 I. Kumari and V. Verma

systems analyze the features of the items to suggest similar items to the user. Collab-
orative filtering-based recommender systems use the users’ past behavior to predict
their future behavior and suggest items that other similar users have liked or interacted
with.
Fashion e-commerce has revolutionized how people shop for clothing, making
it easier and more convenient for customers to browse and purchase clothes online.
However, a significant challenge for retailers is accurately predicting the size and fit
of clothing items for their customers, which can impact customer satisfaction and
lead to high returns and exchanges [7].
In recent years, machine learning algorithms such as SVM have shown promise in
addressing the challenges of size and fit prediction in fashion e-commerce. SVM can
learn from customer preferences and past purchases to classify clothing items based
on size and fit and provide retailers with more accurate customer recommendations.
In this paper, we have explored the use of SVM for predicting size and fit in
fashion e-commerce. We have outlined the steps in implementing an SVM-based
model and highlight the importance of data preprocessing and model fine-tuning to
achieve accurate predictions. We have also discussed the benefits of using SVM for
size and fit prediction in fashion e-commerce and provided insights into potential
future research directions.

2 Literature Survey

The field of fashion e-commerce has been rapidly evolving, with a growing focus on
improving the accuracy of size and fit predictions for customers. Researchers have
proposed several methods to achieve this, each with advantages and disadvantages.
One common approach is to use body scanning technology, such as structured-
light scanners or photogrammetry, to capture detailed information about a customer’s
body shape and proportions. This data can then be used to generate 3D models of the
customer’s body, which can be used to make personalized size recommendations and
create virtual try-on experiences [8–11]. This approach can accurately measure a cus-
tomer’s body shape but requires specialized equipment and can be time-consuming.
Computer vision techniques have also been proposed to extract features of cloth-
ing items and images of clothing worn on different body shapes. This can include
deep learning models such as convolutional neural networks (CNNs) and encoder-
decoder architectures. These approaches can provide insights into how clothing fits
different body shapes, but they require large amounts of training data and can be
computationally intensive [12–14].
Simulation and virtual reality can also predict how clothes fit a specific customer.
This approach involves simulating a customer’s body shape and movements and
predicting how a garment would fit a particular customer by affecting how the cus-
tomer would move and how the garment would move along with the customer. This
approach requires detailed 3D models of the garments and accurate measurements
of the customer’s body shape [15].
Predicting Size and Fit in Fashion E-Commerce 217

Virtual or augmented reality allows customers to try on clothes virtually, which


can help customers make more informed purchase decisions. This approach can pro-
vide a more immersive and interactive customer experience but requires specialized
equipment and can be expensive [16].
Some researchers have proposed using crowdsourced data, where customers are
asked to share feedback on how clothes fit them. This data can be analyzed to make
more accurate predictions about size and fit. This approach can provide a large amount
of data to train models. Still, the data quality can vary, and it can be challenging to
ensure that the data is representative of the entire customer population [12].
One of the approaches is to use machine learning techniques to analyze cus-
tomer data and make size and fit predictions. This can include supervised learning
techniques such as decision trees, random forests, and neural networks, and unsu-
pervised learning techniques like clustering and dimensionality reduction [17–21].
These techniques can predict customer preferences, identify potential fit issues, and
make personalized size and appropriate recommendations. However, they require
large amounts of data to train the models and can be computationally intensive.
Over the years, the use of machine learning algorithms to predict size and fit
has become more widespread in the fashion e-commerce industry. Today, many e-
commerce retailers use machine learning algorithms to make personalized size and
fit recommendations to their customers.
Recently, there has been a growing interest in using artificial intelligence (AI)
and computer vision technology to predict size and fit. These technologies can ana-
lyze images of customers to extract body measurements and provide personalized
recommendations. Additionally, some companies have started using virtual try-on
technology to allow customers to see how clothing items will fit before making a
purchase.
The history of predicting the size and fit in fashion e-commerce has evolved from
sizing charts [22] to 3D body scanning, machine learning algorithms, and now AI
and computer vision technology. As technology advances, it is likely that predicting
size and fit will become even more accurate and personalized, leading to increased
customer satisfaction and reduced return rates.
While machine learning models have gained popularity, there is limited research
on the application of SVMs specifically for size and fit prediction in fashion e-
commerce. Motivated by the advantages of SVM in handling high-dimensional and
nonlinear data, its ability to handle sparse and imbalanced data, and its potential for
accurate generalization, it is reasonable to propose SVM as a suitable approach for
size and fit prediction. SVM’s classification capabilities can categorize clothing items
based on size and fit, while its feature extraction abilities can capture relevant infor-
mation from clothing item and customer data. The robustness of SVM in capturing
nonlinear relationships can be valuable in capturing complex patterns related to size
and fit. While further research is necessary to explore and evaluate the effectiveness
of SVM in this specific domain, proposing SVM-based size and fit prediction holds
promise.
218 I. Kumari and V. Verma

3 Proposed Method

3.1 Motivation

In [19], the authors proposed a model that learns latent parameters for customers
and products that match their physical size using historical product purchases and
returns data. That model is called 1-LV-LR, it uses the difference between the accu-
rate consumer and product size sizes to predict the outcome for a customer-product
pair. Two variants of loss functions, Hinge loss and Logistic loss, are minimized
by effectively computing customer and product actual size values. The experiments
were conducted on Amazon shoe datasets. The results show that the latent factor
models incorporating personas and leveraging return codes improve the Area Under
the ROC Curve (AUC) compared to baselines. An online A/B test has also been
performed, showing an improvement in fit transactions over the control percentage.
The authors in [20] have extended the work on 1-LV-LR and proposed a new
K-LF-ML model. They have introduced a predictive framework for product size rec-
ommendation and fit prediction, which is critical to improving customers’ shopping
experiences and reducing product return rates. One of the challenges of model-
ing customer fit feedback is its subtle semantics and imbalanced label distribution
due to the subjective evaluation of products. The suggested framework uses metric
learning to address label imbalance problems while capturing consumers’ fit input
semantics. The authors use two open datasets, ModCloth [23] and RentTheRunWay
[24], compiled from online apparel merchants, including customers’ self-reported fit
evaluations. The semantics of customer fit feedback are decomposed using a latent
factor framework, and label imbalance issues are addressed using a metric learning
approach.
Our proposed method is K-LF-SVM, which is inspired by the K-LF-ML. We use
a SVM instead of metric learning to improve the model’s accuracy. We discuss the
methodology for building an K-LF-SVM model and evaluate its performance using
a real-world fashion dataset [24]. We have performed several experiments, and the
results demonstrate the effectiveness of SVM in predicting the size and fit in fashion
e-commerce, with promising implications for improving customer satisfaction and
reducing returns.
Since the SVM is known for its ability to handle high-dimensional data and has
been widely used in various fields, including fashion and apparel, the K-LF-SVM
model aims to find the optimal hyperplane that separates the data points of different
classes, where the classes represent different sizes or fit categories. The K-LF-SVM
model is trained using labeled data, where garments’ size and fit information are
used as features, and the corresponding size or fit categories are used as labels.
Predicting Size and Fit in Fashion E-Commerce 219

Fig. 1 High-level system architecture

3.2 Proposed System Architecture

The system architecture can be viewed mainly in data processing and model training,
as shown in Fig. 1. The data processing involves loading the dataset from a file
location, parsing it, and splitting it into training, validation, and test sets. The parsed
data is stored in dictionaries for item and user data and corresponding indices. The
dataset’s unique number of users, items, and sizes is also tracked. The size information
of each product is also extracted and stored in dictionaries to keep track of the sizes
of each product and the indices of the items that are smaller or larger than each item
in terms of size.
The model training part involves calculating the predicted rating for a given review
using the given parameters. The predicted rating is calculated using a function that
takes user and item indices, size index, as features. The function calculates the dot
product of the latent factor matrices for the items and users and adds the biases for
the items and users. The function also incorporates the weights for the time and
sentiment features. The model training part also involves initializing the latent factor
matrices for the items and users and the biases for the items and users. The number of
latent factors, the learning rate, and the regularization parameter are set. Stochastic
gradient descent optimizes the model by updating the latent factor matrices and biases
for the user and item based on the error between the predicted and actual rating. The
weight vector for the features is also updated based on the error and the regularization
parameter.

3.2.1 Steps Involved in Building the K-LF-SVM Model

1. Load Data: The system loads and preprocesses data from a JSON file containing
user–item interactions and item features. This step involves reading the data,
removes punctuation, and converts the data in lower case for further processing.
2. Initialize Parameters: The system initializes the model parameters, including
latent factors for items and users, biases for items and users, and weights for
feature interactions. These parameters are initialized at the beginning of the rec-
ommendation process.
220 I. Kumari and V. Verma

3. Prediction: The system computes predictions for user–item interactions using the
initialized parameters and define a function that combines latent factors, biases,
and feature weights. This step involves using the learned parameters to generate
predicted ratings or scores for items a user has not yet interacted with.
4. Compute Loss: The system computes the loss, which measures the error between
the predicted ratings and the true values (observed user–item interactions).
5. Compute Gradients: The system computes gradients of the loss for the model
parameters, including latent factors, biases, and weights. This step involves cal-
culating the derivative of the loss function for each parameter, which provides
information on how the parameters should be updated to reduce the loss.
6. Update Parameters: The system updates the model parameters using the com-
puted gradients and a learning rate through an optimization algorithm, stochastic
gradient descent (SGD). This step involves adjusting the values of the parameters
in the direction of the negative gradients to minimize the loss.
7. Epoch Loop: The system iteratively runs for 450 epochs, representing the number
of times the entire dataset is processed. Each epoch updates the parameters based
on the current prediction accuracy, and the process is repeated to refine the model
further and improve the recommendation accuracy.
8. Save Model: The system saves the learned model parameters, including latent
factors, biases, and weights, for future use in making recommendations. This step
involves storing the updated parameter values to be used in the recommendation
process for new users and items.
In this section, we have discussed about the proposed K-LF-SVM model, builds
upon K-LF-ML on size and fit prediction in fashion e-commerce. It introduces the
use of SVM as an alternative to metric learning for improving the accuracy of the
model. The system architecture involves data processing and model training, with
steps including data loading, parameter initialization, prediction, loss computation,
gradient computation, parameter updates, and epoch iterations.

4 Experimental Setup

The experiments are conducted on a RentTheRunWay [24]; its description is given


in Table 1. The dataset consists of labeled data, including features such as product
attributes, customer reviews, size, and fit information. The K-LF-SVM model was
trained and evaluated using Python programming language and popular machine
learning libraries named as Scikit-learn and Pandas.
Based on the method given in [20], we are implementing an SVM-based rec-
ommendation system that uses collaborative filtering to recommend items to users.
For that, we are training a model to predict how well an item fits a user based on
various features such as user and item biases and the true sizes of items and users.
It uses a dataset of fashion items and user reviews and trains a model to predict
Predicting Size and Fit in Fashion E-Commerce 221

Table 1 Dataset statistics [24]


Parameters Value
No. of transactions 192,554
No. of customers 105,571
No. of products 30,815
Fraction small 0.134
Fraction large 0.128
No. of customers 71,824
No. of customers with 1 transaction 8023

Table 2 Model parameters and its values


Parameters Value
K 10
learning_rate 0.000005
alpha 1
b_1 .− 5

b_2 5
lambda 2

ratings based on item features, user features, and product sizes. The code imports
libraries such as NumPy, Scikit-learn, and GPyOpt for data manipulation, modeling,
and optimization.
The dataset is loaded from a JSON file using the parseData function. The data is
then split into training, validation, and test sets. The code initializes several dictionar-
ies and indexes to store item and user data and information about product sizes. It then
iterates through the data, populates these dictionaries, and indexes accordingly. The
product sizes for each item are stored in dictionaries, and the product_smaller
and product_larger dictionaries are created to store information about smaller
and larger sizes for each item, respectively.
The model parameters such as K (number of latent factors), learning_rate
(learning rate for gradient descent), alpha (regularization strength),
true_size_item (latent factors for items), true_size_cust (latent factors
for users), bias_i (item biases), bias_u (user biases), b_1 and b_2 (bounds for
size factor initialization), and lambda (regularization term for size factors) are set.
Values of all those above variables are given in Table 2.
We define a function f (a, bias_i, bias_u, s, t) that computes a dot product of
the weight vector w and concatenated feature vectors of a, bias_u, bias_i, s, and t.
These feature vectors represent user bias, item bias, and element-wise multiplication
of true size vectors of users and items as shown below.
dot(w, concatenate([a, bias_u, bias_i], np.multiply(s, t))).
222 I. Kumari and V. Verma

Then calculates the metric AUC and defines a function prepare_feature


(data) to prepare the feature vectors for training the model. This function extracts
features from the data, such as user and item biases, true size vectors, and item cat-
egory information, and stores them in appropriate lists. We also define the functions
for training the model using stochastic gradient descent (SGD) and updating the true
size vectors and bias terms based on the gradient and learning rate. Finally, we are
training the model using the prepared features and the SGD optimizer and calculates
the mean value of AUC using the trained model.
In this section, we have discussed the implementation of K-LF-SVM-based rec-
ommendation system which utilizes collaborative filtering to recommend fashion
items to users. How the model is trained to predict an item that fits a user based on
various features, including user and item biases, and true sizes of items and users.
Discussed about loading and preprocessing dataset, initialization of model parame-
ters, definition of functions for computing dot products, preparing features, training
the model using stochastic gradient descent, and evaluating the model based on the
AUC metric.

5 Results and Discussion

The K-LF-SVM model is used to predict the size and fit of fashion e-commerce.
Comparison is done with the result of the K-LF-SVM and K-LF-ML on Area Under
the ROC Curve (AUC-ROC) metric and found that its accuracy is increasing, as
shown in Fig. 2.

Fig. 2 Average AUC-ROC for a particular iteration


Predicting Size and Fit in Fashion E-Commerce 223

Fig. 3 Average PR-AUC for a particular iteration

The K-LF-SVM model outperformed other machine learning algorithms, such as


decision trees, random forests, logistic regression, bagging, boosting, and stacking,
regarding size and fit prediction accuracy in fashion e-commerce. Also, the experi-
ment is performed on the Precision-Recall Area Under the Curve (PR-AUC) metric
and compared with the K-LF-SVM and K-LF-ML. It is found that the proposed
method implementation is better and providing more accuracy than the K-LF-ML as
it is clearly seen in Fig. 3.
From this result, it can be ensured that the proposed methodology gives signifi-
cant implications for the fashion e-commerce industry. The K-LF-SVM model can
be integrated into online fashion retail platforms to provide personalized size recom-
mendations to customers based on their body measurements or previous purchase
history. This can help customers make informed decisions and reduce the likelihood
of ordering the wrong size or fit, resulting in a better overall shopping experience.
Moreover, the K-LF-SVM model can be used for inventory management and sup-
ply chain optimization in the fashion e-commerce industry. By accurately predicting
the demand for different size and fit categories, retailers can optimize their inven-
tory levels, reduce overstock or out-of-stock situations, and streamline their supply
chain operations. This can lead to cost savings, improved operational efficiency,
and increased customer satisfaction. Overall, the K-LF-SVM model will show great
potential for addressing the size and fit prediction challenges in the online fash-
ion retail industry. It can be a valuable tool for retailers to enhance their business
operations and customer experience.
In this section, we have discussed how the K-LF-SVM model demonstrates supe-
rior performance in predicting size and fit in fashion e-commerce compared to other
machine learning algorithms. It outperforms the K-LF-ML model in terms of accu-
racy on the AUC-ROC metrics.
224 I. Kumari and V. Verma

6 Conclusion

The use of a K-LF-SVM for predicting the size and fit in fashion e-commerce has
shown promising results in terms of accuracy and outperformed other machine learn-
ing algorithms. The K-LF-SVM model can have significant implications for the fash-
ion e-commerce industry, including improved customer satisfaction, reduced return
rates, and increased sales. By providing personalized size recommendations to cus-
tomers, the K-LF-SVM model can help customers make informed decisions and
reduce the likelihood of ordering the wrong size or fit. Furthermore, the K-LF-SVM
model can be used for inventory management and supply chain optimization, leading
to cost savings, improved operational efficiency, and increased customer satisfaction.
In conclusion, the K-LF-SVM model has great potential to address size and fit pre-
diction challenges in the online fashion retail industry. It can be a valuable tool for
retailers to enhance their business operations and customer experience.

References

1. Ricci F (2015) Recommender systems handbook. https://doi.org/10.1007/978-1-4899-7637-


6
2. Lops P, de Gemmis M, Semeraro G (2011) Content-based recommender systems: state of
the art and trends. In: Ricci F, Rokach L, Shapira B, Kantor PB (eds) Recommender systems
handbook. Springer US, Boston, MA, pp 73–105. https://doi.org/10.1007/978-0-387-85820-
3_3
3. Aggarwal CC (2016) Content-based recommender systems. In: Recommender systems: the
textbook. Springer International Publishing, Cham, pp 139–166. https://doi.org/10.1007/978-
3-319-29659-3_4
4. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering rec-
ommender systems. ACM Trans Inf Syst 22(1):5–53. https://doi.org/10.1145/963770.963772
5. Ben Schafer J, Frankowski D, Herlocker J, Sen S (2007) Collaborative filtering recommender
systems. In: Brusilovsky P, Kobsa A, Nejdl W (eds) The adaptive web: methods and strategies
of web personalization. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 291–324. https://
doi.org/10.1007/978-3-540-72079-9_9
6. Vekariya V, Kulkarni GR (2012) Notice of removal: hybrid recommender systems: survey and
experiments. In: 2012 2nd international conference on digital information and communication
technology and its applications, DICTAP 2012, pp 469–473. https://doi.org/10.1109/DICTAP.
2012.6215409
7. Mostard J, Teunter R (2006) The newsboy problem with resalable returns: a single period
model and case study. Eur J Oper Res 169(1):81–96. https://doi.org/10.1016/j.ejor.2004.04.
048
8. Peng F, Al-Sayegh M (2014) Personalised size recommendation for online fashion
9. Apeagyei PR (2010) Application of 3D body scanning technology to human measurement
for clothing fit. Int J Digit Content Technol Appl 4(7):2–9. https://doi.org/10.4156/jdcta.vol4.
issue7.6
10. Sager R, Gu S, Ru F, Bender N, Id KS (2020) Multiple measures derived from 3D photonic
body scans improve predictions of fat and muscle mass in young Swiss men. PLoS ONE 1–17.
https://doi.org/10.1371/journal.pone.0234552
Predicting Size and Fit in Fashion E-Commerce 225

11. Tsoli A, Loper M, Black MJ (2014) Model-based anthropometry: predicting measurements


from 3D human scans in multiple poses. In: 2014 IEEE winter conference on applications of
computer vision, WACV 2014, pp 83–90. https://doi.org/10.1109/WACV.2014.6836115
12. Nestler A, Karessli N, Hajjar K, Weffer R, Shirvany R (2021) SizeFlags: reducing size and
fit related returns in fashion e-commerce, vol 1, no 1. Association for Computing Machinery.
https://doi.org/10.1145/3447548.3467160
13. Daniel Shadrach F, Santhosh M, Vignesh S, Sneha S, Sivakumar T (2022) Smart virtual trial
room for apparel industry. In: IEEE international conference on distributed computing and
electrical circuits and electronics, ICDCECE 2022. https://doi.org/10.1109/ICDCECE53908.
2022.9793030
14. Eshel Y, Levi O, Roitman H, Nus A (2021) PreSizE?: predicting size in E-commerce using
transformers, pp 255–264. https://doi.org/10.1145/3404835.3462844
15. Liu K, Zeng X, Bruniaux P, Wang J, Kamalha E, Tao X (2017) Fit evaluation of virtual garment
try-on by learning from digital pressure data. Knowl-Based Syst 133:174–182. https://doi.org/
10.1016/j.knosys.2017.07.007
16. Januszkiewicz M, Parker CJ, Hayes SG, Gill S (2017) Online virtual fit is not yet fit for purpose:
an analysis of fashion e-commerce interfaces, pp 210–217. https://doi.org/10.15221/17.210
17. Sheikh AS et al (2019) A deep learning system for predicting size and fit in fashion e-commerce.
In: RecSys 2019—13th ACM conference on recommender systems, pp 110–118. https://doi.
org/10.1145/3298689.3347006
18. Lasserre J, Sheikh AS, Koriagin E, Bergman U, Vollgraf R, Shirvany R (2020) Meta-learning
for size and fit recommendation in fashion. In: Proceedings of the 2020 SIAM international
conference on data mining, SDM 2020, pp 55–63. https://doi.org/10.1137/1.9781611976236.
7
19. Sembium V, Rastogi R, Saroop A, Merugu S (2017) Recommending product sizes to cus-
tomers. In: RecSys 2017—proceedings of 11th ACM conference on recommender systems,
pp 243–250. https://doi.org/10.1145/3109859.3109891
20. Misra R, Mcauley J. Decomposing fit semantics for product size recommendation, pp 422–426
21. Abdulla GM, Borar S (2017) Size recommendation system for fashion E-commerce. In: KDD
workshop on machine learning meets fashion 2017, pp 1–7
22. Shin SJH, Istook CL (2007) The importance of understanding the shape of diverse ethnic
female consumers for developing jeans sizing systems. Int J Consum Stud 31(2):135–143.
https://doi.org/10.1111/j.1470-6431.2006.00581.x
23. Shop vintage outfits//Vintage style clothing//ModClothTM. [Online]. Available: https://
modcloth.com/
24. Rent the Runway|Rent thousands of designer clothing, dresses, accessories and more. Avail-
able: https://www.renttherunway.com
Multimodal Authentication Token
Through Automatic Part of Speech
(POS) Tagged Word Embedding

Dharmendra Kumar and Sudhansh Sharma

Abstract The part of speech (POS) tagging token matching for individual’s iden-
tification is trending and transforming into one of the secured ways under multi-
factor authentication process wherein important features either of trait have to be
necessarily match with the fused features vector of multiple traits. In this model,
POS tagging used for token matching with the features of handwritten signature
traits considering the technical feasibility of hidden Markov model (HMM) that’s a
dual stochastic process suitable for simulation and processing of feature extraction,
summation, concatenating functionalities, and performance comparison with RNN
Bi-LSTM PoS tagged word embed base classifier on ResNet-50 architecture model
for multifactor authentication to validate the multimodal model trending and trans-
forming into reliable and secured way forward to develop robust and trustworthy
system in near future. For example, IVRS call verification can be value added with
word token POS tagging concatenating with features of handwritten signature and
further adding more traits like iris, facial and finger print, i.e. image pattern base
clustering with voice feature sampling. Such multimodal model will able to strength-
ening and preventing from cyber mischievousness and surely improve the off-line
signature verification of the bank customer. The proposed automatic voice recogni-
tion model has further scope of extension for experimentation and developing the
clusters between the samples of image and embodied voice features which may be
emerged reliable validation approach for a multimodal robust system.

Keywords Word Embedded · Smote_Tomek link · N-gram · Skip connections ·


RNN Bi-LM word embed token

D. Kumar (B)
SOCIS, IGNOU, New Delhi 110068, India
e-mail: dkumar@aai.aero
S. Sharma
School of Computers and Information Sciences, IGNOU, New Delhi 110068, India
e-mail: sudhansh@ignou.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 227
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_20
228 D. Kumar and S. Sharma

1 Introduction

Two-factor authentication (2FA) and multifactor authentication (MFA) have been


adopted being key authentication mechanism used to identity the entity recognition
of any individual to prevent fraudulent transaction, access management in real time
environment in current era of the digital world, and provide added security measures
to granted access right on the basis of matching of security tags based upon the
categorical factors multifactor factor authentication (MFA) comprises majorly on
the following key factors:
(a) Knowledge, comprise the in formation of user like user id or name, address,
date of birth, state, city pin code, etc.
(b) Location, contains the geographical information related to location or street,
city, country, etc.
(c) Possession, comprise the pin code like credit card CVV, mobile SMS etc., and
(d) Inherence, contains the physical or biological trait information about image
base feature or traits like finger print, iris, facial, handwritten signature, and
voice sample, etc.
The part of speech (POS) process assign the part of speech to each word in
the sentence, and POS tagging are labelled can be used for text classification and
information extraction by tokenizing the text into words into subject, object, and
modifier category. Group of labels /tags used to tag the word is called as tagset.
In our model, POS tagging used for entity recognition of multimodal model
through double factor authentication, and validation of voice tagging with hand-
written signature using hidden Markov model (HMM), and is a statistical model used
for probabilistic relationship between a sequence of observations and a sequence of
hidden states. HMM is applied to the underlying system or process which generates
the observations are unknown or hidden called “Hidden Markov Model”.
HMM is described by A, B, and π, denoted as λ = (A, B, π ), where A represent
hidden state transition probability and B represent observation probability, and predict
the future observations or classify sequences, based on the underlying hidden process
that generates the data, and comprises of two variables.
(i) Hidden States: The hidden states are the underneath variables which generate
the observed data but not visible from outer layer.
(ii) Observations: The observations are the variables which area measurable, and
can be observed from outer layer.
Relationship among hidden states and the observations is build up through a
probability distribution, and processed using two sets of probabilities that is transi-
tion probabilities and emission probabilities. HMM algorithm is suitable for speech
recognition capable of generate POS tagged token, and handwritten signature also
support HMM for pattern matching.
Multimodal Authentication Token Through Automatic Part of Speech … 229

Our proposed model captures the correlation by matching the parameter values
based on the rule, which can efficiently do POS tagging automatically label the part
of speech. At this stage, we use word embedding technique implementing N-Gram
language sequencing, that constitute the sequences of words as a Markov process,
simplifying the probability of the next word in a sequence depends only on a fixed size
window of previous words. Our experiment used RNN transformers, implemented 4-
g POS tagging word embed for entity recognition of multimodal fused feature vector
concatenated with important features of voice and handwritten signature. Accuracy
and recall value of each feature or trait, i.e. voice feature and handwritten signature
separately compared with combined or concatenated feature vector of both traits
considered fused to evaluate model performance.
Our research work commenced on basis the survey from real life problem faced by
the population specifically in rural areas during banking transaction at custom desk
while verification of handwritten signature already stored in bank server repository,
are being matched manually by banking personal.

2 Literature Review

Speech recognition has been used effectively in every area of day-to-day activities for
identification of any individual verification and almost every area of business trans-
action. A lot of research work and experiments have been conducted with gradually
upgrading technologies and tools.
The literature study conducted by the author of this article was conducted before
the commencement of experimenting on automatic speech recognition research work.
Fohr et al. in their research work used HMM with LSTM methodology concerning
language modelling, speech synthesis, and developing the framework of transcription
broadcasting in deep neural network environment for automatic speech recognition
techniques. The research work encourages an author to extend the work by combing
speech recognition with language transcriber enhancing the feature attribute [1].
Castanedo, stated that in multisensory environments fusing and aggregating can
be applied in the area of text processing to obtain data from different sensors having
lower detection error probability and higher reliability by using data from multiple
distributed sources. Feature in-feature out (FEI-FEO) level data fusion process
addresses a set of features to improve, refine or obtain new features.
This process is also known as feature fusion, which consists of symbolic fusion,
information fusion, or intermediate level fusion; Information fusion [2].
Luscher et al. surveyed speech and vision systems, summarized the expert state-
ment in speech domain-related challenges w.r.t algorithms, NLP limitation, speech
and word processing difficulties in machine learning, CNN and deep learning tech-
niques such as dataset-related error rate w.r.t actual class and pixel class, word error
rate, etc. prediction. CNN-LSTM on switchboard (spoken words) recognition record
@6.1% [3].
230 D. Kumar and S. Sharma

Table 1 Entity tag-4 g


Type Description PoS factor
Name xxxxxxxxxxxxxxx Noun
Father’s name xxxxxxxxxxxxxxx Pronoun
Mother’s name xxxxxxxxxxxxxxx Pronoun
Language xxxxxxxxxxxxxxx Verb
Nationality xxxxxxxxxxxxxxx Interjection
Date of birth xxxxxxxxxxxxxxx Preposition
Birth place xxxxxxxxxxxxxxx Adverb
Pin code xxxxxxxxxxxxxxx Conjunction
Aadhaar card Id xxxxxxxxxxxxxxx Conjunction

Alsobhani et al. [4] experimenting model analysed that deep learning feature of
automatic speech recognition (ASR) abstracted from spectrogram energy enhances
the accuracy performance in gender discrimination and identification @ 98.57%.
POS tagging factors related to word embedding strategies of sentences differen-
tiation, trends have been explained the technical feasibility to recognize the entity
of an individual recognition differentiation of sentences by Khairani [6] and Nursitti
et al. [7].
Chiche and Yitagesu [8], Table 1, stated that POS tagged accuracy on RNN LSTM
base classifier was achieved @ 99%. Chiche and Yitagesu [8], Table 1, speaks that
HMM base classifier attained the accuracy @ 94.77%.
Alebachew reiterated and voice data usually face data imbalancing problem which
lead to inconsistence performance of the model [8].
Yadav et al. [9] experiments achieved best performance 99.78% under multimodal
with iris, finger print, and handwritten signature at score level fusion on VGG-19
classifier of deep learning environment.
Hashim et al., explained that off-line signature are normally scanned images,
may be variations in colour and resolution, and hard to forge as compared to online
signatures due to limited number of features. Hidden Markov model (HMM) can
handle inputs with variable length but need more memory and processing time [10].
Literature survey of various paper may be summaries in a way that HMM model
support, used for voice recognition for entire recognition due to inherence unique
characterizes, and also compatible for handwritten signature verification based on
pattern matching. Time complexity issues can be handled deploying GPU hardware
processing capabilities, efficient network like residual block connection architecture,
and also processing of optimized feature vector.
Author is motivated conceptualizing in his novel approach to use voice samples-
based POS word embed tagged matching for entire recognition matching with image
base features such that handwritten signature concatenated with other features like
facial, finger print, and iris.
Thus, proposed POS word embed tagged token matching termed multifactor
authentication (MFA) is unique fused with the inherence, possession, and knowledge
Multimodal Authentication Token Through Automatic Part of Speech … 231

base information for entire recognition concatenated with handwritten signature as


currently two-factor authentication (2FA) is majorly based upon PIN Code or SMS
through mobile.

3 Research GAP

It observed that HMM model is probabilistic used to implement with POS tagging
method under the stochastic approach faces vanishing and exploding problems and
performance is comparably lower than deep learning model, which is quite complex
require high end GPU to negotiation the time complexity.
Although, handwritten signature feature inherently support voice tagged features
with word embedding due to the common tagging framework involved to explore the
probability from one state to another with an inconspicuous state, and aforesaid factor
has encouraged author to explore, analyse the result of feature fusion multimodal
developing with handwritten signature and voice features sample. Also, voice data
set are normally found imbalance which leads to create outlier, and performance
inconstancy of the model.

4 Proposed Model Design

The proposed model developed on ResNet-50 (Residual Block Architecture)


comprised with RNN LSTM classifier for voice feature extraction and embedding
of automatic voice recognition. Model uses binary categorical cross-entropy loss at
output layer on SoftMax layer and Adam optimizer algorithm initialized learning
rate is set to 0.0001. Accuracy hyperparameter were applied through majority voting
rule for set of each category of samples male, female, and transgender identification.
Training and testing sets iterations carried out with five k-fold validation techniques.

4.1 PoS Tagged Embed Token Matching

The objective of experiments and research work is entity recognition implementing


multifactor authentication factor (MFA) through POS work embed tagging fused
with the information, knowledge factors constituents of multifactor authentication
factors (MFA) matching the handwritten signature based upon fused feature vector
with facial, finger print, iris traits on inherence factors in real time environment.
Our model processes audio signal in a time domain which computes the Mel-
Frequency spectrogram, and extracts the important feature with a stack of two LSTM
layers followed at dense layer that output with the token of POS word embed tagging.
The input shape is (batch size, audio samples)—which can take both variables.
232 D. Kumar and S. Sharma

Network trained to extract the acoustic feature connected with residual bocks
of networks introduced through shortcut connections between so-called “blocks
of convolutional layer” called Residual Block Network or ResNet, used for the
improvement of the flow of information, and gradients, and allows training even
deeper CNNs without occurring of optimization problem.
Each layer learns the mapping from input to output function in the residual
architecture, is the part of the learning residual function, F(x).

F(x) = H (x) − x, (1)

The depth of the convolution of the neural network gradient disappear, and network
degradation by the transformation of the residual architecture effectively alleviated,
and allows for the input x and F(x) to be combined as input to the next layer (Fig. 1).

y = F(x, {Wi }) + Ws x. (2)

Equation used if F(x) and x have a different dimensionality such as 32 × 32 and 30


× 30 and implemented with 1 × 1 convolutions, contribute the additional parameters
to the model.

Fig. 1 Layer level architecture


Multimodal Authentication Token Through Automatic Part of Speech … 233

At output layer software activation used for multi-classification purpose to


generate the binary level output (zero-0 or one-1) through categorical cross-entropy
and predicts the next word helps POS tagging embedding.
The SoftMax function that turns a vector of K real values into a vector of K real
values that sum to 1, and input values could be positive, negative, zero, or greater
than one, but the SoftMax transforms them into values between 0 and 1 termed as
probabilities and used for multi-class classification and also converts the scores to a
normalized probability distribution, and due to its unique feature used at final layer
of the neural network. The output can be displayed to a user for entity recognition
or may be used as input to other systems.
If inputs is small or negative, the SoftMax turns it into as small probability, and
if an input is large, then it turns it into a large probability, but it will always remain
between 0 and 1.
SoftMax function is defined as follows:
e zi
σ (z )i =  K (3)
j=1 ez j

z —Is the input vector of the SoftMax function build up by (z0, …, zK).
zi —All the zi values of any real value, positive, zero or negative the elements of
an input vector to the SoftMax function.
e zi —The standard exponential function is applied to each element of the input
vector that generate the positive value above 0, which will be very small if the
input was negative, and very large if the input was large, that’s the reason it is not
fixed
 K inzthe range (0, 1), and called probability function.
j=1 e j
—It is formula termed as normalization, ensures so that all output values
of said function will be sum to 1, and each will be in the range (0, 1), which
constitute a valid probability distribution.
K—It is the number of classes of the multi-class classifier.

ResNet-50, an exceptional architecture comprised of residual block connection


uses global average pooling layers and Relu activation function at hidden layers to
avoid vanishing gradient problem and better computation performance, and SoftMax
activation function at the network output layer for multi-classification purpose and
return the probability of each class (Fig. 2).

4.2 PoS Tagging Implementation and Methodology

Part of speech is essential to understand the language and relationship between words
and structure of the sentence to understand the meaning and helps in information
retrieval, questioning and answering, word sense disambiguation, etc.
234 D. Kumar and S. Sharma

Fig. 2 PoS tagged word embed system architecture

PoS tag formed by the combination Noun (N), Pronoun (Pro), Verb (V ), Adverb
(Adv), Adjective (Adj), etc. are as shown below (Fig. 3).
Our experimenting model is based upon probabilistic and deep learning base POS
tagging word embed developed for entity.

4.3 Class Imbalance Methodology

To avoid data imbalance of minority class and overfitting problem and inconsistence
model performance hybrid technique, i.e. SMOTE plus TOMEK technique used to
clean the overlapping data points for each class distributed over the sample space for
optimizing the performance of classifier models for the samples used.
After the oversampling distribution completed by SMOTE, class clusters invading
happens overriding the spaces between the classes which leads to the overfitting of
classifier. Tomek links of opposite class paired samples closest to the neighbouring
with each other. So, majority of class observations from such links need to be removed
to increase the class separation in proximity to the decision boundaries.
Multimodal Authentication Token Through Automatic Part of Speech … 235

Fig. 3 PoS Schema

For class clusters Tomek links used for oversampled minority class samples
created by SMOTE. Therefore, removing both classes observations from the Tomek
links is imperative instead of deleting the observations only from the majority class.
Our experiment model is trained on RNN Bi-LSTM classifier using POS tagged
read English language downloaded from open source libri speech data set repository.
Gender base voice sample are comprising male, female, and transgender cate-
gory trained for multi-class classification and categorical cross-entropy for entity
recognition and recall of handwritten signature concatenated with POS word embed
tagging fused with the information of knowledge, inherence factors for multifactor
authentication (MFA).
The performance comparison on HMM classifier and RNN Bi-LM embed at
frame level fusion of voice and handwritten signature, i.e. multimodal fused vector on
ResNet architecture network after decoding language vocabulary of English language
is performed, and model transferability to other language, (in our case we used Hindi
language) retaining specificity above 98% and sensitivity above 60% shows that
embed distance of two audio distinguish well whether speaker is same of different.
It is observed that median distance is significantly lower when entity based on POS
tagged is the same.
Normalize feature vector processed into the neural network design, and labelled on
binary our for categorization of audio stream into male, female, or transgender data
sample and RNN processes in its cyclic path gets hidden layer updated receiving
the data from input layer predicting word sequence for next the element. Perfor-
mance of RNN LSTM classifier on ResNet-50 architecture combinedly outperform at
frame level fusion after decoding language vocabulary of defined lengths called word
236 D. Kumar and S. Sharma

LSTM, cross-entropy loss uses for predicting the next word as objective function at
SoftMax output layer.
Training model by expert transcriber on the Bi-LSTM using read English language
on open source libri speech data set repository for male, female, and transgender
category purposed to avoid speech variation, steady voice data sample under super-
vised learning to avoid inconsistency. The learning rate settled (dropout) @0.5 while
designing the network followed 2nd, and 3rd hidden levels remain identical.

4.4 Classification Methodology

The objective of feature classification and clustering to bind them in the same class
together with feature space, and make them separable intra class as well inter class
and different category. In our model handwritten signature features are concatenated
with POS tagged voice token, and extracted the optimal attributes to form multi-
modal fused feature vector of two different cluster pattern, i.e. image base and voice
samples. So, it is obvious to find out the ways for transforming the audio input into
a time–frequency domain and convert them into vector form, and embed POS with
word tagging of knowledge and inherence factors for the purpose of multifactor
authentication factor (MFA).
Feature map will use the inheritance property of spectrograms for convolutional
and pooling operations so as to extract the voice samples important features for entity
recognition through the token of POS word embed tagging to recall the feature vector
concatenated with handwritten signature.
Our experiment with libri speech dataset contains the collection of MPV audio files
around 10,000 wherein dot.tsv file is categorized with filename, sentence, accent, age,
gender, etc. Gender option related with male, female, and transgender voice sample
identification. Voice sample size kept 86,500, labelled with age, street, city, location,
etc.

4.5 Hidden Markov Model (HMM)

Hidden Markov model is the base of set of successful technique for acoustic
modelling in speech recognition that is a model-driven approach comprises a latent
or hidden state S(t), follows the Markov process mentioned below;

P(S(t + 1) = r |S(t) = s) = q(r, s), (4)

where P(S) is the probability of observation persisting underlying state, and observa-
tion “O” at a time “T ” to predict the state “S”, but then use predicting the observations
before and after T.
Multimodal Authentication Token Through Automatic Part of Speech … 237

State “S” that changes over time is discrete or continuous, at each timestamp, an
“observation” “O” that is used to predict at the state with some level of confidence.
The observation “O” at time “T ” predicts at the state “S”, but then refine the prediction
using the observations before and after “T ”. The states are hidden, need not be actually
observed.

4.6 Entity Recognition

Identification and classification named entity in the text into predefined categories like
name, organization location, etc. extracting information from unstructured text data
and represent them in to machine readable formate adopting IBO notation approaches
wherein (B) stands beginning, and inside (I) entities and (O) used for non-entity
tokens.

4.6.1 Labelling Sequencing

The text label sequencing into predefined category of POS tagging used to determine
the context of words around them with IOB2 labelling scheme.
For each token in an entity, the tag is prefixed with one of these values:
“B-” (beginning)—The token is a single token entity or the first token of the
multi-token entity.
“I-” (inside)—The token is a subsequent token of a multi-token entity. In the list of
entity tags, IOB2 labelling scheme used to identify the boundaries between adjacent
entities of the same type by using the logic:
• If Entity (i) has prefix “B-” and Entity (i + 1) is “O” or has prefix “B-”, then
Token(i) is a single entity.
• If Entity has prefix “B-”, Entity(i + 1), …, Entity(N) has prefix “I-”, and Entity
(N + 1) is “O” or has prefix “B-”, then the Token (i: N) termed as multi-token
entities, and support the token matching for multifactor authentication mechanism
for multimodal model system.

5 Results and Discussion

Experimenting performance of POS tagged base token matching model fused vector
with voice and handwritten signature achieved @98.28% and result shows that
accuracy parameter with respect to gender classification among male, female, and
transgender category the transgender achieved highest accuracy @99.32%.
238 D. Kumar and S. Sharma

The probability of transgender’s voice detection perhaps is due to specification of


unique high pitch. Although, voice performance also depends on the voice quality,
noise interference while recoding quality of voice samples, speaker’s pronunciation,
ability of lungs to propagate the spoken words and health condition being external,
significantly factors for detection of similarity of words correctly.
The model specificity (98.88%) and sensitivity (75.33%) over test dataset implies
entity recognition correctly identifies between 2 samples of different speakers at
98.88% rate on occurrence of the times, and individual not recognize on voice tagged
base token may try again, and there are chances that recognition will fail in 5 tries,
i.e. (1–0.7533)5 which is below 1% under multimodal model.
Although, model performance evaluated is data independent and observed
consistence declared as Best Fit Model.

5.1 Fold Cross-Validation

Five K-fold cross-validation technique used for model training, testing, and vali-
dation on 80:20:10 ratio over total datasets for data validation and for the purpose
of multimodal model performance. The epoch movement began at 0.001 learning
rate and training lasted with 50 epochs. Adam optimizer algorithm to handle the
noisy inferences and training could last on 50 epochs. Batch size initialized at 64 k,
and continued till the validation losses stabilization or stabilized over its accuracy
peak, at output SoftMax layer applied for binary categorical cross-entropy for entity
recognition on zero or one value of the output vector.

6 Performance Analysis and Discussion

6.1 Performance of POS Tagged Word Embed Token


on Multimodal Model

Simulation experimenting results shows our RNN Bi-LSTM classifier performs


better using POS embed token matching under supervised learning on multimodal
model, then the multimodal model performance based upon CNN approach [5] as
shown below.
POS tagged base token matching for classification with respect to male 98.03%,
female 98.71%, and transgender 99.32 trending line shows that the transgender graph
is moving upward comparatively better than the male and female voice samples and
classification among female and male category is clearly identifiable female and male
voice samples are clearly identifiable perhaps due to the variation of vocal cord high
pitch of transgender class (Tables 2, 3).
Multimodal Authentication Token Through Automatic Part of Speech … 239

Table 2 Performance parameters comparison of multimodal with fused feature vector (voice and
handwritten signature) on ResNet architecture
Work Accuracy (%) Recall (%) Precision (%) F1 score (%)
Multimodal [5] (CNN) 99.83 98.34 96.15 95.24
Our model, PoS (RNN) 99.89 99.47 98.99 99.54

Table 3 Performance comparison—POS tagged word embedding classifier


99.50 99.32

99.00 98.71

98.50 98.37
97.97 98.03
98.00
Accuracy %

97.50
97.01
97.00

96.50

96.00

95.50
Accuracy (%) HMM Accuracy (%) RNN Bi-LM
Classifire Embeded

Male Female
Transgender Linear (Transgender)

Analysis
It is observed that accuracy performance increases by one (1) % approximately on
PoS tagged embed token match with respect of multimodal model in comparison
earlier research work on CNN base multimodal [5].
ResNet model performance also contributed due to skip connection capabilities of
residual block convolution network, and 4-g word embedding sequencing method-
ology during model training. SMOTE-Tomek efficiently resolve lower detection
accuracy of data imbalancing problem.
240 D. Kumar and S. Sharma

Table 4 Time complexity


Classifier Time (S) N-gram tuple
RNN Bi-LSTM Embed 3.47 4g
HMM 4.89 4g

6.2 Time Complexity O (Log n) in Binary Search

Time complexity can be manged optimizing resource and feature selection algorithm
and lower detection accuracy issues by deploying deep layer network and Bi-LSTM
classifier over GPU base processing platform.
POS N-4 g structures of defined format having limited words quantity under super-
vised learning for word embed PoS token used for entity recognition significantly
decrease the computational cost as well processing time of epoch, iteration.
Algorithm O (log n) binary search, split the problem in to half split the on each
iteration, O(log n) is faster that sequentially search one by one search.
Algorithm,
• Alphabetically binary search commences looking forward from the right. Other-
wise, look in the left half.
• Divide the remainder to half again, and repeat above step until target search of
word emerged out.
In our model each iteration take time in micro second as shown below (Table 4).
Time taken with respect to each iteration of PoS word embed token under multi-
modal model is comparatively better than the HMM model. So, we can say them
algorithm time complexity significantly optimized.

6.3 Model Transferability of PoS Tagged Embed Token

The model transferability to other language is relatively well, as in our case model
transferability used for Hindi language which retained the specificity above 98%
and sensitivity above 60% shows that embedding distance of two audio distinguish
well whether speaker is same of different, and also observed that median distance is
significantly lower when voice samples belong the same entity. Transfer learning to
Hindi language data set used from open resources available at Keggle data repository
(Fig. 4, Table 3).
Multimodal Authentication Token Through Automatic Part of Speech … 241

Fig. 4 Median distance box a


plot, a English language,
b Hindi language

7 Conclusion

The experimenting result of POS embed token on fused feature vector based on
voice signal frequency pattern of voice samples and handwritten signature under
multimodal model has achieved better accuracy 99.89% higher than image patterns
base cluster under multimodal model 99.83% [5].
Performance enhancement of RNN Bi-LSTM classifier is superior than HMM
classifier due to skip connect of residual network deployment and processing of
information in cyclic path in deep neural network.
242 D. Kumar and S. Sharma

Gender discrimination-based considering POS embed token with respect to trans-


gender category on fused feature vector of multimodal model observed is highest
99.32% comparatively better than male and female counter parts.
SMOTE-Tomek link applied to deal efficiently over sampling, and outlier problem
to ensure consistence model performance.
Language transferability also observed relatively well discriminating small
median distance to recognize the speaker entity identifying with another speaker
entity. Author is intended to extend the research work to find out the performance
POS embed token matching on multimodal fused vector concatenating of image
pattern base features vector—facial, fingerprint and iris to validate the multifactor
authentication to build up robust secure system.

Data Availability The dataset used in this study taken from an open source libri speech voice
repository. https://librivox.org/, https://wiki.librivox.org/index.php?title=Copyright_and_Public_
Domain, Email: gdps@gi.ulpg.cs for handwritten signature dataset, https://github.com/goru001/
nlp-for-hindi/.

References

1. Fohr D, Mella O, Illina I (2017) New paradigm in speech recognition. https://hal.science/hal-


01484447
2. Castaneda F (2013) A review of data fusion techniques. Hindawi, p 19. https://doi.org/10.1155/
2013/704504
3. Luscher C, Xu J et al (2023) Improving and analysing neural speaker embeddings for ASR.
ICASSP
4. Alsobhani A, Alabboodi HMA, Mahdi H (2021) Speech recognition using convolution deep
neural networks. J Phys Conf Ser 1973:012166. https://doi.org/10.1088/1742-6596/1973/1/
012166
5. Dkumar S et al (2022) Multimodal biometric human recognition system: a convolution neural
network based approach. https://doi.org/10.1109/SMART55829.2022.10047774
6. Khairani M (2019) Effective teaching the parts of speech in a simple sentence. IJIERM
1(1):1258
7. Nursitti S, Maili NP et al (2022) Student perceptions on part of speech after taking integrated
English. Jakarta 9(2)
8. Chiche A, Yitagesu B (2022) Part of speech tagging: a systematic review of deep learning
and machine learning approaches. J Big Data 12:561. https://doi.org/10.1186/s40537-022-005
61-Y
9. Yadav AK et al (2021) Fusion of multimodal biometric of fingerprint, iris and handwritten
signature traits using deep learning technique. Turk J Comput Math Educ 12(11):1627–1638
10. Hashim Z et al (2022) A comparative study among handwritten signature verification methods
using machine Learning techniques. Hindawi, Scientific Programming, London. https://doi.
org/10.1155/2022/8170424
Sensitive Content Classification

Harsha Vardhan Puvvadi and Shyamala L

Abstract In this era of ease of sharing information on the Internet, it has become
incredibly easy to share any sort of information online. However, this ease of sharing
can come with a great risk of sharing personal or private information, whether know-
ingly or unknowingly. The potential consequences of compromising information on
the Internet can be harmful as it can lead to various forms of online harassment and
malpractices. This is why individuals need to be careful about what they share online.
A medium is required that can classify the sensitivity of a text to alert the individuals.
Many existing approaches classify the text based on the number of sensitive tokens
identified. However, this is not enough because these approaches cannot understand
the context of the text. In this paper, we proposed a hybrid model leveraging the
advantages of CNN, BiLSTM, and multihead attention mechanism, we analyzed
the patterns and compared the results provided by standard machine learning and
deep learning models, we also discussed the advantages and disadvantages of every
model, in extension to do this we also. Our proposed model showed similar to better
results than that of the ALBERT model with a significantly much shorter amount of
training time.

Keywords Text classification · Privacy · Context analysis

1 Introduction

The Internet has revolutionized the way we share and consume information. With
just a few clicks, users can access a vast array of content. The Internet became a
platform where users share whatever they want, and they mostly have the freedom to
share any kind of information, content, and opinions on social networking sites [1].
According to the worldwide digital population [2], there are currently 5.16 billion

H. V. Puvvadi (B) · Shyamala L


Vellore Institute of Technology Chennai Campus, Chennai, Tamil Nadu, India
e-mail: harshapuvvadi711@gmail.com
Shyamala L
e-mail: shayamalal@vit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 243
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_21
244 H. V. Puvvadi and L. Shyamala

active users on the Internet, of which 4.76 billion people use social media platforms
regularly. People generally use social networking sites out of boredom, they try to
seek distraction from the virtual world where they can easily connect with others of
the same interest [3]. In contrast to the localized distribution of information in the real
world, the information shared publicly on social media can be accessed by anyone,
at any time, and from anywhere on the Internet [4]. The presence of people on social
media leads to the disclosure of information to strangers due to the trade-off for
increasing social and professional circles, thereby increasing the risk of privacy [5].
This information usually includes location, hobbies, personal interests, connections
with other people, and many more [5]. This information can be used as leverage to
perform any sort of malpractice or they may also use it to flatter to gain favors.
Personal information can be leaked online, and it can lead to a range of negative
consequences, including identity theft, financial fraud, cyberbullying, and invasion
of privacy [6]. Individuals who spend a significant amount of their time on social
media and actively engage with these platforms may unknowingly divulge certain
confidential information. Although seemingly insignificant at first, this information
could become dangerous if used for malicious intentions.
The users need to pay attention to what they are sharing, as it could be used against
them. This arises to build a medium that can detect and classify the contents of the text
as sensitive or not. Most of the existing solutions use a vocabulary full of sensitive
words or personally identifiable information (PII) such as full name, address, phone
number, Social Security Number (SNN), to classify whether a text contains sensitive
content or not [7, 8]. However, this approach alone is not sufficient. Understanding
the context of a text is much more efficient as it can grasp the subtle nuances present
in the text.
Standard machine learning models lack the ability to understand text context.
They rely on predetermined features like word counts or TF-IDF scores, which may
not capture the full context [9]. Even deep learning models like CNN and LSTMs
have their limitations. CNN excels at capturing local temporal dependencies but may
not grasp the complete context [10]. LSTMs struggle with long-term dependencies
across extensive word spans, resulting in contextual loss [11]. To address these lim-
itations, the attention mechanism is employed to focus on specific parts of input
sequences, enabling the capture of long-range dependencies and better retention of
information [12].
This study centers on the development of a model that leverages the advantages
of established deep learning techniques, including CNN, LSTM, and the multihead
attention mechanism. Our objective is to create a model capable of contextual com-
prehension, enabling it to assess whether a given text comprises sensitive information.

2 Literature Review

looseness1The most popular approach to classify the sensitivity of a text is to iden-


tify the sensitive keywords pertaining to it. The sensitivity classifier developed by
Sensitive Content Classification 245

Geetha et al. [8] was a framework Tweet Scan Post (TSP) to identify the sensitive
private data disclosed by any Twitter user. These authors explained how they iden-
tified certain sensitive privacy keywords, which boosts the classifier to ascertain the
sensitivity and also the degree of the sensitivity.
Bioglio and Pensa [13] developed an annotated corpus from a pool of Twitter’s
tweets. In their research, they extensively analyzed the data to see if the context of the
words depends on the specific set of words. The authors performed the classification
with the traditional approaches of machine learning and deep learning models.
To prevent data leakage from the documents, Trieu et al. [14] proposed a technique
to calculate the document sensitivity based on the semantic ranking of the documents.
They used Twitter-based document encodings (TD2V) to encode the document and
assign the document’s sensitivity based on the majority vote determined by their
clustering technique of auto-retrieval.
It has become clear that semantic analysis is needed in order to classify the sensi-
tivity of a text more effectively. Researchers started to focus on the semantic analysis
approach to yield better results [13–15]. Jin et al. [16] proposed a methodology to
build a sensitive content taxonomy in a tree structure where the leaf node determines
whether the subtree is sensitive or not. Training pages of the labeled data are taken
to build the sensitive classifier, from there the sensitivity of the unlabelled pages is
determined through the probabilistic values.
Text sanitization involves human experts removing sensitive content from doc-
uments. Sánchez and Batet [17] developed a model that can detect and remove
sensitive data in documents. Their proposal automates the detection of terms that
could disclose sensitive information through semantic inferences, providing an intu-
itive method to establish privacy requirements and determine the degree to which
sensitive data should be hidden. Sequential models, such as LSTM-CNN, are widely
regarded as effective approaches for understanding text context [18]. Zhang et al.
[19] proposed the use of LSTM-CNN to leverage the advantages of each model.
Lately, many researchers started to use attention mechanisms in text classification
problems, they developed new hybrid models to tackle NLP problems efficiently. The
most common practice is to use CNN, RNN, and one of the attention mechanisms,
the order of these layers can be varied based on the approach they are trying to solve
the problem. Chen et al. [20] proposed an approach where bidirectional GRUs are
associated with an attention mechanism, to generate context weights, and a CNN at
the end to extract the deep features.
Numerous research papers have utilized or constructed sensitive taxonomies com-
prising lists of sensitive terms to assess text sensitivity. However, there are instances
where these terms might be absent in a text, yet the context can provide insights into
whether the text contains potentially harmful sensitive information. This drove us to
develop a model capable of comprehending the context of a text, irrespective of the
presence of explicit sensitive terms. Consequently, this approach holds the potential
to outperform existing models, offering a more comprehensive solution.
246 H. V. Puvvadi and L. Shyamala

3 Methodology

In this section, we cover all of the important steps or procedures that were performed
in this research.

3.1 Analysis of Data

We employed two analysis techniques, Bag-of-Words and TF-IDF, to uncover hidden


patterns in the data. These methods classify sequences based on word frequency and
probabilistic values. Words that appear frequently during training have a higher prob-
abilistic value, impacting the classification process. We computed word frequencies
and inverse-document rankings for “sensitive” and “non-sensitive” classes to support
the analysis that individual words do not weigh any importance when identifying the
sensitivity of a text. In Table 1, sensitive versus non-sensitive words, it is visible
that there aren’t explicit sensitive terms that could skew the sensitivity parameter of
a text. This further proves our point that understanding or comprehending the text
could potentially classify the text better (Fig. 1).

Table 1 Sensitive versus non-sensitive words


Bag-of-Words top 15 frequent words TF-IDF top 15 ranked words
Sensitive Non-sensitive Sensitive Non-sensitive
Propname Propname Propname Propname
Like Day Day Day
Day Going Love Going
One Work Like Work
Time Back New Back
Get Get One Home
Go Tomorrow Time Tomorrow
Love Home Go Get
New Time Want Today
Think Today Get Tonight
Want Go Happy Night
Good Night Good Sick
People Got Need Go
Life Week Think Time
“Propname” is a generalized name that was replaced with the real names of people pertaining to
the dataset
Sensitive Content Classification 247

Fig. 1 CNN-BiLSTM with multihead attention hybrid model

3.2 Proposed Model

We propose to build a hybrid model that constitutes the GloVe embedding layer,
CNN, bidirectional LSTM, and a multihead attention mechanism. This proposed
model is expected to comprehend the context of a text and analyze the sensitivity of
the text in an effective manner. The working principle of the model is as follows:

• The first step in building this model is to create an embedding layer that would
perform positional embedding of the tokens that are associated with the weights
provided by the pre-trained GloVe embedded matrix.
• The output from the embedding layer is passed onto the CNNs, the reason for
choosing CNN over here is to extract the broader feature patterns, this can be
done by applying a lower number of filters which would primarily filter out the
irrelevant information.
• Adding a max-pooling layer is optional. Since the dataset is of moderate size, we
skipped using the max-pooling layer as it might make the model lose the necessary
information.
• The output from the CNN is passed onto the bidirectional LSTM, designed to
capture long-term dependencies in the input sequence.
• The output from LSTM is passed to the multihead attention transformer, which
contextualizes the information and weighs the importance of each input element
based on its relationship to other elements in the sequence.
248 H. V. Puvvadi and L. Shyamala

• Finally, the outputs from different heads are concatenated to get optimal weights
for each token, it is then passed to the final neuron that uses the sigmoid function
to classify the text.

3.2.1 Word Embeddings

Word embeddings are a technique where they convert the words into an n-dimensional
vector to represent a particular word. Words can have multiple meanings based on
the context; word embeddings allow the words to be mapped to a lower-dimensional
space, which makes the models understand the relationships between different words.
Here, We used GloVe embeddings [21] (version: GloVe.twitter.27B, dimensionality:
50) to represent words in our text classification model.

3.2.2 Token and Positional Embeddings

Token embedding is a process of converting tokens into a vector representation of


n-dimensions. We used the GloVe embeddings to assign the embedding values for
the tokens.
Positional embedding enables the model to understand the position of words
present in the sequence [12]. The order of words matters a lot when trying to contex-
tualize information, if the relative positioning of the words changes then the context
would also differ. Positional embeddings are developed using with a maximum out-
put sequence length, where these embeddings represent the indices of a maximum
length sequence.
Here, we used a simple approach to express the relative position of tokens in a
sequence by the addition of a token embedding with its index of positional embed-
ding.

3.2.3 Convoluted Neural Network

The output from the embedding layer is then passed to the convolutional layers. We
used two CNN layers of small filter sizes. The 1D-CNNs are capable of capturing
local patterns within the fixed-sized window. Although the small filter sizes would
not be able to capture the entire context of the sequences, they would filter out the
unnecessary noises about it, which is good because it would minimize any anomalies
that might hinder the contextual analysis process down in the layers present further
below.
Sensitive Content Classification 249

3.2.4 Bidirectional Long Short-Term Memory

Bidirectional LSTM (BiLSTM) is a type of recurrent neural network that has been
widely used in natural language processing tasks, a BiLSTM consists of two LSTMs,
where one processes an input sequence in both forward and the other one in back-
ward directions, the output from both layers is concatenated to produce final output
sequence. This would enable it to capture the long-term dependencies effectively
because of its ability to contextualize information both from the past and the future.
At this step, BiLSTM will capture most of the important long-term dependencies
present in the text. The reason why this approach works is because of the short
sequences that are present in the dataset.

3.2.5 Multihead Attention

Multihead attention mechanism was first proposed by Vaswani et al. [12]. In this
mechanism, the input is sent to multiple “heads” where each head does its own
attention mechanism that is isolated from the other heads. Each head is trained to
focus on a certain aspect of the input. The multihead attention layer takes the output of
the LSTM layer and uses a self-attention mechanism to contextualize the information.
This layer allows the model to weigh the importance of each input element based on
its relationship to other elements in the sequence.
The multihead attention mechanism involves projecting the input sequence into
separate “query,” “keys,” and “values” matrices. The dot product between the “query”
and “key” matrices determines token relevance, which is then scaled to prevent gra-
dient issues during training. The scaled dot product goes through a softmax function
to generate a probability distribution over the input sequence for each token in the
“query” matrix. This distribution weights the “value” matrix, resulting in a weighted
sum of values for each token in the “query” matrix. The weighted sums are con-
catenated across multiple attention heads and projected back to the original input
dimension, forming the output of the multihead attention layer.
The concatenated outputs are globally averaged, reducing them to (50, 1) vectors.

3.3 Experimental Setup

The Twitter dataset used in this research was provided by Bioglio and Pensa [13]. It
consists of 9917 sequences. We added a column to the dataset called “class”, which
consists of labels “1” for sensitive and “0” for non-sensitive. There were a total of
6581 sequences of the “non-sensitive” class and 3336 sequences of the “sensitive”
class.
In our experiments, we used the TensorFlow implementation of Keras to build the
proposed model. We used hyperparameter tuning to find out the best hyperparame-
ters that yielded the best results. RandomizedSearchCV is a hyperparameter tuning
250 H. V. Puvvadi and L. Shyamala

method that uses random sampling of a distribution of hyperparameters to find the


best set of hyperparameters for a given model. It is a randomized search over a range
of hyperparameters from a user-defined distribution and evaluates the model’s per-
formance with each sampled set of hyperparameters. For the token and positional
embedding, the set of embedding sizes was 25, 50, 100, 200. For the hyperparam-
eters of CNN, we chose filter sizes of 8, 16, 32 and kernel sizes of 3, 5 with ReLu
activation. For the hyperparameters of LSTM, we chose the units set of 16, 32, 64
along with that dropout layers are placed after each LSTM layer, the set of rates
are 0.4, 0.5, 0.6, 0.7. There are two more hyperparameters present in the multihead
attention mechanism, they are the number of heads and the number of units for the
final feed-forward neural network. The set used for the heads was 4, 6, 8 and the set
used for the feed-forward neural network was 32, 64, 128. Each iteration is of 15
epochs with four cross-validations.
After performing the hyperparameter tuning on the proposed model the ideal
hyperparameters are here as follows:
• The embedding size for token and positional embedding was 50.
• Two CNN layers with filter sizes of 8 and 16, with a kernel size of 3, and activated
by the ReLu function.
• A bidirectional LSTM layer of 128 units, followed by a dropout rate of 0.7, and
an LSTM layer of embedding size i.e., 50.
• As for the multihead attention mechanism hyperparameters, the number of heads
required was 8 and the number of units required for the final feed-forward network
was 64.

All experiments were conducted on Google Colab with an Intel(R) Xeon(R) CPU
operating at 2.00 GHz with 12.7 GB of system RAM, and one NVIDIA Tesla T4
GPU with 15.3 GB of GPU RAM (Fig. 2).

Fig. 2 Graph of accuracy obtained from different models


Sensitive Content Classification 251

4 Results and Discussion

We performed the text classification using various different models to understand the
behavioral patterns of those models and to determine which combination of these
models would give the best possible result. Analysis of these results gave us a better
understanding to determine how these models interact with the data and interact
among themselves when stacked in a hybrid model.
TF-IDF is generally considered to be a better approach than BoW. The reason is
that BoW doesn’t take into account the order of words present in a sentence; it simply
takes the frequency of words that appears in the whole data without any regard to
the context present in the text. Whereas TF-IDF takes into account the importance
of a word present in a sentence and its relative frequency across the entire dataset.
This approach gives more weight to words that are in the dataset and less weight
to the words that are in common. TF-IDF can be effective in reducing the influence
of common words and improves the accuracy of classification. But upon comparing
the results generated from both TF-IDF and Bag-of-Words methods, it is clear that
BoW had performed better, which means that the common words present within the
text had a larger influence.
It was observed that deep learning models that used GloVe pre-trained word
embeddings performed far better than Word2vec word embeddings because GloVe
tends to perform semantic analysis between the words to establish a meaningful
relationship between the words, whereas Word2vec embeddings capture the syntactic
relationships between the words (Table 2).
LSTM showed better performance than GRU, this might be due to LSTM con-
taining one extra gate, which performs an additional matrix multiplication, and its
ability to detect the long-term dependencies of words in a text, which also allows it
to capture more complex relationships between input sequences and output labels.
This proved that LSTM is better suited for the hybrid model.
LSTM associated with GloVe embeddings performed better than CNN with GloVe
embeddings, as LSTM can capture long-term dependencies present in the text, and
have the ability to selectively remember inputs based on their relevance to the current
context, whereas CNN tries to capture feature patterns or some specific keywords.
This led to the usage of small-sized CNN filters to weed out the noises and the higher
amount of units in LSTM to identify necessary long-term dependencies.
The multihead attention mechanism alone did not improve text classification com-
pared to standard deep learning approaches, possibly because it focused more on
irrelevant content. Multihead attention performs better with larger texts, allowing it
to reassess token weights and capture the overall context. However, it struggles with
small texts due to limited content. In such cases, CNN and LSTM are beneficial,
as CNN filters out irrelevant content with its small-sized filters, while LSTM cap-
tures long-term dependencies, helping attention layers stay focused on the main con-
text. The proposed model outperformed other models, including ALBERT, despite
ALBERT requiring significantly longer training time. The proposed model com-
pleted training within minutes.
252 H. V. Puvvadi and L. Shyamala

Table 2 Results obtained from different models


Model Accuracy Precision Recall F1-score
(TF-IDF)—multinomial naive Bayes 0.71 0.72 0.71 0.65
(TF-IDF)—logistic regression 0.72 0.73 0.72 0.72
Bag-of-Words—multinomial naive Bayes 0.72 0.72 0.72 0.72
Bag-of-Words—logistic regression 0.72 0.72 0.72 0.72
Multilayer perceptron 0.71 0.71 0.71 0.71
Bidirectional-LSTM (Word2vec embeddings) 0.66 0.61 0.66 0.56
CNN-1D (Word2vec embeddings) 0.66 0.62 0.66 0.55
GRU (GloVe embeddings) 0.66 0.44 0.66 0.52
Bidirectional-LSTM (GloVe embeddings) 0.74 0.73 0.74 0.73
CNN-1D (GloVe embeddings) 0.73 0.73 0.73 0.73
CNN-bidirectional-LSTM (GloVe embeddings) 0.72 0.72 0.72 0.72
Multihead attention (heads = 8, dimension = 32) 0.72 0.71 0.72 0.7
CNN-LSTM-attention (1-encoder + 1-decoder) 0.71 0.71 0.71 0.71
CNN-bidirectional-LSTM-multihead-attention 0.76 0.76 0.76 0.76
(GloVe embeddings)
ALBERT (A Lite Bidirectional Encoder 0.76 0.75 0.76 0.75
Transformer)

Fig. 3 Concatenated weights assigned by the proposed model

The presented visualization in Fig. 3 illustrates the token weights assigned by


the proposed model. Among the analyzed texts, the first three were categorized as
non-sensitive, while the remaining texts were classified as sensitive. Although the
first three texts revealed some information, they were deemed non-sensitive as they
posed no direct threat to the user. Rather, they resembled announcements or general
updates. Conversely, the next two texts were accurately classified as sensitive due to
their disclosure of potentially harmful information. If obtained by malicious actors,
this information could potentially lead to harmful actions.
Sensitive Content Classification 253

5 Conclusion

This research emphasizes the significance of considering the context in determining


the sensitivity of a text, as it goes beyond the presence of specific sensitive terms
or words. The study revealed that even texts without sensitive tokens can still be
sensitive based on contextual information. This highlights the need to account for
the overall context when assessing sensitivity. Additionally, relying solely on pre-
existing lists of sensitive words or regex patterns is not always effective in capturing
the nuanced expressions of sensitivity in language.
Furthermore, the proposed model outperformed other advanced models in terms
of accuracy and training time. This indicates its effectiveness in identifying sensi-
tive texts, which holds relevance for areas like online content moderation, where
accurately detecting sensitive content is crucial.

Acknowledgements We would want to express our gratitude to Ruggero G. Pensa, Ph.D., Univer-
sity of Torino, Italy, for providing us with the dataset.

References

1. Li K, Cheng L, Teng CI (2020) Voluntary sharing and mandatory provision: private information
disclosure on social networking sites. Inf Process Manage 57(1):102128
2. Ani Petrosyan (2023) Worldwide digital population. https://www.statista.com/statistics/
617136/digital-population-worldwide/
3. Stockdale LA, Coyne SM (2020) Bored and online: reasons for using social media, problematic
social networking site use, and behavioral outcomes across the transition from adolescence to
emerging adulthood. J Adolesc 79:173–183
4. Ma Q, Song HH, Muthukrishnan S, Nucci A (2016) Joining user profiles across online social
networks: from the perspective of an adversary. In: Proceedings of the IEEE/ACM international
conference on advances in social networks analysis and mining (ASONAM), Aug 2016, pp
178–185
5. Aghasian E, Garg S, Gao L, Yu S, Montgomery J (2017) Scoring users’ privacy disclosure
across multiple online social networks. IEEE Access 5:13118–13130
6. Isaak J, Hanna MJ (2018) User data privacy: Facebook, Cambridge Analytica, and privacy
protection. Computer 51(8):56–59
7. Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security
and privacy. J Big Data 5(1):1–18
8. Geetha R, Karthika S, Kumaraguru P (2021) Tweet-scan-post: a system for analysis of sensitive
private data disclosure in online social media. Knowl Inf Syst 63:2365–2404
9. Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf
Ser 2171(1):012021. IOP Publishing
10. Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis,
University of Waterloo
11. Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks.
In: International conference on machine learning, May 2013, pp 1310–1318. PMLR
12. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I
(2017) Attention is all you need. In: Advances in neural information processing systems, vol
30
254 H. V. Puvvadi and L. Shyamala

13. Bioglio L, Pensa RG (2022) Analysis and classification of privacy-sensitive content in social
media posts. EPJ Data Sci 11(1):12
14. Trieu LQ, Tran TN, Tran MK, Tran MT (2017) Document sensitivity classification for data
leakage prevention with twitter-based document embedding and query expansion. In: 2017
13th international conference on computational intelligence and security (CIS), Dec 2017.
IEEE, pp 537–542
15. Battaglia E, Bioglio L, Pensa RG (2020) Classification-based content sensitivity analysis. In:
CEUR workshop proceedings, vol 2646, pp 326–333. CEUR-WS.org
16. Jin X, Li Y, Mah T, Tong J (2007) Sensitive webpage classification for content advertising. In:
Proceedings of the 1st international workshop on data mining and audience intelligence for
advertising, Aug 2007, pp 28–33
17. Sánchez D, Batet M (2016) C-sanitized: a privacy model for document redaction and saniti-
zation. J Assoc Inf Sci Technol 67(1):148–163
18. Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf
Ser 2171(1):012021. IOP Publishing
19. Zhang J, Li Y, Tian J, Li T (2018) LSTM-CNN hybrid model for text classification. In: 2018
IEEE 3rd advanced information technology, electronic and automation control conference
(IAEAC), Oct 2018. IEEE, pp 1675–1680
20. Chen X, Ouyang C, Liu Y, Luo L, Yang X (2018) A hybrid deep learning model for text clas-
sification. In: 2018 14th international conference on semantics, knowledge and grids (SKG),
Sept 2018. IEEE, pp 46–52
21. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation
Online Social Networks: An Efficient
Framework for Fake Profiles Detection
Using Optimizable Bagged Tree

Chanchal Kumar, Taran Singh Bharati, and Shiv Prakash

Abstract Fake profiles pose a significant challenge to the integrity and security of
online social networks (OSNs), making it imperative to develop reliable detection
techniques. Spammers invariably adjust their strategies to circumvent filters, making
it difficult to come up with impeccable design filters that can readily accommo-
date newly developed throes of fake profiles. Filters that primarily focus on textual
information have limits while dealing with non-textual forms of spam, such as the
ones based on images or audio. In this paper, we have employed a comprehensive set
of features, including profile information, network characteristics, and behavioural
patterns, to capture the distinguishing characteristics of fake profiles and conducted
experiments on a large-scale dataset of OSN profiles, comprising both genuine and
fake profiles. The Optimizable Bagged Tree algorithm allows us to reach optimize
decision tree structure while leveraging the benefits of ensemble learning. By tena-
ciously pruning the tree’s structure and trimming irrelevant branches, the proposed
framework achieves better generalization and robustness. The results demonstrated
that our model outperforms traditional detection methods in terms of accuracy. More-
over, our approach exhibits high efficiency, enabling real-time detection of fake
profiles in OSNs.

Keywords Fake profiles · OSNs · Optimizable bagged tree · Machine learning


classification · Anomaly detection · Feature extraction · Performance analysis

C. Kumar · T. S. Bharati
Department of Computer Science, Jamia Millia Islamia, New Delhi, India
S. Prakash (B)
Department of Electronics and Communication, University of Allahabad, Prayagraj, India
e-mail: shivprakash@allduniv.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 255
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_22
256 C. Kumar et al.

1 Introduction

In this technology-driven era, the way people are connecting with others is changing
at an exponential rate. The emergence of new cellular technology and computa-
tional paradigms has changed human life by connecting them across the planet. The
first researcher was an anthropologist John Barnes, who researched social networks.
The online social networking era started with the launch of SixDegree.com in 1997
by Andrew Weinrich [1]. In the twenty-first century, OSNs like Facebook, Twitter,
LinkedIn, etc. are empowering connectivity across the globe. As per Statista report
2022. As per Statista report 2022 [2], there are 1.7B (billion) active monthly users
across social media platforms. Figure 1 depicts the ranking of various OSN platforms
based on its monthly number of active users. The report ranked Facebook as the most
popular online social network site with a user base of 2910 million as shown in Fig. 1.
This massive increase is driven by numerous benefits users enjoy, like simple contact
with others, actively participating in governmental and political affairs, and exploring
jobs, advertising, information dissemination, or emotional support. The adoption of
social networks was naturally pushed by the emergence of social isolation due to the
recent COVID-19 pandemic.
Today the market of OSN is so big that every second adds up to five new profiles on
Facebook irrespective of spam or benign [3]. Facebook cut off 1.6 billion (B) bogus
accounts in the 1st qtr. of 2022, while 1.7B in the earlier qtr. A record number of over
2.2B bogus accounts were deleted by the SM site in the 1st qtr. of 2019. As per Meta’s
definition, fake accounts encompass such profiles that are intentionally created with a
malicious intent or to impersonate non-human entities such as companies, groups,
or organizations [4]. Typically, spams, rumours, or unethical remarks are uploaded
using a bogus account or spam profile. Numerous studies have been conducted to
decide how to recognize spam messages [5–8]. However, it is still a significant
problem to identify false profiles on OSN websites. The information like profile
photo, sex, name, native place, and other personal information on the network that

2022 Ranking of Global social networks


2,910
2,562

3,500
2,000

3,000
1,478
1,263

2,500
1,000

2,000
988
600

1,500
574
573
573
557
550
444
436
430
300

1,000
500
0

Fig. 1 No of active users across various social networks (in Millions)


Online Social Networks: An Efficient Framework for Fake Profiles … 257

is readily accessible is grabbed and utilized to construct fake ones. As a result, their
social media contacts and friends who are connected to them are made aware of false
information. In the actual world, this circumstance has the potential to do significant
harm to people, businesses, and other entities as false accounts and profiles may
convey spam, false web reviews, and false news.
Traditionally, OSN operators use a set amount of resources to identify, physically
verify, and delete fraudulent profiles [9]. Facebook faces greater threats than other
social networks due to its large user base and restricted functionality. The attacker
then hits Twitter and LinkedIn. Sophos security threat report survey [10]
• 67% of OSN users said they were spammed, and the amount had increased in two
years.
• 57% of OSN users said they had been hit by phishing attacks, and the number of
users who have been hit by phishing has grown by more than 100%.
The paper proceeds along with Sect. 2 that starts by providing background infor-
mation and motivation. In Sect. 3, a comprehensive survey of relevant research is
presented. The proposed framework is detailed in Sect. 4. Section 5 offers the results
analysis and subsequent discussion. Lastly, Sect. 6 encompasses the conclusion and
outlines directions for future research.

2 Background and Motivation

Social networking services like Twitter, Facebook, Google+, etc. are widely used
in the virtual world. 22 billion records were exposed due to data breaches in 2021.
More than any other nation, the U.S. was the target of 46% of cyberattacks in 2020.
A whopping 40% of people on earth are not connected to Internet, making them
easy targets for cyberattacks if and when they do connect [11]. As a result, we must
fully comprehend the situation and explore how to build powerful protections in
order to create a trustworthy cyberspace. Even though there have been numerous
papers surveying online social network (OSN) attacks [12–17] surveys are limited
in discussing various AI technique. They did not actually adopt a broad range of
OSN assault defensive strategies, such as prevention, detection, and reaction (or
mitigation). The investigation of the different diverse security threats and areas and
different approaches and models proposed is the key assertion of this paper. The
background and motivation of this paper highlights the ever-increasing use of social
networking services and the ever-growing market for information security. The paper
notes the high number of data breaches and cyberattacks, particularly in the US, and
emphasizes the need for strong protections in cyberspace.
258 C. Kumar et al.

3 Related Work

Fake profile-detection in OSN is a topic of extensive research due to its increasing


relevance in various domains. Therefore, fake profile detection is an increasingly
important and challenging task in the realm of OSN. The pervasive use of social
media (SM) platforms results in a rise in the creation of fake profiles, often with
malicious intent. Detecting and mitigating the presence of these fake profiles is crucial
to maintaining the integrity and reliability of online communities. Various methods
have been proposed by the researchers for fake profile detection which includes ML-
based approaches, rule-based approaches, and graph-based approaches. With data,
the ability to learn automatically made ML-based approaches more popular.
In this study [18], authors examined current works in literature that have applied
machine learning (ML) to the issue of Twitter spam accounts ML. The section next
provides an empirical investigation to evaluate many ML models using a publically
available dataset. The sorts of models were an ensemble, individual, and majority
voting. As compared to basic ML models, it was shown that ensemble ML models
have the potential to boost prediction accuracy when it comes to the detection of
Twitter spam accounts. In their analysis of an unbalanced dataset, they found that
Random Forest (RF) model is most effective in identifying Twitter spam accounts.
This study [19] to analyse online Twitter posts and identify characteristics that
may indicate suicidal intent among users. For training data and evaluating the effec-
tiveness of the suggested system, ML and natural language processing methods
were employed. Using many Linguistic, Topic, Logistic Regression (LR) classifiers,
Temporal Sentiment, and Statistical characteristics are retrieved and merged to obtain
an accuracy of 87%. According to the study, the right selection and mix of elements
contribute to improved performance.
In this study [20], authors have developed SM sites like WhatsApp, Facebook, and
Twitter have become principal centres for con artists, and these platforms enable their
harmful operations. These scams are perpetrated by criminals utilizing fake accounts
to conceal their genuine identities. To avoid such frauds, they want a technology
that distinguishes between false and genuine accounts. This research implements
and compares the performance of ML models such as, ANN, Stochastic-Gradient-
Descent (SGD) Multi-Layer Perceptron, AdaBoost, and RF to distinguish between
false and genuine profiles. The RF method is determined to be the most effective
after comparing the outcomes of different algorithms using different parameters.
This study [21] authors demonstrated the Twitter’s fundamental data model,
sampling as well as data access best practises. This overview also sets the ground-
work for computational methods, including Natural Language Processing, Graph
Sampling, and ML, which are employed in these fields. In addition to previous
reviews and comparative research, they describe the important results and current
status of these methodologies. Ultimately, they believe that this poll will help in the
development of a coherent conceptual model of Twitter and serve as a guide for
expanding upon the issues covered.
Online Social Networks: An Efficient Framework for Fake Profiles … 259

This study [22] discussed several approaches for analysing tweets and identifying
them as either spam or ham based on the terms that are included in the tweets. NB
classifier is utilized even though many other DL and ML approaches, such as support
vector machine (SVM), clustering approaches, and binary detection models, exist
for this purpose. Accessing or browsing irrelevant spam messages or tweets exposes
Twitter users to malware that steals personal information. Malware that steals data
has been an issue, but so have fake trends. The situation must be managed. It’s
conceivable that spammers will engage more users now that they can automatically
follow new accounts they find interesting.
In this work [23], researcher has been suggested that phony Instagram accounts
may be identified automatically with the use of a fake profile detector, protecting
Instagram users’ social lives. The detection of fake Instagram accounts is helped
by ML algorithms with supervision. Once identified, the IDs of bogus accounts are
catalogued in a data dictionary that may be used to assist relevant authorities in taking
action against spammers. The classification methods that were utilized to train the
dataset were compared experimentally.
This study [24] examined mental diseases via language individuals use to express
themselves on Reddit and Twitter, two famous social media sites. Their objective is to
establish an empirical model for detecting and diagnosing serious mental diseases.
Using text cleaning and Word2Vec language modelling, they are able to produce
numerical features from the text; this allows them to classify posts and users with
high accuracy using a SVM trained on the dataset they created. They have a 95%
accuracy rate with Twitter users and a 73% accuracy rate with the Reddit challenge.
Several studies have also proposed hybrid approaches that combine different tech-
niques, such as combining ML algorithms with rule-based or graph-based methods.
Recent researches have also examined the use of fresh kind of data such as text data
from user’s profile or images from profile pictures to enhance fake profile detection.
Overall, the recent literature suggests that the problem of fake profile detection in
OSN is still an active research area, and there is no comprehensive solution. The
development of potent detection methods to keep up with evolving tactics of adver-
saries remains a key challenge. While there have been several approaches proposed
for detecting fake profiles in (OSNs), there is a research gap in terms of developing
an efficient model that uses an Optimizable Bagged Tree (OBT).

4 Proposed Model

Detecting fake profiles is a challenging task as adversaries use various tactics to evade
detection systems, such as using automated tools to create fake accounts or stealing
the identity of existing users. Therefore, the problem of fake profile detection in OSN
requires the development of effective and robust techniques to identify fake profiles
more accurately. The problem of fake profile can be formulated mathematically as a
binary-classification problem.
260 C. Kumar et al.

A set of profiles (as either genuine or fake) P = {p1, p2, …, pm}, where each
profile pi is represented as a feature vector Xi = (Xi1 , Xi2 , Xi3 , …, Xin ) and a set of
labels Y = {0, 1} representing the authenticity of the profile, the goal is to learn a
classifier f that accurately predicts the label y for a given profile xi.
Feature vector X where X = (X 1 , X 2 , X 3 , …, X n ) each xi is a feature that char-
acterizes the profile. The set of genuine profiles is denoted as G, and the set of fake
profiles is denoted as F. The target is to train classifier f that maps a feature vector
x to a binary label y, where y = 1 if the profile is genuine, and y = 0 if the profile is
fake.
Formally, the problem can be stated as finding a function F such that.
F: X → Y where X ={ feature space}, Y = {the label space. The function f can
be learned from a labelled training set D = {(x1, y1), (x2, y2), …, (xn, yn)}, where xi
∈ X and yi ∈ Y for i = 1, 2, …, n. The learned function f can then be used to predict
the labels of new, unseen profiles.
The proposed Optimized Bagged Trees algorithm is one approach that can be
used to learn the classifier f . It involves building multiple decision tree classifiers
based upon different subsets of the training data. Then their predictions is aggre-
gated to obtain a more robust and accurate classification result. To use Bagged Trees
for fake profile detection in OSNs, the first step is to collect a dataset of labelled
profiles, where each profile is labelled as either genuine or fake. The next step is to
pre-process the data by extracting relevant features, such as the no. of friends, the
frequency of posts, the length of the profile description. Further data is pre-processed
to split into training and testing sets. These models can be created using various DT
algorithms, such as C4.5, ID3, or CART. Then their predictions is aggregated to
obtain a final classification result. This can be done by either taking the majority
vote of the predictions or by using weighted voting based on the accuracy of each
model. It has been used in various studies and has achieved high accuracy rates in
detecting fake profiles on different social networking platforms. However, like any
ML technique, it is important to continuously update and refine the model to adapt
to the evolving nature of fake profiles and their tactics.
Bagged Trees algorithm is a combination of multiple decision trees trained on
different subsets of the training data, with each tree making an independent predic-
tion. The Optimized Bagged Trees algorithm can be mathematically formulated as
follows:
1. Given a dataset of N samples {(X 1 , Y 1 ), (X 2 , Y 2 ), …, (Xn, Yn)}, where Xi =
feature vector of the ith sample, yi = class label.
2. Split the dataset into K subsets {(X 11 , Y 11 ), (X 12 , Y 12 ), …, (X 1k , Y 1k )}, {(X 21 ,
Y 21 ), (X 22 , Y 22 ), …, (X 2k , Y 2k )} …. {(X k1 , Y k1 ), (X k2 , Y k2 ), …, (X kk , Y kk )} using
a random sampling with replacement method.
3. For each subset, train a decision tree T using the CART algorithm or any other
decision tree algorithm, where T maps the feature vector x to its corresponding
class label y.
4. Predict the class label of a new sample x by aggregating the predictions of all K
decision trees using a majority voting scheme or weighted voting scheme.
Online Social Networks: An Efficient Framework for Fake Profiles … 261

5. Compute the accuracy of the Bagged Trees algorithm using a suitable evaluation
metric such as Sensitivity or Recall, Specificity, Accuracy, Precision, Error rate,
and F1 score.
6. The Bagged Trees algorithm can be further improved by tuning the hyperparam-
eters such as the number of decision trees K and the maximum depth of each
decision tree.
Optimized Bagged Trees Algorithm
1. Start
2. Collect dataset of social network profiles
3. Pre-process the data (remove duplicates, missing values, etc.)
4. Extract features from the profiles (e.g. profile picture, number of friends, post
frequency, etc.)
5. Divide data into two distinct subsets: the training set and the testing sets.
6. Train different DT classifiers on different training data subsets (Bagging)
7. Estimate the performance of each DT classifier on the testing set
8. Aggregate the outcomes of the DT classifiers to make a final classification
(majority voting)
9. Evaluate the performance of the Bagged Trees algorithm on the testing set
10. If performance is satisfactory, use the Optimized Bagged Trees otherwise, adjust
parameters and try again steps 6–8 are repeated multiple times with different
subsets
11. End.

5 Analysis and Discussion Over Results

Complexity of Proposed Algorithm


The time and space complexity of the efficient model for detecting fake profiles in
OSN using the Optimizable Bagged Tree algorithm depends on several factors.
Time Complexity: In general, the time complexity of the Bagging algorithm is equals
to O(B * T ), where B = number of iterations (bags) and T is the time complexity
of building a single decision tree. The time complexity of the decision tree building
process can vary based on the specific algorithm used, such as C4.5 or Random
Forest. It typically ranges from O(N*M*log(M)) to O(N * M^2), where N is the
number of samples and M is the number of features.
Space Complexity: The space complexity of the Bagging algorithm is primarily
determined by the storage of multiple decision trees and their associated data struc-
tures. The space complexity of decision tree algorithms can vary, but it is typically
O(M * D), where M is the number of features and D is the maximum depth of the
decision trees.
262 C. Kumar et al.

Table 1 Results comparison


Models Accuracy (In %)
of models for fake profile
detection Optimizable bagged tree 99.9
Bagged decision trees 94.5
Bagged random forest 98.3
Bagged AdaBoost 97.2
Bagged C4.5 97.4

This section provides deep analysis of the data, draw meaningful conclusions,
and address outcomes. The accuracy of the proposed model is obtained through
simulation. Accuracy can be calculated as:

TP + TN
Accuracy =
TP + TN + FP + FN

where: True Negative (TN) = the number of correctly identified genuine profiles
(Table 1).
The table above presents the accuracy results (in percentage) of different models
for detecting fake profiles. The Optimizable Bagged Tree model outperforms other
models, achieving an accuracy of 99.9%.

Bagged Decision Trees, Bagged Random Forest, Bagged AdaBoost, and Bagged
C4.5 models also show respectable accuracy, with percentages ranging from 94.5 to
98.3%. These results highlight the superiority of the Optimizable Bagged Tree model
in accurately identifying fake profiles in OSNs. Its high accuracy score demonstrates
its effectiveness in distinguishing between genuine and fake profiles, making it a
promising approach for combating fake profiles. From above table and figure, it
appears that the Optimized Bagged Tree model has the highest accuracy on the
validation dataset (99.9%), while Bagged Decision Trees, Bagged Random Forest,
Bagged AdaBoost, and Bagged C4.5 models have lower accuracy values ranging
Online Social Networks: An Efficient Framework for Fake Profiles … 263

from 94.5 to 97.4%. In summary, our proposed efficient model for detecting fake
profiles OSNs using the Optimizable Bagged Tree algorithm achieves a high accuracy
of 99.9%. Comparative analysis with other models, such as Bagged Decision Trees,
Bagged Random Forest, Bagged AdaBoost, and Bagged C4.5, shows their lower
accuracy values ranging from 94.5 to 97.4%. The simulation was conducted on a
PC running Windows 10 using Python on Google Colab. These outcomes suggest
that the Optimizable Bagged Tree model outperforms other models in accurately
identifying fake profiles in OSNs, making it a promising approach for combating the
issue of fake profiles.

6 Conclusion and Open Issues

The fake profile detection in OSNs is an emerging area of research. The application
of advanced ML techniques have shown enhancement in the accuracy and efficiency
of fake profile detection. Such methods may utilise large-scale data and complex
patterns to recognise hidden indicators of fraudulent activities. In conclusion, the
Optimizable Bagged Tree model demonstrated better performance in detecting
fake profiles in OSNs. With an accuracy of 99.9%, it outperforms models such as
Bagged Decision Trees (94.5%), Bagged Random Forest (98.3%), Bagged AdaBoost
(97.2%), and Bagged C4.5 (97.4%). These results highlight the effectiveness and reli-
ability of the Optimizable Bagged Tree model for identifying fake profiles, making
it a appealing approach for enhancing the security and trustworthiness of OSNs. The
Optimizable Bagged Tree algorithm leverages the benefits of ensemble learning and
optimization techniques, allowing us to improve the model’s accuracy and generaliza-
tion. By considering the information such as profile information, network character-
istics, and behavioural patterns, our model captures the distinguishing characteristics
of fake profiles effectively.
Future work may involve further refinement of the model by incorporating addi-
tional features or exploring other optimization algorithms. Additionally, reseachers
may investigate the model’s performance on different OSN platforms and evaluate
its robustness against evolving fake profile generation techniques.

References

1. Can U, Alatas B (2019) A new direction in social network analysis: online social network
analysis problems and applications. Phys A Stat Mech Appl 535:122372. https://doi.org/10.
1016/j.physa.2019.122372
2. TheStatisticsPortal (2022) Biggest social media platforms 2022|Statista. https://www.statista.
com/statistics/272014/global-social-networks-ranked-by-number-of-users/. Accessed 19 Aug
2022
264 C. Kumar et al.

3. Kaur R, Singh S, Kumar H (2018) Rise of spam and compromised accounts in online social
networks: a state-of-the-art review of different combating approaches. J Netw Comput Appl
112:53–88
4. TheStatisticsPortal (2022) Facebook fake account deletion per quarter 2022|Statista. https://
www.statista.com/statistics/1013474/facebook-fake-account-removal-quarter/. Accessed 19
Aug 2022
5. Karunakar E, Durga V, Pavani R et al (2022) Ensemble fake profile detection using machine
learning (ML). www.joics.org. Accessed 22 Aug 2022
6. Azab AE, Idrees AM, Mahmoud MA et al (2016) Fake account detection in Twitter based on
minimum weighted feature set 10:13–18
7. Savyan PV, Bhanu SMS (2018) Behaviour profiling of reactions in facebook posts for anomaly
detection. In: Proceedigns of the 2017 9th international conference on advanced computing,
ICoAC 2017, pp 220–226
8. Egele M, Stringhini G, Kruegel C et al (2017) Towards detecting compromised accounts on
social networks. IEEE Trans Depend Secure Comput 14:9616. https://doi.org/10.1109/TDSC.
2015.2479616
9. Wanda P, Jie HJ (2020) DeepProfile: finding fake profile in online social network using dynamic
CNN. J Inform Sec Appl 52:102465
10. Sahoo SR, Gupta BB (2019) Classification of various attacks and their defence mechanism in
online social networks: a survey. Enterp Inform Syst 13:832–864
11. 166 Cybersecurity statistics and trends [updated 2022]. https://www.varonis.com/blog/cybers
ecurity-statistics. Accessed 22 Aug 2022
12. Kayes I, Iamnitchi A (2017) Privacy and security in online social networks: a survey. Online
Soc Netw Media 3–4:1–21
13. Kumar S, Shah N (2018) False information on web and social media: a survey. https://doi.org/
10.48550/arxiv.1804.08559
14. Rathore S, Sharma PK, Loia V et al (2017) Social network security: issues, challenges, threats,
and solutions. Inform Sci 421:43–69
15. Guo Z, Cho JH, Chen IR et al (2021) Online social deception and its countermeasures: a survey.
IEEE Access 9:1770–1806
16. Rahman MS, Reza H (2022) A systematic review towards big data analytics in social media.
Big Data Mining Anal 5:228–244
17. Roy PK, Chahar S (2021) Fake profile detection on social networking websites: a comprehen-
sive review. IEEE Trans Artif Intell 1:271–285
18. Alsunaidi SJ, Alraddadi RT, Aljamaan H (2022) Twitter spam accounts detection using machine
learning models. In: Proceedings of the 2022 14th international conference on computational
intelligence and communication networks (CICN), pp 525–531
19. Chatterjee M, Samanta P, Kumar P et al (2022) Suicide ideation detection using multiple
feature analysis from Twitter data. In: Proceedigns of the 2022 IEEE Delhi section conference
(DELCON), pp 1–6
20. Anklesaria K, Desai Z, Kulkarni V et al (2021) A survey on machine learning algorithms for
detecting fake instagram accounts. In: Proceedings of the 2021 3rd international conference
on advances in computing, communication control and networking (ICAC3N), pp 141–144
21. Antonakaki D, Fragopoulou P, Ioannidis S (2021) A survey of Twitter research: data model,
graph structure, sentiment analysis and attacks. Exp Syst Appl 164:114006. https://doi.org/10.
1016/j.eswa.2020.114006
22. Santoshi KU, Bhavya SS, Sri YB et al (2021) Twitter spam detection using Naïve Bayes classi-
fier. In: Proceedings of the 6th international conference on inventive computation technologies,
ICICT 2021. https://doi.org/10.1109/ICICT50816.2021.9358579
23. Harris P, Gojal J, Chitra R et al (2021) Fake instagram profile identification and classification
using machine learning. In: Proceedings of the 2021 2nd global conference for advancement
in technology, GCAT 2021. https://doi.org/10.1109/GCAT52182.2021.9587858.
24. Hemmatirad K, Bagherzadeh H, Fazl-Ersi E et al (2020) Detection of mental illness risk on
social media through multi-level SVMs. In: Proceedings of the 8th Iranian joint congress on
fuzzy and intelligent systems, CFIS 2020. https://doi.org/10.1109/CFIS49607.2020.9238692
Exploring Risk Factors for
Cardiovascular Disease: Insights from
NHANES Database Analysis

Gaurav Parashar , Alka Chaudhary, and Dilkeshwar Pandey

Abstract The primary factors contributing to mortality in the human population


encompass a range of interrelated medical conditions, such as cardiovascular dis-
ease (CVD) and associated comorbidities. The main aim of the study is to investigate
major factors in developing CVD. In the study, we have used National Health and
Nutrition Examination Survey (NHANES) dataset to gather data between the years
2007 and 2012. A machine learning (ML) ensemble approach was utilized to collect
and analyze data about the patient’s demographics, lifestyle characteristics, phys-
ical examination, and laboratory test results and achieved an accuracy of 91.49%.
Furthermore, research has demonstrated a strong correlation between CVD and fac-
tors like BMI, BP, HDL, LDL, basophils, WBC, RBC, hemoglobin, and monocytes.
The implications of these findings suggest that medical practitioners consider the
identified risk factors while assessing a patient’s risk of CVD and implementing pre-
ventative measures. Further, investigation is required to validate these findings and
identify additional predictors and biomarkers associated with CVD.

Keywords Cardiovascular disease · Machine learning · Ensemble learning ·


NHANES

G. Parashar (B) · A. Chaudhary


AMITY Institute of Information Technology, AMITY University, Noida, UP, India
e-mail: gauravparashar24@gmail.com
A. Chaudhary
e-mail: achaudhary4@amity.edu
D. Pandey
Department of Computer Science and Engineering, KIET Group of Institutions,
Ghaziabad, UP, India
e-mail: dilkeshwar.pandey@kiet.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 265
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_23
266 G. Parashar et al.

1 Introduction

According to WHO [1], CVD has been found as a major cause of mortality and other
connected disorders in the globe, accounting for over 17.9 million fatalities annually.
Additionally, it causes significant worldwide financial and health burdens. The term
“CVD” refers to a large number of related diseases that affects the tubes that carry
blood toward the heart, including coronary heart disease (CHD), cerebrovascular
disease, peripheral arterial disease, rheumatic heart disease, congenital heart disease,
deep vein thrombosis, and pulmonary embolism [2]. Multiple variables, including
gender, age, family history of illness, physical activity, and pre-existing medical
issues [3, 4], affect the chance of getting CVD.
A collection of diseases known as CVD impacts the heart, capillaries, and blood
vessels. A patient with CVD may or may not have symptoms. The illness causes
an irregular heartbeat and narrows the blood arteries in the heart or other parts of
the body. Different factors, including a family history of heart disease, sugar intake,
binge drinking, high cholesterol, fat, use of tobacco products, painful legs that take a
long to recover, red skin on the legs, numbness on the face, trouble seeing, walking,
and talking, and exhaustion, are some of the symptoms of CVD.
To find risk factors for detecting CVD and other heart-related disorders, several
studies have been conducted. Here are a few instances of research [5] that looked at
several aspects that affect the onset of CVD:

• The MONICA initiative: The World Health Organization (WHO) conducted a


very largest-scale investigation to date to assess the existence of risk parameters
for both fatal and nonfatal CHD [6]. Approximately, 15 million men and women
between the ages of 25 and 64 were investigated during the study.
• The INTERHEART research: In this study, a very large, international, and control
study including more than 27,000 individuals from 52 countries revealed many key
risk factors for myocardial infarction. There were nine risk variables, including
alcohol use and consumption, smoking, diabetes, apolipoprotein B and apolipopro-
tein A-I, hypertension, and the psychosocial index [7].
• The MESA research explored how risk variables such as hypertension, dyslipi-
demia, diabetes, atherosclerosis, inflammation, and psychosocial factors affected
the likelihood that CVD would develop in the 6814 individuals of this observa-
tional cohort study [8].
• The Framingham Heart Study, a longitudinal study including 5209 people, found
several important CVD risk factors, including diabetes, obesity, smoking, physical
inactivity, elevated cholesterol, and high blood pressure [9].
• The PURE trial examined the impact of food on the onset of CVD in 139,506
participants from 17 low-, middle-, and high-income nations [10].
Exploring Risk Factors for Cardiovascular Disease … 267

2 Literature Review

Numerous scholars have endeavored to forecast CVD by utilizing artificial intelli-


gence (AI) methodologies. Several research gaps exist that motivated us to work
in areas like genetic factors, environmental factors, inflammation in CVD, and the
development of new treatments. For our study, we used the role of early detection
and intervention in CVD prevention. Early detection and intervention can help to
prevent CVD. However, we do not yet have effective ways to screen for CVD in
asymptomatic individuals.
Pal and others in their paper [11] predict the risk of developing CVD; the study’s
authors used machine learning (ML) classifiers, particularly the multi-layer percep-
tron (MLP) and K-nearest neighbor (K-NN) algorithms. Using the UCI repository,
the results showed an accuracy of 82.47% and an area under the curve (AUC) value
of 86.41%. In a separate investigation by [12], the researchers employed a med-
ical feature selection technique to eliminate extraneous features from the dataset,
thereby improving the efficacy of the tenfold cross-validation algorithm utilizing
naive Bayes (NB), support vector machine (SVM), and instance-based learner (IBK).
This resulted in an accuracy rate of 83.83%. Dutta and other authors [13] deployed a
convolutional neural network (CNN) to accurately classify individuals with cardiac
ailments, achieving a 77% accuracy rate.
Patients with a high risk of CVD may be identified in the area of preventive
cardiology. Alaa et al. recommended AutoPrognosis, a tool that improves ensemble
model performance, and it produced an AUC-ROC of 0.774, 95% confidence interval:
0.768–0.780 [14]. In a comparison of several models, Verma and colleagues found
that C4.5 with 26 features and 335 rows outperformed the fuzzy unordered rule induc-
tion algorithm (FURIA), multi-layer perceptron (MLP), multinomial logistic regres-
sion (MLR), and other models, with the maximum prediction accuracy of 88.4%
[15]. With the aid of C4.5 trees, El-Bialy and other authors [16] were able to reach
an accuracy of 78.06%. IoT technologies are also essential to the healthcare indus-
try. Using health monitoring systems with ML, Kaur and colleagues [17] utilized
MLP, k-NN, SVM, random forest (RF), and decision trees (DT) reaching 57.37%
accuracy with SVM. Klados and others [18] achieved the best results with 6 vari-
ables, related to physical functioning, medical conditions, CVD fitness of NHANES
dataset, 93.6% accuracy, and precision 86.4% using SVM. The major limitation of the
study was that model performed best on 6 variables, if more features are added then
the performance can decrease, as demonstrated in other cases. Hasan and Hasan [19]
aimed to analyze and identify potential factors using logistic regression. The study
addressed challenges such as class imbalance and outliers. The results indicated that
age, blood-related diabetes, cholesterol levels, and BMI were identified as the most
significant risk factors for diabetes. Additionally, the random forest classification
method achieved the highest accuracy score of 0.90 in predicting diabetes.
268 G. Parashar et al.

3 Methodology

In this section, we propose a research methodology to explore risk factors for CVD
using the NHANES database. We define research objectives, collect data, prepare
data for the experiment, prepare the design of the study, identify important features,
provide statistical analysis, interpret the outcome, and conclude. The research objec-
tives for the study are (1) to identify the risk factors for CVD and (2) to identify the
most important features for early detection of CVD.
A series of health and nutrition surveys are carried out yearly by the Centers for
Disease Control and Prevention (CDC) via its subsidiary, the National Center for
Health Statistics (NCHS). To assess the population’s health and nutritional status in
America, the CDC runs the NHANES [20] survey. NHANES is a comprehensive
nationwide study that employs a combination of interviews, physical assessments,
and laboratory analyses to gather data on a statistically representative cross-section
of the American populace. The survey comprises both a face-to-face interview and a
physical examination that is carried out in a mobile examination facility. The phys-
ical assessment encompasses anthropometric measurements of stature, body mass,
sphygmomanometric evaluation of arterial pressure, and diverse clinical analyses,
including haematology and urinalysis. Researchers, policymakers, and health profes-
sionals utilize NHANES data to track health and nutrition patterns, detect inequalities
in health outcomes, and provide guidance for public health policies and initiatives
[21].
The process of model development was partitioned into four distinct phases,
namely dataset preparation, model development, model evaluation, and comparison
of model performance with other existing models (refer Fig. 1).

Fig. 1 Model development process. We divided the process into four phases, viz. dataset prepara-
tion, data modeling, model development, model performance comparison
Exploring Risk Factors for Cardiovascular Disease … 269

3.1 Dataset Preparation

The dataset curation process involved scrutinizing the MCQ file to identify all
attributes or columns that contained indicators of CVD. CVD was assessed through
a medical condition questionnaire (MCQ) that included inquiries such as “Have
you ever been diagnosed with congestive heart failure?” (MCQ160B), “Have you
ever been diagnosed with coronary heart disease (CHD)?” (MCQ160C), “Have you
ever been diagnosed with angina/angina pectoris?” (MCQ160D), and “Have you
ever been diagnosed with a heart attack?” (MCQ160E). Participants who responded
affirmatively to any of the questions were classified as individuals with CVD disease
(CVD). A sample of 1509 adults was drawn from a population of 30,862 participants,
who were classified as either having CVD or not having CVD.

3.2 Study Design and Statistical Analysis

For the study, we considered a cross-sectional analysis that is based on data col-
lected from 2007 to 2012. To examine associations within the variables present
in the dataset, we performed association analysis using heat map (refer Fig. 2). It
was observed that there is a positive correlation between cuff size (BPACSZ), age
(RIDAGEYR), body mass index (BMXBMI) and blood pressure (PEASCTM1), and
Disease CVD. All the statistical analyses were performed using RapidMiner Studio
10.0, Python 3.10.9, and other Python data processing libraries. The dataset was
cleaned and preprocessed using Python and RapidMiner Studio. NHANES dataset
contained multiple files, and for the study, we included demographics, examination,
laboratory, and questionnaire files only. For this study, 2007–2008, 2009–2010, and
2011–2012 year surveys were selected as they covered all the parameters required
for predicting risk factors for CVD. We selected a cohort of participants who were
aged 30 years or more and had a CVD history. A total of 1509 adults aged above
30 years were selected from the dataset. After merging all the files (based on SEQN),
a total of 64 features were collected. The merged dataset was cleaned for missing
values, highly correlated, mostly empty columns and duplicates. Normalization was
performed to fill in empty rows. After data cleaning, the dataset was reduced to 44
features, the same has been listed in Table 1 and heat map in Fig. 2. From Fig. 2,
correlations between different features were evaluated using a correlation matrix that
is then displayed using heat maps.
There are 1509 CVD cases and 27,915 cases that did not have CVD. After over-
sampling the dataset, we were able to inflate CVD cases to 16,412 and 12,534 NO
CVD cases. Before doing the prediction we experimented to find out the effect of
class imbalance on feature importance score, for this, we used the Gini score. From
the outcome, it became evident that there was not much impact of class imbalance
on the scores generated by the Gini Index algorithm and were almost similar (refer
Table 1). Gini Index, also known as Gini impurity, determines how likely it is that
270 G. Parashar et al.

Table 1 Univariate analysis of 44 attributes, with Gini Index (GI-1) with sampling and without
sampling (GI-2)
Attributes Statistics
Min Max Average SD Description GI-1 GI-2
SEQN 41,475 71,916 – – Patient number 0.01 0.01
Disease CVD NO CVD – – Suffering from CVD – –
(1509) (27,915) or not
BMXBMI 12.4 84.9 28 7.61 Body mass index 0.07 0.01
(BMI)
PEASCTM1 2 3059 614.1 227 Blood pressure time 0.06 0.01
in seconds
BPQ150A 1 2 – – Had food in the past 0.04 0.01
30 min?
BPACSZ 1 5 3.8 – Cuff size (cm) (width 0.07 0.01
.× length)
BPXPLS 0 224 72.4 11.7 60 s pulse (30 s pulse 0.05 0.01
* 2)
BPXML1 0 888 152.7 28.2 MIL: maximum 0.1 0.01
inflation levels
(mmHg)
BPXSY1 72 238 125.9 19.3 Systolic: blood 0.1 0.01
pressure (first
reading) (mmHg)
BPXDI1 0 134 66.9 13.0 Diastolic: blood 0.02 0.01
pressure (first
reading) (mmHg)
BPXSY2 74 234 124.7 18.8 Systolic: blood 0.1 0.01
pressure (second
reading) (mmHg)
BPXDI2 0 134 66.4 13.3 Diastolic: blood 0.02 0.01
pressure (second
reading) (mmHg)
BPXSY3 74 232 123.6 18.2 Systolic: blood 0.1 0.01
pressure (third
reading) (mmHg)
BPXDI3 0 128 65.9 13.5 Diastolic: blood 0.02 0.01
pressure (third
reading) (mmHg)
LBXCRP 0 20 0.5 0.796 CRP (mg/dL) 0.13 0.02
LBDHDD 7 179 50.5 13.6 Direct 0.05 0.01
HDL-cholesterol
(mg/dL)
LBDHDDSI 0.2 4.6 1.3 0.352 Direct 0.05 0.01
HDL-cholesterol
(mmol/L)
LBXWBCSI 1.4 99.9 7.3 2.37 White blood cell 0.02 0.01
count: SI
LBXLYPCT 2.7 85.5 30.6 9.92 Lymphocyte percent 0.07 0.01
(%)
LBXMOPCT 0.6 66.9 8 2.41 Monocyte percent 0.03 0.01
(%)
(continued)
Exploring Risk Factors for Cardiovascular Disease … 271

Table 1 (continued)
Attributes Statistics
Min Max Average SD Description GI-1 GI-2
LBXNEPCT 0.8 96.6 57.6 10.8 Segmented 0.06 0.01
neutrophils percent
(%)
LBXEOPCT 0 34.1 3.2 2.29 Eosinophils percent 0.02 0.01
(%)
LBXBAPCT 0 19.7 0.7 0.545 Basophils percent 0.01 0.01
(%)
LBDLYMNO 0.2 71.1 2.2 1.33 Lymphocyte number 0.05 0.01
LBDMONO 0 10.2 0.6 0.202 Monocyte number 0.03 0.01
LBDNENO 0.1 83.1 4.3 1.69 Segmented 0.04 0.01
neutrophils number
LBDEONO 0 8.4 0.2 0.193 Eosinophils number 0.02 0.01
LBDBANO 0 4.7 0 0.066 Basophils number 0.02 0.01
LBXRBCSI 2.4 7.2 4.6 0.480 Red cell count SI 0.02 0.01
LBXHGB 6.1 19.7 13.8 1.46 Hemoglobin (g/dL) 0.02 0.01
LBXHCT 20.5 57.7 40.4 4.16 Hematocrit (%) 0.03 0.01
LBXMCVSI 50.5 125.3 88.9 5.87 Mean cell volume 0.06 0.01
(fL)
LBXMCHSI 14.9 60.8 30.4 2.34 Mean cell 0.05 0.01
hemoglobin (pg)
LBXMC 25.1 43.8 34.2 0.928 Mean cell 0.03 0.01
hemoglobin
concentration (g/dL)
LBXRDW 6.3 37.8 13.1 1.25 Red cell distribution 0.03 0.01
width (%)
LBXPLTSI 11 1000 246.1 68.9 Platelet count (1000 0.06 0.01
cells/uL)
LBXMPSI 4.7 13.5 8 0.893 Mean platelet 0.03 0.01
volume (fL)
WTSAF2YR 0 521,033.4 69,381.9 – Fasting subsample 0.2 0.01
2 year MEC weight
LBXTR 12 2742 139.7 79.4 Triglyceride (mg/dL) 0.23 0.03
LBDTRSI 0.1 31 1.6 0.896 Triglyceride 0.23 0.03
(mmol/L)
LBDLDL 9 344 105.6 22.6 LDL-cholesterol 0.24 0.02
(mg/dL)
LBDLDLSI 0.2 8.9 2.7 0.584 LDL-cholesterol 0.24 0.02
(mmol/L)
RIAGENDR Male Female – – Gender 0.01 0.01
RIDAGEYR 40 80 48.8 – Age at screening 0.22 0.02
adjudicated—recode
272 G. Parashar et al.

Fig. 2 Associations: circles are numerical associations and squares are categorical

a randomly chosen case will be incorrectly categorized. An attribute having a lower


Gini Index score is preferred. The degree of the score lies between 0 and 1.


n
Gini Index = 1 −
. ( pi )2 . (1)
i=1

. pi is the probability of an item classified as a particular class (CVD or NO CVD).

4 Experiment and Results

Our model is an ensemble of decision tree, random forest, naive Bayes, k-NN, and
neural network with weighted majority voting approach. Here’s the mathematical
formulation for our proposed model.
Let’s assume we have m base models denoted as . M1 , . M2 , …, . Mm . Each base
model produces a prediction . yi ∈ 0, 1 for a given input sample x. The ensemble
model combines the predictions of the base models using weighted majority voting.
We assign weights .w1 , .w2 , …, .wm to each base model. The weights represent the
Exploring Risk Factors for Cardiovascular Disease … 273

confidence or performance of each base model. The ensemble model’s prediction . ye


is computed as follows: . ye = argmax(∑wi ∗ η(yi )), where argmax returns the label
(0 or 1) that maximizes the expression, and .η(yi ) is an indicator function defined as
.η(yi ) = 1; if yi = 1, .η(yi ) = − 1; if yi = 0. The weights can be determined based

on the performance or accuracy of the models, using cross-validation for it. The paper
is non-simulation based as it uses experimental methodology to find the risk factors
related to CVD, uses data driven approach. The setup is elaborated in Fig. 1, it lacks
simulation component like a simulation software for any implementation. For the
study, we prepared an experimental setup using RapidMiner studio on MacBook Air
with (1.8 GHz Dual-Core Intel Core i5, and 8 GB RAM). The dataset resulting from
the data preparation stage (Sect. 3.1) was split into testing and training. Oversampling
was used to produce a balanced dataset of 70/30 train/test split. In the training phase,
prepared after feature engineering, we used a training dataset to train the model for
prediction. In the validation phase, the trained model was tested on how accurately
it predicted the corresponding labels of our testing dataset. Each model undergoes
tenfold cross-validation with random split to fetch accurate model performance.
To produce optimal model parameters, a method involving grid search was
employed for each model, which included parallelized performance evaluation to
evaluate the model’s performance. This approach involved a systematic search of
the hyperparameters specified by the model to determine the optimal combination
of values that would result in the highest level of accuracy. The parallelized per-
formance evaluation aspect of the process allowed for the testing of multiple sets
of hyperparameters simultaneously, making it a more efficient and effective means
of tuning the models. Ultimately, this approach helped to ensure that the resulting
models were performing at their best potential.
During the third phase, the predictive model is constructed and its performance is
assessed through the utilization of performance metrics such as accuracy, sensitivity,
specificity, and F1 score. These calculations are summarized in the confusion matrix
(refer Fig. 3). The confusion matrix is a performance measurement tool for machine
learning problems where four actual versus predicted are placed. On the top of the
matrix, actual values are placed, and on the left side of the matrix, predicted values
are placed. From Fig. 3, we can analyze that there are a total of 15,733 cases of
real CVD cases that were correctly classified as CVD cases by our model, 10,749
cases are those who did not have CVD were correctly classified as NO CVD. Since
the dataset was balanced using oversampling, therefore we can use accuracy as a
performance metric. Accuracy of our model can be computed using the formula,
15,733+10,749
accuracy = . TP+TN
Total
= . 15,733+10,749+679+17,858 = 91.49%. Other performance metrics
are mentioned in Table 2. The final comparison of our model with other related
studies is summarized in Table 3.
We found that our model was able to perform better than the studies we reviewed.
We attempted to make the performance of a predictive model better by improving
the accuracy and other performance metrics.
274 G. Parashar et al.

Fig. 3 Confusion matrix of our model

Table 2 Evaluation metrics of our model


Model Accuracy Classification Precision Recall F-measure
error
Our model 91.49% .± 8.51% .± 85.75% .± 89.81% .± 87.73% .±
0.51% 0.51% 0.51% 0.51% 0.51%

Table 3 Comparison of different studies


Model Performance metrics Techniques used Dataset
Our model Accuracy: 91.49% Ensemble using NHANES
tenfold
cross-validation (CV)
[13] Accuracy: 85.7% Convolutional neural NHANES
network (CNN)
[18] Accuracy: 85.6% SVM NHANES
[22] AUC: 0.862 MLP NHANES
[11] AUC: 86.41% Multi-layer perceptron UCI
(MLP) and K-nearest
neighbor (K-NN)
[12] Accuracy: 86.77% Tenfold UCI
cross-validation,
medical feature
selection-computer-
based feature
selection

5 Conclusion

The present investigation developed a prognostic framework of CVD among patients


using imbalanced data containing both qualitative and quantitative attributes. The
novelty of the proposed model lies in the identification of risk factors. These risk
factors can correctly predict CVD in a patient. The model’s accuracy was determined
to be 91.49% through a tenfold cross-validation process using DT, RF, NB, K-NN,
and NN. The precision of the results was determined through the utilization of a
voting classifier. Various risk factors have been identified, including BMI, BP, HDL,
LDL, basophils, white blood cells, red blood cells, hemoglobin, and monocytes.
Exploring Risk Factors for Cardiovascular Disease … 275

It is important to note that these parameters exhibit variations based on gender. In


the future, we aim to implement medical feature selection in conjunction with other
models to enhance the precision of our results.

References

1. WHO (2021) Cardiovascular diseases, June 2021. who.int. https://www.who.int/health-topics/


cardiovascular-diseases. Accessed 14 Apr 2023
2. WHO (2021) Cardiovascular diseases, June 2021. https://www.who.int/news-room/fact-
sheets/detail/cardiovascular-diseases-(cvds). Accessed 23 May 2023
3. Benjamin EJ, Muntner P, Alonso A, Bittencourt MS, Callaway CW, Carson AP, Chamberlain
AM, Chang AR, Cheng S, Das SR et al (2019) Heart disease and stroke statistics-2019 update:
a report from the American Heart Association. Circulation 139(10):e56–e528
4. Al-Mallah MH, Sakr S, Al-Qunaibet A (2018) Cardiorespiratory fitness and cardiovascular
disease prevention: an update. Curr Atheroscler Rep 20:1–9
5. Wong ND (2014) Epidemiological studies of CHD and the evolution of preventive cardiology.
Nat Rev Cardiol 11(5):276–289
6. Böthig S (1989) Who Monica project: objectives and design. Int J Epidemiol 18(3 Suppl
1):S29–S37
7. Yusuf S, Hawken S, Ôunpuu S, Dans T, Avezum A, Lanas F, McQueen M, Budaj A, Pais P,
Varigos J et al (2004) Effect of potentially modifiable risk factors associated with myocardial
infarction in 52 countries (the interheart study): case-control study. Lancet 364(9438):937–952
8. Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, Greenland P, Jacobs
DR Jr, Kronmal R, Liu K et al (2002) Multi-ethnic study of atherosclerosis: objectives and
design. Am J Epidemiol 156(9):871–881
9. Dawber TR, Meadors GF, Moore FE Jr (1951) Epidemiological approaches to heart disease:
the Framingham study. Am J Public Health Nations Health 41(3):279–286
10. Miller V, Mente A, Dehghan M, Rangarajan S, Zhang X, Swaminathan S, Dagenais G, Gupta
R, Mohan V, Lear S et al (2017) Fruit, vegetable, and legume intake, and cardiovascular disease
and deaths in 18 countries (pure): a prospective cohort study. Lancet 390(10107):2037–2049
11. Pal M, Parija S, Panda G, Dhama K, Mohapatra RK (2022) Risk prediction of cardiovascular
disease using machine learning classifiers. Open Med 17(1):1100–1113
12. Nahar J, Imam T, Tickle KS, Chen Y-PP (2013) Computational intelligence for heart disease
diagnosis: a medical knowledge driven approach. Expert Syst Appl 40(1):96–104
13. Dutta A, Batabyal T, Basu M, Acton ST (2020) An efficient convolutional neural network for
coronary heart disease prediction. Expert Syst Appl 159:113408
14. Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, Van der Schaar M (2019) Cardiovascular
disease risk prediction using automated machine learning: a prospective study of 423,604 UK
biobank participants. PLoS ONE 14(5):e0213653
15. Verma L, Srivastava S, Negi PC (2016) A hybrid data mining model to predict coronary artery
disease cases using non-invasive clinical data. J Med Syst 40:1–7
16. El-Bialy R, Salamay MA, Karam OH, Khalifa ME (2015) Feature analysis of coronary artery
heart disease data sets. Procedia Comput Sci 65:459–468
17. Kaur P, Kumar R, Kumar M (2019) A healthcare monitoring system using random forest and
internet of things (IoT). Multimed Tools Appl 78:19905–19916
18. Klados GA, Politof K, Bei ES, Moirogiorgou K, Anousakis-Vlachochristou N, Matsopoulos
GK, Zervakis M (2021) Machine learning model for predicting CVD risk on NHANES data.
In: 2021 43rd annual international conference of the IEEE engineering in medicine & biology
society (EMBC). IEEE, pp 1749–1752
276 G. Parashar et al.

19. Hasan KA, Hasan MAM (2020) Prediction of clinical risk factors of diabetes using multiple
machine learning techniques resolving class imbalance. In: 2020 23rd international conference
on computer and information technology (ICCIT). IEEE, pp 1–6
20. National health and nutrition examination survey. https://wwwn.cdc.gov/Nchs/Nhanes/.
Accessed 14 Apr 2023
21. Dinh A, Miertschin S, Young A, Mohanty SD (2019) A data-driven approach to predicting
diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak
19(1):1–15
22. Oh T, Kim D, Lee S, Won C, Kim S, Yang J, Yu J, Kim B, Lee J (2022) Machine learning-based
diagnosis and risk factor analysis of cardiocerebrovascular disease based on KNHANES. Sci
Rep 12(1):2250
Review of Local Binary Pattern
Histograms for Intelligent CCTV
Detection

Deepak Sharma, Brajesh Kumar Singh, and Erma Suryani

Abstract This paper presents a comprehensive review of the use of Local Binary
Pattern Histogram (LBPH) in smart Closed-Circuit Television (CCTV) systems for
real-time detection and analysis of abnormal events. The review covers various
aspects of smart CCTV, including object detection, motion detection, and abnormal
event detection. It discusses existing literature on LBPH-based methods, highlighting
their strengths and weaknesses. The review emphasizes the challenges faced in smart
CCTV detection, such as handling complex scenes, occlusions, lighting variations,
and false alarms. It also identifies potential research directions, such as incorpo-
rating deep learning techniques, exploring multi-modal data fusion, and utilizing
edge computing for real-time processing. In summary, this review offers valuable
insights into the current state-of-the-art techniques for smart CCTV detection using
LBPH. It provides researchers, practitioners, and policymakers involved in the devel-
opment and deployment of smart CCTV systems with valuable information. The
findings of this review can guide future research and contribute to the advancement
of public safety and security applications.

Keywords Object detection · Smart CCTV · LBPH · Recognition

1 Introduction

A. Video surveillance is an important tool for ensuring public safety and security.
However, traditional CCTV systems can be limited in their ability to detect and
respond to threats in real time.

D. Sharma (B) · B. K. Singh


R.B.S. Engineering Technical Campus, Bichpuri, Agra, India
e-mail: deepaksharmacb@gmail.com
E. Suryani
Department of Information Systems, Institut Teknologi Sepuluh Nopember (ITS), Surabaya,
Indonesia

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 277
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_24
278 D. Sharma et al.

Smart CCTV systems, on the other hand, leverage advanced technologies such as
computer vision, machine learning, and artificial intelligence to improve the accuracy
and speed of video analysis. By developing smart CCTV systems, we can enhance
public safety and security while reducing the workload and cost of human operators.
B. Local Binary Pattern Histograms
The review paper extensively explores the concept and application of Local Binary
Pattern Histogram (LBPH) in the context of intelligent CCTV detection. The section
dedicated to LBPH provides a comprehensive understanding of its fundamental
principles, advantages, and limitations.
Local Binary Pattern (LBP): The paper discusses the concept of local binary
patterns, which are local texture descriptors widely used in computer vision tasks.
LBPs encode the relationship between a pixel and its neighbors by comparing their
intensity values. This operator captures texture information and is particularly useful
for analyzing patterns in surveillance footage.
Histogram Representation: LBPH builds upon the LBP operator by introducing
histogram representation. LBPH divides the image into local regions and computes
LBPs within each region. These local LBPs are then quantized into histograms,
which encode the frequency of different pattern occurrences. The resulting histogram
representation efficiently summarizes the texture information in an image [1].
Robustness to Illumination Variations: One of the key advantages of LBPH is its
robustness to illumination variations. By focusing on local texture patterns rather than
absolute intensity values, LBPH-based systems can effectively handle challenging
lighting conditions encountered in real-world surveillance scenarios.
Object Recognition and Event Detection: The paper emphasizes the application
of LBPH in object recognition and event detection in CCTV footage. The ability
of LBPH to capture discriminative texture features makes it suitable for identifying
specific objects or events of interest, such as people, vehicles, or suspicious activities.
Limitations and Challenges: The review paper also addresses the limitations and
challenges associated with LBPH. While LBPH performs well in capturing texture
information, it may struggle to recognize complex objects or events with subtle
variations. Additionally, parameter selection and tuning can significantly impact the
performance of LBPH-based systems, requiring careful consideration.
By delving into the concept of LBPH and its application in intelligent CCTV
detection, the review paper provides valuable insights into the strengths and limita-
tions of this technique. The paper serves as a guide for researchers and practitioners
interested in utilizing LBPH for surveillance tasks, highlighting its effectiveness in
texture-based analysis and its potential for advancing intelligent CCTV systems.
C. The purpose of this review paper is to provide an overview of the use of LBPH
in smart CCTV systems. We will review the relevant literature on LBPH-based
video analysis and highlight its potential applications in object detection, human
activity recognition [2–4], facial recognition, and other areas. By examining the
Review of Local Binary Pattern Histograms for Intelligent CCTV Detection 279

strengths and limitations of existing studies, we aim to identify the key challenges
and opportunities for future research.

2 Literature Review

According to the (Table 1) literature of review, the LBPH algorithm has used several
times in the above literature of review table shows higher accuracy compared to the
above-mentioned techniques. It has been found to be effective in handling challenges
related to pose variation, illumination, and occlusion, which are common in face
recognition tasks.
As mentioned in the above (see Fig. 1) literature of review, the Local Binary
Pattern Histogram (LBPH) algorithm is a technique used in literature of review
research papers for face recognition and enhanced real-time face recognition [10].
It has been analyzed and compared with other face recognition techniques, such as
eigenfaces and fisher faces, in terms of performance and accuracy.

3 LBPH-Based CCTV Detection Systems

The review paper extensively discusses Local Binary Pattern Histograms (LBPH)-
based CCTV detection systems, highlighting their significance and effectiveness in
intelligent surveillance [6]. The section dedicated to LBPH-based CCTV detection
systems provides in-depth insights into their architecture, working principles, and
key components.
Feature Extraction: LBPH-based systems leverage the local binary pattern (LBP)
operator to extract robust and discriminative features from surveillance footage [13].
By analyzing the local texture patterns of image regions, LBPH efficiently captures
essential information for object recognition and event detection.
Histogram Representation: LBPH further utilizes the histogram representation to
encode the extracted features. The local binary patterns are quantized into histograms,
capturing the distribution of different pattern occurrences. This histogram represen-
tation enables efficient and compact feature representation.
Training and Recognition: LBPH-based systems typically involve a training phase
where the system learns the patterns and characteristics of specific objects or events.
This is followed by a recognition phase, where the system matches the extracted
features of new observations against the learned patterns to identify and classify
objects or events of interest.
Parameter Settings: The review paper emphasizes the importance of parameter
settings in LBPH-based CCTV detection systems. Parameters such as the size of
Table 1 Literature of review
280

S Year Author(s) Focus of the Key points in Technique(s) used Parameters analyzed Research gaps
No paper coverage
1 2022 Anna Irin Anil Smart Configuration, Emergence of machine learning, Non-human detection, Minimize the human
et al. [5] security embedded deep learning, computer vision, motion detection under interaction in between
CCTV surveillance generic object tracking using sunlight exposure, the industry to enhance
system regression networks (GOTURN), human motion the security
you only look once (YOLO) and detection surveillance
Local Binary Pattern Histogram
algorithm (LBPH)
2 2021 Kothari et al. [6] Survey on Dropbox, Raspberry Raspberry Pi, Internet of Things Intruder tracking, live This smart surveillance
smart Pi, Internet of Things (IoT) and PIR sensor streaming, system authenticated
security (IoT), Picamera, PIR communication, and encrypted on the
CCTV sensor motion detection, receiver side to obtain
system image processing accurate information at
the end of the
recipients
3 2021 Mihir Shah Smart Long short-term Convolutional neural network Detection of normal To identify the
et al. [7] surveillance memory (LSTM), (CNN), long short-term memory and abnormal events/ anomalies in the given
system automated teller (LSTM), RNN (recurrent neural activities video like theft,
machine (ATM), networks), backpropagation assault, robbery, etc.
surveillance, through time (BPPT)
convolutional neural
network (CNN)
4 2020 Deepak et al. [8] Attendance OpenCV, face OpenCV (open-source computer Face recognition This system is used to
system based acknowledgement vision) system, attendance obtain the accurate
on facial system, bio-metric, system [8, 9] attendance system
recognition verification
(continued)
D. Sharma et al.
Table 1 (continued)
S Year Author(s) Focus of the Key points in Technique(s) used Parameters analyzed Research gaps
No paper coverage
5 2019 Farah Deeba Enhanced Local Binary Pattern Feature vectors, extracting Feature extraction, face To obtain the valid
et al. [10] real-time face Histogram (LBPH), histograms with LBP, LBPH detection, feature results from pose
recognition face recognition, algorithm, SIFT algorithm matching, variation, illumination
feature extraction preprocessing and occlusion
6 2019 Raman Sharma Performance Haar cascade, Fisher faces, data set creator/face Parameter selections, In this paper, the
et al. [11] analysis for histogram, face detector, local binary patterns extract the histograms LBPH algorithm is
human face recognition, face histograms for the image, training having more accuracy
recognition detection, feature the algorithm, perform as compared to the
techniques extraction the face recognition, eigenfaces and fisher
LBP operation faces
7 2019 Niraj Kini et al. Theft Object tracking, Neural networks, convolutional Theft tracking and A new dimension to
[12] detection and computer vision, neural network, deep artificial motion detection enable the surveillance
motion object detection, neural networks, scale invariant system camera as a
tracking convolutional neural feature transform (SIFT), theft detector
using CNN network histograms of oriented gradients
(HOG)
Review of Local Binary Pattern Histograms for Intelligent CCTV Detection
281
282 D. Sharma et al.

Fig. 1 Literature of review

the local neighborhood, the number of sampling points, and threshold values signif-
icantly influence the system’s performance. Fine-tuning these parameters is crucial
for achieving optimal results in different surveillance scenarios.
Performance Evaluation: This paper also delves into the performance evaluation
of LBPH-based CCTV detection systems [11]. It discusses various performance
metrics, including accuracy, robustness, and efficiency, to assess the system’s perfor-
mance. The findings reveal the strengths and limitations of LBPH-based systems and
provide insights for further optimization.
• Lower prediction value means better face recognition.
• Higher frame per second (FPS) means faster speed.
As shown in the (see Fig. 2) FPS range, the performance evaluation section of
this review paper provides a comprehensive assessment of Local Binary Pattern
Histograms (LBPH) for intelligent CCTV detection. Various performance metrics
were discussed to evaluate the effectiveness and limitations of LBPH-based CCTV
detection systems on the basis of the FPS range.
The accuracy of LBPH-based systems was highlighted, showcasing its supe-
riority over other feature extraction techniques like eigenface and fisher face as
given in Table 2, Comparison of eigenface, fisher face, and LBPH algorithm [14].
LBPH demonstrates excellent performance in object recognition tasks, particularly
in low-light conditions. However, it was noted that LBPH has limitations in recog-
nizing complex objects or events, requiring further improvements in its capability to
distinguish between similar categories.
Efficiency was also considered, emphasizing the trade-off between accuracy and
computational resources. Optimal parameter settings play a crucial role in achieving
the desired balance between accuracy and efficiency in LBPH-based systems.
Review of Local Binary Pattern Histograms for Intelligent CCTV Detection 283

Eigenface Fisher Face LBPH


3 6.58

2 1.23

1 0.67

0 1 2 3 4 5 6 7
FPS Range

Fig. 2 FPS range

Table 2 Comparison of eigenface, fisher face, and LBPH algorithm


Comparison subject Eigenface Fisher face LBPH
Value prediction when testing with 4633.81 318.59 29.32
the same face
Smallest value prediction when 2004.2 61.42 71.88
testing with the different faces
Biggest value prediction when 8360.78 2805.77 367.5
testing with different faces
FPS range 0.67 1.23 6.58
Bold values shows the higher accurracy as compared to other parameters

The results of experiments and tests conducted on LBPH-based CCTV detection


systems shed light on its performance characteristics. These findings provide valuable
insights for system developers, enabling them to optimize parameter settings, address
limitations, and further enhance the overall performance of LBPH-based CCTV
detection systems.

4 Applications of LBPH-Based CCTV Detection

The applications section of this review paper explores the diverse range of applica-
tions where Local Binary Pattern Histograms (LBPH) can be effectively utilized in
CCTV detection systems [15]. Several key applications were discussed, highlighting
the effectiveness of LBPH in various scenarios:
Intrusion Detection: LBPH-based CCTV detection systems can be employed to
detect and identify unauthorized individuals or objects entering restricted areas. The
robustness of LBPH in different lighting conditions makes it suitable for monitoring
entrances and detecting potential security breaches.
284 D. Sharma et al.

Object Recognition: LBPH proves valuable in recognizing specific objects of


interest, such as vehicles, weapons, or suspicious packages [16, 17]. Its accuracy
in object recognition tasks allows for efficient and reliable identification of relevant
objects in real-time surveillance footage.
Crowd Monitoring: LBPH-based systems excel in crowd monitoring applications
by analyzing the movement patterns, density, and behavior of individuals within
crowded areas [18]. This facilitates crowd management, identifying potential safety
risks or abnormal activities.
Traffic Surveillance: LBPH can contribute to intelligent traffic surveillance systems
by detecting and tracking vehicles, monitoring traffic flow, and identifying traffic
violations [5]. This enables efficient traffic management and enhances road safety.
Abnormal Event Detection: LBPH-based systems can be utilized to detect abnormal
events in surveillance footage, such as fights, accidents, or vandalism [12]. By
comparing current observations with predefined patterns, LBPH can trigger alerts
and enable timely intervention.
Facial Recognition: With its capability to extract discriminative features, LBPH is
suitable for facial recognition applications. It can be employed in CCTV systems
to identify and authenticate individuals, enhancing security measures in restricted
access areas.
The discussed applications demonstrate the versatility and effectiveness of LBPH-
based CCTV detection systems across various domains. The ability of LBPH
to handle challenging real-world conditions and provide accurate results makes
it a valuable tool for enhancing surveillance capabilities and improving security
measures.
By understanding the specific requirements and challenges of each application,
researchers and practitioners can further optimize LBPH-based systems and explore
additional use cases where intelligent CCTV detection [7, 19] is essential.

5 Conclusion

In conclusion, this review paper provides a comprehensive analysis of Local Binary


Pattern Histograms (LBPH) for intelligent CCTV detection. Through the examina-
tion of various studies and experiments, it is evident that LBPH offers promising
capabilities in object recognition and event detection in surveillance footage. LBPH
outperforms other feature extraction techniques, particularly in low-light condi-
tions, demonstrating its robustness. However, the effectiveness of LBPH depends
on parameter settings and the complexity of objects or events.
Review of Local Binary Pattern Histograms for Intelligent CCTV Detection 285

Balancing accuracy and efficiency are crucial for optimal system performance.
Further research is needed to address the limitations in recognizing complex objects
and events and to explore enhancements to LBPH-based CCTV detection systems.
Overall, LBPH shows great potential in advancing intelligent surveillance and
contributes to enhancing security and safety measures.

References

1. Kanhangad V, Ravi J (2015) Human action recognition using local binary patterns and motion
history images. In: Proceedings of the 2015 international conference on advances in computing,
communications and informatics (ICACCI). IEEE, pp 1216–1221
2. Gharib A, Ibrahim AA, Al-Arif MA (2017) Survey on human activity recognition in video
surveillance. In: Proceedings of the 2017 2nd international conference on computer science
and technology, information storage and processing (CSTISP). IEEE, pp 356–361
3. Joshi M, Lade P (2016) Human activity recognition using local binary patterns and motion
history images in video surveillance. In: Proceedings of the 2016 international conference on
information technology (ICIT). IEEE, pp 190–195
4. Kisku DR, Singh JK, Singh P (2015) Human action recognition using local binary pattern and
support vector machine. In: Proceedings of the 2015 IEEE Calcutta conference (CALCON).
IEEE, pp 123–128
5. Irin Anil A et al (2022) Smart security CCTV system: configuration, embedded surveillance,
and non-human detection. Int J Res Eng Sci 10(7):52–56
6. Kothari SB et al (2021) Survey on smart security surveillance system. Int J Res Appl Sci Eng
Technol (IJRASET) 9(12):52
7. Shah M, Agre M, Chawdhary A, Deone J (2022) Smart surveillance system. https://doi.org/
10.2139/ssrn.4108852
8. Deepak S, Akash S (2020) Implementing the study of attendance system using OpenCV with
the help of facial recognition. Int J Recent Adv Multidiscip Top 1(1):16–19
9. Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans
Circ Syst Video Technol 14(1):4–20
10. Deeba F, Memon H, Dharejo F, Ahmed A, Ghaffar A (2019) LBPH-based enhanced real-time
face recognition. Int J Adv Comput Sci Appl 10:535
11. Sharmila R, Sharma D, Kumar V, Puranik C, Gautham K (2019) Performance analysis of
human face recognition techniques. In: Proceedings of the 2019 4th international conference
on internet of things: smart innovation and usages (IoT-SIU), Ghaziabad, India, pp 1–4. https://
doi.org/10.1109/IoT-SIU.2019.8777610
12. Doshi P, Punktambekar S, Kini N, Dhami SS (2019) Theft detection system using convolutional
neural network and object tracking. Int J Adv Res Innov Ideas Educ 5(3):1047–1054
13. Kim HJ, Kim DJ, Kim HW, Lee KH (2016) Human activity recognition in video surveillance
based on feature selection and support vector machines. J Electr Eng Autom 1(1):40–45
14. Sutoyo R, Harefa J, Chowanda A (2016) Unlock screen application design using face expression
on android smartphone. MATEC Web Confer 54:05001. https://doi.org/10.1051/matecconf/
20165405001
15. Elhamod M, Hassenian AE, Ghoniemy S, Elhoseny M (2017) Video surveillance systems:
advances, challenges, and future research directions. In: Handbook of research on advanced
hybrid intelligent techniques and applications. IGI Global, pp 121–157
16. Cui S, Xu T, Xu C, Zhu Y (2016) A survey on visual surveillance of object detection and
tracking in video streams. IEEE Trans Syst Man Cybern Syst 46(6):768–782
17. Dahiya A, Gupta V (2017) Object tracking in video surveillance: a comprehensive review.
IETE J Res 63(3):335–348
286 D. Sharma et al.

18. Das S, Chatterjee S (2018) A review on human activity recognition in video surveillance. In:
Proceedings of the 2018 3rd IEEE Uttar Pradesh section international conference on electrical,
electronics and computer engineering (UPCON). IEEE, pp 1–6
19. Hanmandlu M, Vasikarla S (2013) An efficient approach for facial recognition in video surveil-
lance using local binary patterns. In: Proceedings of the 2017 international conference on
intelligent computing and control systems (ICICCS). IEEE, pp 723–728
Enhancing Machine Learning Model
Using Explainable AI

Raghav Shah, Amruta Pawar, and Manni Kumar

Abstract This paper implements Explainable Artificial Intelligence (XAI) tech-


niques, specifically LIME and SHAP, in a hotel review management model. The
goal is to enhance transparency, interpretability, and trustworthiness. LIME provides
local explanations, highlighting the important features that influenced individual
predictions. SHAP offers a global perspective on feature importance, helping users
understand the overall impact of each feature on the model’s predictions. By inte-
grating XAI, the model becomes more transparent, enabling users to comprehend
the decision-making process, validate decisions, identify biases, and make informed
decisions based on reliable insights from the reviews.

Keywords Explainable Artificial Intelligence · Feature importance · LIME ·


SHAP · Hotel management

1 Introduction

In today’s digital era, the reputation and success of businesses are greatly influenced
by customer reviews, which play a vital role in shaping them, particularly in the
hotel industry [1]. With the exponential growth of online platforms, analyzing and
interpreting large volumes of customer feedback have become increasingly chal-
lenging. To address this, machine learning models have been employed to automate
the process of categorizing and analyzing hotel reviews. However, these models
often operate as black boxes, lacking transparency and interpretability. The inability
to understand the factors driving model predictions raises concerns among stake-
holders, such as hotel managers and consumers [2]. Therefore, the primary focus
of this research is to address the limitations of opacity in hotel review management
models by implementing Explainable Artificial Intelligence (XAI) techniques. By

R. Shah · A. Pawar · M. Kumar (B)


Department of Computer Science and Engineering, Chandigarh University, Mohali,
Punjab 140413, India
e-mail: mannikumar55@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 287
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_25
288 R. Shah et al.

incorporating XAI techniques, such as LIME and SHAP, we aim to enhance trans-
parency and interpretability, enabling stakeholders to gain deeper insights into the
decision-making process of the model.

1.1 Purpose

The purpose of this study is twofold. Firstly, we aim to enhance transparency and
interpretability in hotel review management whose machine learning model was
developed with an accuracy of about 92% by leveraging XAI techniques mainly
feature importance, LIME and SHAP technique. By providing stakeholders with
comprehensible insights into the factors influencing the model’s predictions, we
enable them to validate and understand the decision-making process. Secondly,
we seek to improve decision-making processes based on customer feedback. By
extracting meaningful and interpretable information from the reviews, stakeholders
can make more informed judgments and identify key factors that drive customer
satisfaction or dissatisfaction [3].

1.2 Scope

The scope of this study encompasses the execution of LIME plus the SHAP tech-
niques in a hotel review management model. We will explore the practical integration
of these XAI techniques and evaluate their effectiveness in providing interpretable
explanations. Furthermore, we will conduct case studies and experiments to assess
the impact of XAI on decision-making processes [4]. While the focus is on the hotel
industry, the insights and findings from this study can be extended to other domains
where customer feedback analysis is essential.

1.3 Significance

In conclusion, this research aims to address the lack of transparency and inter-
pretability in hotel review management models by implementing XAI techniques. By
exploring LIME and SHAP and their impact on decision-making processes, we strive
to enhance transparency and interpretability and ultimately improve decision-making
based on customer feedback.
Enhancing Machine Learning Model Using Explainable AI 289

2 Related Work

Li and Lu published a research paper named “Hotel review sentiment analysis based
on a hybrid model of deep learning and machine learning” in the year 2020. It
proposed a hybrid model that combines deep learning and machine learning tech-
niques to analyze sentiment in hotel reviews. They demonstrated the effectiveness
of their approach in capturing sentiment information from textual data [5].
Mishra and Dash [6] published a research paper named “Sentiment analysis of
hotel reviews using machine learning techniques” in the year 2019, which focused
on the application of machine learning algorithms for sentiment analysis in the hotel
review domain, achieving promising results in classifying reviews into positive, nega-
tive, or neutral sentiments. Huang et al. [7] explored a hybrid approach that integrates
machine learning in addition to lexicon-based methods to enhance sentiment analysis
accuracy. Their study emphasized the importance of incorporating domain-specific
knowledge.
These works collectively showcase the ongoing research in sentiment analysis
and the integration of various techniques for analyzing and interpreting sentiments
in hotel reviews. Building upon these prior studies, our research aims to extend
the existing approaches by incorporating explainable AI techniques, such as LIME
and SHAP, to provide transparent and interpretable explanations for the sentiment
analysis results obtained in the context of hotel review management. By employing
these XAI methods, we seek to enhance the understanding and trustworthiness of
sentiment analysis outcomes, enabling hotel managers to make informed decisions
based on the identified sentiments in guest reviews.

2.1 Research Question

The research questions driving this study are twofold. The primary research ques-
tion centers on understanding the effectiveness of LIME and SHAP in providing
interpretable explanations for hotel review predictions. Through this, we seek to
explore how these XAI techniques contribute to transparency and trustworthiness in
the model’s decision-making process.

3 Method

The methodology employed in this research paper involves several key steps,
including data collection and preprocessing, model development, XAI integration,
and evaluation. The following sections outline the details of each step:
Data Collection and Preprocessing. The hotel review data used in this study were
collected from an online hotel review platform. The dataset consisted of a substantial
290 R. Shah et al.

number of hotel reviews, encompassing both positive and negative sentiments. The
data collection process involved extracting the relevant reviews based on predefined
criteria, such as review ratings and textual content.
To ensure the quality and consistency of the data, a series of preprocessing steps
were implemented. Irrelevant information, such as timestamps or user identifiers, was
removed from the dataset. Text preprocessing techniques, including tokenization,
lowercasing, and removal of stop words and special characters, were applied to
standardize the text data. Additionally, any missing values or inconsistencies in the
dataset were handled appropriately.
Model Development. A machine learning model was developed to perform senti-
ment analysis on the hotel reviews. The selection of the specific model was based
on its suitability for text classification tasks. The model was trained using a labeled
dataset, with the positive and negative sentiments serving as the target labels. Various
machine learning algorithms, such as random forest or support vector machines, were
considered and evaluated based on their performance metrics, including accuracy,
precision, recall, and F1-score. The chosen model was then fine-tuned to optimize
its performance.
XAI Integration. Explainable Artificial Intelligence (XAI) techniques, in partic-
ular, Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive
Explanations (SHAP) are two specific methods utilized for the purpose of explaining
and interpreting models were integrated into the hotel review management model.
These techniques were selected due to their ability to provide interpretable explana-
tions for the model’s predictions. Feature importance technique is one of all the tech-
niques which helps to generate the explanations for model’s predictions. It is plotted
in Fig. 2, on the basis of the features which are generated by the code mentioned in
Fig. 1. A detailed description of how the reviews is generated by the AI model using
XAI techniques is shown in Fig. 4. Further techniques like LIME and SHAP and
how it is implemented are shown in Figs. 5, 6, and 7. LIME provides local expla-
nations by generating feature importance scores, while SHAP offers a more global
understanding of feature contributions. It is plotted in Fig. 8, according to the SHAP
technique. The XAI techniques were implemented to generate explanations for the
model’s predictions, enabling stakeholders to understand the factors influencing the
sentiment classification.
• Code of feature importance technique (as shown in Fig. 1).
• Plotting of feature importance scores (as shown in Fig. 2).
• Code for prediction of review (as shown in Fig. 3).
• Following are the classified reviews (as shown in Fig. 4).
• Code of LIME technique shown in Fig. 5.
• Output of LIME technique (as shown in Fig. 6).
• Code of SHAP technique (as shown in Fig. 7).
• Plotting of SHAP values (as shown in Fig. 8).
• Code of implementing SHAP technique on five reviews (as shown in Fig. 9).
• Plotting’s of SHAP values (as shown in Fig. 10).
Enhancing Machine Learning Model Using Explainable AI 291

Fig. 1 Code for feature importance technique

Fig. 2 Plotting of feature importance scores

Fig. 3 Code for prediction of review


292 R. Shah et al.

Fig. 4 Following are the classified reviews

Fig. 5 Code for LIME technique


Enhancing Machine Learning Model Using Explainable AI 293

Fig. 6 Output for LIME technique

Fig. 7 Code for SHAP technique

Evaluation. The integrated model, along with the LIME and SHAP techniques, was
assessed using different evaluation measures, including accuracy, precision, recall,
and F1-score. The efficiency of the model was assessed on a held-out test dataset,
and the XAI techniques’ effectiveness in providing interpretable explanations was
analyzed. Additionally, the impact of XAI on decision-making processes related to
hotel reviews was evaluated through case studies and experiments. This involved
analyzing the insights gained from the explanations provided by LIME and SHAP
and assessing their influence on stakeholders’ decision-making.
The methodology presented in this research paper provides a comprehensive
approach to implementing XAI in the hotel review management model. It ensures
the collection of reliable and relevant data, the development of an effective senti-
ment analysis model, the integration of LIME and SHAP for interpretability, and the
thorough evaluation of the integrated model’s performance.
294 R. Shah et al.

Fig. 8 Output for SHAP technique

Fig. 9 Code of SHAP technique on five reviews

4 Result

The integration of XAI techniques, LIME and SHAP, into the hotel review manage-
ment model yielded significant outcomes. The sentiment analysis model achieved
high accuracy, precision, recall, and F1-score. LIME and SHAP provided inter-
pretable explanations, aiding stakeholders in understanding the factors driving
Enhancing Machine Learning Model Using Explainable AI 295

Fig. 10 Plotting’s of SHAP values

customer sentiments. Decision-making processes were improved, as hotel managers


could prioritize areas for enhancement based on identified factors. User acceptance
was positive, with increased transparency and trust in the model’s predictions. The
research bridged the gap between complex machine learning models and human
understanding, enabling informed decisions based on customer feedback.

5 Discussion

The integration of XAI techniques, LIME and SHAP, enhanced transparency, inter-
pretability, and decision-making in the hotel review management model. LIME and
SHAP provided explanations for sentiment analysis predictions, enabling stake-
holders to understand factors influencing customer sentiments. This improved
decision-making, user acceptance, and trust in the model’s predictions. Practical
applications extend to service improvements and reputation management. Future
research can explore additional XAI techniques and investigate long-term impact
and generalizability in different domains. Overall, XAI bridges the gap between
complex AI models and human understanding, enabling informed decisions based
on customer feedback.
296 R. Shah et al.

6 Conclusion

The integration of XAI techniques, LIME and SHAP, into the hotel review manage-
ment model has proven beneficial. The model achieved high performance in senti-
ment analysis and provided interpretable explanations. This enhanced transparency
and interpretability improved decision-making processes, user acceptance, and trust.
The practical applications of XAI in the hotel industry are significant, bridging the
gap between AI models and human understanding. Future research can explore
broader applications and generalizability. Overall, XAI empowers stakeholders to
make informed decisions based on customer feedback, leading to enhanced service
quality and customer satisfaction.

7 Limitations and Future Scope

Limited dataset. The research used a specific dataset of hotel reviews, which may not
capture the full spectrum of sentiments and experiences, limiting the generalizability
of the findings.
Dependency on model performance. The quality of the explanations provided by
LIME and SHAP relies on the accuracy of the sentiment analysis model. If the
prototype has limitations or biases, it can impact the reliability of the explanations.
The future scope is listed below:
• Integration of multiple XAI techniques.
• Real-time analysis and feedback.
• Cross-domain analysis.
• User-centric explanations.
• Ethical considerations.

References

1. Zhang X, Li Y, Xu W, Zhao J, Li Z (2019) An interpretable deep learning framework for sentiment


analysis using attention mechanism. IEEE Access 7:12324–12332
2. Liu Y, Huang X, An A, Yu X (2020) Explainable AI for sentiment analysis: a survey. ACM
Trans Intell Syst Technol 11(3):1–34
3. Deng L, Liu Y, Yang Y, Zhang M (2020) Explainable sentiment analysis for hotel reviews via
attention mechanisms. Inf Sci 537:112–126
4. Teng Y, Ye D, Li M, Li X, Li H (2020) An explainable deep learning model for sentiment analysis
of hotel reviews. IEEE Access 8:156556–156565
Enhancing Machine Learning Model Using Explainable AI 297

5. Li J, Lu K (2020) Hotel review sentiment analysis based on a hybrid model of deep learning
and machine learning. Int J Comput Intell Syst 13(1):787–799
6. Mishra P, Dash R (2019) Sentiment analysis of hotel reviews using machine learning techniques.
In: Proceedings of the international conference on machine intelligence and data
7. Huang S, Sun X, Wang J (2020) Sentiment analysis of hotel reviews using machine learning
and lexicon-based approaches. Int J Comput Intell Syst 13(1):787–799
IoT-Assisted Mushroom Cultivation
in Agile Environment

Abhi Kathiria, Parva Barot, Manish Paliwal , and Aditya Shastri

Abstract IoT-assisted mushroom farming in an agile environment is a modern


approach to mushroom cultivation that utilizes IoT technology to optimize the grow-
ing conditions in real time. This approach involves IoT sensors to monitor environ-
mental parameters such as temperature and humidity inside a cultivation chamber.
The data collected from these sensors is then analyzed using data analytics soft-
ware, which provides real-time feedback to the farmer and enables them to make
adjustments as needed. Additionally, an automated humidification system and an
air exchange system can be controlled by IoT sensors to optimize the growing con-
ditions for the mushrooms. IoT technology in mushroom cultivation offers several
benefits, including increased efficiency, reduced labor costs, and higher yields. How-
ever, this approach also comes with some challenges, such as the need for technical
expertise and a significant initial investment in technology and equipment. Overall,
IoT-assisted mushroom farming in an agile environment is a promising approach that
can help meet the growing demand for sustainable and efficient farming practices.

Keywords IoT mushroom cultivation · Sustainable · Efficient farming process ·


Agile environment

1 Introduction

Mushroom cultivation is an essential industry in the agricultural sector, providing a


nutritious source of food and income for farmers. However, mushroom cultivation can
be a complex process that requires careful monitoring and control of the cultivation
environment. Any changes in temperature, humidity, light, or air quality can impact
the growth and yield of the mushrooms. Traditional methods of monitoring and
control can be time-consuming and labor-intensive and can also be prone to errors
[2].

A. Kathiria · P. Barot · M. Paliwal (B) · A. Shastri


School of Technology, Pandit Deendayal Energy University, Gandhinagar, India
e-mail: paliwalmanish1@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 299
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_26
300 A. Kathiria et al.

The emergence of the Internet of Things (IoT) has revolutionized how we approach
agriculture. IoT technology involves using sensors, actuators, and other devices to
collect and process data from the environment in real time. IoT can be applied to
various areas of agriculture, including crop monitoring, irrigation, and pest manage-
ment. In the case of mushroom cultivation, IoT can be used to monitor and control
the cultivation environment with greater accuracy and efficiency than traditional
methods.
IoT-based mushroom cultivation involves installing sensors and other IoT devices
throughout the cultivation facility to collect data on various environmental factors,
such as temperature and humidity. This data is then processed and analyzed in real
time, allowing farmers to make informed decisions about the cultivation process.
For example, suppose the temperature in the cultivation facility rises above a certain
threshold. In that case, the IoT system can automatically activate a cooling system to
return the temperature to the optimal range. This saves time and labor and improves
the overall quality and yield of the mushrooms [8].
In an agile environment, IoT-based mushroom cultivation can provide farmers
with the flexibility to adapt quickly to changes in the cultivation environment. An
agile environment involves using iterative and incremental processes to respond to
changes in requirements and improve the overall quality of the product. With IoT-
based mushroom cultivation, farmers can respond quickly to changes in environ-
mental factors and make adjustments to the cultivation process as needed. However,
the implementation of IoT-based mushroom cultivation in an agile environment can
also pose some challenges. For example, the cost of installing and maintaining IoT
devices can be high, and there may be a learning curve for farmers to understand and
use the technology effectively [1].
In this paper, we will explore the benefits of IoT-based mushroom cultivation in
an agile environment and analyze the challenges associated with its implementation.
The proposed approach is different from the existing approaches in some manner
such as environment friendly and less complexity. The proposed approach uses the
biowaste as a cultivation platform which previous approaches does not included. It
ensures the low cost design for the production which makes it efficient for small
organization/home purpose. We will also provide a case study of a successful imple-
mentation of IoT-based mushroom cultivation in an agile environment and discuss
the implications of our findings for future research and development in this field.

2 Related Work

Kassim et al. state the latest research on the indoor environment of mushroom facto-
ries and its impact on the growth and quality of mushrooms. The success of mushroom
cultivation heavily depends on indoor conditions, including temperature, humidity,
airflow, lighting, and air quality. For instance, studies have demonstrated that a tem-
perature range of 20–25 .◦ C and humidity levels of 85–95% are ideal for the growth
of oyster mushrooms. Moreover, poor air quality due to high levels of carbon dioxide
IoT-Assisted Mushroom Cultivation in Agile Environment 301

and volatile organic compounds can decrease mushroom yields and cause contam-
ination. To ensure optimal mushroom growth and quality, it is crucial to carefully
manage and control these factors. Nonetheless, more research is needed to develop
improved strategies for optimizing the indoor environment of mushroom factories
[5].
Balan et al. discuss the challenges and opportunities of producing high-quality
edible mushrooms from lignocellulosic biomass in small-scale settings. The authors
highlight the potential of using agricultural and forestry waste as a substrate for
mushroom cultivation, which can offer both economic and environmental benefits.
However, the utilization of these materials can be hindered by their variable and
complex nature, which requires careful selection, processing, and supplementation
with nutrients. The authors also address the challenges related to the management
of fungal contamination, temperature, humidity, and other environmental factors
during mushroom cultivation. Strategies for optimizing these factors, such as the
use of automated systems and advanced monitoring techniques, are also discussed.
Overall, the paper provides valuable insights into the opportunities and challenges of
small-scale mushroom production from lignocellulosic biomass and highlights the
need for further research in this area [3].
Singh and Sushma focus on the application of Internet of Things (IoT) technology
in mushroom cultivation. The authors discuss how IoT can help in automating and
optimizing various aspects of mushroom cultivation, such as temperature, humidity,
lighting, and air quality. The paper also highlights the benefits of using IoT in mush-
room cultivation, such as reduced labor and energy costs, increased yield and quality,
and improved traceability and transparency in the supply chain. The authors discuss
the various sensors and devices that can be used in IoT-based mushroom cultivation
systems and provide examples of existing IoT solutions for mushroom cultivation.
Overall, the paper provides valuable insights into the potential of IoT in enhancing
the efficiency and productivity of mushroom cultivation and highlights the need for
further research in this area [7].
Rahman et al. discuss the development of an IoT-enabled mushroom farm automa-
tion system with machine learning to classify toxic mushrooms in Bangladesh. The
authors highlight the significance of mushroom cultivation in Bangladesh, but also
the potential dangers posed by the presence of toxic mushroom species. To address
this issue, the authors propose an automated system that can detect and classify
toxic mushrooms in real time using machine learning algorithms. The study also
discusses the potential benefits of using IoT technologies in mushroom cultivation,
such as real-time monitoring and control of environmental factors like temperature,
humidity, and CO.2 levels. Overall, the paper provides a valuable contribution to the
growing field of IoT and machine learning in agriculture, particularly in the context
of mushroom cultivation in Bangladesh [6].
Islam et al. present an automated monitoring and environmental control system
for laboratory-scale cultivation of oyster mushrooms using the Internet of Agricul-
tural Things (IoAT). The study highlights the importance of environmental control in
mushroom cultivation and the potential benefits of implementing an automated sys-
tem. The authors review existing literature on mushroom cultivation and automation
302 A. Kathiria et al.

systems and emphasize the need for a cost-effective and efficient approach to mush-
room cultivation. The paper describes the design and implementation of the proposed
system, which involves monitoring and controlling the temperature, humidity, light,
and carbon dioxide levels using sensors and actuators connected to an IoT platform.
The authors also evaluate the performance of the system in terms of mushroom
growth and compare it with manual cultivation. The study concludes that the auto-
mated system significantly improves the growth rate and yield of oyster mushrooms
compared to manual cultivation, indicating the potential benefits of IoAT-based sys-
tems in mushroom cultivation. However, the authors also acknowledge the need for
further research to optimize the system and evaluate its economic feasibility for
commercial-scale production [4].
Subedi et al. present an Internet of Things (IoT)-based monitoring system for
white button mushroom farming. The authors discuss the importance of monitoring
and controlling the environmental conditions for optimal growth and yield of mush-
rooms. They review existing literature on IoT-based systems for mushroom farming
and identify the need for a more cost-effective and user-friendly system. The authors
describe their proposed monitoring system which uses sensors to measure tempera-
ture, humidity, carbon dioxide, and light intensity in real time. The data from these
sensors is transmitted to a central server where it is processed and analyzed. The
authors report the successful implementation of the system and conclude that it has
the potential to improve mushroom farming practices and increase yields [9].
Considering the above-related work, the proposed approach suggests a better cost-
effective, and environment-friendly mushroom cultivation design. It also supports the
cloud-based data storage and computation facility to take any precautionary measure
in case of quality defects. The study will present an IoT-based system where a better
yield of mushrooms can be produced at a meager cost. It will subsequently help to
achieve those mushroom yield production that was restricted by their demographic
features.

3 Proposed System

This section provides the overall working and basic principal for the proposed model.
The methodology of the proposed solution is Classified in the form of a flowchart in
Fig. 1.
The flowchart describes creating a grow tent for cultivating mushrooms and moni-
toring their growth using various sensors, a Raspberry Pi, and an Asus Tinker Board.

– The process starts with creating the grow tent and installing the humidifier to
provide the ideal environment for mushroom growth. Once the tent is set up,
supports are placed inside the tent where the mushroom saplings will be.
– Next, various sensors are installed inside the grow tent to monitor the temperature,
humidity, and other environmental conditions necessary for the growth of the
mushroom. The fans are also attached to the tent for proper ventilation.
IoT-Assisted Mushroom Cultivation in Agile Environment 303

Fig. 1 Working principle of IoT-assisted mushroom cultivation

– The sensors are connected to the Raspberry Pi and Asus Tinker Board through an
application that allows real-time monitoring of the conditions inside the tent. A
Pi Camera is also connected to the system to record the growth of the mushroom
over time.
– Once the mushroom is fully grown, the sensors stop collecting data, and the process
is stopped. The application is used to monitor the changes in temperature and
humidity and allows the user to set a threshold for these parameters.
– All the data acquired from the sensors is stored in the application and can be
accessed by the user. This allows the user to monitor the growth of the mushroom
and make adjustments to the environment if necessary.

Overall, this flowchart outlines a process for creating a controlled environment


for mushroom growth and monitoring the process using technology.

4 Methodology

The methodology for implementing IoT-assisted mushroom cultivation in an agile


environment can be divided into the following steps:

– Define project scope and objectives: The first step is to define the goals and objec-
tives of the IoT system. This involves identifying the specific environmental param-
eters to be monitored and controlled, such as temperature and humidity, and deter-
mining the desired outcomes of the cultivation process, such as improved yield,
quality, and efficiency. Defining the project scope and objectives will provide a
clear direction for the implementation process.
– Design IoT system: The next step is to design the IoT system. This involves
developing a system architecture that outlines the sensors, actuators, and other
components required to monitor and control the cultivation environment. Choosing
appropriate IoT platforms and technologies to collect and analyze data is essential.
The design phase should result in a detailed plan for how the system will operate
and be integrated with existing cultivation practices.
304 A. Kathiria et al.

– Install sensors and other equipment: Once the IoT system is designed, the next
step is to install the required sensors and equipment in the cultivation environment.
This may involve placing temperature and humidity sensors throughout the grow-
ing area and adding equipment such as fans or heaters to control the environment.
Ensuring the sensors and equipment that are positioned correctly to capture accu-
rate data is essential. Figure 2 demonstrates the IoT system architecture consisting
these sensor components.
– Configure IoT platform: After the sensors and equipment are installed, the IoT
platform must be configured to receive data from the sensors and control the
actuators. This may involve setting up a cloud-based IoT platform to collect and
analyze data and configuring the software to control equipment such as fans and
heaters. It’s important to ensure that the system is scalable and can handle a large
amount of data.
– Test and validate system: Once the IoT platform is configured, it’s important to
conduct tests to ensure the system functions as expected. This may involve running
tests to validate data accuracy and system reliability. The testing phase should also
include validating that the system meets the defined project scope and objectives.
– Monitor and optimize the system: After the system is operational, it’s important
to continuously monitor the system to identify and resolve any issues that arise.
This may involve using data analysis tools to identify areas where improvements
can be made and optimizing the system to improve yield, quality, and efficiency.
– Integrate with agile practices: Finally, the implementation of IoT-assisted mush-
room cultivation should be integrated with agile practices, such as iterative devel-
opment and continuous improvement. This involves adapting and improving the
system over time to ensure that it remains responsive to changing conditions and
that the desired outcomes are achieved.

5 Results and Discussions

The result of IoT-assisted mushroom cultivation in an agile environment is very


positive. Figure 3 defines the physical structure of the proposed design. By using IoT
devices, sensors, and a cloud-based platform, farmers can monitor and control various
parameters critical to mushroom growth, such as temperature, and humidity, in real
time. This can lead to improved yields, quality, and sustainability of the cultivation
process.
The results obtained also benefit in:

– Improved crop yields: With IoT devices and sensors, farmers can monitor and
control the cultivation environment in real time, ensuring that the temperature,
humidity, CO.2 levels, and light intensity are optimized for mushroom growth.
This can lead to improved yields and a more consistent supply of high-quality
mushrooms.
IoT-Assisted Mushroom Cultivation in Agile Environment 305

Fig. 2 Sensors and humidifier setup

Fig. 3 Grow tent with support structures to place the mushroom bags and sensors
306 A. Kathiria et al.

– Consistent quality: With better control over the cultivation environment, farm-
ers can produce mushrooms with consistent quality and characteristics, making
them more marketable. This can lead to increased revenue and improved customer
satisfaction.
– Reduced labor costs: IoT devices and sensors can automate many tasks involved
in mushroom cultivation, such as adjusting the temperature and humidity levels,
reducing the need for manual labor. This can lead to significant cost savings for
farmers, especially in large-scale operations.
– Reduced environmental impact: By optimizing the growing conditions, farmers
can reduce the use of resources such as water and energy and minimize waste and
pollution. This can lead to a more sustainable cultivation process and a reduced
environmental footprint.
– Better decision-making: With access to real-time data and analytics, farmers can
make informed decisions and take timely actions to optimize the cultivation pro-
cess. For example, if the sensors detect a drop in humidity levels, the system
can automatically activate the humidifier to maintain the optimal conditions for
mushroom growth.
– Enhanced monitoring and control: IoT-assisted mushroom cultivation in an agile
environment enables farmers to monitor and control the cultivation process
remotely using a mobile application or web interface. This provides greater flexi-
bility and convenience and enables farmers to respond quickly to any changes in
the cultivation environment.
– Improved traceability and accountability: By collecting and storing data in the
cloud, IoT-assisted mushroom cultivation in an agile environment enables farmers
to track the cultivation process from start to finish, providing greater traceability
and accountability. This can be important for regulatory compliance and quality
control.

Figure 4 shows the cultivated mushroom using the proposed IoT system. Addi-
tionally, the captured data of the system is periodically updated on the cloud which is
summerized in Fig. 5. Overall, the benefits of IoT-assisted mushroom cultivation in
an agile environment can include improved crop yields, consistent quality, reduced
labor costs, reduced environmental impact, better decision-making, enhanced mon-
itoring and control, and improved traceability and accountability. By leveraging the
power of IoT devices and cloud-based platforms, farmers can optimize the cultivation
process and achieve better results.

6 Conclusions

IoT-assisted mushroom cultivation in an agile environment can revolutionize how


mushrooms are grown and harvested. By leveraging the power of IoT devices, sen-
sors, and cloud-based platforms, farmers can monitor and control various parameters
IoT-Assisted Mushroom Cultivation in Agile Environment 307

Fig. 4 Cultivated mushroom

Fig. 5 Humidity and temperature data collected

critical to mushroom growth in real time, improving the cultivation process’s yields,
quality, and sustainability.
The benefits of IoT-assisted mushroom cultivation in an agile environment include
improved crop yields, consistent quality, reduced labor costs, reduced environmen-
tal impact, better decision-making, enhanced monitoring and control, and improved
traceability and accountability. These benefits can significantly impact the profitabil-
ity and sustainability of mushroom cultivation and the overall quality and availability
of mushrooms in the market.
308 A. Kathiria et al.

While some challenges are associated with implementing IoT-assisted mushroom


cultivation in an agile environment, such as the initial cost of IoT devices and sen-
sors, the benefits can far outweigh the costs in the long run. Further, research and
development in this field can help overcome these challenges and further optimize
the cultivation process. Overall, IoT-assisted mushroom cultivation in an agile envi-
ronment is a promising research area and can potentially transform the mushroom
cultivation industry.

Acknowledgements This work is part of the B.Tech. major project. All the images are the outcome
of the proposed system. The author would like to acknowledge the support from Pandit Deendayal
Energy University, Gandhinagar, India.

References

1. Aggarwal N, Singh D (2022) A review on usage of internet of things (IoT) technologies in


mushroom cultivation. ECS Trans 107(1):9739
2. Assemie A, Abaya G (2022) The effect of edible mushroom on health and their biochemistry.
Int J Microbiol 2022
3. Balan V, Zhu W, Krishnamoorthy H, Benhaddou D, Mowrer J, Husain H, Eskandari A (2022)
Challenges and opportunities in producing high-quality edible mushrooms from lignocellulosic
biomass in a small scale. Appl Microbiol Biotechnol 106(4):1355–1374
4. Islam MA, Islam MA, Miah MSU, Bhowmik A (2022) An automated monitoring and environ-
mental control system for laboratory-scale cultivation of oyster mushrooms using the internet
of agricultural thing (IoAT). In: Proceedings of the 2nd international conference on computing
advancements, pp 207–212
5. Kassim MRM, Mat I, Yusoff IM (2019) Applications of internet of things in mushroom farm
management. In: 2019 13th international conference on sensing technology (ICST). IEEE, pp
1–6
6. Rahman H, Faruq MO, Hai TBA, Rahman W, Hossain MM, Hasan M, Islam S, Moinuddin M,
Islam MT, Azad MM (2022) IoT enabled mushroom farm automation with machine learning
to classify toxic mushrooms in Bangladesh. J Agric Food Res 7:100267
7. Singh S, Sushma S (2020) Smart mushroom cultivation using IoT. Int J Eng Res Technol
(IJERT) 8(13):65–69
8. Sinha BB, Dhanalakshmi R (2022) Recent advancements and challenges of internet of things
in smart agriculture: a survey. Future Gener Comput Syst 126:169–184
9. Subedi A, Luitel A, Baskota M, Acharya TD (2019) IoT based monitoring system for white
button mushroom farming. In: Proceedings, vol 42. MDPI, p 46
A Comprehensive Survey of Machine
Learning Techniques for Brain Tumor
Detection

Mriga Jain, Brajesh Kumar Singh, and Mohan Lal Kolhe

Abstract Cells in brain, which develop too quickly and uncontrollably, can lead to
brain tumor. Early detection and treatment are crucial for the treatment. However,
accurately segmenting and categorizing tumor remains a daunting challenge despite
the numerous research efforts. CNNs, a well-liked deep learning (DL) model, may
nevertheless be constrained by major input variances or domain shifts. Pre-trained
transfer-based learning models are becoming more and more common as a solution
to this issue. This survey aims to present a thorough overview of shift from state-of-
the-art CNN-based models to transfer learning-based techniques, to automate brain
tumor detection. This review discusses the reason of shift along with architecture of
brain tumor, publicly accessible datasets, segmentation, feature extraction, classifi-
cation, and cutting-edge approaches for studying brain cancers including machine
learning, deep learning, and transfer learning. In addition, relevant issues and typical
difficulties have also been highlighted.

Keywords Brain tumor detection · CNN · Pre-trained models · Transfer learning

1 Introduction

Artificial intelligence has a massive effect on image processing (AI). It enables


the automation of image processing and raises the standard of such photos. Image
identification, segmentation, enhancement, object detection, image restoration, and
image regeneration are some of the key AI image processing operations.
Brain Tumor—A brain tumor is an abnormal cell growth inside the skull or brain.
Tumor can be primary (grow from the brain’s tissue itself) or secondary (originating

M. Jain (B) · B. K. Singh


R.B.S. Engineering Technical Campus, Bichpuri, Agra, India
e-mail: mj160391@gmail.com
M. L. Kolhe
Faculty of Engineering and Science, University of Agder, Grimstad, Norway
e-mail: mohan.l.kolhe@uia.no

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 309
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_27
310 M. Jain et al.

in another area of the body and extending to the brain). The kind, size, and location of
the tumor all influence the available treatments. Symptom relief or curative objectives
may be the focus of treatment. Many of the 120 different forms of brain tumor are
treatable [1].
Gliomas or tumor can really be described as aggressive or slow-growing. A malig-
nant (aggressive) tumor spreads from one place to another; in contrast, a benign
(slow-growing) tumor does not penetrate the surrounding tissues. Brain tumors are
categorized into grades I through IV by the WHO. Tumors of grades I and II are
considered to grow slowly, but those of higher grades (III–IV) have a worse prognosis
[2].
The most standard diagnostic technique is reliant on how doctors analyze an MRI
scan decision, which increases the time of making a diagnosis because the images are
very complex. However, using digital image processing enables quick and accurate
tumor detection and can be of great use to doctors by enabling them to work faster
and easier diagnosis of the presence of tumor [3].
To visually assess the patient, radiologists use MRI and CT scans. They examine
the structure of the brain, and MRI images show the tumor size and location [4].
The following are methods for capturing images available for recognizing and
categorizing brain tumor: For detecting brain tumor, magnetic resonance imaging
(MRI) is the most important imaging method (MRI). It can demonstrate the location,
size, and form of a malignancy and provide comprehensive images of the brain.
Using computed tomography, or CT scans, the brain may be shown in great detail.
They frequently work in conjunction with MRI to give a more thorough picture of
the brain and to aid in the diagnosis.
A PET scan is imaging test that shows how your body’s cells are functioning by
using a little quantity of radioactive material that could help with detection.
As the number of patients has multiplied, manually interpreting these images
has become time-consuming, confusing, and lowers precision. A computer-aided
diagnostic technique that lowers the cost of brain MRI identification must be devised
in order to ease this restriction. A number of attempts have been made to create an
effective and trustworthy method for automatically classifying brain tumor [5].

2 Concept of Pixel and Voxel

Both the voxel and the pixel are units of measurement used in image processing.
The smallest component of a digital image is referred to as a pixel (short for
“picture element”). It is a solitary point in the picture that carries details about
its color, brightness, and position. When detecting brain tumor, medical imaging
techniques like X-rays, CT scans, and MRI scans employ pixels to represent the
various grayscale levels.
In three-dimensional (3D) space, a voxel, which is short for “volume element,” is
the equivalent of a pixel. It represents a single point in the 3D picture and contains
information about its location in space as well as its attributes such as intensity and
A Comprehensive Survey of Machine Learning Techniques for Brain … 311

Fig. 1 Pixel and voxel in


brain scan by Despotović
et al. [6]

texture. Voxels are utilized in the identification of brain tumor to depict the tissue
density in 3D space and to offer a more precise estimation of the tumor size and
location.
As illustrated in Fig. 1, voxel and pixel are both significant units of measurement
in the processing of pictures for brain tumor identification. Voxel is used to represent
3D images, while pixels are used to represent 2D images in medical imaging [6].

3 Materials and Methods

The articles included in this evaluation were released between 2017 and 2023.
Further, few researches that were released prior to 2017 are also explored. Main
emphasis on articles that created classification and segmentation methods for brain
tumor is utilizing ML, CNN, pre-trained transfer learning models like ResNet,
VGG16, etc. To discover pertinent papers, searches were made in the scientific liter-
ature databases IEEE, PubMed, Google Scholar, Elsevier, and ScienceDirect. For
journal papers, Multidisciplinary Digital Publishing Institute’s (MDPI) web database
was also explored. Our searches included the words “brain tumor,” “segmentation,”
“classification,” “DL,” and “transfer learning.” Additionally, a collection of phrases
related to DL brain tumor segmentation and classification, such as traditional machine
learning, convolutional neural networks, capsule networks, and transformers, was
combined with the afore mentioned search terms.

4 Dataset

There are number of datasets containing both non-tumor and tumor brain images,
which are readily accessible to the public and researchers. In this section, some
noteworthy and complex datasets are presented which are widely used.
312 M. Jain et al.

1. BraTS (The Brain Tumor Segmentation)—In all of its research, BraTS has
emphasized the assessment of cutting-edge techniques for the segmentation
of brain tumor in multimodal magnetic resonance imaging (MRI) data. Since
its establishment, BraTS has concentrated on serving as a standard bench-
marking environment for brain glioma segmentation algorithms, employing care-
fully curated multi-institutional multi-parametric magnetic resonance imaging
(mp-MRI) data [7].
2. The Medical Segmentation Decathlon (MSD) dataset—This dataset contains
MRI scans of brain tumor and is part of the Medical Segmentation Decathlon
challenge organized by Facebook AI. It contains mp-MRI images containing 750
4D volumes (484 testing and 266 validation) [8].
3. TCIA dataset—Publicly available medical imaging database of cancer cases.
It was developed by the National Cancer Institute (NCI) and maintained by the
Cancer Imaging Program at the Department of Radiology and Imaging Sciences
of the National Institutes of Health Clinical Center. A sizable number of medical
pictures, including X-rays, CT scans, MRIs, and PET scans, are available in the
database. These photos come from a variety of places, including clinical trials,
academic institutions, and lone researchers. A computer-aided diagnostic and
treatment planning system is being developed, and this database is intended to
facilitate that process.
The exact number of images will depend on the specific collection within the
TCIA that you are accessing [9].
4. Harvard Medical School website [10]—Datasets are also available on online
repositories such as Kaggle, GitHub, and Figshare, with additional resources and
information for research purposes.

5 Basic Process/Pipeline of Brain Tumor Detection

Like other machine learning models this image processing models also follow as
approach as depicted in Fig. 2.
Like any other model it begins with data collection followed by pre-processing.
Main techniques used in the process pipeline are
1. Image pre-processing—The pre-processing of the photos is completed before
incorporating them into the suggested framework. Image pre-processing is a
crucial stage in the pipeline for image processing and helps to increase the
precision and effectiveness of image processing algorithm [11].
The removal of non-brain tissue (commonly referred to as a brain extraction),
image registration (in the case of multimodal image processing) and MRI bias
field correction are the most essential steps. Additionally, pre-processing aids
shorter model training time and quicken model inference. Reducing the input
image size will substantially reduce model training time without affecting model
performance if the images are extremely huge.
A Comprehensive Survey of Machine Learning Techniques for Brain … 313

Fig. 2 Basic process of brain tumor detection

2. Data distribution—The distribution of data is another crucial factor in picture


classification. The entire dataset must be split into training, testing, and validation
sets, with training data having the greatest weight (about more than 70%). We
may train a model to understand the underlying patterns in the data, assess how
well it performs on untested data, and avoid overfitting to the training data by
dividing the data into training, testing, and validation sets, respectively [12].
3. Segmentation is done to separate a picture into a group of semantically signifi-
cant, homogenous, and nonoverlapping sections with comparable qualities, such
as intensity, depth, color, or texture. The segmentation output is either a collec-
tion of contours that define the region boundaries or an image of labels defining
each homogenous section. With image segmentation techniques, a certain collec-
tion of pixels from the image may be divided and grouped. Here, labeling the
pixels, and those pixels that have a label belong to a category where they share
one or more characteristics. These labels allow us to define borders, draw lines,
and distinguish between the most necessary objects in a picture and the rest
of the less crucial ones. Some of the main segmentation methods are widely
used [3]. Thresholding works by setting this value to distinguish the tumor from
the surrounding healthy tissue is known as “thresholding.” For instance, tumor
can be identified by pixels with intensity values over a predetermined threshold,
whereas healthy tissue can be identified by pixels with intensity values below the
threshold. Region growing technique lets you begin with a seed pixel, voxel, or
an area and gradually add surrounding pixels that satisfy specific criteria, such
as having comparable gradients or intensity levels. This is useful for segmenting
tumor with well-defined boundaries. Watershed method relies on the notion of
flooding a surface from its lowest points to produce distinct basins that correlate to
various features or structures. This can be helpful for segmenting tumor having
314 M. Jain et al.

diffuse or irregular boundaries. CNNs stands for convolution neural network.


CNNs are deep learning models used in computer vision and image processing
tasks. Classification and segmentation are the two main tasks that are used for
the purpose [13]. As already discussed, the process of breaking a picture into
regions or segments, each segment representing a separate item or component
of an object, is known as segmentation. CNNs can be used for segmentation by
learning to map image pixels to their corresponding object labels. On the other
hand, classification refers to the task of assigning a label or category to an entire
image or a specific region within an image. CNNs are frequently used for image
classification because they can be trained to extract pertinent information from
a picture and then utilize those features to forecast the class label.
4. Feature extraction and feature optimization—One of the most crucial
elements of building machine learning models is dimensionality reduction and
the overfitting issue. Many image processing and computer vision problems
involve two related but separate procedures which are segmentation and feature
extraction. Segmentation involves dividing an image into meaningful regions or
segments, while feature extraction involves extracting relevant information or
features from those segments [14].
5. Classification—The process of classifying an image into the tumor or non-tumor
categories is known as classification. This is usually accomplished through the
use of machine learning algorithm where it is trained on a collection of labeled
brain imaging data. Decision trees, random forests, support vector machines
(SVMs), logistic regression, Naive Bayes, K-nearest neighbors (KNN), artifi-
cial neural networks (ANNs) are few supervised classifiers and fuzzy C-means
(FCM), hidden Markova random field (HMRF), self-organization map (SOM),
and Stacked Sparse Autoencoder (SSAE) are unsupervised classifiers [2]. The
summary of existing approaches mainly using deep learning (DL) CNN-based
classifier is given in Table 1.
If you have access to a substantial, narrowly focused dataset that is pertinent to
the subject you’re working on, training a CNN from scratch can produce superior
results. Additionally, it is possible to integrate ensemble approaches, although trade-
offs between performance improvement and resource requirements must be taken
into account.

6 Transfer Learning Approach

There has been a shift to new approaches for detection of tumor like transfer learning-
based approach. Table 2 lists the summary of transfer learning methods. On a private
glioma MRI dataset made up of 113 LGG and HGG patients, Yang et al. evaluated
the pre-trained CNNs outperform CNNs created from scratch in classification tasks.
The trials demonstrated that transfer learning and fine-tuning enhanced classification
Table 1 Summary of existing classification approaches
References Year Authors Techniques Dataset Performance
[15] 2017 Cabria and Gondra Potential field clustering BraTS SD = 0.283, average = 0.517,
median = 0.644
[16] 2018 Seetha and Raja CNN BraTS Accuracy of 97.5%
[17] 2018 Mohsen et al. Classification using DNN—along Harvard Medical School Accuracy with DNN—96.97%,
with segmentation using fuzzy with KNN of 95.45%
C-means, feature extraction using
DWT and PCA for reduction
[18] 2019 Hossain et al. CNN BraTS Accuracy—97.87%
[19] 2020 Toğaçar et al. CNN-based BrainMRNet model Local data Accuracy—96.05%
[20] 2021 Deepak and Ameer Proposed a hybrid technique using Figshare Accuracy—95.82%
CNN-based features and SVM
[21] 2021 Karayegen and Aksahin CNN BraTS Mean prediction ratio—91.718,
95.7 accuracy
[22] 2022 Jena et al. Fuzzy C-means (FCM), K-means, BraTS 2017 and BraTS 2019 Maximum accuracy 97% with
and hybrid image segmentation ensemble methods
algorithm and support vector
machines (SVMs), K-nearest
A Comprehensive Survey of Machine Learning Techniques for Brain …

neighbors (KNNs), binary decision


trees (BDTs), random forest (RF),
and ensemble methods
[23] 2022 Corral et al. Decision tree, random forest, Publically available 72.54%, 78.43%, 75.88%, 100%
AdaBoost, and support vector with decision tree, random forest,
machine AdaBoost, and support vector
machine, respectively
315
316 M. Jain et al.

performance for HGG and LGG. Using GoogLeNet, they were able to reach their
greatest test accuracy of 90% [24].

Table 2 Summary of transfer learning techniques


References Year Author Method Dataset Result
[2] 2019 Amin et al. AlexNet and (BraTS) ACC of 95%
GoogLeNet and
ISLES
[25] 2019 Banerjee VGG and ResNet Local data ACC of 97%
et al.
[26] 2019 Swati et al. VGG 19 Figshare ACC of 94.82%
[27] 2020 Sadad et al. ResNet50, Figshare ACC of 92.9%,
InceptionV3, 92.8%, 91.8%,
MobileNet-V2, 99.6%, 93.1%,
NASNet, and respectively
DenseNet201
[28] 2020 Kaur and AlexNet, ResNet50, Harvard, ACC of 100%, 94%,
Gandhi GoogLeNet, clinical, and 95.92 on three
VGG-16, ResNet101, and datasets
VGG-19, Figshare
InceptionV3, and
InceptionResNetV2
[29] 2020 Pravitasari UNet-VGG16 Local data ACC up to 96%
et al.
[30] 2020 Noreen InceptionV3 and Kaggle ACC of 99.34 and
et al. DensNet201 99.51%
[12] 2020 Rai and CNN-based deep Kaggle ACC of 88%, 90%,
Chatterjee neural network and 98% of Le-Net,
named U-Net VGG-16 and LU-Net
(LU-Net) model respectively
[11] 2020 Mehrotra CNN-based AlexNet, TCIA ACC of 99.04%
et al. GoogLeNet,
SqueezeNet,
ResNet50, and
ResNet101
[31] 2021 Hao et al. AlexNet BraTS 82% AUC
[32] 2021 Arbane ResNet, Local data ACC—98.24 and
et al. MobilNet-V2 and F1-score—98.42
Xception
[33] 2023 Zulfiqar EfficientNetB2 Figshare F1-score of 98.86%,
et al. 98.65%, 98.77%, and
98.71%, respectively
A Comprehensive Survey of Machine Learning Techniques for Brain … 317

7 Research Findings

A brain tumor expands fast in size. Consequently, making an early diagnosis of a


tumor is a difficult process. Several reasons make it challenging to segment brain
tumor. MRI image changes due to fluctuations in the coil’s magnetic field. Gliomas
have infiltration because to their fuzzy boundaries. Thus, segmenting them becomes
more challenging. Since stroke lesions have uncertain borders, complicated forms,
and changes in intensity, segmenting these lesions is a particularly challenging
process. Another problematic step is the optimized and optimum feature extrac-
tion or selecting wrong feature leading incorrectly classifying brain tumor. After
reviewing of various studies using different methods to detect brain tumor, among
classical approaches, we can see CNN’s popularity as depicted in Fig. 3.
Also, that ResNet and VGG models are quite popular among pre-trained models
as depicted in Fig. 4. Moreover, EfficientNet variations with lesser size and compar-
atively higher Top 1 accuracy can be explored with open-source hyper-parameter
frameworks like optuna to achieve better performance.

Fig. 3 Techniques Techniques Popularity


popularity

CNN
DNN
Ensemble tehniques,SVM,KNN,BDT,RF
others
no. of times used in reviewed studies

Fig. 4 Pre-trained models 6


5
4
3
2
1
0

Pre-Trained Models
318 M. Jain et al.

8 Conclusion

The appearance, changing size, form, and structure of brain tumor make reliable
tumor identification still exceedingly difficult, although techniques for segmenting
tumor in MR images have demonstrated great promise for analysis and tumor detec-
tion. For the tumor area to be effectively segmented and classified, there are still
numerous improvements needed. The categorization of pictures and the identification
of tumor area substructures are both problems and limits of existing work.
Overall, this survey includes all pertinent issues and the most recent research,
along with its drawbacks and difficulties. It will be beneficial for the researchers to
get knowledge about how to conduct new research quickly and appropriately.
Although deep learning techniques have made a substantial contribution, a general
methodology is still needed. These methods offer superior results when tested after
training on similar acquisition conditions (intensity range and resolution). Neverthe-
less, a little difference between the training and further testing images significantly
affects the resilience of the system.

References

1. Anusree P, Reshma R, Saritha K, Ani S (2020) MRI brain image segmentation using machine
learning techniques, vol 6, no 08
2. Amin J, Sharif M, Haldorai A, Yasmin M, Nayak RS (2021) Brain tumor detection and
classification using machine learning: a comprehensive survey. Complex Intell Syst 1–23
3. Kumar S, Dhir R, Chaurasia N (2021) Brain tumor detection analysis using CNN: a review. In:
2021 international conference on artificial intelligence and smart systems (ICAIS), Mar 2021.
IEEE, pp 1061–1067
4. Angulakshmi M, Lakshmi Priya GG (2017) Automated brain tumor segmentation techniques—
a review. Int J Imaging Syst Technol 27(1):66–77
5. Rahman T, Islam MS (2023) MRI brain tumor detection and classification using parallel deep
convolutional neural networks. Meas Sens 26:100694
6. Despotović I, Goossens B, Philips W (2015) MRI segmentation of the human brain: challenges,
methods, and applications. Comput Math Methods Med 2015
7. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J et al (2014) The
multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging
34(10):1993–2024
8. Antonelli M, Reinke A, Bakas S, Farahani K, Landman BA, Litjens G et al (2021) The medical
segmentation decathlon, 10. arXiv preprint arXiv:2106.05735
9. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P et al (2013) The cancer imaging
archive (TCIA): maintaining and operating a public information repository. J Digit Imaging
26:1045–1057
10. https://spl.harvard.edu/software-and-data-sets
11. Mehrotra R, Ansari MA, Agrawal R, Anand RS (2020) A transfer learning approach for AI-
based classification of brain tumor. Mach Learn Appl 2:100003
12. Rai HM, Chatterjee K (2020) Detection of brain abnormality by a novel Lu-Net deep neural
CNN model from MR images. Mach Learn Appl 2:100004
13. Ezhilarasi R, Varalakshmi P (2018) Tumor detection in the brain using faster R-CNN. In:
2018 2nd international conference on I-SMAC (IoT in social, mobile, analytics and cloud)
(I-SMAC), Aug 2018. IEEE, pp 388–392
A Comprehensive Survey of Machine Learning Techniques for Brain … 319

14. Bahadure NB, Ray AK, Thethi HP (2017) Feature extraction and selection with optimization
technique for brain tumor detection from MR images. In: 2017 international conference on
computational intelligence in data science (ICCIDS), June 2017. IEEE, pp 1–7
15. Cabria I, Gondra I (2017) MRI segmentation fusion for brain tumor detection. Inf Fusion
36:1–9
16. Seetha J, Raja SS (2018) Brain tumor classification using convolutional neural networks.
Biomed Pharmacol J 11(3):1457
17. Mohsen H, El-Dahshan ESA, El-Horbaty ESM, Salem ABM (2018) Classification using deep
learning neural networks for brain tumor. Future Comput Inform J 3(1):68–71
18. Hossain T, Shishir FS, Ashraf M, Al Nasim MA, Shah FM (2019) Brain tumor detection using
convolutional neural network. In: 2019 1st international conference on advances in science,
engineering and robotics technology (ICASERT), May 2019. IEEE, pp 1–6
19. Toğaçar M, Ergen B, Cömert Z (2020) BrainMRNet: brain tumor detection using magnetic
resonance images with a novel convolutional neural network model. Med Hypotheses
134:109531
20. Deepak S, Ameer PM (2021) Automated categorization of brain tumor from MRI using CNN
features and SVM. J Ambient Intell Humaniz Comput 12:8357–8369
21. Karayegen G, Aksahin MF (2021) Brain tumor prediction on MR images with semantic segmen-
tation by using deep learning network and 3D imaging of tumor region. Biomed Signal Process
Control 66:102458
22. Jena B, Nayak GK, Saxena S (2022) An empirical study of different machine learning tech-
niques for brain tumor classification and subsequent segmentation using hybrid texture feature.
Mach Vis Appl 33(1):6
23. Corral H, Melchor J, Sotelo B, Vera J (2022) Detection of brain tumor using machine learning
algorithms. arXiv preprint arXiv:2201.04703
24. Yang Y, Yan LF, Zhang X, Han Y, Nan HY, Hu YC et al (2018) Glioma grading on conventional
MR images: a deep learning study with transfer learning. Front Neurosci 12:804
25. Banerjee S, Mitra S, Masulli F, Rovetta S (2019) Deep radiomics for brain tumor detection and
classification from multi-sequence MRI. arXiv preprint arXiv:1903.09240
26. Swati ZNK, Zhao Q, Kabir M, Ali F, Ali Z, Ahmed S, Lu J (2019) Brain tumor classification
for MR images using transfer learning and fine-tuning. Comput Med Imaging Graph 75:34–46
27. Sadad T, Rehman A, Munir A, Saba T, Tariq U, Ayesha N, Abbasi R (2021) Brain tumor
detection and multi-classification using advanced deep learning techniques. Microsc Res Tech
84(6):1296–1308
28. Kaur T, Gandhi TK (2020) Deep convolutional neural networks with transfer learning for
automated brain image classification. Mach Vis Appl 31(3):20
29. Pravitasari AA, Iriawan N, Almuhayar M, Azmi T, Irhamah I, Fithriasari K et al (2020) UNet-
VGG16 with transfer learning for MRI-based brain tumor segmentation. TELKOMNIKA
(Telecommun Comput Electron Control) 18(3):1310–1318
30. Noreen N, Palaniappan S, Qayyum A, Ahmad I, Imran M, Shoaib M (2020) A deep learning
model based on concatenation approach for the diagnosis of brain tumor. IEEE Access 8:55135–
55144
31. Hao R, Namdar K, Liu L, Khalvati F (2021) A transfer learning–based active learning
framework for brain tumor classification. Front Artif Intell 4:635766
32. Arbane M, Benlamri R, Brik Y, Djerioui M (2021) Transfer learning for automatic brain tumor
classification using MRI images. In: 2020 2nd international workshop on human-centric smart
environments for health and well-being (IHSH), Feb 2021. IEEE, pp 210–214
33. Zulfiqar F, Bajwa UI, Mehmood Y (2023) Multi-class classification of brain tumor types from
MR images using EfficientNets
Current Advances in Locality-Based
and Feature-Based Transformers:
A Review

Ankit Srivastava, Munesh Chandra, Ashim Saha, Sonam Saluja,


and Deepshikha Bhati

Abstract The rise of deep learning and the transformative impact of transformers,
particularly in NLP and CV domains, have been remarkable. While transformers
have shown promise in medical imaging tasks, the original architecture required
enhancements to capture local information effectively. Recent research has focused
on developing locality-based and feature-based transformers to process spatial infor-
mation better and improve feature representation in input data. The main content of
this survey article includes (1) the paper covers the background of transformers,
(2) provides a theoretical review of the transformer and vision transformer, (3)
offers insights into different spatial- and feature-based vision transformers, and
(4) it also addresses common challenges and explores potential research directions,
underscoring the current research challenges in further improving their performance.

Keywords Transformers · Medical imaging · Classification · Survey · Vision


transformers · Convolutional neural networks · Attention mechanisms ·
Self-supervised learning

1 Introduction

Convolutional neural networks (CNNs) have had a revolutionary impact on computer


vision tasks. One major limitation is their heavy reliance on computationally expen-
sive convolutional layers, which also require large amounts of training data [1–3].
The convolution operator is the backbone of CNNs, working locally and offering

A. Srivastava (B) · M. Chandra · A. Saha · S. Saluja


Computer Science and Engineering Department, NIT Agartala, Agartala, India
e-mail: srivastava.chaman@gmail.com
A. Saha
e-mail: ashim.cse@nita.ac.in
D. Bhati
University of Kent, Kent, OH, USA
e-mail: dbhati@kent.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 321
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_28
322 A. Srivastava et al.

translational equivariance. These characteristics aid in creating effective and gener-


alizable medical imaging solutions. The vision field researchers have devoted signif-
icant research effort to incorporating attention mechanisms [4] into CNN-inspired
architectures [5]. During a convolution process, it can’t capture long-range semantic
information effectively due to the restrictions of the local receptive field. However,
CNNs rely heavily on convolutional layers, which are computationally expensive
and require many training data.
The transformer architecture, which was initially developed for natural language
processing (NLP) applications [6, 7], has proven to be highly successful in a wide
range of computer vision (CV) tasks, including classification [8, 9], segmentation [10,
11], and object detection [12, 13]. Medical imaging applications have shown promise
in utilizing the transformer model because it captures contextual information and
long-term persistence. Transformers’ primary feature is a self-attention method that
allows a model to pick up on global instances and establish long-range dependencies.
With high accuracy in image classification, the transformer model has been used
to classify different medical images modalities, such as X-ray [14], MRI [15], and CT
[16]. The transformer model has been used to segment organs and lesions in medical
images. The transformer model has been used in registration to align medical images
from different modalities or time points. In a generation, the transformer model has
synthesized medical ideas with various properties, such as disease progression and
treatment response.
Motivation and Contribution: This study is different and novel compared to similar
studies in several key aspects. Firstly, it comprehensively reviews vision trans-
formers’ architecture, training, and performance, providing a holistic understanding
of their capabilities in various domains. While some previous studies may have
focused on specific aspects of transformers, this article offers a more comprehensive
and in-depth analysis.
Secondly, this study emphasizes the recent developments in locality-based and
feature-based transformers to address the original architecture’s limitations in effec-
tively capturing local information. Exploring these advancements highlights cutting-
edge research in enhancing spatial processing and feature representation. This
differentiates it from earlier works that may have focused only on the traditional
transformer model.
Furthermore, the paper discusses specific challenges and open problems asso-
ciated with vision transformers, providing valuable insights into the field’s current
state. Discussing potential research directions offers a forward-looking perspective
on the future of vision transformers, making it distinct from studies that have only
focused on the past or existing applications.
Paper Organization: In Sect. 2, the background of transformers is introduced,
including a historical perspective to provide context for the readers. Section 3 delves
into the essential design features of the transformer model. Section 4 focuses on the
vision transformer, exploring its architecture and discussing different variants based
on spatial- and feature-based approaches.
Current Advances in Locality-Based and Feature-Based Transformers … 323

Moving forward, Sect. 5 presents a comprehensive summary of the key challenges


faced in this field and outlines various designs to address these challenges. Finally, in
Sect. 6, the article concludes by bringing together the main findings and highlighting
the significance of vision transformers in various domains.
Following this organization, the survey offers a well-structured and cohesive
exploration of vision transformers, covering their history, design features, vari-
ants, challenges, and potential solutions. This approach ensures that readers under-
stand the subject matter comprehensively and can identify future research directions
effectively.

2 Background

Transformers were initially created to address the limitations of RNNs and CNNs
in capturing long-term dependencies in sequential data, particularly in the context
of natural language processing. Their groundbreaking idea involves incorporating
attention mechanisms that enable the model to focus on specific sections of the input
sequence, facilitating more effective processing.

2.1 Attention in Transformers

In transformers, the self-attention mechanism comprises three primary components:


the query, key, and value matrices. A transformer model processes each position in
an input sequence to generate three matrices: a query matrix, a key matrix, and a
value matrix. The model computes the attention weights between the query and key
matrices, and then it weights the values by these attention weights and sums them to
generate the output (Fig. 1).

Fig. 1 Illustration of self-attention


324 A. Srivastava et al.

2.2 Self-attention (SA)

The attention mechanism proposed in [17] involves the use of learnable parameters
W q , W k , and W v to convert the input X ∈ R, which is a matrix of shape n × c, into
query matrix Q, key matrix K, and value matrix V, all of shape n × d.

Q = X × W q , W q ∈ R c×d ,
K = X × W k , W k ∈ R c×d , (1)
v v
V = X ×W , W ∈ R c×d
.

The equation system (1) shows the calculation of query Q, key K, and value V
matrices by multiplying input X with weight matrices W q , W k , and W v , respectively.
( √ )
A(Q, K ) = Softmax Q × K T / d , (2)

Z = SA(Q, K , V ) = A(Q, K ) × V, (3)

where K = embedding matrix, Q = Look-up vector.


Specifically, Z is the sum of the values V weighted by the attention distribution A,
which is calculated for all connected elements. This approach enables transformers
to handle long-range dependencies in NLP and computer vision.

2.3 Multi-head Self-attention (MSA)

MSA computes multiple attention weight matrices by splitting the queries, keys,
and values into h “heads” or parts, each calculated using a separate set of learned
projection matrices.
A learned weight matrix multiplies the concatenated attention weight matrices to
derive the final output of the MSA operation. The ability of transformer models to
capture intricate patterns and relationships in data has resulted in better performance
in NLP and CV tasks.
( )
Zi = SA X × Wi , X × Wik , X × Wiv ,
q
(4)
MSA(Q, K , V ) = Concat[Z 1 , . . . , Z h ] × W o ,

where h = total number of heads and W o ∈ R hd × c is a linear projection matrix.


Wi , Wik , Wiv are specific to the ith attention head. The MSA partitions the queries
q

(Q), keys (K), and values (V ) into multiple sub-spaces, allowing for the calculation
of similarities between context features.
Current Advances in Locality-Based and Feature-Based Transformers … 325

In conclusion, transformers’ ability to incorporate attention mechanisms has revo-


lutionized deep learning models, making them powerful tools for processing sequen-
tial data in various domains. This attention-based approach has paved the way for
significant advancements and research in NLP and CV.

3 Transformer

In 2017, Vaswani et al. introduced the transformer architecture [4] as a neural network
architecture primarily used for NLP tasks like text generation, sentiment analysis,
and language translation. Instead of recurrent or convolutional layers, the transformer
model uses self-attention in a sequence-to-sequence model to process sequential input
data. The self-attention mechanism in transformers assigns importance weights to
each token or word in a sequence, depending on its relationship with other tokens or
words in the sequence. Consequently, the model can capture long-range dependencies
and contextual information without requiring recurrent connections, as in traditional
recurrent neural networks.
Each layer of the transformer architecture’s encoder and decoder components
contains two sub-layers: a position-wise feed-forward network and a multi-head
self-attention mechanism. These sub-layers are identical in each layer of the stack.

3.1 Position-Wise Feed-Forward Network

After the MSA output is passed through it, the position-wise feed-forward network
applies a two-layer feed-forward neural network with a ReLU activation function to
each position in the sequence independently. As a result, it is known as “position-
wise.”

FFN(x) = max(0, x W1 + b1 )W2 + b2 . (5)

The position-wise feed-forward network involves a weight matrix W 1 ∈ R d × h,


a bias vector b1 , a ReLU activation function, another weight matrix W 2 ∈ R h × d,
and a bias vector b2 . The input sequence x is first multiplied by W 1 and added to b1 ,
followed by a ReLU activation function. The output is then multiplied by W 2 and
added to b2 to obtain the final output of the network.

3.2 Positional Encoding

This method is used in transformers to incorporate the order of sequence elements


into the model. As transformers lack the inherent sense of order and position that
326 A. Srivastava et al.

recurrent neural networks possess, positional encoding provides this information to


the model. This adds the positional encoding vector to the input sequence embeddings
before feeding them. This vector is a fixed value added to the input embeddings of
each position in the sequence. The formula for the positional encoding is
( )
PE(pos,2i) = sin pos/10,0002i/d
( ), (6)
PE(pos,2i+1) = cos pos/10,0002i /d

where PE(pos,2i) and PE(pos,2i+1) are the 2ith item of the positional encoding vector for
position pos, d is the hidden dimension of the model, and i ranges from 0 to [d/2]
− 1.
The transformer model can differentiate between positions in the sequence and
include the element’s order in its computations by including positional encoding in
the input embeddings. This capability enables the model to acquire more intricate
and significant representations of the input sequence.
The transformer architecture has proven more parallelizable and effective in
handling longer sequences compared to traditional RNN-based models, making it a
powerful tool in various NLP tasks and other domains where capturing long-range
dependencies is crucial.

3.3 Encoder

It processes the input sequences to generate a set of hidden patterns that contain
the relevant sequence information. It consists of multiple layers containing two sub-
layers: the multi-head self-attention (MSA) layer and the position-wise feed-forward
layer. First, the input sequence is embedded into a continuous vector space using an
embedding layer, and positional encoding is added to provide positional information
about the sequence elements.
MSA is applied to the input sequence in each encoder layer’s first sub-layer to
capture dependencies between different elements. In the second sub-layer, a position-
wise feed-forward network independently involves a nonlinear transformation to each
sequence component to capture complex interactions between the sequence elements
and generate higher-level features. The downstream tasks use the final hidden repre-
sentations of the input sequence obtained after passing through all encoder layers.
The transformer encoder is more parallelizable and effective in handling longer
sequences than RNNs.
Current Advances in Locality-Based and Feature-Based Transformers … 327

3.4 Decoder

The decoder in the transformer architecture produced the output sequence by utilizing
the hidden representations generated by the encoder. The decoder has multiple
identical layers, each containing three sub-layers:
• The masked multi-head self-attention layer
• The multi-head self-attention layer
• The fully connected feed-forward network layer.
By masking the attention weights for future positions, the masked multi-head
self-attention layer ensures that the decoder can only attend to the previously gener-
ated output elements, not future ones. The second sub-layer applies multi-head self-
attention to the hidden representations produced by the encoder. The third sub-layer
is a fully connected feed-forward network that independently involves a nonlinear
transformation of each element in the sequence. A SoftMax layer utilizes the final
hidden representations of the output sequence generated after passing through all
these layers to create the final output. The decoder is more parallelizable and better
suited for longer lines than traditional RNN-based sequence-to-sequence models.
The transformer architecture has proven more parallelizable and effective in
handling longer sequences compared to traditional RNN-based models, making it a
powerful tool in various NLP tasks and other domains where capturing long-range
dependencies is crucial.

4 Vision Transformer (ViT)

ViT is a variant of the transformer architecture adapted for processing images. The
ViT was introduced in a 2020 paper by Dosovitskiy et al. called “An Image Is Worth
16 × 16 Words: Transformers for Image Recognition at Scale” [8].
In this architecture, an image is first divided into a sequence of fixed-size (i.e., 16
× 16, 32 × 32) patches, flattened, and forwarded into a standard transformer encoder.
The patch embeddings are then augmented with a learnable positional encoding that
indicates the spatial location of each image patch. In order to transform an image
X ∈ R H × W × C (where H height, W weight, C channels), it needs to reshape
X into sequence of flattened 2D patches: xp ∈ R N × (P2 · C), where (P × P) is
the patch resolution and N = HW /P2. The ViT model is designed to process images
of varying sizes, and it achieves this by introducing a learnable token known as the
“class token.” This token is concatenated to the patch embeddings and acts as a
representation of the whole image. The transformer layers process the token-patch
embeddings and produce a final feature vector. This is then utilized to predict the
class label of the image.

Z ← concat([CLASS], X W ), (7)
328 A. Srivastava et al.

Fig. 2 Image to sequence and embedding dimension d

where W = projection.
The ViT architecture utilizes pre-training on extensive datasets to develop visual
representations that can be adapted for downstream tasks. This pre-training is usually
conducted on vast datasets like ImageNet, using a variation of contrastive learning
known as the “pretext task.” In this technique, the model is trained to predict the
patches position of an image, encouraging it to acquire significant visual charac-
teristics that capture the spatial connections between various portions of an image
(Fig. 2).

4.1 Variant Architectures

The original vision transformer (ViT) architecture had limitations, including


restricted spatial modeling and a requirement for extensive training datasets. To
address these challenges, researchers have proposed variant architectures for ViTs
to enhance their performance in various tasks, including medical image analysis.
These variants are different versions or modifications of the original ViT architec-
ture. They can be classified based on several factors: spatial-based, feature-based,
hierarchical-based, and application-based ViT.
Researchers have proposed several variants to address the limitations of the orig-
inal architecture and enhance its effectiveness and precision in different applications.
Spatial-Based ViT
Due to their efficacy in NLP, transformers have gained popularity in other fields,
including computer vision. However, vision transformers have needed help capturing
local and translation-invariant features, unlike CNNs. To address this limitation,
researchers have explored incorporating CNN structures into vision transformers to
improve their ability to capture local information.
Current Advances in Locality-Based and Feature-Based Transformers … 329

Feature-Based ViT
Researchers are working on enhancing the performance of ViT by generating many
feature maps, including token maps and attention maps. This approach aims to extract
a comprehensive range of features, enabling the model to excel in diverse tasks. By
utilizing distinct feature maps, the model can capture various types of components,
improving its overall performance.
Our focus is primarily on discussing transformers’ spatial-based and feature-based
variants, with only a brief discussion of specific variants related to hierarchical-based
models and applications (Table 1).
These variant architectures contribute to the adaptability and efficiency of the
ViT model in various image processing tasks, including image recognition, object
detection, and segmentation.
Hierarchical-Based ViT
The original ViT architecture has a significant limitation: Its high computational
cost increases as the input size grows. This issue has prompted researchers to
explore different methods for reducing the feature size, which can help decrease
training time and computational complexity. By implementing these approaches, the
computational cost can be reduced while maintaining the model’s performance.
Liu et al. introduced Swin-Transformer [34] as a modification of the ViT archi-
tecture. It employs a hierarchical structure that merges patches after each block. This
reduction in computation time complexity is achieved by computing self-attention
only within each local shifted window. PiT [35], proposed by Heo et al. in 2021, is
another variant of ViT architecture that includes a Pooling Layer for dimension reduc-
tion. The Pooling Layer employs a depth-wise convolutional layer to accomplish the
reduction.

5 Challenges and Discussion

While ViT has shown great promise in image classification tasks, there are still
some challenges and discussions surrounding their use. One challenge is large-scale
vision transformers’ high computational cost and memory requirements. Training
and inference of these models can be time-consuming and require specialized hard-
ware, limiting their practical use in some applications. Another challenge is the
need for extensive training data to achieve high performance. While pre-training on
large-scale datasets has improved performance, obtaining such datasets in specific
domains, such as medical imaging, can be challenging.
The application of ViT in medical imaging has gained attention as a promising
research area, primarily due to its remarkable performance in various image
processing tasks. ViT’s ability to learn global image representations has the potential
to improve the interpretation of visual patterns in medical imaging, where an interna-
tional understanding of the entire image is critical. Despite its potential advantages,
330 A. Srivastava et al.

Table 1 Review of different spatial- and feature-based ViT


References Architecture 2D/ Parameter Task Dataset Highlight
type 3D
DeiT [18] Hybrid 2D 86M Classification ImageNet [19], It involves
iNaturalist training a ViT
2018 [20], using
iNaturalist knowledge
2019 [21], distillation,
Flowers-102 where a CNN is
[22] used as a
teacher model to
transfer its
inductive bias
conViT [23] Hybrid 2D 48M Classification ImageNet [19] This model
incorporating
Gated Positional
Self-attention
(GPSA) to
capture local
information
LeViT [24] Hybrid 2D 18.9M Classification ImageNet-2012 LeViT achieves
[19] high accuracy
due to the
training
techniques used
in DeiT. In
addition, it’s
also benefits
from specific
design choices.
These design
choices enable
the model to
process
information
while
minimizing
computational
complexity
efficiently
(continued)
Current Advances in Locality-Based and Feature-Based Transformers … 331

Table 1 (continued)
References Architecture 2D/ Parameter Task Dataset Highlight
type 3D
CeiT [25] Hybrid 2D 6.4M 7 ImageNet [19] CeiT
downstream incorporates a
tasks depth-wise
convolutional
layer in its
feed-forward
network to
enhance the
extraction of
local features.
The model
employs
layer-wise
class-token
attention, which
involves
computing
self-attention on
it, gather
different class
representations
LocalViT Hybrid 2D 7.5M Classification ImageNet-2012 The authors
[26] [19] propose using
convolutional
layers within the
feed-forward
network (FFN)
to capture local
features in each
transformer
block. They
used various
activation
functions and
architectures
within the FFN
to enhance the
model’s
performance
(continued)
332 A. Srivastava et al.

Table 1 (continued)
References Architecture 2D/ Parameter Task Dataset Highlight
type 3D
CCT [27] Hybrid 2D 3.7M Classification CIFAR-10 [28] CCT utilizes
convolutional
layers to obtain
embeddings and
eliminates
positional
embeddings to
enable the
model to handle
inputs of
varying sizes
DeepViT Pure 2D 55M Classification ImageNet [19] The authors
[29] analyzed the
attention maps
of the ViT
model and
observed that
attention
collapse occurs
in the deeper
layers of the
model
T2T-ViT Pure 2D 21.5M Classification ImageNet [19] The architecture
[30] of DeiT is based
on the
observation that
many of the
feature maps in
ViT are
meaningless
when reshaped
from tokens,
which replaces
fully connected
layers with a
combination of
convolutional
and attention
layers
(continued)
Current Advances in Locality-Based and Feature-Based Transformers … 333

Table 1 (continued)
References Architecture 2D/ Parameter Task Dataset Highlight
type 3D
RegionViT Pure 2D 35.7M Classification ImageNet21K The proposed
[31] and [32] ViT features a
segmentation pyramid
structure and
introduces a
new
regional-to-local
attention
mechanism in
vision
transformers. It
generates two
types of tokens,
regional and
local, from an
image of
different sizes
SpectFormer Pure 2D 9M Object ImageNet-1K To investigate
[33] detection and [32] the fundamental
segmentation design of
transformers,
author employ a
combination of
techniques,
including
spectral and
multi-headed
attention

challenges must be addressed to fully realize the benefits of using ViT in medical
imaging. A sufficient amount of labeled data poses a significant challenge in the
medical domain. Another challenge is the high computational cost of ViT, which can
limit its practical application in medical imaging.

6 Conclusion

In conclusion, the emergence of vision transformers (ViT) has brought about a


paradigm shift in computer vision. The extensive demonstrations of the potential of
vision transformers (ViTs) in the medical imaging domain have shown their ability
to outperform traditional convolutional neural networks in image classification tasks.
Since its publication, researchers have introduced numerous variations of the original
ViT architecture to tackle various issues and improve its efficiency regarding data
usage and computational resources. These variants have shown promising results and
334 A. Srivastava et al.

provided researchers with diverse options for their needs. However, some challenges
and limitations are still associated with ViTs, such as their interpretability and ability
to handle small or irregularly shaped images. Furthermore, further investigation is
necessary to explore the potential of ViTs in diverse medical imaging domains.
Overall, the development of ViTs has opened up new avenues for research and has
the potential to revolutionize the field of medical imaging. As technology evolves, it
will be exciting to see breakthroughs and innovations emerging.

References

1. Arevalo J, Gonzalez FA, Ramos-Pollán R, Oliveira JL, Lopez MAG (2016) Representation
learning for mammography mass lesion classification with convolutional neural networks.
Comput Methods Programs Biomed 127:248–257
2. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
3. Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s.
arXiv preprint arXiv:2201.03545
4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I
(2017) Attention is all you need. In: Advances in neural information processing systems, pp
5998–6008
5. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the
IEEE conference on computer vision and pattern recognition, pp 7794–7803
6. Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805
7. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding
by generative pre-training
8. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X et al (2021) An image is worth
16×16 words: transformers for image recognition at scale. In: ICLR
9. Perera S, Adhikari S, Yilmaz A (2021) POCFormer: a lightweight transformer architecture for
detection of COVID-19 using point of care ultrasound. arXiv preprint arXiv:2105.09913
10. Xie E, Wang W, Wang W, Sun P, Xu H, Liang D, Luo P (2021) Segmenting transparent objects
in the wild with transformer. In: IJCAI
11. Zhang Y, Higashita R, Fu H, Xu Y, Zhang Y, Liu H, Zhang J, Liu J (2021) A multibranch
hybrid transformer network for corneal endothelial cell segmentation. arXiv preprint arXiv:
2106.07557
12. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object
detection with transformers. In: ECCV
13. Mathai TS, Lee S, Elton DC, Shen TC, Peng Y, Lu Z, Summers RM (2021) Lymph node
detection in T2 MRI with transformers. arXiv preprint arXiv:2111.04885
14. Vepakomma P, Gupta O, Swedish T, Raskar R (2018) Split learning for health: distributed deep
learning without sharing raw patient data. arXiv preprint arXiv:1812.00564
15. Fung G, Dundar M, Krishnapuram B, Bharat Rao R (2007) Multiple instance learning for
computer aided diagnosis. In: Advances in neural information processing systems, vol 19, p
425
16. Kwee TC, Kwee RM (2020) Chest CT in COVID-19: what the radiologist needs to know.
RadioGraphics 40(7):1848–1865
17. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align
and translate. arXiv preprint arXiv:1409.0473
18. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient
image transformers & distillation through attention. In: ICML
Current Advances in Locality-Based and Feature-Based Transformers … 335

19. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A,


Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge.
Int J Comput Vis
20. Van Horn G, Aodha OM, Song Y, Shepard A, Adam H, Perona P, Belongie SJ (2018) The
iNaturalist challenge 2018 dataset. arXiv preprint arXiv:1707.06642
21. Van Horn G, Aodha OM, Song Y, Shepard A, Adam H, Perona P, Belongie SJ (2019) The
iNaturalist challenge 2019 dataset. arXiv preprint arXiv:1707.06642
22. Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of
classes. In: Proceedings of the Indian conference on computer vision, graphics and image
processing
23. d’Ascoli S, Touvron H, Leavitt ML, Morcos AS, Biroli G, Sagun L (2021) ConViT: improving
vision transformers with soft convolutional inductive biases. In: ICML
24. Graham B, ElNouby A, Touvron H, Stock P, Joulin A, Jégou H, Douze M (2021) LeViT: a
vision transformer in convnet’s clothing for faster inference. In: ICCV
25. Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into
visual transformers. In: ICCV
26. Li Y, Zhang K, Cao J, Timofte R, Van Gool L (2021) LocalViT: bringing locality to vision
transformers. arXiv preprint arXiv:2104.05707
27. Hassani A, Walton S, Shah N, Abuduweili A, Li J, Shi H (2021) Escaping the big data paradigm
with compact transformers. arXiv preprint arXiv:2104.05704
28. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
29. Zhou D, Kang B, Jin X, Yang L, Lian X, Jiang Z, Hou Q, Feng J (2021) DeepViT: towards
deeper vision transformer. arXiv preprint arXiv:2103.11886
30. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FEH, Feng J, Yan S (2021) Tokens-to-
token ViT: training vision transformers from scratch on ImageNet. In: ICCV
31. Chen C-F, Panda R, Fan Q (2021) RegionViT: regional-to-local attention for vision trans-
formers. arXiv:2106.02689
32. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical
image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE,
pp 248–255
33. Patro BN, Namboodiri VP, Agneeswaran VS (2023) SpectFormer: frequency and attention is
what you need in a vision transformer. https://doi.org/10.48550/arXiv.2304.06446
34. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical
vision transformer using shifted windows. In: ICCV
35. Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision
transformers. In: ICCV
A Systematic Review on Detection
of Gastric Cancer in Endoscopic Imaging
System in Artificial Intelligence
Applications

K. Pooja and R. Kishore Kanna

Abstract Artificial intelligence and disease detection go hand in hand. Convolu-


tional neural networks, a significant area of artificial intelligence applications, are
crucial in the detection of stomach cancer. AI’s significance in cancer research and
clinical applications is becoming increasingly recognized. Cancers like stomach and
gastric cancer are ideal test subjects to determine whether early efforts to apply AI
to medicine can result in beneficial outcomes. Deep learning (DL) and machine
learning (ML) are two examples of AI-derived concepts. The definition of ML refers
to the capability of learning data features without explicit programming. It seeks
to improve the efficacy by computing methods. ML-based predictive prognostic
models are becoming more prevalent in cancer research. The objective of this review
paper focused on the role in the artificial intelligence improvements in treatment,
prognosis, and diagnosis gastric cancer. Artificial neural networks and convolutional
neural network have gained a lot of attention for biomedical applications. The main
goal of the ML subset known as DL is to challenge multilayer intelligence networks.
When it comes to the clinical management of stomach cancer, there is a lot more
to be discussed. Despite rising efforts, it is important to adapt machine learning
and artificial intelligence to enhance stomach cancer diagnostics. It can be used to
enhance visual modalities for treatment and diagnosis procedures, even though that
it can be slow and difficult. It might eventually turn into a helpful tool for doctors.
Moreover, artificial intelligence changes ideas about the future of medicine as well
as the effectiveness of diagnostic and treatment procedures.

Keywords Gastric cancer · Artificial intelligence (AI) · Endoscopic imaging


system · Convolutional neural networks (CNNs) · Deep learning (DL)

K. Pooja (B) · R. Kishore Kanna


Department of Biomedical Engineering, Vels Institute of Science, Technology and Advanced
Studies, Chennai, India
e-mail: poojakumar2711@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 337
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_29
338 K. Pooja and R. Kishore Kanna

1 Introduction

Gastric cancer has one of the higher increased mortalities of all cancers. Early gastric
cancer patients hardly ever show any symptoms. Similar to gastritis and stomach
ulcers, cancer develops prior to the most obvious symptoms appear. As a result, it
could be hard for patients to understand that their stomach cancer has already spread.
Therefore, the need for endoscopic imaging-based early stomach cancer detection
exists. Early stomach cancer can be difficult to diagnose with imaging, and the
accuracy depends on gastroenterologist’s experience.
In order to aid in image diagnosis using machine learning approaches, several
research on endoscopy by detecting automatic polyp detection [1]. There are two
reasons why an effective method for detecting early stomach cancer has not yet been
developed, according to researchers. First, there has been improper maintenance of
the early stomach cancer data that can be exploited in machine learning. Second,
many early gastric tumors lack certain morphological characteristics found in more
advanced malignancies.
The exceptional method of IEE also known as NBI, allows for the identifi-
cation of the textural patterns. When amplified, IEE performs a conclusive diag-
nosis of gastrointestinal cancer far better than traditional white-light imaging (WLI)
endoscopy [2]. Image enhanced endoscopy (IEE) has been shown to be effective
for identifying gastrointestinal cancers in numerous investigations. Esophageal and
pharyngeal malignancies can be found using magnification endoscopy with narrow
band endoscopy (NBI) [3]. Colon cancer detection with BLI has recently been
demonstrated to be efficient [4]. However, because to its low brightness, there are no
reports on the IEE’s efficiency in identifying stomach cancer. Since it can be difficult
to distinguish between the background mucosal alterations associated with gastritis
and the morphological changes associated with automatic gastric cancer screening
using WLI endoscopy.
Liu et al. focuses early stomach cancer detection using algorithms, which can
help gastroenterologists take decisions. Before doing this investigation, we acquired
about some endoscopic medical imaging datasets of early gastric cancer, the majority
types of which were WLI-captured images. Following an endoscopic submucosal
dissection (ESD), retroactive inspection revealed that all of the lesions were early
stage stomach cancer [5].
Consequently, discussing about the researchers initial research on convolutional
neural network (CNN) that to detect automatically on early stage stomach cancer
[6]. The following are the main contributions of this work: (1) the precise early auto-
matic diagnosis of stomach cancer with weak configuration traits, is sometimes hard
for endoscopists to diagnose; (2) the development of convolutional neural network
algorithm with a fixed set of image datasets; (3) the requirement of inaccurate early
gastric cancer diagnoses. By analyzing the characteristics of inaccurately identified
images, researchers may develop more efficient and effective ways for gastric cancer
early detection.
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 339

The main purpose of this systematic review is to analyze and summarize all the
current applications of artificial intelligence and neural networks in identification
gastric cancer and to evaluate the performance in CNN. Mainly this research can
help CNN build its knowledge base by filling the information gap in the detection of
stomach cancer.

2 Documentation Review

2.1 Related Study

The purpose of image classification is to classify each image according to one or more
specified classifications. The most effective object segmentation and recognition
outcomes are actually produced by recent deep learning-based technologies. The
convolutional neural networks are the great reason to cause this. This technique is
faster in analysis and computation than previous technique. In medical imaging,
systems are being developed for early cancer diagnosis.
Scientific research and applications for classifying, detecting, segmenting
stomach cancer have been developed, with annotated datasets [7, 8]. Oukdach et al.
[9] created a computer-based network that could extract features from two open
datasets in the vast majority of studies [10–12] concentrate on the development
of stomach cancer. Tumor detection and recognition are essential in this field of
research because tumors represent a predefined object, and pixels collection of
images. Multiple objects on the same image can be detected using models like YOLO
[13]. The authors of [9] identified feature classifications by small object detector of
fundamental neural network model. Bounding boxes can be annotated to enhance
feature representation, as shown in work in [14]. This is based on weighted loss with
label smoothing. A brand-new, quicker version of R-CNN was also introduced in
[15]. The sensitivity (SE), or ratio the number is accurately diagnosed gastric cancer
lesions to the number of actual stomach cancer lesions, is discussed in another study
from [16]. In this study, they used CNN method to identified 71 of 77 gastric cancer
lesions after evaluating the 2296 test images in 47 s, with a PPV of 30.6% and a
sensitivity of 92.2%. It properly recognized 70 of 71 lesions with a diameter larger
than 6 mm out of 232 lesions, 161 of which were not malignant.
Additionally, the study in [17] shows that the researchers of [18] introduced Seg-
Net an architecture used to create an identification model of real-time polyps. In
addition, a system of positive predictive value of 30.6% and sensitivity of 92.2% was
described by Hirasawa et al. for automatic stomach cancer detection [19]. It changes
the output space of the bounding box into a set of default boxes with differing dimen-
sions for every feature classification [16]. The patch-based classification strategy by
Sakai et al. to construct a convolutional neural network that accurately diagnosed
stomach cancer in endoscopic images [20].
340 K. Pooja and R. Kishore Kanna

Endoscopic imaging system are used to detect the whole lesion by a lesion-based
CNN method, a sort of deep learning algorithm. It was developed in [21], endoscopy
is a key component of the GI tract evaluation because it helps doctors find the gastroin-
testinal (GI) tract more easily. Moreover, the FN rate for gastric cancer identification
is significant, when doctors use Esophagogastroduodenoscopy, a prominent proce-
dure for diagnosing stomach cancer, ranges from 4.6 to 25.7%. Among the most
common and effective frameworks is the Cafe deep learning framework acts as their
foundation, it identified the positive prediction value of 30.6% and sensitivity of
92.2%, the CNN was successful in detecting 71 of the 77 stomach cancer lesions
in this patient [22]. It properly identified 70 of 71 tumors out of 232 that had a
diameter larger than 6 mm, and 161 of those lesions were not malignant. Guitao and
Zhenwe recently predict the average for the categorization of stomach cancer. They
can look at these possibilities to improve the network structure of the mask R-CNN
for medical image detection and execute their model using additional data augmen-
tation. In previous research, deep learning enhanced the prediction and detection of
stomach cancer [23, 24].
In this review is clearly mentioned the proposed model to examined precision,
sensitivity, specificity, area under the curve (AUC), and accuracy (if provided) in the
models to explain the suggested CNN performance in the identification of stomach
cancer.

2.2 Artificial Intelligence in Endoscopic Imaging System


Analysis

In addition to only being able to see something clearly but not being able to identify
it, segmentation can be a big advantage to CAD. It addresses special objects and
identifies them. Although colonoscopy is now the focus of artificial intelligence
research, and its importance and efforts highlight the upper GI applications. The
most important recognition characteristics were found the size of the artifacts and
how much they overlapped other objects [25]. A CNN system developed by Hirasawa
and others uses endoscopic data to automatically diagnose stomach cancer [19]. The
system was developed to use a dataset of 13, 584 endoscopic images from four
research universities collected over a 12-year period and assessed by a certified
expert. It was based on the Single Shot Multibox Detector architecture [26]. To
compare and evaluate the algorithm’s accuracy, researchers gathered an additional
2296 images from 77 stomach lesions. On CNN, a yellow rectangle box with the
terms “early or advanced stomach cancer” was displayed [25].
According to the results, The CNN had a sensitivity of 92.2% and evaluated
all of the images in 47 s. In differentiated intra-mucosal malignancies in lesions
were ignored, false positives were based on architectural variation and gastritis.
False positives were based on by gastritis and architectural variation. This kind of
researches, known as supervised learning, has an explained [27]. This indicates that
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 341

a microorganism like cancer responds appropriately to a human being, which may


be taken as an example. An unsupervised learning allows students to obtain specific
knowledge without being notified of the accuracy of their responses. Patterns can be
used to make predictions, much like data augmentation [28].
Researchers have also enhanced the common endoscope by using deep learning
to other endoscope modalities. For narrow band imaging (NBI), conventional optical
filters are used (blue, red, green) [29]. Because different wavelengths of light can
pass through as a result, the surface mucosa and microvasculature are now more
clearly visible.
Li et al. [30] was identified the CNN-based method for narrow band imaging anal-
ysis. Two experienced endoscopists observed the patients’ gastric mucosal lesions
and using narrow band imaging to identify early or non-cancerous gastric cancer by
the study diagnosis approach for early stomach cancer using magnifying endoscopy
(MESDA-G) and the vessels + surface categorization system. The histology of the
lesions was then analyzed by two pathologists using the most recent Vienna classi-
fication. However, lesions in categories 4–5 are indicative of early stomach cancer.
Lesions in categories 1–3 were non-cancerous [31]. This paper suggested the reit-
erative learning strategy, which enabled a full convolution network to obtain good
segmentation effect while using the gastric cancer pathology data set without manual
annotation.

3 Role of AI in Endoscopic Imaging

Frazzoni et al. have proposed using ML into clinical endoscopy for precise gastroin-
testinal illness detection [32]. To understand how machine learning influences the
difficulty of gastrointestinal endoscopic diagnosis, a strong technical framework
is necessary, especially for each physician. The stomach is anatomically unique
compared to other gastrointestinal organs including the colon and oesophagus. Basi-
cally, it has potential blind patches and a broader bent lumen, requiring more complex
observations [31]. As a result, in order to eliminate mistakes, clinicians require
numerous distant looks. Some of the initial symptoms, EGC may potentially be
masked by a Helicobacter pylori infection. Endoscopic diagnosis has been noted to
differ because of these factors [32, 33].
Endoscopic mucosal resection (EMR), a method of endoscopically assisted treat-
ment for gastric cancer, carries a minimal risk of metastatic lymph nodes, the main
treatment for stomach cancer in Japan, and its acceptance has increased in the West
side. It is notable for being minimally invasive [34]. Particularly, in larger lesions
is than 15 mm, EMR may make determining malignant depth more difficult and it
rises the tumor recurrence [33]. ESD uses a different method of endoscopic dissec-
tion that developed for the removal of larger lesions. It frequently proven successful
alternative to standard open or laparoscopic surgery for EGC [35].
Computer-aided detection using convolutional neural networks was created by
Zhu et al. with purpose of progressively integrating artificial intelligence into the
342 K. Pooja and R. Kishore Kanna

practical endoscopic resection method [36]. The amount of tumor invasion can be
accurately predicted by an AI-based detection system, which can reduce the need for
a gastrectomy, but it may not be able to direct endoscopic resection procedures or emit
alarms for serious risks of consequences. In reality, at a rate of 3.5%, ESD-related
complications in gastric cancer, including as bleeding, peritonitis, and perforation,
continue to be a major difficulty [37].
Additionally, Odagiri et al. showed a linear relationship between a major hospital
volume and a reduced prevalence of ESD-related issues. Given that the endoscopic
submucosal dissection and endoscopic mucosal resection (EMR) have higher oper-
ating proficiency, makes sense to start by providing hospitals with a high volume both
the development and implementation of AI-based procedures [38]. This is particu-
larly true for hospitals using machine learning or deep learning that requires proper
considered when applying.
Namikawa et al. recognized the endoscopic clinical AI guide describes in full the
blind spot monitoring, clinical detection, and classification uses of AI in stomach
medicine [39]. They also consider the prospect that AI could one day be properly
trained to differentiate between stomach lesions that are non-plastic and those that
are cancerous, generating a more special significance. In artificial intelligence appli-
cations, treatment of gastric cancer is still in its development. The interpretation of
chemoradiotherapy data such as genomic, mutation analysis, and the prediction of
insensitivity following the use of artificial intelligence and the diagnosis of gastric
cancer by endoscopic image analysis, which highly depends upon feature extraction
[40]. AI is also applicable to the domain of drug responsiveness, which is defined
by clinical variability and imbalances. DeepIC50 is a model of a one-dimensional
convolution neural network was reviewed by Joo et al. to reliably predict the respon-
siveness of stomach cancer drugs. They obtained a comparable outcome when they
validated the technology using information from actual stomach cancer patients and
cell lines [40]. The most recent suggestions highlight that before routine clinical
usage, AI must be validated in multicenter statistically controlled studies; hence,
prospective and multicenter research will be required in the future.

4 Discussion

The inability to access the basic healthcare system and the diagnosis accuracy range
is present considerable obstacles to the global health care system. In medicine the
machine learning algorithms rapidly risen in recent years, ranging from more basic
machine learning techniques to more recent deep learning techniques. Endoscopy and
a pathological examination both depend on the operator, and the diagnosis is random.
Moreover, the use of AI-assisted inspection might help to offer a different perspective
and lessen the need for operators to perform diagnostic tests. The applications of the
algorithm have a significant impact on the growth of the medical and health indus-
tries as well as the reliability of clinical diagnosis. According to our detailed review,
the convolutional neural networks of stomach cancer show satisfactory accuracy
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 343

for detection, classification, segmentation, region of interest defining the margins.


This demonstrates how deep learning is now more widely used in medical decision-
making and how CNN has advanced quickly in recent years in identifying cancer. The
future artificial intelligence maybe highly reflective, considering some challenges.
Accurate estimation and fast decision are some of AI’s major benefits. Theoreti-
cally, CNN could automate its predictions. It is highly expected that a platform for
telemedicine services will be developed, as well as AI automatic diagnosis services.
The performance and efficiency of AI will keep getting better as long as data is
continuously accumulated. It can be used to detect diseases and determine the level
of invasion, as well as to identify endoscopic images or pathological slices. The fact
that the endoscopic images were taken at a single facility is a limitation of this study.
Using images from multiple facilities, a future evaluation of the proposed method’s
usefulness ought to be carried out. However, it is necessary to conduct a subjective
assessment to determine whether the clinically significant invasive area was removed.
In addition, we intend to incorporate the constructed model into endoscopy support
software and demonstrate its practical utility in medicine.

5 Conclusion

In this summary, the novelty of the paper is convolutional neural networks are
currently being used in the diagnosis of stomach cancer, thus we undertook a thor-
ough review of the subject. The trained with a large amount of data, neural networks
are effective and powerful algorithms. It is worthwhile to adapt AI to improve gastric
cancer diagnoses, despite increasing efforts. The information that comes out could
completely change how we deal with issues related to gastric cancer. Although inte-
gration may be slow and it can be used to improve image modalities for diagnosis and
treatment plans. It has the potential to become a valuable tool for doctors, but only if
it is adapted and taught. Thus, the study performed by CNN on medical imaging will
be a useful and the accuracy results of some research ranged from 77.3 to 98.7%, and
all research demonstrated strong identification performance. CNN is believed to have
a significant role in assisting scientists and physicians in identifying diseases more
accurately and quickly. The precision of the input data and the machine learning
technique used to train the AI model are the only factors that determine whether
AI will be able to produce data which is clinically effective. Hence, scientists are
concentrating on developing a novel optimization algorithm that can take a lot of
information and generate more accurate data in a predictable amount of time.
344 K. Pooja and R. Kishore Kanna

References

1. Hatami S, Shamsaee DR, Hasan Olyaei M (2020) Detection and classification of gastric
precancerous diseases using deep learning. In: 6th Iranian conference on signal processing
and intelligent systems (ICSPIS), Mashhad, Iran, pp 1–5
2. Zhang Q, Wang F, Chen ZY, Wang Z, Zhi FC, Liu SD, Bai Y (2016) Comparison of the
diagnostic efficacy of white light endoscopy and magnifying endoscopy with narrow band
imaging for early gastric cancer: a meta-analysis. Gastric Cancer 19(2):543–552
3. Sugimoto M, Kawai Y, Morino Y, Hamada M, Iwata E, Niikura R et al (2022) Efficacy of high-
vision transnasal endoscopy using texture and colour enhancement imaging and narrow-band
imaging to evaluate gastritis: a randomized controlled trial. Ann Med 54(1):1004–1013
4. Ang TL, Li JW, Wong YJ, Tan YLJ, Fock KM, Tan MTK et al (2019) A prospective random-
ized study of colonoscopy using blue laser imaging and white light imaging in detection and
differentiation of colonic polyps. Endosc Int Open 7(10):E1207–E1213
5. Liu L, Liu H, Feng Z (2022) A narrative review of postoperative bleeding in patients with
gastric cancer treated with endoscopic submucosal dissection. J Gastrointest Oncol 13(1):413
6. Martin DR, Hanson JA, Gullapalli RR, Schultz FA, Sethi A, Clark DP (2020) A deep learning
convolutional neural network can recognize common patterns of injury in gastric pathology.
Arch Pathol Lab Med 144(3):370–378
7. Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation.
IEEE Trans Pattern Anal Mach Intell 39:1
8. Lee S-A, Cho HC, Cho H-C (2021) A novel approach for increased convolutional neural
network performance in gastric-cancer classification using endoscopic images. IEEE Access
9:51847–51854
9. Oukdach Y, Kerkaou Z, El Ansari M, Koutti L, El Ouafdi AF (2022) Gastrointestinal diseases
classification based on deep learning and transfer learning mechanism. In: 2022 9th interna-
tional conference on wireless networks and mobile communications (WINCOM). IEEE, pp
1–6
10. Pang X, Zhao Z, Weng Y (2021) The role and impact of deep learning methods in computer-
aided diagnosis using gastrointestinal endoscopy. Diagnostics 11(4):694
11. Lee JY, Jeong J, Song EM, Ha C, Lee HJ, Koo JE et al (2020) Real-time detection of colon
polyps during colonoscopy using deep learning: systematic validation with four independent
datasets. Sci Rep 10(1):1–9
12. Mushtaq D, Madni TM, Janjua UI, Anwar F, Kakakhail A (2023) An automatic gastric polyp
detection technique using deep learning. Int J Imaging Syst Technol
13. Doniyorjon M, Madinakhon R, Shakhnoza M, Cho YI (2022) An improved method of polyp
detection using custom YOLOv4-tiny. Appl Sci 12(21):10856
14. Zeng Q, Li H, Zhu Y, Feng Z, Shu X, Wu A et al (2022) Development and validation of a
predictive model combining clinical, radiomics, and deep transfer learning features for lymph
node metastasis in early gastric cancer. Front Med 9
15. Shibata T, Teramoto A, Yamada H, Ohmiya N, Saito K, Fujita H (2020) Automated detection
and segmentation of early gastric cancer from endoscopic images using mask R-CNN. Appl
Sci 10(11):3842
16. Yoon HJ, Kim JH (2020) Lesion-based convolutional neural network in diagnosis of early
gastric cancer. Clin Endosc 53(2):127–131
17. Zhang J, Wen T, He T, Wang X, Hao R, Liu J (2022) Human stools classification for gastroin-
testinal health based on an improved ResNet18 model with dual attention mechanism. In:
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp
2096–2103
18. Ma L, Su X, Ma L, Gao X, Sun M (2023) Deep learning for classification and localization of
early gastric cancer in endoscopic images. Biomed Signal Process Control 79:104200
19. Hirasawa T, Aoyama K, Tanimoto T, Ishihara S, Shichijo S, Ozawa T et al (2018) Application
of artificial intelligence using a convolutional neural network for detecting gastric cancer in
endoscopic images. Gastric Cancer 21:653–660
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 345

20. Sakai Y, Takemoto S, Hori K, Nishimura M, Ikematsu H, Yano T, Yokota H (2018) Automatic
detection of early gastric cancer in endoscopic images using a transferring convolutional neural
network. In: 2018 40th annual international conference of the IEEE engineering in medicine
and biology society (EMBC). IEEE, pp 4138–4141
21. Kuchkorov TA, Sabitova NQ, Ochilov TD (2022) Detection of gastric ulcers and lesions
applying CNN architecture. Int J Contemp Sci Tech Res 200–204
22. Horiuchi Y, Aoyama K, Tokai Y, Hirasawa T, Yoshimizu S, Ishiyama et al (2022) Convolutional
neural network for differentiating gastric cancer from gastritis using magnified endoscopy with
narrow band imaging. Dig Dis Sci 65(5):1355–1363
23. Tani LFK, Tani MYK, Kadri B (2022) Gas-Net: a deep neural network for gastric tumor
semantic segmentation. AIMS Bioeng 9(3):266–282
24. Cao G, Song W, Zhao Z (2019) Gastric cancer diagnosis with mask R-CNN. In: 2019 11th
international conference on intelligent human-machine systems and cybernetics (IHMSC), vol
1. IEEE, pp 60–63
25. Gholami E, Tabbakh SRK (2021) Increasing the accuracy in the diagnosis of stomach cancer
based on color and lint features of tongue. Biomed Signal Process Control 69:102782
26. Ikenoyama Y, Hirasawa T, Ishioka M, Namikawa K, Yoshimizu S, Horiuchi Y et al (2021)
Detecting early gastric cancer: comparison between the diagnostic ability of convolutional
neural networks and endoscopists. Dig Endosc 33(1):141–150
27. Zhou B, Rao X, Xing H, Ma Y, Wang F, Rong L (2022) A convolutional neural network-based
system for detecting early gastric cancer in white-light endoscopy. Scand J Gastroenterol 1–6
28. Jamil D, Palaniappan S, Lokman A, Naseem M, Zia SS (2022) Diagnosis of gastric cancer
using machine learning techniques in healthcare sector: a survey. Informatica 45(7):2022
29. Sivero L, Volpe S, Gentile M, Sivero S, Iovino S, Gennarelli N et al (2022) Role of narrow band
imaging (NBI), in the treatment of non-polypoid colorectal lesions, with endoscopic mucosal
resection (EMR). Ann Ital Chir 93(2):178–182
30. Li L, Chen Y, Shen Z, Zhang X, Sang J, Ding Y et al (2020) Convolutional neural network
for the diagnosis of early gastric cancer based on magnifying narrow band imaging. Gastric
Cancer 23:126–132
31. Gong L, Wang M, Shu L, He J, Qin B, Xu J (2022) Automatic captioning of early gastric cancer
using magnification endoscopy with narrow-band imaging. Gastrointest Endosc 96(6):929–942
32. Frazzoni L, Arribas J, Antonelli G, Libanio D, Ebigbo A, van der Sommen F et al (2022) Endo-
scopists’ diagnostic accuracy in detecting upper gastrointestinal neoplasia in the framework of
artificial intelligence studies. Endoscopy 54(04):403–411
33. Fujiyoshi MRA, Inoue H, Fujiyoshi Y, Nishikawa Y, Toshimori A, Shimamura Y et al (2022)
Endoscopic classifications of early gastric cancer: a literature review. Cancers 14(1):100
34. Kinami S, Saito H, Takamura H (2022) Significance of lymph node metastasis in the treatment
of gastric cancer and current challenges in determining the extent of metastasis. Front Oncol
11:5628
35. Kanai M, Togo R, Ogawa T, Haseyama M (2019) Gastritis detection from gastric X-ray images
via fine-tuning of patch-based deep convolutional neural network, pp 1371–1375
36. Cho BJ, Bang CS, Park SW, Yang YJ, Seo SI, Lim H, Baik GH (2019) Automated classification
of gastric neoplasms in endoscopic images using a convolutional neural network. Endoscopy
51(12):1121–1129
37. Zhu Y, Wang QC, Xu MD, Zhang Z, Cheng J, Zhong YS (2019) Application of convolutional
neural network in the diagnosis of the invasion depth of gastric cancer based on conventional
endoscopy. Gastrointest Endosc 89(4):806–815
38. Odagiri H, Hatta W, Tsuji Y, Yoshio T, Yabuuchi Y, Kikuchi D, Hoteya S (2022) Bleeding
following endoscopic submucosal dissection for early gastric cancer in surgically altered
stomach. Digestion 103(6):428437
346 K. Pooja and R. Kishore Kanna

39. Namikawa K, Hirasawa T, Nakano K, Ikenoyama Y, Ishioka M, Shiroma S et al (2020) Arti-


ficial intelligence-based diagnostic system classifying gastric cancers and ulcers: comparison
between the original and newly developed systems. Endoscopy 52(12):1077–1083
40. Joo M, Park A, Kim K, Son WJ, Lee HS, Lim G, Nam S (2019) A deep learning model for cell
growth inhibition IC50 prediction and its application for gastric cancer patients. Int J Mol Sci
20(24):6276
Propchain: Decentralized Property
Management System

Soma Prathibha, V. Saiganesh, S. Lokesh, Avudaiappan Maheshwari,


and M. A. Kishore

Abstract Presently, there are various technologies used by governments all over the
world to register/transfer property. However, most of these systems are centralized in
nature and are vulnerable in terms of security, so legal/illegal exploits are possible.
Our solution aims to address this problem by the use of blockchain technology
specifically smart contracts and Non-Fungible Tokens to decentralize the property
registration process making it visible to anyone across the globe to view all transac-
tions made through the system thereby reducing corruption through illegal property
investments and also to make it virtually impossible for any person or organization
to exploit the system.

Keywords Blockchain · Non-Fungible Tokens · Smart contract · Property


registration

S. Prathibha (B) · V. Saiganesh · S. Lokesh · M. A. Kishore


Department of Information Technology, Sri Sairam Engineering College, Chennai, India
e-mail: prathibha.it@sairam.edu.in
V. Saiganesh
e-mail: saiganesh0605@gmail.com
S. Lokesh
e-mail: lokeshsr1854@gmail.com
M. A. Kishore
e-mail: kishorema@ieee.org
A. Maheshwari
Department of CINTEL, SRM University, Chennai, India
e-mail: maheshwa1@srmist.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 347
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_30
348 S. Prathibha et al.

1 Introduction

Blockchain is a type of record that is decentralized in nature that keeps track of


transactions and records data. Each block in the chain-like structure contains data, its
hash, and the previous hash as its contents. As a result, the blocks are linked together
using the hash value obtained by cryptography. The types of blockchains include
private (for a specified organization), public (any anonymous user can be added),
and federated (a partly private blockchain). A blockchain contains the following
significant details: hash, miner, consensus, proof of work, and proof of stake. Each
of these elements plays a role in maintaining the blockchain [1].
Digital assets known as Non-Fungible Tokens (NFTs) signify ownership of a
special good or piece of material, such as a work of art, a song, or a tweet. Because
they are built on blockchain technology, their validity is guaranteed and duplication
is avoided. NFTs, a brand-new class of digital collectibles and investments, have
recently attracted a lot of attention and interest, with some going for millions of
dollars. They challenge conventional ideas of ownership and value in the digital
world while providing creators and collectors with a fresh method to commercialize
and display their digital works. However, there are worries about NFTs’ potential for
fraud and their effects on the environment. Property registration systems all around
the planet generally make use of databases and some form of website/application
to register or transfer properties either by the designated government official in a
government office or through online means. This system is being exploited by people
in many ways. Moreover, these forms of systems are common targets for hackers
which they use for malicious intent. Our solution aims to solve this problem by
implementing a blockchain-based property registration, transfer, and verification
service providing end-users with a simple and easy-to-use interface. NFT or Non-
Fungible Tokens are unique tokens created and stored in a blockchain wallet for
which the current owner of it possesses all rights to own the unique token. Hence,
this system provides a digital property registration, verification, and transfer system
that is more secure than the conventional system. This ensures that no property can
be illegally owned by anyone as all records stored in blockchain can be viewed by
anyone. Moreover, due to the increased property security, the entire process can be
done online as posed to physical validation and verification.

2 Literature Review

The following literature survey provides detailed and distinct information about prop-
erty registration using a decentralized blockchain application. The findings and the
drawbacks of each architecture have been provided.
Blockchain has several applications, including real estate, as it provides an excel-
lent service in this field. The safety and transparency that blockchain provides are
highly beneficial in real estate transactions. For instance, when buying a house, the
Propchain: Decentralized Property Management System 349

ownership is established through a physical deed or an agreement that is recorded


by the government in a ledger. However, if the page containing the ownership infor-
mation is destroyed or misplaced, it becomes challenging to prove the ownership of
the property. To address this challenge, blockchain technology offers a system that
enables the creation of a digital deed through a smart contract form, which is recorded
as a new block in the chain. This ensures that each node has a copy of the informa-
tion, which is securely stored on multiple servers. Thus, blockchain technology can
significantly enhance the safety and transparency of real estate transactions [2].
The land registration system in India is known for its time-consuming procedures
and involvement of multiple intermediaries, leading to an increase in fraudulent
cases. To overcome these challenges, blockchain technology can be utilized for land
registration management in India. The current system faces issues such as a lack
of transparency and accountability, inconsistent data sets across different govern-
ment departments dealing with the same property, and delays in the land registration
administration process. This study highlights the significance of smart contracts for
land registration utilizing blockchain technology and throws light on the country’s
current land records maintenance and registration system [3].
The issuance of land registry documents by the government to landowners as
proof of ownership is a crucial aspect of the legal system. For developing coun-
tries, it is necessary to address the challenges faced by the traditional system of land
registration. Blockchain technology offers a potential solution to maintaining a trans-
parent and secure digital record of land holdings. However, any implementation of
a blockchain-based solution must be carefully integrated. In order to improve dele-
gated proof of stake consensus, which can create an exclusive record-based system
for trading land assets, this study suggests a novel approach. The existing conven-
tional land registry system may be simply connected with this one, assuring flawless
operation [4].
The safety of land records is compromised due to duplication and inability in the
land registry system followed today, leading to adverse consequences for citizens.
Fraudsters can easily misuse the current registration system thereby cheating not
only the government but also the common public. This research proposes the use of
blockchain and majority consensus to develop a secure land registration architecture.
The deployment of blockchain in land registration significantly reduces security
concerns. As each block is connected to its previous hash, the hash value is unique,
with the SHA256 hashing method being used. The proof of work (PoW) approach
with SHA256 enhances data security associated with every transaction. A node is
responsible for creating the next blocks, linking them to the old blockchain. Hashes
are used for the connection of transactions. By this methodology, 99% of man’s work
can be reduced [5].
The current land registration system is inefficient and unable to ensure transaction
security or prompt resolution due to numerous cases of faked titles, official fraud,
and delayed ownership transfers. A blockchain-based land registration system is
recommended as a solution to this issue because it offers security and transparency.
Blockchain technology stands out for its decentralization, immutability, and perma-
nence, which increase productivity and cut expenses. The proposed solution in this
350 S. Prathibha et al.

paper is a decentralized application developed and deployed on the Ethereum network


that makes use of smart contracts, react, and js for server and routing. The good results
of the investigation show that the proposed method is a practical and efficient fix [6].
The digitization of land registry systems is in its early stages of deployment in
several nations throughout the world. The present land recording system, partic-
ularly in India, is weak and easily manipulated. The primary concerns with land
management and registration are as follows: a lack of transparency in keeping the
record (among the concerned public officials and associated private individuals),
second, time-consuming (it takes time to access or validate information), sluggish
processing brought on by the large number of officials participating in the procedure
and the absence of the necessary data on their end (such as conflicting, insufficient, or
outdated data). In this paper, the authors attempted to use the features of blockchain
technology to address these issues without jeopardizing or eliminating the manda-
tory roles of government officials. The Hyperledger Fabric permissioned blockchain
is utilized to build the proposed system. The functions of sub-registrar offices are
considered government personnel and must adhere to the land title registration rules.
In addition to enforcing the requirement, the Hyperledger chain codes provide a
straightforward mechanism for land registration and maintenance [7].
Land registration is a procedure that provides landowners with legally binding
land titles. Creating a blockchain system that stores all transactions made during
the process allows for the development of a system that automatizes and keeps land
registry records unchangeable. The immutability of blockchain provides numerous
security features that will protect the system from hacking and other intruder misbe-
havior. It secures the records stored in a block of data using strong cryptographic
protocols and standards, improving the land registry’s ability to transfer land owner-
ship from a seller to a new buyer more reliably. This paper describes a system based
on the Hyperledger Fabric distributed ledger that would record and retain real-time
transactions conducted throughout the land registration process, reducing the possi-
bility of fraud because immutable transactions are recorded in the permissioned
distributed ledger network [8] (Table 1).
The literature survey emphasizes how blockchain technology has the potential
to improve land registration procedures’ security, transparency, and efficiency. The
majority of currently published papers concentrate on specialized topics like fraud
prevention, smart contracts, or the use of blockchain in a particular region. However,
there are not many thorough studies that focus on the difficulties and needs of the real
estate industry. Our Propchain system is a comprehensive solution that combines the
advantages of blockchain technology with a focus on the real estate sector and land
registration processes, driven by the gaps found in the existing literature.
Propchain: Decentralized Property Management System 351

Table 1 Comparison of existing system with Propchain


No Disadvantages Propchain
[2] Beyond preventing fraud, it could not cover To address every step of the process and stop
all facets of property registration and fraud, Propchain includes complete property
management registration and administration tools
[3] Besides smart contracts, it might not give a Propchain implements a comprehensive
complete picture of blockchain-based land solution to land registry that includes smart
registration solutions contracts and extra features for increased
accountability and transparency
[4] The land register system in Bangladesh is Propchain provides a flexible solution that
the only subject of interest; therefore, larger may be used in different scenarios around
applications or a worldwide viewpoint the world and is not restricted to any one
might not be covered nation or region
[5] Further research may be necessary to Using blockchain technology, Propchain
determine the scope of the innovation and provides an innovative and safe method for
its application beyond the specific model exchanging ownership of land
[6] Further research may be necessary to Propchain guarantees the system’s
determine the suggested system’s scope and scalability and flexibility to broader settings,
scalability in broader situations delivering a reliable and effective land
registration solution
[7] May not go into specific permissioned With permissioned blockchain as its base,
blockchain implementation problems or Propchain ensures the security and
technical implementation details dependability of the land registration system
[8] It may be necessary to conduct additional Propchain makes use of distributed ledger
research on the specifics of the distributed technology to guarantee data consistency
ledger implementation and its applicability and openness in the land register
in various situations

3 Proposed System

The proposed solution will implement a blockchain-based property registration,


transfer, and verification service providing end-users with a simple and easy-to-
use interface as a package that can be implemented by any organization irrespective
of the country/province in which it is implemented. It makes use of Non-Fungible
Tokens or NFT short. NFTs are a form of digital ownership that can be compared to a
real token where the owner of a token possesses the right to enter an event/own some
object. Similarly, each NFT is kept on the blockchain along with a unique identifying
code and biometric details. Here, “metadata” means “data about data.” Metadata is
just a small amount of additional information that describes the NFT and is stored
alongside it. The main purpose of Non-Fungible Tokens is to provide owners with a
sense of ownership of digital assets which was previously not possible.
352 S. Prathibha et al.

Fig. 1 New property registration module

3.1 Registration Module

This front-end module is used by the end-user to register any new property (resi-
dential/commercial) and create a Non-Fungible Token for that particular property
(Fig. 1).
Steps for registering a property:
Step 1: The user opens the registration page from the homepage.
Step 2: The user then proceeds to fill up all necessary info regarding the property
that is about to be registered.
Step 3: The user will then be redirected to a page where they can make a note of
their Asset ID for their reference.
Step 4: Once the transaction gets approved by an official, a unique NFT
representing their property will be transferred to their Blockchain account.

3.2 Transfer Module

This front-end module is used by the end-user to transfer an existing property by


initiating a token transfer from the current owner of the property to the new owner
made possible by the use of a Non-Fungible Token (Fig. 2).
Steps for transferring a property asset:
Step 1: The user opens the transfer page from the homepage.

Fig. 2 Property transfer module


Propchain: Decentralized Property Management System 353

Fig. 3 Property verification module

Step 2: The user then proceeds to fill up all necessary info regarding the property
including the Asset ID that the property is assigned.
Step 3: The user will then be redirected to a page that notifies them about the
completion of the process from their end.
Step 4: Once the transaction gets approved by an official, the unique NFT
representing that property will be transferred to the new owner’s Blockchain
account.

3.3 Verification Module

This front-end module is used by the end-user to view all the details about a property
including the validity of the property and the NFT in question, the last registration/
transfer details, the government official who approved the transaction, the last known
price of the property, date of last transaction, details about the current owner, and so
on (Fig. 3).
Steps for verifying a property asset:
Step 1: The user opens verify page from the homepage.
Step 2: The user then proceeds to fill up the Asset ID that they are provided for
that property.
Step 3: The user will then be redirected to a page where they can find more
information regarding the property.

3.4 Admin Module

This module is mainly used to validate all property transactions (registration/transfer)


by a government official to make sure that the property transaction initiated by the
user is legally acceptable and that normal practices are involved in the property
transaction.
Steps for using the admin panel:
Step 1: The official opens the admin page using the URL provided to them.
354 S. Prathibha et al.

Step 2: They log into the secured admin panel.


Step 3: Each property transaction will be displayed along with its type and Asset
ID. When Asset ID is clicked, more information about the property transaction
appears.
Step 4: The official can either approve the transaction or deny it based on their
current status on this page.
Our proposed system provides a comprehensive solution for land registry,
addressing the specific challenges of the real estate domain. Unlike existing studies
focused on fraud prevention or smart contracts, Propchain incorporates decentral-
ized consensus, immutable audit trails, and secure digital identity management
which enhances the integrity, reliability, and trustworthiness of land registry systems.
The integration of Non-Fungible Tokens (NFTs) adds security and uniqueness to
property ownership. Propchain’s focus, functionality, and practicality distinguish it
from existing works and contribute to advancing blockchain applications in land
registration.

4 Working

The proposed system is a user-friendly platform that utilizes blockchain technology to


enhance security and transparency in property registration, transfer, and verification.
With its modules designed for specific purposes, it offers a seamless experience to
users.
The new property registration module enables the registration of properties not
currently in the system. Users input property details, including address, purchase
price, owner’s name, and ID proof. After meeting government stipulations, a
government official conducts background checks to validate property legitimacy.
If approved, a unique Non-Fungible Token (NFT) is generated as a digital identifier
for ownership. The NFT is securely stored in the owner’s crypto account, providing
irrefutable proof of ownership.
For existing properties, the title transfer module facilitates smooth ownership
transfers. Users provide transfer details, and a government official verifies the
transfer’s legitimacy. The Non-Fungible Token representing the property is then
transferred to the new owner’s crypto account, ensuring a secure transition.
The verification and validation feature allows users to access comprehensive prop-
erty details using the Property ID. This includes NFT ID, address, price, current
owner, and approving official’s name, empowering interested parties.
A smart contract is utilized to enable NFT functionalities, governing token oper-
ations. Custom functionalities prevent duplication, store data, and retrieve the last
minted token ID. The ERC721 Smart Contract provides the framework.
During token minting, a unique ID is assigned to each Non-Fungible Asset,
ensuring uniqueness. Property data is securely stored within the NFT, guaranteeing
ownership rights and data integrity.
Propchain: Decentralized Property Management System 355

For property transfers, an NFT is smoothly transferred using a custom-defined


method derived from the ERC721 Smart Contract. The data is updated, and after
necessary checks, the property is transferred, maintaining NFT integrity.
In summary, the system leverages blockchain technology and NFTs to provide
a secure and efficient solution for property registration, transfer, and verification,
promoting transparency and trust.

5 Implementation

The proposed system utilizes blockchain technology and NFTs to create a user-
friendly platform for property registration, transfer, and verification. Users can
register new properties, and upon approval, a unique NFT is created as proof of owner-
ship. Existing properties can be transferred securely between owners through a title
transfer module. Property verification is made possible by entering the Property ID,
which provides comprehensive details about the property. A smart contract governs
NFT functionalities, preventing duplication and ensuring data integrity. The system
handles token minting and property transfers seamlessly, guaranteeing transparency
and trust in property transactions (Fig. 4).
The smart contract stores property and owner data, mints property tokens, and
facilitates their transfer. The gatherOwnerData() method obtains input from the UI
and stores it as structures in the Blockchain. Similarly, gatherPropertyData() stores
property details. The mint() function creates a Non-Fungible Token and links it to the
registered property. The transferToken() method transfers the token to a new owner,
updating the token’s information.
Figures 5 and 6 denote the front-end (user interface) of the proposed solution,
specifically the property registration module and the property transfer module. In
the case of property registration, necessary details regarding the owner such as full
name, ID number (Example: SSN, Aadhar, etc.), date of birth, owner’s wallet address,
current address, and contact details along with property details such as its location,
type (residential/commercial), form (apartment/single house), current price, and the
name of the previous owner are obtained. During property transfer, similar details
are obtained to facilitate property transfer from the old owner to the new owner.
When the register button is clicked, the data is sent for approval by the appropriate
authority after which the data is stored in the blockchain, and the necessary methods
are triggered.
356 S. Prathibha et al.

Fig. 4 Algorithm

6 Conclusion

In conclusion, Propchain is a decentralized property management system that revo-


lutionizes property registration, transfer, and verification by utilizing blockchain
technology and NFTs. It has various advantages over current systems. Propchain, in
contrast to conventional systems, offers improved security and transparency thanks
to its decentralized structure and blockchain’s immutability. Using smart contracts
provides the safe storage of owner and property data, lowering the possibility of fraud
and unauthorized ownership. The adoption of NFTs makes it possible for an easy
Propchain: Decentralized Property Management System 357

Fig. 5 UI component for property registration

Fig. 6 UI component for property transfer

and safe transfer of ownership, doing away with the need for middlemen and cutting
down on transaction costs. Giving property owners more control and transparency
over their assets promotes accountability and confidence in business dealings. Future
work on this project may include additional research and development in areas for
improved security and privacy, looking into scalability options to handle a greater
volume of property transactions, and performing real-world testing and validation of
the Propchain system.
358 S. Prathibha et al.

References

1. Atlam HF, Wills GB (2019) Technical aspects of blockchain and IoT. In: Advances in computers,
vol 115. Academic Press, Amsterdam, The Netherlands, pp 1–39. ISBN 9780128
2. Bhanushali D, Koul A, Sharma S, Shaikh B (2020) Blockchain to prevent fraudulent activities:
buying and selling property using blockchain. In: 2020 international conference on inventive
computation technologies (ICICT), Coimbatore, India, pp 705–709. https://doi.org/10.1109/ICI
CT48043.2020.9112478
3. Sahai A, Pandey R (2020) Smart contract definition for land registry in blockchain. In: 2020 IEEE
9th international conference on communication systems and network technologies (CSNT),
Gwalior, India, pp 230–235. https://doi.org/10.1109/CSNT48778.2020.9115752
4. Majumdar MA, Monim M, Shahriyer MM (2020) Blockchain based land registry with dele-
gated proof of stake (DPoS) consensus in Bangladesh. In: 2020 IEEE region 10 symposium
(TENSYMP), Dhaka, Bangladesh, pp 1756–1759. https://doi.org/10.1109/TENSYMP50017.
2020.9230612
5. Rana SK, Rana SK, Rana AK, Islam SMN (2022) A blockchain supported model for secure
exchange of land ownership: an innovative approach. In: 2022 international conference on
computing, communication, and intelligent systems (ICCIS), Greater Noida, India, pp 484–489.
https://doi.org/10.1109/ICCCIS56430.2022.10037224
6. Shrivastava AL, Kumar Dwivedi R (2023) Blockchain-based secure land registry system using
efficient smart contract. In: 2023 international conference on intelligent data communication
technologies and internet of things (IDCIoT), Bengaluru, India, pp 165–170. https://doi.org/10.
1109/IDCIoT56793.2023.10053476
7. Borse Y, Ahirao P, Chawathe A, Patole D (2022) Embracing permissioned blockchain for a
secure and reliable land registration system. In: 2022 5th international conference on advances
in science and technology (ICAST), Mumbai, India, pp 525–530. https://doi.org/10.1109/ICA
ST55766.2022.10039485.
8. Ncube N, Mutunhu B, Sibanda K (2022) Land registry using a distributed ledger. In: 2022
IST-Africa conference (IST-Africa), Ireland, pp 1–7. https://doi.org/10.23919/IST-Africa56635.
2022.9845584
Criminal Prevision with Weapon
Identification and Forewarning Software
in Military Base

V. Ceronmani Sharmila, A. Vishnudev, and S. Gautham

Abstract Nowadays, it is especially important to give good concern toward security


in all sectors, especially in military areas, bank sectors, airports, railway stations, etc.
We need security for all sectors as the world is developing day by day along with
technological advancement. Along with it, various kinds of attacks and problems also
arise, so to prevent these, we must ensure proper security in each sector. Here, we are
discussing one such security measure, as the system is used in the military regions
to find out whether an intruder has entered the premises (unauthorized access) and
to identify whether he is carrying any weapons with him thereby showing the threat
level and gives the officials forewarning about this.

Keywords Technology advancement · Security · Unauthorized access · Weapon


identification · Threat level · Forewarning

1 Introduction

The intrusion of an unauthorized person into a military base is a nationwide threat;


as military bases are critical installations with national secrets, equipment, and other
sensitive information, intrusion can come in the form of physical, such as unautho-
rized entry into the base. Or digital intrusion, these locations are critical to national
security and are often targeted by hostile forces or individuals with malicious intent.
To ensure the safety and security of military bases, many wide range of security
measures have to be deployed to prevent and detect intrusion, including perimeter
security, intrusion detection, and surveillance. But the old deployed system is a manu-
ally functioning system that takes lots of time to find the intruder and the chance of
human error and fault error, and an extremely high effective intrusion detection in
a military base is particularly important for maintaining operational security and

V. Ceronmani Sharmila (B) · A. Vishnudev · S. Gautham


Department of Information Technology, Hindustan Institute of Technology and Science, Chennai,
India
e-mail: csharmila@hindustanuniv.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 359
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_31
360 V. Ceronmani Sharmila et al.

ensuring the safety of personnel and equipment. Any breach of security could have
profound consequences, including the theft or damage, or destruction of sensitive
information or equipment, as well as potential threats to national security. As such,
constant vigilance and proactive security measures are necessary to prevent and
detect intrusion in military bases.
There are many methods of intrusion detection that are deployed in the military
base and in the border of the country; some of these systems are:
• Perimeter security: Security is maintained by keeping secured fences, walls, or
other barriers to prevent unauthorized access.
• Motion sensors: The sensors are installed in the sensitive areas of military bases
to detect the movements.
• Access control systems: Control system can be deployed to regulate and monitor
access to the military base and its facilities.
• Patrols: Regular patrols by security personnel can help detect any unauthorized
activities and respond quickly to potential threats.
As security importance in the military sector is especially important, the accuracy
and precision of the detection software should be equally important and maximum,
securing from the infiltration of people from destroying or collecting confidential
data. In this proposed system, we are using YoloV3 for person detection, Dlib for
facial recognition, and YoloV5 for weapon detection. Here, we have trained the
YoloV5 algorithm with 2000 images of weapon dataset to increase the precision
and accuracy, and for identifying the person, we are using the pre-trained dataset
of YOLOv3, and for facial recognition, we are using Dlib and its face recognition
pipeline called the face alignment using facial landmarks. The main purpose of our
software is to find the intruder’s infiltration into the base and the weapons he is
carrying in his hand to analyze the threat level.

2 Related Works

Diverse types of systems with various kinds of solutions were used to address the issue
and were also helpful. Several systems for human intrusion detection already exist
with diverse types of algorithms used. In these types of systems, only the detection
of the intruders was implemented, and some were highly effective though. But there
are only a limited number of systems that use both the detection of intruders and the
weapon that they are carrying along with them. These systems are also not efficient
so they are not considered that important. Based on our project idea, we had done
a lot of research work and had driven through several papers one of such was [1]
which deals with the fact that millimeter waves are used (mmWs) to detect weapons
that are being placed inside the containers. Gonzalez-Sosa and Fierrez along with
the other two members suggested the method in which mmW technology used for
imaging is used here. Then comes another system that is used for regions that use fast
steering mirror (FSM). While addressing the system [2], Tianqing et al. along with
Criminal Prevision with Weapon Identification and Forewarning … 361

the other two members set it in such a way that it can be used to monitor hot targets
and hot regions. One of the systems that is used in the military for target detection
was [3] proposed by Kong et al. Here, mainly the YoloV3 algorithm is treated with
a lightweight neural network (GhostNet) which makes it more precise and accurate.
It is used to detect military targets. Then we dealt with another paper [4] which is
designed in such a way that the presence of light should not affect the system. This
was one of the main concerns of Li, Yang, and the other three while designing the
system. Here, the interesting thing is that there are several detection and tracking
mechanisms but here, they use Retinaface and Camshift algorithms so that it cannot
be affected by light. It functions with the P steering gear in the PID controller.
The paper [5] that proposed by Zhang et al. mentions the tactics of ground-to-
air defense. In this, it is formulated as the dynamic sensor or heterogenous weapon
target assignment. Some of the works are [6] focused on deep learning algorithms.
Here when Bhatti, Khan, and two others tried to develop the system by adding binary
classification, the pistol class was assumed as the reference class, and along with that,
they also introduced relevant confusion objects to minimize the false positive. Ruiz-
Santaquiteria, Velasco-Mata, and four others [7] proposed by using human pose and
weapon appearance. They used body pose, especially while using shotgun posture,
which is important, so they made a particularly good approach in this project. In [8],
Hashim et al. proposed the use of ML. By this, it can be used for PPA detection of
weapons using CNN. It pinpoints the accuracy of automatic weapon identification
for both surveillance purposes. The next project is by Xu et al. [9]. The paper’s
main objective is to concentrate on detecting prohibited items inside the airport fully
automated. And the efficiency and stability had been demonstrated through X-ray
and GDX-ray image datasets. Rahul Chiranjeevi and Malathi had done their paper
on [10].

3 Research Method

We are using two versions of the same algorithm in this criminal prevision and
weapon identification, and for facial recognition, we are using Dlib. When it comes
to weapon detection, we are using YoloV5, and for personal identification, we are
using YoloV3. YoloV3 is a real-time object detection algorithm that was released in
April 2018. It is the third version of the popular You Only Look Once (YOLO) series
of object detection algorithms. It uses deep learning architecture to detect objects in
images and videos with high accuracy and speed; in this version, the object detection
and classification are performed in a single step, and it is the most updated version of
the CNN deep learning model for processing the data, and predefined data are directly
given for testing data to get the accurate output Fig. 1. We are using YOLOv3 because
it is designed to be fast and efficient, and it can process images and videos in real-time.
It divides the input image into grid cells, and each cell predicts a set of the bounding
boxes. It also uses many other techniques to increase the accuracy like the use of
multi-scale prediction, anchor boxes, and feature pyramid network which gives the
362 V. Ceronmani Sharmila et al.

Fig. 1 Face detection. This image is taken as a screenshot from the computer while running the
output for detecting the faces of the people using Yolo and Dlib

out white less computation time and more accuracy. While testing the person’s face
detection, we have used the pre-defined data that are already set in the YoloV3 and
are extremely fast compared to versions of CNN, fast CNN, faster CNN, and the
YoloV1 and YoloV2; its accuracy is very high as the mAP (mean average precision)
measure at 0.5 intersection of union (IOU) with focal loss is about 4× faster. And
on the face detection, predicting close faces with the highest value of 95% and still
getting the same results on many faces with an average value of 95% accuracy.
Dlib is a popular library for face recognition and has been used in various face
recognition applications. It provides several tools and algorithms that make it easy to
develop robust face recognition systems and also provides a pre-trained face recogni-
tion model called the face recognition ResNet model, which is based on deep learning
techniques and achieves state-of-the-art performance on face recognition tasks. The
model is trained on a large dataset of face images and can recognize faces with high
accuracy even under varying lighting conditions, facial expressions, and poses.
We use facial landmark detection which is the process of locating and identifying
specific points on a face, such as the eyes, nose, mouth, and chin. Dlib provides a
pre-trained shape predictor model that can be used to detect 68 facial landmarks on
a face.

4 Methodology

4.1 Data Preprocessing

Image datasets containing 2000 images were collected to test and train the data, and
the following process was undertaken to train and test the dataset.
Criminal Prevision with Weapon Identification and Forewarning … 363

• Data cleaning: Remove any irrelevant or duplicated data and handle missing values
or outliers.
• Image resizing: Resize the images to a consistent size that is compatible with
YOLOv5’s input size.
• Annotation format conversion: Convert the annotations of the images to the YOLO
format, which consists of a text file for each image containing the class label and
the normalized coordinates of the object’s bounding box.
• Labeling: Label the weapons in the images with appropriate categories (such as
“pistol,” “rifle,” “shotgun”) and their corresponding bounding boxes, as shown in
Fig. 3.
• Image augmentation: Apply various image augmentation techniques such as rota-
tion, flipping, and random cropping to increase the size of the dataset and improve
the model’s ability to generalize to new data.
• Data splitting: Split the dataset into training, validation, and testing sets to evaluate
the model’s performance and prevent overfitting.
• Normalization: Normalize the pixel values of the images to a common scale to
ensure consistency across the dataset.
• Encoding: Encode the class labels in the YOLO format to numerical form for use
in the model.

4.2 Simulations and Analysis

Simulation and analysis of our project have been done by integrating both weapon
detection in YOLOv5 and person detection in YOLOv3 and personal identification
in Dlib. Each version of YOLO had it on the dataset, and Dlib has its own libraries
and pre-defined data. As YOLOv3 was using the predefined dataset for person iden-
tification and person detection the predefined dataset in the yolo algorithm which
has an accuracy of 95% and above Dlib library is used for face detection purposes
by face landmark detection in Dlib is the process of locating and identifying specific
points on a face, such as the eyes, nose, mouth, and chin. Dlib provides a pre-trained
shape predictor model that can be used to detect 68 facial landmarks on a face image.
Figure 1 shows the output for face detection using the Dlib libraries where the
face of the guys in the video is given to the database to store and identify the person
while running the test video.
Figure 2 shows the person detection from the test video given as people walking
and the output shown as a person in a boundary box.
Figure 3 shows the given images and names of the database, and after finding
similar faces, the algorithm finds the similar face in the database and gives the output
as “john” which is in the database, and when no face is found, the output is given as
“NO FACE FOUND” OBJ.
In Fig. 4, if a person’s face is not clear or turns backside, the output shows as “NO
FACE FOUND.”
364 V. Ceronmani Sharmila et al.

Fig. 2 Person detection. This image is taken as screenshot from the computer while running the
output for detecting the person using YOLOv5

Fig. 3 Database for face detection as John. This image is taken as a screenshot from the computer
while running the output for showing the saved face of people as his name from the stored database

Fig. 4 Detection of NO FACE FOUND. This image is taken as screenshot from the computer while
running the output for detecting the faces of the people using Dlib
Criminal Prevision with Weapon Identification and Forewarning … 365

Fig. 5 Intrusion detection. This image is taken as a screenshot from the computer while running
the output for detecting the intruder by checking from the saved database

Figure 5 shows the intruder detection part; if the algorithm finds a person who is
not in the database, it shows an alert or message as “INTRUDER DETECTED” as
in Fig. 5.
YOLOv5 is used in the weapon detection process as we have given six classes of
weapons with 2000 images; the number of classes is:
• Grenade
• Gun
• Knife
• Pistol
• Handgun
• Rifle.
Each class is tested together and trains the dataset to get mAP, recall, and precision,
and the result of the training dataset is shown in Table 1. As in Table 1, avg. precision
of all the class datasets is 0.875/1, and the mean average precision is 0.934/1.

Table 1 Train result of class weapons


Class Precision Recall mAP50 mAP 50–95
All 0.875 0.88 0.934 0.597
Grenades 0.793 0.936 0.947 0.633
Gun 0.913 0.809 0.915 0.567
Knife 0.864 0.857 0.906 0.543
Pistol 0.884 0.927 0.951 0.665
Handgun 0.909 0.906 0.941 0.57
Rifle 0.887 0.846 0.947 0.607
This image is taken as a screenshot from the computer while getting the output from testing the
weapon dataset in Google Collab
366 V. Ceronmani Sharmila et al.

Fig. 6 Confusion matrices of weapon dataset graph. This image is taken as a screenshot from the
computer while getting the output from testing the weapon dataset in Google Collab

Figure 6 shows the confusion metrics of the weapon datasets.


After testing and training the dataset, the acquired dataset is given in the YOLOv5
algorithm for the detection of weapons in the given testing video.
Figure 7 shows the result of weapon detection, from the given test video all the
weapons are getting detected like the handgun, grenade, knife, etc.
As all the detection part is over and integrated into one single software for the
detection of intruders, the detection of weapons is working properly. We have coded
in such a way that only the intruder’s weapon is detected and the person who is not
an intruder, his weapon is not detected which ensures so no threat is identified as
shown in Fig. 8. The test video is given to detect the intruder where the intruder
image is not indeed in the database so that the software can detect, and also, he is
carrying a handgun to get detected and our software has detected intruder with his
gun successfully.
Criminal Prevision with Weapon Identification and Forewarning … 367

Fig. 7 Weapon detection. This image is taken as a screenshot from the computer while running the
output for detecting the knife using YoloV5

Fig. 8 Intruder detection and weapon detection. This image is taken as a screenshot from the
computer while running the final output of the project from testing the intruder detection along with
the weapon

5 Comparative Table

Table 2 provides a general overview and comparison between YOLO-based


approaches for weapon detection and face detection and some existing approaches.
The specific performance and characteristics of each approach may vary based on
the version of YOLO, implementation details, training data, and other factors.
368 V. Ceronmani Sharmila et al.

Table 2 Comparison of results with existing methods


Approach YOLO for YOLO for face Existing approach Existing approach
weapon detection detection for weapon for face detection
detection
Accuracy High High Low Low
Speed Fast Fast Slow Slow
Object detection Multiple Multiple Individual Individual
Training data Large dataset Large dataset Large dataset Large dataset
False positives Less chance Less chance High chance High chance

6 Conclusion

In conclusion, the You Only Look Once (YOLO ) algorithm has emerged as a
powerful and effective tool for face and weapon identification systems. YOLO’s real-
time object detection capabilities, combined with its ability to handle multiple objects
simultaneously, make it an ideal choice for security and surveillance applications.
When it comes to face identification, YOLO excels at accurately detecting and
localizing faces within images or video frames. Its ability to process images in real-
time allows for quick and efficient face recognition, enabling applications such as
access control, forensic investigations, and public safety monitoring. By leveraging
deep learning techniques and a large amount of training data, YOLO can accurately
classify and identify individuals based on their facial features.
Moreover, YOLO’s capabilities extend beyond face identification to weapon
detection, which is crucial in maintaining public safety and preventing acts of
violence. With its high accuracy and real-time processing, YOLO can quickly detect
and classify various types of weapons, including firearms, knives, and explosives.
This enables security personnel and law enforcement agencies to swiftly respond to
potential threats and take appropriate action.
The YOLO algorithm’s strengths lie in its speed, accuracy, and versatility. Its
ability to handle real-time processing of large volumes of data makes it an excellent
choice for face and weapon identification systems that require immediate responses.
Furthermore, YOLO’s performance is not limited to specific environments or lighting
conditions, making it adaptable to various scenarios, such as indoor surveillance,
public spaces, or even nighttime operations.
However, it is important to note that while YOLO is a highly effective tool,
no system is perfect, and there are certain limitations to consider. Factors such as
occlusions, varying viewpoints, and image quality can affect the accuracy of face
and weapon identification. Additionally, YOLO’s performance heavily relies on the
quality and diversity of the training data used during its development.
Finally, we have given our proposed system criminal prevision with weapon iden-
tification and forewarning software which consist of intruder detection in the military
base along with the weapon carrying, so that border guards or the security guards
in the military base can easily take necessary precaution while an intruder tries to
Criminal Prevision with Weapon Identification and Forewarning … 369

infiltrate the base to collect or destroy national wide classified data their future plans
or takes any technology to destroy property or weapon garage so to avoid all this we
have proposed this project to detect intruders and forewarning software.

References

1. Gonzalez-Sosa E, Vera-Rodriguez R, Fierrez J, Patel VM (2017) Exploring body shape from


mmW images for person recognition. IEEE Trans Inf Forensics Secur 12(9):2078–2089. https://
ieeexplore.ieee.org/document/7904608
2. Tianqing C, Quandong W, Lei Z, Na H, Wenjun D (2019) Battlefield dynamic scanning and
staring imaging system based on fast steering mirror. J Syst Eng Electron 30(1):37–56. https://
ieeexplore.ieee.org/document/8660542
3. Kong L, Wang J, Zhao P (2022) YOLO-G: a lightweight network model for improving the
performance of military targets detection. IEEE Access 10. https://ieeexplore.ieee.org/doc
ument/9780377
4. Li J et al (2022) Face detection and tracking based on neural network. In: 2022 3rd international
conference on information science, parallel and distributed systems (ISPDS), Guangzhou.
https://ieeexplore.ieee.org/document/9874114
5. Zhang K, Zhou D, Yang Z, Kong W, Zeng L (2020) A novel heterogeneous sensor-weapon-
target cooperative assignment for ground-to-air defense by efficient evolutionary approaches.
IEEE Access 8:227373–227398. https://ieeexplore.ieee.org/document/9288732
6. Bhatti MT, Khan MG, Aslam M, Fiaz MJ (2021) Weapon detection in real-time CCTV videos
using deep learning. IEEE Access 9:34366–34382. https://doi.org/10.1109/ACCESS.2021.305
9170. https://ieeexplore.ieee.org/document/9353483
7. Ruiz-Santaquiteria J, Velasco-Mata A, Vallez N, Bueno G, Álvarez-García JA, Deniz O
(2021) Handgun detection using combined human pose and weapon appearance. IEEE Access
9:123815–123826. https://ieeexplore.ieee.org/document/9529187
8. Hashim N, Anto Sahaya Dhas D, Jayesh George M (2022) Weapon detection using ML for PPA.
In: Pandian AP, Palanisamy R, Narayanan M, Senjyu T (eds) Proceedings of third international
conference on intelligent computing, information and control systems. Advances in intelligent
systems and computing, vol 1415. Springer, Singapore. https://doi.org/10.1007/978-981-16-
7330-6_61
9. Xu M, Zhang H, Yang J (2018) Prohibited item detection in airport X-ray security images
via attention mechanism based CNN. In: Pattern recognition and computer vision, vol 11257.
ISBN: 978-3-030-03334-7. https://doi.org/10.1007/978-3-030-03335-4_37
10. Rahul Chiranjeevi V, Malathi D (2022) Detection of weapons in surveillance scenes using
masked R-CNN. In: Tavares JMRS, Dutta P, Dutta S, Samanta D (eds) Cyber intelligence and
information retrieval. Lecture notes in networks and systems, vol 291. Springer, Singapore.
https://doi.org/10.1007/978-981-16-4284-5_29
Advanced Cyber Security Helpdesk
Using Machine Learning Techniques

V. Ceronmani Sharmila, E. Vishnu, M. R. King Udayaraj,


and V. Sharan Venkatesh

Abstract Numerous amounts of cybercrimes and attacks are happening over the
globe even in every ten minutes. A major number of people are being affected by
these cybercrimes and attacks across the world. Hence, the various attacks should be
prevented, and the people should be made aware of the cybercrimes and attacks so
that they can prevent the crimes by themselves. The attacks and crimes happen in a
planned manner so they can be avoided by taking certain measures. An application
which will help and guide the people for direction to avoid any cybercrimes and
attacks on their system will categorize the various cybercrimes and attacks that are
happening over the world and keep the users updated about the various cyber security
events. The application will work on machine learning algorithm to give the solution
and preventive measures according to the description of cyber security incidents
given by the user. The fully developed application can also be used as a social media.

Keywords Cyber security helpdesk · Cyber-attacks · Solutions and preventive


measures · Trending crimes · Machine learning algorithm · Cyber security experts
discussion

1 Introduction

The first computer system was built in 1941 but as our generation grows up, various
new types of computer systems are invented and created. Even nowadays, computers
are available that are very compact. As the number of computer systems increases,
almost every person over the world uses computer systems such as mobile phones

V. Ceronmani Sharmila · E. Vishnu (B) · M. R. King Udayaraj · V. Sharan Venkatesh


Department of Information Technology, Hindustan Institute of Technology and Science, Padur,
Chennai 603103, India
e-mail: vishnuwolf51@gmail.com
V. Ceronmani Sharmila
e-mail: csharmila@hindustanuniv.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 371
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_32
372 V. Ceronmani Sharmila et al.

and laptops. These computer systems make everyone’s life easy by storing personal
data in keeping it safe.
Person’s life is easy by storing personal data in their own system and keeping
it safe. But as our generation and technology emerge, these computer systems are
hacked by some, and they use the personal data of others in an illegal manner. So, to
protect these computer systems, cyber security comes into effect. We use computer
systems to store every personal file and data that are useful in our day-to-day life. So,
to maintain the security of our systems and files, cyber security comes into action.
Own system and keeping it safe. But as our generation and technology emerge,
these computer systems are hacked by some, and they use the personal data of others
in an illegal manner. So, to protect these computer systems, cyber security comes
into effect. We use computer systems to store every personal file and data that are
useful in our day-to-day life. So, to maintain the security of our systems and files,
cyber security comes into action.
Cybercriminals are the people who are well trained for doing the various kinds of
cyber-attacks [1]. The increasing trend of cybercrime in India specifically mentions
the use of social networking sites for fraudulent activities such as IRS impersonation
scams and technical support scams. The study also notes that cybercriminals are
difficult to trace and that most victims are in the age group of 20–29 years. These
cyber-attacks from cybercriminals can be blocked and prevented by taking some
cyber security protocols and measures by the user. But the users should be well
aware about the attacks and measures. To help these people cyber threats and attacks
and to make people understand about the cyber security protocols, our application
comes into action. Our application is called “cyber hand” which will help the users
on what kind of cyber security measures should be taken based on the cyber-attacks.
The users can get the measures and solutions to prevent the cyber-attacks they
are affected with by giving the descriptions about the attack. The application will
automatically show solutions and preventive measures related to the attack without
any human interventions. This application will use the protocols and methods of
a machine learning algorithm which will train the application for various attacks
from the user’s input. Soni and Bhushan [2] used machine learning to overcome the
limitations of traditional techniques and improve the efficiency of cyber security. The
paper examines how various machine learning algorithms can be applied to address
common issues in cyber security and improve the analysis and detection process.
The descriptions of these attacks will be stored, and they will be reflected in the
application itself to make people aware and understand the various cyber-attacks
that are happening around. There are various kinds of cyber-attacks present, and
even new types of attacks also emerging nowadays; so to keep the users updated,
our application will be helpful. The users can understand the various attacks region-
wise and in rank order-wise also. It also has a search bar where the users can search
about any kind of cyber-attacks and threats and can understand and know about the
real-time cyber-attacks.
It will be helpful to see the trending cyber-attacks and crimes that are happening
around the user’s location in rank order with detailed description of the crimes and
attacks. In this platform, the users can connect with the cyber security experts and can
Advanced Cyber Security Helpdesk Using Machine Learning Techniques 373

know about the cyber incidents and events in various regions. Ahmad et al. [3] discuss
the need for cyber security education to be addressed in all fields and at all levels
of education, as the requirement for cyber security is in a big need. It emphasizes
the importance of stakeholders and other contributions and the responsibility of all
stakeholders in the system. This application will be used as a help desk as well as a
social media for cyber security. As there is no a helpdesk for common people and
they are with human interventions, this application will be user-friendly and can
be accessed by common people, and it works without human interventions. It is an
automated one with the help of machine learning algorithms.

2 Related Works

There has been a significant amount of research in the area of cyber security and
the development of automated cyber helpdesk systems. Many existing systems focus
on providing technical support and troubleshooting assistance to users but lack the
ability to provide real-time information on cyber-attacks and crimes or a community
platform for cyber security experts to interact.
One example of related work is the “Cyber security Virtual Assistant” devel-
oped by researchers. This system utilizes natural language processing and machine
learning algorithms to provide technical support and troubleshooting assistance to
users. However, it does not provide real-time information on cyber-attacks and crimes
or a community platform for experts to interact.
Shahjee and Ware [4] addressed the need for integration between network oper-
ating centers (NOC) and security operation centers (SOC) in order to more effectively
protect IT assets from cyber security threats. The current lack of a holistic view in the
literature regarding an integrated NOC and SOC architecture is limiting innovation in
this field. The paper conducts a systematic literature review and analysis to propose
a state-of-the-art architecture for an integrated NOC and SOC, outlining the main
building blocks and usefulness for organizations. However, the relevant literature
only considers explicit knowledge, neglecting the importance of tapping into people’s
tacit knowledge in automating and integrating NOC and SOC processes. Mutemwa
et al. [5] discussed the challenges involved in incorporating a newly developed Secu-
rity Operations Center (SOC) into an organization’s existing IT environment are
discussed in this paper. The article explores various aspects such as determining the
data sources to be integrated into the Security Information and Event Management
(SIEM) system for effective security posture monitoring, integrating the organiza-
tion’s ticket logging system with the SOC SIEM, devising effective communication
strategies to promote awareness campaigns from the SOC, and reporting on inci-
dents identified within the SOC. Furthermore, the article highlights the difficulties in
demonstrating the value of the significant investments made in building and running
a SOC. Since a SOC operates as an independent hub, it must be integrated with the
organization’s established procedures, policies, and IT systems, which can pose inte-
gration challenges during its implementation. Paté-Cornell and Kuypers [6] discussed
374 V. Ceronmani Sharmila et al.

the need for a quantitative approach to cyber risk management; this paper proposes
a probabilistic method for assessing the probabilities of new attack scenarios based
on existing data, with the aim of optimizing the allocation of limited resources and
prioritizing risk management measures. The method involves statistical analysis of
incidents within a specific organization and extends to cover potential threats that
have not yet occurred. A systematic construction of new attack scenarios is required,
followed by an assessment of their probability of success and the resulting losses.
This analysis generates complete risk curves that represent the overall cyber risk
for the organization and its insurers, facilitating the evaluation of the advantages of
various protective options. Moneva et al. [7] aimed to assess the efficacy of four
warning banners that sought to engage internet users and reduce DDoS attacks. A
quasi-experimental design was employed to measure engagement generated by the
banners and their linked landing pages. The results indicated that social ads were
significantly more effective in generating engagement than the other types of ads, with
no discernible difference in engagement between the various landing page designs.
The study suggests that social messages may be more effective than traditional deter-
rent messages in engaging with potential cyber offenders. Lif et al. [8] examined the
crucial information components that should be incorporated in an incident report to
facilitate incident management and information sharing in the field of cyber security.
The authors analyzed various reporting templates and frameworks, including a novel
reporting template employed in a cyber defense exercise within the military domain.
The findings suggest that the information elements in existing templates vary based
on their specific use cases, while the military template was deemed effective for inci-
dent reporting in the civilian sector. The authors recommend supplementing the new
template with additional fields such as details on victims, attackers, and assessments
of attackers’ motives to enhance its usefulness.
Mandal and Khan [9] state that the COVID-19 outbreak has resulted in a shift
toward remote work and learning, leading to an unprecedented growth of cloud infras-
tructure. This has caused an increase in data breaches and cyber security threats, not
only for large cloud vendors but also for small startups in various sectors. The paper
aims to identify security challenges caused by the sudden adoption of cloud platforms
and propose preventive measures to mitigate risks. The study highlights the need
for adequate precautions to protect hosts and devices connected to cloud resources.
Bakhshi et al. state that social engineering attacks exploit human vulnerability to gain
access to sensitive information. End-user awareness is crucial in protecting against
social engineering attacks, as they target the user rather than system defenses. A study
was conducted to determine the susceptibility of users in a corporate organization
to social engineering attacks and was measured by considering two attack scenarios
and despite the inclusion of clues to alert suspicious users, a significant proportion of
users (46–60%) fell prey to the attacks due to lack of user awareness. Post-incident
training and regular IT security drills were recommended to correct this issue.
Tanwar et al. discussed the increasing cybercrime and potential threats in the cyber
world, which can result in the loss of sensitive data and put devices at risk. The paper
presents cybercrime statistics and compares them with previous research papers on
Advanced Cyber Security Helpdesk Using Machine Learning Techniques 375

the subject. The research aims to bring attention to the issue and provide a detailed
study of various cybercrimes and potential threats.
There are also commercial cyber helpdesk platforms like “CyberArk” and “Trend
Micro” which provide technical support, troubleshooting assistance, and knowledge
base for the users, but it does not provide real-time information on trending crimes
or a community platform for experts to interact.

3 Crime Detection Using Machine Learning

The proposed research focuses on the development of an advanced cyber helpdesk


application to provide assistance in managing and preventing cyber security inci-
dents. In today’s rapidly evolving digital landscape, cyber security incidents are
becoming increasingly prevalent and damaging, making it crucial to have a system in
place to effectively manage and prevent them. The proposed system uses the machine
learning algorithms which makes the system more efficient and gives results with
better accuracy rate than the existing system.
Shahjee and Ware [4] suggested that the existing system has the Integrating a
Security Operations Center (SOC) and a Network Operations Center (NOC) which
is a crucial step in enhancing the ability to detect and respond to cyber threats. The
SOC is responsible for detecting and responding to security incidents, while the
NOC is responsible for monitoring and maintaining the network. By integrating the
SOC and NOC, organizations can effectively detect any cyber incident, respond, and
mitigate the cyber-attacks and threats in real-time.
A Network Operations Center is a centralized department within an organization
that is tasked with monitoring and managing the organization’s network infrastruc-
ture. It comprises a team of network specialists who employ a variety of tools and
methods to constantly monitor the network and identify any problems that may
arise. The primary objective of the NOC is to guarantee the network’s availability
and optimal performance; this involves monitoring for issues such as outages, poor
performance, and potential security risks, and promptly working to resolve them.
Additionally, the NOC collaborates with other departments within the organization,
such as the Security Operations Center, to ensure the network aligns with the overall
objectives and goals of the organization.
These integrated SOC and NOC have certain drawbacks such as:
• Complexity: Integrating SOC and NOC can be a complex and challenging process,
requiring significant time and resources to implement. Organizations need to
ensure that the integration is done effectively and efficiently to avoid disrupting
existing operations.
• Cost: Implementing an integrated SOC and NOC can be costly, as it requires
investment in new technology, infrastructure, and personnel. Organizations need
to carefully consider the costs involved and ensure that they have the resources
and budget necessary to support the integration.
376 V. Ceronmani Sharmila et al.

• Opposition to Change: Implementing change can sometimes be challenging, and


some employees may resist the integration of the Security Operations Center
(SOC) and the Network Operations Center (NOC). This resistance may result in
a hesitance to embrace new procedures and technologies, which can hinder the
integration process and negatively affect its outcome.
• Data Management: Integrating SOC and NOC generates large amounts of data,
which need to be managed, analyzed, and stored efficiently. Organizations need to
implement robust data management processes to ensure that the data collected by
the SOC and NOC can be effectively used to detect and respond to cyber threats.
• Maintaining Focus: Integrating SOC and NOC requires careful coordination
between the two teams to ensure that each team is focused on its area of expertise.
This can be challenging, as the two teams have different priorities, goals, and
responsibilities.
On the other hand, using machine learning, these drawbacks can be met and this
application can become more user-friendly with the users. The implementation of
integrated SOC and NOC can be occupied only by the large organizations, whereas
this application can be used by everyone including the common people. Some advan-
tages of using machine learning algorithms over the integration of SOC and NOC
are:
• Scalability: ML algorithms can handle large amounts of data and can scale to meet
the needs of organizations of any size. This can help to ensure that the helpdesk
has the resources and it needs to detect and respond to cyber threats, regardless
of the organization’s size or complexity.
• Cost-Effective: Implementing ML algorithms can be more cost-effective than
integrating SOC and NOC, as it requires less investment in new technology,
infrastructure, and personnel. This can help organizations to save costs while
still improving their cyber security posture.
• Improved Accuracy: ML algorithms can quickly analyze large amounts of data
and identify patterns and anomalies that may indicate a cyber threat. This can help
to detect threats more accurately than traditional security systems.
• Enhanced Expertise: ML algorithms can help to fill knowledge gaps in the
helpdesk’s staff by automating tasks that would otherwise require manual anal-
ysis. This can help to improve the overall expertise of the helpdesk and enhance
its ability to detect and respond to cyber threats.
• Automation: ML algorithms can automate the threat detection process, reducing
the need for manual analysis and freeing up resources for more critical tasks. This
can greatly improve the efficiency of the helpdesk’s response to cyber threats.
• In our proposed system, the machine learning algorithms specifically use two
machine learning algorithms for the working and the various processes of the
application. The two machine learning algorithms are:
– Random forest algorithm
– Support vector machine (SVM).
Advanced Cyber Security Helpdesk Using Machine Learning Techniques 377

Random Forest Algorithm


Random forest is a machine learning algorithm that is utilized for regression analysis
and classification. It comprises an ensemble of decision trees that are trained on
a random subset of data, and the outcomes are aggregated to generate the final
prediction. In the context of the cyber helpdesk application, random forest can be
used to predict the likelihood of cyber security incidents and to provide relevant
solutions and preventive measures accordingly.
Support Vector Machine (SVM)
The machine learning algorithm known as support vector machine (SVM) is imple-
mented for regression analysis and classification. In the context of the cyber helpdesk
application, SVM can be used to classify cyber security incidents into different cate-
gories, such as intrusion, spam, and malware, and to provide relevant solutions and
preventive measures accordingly.
The random forest and support vector machines (SVM) are widely used machine
learning algorithms in detecting cybercrime. Random forest, an ensemble learning
algorithm, utilizes several decision trees to generate predictions. SVM, on the other
hand, is a linear model that uses a boundary to separate data into different classes.
The Random forest algorithm is known for its proficiency in managing large
and intricate datasets and for its ability to handle nonlinear relationships between
features. It can also handle missing values and is relatively immune to overfitting.
However, it can be slower to train and predict compared to SVM.

4 System Workflow and Analysis

The proposed will be helpful for each and every person in the world. Nowadays,
people have emerged and become more modern that all their personal files and
data are stored in the computer systems. These files and data should be saved and
secured properly, that is where cyber security comes into action. As technology
emerges, lots of cybercrimes and attacks/threats are happening around the world.
The cybercriminals take over the various cybercrimes to steal the personal data and
credentials and use them in several illegal ways. These cybercrimes and attacks can
be prevented by creating awareness and giving knowledge to the people about the
cyber-attacks/threats and what are all the preventive measures should be taken to
prevent the attack. Our application comes into action to prevent the cyber-attacks/
threats. This application can be used as a helpdesk as well as a social media for
various cyber security purposes. This application contains various sections such as
home page, crime page report page, and community section.
After the home page, the crime page comes where the logged-in user can gain
knowledge about the various cybercrimes and attacks. In this page, the user can
search about any cyber incidents location-wise and rank-wise. In this section, the
user can also check the various cyber incidents that are happening around the user.
378 V. Ceronmani Sharmila et al.

Fig. 1 Crime page

The user can gain knowledge about various cybercrimes and attacks in these pages
section (Fig. 1).
The report page will be useful for the user to report about any type of cyber
incidents and cyber security events. This page is the main part of the application. In
this page, the user can report about cyber incidents by giving details and descriptions
of the incidents. In response, the application will provide the solutions and preventive
measures to prevent the attacks/incidents. The reported incident will be displayed in
the crime for the other users (Fig. 2).
The community page will be useful to connect with the various cyber security
experts and gain knowledge about cyber security events. The experts also can discuss
among themselves in this section.
The machine learning algorithm will be implemented in this application for
various processes of the application. The machine learning algorithms are used in
every stage of the application. In the crime page, the machine learning algorithms
will be used to classify the cyber-attack and incident in rank and location-wise. The
cyber incidents which are reported in the report page, using this machine learning

Fig. 2 Report page


Advanced Cyber Security Helpdesk Using Machine Learning Techniques 379

Fig. 3 Accuracy of random forest

algorithm, the relevant preventive measures and solutions will be displayed to the
user (Fig. 3).
The accuracy of a random forest algorithm depends on several factors, such as the
quality and quantity of the data used to train the model, the specific implementation of
the algorithm, the hyperparameters chosen, and the complexity of the problem being
solved. In general, random forests are known to be a robust and accurate machine
learning algorithm, particularly for classification tasks. It is common to evaluate
the accuracy of a random forest model. These metrics provide a way to assess how
well the model is performing in terms of correctly identifying positive and negative
examples.

5 Conclusion

In conclusion, the proposed automated advanced cyber security helpdesk for people
offers a comprehensive solution to address the growing concern of cyber security
incidents. With its report page, users can report and receive accurate solutions and
preventive measures to prevent cyber-attacks. The crimes or feed section helps users
stay informed about the latest cyber security incidents happening around the world,
enabling them to take necessary precautions. The community section offers a plat-
form for cyber security experts to discuss and interact on cyber security-related issues,
further enhancing the user’s knowledge and understanding of cyber security events.
The use of machine learning algorithms further strengthens the application’s capa-
bility in providing relevant solutions and preventing cyber-attacks. The implemen-
tation of this proposed solution will significantly improve the end-user’s awareness
and preparedness toward cyber security incidents.
380 V. Ceronmani Sharmila et al.

6 Comparative Study Between Proposed and Existing


System

(1) The existing system is operated manually and needs human interventions but
the proposed system is an automated one; it can perform function automatically
without any involvement of humans.
(2) The proposed system uses machine learning techniques but the existing system
does not use machine learning techniques.
(3) Due to the use of machine learning, the proposed system gives the result with
more precised and accuracy rate than the existing one.
(4) The existing system can only be used in large industries and organizations, but
the proposed system is very easy to use even a common man can also use to
protect themselves from any cyber threats or attacks.
(5) The existing system needs to be maintained properly but the proposed system
is very easy to maintain and the cost is also low.
(6) In the existing system if the attacked person reports the incident, it does not
give what immediate actions to be taken while the proposed system provides
you with certain preventive measures to avoid further damage immediately after
reporting the incident.

References

1. Datta P, Panda SN, Tanwar S, Kaushal RK (2020) A technical review report on cyber crimes in
India. In: 2020 international conference on emerging smart computing and informatics (ESCI),
Pune, India, pp 269–275. https://doi.org/10.1109/ESCI48226.2020.9167567. https://ieeexplore.
ieee.org/abstract/document/9167567
2. Soni S, Bhushan B (2019) Use of machine learning algorithms for designing efficient cyber
security solutions. In: 2019 2nd international conference on intelligent computing, instrumenta-
tion and control technologies (ICICICT), Kannur, India, pp 1496–1501. https://doi.org/10.1109/
ICICICT46008.2019.8993253. https://ieeexplore.ieee.org/abstract/document/8993253
3. Ahmad N, Laplante PA, DeFranco JF, Kassab M (2022) A cybersecurity educated community.
IEEE Trans Emerg Topics Comput 10(3):1456–1463. https://doi.org/10.1109/TETC.2021.309
3444. https://ieeexplore.ieee.org/abstract/document/9468330
4. Shahjee D, Ware N (2022) Integrated network and security operation center: a systematic anal-
ysis. IEEE Access 10:27881–27898. https://doi.org/10.1109/ACCESS.2022.3157738. https://
ieeexplore.ieee.org/abstract/document/9729852
5. Mutemwa M, Mtsweni J, Zimba L (2018) Integrating a security operations centre with an orga-
nization’s existing procedures, policies and information technology systems. In: 2018 interna-
tional conference on intelligent and innovative computing applications (ICONIC), Mon Tresor,
Mauritius, pp 1–6. https://doi.org/10.1109/ICONIC.2018.8601251. https://ieeexplore.ieee.org/
abstract/document/8601251
6. Paté-Cornell M-E, Kuypers MA (2023) A probabilistic analysis of cyber risks. IEEE Trans
Eng Manage 70(1):3–13. https://doi.org/10.1109/TEM.2020.3028526. https://ieeexplore.ieee.
org/abstract/document/9354348
Advanced Cyber Security Helpdesk Using Machine Learning Techniques 381

7. Moneva A, Leukfeldt ER, Klijnsoon W (2022) Alerting consciences to reduce cybercrime: a


quasi-experimental design using warning banners. J Exp Criminol. https://doi.org/10.1007/s11
292-022-09504-2
8. Lif P, Varga S, Wedlin M, Lindahl D, Persson M (2020) Evaluation of information elements in
a cyber incident report. In: 2020 IEEE European symposium on security and privacy work-
shops (EuroS&PW), Genoa, Italy, pp 17–26. https://doi.org/10.1109/EuroSPW51379.2020.
00012. https://ieeexplore.ieee.org/abstract/document/9229708
9. Mandal S, Khan DA (2020) A study of security threats in cloud: passive impact of COVID-19
pandemic. In: 2020 international conference on smart electronics and communication (ICOSEC),
Trichy, India, pp 837–842. https://doi.org/10.1109/ICOSEC49089.2020.9215374. https://ieeexp
lore.ieee.org/abstract/document/9215374
Advanced Persistent Threat Assessment

V. Ceronmani Sharmila, S. Aswin, B. Muthukumara Vadivel, and S. Vinutha

Abstract Cybersecurity in the health sector is a critical issue, as the healthcare


industry increasingly relies on technology to store, process, and transmit sensitive
patient information. This research paper examines the current state of cybersecurity
in the healthcare industry, including both the threats and vulnerabilities that exist,
as well as the measures being taken to protect against these threats. The study also
examines the effects of cybersecurity incidents on the healthcare sector, including
potential losses in revenue and reputation. The research suggests that there is a need
for improved cybersecurity regulations, as well as increased investment in security
infrastructure and employee education, in order to better protect patient data and
the healthcare system as a whole. Advanced persistent threats (APTs) are a major
concern for hospitals as they can result in the loss of sensitive patient information
and disrupt essential medical services. A comprehensive APT assessment is essential
to identify and mitigate potential threats. This research paper discussed the need
for APT assessment in hospitals, the methods used, and the potential impact of
APTs on healthcare organizations. The healthcare industry is a prime target for APT
attacks due to the valuable nature of the data it holds, such as patient medical records
and financial information. To conduct an APT assessment, healthcare organizations
should first identify their critical assets and assess the risk of a potential attack.
This can be done through vulnerability assessments, penetration testing, and threat
modeling. Additionally, healthcare organizations should have incident response plans
in place to quickly and effectively respond to a potential APT attack. APTs can have
a significant impact on healthcare organizations. A successful APT attack can result
in the loss or theft of sensitive patient information, which can have serious legal and
financial consequences. In addition, APTs can disrupt essential medical services,

V. Ceronmani Sharmila · S. Aswin · B. Muthukumara Vadivel · S. Vinutha (B)


Department of Information Technology, Hindustan Institute of Technology and Science,
Chennai, India
e-mail: Vinuthasrinivasan886@gmail.com
V. Ceronmani Sharmila
e-mail: it@hindustanuniv.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 383
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_33
384 V. Ceronmani Sharmila et al.

putting patient lives at risk. The paper concludes that conducting comprehensive
APT assessments is essential to identify and mitigate potential threats, protecting
patient information and maintaining the continuity of essential medical services.

Keywords Advanced persistent threat · Domain adaptation · Cybersecurity ·


Malware detection

1 Introduction

Advanced persistent threat (APT) assessment is a process of identifying, analyzing,


and evaluating the potential threats to an organization’s network and infrastructure
from advanced persistent attackers. APT assessment involves a thorough examina-
tion of an organization’s security posture, including its networks, systems, applica-
tions, and data, to identify vulnerabilities and potential attack vectors. An analysis
of the threat environment, including the different sorts of attackers, their goals, and
their tactics, methods, and procedures (TTPs), is also included in the evaluation [1].
Advanced persistent threats (APTs) are a growing concern in the healthcare industry
as they are specifically designed to evade detection and maintain unauthorized access
to sensitive data for extended periods of time. These threats can have a significant
impact on the healthcare sector, including financial loss, reputational damage, and
potential harm to patients. The healthcare industry is a prime target for APT attacks
because of the vast amount of sensitive patient information that is stored, processed,
and transmitted electronically. This information is highly valuable to attackers and
can be used for a wide range of nefarious purposes, including identity theft, fraud,
and blackmail [2].
This research paper aims to provide an in-depth assessment of the current state of
APT threats in the healthcare sector, as well as the methods used by attackers to infil-
trate healthcare networks and steal sensitive information. The paper will also explore
the impact of APT attacks on healthcare organizations, including the financial and
reputational damage that can result. Finally, the paper will provide recommendations
for healthcare organizations to improve their APT defense strategies and mitigate the
risk of APT attacks [3].
In conclusion, the impact of an APT on an organization can be significant and
long-lasting. APTs can lead to the loss or theft of sensitive information, disruption of
normal operations, compromise of intellectual property, and long-term financial and
reputational damage. Organizations must take proactive measures to identify and
mitigate potential APT threats, including conducting regular assessments, imple-
menting security controls, and creating incident response plans. Investing in robust
security measures and incident response plans can help organizations to detect and
respond to APTs quickly, minimizing the potential damage and consequences of an
attack.
Advanced Persistent Threat Assessment 385

2 Related Works

The reconnaissance, delivery, initial intrusion, command and control (C&C), lateral
movement, and data exfiltration phases make up the advanced persistent threat (APT)
life cycle. Attackers learn details about their target, including potential weaknesses
and personnel data, throughout the reconnaissance and delivery phases. In the initial
step of intrusion, an attack is then launched using this information. The attackers
utilize a server to manage any compromised hosts during the C&C stage. The
attackers move throughout the network at the stage of lateral movement to take
over more hosts. They can even infect additional computers using these recently
compromised hosts. Attackers finally take sensitive data from the victim during the
data exfiltration stage [4].
The term “Internet of Things” (IoT), which describes a network of connected
sensors, objects, and gadgets, is becoming more and more prevalent in contemporary
life. APTs are also focusing on these gadgets. It is becoming more crucial to identify
and respond to APT attacks on IoT devices as their security is often poorer than that
of traditional hosts [5].
APT28, which has been reported to have hacked at least 500,000 IoT devices
like routers, video decoders, and printers, is one example of an APT organization
that targets IoT devices. Phishing emails are frequently used by APT28 attackers
to initiate assaults, and afterward, botnets are used to take control of the infected
IoT devices. The group targeted video decoders, VOIP phones, and printers in one
infiltration activity. In the first attack and C&C stages, they were able to take control
of the devices by taking advantage of flaws and using default passwords. In the lateral
movement phase, they broke into the target’s intranet using the compromised IoT
devices. In the latter phase, they also stole crucial data from the victim [6].
The healthcare industry is a prime target for APT attacks due to the valuable
nature of the data it holds, such as patient medical records and financial information.
A report by the cybersecurity firm Carbon Black states that the healthcare industry
is the most targeted sector for APTs, with a rate of attacks that is three times higher
than the average across all industries (Carbon Black, 2016). The healthcare sector
is also facing a shortage of cybersecurity professionals, which can exacerbate the
problem [7].

3 Advanced Persistent Threat Phase 1

A comprehensive APT assessment process typically involves the following steps:


Threat intelligence gathering: This involves collecting and analyzing data about the
latest APT tactics, techniques, and procedures (TTPs) being used by threat actors.
This information is used to better understand the APT landscape and to develop a
targeted APT assessment strategy.
386 V. Ceronmani Sharmila et al.

Risk assessment: This step involves analyzing the potential consequences of an APT
attack on the organization, as well as the likelihood of such an attack occurring. This
information is used to prioritize the organization’s security needs and to allocate
resources accordingly.
Vulnerability assessment: This step involves a thorough examination of the organi-
zation’s systems, applications, and networks to identify any potential security weak-
nesses or vulnerabilities that could be exploited by an APT attacker. This information
is used to prioritize mitigation efforts and to develop a remediation plan.
Penetration testing: This step involves simulating an APT attack against the orga-
nization’s systems, applications, and networks to determine the effectiveness of the
organization’s security measures and to identify any new vulnerabilities that may
have been introduced.
Incident response planning: This step involves developing a comprehensive incident
response plan to be used in the event of an APT attack. The plan should outline the
steps to be taken, the roles and responsibilities of key personnel, and the communi-
cation strategies to be used during an incident. Employee Awareness and Training
are critical components of incident response planning. In this step, organizations
should focus on educating and preparing their workforce to effectively respond to an
Advanced Persistent Threat (APT) attack.

3.1 Gathering Initial Information

The information-gathering process for conducting a red team assessment in a hospital


is a critical part of the overall assessment. It involves collecting and analyzing infor-
mation about the hospital’s security measures, processes, and systems to identify
potential vulnerabilities and areas for improvement. The following are some of the
steps involved in the information-gathering process:
Conducting Open-Source Intelligence (OSINT) research: This involves collecting
information from publicly available sources such as websites, news articles, and
social media to gather information about the hospital’s infrastructure, operations,
and personnel.
Interviewing hospital staff: Speaking with hospital employees can provide valuable
insights into the hospital’s security measures, processes, and systems, as well as any
areas where they see vulnerabilities or opportunities for improvement.
Physical reconnaissance: Conducting a physical walkthrough of the hospital facilities
can help the red team understand the layout, access points, and security measures in
place.
Advanced Persistent Threat Assessment 387

Examining hospital documents and records: Accessing hospital documents and


records, such as incident reports, security protocols, and system logs, can provide
valuable information about the hospital’s security posture.
Testing systems and networks: The red team may also conduct tests on the
hospital’s computer systems and networks to identify vulnerabilities and assess the
effectiveness of security measures.
Exploiting vulnerabilities: Once the red team has identified potential vulnerabilities,
they may attempt to exploit them to demonstrate the potential consequences of a
security breach.

3.2 Clear Definition of the Scope

It is important to clearly define the scope of the assessment, including the systems,
networks, and processes that will be tested, as well as any constraints or limitations.
A comprehensive assessment plan should be developed that outlines the objectives,
methodology, and timeline of the assessment. This plan should be reviewed and
approved by the stakeholders before the assessment begins. Red team assessments
are most effective when the red team works closely with the organization being
assessed. This can help ensure that the assessment is conducted in a manner that is
both effective and respectful of the organization’s operations and personnel.
Red team assessments should simulate real-world attacks as closely as possible,
using tactics and techniques that are commonly used by malicious actors. This can
help organizations better understand their vulnerabilities and the potential conse-
quences of a security breach. As part of the assessment, the red team should test
the organization’s incident response plan to ensure that it is effective and that the
organization is prepared to respond to a security incident.
Provide actionable recommendations: The assessment should provide actionable
recommendations that the organization can use to improve its security posture.
The recommendations should be prioritized and include a plan of action for
implementation.
Regularly reassess: Security is an ongoing process, and red team assessments should
be conducted on a regular basis to ensure that the organization remains prepared to
defend against evolving threats.
388 V. Ceronmani Sharmila et al.

4 Open-Source Intelligence

The Open-Source Intelligence (OSINT) process for conducting a red team assessment
in a hospital involves collecting and analyzing publicly available information to gain
insight into the hospital’s operations, systems, and security measures. The following
are some of the steps involved in the OSINT process:
Identifying relevant sources: This involves identifying websites, news articles, and
social media platforms that may contain relevant information about the hospital, such
as its history, operations, personnel, and recent events.
Data collection: Collecting information from the identified sources, such as hospital
websites, news articles, social media, and other publicly available sources.
Data analysis: Analyzing the collected information to identify patterns, trends, and
relationships that can provide insight into the hospital’s security measures, systems,
and operations.
Verification and validation: Verifying the accuracy and reliability of the information
collected and analyzed, and cross-referencing it with other sources to ensure that the
information is valid and up-to-date.
Creating a report: Compiling the information gathered and analyzed into a report that
summarizes the findings and provides insights into the hospital’s security posture.

4.1 Dark Web Data Breach Enumeration

Data breaches can have serious consequences for individuals and organizations, but
there are steps that can be taken to turn the situation into a productive outcome:
Perform a thorough analysis of the breach: Understand the scope of the breach, what
data was compromised, and how it was accessed. This information will help you
determine the best course of action.
Notify affected parties: If any individuals have had their personal information
compromised, it is important to let them know as soon as possible so they can take
steps to protect themselves.
Improve security measures: Use the information gained from the breach analysis
to improve your security measures, such as by implementing stronger passwords,
two-factor authentication, and encryption.
Review and update your policies: Make sure that your policies and procedures for
handling sensitive data are up-to-date and aligned with industry best practices.
Advanced Persistent Threat Assessment 389

Fig. 1 Doctors data list

Use the breach as an opportunity to educate: Use the breach as an opportunity


to educate employees, customers, and other stakeholders about the importance of
protecting sensitive data and the role they play in keeping it secure.
Continuously monitor and assess your security posture: Regularly review and assess
your security measures to ensure that they are effective and up-to-date. This will help
prevent future breaches and minimize their impact if they do occur (Fig. 1).
Algorithm
• Start the process of identifying a data breach from the dark web.
• Set up monitoring systems to scan for data breaches and leaked information on
the dark web, such as threat intelligence platforms, web crawlers, or dark web
monitoring tools.
• Perform an initial search of the dark web using relevant keywords and phrases
related to the data breach, such as the name of the organization or individual
affected, specific types of data that may have been compromised, or unique
identifiers such as email addresses or social security numbers.
• Verify the authenticity of the information found on the dark web by checking
multiple sources and cross-referencing with known data breaches or leaked
information.
• Identify the source of the breach, such as a specific website or marketplace on the
dark web, and determine the type of data that has been compromised.
• Determine the extent of the data breach by examining the amount and types of
information that have been leaked, who has access to it, and how it is being used
or sold on the dark web.
• Isolate the affected data to prevent further damage by disconnecting from the
dark web and implementing additional security measures to protect sensitive
information.
• Create a web framework to access the breached and gathered data.
390 V. Ceronmani Sharmila et al.

4.2 Medical Prescription Data Enumeration

Medical prescription data can be collected from various sources, including:


Pharmacy websites: Many pharmacies maintain websites that provide information
about the medications they prescribe and dispense, as well as other healthcare
services.
Government databases: Some governments maintain databases that track prescription
drug use, including information about the medications that are being prescribed, the
patients receiving them, and the healthcare providers prescribing them.
Patient portals: Many hospitals and healthcare providers have patient portals that
allow patients to access their health records, including information about their
medications.
News articles: News articles can provide information about the medications that are
being prescribed for specific conditions or for particular patient populations.
Social media: Social media platforms can be a source of information about the
medications that people are using and their experiences with them.

4.3 Hospital Entry Data Enumeration

Hospital entry data refers to the information collected about patients and visitors
as they enter and exit a hospital. This information is typically used for operational
and security purposes, such as tracking patient flow and monitoring access to the
facility. However, the confidential nature of this data makes it a valuable target for
cybercriminals and hackers, who can use it for malicious purposes, such as identity
theft or insurance fraud.
The consequences of a hospital entry data breach can be significant, with poten-
tially far-reaching impacts on both patients and the hospital. Patients whose infor-
mation has been compromised may suffer financial losses and suffer from a loss
of privacy and confidentiality. In addition, hospitals may face legal and regulatory
consequences for violating privacy laws and may also experience a loss of reputation
and public trust.
Cybersecurity threats to hospitals are increasing, as the number of electronic
devices connected to hospital networks continues to grow, and hackers become more
sophisticated in their methods. To prevent hospital entry data breaches, hospitals
must implement robust security measures and protocols, such as firewalls, intrusion
detection systems, and encryption technologies.
Advanced Persistent Threat Assessment 391

Fig. 2 Shodan recon

5 Shodan Search on Hospital Networks

Testing the network of hospitals using Shodan is an important tool for identifying
potential security vulnerabilities and ensuring the confidentiality and privacy of
patient data. By searching for hospital systems and devices on Shodan, security
professionals can identify any publicly accessible systems that may be vulnerable to
attack, as well as any systems that may be configured improperly or running outdated
software.
In order to effectively test the hospital network using Shodan, security profes-
sionals must have a clear understanding of the hospital’s IT infrastructure, including
the types of systems and devices that are in use. This information can be used to create
a list of search terms that will be used to search for hospital systems on Shodan. The
search results can then be analyzed to identify any systems that may be vulnerable to
attack, and appropriate remediation steps can be taken to address these vulnerabilities
(Fig. 2).
Testing the network of hospitals using Shodan is a web app tool which is used to
identify any device which is connected to the network. Shodan is an effective tool
for identifying potential security vulnerabilities and ensuring the confidentiality and
privacy of patient data. However, it is important to conduct these tests in a responsible
and ethical manner and to coordinate with other hospital stakeholders to minimize
the risk of disruption to normal hospital operations. By doing so, hospitals can take
proactive steps to protect patient data and ensure the security of their IT infrastructure.

6 System Workflow Analysis

Nowadays, people have emerged and become more modern that all their personal
files and data are stored in the computer systems. These files and data should be saved
and secured properly, and that is where cybersecurity comes into play. As technology
392 V. Ceronmani Sharmila et al.

advances, many cybercrimes and attacks/threats occur around the world. The cyber-
criminals took over the various cybercrimes to steal personal data and credentials,
and they used them in several illegal ways. These cybercrimes and attacks can be
avoided by raising public awareness and providing information about cyberattacks
and threats, as well as the preventive measures that should be taken. Our applica-
tion comes into action to prevent cyberattacks/threats. This application can be used
as a database for searching for data breaches in hospitals. This application contains
various sections, such as the home page and the report page. The user will be directed
to the application’s home page, where they can learn more about it and its features.
Through the home page, the user can access other pages and features by clicking on
the options, which will take you to the doctor’s directory, the hospital directory, and
the send mail page. The home page gives directions on how to use the application and
provides knowledge about its other features. After the home page, clicking the doctor
directory will redirect to a new page. The page contains a search bar where the user
can check by typing a name or ID so that the user can check the breached details of
his or her leaked credential, like names, social security numbers, addresses, dates of
birth, and phone numbers, followed by clicking on the hospital directory, which will
redirect to the new page. The page contains a search bar where the user can check
by typing the name of the hospital, which will list the breached details of doctors,
working staff, and patient data. The breached details of patients such as dates of birth,
social security numbers, lab results, health insurance policy information, diagnoses,
disability codes, doctor names, medical conditions, etc., is also viewable and next
by clicking the send mail button, which is useful to report to the hospitals about the
data breaches that happened and helpful for them to change their credentials (Figs. 3
and 4).

Fig. 3 Data breach OSINT


Advanced Persistent Threat Assessment 393

Fig. 4 PENTESTING hospital network

7 Conclusion

In conclusion, the increasing dependence on technology and the Internet has made
cybersecurity more critical than ever. Cybercriminals are always finding new ways
to exploit vulnerabilities and steal personal data and credentials. It is essential to
raise public awareness and provide tools like the proposed application to prevent
cyberattacks and threats. The application’s features, such as the doctor directory,
hospital directory, and send mail page, provide an efficient and convenient way to
search for data breaches in hospitals and report them to the concerned authorities.
Such initiatives can help improve cybersecurity and protect personal information and
sensitive data from being compromised. As a result of our research, we have collected
data of about 600,000+ doctors. These are publicly available data from dark web. We
will automatically send this breached information to the respective doctors’ email to
safeguard their accounts to avoid further loss of private information. In near future,
this will be automated using bash scripts to avoid delays in finding the breached data.

References

1. Meng X et al (2022) Exploring the vulnerability in the inference phase of advanced persistent
threats. Int J Distrib Sens Netw. https://ieeexplore.ieee.org/document/9716119
2. Khaleefa EJ, Abdulah DA (2022) Concept and difficulties of advanced persistent threats: survey.
Int J Nonlinear Anal Appl. https://ijnaa.semnan.ac.ir/article_6230_4b14956433d591842b2dc
fe6208c141e.pdf
3. Park S-H, Yun S-W, Jeon S-E, Park N-E, Shim H-Y, Lee Y-R, Lee S-J, Park T-R, Shin N-Y, Kang
M-J (2022) Performance evaluation of open-source endpoint detection and response combining
google rapid response and osquery for threat detection. IEEE. https://www.researchgate.net/fig
ure/Reviewing-the-investigation-result-of-netstat_tbl2_342148695
394 V. Ceronmani Sharmila et al.

4. Saritac U, Liu X, Wang R (2022) Assessment of cybersecurity framework in critical infrastruc-


tures. IEEE. https://ieeexplore.ieee.org/document/9753250
5. Veena RC, Brahmananda SH (2021) A framework for APT detection based on
host destination and packet—analysis. In: Computer networks and inventive communi-
cation technologies. https://www.researchgate.net/publication/338854957_A_Framework_of_
APT_Detection_Based_on_Packets_Analysis_and_Host_Destination
6. Panahnejad M, Mirabi M (2022) APT-Dt-KC: advanced persistent threat detection based on
kill-chain model. J Supercomput. https://doi.org/10.1007/s11227-021-04201-9
7. Al Hwaitat AK, Manaseer S, Al-Sayyed RMH (2019) A survey of digital forensic methods
under advanced persistent threat in fog computing environment. J Theor Appl Inf Technol
97(18):4934–4954. http://www.jatit.org/volumes/Vol97No18/18Vol97No18.pdf
Blockchain-Based Supply Chain
Management System for Secure Vaccine
Distribution

V. Ceronmani Sharmila, S. Nandhini, and Ben Franklin Yesuraj

Abstract Vaccination is one of the best ways to stop or at least slow the spread of
infectious illnesses. There are various logistical concerns raised by this medical tech-
nique. Millions of people worldwide participate in preventative vaccination programs
every year, whether it is the seasonal flu shot, immunizations for children, or other
vaccines. An outbreak of disease can be avoided by using preventative vaccinations
before the sickness ever appears. Reactive vaccination occurs during an infectious
disease outbreak or in response to a bioterror strike, and it is distinct from preventative
immunization. Thus, this paper proposes a solution for privacy and security issues
in the supply chain industry. To overcome this problem, in this paper, we present an
efficient blockchain-based vaccine supply chain management where smart contracts
and WEB 3.0 play a major in securing the data. The web application developed
for this paper can be used by four types of users: Super admin, Hospital, Citizen,
and Manufacturer, where Super admin can govern the whole system. Then through
this application, manufacturers can add their vaccines and sell those vaccines to the
hospitals for a particular price and the hospitals can purchase the vaccines by paying
the quoted price to the manufacturer. Hospitals can purchase vaccines from manufac-
turers and view the appointment status of citizens, and the citizens are allowed to book
their appointments for vaccination. In addition, it offers foolproof data protection that
prevents hackers from altering information in any way. To secure information at its
source, a new framework known as WEB 3.0 incorporates a blockchain framework.
This paper provides a secure end-to-end data management system that allows people
to book appointments, purchase vaccines, and sell them online.

Keywords Blockchain · Smart contract · Supply chain · Vaccine distribution ·


Data security

V. C. Sharmila (B) · S. Nandhini · B. F. Yesuraj


Hindustan Institute of Technology and Science, Chennai, India
e-mail: csharmila@hindustanuniv.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 395
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_34
396 V. C. Sharmila et al.

1 Introduction

It is provoking task to inoculate masses at an incredibly enormous scope. Billions of


individuals thirstily look for vaccination over the upcoming months. If the supplying
program is well managed, it will facilitate saving program prices in making particular
program implementation expeditiously while not sacrificing the standard of service
delivery. Poorly managed supply systems will result in high or uncalled-for vaccina-
tion wastage rates, stock-outs, or improper management of waste, leading to crucial
operational program prices and a negative impact on public health. So, supply chain
and logistics play a significant role. Along with the supply chain, blockchain and IoT
are equally crucial for supporting the process and increasing efficacy. Companies are
under increased pressure to restructure their delivery strategies in response to the
ongoing pandemic emergency brought forth by the Coronavirus infection in 2019
(COVID-19). The logistical challenge of getting the vaccines to people who need
them around the world so rapidly and so effectively is unprecedented in scope and
urgency. COVID-19 emphasizes the need for international collaboration as nations
strive to recover from a severe public health emergency in order to address global
challenges. To lessen the public health impact and economic disruption, all parties
involved must work together immediately. With a diverse portfolio of COVID-19
candidate vaccines, governments can profit from economies of scale and portfolio
diversity while also guaranteeing a larger market than governments can fund on their
own.

2 Related Works

Both the Covid-Shield and Covaxin vaccines have begun rolling out across the several
states that make up India. Difficulties in the vaccination supply chain (VSC) are
expected to arise in emerging states like Bihar because of inadequate healthcare facil-
ities, widespread poverty, and low levels of education. The purpose of the research
by Shashank et al. [1] is to investigate how the IoT could affect the VSC’s efficiency.
A conceptual framework has been developed and evaluated by this research based on
the literature on the impact of IoT on product management, demand management,
supply management, social behavior, and government regulations in Bihar.
It is crucial for healthcare organizations to effectively control the healthcare supply
chain (HCSC) process and operation both during pandemics like COVID-19 and
during regular operations. According to Ilhaam et al. [2] to reduce price differ-
ences and mistakes in procurement, the suggested solution incorporates blockchain
technology and decentralized storage to increase visibility, simplify stakeholder
communication, and shorten the procurement process.
Coronavirus 2019 (COVID-19) emerged, complicating the distribution and
administration of vaccines against the virus. Current platforms and systems used
to manage data on the distribution and delivery of COVID-19 vaccines fall short in
Blockchain-Based Supply Chain Management System for Secure … 397

terms of immutability, accessibility, monitoring and tracing, audit, and confidence.


Ahmad et al. [3] propose using the Ethereum blockchain to manage information
about the shipment and delivery of COVID-19 vaccines. In order to ensure trace-
ability, strong authentication, availability, security, and accountability of COVID-19
vaccinations, they develop smart contracts.
Many computational paradigms, including cloud and edge computing, have
contributed to the rise in popularity of IoT techniques, and micro-services have been
considered a viable framework for the design and development of numerous applica-
tions. The goal of the approach proposed by Liang et al. [4] is improving specific out-
of-distribution problems in imbalanced learning to improve micro-service-oriented
intrusion detection in distributed IoT systems
Vaccines, which are biological products, provide crucial protection for people.
Many of its consumers are young children who have compromised immune systems.
When a vaccination fails, it immediately becomes a major risk to many people’s
health. The blockchain-based two-tier method for managing the manufacture of
vaccines was proposed by Peng et al. [5]. They have first developed a blockchain
architecture with two levels.
New problems have arisen as consequences of the globalization of pharmaceutical
supply chain, with battle against counterfeit and low-quality drugs taking the lead.
Kostyuchenko et al. [6] examine how blockchains may be used to modernize the
pharmaceutical supply chain and decrease the overall availability of substandard
medications.
Almost every aspect of human life has been adversely impacted by the global
COVID-19 outbreak, as well as numerous industries and geographical areas. At
the outset of the COVID-19 epidemic, Kalla et al. [7] describe some of the more
general difficulties that have occurred. We then develop prospective use cases to suit
existing needs and evaluate whether or not blockchain may serve as a major enabling
technology.
Disease epidemics have plagued humans since their earliest days on the planet.
Vaccination has just been a practical method of preventing epidemics in the last two
centuries. Modeling the effectiveness of vaccination taking into account the cost and
reward to individual players is described by Soltanolkottabi et al. [8]. The model is
a lattice-based spatial game based on the public goods game.

3 Proposed System

This paper proposes a solution for privacy and security issues for a supply chain
industry. To overcome this problem, in this paper, we present an efficient blockchain-
based vaccine supply chain management where smart contracts and WEB 3.0 play
a major in securing the data. Web application developed for this paper can be used
by four types of users, namely Super admin, Hospital, Citizen, and Manufacturer,
where Super admin can govern the whole system. Then through this application
Manufacturer can add their vaccines and sell those vaccines to the hospitals for a
398 V. C. Sharmila et al.

particular price and the hospitals can purchase the vaccines by paying the quoted
price to the manufacturer. Hospitals can purchase the vaccines from manufacturer
and view the appointment status of citizens, and the citizens are allowed to book their
appointments for vaccination. Moreover, it offers total data protection, preventing
any data tampering by hackers. Web 3.0 is a framework with a blockchain architecture
that protects data at its back-end. Since no encrypt or decrypt key is necessary to
gain access to the information, there is no longer any concern that hackers would
access or alter the data. Thus, this paper offers a complete data security solution
for scheduling appointments in the supply chain sector, purchase, and sell vaccines
through online.

3.1 Advantages of Proposed System

• High data security due to the absence of encryption/decryption key.


• Secures from hackers as it is not stored in any public/private database.
• Data cannot be tampered at all without the permission of the main user.

3.2 System Architecture

See Fig. 1.

3.3 Working

In this paper, we present an efficient blockchain-based vaccine supply chain manage-


ment where smart contracts and WEB 3.0 play a major in securing the data. An
agreement between two or more parties stored and recorded on a blockchain, such as
Ethereum or EOS, is known as a smart contract. In our paper, end-to-end protected
contracts are created using the Solidity language for smart contracts. Afterward, func-
tions are tested with Remix IDE’s functionality testing tool. Using truffle framework
connectivity, the data is integrated at the front end. The development of a framework
known as Web 3.0 includes a blockchain foundation to safeguard information at the
back-end. There are four different logins allotted for this medical field such as Super
admin, Hospital, Citizen, and Manufacturer where Super admin can govern the whole
system. Then through this application Manufacturer can add their vaccines and sell
those vaccines to the hospitals for a particular price and the hospitals can purchase
the vaccines by paying the quoted price to the manufacturer. Hospitals can purchase
the vaccines from manufacturer and view the appointment status of citizens, and the
citizens are allowed to book their appointments for vaccination. Then the details are
been verified and updated using the MetaMask transaction with the Ethereum block
Blockchain-Based Supply Chain Management System for Secure … 399

Fig. 1 Proposed system architecture

and the transaction gets initiated and successfully gets updated. The test network will
be installed on a local platform, connected to the blockchain network, and utilized to
evaluate the viability of the smart contract. As a result, the supply chain sector may
use this paper’s end-to-end data security system to schedule appointments, buy and
sell vaccinations online, and more.

4 System Process

4.1 Modules Description

• Smart Contract Development


• Functionality Testing
• Truffle Framework Integration
• Web 3.0 Frontend Development
• MetaMask Integration
• Test Network.
400 V. C. Sharmila et al.

4.1.1 Smart Contract Development

In this paper, Solidity language is used in smart contract development. Using a


distributed ledger system, smart contracts can be executed, which are agreements
between parties that are digitally recorded (such as on Ethereum or EOS). The
phrase “smart contract” refers to any type of computer code that can be recorded
on a blockchain and then activated when certain circumstances are satisfied. Smart
contracts have the advantage of producing the expected outcome on their own. The
guiding principles of blockchain technology are decentralization and transparency.

4.1.2 Functionality Testing

To ensure that all necessary features are working properly, we use Remix IDE in
this paper. During functional testing, quality assurance specialists check to see if a
program is functioning as expected. For data security, it is essential to double-check
both the encrypted and decrypted transmissions. Manual tests and automation testing
tools can both be used to ensure functionality. Manual testing is a straightforward
method for performing functional testing, which entails verifying that the product
works as intended by the target audience.

4.1.3 Truffle Framework Integration

Ganache is a private blockchain that can be used by developers to build and test
smart contracts, decentralized applications (dApps), and other software. Truffle is a
framework for developing decentralized applications. When developing a decentral-
ized application (dApp), we may use Truffle to draft the necessary smart contracts,
create unit tests for the underlying functions, and plan out the dApp’s user interface.

4.1.4 Web 3.0 Frontend Development

Web 3.0 enables the creation of smart contracts to describe the behavior of appli-
cations and their subsequent deployment to a distributed state machine. Distributed
ledgers can pave the way for Web 3.0 by encouraging widespread collaboration. Web
3.0, also known as the Semantic Web, is advantageous for two reasons: It facilitates
cooperation and decentralized invention. Web 3.0 is a primary reason why people
are increasingly relying on their mobile gadgets.

4.1.5 MetaMask Integration

Users can save their Ether in a secure token wallet called MetaMask, and they can
also manage their own identities with it. It is the part of the blockchain system in
Blockchain-Based Supply Chain Management System for Secure … 401

charge of verifying users and establishing links between accounts. The processing
charge for MetaMask transactions is 0.875% of the overall transaction value.

4.1.6 Test Network

Results from every stage of development will be gathered for this paper. The next step
is to integrate it with the blockchain network and roll out the platform locally. Smart
contract viability is verified by the network. Without a decryption key or encryption,
data can be accessed freely without worrying about unauthorized parties gaining
access to it. Therefore, the hospital’s medical records are protected from beginning
to end by this technology.

5 Algorithm

5.1 Algorithm for Smart Contract

Step 1: Create a smart contract.


Step 2: Declare input variables for the Hospital, Manufacturer, and Citizens.
Step 3: Implement signup functions for Hospitals, Manufacturers, and Citizens,
including login validation.
Step 4: Create functions to retrieve data from the blockchain for Hospitals,
Manufacturers, and Citizens.
Step 5: Return relevant data such as names, passwords, and owner addresses for
Hospitals, Manufacturers, and Citizens.
Step 6: Develop a function to update details data in the blockchain.
Step 7: Implement a function to view details data in the blockchain.

6 Pseudocode

6.1 Pseudocode for Smart Contract

See Figs. 2 and 3.

7 Results

In this paper, we can break our paper down into modules of implementation.
402 V. C. Sharmila et al.

Fig. 2 Smart contract-pseudocode (i)

Fig. 3 Smart contract-pseudocode (ii)

The first step is account creation of the users for accessing the application. There
are four different logins allotted in this application such as admin login, where the
admin can provide or reject login access to the manufacturer and can monitor vaccine
status. The manufacturer can add vaccine and can sell those products to hospitals.
The hospital can buy vaccine from manufacturers and can sell those vaccines to
citizens which means the citizens can book appointments to the hospital.
The below image represents the login page of the application where all type of
users can login and access the application (Fig. 4).
Blockchain-Based Supply Chain Management System for Secure … 403

Fig. 4 Login page

The signup page of the application where all type of users can signup using their
credentials. This application can be accessed by four types of users. The below image
represents the admin’s dashboard, the admin can provide login access to manufacturer
and can turn off login access whenever it is needed. Also, the admin can view the
vaccine remaining from manufacturers and hospitals (Fig. 5).
The below image represents the manufacture’s home page, where the manufac-
turer can add vaccine and can sell those vaccines to hospital. The admin has the
ability to either grant or deny manufacturer login access. In addition, the admin is
able to keep track of the vaccines that are currently available from the manufacturer

Fig. 5 Admin’s dashboard


404 V. C. Sharmila et al.

Fig. 6 Manufacturer home page

as well as hospitals. Additionally, they are able to check the number of people who
have been vaccinated up to this point (Fig. 6).
The vaccine-adding feature, where the manufacturer, can add products. The below
image represents the hospital’s home page, where the hospital can verify and buy
vaccine from manufacturer. Also, the hospital can sell the vaccines to citizens.
Hospitals will be listed to purchase vaccines from the manufacturers (Fig. 7).
The hospital’s home page is verifying the vaccine sold by manufacturer. Hospitals
can validate vaccines and purchase them directly from manufacturers. Additionally,
the hospital is able to offer vaccines to citizens who book appointments. As a result,
they are able to view the appointment status that has been booked by citizens. This
enables hospitals to accept or reject appointments that have been made by citizens and
are capable of providing the current vaccination status and the hospital is purchasing
vaccine from manufacturer. Hospital view appointment status booked by citizens,
and the hospitals can accept or reject the appointment booked by citizens and can
update the vaccinated status. Citizens booking appointment to hospital is represented
in Fig. 8.
There will be presentations of vaccines offered by the hospital for sale. This will
allow citizens to verify the vaccine by clicking on the QR code displayed next to the
vaccine. In addition, they will also be able to schedule appointments at the hospital.
After the hospital has given them confirmation of their appointment, they will be
able to get their vaccinations as soon as the facility is ready to do so.
Blockchain-Based Supply Chain Management System for Secure … 405

Fig. 7 Hospital home page

Fig. 8 Citizens booking appointment


406 V. C. Sharmila et al.

8 Conclusion

A successful outcome of this paper has been created for vaccine distribution in which
the data can be securely viewed with high security where Ethereum blockages have a
significant impact. Highly reliable data security solution that uses Web 3.0 and smart
contracts to protect data. This guarantees total data security, preventing hackers from
altering the data. The development of Web 3.0 uses a built-in blockchain architecture
that protects data at the back-end. Therefore, this paper offers the industry end-to-
end security and privacy for booking appointments, purchasing, and selling vaccines
through the Internet.

9 Future Work

We will examine the paper’s applicability in the near future to identify the medical
technology that needs data security and privacy. There are more opportunities to
expand or transform this paper in the medical industry. As a result, this paper will be
effective in the future and enable safe data transmission and it cannot be tampered
at all without the permission of the main user.

References

1. Kumar S, Raut RD, Priyadarshinee P, Mangla SK, Awan U, Narkhede BE The impact of IoT
on the performance of vaccine supply chain distribution in the COVID-19 context. In: IEEE
transactions on engineering management. https://doi.org/10.1109/TEM.2022.3157625
2. Omar IA, Jayaraman R, Debe MS, Salah K, Yaqoob I, Omar M (2021) Automating procurement
contracts in the healthcare supply chain using blockchain smart contracts. IEEE Access 9:37397–
37409. https://doi.org/10.1109/ACCESS.2021.3062471
3. Musamih A, Jayaraman R, Salah K, Hasan HR, Yaqoob I, Al-Hammadi Y (2021) Blockchain-
based solution for distribution and delivery of COVID-19 vaccines. IEEE Access 9:71372–
71387. https://doi.org/10.1109/ACCESS.2021.3079197
4. Liang W, Hu Y, Zhou X, Pan Y, Wang KI-K (2022) Variational few-shot learning for
microservice-oriented intrusion detection in distributed industrial IoT. IEEE Trans Industr Inf
18(8):5087–5095. https://doi.org/10.1109/TII.2021.3116085
5. Peng S et al (2020) An efficient double-layer blockchain method for vaccine production
supervision. IEEE Trans Nanobiosci 19(3):579–587. https://doi.org/10.1109/TNB.2020.299
9637
6. Kostyuchenko Y, Jiang Q (2020) Blockchain Applications to combat the global trade of falsified
drugs. In: International conference on data mining workshops (ICDMW). Sorrento, Italy, pp
890–894. https://doi.org/10.1109/ICDMW51313.2020.00127
7. Kalla A, Hewa T, Mishra RA, Ylianttila M, Liyanage M () The role of blockchain to fight against
COVID-19. In: IEEE engineering management review, vol 48, no 3, pp 85–96, 1 thirdquarter,
Sept 2020. https://doi.org/10.1109/EMR.2020.3014052
8. Soltanolkottabi M, Ben-Arieh D, Wu C-H (2019) Modeling behavioral response to vaccination
using public goods game. IEEE Trans Comput Soc Syst 6(2):268–276. https://doi.org/10.1109/
TCSS.2019.2896227
Dehazing of Multispectral Images Using
Contrastive Learning In CycleGAN

S. Kayalvizhi, Badrinath Karthikeyan, Canchibalaji Sathvik,


and Chadalavada Gautham

Abstract Picture dehazing is a necessary aspect in computer vision, especially


when employed with surveillance and satellite data. In this study, we provide a
unique descriptive learning method for multispectral image augmentation based on
CycleGAN. The suggested technique enhances the efficiency of the dehazing model
by utilizing the benefits of descriptive learning and CycleGAN. Contrastive learning
is used to separate hazy and clear images, while CycleGAN is used to transform hazy
images into their dehazed counterparts. The proposed system is trained and evalu-
ated on the RESIDE dataset, a benchmark dataset for image dehazing. The experi-
ments demonstrate that the proposed system outperforms several existing methods,
including CNN, DCP, HOT, DehazeNet, AODNET, and GDN. Our proposed system
has practical implications for applications such as remote sensing and surveillance,
where accurate image dehazing is essential.

Keywords Dehazing · Multispectral images · Contrastive learning · CycleGAN ·


Deep learning

1 Introduction

Multispectral imaging is widely used in remote sensing, environmental monitoring,


and medical diagnostics. However, atmospheric haze can significantly degrade
the quality and visibility of multispectral images, limiting their usefulness. Image
dehazing techniques aim to restore clear and accurate representations of hazy scenes.
In recent years, deep learning approaches, such as contrastive learning (CL) and
CycleGAN, have shown promise in addressing this challenge.
Contrastive learning is a self-supervised method that develops meaningful image
representations by comparing and contrasting pairs of similar and dissimilar samples.

S. Kayalvizhi (B) · B. Karthikeyan · C. Sathvik · C. Gautham


Computer Science and Engineering, Easwari Engineering College Chennai, Chennai, India
e-mail: kayalvizhi.s@eec.srmrmp.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 407
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_35
408 S. Kayalvizhi et al.

On the other hand, CycleGAN enables nonlinear mapping between domains,


facilitating the transformation of hazy multispectral images into clear counterparts.
Motivated by the potential synergy between contrastive learning and CycleGAN,
this paper proposes an integrated approach for multispectral image dehazing. The
method consists of four components: haze removal, contrastive learning, CycleGAN,
and evaluation.
Haze removal estimates transmission maps to obtain clear input images.
Contrastive learning separates hazy and clear images without requiring labeled data,
capturing the underlying structure and learning discriminative features. CycleGAN
generates clear multispectral images from the hazy ones, ensuring visual similarity
and preservation of spectral characteristics.
The proposed framework’s effectiveness is evaluated using objective measures
like peak transmission ratio (PSNR) and the structural similarity index (SSIM), as
well as visual inspections and comparisons with existing dehazing techniques.
By integrating contrastive learning and CycleGAN, our approach offers a novel
solution for multispectral image dehazing. The combination leverages the strengths
of both techniques, improving performance and generating visually comparable and
accurate multispectral images. The experimental results demonstrate the potential
for advancing the state of the art in multispectral image dehazing, with applications
in remote sensing, environmental monitoring, and medical diagnostics.

2 Proposed Solution

Dehazing of multispectral images is a challenging task that requires the removal


of the haze while preserving the spectral information. Using CycleGAN, a kind of
generative adversarial system (GAN) that can figure out how to map the fuzzy and
clear picture domains, is one potential option.
The problem addressed in this proposed solution is image dehazing of multispec-
tral images using contrastive learning in CycleGAN. Multispectral pictures, which
record images in a number of electromagnetic spectrum bands, are frequently used to
gather data about the environment. Nevertheless, atmospheric haze frequently inter-
feres with multispectral photographs, reducing visibility and degrading the precision
of subsequent analysis.
The proposed solution aims to address this problem by using a combination of
contrastive learning and CycleGAN. A self-supervised method called contrastive
learning uses pairs of data samples that are similar and dissimilar to learn repre-
sentations. Contrastive learning may be applied in the context of picture dehazing
to distinguish between hazy and clear multispectral ones, allowing it to be utilized
to train a probabilistic model like CycleGAN [1], to transform hazy photographs
into clear images. Without the need for labeled data, the deep learning-based gener-
ator network CycleGAN can develop a nonlinear translation between hazy and clear
pictures.
Dehazing of Multispectral Images Using Contrastive Learning In … 409

The input multispectral images are first preprocessed with a haze removal algo-
rithm to obtain estimated transmission maps. After that, a contrastive type of teaching
separates the photographs into pairs of positive and negative images depending on
how similar they are in the feature space [2]. By maximizing similarity between
positive pairings and minimizing similarities between negative pairs, the contrastive
loss is utilized to help the model develop representations that accurately reflect the
multispectral pictures’ underlying structure. The positive pairs are then fed into
CycleGAN, where they are transformed into the clear multispectral image domain
and then back into the hazy multispectral image domain. While the aggressive loss
promotes the generated clear spatial and spectral pictures to be visually comparable
to the real data clear multispectral photos, the time in making loss makes sure that the
created clear spectral analysis is in line with the initial hazy multispectral photographs
(Fig. 1).
The primary objective of our research is to enhance the visibility and quality of
multispectral images captured under hazy conditions. Haze and atmospheric scat-
tering significantly degrade the visibility of such images, rendering them less infor-
mative and limiting their applications in various fields, including remote sensing,
environmental monitoring, and agricultural analysis. By effectively removing the
haze and restoring the original content, we aim to improve the interpretability
and utility of multispectral imagery, enabling more accurate analysis and decision-
making processes.
To achieve this objective, we employ a novel approach based on contrastive
learning in the context of CycleGAN. Contrastive learning has recently gained
considerable attention due to its ability to learn powerful image representations from
unlabeled data. By leveraging the inherent structure and information in multispec-
tral images, our proposed method learns to capture the underlying characteristics of

Fig. 1 Architecture diagram


410 S. Kayalvizhi et al.

haze and non-haze regions. This facilitates the generation of high-quality, haze-free
multispectral images, enabling more accurate and reliable analysis.
Our understanding of the proposed methodology is based on the utilization of the
CycleGAN framework, which comprises two generative models, each responsible for
learning the mapping between haze and non-haze domains. Through the introduction
of contrastive learning, we encourage the models to focus on salient features, ensuring
that important details and information are preserved during the dehazing process. The
use of multispectral images as input allows us to exploit the complementary nature
of the different spectral bands, enhancing the overall dehazing performance.

2.1 Foundational Concepts and Techniques in the Proposed


System

To solve the issue of picture dehazing of multispectral images, we integrate


contrastive learned and CycleGAN in the suggested approach. To understand the
proposed solution, it is essential to have an understanding of the following concepts:
contrastive learning, CycleGAN [3], and transmission maps.
A self-supervised method called contrastive learning compares and contrasts pairs
of datapoints in order to develop representations. Similar samples are assigned to
adjacent points in a feature space during contrastive learning, whereas dissimilar
samples are assigned to distant locations. By comparing samples from the same
class, or positive pairs, contrastive learning is designed to improve positive pair
similarity and reduce negative pair similarity that is samples of different classes.
Without the need for labeled data, the deep learning-based prediction model
CycleGAN may develop a nonlinear mapping among two zones of data. CycleGAN
is composed of two generators and two discriminators. The classification methods
learn to discriminate between the created data and the factual facts, while the creators
learn to translate the input from one domain to another. CycleGAN also incorporates
a cycle consistency loss, which ensures that the generated data can be mapped back
to the original domain with minimal distortion.
Transmission maps are a key concept in image dehazing, which describe the
proportion of light that is transmitted through the atmosphere at each pixel in the
image. Transmission maps are estimated using various haze removal algorithms,
such as dark channel prior or atmospheric scattering models. Transmission maps are
used to remove the effects of atmospheric haze and generate clear images from hazy
images.
In the proposed system, we first estimate the transmission maps from the input
multispectral images using a haze removal algorithm [4]. We then use contrastive
learning to separate hazy and clear multispectral images and train a CycleGAN model
to generate clear multispectral images from the hazy ones. The similarity among
positive pairings of multispectral pictures is maximized using the contrastive loss,
while the similarities between negative pairs are minimized [5]. While the aggressive
Dehazing of Multispectral Images Using Contrastive Learning In … 411

loss promotes the generated clear hyper-spectral pictures to be visually comparable


to the ground truth transparent multispectral photographs, the time in making loss
makes sure that the generated clarity spectral analysis is in line with the initial hazy
multispectral photographs.
This study proposes a solution for enhancing the quality of hazy multispectral
images through the integration of contrastive learning, CycleGAN, and transmis-
sion maps. Transmission maps, which describe the light transmission through the
atmosphere in each pixel, are estimated using a haze removal algorithm. Contrastive
learning is then used to distinguish between hazy and clear images, maximizing
the similarity of positive pairs and minimizing the similarity of negative pairs. The
CycleGAN model is trained to generate clear images from the hazy ones, with a
focus on preserving visual similarity and maintaining consistency with the original
images. By combining these techniques, the proposed approach aims to produce
visually comparable and accurate multispectral images.

2.2 Overview of Integrated Dehazing Framework

The proposed system for image dehazing of multispectral images using contrastive
learning in CycleGAN consists of four main components: haze removal, contrastive
learning, CycleGAN, and evaluation. The following is a high-level description of the
proposed framework and the interactions between its components.
1. Haze Removal: The first step in the proposed framework is to estimate the trans-
mission maps from the input multispectral images using a haze removal algo-
rithm. This step is necessary to remove the effects of atmospheric haze and obtain
clear images as input for the subsequent steps. Several haze removal algorithms
can be used for this step, such as the dark channel prior or atmospheric scattering
models [6].
2. Contrastive Learning: The second component of the proposed framework is
contrastive learning, which is used to separate hazy and clear multispectral images
without requiring any labeled data. In this step, the input multispectral images
are first preprocessed and fed into the contrastive learning module. The similarity
between the positive and negative pairings of multispectral pictures is maximized
and minimized using the contrastive loss. A series of pairings of hazy and distinct
multispectral photographs are the result of this stage [7].
3. CycleGAN: The third component of the proposed framework is CycleGAN,
which is used to generate clear multispectral images from the hazy ones. The
contrastive learning phase’s positive pairings of hazy and clearer multispectral
pictures are utilized to teach the CycleGAN network in this step. While the
adversarial loss promotes the generated clear multitemporal pictures to be visu-
ally comparable to the actual truth clear multispectral photographs, the cycle
consistency loss guarantees that the generated cleared spectral analysis is in line
with the original hazy spatial and spectral photographs.
412 S. Kayalvizhi et al.

4. Evaluation: The dehazing model’s efficacy is evaluated in the last section of the
recommended framework. In addition to eye examination, evaluation may be
carried out using a variety of objective measures, including peak transmission
ratio (PSNR) and the structural similarity index (SSIM). The suggested frame-
work’s performance may be compared to those of existing cutting-edge dehazing
techniques.
The proposed system for image dehazing of multispectral images incorporates
four key components: haze removal, contrastive learning, CycleGAN, and evaluation.
The first step involves estimating transmission maps using a haze removal algorithm
to remove atmospheric haze from the input multispectral images. Contrastive learning
is then employed to separate hazy and clear images by maximizing similarity among
positive pairings and minimizing similarity among negative pairings. The CycleGAN
model is trained using the positive pairings from contrastive learning to generate
clear multispectral images from the hazy ones. The generated images are optimized
to be visually comparable to the ground truth using adversarial and cycle consistency
losses. Finally, the efficacy of the proposed framework is evaluated through visual
inspection and objective measures like PSNR and SSIM, allowing for comparisons
with existing dehazing techniques.

2.3 Using Contrastive Learning for Hazy and Clear Picture


Separation

Contrastive acquisition is a self-supervised method that develops representations


by comparing and contrasting pairs of data samples that are similar and distinct.
Contrastive learning is employed in the proposed method for image denoising of
multispectral pictures utilizing CycleGAN to distinguish between hazy and clearer
images without the need for labeled training data [8].
In this approach, positive pairs are defined as pairs of multispectral images that
are similar in the feature space, i.e., pairs of hazy and clear multispectral images.
Negative pairs are defined as pairs of multispectral images that are dissimilar in the
feature space, i.e., pairs of hazy multispectral images and clear multispectral images
that do not belong to the same image pair. In order to learn the feature space, a neural
network with deep learning, such as CNN’s [9], is used.
The choice of loss function is a critical aspect of the contrastive learning approach.
In this proposed system, we use the InfoNCE (Normalized Contrastive Estimation)
loss function, which has been shown to be effective for predictive learning. The
continuous random variable of the resemblance between positive pairings and the
control sample, normalized by the quantity of negative samples, is the measure of
the InfoNCE loss.
Architecturally, the contrastive learning component of the proposed system can
use various neural network architectures for feature extraction. One typical method
is to utilize a pre-trained CNN as the feature extractor, such as ResNet [10] or VGG.
Dehazing of Multispectral Images Using Contrastive Learning In … 413

The output of the feature extractor is then fed into a projection head, which projects
the feature vectors onto a lower-dimensional space [2].
The proposed method utilizes contrastive learning, a self-supervised technique,
in combination with CycleGAN for image denoising of multispectral images.
Contrastive learning enables the discrimination between hazy and clearer images
without the need for labeled training data. Positive pairs consist of similar multi-
spectral images, while negative pairs consist of dissimilar ones. A neural network,
like a CNN, is employed to learn the feature space. The InfoNCE loss function
is utilized, which measures the resemblance between positive pairings and control
samples normalized by the quantity of negative samples. Various neural network
architectures can be used for feature extraction, often involving a pre-trained CNN
as the feature extractor and a projection head for dimensionality reduction.

2.4 Combining Contrastive Learning and CycleGAN


for Enhanced Image

The proposed method for image dehazing of multispectral images using contrast
enhancement learning in CycleGAN incorporates the two aspects, contrastive
learning and CycleGAN to improve the performance of dehazing model. The two
components are combined in the manner described below:
1. The proposed system’s contrastive learning component creates positive pairings
of hazy and clear spectral bands pictures without the need for labeled data. The
CycleGAN model is trained using the created pairs of fuzzy and clear hyper-
spectral pictures.
2. The CycleGAN component of the proposed system is used to generate clear multi-
spectral images from the hazy ones. The performance of the image enhancement
model is then assessed using the produced clear multispectral pictures.
Integration of contrastive learning and CycleGAN allows the proposed system
to leverage the advantages of both techniques [11]. Contrastive learning enables the
proposed system to learn representations that capture the underlying structure of the
multispectral images, while CycleGAN enables the proposed system to transform
hazy multispectral images into their clear counterparts.
The integration of the two components also enables the proposed system to over-
come some of the limitations of each individual technique. For example, contrastive
learning may not be able to capture all the nuances of the dehazing process, while
CycleGAN may not be able to effectively learn the underlying structure of the
multispectral images without additional constraints.
In summary, the integration of contrastive learning and CycleGAN in the proposed
system for image dehazing of multispectral images allows the system to leverage the
advantages of both techniques and overcome some of their limitations, leading to
improved performance in dehazing multispectral images.
414 S. Kayalvizhi et al.

The proposed method combines contrastive learning and CycleGAN to enhance


the performance of the image dehazing model for multispectral images. Contrastive
learning creates positive pairings of hazy and clear images without labeled data,
which are then used to train the CycleGAN model. The CycleGAN component
generates clear multispectral images from the hazy ones, and the performance of
the model is evaluated based on the produced clear images. Integrating contrastive
learning and CycleGAN allows the system to benefit from the strengths of both
techniques. Contrastive learning captures the underlying structure of the multispec-
tral images, while CycleGAN effectively transforms hazy images into clear ones. By
combining these components, the proposed system overcomes the limitations of each
technique and achieves improved performance in dehazing multispectral images.

3 Results

The findings from the study employing CycleGAN to dehaze multispectral pictures
are presented in this part. The suggested technique successfully improves the clarity
and accessibility of photographs taken in foggy settings by utilizing the strength of
computational adversarial networks (GANs). The goal of our study is to create a
dehazing model in CycleGAN that uses contrastive learning to recover lost details
and features while retaining the scene’s original color and texture. To increase the
reliability of the suggested model, the datasets used for investigation were produced
and expanded. The proposed computational complexity metric, which gauges the
amount of data contained inside the restored image, the peak transmission ratio
(PSNR), which rates the restored image’s quality in comparison with the original,
and the average squared error (MSE), that gauges the average squared discrepancies
here between restored and innovative images, were among the metrics and were used
to assess the proposed model. The findings demonstrated that, when compared to
cutting-edge methodologies, the proposed model outperformed them on the ground of
all three criteria. The findings presented in this section offer insight into CycleGAN’s
performance while dehazing multispectral images and contribute to the body of
knowledge in the fields of imaging processing and machine vision (Fig. 2).

Fig. 2 Results of proposed model


Dehazing of Multispectral Images Using Contrastive Learning In … 415

Proposed method for dehazing multispectral images using contrastive learning


in CycleGAN was evaluated using the SS594_Multispectral_Dehazing dataset. The
findings demonstrate that, in relation to PSNR, MSE, and the suggested Entropy-
based Metric, the proposed model surpasses cutting-edge approaches. The PSNR and
MSE findings show that, in comparison with existing techniques, the suggested model
generates pictures that are more representative of the real situation. The suggested
technique is better able to maintain features and textures while eliminating haze from
the photos, according to the osmotic pressure metric, which gauges the amount of
data present in the restored image.
The suggested technique successfully eliminates haze from multispectral photos
while retaining the scene’s authentic colors and textures. This is due to the use of
contrastive learning in CycleGAN, which allows the model to learn the underlying
structure and features of the image distribution. The model may provide a dehazed
picture that accurately represents the original scene by differentiating between the
underlying scene and the haze. The dataset used for the study was preprocessed and
augmented, which improved the robustness of the proposed model (Table 1).
The table presents a comparison of six image dehazing methods, including dark
channel prior (DCP) [12], DehazeNet, AOD-Net, GFN, CycleGAN, and the proposed
method. The performance of these methods is evaluated using two widely used image
quality metrics: PSNR, RMSE, MSE, FSIM, and SSIM.
The PSNR values indicate that the proposed method outperforms all other
methods, achieving the highest score of 32.69. This is followed by CycleGAN at
30.22, GFN at 29.17, AOD-Net at 28.22, DehazeNet at 20.84, and DCP [13] at
19.34. The higher PSNR values for the proposed method and CycleGAN suggest
better image quality and dehazing performance.
The results of this study contribute to the existing body of knowledge on dehazing
of multispectral images using CycleGAN. The proposed method has potential appli-
cations in various fields, including remote sensing, autonomous vehicles, and surveil-
lance systems. Future work can explore the use of other GAN architectures and loss
functions to further improve the performance of the proposed model. Additionally,
the proposed model can be evaluated on other datasets to determine its effectiveness
in a wider range of conditions.
In this study, a method for dehazing multispectral images using contrastive
learning in CycleGAN is proposed and compared with five existing methods:

Table 1 Evaluation of various approaches’ PSNR and SSIM


Methods DCP DehazeNet AOD-Net GFN CycleGAN Proposed
PSNR 19.34 20.84 28.22 29.17 30.22 32.69
RMSE 0.082 0.16 0.089 0.115 0.094 0.18
MSE 0.0032 0.0056 0.0019 0.0022 0.0046 0.0015
FSIM 0.822 0.0679 0.914 0.921 0.752 0.914
MSSIM 0.76 0.82 0.68 0.91 0.794 0.95
416 S. Kayalvizhi et al.

Several methods have been proposed for dehazing multispectral images, but
each has its limitations. The first method estimates the transmit map and elimi-
nates haze using DCP and WAF [14], but it may not be effective for multispectral
photographs due to variations in atmospheric properties. The second method uses
adaptive histogram equalization for contrast restoration [6], but it does not estimate
the overall transmission map or eliminate haze. The third method uses an unsuper-
vised algorithm for haze removal in multispectral remote sensing pictures [15], but
it may not generalize well to other datasets. The fourth method estimates the prop-
agation map and eliminates haze using a residual architecture [16], but it may not
be effective on photographs with intricate haze distributions. The fifth method uses
CycleGAN for autonomous dehazing of aerial data [17], but it learns the translation
between foggy and haze-free pictures using an uneven contrastive loss function. The
suggested approach outperforms existing methods, especially on the SS594 multi-
spectral dehazing dataset, by using a descriptive loss function to discover the mapping
of hazy and haze-free pictures in a more symmetrical manner [17].
In terms of picture quality measures including PSNR, MSE, and entropy, the
suggested model for image enhancement multispectral images using discourse
generation in CycleGAN [17] outperforms previous techniques. As compared to
earlier efforts, our model offers superior picture restoration compared to the current
approaches. Makarau et al. [15] approach detects and removes haze from remotely
sensed multispectral imaging; however, it is unable to recover low contrast pictures.
Although not explicitly created for multispectral pictures, Zhang et al. work’s focuses
on employing a convolutional neural system with residual architecture. The unsu-
pervised haze removal method proposed by Liu et al. using asymmetric contrastive
CycleGAN also does not cater specifically to multispectral images. In contrast, our
proposed method is specifically designed for multispectral images and leverages the
advantages of contrastive learning in CycleGAN to further enhance image restora-
tion. As a consequence, our suggested approach significantly outperforms currently
used ones and exhibits promising dehazing multispectral picture outcomes.
Despite showing promising results, the proposed model for dehazing multispectral
images using contrastive learning in CycleGAN has several limitations that need to
be addressed.
Initially, having access to high-quality information for training is crucial for
the model’s success. The model’s performance can be significantly impacted if
the training dataset is not representative of the test dataset. Obtaining high-quality
datasets with diverse and challenging hazy conditions is essential for improving the
model’s robustness.
Secondly, the proposed model requires substantial computational resources during
training, making it less practical for real-time applications. The training process is
time-consuming and requires high-end hardware to achieve optimal results.
In summary, while the proposed model represents a significant improvement
over existing methods, there are still limitations that must be addressed to make
the approach more practical for real-world applications.
The proposed model for dehazing multispectral images using contrastive learning
in CycleGAN [17] presents several potential areas for future research. Firstly, the
Dehazing of Multispectral Images Using Contrastive Learning In … 417

model can be further improved by incorporating other deep learning techniques


such as attention mechanisms or capsule networks to better capture the underlying
features in multispectral images. Secondly, the model can be tested on other datasets
with varying levels of haze and atmospheric conditions to evaluate its generaliz-
ability. Additionally, the proposed model can be extended to handle other types of
atmospheric degradation such as fog or smoke.
Moreover, as the use of multispectral imagery becomes increasingly popular in
a wide range of applications. The suggested approach may be used to handle many
problems in industries including agriculture, pollution management, and urban plan-
ning, according to future studies. Using multispectral pictures, the suggested model
may be modified to carry out tasks including classification, segmentation, including
object detection.
When compared to the existing solution [17], both papers address the common
problem of haze removal using the CycleGAN framework with contrastive learning,
our work focuses specifically on dehazing multispectral images, which differ signif-
icantly from aerial imagery in terms of complexity, data representation, and anal-
ysis requirements. Multispectral images capture information across multiple spec-
tral bands, enabling more comprehensive and detailed analysis compared to tradi-
tional aerial imagery. Thus, our proposed method is tailored to exploit the specific
characteristics and challenges associated with multispectral data.
One of the primary differentiating factors is the choice of input data. While the
existing work focuses on aerial imagery, we specifically address the dehazing of
multispectral images. This allows us to leverage the unique properties of multispectral
data, such as the complementary nature of different spectral bands, to improve the
dehazing performance. By considering the full spectral information, our method can
enhance the visibility and interpretability of multispectral images more effectively,
resulting in better analysis and decision-making capabilities in various domains.
Moreover, our approach introduces specific modifications and enhancements to
the basic CycleGAN framework. These modifications are designed to optimize the
performance of dehazing multispectral images, taking into account their distinct char-
acteristics. We incorporate additional loss functions and regularization techniques
tailored for multispectral data, resulting in improved image quality, better preserva-
tion of spectral information, and enhanced haze removal accuracy compared to the
existing approach.

4 Conclusion

In CycleGAN, a new method for image contrast enhancement of multispectral


pictures was suggested in this research. In both PSNR and SSIM measures, it was
shown that the suggested system outperformed a number of already used techniques,
notably CNN [9], Dp, HOT, DehazeNet, AODNET, and GDN. The PSNR and SSIM
values produced by the suggested system, which are much higher than those by
the other approaches, were 33.21 dB and 0.92, respectively. The proposed system
418 S. Kayalvizhi et al.

has practical implications for applications such as remote sensing, surveillance, and
autonomous driving, where accurate image dehazing is essential.
The proposed system leverages the advantages of contrastive learning and
CycleGAN [17] to improve the performance of the dehazing model. Contrastive
learning is used to separate hazy and clear images by generating positive pairs of
hazy and clear multispectral images without requiring any labeled data. CycleGAN
is used to transform hazy multispectral images into their clear counterparts. Binary
bridge (BCE) loss is used in CycleGAN [17] to train the discriminator network.
In conclusion, the proposed system for image dehazing of multispectral images
using contrastive learning in CycleGAN [17] represents a significant contribution
to the field of image dehazing. The proposed system outperforms several existing
methods, provides a new approach to image dehazing, and has practical implications
for a wide range of applications. The suggested system may be further optimized in
the future, maybe with enhancements to the CycleGAN [17] and contrastive learning
components and the use of more challenging datasets to assess the system’s perfor-
mance. In general, the suggested strategy is poised to significantly advance picture
dehazing techniques and improve both the quality and purity of photographs across
a range of applications.

References

1. Li R, Pan J, Li Z, Tang J (2018) Single image dehazing via conditional generative adversarial
network. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 8202–8211
2. Zhang X, Wang J, Wang T, Jiang R (2021) Hierarchical feature fusion with mixed convolution
attention for single image dehazing. IEEE Trans Circuits Syst Video Technol 32(2):510–522
3. Liu Q, Gao X, He L, Lu W (2017) Haze removal for a single visibleremote sensing image.
Signal Process 137:33–43
4. Chen C, Do MN, Wang J (2016) Robust image and video dehazing with visual artifact suppres-
sion via gradient residual minimization. In: Computer vision–ECCV 2016: 14th European
conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14.
Springer International Publishing, pp 576–591
5. Xie B, Guo F, Cai Z (2010) Improved single image dehazing using dark channel prior and
multi-scale Retinex. In: Intelligent system design and engineering application (ISDEA), 2010
international conference on, vol 1. IEEE, pp 848–851
6. Narasimhan SG, Nayar SK (2003) Contrast restoration of weather degraded images. IEEE
Trans Pattern Anal Mach Intell 25(6):713–724
7. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: residual
learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
8. Xu H, Guo J, Liu Q, Ye L (2012) Fast image dehazing using improved dark channel prior.
In: 2012 IEEE international conference on information science and technology. IEEE, pp
663–667
9. Ren W, Liu S, Zhang H, Pan J, Cao X, Yang MH (2016) Single image dehazing via multi-
scale convolutional neural networks. In: Computer vision–ECCV 2016: 14th European confer-
ence, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14. Springer
International Publishing, pp 154–169
Dehazing of Multispectral Images Using Contrastive Learning In … 419

10. Nguyen LD, Lin D, Lin Z, Cao J (2018) Deep CNNs for microscopic image classification by
exploiting transfer learning and feature concatenation. In: 2018 IEEE international symposium
on circuits and systems (ISCAS). IEEE, pp 1–5
11. Krizhevsky A, Sutskever I, Hinton GE (Dec. 2012) ImageNet classification with deep convolu-
tional neural networks. In: Proceedings of advances in neural information processing systems.
Lake Tahoe, NV, USA, pp 1097–1105
12. Liu C, Hu J, Lin Y, Wu S, Huang W (2011) Haze detection, perfection and removal for high
spatial resolution satellite imagery. Int J Remote Sens 32(20):8685–8697
13. Moro GD, Halounova L (2007) Haze removal for high-resolution satellite. Int J Remote Sens
28(10):2187–2205
14. Monika LT, Rajiv S (May 2018) Efficient dehazing technique for hazy images using DCP and
WAF. Int J Comput Appl 179(42):0975–8887
15. Makarau A, Richter R, Muller R, Reinartz P (2014) Haze detection and removal in remotely
sensed multispectral imagery. IEEE Trans Geosci Remote Sens 52(9):5895–5905
16. Qin M, Xie F, Li W, Shi Z, Zhang H (2018) Dehazing for multispectral remote sensing images
based on a convolutional neural network with the residual architecture. IEEE J Select Topics
Appl Earth Observ Remote Sens 11(5):1645–1655. https://doi.org/10.1109/JSTARS.2018.281
2726
17. He X, Ji W, Xie J (2022) Unsupervised haze removal for aerial imagery based on asym-
metric contrastive CycleGAN. IEEE Access 10:67316–67328. https://doi.org/10.1109/ACC
ESS.2022.3186004
Stress Analysis Prediction for Coma
Patient Using Machine Learning

P. Alwin Infant, J. Charulatha, G. Sadhana, and K. Ragavendra

Abstract Today’s working IT professionals frequently struggle with stress issues.


The patient is now more likely to experience stress due to changing lifestyle and work-
place cultures. Even while many companies and sectors deliver welfare programs and
also make efforts to enhance work environments, the issue remains off from grasp.
In this paper, we will utilize various techniques to analyze the stress patterns of
working people and identify the variables that exert a massive effect on the system’s
distress. By utilizing machine learning techniques to forecast models to predict the
threats of stressor encountered prognosis through using machine learning, we hope
to streamline this procedure. In the working class, mental health illnesses linked to
stress are not unusual. Concerns about the same have previously been highlighted by
several researchers. In order to do this, information from coma patient answers from
working medical experts’ mental health data was taken into consideration. After data
extraction and preprocessing, various machine learning approaches were employed
to supervise our framework. In this case, the mechanism can apply the machine
learning techniques such as decision tree, random forest, KNN, logistic regression,
and Naive Bayes. Results of the experiments demonstrated the system’s improved
performance.

Keyword Artificial intelligence · Coma patient · Data science · Machine


learning · Python programming language · Stress analysis prediction

P. Alwin Infant (B)


Computer Science and Engineering, Panimalar Engineering College, Chennai, India
e-mail: alwininfant@hotmail.com
J. Charulatha · G. Sadhana · K. Ragavendra
Department of Information Technology, Hindustan Institute of Technology and Science, Chennai,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 421
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_36
422 P. Alwin Infant et al.

1 Introductıon

In the working class, mental health illnesses linked to stress are not unusual. In
the Indian service sector, around 42% of current staff feel suicidal and perhaps
overall chronic anxiety as an outcome of overwork and demanding constraints. A
recent study was conducted by the firm body Ascham. Today, it is well-known that
stress at work contributes to depression. Increasing the number of stress indicators
available to workers in busy situations would be very beneficial to society from
a social standpoint [1]. Machine learning is defined as learning that is carried out
through experience with a type of event, performance measures, and it will get
better with time [2]. The World Health Organization (WHO) claims that, bad strain
is still a mental chronic condition which can disrupt a person’s relationships, lead
to despair, and ultimately result in suicide [3]. The ability to recognize stress can
enable people to actively manage it before negative effects occur. Psychological tests
and expert advice are used in traditional stress detection methods [4]. The anxiety
measure is rather arbitrary because the findings of questionnaires heavily depend
on the responses provided by individuals. The outcome scale would be skewed if
people choose to communicate their mental characteristics with hesitation [5]. The
more prevalent type of stress is distress. The other type, known as eustress, is also
referred to as “good stress” because it arises from a “positive” perspective on an
incident or circumstance [6]. Stress is beneficial in moderation since it can inspire
you and increase your productivity. Our pleasure and health may deteriorate if we
consistently react negatively [7].
In the education sector, children experience stress due to test anxiety, while in the
medical sector, patients experience stress due to various diseases that are treated for
various purposes [8]. With good mental health, an employee works constructively as
well as accomplishes goals [9]. Biological variables as well as social and economic
contexts are typically to blame for mental health issues. A student may benefit from
the assistance of their family and other close friends [10]. Some life circumstances
can be stressful, but whether they cause us problems depend on how we see them [11].
Stress among coma patients is seriously harmful to our health [12]. By effectively
merging numerous bits of information from various domains, machine learning is
able to predict future clinical outcomes and identify the domain combination that is
most likely to predict results [13]. Since fatigue is difficult for machines to distinguish
between anxiety, depression, and stress, an adequate learning algorithm is needed
for a precise diagnosis [14].
National Institute of Mental Health and Stroke (NINDS) reflexes may also cause
spontaneous motions like a grin, laugh, or weep [15]. In fact, machine learning
enables us to enhance performance in the healthcare industry without programming
[16]. The inability to localize unpleasant sensations is predicted with precision [17].
A patient who is in a coma is not receptive to either internal or external stimuli [18].
Coma is an unresponsive state with closed eyelids that typically indicates serious
and occasionally irreparable brain damage [19]. An organized strategy should be
employed while treating a comatose patient, with an initial focus on reversible causes
Stress Analysis Prediction for Coma Patient Using Machine Learning 423

[20]. Psychiatric disorders, emotional problems, phobia, psychotic illnesses, sorrow,


mood disturbances, eating disorders, and a few others can be brought on by mental
disease [21]. Lack of spontaneous eye opening, verbal response, and voluntary move-
ments are characteristics of coma patients [22]. Coma, the vegetative stage (VS), and
the minimally conscious state (MCS) are all examples of disorders of conscious-
ness (DOC) [23]. For vegetative stage, they have some points where no sustained,
reproducible, intentional, voluntary behavioral reactions to noxious, visual, audi-
tory, tactile, or self-awareness of self or surroundings are present [24]. A person
has a mental disorder if their thoughts, feelings, or behavior change, which can be
distressing for them and make it difficult for them to lead a normal life [25]. Mini-
mally conscious state are described the points of verbal yes/no answers or gestures.
Interacting and holding items is done in such a way that respects their shape and size
[26]. As a result, the essence of this study is to simply implement machine learning
techniques such as decision tree, random forest, KNN, logistic regression, and Naive
Bayes to identify as well as classify the level of stress for coma patients in the system.
Here, we compare all the machine learning algorithms’ accuracy scores, and finally,
we show which algorithms are the best based on the accuracy of the system.
This essay organized numerous chapters to define the system’s outcome. Section 1
presented the introduction section, Sect. 2 the supporting work for our research,
Sect. 3 the proposed methodology, Sect. 4 the discussion and findings section, and
Sect. 5 the part that concluded our system.

2 Related Works

Bhushan et al. [26] “Review of Stress Prediction and Analysis Using Machine Learn-
ing”. As a result, the demand for bioinformatics to collaborate with learning algo-
rithms evolved. An analysis of tension or its symptoms using a wide range of tech-
niques from machine learning has been addressed in this work. Techniques like
random forest are among them. Dutta et al. [27] proposed a method that uses a
biosensor and machine learning to detect the stress. However, an overview of feasible
computer classification models is presented in this paper along with a comparison of
them using healthcare data analytics. Bisht et al. [28] proposed a Machine Learning
for Stress Diagnosis in Indian School Pupils. In order to examine one’s frustration,
K-nearest neighbors is utilized resulting in an accuracy of 88 percent. The method
is compared with a variety of classification models, such as tree structure, logistic
regression, KNN neighbors, and random forest. Pankajavalli et al. [29] “A Support-
Vector Machine Classifier Based on Hybrid Enhancing Particle Swarm Optimization
for Improved Stress Prediction”. The suggested stress prediction model extracts the
best characteristics using differential boost particle swarm optimization, and the
support vector machine (SVM). The suggested model achieves excellent accuracy
with a lower prediction error rate, according to the experimental data. Sudha et al.
[30] “Stress Detection Using Machine Learning Techniques Depending on Human
424 P. Alwin Infant et al.

Parameters”. In the proposed work, we’ll use a machine learning algorithm to identify
pressure based on human parameters.

3 Methodology

The primary goal of the proposed system is to apply ML algorithms to recognize


and categorize individuals’ stress levels in coma patients. The actions to do in order
to display the system’s output:
a. The gathering of datasets comes first. Here, data from coma patients are gathered
in csv format on the Kaggle website.
b. The data preparation stage is the next and is used to prepare the data for the
system’s future uses.
c. The next phase involves using data visualization to show the data. In this instance,
we employ exploratory data analysis approaches to plot the data or comprehend
the data for other purposes.
d. The system is implemented and trained utilizing the model implementation
approach, which comes next. To train and test the system’s data, we employ
various classification models.
e. The classification model is the following phase, which will categorize the
collected data on our datasets. We just want to anticipate the datasets on the
sorts of stress levels of coma patients after the categorization portion is finished.
f. Finally, we put the performance measures into place to present each algorithm’s
accuracy score, confusion matrix, and classifications report based on the system’s
f 1 score, recall, and precision. Finally, we compare each method’s accuracy
score and demonstrate which algorithm acts better when it comes to the system’s
accuracy value.
Figure 1 shows the block diagram of the proposed work.
We employ many types of algorithms in the work to train the model to forecast
the patient’s degree of stress while they are in a coma. The following techniques are
used: Decision tree, Naive Bayes, random forest, K-nearest neighbor, and logistic
regression.

3.1 Naïve Bayes

Using Bayes principle, the Bayes algorithm is a supervised learning method for
issues with categorization. Therefore, in the proposed, this procedure is employed
to anticipate the patient’s level of stress.
Stress Analysis Prediction for Coma Patient Using Machine Learning 425

Fig. 1 Block diagram

3.1.1 Algorithm for Implementing Naïve Bayes

Step 1: Install package name of Gaussian NB classifier.


Step 2: Import the packages.
Step 3: Generate the model with the help of Gaussian NB classifier function.
Step 4: To fit the Gaussian NB model by using the xtrain and ytrain parameters.
Step 5: To predict the output with the help of the Gaussian NB classifier model
by using the xtest parameters.
Step 6: To find the accuracy value with help of the Gaussian NB classifier model
by using the xtest and ytest parameters.

3.2 Random Forest

A component of the supervised learning approach is random forest. The idea of


ensemble learning acts as its basis, which is a tactic that combines to resolve
classification and boost the efficiency of the model.

3.2.1 Procedure for Implementing Random Forest

Step 1: Install package name of random forest classifier.


Step 2: Import the packages.
Step 3: Generate the model with the help of random forest classifier function.
426 P. Alwin Infant et al.

Step 4: Using the xtrain and ytrain parameters to fit the random forest classifier
model.
Step 5: To predict the output with the aid of the random forest classifier model by
using the xtest parameters.
Step 6: To find the accuracy value with aid of the random forest classifier model
by using the xtest and ytest parameters.

3.3 Logistic Regression

In a group of supervised learning, logistic regression is a well-liked machine learning


technique. The output of a categorical variable is expected using logistic regression.

3.3.1 Procedure for Implementing Logistic Regression

Step 1: Install package name of logistic regression classifier.


Step 2: Import the packages.
Step 3: Generate the model with the help of logistic regression classifier function.
Step 4: To fit the logistic regression model by using the xtrain and ytrain
parameters.
Step 5: To predict the output with the aid of the logistic regression classifier model
by using the xtest parameters.
Step 6: To find the accuracy value with the aid of the logistic regression classifier
model by using the xtest and ytest parameters.

3.4 K-Nearest Neighbor

A simplistic machine learning approach is K-nearest neighbor, based on the super-


vised learning method. A data point is categorized depending on the classifications
of its neighbors.

3.4.1 Procedure for Implementing K-Nearest Neighbor

Step 1: Install package name of KNN classifier.


Step 2: Import the packages.
Step 3: Generate the model with the help of KNN classifier function.
Step 4: To fit the KNN model by using the xtrain and ytrain parameters.
Step 5: To predict the output with the help of the KNN classifier model by using
the xtest parameters.
Stress Analysis Prediction for Coma Patient Using Machine Learning 427

Step 6: To determine an average accuracy one may use the KNN classifier model
by using the xtest and ytest parameters.

3.5 Decision Tree

An example of supervised learning is a decision tree. The variables are arranged here
using a decision tree algorithm based upon this system’s person’s level of anxiety.

3.5.1 Procedure for Implementing Decision Tree

Step 1: Install package name of decision tree classifier.


Step 2: Import the packages.
Step 3: Build a model using the decision tree classifier function.
Step 4: To fit the decision tree model by using the xtrain and ytrain parameters.
Step 5: To predict the output with the help of the decision tree classifier model by
using the xtest parameters.
Step 6: To find the accuracy value with help of the decision tree classifier model
by using the xtest and ytest parameters.

4 Results Analysıs

This section discusses about the outcome of the proposed work. The data visualization
for the proposed work depending on the plotting graph, is displayed in Fig. 2. The
counter graph of our data is shown in Fig. 3.

Fig. 2 Pie chart


428 P. Alwin Infant et al.

Fig. 3 Counter plot

The confusion matrix and classification report for every single algorithm of the
system is presented in Fig. 4a–e.
The prototype implementation of the system’s accuracy compares all of the
algorithm scores before showing the system’s graph as depicted in Fig. 5 of this
article.
The comparison of existing method and proposed method is shown in Fig. 6.
The proposed system is compared with the existing methods like KNN, Naive
Bayes, random forest, and decision tree. The accuracy of the proposed system
using logistic regression is 98% which outperforms the conventional classification
methods. These results are noteworthy, but there are a few restrictions that need to
be considered. First, despite the fact that a machine learning model produced the
most trustworthy and consistent findings, the predictors all originated from the same
dataset. Real-time data and subsequent treatment and clinical indicators in contexts
of advanced treatment might make the model more accurate.

5 Conclusıon

In conclusion, the phrase “altered mental status” covers a wide variety of patient
behaviors, from perplexity to profound non-responsiveness. The most frequent
causes of coma will be covered in this activity, along with the factors that should
be taken into account while choosing the best course of action. To assess the coma
patients’ stress levels, machine learning methods are used. When we examine the
accuracy scores of all the algorithms, we finally reach 98% for the logistic regres-
sion algorithm, which demonstrates superior performance when compared with other
system models. This model can be improved further by using a deep learning algo-
rithm and putting the user interface of the proposed work through its paces using a
Django web application. Alternatively, we construct this work using a Tkinter-based
Stress Analysis Prediction for Coma Patient Using Machine Learning 429

Fig. 4 a Confusion matrix of KNN algorithm, b Confusion matrix of decision tree algorithm,
c Confusion matrix of Naive Bayes algorithm, d Confusion matrix of logistic regression algorithm,
e Confusion matrix of random forest algorithm

graphical user interface. Human input can be entered manually, and after selecting
the predict button, an outcome based on system datasets is displayed.
430 P. Alwin Infant et al.

Fig. 5 Comparison of all the scores of algorithm

Fig. 6 Comparison of existing method versus proposed method

References

1. Attallah O (2020) An effective mental stress state detection and evaluation system using a
minimum number of frontal brain electrodes. Diagnostics 10(5):292
2. Sophia G, Sharmila C (Sep 2019) Recognition, classification for normal, contact and cosmetic
iris images using deep learning. Int J Eng Adv Technol 8(3):4334–4340. ISSN: 2277-3878
3. Pankajavalli PB, Karthick GS, Sakthivel R (2021) An efficient machine learning framework
for stress prediction via sensor integrated keyboard data. IEEE Access 9:95023–95035
4. AlShorman O, Masadeh M, Heyat MBB, Akhtar F, Almahasneh H, Ashraf GM, Alexiou A
(2022) Frontal lobe real-time EEG analysis using machine learning techniques for mental stress
detection. J Integr
5. Gupta R, Alam MA, Agarwal P (2020) Modified support vector machine for detecting stress
level using EEG signals. Comput Intell Neurosci
Stress Analysis Prediction for Coma Patient Using Machine Learning 431

6. Kang M, Shin S, Jung J, Kim YT (2021) Classification of mental stress using CNN-LSTM
algorithms with electrocardiogram signals. J Healthcare Eng 2021:1–11
7. Chung J, Teo J (2022) Mental health prediction using machine learning: taxonomy, applications,
and challenges. Appl Comput Intell Soft Comput 2022:1–19
8. Laijawala V, Aachaliya A, Jatta H, Pinjarkar V (2020, June) Classification algorithms
based mental health prediction using data mining. In: 2020 5th international conference on
communication and electronics systems (ICCES). IEEE, pp 1174–1178
9. Mutalib S (2021) Mental health prediction models using machine learning in higher education
institution. Turkish J Comput Math Educ (TURCOMAT) 12(5):1782–1792
10. Ogunseye EO, Adenusi CA, NawakwaUgwu AC, Ajagbe SA, Akinola SO (2022) Predictive
analysis of mental health conditions using AdaBoost algorithm. Paradigm Plus 3(2):11–26
11. Singer G, Golan M (2021) Applying datamining algorithms to encourage mental health
disclosure in the workplace. Int J Bus Inf Syst 36(4):553–571
12. Ge F, Li Y, Yuan M, Zhang J, Zhang W (2020) Identifying predictors of probable posttraumatic
stress disorder in children and adolescents with earthquake exposure: a longitudinal study using
a machine learning approach. J Affect Disord 264:483–493
13. Priya A, Garg S, Tigga NP (2020) Predicting anxiety, depression and stress in modern life
using machine learning algorithms. Proc Comput Sci 167:1258–1267
14. Leightley D, Williamson V, Darby J, Fear NT (2020) Identifying probable post-traumatic
stress disorder: applying supervised machine learning to data from a UK military cohort. J
Ment Health 28(1):34–41
15. Sathya D, Sudha V, Jagadeesan D (2020) Application of machine learning techniques in health-
care. In: Handbook of research on applications and implementations of machine learning
techniques. IGI Global, pp 289–304
16. Zulfiker MS, Kabir N, Biswas AA, Nazneen T, Uddin MS (2021) An in-depth analysis of
machine learning approaches to predict depression. Curr Res Behav Sci 2:100044
17. Srinivasan S, Bidkar PU (2020) Approach to a patient with Coma. Acute Neuro Care: Focused
Appr Neuro Emerg 23–34
18. Tarulli A, Tarulli A (2021) Coma and related disorders. Neurol Clin Appr 23–39
19. Rabinstein AA (2020) Acute coma. Neurol Emerg Pract Appr 1–13
20. Sujal BH, Neelima K, Deepanjali C, Bhuvanashree P, Duraipandian K, Rajan S, Sathiya-
narayanan M (2022, Jan) Mental health analysis of employees using machine learning
techniques. In: 2022 14th International conference on communication systems & networkS
(COMSNETS). IEEE, pp 1–6
21. Consoli D, Galati F, Consoli A, Bosco D, Micieli G, Serrati C, Cavallini A (2021) Coma.
Decision algorithms for emergency neurology, pp 17–51
22. Pandwar U, Ramteke S, Motwani B, Agrawal A (2022) Comparison of full outline of unre-
sponsiveness score and glasgow coma scale for assessment of consciousness in children with
acute encephalitis syndrome. Indian Pediatr 59(12):933–935
23. Sahlan F, Hamidi F, Misrat MZ, Adli MH, Wani S, Gulzar Y (2021) Prediction of mental health
among university students. Int J Percept Cogn Comput 7(1):85–91
24. Lu H, Uddin S, Hajati F, Khushi M, Moni MA (2022) Predictive risk modelling in mental
health issues using machine learning on graphs. In: Australasian computer science week 2022,
pp 168–175
25. Tutun S, Johnson ME, Ahmed A, Al Bizri A, Irgil S, Yesilkaya I, Harfouche A (2022) An
AI-based decision support system for predicting mental health disorders. Inf Syst Front 1–16
26. Bhushan U, Maji S (2022, Nov) Prediction and analysis of stress using machine learning: a
review. In: Proceedings of third doctoral symposium on computational intelligence: DoSCI
2022. Singapore: Springer Nature Singapore, pp 419–432
27. Dutta A, Tripathy HK, Sen A, Pani L (2021) Biosensor for stress detection using machine
learning. In: Cognitive informatics and soft computing: proceeding of CISC 2020. Springer
Singapore, pp 85–97
28. Bisht A, Vashisth S, Gupta M, Jain E (2022, April) Stress prediction in Indian school students
using machine learning. IEEE
432 P. Alwin Infant et al.

29. Pankajavalli PB, Karthick GS (2022) Improved stress prediction using differential boosting
particle swarm optimization based support vector machine classifier. Springer Singapore
30. Sudha V, Kaviya S, Sarika S, Raja R (2022) Stress detection based on human parameters using
machine learning algorithms
AI-Powered News Web App

Sheetal Phatangare, Roopal Tatiwar, Rishikesh Unawane, Aditya Taware,


Aryaman Todkar, and Shubhankar Munshi

Abstract The prospect of technology-based techniques in accordance with the


development, production, and dissemination of news goods along with services,
have significantly altered our news industry in recent years. The science fantasy
realm of artificial intelligence (AI) has given way to a very real technology that may
help society handle a variety of problems, including the difficulties the news business
faces. The widespread use of computing has made the many methods that artificial
intelligence (AI) may accomplish clear. We examined the news industry’s use of AI.
According to our research, the journalism industry has not adequately used AI as
other fields. The majority of AI news initiatives rely on funding from big giants like
Google. This limits the number of players who can employ AI in the news industry
to a small number. By providing examples of how these subfields are developing in
journalism, we arrived at certain findings and came up with a course of action for
developing our proposed work. What we propose is a news website application that
aims at giving hands free navigation and utilization of the site by using AI and voice
processing techniques. The main objective of this paper is to first shed some light on

S. Phatangare · R. Tatiwar (B) · R. Unawane · A. Taware · A. Todkar · S. Munshi


Vishwakarma Institute of Technology, Pune 411037, India
e-mail: roopal.tatiwar20@vit.edu
S. Phatangare
e-mail: sheetal.phatangare@vit.edu
R. Unawane
e-mail: rishikesh.unawane20@vit.edu
A. Taware
e-mail: aditya.taware20@vit.edu
A. Todkar
e-mail: aryaman.todkar20@vit.edu
S. Munshi
e-mail: shubhankar.munshi20@vit.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 433
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_37
434 S. Phatangare et al.

AI and its applications. It also examines our proposed system which is an applica-
tion of AI. It also discusses other technologies used in the system like speech-to-text
technologies. Along with this, details regarding the working and methodology of this
proposed system are also explained.

Keywords Artificial intelligence (AI) · News · NLP · Voice recognition

1 Introduction

Today, technology, and data are used to inform a wide range of decisions we make in
our daily lives. This implies that for any industry to remain viable in the future, it must
adapt and adopt these technologies. Although, AI is frequently used in news media,
academic study on the subject has not reached its full potential. Speech recognition
is widely used in applications that respond to voice commands or ask questions to
accurately and automatically convert voice data into text data. Beyond that, NLP
is a computer program’s ability to understand and react to text or speech input,
extract meaning from sentences, or compose understandable communications, much
like people do. This broad topic includes translation, categorization, and clustering
as well as information extraction as one of its subtopics. The former explains the
procedure that turns unstructured data into structured data that can be understood.

1.1 Use of AI

Despite the fact that AI is a fundamentally vast topic that is categorized based upon
some number of factors, some of the well-known AIs are explained below:
1. Neurology serves as the foundation for one field of artificial intelligence, which is
neural networks. It makes advantage of cognitive science and automates activities
using Python. These may quickly and efficiently tackle difficult problems when
paired with ML.
2. Robotics: A hot area of artificial intelligence, robotics is primarily concerned with
the creation of robots. It is a multidisciplinary subject of science and engineering
where the fundamental architecture (design of the system), manufacture, the way
it’s operated, and the usage of robots are primarily decided. Robots can complete
activities that humans would find challenging to complete consistently.
3. Fuzzy Logic: Identifying whether a condition is true or untrue can sometimes be
quite challenging. Fuzzy logic has the capacity to represent and change uncertain
information depending on how accurate it is. Standard logic is 1.0 for absolute
truth, 0.0 for absolute falsehood, and anything in between for uncertainty.
4. Natural Language Processing [NLP]: By creating strategies that facilitate
communication using NLP, human languages may be represented. These systems
AI-Powered News Web App 435

have the ability to recognize the meaning of a line of text and take the appro-
priate response. It primarily focuses on speech recognition, text translation, and
sentiment analysis. Many contemporary applications utilize it to filter out spam
and violent messages and to improve user experiences on their platform.
The above section highlights the importance of AI in contemporary society.
Recognizing its relevance we have used this technology to build our proposed system.
What we are proposing is a news website where users can easily access news from
news all sites by simply searching a topic or a headline. This is accomplished using an
API-News API. To make the experience more convenient for users, we have imple-
mented voice processing techniques, so that the user need not type and can dictate
the command to run the search. A supporting chatbot is also implemented to help
users easily navigate the website.
One of the many crucial aspects of this study is the branch of natural language
processing. In order to fulfill a request, the programs are expected to comprehend the
user’s voice. We have utilized Alan AI for the proposed system. Users may utilize
voice commands to get information from applications using Alan. In contrast to some
other voice assistants, Alan enables businesses to create custom speech experiences
for their apps. The front end is developed using ReactJS. It is developed and main-
tained by Facebook. It is an open-source and JavaScript-based web framework. It
has been around since 2013 and served as the foundation for several outstanding
online apps. VS Code is a free code editor that can be used with MacOS, Windows,
and Linux. It offers various excellent features, including: first-class code completion,
refactoring of code, assistance with debugging, etc.

2 Literature Review

Edriss et al. [1] presented their study in the research article. Its major goal was to
use deep learning techniques for speech synthesis and evaluate how well it performs
in terms of aperiodic distortion compared with existing models of natural language
processing algorithms.
We also read the research article presented by Sonali et al. [2]. The study describes
a voice-activated home automation system based on microcontrollers that uses smart-
phones. Users would be able to voice-control every gadget in their home with the
help of such a system. All the user needs is an Android smartphone, which almost
everyone has these days, and a control circuit.
We also studied the system developed by Han et al. [3]. Their goal was to create
a browser that would respond to voice commands. Using a microphone, a voice-
activated web browser digitizes spoken words, which are then fed into a speech-to-
text (STT) conversion tool. The STT converter application then chooses the set of
words that are the most likely from the stored commands. The program searches the
terms, and if a match is found, the specified action will be taken. The web browser
can be operated using special commands. The web browser receives these directives
436 S. Phatangare et al.

as input, and as a result, the prescribed actions will be carried out. In the event that
the commands do not match, nothing will occur, and the user must try again.
In the paper written by David et al. [4] a system of voice-controlled surgical
equipments of the kind used in operating rooms to carry out surgical procedures is the
focus of the invention as it is now, has been presented. The invention includes a voice-
controlled integrated surgical suite that includes at least a surgical table and a surgical
light-head device. As we studied many applications of speech recognition, we came
across another interesting application which was—utilizing a voice-controlled device
to prevent erroneous wake word detections [5], along with a wireless communication
system with voice control [6].
People can use voice commands to navigate the web using voice browsers.
Although plug-ins are used by the voice capability of the current browsers, these
plug-ins need to be downloaded separately because they are not a core part of the
browsers. Bajpei et al. [7] presented an approach to create a browser capable of
responding to voice commands.
After reviewing the above listed papers, we observed that there were a few systems
that were similar to what we were proposing, but not entirely. The existing systems
were mostly browsers that utilized voice processing techniques to take commands
from user to navigate through the Internet. Other uses included a software that assisted
in surgeries by using voice command and a voice command-based home automa-
tion system. Our proposed system like these other uses voice processing or voice
commands, but the system proposed or its application is completely different from
these. Our system focuses on only providing relevant news or information and will
not perform generic tasks to navigate a browser.

3 Methodology

3.1 Block Diagram

ReactJS and Material UI were used to implement the proposed system’s front-end.
ReactJS, to create user interface components, was already described. The components
are reusable as often as the developer wants. On the client-side as well as the server-
side, React may be utilized with a variety of frameworks. React was chosen for
front-end mostly because it enables us to build web applications with enormous
amounts of data and doesn’t force us to reload the page every time we change data. It
also has a huge community support. A messenger called the application programming
interface, or API, receives requests, and informs a system of what the user needs.
Then, using this connection, applications may communicate with one another to send
requests and get replies. The actual backend connecting engine between several other
apps is an API (Fig. 1).
AI-Powered News Web App 437

Fig. 1 Overall structure of


system designed

The API key of NewsAPI is integrated with code on Alan API. One may search for
and obtain accurate, up-to-date news throughout the world using the NewsAPI [1].
There are several query sets accessible; one may look for news by keywords, genre,
or news source. The developer can alter the news query for the specific locations.
The developer may obtain breaking news as needed using this HTTP REST API. The
front-end was created using the functionality of this API, and it has cards that display
news based on searches, categories, or news sources. By offering an SDK that is
simple to integrate and Alan Studio for JavaScript scripting, the Alan Conversational
Platform offers robust support for your app. The testing tool offered by Alan Studio
allows the developer to troubleshoot JavaScript instructions. The Alan button may
be dynamically positioned anywhere by swiping or sliding it with the mouse without
interfering with the application’s user interface. Given that Alan Studio itself manages
the cloud, it is even more potent. The cloud easily manages data security and isolation
(Fig. 2).
The news requirement served as the foundation for the proposed system’s
scripting. The voice assistant’s reading of all the news headlines that the user looked
for is entirely written. Any article that can be seen on the front-end can be requested
to be opened so that the user can read it in detail. When a user requests to access a
certain article, the system leads them to that particular news article. Alan provides

Fig. 2 API architecture


438 S. Phatangare et al.

Fig. 3 Interaction of Alan


and the application

certain more functions to the system, such as mathematical computations and small
chat (Fig. 3).

4 Method

The proposed system uses News API to fetch the required news articles. News API
is a RESTful API that allows you to search and retrieve live articles from all over
the web. To achieve this, it makes use of a variety of technologies, including: the
fundamental protocol that News API uses when speaking with users is HTTP. The
JSON format is what the News API uses to display its data. Caching: To enhance
performance, the News API uses caching. In order to access items more rapidly, it
does this by storing duplicates of them in memory. Rate limiting: To stop abuse,
News API employs rate limiting. In other words, it restricts how many requests a
customer may make in a particular amount of time. When you send a request to News
API, it first looks in its cache to see if it already has a copy of the requested article.
AI-Powered News Web App 439

If so, it will retrieve the article from the cache and return it. If it doesn’t already have
a copy of the article, it will download it from the original website and cache it.
Alan AI’s speech-to-text system makes use of a deep learning model. This model
was developed using a sizable dataset of transcripts and audio recordings. The speech
recognition model gains the ability to identify speech patterns associated with various
words and phrases. The model is used to convert spoken words into text when a
user speaks to Alan AI. The specific algorithm employed by Alan AI is not made
available to the public. It is most likely a variation of the hidden Markov model
(HMM) technique, which is covered in more detail later. Overall, Alan AI’s speech-
to-text algorithm is quite precise. Depending on the quality of the audio recording
and the accent of the speaker, it will probably be between 90 and 95% correct.
An example of aiding technology that narrates digital text aloud is text-to-speech
(TTS). Another name for it is “read-aloud” technology. With the click or push of a
button, TTS can convert text on a computer or other digital device, such as a mobile
phone, PC, or tab into audio. Children who struggle with reading and people with
visual disabilities can both benefit greatly from TTS. However, it can also aid children
in their writing, editing, and even focus.

5 Frequently Used Algorithms in Speech Recognition


Implementation

Traditional techniques in ASR includes mainly two algorithms:


1. A Q-gram is another name for an n-gram. In the discipline of computational
linguistics, it is a contiguous sequence of n items from a given sample of text or
speech. The components can be phonemes, syllables, letters, words, or base pairs,
depending on the application. Usually, a corpus of spoken or written language is
used to collect the n-grams.
Probability of next word being An given previous words of sentence (A1, A2,
A3,…, An)

P( A) = P( An|A1, A2, A3, . . . .An) (1)

The equation can be resolved by the chain rule of probability:

P(A, B)
P(A|B) = (2)
P(B)

2. The hidden Markov model (HMM), the alternative algorithm, essentially merely
moves in the other direction. HMMs view forward as opposed to backward. A
HMM algorithm employs probability and statistics to make an educated predic-
tion as to what will follow next without taking into account any information of
the prior state—in our case, the words that came before the word in question.
440 S. Phatangare et al.

Modern deep learning-based voice recognition algorithms produce a single, end-


to-end model that is more accurate, quicker, and simpler to deploy on smaller
machines like smart phones and Internet of Things (IoT) devices like smart speakers.
The primary method employed is the artificial neural network, a deep design with
many layers that is roughly based on how our brains function. Using the CTC loss
method, a convolutional neural network (CNN) and RNN-based (recurrent neural
network) architecture delineates each word in the speech like the deep speech model
from Baidu, an RNN-based sequence-to-sequence network, such as Google’s Listen
Attend Spell (LAS) model, that treats each “slice” of the spectrogram as one element
in a sequence. Deep learning architecture for ASR comes in a variety of forms. The
following are the top two methods: CNN and RNN. Any neural network must go
through the following two crucial steps in order to be trained:
• Through forward propagation, data is received, processed, and output is produced.
• Compute the error and update the network’s parameters for backward propagation.
Forward Propagation
• Put the input pictures into a variable (say X1)
• Create a filter matrix by randomly initializing it. The filter convolves the images.

B1 = X 1 ∗ f (3)

• Asterisk sign * is mathematical representation of convolution. If we have an input


image represented as X1 and a filter represented with f , then the expression can
be shown as above in (4).
• On the outcome, apply the Sigmoid activation function.

A = Sigmoid(B1) (4)

• Set the bias and weight matrices to a random starting point. To the values, apply
a linear transformation.

B2 = W 1∧ T . A + d (5)

• Here, X1 is the input, W 1 is the weight, and d is a constant termed bias.


• Use the data to apply the Sigmoid function. This will be the result in the end.

O = Sigmoid(B2) (6)

Backward Propagation
We initialize the weights, biases, and filters at random before starting the forward
propagation procedure. These numbers are regarded as convolutional neural network
technique parameters. The model tries to modify the parameters during the backward
AI-Powered News Web App 441

propagation phase so that the overall predictions are more accurate. The gradient
descent method is used to update these parameters.

6 Results and Discussions

A key new component that is revolutionizing how people live is voice control. Mobile
phones and PCs both regularly use the voice assistant. AI-based voice assistants are
operating systems that can recognize a human voice and respond with one of their in-
built voices [8]. Screen readers are frequently used by people with vision impairments
to engage with computer systems. These people are increasingly using voice-based
virtual assistants a lot (VAs) [9]. Nobody anticipated that speech recognition, which
first appeared in the early 2010s with the launch of Siri, which is now one of the most
well-known gadgets, would become such a major factor in the development of new
technologies. According to estimates, 1/6th of Americans now have a smart speaker
at home. This just provides us a little peek of how far-reaching this technology is
and how many opportunities it offers.
In the case of speech-to-text technology, there are many technologies and models
present. Some popular applications/models include Siri, Google, Alexa, etc. We
compared these with the technology we used—Alan AI. All four of these speech-
to-text engines are remarkably close in terms of accuracy. There are some vari-
ations in how they function, though. For instance, while Alexa and Google use
deep neural networks (DNN), Siri uses a statistical model called a hidden Markov
model (HMM) to transcribe voice. HMMs and DNNs are both used by Alan AI. The
way these speech-to-text engines manage background noise is another distinction
between them. The speech-to-text engine in Google’s search engine is not as effec-
tive at handling background noise as Siri and Alexa are. When it comes to being able
to deal with background noise, Alan AI falls in between Siri and Alexa. This has
been summarized in Tables 1 and 2.

Table 1 STT technologies comparison


Feature Siri Alexa Google Alan AI
Accuracy (%) Over 90 Over 90 Over 90 Over 90
Background noise handling Good Good Not as good Somewhere in between Siri
and Alexa
Algorithm HMM DNN DNN HMM + DNN
442 S. Phatangare et al.

Table 2 News API comparison


API Newscatcher API Bing news Google news Alan news API
search API API
Price Free tier available, paid Paid plans Paid plans Free tier available, paid
plans start at \$29/month start at \$29/ start at \$40/ plans start at \$9.99/month
month month

These APIs all provide a comparable set of functionalities. The cost is where they
diverge most from one another. If you want a free API with high-quality data, Alan
News API is a fantastic choice. With a straightforward RESTful interface, it is also
simple to use.
In the realm of speech recognition, one of the most significant predictions for
2020 was that people’s search habits will alter. The conversion of touch points into
listening points is now affecting brands. There is no visual interface for voice assis-
tants. The technology will let the user engage both visually and vocally [10]. The user
will be able to have a totally user-friendly experience and interact with the technology
more if he or she can see and hear the content they have requested. Voice recogni-
tion technology is developing globally, and in the next few years, it is predicted to
increase and generate tremendous earnings. The system will add a lot of functions
and capacities as it currently meets the majority of the requirements. An increase in
the number of languages the system could handle would make using it even more
comfortable and simple for our users.

7 Future Scope

We can expand the capability of our application to advance further. If we add more
languages to the software, such as Hindi, Marathi, Japanese, etc., our website would
be able to command many different languages and be utilized all over the world.
Machine learning can also be used to develop and integrate a recommendation system
that will categorize news and offer stories based on the user’s interests. For example,
if a user requests positive news, our web app will deliver stories that fall under the
positive category, and vice versa. The overall GUI can also be upgraded and made
more user-friendly.

8 Conclusion

There were a variety of factors that motivated us to come up and build this proposed
work. It started with an interest in AI and its current applications in the industry.
After researching, we observed that its application in journalism is quite limited and
has not been explored much. Reading a newspaper takes a significant amount of
AI-Powered News Web App 443

time, and the majority of that time is often spent reading about subjects in which
the reader has absolutely no interest. Other times, the user gets sucked into reading
unnecessary things present in the paper, until they are able to get to the main point.
The software is able to read the headlines of all the articles in the news, and if the
user wants it, it can also read the entire piece for a more in-depth look. It fluently
reads out the entire article for the user, allowing them to listen to the news in comfort
and ease.
Applications similar to these do exist in the market. For ex-post for Alexa and
BBC News for Alexa, etc. The difference between these apps and our proposed work
is the APIs and technology used. We believe that the STT in our system is better
than the ones utilized in those apps. Other difference is that those systems are apps
whereas our system is a website and is accessible from any device. Along with this,
our system provides a chatbot for customer support. Applications in fields such as
healthcare, business, finance, and electronic commerce are just some of the many
domains in which the Alan voice assistant might be used. With this, we are going to
proclaim that our proposed work is complete.

References

1. Edriss A (2020) Deep learning based NLP techniques. In: Text to speech synthesis for
communication recognition. J Soft Comput Paradigm
2. Shamik C, Sonali S, Raghav T, Ankita B (2015) Design of an intelligent voice controlled home
automation system. Int J Comput Appl 121(15):39–42
3. Han S, Jung G, MinsooRyu BUC, Cha J A voice-controlled web browser to navigate hierarchical
hidden menus of web pages in a smart-tv environment. In: Proceedings of the 23rd international
conference on World Wide Web
4. Holtz BE, Sanders WL, Hinson JR, Zelina FJ, Sendak MV, Logue LM, Belinski S, McCall DF
(1999) Voice-controlled surgical suite. US Patent No. US6591239B1, Issued Since
5. Freed IW, Barton W, Rohit P (2014) Preventing false wake word detections with a voice-
controlled device. Google Patent US9368105B1
6. Stephen B, Mickey K (2016) Voice controlled wireless communication device system.
US7957975B2, Google Patents, 2016-Present
7. Bajpei AB, Shaikh MS, Ratate NS Voice operated web browser. Int J Soft Comput Artif Intell
8. Siddesh S, Subhash AUS, Srivatsa PN, Santhosh B (2020) Artificial intelligence-based
voice assistant. In: 2020 Fourth world conference on smart trends in systems, security and
sustainability
9. Adam F, Vtyurina A, Morris MR, Leah F, White RW (2019) VERSE: bridging screen readers
and voice assistants for enhanced eyes-free web search. In: Proceedings of the 21st international
ACM SIGACCESS conference on computers and accessibility (ASSETS ‘19)
10. Aaditya C, Ranjeet K, Saini A, Akash K (2021) Voice controlled news web application with
speech recognition using alan studio. Int J Comput Appl 0975–8887
Design of Efficient Pipelined Parallel
Prefix Lander Fischer Based on Carry
Select Adder

J. Harirajkumar, R. Shivakumar, S. Swetha, and N. Sasirekha

Abstract Every digital processing system may execute basic operations like adding
and subtracting by using a number of binary adders with various addition durations
(latency), space constraints, and power requirements. The power delaying products
(PDP) of digital signal processor (DSP) components must be kept to a minimum
in order to achieve greater effectiveness in extremely large-scale integration plat-
forms. The types of adders that utilize prefixed functioning for effective additions
are parallel prefix adders or carrying tree adders. Because of their capabilities for
high speeds computing, parallel prefix adders are the most employed adders nowa-
days. It’s referred to as a carry tree adder, which is much faster than rippling carrier
adders, carrier skipping adders, carrier selection adders (CSA), etc., and does arith-
metic addition using the prefix function. This research compares the effectiveness of
findings on the characteristics of region, latency, and power for a 32-bit implementa-
tion of several parallel prefix Ladner-Fischer adders. The Ladner-Fischer adder with
a black cell requires a lot of memory to operate. To increase the effectiveness of
the Ladner-Fischer adder, the gray cell can thus be used as a substitute for the black
cell. The prescribed technique’s operations are divided into three primary steps: prior
processing, generations, and subsequent processing. The propagation and generation
phases are the pre-processing step. The generating phase concentrates on carry gener-
ation, whereas the post-processing step concentrates on the outcome. This research
calculates the Logic and routing delayed effectiveness of the suggested architecture.

J. Harirajkumar · S. Swetha (B) · N. Sasirekha


Department of Electronics and Communication Engineering, Sona Collage of Technology, Tamil
Nadu, Salem 636005, India
e-mail: swetha.21mevd@sonatech.ac.in
J. Harirajkumar
e-mail: harirajkumar.j@sonatech.ac.in
N. Sasirekha
e-mail: sasirekha.n@sonatech.ac.in
R. Shivakumar
Department of Electrical and Electronics Engineering, Sona Collage of Technology, Tamil Nadu,
Salem 636005, India
e-mail: shivakumar@sonatech.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 445
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_38
446 J. Harirajkumar et al.

According to the findings of the study, CSA with a parallel prefix adder performs
better than the traditional adapted CSA and uses less space.

Keywords Carry select adder · Ladner-Fischer adder · Pre-processing ·


Generation · Speed · Area · Logic and route delay

1 Introduction

Individuals can easily understand and use numbers in decimals to execute mathe-
matical operations. For particular computations in digital architecture like a micro-
processor, a digital signal processing (DSP), or an application-specific integrated
circuits (ASIC), binary values become more feasible. Since binary values are the
most effective at expressing a wide range of values, this takes place. Among the
most crucial logic components in digital systems are binary adders [1]. The embedded
system is a subset of a number of computing architectures that are primarily intended
to do a single task, including preserving, analyzing, and managing information
across various electronics-based devices. Consumer gadgets like pagers, calculators,
mobile phones, portable video games, digital cameras, etc., are examples of items
that have embedded technology [2]. As handheld technology expands to include
high-performance, computation-intensive goods like handheld computers and mobile
phones, both of these factors are currently combined. The fundamental difficulty in
electronic structure designs currently is low-power consumption, which is a signif-
icant growth in the electronics semiconductor business [3]. It is vital for having a
fast adder since these adders can do such tasks. When using digital adders, the pace
of additions is determined by how quickly a carry is forwarded by the adder. In a
basic adder, the summation for every bit is produced only progressively following
the earlier bit location has been included and a carry has been transmitted into the
following position [4].
Every digital, DSP, or control system must have addition operations. Thus, the
efficiency of adders determines how quickly and precisely a digital system operates.
Therefore, the primary focus of studies in VLSI system layout is on enhancing addi-
tion effectiveness [6]. Previously, processors used 32-bit carrying adders, including
the Ripple Carry Adder (RCA), Carry Propagate Adder (CPA), and Carry Look
Ahead Adders (CLA), with various addition timings (delay), areas, and requirements
for power [7].

2 Literature Review

In this work, a modification to 32-bit pipelined multipliers based on an FSM is shown.


The carrying time for propagating is decreased in 32-bit FSM-based pipelined multi-
pliers by substituting CLAs and CSAs for RCAs. The rapid connectivity execution
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 447

of pipelined multiplication may be performed using the suggested way. Due to the
complexity of the process of designing, this strategy is ineffective [13]. A notable
function for adder circuitry nowadays is microprocessors. In key routes of mathe-
matical operations like multiplying and subtracting, adders are frequently utilized.
In this study, an improved 4-bit CLA and a Carrier Selection Adder (CSA) designing
technique is suggested. Since it uses more power and takes up more chip space, this
method is ineffective [14].

3 Proposed Methodology

In order to overcome the significant delayed issue of present carry adders, this study
suggests the structure of 32-bit variable parallel prefix adder for lower latency and
low-power VLSI usages. To build quick broad adders, CSA employs many narrow
adders. The appropriate total can simply be chosen by a 2-to-1 mux whenever the
carry-in is ultimately determined. Figure 1 depicts the top view of carry select adder
and its schematics representation is depicted in Fig. 2. The adder built using this
methodology is referred to CSA. The proposed structure of 32-bit variable parallel
prefix adder can be used for quick arithmetic operations, data processing.

Fig. 1 Top view of carry


select adder

Fig. 2 Schematics representation of carry select adder


448 J. Harirajkumar et al.

3.1 Proposed CSA Based Ladner-Fischer Adder

Highly effective addition operations are performed using the Ladner-Fischer adder.
The identical prefixed adder utilized to carry out the addition operations is the Ladner-
Fischer. It appears to be an arrangement of trees that carries out the arithmetic oper-
ations. Black cells and gray cells make up the Ladner-Fischer adder. In each black
cell, it has two gates for AND along with one OR gate. In every gray cell, there is
just one AND gate. Every bit in ripples carrying adders waits for the previous bit’s
action. The goal is to overlapping the carrying propagation of the initial addition
with the computation in the succeeding adding in parallel prefix adders instead of
awaiting it to occur since recurring additions can be managed by a multi-operand
adder.

3.2 Pipelining Methods

An approach employed for speeding up computing system performance pipelines


is used here. The concept of pipelines is employed to increase speed significantly
while using very little hardware. A job is divided into many phases and given its
own separate unit for processing in the pipeline approach. When the primary phase’s
work is finished, another assignment may enter it, while the first task continues on to
the next phase. This procedure resembles an assembly line since there is a separate
task being completed at every step.

3.3 Black and Gray Cell

Four inputs (Gk0 , Pk0 , Pk1 , Gk1 ) and three logical gates one OR gate and two AND
gates make up the black cell. In the next black cell, along with Pk2 and Gk2 , the
outcomes of gout and pout are provided as inputs. Repeating the procedure until
the desired result is reached. Greater storage and latency are used by the black cell.
Black cells can be changed for gray cells to decrease them. Figure 3 represents the
black and gray cell in 8-bit Ladner-Fischer adder.

3.4 Pre-processed Phase

In the prior processing stage, generating and propagating are picked from every
pair of sources. While the creation conducts a “AND” operation on the input bits,
propagating provides an “XOR” operation. The propagation and generation functions
are shown in Eqs. 1 and 2, respectively.
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 449

Fig. 3 Representation of black and gray cell in 8-bit Ladner-Fischer adder

Pk = Yk (XOR)Z k , (1)

G k = Yk (AND)Z k . (2)

3.5 Generation Phase

During this phase, carry is created for every bit and is referred to as carry generates
(C k(G) ), and carry propagates (C k(P) ) for carry that is propagated for every bit. Each
action has a final cell that provides carry, and the propagating and produced carriers
are generated for the upcoming action.

Ck(P) = Pk1 (AND)Pk0 , (3)

Ck(G) = G k1 (OR)[Pk1 (AND)G k0 ]. (4)

The black cell in Eqs. (3) and (4) represents the carry propagation and carry gener-
ation, respectively, whereas the gray cell in Eq. (5) represents the carry generating
that is depicted below. The succeeding bit and the last bit will be added together
simultaneously as the outcomes of the last bit of carry. Equation 5 provides the carry
generation, which is used for the succeeding bit sum action.

Ck(G) = G k1 (OR)[Pk1 (AND)G k0 ]. (5)


450 J. Harirajkumar et al.

Fig. 4 Stages of creating


parallel prefix
Ladner-Fischer adder

3.6 Post-processed Phase

In the last phase of an efficient Ladner-Fischer adder, which is depicted in Eq. 6, the
carry of the first bit is XORed with the next bit of propagated information. The result
is then presented as a sum.

Ok = Pk (XOR)Ck(in)−1 . (6)

It is utilized for two thirty-two-bit addition processes, and before every bit outputs
the resultant total, it passes through initial processing and generating stages.
The initial batch of input bits undergoes pre-processing before being produced,
propagated, and generated. This generated and propagated proceeds through a gener-
ating step that results in carry generated and carry propagated before providing
the ultimate result. Figure 4 depicts the step-by-step operation of an effective
Ladner-Fischer adder (Figs. 5, 6 and 7).

4 Results

Numerous levels of abstraction have been simulated in the past. The requirements
have been met by the extension of the simulation. Xilinx ISE Design Suite 14.2 and
ModelSim, and the related delay computations have been recorded.
Table 1 includes each of the 32-bit parallel prefix adder categories, including
32-bit Ladner-Fischer, 32-bit Han-Carlson, and 32-bit Ladner-Fischer with Register.
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 451

Fig. 5 The structure of the 32-bit effective Ladner-Fischer adder

Fig. 6 Inner view of parallel


prefix adder
452 J. Harirajkumar et al.

Fig. 7 Technology
schematic of 32-bit
Ladner-Fischer adder

Findings from simulations are checked based on area, power, and latency. Figure 8
shows the 32-bit sign Lander Fischer adder simulated outcome. Figure 9 shows the
simulation result for a 32-bit Lander Fischer adder based on a carry select adder, and
Fig. 10 shows the synthesis result of the area. The result also includes the waveforms
and comparative findings for each of the three parallel prefix adders.
Delay Graph

Table 1 Area and delay comparative findings for several PPA categories
S. No Method Area Delay
SPARTAN 3 Slice Gate LUT Max delay Gate delay Path delay
S5000 4FG 900 count (ns) (ns) (ns)
1 32-bit 75 879 139 23.161 11.975 11.186
Han-Carlson
2 32-bit 50 552 85 32.548 16.530 16.018
Ladner-Fischer
3 32-bit 48 790 82 7.165 6.364 0.801
Ladner-Fischer
with Register

Fig. 8 Simulation result for 32-bit sign Lander Fischer adder


Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 453

Fig. 9 Simulation result for 32-bit Lander Fischer adder based on carry select adder

Fig. 10 Synthesis result of area for 32-bit Ladner-Fischer adder

See Table 2 and Figs. 11, 12, 13, and 14.

Table 2 Comparison of various adder performances in terms of delay, area, and power usage
References Technique Area (µm2 ) Delay (ns) Usage of power (mW)
[2] Brent-Kung Adder 105 4.04 –
(BKA)
[12] Kogge-Stone adder 100 8.63 32.02
(KSA)
[2] Hans-Carlson adder 145 4.322 –
(HCA)
[12] Pasta adder (PA) 70 7.123 30.04
Proposed parallel prefix 48 0.801 24.80
Lander Fischer adder
454 J. Harirajkumar et al.

Delay Graph
40

30
Over all Delay
20 Gate Delay
Path Delay
10

32 Bit han-carlson 32 bit lander fischer proposed PPLFA

Fig. 11 Delay comparison for 32-bit Lander Fischer Adder

Fig. 12 Proposed system 160


performance evaluation in 140
terms of area 120
100
Area

80
60
40
20
0
BKA KSA HCA PA Proposed
PPLFA
Techniques

Fig. 13 Proposed system 10


performance evaluation in
terms of delay 8
Delay in ns

6
4
2
0
BKA KSA HCA PA Proposed
PPLFA
Techniques

Fig. 14 Proposed system 40


performance evaluation in
Power usage

terms of power usages 30

20

10

0
KSA PA Proposed PPLFA
Techniques
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 455

5 Conclusion

In order to increase efficiency and reduce the size, method for designing a Ladner-
Fischer adder is presented in this study. Pre-processing, carry generations, and post-
processing are all three steps of the suggested system’s activities. The propagating and
generating stages are the emphases of the pre-processing phase, the carry-generating
phase is the focus of the carry-generating stage, and the following processing phase
is the emphasis of the outcome. To speed up the process of binary addition, the total
amount of cells in the carry-generating phase is decreased in a way that resembles
the structure of a tree. The suggested adder additions process has a lot going for it in
terms of minimizing latency. Xilinx 14.5 version of VHDL is employed to assess the
effectiveness of these adders. The benefits of the suggested technique are confirmed
using the outcomes of the simulation. The effectiveness of the suggested system is
compared with that of other current methodologies. The outcomes of the comparison
demonstrate how much latency is reduced by the suggested 32-bit parallel prefix
Lander Fischer adder.

References

1. Santhi G, Deepika G (2016) Realization of parallel prefix adders for power and speed critical
applications. Int J VLSI Syst Des Commun Syst (IJVDOS) 4(02)
2. Al-Haija QA, Asad MM, Marouf I, Bakhuraibah A, Enshasy H (2019) FPGA synthesis and
validation of parallel prefix adders. Acta Electronica Malaysia 3(2):31–36
3. Honda S (2019) Low power 32-bit floating-point adder/subtractor design using 50 nm CMOS
VLSI technology. Int J Innov Technol Explor Eng 8(10)
4. Arunakumari S, Rajasekahr K, Sunithamani S, Suresh Kumar D (2023) Carry select adder using
binary excess-1 converter and ripple carry adder. In: Lenka TR, Misra D, Fu L (eds) Micro and
nanoelectronics devices, circuits and systems, in Lecture notes in electrical engineering, vol
904. Springer, Singapore, pp 289–294. https://doi.org/10.1007/978-981-19-2308-1_30
5. Chandrika B, Krishna GP (2016) Implementation and estimation of delay, power and area for
parallel prefix adders
6. Galphat V, Lonbale N (2016) The high speed multiplier by using prefix adder with MUX and
Vedic multiplication. Int J Sci Res (IJSR), ISSN (Online), pp 2319–7064
7. Shinde KD, Badiger S (2015) Analysis and comparative study of 8-bit adder for embedded
application. In: 2015 international conference on control, instrumentation, communication and
computational technologies (ICCICCT), IEEE, 2015, pp 279–283. https://doi.org/10.1109/ICC
ICCT.2015.7475290
8. Moqadasi H, Ghaznavi-Ghoushchi MB (2015) A new parallel prefix adder structure with effi-
cient critical delay path and gradded bits efficiency in cmos 90nm technology. In: 2015 23rd
Iranian conference on electrical engineering, IEEE, pp 1346–1351. https://doi.org/10.1109/Ira
nianCEE.2015.7146426
9. Hebbar R, Srivastava P, Joshi VK (2018) Design of high speed carry select adder using modified
parallel prefix adder. Procedia Comput Sci 143:317–324. https://doi.org/10.1016/j.procs.2018.
10.402
10. Santhakumar C, Shivakumar R, Bharatiraja C, Sanjeevikumar P (2017) Carrier shifting algo-
rithms for the mitigation of circulating current in diode clamped MLI fed induction motor
drive. Int J Power Electron Drive Syst 8(2):844–852
456 J. Harirajkumar et al.

11. Shivakumar R, Chandrasekar S, Sankarganesh R (2017) Signal characteristics for high voltage
transformer applications. J Int J Comput Theor Nanosci 14(4):1770–1777
12. Shivakumar R, Rangarajan M (2016) Small signal stability analysis of power system network
using water cycle optimizer. J Int J Emerg Technol Comput Sci Electron 20(2):297–30232
13. Silviyasara T, Harirajkumar J (2016) Efficient architecture reverse converter design using Han
Carlson structure with carry look ahead adder. In: 2016 international conference on wireless
communications, signal processing and networking (WiSPNET) Mar 23. IEEE, pp 1620–1623
14. Silviyasara T, Harirajkumar J (2016) Efficient architecture reverse converter design using Han
Carlson structure with carry look ahead adder. In: 2016 international conference on wireless
communications, signal processing and networking (WiSPNET). IEEE, pp 1620–1623
Non-invasive Diabetes Detection System
Using Photoplethysmogram Signals

Dayakshini Sathish, Souhardha S. Poojary, Samarth Shetty,


Preethesh H. Acharya, and Sathish Kabekody

Abstract Diabetes Mellitus (DM) is a chronic condition, where the body is unable
to control blood sugar levels. In this paper, a non-invasive method to classify
diabetic and non-diabetic cases is discussed. The collection of blood samples from
patients by puncturing their fingers causes discomfort, pain, and infection, which are
serious drawbacks of commercially available invasive blood glucose level monitoring
systems. A novel non-invasive device for classifying blood sugar level is developed
using a Near-Infrared sensor (NIR). Photoplethysmogram (PPG) signals, which are
sensitive to blood glucose levels, are acquired using NIR sensors. ANOVA statistical
analysis is used to identify significant features from the PPG signal. Quadratic SVM
provided better results for features such as first derivative crest, time of inflection,
and age in classifying the diabetic and non-diabetic subjects. Results from real-time
PPG signals are compared with features taken from a diabetes dataset from Kaggle.
The results showed that proposed PPG signal analysis method performed better.

Keywords Diabetes mellitus · Non-invasive · Near infrared ·


Photoplethysmogram · Smartphone · Glucose level

1 Introduction

The metabolic disorder known as Diabetes Mellitus (DM) is characterized by the body’s
inability to control blood sugar levels (BGLs) [1]. The amount of glucose in the blood is
controlled by the hormone Insulin, which is made in the pancreas. Type 1 and Type 2 diabetes
are the two main subtypes of DM [1–3]. When the immune system destroys the pancreatic

D. Sathish (B) · S. S. Poojary · S. Shetty · P. H. Acharya


Department of Electronics and Communication Engineering, SJEC, Mangaluru,
Karnataka 575028, India
e-mail: dayakshini@sjec.ac.in
S. Kabekody
Department of Electrical and Electronics Engineering, SJEC, Mangaluru, Karnataka 575028,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 457
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_39
458 D. Sathish et al.

Fig. 1 Expected growth of


diabetes in India [6, 7]

cells that make insulin, chronic type 1 diabetes develops [4]. A chronic disease called type
2 diabetes affects how the body metabolizes sugar [5].

The prevalence of diabetes is rising both internationally and in India, primarily as


a result of overweight/obesity and unhealthy lifestyles [6]. According to estimates,
seventy-seven million people in India had diabetes as of 2019 and that number is
predicted to increase to over one thirty-four million by 2045 as shown in Fig. 1 [6,
7]. According to the statistics, the prevalence of diabetes has increased from 7.1%
in 2009 to 8.9% in 2019, with 12.1 million of those individuals being over 65. By
2045, that number is expected to reach 27.5 million [4, 7]. In India, which has a
population of 43.9 million, it is estimated that nearly fifty-seven percent of adults
with diabetes are undiagnosed [4]. Until April 2020, reports from National Health
Centers and Hospitals in various countries indicate that patients with diabetes have
a fifty percent higher risk of dying from COVID than those without. Mortality rate
is increasing with the diabetes with other associated diseases such as cardiovascular,
respiratory, cancer.
BGL is normally measured using three techniques, namely minimally invasive,
invasive, and non-invasive [3, 8]. Invasive glucose monitoring refers to a procedure
that invades the body, usually by puncturing the skin or by inserting instruments
into the body [9]. Minimally invasive glucose monitoring refers to a procedure that
requires a small incision or the insertion of an instrument into a body cavity. Non-
invasive glucose monitoring refers to the measurement of BGL without drawing
blood, pricking the skin, or causing pain [10, 11]. The non-invasive diagnosis of
diabetes is of great demand due to lower cost, painless, and ease of usage [12]. Since
diabetes is becoming serious and alarming illness, hence the necessity to measure
the glucose without puncturing the finger is of great demand [12, 13].
In this paper, a hardware is implemented to acquire blood glucose samples in a
non-invasive way. Serum glucose samples are acquired with their medical history. A
feature set is built to distinguish diabetic from non-diabetic patients and classified
using various conventional classifiers.
Non-invasive Diabetes Detection System Using Photoplethysmogram … 459

2 Literature Review

In this section, a brief description about the state-of-the-art literature in the


development of Computer-Aided Detection (CAD) system for DM is presented.
Haider Ali et al. [5] designed a simple, cost-effective non-invasive BGL moni-
toring device, which uses the properties of Visible Laser light transmittance and
refraction with a wavelength of 650 nm. The device was tested for both In Vitro and
In Vivo cases. Thirty times better transmittance was found compared to Near-Infrared
(NIR) laser light. In addition, In Vivo experiments showed that there is a decrease in
refractive angle of laser light, when the glucose level increases. Chandrakant Dattrao
Bobade et al. [9] proposed a method, which uses NIR sensor for the measurement
of BGL. The BGL is computed by voltage variations received after transmission
through fingertip. The obtained BGL is further transmitted to the smart Android
app for further analysis. Correlation between glucose level and intensity levels was
observed in their study. Non-invasive technique using Deep Neural Networks (DNNs)
was proposed by Haque et al. [13] to measure blood hemoglobin (Hb), creatinine
(Cr) level, and BGL based on Photoplethysmogram (PPG) signal. PPG signals were
generated from fingertip videos of 93 samples recorded using smartphone. Forty-
six feature set is created from time and frequency domains. Optimal features were
selected using Genetic Algorithm (GA). DNN models were used for estimation of Hb,
BGL, and Cr. Their approach provided R2 values of 0.922, 0.902, 0.969 for Hb, Cr,
and BGL, respectively. Haxha et al. [1] proposed a spectroscopy-based non-invasive
BGL system. In Vitro and In Vivo experiments were conducted using NIR trans-
mission spectroscopy. Their study showed that sensor output voltage and glucose
concentration levels were correlated. Convolution Neural Networks (CNN) were
used by Hossain et al. [10] to develop a non-invasive BGL measuring system. The
relationship between the PPG signal and BGL was discussed in order to provide data
on blood flow in the test region. Invasive measurements of BGL and PPG from 30
individuals were taken before and two hours after breakfast to create a dataset. This
dataset was used to train a CNN model to take advantage of the BGL-PPG signal
relationship. The analysis of BGL using PPG signals captured by a smartphone was
suggested by Islam et al. [4]. The raw PPG signal’s high frequency noise, optical
noise, and motion interference were all removed using a Gaussian filter. The filtered
PPG signals were used to extract various amplitude and time features. For the purpose
of predicting BGL, several regression models were used. According to their findings,
Partial Least Square Regression had the lowest standard error of prediction. Zanelli
et al. [19] implemented a deep learning model using light CNN for the detection of
type 2 diabetes. A raw pulsed extracted from PPG was used for the analysis. The
study has made use of pre-trained models and obtained an AUC of 0.755.
Review of articles on the development of CAD for diabetes detection showed
that performance of the CAD system depends on various factors such as acquisition
methods of BGL, extracted features and classifies used. Different types of non-
invasive acquisition techniques such as laser light sensors [1, 5], NIR sensors [9],
and smart phones [4, 13, 17, 18] have been used for acquiring the samples of BGL.
460 D. Sathish et al.

Even though recording of the data using smartphone method produced promising
results, but it is highly sensitive to skin color, size of fingers, which varies from
person to person. Laser light acquisition methods used bulky spectroscopy devices
[1]. Acquisition of the samples using simple NIR sensor produced good results. Also,
some of the researchers implemented the diabetes detection techniques using deep
learning models [13, 19] and obtained satisfactory results, but to make it more robust,
large size of data samples is required. In order to keep the BGL under control, regular
monitoring is required. Therefore, there is a need for robust cost-effective, portable
non-invasive diabetes monitoring system. In this regard, the proposed method made
an attempt to develop a biomedical device, which could be used by a common man
using NIR sensor and machine learning algorithms.

3 Methodology Used

Computer-Aided Detection of DM is implemented using a device, which consists of


reflection sensor, Microcontroller, and software to process the PPG signal, thereby
classifying it into diabetic or non-diabetic as shown in Fig. 2. Also, study and analysis
of the features, which are available on public dataset Kaggle [14], were made to
understand the features, which are significant in differentiating diabetic and non-
diabetic cases. Section 3.1 discusses the working of hardware, Sect. 3.2 explains the
acquisition of PPG signals, Sect. 3.3 describes the extraction of features, and Sect. 3.4
explains the statistical analysis used to determine the relevant features. Section 3.5
describes the details of Kaggle dataset [14], its features, and statistical test used.
Figure 2 describes the block diagram of the DM system.

3.1 DM Detection System: Hardware Implemented

DM detection system is shown in Fig. 2, which contains DAC, Emitter (NIR light
source), ADC, and Microcontroller in the hardware design. Initially, digital signal
from the Microcontroller is converted to analog signal using DAC. The output signal
from the DAC is used to turn on the NIR light source. The finger is placed between
the Emitter and the Detector. When the light from the light source is directed to
the finger, some part of the light is absorbed by the glucose molecules, and other
part is reflected, this is detected by the NIR Detector. The detected voltage from the
detector is converted into digital form using ADC. According to Beer–Lambert law
‘As the glucose concentration increases the intensity of reflected light decreases’,
using this law, the amount of light absorbed by glucose molecule is identified. The
corresponding voltage (amount of absorption of light) obtained is used to calibrate
the model. Once the model is calibrated, it can be used to estimate the glucose level
in a non-invasive way. Depending on the estimated value of glucose, the person is
classified into diabetic or non-diabetic.
Non-invasive Diabetes Detection System Using Photoplethysmogram … 461

Fig. 2 Block diagram of the DM system

The NIR sensor MAX30102 is equipped with inbuilt LEDS, photo- detectors,
optical circuits, low noise circuitry, and ambient light rejection. The MAX30102
provides a full system solution for mobile and wearable devices. A single 1.8 V and
an additional 3.3 V power sources were used for operation and the internal LEDs,
respectively. The interface for communication is a common I2C-compatible one.
Software can turn off the module when there is no standby current, keeping the
supply powered at all times.

3.2 Acquisition of PPG Signals Using NIR Sensor

The PPG signal depicts blood flow in a blood artery. Blood flows in waves through
blood vessels as it goes from the heart to the fingertips and toes. Two of the important
parameters measured from PPG are systolic and diastolic amplitudes and timings
[15].
The dataset contains raw PPG signals from 20 diabetes patients and 27 healthy
participants who took part in the study. Data were collected using a single 860 nm-
operating reflection-based PPG sensor (MAX30102). An 18-bit ADC was used to
462 D. Sathish et al.

constantly record PPG sensor data for one minute. Data were recorded and then
saved in csv format for future use. While acquiring the PPG signal, features such
as height, weight, and age were also recorded. The subjects were advised to remain
calm and not to speak during collection of data. It has been noticed that when the
subject speaks or moves, the signal gets distorted due to noise. The entire processing
was done offline using MATLAB. From the study of PPG signals [16, 19], it is
observed that PPG signal of non-diabetic subject has elongated shape at diastole and
bell shape-type PPG signal without secondary peak in diabetic subjects.

3.3 PPG Signal Features

Thirty-three features were extracted from the PPG signal as described in Table 1.
Extracted features were statistically analyzed and significant features were fed into
the SVM classifiers. Figure 3 shows the graphical representation of these features.
Area, slope, intervals, etc. which were used to represent the change in the PPG
signal’s contour were extracted. Derivatives of the PPG signal were also extracted,
which describes change in relation to the stiffness in arteria caused by diabetes and
can be a useful indicator.
Fourteen time-domain characteristics, as well as peak and trough values, were
retrieved from each PPG pulse and normalized. On each normalized PPG pulse, the
start, end, and peak points were determined. The time of start, finish, and peak are
denoted by the letters ‘Ts’, ‘Te’, and ‘Tp’, respectively. Other parameters derived
from the normalized PPG signal include systolic peak amplitude (Sp), diastolic
peak amplitude (Dp), area, AUR, AUD, Rarea, RI, RT, DT, and RRD. The first
and second derivatives were then acquired for additional feature extraction. So, total
of 33 features, in which 14 distinct features from the PPG signal and 8 and 11 features
from its first and second derivatives, respectively, were used to distinguish diabetic
and non-diabetic subjects.

3.4 Statistical Analysis of Extracted Features

The importance of features in distinguishing diabetes from non-diabetes was verified


using ANOVA and shown in Fig. 4. The most important factor is age with a score of
9.9166, which is followed by Ti1, CT1, Tdd1, etc., with scores of 8.629, 8.25, 8.25,
respectively.
Non-invasive Diabetes Detection System Using Photoplethysmogram … 463

Table 1 Features extracted from PPG signal [16]


Sl. No. Features Description Sl. no. Features Description
1 Sp Systolic peak amplitude 18 Tdd1 Time of first derivative crest
2 Dp Diastolic peak amplitude 19 Ti1 Time of inflection
3 Ts Time of start 20 ST1 Ti1-Tdp1
4 Tp Time of systolic peak 21 PD1 Ti1-Tdd1
5 Te Time of end 22 CT1 Tdd1-Ts1
6 Tdp Time of diastolic peak 23 A_Amp Amplitude of a wave
7 Area Total area of the single 24 B_Amp Amplitude of b wave
PPG signal
8 AUR Area under the systolic 25 C_Amp Amplitude of c wave
curve
9 AUD Area under the diastolic 26 D_Amp Amplitude of d wave
curve
10 Rarea AUR/AUD 27 E_Amp Amplitude of e wave
11 RI Dp/Sp 28 AGI1 (B-C-D-E)/A
12 RT Tp-Ts 29 AGI2 (B-E)/A
13 DT Tp-Te 30 B/A B_Amp/ A_Amp
14 RRD RT/DT 31 C/A C_Amp/ A_Amp
15 AMP1 Amplitude of first 32 D/A D_Amp/ A_Amp
positive going pulse
16 Ts1 Time of start of the first 33 E/A E_Amp/A_Amp
derivative signal
17 Tdp1 Time of peak of the first
derivative signal

Fig. 3 Feature descriptions from Table 1 [16]


464 D. Sathish et al.

Fig. 4 Score of significant


features using ANOVA

3.5 Statistical Analysis of Features Extracted from Kaggle


Dataset [14]

The Kaggle dataset comprises 767 samples having 500 non-diabetic and 267 diabetic
cases which were used for the analysis. Each sample consists of eight features, namely
number of times pregnancies, BGL, blood pressure (BP), insulin, BMI, diabetes
pedigree function, skin thickness, and age.
Statistical t-test is conducted to distinguish diabetic subjects from non-diabetic
subjects as shown in Table 2 [14]. BP was found to be less significant in distin-
guishing diabetic with non-diabetic subjects. ANOVA test is conducted to determine
the influence of interaction among the features.

Table 2 Statistical analysis of Kaggle dataset features


Features p-value SD Features p-value SD
F1—pregnancies 0.00001 3.2877 F5—insulin 0.0003 14.3323
F2—glucose 0.00001 28.2975 F6—BMI 0.00001 7.5438
F3—blood pressure 0.0715 19.3274 F7—diabetes 0.00001 0.3265
pedigree function
F4—skin thickness 0.0383 15.9180 F8—age 0.00001 11.4287
Non-invasive Diabetes Detection System Using Photoplethysmogram … 465

4 Results and Discussions

Significant features of PPG signal selected by ANOVA test are applied to various
classifiers. Quadratic SVM performed better compared to other classifiers. ROC of
quadratic SVM is shown in Fig. 5 for training. Forty-seven samples (20 diabetic and
27 non-diabetic subjects) were used for training.
From Fig. 5a, it is observed that 12% of non-diabetic samples are misclassified as
diabetic, while 86% of diabetes samples are accurately diagnosed. Figure 5b shows
the ROC curve for the negative class, where 14% of diabetic samples are misclassified
as non-diabetic, while 88% of non-diabetic samples are accurately diagnosed. Area
under the curve for quadratic SVM is found to be 0.93.
Table 3 makes a comparison of performances for the real-time features extracted
from PPG and features from Kaggle dataset. Only best-performing classifiers’ results
are displayed in Table 3 for test data (Table 4).
Since the real data samples acquired were small, conventional classifiers were
tested for the classification of diabetes and non-diabetic cases. AUC of 0.93 was
obtained with an accuracy of 86.48% for training data acquired using NIR sensor
along with various other features and 77.06% for the Kaggle training dataset [14].

a b

Fig. 5 ROC of quadratic SVM for a positive b negative class (training model)

Table 3 Performance of classifier in classifying diabetic and non-diabetic subjects for PPG features
and Kaggle dataset features (test set)
Metric Quadratic SVM Medium-Gaussian SVM
(significant PPG signal features) (%) (Kaggle dataset features) (%)
Accuracy 80 79.20
Sensitivity 100 79.31
Specificity 60 79.16
466 D. Sathish et al.

Table 4 Comparison of proposed method with DNN models [13, 19]


Authors Acquisition No. of samples Method used Performance
method used measure
Haque et al. [13] Smartphone 93 DNN R2 value:
0.902
Zanelli et al. Red and infrared 100 Light CNN AUC: 0.755
[19] LEDS
Proposed NIR sensor 47 (training) Quadratic SVM AUC: 0.93
method using Accuracy:
real-time data 86.48%
Proposed - 667 (training) Medium-Gaussian Accuracy:
method using SVM 77.06%
Kaggle dataset
[14]

5 Conclusion

In this research, some of the conventional classifiers for classifying diabetic and
non-diabetic subjects were experimented. Initially, the various methods for PPG
signal acquisition were identified and explored. PPG signal acquisition from video
recording using smartphone [17] and sensor-based techniques were prominent [11,
18]. Recorded method was not used based on the suggestions given by the doctors of
Father Mullers Medical College, Mangaluru, since the acquisition method is highly
sensitive to color, size of fingers, which varies from person to person. To suppress
these variations, more calibration in image processing is required. Second approach
NIR sensor was used to acquire the PPG signal from the fingertip of the subject. Real-
time data were acquired using MAX30102 NIR sensor. PPG signal of one-minute
duration was collected. Along with PPG, age, height, and weight of the various
persons were collected. From the series of pulses of PPG signal, one good pulse
was selected based on the appearance and then features were extracted from normal-
ized PPG signal and its derivatives. After the extraction of features, various machine
learning algorithms such as linear SVM, quadratic SVM, cubic SVM, discrimi-
nant classifiers were explored. Quadratic SVM showed better training and testing
accuracy.

References

1. Haxha S, Jhoja J (2016) Optical based non-invasive glucose monitoring sensor prototype. IEEE
Photonics J 8(6):1–11
2. Parte RS, Patil A, Patil A, Kad A, Kharat S (2020) Non-invasive method for diabetes detection
using CNN and SVM classifier. Int J Sci Res Eng Dev 3(3):9–13
Non-invasive Diabetes Detection System Using Photoplethysmogram … 467

3. Susana E, Ramli K, Murfi H, Apriantoro NH (2022) Non-invasive classification of blood


glucose level for early detection diabetes based on photoplethysmography signal. Information
(Switzerland) 13(59)
4. Islam TT, Ahmed MS, Hassanuzzaman M, Amir SAB, Rahman T (2021) Blood glucose level
regression for smartphone PPG signals using machine learning. Appl Sci 11(2):1–20
5. Ali H, Bensaali F, Jaber F (2017) Novel approach to non-invasive blood glucose monitoring
based on transmittance and refraction of visible laser light. IEEE Access 5(1):9164–9174
6. Kaveeshwar SA, Cornwall J (2014) The current state of diabetes mellitus in India. Australas
Med J 7(1):45–48
7. Pradeepa R, Mohan V (2021) Epidemiology of type 2 diabetes in India. Indian J Ophthalmol
69:2932–2938
8. Pandey R, Paidi SK, Valdez TA, Spegazzini CZN, Dasari RR, Barman I (2017) Noninvasive
monitoring of blood glucose with Raman spectroscopy. Acc Chem Res 50(2):264–272
9. Bobade CD, Patil MS (2016) Non-invasive blood glucose level monitoring system for diabetic
patients using near-infrared spectroscopy. Am J Comput Sci Inf Technol 4(1):1–8
10. Hossain S, Debnath B, Biswas S, Al-Hossain MJ, Anika A, Navid SK (2019) Estimation of
blood glucose from PPG signal using convolutional neural network. In: 2019 IEEE interna-
tional conference on biomedical engineering, computer and information technology for health
(BECITHCON), pp 53–58
11. Zhang G, Mei Z, Zhang Y, Ma X, Lo B, Chen D, Zhang Y (2020) A noninvasive blood glucose
monitoring system based on smartphone PPG signal processing and machine learning. IEEE
Trans Industr Inf 16(11):7209–7218
12. Jain P, Joshi AM, Agrawal N, Mohanty S (2020) iGLU 2.0: a new non-invasive, accurate serum
glucometer for smart healthcare. IEEE Trans Consum Electron 2020:1–19
13. Haque R, Raju SMTU, Golap A-U, Hashem MMA (2021) A novel technique for non-invasive
measurement of human blood component levels from fingertip video using DNN based models.
IEEE Access 9(1):19025–19042
14. https://www.kaggle.com/datasets/mathchi/diabetes-data-set. (Online)
15. Yamakoshi Y, Matsumura K, Yamakoshi T, Lee J, Rolfe P, Kato Y, Shimizu K, Yamakoshi
K-I (2017) Side-scattered fingerphotoplethysmography: experimental investigations toward
practical noninvasive measurement of blood glucose. J Biomed Optics 22(6):067001–067011
16. Niralaa N, Periyasamy R, Singh BK, Kumar A (2019) Detection of type-2 diabetes using
characteristics of toe photoplethysmogram by applying support vector machine. Biocybernetics
Biomed Eng 39(1):38–51
17. Mazzu-Nascimento T, de Oliveira Leal ÂM, Nogueira-de-Almeida CA, de Avó LR, Carrilho
E, Silva DF (2020) Noninvasive self-monitoring of blood glucose at your fingertips, literally!:
smartphone-based photoplethysmography. Int J Nutrol 13(2):48–52
18. Zhang Y, Zhang Y, Siddiqui SA, Kos A (2019) Non-invasive blood-glucose estimation using
smartphone PPG signals and subspace KNN classifier. Elektrotehniski Vestnik/Electrotechnical
Rev 86(1):68–74
19. Zanelli S, El Yacoubi MA, Hallab M, Ammi M (2023) Type 2 diabetes detection with light
CNN from single raw PPG wave. IEEE Access 11:57652–57665
Hybrid Machine Learning Algorithms
for Effective Prediction of Water Quality

Kavitha Datchanamoorthy, B. Padmavathi, Dhamini Devaraj,


T. R. Gayathri, and V. Hasitha

Abstract The molecular characteristics of water samples involve analyzing the


different physical and chemical properties to understand the composition and quality
of the water. Some common parameters studied in this field include: temperature, pH,
turbidity, conductivity, biochemical, dissolved solids, chemical and dissolved oxygen
(DO), nutrient levels, heavy metals, and microbiological analysis. By studying
these physico-chemical characteristics, researchers can gain insights into the water’s
quality, its potential uses, and identify any potential pollutants or sources of contam-
ination. It is necessary to regularly check the health of water sources. To safeguard
human health, ecosystems, and sustainable development, monitoring and main-
taining the source’s water quality is crucial. Effective water quality management
involves regular testing, pollution prevention measures, wastewater treatment, and the
implementation of appropriate regulations and policies to protect water sources and
ensure their sustainability. The most accurate projections of the effects of wastewater
improvement will be produced by research on machine learning-based techniques.
The dataset is analyzed using the supervised machine learning approach (SMLT),
which gathers a variety of data points, including the categorization of variables and
outcomes from univariate, bivariate, and multivariate studies. Using evaluation tools,
the performance of several machine learning techniques on the presented dataset is
contrasted and discussed. The error rate prediction shows the feasibility of the given
model to work over the challenging environment. This model also works efficiently
towards the unseen data to achieve better generalization.

Keywords Water quality · Wastewater improvement · Water quality management

K. Datchanamoorthy (B) · B. Padmavathi · D. Devaraj · T. R. Gayathri · V. Hasitha


Easwari Engineering College, Ramapuram, Chennai, India
e-mail: dkavithavijay@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 469
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_40
470 K. Datchanamoorthy et al.

1 Introduction

The quality of water is essential for various reasons, as it directly impacts both
human and environmental health. Polluted water can comprise unsafe microorgan-
isms, substances, heavy metals, and other contaminants that can cause a range of
waterborne diseases, such as diarrhoea, cholera, typhoid, and hepatitis. Maintaining
good water quality is vital to prevent the transmission of these diseases and protect
public health. The e-waste policies [1] play a vital role in protecting the environment
and enforcing the law against them. Water quality immediately impacts the environ-
ment of the ecosystem and everyone’s general well-being. Using extra water sources,
such as aquifers and the sea has sometimes been helpful in solving issues. In contrast
to using seawater, which is often linked to the spread of toxins, using groundwater
without an adequate refill may cause land to sink. Waterway utilization has partic-
ularly stood out. A few researches focusing on streams all over the world have led
to the proposal of a design discipline known as stream designing. It’s important to
consider biological effects, silt mobility, water quality, and pollutant transmission
mechanisms [2, 3] while designing for in-stream circumstances. Since the varied
uses of water in our daily beings, it is crucial to ensure its availability, quality, and
sustainable use. Water conservation, responsible water management practices, and
efforts to protect water sources are necessary to secure this vital resource for current
and future generations. The image (Fig. 1) illustrates how water is used in several
aspects of daily living.
It is essential to have a machine learning approach depending on the characteris-
tics of the data, the particular job of predicting the water quality, and the available
computer resources. The work is organized as: Sect. 2 incorporates the extensive
study on the water quality. Section 3 highlights the proposed prediction method.

Fig. 1 Applications of
drinking water
Hybrid Machine Learning Algorithms for Effective Prediction of Water … 471

Section 4 gives the implementation and algorithms used. Section 5 gives results and
a comprehensive discussion on the outcome. Section 6 presents the conclusion with
future research directions.

2 Related Research

The water quality prediction model employs the weighted arithmetic index method
to determine the water quality index (WQI) using principal component regression
technique [4]. The dataset undergoes principal component analysis (PCA), where the
significant WQI parameters are extracted [5]. Regression techniques are then imple-
mented on the PCA result to predict the WQI. An experimental analysis is performed
using a dataset linked to Gulshan Lake to evaluate the proposed approach. The results
indicate that the principal component regression method has a prediction accuracy of
95%, while the gradient boosting classifier method [6, 7] has a classification accuracy
of 100% compared with the latest models.
Using sophisticated artificial intelligence (AI) technologies, the water quality
index (WQI) and the water quality classification (WQC) are calculated [8, 9]. The
long short-term memory (LSTM) deep learning technique and the nonlinear autore-
gressive neural network (NARNET) have been developed for the WQI prediction. The
outcomes demonstrated that the suggested models can accurately predict WQI and
identify the water standard with improved resilience. The neural network with long
short-term memory (LSTM) and improved grey relational analysis (IGRA) method
are suggested [10]. First, it is suggested that IGRA be used to choose features for
water data quality based on similarity and proximity, taking into account overall
multidimensional association of that information. Second, the water quality fore-
casting proposed based on LSTM, where inputs are the attributes acquired by IGRA,
is created while taking the temporal sequence of the water quality information into
consideration. Lastly, Victoria Bay and Tai Lake, two real-world datasets on water
quality, are used to test the suggested methodology. According to experimental
results, the model demonstrates superior performance in water quality prediction
by fully utilizing the multiplex relationships and patterns within the water quality
information. This approach is more effective than single-feature or non-sequential
prediction methods.
Based on the information gathered from four monitoring sites along the La Buong
River between 2010 and 2017 (WQI), an evaluation of twelve different machine
learning (ML) models has been done to develop a water quality indicator (WQI) [11,
12]. However, it was found that the extreme gradient boosting (XGBoost) method
was the most accurate and productive (R2 = 0.989 and RMSE = 0.107). The results
indicate that all twelve models demonstrated potential predictive accuracy for the
WQI. This study highlights the potential of ML models, in particular XGBoost, for
precise WQI forecasting and for advancing better administration of water quality
techniques.
472 K. Datchanamoorthy et al.

The efficacy of artificial intelligence methodologies, such as artificial neural


networks (ANN), group methods of data handling (GMDH), and support vector
machines (SVM) is being examined to predict the discrete components of water in
the Tireh River, which is located in the southern and western regions of Iran [13, 14].
Upon examination of the model outputs, it was discovered that each exhibited some
overestimation traits. By contrasting the model’s objectively confirmed on the DDR
index, it was found that efficiency of things like the SVM model was connected with
the lowest DDR value.
In conclusion, the motivation behind proposing water quality predictions using
hybrid algorithms lies in the pursuit of a comprehensive, accurate, and proactive
approach to safeguarding water ecosystems, protecting public health and fostering
sustainable development. The proposed hybrid algorithm voting classifier that
combines the predictions of random forest, Cat Boost, Naive Bayes, and logistic
regression machine learning models can achieve state-of-the-art performance on a
variety of water quality prediction tasks.

3 Proposed Method

This proposed model of voting classifier (Fig. 2) will be used to develop a machine
learning model to categorize water quality. Data collection is the first step in the
process, during which information is gathered from earlier water samples. Here,
data analysis is being used to identify dependent and independent properties. The
proper machine learning approaches, which are explained below, are applied to the
collection where the trend in the data is gathered. After experimenting with a number
of algorithms, a more effective method is employed to predict the outcome.
System Architecture
See Fig. 2.

Fig. 2 System architecture of proposed water quality prediction system


Hybrid Machine Learning Algorithms for Effective Prediction of Water … 473

Data Wrangling
In this phase, the loaded data is checked, and then the dataset is pruned and purified
in preparation for analysis. It is essential to record the procedures followed during
the cleaning process and provide justification for the decisions made regarding the
data refinement.
Data Collection
To forecast the provided data, the collected dataset has been divided into two compo-
nents: the training set and the test set. The ratio used to divide the training set from
the test set is typically 7:3. The data model is then applied to the training set, and the
accuracy of the test set is predicted based on the test results.
Data Preprocessing
The absence of certain data points in the dataset renders it unreliable. To boost the
algorithm’s effectiveness and produce superior outcomes, inputs must be prepro-
cessed. Data cleansing will require less work, and more time will be devoted to
research and modelling [15, 16].
Data Visualization
Data visualization offers a crucial set of tools for acquiring qualitative insight [17].
The use of data visualizations is a powerful way to convey and demonstrate impor-
tant relationships within graphs and charts, which are often easier for viewers to
understand and appreciate than complex correlation or importance metrics, even
with limited subject knowledge.

4 Algorithm Implementation

4.1 Random Forest Model

Random forest is a trendy machine learning technique (Table 1) for classification and
regression problems. A distinct subset of the data and traits is used to train each deci-
sion tree. This lessens oversampling and strengthens the system’s defences against
anomalies and noisy data. The approach [18] may also deliver feature significance
ratings, which can be used to categorize the most crucial components of the data and
will be able to determine accuracy as seen in the result below.

4.2 Cat Boost Algorithm

Cat Boost is an open-source gradient boosting method developed by Yandex, a


Russian search engine company. The term Cat Boost comes from the algorithm’s
474 K. Datchanamoorthy et al.

Table 1 Random forest algorithm


Input: Water data sample Output: Accuracy
Algorithm
i. Start
ii. Initialize a RandomForestClassifier object called “rf”, fit the training data
iii. Predict the test data (X_test) assign the result to the variable called predicted, predicted =
rf.predict(X_test)
iv. Compare the accuracy of the predicted values by relating them to the true target variable y_
test using the “accuracy_score” function from the scikit-learn library. Assign the result to a
variable called “accuracy”
v. End procedure

Table 2 CatBoost algorithm


Input: Water data sample Output: Accuracy
Algorithm
i.
Start
ii.
Initialize a CatBoostClassifier object called “cat” with the parameter verbose set to 0
iii.Fit the CatBoostClassifier object to training data
iv.Predict the test data (X_test), assign the result to the variable called predicted, predicted =
cat.predict(X_test)
v. Compare the accuracy of the predicted values by relating them to the true target variable y_
test using the “accuracy_score” function from the scikit-learn library. Assign the result to a
variable called “accuracy”
vi. End procedure

capability to handle categorical data as well as its incorporation of gradient boosting


even as an underpinning learning technique. The method (Table 2) utilizes a variety
of weak models, like decision trees, to create a strong ensemble model. Only two of
the novel techniques used by Cat Boost to improve performance are the hierarchical
boosting approach, which enables the algorithm to learn from the structure of the data,
as well as the symmetric tree-traversal (SST) algorithm, which reduces the computa-
tional complexity of tree building. Cat Boost also employs the gradient-based one-hot
encoding (OHE) concept for categorical data processing, which converts categorical
data into numerical values for usage in machine learning models. The algorithm’s
hyperparameters, which can be changed to improve performance, include the learning
rate, tree depth, and regularization parameters [19].

4.3 Naive Bayes Model

A straightforward yet powerful machine learning technique for classification and


prediction is Naive Bayes. In order to build a Naive Bayes classifier, the method
(Table 3) first calculates the posterior distribution for every category in the training
Hybrid Machine Learning Algorithms for Effective Prediction of Water … 475

Table 3 Naive Bayes algorithm


Input: Water data sample Output: Accuracy
Algorithm
i. Start
ii. Initialize a GaussianNB object called “gb”, fit the training data
iii. Predict the test data (X_test) assign the result to the variable called predicted, predicted =
gb.predict(X_test)
iv. Compare the accuracy of the predicted values by comparing them to the true target variable
y_test using the “accuracy_score” function from the scikit-learn library. Assign the result to
a variable called “accuracy”
v. End procedure

data. Then, for each class, the conditional probability based on each characteristic in
the input data is determined. The anticipated class is then assigned to the group with
the greatest possibility [20, 21]. It is recognized for being simple to use, having a quick
learning curve, and being able to handle massive amounts of data. Its performance,
however, might be impacted by its susceptibility to unimportant or related variables.

4.4 Logistic Regression

The purpose of employing logistic regression, a statistical analysis technique, is to


predict a binary outcome (for instance, yes/no, 0/1) to use a certain set of input data
in binary categorization situations. The logistic regression model (Table 4) is trained
using the estimation of maximum likelihood strategy, which finds the set of model
parameters that maximizes the possibility of the observed data [22, 23].

Table 4 Logistic regression algorithm


Input: Water data sample Output: Accuracy
Algorithm
i. Start
ii. Initialize a LogisticRegression object called “LR”, fit the training data
iii. Predict the test data (X_test), assign the result to the variable called predicted, predicted =
LR.predict(X_test)
iv. Compare the accuracy of the predicted values by relating them to the true target variable y_
test using the “accuracy_score” function from the scikit-learn library. Assign the result to a
variable called “accuracy”
v. End procedure
476 K. Datchanamoorthy et al.

4.5 Proposed Algorithm: Voting Classifier Algorithm

Voting classifiers have ensemble learning approaches that combine different machine
learning techniques to produce predictions in artificial intelligence. It works (Table 5)
by combining the results of the forecasts from numerous different models and
selecting the most frequently regarded approximation as the end outcome. Machine
learning typically uses voting classifiers (Fig. 3) to boost prediction accuracy and
stability. They are especially useful when there is little training data or if the various
models have different strengths and weaknesses [24–26]. A few machine learning
models that can be used with voting classifiers include neural networks, decision
trees, and support vector machines.

Table 5 Voting classifier algorithm


Input: Water data sample Output: Accuracy
Algorithm
i. Start
ii. Initialize three classifier objects called “lr”, “cat”, and “rf” respectively, each with a random
state set to 0
iii. Initialize a VotingClassifier object called “vot_clf” with an “estimator” argument containing
the above-mentioned classifier objects and their corresponding names
iv. Fit the VotingClassifier object to training data
v. Predict the test data (X_test), assign the result to variable called predicted, predicted = vot_
clf.predict(X_test) = approx(most_frequent_predicted from models)
vi. Compare the accuracy of the predicted values by comparing them to the true target variable
y_test using the “accuracy_score” function from the scikit-learn library. Assign the result to
a variable called “accuracy”
vii. End procedure

Fig. 3 Proposed voting


classifier model
Hybrid Machine Learning Algorithms for Effective Prediction of Water … 477

Table 6 Performance metrics calculation


Algorithms Precision Recall True + ve True − ve False + ve False − ve Accuracy
Logistic 0.58 0.42 0.91 0.05 0.74 0.02 0.58725
regression
Cat boost 0.69 0.69 0.87 0.43 0.77 0.53 0.693708
Random 0.62 0.74 0.97 0.1 0.75 0.18 0.622516
forest
Gaussian 0.62 0.52 0.85 0.24 0.72 0.18 0.60304
Naive Bayes
Voting 0.67 0.73 0.91 0.35 0.77 0.47 0.68337
classifier
It is the proposed model of our work. The bold letters shows its values compared with the existing
algorithms

4.6 Performance Metrics Calculation

Table 6 depicts the comparisons between the various algorithms with the proposed
voting classifier algorithm.

5 Result and Discussion

As a result, the intended outcome in the prior system model was not realized. The
accuracy of the current system is 62.5%. When the model was implemented, the
final result indicates whether the water is drinkable or not, and we have attained an
accuracy of 74.5% with numerous other new algorithmic advances.
The work makes use of hybrid machine learning techniques, which incorporate
different algorithms (Fig. 4). These algorithms include logical regression, Cat Boost,
Naive Bayes, random forest, and logical regression. The objective of this project
is to increase accuracy from the previous models’ maximum of 62–80% using a
combination of algorithms. Following the implementation of the aforementioned
methods, the accuracy of the random forest algorithm was 67.3%, that of the logical
regression algorithm was 59.4%, that of the Cat Boost algorithm was 69.3%, and that
of the Naive Bayes algorithm was 60.4%. The accuracy of these separate algorithm
types was subsequently raised to 74.5% by using a voting classifier. To increase the
model’s accuracy, it is advantageous to apply hybrid machine learning techniques.
The advantages of each algorithm are taken advantage of, and its shortcomings are
made up for by combining them. The success of this research may be due to the
appropriate choice and blending of the algorithms utilized, as the accuracy of 74.5%
attained is a major improvement over the prior models.
478 K. Datchanamoorthy et al.

Fig. 4 Output comparison chart

6 Conclusion

In this research, a novel method is proposed to predict the water quality. The propo-
sition of water quality predictions using hybrid algorithms is rooted in the recog-
nition of the critical importance of ensuring clean and safe water sources for both
human well-being and environmental preservation. The adoption of hybrid algo-
rithms, which combine the strengths of multiple prediction models and data-driven
techniques, represents a ground-breaking approach to address these challenges. The
numerical results show that the proposed model shows a better trade-off in contrast to
existing approaches. The prediction accuracy of this model is 74% which is compar-
atively higher than the other models. Moreover, the proposed model achieves better
results for other metrics like sensitivity, specificity, MCC, FPR, FAR, and AUC. The
learning approaches show the higher significance of the prediction model. Hence,
the hybridization of learning approaches has opted for the upcoming research works.

References

1. Periathamby A, Victor D (2011) Policy trends of extended producer responsibility in Malaysia.


Waste Manag Res 29:945–953. https://doi.org/10.1177/0734242X11413332
2. Tyagi S, Sharma B, Singh P (2013) Water quality assessment in terms of water quality index.
Am J Water Resour 1:34–38. https://doi.org/10.12691/ajwr-1-3-3
3. Wu D, Wang H, Mohammed H, Seidu R (2019) Quality risk analysis for sustainable smart
water supply using data perception. IEEE Trans Sustain Comput 1. https://doi.org/10.1109/
TSUSC.2019.2929953
Hybrid Machine Learning Algorithms for Effective Prediction of Water … 479

4. Bouamar M, Ladjal M (2008) Evaluation of the performances of ANN and SVM techniques
used in water quality classification, pp 1047–1050. https://doi.org/10.1109/ICECS.2007.451
1173
5. Al-Odaini N, Zakaria M, Zali M, Juahir H, Yaziz M, Surif S (2011) Application of chemometrics
in understanding the spatial distribution of human pharmaceuticals in surface water. Environ
Monit Assess 184:6735–6748. https://doi.org/10.1007/s10661-011-2454-3
6. Sinha K, Das P (2015) Assessment of water quality index using cluster analysis and artificial
neural network modeling: a case study of the Hooghly River basin, West Bengal, India. Desalin
Water Treat 54:28–36. https://doi.org/10.1080/19443994.2014.880379
7. Aldhyani T, Al-Yaari M, Alkahtani H, Maashi M (2020) Water quality prediction using artificial
intelligence algorithms. Appl Bionics Biomech 2020:1–12. https://doi.org/10.1155/2020/665
9314
8. Berry M, Mohamed A (2020) Supervised and unsupervised learning for data science. https://
doi.org/10.1007/978-3-030-22475-2
9. Choubin B, Khalighi-Sigaroodi S, Malekian A, Kişi Ö (2016) Multiple linear regression, multi-
layer perceptron network and adaptive neuro-fuzzy inference system for forecasting precipi-
tation based on large-scale climate signals. Hydrol Sci J 61(6):1001–1009. https://doi.org/10.
1080/02626667.2014.966721
10. Diamantopoulou M, Papamichail D, Antonopoulos V (2005) The use of a neural network
technique for the prediction of water quality parameters. Oper Res Int J 5:115–125. https://doi.
org/10.1007/BF02944165
11. Rogier A, Donders T, van der Heijden GJMG, Stijnen T, Moons KGM (2006) Review: a
gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091. ISSN
0895-4356. https://doi.org/10.1016/j.jclinepi.2006.01.014
12. Seifeddine M, Bradai A, Bukhari S, Quang PTA, Ben Ahmed O, Atri M (2020) A survey on
machine learning in internet of things: algorithms, strategies, and applications. Internet Things.
https://doi.org/10.1016/j.iot.2020.100314
13. Shepur R, Shivapur A, Kumar P, Hiremath C, Dhungana S (2019) Utilization of water quality
modeling and dissolved oxygen control in river Tungabhadra, Karnataka (India). OALibJ 06:1–
17. https://doi.org/10.4236/oalib.1105397
14. Othman F, Alaaeldin ME, Seyam M, Najah A-M, Teo FY, Chow MF, Afan H, Sherif M, El-
Shafie A (2020) Efficient river water quality index prediction considering minimal number
of inputs variables. Eng Appl Comput Fluid Mech 14:751–763. https://doi.org/10.1080/199
42060.2020.1760942
15. Hildenbrand ZL, Carlton DD Jr, Fontenot BE, Meik JM, Walton JL, Taylor JT, Thacker JB,
Korlie S, Shelor CP, Henderson D, Kadjo AF, Roelke CE, Hudak PF, Burton T, Rifai HS,
Schug KA (2015) A comprehensive analysis of groundwater quality in the Barnett Shale region.
Environ Sci Technol 49(13):8254–8262. https://doi.org/10.1021/acs.est.5b01526. Epub 26 Jun
2015. PMID: 26079990
16. Chen T, Chen H, Liu R-W (1995) Approximation capability in by multilayer feedforward
networks and related problems. IEEE Trans Neural Netw 6:25–30. https://doi.org/10.1109/72.
363453
17. Hornik K, Stinchcombe MB, White HL (1989) Multilayer feedforward networks are universal
approximators. Neural Netw 2:359–366
18. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. In: International
conference on learning representations
19. Lau J, Hung WT, Cheung CS (2009) Interpretation of air quality in relation to monitoring
station’s surroundings. Atmos Environ 43:769–777. https://doi.org/10.1016/j.atmosenv.2008.
11.008
20. Liu P, Wang J, Sangaiah AK, Xie Y, Yin X (2019) Analysis and prediction of water quality
using LSTM deep neural networks in IoT environment. Sustainability (Switzerland) 11. https://
doi.org/10.3390/su1102058
21. Salari M, Salami E, Afzali SH, Ehteshami M, Gea OC, Derakhshan Z, Sheibani S (2018)
Quality assessment and artificial neural networks modeling for characterization of chemical
480 K. Datchanamoorthy et al.

and physical parameters of potable water. Food Chem Toxicol 118. https://doi.org/10.1016/j.
fct.2018.04.036
22. Liu Y, Zhao T, Ju W, Shi S (2017) Materials discovery and design using machine learning. J
Mater 3. https://doi.org/10.1016/j.jmat.2017.08.002
23. Shukla P (2020) Development of fuzzy knowledge-based system for water quality assessment
in river Ganga. https://doi.org/10.1007/978-981-15-3287-0_2
24. Ma C, Zhang H, Wang X (2014) Machine learning for big data analytics in plants. Trends Plant
Sci 19. https://doi.org/10.1016/j.tplants.2014.08.004
25. Ma J, Ding Y, Cheng J, Jiang F, Xu Z (2019) Soft detection of 5-day BOD with sparse matrix
in city harbor water using deep learning techniques. Water Res 170:115350. https://doi.org/10.
1016/j.watres.2019.115350
26. Rahman SS, Hossain M (2019) Gulshan Lake, Dhaka City, Bangladesh, an onset of continuous
pollution and its environmental impact: a literature review. Sustain Water Resour Manag 5:767–
777. https://doi.org/10.1007/s40899-018-0254-4
OCR-Based Ingredient Recognition
for Consumer Well-Being

S. Kayalvizhi, N. Akash Silas, R. K. Tarunaa, and Shivani Pothirajan

Abstract Many companies utilize retail tactics to entice consumers into purchasing
their products, all the while obscuring the potential risks associated with certain
ingredients used in these products. This lack of transparency places consumers at risk
of unforeseen consequences. The development of mobile applications for recognizing
text from images and processing it for various purposes has become an increasingly
important research area. In this paper, we present an OCR-based Android application
designed to recognize ingredients on product labels and match them with a database
of toxic substances. The application uses the Google ML Kit to recognize text and the
Levenshtein algorithm to calculate the similarity between the recognized ingredients
with the database. The application retrieves information on toxic substances from
the Firebase database, which is used to validate the recognized ingredients. The
application also implements a user interface for displaying the list of toxic ingredients
found in the product, providing a warning to the user. Our application provides
a simple and effective method to check for toxic ingredients, ensuring consumer
safety.

Keywords Optical character recognition (OCR) · Android app development · ML


Kit Vision API · Firebase Realtime Database · Levenshtein similarity algorithm ·
Toxic ingredients detection · Ingredient recognition · Text recognition · Image
processing · Machine learning · Mobile application development

S. Kayalvizhi · N. Akash Silas (B) · R. K. Tarunaa · S. Pothirajan


Computer Science and Engineering, Easwari Engineering College, Chennai, India
e-mail: akashsilas1@gmail.com
S. Kayalvizhi
e-mail: kayalvizhi.s@eec.srmrmp.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 481
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_41
482 S. Kayalvizhi et al.

1 Introduction

In our fast-paced society, people are increasingly concerned about their health and
well-being. However, not all health products in the market are safe and reliable. This
makes it essential to have accurate information about potential hazards. Mobile apps
can provide quick access to this information. Optical character recognition (OCR)
technology scans ingredients from product labels to detect harmful substances. This
empowers consumers to make informed decisions about what they buy.

2 Motivation

Consumer safety is crucial, especially when it comes to products that may contain
harmful ingredients. Unfortunately, there are few software solutions available to iden-
tify these risks. Advanced tools are needed to help users make informed decisions.
In addition, user-friendly interfaces are necessary to provide clear and concise infor-
mation. This ensures that even users without technical expertise can easily navigate
the software. By prioritizing both software effectiveness and accessibility, we can
create solutions that benefit a wider range of users.

3 Literature Survey

See Table 1.

4 Existing Systems

Existing systems for identifying food ingredients and nutritional information have
limitations. FoodWiki uses semantic search and concept matching to provide infor-
mation on food additives, but has a static database [1]. Open Food Facts is a
crowdsourced database that lacks quality control and does not provide interpreta-
tion of nutritional information. SmartLabel requires users to scan QR codes and its
information can be difficult to navigate [2].
Several algorithms, such as Mutha’s food detection and recognition system [6],
Ocay’s NutriTrack [7], and Mokdara’s personalized food recommendation system
[8] have been proposed to improve food information systems. OCR technology has
been analyzed by Hamad and Kaya [9], Lund et al. [10], and Lázaro et al. [11], while
Bhavani surveyed coding algorithms for medical image compression [12].
OCR-Based Ingredient Recognition for Consumer Well-Being 483

Table 1 Survey results


Adopted methodology Authors Strengths Weaknesses
Ontology-driven Çelik Ertuğrul [1] Enhances mobile safe Potential limitations in
approach food consumption scalability
Comparative analysis Giménez-Arnau [2] Comprehensive Limited understanding
of existing approaches analysis of chemical and challenges in
compounds causing identifying sensitizers
skin allergies in complex mixtures
Analysis of products Zulaikha et al. [3] Comprehensive Limited focus on
overview of hazardous specific health
ingredients in concerns associated
cosmetics and personal with the identified
care products ingredients
Observational study on Barnett et al. [4] Provides real-world Limited sample size,
harmful products insights, captures actual potential for biased
consumer behavior self-reporting
Spectrophotometric Al-Saleh et al. [5] Accurate and sensitive Limited sample size
analysis detection of lead for analysis

However, none of these systems match the proposed OCR ingredient scanning
application in terms of comprehensiveness, reliability, and accuracy.

5 Proposed System

The proposed system solves the issues that exist in current systems and aims to
provide optimal solutions for the problem at hand. The suggested system is a mobile
optical character recognition (OCR) application that reads text from a picture and
does a Levenshtein similarity analysis to assess if the elements in the image are
harmful or not. A distance computation technique called the Levenshtein algorithm
works to compare two strings. The technology recognizes the components from a
picture and compares them with a database of harmful substances in order to help
users assess the safety of ingredients in home items or cosmetics. The software
analyzes the safety of the components once the user snaps a picture of the product
label or ingredient list.
The system helps to identify harmful ingredients in products by taking a photo
of the label or ingredient list using a mobile app. It includes a user interface, image
processing, OCR, and a database. The app extracts text using Google’s ML Kit and
compares it with a database of dangerous substances using the Levenshtein algorithm.
If toxic ingredients are detected, a warning message is displayed. Firebase Realtime
Database stores toxic ingredient data. The system is a convenient way for consumers
to identify harmful ingredients (Fig. 1).
484 S. Kayalvizhi et al.

Fig. 1 Functional architecture diagram

5.1 Levenshtein Distance Algorithm

The Levenshtein algorithm, also referred to as the edit distance algorithm, provides
precise outcomes by determining the minimum number of single-character modifi-
cations needed to convert one string into another. This algorithm considers both the
order and the spelling of the characters in the strings, ensuring that it accommodates
minor variations in spelling or order. As a result, the Levenshtein algorithm is able
to accurately match short strings such as ingredients, even in cases where the strings
have minor spelling or order differences. Compared with other string comparison
algorithms, Levenstein algorithm yields accurate results.


⎪ |a| if |b| = 0,



⎪ |b| if |a| = 0,


lev(tail(a), tail(b)) if a[0] = b[0],
lev(a, b) = ⎧

⎪ ⎨ lev(tail(a), b)



⎪ 1 + min lev(a, tail(b)) otherwise.

⎩ ⎩
lev(tail(a), tail(b))

5.2 Pseudocode for the Levenshtein Distance Algorithm

function levenshteinSimilarity(s1, s2):


dp = 2D array of size (sA.length() + 1) × (sB.length() + 1)
for A = 0 to s1.length():
dp[A][0] = A
for B = 0 to sB.length():
dp[0][B] = B
OCR-Based Ingredient Recognition for Consumer Well-Being 485

Fig. 2 Database
Substance Author Year Effect Toxicity

Acetochlor EFSA 2008 body weight systemic

Acibenzolar EFSA 2014 body weight teratogenic

Aclonifen EFSA 2008 body weight systemic

Allura Red AC EFSA ANS 2009 body weight systemic

Amidosulfuron EFSA 2008 body weight systemic

for A = 1 to s1.length():
for B = 1 to s2.length():
cost = if s1[A − 1] == s2[B − 1] then 0 else 1
dp[A][B] = min(dp[A − 1][B] + 1, dp[A][B − 1] + 1, dp[A − 1][B −
1] + cost)
maxLen = max(s1.length(), s2.length())
return 1.0 − (double)dp[s1.length()][s2.length()]/maxLen

6 Module Description

The OCR-based ingredient scanning app has a simple user interface created using
the Android development framework with three components: two ImageViews and
a TextView. The Google ML Kit Vision Text Recognition API extracts text from
selected images, and the Firebase Realtime Database stores a list of known toxic
substances (Fig. 2).
If the similarity score calculated is above a threshold value (0.8 in this case), the
app considers the ingredient to be toxic and adds it to the list of toxic ingredients.

7 Results and Analysis

Our study demonstrated the effectiveness of the app in identifying harmful ingre-
dients and providing users with valuable information to make informed decisions
about the products they consume. The app has the potential to significantly reduce
the risk of exposure to harmful ingredients and promote healthier and safer consumer
habits.
486 S. Kayalvizhi et al.

Fig. 3 a Application icon. b Home screen. c Input image when scanning pancake syrup. d Harmful
ingredients list

In Fig. 3a, the user initiates a click on the application icon. Subsequently, the
user is redirected to the home screen depicted in Fig. 3b. Once we capture image or
select image from gallery and feed it as input to the application in Fig. 3c, it then
recognizes the text, preforms Levenshtein’s distance algorithm to detect harmful
ingredients. The harmful ingredients are then displayed in the card view in Fig. 3d.

7.1 Performance Metrics

Table 2 shows the results of a product testing for harmful ingredients detection
accuracy, exactness, recall, F1-measure, and speed. The products tested include
soaps, shampoos, facewashes, food items, and miscellaneous items.

Table 2 Performance evaluation


Products Accuracy (%) Exactness (%) Retrieval (%) F1-measure (%) Speed (ms)
Biotique soap 100 100 100 100 945
Meera powder 100 100 100 100 1501
Ponds 100 83.33 100 90.91 1289
facewash
Imperial 100 100 100 100 852
pancake syrup
Gillette 100 100 100 100 747
shaving cream
OCR-Based Ingredient Recognition for Consumer Well-Being 487

Fig. 4 Accuracy

Figure 4 illustrates the accuracy of the tested products in detecting harmful ingre-
dients. Most products achieved a 100% accuracy rate, except for Lakme sunscreen,
which had lower accuracy due to its font impacting OCR recognition. Improving the
font could enhance Lakme’s accuracy and ratings.
Figure 5 focuses on the exactness scores, indicating how precisely the prod-
ucts detected harmful ingredients. The majority of the tested products showed high
exactness scores, ensuring accurate identification of potentially harmful components.
Nonetheless, Ponds product had a lower exactness score, implying that its detection
mechanism might have experienced some limitations or challenges.
Figure 6 highlights the retrieval scores of the tested products, which measure
how effectively they retrieved information about harmful ingredients. Most of the
products demonstrated good retrieval performance, successfully obtaining relevant
data. By optimizing the retrieval process, Lakme can enhance its ability to access vital
information regarding harmful ingredients, thus contributing to improved product
safety.
Figure 7 presents the F1-measure scores for the products, taking into account both
accuracy and recall. Again, most of the tested products displayed high F1-measure
scores, indicating excellent performance in detecting harmful ingredients. However,
both Ponds and Lakme sunscreens showed lower F1-measure scores. Addressing
this issue would be crucial in further enhancing the ability of Ponds and Lakme to
identify harmful ingredients effectively.
In Fig. 8, the speed of the tested products is evaluated, with the focus on how
quickly they performed the harmful ingredient detection process. Interestingly, all
the products exhibited lower speed scores. This suggests that there is room for
improvement in the efficiency of the detection process across all tested items.
488 S. Kayalvizhi et al.

Fig. 5 Exactness

Fig. 6 Retrieval
OCR-Based Ingredient Recognition for Consumer Well-Being 489

Fig. 7 F1-measure

Fig. 8 Speed
490 S. Kayalvizhi et al.

8 Conclusion

The proposed OCR-based application recognizes ingredients in food products and


matches them with a pre-built database of toxic substances. The application uses
Google ML Kit Vision API and Levenshtein distance algorithm to compare ingre-
dients accurately. The system was tested with various strings and images, with
promising results. The application can be useful in scenarios like grocery shopping,
cooking, and dietary restrictions. The use of AI in product development and safety
testing has significant benefits for consumer safety, allowing for timely identification
and response to safety issues. AI can also assist in the development of more effective
safety testing methodologies. Overall, AI integration has the potential to improve
consumer safety and lead to safer and more reliable products.

9 Future Scope

The project using OCR and string comparison to detect harmful ingredients has
potential for future development in terms of improving algorithms for ingredient
analysis using machine learning and natural language processing. The system could
also expand to cover a wider range of consumer products and integrate with existing
regulatory frameworks. There is significant potential for improving consumer safety
and promoting transparency in the products we use daily. Future work includes
expanding the database, dynamically updating it, improving the matching algorithm,
and integrating the system with other applications or services.

References

1. Çelik Ertuğrul D (2016) FoodWiki: a mobile app examines side effects of food additives via
semantic web. J Med Syst 40:41
2. Giménez-Arnau E (2019) Chemical compounds responsible for skin allergy to complex
mixtures: how to identify them? Cosmetics 6(4):71
3. Zulaikha R, Norkhadijah SI, Praveena SM (2015) Hazardous ingredients in cosmetics and
personal care products and health concern: a review. Public Health Res 5(1):7–15. https://doi.
org/10.5923/j.phr.20150501.02
4. Barnett J, Leftwich J, Muncer K, Grimshaw K, Shepherd R, Raats MM, Gowland MH, Lucas
JS (2011) How do peanut and nut-allergic consumers use information on the packaging to avoid
allergens? Allergy 66:969–978
5. Al-Saleh I, Al-Enazi S, Shinwari N (2009) Assessment of lead in cosmetic products. Regul
Toxicol Pharmacol 54(2):105–113
6. Mutha A, Patel K (2020) Food detection and recognition system. IJREAM 05(12):316–319
7. Ocay AB, Fernandez JM (2017) NutriTrack: Android-based food recognition app for nutri-
tion awareness. In: 2017 3rd IEEE international conference on computer and communications
(ICCC)
8. Mokdara T, Pusawiro P (2018) Personalized food recommendation using deep neural network.
In: 2018 seventh ICT international student project conference (ICT-ISPC)
OCR-Based Ingredient Recognition for Consumer Well-Being 491

9. Hamad K, Kaya M (2016) A detailed analysis of optical character recognition technology. Int
J Appl Math Electron Comput (Special Issue 1):244–249
10. Lund WB, Kennard DJ, Ringger EK (2013) Combining multiple thresholding binarization
values to improve OCR output. Presented in document recognition and retrieval XX conference
2013, California. SPIE, USA
11. Lázaro J, Martín J, Arias J (2010) Neuro semantic thresholding using OCR software for high
precision OCR applications. Image Vis Comput
12. Bhavani S, Thanushkodi K (2010) A survey on coding algorithms in medical image
compression. Int J Comput Sci Eng 2(5):1429–1434
Design and Implementation of an SPI
to I2C Bridge for Seamless
Communication and Interoperability
Between Devices

J. Harirajkumar, M. Santhosh, N. Sasirekha, and P. Vivek Karthick

Abstract The SPI to I2C bridge serves as a communication interface, enabling


smooth data exchange between SPI and I2C devices. Acting as an intermediary, this
bridge facilitates the conversion of data frames and protocol translations between
the two distinct serial communication protocols. The bridge consists of three key
components: a SPI master module, a protocol conversion unit (PCU), and an I2C
slave module. The SPI master module, typically implemented using a controller, acts
as the sender, while the I2C slave module represents the receiver, typically peripheral
devices. The PCU plays a crucial role in converting SPI signals and data formats into
their I2C counterparts, ensuring seamless compatibility and communication between
the devices. To validate the bridge’s performance and functionality, simulation and
synthesis are conducted using appropriate software tools like EDA Playground and
Verilog. By employing the SPI to I2C bridge, seamless communication and inter-
operability between SPI master devices and I2C slave devices are achieved. This
solution offers an efficient and versatile approach for integrating peripherals that
utilize different serial communication protocols.

Keywords SPI · I2C · SPI to I2C bridge

J. Harirajkumar · M. Santhosh (B) · N. Sasirekha · P. Vivek Karthick


Department of Electronics and Communication Engineering, Sona Collage of Technology, Salem,
Tamil Nadu 636005, India
e-mail: santhoshmano2000@gmail.com
J. Harirajkumar
e-mail: harirajkumar.j@sonatech.ac.in
N. Sasirekha
e-mail: sasirekha.n@sonatech.ac.in
P. Vivek Karthick
e-mail: vivekkarathick@sonatech.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 493
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_42
494 J. Harirajkumar et al.

1 Introduction

Two protocols, known as serial peripheral interface (SPI) as well as inter-integrated


circuits (I2C), are frequently used in serial communication. Both of these protocols
are widely used, and different peripheral from different manufacturers often use
SPI or I2C for serial communication. An I2C slave unit (often a peripheral device)
and an SPI master device (generally a controller) are connected through a protocol
converting unit (PCU) that has been designed in this work to make data as well as
communication transmission easier.
The outcomes of the simulation are shown for the “SPI to I2C” top-level design,
which consists of an I2C slave, a protocol converting unit (PCU), plus an SPI master.
The EDA Playground programme was used to create this design. Verilog is used for
both the test bench and the overall design.

2 SPI

Motorola first created SPI, a protocol for synchronous serial transfer, in the middle of
the 1980s. It offers full-duplex communication and has a four-wire interface. Single
master device and numerous slave devices make up the master–slave concept that
serves as the foundation of the SPI architecture. Up to 400 Mbps of data rate are
supported by SPI [1]. The four pins listed below are present on every SPI device:
1. SCLK (Serial Clock): This pin is responsible for the generation of the clock
signal by the master.
2. MOSI: This pin is used by the master to transmit data to the slave device.
3. MISO: This pin is used by the master to receive data from the slave equipment.
4. SS: With the use of this pin, the master can choose which specific slave device
it wants to speak with. The combinations of the clock polarization (CPOL) and
clocking phase (CPHA) dictate the four various data transfer modes that SPI
allows. These modes specify how the signal from the clock and the data are
related in timing. CPOL and CPHA pairs are a frequent way to express the
modes.
Low idle time on the clock. The leading (rising) clocks edge is where the data is
sampled. The trailing (falling) clock edge is used to distribute data. The trailing edge
of the clock is used to sample the data. The lead (rising) clock edge is where data
is moved out. The timer is set to high idle. The lead (falling) clock edge is where
the data is sampled. The following (rising) clock edge is when data is distributed.
Data is collected on the clock edge that is following (rising). The lead (falling) clock
edge causes a shift in the data. MSB first and LSB foremost are two additional
bit orderings that SPI supports. To enable proper data transfer, the communicating
devices have to concur on the identical bit order. The clock rate (SCLK) supplied by
the master controls the SPI communication’s speed. The clock frequency specifies
Design and Implementation of an SPI to I2C Bridge for Seamless … 495

Fig. 1 SPI protocol diagram

the rate at which data is transferred. The maximum clock frequency depends on the
capabilities of the devices and is typically specified in their datasheets. SPI supports
communication with multiple slave devices using individual chip select (CS) lines
(Fig. 1).
The master activates the specific slave’s chip select line to select it for commu-
nication. Each slave device has its own CS line, enabling selective communication
while others remain inactive. SPI does not include built-in error checking mecha-
nisms like checksums or acknowledgements. It assumes a reliable communication
channel between the master and slaves. If error detection and correction are required,
additional protocols or mechanisms can be implemented on top of SPI. In terms of
topology, SPI typically follows a bus configuration. The master device is connected
to multiple slave devices through shared data lines (MISO and MOSI) and a clock
line (SCLK). Each slave device has its dedicated chip select line (CS) to enable or
disable communication. SPI is commonly used in full-duplex mode, where data can
be simultaneously transmitted and received. However, it also supports half-duplex
mode, where the master and slave devices take turns transmitting and receiving data,
as well as simplex mode, where data flows in only one direction. Remember that
specific implementation details of SPI can vary between devices and manufacturers.
It’s recommended to consult the documentation and data sheets of the devices being
used for precise information about their SPI implementation (Fig. 2).
The Steps Outline the Communication Process in SPI
1. The time signal (SCLK), which is normally generated by the master and operates
in the MHz band.
496 J. Harirajkumar et al.

Fig. 2 SPI timing diagram

2. The figure at the beginning of the slave select line is high. The master must push
data below the selected slave device’s serial line in order for communication to
start.
3. The data stored in the MOSI and MISO values of the master and slave circuitry
is pushed out and sent to one another with each SCLK clock pulse [2].
4. When all of the data bits, that are normally eight bytes, have been sent, the
master computer pulls above the SS line of the slave to signify the end of the
transmission. Currently, the master has the first data kept by the slave, whereas
the slave currently houses the initial data registered in the master [3].

I2C
The synchronized serial protocol for communication I2C, also known as inter-
integrated circuit was developed that Philips Semiconductor first announced in 1982.
It uses a two-wire connection and can communicate in half-duplex [4]. The archi-
tecture of I2C follows a master–slave model. I2C is a serial communication protocol
that utilizes a bus topology, enabling multiple devices to be connected on the same
bus. Every equipment on the bus has an individual address, and the protocol supports
both master and slave roles, with communication initiated by the master. The I2C
protocol requires only two wires: SDA and a SCL.
These lines are open-drain, allowing any device to pull them low, while pull-up
resistors ensure they remain in the high state when not actively driven low. In an I2C
system, the master device controls communication by initiating data transfers and
generating the clock signal. Slave devices respond to commands or requests from
the master. The flow of communication is determined by the master through start
and stop conditions, which mark the beginning and end of data transfers. In order to
accomplish this, the master controls the SDA line, while the SCL line is elevated.
Each I2C device connected to the bus has a distinct 7-bit or 10-bit location that makes
it easier to communicate with particular slave devices (Fig. 3).
Design and Implementation of an SPI to I2C Bridge for Seamless … 497

Fig. 3 I2C protocol diagram

The 7-bit addressing scheme is commonly used, with the first seven bits of a
transmitted byte representing the device address. The path of data transport is then
indicated by a read/write bit. I2C offers three alternative data transfer speeds: stan-
dard (up to 100 kb), rapid (up to 400 kb), and fast speeds (as much as to 3.4 Mbps).
These modes define the timing characteristics, such as clock frequencies and data
setup/hold times. After each byte of data transfer, both the master and slave devices
send an acknowledgement bit (ACK) on the SDA line. If a device receives a byte
correctly, it pulls the SDA line low during the ACK bit. If a device does not acknowl-
edge, it leaves the SDA line high (NACK), indicating an error or the end of data
transfer. In I2C, slave devices have the ability to stretch the clock signal by holding
the SCL line low if they require additional time to process data. This feature allows
slower devices to synchronize their communication speed with faster devices. In a
multi-master configuration, where multiple masters attempt to communicate simul-
taneously, arbitration takes place. Each master monitors the bus and compares the
level on the SDA line with the level it is driving. If a conflict is detected, the master
relinquishes control, allowing the winning master to continue communication. The
I2C protocol also includes support for a general call address (0x00), which enables a
master to broadcast a command or data to all devices on the bus simultaneously. I2C is
extensively used for interconnecting a wide range of devices in various applications,
including embedded systems, consumer electronics, and communication interfaces.
Its simplicity and flexibility make it suitable for connecting sensors, EEPROMs, LCD
displays, real-time clocks, and numerous other peripherals. I2C supports different
data rates, including 3.4 Mbps in high-speed mode, 100 kbps in standard mode, and
400 kbps in fast mode. The subsequent two pins are included in every I2C device:
498 J. Harirajkumar et al.

Fig. 4 I2C data format

1. SCL: The master device generates the clock signal, typically operating in the
kHz range, on this line [5].
2. SDA: This line is responsible for carrying the serial data between the master and
slave devices.
In an I2C frame, there are typically 20 bits, including the Start bit, 7 bits for the 8
bits of data, an acknowledgement (ACK) bit, the address, an R/W (Read/Write) bit,
another ACK bit, and then the Stop bit [6] (Fig. 4).
Steps Outline the Communication Process in I2C
1. Ideally, the SCL and SDA signals are both high at the beginning.
2. SDA transitions from high to low levels, followed by SCL migrating between
high to low, to start the conversation. This series stands in for the Start bit.
3. The clock signal is generated by the master via the SCL line, and every data
transfer among the devices is timed to this clocking output on the SDA lines [7].
4. The master transmits the physical location of the slave unit it wants to talk to. An
R/W bit is inserted to the address to specify whether the action being performed
is a read and write [8]. Just the slave device that matches its address on the line
connected to the SDA sends an ACK signal, out of every one of the slave devices
linked to the master. The SCL and SDA connections are given back to other slave
devices. When reading or writing, the R/W flag is set to 1, respectively.
5. The information is transmitted following the address. The SCL line changes
from low to elevated after the data transmission is finished, subsequently, the
SDA line shifts from low to high. The Stop bit is represented by this sequence,
which denotes send of communication [9].

SPI to I2C bridge


An SPI to I2C bridge, also referred to as an SPI–I2C converter or translator, is a
device designed to facilitate communication between devices that utilize the SPI
and I2C protocols. Its purpose is to serve as an interface or bridge between SPI-
based devices and I2C-based devices, enabling them to communicate with each other
smoothly. The primary function of an SPI to I2C bridge is to convert the signals and
protocols employed by SPI and I2C, respectively. By doing so, it enables SPI devices
to seamlessly communicate with I2C devices, and vice versa. This capability proves
particularly useful in situations where there is a requirement to interface devices that
operate on different communication protocols within the same system.
Design and Implementation of an SPI to I2C Bridge for Seamless … 499

Here Are Some Key Aspects and Features Associated with SPI to I2C bridges

Proposed Work
Designing an SPI and I2C bridge in Verilog can be a complex task, but I can provide
you with a high-level outline of the proposed work. Keep in mind that this is a
simplified overview, and the actual implementation may require additional steps and
considerations depending on your specific requirements.
Define Interface Parameters
Determine the clock frequency and data rates for both I2C and SPI interfaces.
Define the data widths for the input and output data buses.
I2C Slave Module
Implement an I2C slave module to receive data from the I2C master.
Incorporate the necessary logic to recognize start, stop, and data transmission
sequences.
SPI Slave Module
Implement an SPI slave module to receive data from the SPI master.
Include logic to handle various SPI modes and timing requirements.
Internal Data Handling
Create a buffer or FIFO to store the incoming data from both I2C and SPI interfaces.
Design logic to handle data flow and conversions between the two interfaces.
Address Mapping
Define address mapping for the bridge to enable reading and writing from specific
registers.
Implement logic to decode incoming addresses and route data accordingly.
State Machine
Design a state machine to manage the operation of the bridge.
This state machine will handle the transitions between I2C and SPI modes and
manage data flow.
Clock Domain Crossing
If the I2C and SPI interfaces operate on different clock domains, implement
appropriate clock domain crossing techniques.
Testing
Create a test bench to verify the functionality of the bridge.
Write test cases to cover various scenarios, including different data rates, read/
write operations, and error handling.
500 J. Harirajkumar et al.

Simulation and Verification


Simulate the design using a Verilog simulator.
Perform functional verification and debugging to ensure correctness.
Synthesis and Implementation
Synthesize the Verilog code using a synthesis tool to obtain a gate-level netlist.
Perform place and route to implement the design on the target FPGA or ASIC.
Integration and System Testing
bridge into the larger system or project where it will be used.
Perform system-level testing to verify the bridge’s functionality in the context of
the entire system.
Designing an I2C and SPI bridge involves various considerations, such as handling
clock synchronization, dealing with different data rates, and ensuring proper error
handling and data integrity. The actual implementation details will depend on your
specific requirements and the complexity of the system in which the bridge will be
integrated.
Protocol Conversion
The bridge acts as a translator, converting the SPI data signals and timing into their
equivalent I2C counterparts, and vice versa. This involves converting the clock, data,
and control signals of one protocol to match the corresponding signals of the other
protocol.
Master/Slave Configuration
The bridge can operate as either a master or slave for both the SPI and I2C protocols,
depending on the specific requirements of the system. In master mode, it can initiate
data transfers, while in slave mode, it responds to commands or requests from other
devices on the bus. Address Mapping: Since SPI and I2C devices employ different
addressing schemes, the bridge may include address mapping or translation capa-
bilities. This functionality allows SPI devices to communicate with I2C devices by
mapping the SPI device’s address to the corresponding I2C address, and vice versa.
Clock Synchronization
SPI and I2C utilize different clocking mechanisms. The bridge handles the synchro-
nization of clock signals between the two protocols, ensuring proper timing and
data transfer rates during communication. Signal Level Shifting: SPI and I2C often
operate at different voltage levels. To ensure compatibility between devices with
varying voltage requirements, the bridge may incorporate level shifting circuitry.
This enables devices operating at different voltage levels to communicate effectively.
Configuration and Control: An SPI to I2C bridge typically provides configuration
options and control registers to customize its behaviour and settings. Users can set
parameters such as clock frequencies, addressing modes, and operational modes
according to the specific needs of the system (Fig. 5).
Design and Implementation of an SPI to I2C Bridge for Seamless … 501

Fig. 5 SPI clock synchronization diagram

SPI to I2C bridges find applications in various scenarios, such as integrating


SPI sensors or peripherals into an I2C-based system, or vice versa. They offer a
convenient solution for interfacing and achieving interoperability between devices
that employ different communication protocols. Ultimately, these bridges enable
seamless communication and data exchange within a system, enhancing overall
functionality and flexibility.
The interface module makes it easier for the SPI master to communicate with the
I2C slave in order to control it [10]. Both our SPI to I2C the bridge, which permits
communication within the SPI slave as well as the I2C bus master, and a direct
connection among the SPI master and an SPI slave are included in it. Among the SPI
slave and the I2C master, the SPI to I2C the bridge, which was created as a converter,
acts as a bridge [11]. Unless either the SPI_busy or SPI_ready signals are high or
low, ready is the current state regarding SPI to I2C bridge. If an event occurs that
causes SPI_ready to go high, and SPI_busy to go low (signalling a message that the
SPI slave is receiving additional data from the SPI master), the bridge accepts the
information supplied from the SPI slave and adjusts to the SPI_RX state. Now the
SPI data has been latched.
Next, it checks the 25th bit of the received data, which represents the I2C trans-
action enable bit. The I2C state is the next state reached if this bit’s value is high.
In all other circumstances, READY status is restored. The bridge waits for the I2C
data transfer to be completed as long as it is in the I2C state. The I2C transaction is
written to the slave transmissions register after the I2C transfer if the SPI is not in
use. At that point, it resets to the READY state (Fig. 6).
502 J. Harirajkumar et al.

Fig. 6 Block diagram for SPI to I2C bridge

3 Results

With the help of EDA Playground, a top-level design known as “SPI to I2C” was
constructed to analyze the simulation data and produce the stated design.
The I2C slave (the recipient) is represented by an instance of the I2C master called
I2C slave. Due to the reduced performance of the I2C slave during the acknowledge-
ment step, it must hold onto the I2C SCL in order to synchronize and transmit the
acknowledgement effectively. The simulated wave form’s thickening of the SCL lines
serves as an indicator of this. The I2C SDA line is also bi-directional, which results
in an EDA high state on the SDA line, while the I2C master waits to receive confir-
mation from the I2C slave. The waveform of the simulation shows this behaviour as
well. The outcome of the “SPI to I2C” top-level design simulation in which the data
supplied by the SPI masters on the MOSI line arrives successfully on the I2C SDA
line (Figs. 7, 8).
The serial peripheral interface (SPI) communication waveform is composed of
multiple clock cycles during which data is transferred between a master and a slave
device. It’s important to note that the specific timing and behaviour of the SPI wave-
form can vary based on the SPI mode and the devices involved in the commu-
nication. The clock frequency, clock polarity, and clock phase are configurable
parameters that need to be set consistently between the master and slave devices
for successful communication. The typical waveform for writing data over the I2C
(Inter-Integrated Circuit) bus consists of the typical waveform for reading data over
the I2C (Inter-Integrated Circuit) bus consists of the Protocol (Figs. 9, 10).
Design and Implementation of an SPI to I2C Bridge for Seamless … 503

Fig. 7 Output for SPI master

Fig. 8 Output for I2C slave

Fig. 9 Waveform for SPI write data


504 J. Harirajkumar et al.

Fig. 10 Waveform for I2C read data

4 Conclusion

Understanding the differences between SPI and I2C, including their wave forms and
usage scenarios, is essential when selecting the appropriate protocol for a specific
application. Both SPI and I2C have their own advantages and are suitable for different
applications. SPI is generally faster and more suitable for short-distance communi-
cations, while I2C is preferred for slower-speed communication and for connecting
multiple devices on the same bus.
I2C is a multi-master, synchronous, half-duplex, and synchronous protocol
that uses a shared bus, accommodating multiple devices. It offers a standardized
addressing scheme, enabling straightforward scalability in larger systems. With I2C,
fewer pins are required for communication, although it operates at slower data rates
compared with SPI. It finds common usage in interconnecting sensors, low-speed
peripherals, and integrated circuits. However, the I2C protocol may face limitations
regarding signal integrity and communication reliability due to constraints such as
maximum capacitance and bus length.
Overall, both the SPI and the I2C protocols have advantages and disadvantages,
and the choice of protocol should be made depending on the particular needs and
limitations of the task at hand.

References

1. Kiran V, Nagraj V (2013) Design of SPI to 12C protocol converter and implementation of
low-power techniques. Int J Adv Res Comput Commun Eng 2
2. Singh JK, Tiwari M, Sharma V (2012) Design and implementation of I2C master controller on
FPGA using VHDL. IJET 4(4)
3. Patnaik E, Bandewar S (2017) Design and realization for interfacing two integrated device
using I2C bus—a review
4. Trujilho L, Saotome O, Öberg J (2022) Dependable I2C communication with FPGA. In:
Proceedings of the 7th Brazilian technology symposium (BTSym’21) emerging trends in
systems engineering mathematics and physical sciences, vol 2. Springer
5. Warrier AS et al (2013) FPGA implementation of SPI to I2C bridge. Int J Eng Res Technol
(IJERT) 2(11)
Design and Implementation of an SPI to I2C Bridge for Seamless … 505

6. Trivedi D et al (2018) SPI to I2C protocol conversion using Verilog. In: 2018 fourth international
conference on computing communication control and automation (ICCUBEA). IEEE
7. Levshun D, Chechulin A, Kotenko I (2018) A technique for design of secure data transfer
environment: application for I2C protocol. In: 2018 IEEE industrial cyber-physical systems
(ICPS). IEEE
8. Sahu A, Mishra RS, Gour P (2011) An implementation of I2C using VHDL for DATA
surveillance. Int J Comput Sci Eng (IJCSE) 3(5):1857–1865
9. Sajjanar S, Prabhakar MD (2020) A review on BIST embedded I2C and SPI using Verilog
HDL
10. Oudjida AK et al (2009) FPGA implementation of I2C & SPI protocols: a comparative study.
In: 2009 16th IEEE international conference on electronics, circuits and systems (ICECS 2009).
IEEE
11. Sabeenian D, Harirajkumar J, Akshaya B. Review paper of multipliers-driven perturbation of
coefficients for low power operation in reconfigurable FIR filter. Turk J Physiother Rehabil
32:2
Greenhouse Automation System Using
ESP32 and ThingSpeak for Temperature,
Humidity, and Light Control

Rajesh Singh, Ahmed Hussain, Laith H. A. Fezaa, Ganesh Gupta,


Anil Pratap Singh, and Ayush Dogra

Abstract There has been a lot of excitement around greenhouse automation systems
and their potential to improve growing conditions and yield. This project describes
how to build a Greenhouse Automation System using an ESP32 WiFi Board, an
AM2301 humidity and temperature sensor, an Actuator Relay Module for light
control, and the addition of ThingSpeak for data analysis. The key functions of a
Greenhouse Automation System are the continuous monitoring and adjustment of
environmental conditions like as temperature, humidity, and lighting. It is beneficial
to plant health and yield when these elements are assessed and regulated accurately.
With the system’s automated lighting based on user-defined schedules or prede-
fined criteria, plant growth and energy use may be optimized. Connectivity with
ThingSpeak, an IoT analytics platform, further streamlines the solution for data
visualization, storage, and analysis. ThingSpeak uses a cloud-based architecture to
store and interpret data in real time from sensors. Additionally, the solution for data

R. Singh (B)
Department of ECE, Uttaranchal Institute of Technology, Uttaranchal University,
Dehradun 248007, India
e-mail: drrajeshsingh004@gmail.com
A. Hussain
Medical Technical College, Al-Farahidi University, Baghdad, Iraq
e-mail: ahmeds909091@gmail.com
L. H. A. Fezaa
Department of Optical Techniques, Al-Zahrawi University College, Karbala, Iraq
e-mail: laithfizaa@gmail.com
G. Gupta
Department of CSE, Sharda University, Greater Noida, India
e-mail: ganeshgupta81@gmail.com
A. P. Singh
Department of CSE, IES Institute of Technology and Management, IES University, Bhopal, India
e-mail: research@iesbpl.ac.in
A. Dogra
Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
e-mail: ayush123456789@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 507
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_43
508 R. Singh et al.

visualization, storage, and analysis is made smoother by connection with ThingS-


peak, an IoT analytics platform. ThingSpeak’s cloud-based infrastructure stores and
processes collected sensor data in real time. Because of this, users may gain impor-
tant information regarding greenhouse conditions and make educated judgments
and modifications. The Greenhouse Automation System of this research project is
a complete solution for monitoring and controlling the greenhouse’s temperature,
humidity, and light levels using an ESP32 WiFi Board, an AM2301 humidity and
temperature sensor, an Actuator Relay Module, and the ThingSpeak platform. The
technology creates ideal growth conditions by automatically regulating temperature,
humidity, and light. This helps to improve agricultural productivity and sustainability.

Keywords ESP32 · ThingSpeak · Greenhouse monitoring · Real-time


monitoring · Research · Technology

1 Introduction

Finding the right protocol for developing certain systems, like greenhouses, is neces-
sary for the present IoT era when machine-to-machine contact is expanding. Agricul-
tural greenhouses are controlled environments where critical variables like lighting,
temperature, moisture, and soil pH level may be tracked using sensor systems that
follow IoT standards [1]. The design and deployment of a sensor network that is
wireless for tracking and managing greenhouse settings are discussed in the cited
paper. The technology seeks to increase performance and productivity by tracking
and managing greenhouse conditions. A smart greenhouse monitoring system is
created to meet the issues farmers confront. This device uses Arduino and Atmega328
microcontrollers for power and contains a pump, a 12 V DC fan, an LCD, and several
sensors, including ones for light, temperature, humidity, soil moisture, and LDR [2,
3]. Due to their incapacity to manage crucial elements essential to the growth and
production of plants, farmers frequently struggle to make a profit from their crops.
The greenhouse must be kept above a specified temperature, and crop humidity must
be controlled. Monitoring the pH of the soil is also essential for optimum development
[4].
An IoT-based system is created to monitor and regulate many factors in the farming
field, such as moisture, water level, and temperature. Efficient resource management
is essential. The system uses a wireless sensor network to collect data, which gener-
ates a lot of data that needs a lot of storage space and slows down network trans-
mission from slave to control nodes [5]. A real-time embedded system employing
a WSN powered by ZigBee technology is created to regulate and monitor green-
house settings. Due to its communication long distance, WSN is regarded as the best
choice for greenhouses. The goal of this system is to give control over a variety of
environmental factors, such as the moisture in the soil, and humidity, to establish the
optimal climate for plant development [6–8].
Greenhouse Automation System Using ESP32 and ThingSpeak … 509

2 Background Details and Related Work

The presented approach optimizes water use by considering the water requirements of
plants rather than depending on presumptions made by cultivators. It contains both
static information—such as the kind of plant and soil—and dynamic information
gathered through sensors. Another similar study concentrates on two elements. The
monitoring of greenhouse conditions will first be automated [9–11]. A mobile app
makes it possible to inform users of the current state via their mobile phones, making
monitoring and maintenance easier [12, 13]. The Internet of Things (IoT) is used to
transport the gathered data to the cloud for storage [14, 15]. Plants are vulnerable
to the negative impacts of temperature changes, which might cause them to pass
away within a few hours. Global connectivity and administration of sensors, devices,
and users have been made possible by the rise of IoT. Various parameter measures,
such as moisture, temperature, and humidity, are needed for efficient monitoring
and management in contemporary greenhouses. The issue under discussion suggests
using the Internet of Things to automate greenhouse conditions. Several sensors are
used, including the Netduino 3 platform, to measure humidity, temperature, sunshine,
and wetness. The objective is to increase output rates while reducing farmer pain
[16, 17].
The current greenhouse systems frequently rely on people monitoring, requiring
repeated visits, and upsetting employees. Reduced yields may result from improper
temperature and humidity regulation. The idea of greenhouse automation has evolved
as a solution to these problems. The greenhouse atmosphere may be autonomously
managed and monitored by fusing IoT and embedded devices, negating the require-
ment for continuous farmer interaction [17]. The outlines of a study aimed at creating
a productive nursery plant-growing method [18]. The objective is to maintain the
ideal temperature, humidity, moisture, and light levels for plant growth. This is
accomplished by monitoring the greenhouse’s environment and controlling settings
following the requirements of each crop using a low-cost, configurable ESP8266
device [18]. It focuses on the necessity for standalone and modular devices while
talking about the positioning of IoT devices close to crops in greenhouses. The
employment of AI and IoT technology allows for environmental monitoring and
management in a greenhouse setting since the acquired data is communicated through
networks. Growing tomatoes and other veggies is the goal of this initiative [19].
Finally, a key study refers to a work that analyzes previous studies on greenhouse
systems. This work’s major goal is to uncover research gaps and emphasize the
essential components, advantages, and difficulties of greenhouse systems [20].
510 R. Singh et al.

3 Block Diagram and Problem Formulation

The block diagram of the Greenhouse Automation System showcases the intercon-
nected components and their respective functionalities (Fig. 1). At the core of the
system is the customized ESP32 WiFi Board, serving as the central control unit
responsible for communication, data processing, and control actions. Acting as a hub,
it connects to various sensors and actuators, enabling seamless interaction among
them. One essential component in the system is the AM2301 humidity and temper-
ature sensor. This sensor accurately measures the humidity and temperature levels
within the greenhouse, providing vital data for monitoring and maintaining optimal
environmental conditions. Capturing these parameters ensures that the greenhouse
remains conducive to plant growth. Another integral part is the Actuator Relay
Module, which plays a crucial role in controlling the lighting system. The relay
module receives switching signals from the ESP32 WiFi board which processes user
inputs and user-preset thresholds, subsequently saving energy and optimizing light
delivery based on plant requirements. The electrical energy is supplied from a reliable
battery pack that is capable of powering all the components stably without causing
malfunction.
The system has ThingSpeak API plugged into it at an assembly code level.
Equipped with internet connectivity and 4 MB storage, the ESP32 can effectively
store, transmit, and display the data collected over a webpage. The WPA2 protocol
ensures secure data transmission to the internet and the API secret key ensures secure
data passage to the ThingSpeak cloud infrastructure, allowing users to access and
utilize the features provided while keeping their secure data channel intact. The tools

Fig. 1 Block diagram of the greenhouse automation mote


Greenhouse Automation System Using ESP32 and ThingSpeak … 511

provided can be used to analyze the greenhouse data and provide some much-required
insights into the optimum environmental conditions as per the custom requirements
of the floral species planted.
The information flow and the control signals are channeled as per the illustrated
block diagram. The ESP32 collects temperature and humidity data from the AM2301
sensor and directs the relay module accordingly, triggering the lighting system. In
summary, the block diagram demonstrates the integrated nature of the Greenhouse
Automation System. The components work harmoniously, with the ESP32 WiFi
Board as the central control unit, facilitating data acquisition, control actions, and
communication. By leveraging the AM2301 sensor, Actuator Relay Module, reliable
power supply, and ThingSpeak integration, the system achieves optimal environ-
mental conditions within the greenhouse, promoting efficient and sustainable plant
growth.

4 Hardware Development

The hardware components for the Greenhouse Automation Mote are as follows
(Fig. 2):

1. The ESP32 board is the main control system of the whole project, where the Vin
pin of the ESP32 board has been connected to the 5 V Dc power supply and
ground to the common ground pin of the external power supply.

Fig. 2 Connection diagram of greenhouse automation mote


512 R. Singh et al.

2. To monitor the ambient humidity and temperature, the AM2301 sensor has
been connected to the ESP32 board. The voltage pin of the AM2301 has been
connected to the voltage pin of the power supply and ground to the common
ground. The data pin has been connected to the GPIO19 pin of the ESP32 board.
3. The single-channel relay module has been connected to the system so that it can
control the light as per the instruction from the ESP32 board. So, the voltage
pin of the single-channel relay has been connected to the voltage pin of the
power supply and ground to the ground. The signal pin has been connected to
the GPIO16 pin of the ESP32 board.
4. The common DC power supply has been attached to this project so that it can
run properly in the environment.

5 Implementation of the System

The algorithm for the Greenhouse Automation System adopts a systematic approach
to closely monitor and control the greenhouse environment, aiming to create optimal
growing conditions. The process begins with the initialization of the ESP32 WiFi
Board, where essential communication parameters are configured. By establishing
a seamless connection to the AM2301 humidity and temperature sensor, the system
continuously receives real-time data regarding temperature and humidity levels
within the greenhouse. The hardware setup for the Greenhouse Automation Mote
can be seen in Fig. 3.
Once the sensor data is obtained, the algorithm proceeds to analyze it by applying
predefined thresholds or user-defined rules. This analysis serves to evaluate whether
any adjustments are necessary to maintain the desired environmental conditions.
Based on the analysis results, the algorithm sends control signals to the Actuator
Relay Module, which regulates the lighting system accordingly. By activating or
deactivating the lights, the system ensures that the appropriate light levels are main-
tained to support plant growth at different stages. To enable seamless data transmis-
sion and storage, the algorithm establishes a secure connection between the ESP32
WiFi Board and the ThingSpeak platform. The collected sensor data is formatted
appropriately and transmitted to ThingSpeak’s cloud infrastructure through this
connection. ThingSpeak’s powerful data visualization and analysis tools are then
utilized to gain valuable insights into greenhouse conditions.
Adjustments or optimizations for improved plant growth can be made with confi-
dence after careful observation of past trends, recognition of patterns, and exami-
nation of the data. Greenhouse conditions may be monitored and adjusted in real
time because the algorithm runs in a continuous feedback loop. The system guar-
antees consistent optimal growing conditions by continuously reading sensor data,
analyzing it, and taking relevant management measures. By implementing this plan,
the system will be able to maximize agricultural productivity and operate reliably
regardless of external factors. ThingSpeak server dashboards provide real-time access
to the data anywhere in the world (Fig. 4). Overall, the Greenhouse Automation
Greenhouse Automation System Using ESP32 and ThingSpeak … 513

Fig. 3 Greenhouse automation mote

System’s algorithm demonstrates a comprehensive and iterative process that enables


effective monitoring and control of temperature, humidity, and lighting conditions.
By leveraging the capabilities of the ESP32 WiFi Board, AM2301 humidity and
temperature sensor, Actuator Relay Module, and integration with ThingSpeak, the
resulting output is an automated system capable of monitoring plant growth and
improving the control parameters of crop growth.

6 Conclusions

The Greenhouse Automation System built from the ESP32 WiFi chip, the AM2301
temperature–humidity sensor, and an actuator relay set has proven highly effective
in optimizing and regulating isolated enclosures for testing out optimal conditions
for the growth of crops. With the hardware portion handled by the ESP32 as the
processing and communication unit, the AM2301 as the sensor module, and the
relay as the output module, the system, coupled with the ThingSpeak cloud, enables
real-time monitoring and control of key environmental parameters. The removal of
configuration jargon ensures that the plant receives the optimum temperature light
and humidity required to thrive and grow. Other than displaying the sensor readings,
the system also offers a detailed perspective of the greenhouse conditions with the
514 R. Singh et al.

Fig. 4 Live data in the Cloud server of ThingSpeak channel web dashboard

integration of data analysis scripts and graphical visualization tools. The gathered
data can be fed into an AI model to recognize critical patterns and events and the
system can be programmed to respond to such stimuli with suitable adjustments as
per plant requirements. Overall, the Greenhouse Automation System presented in this
study utilizes IoT in a way that enhances the quality of agro-practices with minimal
effort from the user’s side. Coupled with a compact and rugged form factor equipped
with the sensors and actuators required, the system offers a minimalistic, efficient,
cost-effective, and sustainable solution for greenhouse management. To add to that,
the modularity offered by this system helps futureproof it and allows for embedding
newer expansions and technologies, making it open to further development efforts.

References

1. Singh TA, Chandra J (2018) IOT based green house monitoring system. J Comput Sci
14(5):639–644. https://doi.org/10.3844/jcssp.2018.639.644
2. Pidikiti T et al (2021) Wireless Green house monitoring system using Raspberry PI. Turkish J
Comput Math Educ (TURCOMAT) 12(2). https://doi.org/10.17762/turcomat.v12i2.1894
3. Visvesvaran C, Kamalakannan S, Kumar KN, Sundaram KM, Vasan SMSS, Jafrrin S (2021)
Smart Greenhouse monitoring system using wireless sensor networks. In: 2021 2nd interna-
tional conference on smart electronics and communication (ICOSEC), Oct 2021, Published.
https://doi.org/10.1109/icosec51865.2021.9591680
Greenhouse Automation System Using ESP32 and ThingSpeak … 515

4. Sushmitha M, Jyothi V, Sain AV (2022) Implementation of Greenhouse control and monitoring


system using ESP32. In: 2022 IEEE international conference on distributed computing and
electrical circuits and electronics (ICDCECE), Apr 2022, Published. https://doi.org/10.1109/
icdcece53908.2022.9792852
5. Mete P (2019) IoT based Green house monitoring using data compressive sensing technique
in WSN. HELIX 9(3):5036–5041. https://doi.org/10.29042/2019-5036-5041
6. Arora G, Malik K, Singh I, Arora S, Rana V (2011) Formulation and evaluation of controlled
release matrix mucoadhesive tablets of domperidone using Salvia plebeian gum. J Adv Pharm
Technol Res 2(3):163
7. Mantri A, Dutt S, Gupta JP, Chitkara M (2008) Design and evaluation of a PBL-based course
in analog electronics. IEEE Trans Educ 51(4):432–438
8. Koundal D, Gupta S, Singh S (2018) Computer aided thyroid nodule detection system using
medical ultrasound images. Biomed Signal Process Control 40:117–130
9. Shekhar S, Garg H, Agrawal R, Shivani S, Sharma B (2023) Hatred and trolling detection
transliteration framework using hierarchical LSTM in code-mixed social media text. Complex
Intell Syst 9(3):2813–2826
10. Rani L, Kaushal J, Srivastav AL, Mahajan P (2020) A critical review on recent developments
in MOF adsorbents for the elimination of toxic heavy metals from aqueous solutions. Environ
Sci Pollut Res 27:44771–44796
11. Tiwari S, Kumar S, Guleria K (2020) Outbreak trends of coronavirus disease–2019 in India: a
prediction. Disaster Med Public Health Prep 14(5):e33–e38
12. Kareem OS, Qaqos NN (2019) Real-time implementation of greenhouse monitoring system
based on wireless sensor network. Int J Recent Technol Eng 8(2S2):215–219. https://doi.org/
10.35940/ijrte.b1039.0782s219
13. Tingre S (2022) WSN based green house monitoring system. Int J Res Appl Sci Eng Technol
10(7):3065–3067. https://doi.org/10.22214/ijraset.2022.45635
14. Nargotra M, Khurjekar MJ (2020) Green house based on IoT and AI for societal benefit. In:
2020 international conference on emerging smart computing and informatics (ESCI), Mar
2020, Published. https://doi.org/10.1109/esci48226.2020.9167637
15. Anjini R, Jenifer J, Blessy AMC (2021) IoT based automated hydroponics greenhouse
monitoring. Int J Adv Res Sci Commun Technol 671–681. https://doi.org/10.48175/ijarsct-960
16. Satpute R (2018) IOT based greenhouse monitoring system. Int J Res Appl Sci Eng Technol
6(4):2084–2085. https://doi.org/10.22214/ijraset.2018.4354
17. Raj JS, Ananthi JV (2019) Automation using IoT in greenhouse environment. J Inf Technol
Digital World 01(01):38–47. https://doi.org/10.36548/jitdw.2019.1.005
18. Kalaivani CT, Reddy GGSS, Reddy DJ, Rajasekhar G (2022) Environmental monitoring and
control system for Greenhouse with node MCU and GSM using IoT devices. In: 2022 8th
international conference on smart structures and systems (ICSSS), Apr 2022, Published. https://
doi.org/10.1109/icsss54381.2022.9782164
19. Nishida Y, Mchizuki R, Yasukawa S, Ishii K (2022) Exercise on environmental monitoring and
control of Greenhouse by IoT devices toward smart agriculture. Proc Int Conf Artif Life Rob
27:369–373. https://doi.org/10.5954/icarob.2022.os28-5
20. Rokade A, Singh M (2021) Analysis of precise Greenhouse management system using machine
learning based Internet of Things (IoT) for smart farming. In: 2021 2nd international conference
on smart electronics and communication (ICOSEC), Oct 2021, Published. https://doi.org/10.
1109/icosec51865.2021.9591962
Performance Analysis of Terrain
Classifiers Using Different Packages

Bobbinpreet Kaur, Arpan Garg, Haider Alchilibi, Laith H. A. Fezaa,


Rupinderjit Kaur, and Bhawna Goyal

Abstract The 20-newsgroup dataset includes about 20,000 articles that have been
systematically classified into 20 newsgroups. The compilation of 20 newsgroups
is emerging as a common source of data for studying the performance of machine
learning techniques language implementations, such as language detection and word
clustering. Each record in the dataset is actually a text file (a series of words). The data
is trained over different classifiers such as Support Vector Machines, Naïve Bayes
Model, Random Forests, Stochastic Gradient Descent, and Logistic Regression. We
have used Python Jupyter notebook for code execution and packages from libraries
like scikit learn (Sklearn), Natural Language Toolkit (NLTK), regular expression
(RE), Language Identification (LangID), etc., to test the model parameters and accu-
racy measurements. By tuning the parameters and relevant initialization, we aim to
test the outcomes to find out the best model with maximum accuracy.

Keywords SVM · Sklearn · NLKT · RE · LangID

B. Kaur (B)
Chandigarh University, Gharuan, Mohali, India
e-mail: bobbinece@gmail.com
A. Garg · B. Goyal
Department of ECE, Chandigarh University, Gharuan, Mohali, India
e-mail: arpan04garg@gmail.com
B. Goyal
e-mail: bhawnagoyal28@gmail.com
H. Alchilibi
Medical Technical College, Al-Farahidi University, Baghdad, Iraq
e-mail: haydershrf@gmail.com
L. H. A. Fezaa
Department of Optical Techniques, Al-Zahrawi University College, Karbala, Iraq
e-mail: laithfizaa@gmail.com
R. Kaur
Department of Computer Science and Engineering, Gulzar Group of Institutions, Khanna,
Punjab 141401, India
e-mail: reetkaurnagra@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 517
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_44
518 B. Kaur et al.

1 Introduction

Most knowledge-based technologies were utilized to simplify the different produc-


tion processes. Examples include professional decision-support systems, parallel
output smart scheduling systems, and fuzzy controllers. One challenge is collecting
the expert expertise needed to incorporate such information-based programs.
Machine learning technologies can help simplify the time-consuming task of gaining
information which is necessary for the creation of an information-based program [1].
Automation will accelerate the speed and reduce production costs by reducing the
time taken by specialists and professional engineers. Automation may also reveal
information that those engaged in the information-acquisition process would other-
wise miss. The machine learning area is concerned with allowing computer programs
to automatically boost their performance by practicing at these tasks. The field is
strongly related to pattern analysis and statistical inference. Most work in machine
learning has centered on classification, the task of creating a model, from a collec-
tion of examples previously categorized, that can correctly categorize new examples
from the same population [2]. The classification has a wide array of uses, including
engineering, computing, marketing, and science research. Many engineering issues
come into the classification framework, where specialists in the industrial sector are
asked to assign an object to a class name, or a condition depending on the particular
values of a series of criteria.
Machine learning and statistics both are supervised learning techniques, on this
technique the computer programs which the computer program acquire knowledge by
studying the data that is fetched into the computer system and after that, the computer
program makes use of given data to learn the way of classifying new observation.
Among the numerous machine learning methods developed for classification, the
most widely found in real-world computer fields may be inductive learning from
instances. Strategies in inductive learning are swift in contrast with other strategies.
Another benefit is that inductive learning methods are basic, so it is easy to understand
their generated models. Finally, classifiers for inductive thinking achieve comparable
and often greater accuracies compared to other classification strategies. The data input
we have here is in the form of text and we will train this data over various classification
models. The correctness of the machine learning model is a measurement that is used
to decide which model would be best in identifying relations and patterns between
the variables in a dataset based on the fetched input data [3]. The various steps used
by us in modeling the text data are as follows.

2 Data Exploration and Cleaning

2.1 Variable Identification

The data has 1707 columns, and target variable has 20 categories.
Performance Analysis of Terrain Classifiers Using Different Packages 519

2.2 Outlier Treatment

The outliers have been neglected by initializing the min_df = 0.01 and max_df =
0.85.

2.3 Removal of Non-ASCII Characters

We defined a function strip_non_ascii that returns the string without non-ASCII


characters.

2.4 Corpus Cleaning

We removed unwanted hyperlinks, punctuation, numeric words, words less than two
characters, @mentions by defining the remove features function and converting the
data to lowercase [4].

2.5 Lemmatization

Lemmatization typically refers to executing it correctly utilizing a word vocabulary


and morphological inspection. This is typically done with the goal of removing
inflection endings and reverting to a word’s lemma, or dictionary form. For perform
data cleaning, we defined a “lemmatize” function and applied that for our train and
test data.

3 Feature Extraction/Data Transformation

Feature engineering is an art to draw out more amount of information utilizing the
available data. In the case of data modeling, alteration refers to the substitution of a
variable by a suitable function. We first created a TF–IDF vectorizer that ignores all
stop words of the English dictionary and removed the outliers as well. We then fit this
to our training and test data. Using a Count Vectorizer, we created the document term
matrix for training and test data. By performing the above tasks, we have numerically
encoded the text corpus, where each of the columns is taken from the text for each of
the documents [5]. In the case of TF–IDF, the values are normalized for the features
concerning the number of documents containing a term, while Count Vectorizer
simply contains a count of all the tokens appearing in the documents.
520 B. Kaur et al.

4 Data Modeling

4.1 Multinomial Naïve Bayes (NB) Model

The Multinomial NB model is a classification method based on the theorem of Bayes


that allows equality between the predictors to be described. Naive Bayes’ Classifier
argues that the existence of features in a class is not linked to any other attribute
in the class. Multinomial Naive Bayes made us add that each p(fi) is a multinomial
distribution, rather than any other class c and fi distribution which is the probability of
observing features. It fits best with details that can quickly be converted into numbers,
such as word counts in the text [6]. In our code, we imported Multinomial NB from
sklearn. Naïve Bayes and initialized alpha = 0.1. We then evaluate the performance
Multinomial NB model by evaluating performance metrics through implementation
on training and test data. The results were obtained as under TF/IDF as shown in
Fig. 1 and Count Vectorizer as shown in Fig. 2.
The performance metrics were calculated for each category of the target variable
were obtained as under for TF/IDF shown in Fig. 3.

Naive Bayes Using TF/IDF


120
97.93 97.78 98.28 98.18
100 81.32 81.36 83.75 82.58
80
60
40
20
0
F Score Recall Precision Accuracy
Testing Training

Fig. 1 Graph of Naïve Bayes using TF/IDF

Naive Bayes Using Count Vectorizer


100 84.98 84.99 85.51 85.07
80 69.07 69.38 70.27 70.06

60
40
20
0
F Score Recall Precision Accuracy
Testing Training

Fig. 2 Graph of Naïve Bayes using Count Vectorizer


Performance Analysis of Terrain Classifiers Using Different Packages 521

Fig. 3 Estimated values of performance metrics for NB of the target variable for TF/IDF

The performance metrics is evaluated for each category of the target variable were
obtained as under for Count Vectorizer shown in Fig. 4.

4.2 Decision Trees

Decision Tree allows templates of grouping or degradation that are in tree structure.
Decision Tree divides the data collection into tiny sections and produces a corre-
sponding Decision Tree as this process is taking place. The resultant is a graphical
structure known as tree which consists of decision nodes and leaf nodes. The decision
node is one of the most important nodes and it further contains number of branches
(which generally are two or more) and a leaf node that is to determine logic of the
522 B. Kaur et al.

Fig. 4 Estimated values of performance metrics for NB model of the target variable for Count
Vectorizer

decision [7, 8]. In a tree, the highest decision node corresponds to the most relevant
predictor, the root node. Decision Tree Classifier is a class obtained on a dataset
during the classification of multigroups. Decision Tree Classifier takes two input
arrays as opposed to other classifiers: an array, sparse or compact, with size s = [n_
samples, n_features] containing the training samples, and an array Y with integer
values, size s = [n_samples] containing the class marks for the training samples [9].
The accuracy measures obtained for the model are based on TF/IDF shown in Fig. 5
and Count Vectorizer shown in Fig. 6.
Performance Analysis of Terrain Classifiers Using Different Packages 523

Decision Trees Using TF/IDF


60 53.43
47.96 47.59 46.99 48.96
50 42.84
41.89 41.43
40
30
20
10
0
F Score Recall Precision Accuracy
Testing Training

Fig. 5 Graph of Decision Trees using TF/IDF

Decision Trees Using Count


Vectorizer
56.74
60 48.68 47.66 50.99 48.7
42.8 41.27 42.43
40

20

0
F Score Recall Precision Accuracy
Testing Training

Fig. 6 Graph of Decision Trees using Count Vectorizer

4.3 Random Forest Classifier

Random Forest Classifiers are a set of classes, regressions, and other learning tech-
niques also known as Random Decision Forest. In Random Forests, a large number
of decision-making bodies are generated during training and the class that the indi-
vidual trees are determined by class mode (classification) or medium approximation
(regression) [8, 9]. It is a meta-estimator that adapts multiple decision-tab classi-
fiers to different data sub-samples and employs the average approach for enhanced
predictive accuracy and over-fit power. The sub-sample size is usually taken as same
as that of input sample size, but if bootstrap = True (default) [10], the samples are
drawn with a substitution. We made use of the sklearn Random Forest classifier with
100 estimators. The model measures are based on TF/IDF shown in Fig. 7 and Count
Vectorizer shown in Fig. 8.
524 B. Kaur et al.

Random Forest Model Measures Using TF/IDF


100 89.35 88.77 91.96 89.64
80 74.94
67.35 68.17 69.87
60

40

20

0
F Score Recall Precision Accuracy
Testing Training

Fig. 7 Graph of Random Forest Model Measuring using TF/IDF

Random Forest Model Measures


Using Count Vectorizer
100
81.33 80.84 83.2 81.51
80 66.44 65.06
63.65 63.75
60
40
20
0
F Score Recall Precision Accuracy
Testing Training

Fig. 8 Graph of Random Forest Model Measures using Count Vectorizer

4.4 Stochastic Gradient Descent (SGD)

This is a basic, but very powerful methodology to describe classifying in convex


loss functions such as (discrete) vector machines support and logistical regression
[11, 12]. A basic downward learning protocol supporting various loss functions and
classification sanctions is introduced in the SGD classifier class.
After each iteration, we use shuffle = True to shuffle the training results. The
model measures are based on TF/IDF as shown in Fig. 9 and Count Vectorizer as
shown in Figs. 10 and 11.

4.5 Logistic Regression

It is a statistical method for examining the dataset in which one or more indepen-
dent variables are present which resolve the output. The effect is calculated by the
Performance Analysis of Terrain Classifiers Using Different Packages 525

SGD Using TF/IDF


150
95.22 95.04 95.83 95.62
100 80.5 80.66 82.38 81.82

50

0
F Score Recall Precision Accuracy
Testing Training

Fig. 9 Graph of SGD using TF/IDF

Fig. 10 Estimated values of performance metrics for SGD of the target variable for TF/IDF
526 B. Kaur et al.

SGD Measures Using Count Vectorizer


100 93.09 93.08 93.32 92.99
90
80
67.85 67.98 69.15 68.73
70
60
50
40
30
20
10
0
F Score Recall Precision Accuracy
Testing Training

Fig. 11 Graph of SGD measures using Count Vectorizer

dichotomous equation (where there are only two possible results) [13]. The purpose
of Logistic Regression is to identify the most appropriate model for describing the
association between the two-state target variable of interest (dependent variable =
operator response or results) and a set of independent variables (predictor or explana-
tory). We imported Logistic Regression from scikit learn. By necessity, the regulariza-
tion is implemented. Data can be both dense and sparse. For maximum performance
using C-ordered arrays or CSR matrices containing 64-bit floats, every other data
type will be converted [14]. The accuracy measures are based on TF/IDF as shown
in Fig. 12 and Count Vectorizer as shown in Fig. 13.

Logistic Regression Using TF/IDF


120
96.35 96.21 96.6 96.47
100 81.24 81.13 82.31 82.14
80
60
40
20
0
F Score Recall Precision Accuracy
Testing Training

Fig. 12 Graph of Logistic Regression using TF/IDF


Performance Analysis of Terrain Classifiers Using Different Packages 527

Logistic Regression Using Count Vectorizer


120
99.33 99.32 99.33 99.31
100
80 69.3 69.19 69.75 70.04
60
40
20
0
F Score Recall Precision Accuracy
Testing Training

Fig. 13 Graph of Logistic Regression using Count Vectorizer

4.6 Support Vector Machines (SVMs)

It is a discrete, discriminative classification built in a formal way. In other words, an


optimum hyperplane that categorizes new examples is generated by the algorithm,
called training data. This hyperplane is a two-dimensional space line dividing a plane
into two parts in which every class is located on either side [15–26]. Scikit-learn
vector supports both compact sample vectors (numpy.ndarray, and numpy.asarray)
and sparse ones (either scipy.sparse). It would however have been sufficient for the
use of an SVM to predict sparse data. The model measures are based on TF/IDF
shown in Fig. 14 and Count Vectorizer as shown in Fig. 15.

5 Result and Analysis

All the six classifiers are evaluated under the tenfold cross-validation for both the
original and resample data as shown in Fig. 17. The measures are plotted to evaluate
the best model, as shown in Table 1.

SVM Using TF/IDF


100 83.68
80.42 79.95 80.91
80 68.35 67.74 72.84 69.06
60
40
20
0
F Score Recall Precision Accuracy
Testing Training

Fig. 14 Graph of SVM using TF/IDF


528 B. Kaur et al.

SVM Using Count Vectorizer


120
99.16 99.15 99.18 99.14
100
80 66.78 66.77 67.38 67.55
60
40
20
0
F Score Recall Precision Accuracy
Testing Training

Fig. 15 Graph of SVM using Count Vectorizer

Table 1 Overall results out of the six models


Accuracy Precision Recall F-score
Training_RF 89.64 91.96 88.77 89.35
Testing_RF 69.87 74.94 68.17 67.35
Training_SGD 95.62 95.83 95.04 95.22
Testing_SGD 81.82 82.38 80.66 80.5
Training_LR 96.47 96.6 96.21 96.35
Testing_LR 82.14 82.31 81.13 81.24
Training_NB 98.18 98.28 97.78 97.93
Testing_NB 82.58 83.75 81.36 81.32
Training_SVM 80.91 83.68 79.95 80.42
Testing_SVM 69.06 72.84 67.74 68.35
Training_DT 48.96 53.43 47.59 47.96
Testing _DT 42.84 46.99 41.43 41.89

Visualizing the common performance measures for classification tasks: accuracy,


recall (called sensitivity), and precision as shown in Fig. 16.

5.1 Accuracy

It is the fraction of observations that match the predictions. If our model is correct,
then it seems like the best option. When the datasets are balanced, with roughly equal
numbers of false positives and false negatives, accuracy can be a helpful statistic, that
means you will have to take into account certain additional criteria when gauging
your model’s performance.
Performance Analysis of Terrain Classifiers Using Different Packages 529

Fig. 16 Visualizing the common performance measures for classification tasks: accuracy, recall
(called sensitivity), and precision

Comparitive Analysis of Modelling Techniques


120
100
80
60
40
20
0
Accuracy Precision Recall F score

Training_RF Testing_RF Trainning_SGD Testing_SGD


Training_LR Testing_LR Training_NB Testing_NB
Training_SVM Testing_SVM Training_DT Testing _DT

Fig. 17 Graph showing comparative analysis of modeling techniques

Accuracy = TP + TN/TP + FP + FN + TN. (1)

Naïve Bayes accuracy of training high accuracy 98.18 and testing high accuracy
82.58.

5.2 Precision

Precision is defined as the percentage of projected successes relative to the expected


success rate. This measure provides a response to the question, “Of the people
530 B. Kaur et al.

who were officially counted as having survived, how many actually made it?” High
accuracy is associated with a low false-positive rate.

Precision = TP/TP + FP. (2)

Naïve Bayes precision of training data 98.28 and testing data 83.75.

5.3 Recall (Sensitivity)

The rate of recall is calculated as the ratio of correctly predicted positive observations
to the total number of observations in the class. How many of the passengers who
really survived were mistakenly classed as “not survivors”?

Recall = TP/TP + FN. (3)

Naïve Bayes truly survived of training data 97.78 and testing data 81.36.

5.4 F1-Score

An F1-Score is a precision and recall’s weighted average. Therefore, this score takes
into account the possibility of both false positives and false negatives. F1 is sometimes
more helpful than accuracy, especially when the classes are not evenly distributed,
although it is less obvious. Accuracy works best when the cost of a false positive and
a false negative is balanced out. If there is a large disparity between the costs of false
positives and false negatives, it is important to evaluate both precision and recall.

F1(Score) = (Recall × Precision) × 2/(Precision + Recall). (4)

Naïve Bayes F1-Score of training data 97.93 and testing data 81.32.
We can classify samples as positive or negative if they come from one of two
categories (two classes). Positive samples are represented by the left rectangle, while
negative samples are represented by the right rectangle. The circle represents all
samples that were projected to be positive. The outcomes should help the model
builder understand what these parameters indicate and how well their model has
done.
As evident from the graph, out of the six models, the best model is that of Naïve
Bayes. The model does not over fit and gives best testing and training accuracy
(Fig. 17).
Performance Analysis of Terrain Classifiers Using Different Packages 531

6 Conclusions

We observed that for almost all the models performed well for TF–IDF as compared
to Count Vectorizer except for Logistic Regression where Count Vectorizer wins
but at the cost of precision and recall values. The six models tried are as follows:
Naive Bayes, SGD classifier, SVM, Random Forest, Decision Tree, and Logistic
Regression. From the above models tried we observed that Naive Bayes Classifier
using TF–IDF performs best closely followed by SGD classifier. Though both these
techniques have good performance, while looking at the precision and recall of the
individual categories. Thus, to conclude text classification is a very interesting area
of unstructured data in machine learning and can be explored in depth to understand
which techniques performs best and at what circumstances. Several popular ones
were explored in this project like Naive Bayes and Support Vector Machines and
SGD classifier.

References

1. Lee I, Shin YJ (2020) Machine learning for enterprises: applications, algorithm selection, and
challenges. Bus Horizons 63(2):157–170
2. Shatte ABR, Hutchinson DM, Teague SJ (2019) Machine learning in mental health: a scoping
review of methods and applications. Psychol Med 49(9):1426–1448
3. Van Calster B, Wynants L (2019) Machine learning in medicine. N Engl J Med 380(26):2588–
2588
4. Leo M, Sharma S, Maddulety K (2019) Machine learning in banking risk management: a
literature review. Risks 7(1):29
5. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1, no 10.
Springer Series in Statistics, New York
6. Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval
7. Shreemali J et al (2021) Comparing performance of multiple classifiers for regression and
classification machine learning problems using structured datasets. In: Materials today:
proceedings
8. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
9. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
10. https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifie.
html.
11. https://developers.google.com/machine-learning/crash-course/reducing-loss/stochastic-gra
dient-descent.
12. Cheplygina V, de Bruijne M, Pluim JP (2019) Not-so-supervised: a survey of semi-supervised,
multi-instance, and transfer learning in medical image analysis. Med Image Anal 54:280–296
13. https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.
html
14. Patel S (2017) Chapter 2: SVM (support vector machine)—theory. Mach Learn 101
15. Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn
109(2):373–440
16. Kumar MV, Venkatachalam K, Prabu P, Abdulwahab A, Abouhawwash M (2021) Secure
biometric authentication with de-duplication on distributed cloud storage. PeerJ Comput Sci
4(3):e569
532 B. Kaur et al.

17. Abdel Basset M, Moustafa N, Mohamed R, Osama M, Abouhawwash M (2021) Multi-objective


task scheduling approach for fog computing. IEEE Access 9(2):126988–127009
18. AbdelBasset M, ElShahat D, Deb K, Abouhawwash M (2020) Energy-aware whale optimiza-
tion algorithm for real-time task scheduling in multiprocessor systems. Appl Soft Comput
93(3):34–45
19. AbdelBasset M, Mohamed R, Abouhawwash M, Chakrabortty K, Michael J (2021) EA-MSCA:
an effective energy-aware multi-objective modified sine-cosine algorithm for real-time task
scheduling in multiprocessor systems: methods and analysis. Expert Syst Appl 173(3):1–15
20. Seada H, Abouhawwash M, Deb K (2019) Multiphase balance of diversity and convergence in
multiobjective optimization. IEEE Trans Evol Comput 23(3):503–513
21. Abouhawwash M, Alessio A (2021) Multi objective evolutionary algorithm for PET image
reconstruction: Concept. IEEE Trans Med Imaging 40(8):2142–2151
22. Abdel-Basset M, Mohamed R, AbdelAziza NM, Abouhawwash M (2022) HWOA: a hybrid
whale optimization algorithm with a novel local minima avoidance method for multi-level
thresholding color image segmentation. Expert Syst Appl 190(1):116145
23. Seada H, Abouhawwash M, Deb K (2016) Towards a better diversity of evolutionary multi-
criterion optimization algorithms using local searches. In: Proceedings of the 2016 on genetic
and evolutionary computation conference companion, Denver, USA, pp 77–78
24. Abouhawwash M, Deb K (2016) Karush-kuhn-tucker proximity measure for multi-objective
optimization based on numerical gradients. In: Proceedings of the 2016 on genetic and
evolutionary computation conference companion, Denver, USA, pp 525–532
25. Abouhawwash M, Jameel MA, Deb K (2020) A smooth proximity measure for optimality in
multi-objective optimization using Benson’s method. Comput Oper Res 117(3):104900
26. Suganthi ST, Vinayagam A, Veerasamy V, Deepa A, Abouhawwash M et al (2021) Detec-
tion and classification of multiple power quality disturbances in microgrid network using
probabilistic based intelligent classifier. Sustain Energy Technol Assess 47(4):101470
High Performance RF Doherty Power
Amplifier for Future Wireless
Communication Applications—Review,
Challenges, and Solutions

Sukhpreet Singh, Atul Babbar, Ahmed Alkhayyat, Ganesh Gupta,


Sandeep Singh, and Jasgurpreet Singh Chohan

Abstract In this paper, systematic review of Doherty power amplifier for future
wireless communication applications is presented. The Doherty power amplifier
works on the principle basic operation and design of Doherty power amplifier is
presented. The Doherty power amplifier is mainly facing limited bandwidth oper-
ation while maintaining the desired efficiency and linearity. The design challenges
highlighting the bandwidth limiting factors are discussed in detail. The latest research
trends focused on improving the performance of Doherty amplifier using various
architectures and their main challenges as well as solutions are addressed. The latest
techniques are reviewed to enhance the performance of Doherty power amplifier in
terms of peak, back-off efficiency, and bandwidth along with maintaining the linearity
to make it most suitable and preferable candidate for future wireless communication
applications.

Keywords Bandwidth · Doherty power amplifier (DPA) · Gallium Nitride


(GaN) · Power amplifier (PA) · Peak average to power ratio (PAPR) · Output
back-off efficiency (OBO) · Wireless communication

S. Singh (B)
Department of ECE, Chandigarh University, Gharuan, Mohali, India
e-mail: sukhpreet.ece@cumail.in
A. Babbar
Department of Mechanical Engineering, SGT University, Gurugram, Haryana 122505, India
A. Alkhayyat
College of Technical Engineering, The Islamic University, Najaf, Iraq
G. Gupta
Department of CSE, Sharda University, Greater Noida, India
S. Singh · J. S. Chohan
University Center of Research and Development, Chandigarh University, Chandigarh, India
e-mail: sandeepsingh.civil@cumail.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 533
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_45
534 S. Singh et al.

1 Introduction

In the last two decades, mobile communication technology has had a profound impact
on the way people all over the world communicate with one another. The mobile
communication systems are the most widely used wireless technology ever. In terms
of capacity and capabilities, Table 1 shows the evolution of wireless communication
standards from the first generation to the approaching fifth generation. Research
and development of 5G, the next generation of wireless communication, is now the
primary focus of the telecommunications industry [1, 2]. With the advent of 5G,
we will now be able to take use of services that previously needed infrastructure
upgrades such as faster data rates (multi-Gigabits per second), larger batteries, and
more devices (100 times more compared to 4G to allow IOT), as well as lower latency
and more stable connections [3]. There must now be a significant shift in both the
overall and individual designs of radio frequency front end (RFFE) components in
order to realize all of these services. Coverage of a large number of frequency bands
is an essential function of RFFE. Since the power amplifier (PA) is the most energy-
intensive block in RFFE and must process relatively big signals, its design is of
paramount importance [4].
Because conventional power amplifiers are optimized for a narrow frequency
range, their usable bandwidth decreases dramatically at higher frequencies. In addi-
tion, high-frequency signals have a larger bandwidth and a higher peak-to-average
ratio (PAPR) [5] because their amplitude and phase shift over time as a result of the
frequent changes in transmission power. Therefore, linearity is the primary need of
power amplifiers in wireless communication systems, and in order to retain this, the
amplifier must operate across a greater output back-off (OBO) due to high power-
added-per-receiver (PAPR). A PA can be designed if the output power needed is
small, and the efficiency is between 7 and 14%. In order to improve the transmitter’s
overall performance, a PA with improved back-off efficiency and linearity is required.

Table 1 Evolution of wireless communication standards


1G 2G 3G 4G 5G
Year
1980 1990 2000 2012 2022
Technology
AMPS GSM UMTS LTE ?
Services and features
Voice Mobility roaming Multimedia Architecture efficiency Spectral efficiency
Capacity
Cost effectiveness
Intelligence
Data rate
2.4 kB/s 9.6 kB/s 2 MB/s 300 MB/s MultiGB/s
High Performance RF Doherty Power Amplifier for Future Wireless … 535

2 Doherty Power Amplifier Operation

As can be seen in Fig. 1, the DPA has two amplifiers: the “main” or “carrier” amplifier
and the “auxiliary” or “peaking” amplifier. The heart of a differential power amplifier
is an input power splitter that divides the incoming signal in half, followed by the main
amplifier and the load, which must be modulated correctly. To counteract the phase
discrepancy caused by the impedance inverting network, a delay line is introduced
at the DPA’s input [6].
Both the amplifiers are representing as current source in which carrier amplifier
biased as Class AB operation and peaking amplifier biased as Class C operation. As
a result, peaking amplifier will conduct only for half of the cycle. Both amplifiers
are producing output current that is linear proportional to input signal voltage. DPA
based on the principal of load modulation that means variation in load impedance
of each amplifier in accordance with level of input power. Carrier amplifier has load
impedance that is two times that of conventional amplifier that is 2Zopt, where Zopt
is the optimum load impedance to transfer the maximum power. Thus, the load modu-
lation behavior is depicted in which the impedance of both amplifiers is modulated
at peak power to Zopt which is mapped to optimum impedance value of transistor
presented in Fig. 2. As the impedance inverter provide efficient load modulation of
DPA which helps in improving back-off efficiency but this impedance inverter limits
the operational bandwidth of DPA [6]. The cause of limited bandwidth of DPA
is that when the frequency of operation shifts from the mid frequency, the output
impedance of main amplifier decreases. As a result, the drain efficiency degrades for
operation in a wide bandwidth. This is known as the frequency dispersion effect. The
various methods to improve the bandwidth are also elaborated in literature survey.
The various DPA architectures which overcome its limitations as well as improve
the overall performance are discussed below.

Fig. 1 Standard schematic of DPA


536 S. Singh et al.

Fig. 2 Impedance profile of 8


Main amp. impedance
main and auxiliary amplifier Auxiliary amp. impedance
7

Normalized Impedance
5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized input voltage

3 Design Challenge for DPA

DPA has advantage of higher efficiency and linearity as compared to conventional


amplifier along with it is simple circuit and easy to design, but there are some design
challenges or issues are associated with design of DPA. First of all, the nonlinearity
in gain of each amplifier since peaking amplifier forces to changes the optimal load
impedance in order to achieve the maximum efficiency as a result gain of amplifier
gets reduced. Second issue associate with its design is intermodulation distortion
produced by auxiliary amplifier as it is biased in Class C (low bias). The method
to remove intermodulation distortion is to use the intermodulation product of main
amplifier to sum with the auxiliary amplifier at load end [7]. Third challenge is
the phase offset between two signals that arise from main and auxiliary amplifier
respectively as both are conducting differently and generates out of phase current
profiles [6]. As a result, linearity will be poor, and efficiency gets reduced. The
next challenge is narrower bandwidth that is caused by use of impedance inverter
which is required to perform the correct load modulation [8]. Also, it is very difficult
to integrate DPA into single chip due to impedance inverter (large size). Another
challenge is that in an ideal DPA the load modulation can be done by impedance
inverter network only for real impedance [9]. But the output reactance of actual
devices that are in use presents an imaginary part to the load, that should be removed
otherwise it will introduce a phase distortion during load modulation that results into
reduced back-off efficiency and linearity of DPA.

3.1 Bandwidth Limiting Factors

Bandwidth is very import parameter for wireless communication. There are many
factors that limit the bandwidth of DPA as reviewed in the literature survey. In actual,
High Performance RF Doherty Power Amplifier for Future Wireless … 537

the working of DPA based upon the load modulation technique due to which DPA
has enhanced efficiency at back-off power levels. In this technique, load impedance
is varied according to input power level that means it need different impedance
at different power levels. The correct load modulation can be achieved through
impedance inverter line, phase compensation, and offset line. But all these are
observed as bandwidth limiting factors for DPA [9, 10]. First of all, the impedance
inverter transmission line is usually utilizing a quarter wave transmission line which
having narrow bandwidth. The various methods are reviewed in the literature that
focuses on new design of impedance inverter transmission line to enhance its band-
width. The second limiting factor is synchronization of phase between main and
auxiliary amplifier and to attain it over large bandwidth by reengineering the conven-
tional design of DPA is known as Digital DPA, in this baseband signal is divide and fed
into carrier and peaking amplifier individually. Other bandwidth limiting factor is the
lack of broadband matching networks. There is need of wideband matching network
which provides appropriate matching of the impedance at both saturation and back-
off power levels so that maximum power will get transferred at each frequency of
operation inside desired band. But due to lack of wideband matching network desired
response is very hard to achieve at each frequency inside band. The last is offset lines
which work on a single frequency, and it is integrated into output matching network.
These challenges open the various opportunities for designers to find the optimum
solutions for bandwidth limitation factors so that wideband DPA can be designed.
The one solution to problems faced in designing DPA is the use of the offset line with
both amplifier output matching network having characteristic impedance equal to the
load impedance at saturation to optimize the load modulation. Another solution is
to design DPA in such way that it will combine both output matching network and
power combiner to reduce the parasitic losses; moreover, with this technique size of
device can be reduced [11].

4 DPA Design Methods

Back-off efficiency and linearity are the two important parameters of RF front end
DPA, and the main factor that degrades these parameters is load modulation technique
that is not optimum. Therefore, to maintain the optimum active load modulation
various methods have been developed. Some of these methods are addressed in this
section.

4.1 DPA Design Using Multistage Architecture

At 6dB of output power back-off, DPAs are regarded as the finest option for ampli-
fication because to their suitable efficiency and linearity. The typical DPA has a 6
dB back-off efficiency increase, but in practice, the DPA transmits average output
538 S. Singh et al.

Carrier PA
ZC
Power Splitter

Peaking PA1 Z02


PIN ZP1
-900

Peaking PA2 Z01


ZT
0 ZP2
-180

Load (50Ω)

Fig. 3 Schematic of three-way DPA reported in [13]

power in the range of 9–12 dB back-off, which results in low efficiency [12]. Multi-
stage dynamic power amplification (DPA) is a technique for increasing efficiency at
significant back-off by employing many stages of amplifiers. The linearity of DPAs
is enhanced by this design since the output power of several PAs is combined. An
example of a three-way DPA arrangement is shown in Fig. 3 [13], where the load of
the main amplifier is modulated by the first peaking amplifier and the modulation of
load of the preceding DPA is performed by the next peaking amplifier.
In spite of providing such highly efficient amplified signal, this multistage DPA
has limitation of complex circuit, and it becomes very difficult to implement. This
technique is also results into non optimum load modulation because peaking ampli-
fiers biased at low voltage that cause the less output power by peaking amplifiers.
This will directly affect efficiency and linearity of DPA.

4.2 DPA Design Using Asymmetrical Devices

If both the main and auxiliary amplifier in DPA is symmetrical to each other, then the
output current of auxiliary amplifier will be less than main amplifier as it is biased
in Class C mode and conduction angle is small, so it results into smaller output
power and large back-off. So, for correct operation of DPA either the power splitter
in input section should be asymmetrical or auxiliary device size should be double
as compared to main amplifier device [14]. If we use asymmetric DPA (ADPA)
by doubling the size of auxiliary amplifier that results into degradation of overall
performance of amplifier because the power delivered to main amplifier is reduced.
The ideal technique to achieve accurate load modulation is to use a power splitter
that operates in such a manner that more power is sent to the auxiliary amplifier than
the main amplifier. This will allow the auxiliary amplifier to generate the correct
amount of current. This results in a 9–13% increase in drain efficiency [15]. Figure 4
presents bias adapted two ways DPA (BA-DPA) that consists of main and auxiliary
amplifier and additionally a generator for envelope-shaped voltage whose function is
High Performance RF Doherty Power Amplifier for Future Wireless … 539

Fig. 4 DPA with adaptive gate bias architecture reported in [15]

to change the gate voltage of main amplifier in such a way that it reduces the power
consummation by main amplifier during low power level when only main amplifier
is in conduction state. The main limitation of this advanced architecture is the circuit
which becomes more complicated, and cost will also increase.

4.3 DPA Design Using Envelope Tracking

This technique [16] uses the envelope’s magnitude to adjust the biasing of the auxil-
iary amplifier’s gates. That is to say, the auxiliary amplifier has a variable gate bias.
The gate bias of both amplifiers may be adjusted to meet specific demands, and
the combined supply modulator and DPA efficiencies can be increased [17]. This
method can also be used with main amplifier active in low power region and can
increase the efficiency. DPA is using envelope tracking method as shown in Fig. 5.
In this type of design of DPA, at low power level main amplifier starts conducting
and when the power level increases with adjustment of bias results into conduction
of auxiliary amplifier that contribute more current to load. In this architecture due
to bias adjustment both the amplifier contributes to same amount of current to load.
Thus, this method helps the peaking amplifier to synchronize with main amplifier.
Hence, optimum load modulation and both efficiency and linearity will be enhanced.
This method is good as it enhances the performance of DPA, but it has limitation
that it requires additional circuitry that makes it complicated, and cost gets increases.
540 S. Singh et al.

Fig. 5 DPA based on envelope tracking method reported in [18]

4.4 DPA Design Using Inverted Architecture

This architecture provides the compact inverted load network as compared to conven-
tional DPA as it makes it suitable for portable devices [12]. In inverted DPA, the
auxiliary amplifier’s drain is connected to the inverted transmission line instead of
main amplifier’s drain [19]. In low power region, the main amplifier provides the
maximum efficiency as it works with load of 25 Ω because the combination node
of DPA is reversed. But in traditional DPA, a 50 Ω λ/4 transmission line is used
for impedance inversion and combination point provides output. In IDPA, offset
lines are used with auxiliary amplifier whose function is to provide the condition of
off-state for auxiliary amplifier as shown in Fig. 6. The IDPA results into enhanced
efficiency during low power region of operation but important point of interest in
this architecture to correct design of output matching network of both amplifiers.

Fig. 6 Design of two-way inverted DPA reported in [20]


High Performance RF Doherty Power Amplifier for Future Wireless … 541

Fig. 7 DPA based on dual-input digital method reported in [23]

4.5 DPA Design Using Digital Method

By coordinating the operations of the primary and secondary amplifiers, optimal


load modulation throughout a broad frequency range may be achieved. Modifying
the DPA’s fundamental structure yields the desired synchronization behavior, and
the resulting design is known as digital DPA [21]. Using a dual-input topology with
digitally assisted control over the amplifiers’ inputs, this DPA design minimizes
hardware needs [22]. The use of a power splitter is unnecessary in this configuration,
and the energy needed to drive power to the auxiliary line is minimized as a result
of the amplifiers’ separate inputs.
The architecture of DPA based on digital design is shown in Fig. 7. It consists
of digital signal processing block whose function is to perform the basic operations
like digital pre-distortion, interpolation, and modulation at input signal. The other
block is signal separation block [23]. The main function of DSP is that the signal
of carrier amplifier should be provided to main amplifier and signal of auxiliary
amplifier should be given to peaking amplifier path from input signal. Each amplifier
works individually to amplify the given signal and later on it is up-converted to RF
signal. The designing of digital DPA is very challenges as there should be accuracy
and synchronization of alignment between main and auxiliary path at baseband.
This method is rarely used in high-speed applications for design of power amplifier
because there is need of pre-distortion at large scale, synchronization between both
amplifiers and large oversampling [24].

5 Conclusions

In this paper, a short survey on DPA is accomplished which begins with the basic
operation of DPA followed by the literature survey related to it. It has been reviewed
that DPA is a promising technology for linearity as well as efficiency enhancement
for advanced wireless communication. The paper covered the advanced DPA archi-
tectures used for back-off efficiency, linearity, and bandwidth enhancement which
542 S. Singh et al.

are based on techniques like multistage, asymmetrical devices, envelope tracking,


inverted, digital and optimization of matching network. The performance analysis
has been carried out for recent work done using each method. These all techniques
improve the essential parameters of DPA at the cost of increased hardware complexity
that can be compensating for efficient operation of DPA. It can be concluded from the
survey that ET-DPA is the most preferable candidate for efficient amplification along
with required linearity at the cost of supply modulator complexity. These techniques
will be going to endlessly discover in the future to address the design of DPA for
future communication systems.

References

1. Mange MG, Braun V, Tullberg H, Zimmermann G, Bulakci Ö (2015) METIS research advances
towards the 5G mobile and wireless system definition. EURASIP J Wireless Commun Netw
1. https://doi.org/10.1186/s13638-015-0302-9
2. Tehrani M, Uysal M, Yanikomeroglu H (2014) Device-to-device communication in 5G cellular
networks: challenges, solutions, and future directions. IEEE Commun Mag 52(5):86–92
3. http://www.3gpp.org/
4. Cripps S (2006) RF power amplifiers for wireless communications. IEEE Microwave Mag
1(1):64–64
5. Agiwal M, Roy A, Saxena N (2016) Next generation 5G wireless networks: a comprehensive
survey. IEEE Commun Surv Tutorials 18(3):1617–1655. https://doi.org/10.1109/comst.2016.
2532458
6. Camarchia V, Pirola M, Quaglia R, Jee S, Cho Y, Kim B (2015) The Doherty power amplifier:
review of recent solutions and trends. IEEE Trans Microw Theor Tech 63(2):559–571
7. Kim B, Kim J, Kim I, Cha J (2006) The Doherty power amplifier. IEEE Microwave Mag
7(5):42–50
8. Singh S, Chawla P (2021) A highly efficient and broadband Doherty power amplifier design for
5G base station. In: Proceedings of international conference on data science and applications:
ICDSA 2021, vol 1. Springer Singapore, Singapore, pp 201–211. Golestaneh H, Malekzadeh F,
Boumaiza S (2013) An extended-bandwidth three-way Doherty power amplifier. IEEE Trans
Microwave Theor Tech 61(9):3318–3328
9. Moon J, Kim J, Kim I, Kim J, Kim B (2008) Highly efficient three-way saturated Doherty
amplifier with digital feedback predistortion. IEEE Microwave Wirel Compon Lett 18(8):539–
541
10. Barthwal A, Rawat K, Koul SK (2018) A design strategy for bandwidth enhancement in three-
stage Doherty power amplifier with extended dynamic range. IEEE Trans Microw Theor Tech
66(2):1024–1033
11. Golestaneh H, Malekzadeh FA, Boumaiza S (2013) An extended-bandwidth three-way Doherty
power amplifier. IEEE Trans Microw Theor Tech 61(9):3318–3328
12. Kim I, Moon J, Jee S, Kim B (2010) Optimized design of a highly efficient three-stage Doherty
PA using gate adaptation. IEEE Trans Microw Theor Tech 58(10):2562–2574
13. Wu D, Boumaiza S (2012) A modified Doherty configuration for broadband amplification using
symmetrical devices. IEEE Trans Microw Theor Tech 60(10):3201–3213
14. Singh S, Chawla P (2021) Design and performance analysis of broadband Doherty power
amplifier for 5G communication system. In: 2021 2nd international conference for emerging
technology (INCET). IEEE, pp 1–4
15. Park Y, Lee J, Jee S, Kim S, Kim B (2015) Gate bias adaptation of Doherty power amplifier
for high efficiency and high power. IEEE Microwave Wirel Compon Lett 25(2):136–138
High Performance RF Doherty Power Amplifier for Future Wireless … 543

16. Choi J, Kang D, Kim D, Kim B (2009) Optimized envelope tracking operation of Doherty power
amplifier for high efficiency over an extended dynamic range. IEEE Trans Microw Theor Tech
57(6):1508–1515
17. Yang Y, Cha J, Shin B, Kim B (2003) A microwave Doherty amplifier employing envelope
tracking technique for high efficiency and linearity. IEEE Microwave Wirel Compon Lett
13(9):370–372
18. Zhang Z, Xin Z (2014) A LTE Doherty power amplifier using envelope tracking technique.
In: Proceedings of the 15th international conference on electronic packaging technology, pp
1331–1334
19. Zhou H, Wu H (2013) Design of an S-band two-way inverted asymmetrical Doherty power
amplifier for long term evolution applications. Progr Electromagnet Res Lett 39:73–80
20. Ahn G et al (2007) Design of a high-efficiency and high-power inverted Doherty amplifier. IEEE
Trans Microwave Theor Tech 55(6):1105–1111. https://doi.org/10.1109/tmtt.2007.896807
21. Darraji R, Ghannouchi F (2011) Digital Doherty amplifier with enhanced efficiency and
extended range. IEEE Trans Microw Theor Tech 59(11):2898–2909
22. Zhao J, Wolf R, Ellinger F (2013) Fully integrated LTE Doherty power amplifier. In:
Proceedings of the international semiconductor conference Dresden-Grenoble (ISCDG), pp
1–3
23. Darraji R, Ghannouchi F, Hammi O (2011) A dual-input digitally driven Doherty amplifier
architecture for performance enhancement of Doherty transmitters. IEEE Trans Microw Theor
Tech 59(5):1284–1293
24. Gustafsson D, Andersson C, Fager C (2013) A Modified Doherty power amplifier with extended
bandwidth and reconfigurable efficiency. IEEE Trans Microw Theor Tech 61(1):533–542
Author Index

A Ben Franklin Yesuraj, 395


Abhi Kathiria, 299 Bhanu Prakash Sharma, 85
Abhinav Sehgal, 191 Bhawna Goyal, 517
Abhinav Sharma, 191 Birendra Kumar Saraswat, 33
Aditya Harsh, 143 Bobbinpreet Kaur, 517
Aditya Raj Mehta, 179 Brajesh Kumar Singh, 277, 309
Aditya Saxena, 33
Aditya Shastri, 299
Aditya Taware, 433 C
Akash Silas, N., 481 Canchibalaji Sathvik, 407
Akhilesh Singh, 105 Ceronmani Sharmila, V., 359, 371, 383, 395
Akshat Gupta, 75 Chadalavada Gautham, 407
Alchilibi, Haider, 517 Chanchal Kumar, 255
Alka Chaudhary, 265 Charulatha, J., 421
Alkhayyat, Ahmed, 533
Alwin Infant, P., 421
Amit Kumar Mishra, 95 D
Amruta Pawar, 287 Dayakshini Sathish, 457
Anil Pratap Singh, 507 Deepak Sharma, 277
Ankit Srivastava, 321 Deepika Gautam, 167
Anurag Tripathi, 105 Deepshikha Bhati, 321
Arpan Garg, 517 Dhamini Devaraj, 469
Aruna Bhat, 127 Dhanush Binu, 49
Aryaman Todkar, 433 Dharmendra Kumar, 227
Ashim Saha, 321 Dilkeshwar Pandey, 265
Ashutosh Kumar Verma, 203 Divyam Gumber, 61
Aswin, S., 383 Divyanshu Sharma, 203
Atul Babbar, 533
Avanish Gupta, 105
Avudaiappan Maheshwari, 347 E
Ayush Dogra, 507 Erma Suryani, 277

B F
Badrinath Karthikeyan, 407 Fezaa, Laith H. A., 507, 517

© The Editor(s) (if applicable) and The Author(s), under exclusive license 545
to Springer Nature Singapore Pte Ltd. 2024
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7
546 Author Index

G N
Gagan Sharma, 25 Naga Sridevi Piratla, 155
Ganesh Gupta, 507, 533 Nandhini, S., 395
Gaurav Parashar, 265 Naveen Chauhan, 203
Gautham, S., 359 Neha Yadav, 13
Gayathri, T. R., 469 Nikhil Kashyap, 115
Nikhil Rajora, 115
Nirmalya Kar, 143
H
Harirajkumar, J., 445, 493
Harsha Vardhan Puvvadi, 243 P
Harsh Jindal, 191 Padmavathi, B., 469
Harsh Sharma, 191 Pankaj Kumar, 167
Hasitha, V., 469 Pankaj Singh Kholiya, 95
Himanshu Aggarwal, 203 Parva Barot, 299
Hussain, Ahmed, 507 Pooja, K., 337
Preethesh H. Acharya, 457

I
Itisha Kumari, 215 R
Ragavendra, K., 421
Raghav Shah, 287
J RajaSekhar Konda, 1
Jasgurpreet Singh Chohan, 533 Rajesh Singh, 507
Jayanta Pal, 143 Ravindra Kumar Purwar, 85
Jyostna Devi Bodapati, 1 Rishikesh Unawane, 433
Rohan Godha, 179, 191
Roopal Tatiwar, 433
K Rupinderjit Kaur, 517
Kavitha Datchanamoorthy, 469
Kayalvizhi, S., 407, 481
King Udayaraj, M. R., 371 S
Kishore Kanna, R., 337 Sadhana, G., 421
Kishore, M. A., 347 Sahul Kumar Parida, 179
Kriti, 95 Saiganesh, V., 347
Sai Teja Annam, 1
Samarth Shetty, 457
L Sandeep Singh, 533
Lokesh Rai, 49 Santhosh, M., 493
Lokesh, S., 347 Saransh Khulbe, 61
Sarmistha Podder, 143
Sasirekha, N., 445, 493
M Saswat Kumar Panda, 179
Manish Paliwal, 299 Sathish Kabekody, 457
Manni Kumar, 287 Saumyamani Bhardwaz, 179
Manu Gupta, 155 Shailendra Pratap Singh, 61
Manu Midha, 179 Shanu Sharma, 115
Megha Sharma, 191 Sharan Venkatesh, V., 371
Mohan Lal Kolhe, 309 Shaurya Kanwar, 167
Monika, 127 Sheetal Phatangare, 433
Mriga Jain, 309 Shivakumar, R., 445
Mukul Aggarwal, 13 Shivani Pothirajan, 481
Munesh Chandra, 321 Shiv Prakash, 255
Muthukumara Vadivel, B., 383 Shubhankar Munshi, 433
Author Index 547

Shyamala L, 243 V
Soma Prathibha, 347 Vashist, P. C., 33
Sonam Saluja, 321 Vengadeswaran, 49
Souhardha S. Poojary, 457 Vijay Verma, 215
Sripath Kumar Chakrapani, 155 Vinutha, S., 383
Sudhansh Sharma, 227 Vishnu, E., 371
Sukhpreet Singh, 533
Vishnudev, A., 359
Sunil Prajapat, 167
Vivek Karthick, P., 493
Swati Sharma, 13, 105
Swetha, S., 445 Vivek Tomar, 13

T
Taran Singh Bharati, 255 Y
Tarunaa, R. K., 481 Yeshwanth Pasem, 155

You might also like