Professional Documents
Culture Documents
Advances in Data and Information Sciences: Shailesh Tiwari Munesh C. Trivedi Mohan L. Kolhe Brajesh Kumar Singh Editors
Advances in Data and Information Sciences: Shailesh Tiwari Munesh C. Trivedi Mohan L. Kolhe Brajesh Kumar Singh Editors
Shailesh Tiwari
Munesh C. Trivedi
Mohan L. Kolhe
Brajesh Kumar Singh Editors
Advances
in Data and
Information
Sciences
Proceedings of ICDIS 2023
Lecture Notes in Networks and Systems
Volume 796
Series Editor
Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Türkiye
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Shailesh Tiwari · Munesh C. Trivedi ·
Mohan L. Kolhe · Brajesh Kumar Singh
Editors
Advances in Data
and Information Sciences
Proceedings of ICDIS 2023
Editors
Shailesh Tiwari Munesh C. Trivedi
KIET Group of Institutions Department of Computer Science
Ghaziabad, Uttar Pradesh, India and Engineering
National Institute of Technology Agartala
Mohan L. Kolhe Tripura, India
Faculty of Engineering and Science
University of Agder Brajesh Kumar Singh
Kristiansand, Norway Department of Computer Science
and Engineering
R. B. S. Engineering Technical Campus,
Bichpuri
Agra, Uttar Pradesh, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
v
vi Preface
thankful to delegates and the authors for their participation and their interest in ICDIS-
2023 as a platform to share their ideas and innovation. We are also thankful to the Prof.
Dr. Janusz Kacprzyk, Series Editor, LNNS, Springer Nature, and Mr. Aninda Bose,
Executive Editor, Springer Nature, India, for providing guidance and support. Also,
we extend our heartfelt gratitude to the reviewers and Technical Program Committee
Members for showing their concern and efforts in the review process. We are indeed
thankful to everyone directly or indirectly associated with the conference organizing
team leading it toward the success.
Although utmost care has been taken in compilation and editing, however, a few
errors may still occur. We request the participants to bear with such errors and lapses
(if any). We wish you all the best.
vii
viii Contents
Prof. Shailesh Tiwari, Ph.D. currently works as a director and professor in Computer
Science and Engineering Department, Krishna Engineering College, Ghaziabad,
India. He is also taking care of the responsibility of additional director, KIET Group
of Institutions, Ghaziabad, India. He is an alumnus of Motilal Nehru National Insti-
tute of Technology Allahabad, India. His primary areas of research are software
testing, implementation of optimization algorithms and machine learning techniques
in various engineering problems. He has published more than 100 publications in
international journals and in proceedings of international conferences of repute. He
has edited special issues of several Scopus, SCI and E-SCI-indexed journals. He has
also edited several books published by Springer. He has published 6 Indian patents as
IPRs. He has organized several international conferences under the banner of IEEE,
ACM and Springer.
Dr. Munesh C. Trivedi having more than 18 years of teaching experience (out of
which 12 years post Ph.D.) is currently working with National Institute of Tech-
nology, Agartala, Tripura, India, previously. He has successfully filed 60 patents
(51 National and 09 International Patents) in Germany, South Africa and Australia.
He has also published 12 text books and 148 research papers in different inter-
national journals and proceedings of repute. He has also edited 38 books of the
Springer Nature. He has also received numerous awards including Young Scien-
tist Visiting Fellowship, Albert Einstein Research Scientist Award, and Best Senior
Faculty Award, Outstanding Scientist, Dronacharya Award, Author of Year and
Vigyan Ratan Award from different national as well international forum. He has also
organized more than 32 international conferences technically sponsored by IEEE,
ACM and Springer.
Prof. Dr. Mohan L. Kolhe is a full professor in smart grid and renewable energy at
the Faculty of Engineering and Science of the University of Agder (Norway). He is
xiii
xiv Editors and Contributors
Prof. Brajesh Kumar Singh is presently working as a professor and head in Depart-
ment of Computer Science and Engineering, R. B. S. Engineering Technical Campus,
Agra, India, with more than 20 years of teaching experience. He completed his
doctorate degree in Computer Science and Engineering from Motilal Nehru National
Institute of Technology, Allahabad (Uttar Pradesh). His key areas of research are soft-
ware engineering, software project management, data mining, soft computing and
machine learning. He has supervised 02 Ph.D. candidates. Prof. Singh has delivered
several invited talks/key note addresses and chaired sessions in national and inter-
national conferences of high repute in India and abroad. He is having collaborative
training programs/workshops with IIT Bombay.
Contributors
Abstract The influence of music on mood and emotions has been widely studied,
highlighting its potential for self-expression and personal delight. As technology
continues to advance exponentially, manually selecting and analyzing music from
the vast array of artists, songs, and listeners becomes impractical. In this study, we
propose a system called the “emotion-aware music recommendations” that leverages
real-time facial expressions to determine a person’s emotional state. Deep learning
models are employed to accurately detect facial emotions, leveraging the principles of
transfer learning. By combining the model’s output with the mapped songs from the
dataset, a personalized playlist is created. The main objective of the study is to effec-
tively classify user emotions into six distinct categories using pre-trained models.
Experimental studies conducted on the proposed approach employ the RAF-ML
benchmark facial expression dataset. The findings indicate that the model outper-
forms existing approaches, demonstrating its effectiveness in generating tailored
music recommendations.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_1
2 S. T. Annam et al.
1 Introduction
Music has a profound impact on mood, behavior, and brain function [1]. It is
closely linked to emotions and psychological attributes, with brain regions respon-
sible for music processing also influencing emotions [2]. Emotion detection tech-
nology has gained popularity and has applications in various domains [3]. Advance-
ments in signal processing and feature extraction have facilitated automated emotion
identification in multimedia elements like music [4, 5].
In this study, we propose a facial expression-based emotion identification recom-
mender system. By accurately detecting the user’s emotions, the system generates
a suitable music playlist based on their emotional state, particularly focusing on
negative emotions.
Music is strongly connected to emotions, personality traits, and personal expe-
riences [6]. Facial expressions, which are indicators of emotional states, have been
underutilized in music recommendation systems [7, 8].
The vast collection of music tracks poses challenges in efficient organization
and categorization [9]. Facial images captured through a camera provide input for
emotion identification, enabling the generation of music playlists aligned with the
user’s emotional characteristics [10].
Our proposed system, the facial expression-based music player, aims to iden-
tify human emotions and create personalized playlists accordingly. To address the
complexity of emotions, our system considers combinations of multiple emotions
and utilizes the Real-time Affective Faces dataset, which contains images associated
with various emotional labels [11]. Transfer learning using VGG16 weights trained
on the “ImageNet” dataset achieves an accuracy of 81.72%. Overall, our system
offers a more efficient approach to mood categorization in music and enhances the
personalized music listening experience based on facial expressions and emotions.
2 Literature Survey
The authors have used the Million Song Dataset (MSD) and Million Playlist
Dataset (MPD) to develop and evaluate the proposed system. The MSD contains
metadata for one million songs, and the MPD contains 1000 playlists, each with
1000 tracks [13]. The authors have also collected emotion labels for the MSD using
Amazon Mechanical Turk. The proposed system consists of two parts: emotion
classification using a CNN and music recommendation using collaborative filtering.
The authors have used a CNN architecture with three convolutional layers and two
fully connected layers for emotion classification [14]. The CNN model was trained
on a subset of the MSD, which contains 22,125 songs and 12,312 users with emotion
labels. The emotion labels were mapped into four classes: happy, sad, angry, and
relaxed. The CNN model achieved an average classification accuracy of 75.22% on
a test set [15].
The authors have used two datasets to develop and evaluate their proposed system.
The first dataset is the GTZAN dataset, which contains 1000 audio clips of 30 s
each, categorized into 10 music genres. The second dataset is the Audio Emotion
Recognition dataset, which contains 593 audio clips of 2–3 min each, categorized
into four emotional classes: happy, sad, calm, and angry [16]. The authors have used
a DCNN architecture with six convolutional layers and two fully connected layers
for emotion recognition. The DCNN model was trained on the GTZAN dataset and
achieved an accuracy of 80.5% for music genre classification. The authors then fine-
tuned the DCNN model on the Audio Emotion Recognition dataset for emotion
recognition, achieving an accuracy of 75.14 [17].
A recent research study proposed a facial emotion-based music recommendation
system that uses computer vision and machine learning algorithms to offer music that
matches a user’s emotional state [18]. The system can recognize six basic emotions:
joyful, sad, angry, surprised, afraid, and neutral. The system’s performance was
examined using a dataset of 1000 songs, and it obtained an excellent accuracy of
78% in accurately recognizing the user’s emotional state [19].
3 Methodology
The methodology for the system involves multiple steps, including playlist creation,
image preprocessing, model building for emotion detection, and music recommen-
dation.
Playlist creation is a challenging task due to the increasing number of new songs.
To address this, a neural network-based classifier is developed to categorize songs
into multiple emotions. This classifier utilizes a softmax layer and a dense layer
with six neurons representing different emotions. A threshold probability of 0.2 is
4 S. T. Annam et al.
set, meaning that if a class’s probability is at least 0.2, the song will be classified
under that emotion. This allows songs to be classified into multiple emotions using
multi-label classification techniques.
3.2 Preprocessing
In the preprocessing step, a webcam or video is used as a data source to capture facial
expressions. Only the face is extracted from the acquired data, as it is the essential
component for depicting emotions from the image data. The images obtained from
the RAF-ML dataset may come in various sizes, so they are resized to a uniform size
of 224 × 224. This standard size ensures consistency and facilitates the passage of
images through the layers of the VGG16 model during transfer learning. Additionally,
any noise present in the image is adjusted to improve the quality of the data.
For model building, images of various sizes from the RAF-ML dataset are used. These
images undergo cropping, scaling, and normalization before being used to build a
model suitable for detecting emotions. Transfer learning is employed by leveraging
the pre-trained weights from the VGG16 model, which was originally trained on
the ImageNet dataset. Modifications are made to the model architecture for emotion
detection, including freezing the weights of the first four convolutional blocks and
making subsequent weights trainable. The top layer of VGG16 is removed, and
two additional dense layers are added. The final dense layer with six neurons and a
softmax activation function is responsible for classifying the emotions present in an
image (Table 1).
Training: The model is trained using the pre-processed images from the RAF-ML
dataset. Transfer learning allows the model to be initialized with pre-trained weights,
reducing training time and starting with efficient weights. The batch learning tech-
nique is used, where the model learns after processing a batch of samples. Back-
propagation and a loss function are employed to update the model’s weights during
training.
Testing: To evaluate the model’s effectiveness, an independent test dataset containing
real-time images is used. The model’s performance in identifying different emotions
from these images is analyzed to gage its accuracy and overall effectiveness. Based
on the testing results, small adjustments can be made through fine-tuning to further
enhance the model’s performance. Fine-tuning may involve hyperparameter tuning
to optimize the model’s parameters for improved performance on new and unseen
data.
Emotion-Aware Music Recommendations: A Transfer Learning … 5
Table 1 Layer-wise details of the proposed deep architecture designed for emotion recognition
Layer (type) Output shape Param. #
input 2 (InputLayer) [(None, 224, 224, 3)] 0
block1 convi (Conv2D) (None, 224, 224, 64) 1792
block1 conv2 (Conv20) (None, 224, 224, 64) 36,928
block1_pool (MaxPooling20) (None, 112, 112, 64) 0
block2_conv1 (Conv20) (None, 112, 112, 128) 73,856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147,584
block2_pool (MaxPooling20) (None, 56, 56, 128) 0
block3_conv1 (Conv20) (None, 56, 56, 256) 295,168
block3_conv2 (Conv20) (None, 56, 56, 256) 590,080
block3_conv3 (Conv20) (None, 56, 56, 256) 590,080
block3_pool (MaxPooling20) (None, 28, 28, 256) 0
block4_conv1 (Conv20) (None, 28, 28, 512) 1,180,160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2,359,808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2,359,808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
blocks_conv1 (Conv2D (None, 14, 14, 512) 2,359,808
block5_conv2 (Conv2D) None, 14, 14, 512) 2,359,808
block5_conv3 (Conv20) (None, 14, 14, 512) 2,359,808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
global_average_pooling2d (None, 512)
(GlobalAveragePooling2D)
dense (Dense) (None, 256) 131,328
dense 1 (Dense) (None, 6) 1542
Total params.: 14,847,558
Trainable params.: 4,852,486
Non-trainable params.: 9,995,072
In the music recommendation step, the model’s ability to predict multiple emotions
simultaneously based on input images allows for the creation of playlists. The playlist
classifier already has a collection of songs associated with each emotion. Once the
emotions are determined, the songs corresponding to those emotions are shuffled
and compiled into a playlist. This playlist is then recommended to the user with
the intention of influencing and altering their current mood. By selecting songs that
align with the detected emotions, the playlist aims to create a musical experience
that resonates with the user’s emotional state and potentially uplifts or modifies their
mood (Fig. 1).
6 S. T. Annam et al.
Fig. 1 Workflow of the proposed transfer learning based motion-aware music recommendations
4 Modified VGG16
The system utilizes transfer learning and specifically leverages the pre-trained
weights of the VGG16 model trained on the ImageNet dataset [20]. Transfer learning
allows the model to benefit from the knowledge and features learned by the VGG16
model on a large-scale image classification task [21]. By using pre-trained weights,
the model can initialize with effective and efficient weights, reducing training time
and improving performance.
The VGG16 architecture consists of 16 layers, including convolutional, pooling,
dense, and output layers. It is known for its deep structure and the use of small 3
× 3 convolutional filters and max-pooling layers. This design enables the network
to capture intricate spatial patterns at multiple scales while reducing the number of
parameters (Fig. 2).
In the modified VGG16 proposed for recognizing facial expressions, the last layer
is removed, and additional dense layers are added to fine-tune the network for the
emotion detection task. The final dense layer includes a softmax activation function,
which is suitable for multi-class classification problems like emotion recognition.
The number of neurons in the output layer corresponds to the number of emotion
classes.
Emotion-Aware Music Recommendations: A Transfer Learning … 7
The network architecture starts with an input layer that receives the input data,
typically representing an image. The dimensions of the input layer are determined by
the height, width, and channels of the input image. The convolutional layers perform
convolutions on the input data using learnable filters to extract spatial features. Each
convolutional layer produces a set of feature maps representing the response of
different filters to different parts of the input.
Pooling layers follow the convolutional layers to down-sample the feature maps
while retaining important information. Max-pooling is commonly used, selecting the
maximum value within a pooling window to capture the most prominent features in
each local neighborhood.
The dense layers, also known as fully connected layers, connect every neuron to
every neuron in the previous layer. Each connection has a learned weight, and the
output of a neuron is determined by applying an activation function to the weighted
sum of its inputs.
The output layer is the final layer of the neural network and produces the clas-
sification results. In this system, since it is a multi-class classification problem, the
output layer applies the softmax activation function, which calculates the probability
distribution over the emotion classes.
Overall, the modified VGG16 model with its layers and fine-tuning allows for
effective recognition of facial expressions and emotion detection. The utilization of
transfer learning and the architectural choices of VGG16 contribute to the system’s
ability to accurately classify emotions based on input images.
8 S. T. Annam et al.
Table 2 Emotion-wise
Emotion Train_count Test_count
details of the benchmark
dataset used in the studies Surprise 862 220
Fear 314 64
Disgust 906 219
Happy 490 111
Sad 620 161
Anger 734 207
5 Experimental Studies
5.1 Dataset
The model in this system is trained on the RAF-ML (Real-world Affective Faces
Database) dataset, which consists of 29,672 facial images. The dataset includes 40
compound taggers and 12 emotion classes. Each image is labeled with multiple affec-
tive attributes, representing various emotional expressions or states displayed by the
people in the photographs. The dataset aims to capture a wide spectrum of emotions.
For this system, a subset of the RAF-ML dataset is used, consisting of 4908 images.
The selected subset only includes images associated with the emotions surprise,
fear, disgust, happiness, sadness, and anger. Each image is classified into the corre-
sponding emotion class based on the presence of certain emotional characteristics.
The distribution of images among the emotions is shown in Table 2.
From the available dataset, as the counts for each emotion class are relatively
balanced, there is no need to perform any under-sampling or over-sampling tech-
niques. The dataset is divided into 80% for training data and 20% for testing
data.
The RAF dataset consists of facial images with annotations for various emotional
expressions. The emotions covered in the dataset include anger, surprise, fear, happi-
ness, sadness, and disgust. These emotions are labeled using binary labels indicating
whether a specific emotion is present or not in each image.
To evaluate the performance of a facial emotion recognition model on the RAF
dataset, you can use metrics such as accuracy, precision, recall, and F1 score.
These metrics provide insights into the model’s ability to correctly classify different
emotions.
For example, you can calculate the accuracy of the model by comparing the
predicted emotions with the ground truth labels for the test set. Precision measures
the proportion of correctly predicted positive emotions (true positives) out of all
predicted positive emotions. Recall, on the other hand, measures the proportion of
correctly predicted positive emotions out of all actual positive emotions. The F1
score is the harmonic mean of precision and recall, providing a balanced measure of
the model’s performance.
Emotion-Aware Music Recommendations: A Transfer Learning … 9
The specific results obtained will depend on the model architecture, training
approach, and hyperparameter settings. It is important to fine-tune the model and
optimize the hyperparameters to achieve the best performance on the RAF dataset
(Fig. 3; Table 3).
6 Conclusion
sadness, and disgust, we were able to build a model capable of detecting and clas-
sifying these emotions in real-time. The effectiveness of the model was evaluated
using an independent test dataset, and the results showed that the system can success-
fully identify different emotions from real-time images. This allows the system to
generate music playlists tailored to the user’s current emotions, enhancing their music
listening experience.
Overall, the developed algorithm provides a personalized and improved user expe-
rience by accurately recognizing and classifying emotions from facial expressions
and generating music playlists that align with the identified emotions. The combi-
nation of facial emotion detection and music recommendation creates a unique and
individualized user experience in the field of mood-based music recommendation
systems.
References
1. Sharma N, Kumar R (2019) Emotion based music recommendation system using machine
learning techniques
2. Bodapati JD et al (2022) A deep learning framework with cross pooled soft attention for facial
expression recognition. J Inst Eng (India) Ser B 103(5):1395–1405
3. Choudhary S, Gupta S (2018) A music recommendation system based on emotion recognition
using artificial neural network
4. Jyostna Devi B, Veeranjaneyulu N (2019) Facial emotion recognition using deep CNN based
features. Int J Innov Technol Eng (IJITEE) 8(7)
5. Janghel NK, Gupta S (2020) Music recommendation system based on facial emotion
recognition using CNN
6. Yang X et al (2019) Affective music recommendation with CNN-based feature learning and
fusion
7. Wei et al (2021) Music recommendation based on face emotion recognition
8. Dhavalikar AS, Kulkarni RK (2014) Face detection and facial expression recognition system.
Institute of Electrical and Electronics Engineers (IEEE)
9. Jaichandran R, Ranjitha J (2021) Facial emotion based music recommendation system using
computer vision and machine learning techniques
10. Ayush G et al (n.d.) Music recommendation by facial analysis
11. Ahmed Hamdy A (n.d.) Emotion-based music player emotion detection from live camera
12. Karpathy A et al (2014) Large-scale video classification with convolutional neural networks
13. Bodapati JD et al (2022) FERNet: a deep CNN architecture for facial expression recognition
in the wild. J Inst Eng (India) Ser B 103(2):439–448
14. Yang YH et al (2008) A regression approach to music emotion recognition. IEEE Trans Audio
Speech Lang Process
15. Synak P, Lewis R, Raś ZW (2005) Extracting emotions from music data. In: International
symposium on methodologies for intelligent systems
16. Song Y, Dixon S, Pearce M (2012) Evaluation of musical features for emotion classification
17. Lee K, Cho M (2011) Mood classification from musical audio using user group-dependent
models
18. Gil et al (2015) Emotion recognition in the wild via convolutional neural networks and mapped
binary patterns
19. Zhang D et al (2020) Multi-modal multi-label emotion detection with modality and label
dependence. In: Proceedings of the 2020 conference on empirical methods in natural language
processing (EMNLP)
Emotion-Aware Music Recommendations: A Transfer Learning … 11
20. Bodapati JD, Balaji BB (2023) TumorAwareNet: Deep representation learning with attention
based sparse convolutional denoising autoencoder for brain tumor recognition. Multimedia
Tools and Appl:1–19.
21. Bodapati JD, Balaji BB (2023) Self-adaptive stacking ensemble approach with attention based
deep neural network models for diabetic retinopathy severity prediction. Multimedia Tools and
Appl:1–20.
A Novel Image Captioning Approach
Using CNN and MLP
Abstract Computer vision and natural language processing researchers have ded-
icated significant time and energy to the problem of automatically creating image
descriptions. In this work, we propose an artificially intelligent picture captioner
built on a hybrid architecture of convolutional neural networks (CNNs) and multi-
layer perceptron (MLPs). This system will take photographs as input and generate
captions based on those images using algorithms from convolutional neural networks
(CNN), multilayer perception (MLP), recurrent neural networks (RNN). To gener-
ate a natural language caption, CNN first extracts feature from the input image, and
then the MLP analyses these features. The proposed model is trained on a dataset
of photos with captions by optimizing the parameters of a convolutional neural net-
work (CNN) and a multilayer perceptron (MLP) via a hybrid of supervised learning
and reinforcement learning. Our results demonstrate that the suggested model may
produce captions that are on par with the best methods currently available in terms of
accuracy and variety. The artificially intelligent picture captioner has potential uses
in many areas, such as social networking, e-commerce, and image retrieval systems.
1 Introduction
The study of natural language processing, picture recognition, and machine learning
all contribute to the field of image captioning. The Internet, newspapers, academic
papers, government reports, and even advertising all contain visuals that we are
exposed to on a daily basis. Images in these sources are left up to the discretion
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 13
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_2
14 S. Sharma et al.
of the reader. Even though most pictures don’t have captions, we can nevertheless
make sense of them. However, in order for humans to use automatic image captions,
robots will need to understand a captioning system. The suggested model’s focus is
on providing a descriptive account of an image. Text in an image can be matched with
the correct text with the use of this technology. There are a number of reasons why
image captioning is crucial. Data collection about an object and its connections to
others is required for these jobs. By including a voice recognition feature, it can also
aid the visually impaired in comprehending the content of images. Google Image
Search could benefit from automatic captioning if it were implemented. Having a
searchable caption for every image would be quite helpful. Captioning an image
involves analysing its constituent pieces and then describing them in a single sen-
tence using only ordinary English. Machine learning algorithms typically fall short
because of the challenges presented by complicated data [10]. The representation of
the relation between text and image is one source of difficulty when attempting to
represent and measure text and image similarity. Two broad classes of approaches can
be used to address this problem. To achieve a one-to-one match, we need to merge the
global feature representations of the two sources into a single, shared space [17]. The
worldwide similarity of an image’s text is all that is taken into account by the many-
to-many matching approach; the image’s actual content is disregarded. In most cases,
this approach fails to reliably retrieve relevant information across many media. The
vast majority of the duplicates serve no purpose and are dependent on the matching
processes. They also sounds that increase or decrease the computational complexity
of the model.
2 Literature Review
The captioning of images was investigated by Alam et al. [1] using deep learning.
Investigating current deep learning methods for picture captioning is the focus of
this study. Both the most important step and the CNN-based approach for generat-
ing photo captions were addressed. Captioning models for videos and images were
created by Amirian et al. [2]. They use deep learning algorithms to automatically
generate captions or descriptions for still images and video clips. To aid the visually
impaired, they are expanding this area of expertise to include deep learning-based
automatic captain generation for still images and moving images. Image captions
were automatically created by Bang and Kim [4] using contextual information. The
method produces data that is temporally spatial and visually informative. The tex-
tual description method creates sentences from photo captions. Data organization
methods can be used to any type of information, including but not limited to images,
texts, timestamps, and geographic locations. Data collected by UAVs can be better
managed if the surrounding environment is known. Deng et al. [5] built a model for
creating image descriptions using a Dense Net network and adaptive attention. In
this research, the use of a visual sentinel as part of an adaptive attention paradigm is
proposed. The model uses Dense Net to get at the big picture of an image. To improve
A Novel Image Captioning Approach Using CNN and MLP 15
the output quality of photo captions, an LSTM Network is used as a language gener-
ation model. Image captioning is generated using the Flickr30k and COCO datasets.
Using a model for picture to text synthesis, Hossain et al. [6] enhanced image cap-
tioning. The solutions employ deep learning models for training and assessing the
models using human-annotated photographs. They present a technique for image
captioning that employs both real and simulated data in the model’s training and val-
idation phases. The synthetic graphics were produced using a Generative Adversarial
Network-based text-to-image generator. Iwamura et al. [7] used a moving convolu-
tional neural network (CNN) with object detection to generate visual descriptions.
They released a prototype built with MSCOCO, MSR-VTT2016-Image, and public
domain pictures. The datasets were used to generate captions for images and assess
the quality of those captions. Kalra and Leekha [8] assessed CNN’s image captioning
capabilities.
Models were constructed using a CNN for picture embedding and a recurrent
neural network (RNN) for modelling and predicting language. They proposed a
model for generating textual descriptions. Mathur studied several different deep
learning models for picture captioning [8]. Natural language processing is used to
grasp the syntax and semantics of written language, while computer vision is used to
analyse visual data. Caption creation is automated with the help of machine learning.
Researchers employ convolutional neural networks for visual input analysis, and
recurrent neural networks for phrase generation. Phukan and Panda [11] created
a powerful method for deep neural network-based picture captioning. Automatic
captioning is crucial for image data on the internet since entities need to be accurately
identified and handled. They devise an innovative and efficient approach to automatic
picture captioning for single images, and they describe ways to improve the method’s
efficacy and utility.
Sharma and Jalal [12] employed a convolutional neural network and a long short-
term memory to include external knowledge for image captioning. They proposed
an approach to improved image description that integrates both local and external
knowledge from sources like Concept Net. They utilized the Flickr8k and Flickr30k
datasets as examples of how to generate image descriptions. Shi et al. [13] used
the MSCOCO dataset to refine their picture captioning model. It was shown on
the MSCOCO dataset that the suggested framework achieves better results than the
baselines, leading to caption generation. Wang et al.’s [14] method of remote sensing
image captioning utilized a word-sentence framework. The core components of the
proposed structure are a phrase generator and a word extractor. Instead of improving
remote sensing images, the former approach eliminates pertinent terms. The picture
captioning model proposed by Zhang et al. [15] uses gLSTM with visual enhance-
ments. To improve the precision of visual description, they created a paradigm for
guiding long short-term memory (gLSTM) that collects visual information from RoI
and uses it as guiding information in gLSTM. High-resolution remote sensing Zhao
et al. [16] used structured attention to annotate images. When compared to other
current deep learning-based captioning systems, attention-based captioning stands
out for its ability to generate text while simultaneously identifying the locations of
relevant objects in images.
16 S. Sharma et al.
The image captions were produced using a neural architecture search by Zhu et al.
[17]. They provided a model for Neural Architecture Search that has proven effective
in a number of image recognition tasks. A recurrent neural network is also necessary
for the process of image captioning. Using a reinforcement learning technique that
relies on common parameters, they efficiently construct an Auto RNN.
3 Basic Terminologies
Here, we’ll go over some of the most fundamental concepts in computer vision,
natural language processing, and picture captioning:
– CNN: It’s an ANN that processes pixel data. Common applications include picture
identification and text-based storytelling, both of which need deep learning. A
neural network is a computer system that does visual processing in the same way as
the human brain does. CNNs are intended to process images in a way that does not
need down sampling the original image. A convolutional neural network (CNN)
is a multilayer perceptron that is dispersed in order to minimize computational
overhead [9]. An input layer, a hidden layer, and several convolutional layers
make up its layered architecture [3].
– MLP: A multilayer perceptron is a multi-layered feedforward artificial neural
network. A “vanilla” neural network is a common term for this type of system.
There are at least three layers in any multi-level model input, hidden, and output.
The activation functions of each successive layer are non-linear. Do not confuse a
multilayer perceptron with a perceptron network or think of it as a single perceptron
with numerous layers. Instead, it describes a model of synthetic neuron that may
be programmed to carry out any number of different activation procedures. The
artificial neuron nodes and layers were later dubbed multilayer perceptrons.
– RNN: It’s a neural network that generates its next action based on the outcomes
of the prior one. When a machine needs to predict the next word in a sentence, it
often forgets the one before it. RNN was designed to address this problem. The
input data is stored in the RNN’s memory. All inputs are processed with the same
set of parameters. The algorithm’s complexity is therefore removed.
4 Proposed Approach
and finance. Multiple nodes are interconnected to form a learning system known as
a deep neural network. These parts cooperate to carry out advanced tasks like object
recognition and categorization.
A deep neural network is able to make predictions and categorize data because of
the way its layers interact together. In this paper, we employ a deep learning approach
because of its remarkable capacity to combine existing AI technologies in order to
automatically identify photos, characterize their components, and generate short
phrases that are grammatically acceptable descriptions of those components. Using
convolutional neural networks (CNN), multilayer perception (MLP), and recurrent
neural networks (RNN) techniques, this system will accept input photographs and
create captions based on those photographs (RNN).
5 Methodology
There are a plethora of publicly available datasets that can be used to our challenge.
Use Flickr’s 8k photos, 30k photos, etc. There are 8000 pictures in this dataset. Five
separate captions are provided for each individual image.
Along with the images, there is also text file “Flickr8k.token.txt” that contains the
names of the images and their captions as shown in Fig. 1.
After mapping the data, the next step is Data Cleaning in which, following steps are
done to reduce the time of processing like:
The model’s vocabulary is the collection of all distinct words it can foresee. After
polishing our descriptions, we’ll use them to generate new words. The model’s output
is a number, and we’ll use that number to make a word prediction. Since a model
can produce several possible captions, we will select the terms using probability.
We’ve separated the photos in our dataset into those for testing and those for training.
Here, we’ll make a chart that connects each identifier from the training set to its cor-
responding caption. Now we’ll utilize an RNN, or a layer based on a long short-term
memory network, to generate text. Take the line, “Dog is running”, as an illustration.
In this case, the word “dog” is formed first, then sent to the next layer, where the
word “is” is generated, and finally the word “running” is generated. However, this
is when the loop might become infinite. Therefore, our model ought to recognize
when to cease. The ‘e.>’ sentence-ending token will be created for this purpose. To
complement the first RNN set, we will additionally generate a start token to be used
as an input.
This method is used to improve an image or get data out of it. Analog and digital image
processing are the two most used methods. The end result is a picture every time.
The results of digital image processing can be linked to a particular photo. Medical
visualization, autonomous vehicles, video games, and even law enforcement all make
use of image processing in some capacity. Here, we make use of an already-trained
model (ResNet50) on image net in order to extract features from images. ResNet
additionally allows for skip connections; an attribute tells us which layers a certain
layer connects to. By avoiding the “Vanishing gradient” problem, skip connections
are helpful.
Next, we’ll store the results of the time-consuming operation of encoding all the
photos in an output file. Using the ResNet50 model (see Fig. 2), the output file will
have an image id that corresponds to a feature vector. The vectors have been retrieved
and are ready to be saved. Pickle will be used for archival purposes. It provides us
with the dump and load operations, which we can use to save information on disc
and retrieve it later (Fig. 3).
20 S. Sharma et al.
blind, this image captioning deep learning model is extremely beneficial. The impli-
cations of our work are significant in various applications such as image annotation,
visual assistance systems, and image search engines. The ability to automatically
generate descriptive captions for images can greatly enhance the accessibility and
understanding of visual content for users.
7 Future Scope
We have detailed the process of creating image captions in this paper. Deep learning
has come a long way, but it’s still not viable to generate correct captions for a
number of reasons (machines can’t think or make decisions as accurately as humans,
for example; there’s no good programming logic or model to do so). With better
technology and more sophisticated deep learning models, we anticipate improved
caption generation in the near future. One potential use of this technique is translating
visual captions into spoken language. For the blind, this is a great aid.
References
1. Alam MS, Narula V, Haldia R, Ganpatrao GN (2021) An empirical study of image caption-
ing using deep learning. In: 2021 5th international conference on trends in electronics and
informatics (ICOEI). IEEE, pp 1039–1044
2. Amirian S, Rasheed K, Taha TR, Arabnia HR (2020) Automatic image and video caption gen-
eration with deep learning: a concise review and algorithmic overlap. IEEE Access 8:218386–
218400
3. Azhar I, Afyouni I, Elnagar A (2021) Facilitated deep learning models for image captioning.
In: 2021 55th annual conference on information sciences and systems (CISS). IEEE, pp 1–6
4. Bang S, Kim H (2020) Context-based information generation for managing UAV-acquired data
using image captioning. Autom Constr 112:103116
5. Deng Z, Jiang Z, Lan R, Huang W, Luo X (2020) Image captioning using DenseNet network
and adaptive attention. Signal Process Image Commun 85:115836
6. Hossain MZ, Sohel F, Shiratuddin MF, Laga H, Bennamoun M (2021) Text to image synthesis
for improved image captioning. IEEE Access 9:64918–64928
7. Iwamura K, Louhi Kasahara JY, Moro A, Yamashita A, Asama H (2021) Image captioning
using motion-CNN with object detection. Sensors 21(4):1270
8. Kalra S, Leekha A (2020) Survey of convolutional neural networks for image captioning. J Inf
Optim Sci 41(1):239–260
9. Mann S, Bindal AK, Balyan A, Shukla V, Gupta Z, Tomar V, Miah S (2022) Multiresolution-
based singular value decomposition approach for breast cancer image classification. BioMed
Res Int 2022. https://doi.org/10.1155/2022/6392206
10. Maroju A, Doma SS, Chandarlapati L (2021) Image caption generating deep learning model.
Int J Eng Res Technol (IJERT) 10(09)
11. Phukan BB, Panda AR (2021) An efficient technique for image captioning using deep neural
network. In: Cognitive informatics and soft computing: proceeding of CISC 2020. Springer,
pp 481–491
12. Sharma H, Jalal AS (2020) Incorporating external knowledge for image captioning using CNN
and LSTM. Mod Phys Lett B 34(28):2050315
A Novel Image Captioning Approach Using CNN and MLP 23
13. Shi Z, Zhou X, Qiu X, Zhu X (2020) Improving image captioning with better use of captions.
arXiv preprint arXiv:2006.11807
14. Wang Q, Huang W, Zhang X, Li X (2020) Word-sentence framework for remote sensing image
captioning. IEEE Trans Geosci Remote Sens 59(12):10532–10543
15. Zhang J, Li K, Wang Z, Zhao X, Wang Z (2021) Visual enhanced gLSTM for image captioning.
Expert Syst Appl 184:115462
16. Zhao R, Shi Z, Zou Z (2021) High-resolution remote sensing image captioning based on
structured attention. IEEE Trans Geosci Remote Sens 60:1–14
17. Zhu X, Wang W, Guo L, Liu J (2020) AutoCaption: image captioning with neural architecture
search. arXiv preprint arXiv:2012.09742
Route Optimizations Using Genetic
Algorithms for Wireless Body Area
Networks
Gagan Sharma
Abstract There is a lot of research taking place in this fast and growing technology,
namely Wireless Body Area Networks (WBAN), to improve network performance.
This work focuses to improve the WBAN system which is used in hospitals for the
benefits of patients as well as doctors easiness. A lot of work can be identified in
the same field and used as literature. Different researchers use different techniques
to minimize energy consumption, packet delivery ration like Kruskal’s algorithm,
Prim’s algorithm, particle swarm algorithm, etc. The energy of a sensor node gets
impacted due to transmission distance, ideal routing protocols, and amount of data to
be transmitted. Genetic algorithm (GA) which is a natural evolutionary algorithm and
finds the optimal path to reach to destination which provides benefit to the patients
in terms of their valuable data (patient’s report). Using GA along with fuzzy logic
and cluster head selection, better results will be provided for packet delivery ratio
and normalized residual energy, which further itself improves network lifetime.
1 Introduction
WBAN is a concept that has recently gained attention and is simple for people to
incorporate into their daily lives. To measure various physiological signals produced
by the human body, nano- and micro-devices are created and implanted on human
body. A variety of heterogeneous sensors, including EEG, ECG, blood pressure level
monitoring, and other sensors, are employed to measure various bodily parameters
[1]. Each sensor gathers information from its own planted sensor and sends it to the
sink node, which serves as a database for the collection and storage of all information.
Body Sensor Network (BSN) is another name for WBAN. WBAN is used in various
G. Sharma (B)
CSE Department, Chandigarh University, Mohali, India
e-mail: 21mai1043@cuchd.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 25
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_3
26 G. Sharma
fields, including the military and law enforcement, as a result of extensive study in
this area. There are a lot of other applications of WBAN.
The protocols that are created for communication reasons in WBANs between the
body’s sensors and a data center connected to the internet via a body node. In WBAN,
there are two different methods of communication between nodes:
1. Body to body communication.
2. Additional body language.
A cutting-edge data transmission technique called “intra-body communication”
makes use of the body’s electrical conductivity. The implanted sensors track essential
bodily processes and transmit information to a centralized monitoring unit through
the body.
Communication between personal gadgets and an external network is ensured by
extra body communication.
2 Related Work
Many researchers have worked for the improvement of Wireless Networking Systems
to enhance the quality of network for packet loss during data transmission between
sensor nodes and less battery consumption by nodes during transmission, but there is
still some room for improvement in cases where during packet delivery loss occurs,
battery consumes, etc. especially in medical area. Balouchestani et al. [2] presents
low-power wireless healthcare systems that employ the genetic channel model and
MFC’s capabilities, and it looks into the advantages of using the CS theory with MFC.
Both patients in hospitals and those at home can receive treatment thanks to the use of
CS in MFC. Best route is chosen utilizing the best optimization method. Khalilian and
Rezai [3] reviewed the WBAN applications in real-time healthcare monitoring system
and address the remote healthcare monitoring impacts and propose the solution to
Route Optimizations Using Genetic Algorithms for Wireless Body Area … 27
wear sensors around patient body. Chaudhary [4], an energy-efficient FECG signal
optimization utilizing a genetic algorithm has been presented, extending the battery
life of the FECG sensor and reducing the demand on the buffer memory. The FECG
signal’s transit through the network can be made more streamlined and quick in
the future by using more bio-inspired algorithms and compression techniques. Chen
et al. [5] proposed architecture namely “beyond-BAN communication” which is
used to deliver body signals to remote terminals in timely fashion. For the high
mobility of patients and doctor, existing architectures are not suitable. So by using this
proposed novel network architecture, the QoS Long-Term Networking and Named
Data Networking can be improved which overall improves the performance of the
architecture. Dinkar et al. [6] discussed regarding the use of biomedical sensors
for long-distance continuous patient health monitoring. Sensing system which is
worn by the patients should be having less battery size. Data is collected at sink
node and then transferred to the respective doctors by having internet connectivity
in their mobile phones. Murugeswari and Murugan [7] proposed a synchronized
protocol to address the energy efficiency and quality of service in WBAN. Further,
he also explains the impact of latency and communication medium of the network
and compares and validates the proposed protocols with existing approaches. He
et al. [8] proposed such protocols which ensure high reliability and provide security.
In this, multiple one-way key hash chain is used to provide authentication. Hiep and
Kohno [9] described that due to increase in the population of elderly population,
health issues are increasing so due to which healthcare market keeps on growing.
Multi-hop systems are used in which relay node is used for forwarding the data to
the multiple receivers system. High efficiency is achieved by optimizing the packet
rate and by successful transmission probability.
3 Proposed Methodology
The presented work focuses upon improving the WBAN system which is used in
hospitals for the benefit of patients as well as doctors easiness. In previous researches,
many researchers had done work on this problem. A lot of research has taken place
in this field for minimizing energy consumption, improving packet delivery ration
using Kruskal’s algorithm, Prim’s algorithm, and particle swarm algorithm [10–13].
So following are the aims of the proposed work:
1. To enhance the technology which reduces data transfer loss between the sensors
nodes while communication as well as less battery consumption by nodes.
2. For the safety of the patient, battery consumption during communication should
be less, so the less battery gets consumed, less network improvement occurs.
Inversely, the less battery gets consumed, more improvement occurs in the
network.
28 G. Sharma
Evolve P’ (tm);
// new fitness evaluate
P’ (tm);
// identify the survivors
fitness P := survive P,P’ (tm);
od end GA.
A transmission schedule created by GA is made up of transmission rounds. The
transmission schedule comprises routing pathways that the network must follow
to maximize lifetime. Data is collected from sensor nodes to sink. The nodes are
dispersed over the body, as shown in Fig. 2, and genetic algorithms can be used to
optimize routing. The sink node, which stores all data, is located at the waist.
Figure 1 shows implementation flowchart. After that possible number of routes are
found (approximately 10 routes) in which RSSI parameters are followed. In RSSI,
the different links with less number of hops are selected for further process because
less the number of hops more will be the packet delivered to its destination. Then,
GA will decide the best route by following shortest path. If the best route is not
found, the loop will be followed till the goal is reached. If best route is found among
the possible number of nodes, then parameters like RSSI, battery, fuzzy logic, and
SPT are evaluated over human body. Nodes communicate with each other, and the
shortest path is generated so that all the issues which occur in WBAN get removed
(Fig. 2).
So in the proposed work Genetic Algorithm is used in which constrained based
cluster head selection is used organize the communication channel. It is an efficient
optimization technique among Particle Swarm Optimization (PSO) and Dynamic
Source Routing (DSR).
Figure 3a shows the normalized residual energy for different parameters. By adopting
a load-balancing strategy in the network of nodes, the genetic algorithm achieves the
best overall outcomes by maintaining a greater value of residual energy for the
majority of the network nodes. On the other hand, since other factors like hop count
and RSSI have no effect on balancing network load, relaying nodes would use up
their battery power more quickly than other nodes in the network, which will result
in network partition. For instance, when the sink node is at the ankle and nodes 10
and 11 must relay the majority of the network traffic, genetic algorithm and battery
metrics can balance the network load and maintain both nodes with roughly the same
residual energy level, and Fig. 3b shows clearly that the results of the proposed work
are better than previous results.
30 G. Sharma
Fig. 1 Implementation
flowchart
WBAN is a most vast field for research in which heterogeneous nodes are deployed
over human body to measure different body signals. Mamdani fuzzy inference system
were used in which rules are to be set using and/or fuzzy logic also shows good result
but in the proposed work, best optimization technique namely GA which is based on
the natural evolution in which on the basis fitness function, chromosomes are selected.
For 100 generations, the best possible route is selected among various possible routes
in the shortest span of time by improving packet delivery ratio, residual energy
consumed by nodes, and network lifetime increases. The shortest path is extracted
to reach the packet or send data to its destination in the WBAN in the shortest span
of time. In the proposed work, results are far better than the PSO and DSR one in
order to remove the issues of WBAN system. For handicapped people and elderly
people, this technology is beneficial because as it is aware that the old age patients
cannot move too much so they cannot visit doctor too much, so there using this best
technology, patients are installed with sensors or nodes and then remotely from any
Route Optimizations Using Genetic Algorithms for Wireless Body Area … 31
place doctors can check their patients and prescribe them medicines. Future work for
WBAN demands security in data processing and keeping the information and data
secure. Blockchain technology can be followed in advanced stages of data security.
32 G. Sharma
References
1. Latre B, Braem B, Moerman I, Blondia C, Demeester P (2011) A survey on wireless body area
networks. Wireless Netw 17:1–18
2. Balouchestani M, Raahemifar K, Krishnan S (2012) Wireless body area networks with
compressed sensing theory. In: Proceedings of 2012 ICME international conference on complex
medical engineering, Kobe, Japan, 1–4 July 2012
3. Khalilian R, Rezai A (2022) Wireless body area network (WBAN) applications necessity in
real time healthcare. In: 2022 IEEE integrated STEM education conference (ISEC), Princeton,
NJ, pp 371–374. https://doi.org/10.1109/ISEC54952.2022.10025199
4. Chaudhary K (2014) Fetal ECG signal optimization on signal obtained from FECG sensor for
remote areas with lower signal strength for its smooth propagation to medical databases. Int J
Sci Res (IJSR)
5. Chen M, Mau DO, Wang X, Wang H (2013) The virtue of sharing: efficient content delivery
in wireless body area networks for ubiquitous healthcare. In: 2013 IEEE 15th international
conference on e-health networking, applications and services (Healthcom 2013), Oct 2013.
IEEE
6. Dinkar P, Gulavani A, Ketkale S, Kadam P, Dabhade S (2013) Remote health monitoring using
wireless body area network. Int J Eng Adv Technol (IJEAT) 2(4). ISSN: 2249-8958
7. Murugeswari S, Murugan K (2019) Improving energy efficiency and quality of service in
wireless body area sensor network using versatile synchronization guard band protocol. In:
2019 IEEE international conference on clean energy and energy efficient electronics circuit for
sustainable development (INCCES), Krishnankoil, pp 1–5. https://doi.org/10.1109/INCCES
47820.2019.9167738
8. He D, Chan S, Zhang Y, Yang H (2014) Lightweight and confidential data discovery and
dissemination for wireless body area networks. IEEE J Biomed Health Inform 18(2)
9. Hiep PT, Kohno R (2013) Optimizing data rate for multiple hop wireless body area network. In:
The 2013 international conference on advanced technologies for communications (ATC’13),
Oct 2013. IEEE
10. Cavallari R, Martelli F, Rosini R, Buratti C, Verdone R (2014) A survey on wireless body area
networks: technologies and design challenges. IEEE Commun Surv Tutor 16(3)
11. Chavez-Santiago R, Nolan K, Holland O, De Nardis L, Ferro J, Barroca N, Borges L, Velez
F, Goncalves V, Balasingham I (2012) Cognitive radio for medical body area networks using
ultra wideband. IEEE Wireless Commun 19(4):74–81
12. Santhosha S (2012) Improving the quality of service in wireless body area networks using
genetic algorithm. IOSR J Eng 2(6)
13. Qadri SF, Awan SA, Amjad M, Anwar M, Shehzad S (2013) Applications, challenges, security
of wireless body area networks (WBANs) and functionality of IEEE 802.15.4/ZIGBEE. Sci
Int (Lahore) 25(4):697–702. ISSN 1013-5316
Prediction Model for the Healthcare
Industry Using Machine Learning
Abstract The role of machine learning in health care in emerging times, the field
of research is industry. In machine learning, there are various forms of learning,
including supervised, unsupervised, and reinforcement learning. These strategies
are necessary to discover previously unknown relationships in data that are bene-
ficial to society. In predictive modeling, historical data are used to predict a result
variable. The uses of machine learning in medical care are turning into a benefit
for disease identification and diagnostics. The healthcare industry can benefit from
machine learning’s capacity to assist in the intelligent analysis of huge amounts of
data. Different methods of machine learning, including supervised, unsupervised,
and semi-supervised, reinforcement learning for health care, such as SVM, KNN,
K-Mean clustering, neural network, and decision tree, provide varying levels of accu-
racy, precision, and sensitivity. The area of machine learning (ML) is on the rise.
The purpose of machine learning is to automatically discover patterns and reason
with data. ML offers tailored therapy-dubbed precision medicine. Health care has
benefited from the application of machine learning approaches. Within a few years,
machine learning will alter the healthcare industry.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 33
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_4
34 B. K. Saraswat et al.
1 Introduction
The urgent need for machine learning has been highlighted by the massive rise of
data in the medical industry. Machine learning is proving supportive in removing
relevant data from massive information. The classification and grouping of data can
be assisted by the machine learning algorithm, but the lack of a reliable dataset
presents the biggest challenge in the healthcare industry. Just while the preparation
datasets are liberated from destructive material, would the forecasts made utilizing
AI calculations to be precise.
Nowadays, in healthcare industry, machine learning techniques like deep learning
are producing actual categorization and prediction results for images. Recent
improvement have expanded the sources of data collecting to include.
In [1], the creator proposes a brain network-based calculation for anticipating
ailment risk in view of coordinated and unstructured medical clinic information.
Utilizing machine learning in the well-being area empowers the disclosure of illness
designs from huge datasets.
Utilizing AI to distinguish designated mistakes, creators have ordered datasets of
harming assaults to find designated issues.
Researchers in the healthcare sector have a significant opportunity to improve the
healthcare sector through the discovery of knowledge data contained in electronic
data. Using computational approaches, machine learning manages complex datasets
to draw meaningful conclusions. There is a vast application space for ML algorithms
in the analysis of massive amounts of healthcare data. The machine learning classi-
fiers presented by the authors are intended to aid with healthcare-related results. It is
a choice help framework for the characterization of electronic patient records. The
machine learning approaches aid in the analysis of (a) organized and formless data,
(b) grouping of patients with alike medical signs, and (c) illness prediction.
The tools of machine learning that are aid in prescient examination of medical
care exercises. Apache Mahout, Sky tree, Karma sphere, and Big ML are examples
of (a). These tools carry out large data analytics and data mining. The limitations
and potential applications of picture examination and AI in computerized pathology
are discussed in [2]. Images containing unstructured and complex data can be used
for unsupervised learning using deep learning techniques in conjunction with ANN.
When building and training machine learning models to create predictive models,
both supervised and unsupervised learning techniques extract attributes from a lot
of information.
The authors of [3] conducted a literature review on machine learning in the health-
care industry. The following issues of transforming health data are overcome by deep
learning technology, leading to enhanced healthcare: Medical data might have hetero-
geneous properties, unstructured data, missing values, data from many domains,
complexity, and unstructured data. Building predictive models uses data mining and
statistical methods.
Prediction Model for the Healthcare Industry Using Machine Learning 35
1.1 Motivation
The use of machine learning and artificial intelligence in health care is urgently
required. The force of Python-like dialects can be appropriately investigated in AI to
ultimately benefit society. In order to execute predictive analytics, machine learning
uses algorithms for classification and clustering that learn from large datasets. The
doctors are able to acquire current information about health care and provide patients
with rapid treatment thanks to the predictive outcomes. In order to provide best-in-
class/leading-edge patient care and subsequently enhance well-being results, AI in
the healthcare industry has to develop analytical engines/products that alter how
the healthcare ecosystem accesses and generates value from integrated clinical
information.
With increasingly accurate predictions, machine learning aims to create more
favorable results. Techniques for machine learning rely largely on computational
power. Machine learning is built on creating algorithms that can accomplish this
using computers’ binary yes-or-no logic. In machine learning, there are two cate-
gories: supervised and unsupervised, respectively. Grouping methods are used in
unsupervised learning, whereas regression techniques and classification are used in
supervised learning. Figure 1 displays the categories for machine learning.
In the area of medical care, machine learning provides superior results. According
to a report by McKinsey, machine learning in pharmacy and medical might generate
up to $100 billion annually [5]. This is a result of the accelerated decision-making,
the effectiveness of Health Care trials, and enhanced innovation. There are numerous
machine learning applications in pharmacy. They are generally categorized as [6]:
• Identification and diagnosis of disease.
• Behavior Modification/Personalized Treatment.
• Drug Manufacturing/Discovery.
• Research on Healthcare Trials.
• Radiology and Radiotherapy.
• Intelligent Electronic Medical Records.
• Prediction of an Epidemic Outbreak.
Prediction Model for the Healthcare Industry Using Machine Learning 37
Based on data gathered from social media updates, the web, and satellite data,
machine learning and artificial intelligence technologies are frequently employed to
monitor and predict epidemic outbreaks [7]. Applications of machine learning in
health care for disease diagnosis have been developed successfully in the modern
era. In the earlier years, a lot of research was conducted in the field of medicine
to identify disorders using machine learning algorithms. To talk about some recent
efforts from the past that used machine learning techniques to detect diseases.
An idea in view of machine learning algorithms has been described in [8] in
order to assist physicians and patients in detecting disease in its early stages. This
study exclusively employed a text-based dataset. This study created a system that
allows doctors to diagnose common diseases using patient symptoms. A review of
disease diagnosis using machine learning approaches is presented in the work [9].
This study demonstrates the application of machine learning to high-layered and
multi-layered information and draws some conclusions about the algorithms’ limits.
In an article [10], another investigation into the use of machine learning algorithms
for medical diagnostics. The utilization of several machine learning algorithms for
precise medical diagnosis was the main topic of this research. In this paper, a thorough
analysis of machine learning methods for disease diagnosis is conducted. The focus
of this paper is on recent advances in machine learning that have had a substan-
tial impact on disease identification and diagnosis. Numerous studies have been
conducted to predict heart disorders, such as [11], which used a variety of machine
learning algorithms to identify early-stage heart disease, including SVM, RF, KNN,
and ANN classification algorithms. In the research publication [12], NN was used to
create a diagnostic system that could accurately predict the likelihood of developing
a heart condition. Congestive heart failure has emerged as one of the leading killers
in the modern world. Wu et al. [13] describes the development of four classification
categories for the precise diagnosis of fatty liver disease. This research demonstrates
that the random forest model outperforms the other four models.
The goal of this paper is to use machine learning techniques to improve liver
disease diagnosis. The major objective of this study is to categorize liver patients
from healthy individuals using classification algorithms. Machine learning tech-
niques have been used to predict the hepatitis disease in [14]. In order to forecast the
cirrhosis development in a sizable cohort of CHC-infected individuals. In the study
[15, 16], data mining algorithms were utilized to forecast the phases of renal disease.
The PNN method offers superior performance in both classification and prediction
when compared to the others. SVM performs better than DT when it comes to
predicting chronic renal disease, yet both of these machine learning techniques are
utilized. In the publication [17], a machine learning system is used to make a predic-
tion for type-2 diabetes. In order to put the prediction model into action, support
vector machine is utilized. The findings of the experiments presented in the research
publication [18] demonstrate that the random forest method is an excellent choice for
developing an effective machine learning model to diagnose diabetes. In the study
[19], researchers utilized four distinct machine learning algorithms in order to make
a diabetes mellitus prediction among the adult population.
38 B. K. Saraswat et al.
Even if huge jobs have been completed by researchers in the machine learning arena,
this does not indicate that all manpowers will be replaced by computers very soon.
To begin, machine learning is analogous to a tool, which can only function as effec-
tively as its operator. Since the 1970s, ML has been a component of research in the
healthcare industry. Initially, this study was utilized to calibrate antibiotic dosages
for patients with infections.
Pathology is the process of detecting disease by referring to the findings of labo-
ratory tests on physiological fluids and tissues, such as urine, blood, and other bodily
fluids. The traditional approaches, such as using microscopes, can be supplemented
with the use of machine vision and other machine learning technologies. Through
the use of machine learning systems, clinicians are able to diagnose rare diseases by
merging algorithms with facial recognition software. The patient photos are analyzed
and interpreted by using face analysis and machine learning techniques, which allows
for the determination of phenotypes that are associated with uncommon genetic
illnesses (Fig. 2).
Because the medical care zone and hospitals give such a wide variety of essential
services, it is faced with a great deal of difficulty in providing care to patients. When
Keeping
Well Early
Detection
Training
End of
Life Care
Decision
Treatment Making
considering the efficacy of the treatment over the long term, affording the enor-
mous cost becomes a tough task because hospitals and clinics are bound together by
resource management. The primary objective of hospital administration is to ensure
that patients receive the necessary care by recruiting qualified medical personnel and
efficiently allocating time for the use of diagnostic equipment. Machine learning is a
crucial component in each of these sectors, from the management of inventories to the
notification of emergency departments for patient treatment. It is possible to estimate
the amount of time a patient will have to wait by taking into account factors such
as staffing levels. Patients can be monitored remotely, and virtual nursing assistants
can provide assistance over the phone.
Research into the various patterns of gene expression has been one of the most
important aspects of medical investigation during the past twenty years. There are
already approximately 30,000 coding regions in the human genome, and these are
responsible for the production of a large number of proteins [20]. Therefore, ML-
based solutions are essential in genomic analysis since they tackle problems of a
complex and high dimensional nature in an ideal manner. Gene expression and gene
sequencing are the two basic sorts of genomic analysis that may be carried out with the
assistance of ML. Additionally, machine learning is utilized in gene editing through
the application of CRISPR technology to modify DNA sequencing. The collection of
additional data for the development of efficient machine learning models is necessary
if we are to achieve better deterministic gene alteration.
Diagnostic procedures are a group of examinations that are carried out to identify
diseases, specific medical problems, and infections. The most crucial component in
the improvement of patients is getting an accurate diagnosis as quickly as possible
and then providing treatment that actually works. Errors in diagnosis are respon-
sible for roughly ten percent of all human fatalities and between six and seventeen
percent of all complications that occur in hospitals. ML offers solutions to problems
that arise during the diagnostic process, particularly those that arise when dealing
with oncology and pathology. In addition, when looking at electronic health records
(EHR), machine learning offers improved diagnostic prediction (Fig. 3). In the field
of healthcare research, machine learning is mostly focused on image-based diag-
nosis and image analysis. Imaging that is based on machine learning is a wonderful
fit for the massive datasets that are seen in the healthcare industry since it requires
correct classification of enormous datasets [21, 22]. There is a problem caused by
differences in the imaging technologies used in medical imaging. It is possible that
40 B. K. Saraswat et al.
a certain model that was educated using the imaging system from developer will not
function properly when presented with images taken from another developer.
Logistic Regression (LoR) is a method that can be used to categorize data into
outcomes that are either yes or no. Using the logit function, which is defined as logit
= log(p/1 − p) = log (possibility of occurrence/possibility of event not occurrence)
= log, it determines the probability that an event will take place (Odds). Estimation is
performed using the Maximum Likelihood method, in which the model’s coefficients
provide information about the relative relevance of the various input features. Models
can be prevented from overfitting the training data by the use of regularization. Large
Prediction Model for the Healthcare Industry Using Machine Learning 41
sample sizes are required for LoR. The sigmoidal curve that can be seen in the
following figure divides the data into two classes.
Neural Networks (NNs) take their cue from the way in which the brain operates. The
machine is able to learn and improve itself through the analysis of new data thanks to
an artificial network of neurons that can produce output as a result of receiving input.
Input data can be converted into output by using the activation function, for example,
the sigmoid function. Overfitting is likely to occur in a Neural Network with a greater
number of constraints; hence, regularization is necessary to avoid this problem. In
order to construct ANN, one must first import the necessary Keras library and then
initialize the architecture of ANN to be sequential. This entails piling all of the layers,
one on top of the other, building the ANN structure, and then stacking them in that
order [25]. Input layer has input = # of highlights, output = (# of elements + 1)/
2. Any secret layer can have the same # of info and result factors = (# of elements
+ 1)/2. For the subsequent layer, there is a compelling reason need to add input
hubs on the grounds that the design of the organization is set to be feedforward,
and that implies the results from one layer are utilized as the contributions for the
following layer. The yield layer or output layer has input = (# of elements + 1)/
2 and result = 1; a twofold factor portraying infected or not. In the hidden layers,
the Rectified Linear Unit (ReLU) activation function can be employed, while the
Sigmoid activation function can be used in the output layer to produce a prediction
between 0 and 1. If the value is larger than 0.5, then the model is diseased; otherwise,
the model is healthy. To construct a neural network using supervised learning, one
can make use of the Multi-layer Perceptron (MLP Classifier) class provided by the
sklearn neural network package. The input, unseen, and output layers of a Multi-layer
Perceptron neural network [26].
and the Hamming statistic is used for categorical variables. Before attempting to
compute distance measure, the training set must first be standardized. The ideal
range for k is anywhere between 3 and 10.
Non-parametric decision trees, also known as decision trees (DT), are simple to read
and depict, and they are able to easily detention nonlinear designs. It not requires
normalization. Since DT is also biased toward datasets that are imbalanced, it is
recommended to first achieve dataset balance before developing the model. It does
this by partitioning the working space into subparts and then utilizing entropy and
data gain, gain ratio, or the Gini index to determine how well the classes are separated.
The ID3, C4.5 (an extension of ID3), and CART variants of decision tree each make
use of these metrics. It is also suitable for Feature Selection; however, decision tree
can overfit loud information, which can be mitigated by employing groups. It will
overall be utilized for Lost Value Attribution, and it is also appropriate for Feature
Selection.
3.6 Ensemble
as quiet history for the purpose of achieving more precise analytical results and
doing additional research. The purpose of this model is to make predictions about
the patients’ prognoses based on the symptoms they present with (Table 1).
4 Conclusion
The development of reliable risk models for health care can be facilitated through the
utilization of AI to the relevant data. The healthcare industry is already struggling
under the weight of an increasing number of patients despite a dearth of appropri-
ately qualified medical professionals. Large corporations such as Google, Enlitic,
and MedAware have all initiated significant projects aimed at enhancing the capa-
bilities of machine learning as well as artificial intelligence systems utilized within
the healthcare sector. It is not possible for there to be a natural rise in the number of
competent healthcare professionals. The application of technologies that use machine
learning can boost the productivity and accuracy of those already in use. The utiliza-
tion of these technologies will assist in the service of more patients in a shorter
amount of time, hence improving healthcare outcomes while also lowering the cost
of health care.
Precision refers to the percentage of times a diseased patient was diagnosed by a
machine as having that disease, whereas sensitivity and recall refer to the percentage
of times the computer correctly predicted disease among all afflicted patients. There-
fore, it is necessary for reputable hospitals to keep data electronically and make use of
it for the purpose of conducting relevant analytics, which can assist both patients and
medical professionals in making fast and correct diagnoses. Therefore, a promising
area of research would be to increase the capability of machine learning algorithms
to generalize, particularly with regard to imbalanced classes and multiclass data.
References
1. Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning
over big data from healthcare communities. IEEE Access 5:8869–8879
2. Palanisamy V, Thirunavukarasu R (2017) Implications of big data analytics in developing
healthcare frameworks—a review. J King Saud Univ Comput Inf Sci
3. Madabhushi A, Lee G (2016) Image analysis and machine learning in digital pathology:
challenges and opportunities
4. Quer G, Muse ED, Nikzad N, Topol EJ, Steinhubl SR (2017) Augmenting diagnostic vision
with AI. Lancet 390(10091):221
5. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G,
Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29
6. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E,
Agapow PM, Zietz M, Hoffman MM et al (2018) Opportunities and obstacles for deep learning
in biology and medicine. J R Soc Interface 15(141):20170387
7. Gottesman O, Johansson F, Komorowski M, Faisal A, Sontag D, Doshi-Velez F, Celi LA (2019)
Guidelines for reinforcement learning in healthcare. Nat Med 25(1):16–18
8. Sunny AD, Kulshreshtha S, Singh S, Srinabh BM, Sarojadevi H (2018) Disease diagnosis
system by exploring machine learning algorithms. Int J Innov Eng Technol (IJIET) 10(2):14–21
9. Razia S, Swathi Prathyusha P, Vamsi Krishnan N, Sumana S (2017) A review on disease
diagnosis using machine learning techniques. Int J Pure Appl Math 117(16):79–85
10. Pavithra D, Jayanthi AN (2018) A study on machine learning algorithm in medical diagnosis.
Int J Adv Res Comput Sci 9(4):42–46
46 B. K. Saraswat et al.
11. Mamatha Alex P, Shaji SP (2019) Prediction and diagnosis of heart disease patients using data
mining technique. In: International conference on communication and signal processing, pp
0848–0852
12. Subhadra K, Vikas B (2019) Neural network based intelligent system for predicting heart
disease. Int J Innov Technol Explor Eng (IJITEE) 8(5):484–487
13. Wu C-C, Yeh W-C, Hsu W-D, Islam MM, Nguyen PA, Poly TN, Wang Y-C, Yang H-C, Li Y-C
(2019) Prediction of fatty liver disease using machine learning algorithms. Comput Methods
Programs Biomed 170:23–29
14. Konerman MA, Beste LA, Van T, Liu B, Zhang X, Zhu J, Saini SD, Su GL, Nallamothu BK,
Ioannou GN, Waljee AK (2019) Machine learning models to predict disease progression among
veterans with hepatitis C virus. PLoS ONE 14(1):1–14
15. Tian X, Chong Y, Huang Y, Guo P, Li M, Zhang W, Du Z, Li X, Hao Y (2019) Using machine
learning algorithms to predict hepatitis B surface antigen seroclearance. Comput Math Methods
Med 2019:1–7
16. Rady E-HA, Anwar AS (2019) Prediction of kidney disease stages using data mining
algorithms. Inf Med Unlocked 15:100178
17. Abbas H, Alic L, Rios M, Abdul-Ghani M, Qaraqe K (2019) Predicting diabetes in healthy
population through machine learning. In: 2019 IEEE 32nd international symposium on
computer-based medical systems (CBMS), pp 567–570
18. Benbelkacem S, Atmani B (2019) Random forests for diabetes diagnosis. In: 2019 international
conference on computer and information sciences (ICCIS), pp 1–4
19. Faruque MF, Asaduzzaman, Sarker IH (2019) Performance analysis of machine learning tech-
niques to predict diabetes mellitus. In: 2019 international conference on electrical, computer
and communication engineering (ECCE), pp 1–4
20. Alansary A, Oktay O, Li Y, Le Folgoc L, Hou B, Vaillant G et al (2019) Evaluating reinforcement
learning agents for anatomical landmark detection. Med Image Anal 53:156–164
21. Peyrou B, Vignaux J-J, André A (2019) Artificial intelligence and health care. In: André A
(ed) Digital medicine. Springer International Publishing, Cham, pp 29–40. https://doi.org/10.
1007/978-3-319-98216-8_3
22. Sathya D, Sudha V, Jagadeesan D (2019) Application of machine learning techniques in health-
care. In: Handbook of research on applications and implementations of machine learning tech-
niques. IGI Global, Pennsylvania, pp 289–304. https://doi.org/10.4018/978-1-5225-9902-9.
ch015
23. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL (2018) Artificial intelligence
in radiology. Nat Rev Cancer 18:500–510. https://doi.org/10.1038/s41568-018-0016-5
24. Dzobo K, Adotey S, Thomford NE, Dzobo W (2020) Integrating artificial and human intel-
ligence: a partnership for responsible innovation in biomedical engineering and medicine.
OMICS J Integr Biol 24:247–263. https://doi.org/10.1089/omi.2019.0038
25. Luo J, Wu M, Gopukumar D, Zhao Y (2016) Big data application in biomedical research and
health care: a literature review. Biomed Inform Insights 8:BII.S31559
26. Luo Y, Szolovits P, Dighe AS, Baron JM (2016) Using machine learning to predict laboratory
test results. Am J Clin Pathol 145(6):778–788
27. Guleria P, Sood M (2020) Intelligent learning analytics in healthcare sector using machine
learning. In: Jain V, Chatterjee J (eds) Machine learning with health care perspective. Learning
and analytics in intelligent systems, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-
030-40850-3_3
28. Durai V, Ramesh S, Kalthireddy D (2019) Liver disease prediction using machine learning
29. Mehtaj Banu H (2019) Liver disease prediction using machine-learning algorithms. Int J Eng
Adv Technol (IJEAT) 8(6). ISSN: 2249-8958
30. Sivakumar D, Varchagall M, Gusha SAL (2019) Chronic liver disease prediction analysis
based on the impact of life quality attributes. Int J Recent Technol Eng (IJRTE) 7(6S5). ISSN:
2277-3878
31. Ma H, Xu C, Shen Z, Yu C, Li Y (2018) Application of machine learning techniques for
clinical predictive modeling: a cross-sectional study on nonalcoholic fatty liver disease in
China. BioMed Res Int 2018
Prediction Model for the Healthcare Industry Using Machine Learning 47
32. Sontakke S, Lohokare J, Dani R (2017) Diagnosis of liver diseases using machine learning.
In: 2017 international conference on emerging trends & innovation in ICT (ICEI), Feb 2017.
IEEE, pp 129–133
33. Hashem S et al (2017) Comparison of machine learning approaches for prediction of advanced
liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinform
15(3):861–868
34. Sindhuja D, Jemina Priyadarsini R (2016) A survey on classification techniques in data mining
for analyzing liver disease disorder. Int J Comput Sci Mob Comput 5(5):483–488
An Efficient Framework for Crime
Prediction Using Feature Engineering
and Machine Learning
Abstract The growth of crime in society produces insecurity among people and
severely impacts the country’s economic development. Understanding crime pat-
terns is necessary to provide a proactive response to curb criminal activities. Most
crime prediction models harnessing machine learning/deep learning techniques fail
to exhibit significant results for the vague and inconsistent dataset. This is due to the
non-consideration of suitable pre-processing and feature engineering techniques,
resulting in an unreliable and inaccurate model. Hence in this work, an efficient
framework for crime prediction using Feature engineering and Machine learning is
proposed. Initially, the vague and inconsistent input dataset is pre-processed using
Feature Scaling and Encoding input values. After pre-processing DBSCAN clus-
tering is applied to extract geospatial features, which is used for improving model
accuracy. Then SelectKbest library is used to select the 10 best features for the
prediction model. Further, the SMOTE library is used to oversample crime types
with the lowest sample size. Finally, for classifying the crime types, XGBoost
and LGBM classifier models are evaluated. The proposed crime prediction model
was validated by deploying an Amazon EC2 P3 instance in the cloud environ-
ment. The proposed model yields an improved accuracy of 62% for the XGBoost
Classifier and 63% for the LGBM Classifier compared with KNN and Decision
Tree.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 49
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_5
50 Vengadeswaran et al.
1 Introduction
2 Related Works
Several research works [6–9] proposed crime prediction models using Machine learn-
ing and Deep learning techniques. Khan et al. [6] propose a novel methodology, where
the exploratory data analysis is performed by constructing different graphs to check
the distribution of data, etc. After examining the data, feature selection and transfor-
mation are done to further supplement the model. Different machine learning models
are then evaluated on the dataset and the results are compared with each other. The
An Efficient Framework for Crime Prediction Using Feature … 51
models used in this paper are naive Bayes, random forest, and gradient-boosting
methods. It was eventually concluded that the gradient-boosting methods performed
better on the dataset. The major problem with this approach is that it didn’t consider
the spatial features, which is a major drawback considering crime is very prone to
geographical features. Lin et al. [7] propose a system to detect crime types using
geographical features. The paper presents a novel method for extracting spatial and
temporal features and inputting them into a model that can utilize the above two
types of features to make a more accurate prediction. The paper also proposes a
grid-based approach for geographical feature extraction that helps represent spatial
features more accurately, thereby increasing the model’s accuracy. Here the major
issue in identifying the crime is that, in a particular region, there may be different
factors that affect the crime rates and types. Hence the factors are very much depen-
dent on the geographical area. Further, Bappee et al. [8] proposes a novel method for
predicting crime types by considering spatial features. The paper tries to extract the
crime hotspots for the classification algorithm. By considering the spatial features
the accuracy of the model has increased hence demonstrating the importance of con-
sidering spatial features. Zhang et al. [9] classifies crime types using the XGBoost
classifier and makes use of explainable AI to figure out which variables have the
most effect on the overall output using SHAP values. However, in both cases [8, 9],
the geospatial features are not extracted from the given dataset thereby significantly
reducing the performance during model evaluation. This has motivated us to study
the significance of clustering techniques [10, 11] in detail. Further we explored, how
clustering can be applied to extract the geospatial features from the input dataset and
employ those spatial features that can effectively classify crime types. In this work,
for classifying the crime types, the most promising classifiers such as XGBoost and
LGBM are evaluated for achieving improved accuracy.
3 Proposed Work
In this work, an efficient framework for crime prediction using Feature engineering
and Machine learning is proposed. The detailed work methodology is explained in
this section (Fig. 1).
The region considered for this study is the city of Atlanta, the most populous city
in the state of Georgia, USA. Atlanta encompasses 134.0 square miles, of which
133.2 square miles is land and the rest is water (Fig. 2a). The town is situated in
the foothills of the Appalachian Mountains and east of the Mississippi River. The
Atlanta crime dataset collected from the open-source platform, Kaggle [12], is used
52 Vengadeswaran et al.
for the evaluation of the proposed work. The dataset contains information about every
crime that has occurred in Atlanta from 2009 to 2017. A detailed description of the
attributes in the dataset is represented in Fig. 2b.
An Efficient Framework for Crime Prediction Using Feature … 53
3.2 Pre-processing
In the original crime dataset, there were 11 crime types such as Larceny from Vehi-
cle, Larceny non-Vehicle, Burglary-Residence, Auto theft, Burglary-non Residence,
AGG Assault, Robbery Pedestrian, Robbery-Residence, Robbery-Commercial, Rape,
and Homicide. After initial experimentation, it was found that grouping of crime types
significantly improves the accuracy of crime prediction. Accordingly, the ideal way
of grouping these crime types was identified as follows:
• It was found that the crime types like Homicide, Rape, and Auto theft exhibited
random behaviours and hence were removed entirely from our prediction model.
Also, it observed from experiments, that the model performed better when all
types of burglary and Robbery (Burglary-non-res, Burglary residential, Robbery-
residence, Robbery-commercial, and Robbery-pedestrian) are combined into one
single group called “Burglary”.
After performing the above steps, there were, in total, four crime types. However,
the data still was not in the proper format to be inputted into the machine learning
model; hence three distinct pre-processing steps were performed.
Step 1: Removing Null-Values The presence of null values signified a data loss
and did not provide any meaningful information; hence all the null values were
removed from the dataset using the Pandas library. This will facilitate in improving
the accuracy of the LBGM classifier used in our proposed work.
Step 2: Feature Scaling Since we are performing the classification using a machine
learning model, the numerical data must be on the same scale. Hence a min–max
scaler was used to scale the numerical columns in our dataset (lat, long, beat). It scales
the values to a specific range without changing the shape of the original distribution.
Due to this, the distribution of attributes will have a smaller standard deviation, which
can suppress the effect of outliers.
x − min(x)
. x' = (1)
max(x) − min(x)
where x is an original value, .x ' is the normalized value, min(x) is the Minimum
feature value and max(x) is the Maximum feature value.
Step 3: Encoding Input Values Input values that are categorical are encoded using
one-hot encoding. It converts categorical data into a format that can be fed into
machine learning models to improve prediction accuracy. In the proposed work, a
one-hot encoding transform function available in the scikit-learn Python ML library
is applied, which results in a binary vector for each variable.
54 Vengadeswaran et al.
The original dataset contains nine features, as described in Table 1. These features
were insufficient to classify the crime types accurately; hence for the proposed work,
new temporal features were created. By converting the timestamp column in the
original dataset to DateTime format, temporal features such as day of the week,
day, month, quarter, and holiday were extracted. Another issue to be handled was
the longitude and latitude features. These features provide crucial geospatial data
that was critical in determining the crime type, but the range of the latitude and
longitude column was not significant. Hence, Feature engineering is performed to
extract meaningful information from the data.
In this work, Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) is considered for geospatial features. DBSCAN [11] is a density-based
clustering system that clusters points based on the number of neighbouring points.
The DBSCAN clustering will iterate from point to point, calculate the distances
among points, identify core points and then cluster the surrounding points together
[13]. The main shortcoming of DBSCAN viz high computation costs and consider-
ation of only local characteristics while identifying the cluster is alleviated in this
work by using the Mahalanobis distance metric. The Mahalanobis distance . DM (x)
from a point data, .x, to a cluster with mean, .µ, and covariance matrix, . S, are defined
by the following equations.
√
. DM (x) = (x − µ)S −1 (x − µ)T (2)
To select the best parameter for the clustering, hyper-parameter tuning was per-
formed. The best epsilon value in DBSCAN to maximize the output was identified
using elbow methods. In this work, Latitude, Longitude, and Crime count are con-
sidered as the input vectors. The DBSCAN clusters the input vector and provides
the high-frequency crime hotspots as 3 clusters with increasing intensity (Fig. 3a).
These clusters representing geospatial features are then used for achieving improved
model accuracy.
An Efficient Framework for Crime Prediction Using Feature … 55
Since the input dataset contains many features that are not useful for classification
purposes, feature selection is performed on the dataset by using the SelectKbest
library. It determines the top 10 best features which are considered for model evalua-
tion. SelectKBest library operates on the assumption that the features in a dataset are
independent and identically distributed, and their importance can be quantified by
statistical tests. The higher the feature score, the more critical it is for predicting the
target variable. It selects the ‘k’ best features from the dataset based on their scores
(Fig. 3b).
3.5 SMOTE
The target variable has four crime types after the pre-processing steps, which include
Assault, Burglary, Larceny non-vehicle, and Larceny from a vehicle. But the problem
is that not all the crime types have equal samples; this can create a massive problem for
the model since it can be biased towards the highest sample. The paper [14] describes
the problems related to learning from class-imbalanced datasets. To overcome such
problems, in the proposed work, SMOTE library is used to oversample the crime type
with the smallest sample size. SMOTE stands for synthetic minority oversampling
technique, and it creates synthetic minority class samples. The primary overflow of
smote is that it selects one of the instances of the minority class and selects one
or more of its nearest neighbours based on a distance metric [15]. The significance
of SMOTE high-dimensional imbalanced datasets has been experimented with and
validated in this work.
56 Vengadeswaran et al.
The experiments have been conducted on an Amazon AWS cloud platform [16]
since it is a preferred paradigm for performing massive computations. To carry out
the experiments, the Amazon EC2 P3 instance was established in the cloud, which
delivers high-performance computing for machine learning models. These instances
significantly accelerate machine learning applications. The detailed hardware spec-
ifications is shown in Fig. 4.
The target variables are label encoded before passing to the model. To validate the
performance of the trained model, the input dataset after pre-processing is segregated
into 2 parts, (i) Training dataset (80%) and (ii) Testing dataset (20%). During the
training process, the concept of class weights is used to negate the effect of class
imbalance caused due to the dataset. Moreover, hyper-parameter tuning was also done
to determine the best parameters (Fig. 4). For classifying the crime types, XGBoost
[17] and LGBM [18] classifiers are used in this work.
The LGBM classifier grows leaf-wise. It is extremely popular due to its lightweight
and being computationally inexpensive to train. XGBoost is designed to be faster
and less resource-consuming. XGBoost grows level-wise, resulting in fewer nodes
and hence consuming fewer resources. The performance of the proposed model is
evaluated through the performance metrics viz. Precision, Recall, and Accuracy as
represented in Fig. 5. The prediction model has been evaluated, and the performance
of the LGBM and XGBoost was studied in comparison with complementary works
already done using KNN [19] and using the Decision Tree [20]. The results for
LGBM and XGBoost on the crime dataset are listed in Tables 1 and 2. Table 1 shows
that the LGBM classifier achieves a maximum accuracy of 63% for the test data and
XGBoost achieves 62% accuracy, whereas Decision Tree and KNN have attained
only a maximum accuracy of 58% and 57%, respectively. Figure 6a, b shows the
graphical representation of the relation between the number of epochs used with
accuracy and loss for the prediction model. In addition, a significant improvement in
the prediction model was achieved due to adoption of Feature engineering, as shown
in Fig. 7. From graph it is inferred that LGBM achieves the maximum improvement
of 5% .(((63 − 60)/60) × 100) accuracy for the test data.
Outcome: The prediction model will be developed as a Visualization Dashboard
which benefits the law enforcement authorities to better understand crime issues and
provide insights that will enable them to track activities, predict the likelihood of
incidents, effectively deploy resources, and optimize the decision making.
In this work, we studied the crime data of Atlanta to provide predictions using feature
engineering and machine learning. Here the most prominent geospatial clustering
DBSCAN is used to extract geospatial features, which are then used to increase the
overall model accuracy. The proposed crime prediction model was validated using
58 Vengadeswaran et al.
XGBoost and LGBM classifier deployed on the Amazon cloud environment. The
proposed model yields an improved accuracy of 62% for the XGBoost Classifier
and 63% for the LGBM Classifier compared with KNN and Decision Tree. This
model helps law enforcement to estimate future crime hotspots better and reduce the
number of crimes happening in the locality.
In future, to increase the prediction accuracy, the model will be evaluated using
big data tools such as Apache Spark for processing data and will tested with Deep
learning algorithms such as LSTM for prediction tasks.
Acknowledgements This research work is supported by the IoT Cloud Research Laboratory, Indian
Institute of Information Technology Kottayam, providing the infrastructure required for experi-
ments.
References
1. https://www.macrotrends.net/countries/WLD/world/crime-rate-statistics
2. https://worldpopulationreview.com/country-rankings/crime-rate-by-country
3. van Dijk J, Nieuwbeerta P, Joudo Larsen J (2021) Global crime patterns: an analysis of survey
data from 166 countries around the world, 2006–2019. J Quant Criminol 1–36
4. Jenga K, Catal C, Kar G (2023) Machine learning in crime prediction. J Ambient Intell Humaniz
Comput 1–27
5. Dakalbab F, Talib MA, Waraga OA, Nassif AB, Abbas S, Nasir Q (2022) Artificial intelligence
& crime prediction: a systematic literature review. Soc Sci Humanit Open 6(1):100342
6. Khan M, Ali A, Alharbi Y (2022) Predicting and preventing crime: a crime prediction model
using San Francisco crime data by classification techniques. Complexity 2022
An Efficient Framework for Crime Prediction Using Feature … 59
7. Lin YL, Yen MF, Yu LC (2018) Grid-based crime prediction using geographical features.
ISPRS Int J Geo-Inf 7(8):298
8. Bappee FK, Soares Júnior A, Matwin S (2018) Predicting crime using spatial features. In:
Advances in artificial intelligence: 31st Canadian conference on artificial intelligence, Canadian
AI 2018, Toronto, ON, 8–11 May 2018, proceedings 31. Springer International Publishing, pp
367–373
9. Zhang X, Liu L, Lan M, Song G, Xiao L, Chen J (2022) Interpretable machine learning models
for crime prediction. Comput Environ Urban Syst 94:101789
10. Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2021) BLOCK-DBSCAN: fast clustering
for large scale data. Pattern Recognit 109:107624
11. Deng D (2020) DBSCAN clustering algorithm based on density. In: 2020 7th international
forum on electrical engineering and automation (IFEEA), Sept 2020. IEEE, pp 949–953
12. Link: https://www.atlantapd.org/i-want-to/crime-data-downloads
13. Han X, Armenakis C, Jadidi M (2020) DBSCAN optimization for improving marine trajectory
clustering and anomaly detection. Int Arch Photogram Remote Sens Spat Inf Sci 43:455–461
14. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng
21(9):1263–1284
15. Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinform
14:1–16
16. Amazon AWS Link: https://aws.amazon.com/
17. XGBoost Classifier Documentation: https://xgboost.readthedocs.io/en/stable/
18. LGBM Classifier Documentation: https://lightgbm.readthedocs.io/en/latest/pythonapi/
lightgbm.LGBMClassifier.html
19. Shojaee S, Mustapha A, Sidi F, Jabar MA (2013) A study on classification learning algorithms
to predict crime status. Int J Digit Content Technol Appl 7(9):361
20. Kshatri SS, Narain B (2020) Analytical study of some selected classification algorithms and
crime prediction. Int J Eng Adv Technol 9(6):241–247
Development of an Online Collectable
Items Marketplace Using Modern
Practices of SDLC and Web Technologies
Abstract This research paper is intended to study and implement the development of
a PHP-based system of an online collectable items marketplace which can onboard
multiple users and sellers while keeping a singular admin account. The research
intensely focuses on adapting modern techniques of the software development lifecy-
cle (SDLC), Pattern matching algorithms to optimize search times, secure multilayer
encryption techniques for login and purchase modules, overall security from XSS
attacks in the input fields while documenting the work and meanwhile proposing a
system with a rich feature set to satisfy user requirements. The development cycle
one has been intensely studied with usability testing and a tabular representation of
the results. Furthermore, the analysis of these results has been captured graphically
and conclusively a complete Web application is developed compliant to the essential
benchmarks of today’s time.
1 Introduction
Web development is a widely popular domain for development of ideas and materi-
alizing them into real world applications. However, with the advancement of com-
puter science many modern approaches must be put to practice in order to achieve
desired results. The motivation to develop an online antique marketplace is explained
briefly. Firstly, according to IBIS World, the antiques and collectibles sales industry
has grown over the past 5 years by 7.2%. In the year 2018 it reached a revenue of 2
billion USD, parallelly the business has grown by 1.3% and employees’ number has
grown by almost 1% (as of July 28, 2019).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 61
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_6
62 S. Khulbe et al.
Secondly, as per reports from AT&T the global internet usage for purchase has
dramatically increased and makes it all the more essential for businesses to adopt such
means to stay afloat. Thirdly, in order to dive more into security and encryption-based
techniques the system must involve components like payments and login systems
which can be tested at a big scale. Lastly, the desire for speed and responsiveness
must be satisfied and in order to do so the development of such an application must
involve a rich feature set and improved search times. In order to effectively do so
we must delve into the domain of pattern searching algorithms. On the whole this
research is a collaborative effort that lets us understand the modern practices of
software development and how new ideas can be brought about in the lifecycle of
the Web application we plan to build.
Essentially this research provides a simulation to all considerations for modern
software development. The problem we aim to deal with involves the top-down
approach of software development wherein the working components are made in a
way that they simply tend to work and later refactoring the components and devel-
oping documentation so that others can collaborate. This approach works and with
a good skillset it is possible to achieve desired results. However, we plan to explore
multiple domains as effectively as possible and in order to do so we shall follow
a bottom-up approach of Software have a modular design and must address all the
features that have been discussed so far. This involves the identification of roles (such
as user can edit profile, onboard, search categorically, filter results) and implement
them in the modules that they concern. Secondly once the identified features are being
developed there must be identification of the novelties that need to be embedded into
the system.
In our case, the modules concerning login and payment must be encrypted with
the researched algorithms while the modules involving product searching must be
embedded with the algorithms involving pattern matching. The overall system must
be made highly interconnected so that future developments and maintenance is easily
achieved. In our case that would involve making the web-pages dynamically adjust
their behavior as per the real-time changes made by the multiple entities to the
database. In order to develop the application many techniques had been researched
involving modern SDLC and in order to suit our means we had to select among
many of the present SDLC models. However, our SDLC hybridizes multiple SDLC’s
such as the agile development model alongside with the classical waterfall model.
The waterfall model provided us the needed stability for the development of the
backbone while the agile development was embedded into the modules where the
revised algorithms needed to be tested multiple times. Multiple proposals were made
for the concerned algorithms and will be discussed further into the paper.
2 Literature Survey
To start with our project we first need a survey to be conducted. We studies different
SDLC models and gathered some insights about their strengths and weakness. Refer
to Table 1 to get the best insights that we gathered. Then we studied about XSS
Development of an Online Collectable Items Marketplace … 63
attacks. When the attacker use malicious scripts in the data, such type of attacks
are called XSS attacks. There are various models to prevent and detect XSS attacks.
Refer to Table 2 to get detailed and structured information about the techniques to
prevent and detect these attacks.
Searching the product is a very big concern as the project database starts to
grow. As the size of the data increases the time complexity to search that data also
increases. Therefore we studied String pattern matching algorithms to make the
search of product efficient and have better user experience. Refer to Table 3 to get
the analysis of different algorithms.
We wanted our application to be unique and the best hence we did market surveys
of all the applications out there in the market. Analyzed their pros and cons and
tried to integrate all the best functionalities and improve their shortcomings in our
application. To get the best insights of the market refer to Table 4. We are very
much concerned with the security of our applications. Therefore, we have done an
encryption technique and SQL injection survey for advancements in our future phases
of development. Refer to Tables 5 and 6 to insights about the survey.
3 Proposed Work
After the entire literature analysis, we can finally start to discuss in detail about all
the work we propose to do. In the process of the development, we also used many
standard techniques like development of UML specified diagrams to carry out the
process in an appropriate manner.
64 S. Khulbe et al.
The development process involves many detailed steps and shall not be discussed
entirely through words. Instead, we can claim that there was a total of 25 usable
features that were identified and implemented among 3 modules (User, Seller, and
Admin). Now upon detailed analysis of encryption algorithms and the provided fea-
ture set which already exists in PHP, we used a combination of SHA encryption
algorithms with MD5 so as to provide the system with a double encrypted security.
Strong contenders for the encryption algorithms included salt which had a key-
value-based approach of encryption and provided much better security than each of
the algorithms discussed so far. This algorithm however on its own is very capable. It
consumed a lot of time to encrypt the strings and thus in comparison the dual encryp-
tion technology fared better. The need for this encryption technology is expected to
be used in the login and onboarding functionalities of the application. The appli-
cation also aims to use a session-based login system mainly because although it is
possible to develop a cookie-based system, the session-based login system makes
the user experience far more natural. Furthermore, developers ultimately need to rely
on both session and cookies which complicates the code and hence development is
simplified by using such a technique. Lastly, the shortcomings of the session-based
login system are compensated by the usage of encryption algorithms.
66 S. Khulbe et al.
The entity diagram illustrated below rightfully explains the overall architecture of
the system that we plan to build.The working entities and components are described
in detail and as one tries to implement the system the need for each component is
apparent. Every component has a specific data type which is supported by MySQL as
of the development of this document and every component has its relation/function
specified on the connecting lines of the component which illustrates the functional
dependency amongst the working entities.
The following diagram relates the functions of each entity with their respective
components of the system. The image below represents a data-flow diagram and
uses UML specifications to represent entities, functions, data involved, and the table
concerned for the particular usage.
Development of an Online Collectable Items Marketplace … 67
The high-level abstracted version of the system can be represented with the following
diagram. The tools used are just for representation but ultimately depend on the
developer and what all resources are available at that moment.
In order to prevent confusion/ files/sessions represent only the fact that sessions
are initiated based on actions executed by the client side machines, while sessions
originally function on the server only. Furthermore files in this case represent the
pictures in all accepted formats for purposes like user profile picture, product image
representations, etc.
As per the research we conclude to implement some of the robust ways to prevent
XSS attacks. Firstly, any input accepted from the user is accepted through HTML
forms and thus every form must be validated at the backend so as to keep the system
more secure. Secondly, all SQL queries are to be executed in atomic transactions so
as to keep the database consistent. Thirdly, the form input must always be striped
off from any scripting tags such as PHP scripts, SQL scripts, and JS scripts. This
technique provides a great deal of prevention from most XSS attacks.
The system also ensures a token for login and redirects all external requests
to the login page which discards the requests. This completely prevents access of
inner pages of the website without login and since the login is secured with double
encryption the sensitive information is secured.
Due to the modular nature of the web application the database connection token
is stored away securely and even in case of database corruption the connection token
can quickly be redirected to a Data Backup thus ensuring the systems reliability. A
reliable way to prevent malicious content from entering into the system is to use a
standard text encoding technique by explicitly specifying it in the code base. This
ensures that all the data is displayed in a consistent fashion throughout the system.
Searching is an essential feature used on a frequent basis for filtering data based on
their name or category. While categorical filtering can be done using the database,
the searching functionality must be implemented in the Client-Side only as the Client
may request for multiple kinds of information. The simplest principle of searching
involves simple matching of all characters of the string. However, such a rudimentary
form of pattern matching is poor. Thus, the better way of handling this is to search for
patterns that resemble the text and sort based on similarity of the search. This provides
68 S. Khulbe et al.
us with a more insightful representation of the data. The current algorithms used
involve complex pattern matching algorithms like Knuth Morris Pratt (KMP) pattern
matching algorithms. However, the way a user intends to search the information tends
to resemble pattern matching techniques similar to sliding windows. Thus, in order
to satiate the needs of the user we tend to incline ourselves to a faster technique while
limiting the behavior to the point where the user is able to obtain the desired results.
We successfully developed our application and tested the application on various test
cases. We run the application on 125 various test cases on the field of proper login
and logout functionality, customer onboarding, and secure and fast payment process.
According to our analysis 84% of the test cases performed well, 11% performed
average, and 4% (refer Fig. 4)perform bad but we are happy to say that none of our
test case failed in performing the functionality and our applications worked fine with
all the test cases. Refer to Table 7 to know all the categories of the test cases.
70 S. Khulbe et al.
In our research paper, we have been successful in developing a PHP based system.
This system is an online marketplace. The marketplace is built to provide users with
a medium to purchase and sell collectable items. The facility for multiple users and
sellers to buy and sell collectables and antiques using a single admin account has
been provided in our system.
We have focused our attention mainly on adding modern and new efficient tech-
niques to the online marketplace which has been built. Techniques like software
development lifecycle (SDLC) and pattern matching algorithm have been used and
implemented which has the potential of optimizing search time for the users. Mul-
tilayer encryption has also been used to maintain security for login and purchase
modules. Not only that, but the multilayer encryption also provides security from
XSS attacks. Thus, we conclude that our online marketplace is a well-built platform
that is a user friendly and secure place for buying and selling antiques and collecta-
bles. Going forward, we can see our product being the handiest and the go-to website
for antique collecting enthusiasts. As you can see on our website, we have features
such as
• The seller has a separate account where he can put up items for sale and their
respective prices.
• Also, you have a buyers account from where the buyer can easily buy and view
various types of products.
We see our product as an absolute win where we can buy antiques and collectables
by sitting within the comfort of our homes and buying is just one click away. We see
Development of an Online Collectable Items Marketplace … 71
our product as a huge success going forward in the future since we all know “time
is money” and our website makes buying very efficient and without any wastage of
time compared to the typical offline auctions where you have to specifically go to
the place to buy the item you want and still there is no guarantee of you acquiring
that product.
• We also aim to have very smooth logistics for the buyers where there is transparency
between them and the sellers using our website.
• We also aim to have a real-time tracking system of the product so that the buyer
knows which process their product is going through and when they can expect the
product delivery.
72 S. Khulbe et al.
• We also aim to have a detailed description of our products wherein the buyer
states which century the product is from and its historic importance and to whom
it belongs.
• As we all know the crypto market is booming massively in the current scenario,
we also aim to accept payments through cryptocurrency.
References
1. Waykar, Y (2013) A study of importance of UML diagrams: with special reference to very
large-sized projects
2. Balaji S, Sundararajan Murugaiyan M (2012) Wateerfallvss V-model vs agile: a comparative
study on SDLC
3. Eessaar E (2016) The database normalization theory and the theory of normalized systems:
finding a common ground. Baltic J Mod Comput 4:5–33
4. Bansal, H, Khan R (2018) A review paper on human computer interaction. Int J Adv Res
Computer Sci Software Eng 8(53). https://doi.org/10.23956/ijarcsse.v8i4.630
5. Maurel H, Vidal S, Rezk T (2021) Statically Identifying XSS using deep learning. In: SECRYPT
2021—18th International conference on security and cryptography, Virtual, France, July 2021
6. Khazal IF, Hussain MA (2021) Server side method to detect and prevent stored XSS attack
7. Li C, Wang Y, Miao C, Huang C (2020) Cross-site scripting guardian: a static XSS detector
based on data stream input-output association mining
8. Ankush SD (2014) XSS attack prevention using DOM based filtering API
9. Gupta S, Gupta BB (2016) Automated discovery of JavaScript code injection attacks in PHP
web applications
10. Malviya VK, Saurav S, Gupta A (2013) On Security Issues in Web Applications through cross
site scripting (XSS). In: 2013 20th Asia-Pacific software engineering conference (APSEC), pp
583–588. https://doi.org/10.1109/APSEC.2013.85
11. Diwate R (2013) Study of different algorithms for pattern matching
12. Janani R, Vijayarani S (2019) Information retrieval from web documents using pattern matching
algorithms
13. Aldaej R, Alfowzan L, Alhashem R, Alsmadi MK, Al-Marashdeh I, Badawi UA, Alshabanah
M, Alrajhi D, Tayfour M (2018) Analyzing, designing and implementing a web-based auction
online system
14. Erna P, Herdi A, Enjun J, Venkata Harsha N (2020) An architecture of E-marketplace platform
for agribusiness in Indonesia. MSCEIS, EAI. https://doi.org/10.4108/eai.12-10-2019.2296542
15. Kuhmonen S (2017) One-time password implementation for two-factor authentication
16. School of Mathematics and Information Technology, Nanjing Xiaozhuang College, Nanjing,
China
17. Sinha S (2020) Secure login system for online transaction using two layer authentication
protocol
18. Kumar B, Yadav S (2016) Storageless credentials and secure login. In: ACM International
conference proceeding series, 04–05-Mar 2016. https://doi.org/10.1145/2905055.2905113
19. Guljari E, Lokhande S, Mande S, Reddy L, Uma Maheswari K (2016) Authentication of users
by typing pattern: a review
20. Devi R, Venkatesan R, Koteeswaran R (2016) A study on SQL injection techniques. Int J Pharm
Technol 8:22405–22415
21. Singh JP (2016) Analysis of SQL injection detection techniques
22. Som S, Sinha S, Kataria R (2016) Study on SQL injection attacks: mode, detection and pre-
vention
Development of an Online Collectable Items Marketplace … 73
23. Alwan Z, Younis M (2017) Detection and prevention of SQL injection attack: a survey. Int J
Comput Sci Mob Comput 68:5–17
24. Oluwakemi A, Abdullahi A, Haruna D, Oluwatobi A, Kayode A (2020) A novel technique to
prevent SQL injection and cross-site scripting attacks using Knuth-Morris-Pratt string match
algorithm. EURASIP J Inform Securi. https://doi.org/10.1186/s13635-020-00113-y
25. Rawat R, Shrivastav S (2012) SQL injection attack detection using SVM. Int J Comput Appl
42:1–4. https://doi.org/10.5120/5749-7043
Facial Emotion Recognition Using
Machine Learning Algorithms: Methods
and Techniques
Akshat Gupta
Abstract The interpretation of a person’s state of mind relies heavily on their facial
expressions. The identification of facial expressions requires the categorization of
a number of different feelings, such as neutral, happy, angry. This use is becoming
increasingly significant in day-to-day living. Identifying emotions can be accom-
plished through a variety of strategies, such as those based on machine learning and
artificial intelligence (AI) techniques. Methods such as deep learning and image clas-
sification are utilised in order to recognise facial expressions and categorise them in
accordance with the corresponding photographs. For the purpose of training expres-
sion recognition models, a variety of datasets are utilised. This article provides a
comprehensive analysis of the facial recognition systems and techniques that have
been utilised in the past, as well as those that have been developed more recently,
together with the methodology behind them and some examples of their applications.
1 Introduction
Communication between humans takes place not only through the use of spoken
language but also through the use of non-verbal cues such as hand gestures and facial
expressions, which are employed to convey emotions and provide feedback [1]. They
also assist us in comprehending the motivations of other people [2]. Several studies
[3, 4] have found that non-verbal components are responsible for conveying two-
thirds of human communication, while verbal components are only responsible for
conveying one-third. The rapid development of techniques for artificial intelligence,
A. Gupta (B)
Department of CSE, National Institute of Technology, Goa, India
e-mail: akshatgupta@nitgoa.ac.in; akshat18jan@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 75
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_7
76 A. Gupta
such as in Human–Computer Interaction (HCI) [5, 6], Virtual Reality (VR) [7],
Augmented Reality (AR) [8], Advanced Driver Assistance Systems (ADAS) [9],
and Entertainment [10, 11], is increasing the demand for facial emotion recognition.
Because a person’s circumstances and the people with whom they contact might
cause them to feel a wide range of emotions, that person is prone to experiencing
mood swings, which can in turn have an impact on their day-to-day activities. People
are always facing difficult emotional choices, which impedes both their produc-
tivity and their own personal development. Given that emotions are merely states
of consciousness, there is a strong connection between them and the work that we
undertake. Therefore, all it takes is a few simple actions or pursuits for a person’s
frame of mind to shift in a positive direction. If computers are able to recognise
these emotional inputs, then they will be able to provide users with assistance that is
both clear and suitable in a manner that is tailored to the user’s specific requirements
and preferences. It is a fundamental tenet of psychology theory that human feelings
can be categorised into one of seven archetypal states: surprise, fear, disgust, anger,
happiness, sorrow, or neutral [1]. This categorization is widely accepted in the field.
The movement of the face is an essential component in the communication of these
feelings. The expressions that are created by the face’s muscles can be altered, both
naturally and on purpose, to convey a variety of emotions. If computers are able to
recognise these emotional inputs, they will be able to provide users with assistance
that is both obvious and appropriate, catering to the user’s specific requirements
and preferences while simultaneously ensuring the user’s happiness and fulfilment
(Fig. 1).
The ability to read another person’s feelings based on their facial expressions
differs from person to person. There is also the possibility of variation with age.
It is now much simpler to identify the feelings that other people are experiencing
because to the advancement of technology. However, the accuracy of these methods
is often quite poor. It is possible to improve the accuracy of emotion identification
by employing a combination of different methods to analyse human expressions
from various sources of data like as psychology, audio, or video. This could result
in a significant increase in the level of accuracy achieved. The technology behind
2 Literature Review
Ekman et al. have taken the first step towards providing universality in facial termi-
nology. These ‘universal facial expressions’ are those that are used to represent happi-
ness, sadness, anger, surprise, and being neutral [12]. Multiple intellectual machines
and neural networks are utilised in the process of instrumenting the system that
detects emotions [13].
A thorough investigation into the recognition of facial expressions of emotion
is presented in [14], which also describes the characteristics of a dataset and an
emotion detection study classifier. The article [15] explores the visual features of
the image and discusses some of the classifier techniques, both of which will be
thought-provoking when applied to a further evaluation of the methods of emotion
recognition. The ability to recognise facial expressions of emotion is investigated
and analysed across all scientific domains [16].
Because of the excellent accuracy rate that was achieved using filter banks and
Deep CNN [17], we were able to draw the conclusion that deep learning could
also be utilised for emotion identification. Emotion can be determined from face
photos using these methods. Image spectrograms combined with deep convolutional
networks can also be used to do facial emotion recognition, as described in [18]. It
is also feasible to recognise emotions by employing a variety of features, such as
speech or bi-modal systems, which take into account both the speaker’s voice and
their facial expressions [1, 2]. A number of academics have put up the idea that the
system should both induce and guide the learner’s emotions to the appropriate state.
But first and foremost, the system needs to be able to understand the feelings of the
learner.
Hybrid tactics, which employ many methods in tandem with various affective
cues, have also been developed. This [3] analysis also provides insight on a modern
composite deep learning strategy.
Table 1 summarises the five years of development in this field. Emotion analysis on
faces has been done using many different datasets, some of which are included here:
CK, JAFFE, CMU, FER2013, DISFA, RAFD, CFEE, SFEW, and FERA. Clearly,
there are large output differences between the different implementations, suggesting
the ongoing need to standardise datasets like the well-known IMAGENET dataset
used in CNN models so that controlled research can be carried out.
In a study available at [19], the authors provide a brief summary of the many
studies on facial expression recognition (FER) conducted over the past few decades.
First, the typical FER processes are outlined, then a brief overview of the many
kinds of typical FER systems and the primary algorithms they employ are provided
research into the use of multilevel decisions inside a highly convolutional neural
78 A. Gupta
network for facial emotion recognition was conducted by Hai-Duong Nguyen [20].
They present a model supported by evidence that includes a hierarchy of attributes on
purpose to facilitate classification. When tested on the FER2013 dataset, the model’s
performance was shown to be on par with that of other advanced approaches.
Using a feedforward learning approach, the authors of [21] developed a facial
features recognition algorithm for use in Associate in Nursing classrooms.
Hernández-Pérez [22] proposed a strategy that combined native Binary Patterns
(LBP) options obtained from facial expressions with oriented fast and revolved tran-
sient (ORB) features and support vector machine. Zhang Qinhu [23] introduces self-
attention mechanism supported by residual network and achieved 98.90 and 74.15%
accuracy rates on the CK+ and FER2013 datasets.
Using a pre-trained identity verification and Xception algorithmic rule coach,
Zahara [24] developed a strategy for a facial image separation (FIT) machine with
enhanced features.
Facial Emotion Recognition Using Machine Learning Algorithms … 79
Cascaded regression trees are used. Pixel intensities are employed to differentiate
between the various facial regions, ultimately leading to the identification of 68 facial
landmarks [25] (Fig. 3).
The author [25] uses only 19 major elements out of a total of 68 recovered traits,
focusing only on the areas surrounding the mouth, eyes, and brows (as shown in
Fig. 4).
A support vector machine, or SVM, is one of the most effective classification
techniques. There is also the concept of a margin, which is the distance between two
classes that should exist to prevent any form of overlap [26].
In the preceding part, we saw that there are many various kinds of algorithms that
have been employed in face detection; however, there are still a lot of obstacles that
we have to overcome in the field of emotion detection. Following is a list of these:
• Detection and acknowledgement in real time.
• Recognising changing expressions and facial movements.
• Continuous detection.
• Incomplete data.
6 Conclusion
Our objective was to analyse the facial expressions of users to determine their present
state of mind and then make suggestions tailored to that understanding so that users
might improve their disposition and have a more enjoyable experience. The system
makes an effort to supply the person who is in a state of distress or rage with movies
or activities that are designed to relieve stress.
It also suggests information that will motivate users to overcome feelings of sorrow
and sadness and continue to experience joy in their lives. This research is intended
to propose a solution that is not only quicker but also simpler and more succinct.
It can snap a photo with the help of a straightforward user interface, like that of an
Android app, and then proceed to carry out the necessary actions in a manner that
will not aggravate the user. Because it makes use of contemporary deep learning
techniques, such as convolutional neural network, this system is more dependable
Facial Emotion Recognition Using Machine Learning Algorithms … 83
References
1. Busso ZD, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan
S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal
ınformation. In: Proceedings of the 2004, ICMI’04, October 13–15, State College, Pennsylvania
2. Imani M, Montazer GA (2019) A survey of emotion recognition methods with emphasis on
E-Learningenvironments. J Netw Comput Appl 147:102423
3. Ko BC (2018) A brief review of facial emotion recognition based on visual ınformation. MDPI
J 18(2):401
4. Mehrabian A (1968) Communication without words. Psychol Today 2:53–56
5. Kaulard K, Cunningham DW, Bülthoff HH, Wallraven C (2012) The MPI facial expression
database: a validated database of emotional and conversational facial expressions. PLoS ONE
7:e32321
6. Dornaika F, Raducanu B (2007) Efficient facial expression recognition forhumanrobotinterac-
tion. In: Proceedings of the 9th international work-conference on artificial neural networks on
computational and ambient intelligence, San Sebastián, Spain, 20–22 June 2007, pp 700–708
7. Bartneck S, Lyons MJ (2007) HCI and the face: towards an art of the soluble. In: Proceedings of
the international conference on human computer interaction: interaction design and usability,
Beijing, China, 22–27 July 2007, pp 20–29
8. Hickson S, Dufour N, Sud A, Kwatra V, Essa IA (2017) Eyemotion: classifying facial
expressions in VR using eye-tracking cameras
9. Chen H, Lee IJ, Lin LY (2015) Augmented reality-based self-facial modeling to promote the
emotional expression and social skills of adolescents with autism spectrum disorders. Res Dev
Disabil 36:396–403
10. Assari MA, Rahmati M (2011) Driver drowsiness detection using face expression recogni-
tion. In: Proceedings of the IEEE ınternational conference on signal and ımage processing
applications, Kuala Lumpur, Malaysia, 16–18 November 2011, pp 337–341
11. Zhan C, Li W, Ogunbona P, Safaei F (2008) A real-time facial expression recognition system
for online games. Int J Comput Games Technol 2008:1–7
12. Azcarate A, Hageloh F, van de Sande K, Valenti R (2005) Automatic facial emotion recognition.
Universiteit van Amsterdam
13. Khan R, Sharif O (2017) A literature review on emotion recognition using various methods.
Global J Comput Sci Technol F Graph Vis 17(1):1
14. Hintin G, Greves S, Mohemed A (2013) Emotion recognition with deep recurrent neural
networks. In: Proceedings of the 2013 IEEE ınternational conference on acoustics, speech
and signal processing, pp 6645–6649
15. Weint K, Huaang CW (2017) Characterizing types of convolution in deep convolutional
recurrent neural networks for robust speech emotion recognition, pp 1–19
84 A. Gupta
16. Routrey MS, Kabisetpathy P (2018) Database, features and classifiers for emotion recognition:
a review. Int J Speech Technol
17. Hueng KY, Wiu CH, Yieng TH, Sha MH, Chiu JH (2016) Emotion recognition using auto-
encoder bottleneck features and LSTM. In: Proceedings of the 2016 ınternational conference
on orange technologies (ICOT), pp 1–4
18. Sttilar MN, Leich M, Bolie RS, Skinter M (2017) Real time emotion recognition using RGB
image classification and transfer learning. In: Proceedings of the 2017 11th ınternational
conference, signal processing communication systems, pp 1–8
19. Ko BC (2018) A brief review of facial emotion recognition based on visual ınformation. Sensors
18(2):401–421
20. Nguyen HD, Yeom S, Lee GS, Yang HJ, Na IS, Kim SH (2019) Facial emotion recognition
using an ensemble of multi-level convolutional neural networks. Int J Pattern Recogn Artif
Intell 33(11):128
21. Bhatti YK, Jamil A, Nida N, Yousaf MH, Viriri S (2021) Velastin SA (2021) Facial expression
recognition of ınstructor using deep features and extreme learning machine. Comput Intell
Neurosci 1–14:1–7
22. Niu B, Gao Z, Guo B (2021) Facial expression recognition with LBP and ORB features. Comput
Intell Neurosci 2021:1–16
23. Daihong J, Yuanzheng H, Lei D, Jin P (2021) Facial expression recognition based on attention
mechanism. Sci Program 2021:1–18
24. Zahara L, Musa P, Prasetyo Wibowo E, Karim I, Bahri Musa S (2020) The facial emotion
recognition dataset for prediction system of micro-expressions face using the convolutional
neural network algorithm based raspberry Pi. In: Proceedings of 5th ınternational conference
on ınformatics and computing, pp 1–9
25. Swinkels W, Claesen L, Xiao F, Shen H (2017) SVM point-based real-time emotion detection.
In: Proceedings of the 2017 IEEE conference on dependable and secure computing, Taipei
26. Kaya GT (2013) A hybrid model for classification of remote sensing ımages with linear SVM
and support vector selection and adaptation. IEEE J AEORS 6(4):1988–1997
27. Sheykhmousa M, Mahdianpari M, Ghanbari H, Mohammadimanesh F, Ghamisi P, Homayouni
S (2020) Support vector machine versus random forest for remote sensing image classification:
a meta-analysis and systematic review. IEEE J Sel Top Appl Earth Obs Remote Sens 13:6308–
6325
28. https://www.geeksforgeeks.org/introduction-convolution-neural-network/
29. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-
the-eli5-way-3bd2b1164a53
30. Li CEJ, Zhao L (2019) Emotion recognition using convolutional neural networks. Purdue
University Purdue e-Pubs Purdue undergraduate research conference 2019
Similar Intensity-Based Euclidean
Distance Feature Vector
for Mammogram Image Classification
1 Introduction
In cancer, some categories of abnormal cells grow out of control and, if not treated
on time, may cause the patient’s death. Statistics represent a tremendous increase in
cancer cases during the last few decades. As described in the estimation report [1]
of the American Cancer Society (ACS) for 2023, diagnosis of more than 1,958,310
cancer cases and 609,820 deaths are expected in the USA only. Its 297,790 cases
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 85
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_8
86 B. P. Sharma and R. K. Purwar
belong to breast cancer only and are category-wise maximum among women. Of
these breast cancer cases, 43,170 expected deaths represent a high mortality rate.
Table 1 represents the Indian cancer estimation of the National Cancer Registry
Programme (NCRP) report-2020 [2]. Like many other countries, breast cancer is the
leading site of cancer among Indian women. The estimated number of breast cancer
cases in 2020 was 205,424, which covered 14.8% of all cancer cases, whereas these
estimations for 2025 are 232,832 and 14.8%, respectively. It is estimated that within
every 8 min, one woman is diagnosed with and dies of breast cancer in India.
As suggested in the reports published by cancer-controlling authorities [1–3]
of various countries, the risk of developing cancer increases with age. In addition,
improved average survival rate and reduced mortality and severity have been observed
for the cases detected at an earlier stage and appropriately treated. So, many coun-
tries suggest regular breast checkups to identify the presence of any abnormality
at an earlier stage. For this purpose, many breast screening modalities are used
in medical science to view and analyze the possibility of breast cancer. Some of
them are mammography, magnetic resonance imaging (MRI), computed tomog-
raphy (CT) scan, biopsy, thermography, breast ultrasound, electrical impedance
tomography (EIT), digital breast tomosynthesis (DBT), etc. Among them, mammog-
raphy is economical, less time-consuming, non-invasive and the preferred screening
technique.
Shortage of the number of radiologists present globally, such regular screening
creates a considerable burden on them. To overcome this problem, there is a need for
automatic (computer-aided) analysis of mammogram images for breast cancer detec-
tion, prioritizing the mammogram images for manual analysis by radiologists and
providing an additional opinion to them. Inspired by the need, the authors proposed
a model which automates this process and classifies the input mammogram images
into normal, benign and normal categories. The malignant classified mammograms
are supposed to be cancerous and need the radiologist’s advice on priority. Many
possible steps like mammogram enhancement, noise removal, region of interest
(ROI) segmentation and augmentation techniques of this automation process are
inspired by [4], and its resultant dataset (formed from the MIAS dataset) is used in
this work. Further, the Euclidean distance between similar intensity pixels is used for
the extraction of prominent features. At last, the Ensemble Subspace KNN is used as
Table 1 Estimated number of breast cancer cases, all cancer cases in females and both males and
females for the quinquennial years 2015, 2020 and 2025
Year (a) Cancer cases Females (e) Breast cancer
in all sites (b) Female (c) Breast (d) Breast cases (%) to
(males + cancer cancer cancer cases (males +
females) cases from cases (%) among females)
all sites females
2015 1,228,939 627,202 180,252 28.74 14.67
2020 1,392,179 712,758 205,424 28.82 14.76
2025 1,569,793 806,218 232,832 28.88 14.83
Similar Intensity-Based Euclidean Distance Feature Vector … 87
the primary classification model. To balance the data and get reliable results, 2000
mammogram images (a total of 6000) are randomly taken from each of the three
categories for performance analysis.
2 Literature Survey
3 Proposed Work
In the proposed work, the input mammogram images can be classified into one of
three classes benign (lump is present but probably non-cancerous), malignant (prob-
ably cancerous) and normal (no lump present) mammograms. Figure 1 represents the
stages of the proposed computer-aided detection/diagnosis (CAD) model for breast
cancer detection. These stages are described subsequently:
The dataset proposed in [4] from the mammogram images of the mini-
Mammographic Image Analysis Society (mini-MIAS) dataset [6] is used in this
work. The first four stages of the proposed work are adopted from that dataset
only. The ROI images were extracted using its description document (DD), and a
similar approach was used for benign and malignant mammograms, whereas another
approach was adopted for normal mammograms. The size of the dataset was extended
by using rotation (at 10, 20, 30, …, 350°) and vertical flip operations by generating
71 additional transformed copies of each ROI image. After normalization, a dataset
of 23,256 images of resolution 128 × 128 pixels was formed, consisting of 4608
benign, 3744 malignant and 14,904 normal mammograms. This extended dataset
provides sufficient images for the classification model’s training and testing.
There is a considerable variation in the number of images in all three categories of the
used dataset. Its direct usage may provide inaccurate and biased results. Therefore,
2000 images from each category (a total of 6000 images) are randomly selected for
the classifier’s training and validation to get reliable results.
Each image of the used dataset is an 8-bit grayscale image having pixel intensity
between 0 and 255. For each gray-level intensity, the sum of the Euclidean distance
between similar intensity pixels is used as a feature vector (FV) component. Since
these total 256 different gray levels, the size of FV is 256. This FV for each image
is computed using the pseudocode given in Algorithm 1.
Algorithm 1
Input: ROI image of resolution 128 × 128 pixels. Each pixel’s possible grey level is
between 0 and 255.
Output: A 1-D vector ‘distance’ of length 256.
Begin
1. Read ROI image into imgvariable, representinga 2-dimensional grey-level
intensity matrixof size 128 × 128.
2. Initialization: For each grey level intensity (intensity in range 0 to 255),
distance[intensity]=0.
}
}
}
}
End
In Algorithm 1, img represents a two-dimensional matrix of size 128 × 128 such
that each value represents the grayscale intensity value of the corresponding pixel.
The distance[] is the FV used to store 256 values, where each value represents
the sum of Euclidean distances between subsequent pixels of the same intensity
value. For example, distance[0] stores the sum of Euclidean distance between
subsequent pixels of intensity 0 only.
Figure 2 illustrates how Euclidean distance is computed for intensity level 121,
whereas Table 2 represents its computed values. For the image block shown in Fig. 2,
the value of distance[121] is 10.9. These steps are repeated for each intensity
value for each pixel of img. The size of the computed feature vector becomes 256.
After the FV computation of 6000 randomly selected ROI images (2000 from each
category of benign, malignant and normal class), the next task is its performance
evaluation on classification models. For this evaluation, the simulation work was
Table 2 Euclidean distance between pixels of intensity value 121, computed from Fig. 2
Position of one pixel Position of subsequent pixel Euclidean distance between them
[0, 1] [1, 3] 2.24
[1, 3] [1, 3] 2.83
[1, 3] [3, 4] 3
[3, 4] [2, 5] 2.83
Sum of distance for intensity 121 10.9
Similar Intensity-Based Euclidean Distance Feature Vector … 91
Table 3 FV’s performance over some state-of-the-art classifiers along with used hyperparameters
Classifier Hyperparameters used Overall accuracy (%)
Cubic support vector machine [14, Function (kernel): cubic 97.3
15] Scale (kernel): automatic
Box constraint level: 1
Method: one-versus-one
Data standardization: true
Fine KNN [16] Neighbors: 1 97.2
Metric of distance: Euclidean
Distance weight: equal
Data standardization: true
Wide neural network [17, 18] Fully connected layers: 1 97.1
Ist layer size: 100
Activation: ReLU
Iteration limit: 1000
Regularization strength: 0
Data standardization: yes
Ensemble Subspace KNN [17, 19] Ensemble type: subspace 98.4
Type of learner: nearest neighbors
Learners: 30
Subspace dimension: 128
Bold represents the maximum classification accuracy achieved by any classifier
Table 4 Performance
Performance metric Benign Malignant Normal
metrics (sensitivity/recall,
specificity, precision and Sensitivity/recall 0.99 0.99 0.97
F1-score) for Ensemble Specificity 0.99 0.99 0.99
Subspace KNN classifier
Precision 0.98 0.98 0.99
F1-score 0.99 0.99 0.98
Table 5 Performance comparison of the proposed work with other state-of-the-art techniques
Classifier Accuracy
(%)
Dual thresholding-based breast cancer detection in mammograms [7] 93.45
Breast cancer detection using mammogram images with improved multi-fractal 96.2
dimension approach and feature fusion [20]
Fully automated computer-aided diagnosis system for microcalcifications cancer 97
based on improved mammographic image techniques [21]
Enhancement technique based on the breast density level for mammogram for 96
computer-aided diagnosis [12]
Proposed work 98.4
In this work, the authors proposed a breast cancer detection technique by classi-
fying the input mammogram images into benign, malignant and normal classes. The
primary goal of this research is to develop a technique that is easy to understand, accu-
rate, used as a regular screening technique for early-stage breast cancer detection,
prioritize the mammogram images for the radiologists and provide a second opinion
to them. Proposed work satisfies all these requirements using a simple approach
based on Euclidean distance between similar intensity pixels. When evaluated on
random balanced data, it achieved classification accuracy between 97.1 and 98.4%
using tenfold cross-validation approach for different classifiers.
Future work can incorporate some other feature extraction and classification tech-
niques with this existing approach to get better results. In addition, other datasets may
be incorporated with the existing one for the classifier’s training to achieve enhanced
performance.
Acknowledgements This work has been done under the Visvesvaraya fellowship scheme of the
Government of India (GOI).
Similar Intensity-Based Euclidean Distance Feature Vector … 93
References
20. Zebari DA, Ibrahim DA, Zeebaree DQ, Mohammed MA, Haron H, Zebari NA, Maskeli-
unas R (2021) Breast cancer detection using mammogram images with improved multi-fractal
dimension approach and feature fusion. Appl Sci 11(24):12122
21. Mabrouk MS, Afify HM, Marzouk SY (2019) Fully automated computer-aided diagnosis
system for micro-calcifications cancer based on improved mammographic image techniques.
Ain Shams Eng J 10(3):517–527
Prediction of Lumpy Virus Skin Disease
Using Artificial Intelligence
Abstract Lumpy skin disease (LSD) is a skin infection caused by the lumpy skin
disease virus, which belongs to the genus of Capripoxviruses. The disease transmitted
by arthropod vectors has high morbidity and low mortality rates. Contagious through
arthropod-borne feces and infected insects, it is virtually a non-vectored disease. It
takes one to four weeks for viremia to develop. It affects all animals regardless of
their age or gender. The use of LSD by cattle in Asia has recently been reported as
a terrifying threat. A morbidity rate of 7.1% was reported among cattle for the first
time in India for LSD. Clinical manifestations of the disease include fever, anorexia,
nodules on the mucus membranes of the mouth, nose, genital, udder, eyes, and
rectum, less milk production, infertility, abortion, and even death. First introduced in
Bangladesh in July 2019, the disease has since spread to China, Bhutan, India, Nepal,
Vietnam, and Myanmar in Southeast Asia. This paper presents a comprehensive
overview of LSD outbreaks in Asian countries over the past three years and also sheds
light onto the existing methods of disease detection based on artificial intelligence
techniques.
1 Introduction
Among the most significant threats to the stockbreeding industry, lumpy skin disease
(LSD) is the most common, which causes acute and subacute illnesses in cattle and
buffalo. There are many different kinds and ages of cattle that are affected, but young
cattle and nursing cows are most vulnerable. The World Organization for Animal
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 95
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_9
96 P. S. Kholiya et al.
Health (OIE) has identified this illness as a disease that requires notification due to
its large economic losses and potential for fast spread. As the disease has spread into
disease-free countries in recent years, controlling and eradicating its transmission
is imperative. This virus is a member of the Capripoxvirus genus and is attached to
goat pox virus (GTPV) and sheep pox (SPPV). In addition to its double-stranded
DNA structure, the virus has an envelope composed of lipids and has a size of
approximately 150 kilobase pairs (kbp).
This virus which affects domestic ruminants is the most crucial one in terms
of economic significance in the Poxviridae family of viruses. The genome and
lateral bodies of the virus are contained within its capsid or nucleocapsid. Serologic
cross-reactions between people from the same species and cross-protection between
them are the outcomes of extensive DNA cross-hybridization between species. Even
though Capripoxviruses are generally thought of as host-specific, strains of SPPV
and GTPV are capable of cross-infecting and causing disease in both hosts. LSD,
on the other hand, is capable of infecting sheep [1] and goats experimentally, but no
natural infection has been observed in sheep or goats.
In the year 1929, the first clinical evidence of LSD was found in Zambia (formerly
Northern Rhodesia). Initially, it was believed that poisoning or hypersensitivity to
insect bites was responsible for most LSD cases [2]. The disease was recognized as
infectious in Botswana, Zimbabwe, and South Africa between the years 1943 and
1945. Eight million cattle were affected by LSD in South Africa as a panzootic. From
1945 to 1949, the disease continued to cause massive economic losses. Kenya was the
first country in East Africa to identify LSD [3] in 1957. Sudan was the first country
to report the disease in 1972, followed by West Africa in 1974. In 1983, it spread to
Somalia [3]. According to Davies, the disease continues to spread throughout most of
the African continent in a series of epidemics. Mauritius, Mozambique, and Senegal
reported LSD cases in 2001. As of today, LSD is widespread throughout most of the
African continent (with the exception of Libya, Algeria, Morocco, and Tunisia). It
had been proposed that the disease could spread beyond this range until the 1980s
(from 1929 to 1984) [3]. In 1984 and 2009, Oman was the source of LSD epidemics.
In 2014, LSD was identified as an emerging disease for the first time in Iran. A total
of six cases have been reported in dairy cows. In two villages in the west of the
country, outbreaks have been reported. The outbreak is believed to result from the
illegal movement of animals and normal vectors [4].
There is a possibility of the LSD spreading and invading free neighboring coun-
tries. Among the possible spreading areas of the LSD are north and west of Turkey
into Europe and the Caucasus and eastward into Central and South Asia. Further-
more, Bulgaria and Greece to the west and the Russian Federation to the north are at
risk. LSD also decreases breathing from respiratory muscle, and it can be fixed by
activity in spontaneous ventilation [29]. How the disease spread from South Africa
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 97
to India [5] remains unclear. Still, it may have been spread by livestock crossing
international borders, or vectors from neighboring countries could have spread it.
Several countries bordering India, including China and Bangladesh, have reported
cases of LSD in recent years. Therefore, it is crucial to understand the epidemiology
of exotic diseases to manage these diseases in a timely manner effectively [6–10].
As discussed earlier, the detection of LSD in animals requires many pathology tests.
However, an accurate diagnosis of the disease depends upon the experience of pathol-
ogist and is sometimes affected by inter- and intra-observer variations. Therefore,
a huge interest is present among the research community to develop artificial intel-
ligence (AI)-based techniques that can help in timely detection of the diseases in
animals [11–15]. Table 1 gives a brief description of the studies carried out for
detecting LSD.
As observed from the given Table 1, recent attempts have been made by researchers
in developing different AI techniques-based systems for detecting the presence of
LSD. Some studies utilize metrological and geographical characteristics related to
disease spread detecting the presence of LSD in animals of the region based on
different machine learning classifiers, while few studies utilize the images of infected
animals and normal images for detecting the presence of LSD based on different AI
techniques. In case of image datasets, in the study conducted by Girma et al. [14],
authors used a self-designed convolutional neural network (CNN) named LSDNet
for feature extraction. The extracted features were then fed to different classifiers
out of which support vector machine (SVM) classifier reported the highest accuracy
of 96.0%. From the table, it has been observed that most of the studies have used
CNN models for feature extraction purposes only. Therefore, the authors have tried
to perform different experiments using CNN models to test the efficacy of transfer
learning on the detection of LSD in cattle.
4 Methodology Adopted
The present work focuses on the detection of lumpy virus disease in cattle by making
use of deep learning-based models using the transfer learning approach. Most of the
previous studies have either use geospatial features or have used features extracted
from deep CNNs for classification using machine learning classifiers. No study yet
has analyzed how the end-to-end CNN architectures perform in detection of LSD.
The present work uses publicly available datasets specific to lumpy virus detection.
Preprocessing steps, such as image enhancement, noise reduction, and data augmen-
tation have been used to improve the quality and diversity of the datasets. The dataset
being used for this work is described in Table 2 called “Lumpy Skin Images Dataset”
[16].
The various augmentations applied to images in the dataset like flip, rotation,
crop, grayscale, hue, brightness, exposure, and cutout [17] are shown in Fig. 1.
The dataset has been split into training set (75% images, i.e., 766 images), valida-
tion set (15% images, i.e., 153 images), and testing set (10% images, i.e., 100 images).
During augmentation, the dataset was expanded from a total of 1024 images to a total
of 9200 images.
The CNNs are primarily used for computer vision tasks. They leverage the concept of
convolution to extract spatial features from images. CNNs consist of convolutional
layers for feature extraction, pooling layers for downsampling, and fully connected
layers for classification or regression [18].
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 99
Evaluation measures used to assess the performance of deep learning models for
lumpy virus detection, including accuracy, sensitivity, specificity, precision, and F1-
score [19]. In the previously reviewed papers, several CNN models were used upon
this dataset, but the accuracy levels were not up to the mark. So, in this work, a set of
6 CNN models were chosen, i.e., DenseNet, Inceptionv3, MobileNet, ResNet, VGG,
and Xception for experimental analysis.
Along with all the CNN models, TensorFlow was used to train the dataset. This
provides the advantage of stability to train large datasets. It uses GPUs to deeply
learn the dataset and thoroughly learn the DL model [20]. Deep learning architectures
are neural network structures designed to perform complex tasks by automatically
learning hierarchical representations from raw data. The description of the used
models is as below.
• VGG: The VGG is represented as a traditional deep CNN architecture with
several layers. The VGG16 model has a top-5 accuracy level of about 92.7%
with ImageNet. There are almost 14mil photographs in an ImageNet collection,
which is organized into approximately 100 categories. It outperforms AlexNet by
substituting many 3 × 3 kernel-sized layers for huge kernel-sized layers. As the
name suggests, VGG19 is an advanced part of VGG16 [21, 22].
• DenseNet: A DenseNet is a type of CNN that employs dense interconnections
between its stages by directly linking all layers (with matching feature map
100 P. S. Kholiya et al.
sizes) using dense blocks. DenseNet was built largely to enhance the diminishing
gradient of rising neural networks’ diminishing accuracy. In other words, the data
vanishes before reaching its destination because of the greater separation among
both input and output layers [22, 23].
• MobileNet: Using a novel type of convolutional layer called depthwise separable
convolution, MobileNet is a CNN design that is smaller, faster, and employs both
of these features. These models are highly handy to apply on mobile and embedded
devices due to the small size of the model [24].
• Inceptionv3: An inception network is a type of DNN that has an architectural
design made up of recurring components called inception modules. As was already
said, the technical aspects of the inception module are the main subject of this
paper. All of those levels (three convolutional layers: a 1 × 1 layer, a 3 × 3 layer,
and a 5 × 5 layer) are combined, and the output filter banks from each layer are
joined to create an individual output vector that serves as an input element for the
following stage [25].
• Xception: A CNN with 71 layers is called Xception. It is based on Google’s
inception model and uses depthwise separable convolutions, according to Google
researchers. It can import a network that has already been pretrained using the
ImageNet database’s more than a million images [26].
• ResNet: The key innovation in the ResNet architecture is the use of residual
connections, which enable the network to learn a residual mapping between input
and output features. These connections allow the network to bypass some layers
and preserve information from earlier layers, which helps to mitigate the vanishing
gradient problem and improve the training of very deep networks [27].
5 Results
The deep learning models have been trained in a total count of 20 epochs, with
a patch size of 32 and a learning rate of 0.001. The 6 CNN models were trained
separately. The approach was to find the highest accuracy available, which was not
possible without the help of TensorFlow. During the whole execution of the models,
TensorFlow and Keras models were implemented at the initial stage. For the purpose
of image processing, ImageDataGenerator was used. The models were trained in a
sequential flow of Keras TensorFlow. Adam optimizers are used as it provides better
accuracy, with a faster computational time and requires few parameters for tuning.
The accuracy and loss statistics for training and validation datasets are presented
below for different models.
After training the dataset on DenseNet201, the values of accuracy and loss
obtained for training and validation datasets are plotted in Fig. 2.
After training the dataset on Inceptionv3, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 3.
After training the dataset on Xception, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 4.
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 101
After training the dataset on MobileNet, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 5.
After training the dataset on ResNet50, the values of accuracy and loss obtained
for training and validation datasets are plotted in Fig. 6.
After training the dataset on VGG19, the values of accuracy and loss obtained for
training and validation datasets are plotted in Fig. 7.
The final values of accuracy and loss obtained for different models on the
validation dataset are shown in Table 3.
As seen from Table 3, it is observed that MobileNet gives the maximum accuracy
of 94.1% for differentiating between healthy cows and cows affected from lumpy
disease virus. The study is different from other studies that have been already carried
out for LSD detection as: (i) the study tries to evaluate the effect of transfer learning
by using different CNN models for classifying the images of cattle infected by LSD.
(ii) The study carried out by Safavi et al. [11] takes into account the geometric and
geospatial features, whereas in the proposed study, the feature engineering step is
eliminated by using CNN models.
Table 3 Description of
Deep learning model Accuracy Loss
results obtained in the present
work DenseNet201 0.9215 0.2310
Inceptionv3 0.8627 0.3536
XceptionNet 0.8543 0.3526
MobileNet 0.9411 0.1719
ResNet50 0.9348 0.2789
VGG19 0.8861 0.3720
Prediction of Lumpy Virus Skin Disease Using Artificial Intelligence 103
6 Conclusion
References
14. Girma E, Ababa A (2021) Identify animal lumpy skin disease using image processing and
machine learning. M.Sc. dissertation, St. Mary’s University, Ethiopia
15. Dofadar DF, Abdullah HM, Khan RH, Rahman R, Ahmed MS (2022) A comparative analysis
of lumpy skin disease prediction through machine learning approaches. In: IEEE Conference
on artificial intelligence in engineering and technology. IEEE, Malaysia, pp 1–4
16. Kumar S, Shastri S (2022) Lumpy skin images dataset. Mendeley Data, V1. https://doi.org/10.
17632/w36hpf86j2.1
17. Shijie J, Ping W, Peiyi J, Siping H (2017) Research on data augmentation for image classification
based on convolution neural networks. In: 2017 Chinese Automation Congress (CAC). IEEE,
China, pp 4165–4170
18. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer
vision: a brief review. Comput Intell Neurosci 2018:7068349. https://doi.org/10.1155/2018/706
8349
19. Muñoz IC, Hernández AM, Mañanas MÁ (2007) Estimation of work of breathing from respi-
ratory muscle activity in spontaneous ventilation: a pilot study. Appl Sci 9(10):2019. https://
doi.org/10.3390/app9102007
20. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556
22. Malik H, Anees T, Naeem A, Naqvi RA, Loh WK (2023) Blockchain-federated and deep
learning-based ensembling of capsule network with incremental extreme learning machines
for classification of COVID-19 using CT scans. Bioengineering 10(2):203. https://doi.org/10.
3390/bioengineering10020203
23. Huan G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional
networks. In: Proceedings of the IEEE Conference on computer vison and pattern recognition.
IEEE, pp 4700–4708
24. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam
H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications.
arXiv preprint arXiv:1704.04861
25. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception archi-
tecture for computer vision. In: Proceedings of the IEEE Conference on computer vision and
pattern recognition. IEEE, pp 2818–2826
26. Chollet F (2016) Xception: deep learning with depthwise separable convolutions. In: Proceed-
ings of the IEEE Conference on computer vision and pattern recognition, pp 1251–1258
27. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: Proceed-
ings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, USA,
pp 770–778
Two-Factor Authentication Using
QR Code and OTP
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 105
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_10
106 A. Gupta et al.
With the proliferation of online transactions, protecting user accounts from unau-
thorized access has become paramount. Two-factor authentication (2FA) has emerged
as one of the most widely embraced security measures. It mandates the provi-
sion of two pieces of evidence to verify the user’s identity, enhancing the secu-
rity of user accounts, even if passwords are compromised. QR and OTP-based
2FA systems are popular methods for implementing 2FA for website login. This
research paper investigates the effectiveness of QR and OTP-based 2FA systems
for website login and provides a detailed, step-by-step guide to implementing
them.
Given the rising number of online transactions, ensuring user account security
against unauthorized access has become increasingly crucial. Two-factor authenti-
cation (2FA) has emerged as a widely adopted security measure that requires the user
to provide two pieces of evidence to validate their identity. This additional layer of
security helps to prevent unauthorized access to user accounts, even if passwords are
compromised. QR and OTP-based 2FA systems are two of the most popular methods
of implementing 2FA for website login. This paper delves into the effectiveness of
QR and OTP-based 2FA systems for website login and provides a step-by-step guide
on how to implement them.
2 Literature Survey
The paper titled “Two Factor Authentication Framework Using OTP-SMS Based on
Blockchain” proposes a new framework for two-factor authentication (2FA) using the
OTP-SMS method and blockchain technology to enhance security [1]. The current
2FA methods, particularly OTP-SMS, have vulnerabilities that attackers exploit. The
proposed framework generates an encrypted OTP using smart contracts and sends its
hashed value for authentication. The framework is compared with other blockchain-
based frameworks and found to be more secure and efficient.
Authentication is essential to confirm a user’s identity and grant access. 2FA adds
an extra layer of security to the authentication process by using two factors instead of
one [2]. OTP-SMS is a common method in 2FA, but it can be attacked by intercepting
the OTP [3]. The proposed framework addresses this issue by leveraging blockchain
technology. Blockchain ensures secure transmission of the OTP by encrypting it with
the user’s public key and sending the hashed value.
The paper discusses common attacks on 2FA, such as Man in the Middle (MITM),
session hijacking, and third-party attacks. MITM involves an attacker intercepting
communication between the user and the website, potentially gaining access to sen-
sitive information. Session hijacking exploits stolen tokens to gain unauthorized
access. Third-party attacks exploit vulnerabilities in the generation and verification
of OTP tokens.
Two-Factor Authentication Using QR Code and OTP 107
and sent to the user’s mobile number via SMS [8]. The user then enters the OTP as
the second authentication factor, and access is granted only if it matches the gener-
ated OTP. The paper evaluates the proposed system’s effectiveness and performance,
comparing it with traditional password-based systems. The results demonstrate that
the 2FA login system significantly enhances security by requiring both the correct
username-password combination and access to the user’s mobile device. User feed-
back also reveals positive acceptance of the system, appreciating the heightened
security without significant inconvenience during the login process. In conclusion,
the research paper presents a comprehensive study on the development of a 2FA login
system using dynamic password with SMS verification. The system offers improved
security compared to traditional password-based systems, mitigating risks associated
with password theft. The positive user experience further supports the feasibility and
effectiveness of the proposed system. This research contributes valuable insights for
organizations aiming to strengthen the security of their user login processes.
The paper titled “A Novel User Authentication Scheme Based on QR-Code”
presents a QR code-based one-time password authentication scheme for secure com-
munication and resource sharing over insecure networks [7]. The goal is to provide a
simple and efficient authentication mechanism that eliminates the need for hardware
devices and reduces maintenance costs.
The scheme involves two parties: the service provider (SP) and remote users who
request services from the SP with authorized access rights. Each user possesses a
mobile phone with an embedded camera, allowing them to capture and decode QR
code images.
The scheme consists of two phases: Registration and Verification. In the Registra-
tion phase, a user (User A) who wishes to join the system sends their identity (IDA)
to the SP. The SP then sends a long-term secret key (xA) to User A’s mobile device
through a secure channel.
In the Verification phase, User A sends their IDA and a time stamp (T 1) to the
SP. The SP responds by sending an encoded QR code (EQR(a)), a hashed value (h(r,
T 1, T 2)), and another time stamp (T 2) to User A. User A checks the correctness of
T 2, and the SP checks the correctness of T 3 (not explicitly mentioned).
The Registration phase establishes the initial connection between User A and the
SP, where User A sends their identity to the SP, and the SP sends the long-term secret
key to User A’s mobile device.
The Verification phase is the process of confirming the authenticity of User A’s
identity and granting access rights. User A sends their identity and a time stamp to
the SP, and the SP responds with an encoded QR code, a hashed value, and another
time stamp. User A verifies the time stamp, and the SP verifies the time stamp (T 3,
not explicitly mentioned). If all checks pass, User A is successfully authenticated.
The proposed scheme leverages the widespread use of QR codes and mobile
phones to provide a practical and convenient authentication solution. It eliminates
the need for separate hardware tokens and reduces the risk of tampering and main-
tenance costs [9]. The feasibility of the scheme is evaluated, and security analyses
are conducted to address potential risks. Overall, the scheme offers an efficient and
secure approach to user authentication.
Two-Factor Authentication Using QR Code and OTP 109
The paper titled “Online Banking Authentication System using Mobile-OTP with
QR-code” proposes a new authentication system for online banking to enhance secu-
rity and convenience [6]. The existing Internet banking system is vulnerable to hack-
ing and personal information leakage. The proposed system utilizes a combination
of Mobile one-time password (OTP) and QR code, a variant of the 2D barcode, to
provide secure user confirmation.
Traditional banks advertise online security guarantees, but the fine print often
includes certain security requirements. With the increasing use of online banking,
financial institutions are reluctant to reimburse users who fall victim to scams. OTP, a
password system that can only be used once, is introduced as a countermeasure. The
proposed system uses Mobile OTP, offering the security of traditional OTP devices
with the convenience of mobile features.
The system replaces the use of security cards with QR codes, which can be scanned
by users’ mobile phones. The QR code contains transfer information and the user’s
mobile device serial number. Users generate OTP codes on their mobile phones
based on this information and enter them to complete the transfer process. The
system assumes secure communication, shared hashed serial numbers, and the use
of authorized certificates.
By replacing security cards with QR codes and leveraging Mobile OTP, the pro-
posed system aims to provide greater security and convenience for online banking
transactions. The paper provides an overview of OTP and QR code technologies and
outlines the authentication process of the proposed system.
The paper titled “Multi-factor Authentication: A Survey” discusses the challenges
and potential sources of multi-factor authentication (MFA) in the context of mobile
services [10]. MFA is crucial for establishing secure access rights and validating
user identity. The authors explore various MFA sources, including password pro-
tection, token presence, voice biometrics, facial recognition, ocular-based methods,
hand geometry, vein recognition, fingerprint scanners, thermal image recognition,
geographical location, behavior detection, beam-forming techniques, occupant clas-
sification systems, electrocardiographic recognition, electroencephalographic recog-
nition, and DNA recognition. The paper highlights the importance of usability,
integration, security and privacy, and robustness to the operating environment as
key challenges in implementing MFA systems. Usability challenges involve task
efficiency, task effectiveness, and user preference. Integration challenges include
multi-biometrics, vendor dependency, and the trustworthiness of third-party service
providers. Security and privacy challenges arise from vulnerabilities in the system
components and data transmission. Robustness to the operating environment consid-
ers failure rates and the impact of environmental noise. The authors propose a new
authentication scheme based on vehicle-to-everything (V2X) communication, lever-
aging the sensors available in modern vehicles. They discuss a conventional approach
using Lagrange polynomials for secret sharing and propose a reversed methodology
that adds a unique factor of time for robustness. The proposed MFA solution for
V2X applications allows for automated access based on the presence of factors and
identifies outdated factor information. Overall, the paper provides an overview of
MFA challenges, potential sources, and a proposed solution for V2X applications.
110 A. Gupta et al.
3 Tech Stack
3.2 PyCharm
3.3 Figma
Figma is a web-based design tool that allows users to create, collaborate, and share
interface designs for websites and mobile applications. It offers a range of features,
Two-Factor Authentication Using QR Code and OTP 111
3.4 Firebase
Firebase is a mobile and web application development platform that provides a suite
of tools and services to help developers build high-quality apps. It offers a real-time
database, cloud storage, hosting, and authentication services, among others. Firebase
provides developers with a scalable and reliable infrastructure to build apps quickly
and easily, with features like push notifications, analytics, and remote configuration.
It is widely used by developers to create mobile and web applications across different
platforms.
4 Proposed Methodology
Our application’s architecture is divided into two parts: a website and an Android
app. The website is built using the Django framework, while the Android app is
developed using Java. When a new user visits our website for the first time, they are
required to register by entering their email address, username, and password. This
registration process utilizes the authentication feature of Firebase, where the website
must be listed on the Firebase console to access its functionalities. Upon registration,
a unique ID is created for the user inside Firebase’s user database.
Once registered, the user can log in to the website using the email and password
they used during registration. If the user is present in Firebase’s user database, they
are successfully logged into the website. After the login process, a Python script
generates a QR code that contains the user’s unique ID. The QR code is displayed
on the website along with a space to enter the one-time password (OTP).
The Android app plays a crucial role in ensuring the security of the OTP generation
process. However, before generating the OTP, the user must log in to the app using
their existing credentials from the website registration. It is not possible to register
using the Android app. Once logged in, the user can access the QR code scanner on
the app. Upon scanning the QR code, the user’s unique ID is obtained and compared
against the ID of the user who is currently logged in on the Android app. If the IDs
match, the OTP is generated.
Upon generating the OTP, the Android app triggers a Python script to send the
OTP to the user’s registered email address. The user then enters the OTP received in
their email into the designated space on the website. Once the correct OTP is entered,
the user is successfully logged into the website.
112 A. Gupta et al.
5 Limitations
While the architecture outlined above may seem functional and secure, there are
several limitations to consider.
The system relies heavily on Firebase for authentication, and any disruption in the
Firebase services could cause significant disruptions in the login and registration
process.
Two-Factor Authentication Using QR Code and OTP 113
The system’s reliance on a single OTP generator script makes it a single point of
failure. If this script malfunctions or is compromised, it could affect the entire authen-
tication process.
The system’s current design restricts the user from registering using the mobile app.
This limitation might create inconvenience for some users and might result in losing
some potential users.
The use of Java in the Android app and Python in the website may cause difficulties
in maintaining the code and adding new features in the future. It might also create
issues with scalability in the long run.
Therefore, it is crucial to address these limitations and consider alternative
approaches for enhancing the security and reliability of the system. Despite these
limitations, our proposed approach provides a promising step toward automating the
detection of abusive language on social media, which can help promote a safer and
more inclusive online community.
6 Conclusion
In conclusion, the above model presents a simple and functional architecture for user
authentication and registration for a website and an Android app. It utilizes Firebase
for authentication and OTP-based security measures to ensure secure access to user
accounts. However, there are several limitations to the current model, such as depen-
dence on a single point of failure, potential security risks, and limited functionality.
114 A. Gupta et al.
References
1. Alharbi E, Alghazzawi D (2019) Two factor authentication framework using OTP-SMS based
on blockchain. Trans Mach Learn Artif Intell 7(3):17–27
2. De Cristofaro E, Du H, Freudiger J, Norcie G (2013) A comparative usability study of two-
factor authentication. arXiv preprint arXiv:1309.5344
3. DeFigueiredo D (2011) The case for mobile two-factor authentication. IEEE Sec Privacy
9(5):81–85
4. Harini N, Padmanabhan T et al (2013) 2 Cauth: a new two factor authentication scheme using
qr-code. Int J Eng Technol 5(2):1087–1094
5. Iyanda AR, Fasasi ME (2022) Development of two-factor authentication login system using
dynamic password with SMS verification. Int J Educ Manag Eng 12(3):13
6. Lee YS, Kim NH, Lim H, Jo H, Lee HJ (2010) Online banking authentication system using
mobile-OTP with QR-code. In: Proceedings of the 5th international conference on computer
sciences and convergence information technology. IEEE, pp 644–648
7. Liao KC, Lee WH (2010) A novel user authentication scheme based on QR-code. J Netw
5(8):937
8. Liao KC, Lee WH, Sung MH, Lin TC (2009) A one-time password scheme with QR-code
based on mobile phone. In: Proceedings of the 2009 fifth international joint conference on
INC, IMS and IDC, pp 2069–2071
9. Nwankwo C, Adigwe W, Nwankwo W, Kizito AE, Konyeha S, Uwadia F (2022) An improved
password-authentication model for access control in connected systems. In: Proceedings of the
2022 5th information technology for education and development (ITED). IEEE, pp 1–8
10. Ometov A, Bezzateev S, Mäkitalo N, Andreev S, Mikkonen T, Koucheryavy Y (2018) Multi-
factor authentication: a survey. Cryptography 2(1):1
11. Parmar H, Nainan N, Thaseen S (2012) Generation of secure one-time password based on
image authentication. J Comput Sci Inform Technol 7:195–206
12. Wang D, Wang P (2016) Two birds with one stone: two-factor authentication with security
beyond conventional bound. IEEE Trans Depend Sec Comput 15(4):708–722
bSafe: A Framework for Hazardous
Situation Monitoring in Industries
Abstract Industrial accidents such as chemical spills, fires, and explosions. Can
have serious consequences, including injuries and fatalities, environmental damage,
and economic losses. To prevent industrial accidents, it is important for industries to
implement safety measures such as proper training for employees, regular mainte-
nance of equipment, etc. Furthermore, the installation of emergency response systems
is the need of the era to minimize the impact of any accidents that do occur. In
this paper, an IoT-based hazardous situation monitoring framework is presented to
monitor and detect toxic releases from chemical companies. The proposed system
actively records, processes, and analyses the ambient temperature in areas where
molten metal is handled and manufacturing is done. Additionally, it keeps an eye out
for two dangerous gases i.e., LPG and natural gas. Every time a parameter is breached,
a set of predetermined lists of users are immediately alerted, and the system contin-
uously collects the data for further suggestions to modify the industry’s safety rules.
The sensors used in this prototype model can be altered as needed to accommodate
industrial requirements.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 115
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_11
116 N. Kashyap et al.
1 Introduction
Industrial risks are becoming an increasingly serious concern to both people and
the environment. Industrial accidents can be initiated by a range of causes such as
human error, equipment failure, inadequate training, and unsafe working conditions
[1]. To prevent industrial accidents, it is important for companies to implement safety
measures such as proper training for employees, regular maintenance of equipment,
and strict adherence to safety regulations [2]. Furthermore, the installation of emer-
gency response systems is the need of the era to lessen the impact of any accidents
that do occur. Additionally, it is important for having knowledge of different types of
hazards present at the workplace. It is also important for workers to be aware of the
hazards present in their workplaces and to follow safety protocols and procedures.
This includes wearing protective equipment, following proper handling procedures
for hazardous materials, and being aware of emergency exits and evacuation proce-
dures [3]. By taking these precautions, individuals can help to reduce the risk of
industrial accidents and protect themselves, their colleagues, and the surrounding
environment.
Internet of Things (IoT) based solutions to monitor and detect toxic releases from
chemical companies can be an effective way to prevent or mitigate the impact of
these kinds of disasters [1, 3]. By continuously collecting and analyzing data from
sensors and other devices, it is possible to identify potential hazards and take appro-
priate action to prevent or minimize the release of toxic substances [3]. A frame-
work that utilizes IoT technology to monitor and report on toxic releases could be
designed to include a variety of sensors and devices that are proficient of detecting
and assessing the presence and concentration of hazardous materials [4, 5]. These
devices could be connected to a central network and configured to transmit data in
real-time to a database, where it could be analyzed and used to identify potential
hazards and alert relevant authorities [4–6]. In this paper, an IoT-based monitoring
platform tailored specifically to the needs of the mining, refining, and manufac-
turing industries is presented in which the temperature of the environment is actively
recorded, processed, and analyzed by the system. Temperature is a key safety factor in
locations where molten metal is treated, manufacturing is done, or welds are formed.
It also keeps an eye on the high levels of dangerous gases (LPG/Natural Gas) that
are present in the air. The system continues to gather and monitor data for further
analysis in order to suggest modifications s in the industry’s safety standards if a
parameter is exceeded, issuing an instant alert if a parameter is violated to a group
of users on their cellphones from pre-determined lists of s people.
The work presented in this paper is structured as: The background study and some
of the related systems are discussed in Sect. 2. The design of the proposed prototype
is presented in Sect. 3. The various results obtained during the development of the
prototype are discussed in Sect. 4. The work is concluded in Sect. 5.
bSafe: A Framework for Hazardous Situation Monitoring in Industries 117
2 Background
Industrial hazards refer to the potential dangers that can occur in a workplace or
during the production of goods or services. These hazards can come in many forms,
including physical, chemical, biological, and ergonomic hazards. Some examples of
common industrial hazards include [1]:
• Physical hazards: These include physical injury or harm from equipment,
machinery, or other objects in the workplace.
• Chemical hazards: These include harm from exposure to chemicals in the
workplace such as chemical spills, toxic fumes, etc.
• Biological hazards: These refer to harm from exposure to biological agents in the
workplace, such as bacteria, viruses, or fungi.
• Ergonomic hazards: These include harm from repetitive movements or poor
posture while working such as carpal tunnel syndrome, back strain, or other
musculoskeletal disorders.
It is important for employers to identify and mitigate these hazards in order to
ensure the safety and health of their employees. This can involve things like training
employees on safety procedures, implementing safety protocols and procedures etc.
[7, 8].
The Internet of Things (IoT) is a network of physical items, including machines,
cars, buildings, and other things, that have connectivity, software, and sensors to
collect and exchange data [9]. By enabling new types of automation, efficiency, and
convenience, the IoT has the potential to completely transform several industries [10–
12]. Some common examples of IoT applications include smart homes, which allow
homeowners to control and monitor their home’s security, lighting, and temperature
remotely through a smartphone or other device [3, 8, 10, 11]. IoT might dramatically
increase safety and lower the likelihood of industrial disasters by enabling real-time
monitoring and control of equipment and processes, and by providing early warning
of potential hazards. One way that the IoT can improve safety in industrial settings
is by enabling the remote monitoring and control of equipment and processes [13].
They can be used to monitor environmental conditions in industrial settings, such as
temperature, humidity, and air quality [2, 4]. Sensors and other IoT-enabled devices
can be used to gather data on the performance and condition of equipment, alerting
operators to potential problems or malfunctions that could lead to accidents or other
hazards [14–16]. In addition to monitoring equipment and environmental conditions,
the IoT can be used to track a worker’s whereabouts and mobility in real-time [17–
19]. This can be particularly useful in industries with hazardous working conditions,
such as construction, mining, and oil and gas exploration, where workers may be
exposed to risks such as falls, machinery accidents, and other hazards. By tracking
worker movements, supervisors can quickly respond to potential accidents or other
emergencies, and can also use the data to identify and address potential safety issues
[20].
118 N. Kashyap et al.
Motivated by the popularity of IIoT [10], this paper presents the use of IoT to
actively monitor and analyze various factors in the typical heavy industrial zone like
temperature and levels of gases in the environment. If the above parameters exceed
the recommended safe values, the system can track the same and issue alerts. Also,
the data generated in real time can provide information about how smoothly the work
is going on in different zones.
3 System Requirements
The proposed system is built using open-source hardware and software platforms,
which aligns with current software development trends and the principles of Industry
4.0. This means that the system is designed to be flexible and adaptable, allowing it to
be customized and modified as needed to meet the specific needs of different users.
Various hardware and software components to develop the system are explained
below.
Different hardware used for the development of proposed prototype are listed in
Table 1 and explained below.
Table 1 Hardware
Product Image
requirements
Arduino
Temperature sensor
Gas sensor
Humidity sensor
Power supply
bSafe: A Framework for Hazardous Situation Monitoring in Industries 119
There are several ways to power an Arduino microcontroller board. The most
common method is to use a USB connection to a computer or external power supply.
The Arduino board can also be powered through a barrel jack or VIN pin, which
allows to use a DC power supply or a battery as the power source. The USB connection
provides a stable 5 V power supply to the Arduino board, which is sufficient for most
systems.
connection. It is made to give web applications with real-time updates with low-
latency, high-performance connectivity. Web Sockets provides a persistent connec-
tion between the client and server, allowing for real-time, bidirectional commu-
nication. It provides low-latency communication, as it eliminates the need for the
overhead of HTTP requests and responses.
Firebase Firestore is a cloud-based NoSQL document database provided by
Google. It is designed to store and synchronize data between multiple clients and
the cloud with real-time updates. Firestore is part of the Firebase suite of products,
which provides a comprehensive backend infrastructure for building web and mobile
applications. Because Firestore offers real-time updates, any database modifications
are instantly synchronized between clients and the cloud. It is highly scalable and
can handle large amounts of data and high traffic loads.
The step by step approach of the designing of proposed prototype is explained below.
• Hardware setup: Connecting the sensors to the Arduino board and make sure
everything is properly wired and powered.
• Installing necessary libraries: Installation of required libraries for the sensors
(e.g. DHT11, MQ2, BMP180, LM393) and for paho.mqtt and ThingsBoard
communication.
• Data Serialization: Serialization of data is done before publishing it to Things-
Board or any other IoT platform because most IoT platforms accept data in a
specific format, and that format is usually JSON. Serializing data into JSON
format also makes it easier to parse and analyze the data later on.
• Arduino Coding: Developing the code in python to read data from the sensors and
publish the sensor data to ThingsBoard using MQTT. This will involve setting
up the MQTT client, connecting to the broker, publishing the sensor data, and
handling any errors or exceptions that may arise.
• Configure ThingsBoard: Creating a dashboard on ThingsBoard to visualize the
sensor data and configure the rules engine to trigger alarms when hazardous
conditions are detected.
• Secure the system: To prevent unauthorized access to the system, use JWT
tokens for authentication and web sockets for secure communication between
the Arduino board and ThingsBoard.
• Sending data to Firebase: Using Web Socket(API) data fetching is done in the
local system from the things board and after it parses the JSON data and sends it
to the firebase firestore database.
• Fetching The Data in Mobile UI: When Data is Available on Firebase then make
visualize it on app using firebase config.
• System Testing: To ensure that whether all sensors are working properly, testing
of system is performed by publishing the data to ThingsBoard.
Smartphone is used as a visualization medium for the proposed system, which utilizes
the data from the firebase connection over the Internet. The provided user interface
may deal with the real-time presentation of the device’s filtered data. All data is
recorded in a cloud-hosted database called firebase real-time database.
When a device connects to a cellular/Wi-Fi network, data is stored as JSON and
instantly synchronized to all connected clients. A smartphone’s sensors were used
as the second set of sensors. The several networked sensors offer real-time measure-
ments. When the parameters cross the threshold value or approach the threshold
value, it indicating a dangerous state marked it as an emergency situation by gener-
ating alarm to the user. Various threshold values used in the proposed prototype
are presented in Fig. 2. Various steps followed during the working of the proposed
prototype are listed below and presented in Fig. 3.
bSafe: A Framework for Hazardous Situation Monitoring in Industries 123
Steps
Start
Connect the sensors using Arduino with PC or Laptop
Function to choose a google account for authentication
Function to see the values of the parameters in Application
Function to check if the parameter exceeds the threshold value or not
Function to Sign out
Exit
124 N. Kashyap et al.
The threshold values indicate the acceptable range of values for each param-
eter under normal conditions. If the values for any of these parameters exceed the
threshold value, it may indicate an abnormal or dangerous condition that requires
immediate attention or corrective action. For example, if the temperature exceeds
24 °C, it may indicate that the air conditioning system is not functioning prop-
erly, and corrective measures should be taken to avoid overheating. The successful
detection of hazardous situations is presented in Fig. 4.
5 Conclusion
References
1. Ali MH, Al-Azzawi WK, Jaber M, Abd SK, Alkhayyat A, Rasool ZI (2022) Improving coal
mine safety with internet of things (IoT) based dynamic sensor information control system.
Phys Chem Earth, Parts A/B/C 128:103225. https://doi.org/10.1016/j.pce.2022.103225
bSafe: A Framework for Hazardous Situation Monitoring in Industries 125
2. Shah J, Mishra B (2016) IoT enabled environmental monitoring system for smart cities. In:
2016 International conference on internet of things and applications (IOTA). Pune, India, pp
383–388. https://doi.org/10.1109/iota.2016.7562757
3. Paul S, Sarath TV (2018) End to end IoT based hazard monitoring system. In: 2018 International
conference on inventive research in computing applications (ICIRCA). Coimbatore, India, pp
106–110. https://doi.org/10.1109/icirca.2018.8597430
4. Wall D, McCullagh P, Cleland I, Bond R (2021) Development of an Internet of Things solution
to monitor and analyse indoor air quality. Internet of Things 14:100392. https://doi.org/10.
1016/j.iot.2021.100392
5. Esfahani S, Rollins P, Specht JP, Cole M, Gardner JW (2020) Smart city battery operated IoT
based indoor air quality monitoring system. In: 2020 IEEE sensors. Rotterdam, Netherlands,
pp 1–4. https://doi.org/10.1109/sensors47125.2020.9278913
6. Bhardwaj R, Kumari S, Gupta SN, Prajapati U (2021) IoT based smart indoor environment
monitoring and controlling system. In: 2021 7th International conference on signal processing
and communication (ICSC). Noida, India, pp 348–352. https://doi.org/10.1109/icsc53193.
2021.9673235
7. Lydia J, Monisha R, Murugan R (2022) Automated food grain monitoring system for warehouse
using IOT. Measur Sens 24:100472. https://doi.org/10.1016/j.measen.2022.100472
8. Mekni SK (2022) Design and implementation of a smart fire detection and monitoring system
based on IoT. In: 2022 4th International conference on applied automation and industrial
diagnostics (ICAAID). Hail, Saudi Arabia, pp 1–5. https://doi.org/10.1109/icaaid51067.2022.
9799505
9. Akhter R, Sofi SA (2022) Precision agriculture using IoT data analytics and machine learning.
J King Saud Univ Comput Inf Sci 34:5602–5618. https://doi.org/10.1016/j.jksuci.2021.05.013
10. Mishra DU, Sharma S (2022) Revolutionizing industrial automation through the convergence
of artificial intelligence and the internet of things, 1st ed. IGI Global, USA. https://doi.org/10.
4018/978-1-6684-4991-2
11. Bhati N, Samsani VC, Khareta R, Vashisth T, Sharma S, Sugumaran V (2021) CAPture: a
vision assistive cap for people with visual impairment. In: 2021 8th International conference
on signal processing and integrated networks (SPIN). Noida, India, pp 692–697. https://doi.
org/10.1109/SPIN52536.2021.9565940.
12. Vashisth T, Khareta R, Bhati N, Samsani VC, Sharma S (2022) A low cost and enhanced
assistive environment for people with vision loss. Lecture Notes in Networks and Systems, vol
218. Springer, Singapore. https://doi.org/10.1007/978-981-16-2164-2_20
13. Daponte P, De Vito L, Mazzilli G, Picariello E, Rapuano S, Tudosa I (2022) Implementation
of an intelligent transport system for road monitoring and safety. In: 2022 IEEE international
workshop on metrology for living environment (MetroLivEn). Italy, pp 203–208. https://doi.
org/10.1109/metrolivenv54405.2022.9826948
14. Pan Y, Lu C, Yu W, Wu M (2022) Design and application of intelligent monitoring system for
geological hazards. In: 2022 41st Chinese control conference (CCC), pp 3231–3236. https://
doi.org/10.23919/ccc55666.2022.9902700
15. Juel MTI, Ahmed MS, Islam T (2019) Design of IoT based multiple hazards detection and
alarming system. In: 2019 4th International conference on electrical information and commu-
nication technology (EICT). Khulna, Bangladesh, pp 1–5. https://doi.org/10.1109/eict48899.
2019.9068782
16. Dhall S, Mehta BR, Tyagi AK, Sood K (2021) A review on environmental gas sensors: materials
and technologies. Sens Int 2:100116. https://doi.org/10.1016/j.sintl.2021.100116
17. Fine GF, Cavanagh LM, Afonja A, Binions R (2010) Metal oxide semi-conductor gas sensors
in environmental monitoring. Sensors 10:5469–5502. https://doi.org/10.3390/s100605469
18. Anandan P, Raj VT, Namratha P, Hameed A, Farooq BM (2022) Embedded based smart LPG
gas detection and safety management system. Ind Mech Electr Eng. https://doi.org/10.1063/5.
0110382
19. Liu X, Cheng S, Liu H, Hu S, Zhang D, Ning H (2012) A survey on gas sensing technology.
Sensors 12:9635–9665
126 N. Kashyap et al.
20. Al-Okby MFR, Neubert S, Roddelkopf T, Thurow K (2021) Mobile detection and slarming
systems for hazardous gases and volatile chemicals in laboratories and industrial locations.
Sensors 21:8128. https://doi.org/10.3390/s21238128
Comparative Review of Different
Techniques for Predictive Analytics
in Crime Data Over Online Social Media
Abstract Social media platforms have swiftly become a vital platform for informa-
tion sharing and communication. Its services are constantly being used by millions
of people to connect with one another. Limiting and preventing crime, however, is
one of the security agencies’ primary responsibilities when it comes to urban secu-
rity. The development of enforcement strategies and the implementation of crime
prevention and control depend heavily on crime prediction. The model can predict
future instances of these crimes by using this strategy. Machine learning is currently
widely utilized for predicting crime. However, in the age of big data, when individ-
uals have access to an expanding amount of data, the capacity to recognize criminal
patterns based on historical crime data is no longer effective. Models for deep learning
and machine learning are contrasted using performance metrics. As a result, deep
learning frequently surpasses machine learning when both of them are compared. A
few datasets that are used for predicting crime statistics are studied along with their
various classes. The analysis brings attention to the model’s limitations as well as its
possibilities for improvement.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 127
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_12
128 Monika and A. Bhat
1 Introduction
The Internet had expanded significantly by the twentieth century’s end, and it had
fundamentally changed a significant portion of our economic and social lives [1].
The growth of online social networks (OSNs) has been significantly influenced by
this transition. Because of the fantastic communication opportunities provided by
the Internet. These communication options gave rise to the OSN, a component of
the Internet revolution, and they greatly increased its effectiveness [2]. Despite the
fact that different scholars describe it differently, [3] defined an OSN as an online
community consisting of individuals who share mutual friends, common interests,
and enjoyment of common activities. Several OSNs are web-based, enabling users
to exchange a variety of topics with other online users, contribute text, images and
videos to personal profiles, comment on items, communicate their health issues and
more [4]. The number of users on OSNs is astounding. The number of active users
every month on Facebook has surpassed 2.38 billion as of March 2019 [5]. 330 million
people use Twitter every month, which is a social microblogging platform [6]. These
websites serve as real-time dynamic data sources and new forms of communication
for millions of users, allowing them to interact with one another and develop their
individual profiles without regard to physical distance or other physical restrictions
[7]. When it comes to the development of social networks and communities that
are already enormous in scope and scale as well as the analysis of these networks,
communication data from OSNs might offer us new perspectives and opportunities
[8–10]. The OSN has increased the appeal of using social network architecture for a
number of objectives due to the unique options it can provide (Fig. 1).
Every 34 years, criminal activity drastically rises throughout all 34 nations [11]. To
stop these criminal acts, tough measures are needed. In order to keep an eye on these
criminal activities and enhance public safety, the rate of crime tracking is crucial [12].
Crime limitations or criteria utilized in this work relate to incident-level crime data
that is maintained as a dataset of crime and includes the kind of crime, the criminal’s
ID, the incident date, and the location [13]. Crime statistics can be greatly reduced
since social media is useful for detecting crime rates in various countries and locales.
Social media are a source of information as well as a tool for communication [14].
Twitter, which has a user base of more than 300 million, becomes a strong choice
for data analysis [15]. In this social media site, users express their thoughts, feelings
and even anger. Due to the fact that Tweets are user-initiated, it is challenging to get
data from Twitter for crime detection. There are various formats for tweets, including
symbols. Because of these problems, each Tweet needs to be carefully considered.
2 Summary
Social media is used by individuals for both their professional and personal inter-
ests, for involvement, connection and exchange of ideas as well as for the sharing of
videos, images and other information. The analysis found that social media enables
academics to examine the characteristics of individual behavior as well as regional
and temporal relationships. Based on surveys, criminology has emerged as a popular
field of research on a worldwide scale, using information drawn from Facebook,
Twitter, news feed articles, and other online social media sites. Utilizing spatiotem-
poral links in user-generated material allows for the acquisition of pertinent data for
the study of criminal activities. The research includes reference to the utilization of
text-based data science through the collection and visualization of data from various
news sources. There are several models for predicting crime data that have been devel-
oped by researchers using machine learning and deep learning. In the past, machine
learning models were used for predicting crime data, but more recently, deep learning
models are advancing quickly since they surpass machine learning models in terms
of performance. In contrast to deep learning models, which require less data, require
less time and are more accurate than machine learning models in predicting crime,
machine learning models require more time, less accurate and require more data. It
motivates us to conduct a comparison study on crime data prediction models since
many studies have difficulty generating the novel effect. Datasets are important for
predicting crime statistics on social media, and each dataset has a wealth of details
on crimes that have occurred in various places and at various periods. Numerous
datasets that are often used for predicting crime statistics are thus examined.
Rest of the paper is organized as follows: Sect. 3 presents the benefits of social
media, Sect. 4 describes the crime data prediction using machine learning models,
Sect. 5 describes the crime data prediction using deep learning model, Sect. 6 provides
the crime data prediction using different datasets, Sect. 7 represents the challenges of
the crime data prediction, Sect. 8 presents the future recommendations, and Sects.9
represent the conclusion and references.
130 Monika and A. Bhat
Social media platforms enable online content sharing and exchange. The ability to
share text, image, and video postings on a number of different social media sites, as
well as the ability to like, share, and respond on one another’s posts, allow users to
connect with one another and exchange material. According to the platform and the
user’s preferences, both public or private profile pictures and posts are possible. As a
result of the ability to geotag postings on many social media sites, social media data
can be compared to other kinds of geographic data.
The influence of social media on everyday life has both advantages and disadvan-
tages. Positively, social media platforms like Instagram, Snapchat, Facebook, and
Twitter enable users to connect with friends and family they’ve left behind via video
and audio conversations, chat rooms, and a variety of other services even when they’re
located miles apart. With only their mobile devices, the ordinary population may now
stay current with events happening across the globe. Individuals utilize social media
platforms to facilitate their ability to do so when seeking employment or for any
other reason, such as education. Social media lacks privacy, and therefore, there’s
a big chance someone will utilize your personal data for their own gain. Individual
privacy is a major concern in today’s society. The unauthorized access of another
person’s private details by a third party is the foundation for both cyberbullying
and cybertheft. Individuals are at risk due to the proliferation of offensive content
on social media. They talk on the Internet nonstop all day. As a result, disinforma-
tion could be employed for a variety of purposes, such as inciting racial or religious
animosity, misleading people or inciting digital hate crimes. As a result, social media
has established itself as a necessary component of our everyday life. Virtual space
consequently becomes a somewhat uncharted area in terms of the issues it presents
regarding human rights and responsibility.
In general, there are three primary methods for learning accessible in machine
learning (ML), including reinforcement methods (RL), supervised and unsupervised
(UL and SL) learning. In situations where a label property is provided for a specific
dataset, supervised learning is beneficial. Examples of SL algorithms include deci-
sion trees, regressions, random forests, logistic regressions, KNNs, and others. When
it’s difficult to find implicit correlations in a given unlabeled dataset, the UL can be
helpful. Clustering techniques like K-means and Apriori algorithms are instances of
UL. The RL lies in the middle of both supervised and unsupervised machine learning,
Comparative Review of Different Techniques for Predictive Analytics … 131
providing open feedback for each prediction or action taken but without providing a
label or error message. RL no longer rewards genuine input or output combinations
and directly changes suboptimal operations. The reinforcement method is respected
by the Markov decision process algorithm.
Krishnendu et al. [16]: The crime data prediction was done by dividing the loca-
tions into several clusters. K-means, an iterative unsupervised technique, is employed
to cluster the input unlabeled data. This first step determines how many clusters
must be symbolized by a constant K and then picks a random K point for the
centroids. These centroids might not come from the information. Each data point
is then assigned to the closest centroid, resulting in K clusters. For each of the
newly generated clusters, determine where to put the new centroid. Finally, then
again allocate each data point to the updated nearest centroid. This method divides
the region into five clusters. Each cluster has the mode of operation of oddity, role,
weapon, method, and location. The K-means algorithm achieved an accuracy of 78%,
precision of 60%, recall of 45%, f -measure of 44%, respectively.
Mohemad et al. [17]: K-medoids is a clustering method that is identical to k-
means, although it is less dependent on outliers. Identification of a document in a
cluster using a randomly generated k cluster is the basic idea behind k-medoids. The
medoid that is most closely connected to each other document is grouped together.
The representative docs are used as a reference data point by the k-medoids algorithm
rather than the average value of the docs in each cluster. Oddity, role, weapon, method,
and location are the five clusters used to operate in a region. The K-medoids algorithm
achieved an accuracy of 76%, precision of 59%, recall of 41%, f -measure of 39%,
respectively.
Hossain et al. [18]: In this study, supervised learning is utilized to more accurate
crimes prediction. The approach generates crime predictions through analyzing a
database that includes data of crimes that have already been committed and their
patterns. Decision tree (DT) is a graphical representation that employs branching
methodology to show all potential decisions outcomes based on particular circum-
stances. The regression and classification tree (CART), random forest, and boosted
trees are three of the many regression tree techniques that are accessible. Decision
trees (DT) are used in machine learning for both regression and classification. Every
branching has an outcome decision that takes into account a particular constraint for
the input. Researchers employ decision trees to illustrate every branching graphi-
cally. Using the path between roots to leaf node, it can determine the categorization
in a decision tree. An internal node is a check on the characteristics listed in the
decision tree. The DT divided the search area by classifying theft, drug, and immoral
activities and achieved the accuracy of 72.7% and 0.756 of precision. However one
of the toughest concerns with DT is over-fitting troubles. Pruning and setting model
parameter limitations will be used to resolve this problem. Continuous variables
cannot be fit by decision trees.
132 Monika and A. Bhat
Yao et al. [19]: The main objective of the study is to use random forest to iden-
tify crime hotspots. Random forest (RF) is a trusted machine learning technique for
regression and classification. A reliable method of machine learning for regression
and classification is called random forest (RF). The RF employs a bagging strategy
that is similar to the DT group. Using a random dataset sample, the RF repeat-
edly trains the algorithm to generate an exceptional prediction model. By utilizing
the outputs DTs, much like in the ensemble learning approach, it provides definite
prediction. The high-grade prediction of random forest is one that appears more
than once in the decision tree. The random forest algorithm’s main strength is in
how well it handles classification and regression problems, which allow for accu-
rate computations of these. Large datasets are handled delicately without losing
their dimensionality. This method has the accuracy of 91% with hyperparameters
boost. The benefits of out-of-bag samples and bootstrap sampling were introduced
to random forest methods.
Krysovatyy et al. [20]: A popular SL technique for both classification and regres-
sion is called support vector machines (SVM). This method enables rapid tracking
of fake businesses, which is beneficial for officials to avoid financial crimes. In order
to display the categorization between the input data, the SVM plots a plane on the
given data before classifying them. The maximization of margin, which denotes
the best categorization, entails a better side-to-side gap on a plane. Cyberbullying
prediction models have been developed using SVM, and these models have proven
to be successful and productive. In order to identify offensive material in SM, SVM
was utilized to build a cyberbullying prediction system. The SVM cyberbullying
prediction model was used to extract SM material with the potential for cyberbul-
lying and find offensive content. SVM has the prediction accuracy of 84% in the
research. Hence, multi-dimensional visualization of this data is also possible. Graph-
ical representations of accuracy, precision, recall, f 1-score are given in Figs. 2, 3,
and 4.
A recent area of study in computer science is called “deep learning” in the machine
learning domain. In order to bring machine learning closer to the fundamental objec-
tive of artificial intelligence, deep learning is developed. Deep learning addresses
a variety of challenging pattern recognition problems, advancing artificial intelli-
gence technology and allowing machines to mimic human behaviors like thinking
and audio-visual perception. Additionally, it uses a deep network to extract nonlinear
properties from input and develops complex function models by stacking numerous
nonlinear layers. Modern applications of deep learning include CNNs, recurrent
neural networks, and latent feed-forward networks. Deep learning has been obtained
in computer vision, image identification, processing of works of visual art, processing
natural language, drug discovery, genomics, cybersecurity, emotion recognition, and
speech recognition.
134 Monika and A. Bhat
The unusual actions have been found in the real-time crime scene dataset, which
has been gathered and converted to video frames. Sahay et al. [24]: Many researches
including research publications show that out CNN and RNN among other deep
learning algorithms have been effectively used to predict and classify crimes using a
variety of data types. The research introduced a deep reinforcement neural network
(DRNN) for predicting the crime on social media. These studies show the applica-
bility of deep learning algorithms in this area and give important information about
the elements influencing criminal behavior. In order to train DRNN for categoriza-
tion, recorded video is utilized to extract spatiotemporal characteristics and gestures.
The model obtained 98% accuracy, 96% precision, 80% recall, and 78% F-1 score.
To predict major social media crimes, multinomial naive Bayes (MNB) is used.
Abbass et al. [25]: The words or tokens constitute the majority of any written docu-
ment’s constituent parts. There are the maximum possible dependencies or relation-
ships between the words or sentences in a sentence, tweet, or document. This word
order adheres to the multinomial distribution, MNB was used to solve the classifica-
tion issue. Multinomial naive Bayes has proven to be an efficient, quick, and reliable
classifier in a variety of NLP uses. The Tf-Idf scores for each word in each text are
shown in the feature vector of the MNB algorithm. Both integer and partial feature
counts perform well under the multinomial distribution. The model obtained 78.87%
accuracy, 82.86% precision, 74.94% recall, and 78.70 f1-score, respectively. Graph-
ical representations of accuracy, precision, recall, and f 1-score are given in Figs. 5,
6, 7, and 8.
Datasets play a big part in predicting crime statistics on social media, and each
dataset has a lot of information on crimes that have happened in different locations
and at different times. There are two types of datasets: regional datasets and common
datasets. Regional datasets include information on a particular location, including
different types of information and attack percentages. Common datasets contain data
that is typically obtained from a variety of social media platforms, including Twitter,
Facebook, Instagram, and others. On social media, there are various datasets that are
used to predict crime as given below.
Hossain et al. [18]: San Francisco (SF) Open data from the SFPD Crime incident
report system [9] provides the San Francisco Crime Dataset. It offers details on crimes
that have place in San Francisco between January 1, 2003, and May 13, 2015. The
dataset consists of 8,78,049 entries in a csv file. The classification of a crime incidence
is the target category that needs to be predicted. The target label is connected to the
qualities, including the crime’s description and resolution. As a result, all additional
properties besides these three are used as features. The San Francisco Crime Dataset
includes 39 different categories of crimes. Classes are regarded as containing these
categories. Hence, with 39 classes, San Francisco became a city with many class
issues. There aren’t many crimes that happen constantly as well as some crimes
are incredibly uncommon. With a rate of 174,900, larceny/theft constitutes the most
frequent crime, and trespassing (TREA), with a rate of 6 seems to be the least frequent.
14 crime classes happened more than 10,000 times, whereas 14 crime classes only
happened 2000 times. It appears as though the classes are not allocated equally.
Nikhila et al. [26]: An XML file called the Formspring.me dataset contains 13,158
messages from the webpage Formspring.me that was posted by 50 distinct users. For
a study that was carried out in 2009, this dataset was generated. The dataset is sepa-
rated into “Cyberbullying Positive” and “Cyberbullying Negative” categories. Posi-
tive messages stand in for messages that do contain cyberbullying, whereas negative
messages reflect messages that do not. There are 12,266 comments in the Cyberbul-
lying Negative samples and 892 comments in the Cyberbullying Positive class. The
“holdout” strategy, which is employed in datasets with comparable dimensions, was
employed to divide the dataset into test and training sets. There are some well-known
data clusters with comparable sample sizes and numbers, like Routers and 20 New
Groups (20NG). Thus, the same techniques as in these cases have been employed.
Vyawahare et al. [27]: The communications in the Myspace dataset were taken
from group discussions on the website. The dataset contains group chats that have
been classified and categorized into ten message categories. If a group chat has 100
messages, for instance, the first group will have 1–10 texts, the second group will
have 2–11 texts, and the final group will have 91–100 texts. Every set of ten messages
receives a single labeling operation, which notes whether any of the ten messages
138 Monika and A. Bhat
involve bullying. This dataset consists of ten groups with 357 positive and 1396
negative tags, representing 1753 message groups.
Boukabous et al. [28]: The social media posts that were considered in this eval-
uation were tweets. 8,835,016 tweets were collected for the dataset. The dataset is
divided into two sections: the first portion is the same dataset that was utilized for
ontology verification, which contains 5,896,275 tweets and was gathered during the
first week of October 2018. In the second section, there are 2,938,741 tweets that
were gathered in the third week of August 2019. These terms were gathered from a
huge amount of tweets with at least one that may be connected to CSE.
Yuki et al. [29]: The crime dataset includes burglaries that occurred in the Swiss
canton of Aargau (which is equivalent to a State in the USA) between January 14,
2014 and January 13, 2017. Throughout the past few years, Switzerland has continu-
ously ranked among the top countries in Europe for burglars. Given its size of 140,400
hectares and population of roughly 660,000 people, the canton of Aargau satisfies
the concept of a low-populated area. Only the cities of Aarau (20,000 residents),
Baden (17,500 residents), Brugg (10,000 residents), and Zofingen (10,500 residents)
have the greatest population densities (16–20 people/hectare), with the majority of
the surroundings being sparsely populated. Only 4.72 persons per hectare live there
in total.
Table 1 represents the dataset details which discussed above.
There are so many challenges faced by various methods in crime data detection in
social media network which some of them are mentioned below.
• Predictive models cannot be used to accurately anticipate any kind of criminal
activity, particularly interpersonal and domestic attacks, because these crimes are
rarely concentrated in one place and are not directly linked to a single victim
profile. While prediction algorithms may lessen some kinds of individual bias
by minimizing subjective judgments, systems still rely on frequently inaccurate
crime data that contains systemic reporting flaws.
Comparative Review of Different Techniques for Predictive Analytics … 139
9 Conclusion
References
1. Zhuravskaya E, Petrova M, Enikolopov R (2020) Political effects of the internet and social
media. Ann Rev Econ 12:415–438
2. Madakam S, Tripathi S (2021) Social media/networking: applications, technologies, theories.
JISTEM-J Inf Syst Technol Manage 18
3. Can U, Alatas B (2019) A new direction in social network analysis: online social network
analysis problems and applications. Physica A 535:122372
4. Aldwairi M, Tawalbeh LA (2020) Security techniques for intelligent spam sensing and anomaly
detection in online social platforms. Int J Electr Comput Eng 10(1):275
5. Mavrodieva AV, Rachman OK, Harahap VB, Shaw R (2019) Role of social media as a soft
power tool in raising public awareness and engagement in addressing climate change. Climate
7(10):122
6. Dandannavar PS, Mangalwede SR, Kulkarni PM (2021) Predicting the primary dominant
personality trait of perceived leaders by mapping linguistic cues from social media data onto
the big five model. In: Advanced machine learning technologies and applications: proceedings
of AMLTA 2020. Springer Singapore, pp 417–428
7. Kye B, Han N, Kim E, Park Y, Jo S (2021) Educational applications of metaverse: possibilities
and limitations. J Educ Eval Health Prof 18
8. Mirbabaie M, Bunker D, Stieglitz S, Marx J, Ehnis C (2020) Social media in times of crisis:
learning from Hurricane Harvey for the coronavirus disease 2019 pandemic response. J Inf
Technol 35(3):195–213
9. Bilal M, Usmani RSA, Tayyab M, Mahmoud AA, Abdalla RM, Marjani M, Targio Hashem
IA (2020) Smart cities data: framework, applications, and challenges. Handbook Smart Cities
1–29
Comparative Review of Different Techniques for Predictive Analytics … 141
10. Ravazzi C, Dabbene F, Lagoa C, Proskurnikov AV (2021) Learning hidden influences in large-
scale dynamical social networks: a data-driven sparsity-based approach, in memory of roberto
tempo. IEEE Control Syst Mag 41(5):61–103
11. Saini J, Srivastava V (2019, July) Impact of population density and literacy levels on crime in
India. In: 2019 10th international conference on computing, communication and networking
technologies (ICCCNT). IEEE, pp 1–7
12. Yaacoub JP, Noura H, Salman O, Chehab A (2020) Security analysis of drones systems: attacks,
limitations, and recommendations. Internet of Things 11:100218
13. Bhat A (2019, Dec) An analysis of crime data under apache pig on big data. In: 2019 Third
international conference on I-SMAC (IoT in social, mobile, analytics and cloud)(I-SMAC).
IEEE, pp 330–335
14. Mheidly N, Fares J (2020) Leveraging media and health communication strategies to overcome
the COVID-19 infodemic. J Public Health Policy 41(4):410–420
15. Neelakandan S, Paulraj D (2020) A gradient boosted decision tree-based sentiment classifica-
tion of twitter data. Int J Wavelets Multiresolut Inf Process 18(04):2050027
16. Krishnendu SG, Lakshmi PP, Nitha L (2020, March) Crime analysis and prediction using
optimized K-means algorithm. In: 2020 Fourth international conference on computing
methodologies and communication (ICCMC). IEEE, pp 915–918
17. Mohemad R, Muhait NNM, Noor NMM, Othman ZA (2022) Performance analysis in text
clustering using k-means and k-medoids algorithms for Malay crime documents. Int J Electr
Comput Eng 12(5):5014
18. Hossain S, Abtahee A, Kashem I, Hoque MM, Sarker IH (2020) Crime prediction using spatio-
temporal data. In: Computing science, communication and security: first international confer-
ence, COMS2 2020. Gujarat, India, March 26–27, 2020, Revised Selected Papers 1. Springer
Singapore, pp 277–289
19. Yao S, Wei M, Yan L, Wang C, Dong X, Liu F, Xiong Y (2020, Aug) Prediction of crime
hotspots based on spatial factors of random forest. In: 2020 15th International conference on
computer science & education (ICCSE). IEEE, pp 811–815
20. Krysovatyy A, Lipyanina-Goncharenko H, Sachenko S, Desyatnyuk O (2021) Economic crime
detection using support vector machine classification. In: MoMLeT+ DS, pp 830–840
21. Deepak G, Rooban S, Santhanavijayan A (2021) A knowledge centric hybridized approach
for crime classification incorporating deep bi-LSTM neural network. Multimedia Tools Appl
80(18):28061–28085
22. Butt UM, Letchmunan S, Hassan FH, Koh TW (2022) Hybrid of deep learning and exponential
smoothing for enhancing crime forecasting accuracy. PLoS ONE 17(9):e0274172
23. Monika, Bhat A (2022) Automatic twitter crime prediction using hybrid wavelet convolutional
neural network with world cup optimization. Int J Pattern Recogn Artif Intell 36(05):2259005
24. Sahay KB, Balachander B, Jagadeesh B, Kumar GA, Kumar R, Parvathy LR (2022) A real
time crime scene intelligent video surveillance systems in violence detection framework using
deep learning techniques. Comput Electr Eng 103:108319
25. Abbass Z, Ali Z, Ali M, Akbar B, Saleem A (2020, Feb) A framework to predict social crime
through twitter tweets by using machine learning. In: 2020 IEEE 14th International conference
on semantic computing (ICSC). IEEE, pp 363–368
26. Nikhila MS, Bhalla A, Singh P (2020, July) Text imbalance handling and classification for cross-
platform cyber-crime detection using deep learning. In: 2020 11th international conference on
computing, communication and networking technologies (ICCCNT). IEEE, pp 1–7
27. Vyawahare M, Chatterjee M (2020) Taxonomy of cyberbullying detection and prediction
techniques in online social networks. In: Data communication and networks: proceedings of
GUCON 2019. Springer Singapore, pp 21–37
28. Boukabous M, Azizi M (2022) Crime prediction using a hybrid sentiment analysis approach
based on the bidirectional encoder representations from transformers. Indones J Electr Eng
Comput Sci 25(2):1131–1139
29. Kadar C, Maculan R, Feuerriegel S (2019) Public decision support for low population
density areas: An imbalance-aware hyper-ensemble for spatio-temporal crime prediction. Decis
Support Syst 119:107–117
An Approach Towards Modification of
Playfair Cipher Using 16 × 16 Matrix .
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 143
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_13
144 S. Podder et al.
1 Introduction
Data security is more important than ever in the modern world due to the substantial
growth in Internet traffic caused by remote work. To safeguard private information,
sensitive data, and the security of both the sender and receiver, encryption techniques
like cryptography are crucial [1].
Cryptography involves converting plain text into indiscernible text to make it read-
able only to the intended recipient and sender. It has several applications, including
preventing data theft, user authentication, and ensuring user security. Cryptography
is divided into two categories: symmetric and asymmetric. Classical ciphers like the
hill and Playfair ciphers have long been staples in the world of cryptography. The
Playfair cipher is the most often used cipher algorithm for a variety of reasons. The
algorithm is more difficult for the cryptanalyst to decipher because each step pro-
duces a unique ciphertext, as can be seen by carefully examining it. It is unaffected
by attacks using brute force. It is incredibly difficult to decode the cipher without the
key. The substitution is made simple by it [2].
The traditional Playfair cipher, however, is less trustworthy in the present world
due to technological advancements. Classical Playfair ciphers produce encrypted
data that is simple to decode and offers very little security. The traditional playfair
encryption has already undergone numerous modifications that increase security over
the traditional Playfair cipher.
In this paper, we also suggest a variation that primarily relies on a lightweight
cryptography technique. It offers secure solutions for constrained resources in a
network while using less memory, less computer power, and less energy. In order to
add an additional degree of protection, we are adding rotation, shifting, and rolling in
addition to the standard confusion and diffusion techniques. We are also correcting
the Playfair cipher’s fundamental flaw, which is its restriction to only 25 characters.
The proposed method uses a 16 * 16 matrix rather than a 5 * 5 matrix to take into
account all of the lowercase and uppercase alphabets, integers and many symbols
[3].
2 Review Literature
The Playfair cipher is a traditional encryption method that was developed in 1854
by Charles Wheatstone, but it was given the name Lyon Playfair in honor of their
mutual acquaintance. It encrypts plaintext messages using a polygraphic substitution
technique. Every letter of the alphabet is employed in the method’s 5 .× 5 letter
matrix, known as a Playfair square, with the exception of “J,” which is mixed with
“I.” The keyword is then entered into the matrix, followed by the orderly entry of
the remaining letters. Each pair of two letters from the plaintext message is then
An Approach Towards Modification of Playfair … 145
encrypted according to a set of criteria. If the letters are not in the same row or
column, the letters at the corners of the rectangle the two letters make take their
place. The recipient receives the generated ciphertext and uses the identical Playfair
square to decrypt the message. The conventional Playfair cipher is a straightforward
but efficient encryption method, however it is susceptible to several attacks, such as
frequency analysis and assaults using known plaintext (Table 1).
To improve the security and dependability of the traditional Playfair cipher in the
present day, numerous modifications can be made [8]. The main focus of the change
is the security it offers against various threats.
The traditional Playfair cipher can be improved in the following ways to increase
security and benefit (Table 2).
In recent past, few approaches were proposed which includes part of the modi-
fications listed in the above table. Post performing a comparative analysis of those
methods, the table below lists modifications to the Playfair cipher along with their
complexity and drawbacks. The modifications are listed in the first column, while
the second column describes the complexity of each modification, ranging from low
to high. The third column lists the drawbacks of each modification. The modifica-
[10] Multiple rounds Applying the encryption algorithm to the plaintext message
of encryption numerous times is one method for enhancing the security of the
Playfair cipher. Several rounds of encryption is the name given to
this procedure. This increases the complexity and decryption
difficulty of the cipher text, increasing its security
[11] Substitution and The plaintext message is encrypted using just substitution in the
transposition classic Playfair cipher. The cipher text generated by the encryption
process is made more complex by the addition of a transposition
phase, making it more challenging for attackers to decrypt
[12] Variable matrix Another change to the Playfair cipher is the substitution of a variable
size matrix size for the conventional 5 .× 5 matrix. The key space is
expanded by increasing the matrix size, strengthening the cipher’s
security
An Approach Towards Modification of Playfair … 147
tions listed are Multiple Rounds of Encryption, Variable Matrix Size, Substitution
and Transposition, Key Management, and Modified Playfair Cipher. This table can
be used as a reference to compare the different modifications and their respective
trade-offs (Table 3).
the numbers 0–9, the arrow, the mathematical symbols +, -, /, and *, and many other
symbols, the result is a total of 256 characters we start the encryption of the plain
text (Figs. 1 and 2).
3.1 Encryption
Certain steps in the encryption process give the algorithm an additional layer of
security. The stages are explained in detail below:
• Rotation: The 16 .× 16 matrix is rotated by .90◦ clockwise as the initial stage in
encrypting plain text. Rows and columns are transformed in this procedure from
one another. Here is a section of above matrix showing the rotation process (Fig. 3).
An Approach Towards Modification of Playfair … 149
The algorithm used to rotate the matrix is shown in the pseudo code below.
end procedure
• Shifting: Matrix shifting goes through the elements of each row and shifts them in
accordance with an offset value. To start the shift row operation, the elements of
the second row of the matrix are moved one (1) time to the right. Simultaneously,
the next rows follow the exact number of shifts based on one detrimental value of
their individual row number (Fig. 4).
The pseudo code below demonstrates the algorithm that was used to shift the
matrix.
• Rolling: The third operation, called “matrix roll” shifts an array of items while
iterating across a set of axes. In the advised method, a full row is raised by the roll
action (Fig. 5).
The algorithm used to roll the matrix is shown in the pseudo code below. We
will preprocess plain text into digraphs as the matrix rolls. Then, after utilizing the
matrix to encrypt digraphs, we will wait for the sequence to come to a conclusion.
If it does, we will then produce the final cipher; otherwise, we should return to the
stage that involved creating the 16 .× 16 matrix using the key.
150 S. Podder et al.
end procedure
end procedure
3.2 Decryption
As we did during encryption, we take the key and use it to form a 16 * 16 matrix
in order to decrypt the encrypted text. Then we split the cipher text into digraphs
and apply the decryption method similarly to the conventional Playfair cipher by
employing one diagraph at a time. The matrix is then subjected to all three operations,
rolling, shifting, and rotating in the appropriate order. The procedure will be repeated
until all the digraphs have been transformed into plaintext.
After testing the proposed algorithm into the system with keyword “playfair” and
plaintext as “crpyptography,” we get the cipher text as below:
Keyword:- playfair
A poly-graphic substitution cipher that works with pairs of letters rather than single
letters is the Playfair cipher algorithm. To encrypt and decrypt messages, it employs
a 16 .× 16 matrix of letters. The alphabetical letters of the alphabet are commonly
used to fill the matrix, while additional characters like numerals and punctuation
marks are also permitted.
A cryptographic system can be broken via a brute force attack, which involves
attempting every potential key until the right one is discovered. Although it can
work well against weak systems, strong encryption with vast key spaces is typically
impossible to implement. Using strong encryption and key management procedures,
and capping the number of attempts permitted, can reduce the likelihood of brute
force assaults [18].
effective against simple substitution ciphers. To mitigate the risk, more complex sub-
stitution ciphers should be used, and techniques like adding random data or padding
the ciphertext can further disrupt the frequency distribution [19].
The Playfair cipher algorithm encrypts data using a number of operations, such as
matrix rotation, matrix rolling, and matrix shifting. By making the encryption process
more complex, these activities contribute to improving the security of the cipher. The
analysis shows that doing the aforementioned raises the algorithm’s resistance against
attacks like brute force, the man in the middle and frequency analysis. The attacker
would not gain any significant information in a reasonable amount of time.
The number of distinct keys that can be generated by the matrix determines the size
of the key domain. A square matrix that is filled with alphabetic or other characters
serves as the Playfair cipher’s key.
The matrix has 16 rows and 16 columns, thus there are 16 .× 16 = 256 places in
the matrix where characters could be placed. However, as no letter can appear twice
in a row or column, the first character of the key can be any of the 26 letters of the
alphabet, and each succeeding character can only be one of the remaining 25 letters.
As a result, there are the following number of keys that could be used with a 16
by 16 Playfair cipher matrix:
26 .× 25255 is roughly equivalent to 1.045 .× 10472.
This indicates that a 16 by 16 Playfair cipher has a very vast key space, making it
challenging for an attacker to find the right key using brute force attacks.
5 Conclusion
This paper examined the drawbacks of the original play fair cipher. It also shows
different elements that may be used to modify the basic structure of playfare cipher
to enhance security. Based on these key elements, the modifications proposed by
researchers in general playfare cipher were analyzed and the drawbacks are also
listed for better enhancement. Post analyzing various changes made in the Playfair
cipher, this paper proposed a series of modifications to enhance the security level
that also included enlarging the dimension of the key matrix. The proposed structure
improved the conventional Playfair cipher and strengthened security against brute
An Approach Towards Modification of Playfair … 153
force attack, dictionary attack, chosen plain text/cipher text attacks, and known plain
text attacks. The new Playfair cipher approach uses a 16 .×16 matrix that contains
all the printed extended ASCII values, making it more difficult to crack.
References
1. Agrawal G, Singh S, Agarwal M (2011) An enhanced and secure playfair cipher by introducing
the frequency of letters in any plain text. J Curr Comput Sci Technol 1(3):10–16
2. Ferrer JCC, Guzman FED, Gardon KLE, Rosales RJR, Dell Michael Badua DA, Marcelo
DR (2018) Extended 10 x 10 playfair cipher. In: 2018 IEEE 10th international conference on
humanoid, nanotechnology, information technology, communication and control, environment
and management (HNICEM), pp 1–4 . https://doi.org/10.1109/HNICEM.2018.8666250
3. Sastry VU, Shankar NR, Bhavani SD (2009) A modified playfair cipher involving interweaving
and iteration. Int J Comput Theory Eng 1(5):597
4. Basu S, Ray UK (2012) Modified playfair cipher using rectangular matrix. Int J Comput Appl
46(9):28–30
5. Tunga H, Saha A, Ghosh A, Ghosh S (2014) Novel modified playfair cipher using a square-
matrix. Int J Comput Appl 101(12):16–21
6. Goyal P, Sharma G, Kushwah SS (2015) Network security: a survey paper on playfair cipher
and its variants. Int J Urban Des Ubiquitous Comput 3(1):9
7. Dhenakaran S, Ilayaraja M (2012) Extension of playfair cipher using 16 x 16 matrix. Int J
Comput Appl 48(7)
8. Maha MM, Masuduzzaman M, Bhowmik A (2020) An effective modification of play fair cipher
with performance analysis using 6 x 6 matrix. In: Proceedings of the international conference
on computing advancements, pp 1–6
9. Murali P, Senthilkumar G (2009) Modified version of playfair cipher using linear feedback
shift register. In: 2009 international conference on information management and engineering,
pp 488–490. https://doi.org/10.1109/ICIME.2009.86
10. Villafuerte RS, Sison AM, Medina RP (2019) i3d-playfair: an improved 3d playfair cipher algo-
rithm. In: 2019 IEEE Eurasia conference on IOT, communication and engineering (ECICE),
pp 538–541. https://doi.org/10.1109/ECICE47484.2019.8942655
11. Sannidhan MS, Sudeepa KB, Martis JE, Bhandary A (2020) A novel key generation approach
based on facial image features for stream cipher system. In: 2020 third international conference
on smart systems and inventive technology (ICSSIT), pp. 956–962. https://doi.org/10.1109/
ICSSIT48917.2020.9214095
12. Alam AA, Khalid BS, Salam CM (2013) A modified version of playfair cipher using 7? 4
matrix. Int J Comput Theory Eng 5(4):626
13. Patil R, Bang SV, Bangar RB (2021) Improved cryptography by applying transposition on
modified playfair algorithm followed by steganography. Int J Innov Sci Res Technol 6(5):616–
620
14. Hans S, Johari R, Gautam V (2014) An extended playfair cipher using rotation and random
swap patterns. In: 2014 international conference on computer and communication technology
(ICCCT). IEEE, pp 157–160
15. Srivastava SS, Gupta N (2011) A novel approach to security using extended playfair cipher.
Int J Comput Appl 20(6):0975–8887
16. Chauhan SS, Singh H, Gurjar RN (2014) Secure key exchange using rsa in extended playfair
cipher technique. Int J Comput Appl 104(15)
17. Babu KR, Uday Kumar S, Vinay Babu A, Aditya I, Komuraiah P (2011) An extension to
traditional playfair cryptographic method. Int J Comput Appl 17(5):34–36
154 S. Podder et al.
18. Kaur A, Verma HK, Singh RK (2013) 3d-playfair cipher using lfsr based unique random number
generator. In: 2013 sixth international conference on contemporary computing (IC3). IEEE,
pp 18–23
19. Chand N, Bhattacharyya S (2014) A novel approach for encryption of text messages using
play-fair cipher 6 by 6 matrix with four iteration steps. Int J Eng Sci Innov Technol (IJESIT)
3:478–484
20. Yousif MS, Salih RK, Alsaidi NMG (2019) A new modified playfair cipher. In: AIP conference
proceedings, vol 2086. AIP Publishing LLC, p 030047
Real and Fake Job Classification Using
NLP and Machine Learning Techniques
1 Introduction
The recruitment industry plays a crucial role in matching job seekers with available
job opportunities. However, with the rise of the digital age, the recruitment process
has shifted online, making it easier for fraudulent job postings to go undetected.
This problem has serious consequences, including financial losses, identity theft,
and reputational damage for companies whose brands are used in these scams [1].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 155
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_14
156 M. Gupta et al.
Therefore, it is crucial to distinguish between real and fake job postings to protect job
seekers and businesses. To address this issue, traditional approaches to job posting
analysis have relied on manual inspection and verification by human experts. This
method is time-consuming, expensive, and prone to errors. However, advances in
machine learning and natural language processing have opened up new possibilities
for detecting fake job postings more efficiently and accurately [2]. The application
of machine learning to fake job detection offers several advantages over traditional
methods. Firstly, it allows for the analysis of large datasets in a relatively short amount
of time. Secondly, it can learn from patterns in data, access/control the features of
the dataset, and adapt to new types of fake job postings, making it a more robust and
scalable solution. Thirdly, it can reduce the reliance on human experts, saving time
and money while increasing accuracy.
The objective of this work is to develop a classification model using NLP and
machine learning techniques to accurately differentiate between real and fake job
postings. By leveraging the power of NLP and advanced algorithms, the project
aims to provide job seekers, employers, and recruitment platforms with a reliable
tool to identify and mitigate the risks associated with fraudulent job listings. The
project also aims to contribute to the advancement of fraud detection methodologies
and create a more secure and trustworthy job market environment. Additionally,
the multi-class classification of job posting based on industry, type of employment,
experience, and jib function is also performed.
To achieve the objective of proposed work primarily data preprocessing is
performed on input dataset to verify that data is loaded correctly and handle the
missing data and values which are not given. Following this, Exploratory Data Anal-
ysis (EDA) [3] is performed on data to visualize the input and target features graph-
ically. Next, natural language processing (NLP) [4] is used to obtain real and fake
words by tokenization and by removing stop words such as ‘a’, ‘an’, ‘the’, as they do
not carry much meaning. Further, the data is split into training and testing and machine
learning methods namely KNN, Random Forest, Naive Bayes, Logistic Regression,
SVM, and AdaBoost are implemented for real and fake job classification.
The remaining paper is structured as: Sect. 2 gives a literature survey of previous
related works, Sect. 3 shows the architecture of the proposed model, the results are
discussed in Sects. 4 and 5 gives the conclusion and future scope of this study.
2 Literature Survey
Several studies have applied machine learning algorithms to detect fraudulent job
postings. Sultana et al. [5] experimented with machine learning algorithms (SVM,
KNN, Naive Bayes, Random Forest, and Multilayer Perceptron) and Deep Neural
Network for fake job prediction. This method achieved accuracy of 97.7% using
Deep Neural Networks. Reddy et al. [6], also experimented with machine learning
algorithms and obtained the highest accuracy of about 98% with the random forest
classifier. Several machine learning algorithms, including logistic regression, KNN
Real and Fake Job Classification Using NLP and Machine Learning … 157
classifier, random forest algorithm, and bidirectional LSTM algorithm, are used in the
study by Anita et al. [7] to categorize and distinguish fake jobs and real jobs. Dutta and
Bandyopadhyay [8] attained maximum accuracy of 98.27% for fraudulent posts with
random forest classifier. Choudhury and Acharjee [9] used random forest classifier for
job classification and attained accuracy of 98.27%. In the work presented by Nandini
et al. [10] the supervised classifiers Naive Bayes, SVM, Logistic Regressor, and
Random Forest were employed for fraudulent job predictions. This model achieves
maximum accuracy of 97% using random forest classifier. Kumar [11], implemented
a series of experiments with unidirectional and bidirectional RNN architecture using
GRU cells. They obtained an accuracy of 97.4%.
Although several methods are proposed for the classification of real or fake job
postings, these methods have several limitations. Firstly, some of the methods rely
solely on lexical and syntactic features of the job posting text, which may not capture
the semantic meaning of the text accurately. This can lead to misclassification of job
postings and inaccurate results. Additionally, the previous works have considered a
dataset with a very less number of fake jobs (~4% in the dataset) as compared to the
real jobs which leads to underfitting which leads to inaccurate results. Moreover, the
existing methods use limited or incomplete feature sets, which may not capture all the
relevant information present in the job posting data. This may result in poor accuracy
and high rates of false positives or false negatives. From the gaps identified from
existing studies, it is found that there is a need for more advanced and sophisticated
techniques for the classification of real or fake job postings that can overcome the
above-mentioned limitations and provide accurate and reliable results.
3 Proposed Model
The proposed method for classifying and predicting real or fake job postings involves
several steps as shown in Fig. 1.
3.1 Dataset
The dataset utilized in this study is collected from Kaggle [12]. It consists of approx-
imately 17,880 job postings with several attributes including job ID, title, profile of
organization, description, requirements, reimbursements, place, department, income
range, travel, logo of company, type of job, fraudulent, experience, and education,
etc. The parameter ‘fraudulent’ takes values only 0 and 1. The value 0 indicates
non-fraudulent job postings and 1 indicates fraudulent job postings. This dataset has
been collected from various job posting websites and it contains a combination of
real and fake job postings. The dataset is unbalanced, with only about 5% of the
postings being labeled as fraudulent. The job postings in the dataset have a diverse
set of features, making it challenging to classify them accurately. In addition to the
158 M. Gupta et al.
textual data, the dataset also contains several categorical and numerical attributes,
such as location, industry, salary range, and employment type, which could provide
additional information for classification. Largely, the dataset is a valuable resource
for developing machine learning models to identify fraudulent job postings.
Preprocessing of data is the process of transforming the raw input data into a format
that can be effectively utilized by machine learning algorithms. In this research, data
preprocessing comprises several steps, including data cleaning, feature selection, and
feature engineering. The cleaning step involves removing duplicate entries and any
rows with missing or null values. In feature selection, irrelevant or redundant features
are eliminated. For instance, the job_id feature, being unique for each job posting,
does not contribute meaningful information to the classification task. Similarly, the
location feature may have limited relevance, as fake job postings can originate from
anywhere globally. Feature engineering involves creating new features based on
existing ones, which can enhance the classification algorithm’s performance. For
example, a feature counting the number of commonly used keywords in fraudulent
job postings can be created from the job description.
Real and Fake Job Classification Using NLP and Machine Learning … 159
It is the process of examining and visualizing the dataset to understand its structure,
patterns, and relationships between the variables. The main objective of EDA [3] is
to gain insights into the data that can be used to guide subsequent data processing
and analysis. The descriptive statistics involving central tendency and dispersion for
various features are computed to gain understanding of dataset and identify poten-
tial outliers. EDA also involve data visualization techniques such as histograms,
scatter plots, and heatmaps. This helps to identify the patterns and correlation among
parameters in the dataset. The features which are relevant and improve accuracy
of classification are used for further stages. In proposed work EDA is performed
to assist in attaining the multi-class classification for the job postings on basis of
industry and employment type.
The third step involves encoding the categorical data to quantitative numerical values.
Categorical data refers to data that is not numerical in nature. As most machine
learning algorithms operate on numerical data, this step is significant. The presented
dataset contains categorical characteristics for factors like employment type, requisite
experience, and education. One-hot encoding method is used in proposed work to
convert categorical values into numerical values [13]. This preprocessed dataset is
then used to train and test various machine learning algorithms for the classifying
real and false job postings.
160 M. Gupta et al.
The proposed model for real and fake job classification is developed in python.
The libraries used for data analysis are Pandas and Numpy. Matplotlib and Seaborn
libraries are used for data visualization. Scikit-learn library is utilized for feature
extraction and modeling. Natural language toolkit (NLTK) is used to perform text
cleaning, tokenization, and stemming.
The machine learning techniques used in proposed model to carry out the task of
classification of real and fake jobs are Naïve Bayes (NR), Logistic Regression (LR),
K-Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM),
and AdaBoost classifiers (AB) [14–19]. These algorithms are selected as they are
robust to over-fitting, can handle huge datasets with large number of features and are
effective in separating data that is not linearly separable. The ensemble architecture
of random forest improves the accuracy and stability of the model and can handle
outliers, noise and missing values effectively. AdaBoost is another ensemble method
that combines several weak classifiers to increase the model’s performance. It is also
helpful for determining the key features that are most crucial for the classification
task.
4 Results
The original dataset [12] has been modified by increasing the number of fake jobs
from 5 to 15% by adding more real and fake jobs into the dataset from various sources
[20] to prevent the problem of underfitting. The number of fake jobs are increased
from 866 to 3044, and real jobs remain the same. From this dataset, 80% (15,800
jobs) are used for training and 20% (3900 jobs) of the data are used for testing. The
metrics accuracy, recall, precision, F1-score, selectivity, and sensitivity are computed
for evaluating the performance of developed model [21]. The qualitative and quan-
titative results obtained from the developed model are described in the following
sub-sections.
The results obtained for real and fake jobs classification for multiple classes including
industries, employment types, experience, and job functions after performing EDA
Real and Fake Job Classification Using NLP and Machine Learning … 161
on the dataset is shown in Figs. 2, 3, 4 and 5. As observed from Fig. 2, most number
of fake jobs are generated from the accounting industry and real jobs are from the
information technology (IT) industry. The results in Fig. 3 demonstrates that number
of real and fake jobs are highest in full-time employment. In Fig. 4, it can be observed
that the most number of fake jobs are generated from the entry level experience need
and the real jobs from the mid-senior level. Furthermore, Fig. 5 illustrates that the
administrative functions have more number of fake jobs and IT has more real jobs.
The quantitative results obtained from various machine learning classifiers for fake
job prediction are described here. To optimize the performance of the algorithms,
parameter tuning was conducted on various aspects such as adjusting the number of
trees for the random forest classifier and modifying the count of nearest neighbors
162 M. Gupta et al.
for the K-Nearest Neighbors (KNN) algorithm. This iterative process aimed to find
the best combination of parameter values that would yield the most accurate and reli-
able results for classifying real or fake job postings. By experimenting with different
parameter settings, we sought to enhance the models’ effectiveness and ensure that
they were fine-tuned to achieve optimal performance in accurately identifying fraud-
ulent job listings. The performance metrics of Naive Bayes, Logistic Regression,
KNN with K = 5 and K = 10 (where K is the number of neighbors), Random Forest
with T = 50 and T = 100 (where T is the number of trees and the trees used are
classification and regression trees (CART trees)), SVM, and AdaBoost are discussed
hereafter.
The values of evaluation metrics obtained for training dataset for various machine
learning classifiers are given in Table 1. Additionally, Table 2 gives results for testing
dataset. In these tables: M = Model, A = Accuracy, P = Precision, R = Recall, F1
= F1-Score, SP = Specificity, SE = Selectivity, NB = Naive Bayes, LR = Logistic
Regression, RF = Random Forest, AB = AdaBoost, 0 = Real Jobs, 1 = Fake Jobs.
As observed random forest classifier give best results for training as well as testing
data.
Real and Fake Job Classification Using NLP and Machine Learning … 163
The frequently occurring words that are identified to be present in multiple real and
fake job descriptions from the proposed model are summarized in Table 3. These
words are obtained by performing the Porterstemmer method from the NLTK library
and can be visualized by using the wordcloud method. The users need to be cautious
while reading the description of the job profile to identify some of the words which
indicate the meaning of money, easy work, urgent requirement, cash, make solid, etc.
It can be concluded that the creators of fake job profiles want their target audience
to fall in the trap. They achieve it by using words that make the target audience feel
urgency of the specific job post to be filled by the company and also it makes the
audience think that they provide lots of benefits.
164 M. Gupta et al.
Table 3 Frequently occurring words identified from real and fake job descriptions
Real words Fake words
• Develop • Project manager • Custom service • Posit earn
• Analyst • Work environment • Home • Posit work
• Specialist • Business process • Earn • Base payroll
• Director • Knowledge • Daily • Cash
• Office manager • Communication skills, etc. • Want urgent • Work easy
• Provide efficiency, etc.
The comparative analysis of the proposed model and existing models for a real and
fake job classification is performed as illustrated in Fig. 6. As observed, our proposed
model achieved the highest accuracy of 99% in comparison to the existing methods
[5–11]. The reason for improved accuracy might be that in the proposed study the
dataset is improvised by increasing the number of fake jobs to avoid the problem
of underfitting. Also the missing data and incomplete feature set are addressed by
preprocessing the data. Furthermore, in proposed model both NLP and machine
learning (ML) techniques are used for job classification whereas, most of existing
models use only ML models for classification. Additionally, the proposed method
also provides multi-class classification of real and fake job postings on basis of
industry, employment type, experience, and job function.
The proposed work highlights the significance of NLP and machine learning in
addressing real-world problems such as job classification and fraud detection. The
ability to accurately identify fake job postings can greatly benefit job seekers,
employers, and recruitment platforms alike. It helps in preventing fraudulent activ-
ities, protecting individuals from potential scams, and maintaining the integrity of
the job market. A model for real and fake job classification is presented in this work
using NLP and various machine learning classifiers. The results obtained show that
random forest classifier achieves best results with accuracy of 99.2% and selectivity
value of 96%. The proposed work provides a solid foundation for further research
and development in the field of job fraud detection, and its findings can be utilized
to create more secure and trustworthy job markets in the future. The classification
performance could be further improved by exploring more complex machine learning
models and using extensive datasets.
References
14. Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification.
In: On the move to meaningful internet systems 2003: CoopIS, DOA, and ODBASE: OTM
confederated international conferences, CoopIS, DOA, and ODBASE 2003. Catania, Sicily,
Italy, November 3–7. Springer Berlin Heidelberg, pp 986–996
15. Vishwanathan SVM, Murty MN (2002) SSVM: a simple SVM algorithm. In: Proceedings of
the 2002 international joint conference on neural networks. IJCNN’02 (Cat. No. 02CH37290),
vol 3. IEEE, pp 2393–2398
16. Rigatti SJ (2017) Random forest. J Insur Med 47:31–39
17. LaValley MP (2008) Logistic regression. Circulation 117:2395–2399
18. Webb GI, Keogh E, Miikkulainen R (2010) Naïve Bayes. Encycl Mach Learn 15:713–714
19. Schapire RE (2013) Explaining adaboost. Empirical inference: festschrift in honor of vladimir
N. Vapnik, pp 37–52
20. Dataset from kaggle. https://www.kaggle.com/datasets/whenamancodes/real-or-fake-jobs
21. Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classifica-
tion tasks. Inf Process Manage 45:427–437
An Improved Privacy-Preserving
Multi-factor Authentication Protocol
for Wireless Sensor Networks
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 167
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_15
168 S. Kanwar et al.
information directly from nodes within the WSN, which poses a significant chal-
lenge. Since direct retrieval of data from nodes by the BS is not feasible, it is assumed
that real-time data can be accessed by authorized users upon request. Hence, user
authentication has become a crucial area of research in WSNs, aiming to provide
authenticated access to real-time data for authorized individuals [1].
In recent years, there has been a significant rise in the attention and interest within
the research community toward bio-template-based authentication techniques, in
conjunction with passwords [1–3]. Authentication using biometrics in WSNs is more
reliable and safer than older techniques [3]. Biometric privileges over traditional
schemes are as biometric data cannot be misplaced or forgotten, highly complicated
to imitate, cannot be guessed, it is challenging to fake and distribute, etc. [3].
Authentication protocols have emerged as highly effective methods for ensuring
trust, reliability, and legitimacy of users on a large scale. In the present context, the
sundry authentication protocol has been highlighted in WSNs and IoT area. Most
protocols are prone to safety risk and are inappropriate for practical use. Recently,
Amin et al. intended a three-factor authentication protocol with WSN in the view
of healthcare field [4]. Later, Ali et al. [5] analysis also reviewed the Amin et al.
[6] protocol and pointed the weaknesses, namely impersonation, offline password
guessing, extraction of confidential information, session key, and ID guessing attack.
Subsequently, in response to the vulnerability, Ali et al. [5] put forward an improved
remote user authentication technique based on a three-factor authentication model.
Their approach incorporated the utilization of Automated Validation of Internet
Security and application tools, complemented by simulation techniques employing
Burrows-Abadi-Needham logic [7]. Subsequently, Jiaqing et al. [4] conducted a
thorough analysis of the protocol proposed by Ali et al. [5].
Watro et al. [8] in WSNs presented a scheme using RSA and Diffie–Hellman
protocol in user authentication. In accordance with the findings presented in Das
et al. [9], the following weaknesses were pointed out: The attacker, on getting public
key, was able to encrypt the session key and text to the user, and then the user decrypts
the message by private key and utilizes the session key to carry out action that the
attacker intended. On the other hand, Wong et al. [7] gave an effective password-
based technique, which was vulnerable to login identity and stolen verifier attacks;
thereafter, Das et al. [9] improved the above scheme based on passwords. However,
the improved protocol could not safeguard DOS and node capture attack.
He et al. [10] lately improve the weakness of Das et al. [9] protocol. Thereafter,
Vaidya et al. [11] point out flaws in Das et al. protocol [9], namely as stolen smart
card attack, and give the two-factor customer authentication technique on WSNs.
Das et al. [9] also failed to attain mutual authentication by Chen and Shih [12].
To address the aforementioned imperfections, Das et al. [13] proposed an effective
and efficient password-based technique specifically designed for WSNs. Addition-
ally, subsequent enhancements were introduced in [14] to further improve upon the
existing approach. Thakur et al. [15] cryptanalyze an authentication scheme. A recent
multi-factor authentication protocol for WSNs was given by Renuka et al. However,
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 169
we have observed some flaws in their design and vulnerabilities against some cryp-
tographic attacks. These all have motivated us to construct more reliable and secure
protocol that offers better security in sensor communication network.
The symbols utilized in the technique and their meanings are listed in Table 1 provided
below.
During this phase, an authorized user, Ui , has registered with the gateway node (GW),
using a secure channel that entails the following steps:
Step (R1): User Ui firstly chooses his identity IDi and password PWi and inserts
biometric Bi . Gen (Bi ) = (σi , τi ) is then performed using Gen (·) function, where
σi is the biometric key, and τi is media exposure of restoration σi .
Step (R2): The Ui then computes RPWi = h(IDi ||σi ||PWi ) and sends message
(IDi ,RPWi ) to GW through a secure medium.
Step (R3): On getting a request from Ui for registration, the GW node chooses a
1024-bit secret key X s and assigns a pseudo-identity DIDi to user, and construct
ri = h(IDi ||DIDi ||X s ). Then, GW generates a smart card (SCi ) featuring ri∗ =
ri ⊕ RPWi , DIDi , h(·) and sends this back to Ui . Finally, GW stores (DIDi , IDi )
and safeguards the database by utilizing the master key X s .
Step (R4): On acquiring SCi , the Ui stores additional parameters
{τi Gen(·), Rep(·)} within smart card.
On receiving message (DIDi , IDSN j , C1 , T1 ) from Ui , the GW carries out the following
operations:
Step (A1): When a message is received at time T 2 , GW first ensures the authen-
ticity of the timestamp T 1 by determining if |T 2 − T 1 |≤ ΔT, where ΔT is
the maximum permitted delay of the transmission. If the condition is satisfied,
then GW retrieves Ui real identity IDi concerning DIDi . If the database has the
above-mentioned records, then GW compute K 1∗ = h(h(IDi ||DIDi ||X s ), T1 )
and decrypts C1 using the temporary key. The user’s identification is valid if the
message consists the genuine DIDi and T 1 , otherwise GW terminates the session.
Step ((A2): Then, GW )constructs the cipher text C2 =
E MKSN J IDi , IDSN j , RNUi , T1 , T3 , where MKSN j is the shared master key
between GW and SN j , RNUi is the random number, and T3 is the fresh
timestamp. Finally, GW transmits the message (IDSN j , C2 ) to SN j .
Step (A3): On receiving (IDSN j , C2 ) at the time T4 , SN j first decrypts C2 with
MKSN j , then checks if identity IDSN j information that has been encrypted is
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 171
accurate, and validates the timestamp condition |T4 − T3 |≤ ΔT. If the condition
holds true, indicating the authenticity of the message, SN j proceeds with the
session; otherwise, SN j terminates the session.
Step(A4): SN j now ||selects ||a random || number
|| RN)SN j and computes the session
(
key as SKi j = h IDi ||ID S N j ||RNUi ||RNSN j ||T1 ||T3 ; additionally, SN j constructs
( )
key K 2 = h(IDi , IDSN j ,RNUi ), cipher text C3 = E K 2 IDi , IDSN j , RNSN j , Auth1
= h(SKi j ||RNUi || RNSN j ) and transmits the message (Auth1 , C3 , T5 ) back to Ui .
Step (A5): On receiving (Auth1 , C3 , T5 ), user Ui first verifies |T 6 – T 5 |≤
ΔT. If satisfy, then Ui constructs the key K 2 = h(IDi , IDSN j ,RNUi ) and deci-
phers C3 . If C3 contains the genuine IDi , IDSN j then Ui computes the SKi∗j =
( || || || || )
h IDi ||ID S N j ||RNUi ||RNSN j ||T1 ||T5 . Hence, the user Ui validates the protocol’s
legitimacy.
In this sort of assault, a privileged individual on the server side, such as the IT
manager, uses his or her privileges to access the user’s sensitive information, such
as the identity and value RPWi = h(IDi || σi || PWi ), and then damages the users’
rights. In general, users may access different apps or servers using the same identities
and passwords to guarantee ease of use and security. And if an attacker has access
to the user (IDi , RPWi ), a malevolent activist may quickly get password or even on
imprint other password and biometric can easily achieve login because the protocol
does not give any verification in order to authenticate the genuine user.
172 S. Kanwar et al.
In the corresponding protocol, the smart card holds crucial information that is
required
( ∗ to advance in the relevant protocol. The ) SCi contains information as
ri = ri ⊕ RPWi , DIDi , h(·), τi , Gen (·), Rep (·) . If an attacker any how gains
access of the smart card, then an attacker can easily construct the key that is being
utilized in C1 = E K 1 (DIDi , RNUi , T1 ) for encryption and decryption purposes. An
attacker can easily compute K 1 = h(M1 , T1 ), where M1 can easily be computed from
smart card by utilizing XOR property and RPWi is extracted from privileged insider
or by inserting his own false information as because there is not verification provided
to authenticate generous user.
An attacker has access to the data that is being transmitted within insecure channel
which he can further make use for his own benefits. In the respective protocol, an
advisory has access to the information such as (DIDi , IDSN j ,C1 , T1 ,C2 ,Auth1 ,C3 ,
T5 ). Thus, to compute the session key, an attacker must obtain the respective data
(IDi ||IDSN j ||RNUi ||RNSN j ||T1 ||T5 ), where IDSN j , T1 , T5 are the public published
transmitted data through insecure channel, IDi can easily be extracted through the
privileged insider attack, and with this information, an attacker could eventually insert
his own random number RNUi or can be easily extracted it by simply decrypting C1
and same goes for RNSN j so an attacker hence can easily compute the session key
and hence hijack the session.
Previously in password updating setup in the respective Renuka et al. protocol [16],
the user Ui insert his IDi , password PWi and bio-template Bi to smart device. Then,
further it computes basic parameters and then Ui is directed to insert new PWi∗ and
Bi∗ to smart device and update the certain information of smart card. Importantly
in this respective protocol, smart card never substantiates the old information but
directly updates the new credentials’ data of the user into the smart card. In this
scenario, the two difficulties are priory faced as: (1) password modification after the
loss of a smart card, (2) lack of verification in password-update process.
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 173
Considering the security flaws observed in the protocol proposed by Renuka et al.
[16], which involve insider threats, stolen smart card scenarios, and the potential
exposure of ephemeral secrets, this section introduces our enhanced technique based
on the earlier assessments.
During the registration stage, an authorized user Ui registers through a secure channel
with GW. The registration process encompasses a series of steps, which are outlined
as follows:
Step (R1): User Ui needs to choose IDi , PWi and insert biometric parameter Bi
and then performs Gen(Bi ) = (σ i , τ i ).
Step (R2): Then, Ui randomly generates number as x ∈ Zq, performs following
computation as RPWi = h(IDi ||PWi ||σi ) ⊕ x, and transmits message {IDi , RPWi }
to GW.
Step (R3): On receiving the request from Ui , GW generates 1024-bit key X s . Also
assign a pseudo-identity to IDi as DIDi . GW then computes N = h(IDi ||DIDi ||X s )
and Ni = RPWi ⊕ N and stores (DIDi , g, h(.)) in SCi smart card. Compute and
transmit {Ni , SCi } message back to Ui .
Step (R4): On receiving message {Ni , SCi } from GW, user computes RPWi2 =
Ni ⊕ σi and RPWi3 = h(IDi ||PWi ||RPWi2 ) and stores {τi ,RPWi2 ,RPWi3 } in SCi ,
respectively.
To authenticate GW and obtain real-time data from SN j , the user Ui must compass
a series of steps given below:
Step (L1): Ui firstly inserts his SCi into reader, then inserts his ID'i , PW'i , and B'i .
And retrieve biometric key as σ 'i = Rep(B'i , τ 'i ) by employing rep(.) function.
Then, Ui computes RPW'i2 = Ni ⊕ σ 'i and RPW'i3 = h(ID'i ||PW'i ||RPW'i2 ).
Step (L2): Then Ui verifies RPW'i3 by comparing it with the original value from
SCi as RPW'i3 =? RPWi3 . The procedure should be terminated if this verification
is unsuccessful; otherwise, the user will start the authentication step.
Step (L3): Now, Ui selects RNUi a random number and computes key K u =
h(IDi ||(Ni ⊕ IDi )) for encryption and computes cipher text C1 = E K U (DIDi , g,
RNUi ,IDSN j , T1 ). And send message (DIDi , C1 , T1 ) to GW.
174 S. Kanwar et al.
Upon receiving the authentication request (DIDi , C1 , T1 ) from the user, the GW
follows a set of procedures, which are outlined as follows:
Step (A1): Initially, the GW verifies the timestamp T1 by checking if the time at
which the message arrived, denoted as T2 , satisfies the condition |T2 − T1 |≤ ΔT.
This ensures that the time difference between the two timestamps falls within the
permissible range of the longest message transmission delay allowed in the sensor
network. If the condition satisfies, then GW further retrieves corresponding user
Ui real identity IDi as per DIDi . Once the database contains the requisite records,
the GW proceeds to calculate K u = h(IDi ||(Ni ⊕IDi )) and decrypt C1 .
Step (A2): Then GW computes C2 = E MKSN J (IDi , IDSN j , RNUi , T1 , T3 ), where
MKSN j is the master key shared between GW and SN j . Then, GW transmits
(IDSN j , C2 ) message to SN j .
Step (A3): Sensor node on receiving (IDSN j , C2 ) firstly deciphers C2 by utilizing
master key and then verifies freshness of timestamp |T4 − T3 |≤ ΔT. Then generate
a random number RNSN j .
Step (A4): Hence, compute session key as SKi j = h(IDi ||IDSN j || g RNUi ||g
RNSN j || g RNUi RNSN j ||T1 ||T5 ). Then, compute key K 2 = h(IDi ||IDSN j ||RNUi ) to
encrypt C3 = E K 2 (IDi ||IDSN j ||RNSN j ||T5 ) and computer Auth1 = h(SKi j ||RNUi ||
RNSN j ). Further, transmit message (Auth1 , C3 , T5 ) to user.
Step (A5): When the user Ui acquires (Auth1 , C3 , T5 ), then firstly user checks
whether the condition |T6 − T5 |≤ ΔT is satisfied. If timestamp condition holds,
then Ui computes the key K 2 = h(h(IDi ||IDSN j ||RNUi )) and decrypts C3 . If the
C3 contains the authentic identities IDi , IDSN j , then Ui enumerates SKi∗j =
||
h(IDi ||IDSN j ||g RNUi ||g RNSN j || g RNUi RNSN j ||T1 ||T5 ). Lastly, Ui validates the
schemes legitimacy by verifying Auth1 . If it satisfies, then Ui agrees to execute
the protocol, and SKi∗j is utilized to transfer sensitive data with the SN j in the
future connection.
In the phase where the user intends to modify their password, they are directed to
follow the subsequent steps:
Step (P1): Initially, the user inserts their smart card into the card reader, followed
by the insertion of their ID'i , PW'i , and B ' i . Then retrieve biometric key
σ 'i =Rep(B'i ,τ 'i ) by utilizing rep(.) function. Then compute RPW'i2 =Ni ⊕σ 'i and
RPW'i3 =h(ID'i ||PW'i ||RPW'i2 ). Then verify RPW'i3 =? RPWi . If the condition is
met, the user proceeds with the update phase; otherwise, the updating request is
terminated.
An Improved Privacy-Preserving Multi-factor Authentication Protocol … 175
Step (P2): Then, Ui selects new PW'i and B ' i and uses Gen(B ' i ) = (σ ' i ,τ ' i ) to
compute RPWinew =h(IDi ||PWinew ||σinew ) ⊕ ai and send {RPWinew , IDi } to gateway
node GW.
Step (P3): On receiving {RPWinew ,IDi } from Ui , GW validates (IDi , DIDi ) by
comparing; if the condition is met, then GW computes N new = h(IDi ||X s ||DIDinew )
and Ninew = RPWinew ⊕ N new . And send this {N new ,SCi } to Ui .
new
Step (P4): On receiving {Ninew ,SCi } from GW, user then computes RPWi2
= Ni ⊕ σi
new new
and RPWi3 = h(IDi ||PWi ||RPWi2 ) and stores these new
new new new
new new
parameters (τinew ,RPWi2 ,RPWi3 ) in SCi smart card.
5 Security Analysis
If an adversary manages to gain access to the user’s personal data IDi , RPWi during
the registration process, it poses a significant security risk. However, even if an
attacker obtains all the information stored in the smart card, including RPWi2 and
RPWi3 , through a power analysis attack, it would not allow them to deduce the user’s
pseudo-identity DIDi or PWi . This is due to the secure hash function medium and the
integration of genuine user biometric data, which make it impossible for the attacker
to guess or infer the user’s pseudo-identity, password, and biometric.
Our proposed protocol involves {τi ,RPWi2 ,RPWi3 } parameter the smart card SCi .
Regardless of whether the smart card is stolen or misplaced, there are as such no
parameters that can allow a malicious activist to obtain or guess user’s secret infor-
mation. Thus, if an attacker obtains the user smart card, he will not be capable of
taking advantage of its principles. The utilization of a secure hash function medium
and the integration of authentic biometric data ensure the impossibility of an attacker
misusing the smart card data. As a result, smart card theft attacks are not possible
using the offered solution.
Taking into consideration the situation where an adversary gains illicit access to the
random numbers RNUi and RNSN j however, these random numbers are utilized in
computing C1 , C2 , and C3 which are encrypted. Thus, although on having access to
random numbers, an attacker still will not be able to compute C1 , C2 , C3 because of
176 S. Kanwar et al.
In our proposed system, the user validates the session key by verifying the condi-
tion Auth1 = h(SKi j ||RNUi || RNSN j ), which incorporates secret parameters exclu-
sively known to the legitimate entities participating in the protocol. Consequently,
the suggested approach enables the user to authenticate and verify the session key.
In our proposed protocol, the random numbers RNUi , RNSN j , and fresh timestamp
integrated within C1 = E K U (DIDi ,g,RNUi ,IDSN j , T1 ), C2 = E MKSN J (IDi ,IDSN j ,RNUi ,
( || || )
T1 , T3 ), and C3 = E K 2 IDi ||IDSN j ||RNSN j ||T 5 that are exchanged during various
phases of the protocol are individually customized for each session, ensuring their
uniqueness and distinctiveness in sessions. Consequently, an adversary is unable to
track the activities of the user and sensor nodes. Furthermore, since no identifiable
information is transmitted openly through insecure channels, this scheme provides
both anonymity and Untraceability.
6 Conclusion
having formal evidence provided in Renuka et al. [16] protocol, their techniques are
still vulnerable to smart card loss attacks, privileged insider attacks, ephemeral secret
leakage attack, and flaw in password update phase. The findings of our cryptanalysis
discourage the practical deployment of these schemes and highlight certain issues
in constructing a strong scheme for WSNs. And, a new authentication protocol for
WSNs is proposed in this paper that achieves all the required security goals.
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 179
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_16
180 M. Midha et al.
1 Introduction
Crowdfunding is a popular method of raising funds for various projects and initiatives.
It involves a large number of people contributing small amounts of money to a project,
instead of relying on a single investor or institution for funding. Crowdfunding has
been used to finance a range of ventures, including startups, social causes, and creative
projects [1].
However, crowdfunding also poses several challenges. One of the major challenges
is the lack of transparency and accountability in the crowdfunding process. Since
there is no central authority governing the process, it can be difficult for backers to
verify the authenticity of the project and ensure that their funds are being used for
the intended purpose. In addition, there is a risk of fraud, with some individuals or
groups using crowdfunding platforms to scam backers and disappear with the funds.
In recent years, crowdfunding has become a popular way for entrepreneurs and small
businesses to raise money. However, it is not without its challenges [3]. One of the
main issues with crowdfunding is the risk of fraud. As crowdfunding has become
more popular, the number of fraudulent campaigns has also increased. This has led
to a number of high-profile cases where individuals have lost money after investing
in fraudulent campaigns.
3 Methodology
The research design used in this study was a qualitative approach that involved a
comprehensive analysis of existing literature on crowdfunding and blockchain tech-
nology. A systematic literature review was used to identify relevant peer-reviewed
academic journals and conference proceedings, and data were extracted from these
sources using a predetermined set of criteria.
The data collection methods used in this study involved the identification and selec-
tion of relevant peer-reviewed academic journals and conference proceedings using
a comprehensive search strategy. The search strategy was designed to identify all
relevant literature on the use of blockchain technology in crowdfunding and fraud
prevention. The selected papers were then reviewed and analyzed to identify key
themes and patterns [4] (Fig. 1).
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 183
This study used data analysis techniques to extract key themes and patterns from
the selected literature on the use of blockchain technology in crowdfunding and
fraud prevention. The content analysis approach involved systematic identification
and categorization of data based on a predetermined set of criteria. The results of
the content analysis were then used to draw conclusions about the use of blockchain
technology in crowdfunding and its potential for addressing the risks associated with
fraud. The research design, data collection methods, and data analysis techniques
used in this study were intended to provide a rigorous and systematic approach to
analyzing the existing literature on this topic.
The smart contract is written in the Solidity programming language and is deployed
on the Ethereum blockchain network [7]. The smart contract consists of two main
functions: the create Project function and the contribute function. The create Project
function allows the project owners to create a project by providing the project details
and the fundraising target. The contribute function allows the backers to contribute
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 185
to the project by sending funds to the smart contract address. Once the fundraising
target is met, the funds are transferred from the smart contract to the project owners.
The front end of the crowdfunding platform is developed using ReactJS, which is a
popular JavaScript library for building user interfaces. The back end of the platform
is developed using NodeJS, which is a JavaScript runtime environment. The back
end interacts with the smart contract deployed on the blockchain network using the
web3.js library [8].
Fig. 3 Features of our Web3 crowdfunding platform using blockchain navigation board constituting
options like (all campaigns viewer, create new campaigns, amount raised, donated amount by user,
user account, sign out/log out, lite/dark mode switcher, search for a campaign from here, visit your
own profile/change settings)
Fig. 4 Benefits of
blockchain technology
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 187
and backers while fostering trust in crowdfunding. Overall, the document presents
a fresh approach to crowdfunding, emphasizing security and transparency through
a dedicated platform. With the utilization of blockchain, the authors aim to shape
a more secure and decentralized crowdfunding ecosystem, paving the way for the
future of crowdfunding (Fig. 5).
Future research should conduct a larger-scale study with a larger sample size to
validate the findings of this study and focus on the potential of using blockchain
technology in other types of crowdfunding [13–15].
6.3 Conclusion
In conclusion, blockchain technology can provide a more secure and transparent envi-
ronment for investors and campaign organizers, with smart contract code reducing
fraud risk. Further research is needed to explore potential [16].
References
1. Ahlers GK, Cumming D, Günther C (2015) Signaling in equity crowdfunding. Entrep Theory
Pract 39(4):955–980
2. Ang JS, Hachemi A (2018) An investigation of crowdfunding success factors in the United
States. J Small Bus Manag 56(2):263–282
3. Belleflamme P, Lambert T, Schwienbacher A (2014) Crowdfunding: tapping the right crowd.
J Bus Ventur 29(5):585–609
4. Carter RE, Lusch RF (2020) Crowdfunding the commons: how blockchain can facilitate peer-
to-peer renewable energy financing. Energy Res Soc Sci 68:101563
Blockchain-Powered Crowdfunding: Assessing the Viability, Benefits … 189
5. Cai C, Xu Y (2020) The impact of blockchain on crowdfunding. Electr Commer Res Appl
41:100918
6. Colombo MG, Franzoni C, Rossi-Lamastra C (2015) Internal social capital and the attraction
of early contributions in crowdfunding. Entrep Theory Pract 39(1):75–100
7. Crosby M, Pattanayak P, Verma S, Kalyanaraman V (2016) Blockchain technology: beyond
bitcoin. Appl Innov 2(6–10):71–81
8. Dangi AK, Kumar S, Rai R (2020) Blockchain-enabled crowdfunding and its potential impact
on entrepreneurship. J Bus Res 112:150–162
9. Gerber EM, Hui JS, Kuo PY (2012) Crowdfunding: why people are motivated to post and fund
projects on crowdfunding platforms. In: Proceedings of the ınternational workshop on design,
ınfluence, and social technologies: techniques, ımpacts and ethics, pp 9–16
10. Zhang Z, Li S, Lu K (2020) Blockchain for crowdfunding: state-of-the-art, challenges, and
opportunities. IEEE Access 8:17829–17843
11. Lee JW, Lee J, Lee J, Park CK (2018) A study on the utilization of blockchain in crowdfunding.
Sustainability 10:9
12. Parojcic A, Gillette JL (2019) Crowdfunding and blockchain: challenges and opportunities for
social entrepreneurs. J Soc Entrep 10(3):287–305
13. Wang NNY (2018) How blockchain is revolutionizing crowdfunding. J Dig Bank 3(1):29–39
14. Forti F, Schirripa Spagnolo F, Visaggio G (2018) Crowdfunding and blockchain: opportu-
nities and challenges. In: Proceedings of the 2018 ınternational conference on ınformation
management and technology
15. Young AJ (2018) Blockchain and crowdfunding: a legal perspective. J Finan Crime 25(4):1029–
1037
16. Vermeulen P (2015) Crowdfunding on the blockchain: a brief analysis. In: Proceedings of the
14th ınternational conference on mobile business
Geospatial Project: Landslide Prediction
Abstract This literature review examines the use of machine learning (ML) algo-
rithms for landslide identification and provides an overview of recent studies in this
field. The most used algorithms for landslide identification include support vector
machine (SVM), decision trees, random forests, artificial neural networks (ANNs),
and deep learning models such as convolutional neural networks (CNN). The review
highlights the strengths and limitations of these approaches, such as data scarcity,
imbalanced datasets, and interpretability issues, and proposes solutions to these chal-
lenges. Two specific studies in landslide prediction using ML are discussed, including
a digital inventory using supervised learning to detect landslides and an optimized
random forest model to evaluate landslide susceptibility with 16 conditioning factors.
The review concludes by emphasizing the potential of ML techniques for landslide
identification and the importance of understanding their strengths and limitations.
This paper provides valuable insights for researchers and practitioners interested in
applying ML algorithms for landslide identification and prediction.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 191
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_17
192 H. Sharma et al.
1 Introduction
Landslides are a significant natural hazard that can cause substantial damage to
both infrastructure and human life, resulting in economic losses, displacement of
communities, and even loss of lives. These events occur when soil and rock mass
movements slide, topple, or flow rapidly, leading to the sudden failure of slopes or
hillsides. The factors that contribute to the occurrence of landslides are complex and
multifaceted, involving geological, environmental, and human-related aspects. For
example, geological factors such as the type and structure of rocks, soil types, slope
steepness, and elevation can significantly influence the likelihood of a landslide.
Similarly, environmental factors such as heavy rainfall, earthquakes, and changes
in groundwater levels can trigger landslides. Human activities, including deforesta-
tion, construction, mining, and agricultural practices, can also destabilize slopes and
increase the risk of landslides. Given the complexity of landslide causation, it is diffi-
cult to predict exactly where and when landslides will occur [1]. However, there are
ways to identify areas that are more susceptible to landslides. For instance, geolog-
ical mapping, satellite imagery, and LiDAR data analysis can be used to assess
the geology and topography of a region and identify areas with a higher risk of
landslides. Additionally, monitoring, and early warning systems, such as sensors
and alarms, can provide alerts to authorities and communities in landslide-prone
areas. To prevent the negative impacts of landslides, it is crucial to implement effec-
tive mitigation and adaptation strategies. These can include avoiding development
in high-risk areas, implementing slope stabilization measures, and implementing
effective land-use planning and management practices. Community awareness and
education programs can also play a crucial role in reducing the impacts of landslides
by informing residents of the risks and teaching them how to prepare and respond to
landslide events [2].
Machine learning (ML) algorithms have revolutionized the field of landslide iden-
tification and prediction due to their ability to process and analyze large amounts of
data in a relatively short amount of time. These algorithms have the potential to iden-
tify patterns and relationships that may not be detectable using traditional methods,
and they can be used to make accurate predictions about future landslide events.
Several ML algorithms have been employed in landslide identification studies, each
with its own strengths and limitations. Support vector machine (SVM) is a popular
ML algorithm that works by finding the best separation between two classes of data.
Decision trees are another widely used ML algorithm that creates a tree-like model of
decisions and their possible consequences. Random forests are an extension of deci-
sion trees that involve building multiple trees and averaging their results to improve
accuracy. Artificial neural networks (ANNs) are ML algorithms modeled on the
structure and function of the human brain, with input and output layers and one or
more hidden layers. Deep learning models, such as convolutional neural networks
(CNNs), have also been used in landslide identification studies due to their ability to
process large amounts of data and detect complex patterns. One of the primary chal-
lenges in using ML algorithms for landslide identification is the need for high-quality
Geospatial Project: Landslide Prediction 193
input data. Accurate and reliable data, including geological and meteorological data,
is essential for ML algorithms to learn and make accurate predictions. In addition,
the performance of ML algorithms is heavily influenced by the selection of features
or variables that are inputted into the model. Therefore, it is important to carefully
select and preprocess data to ensure that the ML algorithm is trained on the most rele-
vant and informative variables. Despite these challenges, ML algorithms have shown
great promise in landslide identification and prediction, and they have the potential
to significantly improve our ability to mitigate and adapt to landslide hazards. By
combining traditional methods with advanced ML algorithms, researchers and prac-
titioners can gain a more comprehensive understanding of landslide occurrence and
develop effective risk management strategies to reduce the impact of these natural
hazards.
Landslides are a significant natural hazard that can cause substantial damage
to both infrastructure and human life. To mitigate the risks associated with land-
slides, there is a growing interest in using machine learning (ML) algorithms for
landslide identification and prediction. Recent studies have used various ML algo-
rithms, including support vector machine (SVM), decision trees, random forests, arti-
ficial neural networks (ANNs), and deep learning models like convolutional neural
networks (CNN), to identify and predict landslides. This literature review provides
an overview of recent studies that have used ML algorithms for landslide identifi-
cation. The review discusses the strengths and limitations of these approaches, as
well as challenges related to data scarcity, imbalanced datasets, and interpretability
issues. Proposed solutions to these challenges, such as data augmentation tech-
niques, sampling strategies, and model explainability tools, are also highlighted.
One example of a study that used ML for landslide identification involved a digital
inventory using supervised learning to detect landslides in Rudraprayag. The study
used a combination of image processing techniques and SVM to identify landslides
from satellite images.
The authors reported that their approach achieved an accuracy of 90.9% in
detecting landslides, demonstrating the potential of ML algorithms in landslide iden-
tification. Another study focused on landslide prediction and used a random forest
model to evaluate landslide susceptibility with 16 conditioning factors as parame-
ters. The study applied data preprocessing techniques to address the imbalanced data
issue and reported that their approach achieved high accuracy in predicting land-
slide susceptibility, indicating the potential of ML algorithms in landslide predic-
tion. Despite the promising results, ML approaches for landslide identification and
prediction still face challenges. Data scarcity and imbalanced datasets are common
issues in landslide studies, which can affect the performance of ML algorithms. In
addition, the interpretability of ML models can be a challenge, particularly when
the models involve complex deep learning architectures. Proposed solutions to these
challenges include data augmentation techniques to address data scarcity, sampling
strategies to address imbalanced datasets, and model explainability tools to enhance
the interpretability of ML models. Landslides are a significant natural hazard that can
cause devastating damage to human life and infrastructure. In recent years, machine
194 H. Sharma et al.
learning (ML) techniques have emerged as powerful tools for landslide identifica-
tion and prediction due to their ability to process large amounts of data and identify
complex patterns that are difficult to detect with traditional methods.
This literature review provides an overview of recent studies that have used
ML algorithms for landslide identification and prediction, including support vector
machine (SVM), decision trees, random forests, artificial neural networks (ANNs),
and deep learning models like convolutional neural networks (CNN). The review
emphasizes the strengths and limitations of these approaches, including challenges
related to data scarcity, imbalanced datasets, and interpretability issues. Proposed
solutions to these challenges, such as data augmentation techniques, sampling strate-
gies, and model explainability tools, are discussed in detail. The review also high-
lights two specific studies that have successfully used ML for landslide identification
and prediction, demonstrating the potential of ML algorithms in this field. While ML
techniques offer promising results, they also face significant challenges. For example,
data scarcity and imbalanced datasets can affect the performance of ML algorithms,
while complex deep learning architectures can pose interpretability issues. Proposed
solutions to these challenges, such as data augmentation, sampling strategies, and
model explainability tools, can help address these issues and enhance the accuracy
and reliability of ML-based landslide models. In conclusion, the use of ML algo-
rithms in landslide identification and prediction has the potential to significantly
contribute to reducing the impact of landslides on society and infrastructure.
However, it is essential to understand the strengths and limitations of ML tech-
niques and address the challenges associated with using these methods. Ultimately,
a combination of traditional methods and ML techniques can provide valuable
insights into landslide prediction and identification, leading to more effective risk
management strategies and reduced impact on communities.
2 Lıterature Revıew
The use of machine learning (ML) techniques to identify landslides has received
increasing attention in recent years. This literature review provides an overview
of recent studies applying ML algorithms to identify landslides and highlights the
strengths and weaknesses of these approaches.
Support vector machine (SVM) is a commonly used ML algorithm for landslide
identification. SVMs are particularly well suited for high-dimensional datasets and
can handle nonlinear relationships. Che et al. used SVM to classify landslides based
on morphological features and achieved an accuracy of 86.8%. Similarly, Yu et al.
used SVM to classify landslides based on their spectral characteristics, achieving an
overall accuracy of 88.5% [3].
Geospatial Project: Landslide Prediction 195
Decision trees and random forests are also popular ML algorithms for landslide
identification. Decision trees are especially useful for classifying landslides based
on their physical characteristics. Random forests, on the other hand, can handle
imbalanced data sets and reduce overfitting. Choye et al. used decision trees to classify
landslides based on their physical characteristics, achieving 88.9% accuracy. Cheat
Al’s random forest was used to classify landslides based on terrain features, achieving
89.9% accuracy. Artificial neural networks (ANNs) are another type of ML algorithm
that has been applied to landslide identification. ANNs are particularly useful for
landslide hazard mapping and early warning systems. Hmmm [4] used ANNs to
predict landslide vulnerability based on terrain, geology, and environmental factors,
achieving 82.5% accuracy. Hmmm used ANN to develop a landslide early warning
system based on real-time monitoring data, achieving 91.2% accuracy.
Deep learning models such as convolutional neural networks (CNN) have also
been applied to identify landslides using remote sensing data. CNNs are particularly
useful for feature extraction from high-dimensional data such as satellite imagery.
Huang et al. used a CNN-based approach to identify landslides from satellite imagery,
achieving an overall accuracy of 90.3%. Zhang et al. used a deep learning model to
detect landslides from synthetic aperture radar (SAR) images, achieving a detection
accuracy of 93.6% [5].
Despite great progress in applying ML techniques to landslide identification,
these approaches also have some limitations. One of the main limitations is the
scarcity of data, especially in areas with restricted access and scarce data. In addition,
imbalanced datasets and interpretability issues also pose challenges in developing
ML models to identify landslides. This can lead to imbalanced datasets that can
adversely affect the performance of Interpretability issues and pose challenges, such
as for some ML algorithms: B. Deep learning models can be difficult to interpret,
hindering their application to decision-making processes [6].
To address these challenges, researchers have proposed various solutions,
including B. Data Augmentation Techniques, Sampling Strategies, and Model
Explainability Tools. Data enrichment techniques such as oversampling and under
sampling help correct imbalanced datasets by generating synthetic samples or
removing some samples to balance the dataset. Sampling strategies such as stratified
sampling and cluster sampling also help correct imbalanced data sets by choosing
a representative sample for each class. Model explainability tools, such as feature
importance analysis and model visualization, help improve the interpretability of ML
models and enable their application in decision-making processes. In summary, the
literature review highlights the potential of ML techniques in landslide identification
and provides insight into the strengths and limitations of these approaches.
196 H. Sharma et al.
Fig. 1 Artifacts empowered by artificial ıntelligence. Source LNCS 5640, p. 115 [9]
Geospatial Project: Landslide Prediction 197
The study area is in Fengjie County, China, which has complex tectonic stress
fields, abundant rainfall, and a Central Asian tropical humid monsoon climate with
an annual average precipitation of 1132 mm. The research aims to find effective
solutions to prevent and control landslide disasters and improve the accuracy of
landslide susceptibility mapping, which can ultimately help reduce the risks and
impacts of landslides.
In this research study, a comprehensive collection, sorting, and organization of
geospatial data related to landslide formation, types, and triggers of historical land-
slides in Fengjie County from 2001 to 2016 were conducted. The dataset consisted
of 1520 historical landslides and their associated factors categorized by type and
trigger. Most landslides in the study area were small, shallow, and soil-based, with
most triggered by rainfall. The processed data provided insights that can be utilized
to identify patterns and relationships between landslide formation and contributing
factors in the study area [11].
The study also acknowledges the complexity of landslide formation mechanisms
and how susceptibility to landslides is influenced by both natural and human factors.
The authors highlighted the limitation of most landslide susceptibility models that
only consider a few factors. Thus, the study identified 16 conditioning factors
that were categorized into four aspects: topography, geological conditions, envi-
ronmental conditions, and human activities. The selection of these factors was
guided by the principles of measurability, operability, unevenness, completeness,
and non-redundancy [12].
To evaluate landslide susceptibility accurately, the study employed the random
forest model, an ensemble learning method used for both regression and classification
tasks. The model involves constructing multiple decision trees through different data
subsets and combining their outputs to obtain the result. The random forest model is
known for its ability to handle noise and outliers, reduce overfitting, and provide high
prediction accuracy and stability. The study showed that the optimized random forest
model achieved high accuracy in landslide susceptibility evaluation, with AUC values
of 0.95, 0.87, and 0.93 for the training data set, verification data set, and regional
simulation, respectively. Additionally, the model’s prediction was consistent with the
distribution characteristics of historical landslides in the study area. The recursive
feature elimination method was used to identify the dominant conditioning factors
that explained the degree of landslide susceptibility, with the contribution of human
activities accounting for a significant proportion of the conditioning factors affecting
landslide susceptibility [13].
Random Forest (Rf) Random forest is one of the powerful machine learning algo-
rithms that is being used for predictive modeling, including classification and regres-
sion. Being an ensemble learning method a combination of multiple decision trees
is used to make predictions. In a random forest model, using random subsets of the
available data and random subsets of the available features a set of decision trees is
created. Each decision tree is trained on a different subset of the data and features,
which helps to reduce overfitting and increase the accuracy of the model.
198 H. Sharma et al.
In the random forest model when a new data point is presented, each decision tree
makes a prediction based on the features of the data point. The final prediction of
the model is then determined by taking the average or majority vote of the individual
tree predictions.
Random forest models have several advantages, including their ability to handle
high-dimensional data, their robustness to noisy and missing data, and their ability
to capture complex nonlinear relationships between features and the target variable.
They are widely used in a variety of fields, including GIS, where they have been
applied to tasks such as land-use classification, forest mapping, and urban growth
prediction. The random forest predicts the mean square error (MSE) of out of bag
2
part of the data: MSEOOB = n − 1 ni = 1 zi − ẑOOBi .
Xtreme Gradient Boost (XG Boost) For predictive modeling tasks, XGBoost
(extreme gradient boosting) is another popular machine learning algorithm that is
often used. Combining multiple weak learners (usually decision trees) into a single
strong learner it justifies the ensemble learning method of it [14–16].
In XGBoost, following the sequential manner, the weak learners are trained, where
each new tree is trained to correct the errors made by the previous trees. By assigning
higher weights to the misclassified data points this is achieved, which ensures that
subsequent trees focus on the more difficult to predict instances.
The ability to handle large datasets and feature spaces with high-dimensional
values, as well as its strong performance on a wide range of classification and
regression tasks. It also offers several advanced features, including regularization
techniques, early stopping, and automatic handling of missing values.
One of the key advantages of XGBoost over other ensemble learning methods is
its scalability and speed. It has been shown to outperform other popular algorithms,
such as random forest and gradient boosting, on many benchmark datasets. XGBoost
has been widely used in a variety of fields, including GIS, where it has been applied
to tasks such as land cover classification, urban growth prediction, and soil mapping
[17].
Convolutional Neural Network (CNN) Landslide prediction using machine
learning is an active research area, and convolutional neural networks (CNNs) have
shown promising results in this area. CNNs are a type of neural network commonly
used for image recognition tasks and can also be used for landslide prediction. The
basic idea behind using a CNN for landslide prediction is to feed the network with
satellite imagery or other geospatial data of an area of interest and train it to classify
different areas as landslide-prone or not. CNNs are trained on labeled data obtained
by matching historical landslide events with appropriate geospatial data. Once the
CNN is trained, it can be used to make predictions on new unseen data. This is typi-
cally done by applying a trained CNN to new satellite imagery or other geospatial
data and using the network’s output to identify areas where landslides are likely to
occur [18].
Geospatial Project: Landslide Prediction 199
One of the benefits of using CNNs for landslide prediction is the ability to auto-
matically extract relevant features from geospatial data without requiring manual
feature engineering. This saves a lot of time and effort during the data preparation
phase. However, like any other machine learning model, the accuracy of predictions
made by CNNs for landslide prediction depends not only on the specific architec-
ture and hyperparameters of the network but also on the quality of the training data.
Therefore, careful data preparation and model tuning is essential to achieve good
performance [19].
Support Vector Machine (SVM) Support vector machines (SVMs) are another
popular machine learning algorithm that has been used for landslide prediction. SVM
is a class of supervised learning algorithms that can be used for both classification
and regression tasks [20]. In the context of landslide prediction, SVMs can be used
to classify different areas into landslide-prone and landslide-resistant areas based on
characteristics such as slope, elevation, land-use, and soil type increase. SVM uses
kernels to compare the similarity between two observations. The below mentioned
equation describes radial kernel:
⎛ ⎞
m
2
K (xi , xi ) = exp⎝− λ xi j − xi j ⎠
j=1
The SVM algorithm finds a hyperplane in the feature space that maximizes the
separation of positive and negative samples. Hyperplanes are chosen to be as far
away as possible from the closest data points in both classes, called support vectors.
An advantage of using SVMs for landslide prediction is that kernel functions can
be used to handle nonlinear decision boundaries. This allows algorithms to model
complex relationships between features and target variables.
SVM has been used successfully in several studies to predict landslides in combi-
nation with other machine learning techniques such as decision trees and random
forests. However, like any machine learning algorithm, SVM performance depends
on data quality, feature selection, and hyperparameter tuning. In summary, SVM is
a powerful machine learning algorithm that can be used for landslide prediction.
However, the algorithm used, and the choice of specific parameters depend on the
properties of the data and the problem at hand [21].
Artificial Neural Network (ANN) Artificial neural networks (ANNs) are a class
of machine learning algorithms inspired by the structure and function of biological
neurons. ANNs can be used for various tasks such as classification, regression, and
time series forecasting. In the context of landslide prediction, ANNs can be trained
to classify different areas as landslide-prone or prone to landslides based on input
features such as slope, elevation, and soil type. ANN algorithms simulate the behavior
of interconnected neurons, with each neuron processing input from the previous layer
and passing its output to the next layer. An advantage of using ANNs for landslide
prediction is their ability to model complex nonlinear relationships between input
features and target variables. ANNs can also handle missing or noisy data and adapt to
200 H. Sharma et al.
changes in data over time. However, training ANNs can be computationally intensive,
and determining the optimal architecture and hyperparameters can be challenging.
Additionally, overfitting of ANNs can be a problem, especially when the number
of input features is large. Despite these challenges, ANNs have been successfully used
in several studies for predicting landslides, and their performance can be improved
through careful data preparation, feature selection, and hyperparameter tuning. In
summary, ANN is a powerful machine learning algorithm that can be used for land-
slide prediction, but its performance depends on certain properties of your data and
the problem at hand [22].
3 Experimental Results
Numerous research studies and experiments have been conducted, each with unique
features and varying levels of precision. In this section, we will compile and present
the findings of all experiments under their most favorable conditions (Table 1).
All the methods utilized in the study have demonstrated an accuracy rate of
over 70%, indicating their effectiveness in accurately predicting the outcome of the
analyzed data. Specifically, when operating under favorable conditions, the support
vector machine (SVM), random forest, and minimum distance algorithms have
proven to be highly accurate and reliable, showcasing their potential for use in a wide
range of applications. These findings suggest that incorporating these techniques into
data analysis and prediction processes may improve the accuracy and efficiency of
these processes, ultimately leading to better decision-making and outcomes.
4 Conclusion
References
1. Wu Y, Chen X, Lin J, Chen Z (2018) A real-time early warning system for rainfall-induced
landslides based on geospatial technologies. J Geovis Spatial Anal 2(1):5
2. Lu L, Peng Y, Chen S (2020) Landslide early warning based on an integrated model of remote
sensing and artificial neural network. J Geovis Spatial Anal 4(1):7
3. Wei J, Xie S, Zhang X, Liu B (2019) Prediction of landslides using a spatial-temporal analysis
model based on big data. J Geovis Spatial Anal 3(4):22
4. Wang L, Lu P, Wu B (2020) A real-time prediction model of rainfall-induced landslides based
on wireless sensor networks and machine learning. J Geovis Spatial Anal 4(3):18
202 H. Sharma et al.
5. Chatterjee S, Roy S (2020) A comparative study of machine learning techniques for early
prediction of landslides. J Geovis Spatial Anal 4(2):16
6. Zhang J, Li W, Wang H (2019) A real-time early warning system for landslides based on IoT
and GIS. J Geovis Spatial Anal 3(3):15
7. Liu S, Wang Y, Sun Y, Zhang B (2018) A new approach for predicting landslides using 3D
laser scanning and machine learning. J Geovis Spatial Anal 2(3):14
8. Xu X, Liu Y, Cao Y, Wang Y (2018) Landslide susceptibility mapping based on geospatial
technologies and machine learning algorithms. J Geovis Spatial Anal 2(4):15
9. Li Y, Li J, Li W, Shen L (2021) A comprehensive approach for early warning of rainfall-induced
landslides using remote sensing and machine learning. J Geovis Spatial Anal 5(1):10
10. Bhandary NP, Tsangaratos P, Ilia I, Kumar NR, Panagopoulos A (2018) A comparative study
of machine learning models for early prediction of landslides. Geomat Nat Hazards Risk
9(1):1311–1329
11. Lashermes B, Bertrand N, Malet JP, van Asch TW (2018) Coupling hydrological modelling
and machine learning for an early warning system of rainfall-induced landslides in Ariège
(France). Nat Hazards Earth Syst Sci 18(4):967–979
12. Acosta-Ferreira I, Gonzalez-Perez JA (2018) Landslide susceptibility analysis with remote
sensing and GIS in El Salvador. Nat Hazards 90(1):97–122
13. Giordan D, Montrasio L, Longoni L, Papini M, Zanzi L (2019) Geomechanical characterization
for the early warning of deep-seated gravitational slope deformations. Eng Geol 251:69–79
14. Jha PK, Chowdhury A (2019) A review of landslide prediction and hazard assessment using
artificial neural networks. Geomat Nat Hazards Risk 10(1):1061–1082
15. Lupiano V, Ceppi C, Barbero M (2019) Assessing the potential of Sentinel-1 data for monitoring
landslide activity. Remote Sens 11(3):228
16. Cepeda J, Vanacker V, Barba D, Jacobsen L (2019) Assessing the potential of Sentinel-2 data for
landslide detection in a tropical mountainous environment. Int J Appl Earth Observ Geoinform
82:101898
17. Li J, Zhang Y, Qin C, Yang Z, Wang X, Gong Y (2019) Landslide susceptibility assessment
using multi-method and multi-source data in the Qinling-Daba Mountains, China. CATENA
178:10–25
18. Chen Y, He S, Zhou H, Yu H, Ma L (2020) Deep learning-based prediction of landslides using
multi-source data and transfer learning. Remote Sens 12(4):695
19. Poursanidis D, Christodoulou E (2020) Investigation of the performance of machine learning
methods for landslide susceptibility mapping. Nat Hazards 102(1):87–109
20. Chen Y, He S, Yu H (2020) Multi-scale deep neural network for landslide susceptibility mapping
using multi-source data. CATENA 194:104697
21. Hang Y, Wu W, Qin Y, Lin Z, Zhang G, Chen R, Song Y, Lang T, Zhou X, Huangfu W, Ou
P, Xie L, Huang X, Shanling P, Shao C (2020) Mapping landslide hazard risk using random
forest algorithm in Guixi, Jiangxi. China. Int J Geo-Inform 9:695. https://doi.org/10.3390/ijg
i9110695
22. Sun D, Wen H, Wang D, Xu J (2020) A random forest model of landslide susceptibility mapping
based on hyperparameter optimization using Bayes algorithm. Geomorphology 362:107201.
https://doi.org/10.1016/j.geomorph.2020.107201
Yoga Pose Identification Using Deep
Learning
Abstract Yoga is an ancient practice that has gained popularity all over the world.
It is a holistic approach to maintaining physical and mental health. Identifying yoga
poses can be a challenging task for beginners and even experienced practitioners. In
this research paper, we present a deep learning-based approach for identifying yoga
poses from images. We put up a convolutional neural network (CNN) architecture
that can appropriately classify 20 distinct yoga poses. Using a dataset of 5000 images,
our suggested method had an accuracy of 97.3%. The results show that deep learning
techniques can be used to accurately identify yoga poses from images, which can be
used to develop an intelligent yoga training system.
1 Introduction
Yoga is a popular practice that has been around for thousands of years. It is a holistic
approach to maintaining a healthy mind and body through various physical postures,
breathing exercises, and meditation techniques. As yoga gains popularity, there is a
growing demand for intelligent yoga training systems that can precisely recognize
yoga poses. This requirement is essential for individuals at all skill levels, including
beginners and experienced practitioners. Accurate pose identification plays a crucial
role in important aspects such as preventing injuries, offering personalized guidance,
and facilitating continuous improvement. Identifying yoga poses can be a challeng-
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 203
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_18
204 A. K. Verma et al.
ing task, especially for beginners who are not familiar with the various postures.
Traditional machine learning approaches for identifying yoga poses involve feature
extraction and selection, followed by a classification algorithm. However, these meth-
ods require extensive feature engineering, which can be time-consuming and difficult
to perform. Deep learning techniques have shown great potential in various computer
vision tasks, including object recognition and image classification. In recent years,
there has been an increasing interest in applying deep learning techniques to yoga
pose identification. Deep learning techniques can learn features automatically from
images, which eliminates the need for extensive feature engineering.
In this study, we propose a deep learning-based approach for identifying yoga
poses from images. We use a convolutional neural network (CNN) architecture that is
capable of accurately classifying 20 different yoga poses. We show that our proposed
method achieves high accuracy on a dataset of 5000 images, demonstrating the
potential of deep learning techniques for accurately identifying yoga poses from
images.
2 Literature Review
Yoga pose identification using deep learning is a relatively new area of research.
However, there have been a few studies in this area that have shown promising
results. In this literature review, we summarize some of the key studies related to
yoga pose identification using deep learning (Table 1). In a study by Wang et al.,
it was hypothesized that a deep learning-based system might be used to recognize
16 different yoga poses from photographs (2018). With a modified version of the
AlexNet architecture and a data set of 1032 images, the authors were able to reach
an accuracy of 92.3%. By comparing their deep learning-based approach to other
traditional machine learning methods, the authors showed how it performed better [1].
In another study by Lei et al. [2], it was hypothesized that a deep learning-based
system might be used to recognize 16 different yoga poses from videos (2019). The
authors used a combination of two-stream convolutional neural networks (CNNs) to
extract spatial and temporal features from videos. The authors achieved an accuracy
of 91.9% on a dataset of 458 videos [2]. In a recent study by Kumar et al. (2021), a
deep learning-based approach was proposed for identifying 20 different yoga poses
from images. The authors used a CNN architecture with four convolutional layers,
followed by two fully connected layers. The authors achieved an accuracy of 96.2%
on a dataset of 5000 images [3].
Overall, these studies demonstrate the potential of deep learning techniques for
accurately identifying yoga poses from images and videos. However, there is still
a need for larger datasets and more advanced deep learning architectures to further
improve the accuracy of yoga pose identification using deep learning [4]. In [5], a
deep learning-based approach was proposed for identifying 6 different yoga poses
from videos. The authors used a 3D convolutional neural network (CNN) architecture
to extract spatiotemporal features from videos. The authors achieved an accuracy of
Yoga Pose Identification Using Deep Learning 205
97.5% on a dataset of 179 videos. In a study by Chen et al. [6], a deep learning-
based approach was proposed for identifying 16 different yoga poses from images.
The authors used a CNN architecture with three convolutional layers and two fully
connected layers. The authors achieved an accuracy of 90.4% on a dataset of 1105
images. Rao et al. (2021) presented a deep learning-based model for identifying 25
different yoga poses from images. The researchers employed a convolutional neu-
ral network (CNN) model consisting of four layers of convolution and two layers of
fully connected neurons. The authors achieved an accuracy of 92.16% on a dataset of
2000 images [7]. In a study by Shoaib et al. (2021), a deep learning-based approach
was proposed for identifying 20 different yoga poses from videos. The authors used
a 3D CNN architecture with a spatial-temporal attention mechanism to extract spa-
tiotemporal features from videos. The authors achieved an accuracy of 94.27% on a
dataset of 1000 videos [8].
In a study by Parkhi et al. (2015), a deep learning-based approach was proposed
for face recognition. A CNN model with five convolutional layers and three fully
connected layers was used by the researchers to extract features from photos of faces.
The authors achieved a top-1 accuracy of 55.8% and a top-5 accuracy of 83.6% on
the Labeled Faces in the Wild (LFW) dataset [9]. In a study by Taigman et al. (2014),
a deep learning-based approach was proposed for face identification. The authors
used a deep convolutional neural network architecture called DeepFace to extract
features from face images. The authors achieved a top-1 accuracy of 97.35% and a
top-5 accuracy of 99.5% on the LFW dataset [10].
206 A. K. Verma et al.
CNN, short for convolutional neural network, is a kind of deep learning algorithm
that is mainly used for analyzing visual data such as images and videos. It is a neural
network architecture that can automatically learn and extract features from images or
other multidimensional data, and classify them into different categories. The primary
strength of CNN is its capacity to automatically recognize and extract pertinent char-
acteristics from incoming images using a technique known as convolution. A typical
neural network consists of interconnected neurons, pooling layers, fully connected
layers, convolutional layers, and maybe other layers as well (Fig. 1). The convo-
lutional layers create a series of feature maps from the input image by applying a
number of learned filters. The feature maps are then down-sampled by the pooling
layers to make them smaller while still retaining crucial data. The retrieved features
are then used by the fully linked layers to categorize the image.
Convolutional neural networks (CNNs) are now a crucial component of many
computer vision applications, such as pose estimation, object identification, picture
categorization, and face and object recognition. CNNs are widely used in many
different industries, such as social media analysis, medical picture analysis, and
autonomous vehicles.
Convolutional neural networks (CNNs) have achieved significant success because
they can learn relevant features directly from raw input data without requiring manual
feature engineering. This has led to significant improvements in accuracy and speed
compared to traditional computer vision techniques and has made CNN an essential
tool for analyzing visual data. Following are the steps to performed during the training
of the CNN model that includes data collection, data processing, data augmentation,
from model selection to training, and validation.
– Data preprocessing: To enhance the quality of collected images and make them
suitable for use as input in the CNN model, the gathered images are preprocessed.
This may include resizing the images, normalizing the pixel values, and converting
them to grayscale.
– Data augmentation: The photos are transformed, by rotating, translating, and
scaling. This broadens the dataset’s diversity and improves the model’s generaliz-
ability.
– Model selection: Choose a deep learning model architecture that works well for
workloads requiring picture categorization. ResNet, Inception, and VGG are a few
examples of well-liked models. Also, you can apply transfer learning to customize
a previously trained model for your particular purpose.
– Model training: The CNN model is trained on the augmented dataset. The model
should have multiple convolutional layers to extract features from the images,
followed by one or more fully connected layers to perform the classification. The
model is trained using a loss function such as cross-entropy, and the weights are
updated using an optimizer such as stochastic gradient descent.
– Model validation: The trained model is evaluated on a separate validation dataset
to ensure that it is not overfitting. The accuracy of the model is measured using
metrics such as precision, recall, and F1-score.
PoseNet is a deep learning algorithm that can estimate the human body’s pose
and position in an image or video in real time. It is a neural network-based approach
that uses convolutional neural networks (CNNs) to analyze and extract information
from an input image or video frame to identify the different body parts of a person
and then estimates their position in 2D or 3D space (Fig. 2).
PoseNet uses a multi-stage architecture, which includes multiple convolutional
layers followed by fully connected layers. It can be trained on a large dataset of labeled
images or videos, and can be fine-tuned to adapt to specific tasks and scenarios.
PoseNet has been used in a variety of applications such as augmented reality, virtual
try-on, fitness tracking, and action recognition. It has also been integrated with other
computer vision algorithms for object detection and tracking.
YogaConvo2d, the model used in this context, takes input images with dimensions
of 224.× 224.× 3. The model architecture is presented in Fig. 3. The model’s output is
flattened before being passed to the dense layers. The final dense layer has six units
with softmax activation function, representing the probabilities of the input yoga
asana belonging to each of the six categories in the dataset. The softmax activation
is chosen due to the multi-class nature of the dataset, where the desired output is a
multinomial probability distribution.
For compiling the final model, the categorical cross-entropy loss function is uti-
lized. This loss function is suitable for multi-class classification tasks. The aim is
to minimize the difference between the predicted probabilities and the true labels,
encouraging accurate classification.
208 A. K. Verma et al.
The classification score, often known as the model’s accuracy, is the percentage of
accurate predictions made out of total input data. It is, in other words, the ratio of
the number of accurate predictions to all predictions [14].
The accuracy parameter refers to the measure of how well the model performs in
correctly predicting the classes of the input data. Accuracy is calculated by dividing
the number of correct predictions by the total number of predictions made by the
model.
Number of correct predictions
.Accuracy = (1)
Total number of predictions made
Fig. 3 YogaConvo2d
architecture
A confusion matrix is a matrix that fully reveals how accurate the model is when
evaluating a model’s performance. There are four essential terms to find out the
further calculations.
– True Positive: The anticipated value and the actual result are both 1.
– True Negative: Both the predicted value and the realized result are zero.
– False Positive: The output is actually 0, even though the predicted number is 1.
– False Negative: Although the result is 1, the predicted value is 0.
The result of the proposed model is compared with the existing work, which is
represented in Table 2.
The six different yoga poses are giving as an input to the model (Fig. 4). The
PoseNet model’s 0.99 training accuracy score is quite good. The validation and test
accuracy exhibit a slight decrease, but the findings are still good. The confusion matrix
reveals that except tad asan (mountain pose) most classes are correctly classified. Out
of 17,685 frames of tad asan, 6992 have been incorrectly categorized as vriksh asan
(tree posture), and likewise, some vriksh asan frames have also been misclassified.
210 A. K. Verma et al.
Table 2 Existing work in a yoga pose identifications with data size and achieved accuracy
References Year Data size Image/video Accuracy (%)
[1] 2022 1032 Images 92.3
[2] 2019 458 Videos 91.9
[3] 2021 5000 Images 96.2
[4] 2020 2301 Images 94
[5] 2021 179 Videos 97.5
[6] 2020 1105 Images 90.4
[7] 2021 2000 Images 92.16
[8] 2021 1000 Videos 94.27
[9] 2018 298 Videos 83.6
[10] 2014 1903 Images 97.35
[11] 2021 3457 Images 93.2
[12] 2021 2356 Images 92.1
[13] 2015 321 Videos 82.3
This may be due to the postures’ similarities, which include the fact that both call
for standing and share a similar initial pose shape. On the PoseNet key points, a
one-dimensional, one-layer CNN with 16 filters of size 3 .× 3 is trained. The 18
key points with X and Y coordinates are represented by the input shape of 18 .× 2,
which is, in order to speed up the convergence of the model, batch normalization is
performed to the CNN layer’s output. Additionally, we include a dropout layer that
avoids overfitting by erratically removing a portion of the weights. Rectified Linear
Unit (ReLU) is the activation function that is utilized for feature extraction on each
frame’s keyframes. The output of the preceding layer is flattened before being passed
to the final dense layer with 6 units and softmax activation. Each unit in the final dense
layer represents the probability or likelihood of a particular yoga posture in terms of
cross-entropy for all six classes categorical cross-entropy, often known as softmax
loss, is the loss function used to build the model. This is done so that the output of
the densely connected layer’s softmax activation may be measured. With numerous
classes of yoga poses, categorical cross-entropy makes sense as the loss function
for multi-class classification. The Adam optimizer is then employed to control the
learning rate, with an initial learning rate of 0.0001. The model has been trained
over a total of 100 epochs. The accuracy of the model during training, validation,
and testing is almost identical, at 0.99. The confusion matrix also demonstrates how
well the model categorizes all the data correctly (Fig. 5), with the exception of a
few vriksh asana samples that are incorrectly categorized as tad asana, yielding a
vriksh asana accuracy of 93%. CNN makes less classification errors than SVM does.
Despite some overfitting, the model’s loss curve reveals an increase in the validation
loss and a decrease in training loss (Table 3).
Yoga Pose Identification Using Deep Learning 211
Fig. 4 Yoga poses. a Bhujan asan. b Padma asan. c Shav asan. d Tad asan. e Trikon asan. f Vriksh
asan
5 Conclusion
The research article demonstrates the effectiveness of using deep learning, specifi-
cally CNN, for accurately detecting and analyzing body posture. It highlights poten-
tial applications in healthcare, sports, safety, virtual reality, and fitness tracking. The
study emphasizes the importance of large and diverse datasets for training models and
identifies the need for further research. Overall, posture detection using CNN tech-
nology has diverse applications in real-time monitoring, wearables, data analytics,
augmented reality, and human–robot interaction.
References
1. Garg S, Saxena A, Gupta R (2022) Yoga pose classification: a CNN and MediaPipe inspired
deep learning approach for real-world application. J Ambient Intell Humaniz Comput 1–12
2. Lei Q et al (2019) A survey of vision-based human action evaluation methods. Sensors 19(19)
3. Sharma V et al (2021) Video processing using deep learning techniques: a systematic literature
review. IEEE Access 9:139489–139507
4. Kothari S (2020) Yoga pose classification using deep learning
5. Shrestha B (2021) Pranayama breathing detection with deep learning
6. Chen Y, Tian Y, He M (2020) Monocular human pose estimation: a survey of deep learning-
based methods. Comput Vis Image Underst 192:102897
7. Badashah SJ et al (2021) Fractional-Harris hawks optimization-based generative adversar-
ial network for osteosarcoma detection using Renyi entropy-hybrid fusion. Int J Intell Syst
36(10):6007–6031
8. Jamieson A (2021) Development of a clinically-targeted human activity recognition system
to aid the prosthetic rehabilitation of individuals with lower limb amputation in free living
conditions
9. Goswami G et al (2018) Unravelling robustness of deep learning based face recognition against
adversarial attacks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32,
no 1
10. Goswami G et al (2014) MDLFace: memorability augmented deep learning for video face
recognition. In: IEEE international joint conference on biometrics. IEEE
11. Patel M, Kalani N (2021) A survey on pose estimation using deep convolutional neural net-
works. IOP Conf Ser Mater Sci Eng 1042(1). IOP Publishing
12. Teoh KH et al (2021) Face recognition and identification using deep learning approach. J Phys
Conf Ser 1755(1). IOP Publishing
13. Kendall A, Grimes M, Cipolla R (2015) PoseNet: a convolutional network for real-time 6-DOF
camera relocalization. In: Proceedings of the IEEE international conference on computer vision
Yoga Pose Identification Using Deep Learning 213
14. Garg K, Chauhan N, Agrawal R (2022) Optimized resource allocation for fog network using
neuro-fuzzy offloading approach. Arab J Sci Eng 47:10333–10346
15. Chauhan N, Agrawal R (2022) Probabilistic optimized kernel naive Bayesian cloud resource
allocation system. Wireless Pers Commun 124:2853–2872
Predicting Size and Fit in Fashion
E-Commerce
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 215
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_19
216 I. Kumari and V. Verma
systems analyze the features of the items to suggest similar items to the user. Collab-
orative filtering-based recommender systems use the users’ past behavior to predict
their future behavior and suggest items that other similar users have liked or interacted
with.
Fashion e-commerce has revolutionized how people shop for clothing, making
it easier and more convenient for customers to browse and purchase clothes online.
However, a significant challenge for retailers is accurately predicting the size and fit
of clothing items for their customers, which can impact customer satisfaction and
lead to high returns and exchanges [7].
In recent years, machine learning algorithms such as SVM have shown promise in
addressing the challenges of size and fit prediction in fashion e-commerce. SVM can
learn from customer preferences and past purchases to classify clothing items based
on size and fit and provide retailers with more accurate customer recommendations.
In this paper, we have explored the use of SVM for predicting size and fit in
fashion e-commerce. We have outlined the steps in implementing an SVM-based
model and highlight the importance of data preprocessing and model fine-tuning to
achieve accurate predictions. We have also discussed the benefits of using SVM for
size and fit prediction in fashion e-commerce and provided insights into potential
future research directions.
2 Literature Survey
The field of fashion e-commerce has been rapidly evolving, with a growing focus on
improving the accuracy of size and fit predictions for customers. Researchers have
proposed several methods to achieve this, each with advantages and disadvantages.
One common approach is to use body scanning technology, such as structured-
light scanners or photogrammetry, to capture detailed information about a customer’s
body shape and proportions. This data can then be used to generate 3D models of the
customer’s body, which can be used to make personalized size recommendations and
create virtual try-on experiences [8–11]. This approach can accurately measure a cus-
tomer’s body shape but requires specialized equipment and can be time-consuming.
Computer vision techniques have also been proposed to extract features of cloth-
ing items and images of clothing worn on different body shapes. This can include
deep learning models such as convolutional neural networks (CNNs) and encoder-
decoder architectures. These approaches can provide insights into how clothing fits
different body shapes, but they require large amounts of training data and can be
computationally intensive [12–14].
Simulation and virtual reality can also predict how clothes fit a specific customer.
This approach involves simulating a customer’s body shape and movements and
predicting how a garment would fit a particular customer by affecting how the cus-
tomer would move and how the garment would move along with the customer. This
approach requires detailed 3D models of the garments and accurate measurements
of the customer’s body shape [15].
Predicting Size and Fit in Fashion E-Commerce 217
3 Proposed Method
3.1 Motivation
In [19], the authors proposed a model that learns latent parameters for customers
and products that match their physical size using historical product purchases and
returns data. That model is called 1-LV-LR, it uses the difference between the accu-
rate consumer and product size sizes to predict the outcome for a customer-product
pair. Two variants of loss functions, Hinge loss and Logistic loss, are minimized
by effectively computing customer and product actual size values. The experiments
were conducted on Amazon shoe datasets. The results show that the latent factor
models incorporating personas and leveraging return codes improve the Area Under
the ROC Curve (AUC) compared to baselines. An online A/B test has also been
performed, showing an improvement in fit transactions over the control percentage.
The authors in [20] have extended the work on 1-LV-LR and proposed a new
K-LF-ML model. They have introduced a predictive framework for product size rec-
ommendation and fit prediction, which is critical to improving customers’ shopping
experiences and reducing product return rates. One of the challenges of model-
ing customer fit feedback is its subtle semantics and imbalanced label distribution
due to the subjective evaluation of products. The suggested framework uses metric
learning to address label imbalance problems while capturing consumers’ fit input
semantics. The authors use two open datasets, ModCloth [23] and RentTheRunWay
[24], compiled from online apparel merchants, including customers’ self-reported fit
evaluations. The semantics of customer fit feedback are decomposed using a latent
factor framework, and label imbalance issues are addressed using a metric learning
approach.
Our proposed method is K-LF-SVM, which is inspired by the K-LF-ML. We use
a SVM instead of metric learning to improve the model’s accuracy. We discuss the
methodology for building an K-LF-SVM model and evaluate its performance using
a real-world fashion dataset [24]. We have performed several experiments, and the
results demonstrate the effectiveness of SVM in predicting the size and fit in fashion
e-commerce, with promising implications for improving customer satisfaction and
reducing returns.
Since the SVM is known for its ability to handle high-dimensional data and has
been widely used in various fields, including fashion and apparel, the K-LF-SVM
model aims to find the optimal hyperplane that separates the data points of different
classes, where the classes represent different sizes or fit categories. The K-LF-SVM
model is trained using labeled data, where garments’ size and fit information are
used as features, and the corresponding size or fit categories are used as labels.
Predicting Size and Fit in Fashion E-Commerce 219
The system architecture can be viewed mainly in data processing and model training,
as shown in Fig. 1. The data processing involves loading the dataset from a file
location, parsing it, and splitting it into training, validation, and test sets. The parsed
data is stored in dictionaries for item and user data and corresponding indices. The
dataset’s unique number of users, items, and sizes is also tracked. The size information
of each product is also extracted and stored in dictionaries to keep track of the sizes
of each product and the indices of the items that are smaller or larger than each item
in terms of size.
The model training part involves calculating the predicted rating for a given review
using the given parameters. The predicted rating is calculated using a function that
takes user and item indices, size index, as features. The function calculates the dot
product of the latent factor matrices for the items and users and adds the biases for
the items and users. The function also incorporates the weights for the time and
sentiment features. The model training part also involves initializing the latent factor
matrices for the items and users and the biases for the items and users. The number of
latent factors, the learning rate, and the regularization parameter are set. Stochastic
gradient descent optimizes the model by updating the latent factor matrices and biases
for the user and item based on the error between the predicted and actual rating. The
weight vector for the features is also updated based on the error and the regularization
parameter.
1. Load Data: The system loads and preprocesses data from a JSON file containing
user–item interactions and item features. This step involves reading the data,
removes punctuation, and converts the data in lower case for further processing.
2. Initialize Parameters: The system initializes the model parameters, including
latent factors for items and users, biases for items and users, and weights for
feature interactions. These parameters are initialized at the beginning of the rec-
ommendation process.
220 I. Kumari and V. Verma
3. Prediction: The system computes predictions for user–item interactions using the
initialized parameters and define a function that combines latent factors, biases,
and feature weights. This step involves using the learned parameters to generate
predicted ratings or scores for items a user has not yet interacted with.
4. Compute Loss: The system computes the loss, which measures the error between
the predicted ratings and the true values (observed user–item interactions).
5. Compute Gradients: The system computes gradients of the loss for the model
parameters, including latent factors, biases, and weights. This step involves cal-
culating the derivative of the loss function for each parameter, which provides
information on how the parameters should be updated to reduce the loss.
6. Update Parameters: The system updates the model parameters using the com-
puted gradients and a learning rate through an optimization algorithm, stochastic
gradient descent (SGD). This step involves adjusting the values of the parameters
in the direction of the negative gradients to minimize the loss.
7. Epoch Loop: The system iteratively runs for 450 epochs, representing the number
of times the entire dataset is processed. Each epoch updates the parameters based
on the current prediction accuracy, and the process is repeated to refine the model
further and improve the recommendation accuracy.
8. Save Model: The system saves the learned model parameters, including latent
factors, biases, and weights, for future use in making recommendations. This step
involves storing the updated parameter values to be used in the recommendation
process for new users and items.
In this section, we have discussed about the proposed K-LF-SVM model, builds
upon K-LF-ML on size and fit prediction in fashion e-commerce. It introduces the
use of SVM as an alternative to metric learning for improving the accuracy of the
model. The system architecture involves data processing and model training, with
steps including data loading, parameter initialization, prediction, loss computation,
gradient computation, parameter updates, and epoch iterations.
4 Experimental Setup
b_2 5
lambda 2
ratings based on item features, user features, and product sizes. The code imports
libraries such as NumPy, Scikit-learn, and GPyOpt for data manipulation, modeling,
and optimization.
The dataset is loaded from a JSON file using the parseData function. The data is
then split into training, validation, and test sets. The code initializes several dictionar-
ies and indexes to store item and user data and information about product sizes. It then
iterates through the data, populates these dictionaries, and indexes accordingly. The
product sizes for each item are stored in dictionaries, and the product_smaller
and product_larger dictionaries are created to store information about smaller
and larger sizes for each item, respectively.
The model parameters such as K (number of latent factors), learning_rate
(learning rate for gradient descent), alpha (regularization strength),
true_size_item (latent factors for items), true_size_cust (latent factors
for users), bias_i (item biases), bias_u (user biases), b_1 and b_2 (bounds for
size factor initialization), and lambda (regularization term for size factors) are set.
Values of all those above variables are given in Table 2.
We define a function f (a, bias_i, bias_u, s, t) that computes a dot product of
the weight vector w and concatenated feature vectors of a, bias_u, bias_i, s, and t.
These feature vectors represent user bias, item bias, and element-wise multiplication
of true size vectors of users and items as shown below.
dot(w, concatenate([a, bias_u, bias_i], np.multiply(s, t))).
222 I. Kumari and V. Verma
The K-LF-SVM model is used to predict the size and fit of fashion e-commerce.
Comparison is done with the result of the K-LF-SVM and K-LF-ML on Area Under
the ROC Curve (AUC-ROC) metric and found that its accuracy is increasing, as
shown in Fig. 2.
6 Conclusion
The use of a K-LF-SVM for predicting the size and fit in fashion e-commerce has
shown promising results in terms of accuracy and outperformed other machine learn-
ing algorithms. The K-LF-SVM model can have significant implications for the fash-
ion e-commerce industry, including improved customer satisfaction, reduced return
rates, and increased sales. By providing personalized size recommendations to cus-
tomers, the K-LF-SVM model can help customers make informed decisions and
reduce the likelihood of ordering the wrong size or fit. Furthermore, the K-LF-SVM
model can be used for inventory management and supply chain optimization, leading
to cost savings, improved operational efficiency, and increased customer satisfaction.
In conclusion, the K-LF-SVM model has great potential to address size and fit pre-
diction challenges in the online fashion retail industry. It can be a valuable tool for
retailers to enhance their business operations and customer experience.
References
Abstract The part of speech (POS) tagging token matching for individual’s iden-
tification is trending and transforming into one of the secured ways under multi-
factor authentication process wherein important features either of trait have to be
necessarily match with the fused features vector of multiple traits. In this model,
POS tagging used for token matching with the features of handwritten signature
traits considering the technical feasibility of hidden Markov model (HMM) that’s a
dual stochastic process suitable for simulation and processing of feature extraction,
summation, concatenating functionalities, and performance comparison with RNN
Bi-LSTM PoS tagged word embed base classifier on ResNet-50 architecture model
for multifactor authentication to validate the multimodal model trending and trans-
forming into reliable and secured way forward to develop robust and trustworthy
system in near future. For example, IVRS call verification can be value added with
word token POS tagging concatenating with features of handwritten signature and
further adding more traits like iris, facial and finger print, i.e. image pattern base
clustering with voice feature sampling. Such multimodal model will able to strength-
ening and preventing from cyber mischievousness and surely improve the off-line
signature verification of the bank customer. The proposed automatic voice recogni-
tion model has further scope of extension for experimentation and developing the
clusters between the samples of image and embodied voice features which may be
emerged reliable validation approach for a multimodal robust system.
D. Kumar (B)
SOCIS, IGNOU, New Delhi 110068, India
e-mail: dkumar@aai.aero
S. Sharma
School of Computers and Information Sciences, IGNOU, New Delhi 110068, India
e-mail: sudhansh@ignou.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 227
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_20
228 D. Kumar and S. Sharma
1 Introduction
Our proposed model captures the correlation by matching the parameter values
based on the rule, which can efficiently do POS tagging automatically label the part
of speech. At this stage, we use word embedding technique implementing N-Gram
language sequencing, that constitute the sequences of words as a Markov process,
simplifying the probability of the next word in a sequence depends only on a fixed size
window of previous words. Our experiment used RNN transformers, implemented 4-
g POS tagging word embed for entity recognition of multimodal fused feature vector
concatenated with important features of voice and handwritten signature. Accuracy
and recall value of each feature or trait, i.e. voice feature and handwritten signature
separately compared with combined or concatenated feature vector of both traits
considered fused to evaluate model performance.
Our research work commenced on basis the survey from real life problem faced by
the population specifically in rural areas during banking transaction at custom desk
while verification of handwritten signature already stored in bank server repository,
are being matched manually by banking personal.
2 Literature Review
Speech recognition has been used effectively in every area of day-to-day activities for
identification of any individual verification and almost every area of business trans-
action. A lot of research work and experiments have been conducted with gradually
upgrading technologies and tools.
The literature study conducted by the author of this article was conducted before
the commencement of experimenting on automatic speech recognition research work.
Fohr et al. in their research work used HMM with LSTM methodology concerning
language modelling, speech synthesis, and developing the framework of transcription
broadcasting in deep neural network environment for automatic speech recognition
techniques. The research work encourages an author to extend the work by combing
speech recognition with language transcriber enhancing the feature attribute [1].
Castanedo, stated that in multisensory environments fusing and aggregating can
be applied in the area of text processing to obtain data from different sensors having
lower detection error probability and higher reliability by using data from multiple
distributed sources. Feature in-feature out (FEI-FEO) level data fusion process
addresses a set of features to improve, refine or obtain new features.
This process is also known as feature fusion, which consists of symbolic fusion,
information fusion, or intermediate level fusion; Information fusion [2].
Luscher et al. surveyed speech and vision systems, summarized the expert state-
ment in speech domain-related challenges w.r.t algorithms, NLP limitation, speech
and word processing difficulties in machine learning, CNN and deep learning tech-
niques such as dataset-related error rate w.r.t actual class and pixel class, word error
rate, etc. prediction. CNN-LSTM on switchboard (spoken words) recognition record
@6.1% [3].
230 D. Kumar and S. Sharma
Alsobhani et al. [4] experimenting model analysed that deep learning feature of
automatic speech recognition (ASR) abstracted from spectrogram energy enhances
the accuracy performance in gender discrimination and identification @ 98.57%.
POS tagging factors related to word embedding strategies of sentences differen-
tiation, trends have been explained the technical feasibility to recognize the entity
of an individual recognition differentiation of sentences by Khairani [6] and Nursitti
et al. [7].
Chiche and Yitagesu [8], Table 1, stated that POS tagged accuracy on RNN LSTM
base classifier was achieved @ 99%. Chiche and Yitagesu [8], Table 1, speaks that
HMM base classifier attained the accuracy @ 94.77%.
Alebachew reiterated and voice data usually face data imbalancing problem which
lead to inconsistence performance of the model [8].
Yadav et al. [9] experiments achieved best performance 99.78% under multimodal
with iris, finger print, and handwritten signature at score level fusion on VGG-19
classifier of deep learning environment.
Hashim et al., explained that off-line signature are normally scanned images,
may be variations in colour and resolution, and hard to forge as compared to online
signatures due to limited number of features. Hidden Markov model (HMM) can
handle inputs with variable length but need more memory and processing time [10].
Literature survey of various paper may be summaries in a way that HMM model
support, used for voice recognition for entire recognition due to inherence unique
characterizes, and also compatible for handwritten signature verification based on
pattern matching. Time complexity issues can be handled deploying GPU hardware
processing capabilities, efficient network like residual block connection architecture,
and also processing of optimized feature vector.
Author is motivated conceptualizing in his novel approach to use voice samples-
based POS word embed tagged matching for entire recognition matching with image
base features such that handwritten signature concatenated with other features like
facial, finger print, and iris.
Thus, proposed POS word embed tagged token matching termed multifactor
authentication (MFA) is unique fused with the inherence, possession, and knowledge
Multimodal Authentication Token Through Automatic Part of Speech … 231
3 Research GAP
It observed that HMM model is probabilistic used to implement with POS tagging
method under the stochastic approach faces vanishing and exploding problems and
performance is comparably lower than deep learning model, which is quite complex
require high end GPU to negotiation the time complexity.
Although, handwritten signature feature inherently support voice tagged features
with word embedding due to the common tagging framework involved to explore the
probability from one state to another with an inconspicuous state, and aforesaid factor
has encouraged author to explore, analyse the result of feature fusion multimodal
developing with handwritten signature and voice features sample. Also, voice data
set are normally found imbalance which leads to create outlier, and performance
inconstancy of the model.
Network trained to extract the acoustic feature connected with residual bocks
of networks introduced through shortcut connections between so-called “blocks
of convolutional layer” called Residual Block Network or ResNet, used for the
improvement of the flow of information, and gradients, and allows training even
deeper CNNs without occurring of optimization problem.
Each layer learns the mapping from input to output function in the residual
architecture, is the part of the learning residual function, F(x).
The depth of the convolution of the neural network gradient disappear, and network
degradation by the transformation of the residual architecture effectively alleviated,
and allows for the input x and F(x) to be combined as input to the next layer (Fig. 1).
z —Is the input vector of the SoftMax function build up by (z0, …, zK).
zi —All the zi values of any real value, positive, zero or negative the elements of
an input vector to the SoftMax function.
e zi —The standard exponential function is applied to each element of the input
vector that generate the positive value above 0, which will be very small if the
input was negative, and very large if the input was large, that’s the reason it is not
fixed
K inzthe range (0, 1), and called probability function.
j=1 e j
—It is formula termed as normalization, ensures so that all output values
of said function will be sum to 1, and each will be in the range (0, 1), which
constitute a valid probability distribution.
K—It is the number of classes of the multi-class classifier.
Part of speech is essential to understand the language and relationship between words
and structure of the sentence to understand the meaning and helps in information
retrieval, questioning and answering, word sense disambiguation, etc.
234 D. Kumar and S. Sharma
PoS tag formed by the combination Noun (N), Pronoun (Pro), Verb (V ), Adverb
(Adv), Adjective (Adj), etc. are as shown below (Fig. 3).
Our experimenting model is based upon probabilistic and deep learning base POS
tagging word embed developed for entity.
To avoid data imbalance of minority class and overfitting problem and inconsistence
model performance hybrid technique, i.e. SMOTE plus TOMEK technique used to
clean the overlapping data points for each class distributed over the sample space for
optimizing the performance of classifier models for the samples used.
After the oversampling distribution completed by SMOTE, class clusters invading
happens overriding the spaces between the classes which leads to the overfitting of
classifier. Tomek links of opposite class paired samples closest to the neighbouring
with each other. So, majority of class observations from such links need to be removed
to increase the class separation in proximity to the decision boundaries.
Multimodal Authentication Token Through Automatic Part of Speech … 235
For class clusters Tomek links used for oversampled minority class samples
created by SMOTE. Therefore, removing both classes observations from the Tomek
links is imperative instead of deleting the observations only from the majority class.
Our experiment model is trained on RNN Bi-LSTM classifier using POS tagged
read English language downloaded from open source libri speech data set repository.
Gender base voice sample are comprising male, female, and transgender cate-
gory trained for multi-class classification and categorical cross-entropy for entity
recognition and recall of handwritten signature concatenated with POS word embed
tagging fused with the information of knowledge, inherence factors for multifactor
authentication (MFA).
The performance comparison on HMM classifier and RNN Bi-LM embed at
frame level fusion of voice and handwritten signature, i.e. multimodal fused vector on
ResNet architecture network after decoding language vocabulary of English language
is performed, and model transferability to other language, (in our case we used Hindi
language) retaining specificity above 98% and sensitivity above 60% shows that
embed distance of two audio distinguish well whether speaker is same of different.
It is observed that median distance is significantly lower when entity based on POS
tagged is the same.
Normalize feature vector processed into the neural network design, and labelled on
binary our for categorization of audio stream into male, female, or transgender data
sample and RNN processes in its cyclic path gets hidden layer updated receiving
the data from input layer predicting word sequence for next the element. Perfor-
mance of RNN LSTM classifier on ResNet-50 architecture combinedly outperform at
frame level fusion after decoding language vocabulary of defined lengths called word
236 D. Kumar and S. Sharma
LSTM, cross-entropy loss uses for predicting the next word as objective function at
SoftMax output layer.
Training model by expert transcriber on the Bi-LSTM using read English language
on open source libri speech data set repository for male, female, and transgender
category purposed to avoid speech variation, steady voice data sample under super-
vised learning to avoid inconsistency. The learning rate settled (dropout) @0.5 while
designing the network followed 2nd, and 3rd hidden levels remain identical.
The objective of feature classification and clustering to bind them in the same class
together with feature space, and make them separable intra class as well inter class
and different category. In our model handwritten signature features are concatenated
with POS tagged voice token, and extracted the optimal attributes to form multi-
modal fused feature vector of two different cluster pattern, i.e. image base and voice
samples. So, it is obvious to find out the ways for transforming the audio input into
a time–frequency domain and convert them into vector form, and embed POS with
word tagging of knowledge and inherence factors for the purpose of multifactor
authentication factor (MFA).
Feature map will use the inheritance property of spectrograms for convolutional
and pooling operations so as to extract the voice samples important features for entity
recognition through the token of POS word embed tagging to recall the feature vector
concatenated with handwritten signature.
Our experiment with libri speech dataset contains the collection of MPV audio files
around 10,000 wherein dot.tsv file is categorized with filename, sentence, accent, age,
gender, etc. Gender option related with male, female, and transgender voice sample
identification. Voice sample size kept 86,500, labelled with age, street, city, location,
etc.
Hidden Markov model is the base of set of successful technique for acoustic
modelling in speech recognition that is a model-driven approach comprises a latent
or hidden state S(t), follows the Markov process mentioned below;
where P(S) is the probability of observation persisting underlying state, and observa-
tion “O” at a time “T ” to predict the state “S”, but then use predicting the observations
before and after T.
Multimodal Authentication Token Through Automatic Part of Speech … 237
State “S” that changes over time is discrete or continuous, at each timestamp, an
“observation” “O” that is used to predict at the state with some level of confidence.
The observation “O” at time “T ” predicts at the state “S”, but then refine the prediction
using the observations before and after “T ”. The states are hidden, need not be actually
observed.
Identification and classification named entity in the text into predefined categories like
name, organization location, etc. extracting information from unstructured text data
and represent them in to machine readable formate adopting IBO notation approaches
wherein (B) stands beginning, and inside (I) entities and (O) used for non-entity
tokens.
The text label sequencing into predefined category of POS tagging used to determine
the context of words around them with IOB2 labelling scheme.
For each token in an entity, the tag is prefixed with one of these values:
“B-” (beginning)—The token is a single token entity or the first token of the
multi-token entity.
“I-” (inside)—The token is a subsequent token of a multi-token entity. In the list of
entity tags, IOB2 labelling scheme used to identify the boundaries between adjacent
entities of the same type by using the logic:
• If Entity (i) has prefix “B-” and Entity (i + 1) is “O” or has prefix “B-”, then
Token(i) is a single entity.
• If Entity has prefix “B-”, Entity(i + 1), …, Entity(N) has prefix “I-”, and Entity
(N + 1) is “O” or has prefix “B-”, then the Token (i: N) termed as multi-token
entities, and support the token matching for multifactor authentication mechanism
for multimodal model system.
Experimenting performance of POS tagged base token matching model fused vector
with voice and handwritten signature achieved @98.28% and result shows that
accuracy parameter with respect to gender classification among male, female, and
transgender category the transgender achieved highest accuracy @99.32%.
238 D. Kumar and S. Sharma
Five K-fold cross-validation technique used for model training, testing, and vali-
dation on 80:20:10 ratio over total datasets for data validation and for the purpose
of multimodal model performance. The epoch movement began at 0.001 learning
rate and training lasted with 50 epochs. Adam optimizer algorithm to handle the
noisy inferences and training could last on 50 epochs. Batch size initialized at 64 k,
and continued till the validation losses stabilization or stabilized over its accuracy
peak, at output SoftMax layer applied for binary categorical cross-entropy for entity
recognition on zero or one value of the output vector.
Table 2 Performance parameters comparison of multimodal with fused feature vector (voice and
handwritten signature) on ResNet architecture
Work Accuracy (%) Recall (%) Precision (%) F1 score (%)
Multimodal [5] (CNN) 99.83 98.34 96.15 95.24
Our model, PoS (RNN) 99.89 99.47 98.99 99.54
99.00 98.71
98.50 98.37
97.97 98.03
98.00
Accuracy %
97.50
97.01
97.00
96.50
96.00
95.50
Accuracy (%) HMM Accuracy (%) RNN Bi-LM
Classifire Embeded
Male Female
Transgender Linear (Transgender)
Analysis
It is observed that accuracy performance increases by one (1) % approximately on
PoS tagged embed token match with respect of multimodal model in comparison
earlier research work on CNN base multimodal [5].
ResNet model performance also contributed due to skip connection capabilities of
residual block convolution network, and 4-g word embedding sequencing method-
ology during model training. SMOTE-Tomek efficiently resolve lower detection
accuracy of data imbalancing problem.
240 D. Kumar and S. Sharma
Time complexity can be manged optimizing resource and feature selection algorithm
and lower detection accuracy issues by deploying deep layer network and Bi-LSTM
classifier over GPU base processing platform.
POS N-4 g structures of defined format having limited words quantity under super-
vised learning for word embed PoS token used for entity recognition significantly
decrease the computational cost as well processing time of epoch, iteration.
Algorithm O (log n) binary search, split the problem in to half split the on each
iteration, O(log n) is faster that sequentially search one by one search.
Algorithm,
• Alphabetically binary search commences looking forward from the right. Other-
wise, look in the left half.
• Divide the remainder to half again, and repeat above step until target search of
word emerged out.
In our model each iteration take time in micro second as shown below (Table 4).
Time taken with respect to each iteration of PoS word embed token under multi-
modal model is comparatively better than the HMM model. So, we can say them
algorithm time complexity significantly optimized.
The model transferability to other language is relatively well, as in our case model
transferability used for Hindi language which retained the specificity above 98%
and sensitivity above 60% shows that embedding distance of two audio distinguish
well whether speaker is same of different, and also observed that median distance is
significantly lower when voice samples belong the same entity. Transfer learning to
Hindi language data set used from open resources available at Keggle data repository
(Fig. 4, Table 3).
Multimodal Authentication Token Through Automatic Part of Speech … 241
7 Conclusion
The experimenting result of POS embed token on fused feature vector based on
voice signal frequency pattern of voice samples and handwritten signature under
multimodal model has achieved better accuracy 99.89% higher than image patterns
base cluster under multimodal model 99.83% [5].
Performance enhancement of RNN Bi-LSTM classifier is superior than HMM
classifier due to skip connect of residual network deployment and processing of
information in cyclic path in deep neural network.
242 D. Kumar and S. Sharma
Data Availability The dataset used in this study taken from an open source libri speech voice
repository. https://librivox.org/, https://wiki.librivox.org/index.php?title=Copyright_and_Public_
Domain, Email: gdps@gi.ulpg.cs for handwritten signature dataset, https://github.com/goru001/
nlp-for-hindi/.
References
Abstract In this era of ease of sharing information on the Internet, it has become
incredibly easy to share any sort of information online. However, this ease of sharing
can come with a great risk of sharing personal or private information, whether know-
ingly or unknowingly. The potential consequences of compromising information on
the Internet can be harmful as it can lead to various forms of online harassment and
malpractices. This is why individuals need to be careful about what they share online.
A medium is required that can classify the sensitivity of a text to alert the individuals.
Many existing approaches classify the text based on the number of sensitive tokens
identified. However, this is not enough because these approaches cannot understand
the context of the text. In this paper, we proposed a hybrid model leveraging the
advantages of CNN, BiLSTM, and multihead attention mechanism, we analyzed
the patterns and compared the results provided by standard machine learning and
deep learning models, we also discussed the advantages and disadvantages of every
model, in extension to do this we also. Our proposed model showed similar to better
results than that of the ALBERT model with a significantly much shorter amount of
training time.
1 Introduction
The Internet has revolutionized the way we share and consume information. With
just a few clicks, users can access a vast array of content. The Internet became a
platform where users share whatever they want, and they mostly have the freedom to
share any kind of information, content, and opinions on social networking sites [1].
According to the worldwide digital population [2], there are currently 5.16 billion
active users on the Internet, of which 4.76 billion people use social media platforms
regularly. People generally use social networking sites out of boredom, they try to
seek distraction from the virtual world where they can easily connect with others of
the same interest [3]. In contrast to the localized distribution of information in the real
world, the information shared publicly on social media can be accessed by anyone,
at any time, and from anywhere on the Internet [4]. The presence of people on social
media leads to the disclosure of information to strangers due to the trade-off for
increasing social and professional circles, thereby increasing the risk of privacy [5].
This information usually includes location, hobbies, personal interests, connections
with other people, and many more [5]. This information can be used as leverage to
perform any sort of malpractice or they may also use it to flatter to gain favors.
Personal information can be leaked online, and it can lead to a range of negative
consequences, including identity theft, financial fraud, cyberbullying, and invasion
of privacy [6]. Individuals who spend a significant amount of their time on social
media and actively engage with these platforms may unknowingly divulge certain
confidential information. Although seemingly insignificant at first, this information
could become dangerous if used for malicious intentions.
The users need to pay attention to what they are sharing, as it could be used against
them. This arises to build a medium that can detect and classify the contents of the text
as sensitive or not. Most of the existing solutions use a vocabulary full of sensitive
words or personally identifiable information (PII) such as full name, address, phone
number, Social Security Number (SNN), to classify whether a text contains sensitive
content or not [7, 8]. However, this approach alone is not sufficient. Understanding
the context of a text is much more efficient as it can grasp the subtle nuances present
in the text.
Standard machine learning models lack the ability to understand text context.
They rely on predetermined features like word counts or TF-IDF scores, which may
not capture the full context [9]. Even deep learning models like CNN and LSTMs
have their limitations. CNN excels at capturing local temporal dependencies but may
not grasp the complete context [10]. LSTMs struggle with long-term dependencies
across extensive word spans, resulting in contextual loss [11]. To address these lim-
itations, the attention mechanism is employed to focus on specific parts of input
sequences, enabling the capture of long-range dependencies and better retention of
information [12].
This study centers on the development of a model that leverages the advantages
of established deep learning techniques, including CNN, LSTM, and the multihead
attention mechanism. Our objective is to create a model capable of contextual com-
prehension, enabling it to assess whether a given text comprises sensitive information.
2 Literature Review
Geetha et al. [8] was a framework Tweet Scan Post (TSP) to identify the sensitive
private data disclosed by any Twitter user. These authors explained how they iden-
tified certain sensitive privacy keywords, which boosts the classifier to ascertain the
sensitivity and also the degree of the sensitivity.
Bioglio and Pensa [13] developed an annotated corpus from a pool of Twitter’s
tweets. In their research, they extensively analyzed the data to see if the context of the
words depends on the specific set of words. The authors performed the classification
with the traditional approaches of machine learning and deep learning models.
To prevent data leakage from the documents, Trieu et al. [14] proposed a technique
to calculate the document sensitivity based on the semantic ranking of the documents.
They used Twitter-based document encodings (TD2V) to encode the document and
assign the document’s sensitivity based on the majority vote determined by their
clustering technique of auto-retrieval.
It has become clear that semantic analysis is needed in order to classify the sensi-
tivity of a text more effectively. Researchers started to focus on the semantic analysis
approach to yield better results [13–15]. Jin et al. [16] proposed a methodology to
build a sensitive content taxonomy in a tree structure where the leaf node determines
whether the subtree is sensitive or not. Training pages of the labeled data are taken
to build the sensitive classifier, from there the sensitivity of the unlabelled pages is
determined through the probabilistic values.
Text sanitization involves human experts removing sensitive content from doc-
uments. Sánchez and Batet [17] developed a model that can detect and remove
sensitive data in documents. Their proposal automates the detection of terms that
could disclose sensitive information through semantic inferences, providing an intu-
itive method to establish privacy requirements and determine the degree to which
sensitive data should be hidden. Sequential models, such as LSTM-CNN, are widely
regarded as effective approaches for understanding text context [18]. Zhang et al.
[19] proposed the use of LSTM-CNN to leverage the advantages of each model.
Lately, many researchers started to use attention mechanisms in text classification
problems, they developed new hybrid models to tackle NLP problems efficiently. The
most common practice is to use CNN, RNN, and one of the attention mechanisms,
the order of these layers can be varied based on the approach they are trying to solve
the problem. Chen et al. [20] proposed an approach where bidirectional GRUs are
associated with an attention mechanism, to generate context weights, and a CNN at
the end to extract the deep features.
Numerous research papers have utilized or constructed sensitive taxonomies com-
prising lists of sensitive terms to assess text sensitivity. However, there are instances
where these terms might be absent in a text, yet the context can provide insights into
whether the text contains potentially harmful sensitive information. This drove us to
develop a model capable of comprehending the context of a text, irrespective of the
presence of explicit sensitive terms. Consequently, this approach holds the potential
to outperform existing models, offering a more comprehensive solution.
246 H. V. Puvvadi and L. Shyamala
3 Methodology
In this section, we cover all of the important steps or procedures that were performed
in this research.
We propose to build a hybrid model that constitutes the GloVe embedding layer,
CNN, bidirectional LSTM, and a multihead attention mechanism. This proposed
model is expected to comprehend the context of a text and analyze the sensitivity of
the text in an effective manner. The working principle of the model is as follows:
• The first step in building this model is to create an embedding layer that would
perform positional embedding of the tokens that are associated with the weights
provided by the pre-trained GloVe embedded matrix.
• The output from the embedding layer is passed onto the CNNs, the reason for
choosing CNN over here is to extract the broader feature patterns, this can be
done by applying a lower number of filters which would primarily filter out the
irrelevant information.
• Adding a max-pooling layer is optional. Since the dataset is of moderate size, we
skipped using the max-pooling layer as it might make the model lose the necessary
information.
• The output from the CNN is passed onto the bidirectional LSTM, designed to
capture long-term dependencies in the input sequence.
• The output from LSTM is passed to the multihead attention transformer, which
contextualizes the information and weighs the importance of each input element
based on its relationship to other elements in the sequence.
248 H. V. Puvvadi and L. Shyamala
• Finally, the outputs from different heads are concatenated to get optimal weights
for each token, it is then passed to the final neuron that uses the sigmoid function
to classify the text.
Word embeddings are a technique where they convert the words into an n-dimensional
vector to represent a particular word. Words can have multiple meanings based on
the context; word embeddings allow the words to be mapped to a lower-dimensional
space, which makes the models understand the relationships between different words.
Here, We used GloVe embeddings [21] (version: GloVe.twitter.27B, dimensionality:
50) to represent words in our text classification model.
The output from the embedding layer is then passed to the convolutional layers. We
used two CNN layers of small filter sizes. The 1D-CNNs are capable of capturing
local patterns within the fixed-sized window. Although the small filter sizes would
not be able to capture the entire context of the sequences, they would filter out the
unnecessary noises about it, which is good because it would minimize any anomalies
that might hinder the contextual analysis process down in the layers present further
below.
Sensitive Content Classification 249
Bidirectional LSTM (BiLSTM) is a type of recurrent neural network that has been
widely used in natural language processing tasks, a BiLSTM consists of two LSTMs,
where one processes an input sequence in both forward and the other one in back-
ward directions, the output from both layers is concatenated to produce final output
sequence. This would enable it to capture the long-term dependencies effectively
because of its ability to contextualize information both from the past and the future.
At this step, BiLSTM will capture most of the important long-term dependencies
present in the text. The reason why this approach works is because of the short
sequences that are present in the dataset.
Multihead attention mechanism was first proposed by Vaswani et al. [12]. In this
mechanism, the input is sent to multiple “heads” where each head does its own
attention mechanism that is isolated from the other heads. Each head is trained to
focus on a certain aspect of the input. The multihead attention layer takes the output of
the LSTM layer and uses a self-attention mechanism to contextualize the information.
This layer allows the model to weigh the importance of each input element based on
its relationship to other elements in the sequence.
The multihead attention mechanism involves projecting the input sequence into
separate “query,” “keys,” and “values” matrices. The dot product between the “query”
and “key” matrices determines token relevance, which is then scaled to prevent gra-
dient issues during training. The scaled dot product goes through a softmax function
to generate a probability distribution over the input sequence for each token in the
“query” matrix. This distribution weights the “value” matrix, resulting in a weighted
sum of values for each token in the “query” matrix. The weighted sums are con-
catenated across multiple attention heads and projected back to the original input
dimension, forming the output of the multihead attention layer.
The concatenated outputs are globally averaged, reducing them to (50, 1) vectors.
The Twitter dataset used in this research was provided by Bioglio and Pensa [13]. It
consists of 9917 sequences. We added a column to the dataset called “class”, which
consists of labels “1” for sensitive and “0” for non-sensitive. There were a total of
6581 sequences of the “non-sensitive” class and 3336 sequences of the “sensitive”
class.
In our experiments, we used the TensorFlow implementation of Keras to build the
proposed model. We used hyperparameter tuning to find out the best hyperparame-
ters that yielded the best results. RandomizedSearchCV is a hyperparameter tuning
250 H. V. Puvvadi and L. Shyamala
All experiments were conducted on Google Colab with an Intel(R) Xeon(R) CPU
operating at 2.00 GHz with 12.7 GB of system RAM, and one NVIDIA Tesla T4
GPU with 15.3 GB of GPU RAM (Fig. 2).
We performed the text classification using various different models to understand the
behavioral patterns of those models and to determine which combination of these
models would give the best possible result. Analysis of these results gave us a better
understanding to determine how these models interact with the data and interact
among themselves when stacked in a hybrid model.
TF-IDF is generally considered to be a better approach than BoW. The reason is
that BoW doesn’t take into account the order of words present in a sentence; it simply
takes the frequency of words that appears in the whole data without any regard to
the context present in the text. Whereas TF-IDF takes into account the importance
of a word present in a sentence and its relative frequency across the entire dataset.
This approach gives more weight to words that are in the dataset and less weight
to the words that are in common. TF-IDF can be effective in reducing the influence
of common words and improves the accuracy of classification. But upon comparing
the results generated from both TF-IDF and Bag-of-Words methods, it is clear that
BoW had performed better, which means that the common words present within the
text had a larger influence.
It was observed that deep learning models that used GloVe pre-trained word
embeddings performed far better than Word2vec word embeddings because GloVe
tends to perform semantic analysis between the words to establish a meaningful
relationship between the words, whereas Word2vec embeddings capture the syntactic
relationships between the words (Table 2).
LSTM showed better performance than GRU, this might be due to LSTM con-
taining one extra gate, which performs an additional matrix multiplication, and its
ability to detect the long-term dependencies of words in a text, which also allows it
to capture more complex relationships between input sequences and output labels.
This proved that LSTM is better suited for the hybrid model.
LSTM associated with GloVe embeddings performed better than CNN with GloVe
embeddings, as LSTM can capture long-term dependencies present in the text, and
have the ability to selectively remember inputs based on their relevance to the current
context, whereas CNN tries to capture feature patterns or some specific keywords.
This led to the usage of small-sized CNN filters to weed out the noises and the higher
amount of units in LSTM to identify necessary long-term dependencies.
The multihead attention mechanism alone did not improve text classification com-
pared to standard deep learning approaches, possibly because it focused more on
irrelevant content. Multihead attention performs better with larger texts, allowing it
to reassess token weights and capture the overall context. However, it struggles with
small texts due to limited content. In such cases, CNN and LSTM are beneficial,
as CNN filters out irrelevant content with its small-sized filters, while LSTM cap-
tures long-term dependencies, helping attention layers stay focused on the main con-
text. The proposed model outperformed other models, including ALBERT, despite
ALBERT requiring significantly longer training time. The proposed model com-
pleted training within minutes.
252 H. V. Puvvadi and L. Shyamala
5 Conclusion
Acknowledgements We would want to express our gratitude to Ruggero G. Pensa, Ph.D., Univer-
sity of Torino, Italy, for providing us with the dataset.
References
1. Li K, Cheng L, Teng CI (2020) Voluntary sharing and mandatory provision: private information
disclosure on social networking sites. Inf Process Manage 57(1):102128
2. Ani Petrosyan (2023) Worldwide digital population. https://www.statista.com/statistics/
617136/digital-population-worldwide/
3. Stockdale LA, Coyne SM (2020) Bored and online: reasons for using social media, problematic
social networking site use, and behavioral outcomes across the transition from adolescence to
emerging adulthood. J Adolesc 79:173–183
4. Ma Q, Song HH, Muthukrishnan S, Nucci A (2016) Joining user profiles across online social
networks: from the perspective of an adversary. In: Proceedings of the IEEE/ACM international
conference on advances in social networks analysis and mining (ASONAM), Aug 2016, pp
178–185
5. Aghasian E, Garg S, Gao L, Yu S, Montgomery J (2017) Scoring users’ privacy disclosure
across multiple online social networks. IEEE Access 5:13118–13130
6. Isaak J, Hanna MJ (2018) User data privacy: Facebook, Cambridge Analytica, and privacy
protection. Computer 51(8):56–59
7. Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security
and privacy. J Big Data 5(1):1–18
8. Geetha R, Karthika S, Kumaraguru P (2021) Tweet-scan-post: a system for analysis of sensitive
private data disclosure in online social media. Knowl Inf Syst 63:2365–2404
9. Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf
Ser 2171(1):012021. IOP Publishing
10. Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis,
University of Waterloo
11. Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks.
In: International conference on machine learning, May 2013, pp 1310–1318. PMLR
12. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I
(2017) Attention is all you need. In: Advances in neural information processing systems, vol
30
254 H. V. Puvvadi and L. Shyamala
13. Bioglio L, Pensa RG (2022) Analysis and classification of privacy-sensitive content in social
media posts. EPJ Data Sci 11(1):12
14. Trieu LQ, Tran TN, Tran MK, Tran MT (2017) Document sensitivity classification for data
leakage prevention with twitter-based document embedding and query expansion. In: 2017
13th international conference on computational intelligence and security (CIS), Dec 2017.
IEEE, pp 537–542
15. Battaglia E, Bioglio L, Pensa RG (2020) Classification-based content sensitivity analysis. In:
CEUR workshop proceedings, vol 2646, pp 326–333. CEUR-WS.org
16. Jin X, Li Y, Mah T, Tong J (2007) Sensitive webpage classification for content advertising. In:
Proceedings of the 1st international workshop on data mining and audience intelligence for
advertising, Aug 2007, pp 28–33
17. Sánchez D, Batet M (2016) C-sanitized: a privacy model for document redaction and saniti-
zation. J Assoc Inf Sci Technol 67(1):148–163
18. Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf
Ser 2171(1):012021. IOP Publishing
19. Zhang J, Li Y, Tian J, Li T (2018) LSTM-CNN hybrid model for text classification. In: 2018
IEEE 3rd advanced information technology, electronic and automation control conference
(IAEAC), Oct 2018. IEEE, pp 1675–1680
20. Chen X, Ouyang C, Liu Y, Luo L, Yang X (2018) A hybrid deep learning model for text clas-
sification. In: 2018 14th international conference on semantics, knowledge and grids (SKG),
Sept 2018. IEEE, pp 46–52
21. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation
Online Social Networks: An Efficient
Framework for Fake Profiles Detection
Using Optimizable Bagged Tree
Abstract Fake profiles pose a significant challenge to the integrity and security of
online social networks (OSNs), making it imperative to develop reliable detection
techniques. Spammers invariably adjust their strategies to circumvent filters, making
it difficult to come up with impeccable design filters that can readily accommo-
date newly developed throes of fake profiles. Filters that primarily focus on textual
information have limits while dealing with non-textual forms of spam, such as the
ones based on images or audio. In this paper, we have employed a comprehensive set
of features, including profile information, network characteristics, and behavioural
patterns, to capture the distinguishing characteristics of fake profiles and conducted
experiments on a large-scale dataset of OSN profiles, comprising both genuine and
fake profiles. The Optimizable Bagged Tree algorithm allows us to reach optimize
decision tree structure while leveraging the benefits of ensemble learning. By tena-
ciously pruning the tree’s structure and trimming irrelevant branches, the proposed
framework achieves better generalization and robustness. The results demonstrated
that our model outperforms traditional detection methods in terms of accuracy. More-
over, our approach exhibits high efficiency, enabling real-time detection of fake
profiles in OSNs.
C. Kumar · T. S. Bharati
Department of Computer Science, Jamia Millia Islamia, New Delhi, India
S. Prakash (B)
Department of Electronics and Communication, University of Allahabad, Prayagraj, India
e-mail: shivprakash@allduniv.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 255
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_22
256 C. Kumar et al.
1 Introduction
In this technology-driven era, the way people are connecting with others is changing
at an exponential rate. The emergence of new cellular technology and computa-
tional paradigms has changed human life by connecting them across the planet. The
first researcher was an anthropologist John Barnes, who researched social networks.
The online social networking era started with the launch of SixDegree.com in 1997
by Andrew Weinrich [1]. In the twenty-first century, OSNs like Facebook, Twitter,
LinkedIn, etc. are empowering connectivity across the globe. As per Statista report
2022. As per Statista report 2022 [2], there are 1.7B (billion) active monthly users
across social media platforms. Figure 1 depicts the ranking of various OSN platforms
based on its monthly number of active users. The report ranked Facebook as the most
popular online social network site with a user base of 2910 million as shown in Fig. 1.
This massive increase is driven by numerous benefits users enjoy, like simple contact
with others, actively participating in governmental and political affairs, and exploring
jobs, advertising, information dissemination, or emotional support. The adoption of
social networks was naturally pushed by the emergence of social isolation due to the
recent COVID-19 pandemic.
Today the market of OSN is so big that every second adds up to five new profiles on
Facebook irrespective of spam or benign [3]. Facebook cut off 1.6 billion (B) bogus
accounts in the 1st qtr. of 2022, while 1.7B in the earlier qtr. A record number of over
2.2B bogus accounts were deleted by the SM site in the 1st qtr. of 2019. As per Meta’s
definition, fake accounts encompass such profiles that are intentionally created with a
malicious intent or to impersonate non-human entities such as companies, groups,
or organizations [4]. Typically, spams, rumours, or unethical remarks are uploaded
using a bogus account or spam profile. Numerous studies have been conducted to
decide how to recognize spam messages [5–8]. However, it is still a significant
problem to identify false profiles on OSN websites. The information like profile
photo, sex, name, native place, and other personal information on the network that
3,500
2,000
3,000
1,478
1,263
2,500
1,000
2,000
988
600
1,500
574
573
573
557
550
444
436
430
300
1,000
500
0
is readily accessible is grabbed and utilized to construct fake ones. As a result, their
social media contacts and friends who are connected to them are made aware of false
information. In the actual world, this circumstance has the potential to do significant
harm to people, businesses, and other entities as false accounts and profiles may
convey spam, false web reviews, and false news.
Traditionally, OSN operators use a set amount of resources to identify, physically
verify, and delete fraudulent profiles [9]. Facebook faces greater threats than other
social networks due to its large user base and restricted functionality. The attacker
then hits Twitter and LinkedIn. Sophos security threat report survey [10]
• 67% of OSN users said they were spammed, and the amount had increased in two
years.
• 57% of OSN users said they had been hit by phishing attacks, and the number of
users who have been hit by phishing has grown by more than 100%.
The paper proceeds along with Sect. 2 that starts by providing background infor-
mation and motivation. In Sect. 3, a comprehensive survey of relevant research is
presented. The proposed framework is detailed in Sect. 4. Section 5 offers the results
analysis and subsequent discussion. Lastly, Sect. 6 encompasses the conclusion and
outlines directions for future research.
Social networking services like Twitter, Facebook, Google+, etc. are widely used
in the virtual world. 22 billion records were exposed due to data breaches in 2021.
More than any other nation, the U.S. was the target of 46% of cyberattacks in 2020.
A whopping 40% of people on earth are not connected to Internet, making them
easy targets for cyberattacks if and when they do connect [11]. As a result, we must
fully comprehend the situation and explore how to build powerful protections in
order to create a trustworthy cyberspace. Even though there have been numerous
papers surveying online social network (OSN) attacks [12–17] surveys are limited
in discussing various AI technique. They did not actually adopt a broad range of
OSN assault defensive strategies, such as prevention, detection, and reaction (or
mitigation). The investigation of the different diverse security threats and areas and
different approaches and models proposed is the key assertion of this paper. The
background and motivation of this paper highlights the ever-increasing use of social
networking services and the ever-growing market for information security. The paper
notes the high number of data breaches and cyberattacks, particularly in the US, and
emphasizes the need for strong protections in cyberspace.
258 C. Kumar et al.
3 Related Work
This study [22] discussed several approaches for analysing tweets and identifying
them as either spam or ham based on the terms that are included in the tweets. NB
classifier is utilized even though many other DL and ML approaches, such as support
vector machine (SVM), clustering approaches, and binary detection models, exist
for this purpose. Accessing or browsing irrelevant spam messages or tweets exposes
Twitter users to malware that steals personal information. Malware that steals data
has been an issue, but so have fake trends. The situation must be managed. It’s
conceivable that spammers will engage more users now that they can automatically
follow new accounts they find interesting.
In this work [23], researcher has been suggested that phony Instagram accounts
may be identified automatically with the use of a fake profile detector, protecting
Instagram users’ social lives. The detection of fake Instagram accounts is helped
by ML algorithms with supervision. Once identified, the IDs of bogus accounts are
catalogued in a data dictionary that may be used to assist relevant authorities in taking
action against spammers. The classification methods that were utilized to train the
dataset were compared experimentally.
This study [24] examined mental diseases via language individuals use to express
themselves on Reddit and Twitter, two famous social media sites. Their objective is to
establish an empirical model for detecting and diagnosing serious mental diseases.
Using text cleaning and Word2Vec language modelling, they are able to produce
numerical features from the text; this allows them to classify posts and users with
high accuracy using a SVM trained on the dataset they created. They have a 95%
accuracy rate with Twitter users and a 73% accuracy rate with the Reddit challenge.
Several studies have also proposed hybrid approaches that combine different tech-
niques, such as combining ML algorithms with rule-based or graph-based methods.
Recent researches have also examined the use of fresh kind of data such as text data
from user’s profile or images from profile pictures to enhance fake profile detection.
Overall, the recent literature suggests that the problem of fake profile detection in
OSN is still an active research area, and there is no comprehensive solution. The
development of potent detection methods to keep up with evolving tactics of adver-
saries remains a key challenge. While there have been several approaches proposed
for detecting fake profiles in (OSNs), there is a research gap in terms of developing
an efficient model that uses an Optimizable Bagged Tree (OBT).
4 Proposed Model
Detecting fake profiles is a challenging task as adversaries use various tactics to evade
detection systems, such as using automated tools to create fake accounts or stealing
the identity of existing users. Therefore, the problem of fake profile detection in OSN
requires the development of effective and robust techniques to identify fake profiles
more accurately. The problem of fake profile can be formulated mathematically as a
binary-classification problem.
260 C. Kumar et al.
A set of profiles (as either genuine or fake) P = {p1, p2, …, pm}, where each
profile pi is represented as a feature vector Xi = (Xi1 , Xi2 , Xi3 , …, Xin ) and a set of
labels Y = {0, 1} representing the authenticity of the profile, the goal is to learn a
classifier f that accurately predicts the label y for a given profile xi.
Feature vector X where X = (X 1 , X 2 , X 3 , …, X n ) each xi is a feature that char-
acterizes the profile. The set of genuine profiles is denoted as G, and the set of fake
profiles is denoted as F. The target is to train classifier f that maps a feature vector
x to a binary label y, where y = 1 if the profile is genuine, and y = 0 if the profile is
fake.
Formally, the problem can be stated as finding a function F such that.
F: X → Y where X ={ feature space}, Y = {the label space. The function f can
be learned from a labelled training set D = {(x1, y1), (x2, y2), …, (xn, yn)}, where xi
∈ X and yi ∈ Y for i = 1, 2, …, n. The learned function f can then be used to predict
the labels of new, unseen profiles.
The proposed Optimized Bagged Trees algorithm is one approach that can be
used to learn the classifier f . It involves building multiple decision tree classifiers
based upon different subsets of the training data. Then their predictions is aggre-
gated to obtain a more robust and accurate classification result. To use Bagged Trees
for fake profile detection in OSNs, the first step is to collect a dataset of labelled
profiles, where each profile is labelled as either genuine or fake. The next step is to
pre-process the data by extracting relevant features, such as the no. of friends, the
frequency of posts, the length of the profile description. Further data is pre-processed
to split into training and testing sets. These models can be created using various DT
algorithms, such as C4.5, ID3, or CART. Then their predictions is aggregated to
obtain a final classification result. This can be done by either taking the majority
vote of the predictions or by using weighted voting based on the accuracy of each
model. It has been used in various studies and has achieved high accuracy rates in
detecting fake profiles on different social networking platforms. However, like any
ML technique, it is important to continuously update and refine the model to adapt
to the evolving nature of fake profiles and their tactics.
Bagged Trees algorithm is a combination of multiple decision trees trained on
different subsets of the training data, with each tree making an independent predic-
tion. The Optimized Bagged Trees algorithm can be mathematically formulated as
follows:
1. Given a dataset of N samples {(X 1 , Y 1 ), (X 2 , Y 2 ), …, (Xn, Yn)}, where Xi =
feature vector of the ith sample, yi = class label.
2. Split the dataset into K subsets {(X 11 , Y 11 ), (X 12 , Y 12 ), …, (X 1k , Y 1k )}, {(X 21 ,
Y 21 ), (X 22 , Y 22 ), …, (X 2k , Y 2k )} …. {(X k1 , Y k1 ), (X k2 , Y k2 ), …, (X kk , Y kk )} using
a random sampling with replacement method.
3. For each subset, train a decision tree T using the CART algorithm or any other
decision tree algorithm, where T maps the feature vector x to its corresponding
class label y.
4. Predict the class label of a new sample x by aggregating the predictions of all K
decision trees using a majority voting scheme or weighted voting scheme.
Online Social Networks: An Efficient Framework for Fake Profiles … 261
5. Compute the accuracy of the Bagged Trees algorithm using a suitable evaluation
metric such as Sensitivity or Recall, Specificity, Accuracy, Precision, Error rate,
and F1 score.
6. The Bagged Trees algorithm can be further improved by tuning the hyperparam-
eters such as the number of decision trees K and the maximum depth of each
decision tree.
Optimized Bagged Trees Algorithm
1. Start
2. Collect dataset of social network profiles
3. Pre-process the data (remove duplicates, missing values, etc.)
4. Extract features from the profiles (e.g. profile picture, number of friends, post
frequency, etc.)
5. Divide data into two distinct subsets: the training set and the testing sets.
6. Train different DT classifiers on different training data subsets (Bagging)
7. Estimate the performance of each DT classifier on the testing set
8. Aggregate the outcomes of the DT classifiers to make a final classification
(majority voting)
9. Evaluate the performance of the Bagged Trees algorithm on the testing set
10. If performance is satisfactory, use the Optimized Bagged Trees otherwise, adjust
parameters and try again steps 6–8 are repeated multiple times with different
subsets
11. End.
This section provides deep analysis of the data, draw meaningful conclusions,
and address outcomes. The accuracy of the proposed model is obtained through
simulation. Accuracy can be calculated as:
TP + TN
Accuracy =
TP + TN + FP + FN
where: True Negative (TN) = the number of correctly identified genuine profiles
(Table 1).
The table above presents the accuracy results (in percentage) of different models
for detecting fake profiles. The Optimizable Bagged Tree model outperforms other
models, achieving an accuracy of 99.9%.
Bagged Decision Trees, Bagged Random Forest, Bagged AdaBoost, and Bagged
C4.5 models also show respectable accuracy, with percentages ranging from 94.5 to
98.3%. These results highlight the superiority of the Optimizable Bagged Tree model
in accurately identifying fake profiles in OSNs. Its high accuracy score demonstrates
its effectiveness in distinguishing between genuine and fake profiles, making it a
promising approach for combating fake profiles. From above table and figure, it
appears that the Optimized Bagged Tree model has the highest accuracy on the
validation dataset (99.9%), while Bagged Decision Trees, Bagged Random Forest,
Bagged AdaBoost, and Bagged C4.5 models have lower accuracy values ranging
Online Social Networks: An Efficient Framework for Fake Profiles … 263
from 94.5 to 97.4%. In summary, our proposed efficient model for detecting fake
profiles OSNs using the Optimizable Bagged Tree algorithm achieves a high accuracy
of 99.9%. Comparative analysis with other models, such as Bagged Decision Trees,
Bagged Random Forest, Bagged AdaBoost, and Bagged C4.5, shows their lower
accuracy values ranging from 94.5 to 97.4%. The simulation was conducted on a
PC running Windows 10 using Python on Google Colab. These outcomes suggest
that the Optimizable Bagged Tree model outperforms other models in accurately
identifying fake profiles in OSNs, making it a promising approach for combating the
issue of fake profiles.
The fake profile detection in OSNs is an emerging area of research. The application
of advanced ML techniques have shown enhancement in the accuracy and efficiency
of fake profile detection. Such methods may utilise large-scale data and complex
patterns to recognise hidden indicators of fraudulent activities. In conclusion, the
Optimizable Bagged Tree model demonstrated better performance in detecting
fake profiles in OSNs. With an accuracy of 99.9%, it outperforms models such as
Bagged Decision Trees (94.5%), Bagged Random Forest (98.3%), Bagged AdaBoost
(97.2%), and Bagged C4.5 (97.4%). These results highlight the effectiveness and reli-
ability of the Optimizable Bagged Tree model for identifying fake profiles, making
it a appealing approach for enhancing the security and trustworthiness of OSNs. The
Optimizable Bagged Tree algorithm leverages the benefits of ensemble learning and
optimization techniques, allowing us to improve the model’s accuracy and generaliza-
tion. By considering the information such as profile information, network character-
istics, and behavioural patterns, our model captures the distinguishing characteristics
of fake profiles effectively.
Future work may involve further refinement of the model by incorporating addi-
tional features or exploring other optimization algorithms. Additionally, reseachers
may investigate the model’s performance on different OSN platforms and evaluate
its robustness against evolving fake profile generation techniques.
References
1. Can U, Alatas B (2019) A new direction in social network analysis: online social network
analysis problems and applications. Phys A Stat Mech Appl 535:122372. https://doi.org/10.
1016/j.physa.2019.122372
2. TheStatisticsPortal (2022) Biggest social media platforms 2022|Statista. https://www.statista.
com/statistics/272014/global-social-networks-ranked-by-number-of-users/. Accessed 19 Aug
2022
264 C. Kumar et al.
3. Kaur R, Singh S, Kumar H (2018) Rise of spam and compromised accounts in online social
networks: a state-of-the-art review of different combating approaches. J Netw Comput Appl
112:53–88
4. TheStatisticsPortal (2022) Facebook fake account deletion per quarter 2022|Statista. https://
www.statista.com/statistics/1013474/facebook-fake-account-removal-quarter/. Accessed 19
Aug 2022
5. Karunakar E, Durga V, Pavani R et al (2022) Ensemble fake profile detection using machine
learning (ML). www.joics.org. Accessed 22 Aug 2022
6. Azab AE, Idrees AM, Mahmoud MA et al (2016) Fake account detection in Twitter based on
minimum weighted feature set 10:13–18
7. Savyan PV, Bhanu SMS (2018) Behaviour profiling of reactions in facebook posts for anomaly
detection. In: Proceedigns of the 2017 9th international conference on advanced computing,
ICoAC 2017, pp 220–226
8. Egele M, Stringhini G, Kruegel C et al (2017) Towards detecting compromised accounts on
social networks. IEEE Trans Depend Secure Comput 14:9616. https://doi.org/10.1109/TDSC.
2015.2479616
9. Wanda P, Jie HJ (2020) DeepProfile: finding fake profile in online social network using dynamic
CNN. J Inform Sec Appl 52:102465
10. Sahoo SR, Gupta BB (2019) Classification of various attacks and their defence mechanism in
online social networks: a survey. Enterp Inform Syst 13:832–864
11. 166 Cybersecurity statistics and trends [updated 2022]. https://www.varonis.com/blog/cybers
ecurity-statistics. Accessed 22 Aug 2022
12. Kayes I, Iamnitchi A (2017) Privacy and security in online social networks: a survey. Online
Soc Netw Media 3–4:1–21
13. Kumar S, Shah N (2018) False information on web and social media: a survey. https://doi.org/
10.48550/arxiv.1804.08559
14. Rathore S, Sharma PK, Loia V et al (2017) Social network security: issues, challenges, threats,
and solutions. Inform Sci 421:43–69
15. Guo Z, Cho JH, Chen IR et al (2021) Online social deception and its countermeasures: a survey.
IEEE Access 9:1770–1806
16. Rahman MS, Reza H (2022) A systematic review towards big data analytics in social media.
Big Data Mining Anal 5:228–244
17. Roy PK, Chahar S (2021) Fake profile detection on social networking websites: a comprehen-
sive review. IEEE Trans Artif Intell 1:271–285
18. Alsunaidi SJ, Alraddadi RT, Aljamaan H (2022) Twitter spam accounts detection using machine
learning models. In: Proceedings of the 2022 14th international conference on computational
intelligence and communication networks (CICN), pp 525–531
19. Chatterjee M, Samanta P, Kumar P et al (2022) Suicide ideation detection using multiple
feature analysis from Twitter data. In: Proceedigns of the 2022 IEEE Delhi section conference
(DELCON), pp 1–6
20. Anklesaria K, Desai Z, Kulkarni V et al (2021) A survey on machine learning algorithms for
detecting fake instagram accounts. In: Proceedings of the 2021 3rd international conference
on advances in computing, communication control and networking (ICAC3N), pp 141–144
21. Antonakaki D, Fragopoulou P, Ioannidis S (2021) A survey of Twitter research: data model,
graph structure, sentiment analysis and attacks. Exp Syst Appl 164:114006. https://doi.org/10.
1016/j.eswa.2020.114006
22. Santoshi KU, Bhavya SS, Sri YB et al (2021) Twitter spam detection using Naïve Bayes classi-
fier. In: Proceedings of the 6th international conference on inventive computation technologies,
ICICT 2021. https://doi.org/10.1109/ICICT50816.2021.9358579
23. Harris P, Gojal J, Chitra R et al (2021) Fake instagram profile identification and classification
using machine learning. In: Proceedings of the 2021 2nd global conference for advancement
in technology, GCAT 2021. https://doi.org/10.1109/GCAT52182.2021.9587858.
24. Hemmatirad K, Bagherzadeh H, Fazl-Ersi E et al (2020) Detection of mental illness risk on
social media through multi-level SVMs. In: Proceedings of the 8th Iranian joint congress on
fuzzy and intelligent systems, CFIS 2020. https://doi.org/10.1109/CFIS49607.2020.9238692
Exploring Risk Factors for
Cardiovascular Disease: Insights from
NHANES Database Analysis
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 265
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_23
266 G. Parashar et al.
1 Introduction
According to WHO [1], CVD has been found as a major cause of mortality and other
connected disorders in the globe, accounting for over 17.9 million fatalities annually.
Additionally, it causes significant worldwide financial and health burdens. The term
“CVD” refers to a large number of related diseases that affects the tubes that carry
blood toward the heart, including coronary heart disease (CHD), cerebrovascular
disease, peripheral arterial disease, rheumatic heart disease, congenital heart disease,
deep vein thrombosis, and pulmonary embolism [2]. Multiple variables, including
gender, age, family history of illness, physical activity, and pre-existing medical
issues [3, 4], affect the chance of getting CVD.
A collection of diseases known as CVD impacts the heart, capillaries, and blood
vessels. A patient with CVD may or may not have symptoms. The illness causes
an irregular heartbeat and narrows the blood arteries in the heart or other parts of
the body. Different factors, including a family history of heart disease, sugar intake,
binge drinking, high cholesterol, fat, use of tobacco products, painful legs that take a
long to recover, red skin on the legs, numbness on the face, trouble seeing, walking,
and talking, and exhaustion, are some of the symptoms of CVD.
To find risk factors for detecting CVD and other heart-related disorders, several
studies have been conducted. Here are a few instances of research [5] that looked at
several aspects that affect the onset of CVD:
2 Literature Review
3 Methodology
In this section, we propose a research methodology to explore risk factors for CVD
using the NHANES database. We define research objectives, collect data, prepare
data for the experiment, prepare the design of the study, identify important features,
provide statistical analysis, interpret the outcome, and conclude. The research objec-
tives for the study are (1) to identify the risk factors for CVD and (2) to identify the
most important features for early detection of CVD.
A series of health and nutrition surveys are carried out yearly by the Centers for
Disease Control and Prevention (CDC) via its subsidiary, the National Center for
Health Statistics (NCHS). To assess the population’s health and nutritional status in
America, the CDC runs the NHANES [20] survey. NHANES is a comprehensive
nationwide study that employs a combination of interviews, physical assessments,
and laboratory analyses to gather data on a statistically representative cross-section
of the American populace. The survey comprises both a face-to-face interview and a
physical examination that is carried out in a mobile examination facility. The phys-
ical assessment encompasses anthropometric measurements of stature, body mass,
sphygmomanometric evaluation of arterial pressure, and diverse clinical analyses,
including haematology and urinalysis. Researchers, policymakers, and health profes-
sionals utilize NHANES data to track health and nutrition patterns, detect inequalities
in health outcomes, and provide guidance for public health policies and initiatives
[21].
The process of model development was partitioned into four distinct phases,
namely dataset preparation, model development, model evaluation, and comparison
of model performance with other existing models (refer Fig. 1).
Fig. 1 Model development process. We divided the process into four phases, viz. dataset prepara-
tion, data modeling, model development, model performance comparison
Exploring Risk Factors for Cardiovascular Disease … 269
The dataset curation process involved scrutinizing the MCQ file to identify all
attributes or columns that contained indicators of CVD. CVD was assessed through
a medical condition questionnaire (MCQ) that included inquiries such as “Have
you ever been diagnosed with congestive heart failure?” (MCQ160B), “Have you
ever been diagnosed with coronary heart disease (CHD)?” (MCQ160C), “Have you
ever been diagnosed with angina/angina pectoris?” (MCQ160D), and “Have you
ever been diagnosed with a heart attack?” (MCQ160E). Participants who responded
affirmatively to any of the questions were classified as individuals with CVD disease
(CVD). A sample of 1509 adults was drawn from a population of 30,862 participants,
who were classified as either having CVD or not having CVD.
For the study, we considered a cross-sectional analysis that is based on data col-
lected from 2007 to 2012. To examine associations within the variables present
in the dataset, we performed association analysis using heat map (refer Fig. 2). It
was observed that there is a positive correlation between cuff size (BPACSZ), age
(RIDAGEYR), body mass index (BMXBMI) and blood pressure (PEASCTM1), and
Disease CVD. All the statistical analyses were performed using RapidMiner Studio
10.0, Python 3.10.9, and other Python data processing libraries. The dataset was
cleaned and preprocessed using Python and RapidMiner Studio. NHANES dataset
contained multiple files, and for the study, we included demographics, examination,
laboratory, and questionnaire files only. For this study, 2007–2008, 2009–2010, and
2011–2012 year surveys were selected as they covered all the parameters required
for predicting risk factors for CVD. We selected a cohort of participants who were
aged 30 years or more and had a CVD history. A total of 1509 adults aged above
30 years were selected from the dataset. After merging all the files (based on SEQN),
a total of 64 features were collected. The merged dataset was cleaned for missing
values, highly correlated, mostly empty columns and duplicates. Normalization was
performed to fill in empty rows. After data cleaning, the dataset was reduced to 44
features, the same has been listed in Table 1 and heat map in Fig. 2. From Fig. 2,
correlations between different features were evaluated using a correlation matrix that
is then displayed using heat maps.
There are 1509 CVD cases and 27,915 cases that did not have CVD. After over-
sampling the dataset, we were able to inflate CVD cases to 16,412 and 12,534 NO
CVD cases. Before doing the prediction we experimented to find out the effect of
class imbalance on feature importance score, for this, we used the Gini score. From
the outcome, it became evident that there was not much impact of class imbalance
on the scores generated by the Gini Index algorithm and were almost similar (refer
Table 1). Gini Index, also known as Gini impurity, determines how likely it is that
270 G. Parashar et al.
Table 1 Univariate analysis of 44 attributes, with Gini Index (GI-1) with sampling and without
sampling (GI-2)
Attributes Statistics
Min Max Average SD Description GI-1 GI-2
SEQN 41,475 71,916 – – Patient number 0.01 0.01
Disease CVD NO CVD – – Suffering from CVD – –
(1509) (27,915) or not
BMXBMI 12.4 84.9 28 7.61 Body mass index 0.07 0.01
(BMI)
PEASCTM1 2 3059 614.1 227 Blood pressure time 0.06 0.01
in seconds
BPQ150A 1 2 – – Had food in the past 0.04 0.01
30 min?
BPACSZ 1 5 3.8 – Cuff size (cm) (width 0.07 0.01
.× length)
BPXPLS 0 224 72.4 11.7 60 s pulse (30 s pulse 0.05 0.01
* 2)
BPXML1 0 888 152.7 28.2 MIL: maximum 0.1 0.01
inflation levels
(mmHg)
BPXSY1 72 238 125.9 19.3 Systolic: blood 0.1 0.01
pressure (first
reading) (mmHg)
BPXDI1 0 134 66.9 13.0 Diastolic: blood 0.02 0.01
pressure (first
reading) (mmHg)
BPXSY2 74 234 124.7 18.8 Systolic: blood 0.1 0.01
pressure (second
reading) (mmHg)
BPXDI2 0 134 66.4 13.3 Diastolic: blood 0.02 0.01
pressure (second
reading) (mmHg)
BPXSY3 74 232 123.6 18.2 Systolic: blood 0.1 0.01
pressure (third
reading) (mmHg)
BPXDI3 0 128 65.9 13.5 Diastolic: blood 0.02 0.01
pressure (third
reading) (mmHg)
LBXCRP 0 20 0.5 0.796 CRP (mg/dL) 0.13 0.02
LBDHDD 7 179 50.5 13.6 Direct 0.05 0.01
HDL-cholesterol
(mg/dL)
LBDHDDSI 0.2 4.6 1.3 0.352 Direct 0.05 0.01
HDL-cholesterol
(mmol/L)
LBXWBCSI 1.4 99.9 7.3 2.37 White blood cell 0.02 0.01
count: SI
LBXLYPCT 2.7 85.5 30.6 9.92 Lymphocyte percent 0.07 0.01
(%)
LBXMOPCT 0.6 66.9 8 2.41 Monocyte percent 0.03 0.01
(%)
(continued)
Exploring Risk Factors for Cardiovascular Disease … 271
Table 1 (continued)
Attributes Statistics
Min Max Average SD Description GI-1 GI-2
LBXNEPCT 0.8 96.6 57.6 10.8 Segmented 0.06 0.01
neutrophils percent
(%)
LBXEOPCT 0 34.1 3.2 2.29 Eosinophils percent 0.02 0.01
(%)
LBXBAPCT 0 19.7 0.7 0.545 Basophils percent 0.01 0.01
(%)
LBDLYMNO 0.2 71.1 2.2 1.33 Lymphocyte number 0.05 0.01
LBDMONO 0 10.2 0.6 0.202 Monocyte number 0.03 0.01
LBDNENO 0.1 83.1 4.3 1.69 Segmented 0.04 0.01
neutrophils number
LBDEONO 0 8.4 0.2 0.193 Eosinophils number 0.02 0.01
LBDBANO 0 4.7 0 0.066 Basophils number 0.02 0.01
LBXRBCSI 2.4 7.2 4.6 0.480 Red cell count SI 0.02 0.01
LBXHGB 6.1 19.7 13.8 1.46 Hemoglobin (g/dL) 0.02 0.01
LBXHCT 20.5 57.7 40.4 4.16 Hematocrit (%) 0.03 0.01
LBXMCVSI 50.5 125.3 88.9 5.87 Mean cell volume 0.06 0.01
(fL)
LBXMCHSI 14.9 60.8 30.4 2.34 Mean cell 0.05 0.01
hemoglobin (pg)
LBXMC 25.1 43.8 34.2 0.928 Mean cell 0.03 0.01
hemoglobin
concentration (g/dL)
LBXRDW 6.3 37.8 13.1 1.25 Red cell distribution 0.03 0.01
width (%)
LBXPLTSI 11 1000 246.1 68.9 Platelet count (1000 0.06 0.01
cells/uL)
LBXMPSI 4.7 13.5 8 0.893 Mean platelet 0.03 0.01
volume (fL)
WTSAF2YR 0 521,033.4 69,381.9 – Fasting subsample 0.2 0.01
2 year MEC weight
LBXTR 12 2742 139.7 79.4 Triglyceride (mg/dL) 0.23 0.03
LBDTRSI 0.1 31 1.6 0.896 Triglyceride 0.23 0.03
(mmol/L)
LBDLDL 9 344 105.6 22.6 LDL-cholesterol 0.24 0.02
(mg/dL)
LBDLDLSI 0.2 8.9 2.7 0.584 LDL-cholesterol 0.24 0.02
(mmol/L)
RIAGENDR Male Female – – Gender 0.01 0.01
RIDAGEYR 40 80 48.8 – Age at screening 0.22 0.02
adjudicated—recode
272 G. Parashar et al.
Fig. 2 Associations: circles are numerical associations and squares are categorical
∑
n
Gini Index = 1 −
. ( pi )2 . (1)
i=1
Our model is an ensemble of decision tree, random forest, naive Bayes, k-NN, and
neural network with weighted majority voting approach. Here’s the mathematical
formulation for our proposed model.
Let’s assume we have m base models denoted as . M1 , . M2 , …, . Mm . Each base
model produces a prediction . yi ∈ 0, 1 for a given input sample x. The ensemble
model combines the predictions of the base models using weighted majority voting.
We assign weights .w1 , .w2 , …, .wm to each base model. The weights represent the
Exploring Risk Factors for Cardiovascular Disease … 273
on the performance or accuracy of the models, using cross-validation for it. The paper
is non-simulation based as it uses experimental methodology to find the risk factors
related to CVD, uses data driven approach. The setup is elaborated in Fig. 1, it lacks
simulation component like a simulation software for any implementation. For the
study, we prepared an experimental setup using RapidMiner studio on MacBook Air
with (1.8 GHz Dual-Core Intel Core i5, and 8 GB RAM). The dataset resulting from
the data preparation stage (Sect. 3.1) was split into testing and training. Oversampling
was used to produce a balanced dataset of 70/30 train/test split. In the training phase,
prepared after feature engineering, we used a training dataset to train the model for
prediction. In the validation phase, the trained model was tested on how accurately
it predicted the corresponding labels of our testing dataset. Each model undergoes
tenfold cross-validation with random split to fetch accurate model performance.
To produce optimal model parameters, a method involving grid search was
employed for each model, which included parallelized performance evaluation to
evaluate the model’s performance. This approach involved a systematic search of
the hyperparameters specified by the model to determine the optimal combination
of values that would result in the highest level of accuracy. The parallelized per-
formance evaluation aspect of the process allowed for the testing of multiple sets
of hyperparameters simultaneously, making it a more efficient and effective means
of tuning the models. Ultimately, this approach helped to ensure that the resulting
models were performing at their best potential.
During the third phase, the predictive model is constructed and its performance is
assessed through the utilization of performance metrics such as accuracy, sensitivity,
specificity, and F1 score. These calculations are summarized in the confusion matrix
(refer Fig. 3). The confusion matrix is a performance measurement tool for machine
learning problems where four actual versus predicted are placed. On the top of the
matrix, actual values are placed, and on the left side of the matrix, predicted values
are placed. From Fig. 3, we can analyze that there are a total of 15,733 cases of
real CVD cases that were correctly classified as CVD cases by our model, 10,749
cases are those who did not have CVD were correctly classified as NO CVD. Since
the dataset was balanced using oversampling, therefore we can use accuracy as a
performance metric. Accuracy of our model can be computed using the formula,
15,733+10,749
accuracy = . TP+TN
Total
= . 15,733+10,749+679+17,858 = 91.49%. Other performance metrics
are mentioned in Table 2. The final comparison of our model with other related
studies is summarized in Table 3.
We found that our model was able to perform better than the studies we reviewed.
We attempted to make the performance of a predictive model better by improving
the accuracy and other performance metrics.
274 G. Parashar et al.
5 Conclusion
References
19. Hasan KA, Hasan MAM (2020) Prediction of clinical risk factors of diabetes using multiple
machine learning techniques resolving class imbalance. In: 2020 23rd international conference
on computer and information technology (ICCIT). IEEE, pp 1–6
20. National health and nutrition examination survey. https://wwwn.cdc.gov/Nchs/Nhanes/.
Accessed 14 Apr 2023
21. Dinh A, Miertschin S, Young A, Mohanty SD (2019) A data-driven approach to predicting
diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak
19(1):1–15
22. Oh T, Kim D, Lee S, Won C, Kim S, Yang J, Yu J, Kim B, Lee J (2022) Machine learning-based
diagnosis and risk factor analysis of cardiocerebrovascular disease based on KNHANES. Sci
Rep 12(1):2250
Review of Local Binary Pattern
Histograms for Intelligent CCTV
Detection
Abstract This paper presents a comprehensive review of the use of Local Binary
Pattern Histogram (LBPH) in smart Closed-Circuit Television (CCTV) systems for
real-time detection and analysis of abnormal events. The review covers various
aspects of smart CCTV, including object detection, motion detection, and abnormal
event detection. It discusses existing literature on LBPH-based methods, highlighting
their strengths and weaknesses. The review emphasizes the challenges faced in smart
CCTV detection, such as handling complex scenes, occlusions, lighting variations,
and false alarms. It also identifies potential research directions, such as incorpo-
rating deep learning techniques, exploring multi-modal data fusion, and utilizing
edge computing for real-time processing. In summary, this review offers valuable
insights into the current state-of-the-art techniques for smart CCTV detection using
LBPH. It provides researchers, practitioners, and policymakers involved in the devel-
opment and deployment of smart CCTV systems with valuable information. The
findings of this review can guide future research and contribute to the advancement
of public safety and security applications.
1 Introduction
A. Video surveillance is an important tool for ensuring public safety and security.
However, traditional CCTV systems can be limited in their ability to detect and
respond to threats in real time.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 277
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_24
278 D. Sharma et al.
Smart CCTV systems, on the other hand, leverage advanced technologies such as
computer vision, machine learning, and artificial intelligence to improve the accuracy
and speed of video analysis. By developing smart CCTV systems, we can enhance
public safety and security while reducing the workload and cost of human operators.
B. Local Binary Pattern Histograms
The review paper extensively explores the concept and application of Local Binary
Pattern Histogram (LBPH) in the context of intelligent CCTV detection. The section
dedicated to LBPH provides a comprehensive understanding of its fundamental
principles, advantages, and limitations.
Local Binary Pattern (LBP): The paper discusses the concept of local binary
patterns, which are local texture descriptors widely used in computer vision tasks.
LBPs encode the relationship between a pixel and its neighbors by comparing their
intensity values. This operator captures texture information and is particularly useful
for analyzing patterns in surveillance footage.
Histogram Representation: LBPH builds upon the LBP operator by introducing
histogram representation. LBPH divides the image into local regions and computes
LBPs within each region. These local LBPs are then quantized into histograms,
which encode the frequency of different pattern occurrences. The resulting histogram
representation efficiently summarizes the texture information in an image [1].
Robustness to Illumination Variations: One of the key advantages of LBPH is its
robustness to illumination variations. By focusing on local texture patterns rather than
absolute intensity values, LBPH-based systems can effectively handle challenging
lighting conditions encountered in real-world surveillance scenarios.
Object Recognition and Event Detection: The paper emphasizes the application
of LBPH in object recognition and event detection in CCTV footage. The ability
of LBPH to capture discriminative texture features makes it suitable for identifying
specific objects or events of interest, such as people, vehicles, or suspicious activities.
Limitations and Challenges: The review paper also addresses the limitations and
challenges associated with LBPH. While LBPH performs well in capturing texture
information, it may struggle to recognize complex objects or events with subtle
variations. Additionally, parameter selection and tuning can significantly impact the
performance of LBPH-based systems, requiring careful consideration.
By delving into the concept of LBPH and its application in intelligent CCTV
detection, the review paper provides valuable insights into the strengths and limita-
tions of this technique. The paper serves as a guide for researchers and practitioners
interested in utilizing LBPH for surveillance tasks, highlighting its effectiveness in
texture-based analysis and its potential for advancing intelligent CCTV systems.
C. The purpose of this review paper is to provide an overview of the use of LBPH
in smart CCTV systems. We will review the relevant literature on LBPH-based
video analysis and highlight its potential applications in object detection, human
activity recognition [2–4], facial recognition, and other areas. By examining the
Review of Local Binary Pattern Histograms for Intelligent CCTV Detection 279
strengths and limitations of existing studies, we aim to identify the key challenges
and opportunities for future research.
2 Literature Review
According to the (Table 1) literature of review, the LBPH algorithm has used several
times in the above literature of review table shows higher accuracy compared to the
above-mentioned techniques. It has been found to be effective in handling challenges
related to pose variation, illumination, and occlusion, which are common in face
recognition tasks.
As mentioned in the above (see Fig. 1) literature of review, the Local Binary
Pattern Histogram (LBPH) algorithm is a technique used in literature of review
research papers for face recognition and enhanced real-time face recognition [10].
It has been analyzed and compared with other face recognition techniques, such as
eigenfaces and fisher faces, in terms of performance and accuracy.
The review paper extensively discusses Local Binary Pattern Histograms (LBPH)-
based CCTV detection systems, highlighting their significance and effectiveness in
intelligent surveillance [6]. The section dedicated to LBPH-based CCTV detection
systems provides in-depth insights into their architecture, working principles, and
key components.
Feature Extraction: LBPH-based systems leverage the local binary pattern (LBP)
operator to extract robust and discriminative features from surveillance footage [13].
By analyzing the local texture patterns of image regions, LBPH efficiently captures
essential information for object recognition and event detection.
Histogram Representation: LBPH further utilizes the histogram representation to
encode the extracted features. The local binary patterns are quantized into histograms,
capturing the distribution of different pattern occurrences. This histogram represen-
tation enables efficient and compact feature representation.
Training and Recognition: LBPH-based systems typically involve a training phase
where the system learns the patterns and characteristics of specific objects or events.
This is followed by a recognition phase, where the system matches the extracted
features of new observations against the learned patterns to identify and classify
objects or events of interest.
Parameter Settings: The review paper emphasizes the importance of parameter
settings in LBPH-based CCTV detection systems. Parameters such as the size of
Table 1 Literature of review
280
S Year Author(s) Focus of the Key points in Technique(s) used Parameters analyzed Research gaps
No paper coverage
1 2022 Anna Irin Anil Smart Configuration, Emergence of machine learning, Non-human detection, Minimize the human
et al. [5] security embedded deep learning, computer vision, motion detection under interaction in between
CCTV surveillance generic object tracking using sunlight exposure, the industry to enhance
system regression networks (GOTURN), human motion the security
you only look once (YOLO) and detection surveillance
Local Binary Pattern Histogram
algorithm (LBPH)
2 2021 Kothari et al. [6] Survey on Dropbox, Raspberry Raspberry Pi, Internet of Things Intruder tracking, live This smart surveillance
smart Pi, Internet of Things (IoT) and PIR sensor streaming, system authenticated
security (IoT), Picamera, PIR communication, and encrypted on the
CCTV sensor motion detection, receiver side to obtain
system image processing accurate information at
the end of the
recipients
3 2021 Mihir Shah Smart Long short-term Convolutional neural network Detection of normal To identify the
et al. [7] surveillance memory (LSTM), (CNN), long short-term memory and abnormal events/ anomalies in the given
system automated teller (LSTM), RNN (recurrent neural activities video like theft,
machine (ATM), networks), backpropagation assault, robbery, etc.
surveillance, through time (BPPT)
convolutional neural
network (CNN)
4 2020 Deepak et al. [8] Attendance OpenCV, face OpenCV (open-source computer Face recognition This system is used to
system based acknowledgement vision) system, attendance obtain the accurate
on facial system, bio-metric, system [8, 9] attendance system
recognition verification
(continued)
D. Sharma et al.
Table 1 (continued)
S Year Author(s) Focus of the Key points in Technique(s) used Parameters analyzed Research gaps
No paper coverage
5 2019 Farah Deeba Enhanced Local Binary Pattern Feature vectors, extracting Feature extraction, face To obtain the valid
et al. [10] real-time face Histogram (LBPH), histograms with LBP, LBPH detection, feature results from pose
recognition face recognition, algorithm, SIFT algorithm matching, variation, illumination
feature extraction preprocessing and occlusion
6 2019 Raman Sharma Performance Haar cascade, Fisher faces, data set creator/face Parameter selections, In this paper, the
et al. [11] analysis for histogram, face detector, local binary patterns extract the histograms LBPH algorithm is
human face recognition, face histograms for the image, training having more accuracy
recognition detection, feature the algorithm, perform as compared to the
techniques extraction the face recognition, eigenfaces and fisher
LBP operation faces
7 2019 Niraj Kini et al. Theft Object tracking, Neural networks, convolutional Theft tracking and A new dimension to
[12] detection and computer vision, neural network, deep artificial motion detection enable the surveillance
motion object detection, neural networks, scale invariant system camera as a
tracking convolutional neural feature transform (SIFT), theft detector
using CNN network histograms of oriented gradients
(HOG)
Review of Local Binary Pattern Histograms for Intelligent CCTV Detection
281
282 D. Sharma et al.
the local neighborhood, the number of sampling points, and threshold values signif-
icantly influence the system’s performance. Fine-tuning these parameters is crucial
for achieving optimal results in different surveillance scenarios.
Performance Evaluation: This paper also delves into the performance evaluation
of LBPH-based CCTV detection systems [11]. It discusses various performance
metrics, including accuracy, robustness, and efficiency, to assess the system’s perfor-
mance. The findings reveal the strengths and limitations of LBPH-based systems and
provide insights for further optimization.
• Lower prediction value means better face recognition.
• Higher frame per second (FPS) means faster speed.
As shown in the (see Fig. 2) FPS range, the performance evaluation section of
this review paper provides a comprehensive assessment of Local Binary Pattern
Histograms (LBPH) for intelligent CCTV detection. Various performance metrics
were discussed to evaluate the effectiveness and limitations of LBPH-based CCTV
detection systems on the basis of the FPS range.
The accuracy of LBPH-based systems was highlighted, showcasing its supe-
riority over other feature extraction techniques like eigenface and fisher face as
given in Table 2, Comparison of eigenface, fisher face, and LBPH algorithm [14].
LBPH demonstrates excellent performance in object recognition tasks, particularly
in low-light conditions. However, it was noted that LBPH has limitations in recog-
nizing complex objects or events, requiring further improvements in its capability to
distinguish between similar categories.
Efficiency was also considered, emphasizing the trade-off between accuracy and
computational resources. Optimal parameter settings play a crucial role in achieving
the desired balance between accuracy and efficiency in LBPH-based systems.
Review of Local Binary Pattern Histograms for Intelligent CCTV Detection 283
2 1.23
1 0.67
0 1 2 3 4 5 6 7
FPS Range
The applications section of this review paper explores the diverse range of applica-
tions where Local Binary Pattern Histograms (LBPH) can be effectively utilized in
CCTV detection systems [15]. Several key applications were discussed, highlighting
the effectiveness of LBPH in various scenarios:
Intrusion Detection: LBPH-based CCTV detection systems can be employed to
detect and identify unauthorized individuals or objects entering restricted areas. The
robustness of LBPH in different lighting conditions makes it suitable for monitoring
entrances and detecting potential security breaches.
284 D. Sharma et al.
5 Conclusion
Balancing accuracy and efficiency are crucial for optimal system performance.
Further research is needed to address the limitations in recognizing complex objects
and events and to explore enhancements to LBPH-based CCTV detection systems.
Overall, LBPH shows great potential in advancing intelligent surveillance and
contributes to enhancing security and safety measures.
References
1. Kanhangad V, Ravi J (2015) Human action recognition using local binary patterns and motion
history images. In: Proceedings of the 2015 international conference on advances in computing,
communications and informatics (ICACCI). IEEE, pp 1216–1221
2. Gharib A, Ibrahim AA, Al-Arif MA (2017) Survey on human activity recognition in video
surveillance. In: Proceedings of the 2017 2nd international conference on computer science
and technology, information storage and processing (CSTISP). IEEE, pp 356–361
3. Joshi M, Lade P (2016) Human activity recognition using local binary patterns and motion
history images in video surveillance. In: Proceedings of the 2016 international conference on
information technology (ICIT). IEEE, pp 190–195
4. Kisku DR, Singh JK, Singh P (2015) Human action recognition using local binary pattern and
support vector machine. In: Proceedings of the 2015 IEEE Calcutta conference (CALCON).
IEEE, pp 123–128
5. Irin Anil A et al (2022) Smart security CCTV system: configuration, embedded surveillance,
and non-human detection. Int J Res Eng Sci 10(7):52–56
6. Kothari SB et al (2021) Survey on smart security surveillance system. Int J Res Appl Sci Eng
Technol (IJRASET) 9(12):52
7. Shah M, Agre M, Chawdhary A, Deone J (2022) Smart surveillance system. https://doi.org/
10.2139/ssrn.4108852
8. Deepak S, Akash S (2020) Implementing the study of attendance system using OpenCV with
the help of facial recognition. Int J Recent Adv Multidiscip Top 1(1):16–19
9. Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans
Circ Syst Video Technol 14(1):4–20
10. Deeba F, Memon H, Dharejo F, Ahmed A, Ghaffar A (2019) LBPH-based enhanced real-time
face recognition. Int J Adv Comput Sci Appl 10:535
11. Sharmila R, Sharma D, Kumar V, Puranik C, Gautham K (2019) Performance analysis of
human face recognition techniques. In: Proceedings of the 2019 4th international conference
on internet of things: smart innovation and usages (IoT-SIU), Ghaziabad, India, pp 1–4. https://
doi.org/10.1109/IoT-SIU.2019.8777610
12. Doshi P, Punktambekar S, Kini N, Dhami SS (2019) Theft detection system using convolutional
neural network and object tracking. Int J Adv Res Innov Ideas Educ 5(3):1047–1054
13. Kim HJ, Kim DJ, Kim HW, Lee KH (2016) Human activity recognition in video surveillance
based on feature selection and support vector machines. J Electr Eng Autom 1(1):40–45
14. Sutoyo R, Harefa J, Chowanda A (2016) Unlock screen application design using face expression
on android smartphone. MATEC Web Confer 54:05001. https://doi.org/10.1051/matecconf/
20165405001
15. Elhamod M, Hassenian AE, Ghoniemy S, Elhoseny M (2017) Video surveillance systems:
advances, challenges, and future research directions. In: Handbook of research on advanced
hybrid intelligent techniques and applications. IGI Global, pp 121–157
16. Cui S, Xu T, Xu C, Zhu Y (2016) A survey on visual surveillance of object detection and
tracking in video streams. IEEE Trans Syst Man Cybern Syst 46(6):768–782
17. Dahiya A, Gupta V (2017) Object tracking in video surveillance: a comprehensive review.
IETE J Res 63(3):335–348
286 D. Sharma et al.
18. Das S, Chatterjee S (2018) A review on human activity recognition in video surveillance. In:
Proceedings of the 2018 3rd IEEE Uttar Pradesh section international conference on electrical,
electronics and computer engineering (UPCON). IEEE, pp 1–6
19. Hanmandlu M, Vasikarla S (2013) An efficient approach for facial recognition in video surveil-
lance using local binary patterns. In: Proceedings of the 2017 international conference on
intelligent computing and control systems (ICICCS). IEEE, pp 723–728
Enhancing Machine Learning Model
Using Explainable AI
1 Introduction
In today’s digital era, the reputation and success of businesses are greatly influenced
by customer reviews, which play a vital role in shaping them, particularly in the
hotel industry [1]. With the exponential growth of online platforms, analyzing and
interpreting large volumes of customer feedback have become increasingly chal-
lenging. To address this, machine learning models have been employed to automate
the process of categorizing and analyzing hotel reviews. However, these models
often operate as black boxes, lacking transparency and interpretability. The inability
to understand the factors driving model predictions raises concerns among stake-
holders, such as hotel managers and consumers [2]. Therefore, the primary focus
of this research is to address the limitations of opacity in hotel review management
models by implementing Explainable Artificial Intelligence (XAI) techniques. By
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 287
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_25
288 R. Shah et al.
incorporating XAI techniques, such as LIME and SHAP, we aim to enhance trans-
parency and interpretability, enabling stakeholders to gain deeper insights into the
decision-making process of the model.
1.1 Purpose
The purpose of this study is twofold. Firstly, we aim to enhance transparency and
interpretability in hotel review management whose machine learning model was
developed with an accuracy of about 92% by leveraging XAI techniques mainly
feature importance, LIME and SHAP technique. By providing stakeholders with
comprehensible insights into the factors influencing the model’s predictions, we
enable them to validate and understand the decision-making process. Secondly,
we seek to improve decision-making processes based on customer feedback. By
extracting meaningful and interpretable information from the reviews, stakeholders
can make more informed judgments and identify key factors that drive customer
satisfaction or dissatisfaction [3].
1.2 Scope
The scope of this study encompasses the execution of LIME plus the SHAP tech-
niques in a hotel review management model. We will explore the practical integration
of these XAI techniques and evaluate their effectiveness in providing interpretable
explanations. Furthermore, we will conduct case studies and experiments to assess
the impact of XAI on decision-making processes [4]. While the focus is on the hotel
industry, the insights and findings from this study can be extended to other domains
where customer feedback analysis is essential.
1.3 Significance
In conclusion, this research aims to address the lack of transparency and inter-
pretability in hotel review management models by implementing XAI techniques. By
exploring LIME and SHAP and their impact on decision-making processes, we strive
to enhance transparency and interpretability and ultimately improve decision-making
based on customer feedback.
Enhancing Machine Learning Model Using Explainable AI 289
2 Related Work
Li and Lu published a research paper named “Hotel review sentiment analysis based
on a hybrid model of deep learning and machine learning” in the year 2020. It
proposed a hybrid model that combines deep learning and machine learning tech-
niques to analyze sentiment in hotel reviews. They demonstrated the effectiveness
of their approach in capturing sentiment information from textual data [5].
Mishra and Dash [6] published a research paper named “Sentiment analysis of
hotel reviews using machine learning techniques” in the year 2019, which focused
on the application of machine learning algorithms for sentiment analysis in the hotel
review domain, achieving promising results in classifying reviews into positive, nega-
tive, or neutral sentiments. Huang et al. [7] explored a hybrid approach that integrates
machine learning in addition to lexicon-based methods to enhance sentiment analysis
accuracy. Their study emphasized the importance of incorporating domain-specific
knowledge.
These works collectively showcase the ongoing research in sentiment analysis
and the integration of various techniques for analyzing and interpreting sentiments
in hotel reviews. Building upon these prior studies, our research aims to extend
the existing approaches by incorporating explainable AI techniques, such as LIME
and SHAP, to provide transparent and interpretable explanations for the sentiment
analysis results obtained in the context of hotel review management. By employing
these XAI methods, we seek to enhance the understanding and trustworthiness of
sentiment analysis outcomes, enabling hotel managers to make informed decisions
based on the identified sentiments in guest reviews.
The research questions driving this study are twofold. The primary research ques-
tion centers on understanding the effectiveness of LIME and SHAP in providing
interpretable explanations for hotel review predictions. Through this, we seek to
explore how these XAI techniques contribute to transparency and trustworthiness in
the model’s decision-making process.
3 Method
The methodology employed in this research paper involves several key steps,
including data collection and preprocessing, model development, XAI integration,
and evaluation. The following sections outline the details of each step:
Data Collection and Preprocessing. The hotel review data used in this study were
collected from an online hotel review platform. The dataset consisted of a substantial
290 R. Shah et al.
number of hotel reviews, encompassing both positive and negative sentiments. The
data collection process involved extracting the relevant reviews based on predefined
criteria, such as review ratings and textual content.
To ensure the quality and consistency of the data, a series of preprocessing steps
were implemented. Irrelevant information, such as timestamps or user identifiers, was
removed from the dataset. Text preprocessing techniques, including tokenization,
lowercasing, and removal of stop words and special characters, were applied to
standardize the text data. Additionally, any missing values or inconsistencies in the
dataset were handled appropriately.
Model Development. A machine learning model was developed to perform senti-
ment analysis on the hotel reviews. The selection of the specific model was based
on its suitability for text classification tasks. The model was trained using a labeled
dataset, with the positive and negative sentiments serving as the target labels. Various
machine learning algorithms, such as random forest or support vector machines, were
considered and evaluated based on their performance metrics, including accuracy,
precision, recall, and F1-score. The chosen model was then fine-tuned to optimize
its performance.
XAI Integration. Explainable Artificial Intelligence (XAI) techniques, in partic-
ular, Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive
Explanations (SHAP) are two specific methods utilized for the purpose of explaining
and interpreting models were integrated into the hotel review management model.
These techniques were selected due to their ability to provide interpretable explana-
tions for the model’s predictions. Feature importance technique is one of all the tech-
niques which helps to generate the explanations for model’s predictions. It is plotted
in Fig. 2, on the basis of the features which are generated by the code mentioned in
Fig. 1. A detailed description of how the reviews is generated by the AI model using
XAI techniques is shown in Fig. 4. Further techniques like LIME and SHAP and
how it is implemented are shown in Figs. 5, 6, and 7. LIME provides local expla-
nations by generating feature importance scores, while SHAP offers a more global
understanding of feature contributions. It is plotted in Fig. 8, according to the SHAP
technique. The XAI techniques were implemented to generate explanations for the
model’s predictions, enabling stakeholders to understand the factors influencing the
sentiment classification.
• Code of feature importance technique (as shown in Fig. 1).
• Plotting of feature importance scores (as shown in Fig. 2).
• Code for prediction of review (as shown in Fig. 3).
• Following are the classified reviews (as shown in Fig. 4).
• Code of LIME technique shown in Fig. 5.
• Output of LIME technique (as shown in Fig. 6).
• Code of SHAP technique (as shown in Fig. 7).
• Plotting of SHAP values (as shown in Fig. 8).
• Code of implementing SHAP technique on five reviews (as shown in Fig. 9).
• Plotting’s of SHAP values (as shown in Fig. 10).
Enhancing Machine Learning Model Using Explainable AI 291
Evaluation. The integrated model, along with the LIME and SHAP techniques, was
assessed using different evaluation measures, including accuracy, precision, recall,
and F1-score. The efficiency of the model was assessed on a held-out test dataset,
and the XAI techniques’ effectiveness in providing interpretable explanations was
analyzed. Additionally, the impact of XAI on decision-making processes related to
hotel reviews was evaluated through case studies and experiments. This involved
analyzing the insights gained from the explanations provided by LIME and SHAP
and assessing their influence on stakeholders’ decision-making.
The methodology presented in this research paper provides a comprehensive
approach to implementing XAI in the hotel review management model. It ensures
the collection of reliable and relevant data, the development of an effective senti-
ment analysis model, the integration of LIME and SHAP for interpretability, and the
thorough evaluation of the integrated model’s performance.
294 R. Shah et al.
4 Result
The integration of XAI techniques, LIME and SHAP, into the hotel review manage-
ment model yielded significant outcomes. The sentiment analysis model achieved
high accuracy, precision, recall, and F1-score. LIME and SHAP provided inter-
pretable explanations, aiding stakeholders in understanding the factors driving
Enhancing Machine Learning Model Using Explainable AI 295
5 Discussion
The integration of XAI techniques, LIME and SHAP, enhanced transparency, inter-
pretability, and decision-making in the hotel review management model. LIME and
SHAP provided explanations for sentiment analysis predictions, enabling stake-
holders to understand factors influencing customer sentiments. This improved
decision-making, user acceptance, and trust in the model’s predictions. Practical
applications extend to service improvements and reputation management. Future
research can explore additional XAI techniques and investigate long-term impact
and generalizability in different domains. Overall, XAI bridges the gap between
complex AI models and human understanding, enabling informed decisions based
on customer feedback.
296 R. Shah et al.
6 Conclusion
The integration of XAI techniques, LIME and SHAP, into the hotel review manage-
ment model has proven beneficial. The model achieved high performance in senti-
ment analysis and provided interpretable explanations. This enhanced transparency
and interpretability improved decision-making processes, user acceptance, and trust.
The practical applications of XAI in the hotel industry are significant, bridging the
gap between AI models and human understanding. Future research can explore
broader applications and generalizability. Overall, XAI empowers stakeholders to
make informed decisions based on customer feedback, leading to enhanced service
quality and customer satisfaction.
Limited dataset. The research used a specific dataset of hotel reviews, which may not
capture the full spectrum of sentiments and experiences, limiting the generalizability
of the findings.
Dependency on model performance. The quality of the explanations provided by
LIME and SHAP relies on the accuracy of the sentiment analysis model. If the
prototype has limitations or biases, it can impact the reliability of the explanations.
The future scope is listed below:
• Integration of multiple XAI techniques.
• Real-time analysis and feedback.
• Cross-domain analysis.
• User-centric explanations.
• Ethical considerations.
References
5. Li J, Lu K (2020) Hotel review sentiment analysis based on a hybrid model of deep learning
and machine learning. Int J Comput Intell Syst 13(1):787–799
6. Mishra P, Dash R (2019) Sentiment analysis of hotel reviews using machine learning techniques.
In: Proceedings of the international conference on machine intelligence and data
7. Huang S, Sun X, Wang J (2020) Sentiment analysis of hotel reviews using machine learning
and lexicon-based approaches. Int J Comput Intell Syst 13(1):787–799
IoT-Assisted Mushroom Cultivation
in Agile Environment
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 299
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_26
300 A. Kathiria et al.
The emergence of the Internet of Things (IoT) has revolutionized how we approach
agriculture. IoT technology involves using sensors, actuators, and other devices to
collect and process data from the environment in real time. IoT can be applied to
various areas of agriculture, including crop monitoring, irrigation, and pest manage-
ment. In the case of mushroom cultivation, IoT can be used to monitor and control
the cultivation environment with greater accuracy and efficiency than traditional
methods.
IoT-based mushroom cultivation involves installing sensors and other IoT devices
throughout the cultivation facility to collect data on various environmental factors,
such as temperature and humidity. This data is then processed and analyzed in real
time, allowing farmers to make informed decisions about the cultivation process.
For example, suppose the temperature in the cultivation facility rises above a certain
threshold. In that case, the IoT system can automatically activate a cooling system to
return the temperature to the optimal range. This saves time and labor and improves
the overall quality and yield of the mushrooms [8].
In an agile environment, IoT-based mushroom cultivation can provide farmers
with the flexibility to adapt quickly to changes in the cultivation environment. An
agile environment involves using iterative and incremental processes to respond to
changes in requirements and improve the overall quality of the product. With IoT-
based mushroom cultivation, farmers can respond quickly to changes in environ-
mental factors and make adjustments to the cultivation process as needed. However,
the implementation of IoT-based mushroom cultivation in an agile environment can
also pose some challenges. For example, the cost of installing and maintaining IoT
devices can be high, and there may be a learning curve for farmers to understand and
use the technology effectively [1].
In this paper, we will explore the benefits of IoT-based mushroom cultivation in
an agile environment and analyze the challenges associated with its implementation.
The proposed approach is different from the existing approaches in some manner
such as environment friendly and less complexity. The proposed approach uses the
biowaste as a cultivation platform which previous approaches does not included. It
ensures the low cost design for the production which makes it efficient for small
organization/home purpose. We will also provide a case study of a successful imple-
mentation of IoT-based mushroom cultivation in an agile environment and discuss
the implications of our findings for future research and development in this field.
2 Related Work
Kassim et al. state the latest research on the indoor environment of mushroom facto-
ries and its impact on the growth and quality of mushrooms. The success of mushroom
cultivation heavily depends on indoor conditions, including temperature, humidity,
airflow, lighting, and air quality. For instance, studies have demonstrated that a tem-
perature range of 20–25 .◦ C and humidity levels of 85–95% are ideal for the growth
of oyster mushrooms. Moreover, poor air quality due to high levels of carbon dioxide
IoT-Assisted Mushroom Cultivation in Agile Environment 301
and volatile organic compounds can decrease mushroom yields and cause contam-
ination. To ensure optimal mushroom growth and quality, it is crucial to carefully
manage and control these factors. Nonetheless, more research is needed to develop
improved strategies for optimizing the indoor environment of mushroom factories
[5].
Balan et al. discuss the challenges and opportunities of producing high-quality
edible mushrooms from lignocellulosic biomass in small-scale settings. The authors
highlight the potential of using agricultural and forestry waste as a substrate for
mushroom cultivation, which can offer both economic and environmental benefits.
However, the utilization of these materials can be hindered by their variable and
complex nature, which requires careful selection, processing, and supplementation
with nutrients. The authors also address the challenges related to the management
of fungal contamination, temperature, humidity, and other environmental factors
during mushroom cultivation. Strategies for optimizing these factors, such as the
use of automated systems and advanced monitoring techniques, are also discussed.
Overall, the paper provides valuable insights into the opportunities and challenges of
small-scale mushroom production from lignocellulosic biomass and highlights the
need for further research in this area [3].
Singh and Sushma focus on the application of Internet of Things (IoT) technology
in mushroom cultivation. The authors discuss how IoT can help in automating and
optimizing various aspects of mushroom cultivation, such as temperature, humidity,
lighting, and air quality. The paper also highlights the benefits of using IoT in mush-
room cultivation, such as reduced labor and energy costs, increased yield and quality,
and improved traceability and transparency in the supply chain. The authors discuss
the various sensors and devices that can be used in IoT-based mushroom cultivation
systems and provide examples of existing IoT solutions for mushroom cultivation.
Overall, the paper provides valuable insights into the potential of IoT in enhancing
the efficiency and productivity of mushroom cultivation and highlights the need for
further research in this area [7].
Rahman et al. discuss the development of an IoT-enabled mushroom farm automa-
tion system with machine learning to classify toxic mushrooms in Bangladesh. The
authors highlight the significance of mushroom cultivation in Bangladesh, but also
the potential dangers posed by the presence of toxic mushroom species. To address
this issue, the authors propose an automated system that can detect and classify
toxic mushrooms in real time using machine learning algorithms. The study also
discusses the potential benefits of using IoT technologies in mushroom cultivation,
such as real-time monitoring and control of environmental factors like temperature,
humidity, and CO.2 levels. Overall, the paper provides a valuable contribution to the
growing field of IoT and machine learning in agriculture, particularly in the context
of mushroom cultivation in Bangladesh [6].
Islam et al. present an automated monitoring and environmental control system
for laboratory-scale cultivation of oyster mushrooms using the Internet of Agricul-
tural Things (IoAT). The study highlights the importance of environmental control in
mushroom cultivation and the potential benefits of implementing an automated sys-
tem. The authors review existing literature on mushroom cultivation and automation
302 A. Kathiria et al.
systems and emphasize the need for a cost-effective and efficient approach to mush-
room cultivation. The paper describes the design and implementation of the proposed
system, which involves monitoring and controlling the temperature, humidity, light,
and carbon dioxide levels using sensors and actuators connected to an IoT platform.
The authors also evaluate the performance of the system in terms of mushroom
growth and compare it with manual cultivation. The study concludes that the auto-
mated system significantly improves the growth rate and yield of oyster mushrooms
compared to manual cultivation, indicating the potential benefits of IoAT-based sys-
tems in mushroom cultivation. However, the authors also acknowledge the need for
further research to optimize the system and evaluate its economic feasibility for
commercial-scale production [4].
Subedi et al. present an Internet of Things (IoT)-based monitoring system for
white button mushroom farming. The authors discuss the importance of monitoring
and controlling the environmental conditions for optimal growth and yield of mush-
rooms. They review existing literature on IoT-based systems for mushroom farming
and identify the need for a more cost-effective and user-friendly system. The authors
describe their proposed monitoring system which uses sensors to measure tempera-
ture, humidity, carbon dioxide, and light intensity in real time. The data from these
sensors is transmitted to a central server where it is processed and analyzed. The
authors report the successful implementation of the system and conclude that it has
the potential to improve mushroom farming practices and increase yields [9].
Considering the above-related work, the proposed approach suggests a better cost-
effective, and environment-friendly mushroom cultivation design. It also supports the
cloud-based data storage and computation facility to take any precautionary measure
in case of quality defects. The study will present an IoT-based system where a better
yield of mushrooms can be produced at a meager cost. It will subsequently help to
achieve those mushroom yield production that was restricted by their demographic
features.
3 Proposed System
This section provides the overall working and basic principal for the proposed model.
The methodology of the proposed solution is Classified in the form of a flowchart in
Fig. 1.
The flowchart describes creating a grow tent for cultivating mushrooms and moni-
toring their growth using various sensors, a Raspberry Pi, and an Asus Tinker Board.
– The process starts with creating the grow tent and installing the humidifier to
provide the ideal environment for mushroom growth. Once the tent is set up,
supports are placed inside the tent where the mushroom saplings will be.
– Next, various sensors are installed inside the grow tent to monitor the temperature,
humidity, and other environmental conditions necessary for the growth of the
mushroom. The fans are also attached to the tent for proper ventilation.
IoT-Assisted Mushroom Cultivation in Agile Environment 303
– The sensors are connected to the Raspberry Pi and Asus Tinker Board through an
application that allows real-time monitoring of the conditions inside the tent. A
Pi Camera is also connected to the system to record the growth of the mushroom
over time.
– Once the mushroom is fully grown, the sensors stop collecting data, and the process
is stopped. The application is used to monitor the changes in temperature and
humidity and allows the user to set a threshold for these parameters.
– All the data acquired from the sensors is stored in the application and can be
accessed by the user. This allows the user to monitor the growth of the mushroom
and make adjustments to the environment if necessary.
4 Methodology
– Define project scope and objectives: The first step is to define the goals and objec-
tives of the IoT system. This involves identifying the specific environmental param-
eters to be monitored and controlled, such as temperature and humidity, and deter-
mining the desired outcomes of the cultivation process, such as improved yield,
quality, and efficiency. Defining the project scope and objectives will provide a
clear direction for the implementation process.
– Design IoT system: The next step is to design the IoT system. This involves
developing a system architecture that outlines the sensors, actuators, and other
components required to monitor and control the cultivation environment. Choosing
appropriate IoT platforms and technologies to collect and analyze data is essential.
The design phase should result in a detailed plan for how the system will operate
and be integrated with existing cultivation practices.
304 A. Kathiria et al.
– Install sensors and other equipment: Once the IoT system is designed, the next
step is to install the required sensors and equipment in the cultivation environment.
This may involve placing temperature and humidity sensors throughout the grow-
ing area and adding equipment such as fans or heaters to control the environment.
Ensuring the sensors and equipment that are positioned correctly to capture accu-
rate data is essential. Figure 2 demonstrates the IoT system architecture consisting
these sensor components.
– Configure IoT platform: After the sensors and equipment are installed, the IoT
platform must be configured to receive data from the sensors and control the
actuators. This may involve setting up a cloud-based IoT platform to collect and
analyze data and configuring the software to control equipment such as fans and
heaters. It’s important to ensure that the system is scalable and can handle a large
amount of data.
– Test and validate system: Once the IoT platform is configured, it’s important to
conduct tests to ensure the system functions as expected. This may involve running
tests to validate data accuracy and system reliability. The testing phase should also
include validating that the system meets the defined project scope and objectives.
– Monitor and optimize the system: After the system is operational, it’s important
to continuously monitor the system to identify and resolve any issues that arise.
This may involve using data analysis tools to identify areas where improvements
can be made and optimizing the system to improve yield, quality, and efficiency.
– Integrate with agile practices: Finally, the implementation of IoT-assisted mush-
room cultivation should be integrated with agile practices, such as iterative devel-
opment and continuous improvement. This involves adapting and improving the
system over time to ensure that it remains responsive to changing conditions and
that the desired outcomes are achieved.
– Improved crop yields: With IoT devices and sensors, farmers can monitor and
control the cultivation environment in real time, ensuring that the temperature,
humidity, CO.2 levels, and light intensity are optimized for mushroom growth.
This can lead to improved yields and a more consistent supply of high-quality
mushrooms.
IoT-Assisted Mushroom Cultivation in Agile Environment 305
Fig. 3 Grow tent with support structures to place the mushroom bags and sensors
306 A. Kathiria et al.
– Consistent quality: With better control over the cultivation environment, farm-
ers can produce mushrooms with consistent quality and characteristics, making
them more marketable. This can lead to increased revenue and improved customer
satisfaction.
– Reduced labor costs: IoT devices and sensors can automate many tasks involved
in mushroom cultivation, such as adjusting the temperature and humidity levels,
reducing the need for manual labor. This can lead to significant cost savings for
farmers, especially in large-scale operations.
– Reduced environmental impact: By optimizing the growing conditions, farmers
can reduce the use of resources such as water and energy and minimize waste and
pollution. This can lead to a more sustainable cultivation process and a reduced
environmental footprint.
– Better decision-making: With access to real-time data and analytics, farmers can
make informed decisions and take timely actions to optimize the cultivation pro-
cess. For example, if the sensors detect a drop in humidity levels, the system
can automatically activate the humidifier to maintain the optimal conditions for
mushroom growth.
– Enhanced monitoring and control: IoT-assisted mushroom cultivation in an agile
environment enables farmers to monitor and control the cultivation process
remotely using a mobile application or web interface. This provides greater flexi-
bility and convenience and enables farmers to respond quickly to any changes in
the cultivation environment.
– Improved traceability and accountability: By collecting and storing data in the
cloud, IoT-assisted mushroom cultivation in an agile environment enables farmers
to track the cultivation process from start to finish, providing greater traceability
and accountability. This can be important for regulatory compliance and quality
control.
Figure 4 shows the cultivated mushroom using the proposed IoT system. Addi-
tionally, the captured data of the system is periodically updated on the cloud which is
summerized in Fig. 5. Overall, the benefits of IoT-assisted mushroom cultivation in
an agile environment can include improved crop yields, consistent quality, reduced
labor costs, reduced environmental impact, better decision-making, enhanced mon-
itoring and control, and improved traceability and accountability. By leveraging the
power of IoT devices and cloud-based platforms, farmers can optimize the cultivation
process and achieve better results.
6 Conclusions
critical to mushroom growth in real time, improving the cultivation process’s yields,
quality, and sustainability.
The benefits of IoT-assisted mushroom cultivation in an agile environment include
improved crop yields, consistent quality, reduced labor costs, reduced environmen-
tal impact, better decision-making, enhanced monitoring and control, and improved
traceability and accountability. These benefits can significantly impact the profitabil-
ity and sustainability of mushroom cultivation and the overall quality and availability
of mushrooms in the market.
308 A. Kathiria et al.
Acknowledgements This work is part of the B.Tech. major project. All the images are the outcome
of the proposed system. The author would like to acknowledge the support from Pandit Deendayal
Energy University, Gandhinagar, India.
References
Abstract Cells in brain, which develop too quickly and uncontrollably, can lead to
brain tumor. Early detection and treatment are crucial for the treatment. However,
accurately segmenting and categorizing tumor remains a daunting challenge despite
the numerous research efforts. CNNs, a well-liked deep learning (DL) model, may
nevertheless be constrained by major input variances or domain shifts. Pre-trained
transfer-based learning models are becoming more and more common as a solution
to this issue. This survey aims to present a thorough overview of shift from state-of-
the-art CNN-based models to transfer learning-based techniques, to automate brain
tumor detection. This review discusses the reason of shift along with architecture of
brain tumor, publicly accessible datasets, segmentation, feature extraction, classifi-
cation, and cutting-edge approaches for studying brain cancers including machine
learning, deep learning, and transfer learning. In addition, relevant issues and typical
difficulties have also been highlighted.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 309
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_27
310 M. Jain et al.
in another area of the body and extending to the brain). The kind, size, and location of
the tumor all influence the available treatments. Symptom relief or curative objectives
may be the focus of treatment. Many of the 120 different forms of brain tumor are
treatable [1].
Gliomas or tumor can really be described as aggressive or slow-growing. A malig-
nant (aggressive) tumor spreads from one place to another; in contrast, a benign
(slow-growing) tumor does not penetrate the surrounding tissues. Brain tumors are
categorized into grades I through IV by the WHO. Tumors of grades I and II are
considered to grow slowly, but those of higher grades (III–IV) have a worse prognosis
[2].
The most standard diagnostic technique is reliant on how doctors analyze an MRI
scan decision, which increases the time of making a diagnosis because the images are
very complex. However, using digital image processing enables quick and accurate
tumor detection and can be of great use to doctors by enabling them to work faster
and easier diagnosis of the presence of tumor [3].
To visually assess the patient, radiologists use MRI and CT scans. They examine
the structure of the brain, and MRI images show the tumor size and location [4].
The following are methods for capturing images available for recognizing and
categorizing brain tumor: For detecting brain tumor, magnetic resonance imaging
(MRI) is the most important imaging method (MRI). It can demonstrate the location,
size, and form of a malignancy and provide comprehensive images of the brain.
Using computed tomography, or CT scans, the brain may be shown in great detail.
They frequently work in conjunction with MRI to give a more thorough picture of
the brain and to aid in the diagnosis.
A PET scan is imaging test that shows how your body’s cells are functioning by
using a little quantity of radioactive material that could help with detection.
As the number of patients has multiplied, manually interpreting these images
has become time-consuming, confusing, and lowers precision. A computer-aided
diagnostic technique that lowers the cost of brain MRI identification must be devised
in order to ease this restriction. A number of attempts have been made to create an
effective and trustworthy method for automatically classifying brain tumor [5].
Both the voxel and the pixel are units of measurement used in image processing.
The smallest component of a digital image is referred to as a pixel (short for
“picture element”). It is a solitary point in the picture that carries details about
its color, brightness, and position. When detecting brain tumor, medical imaging
techniques like X-rays, CT scans, and MRI scans employ pixels to represent the
various grayscale levels.
In three-dimensional (3D) space, a voxel, which is short for “volume element,” is
the equivalent of a pixel. It represents a single point in the 3D picture and contains
information about its location in space as well as its attributes such as intensity and
A Comprehensive Survey of Machine Learning Techniques for Brain … 311
texture. Voxels are utilized in the identification of brain tumor to depict the tissue
density in 3D space and to offer a more precise estimation of the tumor size and
location.
As illustrated in Fig. 1, voxel and pixel are both significant units of measurement
in the processing of pictures for brain tumor identification. Voxel is used to represent
3D images, while pixels are used to represent 2D images in medical imaging [6].
The articles included in this evaluation were released between 2017 and 2023.
Further, few researches that were released prior to 2017 are also explored. Main
emphasis on articles that created classification and segmentation methods for brain
tumor is utilizing ML, CNN, pre-trained transfer learning models like ResNet,
VGG16, etc. To discover pertinent papers, searches were made in the scientific liter-
ature databases IEEE, PubMed, Google Scholar, Elsevier, and ScienceDirect. For
journal papers, Multidisciplinary Digital Publishing Institute’s (MDPI) web database
was also explored. Our searches included the words “brain tumor,” “segmentation,”
“classification,” “DL,” and “transfer learning.” Additionally, a collection of phrases
related to DL brain tumor segmentation and classification, such as traditional machine
learning, convolutional neural networks, capsule networks, and transformers, was
combined with the afore mentioned search terms.
4 Dataset
There are number of datasets containing both non-tumor and tumor brain images,
which are readily accessible to the public and researchers. In this section, some
noteworthy and complex datasets are presented which are widely used.
312 M. Jain et al.
1. BraTS (The Brain Tumor Segmentation)—In all of its research, BraTS has
emphasized the assessment of cutting-edge techniques for the segmentation
of brain tumor in multimodal magnetic resonance imaging (MRI) data. Since
its establishment, BraTS has concentrated on serving as a standard bench-
marking environment for brain glioma segmentation algorithms, employing care-
fully curated multi-institutional multi-parametric magnetic resonance imaging
(mp-MRI) data [7].
2. The Medical Segmentation Decathlon (MSD) dataset—This dataset contains
MRI scans of brain tumor and is part of the Medical Segmentation Decathlon
challenge organized by Facebook AI. It contains mp-MRI images containing 750
4D volumes (484 testing and 266 validation) [8].
3. TCIA dataset—Publicly available medical imaging database of cancer cases.
It was developed by the National Cancer Institute (NCI) and maintained by the
Cancer Imaging Program at the Department of Radiology and Imaging Sciences
of the National Institutes of Health Clinical Center. A sizable number of medical
pictures, including X-rays, CT scans, MRIs, and PET scans, are available in the
database. These photos come from a variety of places, including clinical trials,
academic institutions, and lone researchers. A computer-aided diagnostic and
treatment planning system is being developed, and this database is intended to
facilitate that process.
The exact number of images will depend on the specific collection within the
TCIA that you are accessing [9].
4. Harvard Medical School website [10]—Datasets are also available on online
repositories such as Kaggle, GitHub, and Figshare, with additional resources and
information for research purposes.
Like other machine learning models this image processing models also follow as
approach as depicted in Fig. 2.
Like any other model it begins with data collection followed by pre-processing.
Main techniques used in the process pipeline are
1. Image pre-processing—The pre-processing of the photos is completed before
incorporating them into the suggested framework. Image pre-processing is a
crucial stage in the pipeline for image processing and helps to increase the
precision and effectiveness of image processing algorithm [11].
The removal of non-brain tissue (commonly referred to as a brain extraction),
image registration (in the case of multimodal image processing) and MRI bias
field correction are the most essential steps. Additionally, pre-processing aids
shorter model training time and quicken model inference. Reducing the input
image size will substantially reduce model training time without affecting model
performance if the images are extremely huge.
A Comprehensive Survey of Machine Learning Techniques for Brain … 313
There has been a shift to new approaches for detection of tumor like transfer learning-
based approach. Table 2 lists the summary of transfer learning methods. On a private
glioma MRI dataset made up of 113 LGG and HGG patients, Yang et al. evaluated
the pre-trained CNNs outperform CNNs created from scratch in classification tasks.
The trials demonstrated that transfer learning and fine-tuning enhanced classification
Table 1 Summary of existing classification approaches
References Year Authors Techniques Dataset Performance
[15] 2017 Cabria and Gondra Potential field clustering BraTS SD = 0.283, average = 0.517,
median = 0.644
[16] 2018 Seetha and Raja CNN BraTS Accuracy of 97.5%
[17] 2018 Mohsen et al. Classification using DNN—along Harvard Medical School Accuracy with DNN—96.97%,
with segmentation using fuzzy with KNN of 95.45%
C-means, feature extraction using
DWT and PCA for reduction
[18] 2019 Hossain et al. CNN BraTS Accuracy—97.87%
[19] 2020 Toğaçar et al. CNN-based BrainMRNet model Local data Accuracy—96.05%
[20] 2021 Deepak and Ameer Proposed a hybrid technique using Figshare Accuracy—95.82%
CNN-based features and SVM
[21] 2021 Karayegen and Aksahin CNN BraTS Mean prediction ratio—91.718,
95.7 accuracy
[22] 2022 Jena et al. Fuzzy C-means (FCM), K-means, BraTS 2017 and BraTS 2019 Maximum accuracy 97% with
and hybrid image segmentation ensemble methods
algorithm and support vector
machines (SVMs), K-nearest
A Comprehensive Survey of Machine Learning Techniques for Brain …
performance for HGG and LGG. Using GoogLeNet, they were able to reach their
greatest test accuracy of 90% [24].
7 Research Findings
CNN
DNN
Ensemble tehniques,SVM,KNN,BDT,RF
others
no. of times used in reviewed studies
Pre-Trained Models
318 M. Jain et al.
8 Conclusion
The appearance, changing size, form, and structure of brain tumor make reliable
tumor identification still exceedingly difficult, although techniques for segmenting
tumor in MR images have demonstrated great promise for analysis and tumor detec-
tion. For the tumor area to be effectively segmented and classified, there are still
numerous improvements needed. The categorization of pictures and the identification
of tumor area substructures are both problems and limits of existing work.
Overall, this survey includes all pertinent issues and the most recent research,
along with its drawbacks and difficulties. It will be beneficial for the researchers to
get knowledge about how to conduct new research quickly and appropriately.
Although deep learning techniques have made a substantial contribution, a general
methodology is still needed. These methods offer superior results when tested after
training on similar acquisition conditions (intensity range and resolution). Neverthe-
less, a little difference between the training and further testing images significantly
affects the resilience of the system.
References
1. Anusree P, Reshma R, Saritha K, Ani S (2020) MRI brain image segmentation using machine
learning techniques, vol 6, no 08
2. Amin J, Sharif M, Haldorai A, Yasmin M, Nayak RS (2021) Brain tumor detection and
classification using machine learning: a comprehensive survey. Complex Intell Syst 1–23
3. Kumar S, Dhir R, Chaurasia N (2021) Brain tumor detection analysis using CNN: a review. In:
2021 international conference on artificial intelligence and smart systems (ICAIS), Mar 2021.
IEEE, pp 1061–1067
4. Angulakshmi M, Lakshmi Priya GG (2017) Automated brain tumor segmentation techniques—
a review. Int J Imaging Syst Technol 27(1):66–77
5. Rahman T, Islam MS (2023) MRI brain tumor detection and classification using parallel deep
convolutional neural networks. Meas Sens 26:100694
6. Despotović I, Goossens B, Philips W (2015) MRI segmentation of the human brain: challenges,
methods, and applications. Comput Math Methods Med 2015
7. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J et al (2014) The
multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging
34(10):1993–2024
8. Antonelli M, Reinke A, Bakas S, Farahani K, Landman BA, Litjens G et al (2021) The medical
segmentation decathlon, 10. arXiv preprint arXiv:2106.05735
9. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P et al (2013) The cancer imaging
archive (TCIA): maintaining and operating a public information repository. J Digit Imaging
26:1045–1057
10. https://spl.harvard.edu/software-and-data-sets
11. Mehrotra R, Ansari MA, Agrawal R, Anand RS (2020) A transfer learning approach for AI-
based classification of brain tumor. Mach Learn Appl 2:100003
12. Rai HM, Chatterjee K (2020) Detection of brain abnormality by a novel Lu-Net deep neural
CNN model from MR images. Mach Learn Appl 2:100004
13. Ezhilarasi R, Varalakshmi P (2018) Tumor detection in the brain using faster R-CNN. In:
2018 2nd international conference on I-SMAC (IoT in social, mobile, analytics and cloud)
(I-SMAC), Aug 2018. IEEE, pp 388–392
A Comprehensive Survey of Machine Learning Techniques for Brain … 319
14. Bahadure NB, Ray AK, Thethi HP (2017) Feature extraction and selection with optimization
technique for brain tumor detection from MR images. In: 2017 international conference on
computational intelligence in data science (ICCIDS), June 2017. IEEE, pp 1–7
15. Cabria I, Gondra I (2017) MRI segmentation fusion for brain tumor detection. Inf Fusion
36:1–9
16. Seetha J, Raja SS (2018) Brain tumor classification using convolutional neural networks.
Biomed Pharmacol J 11(3):1457
17. Mohsen H, El-Dahshan ESA, El-Horbaty ESM, Salem ABM (2018) Classification using deep
learning neural networks for brain tumor. Future Comput Inform J 3(1):68–71
18. Hossain T, Shishir FS, Ashraf M, Al Nasim MA, Shah FM (2019) Brain tumor detection using
convolutional neural network. In: 2019 1st international conference on advances in science,
engineering and robotics technology (ICASERT), May 2019. IEEE, pp 1–6
19. Toğaçar M, Ergen B, Cömert Z (2020) BrainMRNet: brain tumor detection using magnetic
resonance images with a novel convolutional neural network model. Med Hypotheses
134:109531
20. Deepak S, Ameer PM (2021) Automated categorization of brain tumor from MRI using CNN
features and SVM. J Ambient Intell Humaniz Comput 12:8357–8369
21. Karayegen G, Aksahin MF (2021) Brain tumor prediction on MR images with semantic segmen-
tation by using deep learning network and 3D imaging of tumor region. Biomed Signal Process
Control 66:102458
22. Jena B, Nayak GK, Saxena S (2022) An empirical study of different machine learning tech-
niques for brain tumor classification and subsequent segmentation using hybrid texture feature.
Mach Vis Appl 33(1):6
23. Corral H, Melchor J, Sotelo B, Vera J (2022) Detection of brain tumor using machine learning
algorithms. arXiv preprint arXiv:2201.04703
24. Yang Y, Yan LF, Zhang X, Han Y, Nan HY, Hu YC et al (2018) Glioma grading on conventional
MR images: a deep learning study with transfer learning. Front Neurosci 12:804
25. Banerjee S, Mitra S, Masulli F, Rovetta S (2019) Deep radiomics for brain tumor detection and
classification from multi-sequence MRI. arXiv preprint arXiv:1903.09240
26. Swati ZNK, Zhao Q, Kabir M, Ali F, Ali Z, Ahmed S, Lu J (2019) Brain tumor classification
for MR images using transfer learning and fine-tuning. Comput Med Imaging Graph 75:34–46
27. Sadad T, Rehman A, Munir A, Saba T, Tariq U, Ayesha N, Abbasi R (2021) Brain tumor
detection and multi-classification using advanced deep learning techniques. Microsc Res Tech
84(6):1296–1308
28. Kaur T, Gandhi TK (2020) Deep convolutional neural networks with transfer learning for
automated brain image classification. Mach Vis Appl 31(3):20
29. Pravitasari AA, Iriawan N, Almuhayar M, Azmi T, Irhamah I, Fithriasari K et al (2020) UNet-
VGG16 with transfer learning for MRI-based brain tumor segmentation. TELKOMNIKA
(Telecommun Comput Electron Control) 18(3):1310–1318
30. Noreen N, Palaniappan S, Qayyum A, Ahmad I, Imran M, Shoaib M (2020) A deep learning
model based on concatenation approach for the diagnosis of brain tumor. IEEE Access 8:55135–
55144
31. Hao R, Namdar K, Liu L, Khalvati F (2021) A transfer learning–based active learning
framework for brain tumor classification. Front Artif Intell 4:635766
32. Arbane M, Benlamri R, Brik Y, Djerioui M (2021) Transfer learning for automatic brain tumor
classification using MRI images. In: 2020 2nd international workshop on human-centric smart
environments for health and well-being (IHSH), Feb 2021. IEEE, pp 210–214
33. Zulfiqar F, Bajwa UI, Mehmood Y (2023) Multi-class classification of brain tumor types from
MR images using EfficientNets
Current Advances in Locality-Based
and Feature-Based Transformers:
A Review
Abstract The rise of deep learning and the transformative impact of transformers,
particularly in NLP and CV domains, have been remarkable. While transformers
have shown promise in medical imaging tasks, the original architecture required
enhancements to capture local information effectively. Recent research has focused
on developing locality-based and feature-based transformers to process spatial infor-
mation better and improve feature representation in input data. The main content of
this survey article includes (1) the paper covers the background of transformers,
(2) provides a theoretical review of the transformer and vision transformer, (3)
offers insights into different spatial- and feature-based vision transformers, and
(4) it also addresses common challenges and explores potential research directions,
underscoring the current research challenges in further improving their performance.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 321
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_28
322 A. Srivastava et al.
2 Background
Transformers were initially created to address the limitations of RNNs and CNNs
in capturing long-term dependencies in sequential data, particularly in the context
of natural language processing. Their groundbreaking idea involves incorporating
attention mechanisms that enable the model to focus on specific sections of the input
sequence, facilitating more effective processing.
The attention mechanism proposed in [17] involves the use of learnable parameters
W q , W k , and W v to convert the input X ∈ R, which is a matrix of shape n × c, into
query matrix Q, key matrix K, and value matrix V, all of shape n × d.
Q = X × W q , W q ∈ R c×d ,
K = X × W k , W k ∈ R c×d , (1)
v v
V = X ×W , W ∈ R c×d
.
The equation system (1) shows the calculation of query Q, key K, and value V
matrices by multiplying input X with weight matrices W q , W k , and W v , respectively.
( √ )
A(Q, K ) = Softmax Q × K T / d , (2)
MSA computes multiple attention weight matrices by splitting the queries, keys,
and values into h “heads” or parts, each calculated using a separate set of learned
projection matrices.
A learned weight matrix multiplies the concatenated attention weight matrices to
derive the final output of the MSA operation. The ability of transformer models to
capture intricate patterns and relationships in data has resulted in better performance
in NLP and CV tasks.
( )
Zi = SA X × Wi , X × Wik , X × Wiv ,
q
(4)
MSA(Q, K , V ) = Concat[Z 1 , . . . , Z h ] × W o ,
(Q), keys (K), and values (V ) into multiple sub-spaces, allowing for the calculation
of similarities between context features.
Current Advances in Locality-Based and Feature-Based Transformers … 325
3 Transformer
In 2017, Vaswani et al. introduced the transformer architecture [4] as a neural network
architecture primarily used for NLP tasks like text generation, sentiment analysis,
and language translation. Instead of recurrent or convolutional layers, the transformer
model uses self-attention in a sequence-to-sequence model to process sequential input
data. The self-attention mechanism in transformers assigns importance weights to
each token or word in a sequence, depending on its relationship with other tokens or
words in the sequence. Consequently, the model can capture long-range dependencies
and contextual information without requiring recurrent connections, as in traditional
recurrent neural networks.
Each layer of the transformer architecture’s encoder and decoder components
contains two sub-layers: a position-wise feed-forward network and a multi-head
self-attention mechanism. These sub-layers are identical in each layer of the stack.
After the MSA output is passed through it, the position-wise feed-forward network
applies a two-layer feed-forward neural network with a ReLU activation function to
each position in the sequence independently. As a result, it is known as “position-
wise.”
where PE(pos,2i) and PE(pos,2i+1) are the 2ith item of the positional encoding vector for
position pos, d is the hidden dimension of the model, and i ranges from 0 to [d/2]
− 1.
The transformer model can differentiate between positions in the sequence and
include the element’s order in its computations by including positional encoding in
the input embeddings. This capability enables the model to acquire more intricate
and significant representations of the input sequence.
The transformer architecture has proven more parallelizable and effective in
handling longer sequences compared to traditional RNN-based models, making it a
powerful tool in various NLP tasks and other domains where capturing long-range
dependencies is crucial.
3.3 Encoder
It processes the input sequences to generate a set of hidden patterns that contain
the relevant sequence information. It consists of multiple layers containing two sub-
layers: the multi-head self-attention (MSA) layer and the position-wise feed-forward
layer. First, the input sequence is embedded into a continuous vector space using an
embedding layer, and positional encoding is added to provide positional information
about the sequence elements.
MSA is applied to the input sequence in each encoder layer’s first sub-layer to
capture dependencies between different elements. In the second sub-layer, a position-
wise feed-forward network independently involves a nonlinear transformation to each
sequence component to capture complex interactions between the sequence elements
and generate higher-level features. The downstream tasks use the final hidden repre-
sentations of the input sequence obtained after passing through all encoder layers.
The transformer encoder is more parallelizable and effective in handling longer
sequences than RNNs.
Current Advances in Locality-Based and Feature-Based Transformers … 327
3.4 Decoder
The decoder in the transformer architecture produced the output sequence by utilizing
the hidden representations generated by the encoder. The decoder has multiple
identical layers, each containing three sub-layers:
• The masked multi-head self-attention layer
• The multi-head self-attention layer
• The fully connected feed-forward network layer.
By masking the attention weights for future positions, the masked multi-head
self-attention layer ensures that the decoder can only attend to the previously gener-
ated output elements, not future ones. The second sub-layer applies multi-head self-
attention to the hidden representations produced by the encoder. The third sub-layer
is a fully connected feed-forward network that independently involves a nonlinear
transformation of each element in the sequence. A SoftMax layer utilizes the final
hidden representations of the output sequence generated after passing through all
these layers to create the final output. The decoder is more parallelizable and better
suited for longer lines than traditional RNN-based sequence-to-sequence models.
The transformer architecture has proven more parallelizable and effective in
handling longer sequences compared to traditional RNN-based models, making it a
powerful tool in various NLP tasks and other domains where capturing long-range
dependencies is crucial.
ViT is a variant of the transformer architecture adapted for processing images. The
ViT was introduced in a 2020 paper by Dosovitskiy et al. called “An Image Is Worth
16 × 16 Words: Transformers for Image Recognition at Scale” [8].
In this architecture, an image is first divided into a sequence of fixed-size (i.e., 16
× 16, 32 × 32) patches, flattened, and forwarded into a standard transformer encoder.
The patch embeddings are then augmented with a learnable positional encoding that
indicates the spatial location of each image patch. In order to transform an image
X ∈ R H × W × C (where H height, W weight, C channels), it needs to reshape
X into sequence of flattened 2D patches: xp ∈ R N × (P2 · C), where (P × P) is
the patch resolution and N = HW /P2. The ViT model is designed to process images
of varying sizes, and it achieves this by introducing a learnable token known as the
“class token.” This token is concatenated to the patch embeddings and acts as a
representation of the whole image. The transformer layers process the token-patch
embeddings and produce a final feature vector. This is then utilized to predict the
class label of the image.
Z ← concat([CLASS], X W ), (7)
328 A. Srivastava et al.
where W = projection.
The ViT architecture utilizes pre-training on extensive datasets to develop visual
representations that can be adapted for downstream tasks. This pre-training is usually
conducted on vast datasets like ImageNet, using a variation of contrastive learning
known as the “pretext task.” In this technique, the model is trained to predict the
patches position of an image, encouraging it to acquire significant visual charac-
teristics that capture the spatial connections between various portions of an image
(Fig. 2).
Feature-Based ViT
Researchers are working on enhancing the performance of ViT by generating many
feature maps, including token maps and attention maps. This approach aims to extract
a comprehensive range of features, enabling the model to excel in diverse tasks. By
utilizing distinct feature maps, the model can capture various types of components,
improving its overall performance.
Our focus is primarily on discussing transformers’ spatial-based and feature-based
variants, with only a brief discussion of specific variants related to hierarchical-based
models and applications (Table 1).
These variant architectures contribute to the adaptability and efficiency of the
ViT model in various image processing tasks, including image recognition, object
detection, and segmentation.
Hierarchical-Based ViT
The original ViT architecture has a significant limitation: Its high computational
cost increases as the input size grows. This issue has prompted researchers to
explore different methods for reducing the feature size, which can help decrease
training time and computational complexity. By implementing these approaches, the
computational cost can be reduced while maintaining the model’s performance.
Liu et al. introduced Swin-Transformer [34] as a modification of the ViT archi-
tecture. It employs a hierarchical structure that merges patches after each block. This
reduction in computation time complexity is achieved by computing self-attention
only within each local shifted window. PiT [35], proposed by Heo et al. in 2021, is
another variant of ViT architecture that includes a Pooling Layer for dimension reduc-
tion. The Pooling Layer employs a depth-wise convolutional layer to accomplish the
reduction.
While ViT has shown great promise in image classification tasks, there are still
some challenges and discussions surrounding their use. One challenge is large-scale
vision transformers’ high computational cost and memory requirements. Training
and inference of these models can be time-consuming and require specialized hard-
ware, limiting their practical use in some applications. Another challenge is the
need for extensive training data to achieve high performance. While pre-training on
large-scale datasets has improved performance, obtaining such datasets in specific
domains, such as medical imaging, can be challenging.
The application of ViT in medical imaging has gained attention as a promising
research area, primarily due to its remarkable performance in various image
processing tasks. ViT’s ability to learn global image representations has the potential
to improve the interpretation of visual patterns in medical imaging, where an interna-
tional understanding of the entire image is critical. Despite its potential advantages,
330 A. Srivastava et al.
Table 1 (continued)
References Architecture 2D/ Parameter Task Dataset Highlight
type 3D
CeiT [25] Hybrid 2D 6.4M 7 ImageNet [19] CeiT
downstream incorporates a
tasks depth-wise
convolutional
layer in its
feed-forward
network to
enhance the
extraction of
local features.
The model
employs
layer-wise
class-token
attention, which
involves
computing
self-attention on
it, gather
different class
representations
LocalViT Hybrid 2D 7.5M Classification ImageNet-2012 The authors
[26] [19] propose using
convolutional
layers within the
feed-forward
network (FFN)
to capture local
features in each
transformer
block. They
used various
activation
functions and
architectures
within the FFN
to enhance the
model’s
performance
(continued)
332 A. Srivastava et al.
Table 1 (continued)
References Architecture 2D/ Parameter Task Dataset Highlight
type 3D
CCT [27] Hybrid 2D 3.7M Classification CIFAR-10 [28] CCT utilizes
convolutional
layers to obtain
embeddings and
eliminates
positional
embeddings to
enable the
model to handle
inputs of
varying sizes
DeepViT Pure 2D 55M Classification ImageNet [19] The authors
[29] analyzed the
attention maps
of the ViT
model and
observed that
attention
collapse occurs
in the deeper
layers of the
model
T2T-ViT Pure 2D 21.5M Classification ImageNet [19] The architecture
[30] of DeiT is based
on the
observation that
many of the
feature maps in
ViT are
meaningless
when reshaped
from tokens,
which replaces
fully connected
layers with a
combination of
convolutional
and attention
layers
(continued)
Current Advances in Locality-Based and Feature-Based Transformers … 333
Table 1 (continued)
References Architecture 2D/ Parameter Task Dataset Highlight
type 3D
RegionViT Pure 2D 35.7M Classification ImageNet21K The proposed
[31] and [32] ViT features a
segmentation pyramid
structure and
introduces a
new
regional-to-local
attention
mechanism in
vision
transformers. It
generates two
types of tokens,
regional and
local, from an
image of
different sizes
SpectFormer Pure 2D 9M Object ImageNet-1K To investigate
[33] detection and [32] the fundamental
segmentation design of
transformers,
author employ a
combination of
techniques,
including
spectral and
multi-headed
attention
challenges must be addressed to fully realize the benefits of using ViT in medical
imaging. A sufficient amount of labeled data poses a significant challenge in the
medical domain. Another challenge is the high computational cost of ViT, which can
limit its practical application in medical imaging.
6 Conclusion
provided researchers with diverse options for their needs. However, some challenges
and limitations are still associated with ViTs, such as their interpretability and ability
to handle small or irregularly shaped images. Furthermore, further investigation is
necessary to explore the potential of ViTs in diverse medical imaging domains.
Overall, the development of ViTs has opened up new avenues for research and has
the potential to revolutionize the field of medical imaging. As technology evolves, it
will be exciting to see breakthroughs and innovations emerging.
References
1. Arevalo J, Gonzalez FA, Ramos-Pollán R, Oliveira JL, Lopez MAG (2016) Representation
learning for mammography mass lesion classification with convolutional neural networks.
Comput Methods Programs Biomed 127:248–257
2. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
3. Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s.
arXiv preprint arXiv:2201.03545
4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I
(2017) Attention is all you need. In: Advances in neural information processing systems, pp
5998–6008
5. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the
IEEE conference on computer vision and pattern recognition, pp 7794–7803
6. Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805
7. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding
by generative pre-training
8. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X et al (2021) An image is worth
16×16 words: transformers for image recognition at scale. In: ICLR
9. Perera S, Adhikari S, Yilmaz A (2021) POCFormer: a lightweight transformer architecture for
detection of COVID-19 using point of care ultrasound. arXiv preprint arXiv:2105.09913
10. Xie E, Wang W, Wang W, Sun P, Xu H, Liang D, Luo P (2021) Segmenting transparent objects
in the wild with transformer. In: IJCAI
11. Zhang Y, Higashita R, Fu H, Xu Y, Zhang Y, Liu H, Zhang J, Liu J (2021) A multibranch
hybrid transformer network for corneal endothelial cell segmentation. arXiv preprint arXiv:
2106.07557
12. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object
detection with transformers. In: ECCV
13. Mathai TS, Lee S, Elton DC, Shen TC, Peng Y, Lu Z, Summers RM (2021) Lymph node
detection in T2 MRI with transformers. arXiv preprint arXiv:2111.04885
14. Vepakomma P, Gupta O, Swedish T, Raskar R (2018) Split learning for health: distributed deep
learning without sharing raw patient data. arXiv preprint arXiv:1812.00564
15. Fung G, Dundar M, Krishnapuram B, Bharat Rao R (2007) Multiple instance learning for
computer aided diagnosis. In: Advances in neural information processing systems, vol 19, p
425
16. Kwee TC, Kwee RM (2020) Chest CT in COVID-19: what the radiologist needs to know.
RadioGraphics 40(7):1848–1865
17. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align
and translate. arXiv preprint arXiv:1409.0473
18. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient
image transformers & distillation through attention. In: ICML
Current Advances in Locality-Based and Feature-Based Transformers … 335
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 337
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_29
338 K. Pooja and R. Kishore Kanna
1 Introduction
Gastric cancer has one of the higher increased mortalities of all cancers. Early gastric
cancer patients hardly ever show any symptoms. Similar to gastritis and stomach
ulcers, cancer develops prior to the most obvious symptoms appear. As a result, it
could be hard for patients to understand that their stomach cancer has already spread.
Therefore, the need for endoscopic imaging-based early stomach cancer detection
exists. Early stomach cancer can be difficult to diagnose with imaging, and the
accuracy depends on gastroenterologist’s experience.
In order to aid in image diagnosis using machine learning approaches, several
research on endoscopy by detecting automatic polyp detection [1]. There are two
reasons why an effective method for detecting early stomach cancer has not yet been
developed, according to researchers. First, there has been improper maintenance of
the early stomach cancer data that can be exploited in machine learning. Second,
many early gastric tumors lack certain morphological characteristics found in more
advanced malignancies.
The exceptional method of IEE also known as NBI, allows for the identifi-
cation of the textural patterns. When amplified, IEE performs a conclusive diag-
nosis of gastrointestinal cancer far better than traditional white-light imaging (WLI)
endoscopy [2]. Image enhanced endoscopy (IEE) has been shown to be effective
for identifying gastrointestinal cancers in numerous investigations. Esophageal and
pharyngeal malignancies can be found using magnification endoscopy with narrow
band endoscopy (NBI) [3]. Colon cancer detection with BLI has recently been
demonstrated to be efficient [4]. However, because to its low brightness, there are no
reports on the IEE’s efficiency in identifying stomach cancer. Since it can be difficult
to distinguish between the background mucosal alterations associated with gastritis
and the morphological changes associated with automatic gastric cancer screening
using WLI endoscopy.
Liu et al. focuses early stomach cancer detection using algorithms, which can
help gastroenterologists take decisions. Before doing this investigation, we acquired
about some endoscopic medical imaging datasets of early gastric cancer, the majority
types of which were WLI-captured images. Following an endoscopic submucosal
dissection (ESD), retroactive inspection revealed that all of the lesions were early
stage stomach cancer [5].
Consequently, discussing about the researchers initial research on convolutional
neural network (CNN) that to detect automatically on early stage stomach cancer
[6]. The following are the main contributions of this work: (1) the precise early auto-
matic diagnosis of stomach cancer with weak configuration traits, is sometimes hard
for endoscopists to diagnose; (2) the development of convolutional neural network
algorithm with a fixed set of image datasets; (3) the requirement of inaccurate early
gastric cancer diagnoses. By analyzing the characteristics of inaccurately identified
images, researchers may develop more efficient and effective ways for gastric cancer
early detection.
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 339
The main purpose of this systematic review is to analyze and summarize all the
current applications of artificial intelligence and neural networks in identification
gastric cancer and to evaluate the performance in CNN. Mainly this research can
help CNN build its knowledge base by filling the information gap in the detection of
stomach cancer.
2 Documentation Review
The purpose of image classification is to classify each image according to one or more
specified classifications. The most effective object segmentation and recognition
outcomes are actually produced by recent deep learning-based technologies. The
convolutional neural networks are the great reason to cause this. This technique is
faster in analysis and computation than previous technique. In medical imaging,
systems are being developed for early cancer diagnosis.
Scientific research and applications for classifying, detecting, segmenting
stomach cancer have been developed, with annotated datasets [7, 8]. Oukdach et al.
[9] created a computer-based network that could extract features from two open
datasets in the vast majority of studies [10–12] concentrate on the development
of stomach cancer. Tumor detection and recognition are essential in this field of
research because tumors represent a predefined object, and pixels collection of
images. Multiple objects on the same image can be detected using models like YOLO
[13]. The authors of [9] identified feature classifications by small object detector of
fundamental neural network model. Bounding boxes can be annotated to enhance
feature representation, as shown in work in [14]. This is based on weighted loss with
label smoothing. A brand-new, quicker version of R-CNN was also introduced in
[15]. The sensitivity (SE), or ratio the number is accurately diagnosed gastric cancer
lesions to the number of actual stomach cancer lesions, is discussed in another study
from [16]. In this study, they used CNN method to identified 71 of 77 gastric cancer
lesions after evaluating the 2296 test images in 47 s, with a PPV of 30.6% and a
sensitivity of 92.2%. It properly recognized 70 of 71 lesions with a diameter larger
than 6 mm out of 232 lesions, 161 of which were not malignant.
Additionally, the study in [17] shows that the researchers of [18] introduced Seg-
Net an architecture used to create an identification model of real-time polyps. In
addition, a system of positive predictive value of 30.6% and sensitivity of 92.2% was
described by Hirasawa et al. for automatic stomach cancer detection [19]. It changes
the output space of the bounding box into a set of default boxes with differing dimen-
sions for every feature classification [16]. The patch-based classification strategy by
Sakai et al. to construct a convolutional neural network that accurately diagnosed
stomach cancer in endoscopic images [20].
340 K. Pooja and R. Kishore Kanna
Endoscopic imaging system are used to detect the whole lesion by a lesion-based
CNN method, a sort of deep learning algorithm. It was developed in [21], endoscopy
is a key component of the GI tract evaluation because it helps doctors find the gastroin-
testinal (GI) tract more easily. Moreover, the FN rate for gastric cancer identification
is significant, when doctors use Esophagogastroduodenoscopy, a prominent proce-
dure for diagnosing stomach cancer, ranges from 4.6 to 25.7%. Among the most
common and effective frameworks is the Cafe deep learning framework acts as their
foundation, it identified the positive prediction value of 30.6% and sensitivity of
92.2%, the CNN was successful in detecting 71 of the 77 stomach cancer lesions
in this patient [22]. It properly identified 70 of 71 tumors out of 232 that had a
diameter larger than 6 mm, and 161 of those lesions were not malignant. Guitao and
Zhenwe recently predict the average for the categorization of stomach cancer. They
can look at these possibilities to improve the network structure of the mask R-CNN
for medical image detection and execute their model using additional data augmen-
tation. In previous research, deep learning enhanced the prediction and detection of
stomach cancer [23, 24].
In this review is clearly mentioned the proposed model to examined precision,
sensitivity, specificity, area under the curve (AUC), and accuracy (if provided) in the
models to explain the suggested CNN performance in the identification of stomach
cancer.
In addition to only being able to see something clearly but not being able to identify
it, segmentation can be a big advantage to CAD. It addresses special objects and
identifies them. Although colonoscopy is now the focus of artificial intelligence
research, and its importance and efforts highlight the upper GI applications. The
most important recognition characteristics were found the size of the artifacts and
how much they overlapped other objects [25]. A CNN system developed by Hirasawa
and others uses endoscopic data to automatically diagnose stomach cancer [19]. The
system was developed to use a dataset of 13, 584 endoscopic images from four
research universities collected over a 12-year period and assessed by a certified
expert. It was based on the Single Shot Multibox Detector architecture [26]. To
compare and evaluate the algorithm’s accuracy, researchers gathered an additional
2296 images from 77 stomach lesions. On CNN, a yellow rectangle box with the
terms “early or advanced stomach cancer” was displayed [25].
According to the results, The CNN had a sensitivity of 92.2% and evaluated
all of the images in 47 s. In differentiated intra-mucosal malignancies in lesions
were ignored, false positives were based on architectural variation and gastritis.
False positives were based on by gastritis and architectural variation. This kind of
researches, known as supervised learning, has an explained [27]. This indicates that
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 341
Frazzoni et al. have proposed using ML into clinical endoscopy for precise gastroin-
testinal illness detection [32]. To understand how machine learning influences the
difficulty of gastrointestinal endoscopic diagnosis, a strong technical framework
is necessary, especially for each physician. The stomach is anatomically unique
compared to other gastrointestinal organs including the colon and oesophagus. Basi-
cally, it has potential blind patches and a broader bent lumen, requiring more complex
observations [31]. As a result, in order to eliminate mistakes, clinicians require
numerous distant looks. Some of the initial symptoms, EGC may potentially be
masked by a Helicobacter pylori infection. Endoscopic diagnosis has been noted to
differ because of these factors [32, 33].
Endoscopic mucosal resection (EMR), a method of endoscopically assisted treat-
ment for gastric cancer, carries a minimal risk of metastatic lymph nodes, the main
treatment for stomach cancer in Japan, and its acceptance has increased in the West
side. It is notable for being minimally invasive [34]. Particularly, in larger lesions
is than 15 mm, EMR may make determining malignant depth more difficult and it
rises the tumor recurrence [33]. ESD uses a different method of endoscopic dissec-
tion that developed for the removal of larger lesions. It frequently proven successful
alternative to standard open or laparoscopic surgery for EGC [35].
Computer-aided detection using convolutional neural networks was created by
Zhu et al. with purpose of progressively integrating artificial intelligence into the
342 K. Pooja and R. Kishore Kanna
practical endoscopic resection method [36]. The amount of tumor invasion can be
accurately predicted by an AI-based detection system, which can reduce the need for
a gastrectomy, but it may not be able to direct endoscopic resection procedures or emit
alarms for serious risks of consequences. In reality, at a rate of 3.5%, ESD-related
complications in gastric cancer, including as bleeding, peritonitis, and perforation,
continue to be a major difficulty [37].
Additionally, Odagiri et al. showed a linear relationship between a major hospital
volume and a reduced prevalence of ESD-related issues. Given that the endoscopic
submucosal dissection and endoscopic mucosal resection (EMR) have higher oper-
ating proficiency, makes sense to start by providing hospitals with a high volume both
the development and implementation of AI-based procedures [38]. This is particu-
larly true for hospitals using machine learning or deep learning that requires proper
considered when applying.
Namikawa et al. recognized the endoscopic clinical AI guide describes in full the
blind spot monitoring, clinical detection, and classification uses of AI in stomach
medicine [39]. They also consider the prospect that AI could one day be properly
trained to differentiate between stomach lesions that are non-plastic and those that
are cancerous, generating a more special significance. In artificial intelligence appli-
cations, treatment of gastric cancer is still in its development. The interpretation of
chemoradiotherapy data such as genomic, mutation analysis, and the prediction of
insensitivity following the use of artificial intelligence and the diagnosis of gastric
cancer by endoscopic image analysis, which highly depends upon feature extraction
[40]. AI is also applicable to the domain of drug responsiveness, which is defined
by clinical variability and imbalances. DeepIC50 is a model of a one-dimensional
convolution neural network was reviewed by Joo et al. to reliably predict the respon-
siveness of stomach cancer drugs. They obtained a comparable outcome when they
validated the technology using information from actual stomach cancer patients and
cell lines [40]. The most recent suggestions highlight that before routine clinical
usage, AI must be validated in multicenter statistically controlled studies; hence,
prospective and multicenter research will be required in the future.
4 Discussion
The inability to access the basic healthcare system and the diagnosis accuracy range
is present considerable obstacles to the global health care system. In medicine the
machine learning algorithms rapidly risen in recent years, ranging from more basic
machine learning techniques to more recent deep learning techniques. Endoscopy and
a pathological examination both depend on the operator, and the diagnosis is random.
Moreover, the use of AI-assisted inspection might help to offer a different perspective
and lessen the need for operators to perform diagnostic tests. The applications of the
algorithm have a significant impact on the growth of the medical and health indus-
tries as well as the reliability of clinical diagnosis. According to our detailed review,
the convolutional neural networks of stomach cancer show satisfactory accuracy
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 343
5 Conclusion
In this summary, the novelty of the paper is convolutional neural networks are
currently being used in the diagnosis of stomach cancer, thus we undertook a thor-
ough review of the subject. The trained with a large amount of data, neural networks
are effective and powerful algorithms. It is worthwhile to adapt AI to improve gastric
cancer diagnoses, despite increasing efforts. The information that comes out could
completely change how we deal with issues related to gastric cancer. Although inte-
gration may be slow and it can be used to improve image modalities for diagnosis and
treatment plans. It has the potential to become a valuable tool for doctors, but only if
it is adapted and taught. Thus, the study performed by CNN on medical imaging will
be a useful and the accuracy results of some research ranged from 77.3 to 98.7%, and
all research demonstrated strong identification performance. CNN is believed to have
a significant role in assisting scientists and physicians in identifying diseases more
accurately and quickly. The precision of the input data and the machine learning
technique used to train the AI model are the only factors that determine whether
AI will be able to produce data which is clinically effective. Hence, scientists are
concentrating on developing a novel optimization algorithm that can take a lot of
information and generate more accurate data in a predictable amount of time.
344 K. Pooja and R. Kishore Kanna
References
1. Hatami S, Shamsaee DR, Hasan Olyaei M (2020) Detection and classification of gastric
precancerous diseases using deep learning. In: 6th Iranian conference on signal processing
and intelligent systems (ICSPIS), Mashhad, Iran, pp 1–5
2. Zhang Q, Wang F, Chen ZY, Wang Z, Zhi FC, Liu SD, Bai Y (2016) Comparison of the
diagnostic efficacy of white light endoscopy and magnifying endoscopy with narrow band
imaging for early gastric cancer: a meta-analysis. Gastric Cancer 19(2):543–552
3. Sugimoto M, Kawai Y, Morino Y, Hamada M, Iwata E, Niikura R et al (2022) Efficacy of high-
vision transnasal endoscopy using texture and colour enhancement imaging and narrow-band
imaging to evaluate gastritis: a randomized controlled trial. Ann Med 54(1):1004–1013
4. Ang TL, Li JW, Wong YJ, Tan YLJ, Fock KM, Tan MTK et al (2019) A prospective random-
ized study of colonoscopy using blue laser imaging and white light imaging in detection and
differentiation of colonic polyps. Endosc Int Open 7(10):E1207–E1213
5. Liu L, Liu H, Feng Z (2022) A narrative review of postoperative bleeding in patients with
gastric cancer treated with endoscopic submucosal dissection. J Gastrointest Oncol 13(1):413
6. Martin DR, Hanson JA, Gullapalli RR, Schultz FA, Sethi A, Clark DP (2020) A deep learning
convolutional neural network can recognize common patterns of injury in gastric pathology.
Arch Pathol Lab Med 144(3):370–378
7. Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation.
IEEE Trans Pattern Anal Mach Intell 39:1
8. Lee S-A, Cho HC, Cho H-C (2021) A novel approach for increased convolutional neural
network performance in gastric-cancer classification using endoscopic images. IEEE Access
9:51847–51854
9. Oukdach Y, Kerkaou Z, El Ansari M, Koutti L, El Ouafdi AF (2022) Gastrointestinal diseases
classification based on deep learning and transfer learning mechanism. In: 2022 9th interna-
tional conference on wireless networks and mobile communications (WINCOM). IEEE, pp
1–6
10. Pang X, Zhao Z, Weng Y (2021) The role and impact of deep learning methods in computer-
aided diagnosis using gastrointestinal endoscopy. Diagnostics 11(4):694
11. Lee JY, Jeong J, Song EM, Ha C, Lee HJ, Koo JE et al (2020) Real-time detection of colon
polyps during colonoscopy using deep learning: systematic validation with four independent
datasets. Sci Rep 10(1):1–9
12. Mushtaq D, Madni TM, Janjua UI, Anwar F, Kakakhail A (2023) An automatic gastric polyp
detection technique using deep learning. Int J Imaging Syst Technol
13. Doniyorjon M, Madinakhon R, Shakhnoza M, Cho YI (2022) An improved method of polyp
detection using custom YOLOv4-tiny. Appl Sci 12(21):10856
14. Zeng Q, Li H, Zhu Y, Feng Z, Shu X, Wu A et al (2022) Development and validation of a
predictive model combining clinical, radiomics, and deep transfer learning features for lymph
node metastasis in early gastric cancer. Front Med 9
15. Shibata T, Teramoto A, Yamada H, Ohmiya N, Saito K, Fujita H (2020) Automated detection
and segmentation of early gastric cancer from endoscopic images using mask R-CNN. Appl
Sci 10(11):3842
16. Yoon HJ, Kim JH (2020) Lesion-based convolutional neural network in diagnosis of early
gastric cancer. Clin Endosc 53(2):127–131
17. Zhang J, Wen T, He T, Wang X, Hao R, Liu J (2022) Human stools classification for gastroin-
testinal health based on an improved ResNet18 model with dual attention mechanism. In:
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp
2096–2103
18. Ma L, Su X, Ma L, Gao X, Sun M (2023) Deep learning for classification and localization of
early gastric cancer in endoscopic images. Biomed Signal Process Control 79:104200
19. Hirasawa T, Aoyama K, Tanimoto T, Ishihara S, Shichijo S, Ozawa T et al (2018) Application
of artificial intelligence using a convolutional neural network for detecting gastric cancer in
endoscopic images. Gastric Cancer 21:653–660
A Systematic Review on Detection of Gastric Cancer in Endoscopic … 345
20. Sakai Y, Takemoto S, Hori K, Nishimura M, Ikematsu H, Yano T, Yokota H (2018) Automatic
detection of early gastric cancer in endoscopic images using a transferring convolutional neural
network. In: 2018 40th annual international conference of the IEEE engineering in medicine
and biology society (EMBC). IEEE, pp 4138–4141
21. Kuchkorov TA, Sabitova NQ, Ochilov TD (2022) Detection of gastric ulcers and lesions
applying CNN architecture. Int J Contemp Sci Tech Res 200–204
22. Horiuchi Y, Aoyama K, Tokai Y, Hirasawa T, Yoshimizu S, Ishiyama et al (2022) Convolutional
neural network for differentiating gastric cancer from gastritis using magnified endoscopy with
narrow band imaging. Dig Dis Sci 65(5):1355–1363
23. Tani LFK, Tani MYK, Kadri B (2022) Gas-Net: a deep neural network for gastric tumor
semantic segmentation. AIMS Bioeng 9(3):266–282
24. Cao G, Song W, Zhao Z (2019) Gastric cancer diagnosis with mask R-CNN. In: 2019 11th
international conference on intelligent human-machine systems and cybernetics (IHMSC), vol
1. IEEE, pp 60–63
25. Gholami E, Tabbakh SRK (2021) Increasing the accuracy in the diagnosis of stomach cancer
based on color and lint features of tongue. Biomed Signal Process Control 69:102782
26. Ikenoyama Y, Hirasawa T, Ishioka M, Namikawa K, Yoshimizu S, Horiuchi Y et al (2021)
Detecting early gastric cancer: comparison between the diagnostic ability of convolutional
neural networks and endoscopists. Dig Endosc 33(1):141–150
27. Zhou B, Rao X, Xing H, Ma Y, Wang F, Rong L (2022) A convolutional neural network-based
system for detecting early gastric cancer in white-light endoscopy. Scand J Gastroenterol 1–6
28. Jamil D, Palaniappan S, Lokman A, Naseem M, Zia SS (2022) Diagnosis of gastric cancer
using machine learning techniques in healthcare sector: a survey. Informatica 45(7):2022
29. Sivero L, Volpe S, Gentile M, Sivero S, Iovino S, Gennarelli N et al (2022) Role of narrow band
imaging (NBI), in the treatment of non-polypoid colorectal lesions, with endoscopic mucosal
resection (EMR). Ann Ital Chir 93(2):178–182
30. Li L, Chen Y, Shen Z, Zhang X, Sang J, Ding Y et al (2020) Convolutional neural network
for the diagnosis of early gastric cancer based on magnifying narrow band imaging. Gastric
Cancer 23:126–132
31. Gong L, Wang M, Shu L, He J, Qin B, Xu J (2022) Automatic captioning of early gastric cancer
using magnification endoscopy with narrow-band imaging. Gastrointest Endosc 96(6):929–942
32. Frazzoni L, Arribas J, Antonelli G, Libanio D, Ebigbo A, van der Sommen F et al (2022) Endo-
scopists’ diagnostic accuracy in detecting upper gastrointestinal neoplasia in the framework of
artificial intelligence studies. Endoscopy 54(04):403–411
33. Fujiyoshi MRA, Inoue H, Fujiyoshi Y, Nishikawa Y, Toshimori A, Shimamura Y et al (2022)
Endoscopic classifications of early gastric cancer: a literature review. Cancers 14(1):100
34. Kinami S, Saito H, Takamura H (2022) Significance of lymph node metastasis in the treatment
of gastric cancer and current challenges in determining the extent of metastasis. Front Oncol
11:5628
35. Kanai M, Togo R, Ogawa T, Haseyama M (2019) Gastritis detection from gastric X-ray images
via fine-tuning of patch-based deep convolutional neural network, pp 1371–1375
36. Cho BJ, Bang CS, Park SW, Yang YJ, Seo SI, Lim H, Baik GH (2019) Automated classification
of gastric neoplasms in endoscopic images using a convolutional neural network. Endoscopy
51(12):1121–1129
37. Zhu Y, Wang QC, Xu MD, Zhang Z, Cheng J, Zhong YS (2019) Application of convolutional
neural network in the diagnosis of the invasion depth of gastric cancer based on conventional
endoscopy. Gastrointest Endosc 89(4):806–815
38. Odagiri H, Hatta W, Tsuji Y, Yoshio T, Yabuuchi Y, Kikuchi D, Hoteya S (2022) Bleeding
following endoscopic submucosal dissection for early gastric cancer in surgically altered
stomach. Digestion 103(6):428437
346 K. Pooja and R. Kishore Kanna
Abstract Presently, there are various technologies used by governments all over the
world to register/transfer property. However, most of these systems are centralized in
nature and are vulnerable in terms of security, so legal/illegal exploits are possible.
Our solution aims to address this problem by the use of blockchain technology
specifically smart contracts and Non-Fungible Tokens to decentralize the property
registration process making it visible to anyone across the globe to view all transac-
tions made through the system thereby reducing corruption through illegal property
investments and also to make it virtually impossible for any person or organization
to exploit the system.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 347
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_30
348 S. Prathibha et al.
1 Introduction
2 Literature Review
The following literature survey provides detailed and distinct information about prop-
erty registration using a decentralized blockchain application. The findings and the
drawbacks of each architecture have been provided.
Blockchain has several applications, including real estate, as it provides an excel-
lent service in this field. The safety and transparency that blockchain provides are
highly beneficial in real estate transactions. For instance, when buying a house, the
Propchain: Decentralized Property Management System 349
3 Proposed System
This front-end module is used by the end-user to register any new property (resi-
dential/commercial) and create a Non-Fungible Token for that particular property
(Fig. 1).
Steps for registering a property:
Step 1: The user opens the registration page from the homepage.
Step 2: The user then proceeds to fill up all necessary info regarding the property
that is about to be registered.
Step 3: The user will then be redirected to a page where they can make a note of
their Asset ID for their reference.
Step 4: Once the transaction gets approved by an official, a unique NFT
representing their property will be transferred to their Blockchain account.
Step 2: The user then proceeds to fill up all necessary info regarding the property
including the Asset ID that the property is assigned.
Step 3: The user will then be redirected to a page that notifies them about the
completion of the process from their end.
Step 4: Once the transaction gets approved by an official, the unique NFT
representing that property will be transferred to the new owner’s Blockchain
account.
This front-end module is used by the end-user to view all the details about a property
including the validity of the property and the NFT in question, the last registration/
transfer details, the government official who approved the transaction, the last known
price of the property, date of last transaction, details about the current owner, and so
on (Fig. 3).
Steps for verifying a property asset:
Step 1: The user opens verify page from the homepage.
Step 2: The user then proceeds to fill up the Asset ID that they are provided for
that property.
Step 3: The user will then be redirected to a page where they can find more
information regarding the property.
4 Working
5 Implementation
The proposed system utilizes blockchain technology and NFTs to create a user-
friendly platform for property registration, transfer, and verification. Users can
register new properties, and upon approval, a unique NFT is created as proof of owner-
ship. Existing properties can be transferred securely between owners through a title
transfer module. Property verification is made possible by entering the Property ID,
which provides comprehensive details about the property. A smart contract governs
NFT functionalities, preventing duplication and ensuring data integrity. The system
handles token minting and property transfers seamlessly, guaranteeing transparency
and trust in property transactions (Fig. 4).
The smart contract stores property and owner data, mints property tokens, and
facilitates their transfer. The gatherOwnerData() method obtains input from the UI
and stores it as structures in the Blockchain. Similarly, gatherPropertyData() stores
property details. The mint() function creates a Non-Fungible Token and links it to the
registered property. The transferToken() method transfers the token to a new owner,
updating the token’s information.
Figures 5 and 6 denote the front-end (user interface) of the proposed solution,
specifically the property registration module and the property transfer module. In
the case of property registration, necessary details regarding the owner such as full
name, ID number (Example: SSN, Aadhar, etc.), date of birth, owner’s wallet address,
current address, and contact details along with property details such as its location,
type (residential/commercial), form (apartment/single house), current price, and the
name of the previous owner are obtained. During property transfer, similar details
are obtained to facilitate property transfer from the old owner to the new owner.
When the register button is clicked, the data is sent for approval by the appropriate
authority after which the data is stored in the blockchain, and the necessary methods
are triggered.
356 S. Prathibha et al.
Fig. 4 Algorithm
6 Conclusion
and safe transfer of ownership, doing away with the need for middlemen and cutting
down on transaction costs. Giving property owners more control and transparency
over their assets promotes accountability and confidence in business dealings. Future
work on this project may include additional research and development in areas for
improved security and privacy, looking into scalability options to handle a greater
volume of property transactions, and performing real-world testing and validation of
the Propchain system.
358 S. Prathibha et al.
References
1. Atlam HF, Wills GB (2019) Technical aspects of blockchain and IoT. In: Advances in computers,
vol 115. Academic Press, Amsterdam, The Netherlands, pp 1–39. ISBN 9780128
2. Bhanushali D, Koul A, Sharma S, Shaikh B (2020) Blockchain to prevent fraudulent activities:
buying and selling property using blockchain. In: 2020 international conference on inventive
computation technologies (ICICT), Coimbatore, India, pp 705–709. https://doi.org/10.1109/ICI
CT48043.2020.9112478
3. Sahai A, Pandey R (2020) Smart contract definition for land registry in blockchain. In: 2020 IEEE
9th international conference on communication systems and network technologies (CSNT),
Gwalior, India, pp 230–235. https://doi.org/10.1109/CSNT48778.2020.9115752
4. Majumdar MA, Monim M, Shahriyer MM (2020) Blockchain based land registry with dele-
gated proof of stake (DPoS) consensus in Bangladesh. In: 2020 IEEE region 10 symposium
(TENSYMP), Dhaka, Bangladesh, pp 1756–1759. https://doi.org/10.1109/TENSYMP50017.
2020.9230612
5. Rana SK, Rana SK, Rana AK, Islam SMN (2022) A blockchain supported model for secure
exchange of land ownership: an innovative approach. In: 2022 international conference on
computing, communication, and intelligent systems (ICCIS), Greater Noida, India, pp 484–489.
https://doi.org/10.1109/ICCCIS56430.2022.10037224
6. Shrivastava AL, Kumar Dwivedi R (2023) Blockchain-based secure land registry system using
efficient smart contract. In: 2023 international conference on intelligent data communication
technologies and internet of things (IDCIoT), Bengaluru, India, pp 165–170. https://doi.org/10.
1109/IDCIoT56793.2023.10053476
7. Borse Y, Ahirao P, Chawathe A, Patole D (2022) Embracing permissioned blockchain for a
secure and reliable land registration system. In: 2022 5th international conference on advances
in science and technology (ICAST), Mumbai, India, pp 525–530. https://doi.org/10.1109/ICA
ST55766.2022.10039485.
8. Ncube N, Mutunhu B, Sibanda K (2022) Land registry using a distributed ledger. In: 2022
IST-Africa conference (IST-Africa), Ireland, pp 1–7. https://doi.org/10.23919/IST-Africa56635.
2022.9845584
Criminal Prevision with Weapon
Identification and Forewarning Software
in Military Base
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 359
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_31
360 V. Ceronmani Sharmila et al.
ensuring the safety of personnel and equipment. Any breach of security could have
profound consequences, including the theft or damage, or destruction of sensitive
information or equipment, as well as potential threats to national security. As such,
constant vigilance and proactive security measures are necessary to prevent and
detect intrusion in military bases.
There are many methods of intrusion detection that are deployed in the military
base and in the border of the country; some of these systems are:
• Perimeter security: Security is maintained by keeping secured fences, walls, or
other barriers to prevent unauthorized access.
• Motion sensors: The sensors are installed in the sensitive areas of military bases
to detect the movements.
• Access control systems: Control system can be deployed to regulate and monitor
access to the military base and its facilities.
• Patrols: Regular patrols by security personnel can help detect any unauthorized
activities and respond quickly to potential threats.
As security importance in the military sector is especially important, the accuracy
and precision of the detection software should be equally important and maximum,
securing from the infiltration of people from destroying or collecting confidential
data. In this proposed system, we are using YoloV3 for person detection, Dlib for
facial recognition, and YoloV5 for weapon detection. Here, we have trained the
YoloV5 algorithm with 2000 images of weapon dataset to increase the precision
and accuracy, and for identifying the person, we are using the pre-trained dataset
of YOLOv3, and for facial recognition, we are using Dlib and its face recognition
pipeline called the face alignment using facial landmarks. The main purpose of our
software is to find the intruder’s infiltration into the base and the weapons he is
carrying in his hand to analyze the threat level.
2 Related Works
Diverse types of systems with various kinds of solutions were used to address the issue
and were also helpful. Several systems for human intrusion detection already exist
with diverse types of algorithms used. In these types of systems, only the detection
of the intruders was implemented, and some were highly effective though. But there
are only a limited number of systems that use both the detection of intruders and the
weapon that they are carrying along with them. These systems are also not efficient
so they are not considered that important. Based on our project idea, we had done
a lot of research work and had driven through several papers one of such was [1]
which deals with the fact that millimeter waves are used (mmWs) to detect weapons
that are being placed inside the containers. Gonzalez-Sosa and Fierrez along with
the other two members suggested the method in which mmW technology used for
imaging is used here. Then comes another system that is used for regions that use fast
steering mirror (FSM). While addressing the system [2], Tianqing et al. along with
Criminal Prevision with Weapon Identification and Forewarning … 361
the other two members set it in such a way that it can be used to monitor hot targets
and hot regions. One of the systems that is used in the military for target detection
was [3] proposed by Kong et al. Here, mainly the YoloV3 algorithm is treated with
a lightweight neural network (GhostNet) which makes it more precise and accurate.
It is used to detect military targets. Then we dealt with another paper [4] which is
designed in such a way that the presence of light should not affect the system. This
was one of the main concerns of Li, Yang, and the other three while designing the
system. Here, the interesting thing is that there are several detection and tracking
mechanisms but here, they use Retinaface and Camshift algorithms so that it cannot
be affected by light. It functions with the P steering gear in the PID controller.
The paper [5] that proposed by Zhang et al. mentions the tactics of ground-to-
air defense. In this, it is formulated as the dynamic sensor or heterogenous weapon
target assignment. Some of the works are [6] focused on deep learning algorithms.
Here when Bhatti, Khan, and two others tried to develop the system by adding binary
classification, the pistol class was assumed as the reference class, and along with that,
they also introduced relevant confusion objects to minimize the false positive. Ruiz-
Santaquiteria, Velasco-Mata, and four others [7] proposed by using human pose and
weapon appearance. They used body pose, especially while using shotgun posture,
which is important, so they made a particularly good approach in this project. In [8],
Hashim et al. proposed the use of ML. By this, it can be used for PPA detection of
weapons using CNN. It pinpoints the accuracy of automatic weapon identification
for both surveillance purposes. The next project is by Xu et al. [9]. The paper’s
main objective is to concentrate on detecting prohibited items inside the airport fully
automated. And the efficiency and stability had been demonstrated through X-ray
and GDX-ray image datasets. Rahul Chiranjeevi and Malathi had done their paper
on [10].
3 Research Method
We are using two versions of the same algorithm in this criminal prevision and
weapon identification, and for facial recognition, we are using Dlib. When it comes
to weapon detection, we are using YoloV5, and for personal identification, we are
using YoloV3. YoloV3 is a real-time object detection algorithm that was released in
April 2018. It is the third version of the popular You Only Look Once (YOLO) series
of object detection algorithms. It uses deep learning architecture to detect objects in
images and videos with high accuracy and speed; in this version, the object detection
and classification are performed in a single step, and it is the most updated version of
the CNN deep learning model for processing the data, and predefined data are directly
given for testing data to get the accurate output Fig. 1. We are using YOLOv3 because
it is designed to be fast and efficient, and it can process images and videos in real-time.
It divides the input image into grid cells, and each cell predicts a set of the bounding
boxes. It also uses many other techniques to increase the accuracy like the use of
multi-scale prediction, anchor boxes, and feature pyramid network which gives the
362 V. Ceronmani Sharmila et al.
Fig. 1 Face detection. This image is taken as a screenshot from the computer while running the
output for detecting the faces of the people using Yolo and Dlib
out white less computation time and more accuracy. While testing the person’s face
detection, we have used the pre-defined data that are already set in the YoloV3 and
are extremely fast compared to versions of CNN, fast CNN, faster CNN, and the
YoloV1 and YoloV2; its accuracy is very high as the mAP (mean average precision)
measure at 0.5 intersection of union (IOU) with focal loss is about 4× faster. And
on the face detection, predicting close faces with the highest value of 95% and still
getting the same results on many faces with an average value of 95% accuracy.
Dlib is a popular library for face recognition and has been used in various face
recognition applications. It provides several tools and algorithms that make it easy to
develop robust face recognition systems and also provides a pre-trained face recogni-
tion model called the face recognition ResNet model, which is based on deep learning
techniques and achieves state-of-the-art performance on face recognition tasks. The
model is trained on a large dataset of face images and can recognize faces with high
accuracy even under varying lighting conditions, facial expressions, and poses.
We use facial landmark detection which is the process of locating and identifying
specific points on a face, such as the eyes, nose, mouth, and chin. Dlib provides a
pre-trained shape predictor model that can be used to detect 68 facial landmarks on
a face.
4 Methodology
Image datasets containing 2000 images were collected to test and train the data, and
the following process was undertaken to train and test the dataset.
Criminal Prevision with Weapon Identification and Forewarning … 363
• Data cleaning: Remove any irrelevant or duplicated data and handle missing values
or outliers.
• Image resizing: Resize the images to a consistent size that is compatible with
YOLOv5’s input size.
• Annotation format conversion: Convert the annotations of the images to the YOLO
format, which consists of a text file for each image containing the class label and
the normalized coordinates of the object’s bounding box.
• Labeling: Label the weapons in the images with appropriate categories (such as
“pistol,” “rifle,” “shotgun”) and their corresponding bounding boxes, as shown in
Fig. 3.
• Image augmentation: Apply various image augmentation techniques such as rota-
tion, flipping, and random cropping to increase the size of the dataset and improve
the model’s ability to generalize to new data.
• Data splitting: Split the dataset into training, validation, and testing sets to evaluate
the model’s performance and prevent overfitting.
• Normalization: Normalize the pixel values of the images to a common scale to
ensure consistency across the dataset.
• Encoding: Encode the class labels in the YOLO format to numerical form for use
in the model.
Simulation and analysis of our project have been done by integrating both weapon
detection in YOLOv5 and person detection in YOLOv3 and personal identification
in Dlib. Each version of YOLO had it on the dataset, and Dlib has its own libraries
and pre-defined data. As YOLOv3 was using the predefined dataset for person iden-
tification and person detection the predefined dataset in the yolo algorithm which
has an accuracy of 95% and above Dlib library is used for face detection purposes
by face landmark detection in Dlib is the process of locating and identifying specific
points on a face, such as the eyes, nose, mouth, and chin. Dlib provides a pre-trained
shape predictor model that can be used to detect 68 facial landmarks on a face image.
Figure 1 shows the output for face detection using the Dlib libraries where the
face of the guys in the video is given to the database to store and identify the person
while running the test video.
Figure 2 shows the person detection from the test video given as people walking
and the output shown as a person in a boundary box.
Figure 3 shows the given images and names of the database, and after finding
similar faces, the algorithm finds the similar face in the database and gives the output
as “john” which is in the database, and when no face is found, the output is given as
“NO FACE FOUND” OBJ.
In Fig. 4, if a person’s face is not clear or turns backside, the output shows as “NO
FACE FOUND.”
364 V. Ceronmani Sharmila et al.
Fig. 2 Person detection. This image is taken as screenshot from the computer while running the
output for detecting the person using YOLOv5
Fig. 3 Database for face detection as John. This image is taken as a screenshot from the computer
while running the output for showing the saved face of people as his name from the stored database
Fig. 4 Detection of NO FACE FOUND. This image is taken as screenshot from the computer while
running the output for detecting the faces of the people using Dlib
Criminal Prevision with Weapon Identification and Forewarning … 365
Fig. 5 Intrusion detection. This image is taken as a screenshot from the computer while running
the output for detecting the intruder by checking from the saved database
Figure 5 shows the intruder detection part; if the algorithm finds a person who is
not in the database, it shows an alert or message as “INTRUDER DETECTED” as
in Fig. 5.
YOLOv5 is used in the weapon detection process as we have given six classes of
weapons with 2000 images; the number of classes is:
• Grenade
• Gun
• Knife
• Pistol
• Handgun
• Rifle.
Each class is tested together and trains the dataset to get mAP, recall, and precision,
and the result of the training dataset is shown in Table 1. As in Table 1, avg. precision
of all the class datasets is 0.875/1, and the mean average precision is 0.934/1.
Fig. 6 Confusion matrices of weapon dataset graph. This image is taken as a screenshot from the
computer while getting the output from testing the weapon dataset in Google Collab
Fig. 7 Weapon detection. This image is taken as a screenshot from the computer while running the
output for detecting the knife using YoloV5
Fig. 8 Intruder detection and weapon detection. This image is taken as a screenshot from the
computer while running the final output of the project from testing the intruder detection along with
the weapon
5 Comparative Table
6 Conclusion
In conclusion, the You Only Look Once (YOLO ) algorithm has emerged as a
powerful and effective tool for face and weapon identification systems. YOLO’s real-
time object detection capabilities, combined with its ability to handle multiple objects
simultaneously, make it an ideal choice for security and surveillance applications.
When it comes to face identification, YOLO excels at accurately detecting and
localizing faces within images or video frames. Its ability to process images in real-
time allows for quick and efficient face recognition, enabling applications such as
access control, forensic investigations, and public safety monitoring. By leveraging
deep learning techniques and a large amount of training data, YOLO can accurately
classify and identify individuals based on their facial features.
Moreover, YOLO’s capabilities extend beyond face identification to weapon
detection, which is crucial in maintaining public safety and preventing acts of
violence. With its high accuracy and real-time processing, YOLO can quickly detect
and classify various types of weapons, including firearms, knives, and explosives.
This enables security personnel and law enforcement agencies to swiftly respond to
potential threats and take appropriate action.
The YOLO algorithm’s strengths lie in its speed, accuracy, and versatility. Its
ability to handle real-time processing of large volumes of data makes it an excellent
choice for face and weapon identification systems that require immediate responses.
Furthermore, YOLO’s performance is not limited to specific environments or lighting
conditions, making it adaptable to various scenarios, such as indoor surveillance,
public spaces, or even nighttime operations.
However, it is important to note that while YOLO is a highly effective tool,
no system is perfect, and there are certain limitations to consider. Factors such as
occlusions, varying viewpoints, and image quality can affect the accuracy of face
and weapon identification. Additionally, YOLO’s performance heavily relies on the
quality and diversity of the training data used during its development.
Finally, we have given our proposed system criminal prevision with weapon iden-
tification and forewarning software which consist of intruder detection in the military
base along with the weapon carrying, so that border guards or the security guards
in the military base can easily take necessary precaution while an intruder tries to
Criminal Prevision with Weapon Identification and Forewarning … 369
infiltrate the base to collect or destroy national wide classified data their future plans
or takes any technology to destroy property or weapon garage so to avoid all this we
have proposed this project to detect intruders and forewarning software.
References
Abstract Numerous amounts of cybercrimes and attacks are happening over the
globe even in every ten minutes. A major number of people are being affected by
these cybercrimes and attacks across the world. Hence, the various attacks should be
prevented, and the people should be made aware of the cybercrimes and attacks so
that they can prevent the crimes by themselves. The attacks and crimes happen in a
planned manner so they can be avoided by taking certain measures. An application
which will help and guide the people for direction to avoid any cybercrimes and
attacks on their system will categorize the various cybercrimes and attacks that are
happening over the world and keep the users updated about the various cyber security
events. The application will work on machine learning algorithm to give the solution
and preventive measures according to the description of cyber security incidents
given by the user. The fully developed application can also be used as a social media.
1 Introduction
The first computer system was built in 1941 but as our generation grows up, various
new types of computer systems are invented and created. Even nowadays, computers
are available that are very compact. As the number of computer systems increases,
almost every person over the world uses computer systems such as mobile phones
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 371
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_32
372 V. Ceronmani Sharmila et al.
and laptops. These computer systems make everyone’s life easy by storing personal
data in keeping it safe.
Person’s life is easy by storing personal data in their own system and keeping
it safe. But as our generation and technology emerge, these computer systems are
hacked by some, and they use the personal data of others in an illegal manner. So, to
protect these computer systems, cyber security comes into effect. We use computer
systems to store every personal file and data that are useful in our day-to-day life. So,
to maintain the security of our systems and files, cyber security comes into action.
Own system and keeping it safe. But as our generation and technology emerge,
these computer systems are hacked by some, and they use the personal data of others
in an illegal manner. So, to protect these computer systems, cyber security comes
into effect. We use computer systems to store every personal file and data that are
useful in our day-to-day life. So, to maintain the security of our systems and files,
cyber security comes into action.
Cybercriminals are the people who are well trained for doing the various kinds of
cyber-attacks [1]. The increasing trend of cybercrime in India specifically mentions
the use of social networking sites for fraudulent activities such as IRS impersonation
scams and technical support scams. The study also notes that cybercriminals are
difficult to trace and that most victims are in the age group of 20–29 years. These
cyber-attacks from cybercriminals can be blocked and prevented by taking some
cyber security protocols and measures by the user. But the users should be well
aware about the attacks and measures. To help these people cyber threats and attacks
and to make people understand about the cyber security protocols, our application
comes into action. Our application is called “cyber hand” which will help the users
on what kind of cyber security measures should be taken based on the cyber-attacks.
The users can get the measures and solutions to prevent the cyber-attacks they
are affected with by giving the descriptions about the attack. The application will
automatically show solutions and preventive measures related to the attack without
any human interventions. This application will use the protocols and methods of
a machine learning algorithm which will train the application for various attacks
from the user’s input. Soni and Bhushan [2] used machine learning to overcome the
limitations of traditional techniques and improve the efficiency of cyber security. The
paper examines how various machine learning algorithms can be applied to address
common issues in cyber security and improve the analysis and detection process.
The descriptions of these attacks will be stored, and they will be reflected in the
application itself to make people aware and understand the various cyber-attacks
that are happening around. There are various kinds of cyber-attacks present, and
even new types of attacks also emerging nowadays; so to keep the users updated,
our application will be helpful. The users can understand the various attacks region-
wise and in rank order-wise also. It also has a search bar where the users can search
about any kind of cyber-attacks and threats and can understand and know about the
real-time cyber-attacks.
It will be helpful to see the trending cyber-attacks and crimes that are happening
around the user’s location in rank order with detailed description of the crimes and
attacks. In this platform, the users can connect with the cyber security experts and can
Advanced Cyber Security Helpdesk Using Machine Learning Techniques 373
know about the cyber incidents and events in various regions. Ahmad et al. [3] discuss
the need for cyber security education to be addressed in all fields and at all levels
of education, as the requirement for cyber security is in a big need. It emphasizes
the importance of stakeholders and other contributions and the responsibility of all
stakeholders in the system. This application will be used as a help desk as well as a
social media for cyber security. As there is no a helpdesk for common people and
they are with human interventions, this application will be user-friendly and can
be accessed by common people, and it works without human interventions. It is an
automated one with the help of machine learning algorithms.
2 Related Works
There has been a significant amount of research in the area of cyber security and
the development of automated cyber helpdesk systems. Many existing systems focus
on providing technical support and troubleshooting assistance to users but lack the
ability to provide real-time information on cyber-attacks and crimes or a community
platform for cyber security experts to interact.
One example of related work is the “Cyber security Virtual Assistant” devel-
oped by researchers. This system utilizes natural language processing and machine
learning algorithms to provide technical support and troubleshooting assistance to
users. However, it does not provide real-time information on cyber-attacks and crimes
or a community platform for experts to interact.
Shahjee and Ware [4] addressed the need for integration between network oper-
ating centers (NOC) and security operation centers (SOC) in order to more effectively
protect IT assets from cyber security threats. The current lack of a holistic view in the
literature regarding an integrated NOC and SOC architecture is limiting innovation in
this field. The paper conducts a systematic literature review and analysis to propose
a state-of-the-art architecture for an integrated NOC and SOC, outlining the main
building blocks and usefulness for organizations. However, the relevant literature
only considers explicit knowledge, neglecting the importance of tapping into people’s
tacit knowledge in automating and integrating NOC and SOC processes. Mutemwa
et al. [5] discussed the challenges involved in incorporating a newly developed Secu-
rity Operations Center (SOC) into an organization’s existing IT environment are
discussed in this paper. The article explores various aspects such as determining the
data sources to be integrated into the Security Information and Event Management
(SIEM) system for effective security posture monitoring, integrating the organiza-
tion’s ticket logging system with the SOC SIEM, devising effective communication
strategies to promote awareness campaigns from the SOC, and reporting on inci-
dents identified within the SOC. Furthermore, the article highlights the difficulties in
demonstrating the value of the significant investments made in building and running
a SOC. Since a SOC operates as an independent hub, it must be integrated with the
organization’s established procedures, policies, and IT systems, which can pose inte-
gration challenges during its implementation. Paté-Cornell and Kuypers [6] discussed
374 V. Ceronmani Sharmila et al.
the need for a quantitative approach to cyber risk management; this paper proposes
a probabilistic method for assessing the probabilities of new attack scenarios based
on existing data, with the aim of optimizing the allocation of limited resources and
prioritizing risk management measures. The method involves statistical analysis of
incidents within a specific organization and extends to cover potential threats that
have not yet occurred. A systematic construction of new attack scenarios is required,
followed by an assessment of their probability of success and the resulting losses.
This analysis generates complete risk curves that represent the overall cyber risk
for the organization and its insurers, facilitating the evaluation of the advantages of
various protective options. Moneva et al. [7] aimed to assess the efficacy of four
warning banners that sought to engage internet users and reduce DDoS attacks. A
quasi-experimental design was employed to measure engagement generated by the
banners and their linked landing pages. The results indicated that social ads were
significantly more effective in generating engagement than the other types of ads, with
no discernible difference in engagement between the various landing page designs.
The study suggests that social messages may be more effective than traditional deter-
rent messages in engaging with potential cyber offenders. Lif et al. [8] examined the
crucial information components that should be incorporated in an incident report to
facilitate incident management and information sharing in the field of cyber security.
The authors analyzed various reporting templates and frameworks, including a novel
reporting template employed in a cyber defense exercise within the military domain.
The findings suggest that the information elements in existing templates vary based
on their specific use cases, while the military template was deemed effective for inci-
dent reporting in the civilian sector. The authors recommend supplementing the new
template with additional fields such as details on victims, attackers, and assessments
of attackers’ motives to enhance its usefulness.
Mandal and Khan [9] state that the COVID-19 outbreak has resulted in a shift
toward remote work and learning, leading to an unprecedented growth of cloud infras-
tructure. This has caused an increase in data breaches and cyber security threats, not
only for large cloud vendors but also for small startups in various sectors. The paper
aims to identify security challenges caused by the sudden adoption of cloud platforms
and propose preventive measures to mitigate risks. The study highlights the need
for adequate precautions to protect hosts and devices connected to cloud resources.
Bakhshi et al. state that social engineering attacks exploit human vulnerability to gain
access to sensitive information. End-user awareness is crucial in protecting against
social engineering attacks, as they target the user rather than system defenses. A study
was conducted to determine the susceptibility of users in a corporate organization
to social engineering attacks and was measured by considering two attack scenarios
and despite the inclusion of clues to alert suspicious users, a significant proportion of
users (46–60%) fell prey to the attacks due to lack of user awareness. Post-incident
training and regular IT security drills were recommended to correct this issue.
Tanwar et al. discussed the increasing cybercrime and potential threats in the cyber
world, which can result in the loss of sensitive data and put devices at risk. The paper
presents cybercrime statistics and compares them with previous research papers on
Advanced Cyber Security Helpdesk Using Machine Learning Techniques 375
the subject. The research aims to bring attention to the issue and provide a detailed
study of various cybercrimes and potential threats.
There are also commercial cyber helpdesk platforms like “CyberArk” and “Trend
Micro” which provide technical support, troubleshooting assistance, and knowledge
base for the users, but it does not provide real-time information on trending crimes
or a community platform for experts to interact.
The proposed will be helpful for each and every person in the world. Nowadays,
people have emerged and become more modern that all their personal files and
data are stored in the computer systems. These files and data should be saved and
secured properly, that is where cyber security comes into action. As technology
emerges, lots of cybercrimes and attacks/threats are happening around the world.
The cybercriminals take over the various cybercrimes to steal the personal data and
credentials and use them in several illegal ways. These cybercrimes and attacks can
be prevented by creating awareness and giving knowledge to the people about the
cyber-attacks/threats and what are all the preventive measures should be taken to
prevent the attack. Our application comes into action to prevent the cyber-attacks/
threats. This application can be used as a helpdesk as well as a social media for
various cyber security purposes. This application contains various sections such as
home page, crime page report page, and community section.
After the home page, the crime page comes where the logged-in user can gain
knowledge about the various cybercrimes and attacks. In this page, the user can
search about any cyber incidents location-wise and rank-wise. In this section, the
user can also check the various cyber incidents that are happening around the user.
378 V. Ceronmani Sharmila et al.
The user can gain knowledge about various cybercrimes and attacks in these pages
section (Fig. 1).
The report page will be useful for the user to report about any type of cyber
incidents and cyber security events. This page is the main part of the application. In
this page, the user can report about cyber incidents by giving details and descriptions
of the incidents. In response, the application will provide the solutions and preventive
measures to prevent the attacks/incidents. The reported incident will be displayed in
the crime for the other users (Fig. 2).
The community page will be useful to connect with the various cyber security
experts and gain knowledge about cyber security events. The experts also can discuss
among themselves in this section.
The machine learning algorithm will be implemented in this application for
various processes of the application. The machine learning algorithms are used in
every stage of the application. In the crime page, the machine learning algorithms
will be used to classify the cyber-attack and incident in rank and location-wise. The
cyber incidents which are reported in the report page, using this machine learning
algorithm, the relevant preventive measures and solutions will be displayed to the
user (Fig. 3).
The accuracy of a random forest algorithm depends on several factors, such as the
quality and quantity of the data used to train the model, the specific implementation of
the algorithm, the hyperparameters chosen, and the complexity of the problem being
solved. In general, random forests are known to be a robust and accurate machine
learning algorithm, particularly for classification tasks. It is common to evaluate
the accuracy of a random forest model. These metrics provide a way to assess how
well the model is performing in terms of correctly identifying positive and negative
examples.
5 Conclusion
In conclusion, the proposed automated advanced cyber security helpdesk for people
offers a comprehensive solution to address the growing concern of cyber security
incidents. With its report page, users can report and receive accurate solutions and
preventive measures to prevent cyber-attacks. The crimes or feed section helps users
stay informed about the latest cyber security incidents happening around the world,
enabling them to take necessary precautions. The community section offers a plat-
form for cyber security experts to discuss and interact on cyber security-related issues,
further enhancing the user’s knowledge and understanding of cyber security events.
The use of machine learning algorithms further strengthens the application’s capa-
bility in providing relevant solutions and preventing cyber-attacks. The implemen-
tation of this proposed solution will significantly improve the end-user’s awareness
and preparedness toward cyber security incidents.
380 V. Ceronmani Sharmila et al.
(1) The existing system is operated manually and needs human interventions but
the proposed system is an automated one; it can perform function automatically
without any involvement of humans.
(2) The proposed system uses machine learning techniques but the existing system
does not use machine learning techniques.
(3) Due to the use of machine learning, the proposed system gives the result with
more precised and accuracy rate than the existing one.
(4) The existing system can only be used in large industries and organizations, but
the proposed system is very easy to use even a common man can also use to
protect themselves from any cyber threats or attacks.
(5) The existing system needs to be maintained properly but the proposed system
is very easy to maintain and the cost is also low.
(6) In the existing system if the attacked person reports the incident, it does not
give what immediate actions to be taken while the proposed system provides
you with certain preventive measures to avoid further damage immediately after
reporting the incident.
References
1. Datta P, Panda SN, Tanwar S, Kaushal RK (2020) A technical review report on cyber crimes in
India. In: 2020 international conference on emerging smart computing and informatics (ESCI),
Pune, India, pp 269–275. https://doi.org/10.1109/ESCI48226.2020.9167567. https://ieeexplore.
ieee.org/abstract/document/9167567
2. Soni S, Bhushan B (2019) Use of machine learning algorithms for designing efficient cyber
security solutions. In: 2019 2nd international conference on intelligent computing, instrumenta-
tion and control technologies (ICICICT), Kannur, India, pp 1496–1501. https://doi.org/10.1109/
ICICICT46008.2019.8993253. https://ieeexplore.ieee.org/abstract/document/8993253
3. Ahmad N, Laplante PA, DeFranco JF, Kassab M (2022) A cybersecurity educated community.
IEEE Trans Emerg Topics Comput 10(3):1456–1463. https://doi.org/10.1109/TETC.2021.309
3444. https://ieeexplore.ieee.org/abstract/document/9468330
4. Shahjee D, Ware N (2022) Integrated network and security operation center: a systematic anal-
ysis. IEEE Access 10:27881–27898. https://doi.org/10.1109/ACCESS.2022.3157738. https://
ieeexplore.ieee.org/abstract/document/9729852
5. Mutemwa M, Mtsweni J, Zimba L (2018) Integrating a security operations centre with an orga-
nization’s existing procedures, policies and information technology systems. In: 2018 interna-
tional conference on intelligent and innovative computing applications (ICONIC), Mon Tresor,
Mauritius, pp 1–6. https://doi.org/10.1109/ICONIC.2018.8601251. https://ieeexplore.ieee.org/
abstract/document/8601251
6. Paté-Cornell M-E, Kuypers MA (2023) A probabilistic analysis of cyber risks. IEEE Trans
Eng Manage 70(1):3–13. https://doi.org/10.1109/TEM.2020.3028526. https://ieeexplore.ieee.
org/abstract/document/9354348
Advanced Cyber Security Helpdesk Using Machine Learning Techniques 381
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 383
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_33
384 V. Ceronmani Sharmila et al.
putting patient lives at risk. The paper concludes that conducting comprehensive
APT assessments is essential to identify and mitigate potential threats, protecting
patient information and maintaining the continuity of essential medical services.
1 Introduction
2 Related Works
The reconnaissance, delivery, initial intrusion, command and control (C&C), lateral
movement, and data exfiltration phases make up the advanced persistent threat (APT)
life cycle. Attackers learn details about their target, including potential weaknesses
and personnel data, throughout the reconnaissance and delivery phases. In the initial
step of intrusion, an attack is then launched using this information. The attackers
utilize a server to manage any compromised hosts during the C&C stage. The
attackers move throughout the network at the stage of lateral movement to take
over more hosts. They can even infect additional computers using these recently
compromised hosts. Attackers finally take sensitive data from the victim during the
data exfiltration stage [4].
The term “Internet of Things” (IoT), which describes a network of connected
sensors, objects, and gadgets, is becoming more and more prevalent in contemporary
life. APTs are also focusing on these gadgets. It is becoming more crucial to identify
and respond to APT attacks on IoT devices as their security is often poorer than that
of traditional hosts [5].
APT28, which has been reported to have hacked at least 500,000 IoT devices
like routers, video decoders, and printers, is one example of an APT organization
that targets IoT devices. Phishing emails are frequently used by APT28 attackers
to initiate assaults, and afterward, botnets are used to take control of the infected
IoT devices. The group targeted video decoders, VOIP phones, and printers in one
infiltration activity. In the first attack and C&C stages, they were able to take control
of the devices by taking advantage of flaws and using default passwords. In the lateral
movement phase, they broke into the target’s intranet using the compromised IoT
devices. In the latter phase, they also stole crucial data from the victim [6].
The healthcare industry is a prime target for APT attacks due to the valuable
nature of the data it holds, such as patient medical records and financial information.
A report by the cybersecurity firm Carbon Black states that the healthcare industry
is the most targeted sector for APTs, with a rate of attacks that is three times higher
than the average across all industries (Carbon Black, 2016). The healthcare sector
is also facing a shortage of cybersecurity professionals, which can exacerbate the
problem [7].
Risk assessment: This step involves analyzing the potential consequences of an APT
attack on the organization, as well as the likelihood of such an attack occurring. This
information is used to prioritize the organization’s security needs and to allocate
resources accordingly.
Vulnerability assessment: This step involves a thorough examination of the organi-
zation’s systems, applications, and networks to identify any potential security weak-
nesses or vulnerabilities that could be exploited by an APT attacker. This information
is used to prioritize mitigation efforts and to develop a remediation plan.
Penetration testing: This step involves simulating an APT attack against the orga-
nization’s systems, applications, and networks to determine the effectiveness of the
organization’s security measures and to identify any new vulnerabilities that may
have been introduced.
Incident response planning: This step involves developing a comprehensive incident
response plan to be used in the event of an APT attack. The plan should outline the
steps to be taken, the roles and responsibilities of key personnel, and the communi-
cation strategies to be used during an incident. Employee Awareness and Training
are critical components of incident response planning. In this step, organizations
should focus on educating and preparing their workforce to effectively respond to an
Advanced Persistent Threat (APT) attack.
It is important to clearly define the scope of the assessment, including the systems,
networks, and processes that will be tested, as well as any constraints or limitations.
A comprehensive assessment plan should be developed that outlines the objectives,
methodology, and timeline of the assessment. This plan should be reviewed and
approved by the stakeholders before the assessment begins. Red team assessments
are most effective when the red team works closely with the organization being
assessed. This can help ensure that the assessment is conducted in a manner that is
both effective and respectful of the organization’s operations and personnel.
Red team assessments should simulate real-world attacks as closely as possible,
using tactics and techniques that are commonly used by malicious actors. This can
help organizations better understand their vulnerabilities and the potential conse-
quences of a security breach. As part of the assessment, the red team should test
the organization’s incident response plan to ensure that it is effective and that the
organization is prepared to respond to a security incident.
Provide actionable recommendations: The assessment should provide actionable
recommendations that the organization can use to improve its security posture.
The recommendations should be prioritized and include a plan of action for
implementation.
Regularly reassess: Security is an ongoing process, and red team assessments should
be conducted on a regular basis to ensure that the organization remains prepared to
defend against evolving threats.
388 V. Ceronmani Sharmila et al.
4 Open-Source Intelligence
The Open-Source Intelligence (OSINT) process for conducting a red team assessment
in a hospital involves collecting and analyzing publicly available information to gain
insight into the hospital’s operations, systems, and security measures. The following
are some of the steps involved in the OSINT process:
Identifying relevant sources: This involves identifying websites, news articles, and
social media platforms that may contain relevant information about the hospital, such
as its history, operations, personnel, and recent events.
Data collection: Collecting information from the identified sources, such as hospital
websites, news articles, social media, and other publicly available sources.
Data analysis: Analyzing the collected information to identify patterns, trends, and
relationships that can provide insight into the hospital’s security measures, systems,
and operations.
Verification and validation: Verifying the accuracy and reliability of the information
collected and analyzed, and cross-referencing it with other sources to ensure that the
information is valid and up-to-date.
Creating a report: Compiling the information gathered and analyzed into a report that
summarizes the findings and provides insights into the hospital’s security posture.
Data breaches can have serious consequences for individuals and organizations, but
there are steps that can be taken to turn the situation into a productive outcome:
Perform a thorough analysis of the breach: Understand the scope of the breach, what
data was compromised, and how it was accessed. This information will help you
determine the best course of action.
Notify affected parties: If any individuals have had their personal information
compromised, it is important to let them know as soon as possible so they can take
steps to protect themselves.
Improve security measures: Use the information gained from the breach analysis
to improve your security measures, such as by implementing stronger passwords,
two-factor authentication, and encryption.
Review and update your policies: Make sure that your policies and procedures for
handling sensitive data are up-to-date and aligned with industry best practices.
Advanced Persistent Threat Assessment 389
Hospital entry data refers to the information collected about patients and visitors
as they enter and exit a hospital. This information is typically used for operational
and security purposes, such as tracking patient flow and monitoring access to the
facility. However, the confidential nature of this data makes it a valuable target for
cybercriminals and hackers, who can use it for malicious purposes, such as identity
theft or insurance fraud.
The consequences of a hospital entry data breach can be significant, with poten-
tially far-reaching impacts on both patients and the hospital. Patients whose infor-
mation has been compromised may suffer financial losses and suffer from a loss
of privacy and confidentiality. In addition, hospitals may face legal and regulatory
consequences for violating privacy laws and may also experience a loss of reputation
and public trust.
Cybersecurity threats to hospitals are increasing, as the number of electronic
devices connected to hospital networks continues to grow, and hackers become more
sophisticated in their methods. To prevent hospital entry data breaches, hospitals
must implement robust security measures and protocols, such as firewalls, intrusion
detection systems, and encryption technologies.
Advanced Persistent Threat Assessment 391
Testing the network of hospitals using Shodan is an important tool for identifying
potential security vulnerabilities and ensuring the confidentiality and privacy of
patient data. By searching for hospital systems and devices on Shodan, security
professionals can identify any publicly accessible systems that may be vulnerable to
attack, as well as any systems that may be configured improperly or running outdated
software.
In order to effectively test the hospital network using Shodan, security profes-
sionals must have a clear understanding of the hospital’s IT infrastructure, including
the types of systems and devices that are in use. This information can be used to create
a list of search terms that will be used to search for hospital systems on Shodan. The
search results can then be analyzed to identify any systems that may be vulnerable to
attack, and appropriate remediation steps can be taken to address these vulnerabilities
(Fig. 2).
Testing the network of hospitals using Shodan is a web app tool which is used to
identify any device which is connected to the network. Shodan is an effective tool
for identifying potential security vulnerabilities and ensuring the confidentiality and
privacy of patient data. However, it is important to conduct these tests in a responsible
and ethical manner and to coordinate with other hospital stakeholders to minimize
the risk of disruption to normal hospital operations. By doing so, hospitals can take
proactive steps to protect patient data and ensure the security of their IT infrastructure.
Nowadays, people have emerged and become more modern that all their personal
files and data are stored in the computer systems. These files and data should be saved
and secured properly, and that is where cybersecurity comes into play. As technology
392 V. Ceronmani Sharmila et al.
advances, many cybercrimes and attacks/threats occur around the world. The cyber-
criminals took over the various cybercrimes to steal personal data and credentials,
and they used them in several illegal ways. These cybercrimes and attacks can be
avoided by raising public awareness and providing information about cyberattacks
and threats, as well as the preventive measures that should be taken. Our applica-
tion comes into action to prevent cyberattacks/threats. This application can be used
as a database for searching for data breaches in hospitals. This application contains
various sections, such as the home page and the report page. The user will be directed
to the application’s home page, where they can learn more about it and its features.
Through the home page, the user can access other pages and features by clicking on
the options, which will take you to the doctor’s directory, the hospital directory, and
the send mail page. The home page gives directions on how to use the application and
provides knowledge about its other features. After the home page, clicking the doctor
directory will redirect to a new page. The page contains a search bar where the user
can check by typing a name or ID so that the user can check the breached details of
his or her leaked credential, like names, social security numbers, addresses, dates of
birth, and phone numbers, followed by clicking on the hospital directory, which will
redirect to the new page. The page contains a search bar where the user can check
by typing the name of the hospital, which will list the breached details of doctors,
working staff, and patient data. The breached details of patients such as dates of birth,
social security numbers, lab results, health insurance policy information, diagnoses,
disability codes, doctor names, medical conditions, etc., is also viewable and next
by clicking the send mail button, which is useful to report to the hospitals about the
data breaches that happened and helpful for them to change their credentials (Figs. 3
and 4).
7 Conclusion
In conclusion, the increasing dependence on technology and the Internet has made
cybersecurity more critical than ever. Cybercriminals are always finding new ways
to exploit vulnerabilities and steal personal data and credentials. It is essential to
raise public awareness and provide tools like the proposed application to prevent
cyberattacks and threats. The application’s features, such as the doctor directory,
hospital directory, and send mail page, provide an efficient and convenient way to
search for data breaches in hospitals and report them to the concerned authorities.
Such initiatives can help improve cybersecurity and protect personal information and
sensitive data from being compromised. As a result of our research, we have collected
data of about 600,000+ doctors. These are publicly available data from dark web. We
will automatically send this breached information to the respective doctors’ email to
safeguard their accounts to avoid further loss of private information. In near future,
this will be automated using bash scripts to avoid delays in finding the breached data.
References
1. Meng X et al (2022) Exploring the vulnerability in the inference phase of advanced persistent
threats. Int J Distrib Sens Netw. https://ieeexplore.ieee.org/document/9716119
2. Khaleefa EJ, Abdulah DA (2022) Concept and difficulties of advanced persistent threats: survey.
Int J Nonlinear Anal Appl. https://ijnaa.semnan.ac.ir/article_6230_4b14956433d591842b2dc
fe6208c141e.pdf
3. Park S-H, Yun S-W, Jeon S-E, Park N-E, Shim H-Y, Lee Y-R, Lee S-J, Park T-R, Shin N-Y, Kang
M-J (2022) Performance evaluation of open-source endpoint detection and response combining
google rapid response and osquery for threat detection. IEEE. https://www.researchgate.net/fig
ure/Reviewing-the-investigation-result-of-netstat_tbl2_342148695
394 V. Ceronmani Sharmila et al.
Abstract Vaccination is one of the best ways to stop or at least slow the spread of
infectious illnesses. There are various logistical concerns raised by this medical tech-
nique. Millions of people worldwide participate in preventative vaccination programs
every year, whether it is the seasonal flu shot, immunizations for children, or other
vaccines. An outbreak of disease can be avoided by using preventative vaccinations
before the sickness ever appears. Reactive vaccination occurs during an infectious
disease outbreak or in response to a bioterror strike, and it is distinct from preventative
immunization. Thus, this paper proposes a solution for privacy and security issues
in the supply chain industry. To overcome this problem, in this paper, we present an
efficient blockchain-based vaccine supply chain management where smart contracts
and WEB 3.0 play a major in securing the data. The web application developed
for this paper can be used by four types of users: Super admin, Hospital, Citizen,
and Manufacturer, where Super admin can govern the whole system. Then through
this application, manufacturers can add their vaccines and sell those vaccines to the
hospitals for a particular price and the hospitals can purchase the vaccines by paying
the quoted price to the manufacturer. Hospitals can purchase vaccines from manufac-
turers and view the appointment status of citizens, and the citizens are allowed to book
their appointments for vaccination. In addition, it offers foolproof data protection that
prevents hackers from altering information in any way. To secure information at its
source, a new framework known as WEB 3.0 incorporates a blockchain framework.
This paper provides a secure end-to-end data management system that allows people
to book appointments, purchase vaccines, and sell them online.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 395
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_34
396 V. C. Sharmila et al.
1 Introduction
2 Related Works
Both the Covid-Shield and Covaxin vaccines have begun rolling out across the several
states that make up India. Difficulties in the vaccination supply chain (VSC) are
expected to arise in emerging states like Bihar because of inadequate healthcare facil-
ities, widespread poverty, and low levels of education. The purpose of the research
by Shashank et al. [1] is to investigate how the IoT could affect the VSC’s efficiency.
A conceptual framework has been developed and evaluated by this research based on
the literature on the impact of IoT on product management, demand management,
supply management, social behavior, and government regulations in Bihar.
It is crucial for healthcare organizations to effectively control the healthcare supply
chain (HCSC) process and operation both during pandemics like COVID-19 and
during regular operations. According to Ilhaam et al. [2] to reduce price differ-
ences and mistakes in procurement, the suggested solution incorporates blockchain
technology and decentralized storage to increase visibility, simplify stakeholder
communication, and shorten the procurement process.
Coronavirus 2019 (COVID-19) emerged, complicating the distribution and
administration of vaccines against the virus. Current platforms and systems used
to manage data on the distribution and delivery of COVID-19 vaccines fall short in
Blockchain-Based Supply Chain Management System for Secure … 397
3 Proposed System
This paper proposes a solution for privacy and security issues for a supply chain
industry. To overcome this problem, in this paper, we present an efficient blockchain-
based vaccine supply chain management where smart contracts and WEB 3.0 play
a major in securing the data. Web application developed for this paper can be used
by four types of users, namely Super admin, Hospital, Citizen, and Manufacturer,
where Super admin can govern the whole system. Then through this application
Manufacturer can add their vaccines and sell those vaccines to the hospitals for a
398 V. C. Sharmila et al.
particular price and the hospitals can purchase the vaccines by paying the quoted
price to the manufacturer. Hospitals can purchase the vaccines from manufacturer
and view the appointment status of citizens, and the citizens are allowed to book their
appointments for vaccination. Moreover, it offers total data protection, preventing
any data tampering by hackers. Web 3.0 is a framework with a blockchain architecture
that protects data at its back-end. Since no encrypt or decrypt key is necessary to
gain access to the information, there is no longer any concern that hackers would
access or alter the data. Thus, this paper offers a complete data security solution
for scheduling appointments in the supply chain sector, purchase, and sell vaccines
through online.
See Fig. 1.
3.3 Working
and the transaction gets initiated and successfully gets updated. The test network will
be installed on a local platform, connected to the blockchain network, and utilized to
evaluate the viability of the smart contract. As a result, the supply chain sector may
use this paper’s end-to-end data security system to schedule appointments, buy and
sell vaccinations online, and more.
4 System Process
To ensure that all necessary features are working properly, we use Remix IDE in
this paper. During functional testing, quality assurance specialists check to see if a
program is functioning as expected. For data security, it is essential to double-check
both the encrypted and decrypted transmissions. Manual tests and automation testing
tools can both be used to ensure functionality. Manual testing is a straightforward
method for performing functional testing, which entails verifying that the product
works as intended by the target audience.
Ganache is a private blockchain that can be used by developers to build and test
smart contracts, decentralized applications (dApps), and other software. Truffle is a
framework for developing decentralized applications. When developing a decentral-
ized application (dApp), we may use Truffle to draft the necessary smart contracts,
create unit tests for the underlying functions, and plan out the dApp’s user interface.
Web 3.0 enables the creation of smart contracts to describe the behavior of appli-
cations and their subsequent deployment to a distributed state machine. Distributed
ledgers can pave the way for Web 3.0 by encouraging widespread collaboration. Web
3.0, also known as the Semantic Web, is advantageous for two reasons: It facilitates
cooperation and decentralized invention. Web 3.0 is a primary reason why people
are increasingly relying on their mobile gadgets.
Users can save their Ether in a secure token wallet called MetaMask, and they can
also manage their own identities with it. It is the part of the blockchain system in
Blockchain-Based Supply Chain Management System for Secure … 401
charge of verifying users and establishing links between accounts. The processing
charge for MetaMask transactions is 0.875% of the overall transaction value.
Results from every stage of development will be gathered for this paper. The next step
is to integrate it with the blockchain network and roll out the platform locally. Smart
contract viability is verified by the network. Without a decryption key or encryption,
data can be accessed freely without worrying about unauthorized parties gaining
access to it. Therefore, the hospital’s medical records are protected from beginning
to end by this technology.
5 Algorithm
6 Pseudocode
7 Results
In this paper, we can break our paper down into modules of implementation.
402 V. C. Sharmila et al.
The first step is account creation of the users for accessing the application. There
are four different logins allotted in this application such as admin login, where the
admin can provide or reject login access to the manufacturer and can monitor vaccine
status. The manufacturer can add vaccine and can sell those products to hospitals.
The hospital can buy vaccine from manufacturers and can sell those vaccines to
citizens which means the citizens can book appointments to the hospital.
The below image represents the login page of the application where all type of
users can login and access the application (Fig. 4).
Blockchain-Based Supply Chain Management System for Secure … 403
The signup page of the application where all type of users can signup using their
credentials. This application can be accessed by four types of users. The below image
represents the admin’s dashboard, the admin can provide login access to manufacturer
and can turn off login access whenever it is needed. Also, the admin can view the
vaccine remaining from manufacturers and hospitals (Fig. 5).
The below image represents the manufacture’s home page, where the manufac-
turer can add vaccine and can sell those vaccines to hospital. The admin has the
ability to either grant or deny manufacturer login access. In addition, the admin is
able to keep track of the vaccines that are currently available from the manufacturer
as well as hospitals. Additionally, they are able to check the number of people who
have been vaccinated up to this point (Fig. 6).
The vaccine-adding feature, where the manufacturer, can add products. The below
image represents the hospital’s home page, where the hospital can verify and buy
vaccine from manufacturer. Also, the hospital can sell the vaccines to citizens.
Hospitals will be listed to purchase vaccines from the manufacturers (Fig. 7).
The hospital’s home page is verifying the vaccine sold by manufacturer. Hospitals
can validate vaccines and purchase them directly from manufacturers. Additionally,
the hospital is able to offer vaccines to citizens who book appointments. As a result,
they are able to view the appointment status that has been booked by citizens. This
enables hospitals to accept or reject appointments that have been made by citizens and
are capable of providing the current vaccination status and the hospital is purchasing
vaccine from manufacturer. Hospital view appointment status booked by citizens,
and the hospitals can accept or reject the appointment booked by citizens and can
update the vaccinated status. Citizens booking appointment to hospital is represented
in Fig. 8.
There will be presentations of vaccines offered by the hospital for sale. This will
allow citizens to verify the vaccine by clicking on the QR code displayed next to the
vaccine. In addition, they will also be able to schedule appointments at the hospital.
After the hospital has given them confirmation of their appointment, they will be
able to get their vaccinations as soon as the facility is ready to do so.
Blockchain-Based Supply Chain Management System for Secure … 405
8 Conclusion
A successful outcome of this paper has been created for vaccine distribution in which
the data can be securely viewed with high security where Ethereum blockages have a
significant impact. Highly reliable data security solution that uses Web 3.0 and smart
contracts to protect data. This guarantees total data security, preventing hackers from
altering the data. The development of Web 3.0 uses a built-in blockchain architecture
that protects data at the back-end. Therefore, this paper offers the industry end-to-
end security and privacy for booking appointments, purchasing, and selling vaccines
through the Internet.
9 Future Work
We will examine the paper’s applicability in the near future to identify the medical
technology that needs data security and privacy. There are more opportunities to
expand or transform this paper in the medical industry. As a result, this paper will be
effective in the future and enable safe data transmission and it cannot be tampered
at all without the permission of the main user.
References
1. Kumar S, Raut RD, Priyadarshinee P, Mangla SK, Awan U, Narkhede BE The impact of IoT
on the performance of vaccine supply chain distribution in the COVID-19 context. In: IEEE
transactions on engineering management. https://doi.org/10.1109/TEM.2022.3157625
2. Omar IA, Jayaraman R, Debe MS, Salah K, Yaqoob I, Omar M (2021) Automating procurement
contracts in the healthcare supply chain using blockchain smart contracts. IEEE Access 9:37397–
37409. https://doi.org/10.1109/ACCESS.2021.3062471
3. Musamih A, Jayaraman R, Salah K, Hasan HR, Yaqoob I, Al-Hammadi Y (2021) Blockchain-
based solution for distribution and delivery of COVID-19 vaccines. IEEE Access 9:71372–
71387. https://doi.org/10.1109/ACCESS.2021.3079197
4. Liang W, Hu Y, Zhou X, Pan Y, Wang KI-K (2022) Variational few-shot learning for
microservice-oriented intrusion detection in distributed industrial IoT. IEEE Trans Industr Inf
18(8):5087–5095. https://doi.org/10.1109/TII.2021.3116085
5. Peng S et al (2020) An efficient double-layer blockchain method for vaccine production
supervision. IEEE Trans Nanobiosci 19(3):579–587. https://doi.org/10.1109/TNB.2020.299
9637
6. Kostyuchenko Y, Jiang Q (2020) Blockchain Applications to combat the global trade of falsified
drugs. In: International conference on data mining workshops (ICDMW). Sorrento, Italy, pp
890–894. https://doi.org/10.1109/ICDMW51313.2020.00127
7. Kalla A, Hewa T, Mishra RA, Ylianttila M, Liyanage M () The role of blockchain to fight against
COVID-19. In: IEEE engineering management review, vol 48, no 3, pp 85–96, 1 thirdquarter,
Sept 2020. https://doi.org/10.1109/EMR.2020.3014052
8. Soltanolkottabi M, Ben-Arieh D, Wu C-H (2019) Modeling behavioral response to vaccination
using public goods game. IEEE Trans Comput Soc Syst 6(2):268–276. https://doi.org/10.1109/
TCSS.2019.2896227
Dehazing of Multispectral Images Using
Contrastive Learning In CycleGAN
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 407
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_35
408 S. Kayalvizhi et al.
2 Proposed Solution
The input multispectral images are first preprocessed with a haze removal algo-
rithm to obtain estimated transmission maps. After that, a contrastive type of teaching
separates the photographs into pairs of positive and negative images depending on
how similar they are in the feature space [2]. By maximizing similarity between
positive pairings and minimizing similarities between negative pairs, the contrastive
loss is utilized to help the model develop representations that accurately reflect the
multispectral pictures’ underlying structure. The positive pairs are then fed into
CycleGAN, where they are transformed into the clear multispectral image domain
and then back into the hazy multispectral image domain. While the aggressive loss
promotes the generated clear spatial and spectral pictures to be visually comparable
to the real data clear multispectral photos, the time in making loss makes sure that the
created clear spectral analysis is in line with the initial hazy multispectral photographs
(Fig. 1).
The primary objective of our research is to enhance the visibility and quality of
multispectral images captured under hazy conditions. Haze and atmospheric scat-
tering significantly degrade the visibility of such images, rendering them less infor-
mative and limiting their applications in various fields, including remote sensing,
environmental monitoring, and agricultural analysis. By effectively removing the
haze and restoring the original content, we aim to improve the interpretability
and utility of multispectral imagery, enabling more accurate analysis and decision-
making processes.
To achieve this objective, we employ a novel approach based on contrastive
learning in the context of CycleGAN. Contrastive learning has recently gained
considerable attention due to its ability to learn powerful image representations from
unlabeled data. By leveraging the inherent structure and information in multispec-
tral images, our proposed method learns to capture the underlying characteristics of
haze and non-haze regions. This facilitates the generation of high-quality, haze-free
multispectral images, enabling more accurate and reliable analysis.
Our understanding of the proposed methodology is based on the utilization of the
CycleGAN framework, which comprises two generative models, each responsible for
learning the mapping between haze and non-haze domains. Through the introduction
of contrastive learning, we encourage the models to focus on salient features, ensuring
that important details and information are preserved during the dehazing process. The
use of multispectral images as input allows us to exploit the complementary nature
of the different spectral bands, enhancing the overall dehazing performance.
The proposed system for image dehazing of multispectral images using contrastive
learning in CycleGAN consists of four main components: haze removal, contrastive
learning, CycleGAN, and evaluation. The following is a high-level description of the
proposed framework and the interactions between its components.
1. Haze Removal: The first step in the proposed framework is to estimate the trans-
mission maps from the input multispectral images using a haze removal algo-
rithm. This step is necessary to remove the effects of atmospheric haze and obtain
clear images as input for the subsequent steps. Several haze removal algorithms
can be used for this step, such as the dark channel prior or atmospheric scattering
models [6].
2. Contrastive Learning: The second component of the proposed framework is
contrastive learning, which is used to separate hazy and clear multispectral images
without requiring any labeled data. In this step, the input multispectral images
are first preprocessed and fed into the contrastive learning module. The similarity
between the positive and negative pairings of multispectral pictures is maximized
and minimized using the contrastive loss. A series of pairings of hazy and distinct
multispectral photographs are the result of this stage [7].
3. CycleGAN: The third component of the proposed framework is CycleGAN,
which is used to generate clear multispectral images from the hazy ones. The
contrastive learning phase’s positive pairings of hazy and clearer multispectral
pictures are utilized to teach the CycleGAN network in this step. While the
adversarial loss promotes the generated clear multitemporal pictures to be visu-
ally comparable to the actual truth clear multispectral photographs, the cycle
consistency loss guarantees that the generated cleared spectral analysis is in line
with the original hazy spatial and spectral photographs.
412 S. Kayalvizhi et al.
4. Evaluation: The dehazing model’s efficacy is evaluated in the last section of the
recommended framework. In addition to eye examination, evaluation may be
carried out using a variety of objective measures, including peak transmission
ratio (PSNR) and the structural similarity index (SSIM). The suggested frame-
work’s performance may be compared to those of existing cutting-edge dehazing
techniques.
The proposed system for image dehazing of multispectral images incorporates
four key components: haze removal, contrastive learning, CycleGAN, and evaluation.
The first step involves estimating transmission maps using a haze removal algorithm
to remove atmospheric haze from the input multispectral images. Contrastive learning
is then employed to separate hazy and clear images by maximizing similarity among
positive pairings and minimizing similarity among negative pairings. The CycleGAN
model is trained using the positive pairings from contrastive learning to generate
clear multispectral images from the hazy ones. The generated images are optimized
to be visually comparable to the ground truth using adversarial and cycle consistency
losses. Finally, the efficacy of the proposed framework is evaluated through visual
inspection and objective measures like PSNR and SSIM, allowing for comparisons
with existing dehazing techniques.
The output of the feature extractor is then fed into a projection head, which projects
the feature vectors onto a lower-dimensional space [2].
The proposed method utilizes contrastive learning, a self-supervised technique,
in combination with CycleGAN for image denoising of multispectral images.
Contrastive learning enables the discrimination between hazy and clearer images
without the need for labeled training data. Positive pairs consist of similar multi-
spectral images, while negative pairs consist of dissimilar ones. A neural network,
like a CNN, is employed to learn the feature space. The InfoNCE loss function
is utilized, which measures the resemblance between positive pairings and control
samples normalized by the quantity of negative samples. Various neural network
architectures can be used for feature extraction, often involving a pre-trained CNN
as the feature extractor and a projection head for dimensionality reduction.
The proposed method for image dehazing of multispectral images using contrast
enhancement learning in CycleGAN incorporates the two aspects, contrastive
learning and CycleGAN to improve the performance of dehazing model. The two
components are combined in the manner described below:
1. The proposed system’s contrastive learning component creates positive pairings
of hazy and clear spectral bands pictures without the need for labeled data. The
CycleGAN model is trained using the created pairs of fuzzy and clear hyper-
spectral pictures.
2. The CycleGAN component of the proposed system is used to generate clear multi-
spectral images from the hazy ones. The performance of the image enhancement
model is then assessed using the produced clear multispectral pictures.
Integration of contrastive learning and CycleGAN allows the proposed system
to leverage the advantages of both techniques [11]. Contrastive learning enables the
proposed system to learn representations that capture the underlying structure of the
multispectral images, while CycleGAN enables the proposed system to transform
hazy multispectral images into their clear counterparts.
The integration of the two components also enables the proposed system to over-
come some of the limitations of each individual technique. For example, contrastive
learning may not be able to capture all the nuances of the dehazing process, while
CycleGAN may not be able to effectively learn the underlying structure of the
multispectral images without additional constraints.
In summary, the integration of contrastive learning and CycleGAN in the proposed
system for image dehazing of multispectral images allows the system to leverage the
advantages of both techniques and overcome some of their limitations, leading to
improved performance in dehazing multispectral images.
414 S. Kayalvizhi et al.
3 Results
The findings from the study employing CycleGAN to dehaze multispectral pictures
are presented in this part. The suggested technique successfully improves the clarity
and accessibility of photographs taken in foggy settings by utilizing the strength of
computational adversarial networks (GANs). The goal of our study is to create a
dehazing model in CycleGAN that uses contrastive learning to recover lost details
and features while retaining the scene’s original color and texture. To increase the
reliability of the suggested model, the datasets used for investigation were produced
and expanded. The proposed computational complexity metric, which gauges the
amount of data contained inside the restored image, the peak transmission ratio
(PSNR), which rates the restored image’s quality in comparison with the original,
and the average squared error (MSE), that gauges the average squared discrepancies
here between restored and innovative images, were among the metrics and were used
to assess the proposed model. The findings demonstrated that, when compared to
cutting-edge methodologies, the proposed model outperformed them on the ground of
all three criteria. The findings presented in this section offer insight into CycleGAN’s
performance while dehazing multispectral images and contribute to the body of
knowledge in the fields of imaging processing and machine vision (Fig. 2).
Several methods have been proposed for dehazing multispectral images, but
each has its limitations. The first method estimates the transmit map and elimi-
nates haze using DCP and WAF [14], but it may not be effective for multispectral
photographs due to variations in atmospheric properties. The second method uses
adaptive histogram equalization for contrast restoration [6], but it does not estimate
the overall transmission map or eliminate haze. The third method uses an unsuper-
vised algorithm for haze removal in multispectral remote sensing pictures [15], but
it may not generalize well to other datasets. The fourth method estimates the prop-
agation map and eliminates haze using a residual architecture [16], but it may not
be effective on photographs with intricate haze distributions. The fifth method uses
CycleGAN for autonomous dehazing of aerial data [17], but it learns the translation
between foggy and haze-free pictures using an uneven contrastive loss function. The
suggested approach outperforms existing methods, especially on the SS594 multi-
spectral dehazing dataset, by using a descriptive loss function to discover the mapping
of hazy and haze-free pictures in a more symmetrical manner [17].
In terms of picture quality measures including PSNR, MSE, and entropy, the
suggested model for image enhancement multispectral images using discourse
generation in CycleGAN [17] outperforms previous techniques. As compared to
earlier efforts, our model offers superior picture restoration compared to the current
approaches. Makarau et al. [15] approach detects and removes haze from remotely
sensed multispectral imaging; however, it is unable to recover low contrast pictures.
Although not explicitly created for multispectral pictures, Zhang et al. work’s focuses
on employing a convolutional neural system with residual architecture. The unsu-
pervised haze removal method proposed by Liu et al. using asymmetric contrastive
CycleGAN also does not cater specifically to multispectral images. In contrast, our
proposed method is specifically designed for multispectral images and leverages the
advantages of contrastive learning in CycleGAN to further enhance image restora-
tion. As a consequence, our suggested approach significantly outperforms currently
used ones and exhibits promising dehazing multispectral picture outcomes.
Despite showing promising results, the proposed model for dehazing multispectral
images using contrastive learning in CycleGAN has several limitations that need to
be addressed.
Initially, having access to high-quality information for training is crucial for
the model’s success. The model’s performance can be significantly impacted if
the training dataset is not representative of the test dataset. Obtaining high-quality
datasets with diverse and challenging hazy conditions is essential for improving the
model’s robustness.
Secondly, the proposed model requires substantial computational resources during
training, making it less practical for real-time applications. The training process is
time-consuming and requires high-end hardware to achieve optimal results.
In summary, while the proposed model represents a significant improvement
over existing methods, there are still limitations that must be addressed to make
the approach more practical for real-world applications.
The proposed model for dehazing multispectral images using contrastive learning
in CycleGAN [17] presents several potential areas for future research. Firstly, the
Dehazing of Multispectral Images Using Contrastive Learning In … 417
4 Conclusion
has practical implications for applications such as remote sensing, surveillance, and
autonomous driving, where accurate image dehazing is essential.
The proposed system leverages the advantages of contrastive learning and
CycleGAN [17] to improve the performance of the dehazing model. Contrastive
learning is used to separate hazy and clear images by generating positive pairs of
hazy and clear multispectral images without requiring any labeled data. CycleGAN
is used to transform hazy multispectral images into their clear counterparts. Binary
bridge (BCE) loss is used in CycleGAN [17] to train the discriminator network.
In conclusion, the proposed system for image dehazing of multispectral images
using contrastive learning in CycleGAN [17] represents a significant contribution
to the field of image dehazing. The proposed system outperforms several existing
methods, provides a new approach to image dehazing, and has practical implications
for a wide range of applications. The suggested system may be further optimized in
the future, maybe with enhancements to the CycleGAN [17] and contrastive learning
components and the use of more challenging datasets to assess the system’s perfor-
mance. In general, the suggested strategy is poised to significantly advance picture
dehazing techniques and improve both the quality and purity of photographs across
a range of applications.
References
1. Li R, Pan J, Li Z, Tang J (2018) Single image dehazing via conditional generative adversarial
network. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 8202–8211
2. Zhang X, Wang J, Wang T, Jiang R (2021) Hierarchical feature fusion with mixed convolution
attention for single image dehazing. IEEE Trans Circuits Syst Video Technol 32(2):510–522
3. Liu Q, Gao X, He L, Lu W (2017) Haze removal for a single visibleremote sensing image.
Signal Process 137:33–43
4. Chen C, Do MN, Wang J (2016) Robust image and video dehazing with visual artifact suppres-
sion via gradient residual minimization. In: Computer vision–ECCV 2016: 14th European
conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14.
Springer International Publishing, pp 576–591
5. Xie B, Guo F, Cai Z (2010) Improved single image dehazing using dark channel prior and
multi-scale Retinex. In: Intelligent system design and engineering application (ISDEA), 2010
international conference on, vol 1. IEEE, pp 848–851
6. Narasimhan SG, Nayar SK (2003) Contrast restoration of weather degraded images. IEEE
Trans Pattern Anal Mach Intell 25(6):713–724
7. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: residual
learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
8. Xu H, Guo J, Liu Q, Ye L (2012) Fast image dehazing using improved dark channel prior.
In: 2012 IEEE international conference on information science and technology. IEEE, pp
663–667
9. Ren W, Liu S, Zhang H, Pan J, Cao X, Yang MH (2016) Single image dehazing via multi-
scale convolutional neural networks. In: Computer vision–ECCV 2016: 14th European confer-
ence, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14. Springer
International Publishing, pp 154–169
Dehazing of Multispectral Images Using Contrastive Learning In … 419
10. Nguyen LD, Lin D, Lin Z, Cao J (2018) Deep CNNs for microscopic image classification by
exploiting transfer learning and feature concatenation. In: 2018 IEEE international symposium
on circuits and systems (ISCAS). IEEE, pp 1–5
11. Krizhevsky A, Sutskever I, Hinton GE (Dec. 2012) ImageNet classification with deep convolu-
tional neural networks. In: Proceedings of advances in neural information processing systems.
Lake Tahoe, NV, USA, pp 1097–1105
12. Liu C, Hu J, Lin Y, Wu S, Huang W (2011) Haze detection, perfection and removal for high
spatial resolution satellite imagery. Int J Remote Sens 32(20):8685–8697
13. Moro GD, Halounova L (2007) Haze removal for high-resolution satellite. Int J Remote Sens
28(10):2187–2205
14. Monika LT, Rajiv S (May 2018) Efficient dehazing technique for hazy images using DCP and
WAF. Int J Comput Appl 179(42):0975–8887
15. Makarau A, Richter R, Muller R, Reinartz P (2014) Haze detection and removal in remotely
sensed multispectral imagery. IEEE Trans Geosci Remote Sens 52(9):5895–5905
16. Qin M, Xie F, Li W, Shi Z, Zhang H (2018) Dehazing for multispectral remote sensing images
based on a convolutional neural network with the residual architecture. IEEE J Select Topics
Appl Earth Observ Remote Sens 11(5):1645–1655. https://doi.org/10.1109/JSTARS.2018.281
2726
17. He X, Ji W, Xie J (2022) Unsupervised haze removal for aerial imagery based on asym-
metric contrastive CycleGAN. IEEE Access 10:67316–67328. https://doi.org/10.1109/ACC
ESS.2022.3186004
Stress Analysis Prediction for Coma
Patient Using Machine Learning
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 421
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_36
422 P. Alwin Infant et al.
1 Introductıon
In the working class, mental health illnesses linked to stress are not unusual. In
the Indian service sector, around 42% of current staff feel suicidal and perhaps
overall chronic anxiety as an outcome of overwork and demanding constraints. A
recent study was conducted by the firm body Ascham. Today, it is well-known that
stress at work contributes to depression. Increasing the number of stress indicators
available to workers in busy situations would be very beneficial to society from
a social standpoint [1]. Machine learning is defined as learning that is carried out
through experience with a type of event, performance measures, and it will get
better with time [2]. The World Health Organization (WHO) claims that, bad strain
is still a mental chronic condition which can disrupt a person’s relationships, lead
to despair, and ultimately result in suicide [3]. The ability to recognize stress can
enable people to actively manage it before negative effects occur. Psychological tests
and expert advice are used in traditional stress detection methods [4]. The anxiety
measure is rather arbitrary because the findings of questionnaires heavily depend
on the responses provided by individuals. The outcome scale would be skewed if
people choose to communicate their mental characteristics with hesitation [5]. The
more prevalent type of stress is distress. The other type, known as eustress, is also
referred to as “good stress” because it arises from a “positive” perspective on an
incident or circumstance [6]. Stress is beneficial in moderation since it can inspire
you and increase your productivity. Our pleasure and health may deteriorate if we
consistently react negatively [7].
In the education sector, children experience stress due to test anxiety, while in the
medical sector, patients experience stress due to various diseases that are treated for
various purposes [8]. With good mental health, an employee works constructively as
well as accomplishes goals [9]. Biological variables as well as social and economic
contexts are typically to blame for mental health issues. A student may benefit from
the assistance of their family and other close friends [10]. Some life circumstances
can be stressful, but whether they cause us problems depend on how we see them [11].
Stress among coma patients is seriously harmful to our health [12]. By effectively
merging numerous bits of information from various domains, machine learning is
able to predict future clinical outcomes and identify the domain combination that is
most likely to predict results [13]. Since fatigue is difficult for machines to distinguish
between anxiety, depression, and stress, an adequate learning algorithm is needed
for a precise diagnosis [14].
National Institute of Mental Health and Stroke (NINDS) reflexes may also cause
spontaneous motions like a grin, laugh, or weep [15]. In fact, machine learning
enables us to enhance performance in the healthcare industry without programming
[16]. The inability to localize unpleasant sensations is predicted with precision [17].
A patient who is in a coma is not receptive to either internal or external stimuli [18].
Coma is an unresponsive state with closed eyelids that typically indicates serious
and occasionally irreparable brain damage [19]. An organized strategy should be
employed while treating a comatose patient, with an initial focus on reversible causes
Stress Analysis Prediction for Coma Patient Using Machine Learning 423
2 Related Works
Bhushan et al. [26] “Review of Stress Prediction and Analysis Using Machine Learn-
ing”. As a result, the demand for bioinformatics to collaborate with learning algo-
rithms evolved. An analysis of tension or its symptoms using a wide range of tech-
niques from machine learning has been addressed in this work. Techniques like
random forest are among them. Dutta et al. [27] proposed a method that uses a
biosensor and machine learning to detect the stress. However, an overview of feasible
computer classification models is presented in this paper along with a comparison of
them using healthcare data analytics. Bisht et al. [28] proposed a Machine Learning
for Stress Diagnosis in Indian School Pupils. In order to examine one’s frustration,
K-nearest neighbors is utilized resulting in an accuracy of 88 percent. The method
is compared with a variety of classification models, such as tree structure, logistic
regression, KNN neighbors, and random forest. Pankajavalli et al. [29] “A Support-
Vector Machine Classifier Based on Hybrid Enhancing Particle Swarm Optimization
for Improved Stress Prediction”. The suggested stress prediction model extracts the
best characteristics using differential boost particle swarm optimization, and the
support vector machine (SVM). The suggested model achieves excellent accuracy
with a lower prediction error rate, according to the experimental data. Sudha et al.
[30] “Stress Detection Using Machine Learning Techniques Depending on Human
424 P. Alwin Infant et al.
Parameters”. In the proposed work, we’ll use a machine learning algorithm to identify
pressure based on human parameters.
3 Methodology
Using Bayes principle, the Bayes algorithm is a supervised learning method for
issues with categorization. Therefore, in the proposed, this procedure is employed
to anticipate the patient’s level of stress.
Stress Analysis Prediction for Coma Patient Using Machine Learning 425
Step 4: Using the xtrain and ytrain parameters to fit the random forest classifier
model.
Step 5: To predict the output with the aid of the random forest classifier model by
using the xtest parameters.
Step 6: To find the accuracy value with aid of the random forest classifier model
by using the xtest and ytest parameters.
Step 6: To determine an average accuracy one may use the KNN classifier model
by using the xtest and ytest parameters.
An example of supervised learning is a decision tree. The variables are arranged here
using a decision tree algorithm based upon this system’s person’s level of anxiety.
4 Results Analysıs
This section discusses about the outcome of the proposed work. The data visualization
for the proposed work depending on the plotting graph, is displayed in Fig. 2. The
counter graph of our data is shown in Fig. 3.
The confusion matrix and classification report for every single algorithm of the
system is presented in Fig. 4a–e.
The prototype implementation of the system’s accuracy compares all of the
algorithm scores before showing the system’s graph as depicted in Fig. 5 of this
article.
The comparison of existing method and proposed method is shown in Fig. 6.
The proposed system is compared with the existing methods like KNN, Naive
Bayes, random forest, and decision tree. The accuracy of the proposed system
using logistic regression is 98% which outperforms the conventional classification
methods. These results are noteworthy, but there are a few restrictions that need to
be considered. First, despite the fact that a machine learning model produced the
most trustworthy and consistent findings, the predictors all originated from the same
dataset. Real-time data and subsequent treatment and clinical indicators in contexts
of advanced treatment might make the model more accurate.
5 Conclusıon
In conclusion, the phrase “altered mental status” covers a wide variety of patient
behaviors, from perplexity to profound non-responsiveness. The most frequent
causes of coma will be covered in this activity, along with the factors that should
be taken into account while choosing the best course of action. To assess the coma
patients’ stress levels, machine learning methods are used. When we examine the
accuracy scores of all the algorithms, we finally reach 98% for the logistic regres-
sion algorithm, which demonstrates superior performance when compared with other
system models. This model can be improved further by using a deep learning algo-
rithm and putting the user interface of the proposed work through its paces using a
Django web application. Alternatively, we construct this work using a Tkinter-based
Stress Analysis Prediction for Coma Patient Using Machine Learning 429
Fig. 4 a Confusion matrix of KNN algorithm, b Confusion matrix of decision tree algorithm,
c Confusion matrix of Naive Bayes algorithm, d Confusion matrix of logistic regression algorithm,
e Confusion matrix of random forest algorithm
graphical user interface. Human input can be entered manually, and after selecting
the predict button, an outcome based on system datasets is displayed.
430 P. Alwin Infant et al.
References
1. Attallah O (2020) An effective mental stress state detection and evaluation system using a
minimum number of frontal brain electrodes. Diagnostics 10(5):292
2. Sophia G, Sharmila C (Sep 2019) Recognition, classification for normal, contact and cosmetic
iris images using deep learning. Int J Eng Adv Technol 8(3):4334–4340. ISSN: 2277-3878
3. Pankajavalli PB, Karthick GS, Sakthivel R (2021) An efficient machine learning framework
for stress prediction via sensor integrated keyboard data. IEEE Access 9:95023–95035
4. AlShorman O, Masadeh M, Heyat MBB, Akhtar F, Almahasneh H, Ashraf GM, Alexiou A
(2022) Frontal lobe real-time EEG analysis using machine learning techniques for mental stress
detection. J Integr
5. Gupta R, Alam MA, Agarwal P (2020) Modified support vector machine for detecting stress
level using EEG signals. Comput Intell Neurosci
Stress Analysis Prediction for Coma Patient Using Machine Learning 431
6. Kang M, Shin S, Jung J, Kim YT (2021) Classification of mental stress using CNN-LSTM
algorithms with electrocardiogram signals. J Healthcare Eng 2021:1–11
7. Chung J, Teo J (2022) Mental health prediction using machine learning: taxonomy, applications,
and challenges. Appl Comput Intell Soft Comput 2022:1–19
8. Laijawala V, Aachaliya A, Jatta H, Pinjarkar V (2020, June) Classification algorithms
based mental health prediction using data mining. In: 2020 5th international conference on
communication and electronics systems (ICCES). IEEE, pp 1174–1178
9. Mutalib S (2021) Mental health prediction models using machine learning in higher education
institution. Turkish J Comput Math Educ (TURCOMAT) 12(5):1782–1792
10. Ogunseye EO, Adenusi CA, NawakwaUgwu AC, Ajagbe SA, Akinola SO (2022) Predictive
analysis of mental health conditions using AdaBoost algorithm. Paradigm Plus 3(2):11–26
11. Singer G, Golan M (2021) Applying datamining algorithms to encourage mental health
disclosure in the workplace. Int J Bus Inf Syst 36(4):553–571
12. Ge F, Li Y, Yuan M, Zhang J, Zhang W (2020) Identifying predictors of probable posttraumatic
stress disorder in children and adolescents with earthquake exposure: a longitudinal study using
a machine learning approach. J Affect Disord 264:483–493
13. Priya A, Garg S, Tigga NP (2020) Predicting anxiety, depression and stress in modern life
using machine learning algorithms. Proc Comput Sci 167:1258–1267
14. Leightley D, Williamson V, Darby J, Fear NT (2020) Identifying probable post-traumatic
stress disorder: applying supervised machine learning to data from a UK military cohort. J
Ment Health 28(1):34–41
15. Sathya D, Sudha V, Jagadeesan D (2020) Application of machine learning techniques in health-
care. In: Handbook of research on applications and implementations of machine learning
techniques. IGI Global, pp 289–304
16. Zulfiker MS, Kabir N, Biswas AA, Nazneen T, Uddin MS (2021) An in-depth analysis of
machine learning approaches to predict depression. Curr Res Behav Sci 2:100044
17. Srinivasan S, Bidkar PU (2020) Approach to a patient with Coma. Acute Neuro Care: Focused
Appr Neuro Emerg 23–34
18. Tarulli A, Tarulli A (2021) Coma and related disorders. Neurol Clin Appr 23–39
19. Rabinstein AA (2020) Acute coma. Neurol Emerg Pract Appr 1–13
20. Sujal BH, Neelima K, Deepanjali C, Bhuvanashree P, Duraipandian K, Rajan S, Sathiya-
narayanan M (2022, Jan) Mental health analysis of employees using machine learning
techniques. In: 2022 14th International conference on communication systems & networkS
(COMSNETS). IEEE, pp 1–6
21. Consoli D, Galati F, Consoli A, Bosco D, Micieli G, Serrati C, Cavallini A (2021) Coma.
Decision algorithms for emergency neurology, pp 17–51
22. Pandwar U, Ramteke S, Motwani B, Agrawal A (2022) Comparison of full outline of unre-
sponsiveness score and glasgow coma scale for assessment of consciousness in children with
acute encephalitis syndrome. Indian Pediatr 59(12):933–935
23. Sahlan F, Hamidi F, Misrat MZ, Adli MH, Wani S, Gulzar Y (2021) Prediction of mental health
among university students. Int J Percept Cogn Comput 7(1):85–91
24. Lu H, Uddin S, Hajati F, Khushi M, Moni MA (2022) Predictive risk modelling in mental
health issues using machine learning on graphs. In: Australasian computer science week 2022,
pp 168–175
25. Tutun S, Johnson ME, Ahmed A, Al Bizri A, Irgil S, Yesilkaya I, Harfouche A (2022) An
AI-based decision support system for predicting mental health disorders. Inf Syst Front 1–16
26. Bhushan U, Maji S (2022, Nov) Prediction and analysis of stress using machine learning: a
review. In: Proceedings of third doctoral symposium on computational intelligence: DoSCI
2022. Singapore: Springer Nature Singapore, pp 419–432
27. Dutta A, Tripathy HK, Sen A, Pani L (2021) Biosensor for stress detection using machine
learning. In: Cognitive informatics and soft computing: proceeding of CISC 2020. Springer
Singapore, pp 85–97
28. Bisht A, Vashisth S, Gupta M, Jain E (2022, April) Stress prediction in Indian school students
using machine learning. IEEE
432 P. Alwin Infant et al.
29. Pankajavalli PB, Karthick GS (2022) Improved stress prediction using differential boosting
particle swarm optimization based support vector machine classifier. Springer Singapore
30. Sudha V, Kaviya S, Sarika S, Raja R (2022) Stress detection based on human parameters using
machine learning algorithms
AI-Powered News Web App
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 433
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_37
434 S. Phatangare et al.
AI and its applications. It also examines our proposed system which is an applica-
tion of AI. It also discusses other technologies used in the system like speech-to-text
technologies. Along with this, details regarding the working and methodology of this
proposed system are also explained.
1 Introduction
Today, technology, and data are used to inform a wide range of decisions we make in
our daily lives. This implies that for any industry to remain viable in the future, it must
adapt and adopt these technologies. Although, AI is frequently used in news media,
academic study on the subject has not reached its full potential. Speech recognition
is widely used in applications that respond to voice commands or ask questions to
accurately and automatically convert voice data into text data. Beyond that, NLP
is a computer program’s ability to understand and react to text or speech input,
extract meaning from sentences, or compose understandable communications, much
like people do. This broad topic includes translation, categorization, and clustering
as well as information extraction as one of its subtopics. The former explains the
procedure that turns unstructured data into structured data that can be understood.
1.1 Use of AI
Despite the fact that AI is a fundamentally vast topic that is categorized based upon
some number of factors, some of the well-known AIs are explained below:
1. Neurology serves as the foundation for one field of artificial intelligence, which is
neural networks. It makes advantage of cognitive science and automates activities
using Python. These may quickly and efficiently tackle difficult problems when
paired with ML.
2. Robotics: A hot area of artificial intelligence, robotics is primarily concerned with
the creation of robots. It is a multidisciplinary subject of science and engineering
where the fundamental architecture (design of the system), manufacture, the way
it’s operated, and the usage of robots are primarily decided. Robots can complete
activities that humans would find challenging to complete consistently.
3. Fuzzy Logic: Identifying whether a condition is true or untrue can sometimes be
quite challenging. Fuzzy logic has the capacity to represent and change uncertain
information depending on how accurate it is. Standard logic is 1.0 for absolute
truth, 0.0 for absolute falsehood, and anything in between for uncertainty.
4. Natural Language Processing [NLP]: By creating strategies that facilitate
communication using NLP, human languages may be represented. These systems
AI-Powered News Web App 435
have the ability to recognize the meaning of a line of text and take the appro-
priate response. It primarily focuses on speech recognition, text translation, and
sentiment analysis. Many contemporary applications utilize it to filter out spam
and violent messages and to improve user experiences on their platform.
The above section highlights the importance of AI in contemporary society.
Recognizing its relevance we have used this technology to build our proposed system.
What we are proposing is a news website where users can easily access news from
news all sites by simply searching a topic or a headline. This is accomplished using an
API-News API. To make the experience more convenient for users, we have imple-
mented voice processing techniques, so that the user need not type and can dictate
the command to run the search. A supporting chatbot is also implemented to help
users easily navigate the website.
One of the many crucial aspects of this study is the branch of natural language
processing. In order to fulfill a request, the programs are expected to comprehend the
user’s voice. We have utilized Alan AI for the proposed system. Users may utilize
voice commands to get information from applications using Alan. In contrast to some
other voice assistants, Alan enables businesses to create custom speech experiences
for their apps. The front end is developed using ReactJS. It is developed and main-
tained by Facebook. It is an open-source and JavaScript-based web framework. It
has been around since 2013 and served as the foundation for several outstanding
online apps. VS Code is a free code editor that can be used with MacOS, Windows,
and Linux. It offers various excellent features, including: first-class code completion,
refactoring of code, assistance with debugging, etc.
2 Literature Review
Edriss et al. [1] presented their study in the research article. Its major goal was to
use deep learning techniques for speech synthesis and evaluate how well it performs
in terms of aperiodic distortion compared with existing models of natural language
processing algorithms.
We also read the research article presented by Sonali et al. [2]. The study describes
a voice-activated home automation system based on microcontrollers that uses smart-
phones. Users would be able to voice-control every gadget in their home with the
help of such a system. All the user needs is an Android smartphone, which almost
everyone has these days, and a control circuit.
We also studied the system developed by Han et al. [3]. Their goal was to create
a browser that would respond to voice commands. Using a microphone, a voice-
activated web browser digitizes spoken words, which are then fed into a speech-to-
text (STT) conversion tool. The STT converter application then chooses the set of
words that are the most likely from the stored commands. The program searches the
terms, and if a match is found, the specified action will be taken. The web browser
can be operated using special commands. The web browser receives these directives
436 S. Phatangare et al.
as input, and as a result, the prescribed actions will be carried out. In the event that
the commands do not match, nothing will occur, and the user must try again.
In the paper written by David et al. [4] a system of voice-controlled surgical
equipments of the kind used in operating rooms to carry out surgical procedures is the
focus of the invention as it is now, has been presented. The invention includes a voice-
controlled integrated surgical suite that includes at least a surgical table and a surgical
light-head device. As we studied many applications of speech recognition, we came
across another interesting application which was—utilizing a voice-controlled device
to prevent erroneous wake word detections [5], along with a wireless communication
system with voice control [6].
People can use voice commands to navigate the web using voice browsers.
Although plug-ins are used by the voice capability of the current browsers, these
plug-ins need to be downloaded separately because they are not a core part of the
browsers. Bajpei et al. [7] presented an approach to create a browser capable of
responding to voice commands.
After reviewing the above listed papers, we observed that there were a few systems
that were similar to what we were proposing, but not entirely. The existing systems
were mostly browsers that utilized voice processing techniques to take commands
from user to navigate through the Internet. Other uses included a software that assisted
in surgeries by using voice command and a voice command-based home automa-
tion system. Our proposed system like these other uses voice processing or voice
commands, but the system proposed or its application is completely different from
these. Our system focuses on only providing relevant news or information and will
not perform generic tasks to navigate a browser.
3 Methodology
ReactJS and Material UI were used to implement the proposed system’s front-end.
ReactJS, to create user interface components, was already described. The components
are reusable as often as the developer wants. On the client-side as well as the server-
side, React may be utilized with a variety of frameworks. React was chosen for
front-end mostly because it enables us to build web applications with enormous
amounts of data and doesn’t force us to reload the page every time we change data. It
also has a huge community support. A messenger called the application programming
interface, or API, receives requests, and informs a system of what the user needs.
Then, using this connection, applications may communicate with one another to send
requests and get replies. The actual backend connecting engine between several other
apps is an API (Fig. 1).
AI-Powered News Web App 437
The API key of NewsAPI is integrated with code on Alan API. One may search for
and obtain accurate, up-to-date news throughout the world using the NewsAPI [1].
There are several query sets accessible; one may look for news by keywords, genre,
or news source. The developer can alter the news query for the specific locations.
The developer may obtain breaking news as needed using this HTTP REST API. The
front-end was created using the functionality of this API, and it has cards that display
news based on searches, categories, or news sources. By offering an SDK that is
simple to integrate and Alan Studio for JavaScript scripting, the Alan Conversational
Platform offers robust support for your app. The testing tool offered by Alan Studio
allows the developer to troubleshoot JavaScript instructions. The Alan button may
be dynamically positioned anywhere by swiping or sliding it with the mouse without
interfering with the application’s user interface. Given that Alan Studio itself manages
the cloud, it is even more potent. The cloud easily manages data security and isolation
(Fig. 2).
The news requirement served as the foundation for the proposed system’s
scripting. The voice assistant’s reading of all the news headlines that the user looked
for is entirely written. Any article that can be seen on the front-end can be requested
to be opened so that the user can read it in detail. When a user requests to access a
certain article, the system leads them to that particular news article. Alan provides
certain more functions to the system, such as mathematical computations and small
chat (Fig. 3).
4 Method
The proposed system uses News API to fetch the required news articles. News API
is a RESTful API that allows you to search and retrieve live articles from all over
the web. To achieve this, it makes use of a variety of technologies, including: the
fundamental protocol that News API uses when speaking with users is HTTP. The
JSON format is what the News API uses to display its data. Caching: To enhance
performance, the News API uses caching. In order to access items more rapidly, it
does this by storing duplicates of them in memory. Rate limiting: To stop abuse,
News API employs rate limiting. In other words, it restricts how many requests a
customer may make in a particular amount of time. When you send a request to News
API, it first looks in its cache to see if it already has a copy of the requested article.
AI-Powered News Web App 439
If so, it will retrieve the article from the cache and return it. If it doesn’t already have
a copy of the article, it will download it from the original website and cache it.
Alan AI’s speech-to-text system makes use of a deep learning model. This model
was developed using a sizable dataset of transcripts and audio recordings. The speech
recognition model gains the ability to identify speech patterns associated with various
words and phrases. The model is used to convert spoken words into text when a
user speaks to Alan AI. The specific algorithm employed by Alan AI is not made
available to the public. It is most likely a variation of the hidden Markov model
(HMM) technique, which is covered in more detail later. Overall, Alan AI’s speech-
to-text algorithm is quite precise. Depending on the quality of the audio recording
and the accent of the speaker, it will probably be between 90 and 95% correct.
An example of aiding technology that narrates digital text aloud is text-to-speech
(TTS). Another name for it is “read-aloud” technology. With the click or push of a
button, TTS can convert text on a computer or other digital device, such as a mobile
phone, PC, or tab into audio. Children who struggle with reading and people with
visual disabilities can both benefit greatly from TTS. However, it can also aid children
in their writing, editing, and even focus.
P(A, B)
P(A|B) = (2)
P(B)
2. The hidden Markov model (HMM), the alternative algorithm, essentially merely
moves in the other direction. HMMs view forward as opposed to backward. A
HMM algorithm employs probability and statistics to make an educated predic-
tion as to what will follow next without taking into account any information of
the prior state—in our case, the words that came before the word in question.
440 S. Phatangare et al.
B1 = X 1 ∗ f (3)
A = Sigmoid(B1) (4)
• Set the bias and weight matrices to a random starting point. To the values, apply
a linear transformation.
B2 = W 1∧ T . A + d (5)
O = Sigmoid(B2) (6)
Backward Propagation
We initialize the weights, biases, and filters at random before starting the forward
propagation procedure. These numbers are regarded as convolutional neural network
technique parameters. The model tries to modify the parameters during the backward
AI-Powered News Web App 441
propagation phase so that the overall predictions are more accurate. The gradient
descent method is used to update these parameters.
A key new component that is revolutionizing how people live is voice control. Mobile
phones and PCs both regularly use the voice assistant. AI-based voice assistants are
operating systems that can recognize a human voice and respond with one of their in-
built voices [8]. Screen readers are frequently used by people with vision impairments
to engage with computer systems. These people are increasingly using voice-based
virtual assistants a lot (VAs) [9]. Nobody anticipated that speech recognition, which
first appeared in the early 2010s with the launch of Siri, which is now one of the most
well-known gadgets, would become such a major factor in the development of new
technologies. According to estimates, 1/6th of Americans now have a smart speaker
at home. This just provides us a little peek of how far-reaching this technology is
and how many opportunities it offers.
In the case of speech-to-text technology, there are many technologies and models
present. Some popular applications/models include Siri, Google, Alexa, etc. We
compared these with the technology we used—Alan AI. All four of these speech-
to-text engines are remarkably close in terms of accuracy. There are some vari-
ations in how they function, though. For instance, while Alexa and Google use
deep neural networks (DNN), Siri uses a statistical model called a hidden Markov
model (HMM) to transcribe voice. HMMs and DNNs are both used by Alan AI. The
way these speech-to-text engines manage background noise is another distinction
between them. The speech-to-text engine in Google’s search engine is not as effec-
tive at handling background noise as Siri and Alexa are. When it comes to being able
to deal with background noise, Alan AI falls in between Siri and Alexa. This has
been summarized in Tables 1 and 2.
These APIs all provide a comparable set of functionalities. The cost is where they
diverge most from one another. If you want a free API with high-quality data, Alan
News API is a fantastic choice. With a straightforward RESTful interface, it is also
simple to use.
In the realm of speech recognition, one of the most significant predictions for
2020 was that people’s search habits will alter. The conversion of touch points into
listening points is now affecting brands. There is no visual interface for voice assis-
tants. The technology will let the user engage both visually and vocally [10]. The user
will be able to have a totally user-friendly experience and interact with the technology
more if he or she can see and hear the content they have requested. Voice recogni-
tion technology is developing globally, and in the next few years, it is predicted to
increase and generate tremendous earnings. The system will add a lot of functions
and capacities as it currently meets the majority of the requirements. An increase in
the number of languages the system could handle would make using it even more
comfortable and simple for our users.
7 Future Scope
We can expand the capability of our application to advance further. If we add more
languages to the software, such as Hindi, Marathi, Japanese, etc., our website would
be able to command many different languages and be utilized all over the world.
Machine learning can also be used to develop and integrate a recommendation system
that will categorize news and offer stories based on the user’s interests. For example,
if a user requests positive news, our web app will deliver stories that fall under the
positive category, and vice versa. The overall GUI can also be upgraded and made
more user-friendly.
8 Conclusion
There were a variety of factors that motivated us to come up and build this proposed
work. It started with an interest in AI and its current applications in the industry.
After researching, we observed that its application in journalism is quite limited and
has not been explored much. Reading a newspaper takes a significant amount of
AI-Powered News Web App 443
time, and the majority of that time is often spent reading about subjects in which
the reader has absolutely no interest. Other times, the user gets sucked into reading
unnecessary things present in the paper, until they are able to get to the main point.
The software is able to read the headlines of all the articles in the news, and if the
user wants it, it can also read the entire piece for a more in-depth look. It fluently
reads out the entire article for the user, allowing them to listen to the news in comfort
and ease.
Applications similar to these do exist in the market. For ex-post for Alexa and
BBC News for Alexa, etc. The difference between these apps and our proposed work
is the APIs and technology used. We believe that the STT in our system is better
than the ones utilized in those apps. Other difference is that those systems are apps
whereas our system is a website and is accessible from any device. Along with this,
our system provides a chatbot for customer support. Applications in fields such as
healthcare, business, finance, and electronic commerce are just some of the many
domains in which the Alan voice assistant might be used. With this, we are going to
proclaim that our proposed work is complete.
References
1. Edriss A (2020) Deep learning based NLP techniques. In: Text to speech synthesis for
communication recognition. J Soft Comput Paradigm
2. Shamik C, Sonali S, Raghav T, Ankita B (2015) Design of an intelligent voice controlled home
automation system. Int J Comput Appl 121(15):39–42
3. Han S, Jung G, MinsooRyu BUC, Cha J A voice-controlled web browser to navigate hierarchical
hidden menus of web pages in a smart-tv environment. In: Proceedings of the 23rd international
conference on World Wide Web
4. Holtz BE, Sanders WL, Hinson JR, Zelina FJ, Sendak MV, Logue LM, Belinski S, McCall DF
(1999) Voice-controlled surgical suite. US Patent No. US6591239B1, Issued Since
5. Freed IW, Barton W, Rohit P (2014) Preventing false wake word detections with a voice-
controlled device. Google Patent US9368105B1
6. Stephen B, Mickey K (2016) Voice controlled wireless communication device system.
US7957975B2, Google Patents, 2016-Present
7. Bajpei AB, Shaikh MS, Ratate NS Voice operated web browser. Int J Soft Comput Artif Intell
8. Siddesh S, Subhash AUS, Srivatsa PN, Santhosh B (2020) Artificial intelligence-based
voice assistant. In: 2020 Fourth world conference on smart trends in systems, security and
sustainability
9. Adam F, Vtyurina A, Morris MR, Leah F, White RW (2019) VERSE: bridging screen readers
and voice assistants for enhanced eyes-free web search. In: Proceedings of the 21st international
ACM SIGACCESS conference on computers and accessibility (ASSETS ‘19)
10. Aaditya C, Ranjeet K, Saini A, Akash K (2021) Voice controlled news web application with
speech recognition using alan studio. Int J Comput Appl 0975–8887
Design of Efficient Pipelined Parallel
Prefix Lander Fischer Based on Carry
Select Adder
Abstract Every digital processing system may execute basic operations like adding
and subtracting by using a number of binary adders with various addition durations
(latency), space constraints, and power requirements. The power delaying products
(PDP) of digital signal processor (DSP) components must be kept to a minimum
in order to achieve greater effectiveness in extremely large-scale integration plat-
forms. The types of adders that utilize prefixed functioning for effective additions
are parallel prefix adders or carrying tree adders. Because of their capabilities for
high speeds computing, parallel prefix adders are the most employed adders nowa-
days. It’s referred to as a carry tree adder, which is much faster than rippling carrier
adders, carrier skipping adders, carrier selection adders (CSA), etc., and does arith-
metic addition using the prefix function. This research compares the effectiveness of
findings on the characteristics of region, latency, and power for a 32-bit implementa-
tion of several parallel prefix Ladner-Fischer adders. The Ladner-Fischer adder with
a black cell requires a lot of memory to operate. To increase the effectiveness of
the Ladner-Fischer adder, the gray cell can thus be used as a substitute for the black
cell. The prescribed technique’s operations are divided into three primary steps: prior
processing, generations, and subsequent processing. The propagation and generation
phases are the pre-processing step. The generating phase concentrates on carry gener-
ation, whereas the post-processing step concentrates on the outcome. This research
calculates the Logic and routing delayed effectiveness of the suggested architecture.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 445
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_38
446 J. Harirajkumar et al.
According to the findings of the study, CSA with a parallel prefix adder performs
better than the traditional adapted CSA and uses less space.
1 Introduction
Individuals can easily understand and use numbers in decimals to execute mathe-
matical operations. For particular computations in digital architecture like a micro-
processor, a digital signal processing (DSP), or an application-specific integrated
circuits (ASIC), binary values become more feasible. Since binary values are the
most effective at expressing a wide range of values, this takes place. Among the
most crucial logic components in digital systems are binary adders [1]. The embedded
system is a subset of a number of computing architectures that are primarily intended
to do a single task, including preserving, analyzing, and managing information
across various electronics-based devices. Consumer gadgets like pagers, calculators,
mobile phones, portable video games, digital cameras, etc., are examples of items
that have embedded technology [2]. As handheld technology expands to include
high-performance, computation-intensive goods like handheld computers and mobile
phones, both of these factors are currently combined. The fundamental difficulty in
electronic structure designs currently is low-power consumption, which is a signif-
icant growth in the electronics semiconductor business [3]. It is vital for having a
fast adder since these adders can do such tasks. When using digital adders, the pace
of additions is determined by how quickly a carry is forwarded by the adder. In a
basic adder, the summation for every bit is produced only progressively following
the earlier bit location has been included and a carry has been transmitted into the
following position [4].
Every digital, DSP, or control system must have addition operations. Thus, the
efficiency of adders determines how quickly and precisely a digital system operates.
Therefore, the primary focus of studies in VLSI system layout is on enhancing addi-
tion effectiveness [6]. Previously, processors used 32-bit carrying adders, including
the Ripple Carry Adder (RCA), Carry Propagate Adder (CPA), and Carry Look
Ahead Adders (CLA), with various addition timings (delay), areas, and requirements
for power [7].
2 Literature Review
of pipelined multiplication may be performed using the suggested way. Due to the
complexity of the process of designing, this strategy is ineffective [13]. A notable
function for adder circuitry nowadays is microprocessors. In key routes of mathe-
matical operations like multiplying and subtracting, adders are frequently utilized.
In this study, an improved 4-bit CLA and a Carrier Selection Adder (CSA) designing
technique is suggested. Since it uses more power and takes up more chip space, this
method is ineffective [14].
3 Proposed Methodology
In order to overcome the significant delayed issue of present carry adders, this study
suggests the structure of 32-bit variable parallel prefix adder for lower latency and
low-power VLSI usages. To build quick broad adders, CSA employs many narrow
adders. The appropriate total can simply be chosen by a 2-to-1 mux whenever the
carry-in is ultimately determined. Figure 1 depicts the top view of carry select adder
and its schematics representation is depicted in Fig. 2. The adder built using this
methodology is referred to CSA. The proposed structure of 32-bit variable parallel
prefix adder can be used for quick arithmetic operations, data processing.
Highly effective addition operations are performed using the Ladner-Fischer adder.
The identical prefixed adder utilized to carry out the addition operations is the Ladner-
Fischer. It appears to be an arrangement of trees that carries out the arithmetic oper-
ations. Black cells and gray cells make up the Ladner-Fischer adder. In each black
cell, it has two gates for AND along with one OR gate. In every gray cell, there is
just one AND gate. Every bit in ripples carrying adders waits for the previous bit’s
action. The goal is to overlapping the carrying propagation of the initial addition
with the computation in the succeeding adding in parallel prefix adders instead of
awaiting it to occur since recurring additions can be managed by a multi-operand
adder.
Four inputs (Gk0 , Pk0 , Pk1 , Gk1 ) and three logical gates one OR gate and two AND
gates make up the black cell. In the next black cell, along with Pk2 and Gk2 , the
outcomes of gout and pout are provided as inputs. Repeating the procedure until
the desired result is reached. Greater storage and latency are used by the black cell.
Black cells can be changed for gray cells to decrease them. Figure 3 represents the
black and gray cell in 8-bit Ladner-Fischer adder.
In the prior processing stage, generating and propagating are picked from every
pair of sources. While the creation conducts a “AND” operation on the input bits,
propagating provides an “XOR” operation. The propagation and generation functions
are shown in Eqs. 1 and 2, respectively.
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 449
Pk = Yk (XOR)Z k , (1)
G k = Yk (AND)Z k . (2)
During this phase, carry is created for every bit and is referred to as carry generates
(C k(G) ), and carry propagates (C k(P) ) for carry that is propagated for every bit. Each
action has a final cell that provides carry, and the propagating and produced carriers
are generated for the upcoming action.
The black cell in Eqs. (3) and (4) represents the carry propagation and carry gener-
ation, respectively, whereas the gray cell in Eq. (5) represents the carry generating
that is depicted below. The succeeding bit and the last bit will be added together
simultaneously as the outcomes of the last bit of carry. Equation 5 provides the carry
generation, which is used for the succeeding bit sum action.
In the last phase of an efficient Ladner-Fischer adder, which is depicted in Eq. 6, the
carry of the first bit is XORed with the next bit of propagated information. The result
is then presented as a sum.
Ok = Pk (XOR)Ck(in)−1 . (6)
It is utilized for two thirty-two-bit addition processes, and before every bit outputs
the resultant total, it passes through initial processing and generating stages.
The initial batch of input bits undergoes pre-processing before being produced,
propagated, and generated. This generated and propagated proceeds through a gener-
ating step that results in carry generated and carry propagated before providing
the ultimate result. Figure 4 depicts the step-by-step operation of an effective
Ladner-Fischer adder (Figs. 5, 6 and 7).
4 Results
Numerous levels of abstraction have been simulated in the past. The requirements
have been met by the extension of the simulation. Xilinx ISE Design Suite 14.2 and
ModelSim, and the related delay computations have been recorded.
Table 1 includes each of the 32-bit parallel prefix adder categories, including
32-bit Ladner-Fischer, 32-bit Han-Carlson, and 32-bit Ladner-Fischer with Register.
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 451
Fig. 7 Technology
schematic of 32-bit
Ladner-Fischer adder
Findings from simulations are checked based on area, power, and latency. Figure 8
shows the 32-bit sign Lander Fischer adder simulated outcome. Figure 9 shows the
simulation result for a 32-bit Lander Fischer adder based on a carry select adder, and
Fig. 10 shows the synthesis result of the area. The result also includes the waveforms
and comparative findings for each of the three parallel prefix adders.
Delay Graph
Table 1 Area and delay comparative findings for several PPA categories
S. No Method Area Delay
SPARTAN 3 Slice Gate LUT Max delay Gate delay Path delay
S5000 4FG 900 count (ns) (ns) (ns)
1 32-bit 75 879 139 23.161 11.975 11.186
Han-Carlson
2 32-bit 50 552 85 32.548 16.530 16.018
Ladner-Fischer
3 32-bit 48 790 82 7.165 6.364 0.801
Ladner-Fischer
with Register
Fig. 9 Simulation result for 32-bit Lander Fischer adder based on carry select adder
Table 2 Comparison of various adder performances in terms of delay, area, and power usage
References Technique Area (µm2 ) Delay (ns) Usage of power (mW)
[2] Brent-Kung Adder 105 4.04 –
(BKA)
[12] Kogge-Stone adder 100 8.63 32.02
(KSA)
[2] Hans-Carlson adder 145 4.322 –
(HCA)
[12] Pasta adder (PA) 70 7.123 30.04
Proposed parallel prefix 48 0.801 24.80
Lander Fischer adder
454 J. Harirajkumar et al.
Delay Graph
40
30
Over all Delay
20 Gate Delay
Path Delay
10
80
60
40
20
0
BKA KSA HCA PA Proposed
PPLFA
Techniques
6
4
2
0
BKA KSA HCA PA Proposed
PPLFA
Techniques
20
10
0
KSA PA Proposed PPLFA
Techniques
Design of Efficient Pipelined Parallel Prefix Lander Fischer Based … 455
5 Conclusion
In order to increase efficiency and reduce the size, method for designing a Ladner-
Fischer adder is presented in this study. Pre-processing, carry generations, and post-
processing are all three steps of the suggested system’s activities. The propagating and
generating stages are the emphases of the pre-processing phase, the carry-generating
phase is the focus of the carry-generating stage, and the following processing phase
is the emphasis of the outcome. To speed up the process of binary addition, the total
amount of cells in the carry-generating phase is decreased in a way that resembles
the structure of a tree. The suggested adder additions process has a lot going for it in
terms of minimizing latency. Xilinx 14.5 version of VHDL is employed to assess the
effectiveness of these adders. The benefits of the suggested technique are confirmed
using the outcomes of the simulation. The effectiveness of the suggested system is
compared with that of other current methodologies. The outcomes of the comparison
demonstrate how much latency is reduced by the suggested 32-bit parallel prefix
Lander Fischer adder.
References
1. Santhi G, Deepika G (2016) Realization of parallel prefix adders for power and speed critical
applications. Int J VLSI Syst Des Commun Syst (IJVDOS) 4(02)
2. Al-Haija QA, Asad MM, Marouf I, Bakhuraibah A, Enshasy H (2019) FPGA synthesis and
validation of parallel prefix adders. Acta Electronica Malaysia 3(2):31–36
3. Honda S (2019) Low power 32-bit floating-point adder/subtractor design using 50 nm CMOS
VLSI technology. Int J Innov Technol Explor Eng 8(10)
4. Arunakumari S, Rajasekahr K, Sunithamani S, Suresh Kumar D (2023) Carry select adder using
binary excess-1 converter and ripple carry adder. In: Lenka TR, Misra D, Fu L (eds) Micro and
nanoelectronics devices, circuits and systems, in Lecture notes in electrical engineering, vol
904. Springer, Singapore, pp 289–294. https://doi.org/10.1007/978-981-19-2308-1_30
5. Chandrika B, Krishna GP (2016) Implementation and estimation of delay, power and area for
parallel prefix adders
6. Galphat V, Lonbale N (2016) The high speed multiplier by using prefix adder with MUX and
Vedic multiplication. Int J Sci Res (IJSR), ISSN (Online), pp 2319–7064
7. Shinde KD, Badiger S (2015) Analysis and comparative study of 8-bit adder for embedded
application. In: 2015 international conference on control, instrumentation, communication and
computational technologies (ICCICCT), IEEE, 2015, pp 279–283. https://doi.org/10.1109/ICC
ICCT.2015.7475290
8. Moqadasi H, Ghaznavi-Ghoushchi MB (2015) A new parallel prefix adder structure with effi-
cient critical delay path and gradded bits efficiency in cmos 90nm technology. In: 2015 23rd
Iranian conference on electrical engineering, IEEE, pp 1346–1351. https://doi.org/10.1109/Ira
nianCEE.2015.7146426
9. Hebbar R, Srivastava P, Joshi VK (2018) Design of high speed carry select adder using modified
parallel prefix adder. Procedia Comput Sci 143:317–324. https://doi.org/10.1016/j.procs.2018.
10.402
10. Santhakumar C, Shivakumar R, Bharatiraja C, Sanjeevikumar P (2017) Carrier shifting algo-
rithms for the mitigation of circulating current in diode clamped MLI fed induction motor
drive. Int J Power Electron Drive Syst 8(2):844–852
456 J. Harirajkumar et al.
11. Shivakumar R, Chandrasekar S, Sankarganesh R (2017) Signal characteristics for high voltage
transformer applications. J Int J Comput Theor Nanosci 14(4):1770–1777
12. Shivakumar R, Rangarajan M (2016) Small signal stability analysis of power system network
using water cycle optimizer. J Int J Emerg Technol Comput Sci Electron 20(2):297–30232
13. Silviyasara T, Harirajkumar J (2016) Efficient architecture reverse converter design using Han
Carlson structure with carry look ahead adder. In: 2016 international conference on wireless
communications, signal processing and networking (WiSPNET) Mar 23. IEEE, pp 1620–1623
14. Silviyasara T, Harirajkumar J (2016) Efficient architecture reverse converter design using Han
Carlson structure with carry look ahead adder. In: 2016 international conference on wireless
communications, signal processing and networking (WiSPNET). IEEE, pp 1620–1623
Non-invasive Diabetes Detection System
Using Photoplethysmogram Signals
Abstract Diabetes Mellitus (DM) is a chronic condition, where the body is unable
to control blood sugar levels. In this paper, a non-invasive method to classify
diabetic and non-diabetic cases is discussed. The collection of blood samples from
patients by puncturing their fingers causes discomfort, pain, and infection, which are
serious drawbacks of commercially available invasive blood glucose level monitoring
systems. A novel non-invasive device for classifying blood sugar level is developed
using a Near-Infrared sensor (NIR). Photoplethysmogram (PPG) signals, which are
sensitive to blood glucose levels, are acquired using NIR sensors. ANOVA statistical
analysis is used to identify significant features from the PPG signal. Quadratic SVM
provided better results for features such as first derivative crest, time of inflection,
and age in classifying the diabetic and non-diabetic subjects. Results from real-time
PPG signals are compared with features taken from a diabetes dataset from Kaggle.
The results showed that proposed PPG signal analysis method performed better.
1 Introduction
The metabolic disorder known as Diabetes Mellitus (DM) is characterized by the body’s
inability to control blood sugar levels (BGLs) [1]. The amount of glucose in the blood is
controlled by the hormone Insulin, which is made in the pancreas. Type 1 and Type 2 diabetes
are the two main subtypes of DM [1–3]. When the immune system destroys the pancreatic
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 457
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_39
458 D. Sathish et al.
cells that make insulin, chronic type 1 diabetes develops [4]. A chronic disease called type
2 diabetes affects how the body metabolizes sugar [5].
2 Literature Review
Even though recording of the data using smartphone method produced promising
results, but it is highly sensitive to skin color, size of fingers, which varies from
person to person. Laser light acquisition methods used bulky spectroscopy devices
[1]. Acquisition of the samples using simple NIR sensor produced good results. Also,
some of the researchers implemented the diabetes detection techniques using deep
learning models [13, 19] and obtained satisfactory results, but to make it more robust,
large size of data samples is required. In order to keep the BGL under control, regular
monitoring is required. Therefore, there is a need for robust cost-effective, portable
non-invasive diabetes monitoring system. In this regard, the proposed method made
an attempt to develop a biomedical device, which could be used by a common man
using NIR sensor and machine learning algorithms.
3 Methodology Used
DM detection system is shown in Fig. 2, which contains DAC, Emitter (NIR light
source), ADC, and Microcontroller in the hardware design. Initially, digital signal
from the Microcontroller is converted to analog signal using DAC. The output signal
from the DAC is used to turn on the NIR light source. The finger is placed between
the Emitter and the Detector. When the light from the light source is directed to
the finger, some part of the light is absorbed by the glucose molecules, and other
part is reflected, this is detected by the NIR Detector. The detected voltage from the
detector is converted into digital form using ADC. According to Beer–Lambert law
‘As the glucose concentration increases the intensity of reflected light decreases’,
using this law, the amount of light absorbed by glucose molecule is identified. The
corresponding voltage (amount of absorption of light) obtained is used to calibrate
the model. Once the model is calibrated, it can be used to estimate the glucose level
in a non-invasive way. Depending on the estimated value of glucose, the person is
classified into diabetic or non-diabetic.
Non-invasive Diabetes Detection System Using Photoplethysmogram … 461
The NIR sensor MAX30102 is equipped with inbuilt LEDS, photo- detectors,
optical circuits, low noise circuitry, and ambient light rejection. The MAX30102
provides a full system solution for mobile and wearable devices. A single 1.8 V and
an additional 3.3 V power sources were used for operation and the internal LEDs,
respectively. The interface for communication is a common I2C-compatible one.
Software can turn off the module when there is no standby current, keeping the
supply powered at all times.
The PPG signal depicts blood flow in a blood artery. Blood flows in waves through
blood vessels as it goes from the heart to the fingertips and toes. Two of the important
parameters measured from PPG are systolic and diastolic amplitudes and timings
[15].
The dataset contains raw PPG signals from 20 diabetes patients and 27 healthy
participants who took part in the study. Data were collected using a single 860 nm-
operating reflection-based PPG sensor (MAX30102). An 18-bit ADC was used to
462 D. Sathish et al.
constantly record PPG sensor data for one minute. Data were recorded and then
saved in csv format for future use. While acquiring the PPG signal, features such
as height, weight, and age were also recorded. The subjects were advised to remain
calm and not to speak during collection of data. It has been noticed that when the
subject speaks or moves, the signal gets distorted due to noise. The entire processing
was done offline using MATLAB. From the study of PPG signals [16, 19], it is
observed that PPG signal of non-diabetic subject has elongated shape at diastole and
bell shape-type PPG signal without secondary peak in diabetic subjects.
Thirty-three features were extracted from the PPG signal as described in Table 1.
Extracted features were statistically analyzed and significant features were fed into
the SVM classifiers. Figure 3 shows the graphical representation of these features.
Area, slope, intervals, etc. which were used to represent the change in the PPG
signal’s contour were extracted. Derivatives of the PPG signal were also extracted,
which describes change in relation to the stiffness in arteria caused by diabetes and
can be a useful indicator.
Fourteen time-domain characteristics, as well as peak and trough values, were
retrieved from each PPG pulse and normalized. On each normalized PPG pulse, the
start, end, and peak points were determined. The time of start, finish, and peak are
denoted by the letters ‘Ts’, ‘Te’, and ‘Tp’, respectively. Other parameters derived
from the normalized PPG signal include systolic peak amplitude (Sp), diastolic
peak amplitude (Dp), area, AUR, AUD, Rarea, RI, RT, DT, and RRD. The first
and second derivatives were then acquired for additional feature extraction. So, total
of 33 features, in which 14 distinct features from the PPG signal and 8 and 11 features
from its first and second derivatives, respectively, were used to distinguish diabetic
and non-diabetic subjects.
The Kaggle dataset comprises 767 samples having 500 non-diabetic and 267 diabetic
cases which were used for the analysis. Each sample consists of eight features, namely
number of times pregnancies, BGL, blood pressure (BP), insulin, BMI, diabetes
pedigree function, skin thickness, and age.
Statistical t-test is conducted to distinguish diabetic subjects from non-diabetic
subjects as shown in Table 2 [14]. BP was found to be less significant in distin-
guishing diabetic with non-diabetic subjects. ANOVA test is conducted to determine
the influence of interaction among the features.
Significant features of PPG signal selected by ANOVA test are applied to various
classifiers. Quadratic SVM performed better compared to other classifiers. ROC of
quadratic SVM is shown in Fig. 5 for training. Forty-seven samples (20 diabetic and
27 non-diabetic subjects) were used for training.
From Fig. 5a, it is observed that 12% of non-diabetic samples are misclassified as
diabetic, while 86% of diabetes samples are accurately diagnosed. Figure 5b shows
the ROC curve for the negative class, where 14% of diabetic samples are misclassified
as non-diabetic, while 88% of non-diabetic samples are accurately diagnosed. Area
under the curve for quadratic SVM is found to be 0.93.
Table 3 makes a comparison of performances for the real-time features extracted
from PPG and features from Kaggle dataset. Only best-performing classifiers’ results
are displayed in Table 3 for test data (Table 4).
Since the real data samples acquired were small, conventional classifiers were
tested for the classification of diabetes and non-diabetic cases. AUC of 0.93 was
obtained with an accuracy of 86.48% for training data acquired using NIR sensor
along with various other features and 77.06% for the Kaggle training dataset [14].
a b
Fig. 5 ROC of quadratic SVM for a positive b negative class (training model)
Table 3 Performance of classifier in classifying diabetic and non-diabetic subjects for PPG features
and Kaggle dataset features (test set)
Metric Quadratic SVM Medium-Gaussian SVM
(significant PPG signal features) (%) (Kaggle dataset features) (%)
Accuracy 80 79.20
Sensitivity 100 79.31
Specificity 60 79.16
466 D. Sathish et al.
5 Conclusion
In this research, some of the conventional classifiers for classifying diabetic and
non-diabetic subjects were experimented. Initially, the various methods for PPG
signal acquisition were identified and explored. PPG signal acquisition from video
recording using smartphone [17] and sensor-based techniques were prominent [11,
18]. Recorded method was not used based on the suggestions given by the doctors of
Father Mullers Medical College, Mangaluru, since the acquisition method is highly
sensitive to color, size of fingers, which varies from person to person. To suppress
these variations, more calibration in image processing is required. Second approach
NIR sensor was used to acquire the PPG signal from the fingertip of the subject. Real-
time data were acquired using MAX30102 NIR sensor. PPG signal of one-minute
duration was collected. Along with PPG, age, height, and weight of the various
persons were collected. From the series of pulses of PPG signal, one good pulse
was selected based on the appearance and then features were extracted from normal-
ized PPG signal and its derivatives. After the extraction of features, various machine
learning algorithms such as linear SVM, quadratic SVM, cubic SVM, discrimi-
nant classifiers were explored. Quadratic SVM showed better training and testing
accuracy.
References
1. Haxha S, Jhoja J (2016) Optical based non-invasive glucose monitoring sensor prototype. IEEE
Photonics J 8(6):1–11
2. Parte RS, Patil A, Patil A, Kad A, Kharat S (2020) Non-invasive method for diabetes detection
using CNN and SVM classifier. Int J Sci Res Eng Dev 3(3):9–13
Non-invasive Diabetes Detection System Using Photoplethysmogram … 467
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 469
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_40
470 K. Datchanamoorthy et al.
1 Introduction
The quality of water is essential for various reasons, as it directly impacts both
human and environmental health. Polluted water can comprise unsafe microorgan-
isms, substances, heavy metals, and other contaminants that can cause a range of
waterborne diseases, such as diarrhoea, cholera, typhoid, and hepatitis. Maintaining
good water quality is vital to prevent the transmission of these diseases and protect
public health. The e-waste policies [1] play a vital role in protecting the environment
and enforcing the law against them. Water quality immediately impacts the environ-
ment of the ecosystem and everyone’s general well-being. Using extra water sources,
such as aquifers and the sea has sometimes been helpful in solving issues. In contrast
to using seawater, which is often linked to the spread of toxins, using groundwater
without an adequate refill may cause land to sink. Waterway utilization has partic-
ularly stood out. A few researches focusing on streams all over the world have led
to the proposal of a design discipline known as stream designing. It’s important to
consider biological effects, silt mobility, water quality, and pollutant transmission
mechanisms [2, 3] while designing for in-stream circumstances. Since the varied
uses of water in our daily beings, it is crucial to ensure its availability, quality, and
sustainable use. Water conservation, responsible water management practices, and
efforts to protect water sources are necessary to secure this vital resource for current
and future generations. The image (Fig. 1) illustrates how water is used in several
aspects of daily living.
It is essential to have a machine learning approach depending on the characteris-
tics of the data, the particular job of predicting the water quality, and the available
computer resources. The work is organized as: Sect. 2 incorporates the extensive
study on the water quality. Section 3 highlights the proposed prediction method.
Fig. 1 Applications of
drinking water
Hybrid Machine Learning Algorithms for Effective Prediction of Water … 471
Section 4 gives the implementation and algorithms used. Section 5 gives results and
a comprehensive discussion on the outcome. Section 6 presents the conclusion with
future research directions.
2 Related Research
The water quality prediction model employs the weighted arithmetic index method
to determine the water quality index (WQI) using principal component regression
technique [4]. The dataset undergoes principal component analysis (PCA), where the
significant WQI parameters are extracted [5]. Regression techniques are then imple-
mented on the PCA result to predict the WQI. An experimental analysis is performed
using a dataset linked to Gulshan Lake to evaluate the proposed approach. The results
indicate that the principal component regression method has a prediction accuracy of
95%, while the gradient boosting classifier method [6, 7] has a classification accuracy
of 100% compared with the latest models.
Using sophisticated artificial intelligence (AI) technologies, the water quality
index (WQI) and the water quality classification (WQC) are calculated [8, 9]. The
long short-term memory (LSTM) deep learning technique and the nonlinear autore-
gressive neural network (NARNET) have been developed for the WQI prediction. The
outcomes demonstrated that the suggested models can accurately predict WQI and
identify the water standard with improved resilience. The neural network with long
short-term memory (LSTM) and improved grey relational analysis (IGRA) method
are suggested [10]. First, it is suggested that IGRA be used to choose features for
water data quality based on similarity and proximity, taking into account overall
multidimensional association of that information. Second, the water quality fore-
casting proposed based on LSTM, where inputs are the attributes acquired by IGRA,
is created while taking the temporal sequence of the water quality information into
consideration. Lastly, Victoria Bay and Tai Lake, two real-world datasets on water
quality, are used to test the suggested methodology. According to experimental
results, the model demonstrates superior performance in water quality prediction
by fully utilizing the multiplex relationships and patterns within the water quality
information. This approach is more effective than single-feature or non-sequential
prediction methods.
Based on the information gathered from four monitoring sites along the La Buong
River between 2010 and 2017 (WQI), an evaluation of twelve different machine
learning (ML) models has been done to develop a water quality indicator (WQI) [11,
12]. However, it was found that the extreme gradient boosting (XGBoost) method
was the most accurate and productive (R2 = 0.989 and RMSE = 0.107). The results
indicate that all twelve models demonstrated potential predictive accuracy for the
WQI. This study highlights the potential of ML models, in particular XGBoost, for
precise WQI forecasting and for advancing better administration of water quality
techniques.
472 K. Datchanamoorthy et al.
3 Proposed Method
This proposed model of voting classifier (Fig. 2) will be used to develop a machine
learning model to categorize water quality. Data collection is the first step in the
process, during which information is gathered from earlier water samples. Here,
data analysis is being used to identify dependent and independent properties. The
proper machine learning approaches, which are explained below, are applied to the
collection where the trend in the data is gathered. After experimenting with a number
of algorithms, a more effective method is employed to predict the outcome.
System Architecture
See Fig. 2.
Data Wrangling
In this phase, the loaded data is checked, and then the dataset is pruned and purified
in preparation for analysis. It is essential to record the procedures followed during
the cleaning process and provide justification for the decisions made regarding the
data refinement.
Data Collection
To forecast the provided data, the collected dataset has been divided into two compo-
nents: the training set and the test set. The ratio used to divide the training set from
the test set is typically 7:3. The data model is then applied to the training set, and the
accuracy of the test set is predicted based on the test results.
Data Preprocessing
The absence of certain data points in the dataset renders it unreliable. To boost the
algorithm’s effectiveness and produce superior outcomes, inputs must be prepro-
cessed. Data cleansing will require less work, and more time will be devoted to
research and modelling [15, 16].
Data Visualization
Data visualization offers a crucial set of tools for acquiring qualitative insight [17].
The use of data visualizations is a powerful way to convey and demonstrate impor-
tant relationships within graphs and charts, which are often easier for viewers to
understand and appreciate than complex correlation or importance metrics, even
with limited subject knowledge.
4 Algorithm Implementation
Random forest is a trendy machine learning technique (Table 1) for classification and
regression problems. A distinct subset of the data and traits is used to train each deci-
sion tree. This lessens oversampling and strengthens the system’s defences against
anomalies and noisy data. The approach [18] may also deliver feature significance
ratings, which can be used to categorize the most crucial components of the data and
will be able to determine accuracy as seen in the result below.
data. Then, for each class, the conditional probability based on each characteristic in
the input data is determined. The anticipated class is then assigned to the group with
the greatest possibility [20, 21]. It is recognized for being simple to use, having a quick
learning curve, and being able to handle massive amounts of data. Its performance,
however, might be impacted by its susceptibility to unimportant or related variables.
Voting classifiers have ensemble learning approaches that combine different machine
learning techniques to produce predictions in artificial intelligence. It works (Table 5)
by combining the results of the forecasts from numerous different models and
selecting the most frequently regarded approximation as the end outcome. Machine
learning typically uses voting classifiers (Fig. 3) to boost prediction accuracy and
stability. They are especially useful when there is little training data or if the various
models have different strengths and weaknesses [24–26]. A few machine learning
models that can be used with voting classifiers include neural networks, decision
trees, and support vector machines.
Table 6 depicts the comparisons between the various algorithms with the proposed
voting classifier algorithm.
As a result, the intended outcome in the prior system model was not realized. The
accuracy of the current system is 62.5%. When the model was implemented, the
final result indicates whether the water is drinkable or not, and we have attained an
accuracy of 74.5% with numerous other new algorithmic advances.
The work makes use of hybrid machine learning techniques, which incorporate
different algorithms (Fig. 4). These algorithms include logical regression, Cat Boost,
Naive Bayes, random forest, and logical regression. The objective of this project
is to increase accuracy from the previous models’ maximum of 62–80% using a
combination of algorithms. Following the implementation of the aforementioned
methods, the accuracy of the random forest algorithm was 67.3%, that of the logical
regression algorithm was 59.4%, that of the Cat Boost algorithm was 69.3%, and that
of the Naive Bayes algorithm was 60.4%. The accuracy of these separate algorithm
types was subsequently raised to 74.5% by using a voting classifier. To increase the
model’s accuracy, it is advantageous to apply hybrid machine learning techniques.
The advantages of each algorithm are taken advantage of, and its shortcomings are
made up for by combining them. The success of this research may be due to the
appropriate choice and blending of the algorithms utilized, as the accuracy of 74.5%
attained is a major improvement over the prior models.
478 K. Datchanamoorthy et al.
6 Conclusion
In this research, a novel method is proposed to predict the water quality. The propo-
sition of water quality predictions using hybrid algorithms is rooted in the recog-
nition of the critical importance of ensuring clean and safe water sources for both
human well-being and environmental preservation. The adoption of hybrid algo-
rithms, which combine the strengths of multiple prediction models and data-driven
techniques, represents a ground-breaking approach to address these challenges. The
numerical results show that the proposed model shows a better trade-off in contrast to
existing approaches. The prediction accuracy of this model is 74% which is compar-
atively higher than the other models. Moreover, the proposed model achieves better
results for other metrics like sensitivity, specificity, MCC, FPR, FAR, and AUC. The
learning approaches show the higher significance of the prediction model. Hence,
the hybridization of learning approaches has opted for the upcoming research works.
References
4. Bouamar M, Ladjal M (2008) Evaluation of the performances of ANN and SVM techniques
used in water quality classification, pp 1047–1050. https://doi.org/10.1109/ICECS.2007.451
1173
5. Al-Odaini N, Zakaria M, Zali M, Juahir H, Yaziz M, Surif S (2011) Application of chemometrics
in understanding the spatial distribution of human pharmaceuticals in surface water. Environ
Monit Assess 184:6735–6748. https://doi.org/10.1007/s10661-011-2454-3
6. Sinha K, Das P (2015) Assessment of water quality index using cluster analysis and artificial
neural network modeling: a case study of the Hooghly River basin, West Bengal, India. Desalin
Water Treat 54:28–36. https://doi.org/10.1080/19443994.2014.880379
7. Aldhyani T, Al-Yaari M, Alkahtani H, Maashi M (2020) Water quality prediction using artificial
intelligence algorithms. Appl Bionics Biomech 2020:1–12. https://doi.org/10.1155/2020/665
9314
8. Berry M, Mohamed A (2020) Supervised and unsupervised learning for data science. https://
doi.org/10.1007/978-3-030-22475-2
9. Choubin B, Khalighi-Sigaroodi S, Malekian A, Kişi Ö (2016) Multiple linear regression, multi-
layer perceptron network and adaptive neuro-fuzzy inference system for forecasting precipi-
tation based on large-scale climate signals. Hydrol Sci J 61(6):1001–1009. https://doi.org/10.
1080/02626667.2014.966721
10. Diamantopoulou M, Papamichail D, Antonopoulos V (2005) The use of a neural network
technique for the prediction of water quality parameters. Oper Res Int J 5:115–125. https://doi.
org/10.1007/BF02944165
11. Rogier A, Donders T, van der Heijden GJMG, Stijnen T, Moons KGM (2006) Review: a
gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091. ISSN
0895-4356. https://doi.org/10.1016/j.jclinepi.2006.01.014
12. Seifeddine M, Bradai A, Bukhari S, Quang PTA, Ben Ahmed O, Atri M (2020) A survey on
machine learning in internet of things: algorithms, strategies, and applications. Internet Things.
https://doi.org/10.1016/j.iot.2020.100314
13. Shepur R, Shivapur A, Kumar P, Hiremath C, Dhungana S (2019) Utilization of water quality
modeling and dissolved oxygen control in river Tungabhadra, Karnataka (India). OALibJ 06:1–
17. https://doi.org/10.4236/oalib.1105397
14. Othman F, Alaaeldin ME, Seyam M, Najah A-M, Teo FY, Chow MF, Afan H, Sherif M, El-
Shafie A (2020) Efficient river water quality index prediction considering minimal number
of inputs variables. Eng Appl Comput Fluid Mech 14:751–763. https://doi.org/10.1080/199
42060.2020.1760942
15. Hildenbrand ZL, Carlton DD Jr, Fontenot BE, Meik JM, Walton JL, Taylor JT, Thacker JB,
Korlie S, Shelor CP, Henderson D, Kadjo AF, Roelke CE, Hudak PF, Burton T, Rifai HS,
Schug KA (2015) A comprehensive analysis of groundwater quality in the Barnett Shale region.
Environ Sci Technol 49(13):8254–8262. https://doi.org/10.1021/acs.est.5b01526. Epub 26 Jun
2015. PMID: 26079990
16. Chen T, Chen H, Liu R-W (1995) Approximation capability in by multilayer feedforward
networks and related problems. IEEE Trans Neural Netw 6:25–30. https://doi.org/10.1109/72.
363453
17. Hornik K, Stinchcombe MB, White HL (1989) Multilayer feedforward networks are universal
approximators. Neural Netw 2:359–366
18. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. In: International
conference on learning representations
19. Lau J, Hung WT, Cheung CS (2009) Interpretation of air quality in relation to monitoring
station’s surroundings. Atmos Environ 43:769–777. https://doi.org/10.1016/j.atmosenv.2008.
11.008
20. Liu P, Wang J, Sangaiah AK, Xie Y, Yin X (2019) Analysis and prediction of water quality
using LSTM deep neural networks in IoT environment. Sustainability (Switzerland) 11. https://
doi.org/10.3390/su1102058
21. Salari M, Salami E, Afzali SH, Ehteshami M, Gea OC, Derakhshan Z, Sheibani S (2018)
Quality assessment and artificial neural networks modeling for characterization of chemical
480 K. Datchanamoorthy et al.
and physical parameters of potable water. Food Chem Toxicol 118. https://doi.org/10.1016/j.
fct.2018.04.036
22. Liu Y, Zhao T, Ju W, Shi S (2017) Materials discovery and design using machine learning. J
Mater 3. https://doi.org/10.1016/j.jmat.2017.08.002
23. Shukla P (2020) Development of fuzzy knowledge-based system for water quality assessment
in river Ganga. https://doi.org/10.1007/978-981-15-3287-0_2
24. Ma C, Zhang H, Wang X (2014) Machine learning for big data analytics in plants. Trends Plant
Sci 19. https://doi.org/10.1016/j.tplants.2014.08.004
25. Ma J, Ding Y, Cheng J, Jiang F, Xu Z (2019) Soft detection of 5-day BOD with sparse matrix
in city harbor water using deep learning techniques. Water Res 170:115350. https://doi.org/10.
1016/j.watres.2019.115350
26. Rahman SS, Hossain M (2019) Gulshan Lake, Dhaka City, Bangladesh, an onset of continuous
pollution and its environmental impact: a literature review. Sustain Water Resour Manag 5:767–
777. https://doi.org/10.1007/s40899-018-0254-4
OCR-Based Ingredient Recognition
for Consumer Well-Being
Abstract Many companies utilize retail tactics to entice consumers into purchasing
their products, all the while obscuring the potential risks associated with certain
ingredients used in these products. This lack of transparency places consumers at risk
of unforeseen consequences. The development of mobile applications for recognizing
text from images and processing it for various purposes has become an increasingly
important research area. In this paper, we present an OCR-based Android application
designed to recognize ingredients on product labels and match them with a database
of toxic substances. The application uses the Google ML Kit to recognize text and the
Levenshtein algorithm to calculate the similarity between the recognized ingredients
with the database. The application retrieves information on toxic substances from
the Firebase database, which is used to validate the recognized ingredients. The
application also implements a user interface for displaying the list of toxic ingredients
found in the product, providing a warning to the user. Our application provides
a simple and effective method to check for toxic ingredients, ensuring consumer
safety.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 481
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_41
482 S. Kayalvizhi et al.
1 Introduction
In our fast-paced society, people are increasingly concerned about their health and
well-being. However, not all health products in the market are safe and reliable. This
makes it essential to have accurate information about potential hazards. Mobile apps
can provide quick access to this information. Optical character recognition (OCR)
technology scans ingredients from product labels to detect harmful substances. This
empowers consumers to make informed decisions about what they buy.
2 Motivation
Consumer safety is crucial, especially when it comes to products that may contain
harmful ingredients. Unfortunately, there are few software solutions available to iden-
tify these risks. Advanced tools are needed to help users make informed decisions.
In addition, user-friendly interfaces are necessary to provide clear and concise infor-
mation. This ensures that even users without technical expertise can easily navigate
the software. By prioritizing both software effectiveness and accessibility, we can
create solutions that benefit a wider range of users.
3 Literature Survey
See Table 1.
4 Existing Systems
Existing systems for identifying food ingredients and nutritional information have
limitations. FoodWiki uses semantic search and concept matching to provide infor-
mation on food additives, but has a static database [1]. Open Food Facts is a
crowdsourced database that lacks quality control and does not provide interpreta-
tion of nutritional information. SmartLabel requires users to scan QR codes and its
information can be difficult to navigate [2].
Several algorithms, such as Mutha’s food detection and recognition system [6],
Ocay’s NutriTrack [7], and Mokdara’s personalized food recommendation system
[8] have been proposed to improve food information systems. OCR technology has
been analyzed by Hamad and Kaya [9], Lund et al. [10], and Lázaro et al. [11], while
Bhavani surveyed coding algorithms for medical image compression [12].
OCR-Based Ingredient Recognition for Consumer Well-Being 483
However, none of these systems match the proposed OCR ingredient scanning
application in terms of comprehensiveness, reliability, and accuracy.
5 Proposed System
The proposed system solves the issues that exist in current systems and aims to
provide optimal solutions for the problem at hand. The suggested system is a mobile
optical character recognition (OCR) application that reads text from a picture and
does a Levenshtein similarity analysis to assess if the elements in the image are
harmful or not. A distance computation technique called the Levenshtein algorithm
works to compare two strings. The technology recognizes the components from a
picture and compares them with a database of harmful substances in order to help
users assess the safety of ingredients in home items or cosmetics. The software
analyzes the safety of the components once the user snaps a picture of the product
label or ingredient list.
The system helps to identify harmful ingredients in products by taking a photo
of the label or ingredient list using a mobile app. It includes a user interface, image
processing, OCR, and a database. The app extracts text using Google’s ML Kit and
compares it with a database of dangerous substances using the Levenshtein algorithm.
If toxic ingredients are detected, a warning message is displayed. Firebase Realtime
Database stores toxic ingredient data. The system is a convenient way for consumers
to identify harmful ingredients (Fig. 1).
484 S. Kayalvizhi et al.
The Levenshtein algorithm, also referred to as the edit distance algorithm, provides
precise outcomes by determining the minimum number of single-character modifi-
cations needed to convert one string into another. This algorithm considers both the
order and the spelling of the characters in the strings, ensuring that it accommodates
minor variations in spelling or order. As a result, the Levenshtein algorithm is able
to accurately match short strings such as ingredients, even in cases where the strings
have minor spelling or order differences. Compared with other string comparison
algorithms, Levenstein algorithm yields accurate results.
⎧
⎪
⎪ |a| if |b| = 0,
⎪
⎪
⎪
⎪ |b| if |a| = 0,
⎪
⎨
lev(tail(a), tail(b)) if a[0] = b[0],
lev(a, b) = ⎧
⎪
⎪ ⎨ lev(tail(a), b)
⎪
⎪
⎪
⎪ 1 + min lev(a, tail(b)) otherwise.
⎪
⎩ ⎩
lev(tail(a), tail(b))
Fig. 2 Database
Substance Author Year Effect Toxicity
for A = 1 to s1.length():
for B = 1 to s2.length():
cost = if s1[A − 1] == s2[B − 1] then 0 else 1
dp[A][B] = min(dp[A − 1][B] + 1, dp[A][B − 1] + 1, dp[A − 1][B −
1] + cost)
maxLen = max(s1.length(), s2.length())
return 1.0 − (double)dp[s1.length()][s2.length()]/maxLen
6 Module Description
The OCR-based ingredient scanning app has a simple user interface created using
the Android development framework with three components: two ImageViews and
a TextView. The Google ML Kit Vision Text Recognition API extracts text from
selected images, and the Firebase Realtime Database stores a list of known toxic
substances (Fig. 2).
If the similarity score calculated is above a threshold value (0.8 in this case), the
app considers the ingredient to be toxic and adds it to the list of toxic ingredients.
Our study demonstrated the effectiveness of the app in identifying harmful ingre-
dients and providing users with valuable information to make informed decisions
about the products they consume. The app has the potential to significantly reduce
the risk of exposure to harmful ingredients and promote healthier and safer consumer
habits.
486 S. Kayalvizhi et al.
Fig. 3 a Application icon. b Home screen. c Input image when scanning pancake syrup. d Harmful
ingredients list
In Fig. 3a, the user initiates a click on the application icon. Subsequently, the
user is redirected to the home screen depicted in Fig. 3b. Once we capture image or
select image from gallery and feed it as input to the application in Fig. 3c, it then
recognizes the text, preforms Levenshtein’s distance algorithm to detect harmful
ingredients. The harmful ingredients are then displayed in the card view in Fig. 3d.
Table 2 shows the results of a product testing for harmful ingredients detection
accuracy, exactness, recall, F1-measure, and speed. The products tested include
soaps, shampoos, facewashes, food items, and miscellaneous items.
Fig. 4 Accuracy
Figure 4 illustrates the accuracy of the tested products in detecting harmful ingre-
dients. Most products achieved a 100% accuracy rate, except for Lakme sunscreen,
which had lower accuracy due to its font impacting OCR recognition. Improving the
font could enhance Lakme’s accuracy and ratings.
Figure 5 focuses on the exactness scores, indicating how precisely the prod-
ucts detected harmful ingredients. The majority of the tested products showed high
exactness scores, ensuring accurate identification of potentially harmful components.
Nonetheless, Ponds product had a lower exactness score, implying that its detection
mechanism might have experienced some limitations or challenges.
Figure 6 highlights the retrieval scores of the tested products, which measure
how effectively they retrieved information about harmful ingredients. Most of the
products demonstrated good retrieval performance, successfully obtaining relevant
data. By optimizing the retrieval process, Lakme can enhance its ability to access vital
information regarding harmful ingredients, thus contributing to improved product
safety.
Figure 7 presents the F1-measure scores for the products, taking into account both
accuracy and recall. Again, most of the tested products displayed high F1-measure
scores, indicating excellent performance in detecting harmful ingredients. However,
both Ponds and Lakme sunscreens showed lower F1-measure scores. Addressing
this issue would be crucial in further enhancing the ability of Ponds and Lakme to
identify harmful ingredients effectively.
In Fig. 8, the speed of the tested products is evaluated, with the focus on how
quickly they performed the harmful ingredient detection process. Interestingly, all
the products exhibited lower speed scores. This suggests that there is room for
improvement in the efficiency of the detection process across all tested items.
488 S. Kayalvizhi et al.
Fig. 5 Exactness
Fig. 6 Retrieval
OCR-Based Ingredient Recognition for Consumer Well-Being 489
Fig. 7 F1-measure
Fig. 8 Speed
490 S. Kayalvizhi et al.
8 Conclusion
9 Future Scope
The project using OCR and string comparison to detect harmful ingredients has
potential for future development in terms of improving algorithms for ingredient
analysis using machine learning and natural language processing. The system could
also expand to cover a wider range of consumer products and integrate with existing
regulatory frameworks. There is significant potential for improving consumer safety
and promoting transparency in the products we use daily. Future work includes
expanding the database, dynamically updating it, improving the matching algorithm,
and integrating the system with other applications or services.
References
1. Çelik Ertuğrul D (2016) FoodWiki: a mobile app examines side effects of food additives via
semantic web. J Med Syst 40:41
2. Giménez-Arnau E (2019) Chemical compounds responsible for skin allergy to complex
mixtures: how to identify them? Cosmetics 6(4):71
3. Zulaikha R, Norkhadijah SI, Praveena SM (2015) Hazardous ingredients in cosmetics and
personal care products and health concern: a review. Public Health Res 5(1):7–15. https://doi.
org/10.5923/j.phr.20150501.02
4. Barnett J, Leftwich J, Muncer K, Grimshaw K, Shepherd R, Raats MM, Gowland MH, Lucas
JS (2011) How do peanut and nut-allergic consumers use information on the packaging to avoid
allergens? Allergy 66:969–978
5. Al-Saleh I, Al-Enazi S, Shinwari N (2009) Assessment of lead in cosmetic products. Regul
Toxicol Pharmacol 54(2):105–113
6. Mutha A, Patel K (2020) Food detection and recognition system. IJREAM 05(12):316–319
7. Ocay AB, Fernandez JM (2017) NutriTrack: Android-based food recognition app for nutri-
tion awareness. In: 2017 3rd IEEE international conference on computer and communications
(ICCC)
8. Mokdara T, Pusawiro P (2018) Personalized food recommendation using deep neural network.
In: 2018 seventh ICT international student project conference (ICT-ISPC)
OCR-Based Ingredient Recognition for Consumer Well-Being 491
9. Hamad K, Kaya M (2016) A detailed analysis of optical character recognition technology. Int
J Appl Math Electron Comput (Special Issue 1):244–249
10. Lund WB, Kennard DJ, Ringger EK (2013) Combining multiple thresholding binarization
values to improve OCR output. Presented in document recognition and retrieval XX conference
2013, California. SPIE, USA
11. Lázaro J, Martín J, Arias J (2010) Neuro semantic thresholding using OCR software for high
precision OCR applications. Image Vis Comput
12. Bhavani S, Thanushkodi K (2010) A survey on coding algorithms in medical image
compression. Int J Comput Sci Eng 2(5):1429–1434
Design and Implementation of an SPI
to I2C Bridge for Seamless
Communication and Interoperability
Between Devices
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 493
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_42
494 J. Harirajkumar et al.
1 Introduction
2 SPI
Motorola first created SPI, a protocol for synchronous serial transfer, in the middle of
the 1980s. It offers full-duplex communication and has a four-wire interface. Single
master device and numerous slave devices make up the master–slave concept that
serves as the foundation of the SPI architecture. Up to 400 Mbps of data rate are
supported by SPI [1]. The four pins listed below are present on every SPI device:
1. SCLK (Serial Clock): This pin is responsible for the generation of the clock
signal by the master.
2. MOSI: This pin is used by the master to transmit data to the slave device.
3. MISO: This pin is used by the master to receive data from the slave equipment.
4. SS: With the use of this pin, the master can choose which specific slave device
it wants to speak with. The combinations of the clock polarization (CPOL) and
clocking phase (CPHA) dictate the four various data transfer modes that SPI
allows. These modes specify how the signal from the clock and the data are
related in timing. CPOL and CPHA pairs are a frequent way to express the
modes.
Low idle time on the clock. The leading (rising) clocks edge is where the data is
sampled. The trailing (falling) clock edge is used to distribute data. The trailing edge
of the clock is used to sample the data. The lead (rising) clock edge is where data
is moved out. The timer is set to high idle. The lead (falling) clock edge is where
the data is sampled. The following (rising) clock edge is when data is distributed.
Data is collected on the clock edge that is following (rising). The lead (falling) clock
edge causes a shift in the data. MSB first and LSB foremost are two additional
bit orderings that SPI supports. To enable proper data transfer, the communicating
devices have to concur on the identical bit order. The clock rate (SCLK) supplied by
the master controls the SPI communication’s speed. The clock frequency specifies
Design and Implementation of an SPI to I2C Bridge for Seamless … 495
the rate at which data is transferred. The maximum clock frequency depends on the
capabilities of the devices and is typically specified in their datasheets. SPI supports
communication with multiple slave devices using individual chip select (CS) lines
(Fig. 1).
The master activates the specific slave’s chip select line to select it for commu-
nication. Each slave device has its own CS line, enabling selective communication
while others remain inactive. SPI does not include built-in error checking mecha-
nisms like checksums or acknowledgements. It assumes a reliable communication
channel between the master and slaves. If error detection and correction are required,
additional protocols or mechanisms can be implemented on top of SPI. In terms of
topology, SPI typically follows a bus configuration. The master device is connected
to multiple slave devices through shared data lines (MISO and MOSI) and a clock
line (SCLK). Each slave device has its dedicated chip select line (CS) to enable or
disable communication. SPI is commonly used in full-duplex mode, where data can
be simultaneously transmitted and received. However, it also supports half-duplex
mode, where the master and slave devices take turns transmitting and receiving data,
as well as simplex mode, where data flows in only one direction. Remember that
specific implementation details of SPI can vary between devices and manufacturers.
It’s recommended to consult the documentation and data sheets of the devices being
used for precise information about their SPI implementation (Fig. 2).
The Steps Outline the Communication Process in SPI
1. The time signal (SCLK), which is normally generated by the master and operates
in the MHz band.
496 J. Harirajkumar et al.
2. The figure at the beginning of the slave select line is high. The master must push
data below the selected slave device’s serial line in order for communication to
start.
3. The data stored in the MOSI and MISO values of the master and slave circuitry
is pushed out and sent to one another with each SCLK clock pulse [2].
4. When all of the data bits, that are normally eight bytes, have been sent, the
master computer pulls above the SS line of the slave to signify the end of the
transmission. Currently, the master has the first data kept by the slave, whereas
the slave currently houses the initial data registered in the master [3].
I2C
The synchronized serial protocol for communication I2C, also known as inter-
integrated circuit was developed that Philips Semiconductor first announced in 1982.
It uses a two-wire connection and can communicate in half-duplex [4]. The archi-
tecture of I2C follows a master–slave model. I2C is a serial communication protocol
that utilizes a bus topology, enabling multiple devices to be connected on the same
bus. Every equipment on the bus has an individual address, and the protocol supports
both master and slave roles, with communication initiated by the master. The I2C
protocol requires only two wires: SDA and a SCL.
These lines are open-drain, allowing any device to pull them low, while pull-up
resistors ensure they remain in the high state when not actively driven low. In an I2C
system, the master device controls communication by initiating data transfers and
generating the clock signal. Slave devices respond to commands or requests from
the master. The flow of communication is determined by the master through start
and stop conditions, which mark the beginning and end of data transfers. In order to
accomplish this, the master controls the SDA line, while the SCL line is elevated.
Each I2C device connected to the bus has a distinct 7-bit or 10-bit location that makes
it easier to communicate with particular slave devices (Fig. 3).
Design and Implementation of an SPI to I2C Bridge for Seamless … 497
The 7-bit addressing scheme is commonly used, with the first seven bits of a
transmitted byte representing the device address. The path of data transport is then
indicated by a read/write bit. I2C offers three alternative data transfer speeds: stan-
dard (up to 100 kb), rapid (up to 400 kb), and fast speeds (as much as to 3.4 Mbps).
These modes define the timing characteristics, such as clock frequencies and data
setup/hold times. After each byte of data transfer, both the master and slave devices
send an acknowledgement bit (ACK) on the SDA line. If a device receives a byte
correctly, it pulls the SDA line low during the ACK bit. If a device does not acknowl-
edge, it leaves the SDA line high (NACK), indicating an error or the end of data
transfer. In I2C, slave devices have the ability to stretch the clock signal by holding
the SCL line low if they require additional time to process data. This feature allows
slower devices to synchronize their communication speed with faster devices. In a
multi-master configuration, where multiple masters attempt to communicate simul-
taneously, arbitration takes place. Each master monitors the bus and compares the
level on the SDA line with the level it is driving. If a conflict is detected, the master
relinquishes control, allowing the winning master to continue communication. The
I2C protocol also includes support for a general call address (0x00), which enables a
master to broadcast a command or data to all devices on the bus simultaneously. I2C is
extensively used for interconnecting a wide range of devices in various applications,
including embedded systems, consumer electronics, and communication interfaces.
Its simplicity and flexibility make it suitable for connecting sensors, EEPROMs, LCD
displays, real-time clocks, and numerous other peripherals. I2C supports different
data rates, including 3.4 Mbps in high-speed mode, 100 kbps in standard mode, and
400 kbps in fast mode. The subsequent two pins are included in every I2C device:
498 J. Harirajkumar et al.
1. SCL: The master device generates the clock signal, typically operating in the
kHz range, on this line [5].
2. SDA: This line is responsible for carrying the serial data between the master and
slave devices.
In an I2C frame, there are typically 20 bits, including the Start bit, 7 bits for the 8
bits of data, an acknowledgement (ACK) bit, the address, an R/W (Read/Write) bit,
another ACK bit, and then the Stop bit [6] (Fig. 4).
Steps Outline the Communication Process in I2C
1. Ideally, the SCL and SDA signals are both high at the beginning.
2. SDA transitions from high to low levels, followed by SCL migrating between
high to low, to start the conversation. This series stands in for the Start bit.
3. The clock signal is generated by the master via the SCL line, and every data
transfer among the devices is timed to this clocking output on the SDA lines [7].
4. The master transmits the physical location of the slave unit it wants to talk to. An
R/W bit is inserted to the address to specify whether the action being performed
is a read and write [8]. Just the slave device that matches its address on the line
connected to the SDA sends an ACK signal, out of every one of the slave devices
linked to the master. The SCL and SDA connections are given back to other slave
devices. When reading or writing, the R/W flag is set to 1, respectively.
5. The information is transmitted following the address. The SCL line changes
from low to elevated after the data transmission is finished, subsequently, the
SDA line shifts from low to high. The Stop bit is represented by this sequence,
which denotes send of communication [9].
Here Are Some Key Aspects and Features Associated with SPI to I2C bridges
Proposed Work
Designing an SPI and I2C bridge in Verilog can be a complex task, but I can provide
you with a high-level outline of the proposed work. Keep in mind that this is a
simplified overview, and the actual implementation may require additional steps and
considerations depending on your specific requirements.
Define Interface Parameters
Determine the clock frequency and data rates for both I2C and SPI interfaces.
Define the data widths for the input and output data buses.
I2C Slave Module
Implement an I2C slave module to receive data from the I2C master.
Incorporate the necessary logic to recognize start, stop, and data transmission
sequences.
SPI Slave Module
Implement an SPI slave module to receive data from the SPI master.
Include logic to handle various SPI modes and timing requirements.
Internal Data Handling
Create a buffer or FIFO to store the incoming data from both I2C and SPI interfaces.
Design logic to handle data flow and conversions between the two interfaces.
Address Mapping
Define address mapping for the bridge to enable reading and writing from specific
registers.
Implement logic to decode incoming addresses and route data accordingly.
State Machine
Design a state machine to manage the operation of the bridge.
This state machine will handle the transitions between I2C and SPI modes and
manage data flow.
Clock Domain Crossing
If the I2C and SPI interfaces operate on different clock domains, implement
appropriate clock domain crossing techniques.
Testing
Create a test bench to verify the functionality of the bridge.
Write test cases to cover various scenarios, including different data rates, read/
write operations, and error handling.
500 J. Harirajkumar et al.
3 Results
With the help of EDA Playground, a top-level design known as “SPI to I2C” was
constructed to analyze the simulation data and produce the stated design.
The I2C slave (the recipient) is represented by an instance of the I2C master called
I2C slave. Due to the reduced performance of the I2C slave during the acknowledge-
ment step, it must hold onto the I2C SCL in order to synchronize and transmit the
acknowledgement effectively. The simulated wave form’s thickening of the SCL lines
serves as an indicator of this. The I2C SDA line is also bi-directional, which results
in an EDA high state on the SDA line, while the I2C master waits to receive confir-
mation from the I2C slave. The waveform of the simulation shows this behaviour as
well. The outcome of the “SPI to I2C” top-level design simulation in which the data
supplied by the SPI masters on the MOSI line arrives successfully on the I2C SDA
line (Figs. 7, 8).
The serial peripheral interface (SPI) communication waveform is composed of
multiple clock cycles during which data is transferred between a master and a slave
device. It’s important to note that the specific timing and behaviour of the SPI wave-
form can vary based on the SPI mode and the devices involved in the commu-
nication. The clock frequency, clock polarity, and clock phase are configurable
parameters that need to be set consistently between the master and slave devices
for successful communication. The typical waveform for writing data over the I2C
(Inter-Integrated Circuit) bus consists of the typical waveform for reading data over
the I2C (Inter-Integrated Circuit) bus consists of the Protocol (Figs. 9, 10).
Design and Implementation of an SPI to I2C Bridge for Seamless … 503
4 Conclusion
Understanding the differences between SPI and I2C, including their wave forms and
usage scenarios, is essential when selecting the appropriate protocol for a specific
application. Both SPI and I2C have their own advantages and are suitable for different
applications. SPI is generally faster and more suitable for short-distance communi-
cations, while I2C is preferred for slower-speed communication and for connecting
multiple devices on the same bus.
I2C is a multi-master, synchronous, half-duplex, and synchronous protocol
that uses a shared bus, accommodating multiple devices. It offers a standardized
addressing scheme, enabling straightforward scalability in larger systems. With I2C,
fewer pins are required for communication, although it operates at slower data rates
compared with SPI. It finds common usage in interconnecting sensors, low-speed
peripherals, and integrated circuits. However, the I2C protocol may face limitations
regarding signal integrity and communication reliability due to constraints such as
maximum capacitance and bus length.
Overall, both the SPI and the I2C protocols have advantages and disadvantages,
and the choice of protocol should be made depending on the particular needs and
limitations of the task at hand.
References
1. Kiran V, Nagraj V (2013) Design of SPI to 12C protocol converter and implementation of
low-power techniques. Int J Adv Res Comput Commun Eng 2
2. Singh JK, Tiwari M, Sharma V (2012) Design and implementation of I2C master controller on
FPGA using VHDL. IJET 4(4)
3. Patnaik E, Bandewar S (2017) Design and realization for interfacing two integrated device
using I2C bus—a review
4. Trujilho L, Saotome O, Öberg J (2022) Dependable I2C communication with FPGA. In:
Proceedings of the 7th Brazilian technology symposium (BTSym’21) emerging trends in
systems engineering mathematics and physical sciences, vol 2. Springer
5. Warrier AS et al (2013) FPGA implementation of SPI to I2C bridge. Int J Eng Res Technol
(IJERT) 2(11)
Design and Implementation of an SPI to I2C Bridge for Seamless … 505
6. Trivedi D et al (2018) SPI to I2C protocol conversion using Verilog. In: 2018 fourth international
conference on computing communication control and automation (ICCUBEA). IEEE
7. Levshun D, Chechulin A, Kotenko I (2018) A technique for design of secure data transfer
environment: application for I2C protocol. In: 2018 IEEE industrial cyber-physical systems
(ICPS). IEEE
8. Sahu A, Mishra RS, Gour P (2011) An implementation of I2C using VHDL for DATA
surveillance. Int J Comput Sci Eng (IJCSE) 3(5):1857–1865
9. Sajjanar S, Prabhakar MD (2020) A review on BIST embedded I2C and SPI using Verilog
HDL
10. Oudjida AK et al (2009) FPGA implementation of I2C & SPI protocols: a comparative study.
In: 2009 16th IEEE international conference on electronics, circuits and systems (ICECS 2009).
IEEE
11. Sabeenian D, Harirajkumar J, Akshaya B. Review paper of multipliers-driven perturbation of
coefficients for low power operation in reconfigurable FIR filter. Turk J Physiother Rehabil
32:2
Greenhouse Automation System Using
ESP32 and ThingSpeak for Temperature,
Humidity, and Light Control
Abstract There has been a lot of excitement around greenhouse automation systems
and their potential to improve growing conditions and yield. This project describes
how to build a Greenhouse Automation System using an ESP32 WiFi Board, an
AM2301 humidity and temperature sensor, an Actuator Relay Module for light
control, and the addition of ThingSpeak for data analysis. The key functions of a
Greenhouse Automation System are the continuous monitoring and adjustment of
environmental conditions like as temperature, humidity, and lighting. It is beneficial
to plant health and yield when these elements are assessed and regulated accurately.
With the system’s automated lighting based on user-defined schedules or prede-
fined criteria, plant growth and energy use may be optimized. Connectivity with
ThingSpeak, an IoT analytics platform, further streamlines the solution for data
visualization, storage, and analysis. ThingSpeak uses a cloud-based architecture to
store and interpret data in real time from sensors. Additionally, the solution for data
R. Singh (B)
Department of ECE, Uttaranchal Institute of Technology, Uttaranchal University,
Dehradun 248007, India
e-mail: drrajeshsingh004@gmail.com
A. Hussain
Medical Technical College, Al-Farahidi University, Baghdad, Iraq
e-mail: ahmeds909091@gmail.com
L. H. A. Fezaa
Department of Optical Techniques, Al-Zahrawi University College, Karbala, Iraq
e-mail: laithfizaa@gmail.com
G. Gupta
Department of CSE, Sharda University, Greater Noida, India
e-mail: ganeshgupta81@gmail.com
A. P. Singh
Department of CSE, IES Institute of Technology and Management, IES University, Bhopal, India
e-mail: research@iesbpl.ac.in
A. Dogra
Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
e-mail: ayush123456789@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 507
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_43
508 R. Singh et al.
1 Introduction
Finding the right protocol for developing certain systems, like greenhouses, is neces-
sary for the present IoT era when machine-to-machine contact is expanding. Agricul-
tural greenhouses are controlled environments where critical variables like lighting,
temperature, moisture, and soil pH level may be tracked using sensor systems that
follow IoT standards [1]. The design and deployment of a sensor network that is
wireless for tracking and managing greenhouse settings are discussed in the cited
paper. The technology seeks to increase performance and productivity by tracking
and managing greenhouse conditions. A smart greenhouse monitoring system is
created to meet the issues farmers confront. This device uses Arduino and Atmega328
microcontrollers for power and contains a pump, a 12 V DC fan, an LCD, and several
sensors, including ones for light, temperature, humidity, soil moisture, and LDR [2,
3]. Due to their incapacity to manage crucial elements essential to the growth and
production of plants, farmers frequently struggle to make a profit from their crops.
The greenhouse must be kept above a specified temperature, and crop humidity must
be controlled. Monitoring the pH of the soil is also essential for optimum development
[4].
An IoT-based system is created to monitor and regulate many factors in the farming
field, such as moisture, water level, and temperature. Efficient resource management
is essential. The system uses a wireless sensor network to collect data, which gener-
ates a lot of data that needs a lot of storage space and slows down network trans-
mission from slave to control nodes [5]. A real-time embedded system employing
a WSN powered by ZigBee technology is created to regulate and monitor green-
house settings. Due to its communication long distance, WSN is regarded as the best
choice for greenhouses. The goal of this system is to give control over a variety of
environmental factors, such as the moisture in the soil, and humidity, to establish the
optimal climate for plant development [6–8].
Greenhouse Automation System Using ESP32 and ThingSpeak … 509
The presented approach optimizes water use by considering the water requirements of
plants rather than depending on presumptions made by cultivators. It contains both
static information—such as the kind of plant and soil—and dynamic information
gathered through sensors. Another similar study concentrates on two elements. The
monitoring of greenhouse conditions will first be automated [9–11]. A mobile app
makes it possible to inform users of the current state via their mobile phones, making
monitoring and maintenance easier [12, 13]. The Internet of Things (IoT) is used to
transport the gathered data to the cloud for storage [14, 15]. Plants are vulnerable
to the negative impacts of temperature changes, which might cause them to pass
away within a few hours. Global connectivity and administration of sensors, devices,
and users have been made possible by the rise of IoT. Various parameter measures,
such as moisture, temperature, and humidity, are needed for efficient monitoring
and management in contemporary greenhouses. The issue under discussion suggests
using the Internet of Things to automate greenhouse conditions. Several sensors are
used, including the Netduino 3 platform, to measure humidity, temperature, sunshine,
and wetness. The objective is to increase output rates while reducing farmer pain
[16, 17].
The current greenhouse systems frequently rely on people monitoring, requiring
repeated visits, and upsetting employees. Reduced yields may result from improper
temperature and humidity regulation. The idea of greenhouse automation has evolved
as a solution to these problems. The greenhouse atmosphere may be autonomously
managed and monitored by fusing IoT and embedded devices, negating the require-
ment for continuous farmer interaction [17]. The outlines of a study aimed at creating
a productive nursery plant-growing method [18]. The objective is to maintain the
ideal temperature, humidity, moisture, and light levels for plant growth. This is
accomplished by monitoring the greenhouse’s environment and controlling settings
following the requirements of each crop using a low-cost, configurable ESP8266
device [18]. It focuses on the necessity for standalone and modular devices while
talking about the positioning of IoT devices close to crops in greenhouses. The
employment of AI and IoT technology allows for environmental monitoring and
management in a greenhouse setting since the acquired data is communicated through
networks. Growing tomatoes and other veggies is the goal of this initiative [19].
Finally, a key study refers to a work that analyzes previous studies on greenhouse
systems. This work’s major goal is to uncover research gaps and emphasize the
essential components, advantages, and difficulties of greenhouse systems [20].
510 R. Singh et al.
The block diagram of the Greenhouse Automation System showcases the intercon-
nected components and their respective functionalities (Fig. 1). At the core of the
system is the customized ESP32 WiFi Board, serving as the central control unit
responsible for communication, data processing, and control actions. Acting as a hub,
it connects to various sensors and actuators, enabling seamless interaction among
them. One essential component in the system is the AM2301 humidity and temper-
ature sensor. This sensor accurately measures the humidity and temperature levels
within the greenhouse, providing vital data for monitoring and maintaining optimal
environmental conditions. Capturing these parameters ensures that the greenhouse
remains conducive to plant growth. Another integral part is the Actuator Relay
Module, which plays a crucial role in controlling the lighting system. The relay
module receives switching signals from the ESP32 WiFi board which processes user
inputs and user-preset thresholds, subsequently saving energy and optimizing light
delivery based on plant requirements. The electrical energy is supplied from a reliable
battery pack that is capable of powering all the components stably without causing
malfunction.
The system has ThingSpeak API plugged into it at an assembly code level.
Equipped with internet connectivity and 4 MB storage, the ESP32 can effectively
store, transmit, and display the data collected over a webpage. The WPA2 protocol
ensures secure data transmission to the internet and the API secret key ensures secure
data passage to the ThingSpeak cloud infrastructure, allowing users to access and
utilize the features provided while keeping their secure data channel intact. The tools
provided can be used to analyze the greenhouse data and provide some much-required
insights into the optimum environmental conditions as per the custom requirements
of the floral species planted.
The information flow and the control signals are channeled as per the illustrated
block diagram. The ESP32 collects temperature and humidity data from the AM2301
sensor and directs the relay module accordingly, triggering the lighting system. In
summary, the block diagram demonstrates the integrated nature of the Greenhouse
Automation System. The components work harmoniously, with the ESP32 WiFi
Board as the central control unit, facilitating data acquisition, control actions, and
communication. By leveraging the AM2301 sensor, Actuator Relay Module, reliable
power supply, and ThingSpeak integration, the system achieves optimal environ-
mental conditions within the greenhouse, promoting efficient and sustainable plant
growth.
4 Hardware Development
The hardware components for the Greenhouse Automation Mote are as follows
(Fig. 2):
1. The ESP32 board is the main control system of the whole project, where the Vin
pin of the ESP32 board has been connected to the 5 V Dc power supply and
ground to the common ground pin of the external power supply.
2. To monitor the ambient humidity and temperature, the AM2301 sensor has
been connected to the ESP32 board. The voltage pin of the AM2301 has been
connected to the voltage pin of the power supply and ground to the common
ground. The data pin has been connected to the GPIO19 pin of the ESP32 board.
3. The single-channel relay module has been connected to the system so that it can
control the light as per the instruction from the ESP32 board. So, the voltage
pin of the single-channel relay has been connected to the voltage pin of the
power supply and ground to the ground. The signal pin has been connected to
the GPIO16 pin of the ESP32 board.
4. The common DC power supply has been attached to this project so that it can
run properly in the environment.
The algorithm for the Greenhouse Automation System adopts a systematic approach
to closely monitor and control the greenhouse environment, aiming to create optimal
growing conditions. The process begins with the initialization of the ESP32 WiFi
Board, where essential communication parameters are configured. By establishing
a seamless connection to the AM2301 humidity and temperature sensor, the system
continuously receives real-time data regarding temperature and humidity levels
within the greenhouse. The hardware setup for the Greenhouse Automation Mote
can be seen in Fig. 3.
Once the sensor data is obtained, the algorithm proceeds to analyze it by applying
predefined thresholds or user-defined rules. This analysis serves to evaluate whether
any adjustments are necessary to maintain the desired environmental conditions.
Based on the analysis results, the algorithm sends control signals to the Actuator
Relay Module, which regulates the lighting system accordingly. By activating or
deactivating the lights, the system ensures that the appropriate light levels are main-
tained to support plant growth at different stages. To enable seamless data transmis-
sion and storage, the algorithm establishes a secure connection between the ESP32
WiFi Board and the ThingSpeak platform. The collected sensor data is formatted
appropriately and transmitted to ThingSpeak’s cloud infrastructure through this
connection. ThingSpeak’s powerful data visualization and analysis tools are then
utilized to gain valuable insights into greenhouse conditions.
Adjustments or optimizations for improved plant growth can be made with confi-
dence after careful observation of past trends, recognition of patterns, and exami-
nation of the data. Greenhouse conditions may be monitored and adjusted in real
time because the algorithm runs in a continuous feedback loop. The system guar-
antees consistent optimal growing conditions by continuously reading sensor data,
analyzing it, and taking relevant management measures. By implementing this plan,
the system will be able to maximize agricultural productivity and operate reliably
regardless of external factors. ThingSpeak server dashboards provide real-time access
to the data anywhere in the world (Fig. 4). Overall, the Greenhouse Automation
Greenhouse Automation System Using ESP32 and ThingSpeak … 513
6 Conclusions
The Greenhouse Automation System built from the ESP32 WiFi chip, the AM2301
temperature–humidity sensor, and an actuator relay set has proven highly effective
in optimizing and regulating isolated enclosures for testing out optimal conditions
for the growth of crops. With the hardware portion handled by the ESP32 as the
processing and communication unit, the AM2301 as the sensor module, and the
relay as the output module, the system, coupled with the ThingSpeak cloud, enables
real-time monitoring and control of key environmental parameters. The removal of
configuration jargon ensures that the plant receives the optimum temperature light
and humidity required to thrive and grow. Other than displaying the sensor readings,
the system also offers a detailed perspective of the greenhouse conditions with the
514 R. Singh et al.
Fig. 4 Live data in the Cloud server of ThingSpeak channel web dashboard
integration of data analysis scripts and graphical visualization tools. The gathered
data can be fed into an AI model to recognize critical patterns and events and the
system can be programmed to respond to such stimuli with suitable adjustments as
per plant requirements. Overall, the Greenhouse Automation System presented in this
study utilizes IoT in a way that enhances the quality of agro-practices with minimal
effort from the user’s side. Coupled with a compact and rugged form factor equipped
with the sensors and actuators required, the system offers a minimalistic, efficient,
cost-effective, and sustainable solution for greenhouse management. To add to that,
the modularity offered by this system helps futureproof it and allows for embedding
newer expansions and technologies, making it open to further development efforts.
References
1. Singh TA, Chandra J (2018) IOT based green house monitoring system. J Comput Sci
14(5):639–644. https://doi.org/10.3844/jcssp.2018.639.644
2. Pidikiti T et al (2021) Wireless Green house monitoring system using Raspberry PI. Turkish J
Comput Math Educ (TURCOMAT) 12(2). https://doi.org/10.17762/turcomat.v12i2.1894
3. Visvesvaran C, Kamalakannan S, Kumar KN, Sundaram KM, Vasan SMSS, Jafrrin S (2021)
Smart Greenhouse monitoring system using wireless sensor networks. In: 2021 2nd interna-
tional conference on smart electronics and communication (ICOSEC), Oct 2021, Published.
https://doi.org/10.1109/icosec51865.2021.9591680
Greenhouse Automation System Using ESP32 and ThingSpeak … 515
Abstract The 20-newsgroup dataset includes about 20,000 articles that have been
systematically classified into 20 newsgroups. The compilation of 20 newsgroups
is emerging as a common source of data for studying the performance of machine
learning techniques language implementations, such as language detection and word
clustering. Each record in the dataset is actually a text file (a series of words). The data
is trained over different classifiers such as Support Vector Machines, Naïve Bayes
Model, Random Forests, Stochastic Gradient Descent, and Logistic Regression. We
have used Python Jupyter notebook for code execution and packages from libraries
like scikit learn (Sklearn), Natural Language Toolkit (NLTK), regular expression
(RE), Language Identification (LangID), etc., to test the model parameters and accu-
racy measurements. By tuning the parameters and relevant initialization, we aim to
test the outcomes to find out the best model with maximum accuracy.
B. Kaur (B)
Chandigarh University, Gharuan, Mohali, India
e-mail: bobbinece@gmail.com
A. Garg · B. Goyal
Department of ECE, Chandigarh University, Gharuan, Mohali, India
e-mail: arpan04garg@gmail.com
B. Goyal
e-mail: bhawnagoyal28@gmail.com
H. Alchilibi
Medical Technical College, Al-Farahidi University, Baghdad, Iraq
e-mail: haydershrf@gmail.com
L. H. A. Fezaa
Department of Optical Techniques, Al-Zahrawi University College, Karbala, Iraq
e-mail: laithfizaa@gmail.com
R. Kaur
Department of Computer Science and Engineering, Gulzar Group of Institutions, Khanna,
Punjab 141401, India
e-mail: reetkaurnagra@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 517
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_44
518 B. Kaur et al.
1 Introduction
The data has 1707 columns, and target variable has 20 categories.
Performance Analysis of Terrain Classifiers Using Different Packages 519
The outliers have been neglected by initializing the min_df = 0.01 and max_df =
0.85.
We removed unwanted hyperlinks, punctuation, numeric words, words less than two
characters, @mentions by defining the remove features function and converting the
data to lowercase [4].
2.5 Lemmatization
Feature engineering is an art to draw out more amount of information utilizing the
available data. In the case of data modeling, alteration refers to the substitution of a
variable by a suitable function. We first created a TF–IDF vectorizer that ignores all
stop words of the English dictionary and removed the outliers as well. We then fit this
to our training and test data. Using a Count Vectorizer, we created the document term
matrix for training and test data. By performing the above tasks, we have numerically
encoded the text corpus, where each of the columns is taken from the text for each of
the documents [5]. In the case of TF–IDF, the values are normalized for the features
concerning the number of documents containing a term, while Count Vectorizer
simply contains a count of all the tokens appearing in the documents.
520 B. Kaur et al.
4 Data Modeling
60
40
20
0
F Score Recall Precision Accuracy
Testing Training
Fig. 3 Estimated values of performance metrics for NB of the target variable for TF/IDF
The performance metrics is evaluated for each category of the target variable were
obtained as under for Count Vectorizer shown in Fig. 4.
Decision Tree allows templates of grouping or degradation that are in tree structure.
Decision Tree divides the data collection into tiny sections and produces a corre-
sponding Decision Tree as this process is taking place. The resultant is a graphical
structure known as tree which consists of decision nodes and leaf nodes. The decision
node is one of the most important nodes and it further contains number of branches
(which generally are two or more) and a leaf node that is to determine logic of the
522 B. Kaur et al.
Fig. 4 Estimated values of performance metrics for NB model of the target variable for Count
Vectorizer
decision [7, 8]. In a tree, the highest decision node corresponds to the most relevant
predictor, the root node. Decision Tree Classifier is a class obtained on a dataset
during the classification of multigroups. Decision Tree Classifier takes two input
arrays as opposed to other classifiers: an array, sparse or compact, with size s = [n_
samples, n_features] containing the training samples, and an array Y with integer
values, size s = [n_samples] containing the class marks for the training samples [9].
The accuracy measures obtained for the model are based on TF/IDF shown in Fig. 5
and Count Vectorizer shown in Fig. 6.
Performance Analysis of Terrain Classifiers Using Different Packages 523
20
0
F Score Recall Precision Accuracy
Testing Training
Random Forest Classifiers are a set of classes, regressions, and other learning tech-
niques also known as Random Decision Forest. In Random Forests, a large number
of decision-making bodies are generated during training and the class that the indi-
vidual trees are determined by class mode (classification) or medium approximation
(regression) [8, 9]. It is a meta-estimator that adapts multiple decision-tab classi-
fiers to different data sub-samples and employs the average approach for enhanced
predictive accuracy and over-fit power. The sub-sample size is usually taken as same
as that of input sample size, but if bootstrap = True (default) [10], the samples are
drawn with a substitution. We made use of the sklearn Random Forest classifier with
100 estimators. The model measures are based on TF/IDF shown in Fig. 7 and Count
Vectorizer shown in Fig. 8.
524 B. Kaur et al.
40
20
0
F Score Recall Precision Accuracy
Testing Training
It is a statistical method for examining the dataset in which one or more indepen-
dent variables are present which resolve the output. The effect is calculated by the
Performance Analysis of Terrain Classifiers Using Different Packages 525
50
0
F Score Recall Precision Accuracy
Testing Training
Fig. 10 Estimated values of performance metrics for SGD of the target variable for TF/IDF
526 B. Kaur et al.
dichotomous equation (where there are only two possible results) [13]. The purpose
of Logistic Regression is to identify the most appropriate model for describing the
association between the two-state target variable of interest (dependent variable =
operator response or results) and a set of independent variables (predictor or explana-
tory). We imported Logistic Regression from scikit learn. By necessity, the regulariza-
tion is implemented. Data can be both dense and sparse. For maximum performance
using C-ordered arrays or CSR matrices containing 64-bit floats, every other data
type will be converted [14]. The accuracy measures are based on TF/IDF as shown
in Fig. 12 and Count Vectorizer as shown in Fig. 13.
All the six classifiers are evaluated under the tenfold cross-validation for both the
original and resample data as shown in Fig. 17. The measures are plotted to evaluate
the best model, as shown in Table 1.
5.1 Accuracy
It is the fraction of observations that match the predictions. If our model is correct,
then it seems like the best option. When the datasets are balanced, with roughly equal
numbers of false positives and false negatives, accuracy can be a helpful statistic, that
means you will have to take into account certain additional criteria when gauging
your model’s performance.
Performance Analysis of Terrain Classifiers Using Different Packages 529
Fig. 16 Visualizing the common performance measures for classification tasks: accuracy, recall
(called sensitivity), and precision
Naïve Bayes accuracy of training high accuracy 98.18 and testing high accuracy
82.58.
5.2 Precision
who were officially counted as having survived, how many actually made it?” High
accuracy is associated with a low false-positive rate.
Naïve Bayes precision of training data 98.28 and testing data 83.75.
The rate of recall is calculated as the ratio of correctly predicted positive observations
to the total number of observations in the class. How many of the passengers who
really survived were mistakenly classed as “not survivors”?
Naïve Bayes truly survived of training data 97.78 and testing data 81.36.
5.4 F1-Score
An F1-Score is a precision and recall’s weighted average. Therefore, this score takes
into account the possibility of both false positives and false negatives. F1 is sometimes
more helpful than accuracy, especially when the classes are not evenly distributed,
although it is less obvious. Accuracy works best when the cost of a false positive and
a false negative is balanced out. If there is a large disparity between the costs of false
positives and false negatives, it is important to evaluate both precision and recall.
Naïve Bayes F1-Score of training data 97.93 and testing data 81.32.
We can classify samples as positive or negative if they come from one of two
categories (two classes). Positive samples are represented by the left rectangle, while
negative samples are represented by the right rectangle. The circle represents all
samples that were projected to be positive. The outcomes should help the model
builder understand what these parameters indicate and how well their model has
done.
As evident from the graph, out of the six models, the best model is that of Naïve
Bayes. The model does not over fit and gives best testing and training accuracy
(Fig. 17).
Performance Analysis of Terrain Classifiers Using Different Packages 531
6 Conclusions
We observed that for almost all the models performed well for TF–IDF as compared
to Count Vectorizer except for Logistic Regression where Count Vectorizer wins
but at the cost of precision and recall values. The six models tried are as follows:
Naive Bayes, SGD classifier, SVM, Random Forest, Decision Tree, and Logistic
Regression. From the above models tried we observed that Naive Bayes Classifier
using TF–IDF performs best closely followed by SGD classifier. Though both these
techniques have good performance, while looking at the precision and recall of the
individual categories. Thus, to conclude text classification is a very interesting area
of unstructured data in machine learning and can be explored in depth to understand
which techniques performs best and at what circumstances. Several popular ones
were explored in this project like Naive Bayes and Support Vector Machines and
SGD classifier.
References
1. Lee I, Shin YJ (2020) Machine learning for enterprises: applications, algorithm selection, and
challenges. Bus Horizons 63(2):157–170
2. Shatte ABR, Hutchinson DM, Teague SJ (2019) Machine learning in mental health: a scoping
review of methods and applications. Psychol Med 49(9):1426–1448
3. Van Calster B, Wynants L (2019) Machine learning in medicine. N Engl J Med 380(26):2588–
2588
4. Leo M, Sharma S, Maddulety K (2019) Machine learning in banking risk management: a
literature review. Risks 7(1):29
5. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1, no 10.
Springer Series in Statistics, New York
6. Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval
7. Shreemali J et al (2021) Comparing performance of multiple classifiers for regression and
classification machine learning problems using structured datasets. In: Materials today:
proceedings
8. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
9. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
10. https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifie.
html.
11. https://developers.google.com/machine-learning/crash-course/reducing-loss/stochastic-gra
dient-descent.
12. Cheplygina V, de Bruijne M, Pluim JP (2019) Not-so-supervised: a survey of semi-supervised,
multi-instance, and transfer learning in medical image analysis. Med Image Anal 54:280–296
13. https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.
html
14. Patel S (2017) Chapter 2: SVM (support vector machine)—theory. Mach Learn 101
15. Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn
109(2):373–440
16. Kumar MV, Venkatachalam K, Prabu P, Abdulwahab A, Abouhawwash M (2021) Secure
biometric authentication with de-duplication on distributed cloud storage. PeerJ Comput Sci
4(3):e569
532 B. Kaur et al.
Abstract In this paper, systematic review of Doherty power amplifier for future
wireless communication applications is presented. The Doherty power amplifier
works on the principle basic operation and design of Doherty power amplifier is
presented. The Doherty power amplifier is mainly facing limited bandwidth oper-
ation while maintaining the desired efficiency and linearity. The design challenges
highlighting the bandwidth limiting factors are discussed in detail. The latest research
trends focused on improving the performance of Doherty amplifier using various
architectures and their main challenges as well as solutions are addressed. The latest
techniques are reviewed to enhance the performance of Doherty power amplifier in
terms of peak, back-off efficiency, and bandwidth along with maintaining the linearity
to make it most suitable and preferable candidate for future wireless communication
applications.
S. Singh (B)
Department of ECE, Chandigarh University, Gharuan, Mohali, India
e-mail: sukhpreet.ece@cumail.in
A. Babbar
Department of Mechanical Engineering, SGT University, Gurugram, Haryana 122505, India
A. Alkhayyat
College of Technical Engineering, The Islamic University, Najaf, Iraq
G. Gupta
Department of CSE, Sharda University, Greater Noida, India
S. Singh · J. S. Chohan
University Center of Research and Development, Chandigarh University, Chandigarh, India
e-mail: sandeepsingh.civil@cumail.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 533
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7_45
534 S. Singh et al.
1 Introduction
In the last two decades, mobile communication technology has had a profound impact
on the way people all over the world communicate with one another. The mobile
communication systems are the most widely used wireless technology ever. In terms
of capacity and capabilities, Table 1 shows the evolution of wireless communication
standards from the first generation to the approaching fifth generation. Research
and development of 5G, the next generation of wireless communication, is now the
primary focus of the telecommunications industry [1, 2]. With the advent of 5G,
we will now be able to take use of services that previously needed infrastructure
upgrades such as faster data rates (multi-Gigabits per second), larger batteries, and
more devices (100 times more compared to 4G to allow IOT), as well as lower latency
and more stable connections [3]. There must now be a significant shift in both the
overall and individual designs of radio frequency front end (RFFE) components in
order to realize all of these services. Coverage of a large number of frequency bands
is an essential function of RFFE. Since the power amplifier (PA) is the most energy-
intensive block in RFFE and must process relatively big signals, its design is of
paramount importance [4].
Because conventional power amplifiers are optimized for a narrow frequency
range, their usable bandwidth decreases dramatically at higher frequencies. In addi-
tion, high-frequency signals have a larger bandwidth and a higher peak-to-average
ratio (PAPR) [5] because their amplitude and phase shift over time as a result of the
frequent changes in transmission power. Therefore, linearity is the primary need of
power amplifiers in wireless communication systems, and in order to retain this, the
amplifier must operate across a greater output back-off (OBO) due to high power-
added-per-receiver (PAPR). A PA can be designed if the output power needed is
small, and the efficiency is between 7 and 14%. In order to improve the transmitter’s
overall performance, a PA with improved back-off efficiency and linearity is required.
As can be seen in Fig. 1, the DPA has two amplifiers: the “main” or “carrier” amplifier
and the “auxiliary” or “peaking” amplifier. The heart of a differential power amplifier
is an input power splitter that divides the incoming signal in half, followed by the main
amplifier and the load, which must be modulated correctly. To counteract the phase
discrepancy caused by the impedance inverting network, a delay line is introduced
at the DPA’s input [6].
Both the amplifiers are representing as current source in which carrier amplifier
biased as Class AB operation and peaking amplifier biased as Class C operation. As
a result, peaking amplifier will conduct only for half of the cycle. Both amplifiers
are producing output current that is linear proportional to input signal voltage. DPA
based on the principal of load modulation that means variation in load impedance
of each amplifier in accordance with level of input power. Carrier amplifier has load
impedance that is two times that of conventional amplifier that is 2Zopt, where Zopt
is the optimum load impedance to transfer the maximum power. Thus, the load modu-
lation behavior is depicted in which the impedance of both amplifiers is modulated
at peak power to Zopt which is mapped to optimum impedance value of transistor
presented in Fig. 2. As the impedance inverter provide efficient load modulation of
DPA which helps in improving back-off efficiency but this impedance inverter limits
the operational bandwidth of DPA [6]. The cause of limited bandwidth of DPA
is that when the frequency of operation shifts from the mid frequency, the output
impedance of main amplifier decreases. As a result, the drain efficiency degrades for
operation in a wide bandwidth. This is known as the frequency dispersion effect. The
various methods to improve the bandwidth are also elaborated in literature survey.
The various DPA architectures which overcome its limitations as well as improve
the overall performance are discussed below.
Normalized Impedance
5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized input voltage
Bandwidth is very import parameter for wireless communication. There are many
factors that limit the bandwidth of DPA as reviewed in the literature survey. In actual,
High Performance RF Doherty Power Amplifier for Future Wireless … 537
the working of DPA based upon the load modulation technique due to which DPA
has enhanced efficiency at back-off power levels. In this technique, load impedance
is varied according to input power level that means it need different impedance
at different power levels. The correct load modulation can be achieved through
impedance inverter line, phase compensation, and offset line. But all these are
observed as bandwidth limiting factors for DPA [9, 10]. First of all, the impedance
inverter transmission line is usually utilizing a quarter wave transmission line which
having narrow bandwidth. The various methods are reviewed in the literature that
focuses on new design of impedance inverter transmission line to enhance its band-
width. The second limiting factor is synchronization of phase between main and
auxiliary amplifier and to attain it over large bandwidth by reengineering the conven-
tional design of DPA is known as Digital DPA, in this baseband signal is divide and fed
into carrier and peaking amplifier individually. Other bandwidth limiting factor is the
lack of broadband matching networks. There is need of wideband matching network
which provides appropriate matching of the impedance at both saturation and back-
off power levels so that maximum power will get transferred at each frequency of
operation inside desired band. But due to lack of wideband matching network desired
response is very hard to achieve at each frequency inside band. The last is offset lines
which work on a single frequency, and it is integrated into output matching network.
These challenges open the various opportunities for designers to find the optimum
solutions for bandwidth limitation factors so that wideband DPA can be designed.
The one solution to problems faced in designing DPA is the use of the offset line with
both amplifier output matching network having characteristic impedance equal to the
load impedance at saturation to optimize the load modulation. Another solution is
to design DPA in such way that it will combine both output matching network and
power combiner to reduce the parasitic losses; moreover, with this technique size of
device can be reduced [11].
Back-off efficiency and linearity are the two important parameters of RF front end
DPA, and the main factor that degrades these parameters is load modulation technique
that is not optimum. Therefore, to maintain the optimum active load modulation
various methods have been developed. Some of these methods are addressed in this
section.
At 6dB of output power back-off, DPAs are regarded as the finest option for ampli-
fication because to their suitable efficiency and linearity. The typical DPA has a 6
dB back-off efficiency increase, but in practice, the DPA transmits average output
538 S. Singh et al.
Carrier PA
ZC
Power Splitter
Load (50Ω)
power in the range of 9–12 dB back-off, which results in low efficiency [12]. Multi-
stage dynamic power amplification (DPA) is a technique for increasing efficiency at
significant back-off by employing many stages of amplifiers. The linearity of DPAs
is enhanced by this design since the output power of several PAs is combined. An
example of a three-way DPA arrangement is shown in Fig. 3 [13], where the load of
the main amplifier is modulated by the first peaking amplifier and the modulation of
load of the preceding DPA is performed by the next peaking amplifier.
In spite of providing such highly efficient amplified signal, this multistage DPA
has limitation of complex circuit, and it becomes very difficult to implement. This
technique is also results into non optimum load modulation because peaking ampli-
fiers biased at low voltage that cause the less output power by peaking amplifiers.
This will directly affect efficiency and linearity of DPA.
If both the main and auxiliary amplifier in DPA is symmetrical to each other, then the
output current of auxiliary amplifier will be less than main amplifier as it is biased
in Class C mode and conduction angle is small, so it results into smaller output
power and large back-off. So, for correct operation of DPA either the power splitter
in input section should be asymmetrical or auxiliary device size should be double
as compared to main amplifier device [14]. If we use asymmetric DPA (ADPA)
by doubling the size of auxiliary amplifier that results into degradation of overall
performance of amplifier because the power delivered to main amplifier is reduced.
The ideal technique to achieve accurate load modulation is to use a power splitter
that operates in such a manner that more power is sent to the auxiliary amplifier than
the main amplifier. This will allow the auxiliary amplifier to generate the correct
amount of current. This results in a 9–13% increase in drain efficiency [15]. Figure 4
presents bias adapted two ways DPA (BA-DPA) that consists of main and auxiliary
amplifier and additionally a generator for envelope-shaped voltage whose function is
High Performance RF Doherty Power Amplifier for Future Wireless … 539
to change the gate voltage of main amplifier in such a way that it reduces the power
consummation by main amplifier during low power level when only main amplifier
is in conduction state. The main limitation of this advanced architecture is the circuit
which becomes more complicated, and cost will also increase.
This technique [16] uses the envelope’s magnitude to adjust the biasing of the auxil-
iary amplifier’s gates. That is to say, the auxiliary amplifier has a variable gate bias.
The gate bias of both amplifiers may be adjusted to meet specific demands, and
the combined supply modulator and DPA efficiencies can be increased [17]. This
method can also be used with main amplifier active in low power region and can
increase the efficiency. DPA is using envelope tracking method as shown in Fig. 5.
In this type of design of DPA, at low power level main amplifier starts conducting
and when the power level increases with adjustment of bias results into conduction
of auxiliary amplifier that contribute more current to load. In this architecture due
to bias adjustment both the amplifier contributes to same amount of current to load.
Thus, this method helps the peaking amplifier to synchronize with main amplifier.
Hence, optimum load modulation and both efficiency and linearity will be enhanced.
This method is good as it enhances the performance of DPA, but it has limitation
that it requires additional circuitry that makes it complicated, and cost gets increases.
540 S. Singh et al.
This architecture provides the compact inverted load network as compared to conven-
tional DPA as it makes it suitable for portable devices [12]. In inverted DPA, the
auxiliary amplifier’s drain is connected to the inverted transmission line instead of
main amplifier’s drain [19]. In low power region, the main amplifier provides the
maximum efficiency as it works with load of 25 Ω because the combination node
of DPA is reversed. But in traditional DPA, a 50 Ω λ/4 transmission line is used
for impedance inversion and combination point provides output. In IDPA, offset
lines are used with auxiliary amplifier whose function is to provide the condition of
off-state for auxiliary amplifier as shown in Fig. 6. The IDPA results into enhanced
efficiency during low power region of operation but important point of interest in
this architecture to correct design of output matching network of both amplifiers.
5 Conclusions
In this paper, a short survey on DPA is accomplished which begins with the basic
operation of DPA followed by the literature survey related to it. It has been reviewed
that DPA is a promising technology for linearity as well as efficiency enhancement
for advanced wireless communication. The paper covered the advanced DPA archi-
tectures used for back-off efficiency, linearity, and bandwidth enhancement which
542 S. Singh et al.
References
1. Mange MG, Braun V, Tullberg H, Zimmermann G, Bulakci Ö (2015) METIS research advances
towards the 5G mobile and wireless system definition. EURASIP J Wireless Commun Netw
1. https://doi.org/10.1186/s13638-015-0302-9
2. Tehrani M, Uysal M, Yanikomeroglu H (2014) Device-to-device communication in 5G cellular
networks: challenges, solutions, and future directions. IEEE Commun Mag 52(5):86–92
3. http://www.3gpp.org/
4. Cripps S (2006) RF power amplifiers for wireless communications. IEEE Microwave Mag
1(1):64–64
5. Agiwal M, Roy A, Saxena N (2016) Next generation 5G wireless networks: a comprehensive
survey. IEEE Commun Surv Tutorials 18(3):1617–1655. https://doi.org/10.1109/comst.2016.
2532458
6. Camarchia V, Pirola M, Quaglia R, Jee S, Cho Y, Kim B (2015) The Doherty power amplifier:
review of recent solutions and trends. IEEE Trans Microw Theor Tech 63(2):559–571
7. Kim B, Kim J, Kim I, Cha J (2006) The Doherty power amplifier. IEEE Microwave Mag
7(5):42–50
8. Singh S, Chawla P (2021) A highly efficient and broadband Doherty power amplifier design for
5G base station. In: Proceedings of international conference on data science and applications:
ICDSA 2021, vol 1. Springer Singapore, Singapore, pp 201–211. Golestaneh H, Malekzadeh F,
Boumaiza S (2013) An extended-bandwidth three-way Doherty power amplifier. IEEE Trans
Microwave Theor Tech 61(9):3318–3328
9. Moon J, Kim J, Kim I, Kim J, Kim B (2008) Highly efficient three-way saturated Doherty
amplifier with digital feedback predistortion. IEEE Microwave Wirel Compon Lett 18(8):539–
541
10. Barthwal A, Rawat K, Koul SK (2018) A design strategy for bandwidth enhancement in three-
stage Doherty power amplifier with extended dynamic range. IEEE Trans Microw Theor Tech
66(2):1024–1033
11. Golestaneh H, Malekzadeh FA, Boumaiza S (2013) An extended-bandwidth three-way Doherty
power amplifier. IEEE Trans Microw Theor Tech 61(9):3318–3328
12. Kim I, Moon J, Jee S, Kim B (2010) Optimized design of a highly efficient three-stage Doherty
PA using gate adaptation. IEEE Trans Microw Theor Tech 58(10):2562–2574
13. Wu D, Boumaiza S (2012) A modified Doherty configuration for broadband amplification using
symmetrical devices. IEEE Trans Microw Theor Tech 60(10):3201–3213
14. Singh S, Chawla P (2021) Design and performance analysis of broadband Doherty power
amplifier for 5G communication system. In: 2021 2nd international conference for emerging
technology (INCET). IEEE, pp 1–4
15. Park Y, Lee J, Jee S, Kim S, Kim B (2015) Gate bias adaptation of Doherty power amplifier
for high efficiency and high power. IEEE Microwave Wirel Compon Lett 25(2):136–138
High Performance RF Doherty Power Amplifier for Future Wireless … 543
16. Choi J, Kang D, Kim D, Kim B (2009) Optimized envelope tracking operation of Doherty power
amplifier for high efficiency over an extended dynamic range. IEEE Trans Microw Theor Tech
57(6):1508–1515
17. Yang Y, Cha J, Shin B, Kim B (2003) A microwave Doherty amplifier employing envelope
tracking technique for high efficiency and linearity. IEEE Microwave Wirel Compon Lett
13(9):370–372
18. Zhang Z, Xin Z (2014) A LTE Doherty power amplifier using envelope tracking technique.
In: Proceedings of the 15th international conference on electronic packaging technology, pp
1331–1334
19. Zhou H, Wu H (2013) Design of an S-band two-way inverted asymmetrical Doherty power
amplifier for long term evolution applications. Progr Electromagnet Res Lett 39:73–80
20. Ahn G et al (2007) Design of a high-efficiency and high-power inverted Doherty amplifier. IEEE
Trans Microwave Theor Tech 55(6):1105–1111. https://doi.org/10.1109/tmtt.2007.896807
21. Darraji R, Ghannouchi F (2011) Digital Doherty amplifier with enhanced efficiency and
extended range. IEEE Trans Microw Theor Tech 59(11):2898–2909
22. Zhao J, Wolf R, Ellinger F (2013) Fully integrated LTE Doherty power amplifier. In:
Proceedings of the international semiconductor conference Dresden-Grenoble (ISCDG), pp
1–3
23. Darraji R, Ghannouchi F, Hammi O (2011) A dual-input digitally driven Doherty amplifier
architecture for performance enhancement of Doherty transmitters. IEEE Trans Microw Theor
Tech 59(5):1284–1293
24. Gustafsson D, Andersson C, Fager C (2013) A Modified Doherty power amplifier with extended
bandwidth and reconfigurable efficiency. IEEE Trans Microw Theor Tech 61(1):533–542
Author Index
B F
Badrinath Karthikeyan, 407 Fezaa, Laith H. A., 507, 517
© The Editor(s) (if applicable) and The Author(s), under exclusive license 545
to Springer Nature Singapore Pte Ltd. 2024
S. Tiwari et al. (eds.), Advances in Data and Information Sciences,
Lecture Notes in Networks and Systems 796,
https://doi.org/10.1007/978-981-99-6906-7
546 Author Index
G N
Gagan Sharma, 25 Naga Sridevi Piratla, 155
Ganesh Gupta, 507, 533 Nandhini, S., 395
Gaurav Parashar, 265 Naveen Chauhan, 203
Gautham, S., 359 Neha Yadav, 13
Gayathri, T. R., 469 Nikhil Kashyap, 115
Nikhil Rajora, 115
Nirmalya Kar, 143
H
Harirajkumar, J., 445, 493
Harsha Vardhan Puvvadi, 243 P
Harsh Jindal, 191 Padmavathi, B., 469
Harsh Sharma, 191 Pankaj Kumar, 167
Hasitha, V., 469 Pankaj Singh Kholiya, 95
Himanshu Aggarwal, 203 Parva Barot, 299
Hussain, Ahmed, 507 Pooja, K., 337
Preethesh H. Acharya, 457
I
Itisha Kumari, 215 R
Ragavendra, K., 421
Raghav Shah, 287
J RajaSekhar Konda, 1
Jasgurpreet Singh Chohan, 533 Rajesh Singh, 507
Jayanta Pal, 143 Ravindra Kumar Purwar, 85
Jyostna Devi Bodapati, 1 Rishikesh Unawane, 433
Rohan Godha, 179, 191
Roopal Tatiwar, 433
K Rupinderjit Kaur, 517
Kavitha Datchanamoorthy, 469
Kayalvizhi, S., 407, 481
King Udayaraj, M. R., 371 S
Kishore Kanna, R., 337 Sadhana, G., 421
Kishore, M. A., 347 Sahul Kumar Parida, 179
Kriti, 95 Saiganesh, V., 347
Sai Teja Annam, 1
Samarth Shetty, 457
L Sandeep Singh, 533
Lokesh Rai, 49 Santhosh, M., 493
Lokesh, S., 347 Saransh Khulbe, 61
Sarmistha Podder, 143
Sasirekha, N., 445, 493
M Saswat Kumar Panda, 179
Manish Paliwal, 299 Sathish Kabekody, 457
Manni Kumar, 287 Saumyamani Bhardwaz, 179
Manu Gupta, 155 Shailendra Pratap Singh, 61
Manu Midha, 179 Shanu Sharma, 115
Megha Sharma, 191 Sharan Venkatesh, V., 371
Mohan Lal Kolhe, 309 Shaurya Kanwar, 167
Monika, 127 Sheetal Phatangare, 433
Mriga Jain, 309 Shivakumar, R., 445
Mukul Aggarwal, 13 Shivani Pothirajan, 481
Munesh Chandra, 321 Shiv Prakash, 255
Muthukumara Vadivel, B., 383 Shubhankar Munshi, 433
Author Index 547
Shyamala L, 243 V
Soma Prathibha, 347 Vashist, P. C., 33
Sonam Saluja, 321 Vengadeswaran, 49
Souhardha S. Poojary, 457 Vijay Verma, 215
Sripath Kumar Chakrapani, 155 Vinutha, S., 383
Sudhansh Sharma, 227 Vishnu, E., 371
Sukhpreet Singh, 533
Vishnudev, A., 359
Sunil Prajapat, 167
Vivek Karthick, P., 493
Swati Sharma, 13, 105
Swetha, S., 445 Vivek Tomar, 13
T
Taran Singh Bharati, 255 Y
Tarunaa, R. K., 481 Yeshwanth Pasem, 155